In-memory-fuzzing in Linux (with GDB and Python)

Probably, if you’re reading this article, you already know what fuzzing means. In short, fuzz testing is a technique for testing software and searching vulnerabilities: targeted software is feeded with malformed input, hoping for something abnormal to occur.

In recent years many excellent frameworks have been published to help the tester in the development of an effective fuzzer (I like spike, peach and sulley very much).  But, as always, there are some experimental techniques, which in future will allow the evolution of the field: in this case I’m talking about in-memory fuzzing.

In-Memory Fuzzing is an advanced, and therefore even more complex, technique (but we’ll see how to manage this complexity), which allows the tester to fuzz individual subroutines of the targeted program. This focused type of test has many advantages:

  • The fuzzing process in faster and allows a complete code coverage, since it’s possible to select the piece of code to test.
  • Targeting a specific program subroutine, allows to bypass any obfuscation or decoding of input data, making the development of fuzzing tools simpler, from this point of view.

But how does an in-memory fuzzer work?

If we consider an application as chain of function, that receives an input, parses and processes it, and produces an output, we can describe in-memory fuzzing as a process that tests only a few specific rings of the chain (those dealing with parsing and processing).

Having clarified this point, is now easy to illustrate the main techniques used today:

Mutation Loop Insertion

Mutation Loop Insertion (MLI) modifies the target program inserting an infinite loop on the parsing subroutine, isolating it from from the rest of the function chain.

This loop can test the targeted function with a large quantity of inputs in a very short time, eliminating superfluous program’s code, and requires no interaction from the outside. This makes this method the fastest.

However it has also negative aspects, being the method more difficult to implement… It requires at least some knowledge of reverse engineering, and the ability to write code safely injectable inside an active process.

Snapshot Restoration Mutation

Snapshot Restoration Mutation (SRM) handle the program function chain is a different way: no code is injected, but, through the use of breakpoints, the fuzzer takes and restores snapshots of the process at the beginning and end of the tested function.

The effect is similar to that of an infinite loop, as in the previous case, but managed and monitored by an external process.

This method has several advantages including the ability to restore the program at a clean state, and, of course, it does not require the writing of assembly code. This at a cost of a certain performance degradation.

Implementation

Ok, the practical part, let’s start with a question…  Did you know that, since version 7.0, GDB can be scripted in python?

Yeah, I agree, it’s a wonderful thing:  although at the time the development is not yet complete, with some adjustments, it is possible to exploit all the capabilities of the (u)nix debugger par excellence, within our python scripts.

Moreover, since version 7.0, GDB is able to take and restore snapshots of the debugged process, with the introduction of the checkpoint command.

Having found a so confortable “framework” (that, IMHO, has good probabilities to overcome similar solutions present in the Windows environment, also excellent as PyDbg), I could not resist the temptation to write an implementation of in-memory fuzzer for (u)nix environments.

The result was the creation of a small library to support GDB python scripting, with the fuzzer inside the examples directory. Let’s look at its practical use…

 In-memory fuzzing (in practice)

The fuzzer is composed of two scripts: the first, in-memory-break.py, is used to find functions to test.

The script inserts breakpoints at the beginning of every function of the program and prints out their arguments in search of text strings. The goal is to find the function that parses the input.

Let’s try to run the script against the vulnerable program contained in the same directory:

cross@yotsubox$ ./in-memory-break.py getdomain test@email.com
Breakpoint 1 at 0x4004b0
Breakpoint 2 at 0x4004d8
Breakpoint 3 at 0x4004e8
Breakpoint 4 at 0x4004f8
Breakpoint 5 at 0x400508
Breakpoint 6 at 0x400518
Breakpoint 7 at 0x400528
Breakpoint 8 at 0x40056c
Breakpoint 9 at 0x400590
Breakpoint 10 at 0x400600
Breakpoint 11 at 0x40062c
Breakpoint 12 at 0x40064d
Breakpoint 13 at 0x4007f0

Function <__libc_start_main@plt> at *0x4004f8:
	argument0 = 4196079 "UH"
	argument1 = 2
	argument2 = 140737488348088 "L"
	argument3 = 4196192 "H"
	argument4 = 4196176
	argument5 = 140737351962048 "UH"

Function <_init> at *0x4004b0:
	argument0 = 2
	argument1 = 140737488348088 "L"
	argument2 = 140737488348112
	argument3 = 0
	argument4 = 140737351885568
	argument5 = 140737351962048 "UH"

Function <call_gmon_start> at *0x40056c:
	argument0 = 2
	argument1 = 140737488348088 "L"
	argument2 = 140737488348112
	argument3 = 0
	argument4 = 140737351885568
	argument5 = 140737351962048 "UH"

Function <frame_dummy> at *0x400600:
	argument0 = 2
	argument1 = 140737488348088 "L"
	argument2 = 140737488348112
	argument3 = 0
	argument4 = 140737351885568
	argument5 = 140737351962048 "UH"

Function <__do_global_ctors_aux> at *0x4007f0:
	argument0 = 2
	argument1 = 140737488348088 "L"
	argument2 = 140737488348112
	argument3 = 0
	argument4 = 140737351885568
	argument5 = 140737351962048 "UH"

Function <strdup@plt> at *0x400508:
	argument0 = 140737488348816 "test@email.com"
	argument1 = 140737488348088 "L"
	argument2 = 140737488348112
	argument3 = 0
	argument4 = 140737351885568
	argument5 = 140737351962048 "UH"

Function <parse> at *0x40064d:
	argument0 = 6295568 "test@email.com"
	argument1 = 140737488348831 "SSH_AGENT_PID=2952"
	argument2 = 0
	argument3 = 30803244232763745
	argument4 = 140737351888448
	argument5 = 140737348377640

Function <strtok@plt> at *0x400528:
	argument0 = 6295568 "test@email.com"
	argument1 = 4196426 "@"
	argument2 = 0
	argument3 = 30803244232763745
	argument4 = 140737351888448
	argument5 = 140737348377640

Function <strtok@plt> at *0x400528:
	argument0 = 0
	argument1 = 4196426 "@"
	argument2 = 6295573 "email.com"
	argument3 = 6295573 "email.com"
	argument4 = 6295568 "test"
	argument5 = 140737348377640

Function <strcpy@plt> at *0x400518:
	argument0 = 140737488346768
	argument1 = 6295573 "email.com"
	argument2 = 6295582
	argument3 = 6295583
	argument4 = 0
	argument5 = 140737348377640

Function <print_domain> at *0x40062c:
	argument0 = 140737488346768 "email.com"
	argument1 = 6295584
	argument2 = 140737488346777
	argument3 = 0
	argument4 = -72340172838076673
	argument5 = -72219847665292440

Function <printf@plt> at *0x4004d8:
	argument0 = 4196412 "Domain is %s\n"
	argument1 = 140737488346768 "email.com"
	argument2 = 140737488346777
	argument3 = 0
	argument4 = -72340172838076673
	argument5 = -72219847665292440

Domain is email.com
Function <printf@plt> at *0x4004d8:
	argument0 = 4196463 "Domain is valid? %s\n"
	argument1 = 4196428 "YES"
	argument2 = 140737351888368
	argument3 = 4196425
	argument4 = 1
	argument5 = 4196425

Domain is valid? YES
Function <__do_global_dtors_aux> at *0x400590:
	argument0 = 140737488347632 "("
	argument1 = 140737488347632 "("
	argument2 = 140737354127792
	argument3 = 4
	argument4 = 0
	argument5 = 4

[Inferior 1 (process 18083) exited normally]

I’ve highlighted the parsing function. This case was simple, because the binary was not stripped, making possible to print function names. But even in the case of stripped binaries, it’s possible to find the function we are interested in, analyzing and trying to decode the arguments.

The vulnerable program simply try to extract the domain from an email address: an overly long domain triggers a buffer overflow on the stack.

Now that we know the function to test and the input type, let’s try to see if our in-memory fuzzer is able to detect the bug:

cross@yotsubox$ ./in-memory-fuzz.py parse getdomain test@email.com
Breakpoint 1 at 0x400651

Breakpoint 1, 0x0000000000400651 in parse ()
fuzz loop: 1
string len: 15
0x601030:	 "test@email.com"
Domain is email.com
0x000000000040072f in main ()
Switching to process 4997
#0  0x0000000000400651 in parse ()
fuzz loop: 2
string len: 16
0x601030:	 "test@email.comA"
Domain is email.comA
0x000000000040072f in main ()
Switching to process 4998
#0  0x0000000000400651 in parse ()
fuzz loop: 3
string len: 17
0x601030:	 "test@email.comAA"
Domain is email.comAA
0x000000000040072f in main ()
Switching to process 4999
...
...
...
#0  0x0000000000400651 in parse ()
fuzz loop: 1031
string len: 1045
0x601030:	 "test@email.com", 'A' <repeats 186 times>...
Domain is email.comAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
0x000000000040072f in main ()
Switching to process 7202
#0 0x0000000000400651 in parse ()
fuzz loop: 1032
string len: 1046
0x601030:	 "test@email.com", 'A' <repeats 186 times>...
Domain is email.comAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Domain is valid? YES

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400849 in ?? ()
Switching to process 7203
#0  0x0000000000400651 in parse ()
fuzz loop: 1033
string len: 1047
0x601030:	 "test@email.com", 'A' <repeats 186 times>...
[Switching to process 7204]

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

#
# The program has crashed! Stack exhaustion or bug???
# Now is your turn, have fun! :P 
#

A debugging session is active.

	Inferior 1 [process 7204] will be killed.

Quit anyway? (y or n) y

As you can see, fuzzing loop number 1032 triggered the bug: the input strings are simply generated appending ‘A’ characters to a valid email address, and when the domain part reaches a critical length, a buffer on the stack is overflowed, overwriting the return address (and stack canaries).

Although this is script is very simple, it’s a basis to build on to create more complex fuzzer. In addition it illustrates several GDB features very useful in this field:

  • Allocation of memory on the target process
  • Snapshots/checkpoints and their restoration
  • Breakpoint management
  • Argument analysis

and so on…

If you are interested in GDB python scripting or in-memory fuzzing on (u)nix system, you can visit the project website:

GDB Python Utils:  https://github.com/crossbowerbt/GDB-Python-Utils/

You will find the scripts illustrated in this article under the examples directory of the project.

I’m also trying to maintain a good documentation for the support library, so you may give a look at the “snippet” page of the wiki (https://github.com/crossbowerbt/GDB-Python-Utils/wiki/Snippets/) to see the implemented features.

PS: the fuzzer was specifically developed for 64bit systems: if you want to use it against 32bit application you must adapt it (if you send me an email I can give you a few hints…)

Fun with HexInject and USB protocols

Did you know that pcap (http://www.tcpdump.org/) libraries can capture raw USB traffic?

I had noticed several times the presence of various USB interfaces in wireshark but so far I’ve never tried to play with them:

On your system should appear similar interfaces. If not you can refer to this guide: http://wiki.wireshark.org/CaptureSetup/USB

In this short post I just want to talk about a simple experiment I did with hexinject and awk: the recognition of mouse clicks.

The first thing to do is to find the port connected to the mouse. I’m sure there are more elegant systems to do it, but I just looked in wireshark at the port receiving packets when the mouse is moved. From the image you can easily tell that, in my case, it’s the USB port 3 (usbmon3).

Then we can try to sniff on this port, performing various actions with the mouse, to see if we can understand at least part of the protocol used.

Captured data in the case of a left mouse click:

80 3A DF 2A 01 88 FF FF 43 01 81 02 03 00 2D 00 8D 43 E7 4D 00 00 00 00 AA 38 00 00 00 00 00 00 06 00 00 00 06 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 04 02 00 00 00 00 00 00 01 00 00 00 00 00
80 3A DF 2A 01 88 FF FF 53 01 81 02 03 00 2D 3C 8D 43 E7 4D 00 00 00 00 BD 38 00 00 8D FF FF FF 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 04 02 00 00 00 00 00 00

Captured data in the case of a right mouse click:

80 3A DF 2A 01 88 FF FF 43 01 81 02 03 00 2D 00 AB 43 E7 4D 00 00 00 00 A2 22 03 00 00 00 00 00 06 00 00 00 06 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 04 02 00 00 00 00 00 00 02 00 00 00 00 00
80 3A DF 2A 01 88 FF FF 53 01 81 02 03 00 2D 3C AB 43 E7 4D 00 00 00 00 B4 22 03 00 8D FF FF FF 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 04 02 00 00 00 00 00 00

The first dumped line is generated by the mouse, the second is the system acknowledgment. The hexadecimal byte in bold represent the button pressed (use the scrollbar to reach the text). Bytes in italic allow us to understand the type of action performed (a button action and not a mouse movement).

Using these informations it’s very easy to write an awk script that can tell us the type of action performed:

#
# Analyze USB mouse protocol
# and print button actions
#
# use with:
#   source_program | awk --enable-switch -f mouse_click.awk
# or sometimes just:
#   source_program | gawk -f mouse_click.awk
#

/06 00 00 00 06 00 .+ 0[0-9] 00 00 00 00 00$/ {

    # button code check
    switch ($65) {
        case "00": print "click released";     break;
        case "01": print "left click";         break;
        case "02": print "right click";        break;
        case "03": print "left+right click";   break;
        case "04": print "central click";      break;
        default:   print "code " $65 " click"; break;
    }

}

Let’s try it:

$ sudo hexinject -s -i usbmon3 | awk -f mouse_click.awk
left click
click released
central click
click released
left+right click
click released
...

This successful experiment demonstrates the extreme versatility of the “Data Oriented” approach used by hexinject. In the future I hope to deepen the USB protocol and maybe write a post that uses hexinject in USB injection mode (really cool IMHO).

At the moment I haven’t a very in-depth knowledge of USB, but if you want to know the meaning of the rest of the dump can refer to this document: http://www.usb.org/developers/devclass_docs/HID1_11.pdf, or this tutorial (shorter): http://www.faculty.iu-bremen.de/birk/lectures/PC101-2003/14usb/FINAL%20VERSION/usb_protocol.html.

Hexinject 1.2 released

HexInject version 1.2 has been released (http://hexinject.sourceforge.net/). Evvai! :)

The release includes some minor fixes and a new feature: now the various length fields of IP, UDP, TCP, ICMP headers are automatically adjusted when the size of the packet in modified.

Thanks also to the feature that allows the automatic checksum of packet, hexinject has no longer limitation in altering network’s streams of data… But let’s do one step back, since you might not know what I’m talking about :)

From the site (http://hexinject.sourceforge.net/):

“HexInject is a very versatile packet injector and sniffer, that provide a command-line framework for raw network access.
It’s designed to work together with others command-line utilities, and for this reason it facilitates the creation of powerful shell scripts capable of reading, intercepting and modifying network traffic in a transparent manner.”

Give a look to the site if you want to see some pratical uses of the tool… There’s also a PDF guide to hexinject that includes a lot of examples and some useful cheatsheets: http://hexinject.sourceforge.net/hexinject_introduction.pdf

Something personal

I do not know if the same is true for you, but I often need a bit of encouragement to finish my programs and researches.

Fortunately, I read some comments at the right time (http://www.reddit.com/r/netsec/comments/f78fb/regex_man_in_the_middle/.compact?sort=new):

“HexInject is a lot of fun! … Running this next to tcptrack in a couple consoles makes me feel like I know what is going in my network.”

“This may be one of my favorite new tools! Thanks for the idea!!”

How can these comments do not warm the heart?  :) After reading these few lines, being a sentimental type, I’ve decided to release the new features (which would otherwise have remained in limbo for who knows how long)…

Something historical

HexInject was inspired by the tool linkcat of paketto keiretsu (http://freshmeat.net/projects/paketto/).

This collection of instruments, issued in late 2002, contained many innovative ideas, including that of a low-level access to the network via a tool similar to cat. The objective was precisely to make it easy to use, piped with other cmdline tools.

Compared to linkcat, hexinject use more modern libraries and is able to automatically calculate the checksum and the size of packets, making it easier to use. But the basic ideas are the same.

For this reason I suggest you to read the slides presented at Defcon 11 by the author of paketto keiretsu (http://www.defcon.org/images/defcon-11/dc-11-presentations/dc-11-Kaminsky/dc-11-kaminsky.pdf). Truly inspiring slides, imho…

Exploiting Arm Linux Systems

Wow, this last month has been pretty intense. Between trips, new articles and projects I haven’t had much free time (although I enjoyed this month).

Exploiting Arm Linux Systems

This was my first article dealing specifically with ARM processors (even though I had already played with ARM-based embedded things…)

You can find the article at this address: http://www.exploit-db.com/download_pdf/16151

The majority of ARM systems are vulnerable and not adequately protected against arbitrary code execution attacks. I’ve (tried to) brought together, in a single document, the knowledge required to approach the exploitation of ARM Linux systems.

Return-oriented ARM shellcode

I assure you the article will not be a heavy read, because the chapters are full of examples, images and graphics.

A small digression: a friend of mine, has proposed me a t-shirt design. To appreciate the idea, one must have a basic knowledge of the ARM architecture:

we all like dirty tricks...

(Or maybe are these words a veiled reproach? :P )

Cymothoa Ver.1 Alpha, introduzione

E’ stato appena rilasciato Cymothoa (http://cymothoa.sourceforge.net/) un tool scritto nel tempo libero da me e da codwizard (che colgo l’occasione per salutare ;) )

Come dice anche la pagina del progetto, Cymothoa e’ uno strumento per creare backdoor nascoste, inniettando il codice assembly di queste direttamente in processi gia’ in esecuzione. In questo modo e’ possibile sia sovrascrivere il codice del processo, sia farlo fork()are creando un processo con lo stesso nome ma che esegue in realta’ il nostro codice.

Vediamo un breve esempio di utilizzo del tool… Read more of this post

Follow

Get every new post delivered to your Inbox.