.code_seg_0:40207411 movi a4, 1337 ; connect to port 1337
.code_seg_0:40207414 mov.n a2, a1
.code_seg_0:40207416 call0 guessed_connect
.code_seg_0:40207419 movi a2, 1000
.code_seg_0:4020741C call0 delay ; wait 1000ms before next connection attempt
.code_seg_0:4020741F l32i.n a3, a12, 0
.code_seg_0:40207421 l32r a4, port_8000
.code_seg_0:40207424 mov.n a2, a1
.code_seg_0:40207426 call0 guessed_connect ; connect to port 8000
.code_seg_0:40207429 movi a2, 1000
.code_seg_0:4020742C call0 delay
.code_seg_0:4020742F l32i.n a3, a12, 0
.code_seg_0:40207431 l32r a4, port_3306
.code_seg_0:40207434 mov.n a2, a1
.code_seg_0:40207436 call0 guessed_connect ; connect to port 3306
.code_seg_0:40207439 movi a2, 1000
.code_seg_0:4020743C call0 delay
.code_seg_0:4020743F l32i.n a3, a12, 0
.code_seg_0:40207441 l32r a4, port_4545
.code_seg_0:40207444 mov.n a2, a1
.code_seg_0:40207446 call0 guessed_connect ; connect to port 4545
.code_seg_0:40207449 movi a2, 1000
.code_seg_0:4020744C call0 delay
.code_seg_0:4020744F l32i.n a3, a12, 0
.code_seg_0:40207451 mov.n a2, a1
.code_seg_0:40207453 movi a4, 445 ; our final port!
.code_seg_0:40207456 call0 guessed_connect
.code_seg_0:40207459 bnez.n a2, loc_40207469
From the above, we’ve determined that:
.code_seg_0:40207400 l32r a12, hostname
Is loading a pointer to the hostname variable into the a12 register. This is followed by loading of what looks like a port number into various other registers, again followed by a call0 instruction. This behaviour led me to guess this is likely our connect() function.
From this analysis, we’ve determined our port knocking sequence to be as follows:
With the application connecting predictably, to services.ioteeth.com on port 445.
With that, we’ve effectively solved the challenge! All that’s left is to get the secrets!
Getting the secrets!
In order to obtain the secrets, we need to knock on the now known ports in the correct order. We can do this in various ways, using nmap or even netcat, but I prefer to use the knock binary, as it’s purpose built (and is part of the knockd package).
Well done! We hope you had fun with this challenge and learned a lot!
In this post, we set out to understand how a particular firmware image communicated with external services to apparently obtain secrets. We knew nothing about the firmware initially and wanted to describe a methodology for analysing unknown formats.
Ultimately we’ve taken the following steps:
Analysed the file using common Linux utilities file, binwalk, strings and hexdump
Made note that our firmware image is based on the ESP8266 and is likely performing a form of port knocking, prior to accessing secrets, based on the strings within.
Performed research, as well as reversed open source tools, to understand the hardware on which the firmware image runs, its processor, boot process and the memory layout, as well the firmware image format itself.
Equipped our tools with the appropriate additions to understand the Xtensa processor.
Written a loader for IDA that’s capable of loading future firmware images of this format.
Came to understand the format of compiled code prior to being exported as a firmware image.
Written and compiled our own code for the ESP8266 to obtain debugging symbols.
Patched and made use of FireEye’s IDB2PAT IDA plugin, to generate FLIRT signatures from our debug build.
Applied our FLIRT signatures across our target firmware image, to recognise library functions.
Observed the use of vtable’s to call library functions and used this to classify other unknown library functions.
Used references to functions of known and likely libraries to locate the firmware image’s main processing loop.
Reverse engineered the main loop function to understand our port knocking sequence.
Made use of the knock client to perform our port knocking and reap all of the secrets!
I’d like to think that this methodology can be applied more generally when analysing unknown binaries or firmware images. In this case, we were fortunate in that most of the internals had been documented already and as documented here, our job was to put the pieces together. I’d encourage the reader to look at other firmware images, such as router firmware for example.
Special thanks to the author’s of the following for their insight:
https://github.com/esp8266/Arduino/blob/master/libraries/ESP8266WiFi/examples/WiFiClient/WiFiClient.ino – ESP8266 Wifi Connect Example
https://richard.burtons.org/2015/05/17/decompiling-the-esp8266-boot-loader-v1-3b3/ – Decompiling the boot loader
A VTABLE in this context is essentially a collection of function pointers per each module of the application’s libraries. We can see that each library’s function pointers are delimited by three nullbytes, represented as the below for example:
Where we can observe two functions of the WiFiClient module, followed by three nullbytes (a delimiter) and finally, followed by the function pointers of the next module, in this case SdFile.
This is an important observation, as it will allow us to recognise if an unknown function belongs to a particular library, based on its presence within the VTABLEs amongst the other libraries. Given the below for example:
We can infer that sub_40207AD0 whilst unnamed and unknown in terms of functionality, does in-fact belong to the WiFiClient library, which hints at its purpose.
Finding the port knock sequence
Armed with all of our obtained knowledge, at this point we’re in a position to find references to the connect() function of the WiFiClient library, or indeed to other functions, including those that are unnamed, in search of our port knocking sequence.
Having searched for references to connect(), I couldn’t find any. I did however, after checking for references to all functions that were part of the WiFiClient library, find a reference to the following unnamed function:
.code_seg_0:4020B214 .int sub_40207A58
The following XREFS were identified:
The function referenced appeared to be quite involved. You can see it below:
As an educated guess, we can assume this is probably the loop function of our image, which is responsible for doing most of the heavy leg work.
Understanding the Xtensa instruction set
In order to understand what the instructions above are doing, we want to have at least a passing familarity with what registers the processor uses and their purpose, as well as what common instructions do and how conditional jumps work.
It turns out someone has documented most of the common instructions of the Xtensa processor.
An excerpt of this guide, which covers loading and storing instructions, as well as register usage, is below:
This is a load/store machine with either 16 or 24 bit instructions. This leads to higher code density than with constand 32 bit encoding. Some instructions have optional “short” 16 bit encodings indicated by appending “.n” to the mnemonic. The Xtensa implements SPARC like register windows on subroutine calls, but I have never seen this feature used in either the bootrom or code generated by gcc, so this can be ignored.
There are 16 tegisters named a0 through a15.
a0 is special – it holds the call return address.
a1 is used by gcc as a stack pointer.
a2 gets used to pass a single argument (and to return a function value).
Understanding the Xtensa calling convention
It would also be helpful to understand the calling convention, which describes how arguments are passed to function calls. I found this document, which describes the calling convention as follows:
Arguments are passed in both registers and memory. The first six incoming arguments are stored in registers a2 through a7, and additional arguments are stored on the stack starting at the current stack pointer a1. […].
Thus we can determine that registers a2 to a7, in most cases will be used to store arguments passed to functions. The register a2 is also used when passing a single argument to a function.
So, why a loader? The main reason was that I wanted something I could re-use when reversing future ESP8266 firmware dumps.
Our loader will be quite simple. IDA loaders typically define the following functions:
defload_file(li, neflags, format):
The first is responsible for identifying an applicable file, based on its signature and is executed when you open a file in IDA for analysis. The second, for interpreting the file, setting entry points, processor, as well as loading and naming segments accordingly. Our loader won’t perform any sanity checking, but should be able to load an image for us.
My loader is derived from the existing loader classes shipped with IDA and of-course, is built to take into account the format we’ve dissected above. It will attempt to identify the firmware image based on signature (image magic), followed by loading each of the segments into memory, whilst trying to guess the names and types of segments based on their loading address.
Below is the Python code for our loader, which lives in IDA’s loader directory:
As you can see, the user segment loading loop, which iterates over each of the segments within ROM 1, attempts to perform some basic classification and naming based on the load address of the given segment, per our rules mentioned earlier.
elif(seg_addr > 0x40100000):
With this loader in use, IDA now recognises our firmware image:
Our segments look a lot tidier:
And we have an entry point! (of the user ROM):
Whilst we’re in a good state to perform cursory analysis, we don’t have any function names to base our analysis on. Ideally, we’d like to identify the routine(s) responsible for connecting to a given port and locate the references to that function, as well as make sense of any other library function calls. This will allow us to discover the ports knocked on, as well as the order of which knocking should take place.
Performing library recognition
There are known and documented methods to identify library functions within a statically linked, stripped image. The most known of which is to use IDA’s Fast Library Acquisition for Identification and Recognition(FLAIR) tools, which in turn creates Fast Library Identification and Recognition Technology (FLIRT) signatures.
The process of creating FLIRT signatures usually requires a number of prerequisite conditions to exist:
A pattern file must be created via either pelf or similar, followed by use of sigmake
A compiled, relocatable library containing the functions and associated names, of which signatures are to be generated against, must exist
The library must be a recognised format and with a supported instruction set
This poses two problems, the first is that we don’t have such a library available to us at present, the second is that Xtensa is not a supported processor type, as shown below.
ELF parser. Copyright (c) 2000-2015 Hex-Rays SA. Version 1.16
Supported processors: MIPS, I960, ARM, IBM PC, M6812, SuperH
Usage: ./pelf [-switch or @file or $env_var] file [pattern-file]
(wildcards are allowed)
The result is that we can’t create pattern files using IDA’s traditional toolset.
The solution to these problems, which we’ll tackle in a moment (not without their own obstacles) are as follows:
We need to install a suitable IDE capable of compiling code for the ESP8266
We need to write code that hopefully, uses the same libraries as our target
We need to compile our code into an ELF file that is statically linked, unstripped and with debug info.
We need to find a way to create signatures from said ELF file
The first step is involved and beyond the scope of this blog post. I’ve opted to use Arduino IDE and configured it to compile for a generic ESP8266 module, with verbose compiler output enabled.
With our environment configured, we can look up example sketches for the ESP8266, we want to find one that performs a similar function to our target. Fortunately, a Github of example code exists, which can help us.
Searching the repository, we see a promising file, WiFiClient.ino, which contains the following code:
This sketch sends data via HTTP GET requests to data.sparkfun.com service.
You need to get streamId and privateKey at data.sparkfun.com and paste them
below. Or just customize this script to talk to other HTTP servers.
constchar* ssid = "your-ssid";
constchar* password = "your-password";
constchar* host = "data.sparkfun.com";
constchar* streamId = "....................";
constchar* privateKey = "....................";
// We start by connecting to a WiFi network
Serial.print("Connecting to ");
/* Explicitly set the ESP8266 to be a WiFi-client, otherwise, it by default,
would try to act as both a client and an access-point and could cause
network-issues with your other WiFi-devices on your WiFi-network. */
This is a good sign, as it’s indicative that at the very least, we’re compiling a Sketch which uses the relevant, identical or similar libraries (there may be version discrepancies) to our target firmware image. This increases the likelihood of successful function identification, based on the signatures we’ll obtain.
Compiling the above sketch, results in the following notable compiler output:
/tmp/arduino_build_867542/sketch_may24a.ino.elf: ELF 32-bit LSB executable, Tensilica Xtensa, version 1 (SYSV), statically linked, with debug_info, not stripped
Loading this ELF file into IDA, we can see we’ve got sensible function names! As depicted below:
So, how can we generate a pattern file from the above ELF to create a FLIRT signature? After much research, I found Fire Eye’s IDB2PAT tool, created by the FLARE the division of Fire Eye.
This tool is described as follows:
This script allows you to easily generate function patterns from an existing IDB database that can then be turned into FLIRT signatures to help identify similar functions in new files. More information is available at: https://www.fireeye.com/blog/threat-research/2015/01/flare_ida_pro_script.html
Having installed this plugin, it initially didn’t work at all for my version of IDA (6.8). This appeared to be the result of IDA using QT5 as opposed to Pyside in later versions (7.x), where the plugin was migrated to support version 7.x of IDA and not version 6.8.
Scrolling through the plugin’s known issues, someone pointed out the above and recommended an earlier version be used, which worked with IDA 6.8. I checked out an earlier commit. No more IDA plugin errors.
Did the plugin work? No. It got stuck in an infinite loop upon being launched. It turned out this issue was related to the version I had containing a bug, where functions less than 32 bytes would cause an infinite loop. To fix this issue, I downloaded the latest version of the individual script file, in which the bug was apparently fixed.
The result, yet another issue:
This was seemingly due to a version discrepancy between the installed and targeted IDA SDK. I fixed the plugin by updating the relevant function call “get_name(…)” to “GetFunctionName(…)”. I also added code to ignore functions that started with the word “sub_”, as these were undefined and not useful to me.
See the documentation to learn how to resolve collisions.
We can see six collisions have occurred. In this context, a collision is generated when sigmake encounters the same signature for more than one function. When this happens, it will generate a .exc file listing the collisions, which we can modify to instruct IDA to use one signature over another, for example.
This is a processor plugin for IDA, to support the Xtensa core found in Espressif ESP8266.
With the above information, we’ve also answered our second question of “What is the processor?“.
Understanding the firmware format
Now that IDA can understand the instruction set of the processor, it’s time to learn how firmware images are comprised in terms of format, data and code. Indeed, what is the format of our firmware image?. To help answer this question, my first point of call was to analyse existing open source tools published by Expressif, in order to work with the ESP8266.
This leads us to ESPTool, an application written in Python capable of displaying some information about binary firmware images, amongst other things.
The manual for this tool also gives away some important information:
The elf2image command converts an ELF file (from compiler/linker output) into the binary executable images which can be flashed and then booted into.
From this, we can determine that compiled images, prior to their transformation into firmware, exist in the ELF-32 Xtensa format. This will be useful later on.
Moving back to the other features of ESPTool, we see it’s indeed able to present information about our firmware image:
Which represented as a structure would look like this:
We can see from the function load_segment() that following our image header are the image segment headers, followed immediately by the segment data itself, for each segment.
The following code parses a segment header:
(offset, size) =struct.unpack('<II', f.read(8))
Which again, represented as a structure would be as follows:
This is helpful, we now know both the format of the firmware image and a number of the tools available to process such images. It’s worth noting that we haven’t considered elements such as checksums, but these aren’t important to us as we don’t intend on patching the firmware image.
Whilst a tangent, it’s worth noting that whilst in this case, our format has been documented and tools exist to parse such formats, often this is not the case. In such cases, I’d advise obtaining as many firmware images as you can from your target devices. At that point, a starting point could be to find commonalities between them, which could indicate what certain bytes mean within the format. Also of use would be to understand how an image is booted into, as the bootloader may act differently depending on certain values at fixed offsets.
Understanding the boot process
So, onto our next question, what is the boot process of the device? Understanding this is important as it will help to clarify our understanding of the image. Richard Aburton has very helpfully reverse engineered the boot loader and described the following key point:
It finds the flash address of the rom to boot. Rom 1 is always at 0×1000 (next sector after boot loader). Rom 2 is half the chip size + 0×1000 (unless the chip is above a 1mb when it’s position it kept down to to 0×81000).
Checking the 0×1000 offset within our firmware image, there is indeed a second image, as denoted by presence of the image magic signature (0xE9):
josh@ioteeth:/tmp/reversing$ hexdump -s 0x1000 -v -C recovered_file | head
00001070 b0 ff ff 3f 24 10 20 40 00 ed fe 3f 80 6e 10 40 |...?$. @...?.n.@|
00001080 04 ed fe 3f 79 6e 10 40 fc ec fe 3f f8 ec fe 3f |...?yn.@...?...?|
00001090 6b 6e 10 40 61 6e 10 40 f6 ec fe 3f 52 6e 10 40 |kn.@an.@...?Rn.@|
This second firmware image sits almost immediately after the padding bytes we observed earlier. Based on the format, we can see from the second byte (0×04) that this ROM has 4 segments and is likely to be user or custom ROM code, with the first ROM image potentially being the bootloader of the device, responsible for bootstrapping.
Whilst there are a lot of nuances to the boot process, the above is all we really need to be aware of at this time.
From the information within, we can conclude the following:
0×40100000 – Instruction RAM. Used by bootloader to load SPI Flash <40000h.
0x3FFE8000 – User data RAM. Available to applications.
0x3FFFFFFF – Anything below this address appears to be data, not code
0×40100000 – Anything above this address appears to be code, not data
Anything that doesn’t match an address exactly, we’ll mark as unknown and classify as either code or data based on the rules above.
It should be noted that simply loading the file as ‘binary’ within IDA, having set the appropriate processor, allows for limited understanding and doesn’t display any xrefs to strings that could guide our efforts:
With this in mind, we can write a simple loader for IDA to identify the firmware image and load the segments accordingly, which should yield better results. We’ll use the memory map above as a guide to name the segments and mark them as code or data accordingly.
As with any unknown binary, our initial analysis will help to uncover any strings that may allude to what we’re looking at, as well as any signatures within the file that could present a point of further analysis. Lastly, we want to look at the hexadecimal representation of the file, in order to identify padding or other blocks of interest.
It would appear our firmware is potentially performing a form of port knocking, before connecting to services.ioteeth.com to retrieve the aforementioned secrets. Port knocking is a means of instructing a firewall to open a predefined TCP/UDP port if the correct sequence of ports are ‘knocked’ on, which is usually performed via sending a TCP SYN packet to the required ports. The firewall would recognise the sequence and permit access to the defined resources.
We can perform a fast port scan to check for any filtered ports across a limited number of popular ports:
Completed Connect Scan at 11:29, 1.20s elapsed (100 total ports)
Nmap scan report for services.ioteeth.com (192.168.1.69)
Host is up (0.00059s latency).
Not shown: 98 closed ports
PORT STATE SERVICE
22/tcp open ssh
445/tcp filtered microsoft-ds
Read data files from: /usr/bin/../share/nmap
Nmap done: 1 IP address (1 host up) scanned in 1.23 seconds
We can see that port 445 is filtered, so this could be our target port and equally, this would explain why the service couldn’t be reached by the originator of this challenge. It’s also possible another port is the one being used to obtain/upload secrets and ultimately, we’ll find that port through investigation, we note this however as a passive observation.
Continuing with our analysis, let’s take a look at the hexdump of our firmware image:
josh@ioteeth:/tmp/reversing$ hexdump -v -C recovered_file | head
This padding is potentially useful, as it could be indicative of multiple files or formats being present within our target firmware image.
At this point, we have a number of questions we need to answer before we can continue. We can theorise that our long-term goal, based on the strings observed within the file, is to uncover the port knocking sequence performed by the firmware, which will hopefully allow us to access the external service that’s communicated with. But how do we go about doing that?
It’s worth noting that IDA doesn’t recognise our file, so let’s pause and do some research first.
Questions we need to answer
As with any firmware image, a good starting point prior to reverse engineering is to understand the following:
What is the device in question?
What processor does it operate on?
What is the format of the firmware image in question?
What tools exist to process the type of firmware image we have, if any?
What is the boot process of the device?
What does the physical memory layout look like?
Throughout the course of this section, we’ll work towards learning the answers to the above and understanding how they can help us.
Our penultimate goal will be to load the firmware into IDA for analysis, in a way that allows us to make some sense of what’s happening.
During my time with Cisco Portcullis, I wanted to learn more about reverse engineering embedded device firmware.
This six-part series was written both during my time with Cisco Portcullis, as well in my spare time (if the tagline of this blog didn’t give that away). This series intends to detail my analysis of an embedded device’s firmware, in this case, the ESP8266’s, present a methodology for analysing firmware more generally and of course, solve the challenge presented. It will be one of many series with a focus on firmware reverse engineering and security assessment of ‘smart’ devices.
We will cover the initial analysis, writing an IDA loader and recovering function symbols (names) via IDA’s FLAIR tools. Finally, we’ll reverse engineer the functionality required to solve the challenge, for extra points, without reliance upon string references.
I chose an ESP8266 firmware image supplied by a colleague as a contrived reversing challenge. Mainly because this target made a good starting point due to the simplicity of the firmware format.
The challenge was described as follows:
We managed to obtain the firmware of an unknown device connected to our wireless access point. We’ve been told it’s connecting to a service and retrieving secrets, but we can’t reach the service. Can you?
Update: You can find the a slightly modified version of the supplied binary here (within the zip archive).
This series has been broken down into the following parts: