Reversing ESP8266 Firmware (Part 6)

( original text by @boredpentester )

At this point we’re actually reversing ESP8266 firmware to understand the functionality, specifically, we’d like to understand what the loop function does, which is the main entry point once booted.

Reversing the loop function

I’ve analysed and commented the assembly below to detail guessed ports, functions and hostnames:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
.code_seg_0:402073D0 loop:                                   ; CODE XREF: .code_seg_0:loc_40209F9Dp
.code_seg_0:402073D0                 addi            a1, a1, 0xA0
.code_seg_0:402073D3                 s32i            a15, a1, 0x4C
.code_seg_0:402073D6                 l32r            a15, p_5000
.code_seg_0:402073D9                 s32i            a0, a1, 0x5C
.code_seg_0:402073DC                 mov.n           a2, a15 ; wait 5 seconds
.code_seg_0:402073DE                 s32i            a12, a1, 0x58
.code_seg_0:402073E1                 s32i            a13, a1, 0x54
.code_seg_0:402073E4                 s32i            a14, a1, 0x50
.code_seg_0:402073E7                 call0           delay
.code_seg_0:402073EA                 l32r            a2, dword_40207390
.code_seg_0:402073ED                 l32r            a14, dword_402072B8
.code_seg_0:402073F0                 l32i.n          a3, a2, 0
.code_seg_0:402073F2                 mov.n           a13, a14
.code_seg_0:402073F4                 addi.n          a3, a3, 1
.code_seg_0:402073F6                 s32i.n          a3, a2, 0
.code_seg_0:402073F8                 l32r            a3, off_402072C0
.code_seg_0:402073FB                 mov.n           a2, a14
.code_seg_0:402073FD                 call0           _ZN5Print5printEPKc ; Print::print(
char
const
*)
.code_seg_0:40207400                 l32r            a12, hostname
.code_seg_0:40207403                 mov.n           a2, a14
.code_seg_0:40207405                 l32i.n          a3, a12, 0
.code_seg_0:40207407                 call0           _ZN5Print7printlnEPKc ; Print::println(
char
const
*)
.code_seg_0:4020740A                 mov.n           a2, a1
.code_seg_0:4020740C                 call0           sub_40208454
.code_seg_0:4020740F                 l32i.n          a3, a12, 0
.code_seg_0:40207411                 movi            a4, 1337 ; connect to port 1337
.code_seg_0:40207414                 mov.n           a2, a1
.code_seg_0:40207416                 call0           guessed_connect
.code_seg_0:40207419                 movi            a2, 1000
.code_seg_0:4020741C                 call0           delay   ; wait 1000ms before next connection attempt
.code_seg_0:4020741F                 l32i.n          a3, a12, 0
.code_seg_0:40207421                 l32r            a4, port_8000
.code_seg_0:40207424                 mov.n           a2, a1
.code_seg_0:40207426                 call0           guessed_connect ; connect to port 8000
.code_seg_0:40207429                 movi            a2, 1000
.code_seg_0:4020742C                 call0           delay
.code_seg_0:4020742F                 l32i.n          a3, a12, 0
.code_seg_0:40207431                 l32r            a4, port_3306
.code_seg_0:40207434                 mov.n           a2, a1
.code_seg_0:40207436                 call0           guessed_connect ; connect to port 3306
.code_seg_0:40207439                 movi            a2, 1000
.code_seg_0:4020743C                 call0           delay
.code_seg_0:4020743F                 l32i.n          a3, a12, 0
.code_seg_0:40207441                 l32r            a4, port_4545
.code_seg_0:40207444                 mov.n           a2, a1
.code_seg_0:40207446                 call0           guessed_connect ; connect to port 4545
.code_seg_0:40207449                 movi            a2, 1000
.code_seg_0:4020744C                 call0           delay
.code_seg_0:4020744F                 l32i.n          a3, a12, 0
.code_seg_0:40207451                 mov.n           a2, a1
.code_seg_0:40207453                 movi            a4, 445 ; our final port!
.code_seg_0:40207456                 call0           guessed_connect
.code_seg_0:40207459                 bnez.n          a2, loc_40207469

From the above, we’ve determined that:

1
.code_seg_0:40207400                 l32r            a12, hostname

Is loading a pointer to the hostname variable into the a12 register. This is followed by loading of what looks like a port number into various other registers, again followed by a call0 instruction. This behaviour led me to guess this is likely our connect() function.

From this analysis, we’ve determined our port knocking sequence to be as follows:

  • 1337
  • 8000
  • 3306
  • 4545

With the application connecting predictably, to services.ioteeth.com on port 445.

With that, we’ve effectively solved the challenge! All that’s left is to get the secrets!

Getting the secrets!

In order to obtain the secrets, we need to knock on the now known ports in the correct order. We can do this in various ways, using nmap or even netcat, but I prefer to use the knock binary, as it’s purpose built (and is part of the knockd package).

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
josh@ioteeth:/tmp$ knock -v services.ioteeth.com 1337 8000 3306 4545
hitting tcp 192.168.1.69:1337
hitting tcp 192.168.1.69:8000
hitting tcp 192.168.1.69:3306
hitting tcp 192.168.1.69:4545
josh@ioteeth:/tmp$ nmap -n -PN -F -v services.ioteeth.com -oN services.ioteeth.com.out
Warning: The -PN option is deprecated. Please use -Pn
Starting Nmap 7.40 ( https://nmap.org ) at 2018-05-25 11:55 BST
Initiating Connect Scan at 11:55
Scanning services.ioteeth.com (192.168.1.69) [100 ports]
Discovered open port 445/tcp on 192.168.1.69
Discovered open port 22/tcp on 192.168.1.69
Completed Connect Scan at 11:55, 0.00s elapsed (100 total ports)
Nmap scan report for services.ioteeth.com (192.168.1.69)
Host is up (0.0012s latency).
Not shown: 98 closed ports
PORT    STATE SERVICE
22/tcp  open  ssh
445/tcp open  microsoft-ds
Read data files from: /usr/bin/../share/nmap
Nmap done: 1 IP address (1 host up) scanned in 0.03 seconds

Accessing the service, we receive the following:

1
2
3
4
josh@ioteeth:/tmp$ curl http://services.ioteeth.com:445/get_secrets
Well done! We hope you had fun with this challenge and learned a lot!
flag{esp8266_reversing_is_awes0me}

Conclusion

In this post, we set out to understand how a particular firmware image communicated with external services to apparently obtain secrets. We knew nothing about the firmware initially and wanted to describe a methodology for analysing unknown formats.

Ultimately we’ve taken the following steps:

  • Analysed the file using common Linux utilities filebinwalkstrings and hexdump
  • Made note that our firmware image is based on the ESP8266 and is likely performing a form of port knocking, prior to accessing secrets, based on the strings within.
  • Performed research, as well as reversed open source tools, to understand the hardware on which the firmware image runs, its processor, boot process and the memory layout, as well the firmware image format itself.
  • Equipped our tools with the appropriate additions to understand the Xtensa processor.
  • Written a loader for IDA that’s capable of loading future firmware images of this format.
  • Came to understand the format of compiled code prior to being exported as a firmware image.
  • Written and compiled our own code for the ESP8266 to obtain debugging symbols.
  • Patched and made use of FireEye’s IDB2PAT IDA plugin, to generate FLIRT signatures from our debug build.
  • Applied our FLIRT signatures across our target firmware image, to recognise library functions.
  • Observed the use of vtable’s to call library functions and used this to classify other unknown library functions.
  • Used references to functions of known and likely libraries to locate the firmware image’s main processing loop.
  • Reverse engineered the main loop function to understand our port knocking sequence.
  • Made use of the knock client to perform our port knocking and reap all of the secrets!

I’d like to think that this methodology can be applied more generally when analysing unknown binaries or firmware images. In this case, we were fortunate in that most of the internals had been documented already and as documented here, our job was to put the pieces together. I’d encourage the reader to look at other firmware images, such as router firmware for example.

References

Special thanks to the author’s of the following for their insight:

  • https://github.com/esp8266/Arduino/blob/master/libraries/ESP8266WiFi/examples/WiFiClient/WiFiClient.ino – ESP8266 Wifi Connect Example
  • https://richard.burtons.org/2015/05/17/decompiling-the-esp8266-boot-loader-v1-3b3/ – Decompiling the boot loader
  • https://github.com/nodemcu/nodemcu-firmware/tree/c8037568571edb5c568c2f8231e4f8ce0683b883 – NodeMCU tools
  • https://github.com/espressif/esptool – used to understand the firmware format
  • http://developers-club.com/posts/255135/ – describes memory layout and format
  • http://developers-club.com/posts/255153/ – describes memory layout and format
  • https://github.com/fireeye/flare-ida – FireEye’s IDB2PAT plugin!
  • https://github.com/esp8266/esp8266-wiki/wiki/Memory-Map – segment memory map!
  • https://en.wikipedia.org/wiki/ESP8266 – Description of the device
  • http://wiki.linux-xtensa.org/index.php/ABI_Interface – Xtensa calling convention
  • http://cholla.mmto.org/esp8266/xtensa.html – Xtensa instruction set
  • https://en.wikipedia.org/wiki/ESP8266 – ESP8266 Wiki page

Tools

  • IDA 6.8.
  • IDA FLAIR Utils 6.8.
  • Xtensa IDA processor plugin.
  • Linux utils: file, strings and hexdump.
  • Binwalk.
  • FireEye’s IDB2PAT.
  • Knock of the Knockd package.
  • Nmap.
  • cURL.

Feedback

I’m always keen to hear feedback, be it corrections or comments more generally. Drop me a tweet and feel free to share this post, as well as your own experiences reverse engineering firmware.

In future posts, I’ll be taking apart common, cheap ‘smart’ products such as doorbells and other things I’d like to use at home.

Reversing ESP8266 Firmware (Part 5)

( original text by @boredpentester )

Recognising VTABLE’s

After analysing our firmware image to some degree, it becomes clear that vtables are in use.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
.
int
_ZN24BufferedStreamDataSourceI13ProgmemStreamE14release_bufferEPKhj ; BufferedStreamDataSource<ProgmemStream>::release_buffer(uchar
const
*,uint)
.code_seg_0:4020B1E4                 .
int
0, 0, 0
.code_seg_0:4020B1F0 off_4020B1F0    .
int
_ZN10WiFiClient5writeEh
.code_seg_0:4020B1F0                                         ; DATA XREF: .code_seg_0:off_4020804Co
.code_seg_0:4020B1F0                                         ; WiFiClient::write(uchar)
.code_seg_0:4020B1F4                 .
int
_ZN10WiFiClient5writeEPKhj ; WiFiClient::write(uchar
const
*,uint)
.code_seg_0:4020B1F8                 .
int
loc_402082EF+1
.code_seg_0:4020B1FC                 .
int
sub_40207AA0
.code_seg_0:4020B200                 .
int
_ZN10WiFiClient4readEv ; WiFiClient::read(
void
)
.code_seg_0:4020B204                 .
int
loc_4020AF27+1
.code_seg_0:4020B208                 .
int
_ZN6Stream9readBytesEPcj ; Stream::readBytes(
char
*,uint)
.code_seg_0:4020B20C                 .
int
_ZN6Stream9readBytesEPhj ; Stream::readBytes(uchar *,uint)
.code_seg_0:4020B210                 .
int
dword_40207F28+0x18
.code_seg_0:4020B214                 .
int
sub_40207A58
.code_seg_0:4020B218                 .
int
_ZN10WiFiClient4readEPhj ; WiFiClient::read(uchar *,uint)
.code_seg_0:4020B21C                 .
int
_ZN10WiFiClient4stopEv ; WiFiClient::stop(
void
)
.code_seg_0:4020B220                 .
int
_ZN10WiFiClient9connectedEv ; WiFiClient::connected(
void
)
.code_seg_0:4020B224                 .
int
_ZN10WiFiClientcvbEv ; WiFiClient::operator
bool
(
void
)
.code_seg_0:4020B228                 .
int
_ZN10WiFiClientD2Ev ; WiFiClient::~WiFiClient()
.code_seg_0:4020B22C                 .
int
sub_40208098
.code_seg_0:4020B230                 .
int
_ZN10WiFiClient7connectE6Stringt ; WiFiClient::connect(String,ushort)
.code_seg_0:4020B234                 .
int
_ZN10WiFiClient7write_PEPKcj ; WiFiClient::write_P(
char
const
*,uint)
.code_seg_0:4020B238                 .
int
sub_40207AD0
.code_seg_0:4020B23C                 .
int
0, 0, 0
.code_seg_0:4020B248                 .
int
_ZN6SdFile5writeEh ; SdFile::write(uchar)
.code_seg_0:4020B24C                 .
int
_ZN5Print5writeEPKhj ; Print::write(uchar
const
*,uint)
.code_seg_0:4020B250                 .
int
sub_4020AFC4
.code_seg_0:4020B254                 .
int
0, 0, 0
.code_seg_0:4020B260 off_4020B260    .
int
loc_40209714       ; DATA XREF: .code_seg_0:off_402096C8o
.code_seg_0:4020B264                 .
int
loc_4020972C
.code_seg_0:4020B268                 .
int
dword_40209764+4
.code_seg_0:4020B26C                 .
int
loc_40209740
.code_seg_0:4020B270                 .
int
loc_40209700
.code_seg_0:4020B274                 .
int
loc_402096EC
.code_seg_0:4020B278                 .
int
_ZN6Stream9readBytesEPcj ; Stream::readBytes(
char
*,uint)
.code_seg_0:4020B27C                 .
int
_ZN6Stream9readBytesEPhj ; Stream::readBytes(uchar *,uint)

VTABLE in this context is essentially a collection of function pointers per each module of the application’s libraries. We can see that each library’s function pointers are delimited by three nullbytes, represented as the below for example:

1
2
3
4
5
6
[...]
.code_seg_0:4020B234                 .
int
_ZN10WiFiClient7write_PEPKcj ; WiFiClient::write_P(
char
const
*,uint)
.code_seg_0:4020B238                 .
int
sub_40207AD0
.code_seg_0:4020B23C                 .
int
0, 0, 0
.code_seg_0:4020B248                 .
int
_ZN6SdFile5writeEh ; SdFile::write(uchar)
[...]

Where we can observe two functions of the WiFiClient module, followed by three nullbytes (a delimiter) and finally, followed by the function pointers of the next module, in this case SdFile.

This is an important observation, as it will allow us to recognise if an unknown function belongs to a particular library, based on its presence within the VTABLEs amongst the other libraries. Given the below for example:

1
2
3
.code_seg_0:4020B234                 .int _ZN10WiFiClient7write_PEPKcj ; WiFiClient::write_P(char const*,uint)
.code_seg_0:4020B238                 .int sub_40207AD0
.code_seg_0:4020B23C                 .int 0, 0, 0

We can infer that sub_40207AD0 whilst unnamed and unknown in terms of functionality, does in-fact belong to the WiFiClient library, which hints at its purpose.

Finding the port knock sequence

Armed with all of our obtained knowledge, at this point we’re in a position to find references to the connect() function of the WiFiClient library, or indeed to other functions, including those that are unnamed, in search of our port knocking sequence.

Having searched for references to connect(), I couldn’t find any. I did however, after checking for references to all functions that were part of the WiFiClient library, find a reference to the following unnamed function:

1
.code_seg_0:4020B214                 .int sub_40207A58

The following XREFS were identified:

The function referenced appeared to be quite involved. You can see it below:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
.code_seg_0:402073D0 sub_402073D0:                           ; CODE XREF: .code_seg_0:loc_40209F9Dp
.code_seg_0:402073D0                 addi            a1, a1, 0xA0
.code_seg_0:402073D3                 s32i            a15, a1, 0x4C
.code_seg_0:402073D6                 l32r            a15, dword_4020738C
.code_seg_0:402073D9                 s32i            a0, a1, 0x5C
.code_seg_0:402073DC                 mov.n           a2, a15
.code_seg_0:402073DE                 s32i            a12, a1, 0x58
.code_seg_0:402073E1                 s32i            a13, a1, 0x54
.code_seg_0:402073E4                 s32i            a14, a1, 0x50
.code_seg_0:402073E7                 call0           delay
.code_seg_0:402073EA                 l32r            a2, dword_40207390
.code_seg_0:402073ED                 l32r            a14, dword_402072B8
.code_seg_0:402073F0                 l32i.n          a3, a2, 0
.code_seg_0:402073F2                 mov.n           a13, a14
.code_seg_0:402073F4                 addi.n          a3, a3, 1
.code_seg_0:402073F6                 s32i.n          a3, a2, 0
.code_seg_0:402073F8                 l32r            a3, off_402072C0
.code_seg_0:402073FB                 mov.n           a2, a14
.code_seg_0:402073FD                 call0           _ZN5Print5printEPKc ; Print::print(
char
const
*)
.code_seg_0:40207400                 l32r            a12, off_40207394
.code_seg_0:40207403                 mov.n           a2, a14
.code_seg_0:40207405                 l32i.n          a3, a12, 0
.code_seg_0:40207407                 call0           _ZN5Print7printlnEPKc ; Print::println(
char
const
*)
.code_seg_0:4020740A                 mov.n           a2, a1
.code_seg_0:4020740C                 call0           sub_40208454
.code_seg_0:4020740F                 l32i.n          a3, a12, 0
.code_seg_0:40207411                 movi            a4, 0x539
.code_seg_0:40207414                 mov.n           a2, a1
.code_seg_0:40207416                 call0           sub_40207A58
.code_seg_0:40207419                 movi            a2, 0x3E8
.code_seg_0:4020741C                 call0           delay
.code_seg_0:4020741F                 l32i.n          a3, a12, 0
.code_seg_0:40207421                 l32r            a4, dword_40207398
.code_seg_0:40207424                 mov.n           a2, a1
.code_seg_0:40207426                 call0           sub_40207A58
.code_seg_0:40207429                 movi            a2, 0x3E8
.code_seg_0:4020742C                 call0           delay
.code_seg_0:4020742F                 l32i.n          a3, a12, 0
.code_seg_0:40207431                 l32r            a4, dword_4020739C
.code_seg_0:40207434                 mov.n           a2, a1
.code_seg_0:40207436                 call0           sub_40207A58
.code_seg_0:40207439                 movi            a2, 0x3E8
.code_seg_0:4020743C                 call0           delay
.code_seg_0:4020743F                 l32i.n          a3, a12, 0
.code_seg_0:40207441                 l32r            a4, dword_402073A0
.code_seg_0:40207444                 mov.n           a2, a1
.code_seg_0:40207446                 call0           sub_40207A58
.code_seg_0:40207449                 movi            a2, 0x3E8
.code_seg_0:4020744C                 call0           delay
.code_seg_0:4020744F                 l32i.n          a3, a12, 0
.code_seg_0:40207451                 mov.n           a2, a1
.code_seg_0:40207453                 movi            a4, 0x1BD
.code_seg_0:40207456                 call0           sub_40207A58
.code_seg_0:40207459                 bnez.n          a2, loc_40207469
[...]

As an educated guess, we can assume this is probably the loop function of our image, which is responsible for doing most of the heavy leg work.

Understanding the Xtensa instruction set

In order to understand what the instructions above are doing, we want to have at least a passing familarity with what registers the processor uses and their purpose, as well as what common instructions do and how conditional jumps work.

It turns out someone has documented most of the common instructions of the Xtensa processor.

An excerpt of this guide, which covers loading and storing instructions, as well as register usage, is below:

This is a load/store machine with either 16 or 24 bit instructions. This leads to higher code density than with constand 32 bit encoding. Some instructions have optional “short” 16 bit encodings indicated by appending “.n” to the mnemonic. The Xtensa implements SPARC like register windows on subroutine calls, but I have never seen this feature used in either the bootrom or code generated by gcc, so this can be ignored.
There are 16 tegisters named a0 through a15.

a0 is special – it holds the call return address.
a1 is used by gcc as a stack pointer.
a2 gets used to pass a single argument (and to return a function value).

Understanding the Xtensa calling convention

It would also be helpful to understand the calling convention, which describes how arguments are passed to function calls. I found this document, which describes the calling convention as follows:

Arguments are passed in both registers and memory. The first six incoming arguments are stored in registers a2 through a7, and additional arguments are stored on the stack starting at the current stack pointer a1. […].

Thus we can determine that registers a2 to a7, in most cases will be used to store arguments passed to functions. The register a2 is also used when passing a single argument to a function.

Reversing ESP8266 Firmware (Part 4)

( original text by @boredpentester )

Writing an IDA loader

So, why a loader? The main reason was that I wanted something I could re-use when reversing future ESP8266 firmware dumps.

Our loader will be quite simple. IDA loaders typically define the following functions:

1
2
def
accept_file(li, n):
def
load_file(li, neflags,
format
):

The first is responsible for identifying an applicable file, based on its signature and is executed when you open a file in IDA for analysis. The second, for interpreting the file, setting entry points, processor, as well as loading and naming segments accordingly. Our loader won’t perform any sanity checking, but should be able to load an image for us.

My loader is derived from the existing loader classes shipped with IDA and of-course, is built to take into account the format we’ve dissected above. It will attempt to identify the firmware image based on signature (image magic), followed by loading each of the segments into memory, whilst trying to guess the names and types of segments based on their loading address.

Below is the Python code for our loader, which lives in IDA’s loader directory:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#!/usr/bin/python
from
struct
import
unpack_from
from
idaapi
import
*
def
accept_file(li, n):
    
retval
=
0
    
if
n
=
=
0
:
        
li.seek(
0
)
        
if
li.read(
2
)
=
=
"e901"
.decode(
"hex"
):
            
retval
=
"ESP8266 firmware"
    
return
retval
def
load_file(li, neflags,
format
):
    
li.seek(
0
)
    
# set processor type (doesn't appear to work)
    
SetProcessorType(
"xtensa"
, SETPROC_ALL);
    
# load ROM segment
    
(magic, segments, flash_mode, flash_size_freq, entrypoint)
=
struct.unpack(
'<BBBBI'
, li.read(
8
))
    
print
"Reading ROM boot firmware"
    
print
"Magic: %x"
%
magic
    
print
"Segments: %x"
%
segments
    
print
"Entry point: %x"
%
entrypoint
    
print
"\n"
    
(rom_addr, rom_size)
=
unpack_from(
"<II"
,li.read(
8
))
    
li.file2base(
16
, rom_addr, rom_addr
+
rom_size,
True
)
    
add_segm(
0
, rom_addr, rom_addr
+
rom_size,
".boot_rom"
,
"CODE"
)
    
idaapi.add_entry(
0
, entrypoint,
"rom_entry"
,
1
)
    
print
"Reading boot loader code"
    
print
"ROM address: %x"
%
rom_addr
    
print
"ROM size: %x"
%
rom_size
    
print
"\n"
    
# Go to user ROM code
    
li.seek(
0x1000
,
0
)
    
# load ROM segment
    
(magic, segments, flash_mode, flash_size_freq, entrypoint)
=
struct.unpack(
'<BBBBI'
, li.read(
8
))
    
idaapi.add_entry(
1
, entrypoint,
"user_entry"
,
1
)
    
print
"Reading user firmware"
    
print
"Magic: %x"
%
magic
    
print
"Segments: %x"
%
segments
    
print
"Entry point: %x"
%
entrypoint
    
print
"\n"
    
print
"Reading user code"
      
    
for
k
in
xrange
(segments):
        
(seg_addr, seg_size)
=
unpack_from(
"<II"
,li.read(
8
))
        
file_offset
=
li.tell()
        
if
(seg_addr
=
=
0x40100000
):
            
seg_name
=
".user_rom"
            
seg_type
=
"CODE"
        
elif
(seg_addr
=
=
0x3FFE8000
):
            
seg_name
=
".user_rom_data"
            
seg_type
=
"DATA"
        
elif
(seg_addr <
=
0x3FFFFFFF
):
            
seg_name
=
".data_seg_%d"
%
k
            
seg_type
=
"DATA"
        
elif
(seg_addr >
0x40100000
):
            
seg_name
=
".code_seg_%d"
%
k
            
seg_type
=
"CODE"
        
else
:
            
seg_name
=
".unknown_seg_%d"
%
k
            
seg_type
=
"CODE"
        
print
"Seg name: %s"
%
seg_name
        
print
"Seg type: %s"
%
seg_type
        
print
"Seg address: %x"
%
seg_addr
        
print
"Seg size: %x"
%
seg_size
        
print
"\n"
        
li.file2base(file_offset, seg_addr, seg_addr
+
seg_size,
True
)
        
add_segm(
0
, seg_addr, seg_addr
+
seg_size, seg_name, seg_type)
        
        
li.seek(file_offset
+
seg_size,
0
)
    
return
1

As you can see, the user segment loading loop, which iterates over each of the segments within ROM 1, attempts to perform some basic classification and naming based on the load address of the given segment, per our rules mentioned earlier.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
if
(seg_addr
=
=
0x40100000
):
    
seg_name
=
".user_rom"
    
seg_type
=
"CODE"
elif
(seg_addr
=
=
0x3FFE8000
):
    
seg_name
=
".user_rom_data"
    
seg_type
=
"DATA"
elif
(seg_addr <
=
0x3FFFFFFF
):
    
seg_name
=
".data_seg_%d"
%
k
    
seg_type
=
"DATA"
elif
(seg_addr >
0x40100000
):
    
seg_name
=
".code_seg_%d"
%
k
    
seg_type
=
"CODE"
else
:
    
seg_name
=
".unknown_seg_%d"
%
k
    
seg_type
=
"CODE"

With this loader in use, IDA now recognises our firmware image:

Our segments look a lot tidier:

And we have an entry point! (of the user ROM):

Whilst we’re in a good state to perform cursory analysis, we don’t have any function names to base our analysis on. Ideally, we’d like to identify the routine(s) responsible for connecting to a given port and locate the references to that function, as well as make sense of any other library function calls. This will allow us to discover the ports knocked on, as well as the order of which knocking should take place.

Performing library recognition

There are known and documented methods to identify library functions within a statically linked, stripped image. The most known of which is to use IDA’s Fast Library Acquisition for Identification and Recognition(FLAIR) tools, which in turn creates Fast Library Identification and Recognition Technology (FLIRT) signatures.

The process of creating FLIRT signatures usually requires a number of prerequisite conditions to exist:

  • A pattern file must be created via either pelf or similar, followed by use of sigmake
  • A compiled, relocatable library containing the functions and associated names, of which signatures are to be generated against, must exist
  • The library must be a recognised format and with a supported instruction set

This poses two problems, the first is that we don’t have such a library available to us at present, the second is that Xtensa is not a supported processor type, as shown below.

1
2
3
4
5
josh@ioteeth:/tmp/flair68/bin/linux$ ./pelf
ELF parser. Copyright (c) 2000-2015 Hex-Rays SA. Version 1.16
Supported processors: MIPS, I960, ARM, IBM PC, M6812, SuperH
Usage: ./pelf [-switch or @file or $env_var] file [pattern-file]
    
(wildcards are allowed)

The result is that we can’t create pattern files using IDA’s traditional toolset.

The solution to these problems, which we’ll tackle in a moment (not without their own obstacles) are as follows:

  • We need to install a suitable IDE capable of compiling code for the ESP8266
  • We need to write code that hopefully, uses the same libraries as our target
  • We need to compile our code into an ELF file that is statically linked, unstripped and with debug info.
  • We need to find a way to create signatures from said ELF file

The first step is involved and beyond the scope of this blog post. I’ve opted to use Arduino IDE and configured it to compile for a generic ESP8266 module, with verbose compiler output enabled.

With our environment configured, we can look up example sketches for the ESP8266, we want to find one that performs a similar function to our target. Fortunately, a Github of example code exists, which can help us.

Searching the repository, we see a promising file, WiFiClient.ino, which contains the following code:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
/*
    
This sketch sends data via HTTP GET requests to data.sparkfun.com service.
    
You need to get streamId and privateKey at data.sparkfun.com and paste them
    
below. Or just customize this script to talk to other HTTP servers.
*/
#include <ESP8266WiFi.h>
const
char
* ssid     =
"your-ssid"
;
const
char
* password =
"your-password"
;
const
char
* host =
"data.sparkfun.com"
;
const
char
* streamId   =
"...................."
;
const
char
* privateKey =
"...................."
;
void
setup() {
  
Serial.begin(115200);
  
delay(10);
  
// We start by connecting to a WiFi network
  
Serial.println();
  
Serial.println();
  
Serial.print(
"Connecting to "
);
  
Serial.println(ssid);
  
/* Explicitly set the ESP8266 to be a WiFi-client, otherwise, it by default,
     
would try to act as both a client and an access-point and could cause
     
network-issues with your other WiFi-devices on your WiFi-network. */
  
WiFi.mode(WIFI_STA);
  
WiFi.begin(ssid, password);
  
while
(WiFi.status() != WL_CONNECTED) {
    
delay(500);
    
Serial.print(
"."
);
  
}
  
Serial.println(
""
);
  
Serial.println(
"WiFi connected"
);
  
Serial.println(
"IP address: "
);
  
Serial.println(WiFi.localIP());
}
int
value = 0;
void
loop() {
  
delay(5000);
  
++value;
  
Serial.print(
"connecting to "
);
  
Serial.println(host);
  
// Use WiFiClient class to create TCP connections
  
WiFiClient client;
  
const
int
httpPort = 80;
  
if
(!client.connect(host, httpPort)) {
    
Serial.println(
"connection failed"
);
    
return
;
  
}
  
// We now create a URI for the request
  
String url =
"/input/"
;
  
url += streamId;
  
url +=
"?private_key="
;
  
url += privateKey;
  
url +=
"&value="
;
  
url += value;
  
Serial.print(
"Requesting URL: "
);
  
Serial.println(url);
  
// This will send the request to the server
  
client.print(String(
"GET "
) + url +
" HTTP/1.1\r\n"
+
               
"Host: "
+ host +
"\r\n"
+
               
"Connection: close\r\n\r\n"
);
  
unsigned
long
timeout = millis();
  
while
(client.available() == 0) {
    
if
(millis() - timeout > 5000) {
      
Serial.println(
">>> Client Timeout !"
);
      
client.stop();
      
return
;
    
}
  
}
  
// Read all the lines of the reply from server and print them to Serial
  
while
(client.available()) {
    
String line = client.readStringUntil(
'\r'
);
    
Serial.print(line);
  
}
  
Serial.println();
  
Serial.println(
"closing connection"
);
}

Based on the included files, we can see that this code uses the ESP8266WiFi library, which was displayed in our strings output earlier:

1
2
3
4
5
6
7
josh@ioteeth:/tmp/reversing$ strings recovered_file | grep -i wifi
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
[...]
ap_probe_send over, rest wifi status to disassoc
WiFi connected

This is a good sign, as it’s indicative that at the very least, we’re compiling a Sketch which uses the relevant, identical or similar libraries (there may be version discrepancies) to our target firmware image. This increases the likelihood of successful function identification, based on the signatures we’ll obtain.

Compiling the above sketch, results in the following notable compiler output:

1
"/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/tools/esptool/esptool" -eo "/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/bootloaders/eboot/eboot.elf" -bo "/tmp/arduino_build_867542/sketch_may24a.ino.bin" -bm qio -bf 40 -bz 512K -bs .text -bp 4096 -ec -eo "/tmp/arduino_build_867542/sketch_may24a.ino.elf" -bs .irom0.text -bs .text -bs .data -bs .rodata -bc -ec

Which presents us with an ELF file, prior to its transformation into firmware, which is as follows:

1
2
josh@ioteeth:/tmp/reversing$ file /tmp/arduino_build_867542/sketch_may24a.ino.elf
/tmp/arduino_build_867542/sketch_may24a.ino.elf: ELF 32-bit LSB executable, Tensilica Xtensa, version 1 (SYSV), statically linked, with debug_info, not stripped

Loading this ELF file into IDA, we can see we’ve got sensible function names! As depicted below:

So, how can we generate a pattern file from the above ELF to create a FLIRT signature? After much research, I found Fire Eye’s IDB2PAT tool, created by the FLARE the division of Fire Eye.

This tool is described as follows:

This script allows you to easily generate function patterns from an existing IDB database that can then be turned into FLIRT signatures to help identify similar functions in new files. More information is available at: https://www.fireeye.com/blog/threat-research/2015/01/flare_ida_pro_script.html

Fixing IDB2PAT

Having installed this plugin, it initially didn’t work at all for my version of IDA (6.8). This appeared to be the result of IDA using QT5 as opposed to Pyside in later versions (7.x), where the plugin was migrated to support version 7.x of IDA and not version 6.8.

Scrolling through the plugin’s known issues, someone pointed out the above and recommended an earlier version be used, which worked with IDA 6.8. I checked out an earlier commit. No more IDA plugin errors.

Did the plugin work? No. It got stuck in an infinite loop upon being launched. It turned out this issue was related to the version I had containing a bug, where functions less than 32 bytes would cause an infinite loop. To fix this issue, I downloaded the latest version of the individual script file, in which the bug was apparently fixed.

The result, yet another issue:

This was seemingly due to a version discrepancy between the installed and targeted IDA SDK. I fixed the plugin by updating the relevant function call “get_name(…)” to “GetFunctionName(…)”. I also added code to ignore functions that started with the word “sub_”, as these were undefined and not useful to me.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
# ported from IDB2SIG plugin updated by TQN
def
make_func_sig(config, func):
    
"""
    
type config: Config
    
type func: idc.func_t
    
"""
    
logger
=
logging.getLogger(
"idb2pat:make_func_sig"
)
    
if
func.endEA
-
func.startEA < config.min_func_length:
        
logger.debug(
"Function is too short"
)
        
raise
FuncTooShortException()
    
ea
=
func.startEA
    
publics
=
[] 
# type: idc.ea_t
    
refs
=
{} 
# type: dict(idc.ea_t, idc.ea_t)
    
variable_bytes
=
set
([]) 
# type: set of idc.ea_t
    
if
(GetFunctionName(ea).startswith(
"sub_"
)):
        
logger.info(
"Ignoring %s"
, GetFunctionName(ea))
        
raise
FuncTooShortException

Generating our pattern file

With these changes made, the plugin appeared to work:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
[...]
INFO:idb2pat:make_func_sigs:[ 38 / 1888 ] RC_GetAckTime 0x401010b8L
INFO:idb2pat:make_func_sigs:[ 39 / 1888 ] RC_GetCtsTime 0x401010ccL
INFO:idb2pat:make_func_sigs:[ 40 / 1888 ] RC_GetBlockAckTime 0x40101104L
INFO:idb2pat:make_func_sigs:[ 41 / 1888 ] sub_40101144 0x40101144L
INFO:idb2pat:make_func_sig:Ignoring sub_40101144
INFO:idb2pat:make_func_sigs:[ 42 / 1888 ] sub_40101178 0x40101178L
INFO:idb2pat:make_func_sig:Ignoring sub_40101178
INFO:idb2pat:make_func_sigs:[ 43 / 1888 ] sub_4010122C 0x4010122cL
INFO:idb2pat:make_func_sig:Ignoring sub_4010122C
INFO:idb2pat:make_func_sigs:[ 44 / 1888 ] sub_4010125C 0x4010125cL
INFO:idb2pat:make_func_sig:Ignoring sub_4010125C
INFO:idb2pat:make_func_sigs:[ 45 / 1888 ] sub_401012F4 0x401012f4L
INFO:idb2pat:make_func_sig:Ignoring sub_401012F4
INFO:idb2pat:make_func_sigs:[ 46 / 1888 ] rcUpdateTxDone 0x40101350L
[...]

Finally, we had a generated pattern file, of which we could run sigmake against.

1
2
3
josh@ioteeth:/tmp/flair68/bin/linux$ ./sigmake ../../../reversing/sketch_may24a.ino.pat /tmp/reversing/esp_lib_sigs.sig
/tmp/reversing/esp_lib_sigs.sig: modules/leaves: 1538/1547, COLLISIONS: 6
See the documentation to learn how to resolve collisions.

We can see six collisions have occurred. In this context, a collision is generated when sigmake encounters the same signature for more than one function. When this happens, it will generate a .exc file listing the collisions, which we can modify to instruct IDA to use one signature over another, for example.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
josh@ioteeth:/tmp/flair68/bin/linux$ scat /tmp/reversing/esp_lib_sigs.exc
;--------- (delete these lines to allow sigmake to read this file)
; add '+' at the start of a line to select a module
; add '-' if you are not sure about the selection
; do nothing if you want to exclude all modules
wifi_register_rfid_locp_recv_cb                     00 0000 12C1F00261008548EA02210012C110800000............................
wifi_unregister_rfid_locp_recv_cb                   00 0000 12C1F00261008548EA02210012C110800000............................
_ZN5Print5printEPKc                                 00 0000 12C1F0093185FCFF083112C1100DF0..................................
udp_new_ip_type                                     00 0000 12C1F0093185FCFF083112C1100DF0..................................
pgm_read_byte_inlined                               00 0000 2030143022C02802D033110003402030913020740DF0....................
pgm_read_byte_inlined_0                             00 0000 2030143022C02802D033110003402030913020740DF0....................
_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE0EED2Ev  00 0000 31FFFF39020DF0..................................................
_ZN10DataSourceD2Ev                                 00 0000 31FFFF39020DF0..................................................
_ZN14HardwareSerialD2Ev                             00 0000 31FFFF39020DF0..................................................
_ZN2fs8FileImplD2Ev                                 00 0000 31FFFF39020DF0..................................................
_ZN2fs7DirImplD2Ev                                  00 0000 31FFFF39020DF0..................................................
glue2git_err                                        00 0000 32C2101C047C3237340D21FCFF3A322203008022012028310DF0............
git2glue_err                                        00 0000 32C2101C047C3237340D21FCFF3A322203008022012028310DF0............
system_rtc_mem_write                                00 0000 52A0BF2735149C130C37306014CCA6E0921182A3009088C047A8030C020DF047
system_rtc_mem_read                                 00 0000 52A0BF2735149C130C37306014CCA6E0921182A3009088C047A8030C020DF047

I’ve edited my exclusion file as follows:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
+wifi_register_rfid_locp_recv_cb                    00 0000 12C1F00261008548EA02210012C110800000............................
wifi_unregister_rfid_locp_recv_cb                   00 0000 12C1F00261008548EA02210012C110800000............................
+_ZN5Print5printEPKc                                00 0000 12C1F0093185FCFF083112C1100DF0..................................
udp_new_ip_type                                     00 0000 12C1F0093185FCFF083112C1100DF0..................................
+pgm_read_byte_inlined                              00 0000 2030143022C02802D033110003402030913020740DF0....................
pgm_read_byte_inlined_0                             00 0000 2030143022C02802D033110003402030913020740DF0....................
_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE0EED2Ev  00 0000 31FFFF39020DF0..................................................
_ZN10DataSourceD2Ev                                 00 0000 31FFFF39020DF0..................................................
_ZN14HardwareSerialD2Ev                             00 0000 31FFFF39020DF0..................................................
+_ZN2fs8FileImplD2Ev                                00 0000 31FFFF39020DF0..................................................
-_ZN2fs7DirImplD2Ev                                 00 0000 31FFFF39020DF0..................................................
+glue2git_err                                       00 0000 32C2101C047C3237340D21FCFF3A322203008022012028310DF0............
git2glue_err                                        00 0000 32C2101C047C3237340D21FCFF3A322203008022012028310DF0............
+system_rtc_mem_write                               00 0000 52A0BF2735149C130C37306014CCA6E0921182A3009088C047A8030C020DF047
system_rtc_mem_read                                 00 0000 52A0BF2735149C130C37306014CCA6E0921182A3009088C047A8030C020DF047

Re-running sigmake and correcting one last collision, followed by running again, finally results in a usable signature file.

Applying these signatures against our firmware file, resolves many of the library functions present, including the connect() call (which we hope is the one we want):

Reversing ESP8266 Firmware (Part 3)

( original text by @boredpentester )

What is it?

So, what is the ESP8266? Wikipedia describes it as follows:

The ESP8266 is a low-cost Wi-Fi microchip with full TCP/IP stack and microcontroller capability produced by Shanghai-based Chinese manufacturer, Espressif Systems.

Moreover, Wikipedia alludes to the processor specifics:

Processor: L106 32-bit RISC microprocessor core based on the Tensilica Xtensa Diamond Standard 106Micro running at 80 MHz”

At present, my version of IDA does not recognise this processor, but looking up “IDA Xtensa” unveils a processor module to support the instruction set, which is described as follows:

This is a processor plugin for IDA, to support the Xtensa core found in Espressif ESP8266.

With the above information, we’ve also answered our second question of “What is the processor?“.

Understanding the firmware format

Now that IDA can understand the instruction set of the processor, it’s time to learn how firmware images are comprised in terms of format, data and code. Indeed, what is the format of our firmware image?. To help answer this question, my first point of call was to analyse existing open source tools published by Expressif, in order to work with the ESP8266.

This leads us to ESPTool, an application written in Python capable of displaying some information about binary firmware images, amongst other things.

The manual for this tool also gives away some important information:

The elf2image command converts an ELF file (from compiler/linker output) into the binary executable images which can be flashed and then booted into.

From this, we can determine that compiled images, prior to their transformation into firmware, exist in the ELF-32 Xtensa format. This will be useful later on.

Moving back to the other features of ESPTool, we see it’s indeed able to present information about our firmware image:

1
2
3
4
5
6
7
josh@ioteeth:/tmp/reversing$ ~/esptool/esptool.py image_info recovered_file
esptool.py v2.4.0-dev
Image version: 1
Entry point: 4010f29c
1 segments
Segment 1: len 0x00568 load 0x4010f000 file_offs 0x00000008
Checksum: 2d (valid)

Clearly, this application understands the format of an image, so let’s take it apart and see how it works.

Browsing through the code, we come across:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Memory addresses
IROM_MAP_START
=
0x40200000
IROM_MAP_END
=
0x40300000
[...]
class
ESPFirmwareImage(BaseFirmwareImage):
    
""" 'Version 1' firmware image, segments loaded directly by the ROM bootloader. """
    
ROM_LOADER
=
ESP8266ROM
    
def
__init__(
self
, load_file
=
None
):
        
super
(ESPFirmwareImage,
self
).__init__()
        
self
.flash_mode
=
0
        
self
.flash_size_freq
=
0
        
self
.version
=
1
        
if
load_file
is
not
None
:
            
segments
=
self
.load_common_header(load_file, ESPLoader.ESP_IMAGE_MAGIC)
            
for
_
in
range
(segments):
                
self
.load_segment(load_file)
            
self
.checksum
=
self
.read_checksum(load_file)
[...]
class
BaseFirmwareImage(
object
):
    
SEG_HEADER_LEN
=
8
    
""" Base class with common firmware image functions """
    
def
__init__(
self
):
        
self
.segments
=
[]
        
self
.entrypoint
=
0
    
def
load_common_header(
self
, load_file, expected_magic):
            
(magic, segments,
self
.flash_mode,
self
.flash_size_freq,
self
.entrypoint)
=
struct.unpack(
'<BBBBI'
, load_file.read(
8
))
            
if
magic !
=
expected_magic
or
segments >
16
:
                
raise
FatalError(
'Invalid firmware image magic=%d segments=%d'
%
(magic, segments))
            
return
segments
    
def
load_segment(
self
, f, is_irom_segment
=
False
):
        
""" Load the next segment from the image file """
        
file_offs
=
f.tell()
        
(offset, size)
=
struct.unpack(
'<II'
, f.read(
8
))
        
self
.warn_if_unusual_segment(offset, size, is_irom_segment)
        
segment_data
=
f.read(size)
        
if
len
(segment_data) < size:
            
raise
FatalError(
'End of file reading segment 0x%x, length %d (actual length %d)'
%
(offset, size,
len
(segment_data)))
        
segment
=
ImageSegment(offset, segment_data, file_offs)
        
self
.segments.append(segment)
        
return
segment

All of the above code is notable. It allows us to discern the structure of the firmware image.

The function load_common_header() details the following format:

1
(magic, segments,
self
.flash_mode,
self
.flash_size_freq,
self
.entrypoint)
=
struct.unpack(
'<BBBBI'
, load_file.read(
8
))

Which represented as a structure would look like this:

1
2
3
4
5
6
7
typedef
struct
{
    
uint8 magic;
    
uint8 sect_count;
    
uint8 flash_mode;
    
uint8 flash_size_freq;
    
uint32 entry_addr;
} rom_header;

We can see from the function load_segment() that following our image header are the image segment headers, followed immediately by the segment data itself, for each segment.

The following code parses a segment header:

1
(offset, size)
=
struct.unpack(
'<II'
, f.read(
8
))

Which again, represented as a structure would be as follows:

1
2
3
4
typedef
struct
{
    
uint32 seg_addr;
    
uint32 seg_size;
} segment_header;

This is helpful, we now know both the format of the firmware image and a number of the tools available to process such images. It’s worth noting that we haven’t considered elements such as checksums, but these aren’t important to us as we don’t intend on patching the firmware image.

Whilst a tangent, it’s worth noting that whilst in this case, our format has been documented and tools exist to parse such formats, often this is not the case. In such cases, I’d advise obtaining as many firmware images as you can from your target devices. At that point, a starting point could be to find commonalities between them, which could indicate what certain bytes mean within the format. Also of use would be to understand how an image is booted into, as the bootloader may act differently depending on certain values at fixed offsets.

Understanding the boot process

So, onto our next question, what is the boot process of the device? Understanding this is important as it will help to clarify our understanding of the image. Richard Aburton has very helpfully reverse engineered the boot loader and described the following key point:

It finds the flash address of the rom to boot. Rom 1 is always at 0×1000 (next sector after boot loader). Rom 2 is half the chip size + 0×1000 (unless the chip is above a 1mb when it’s position it kept down to to 0×81000).

Checking the 0×1000 offset within our firmware image, there is indeed a second image, as denoted by presence of the image magic signature (0xE9):

01
02
03
04
05
06
07
08
09
10
11
josh@ioteeth:/tmp/reversing$ hexdump -s 0x1000 -v -C recovered_file | head
00001000  e9 04 00 00 30 64 10 40  10 10 20 40 c0 ed 03 00  |....0d.@.. @....|
00001010  43 03 ab 83 1c 00 00 60  00 00 00 60 1c 0f 00 60  |C......`...`...`|
00001020  00 0f 00 60 41 fc ff 20  20 74 c0 20 00 32 24 00  |...`A..  t. .2$.|
00001030  30 30 75 56 33 ff 31 f8  ff 66 92 08 42 a0 0d c0  |00uV3.1..f..B...|
00001040  20 00 42 63 00 51 f5 ff  c0 20 00 29 03 42 a0 7d  | .Bc.Q... .).B.}|
00001050  c0 20 00 38 05 30 30 75  37 34 f4 31 f1 ff 66 92  |. .8.00u74.1..f.|
00001060  06 0c d4 c0 20 00 49 03  c0 20 00 29 03 0d f0 00  |.... .I.. .)....|
00001070  b0 ff ff 3f 24 10 20 40  00 ed fe 3f 80 6e 10 40  |...?$. @...?.n.@|
00001080  04 ed fe 3f 79 6e 10 40  fc ec fe 3f f8 ec fe 3f  |...?yn.@...?...?|
00001090  6b 6e 10 40 61 6e 10 40  f6 ec fe 3f 52 6e 10 40  |kn.@an.@...?Rn.@|

This second firmware image sits almost immediately after the padding bytes we observed earlier. Based on the format, we can see from the second byte (0×04) that this ROM has 4 segments and is likely to be user or custom ROM code, with the first ROM image potentially being the bootloader of the device, responsible for bootstrapping.

Whilst there are a lot of nuances to the boot process, the above is all we really need to be aware of at this time.

Understanding the physical memory layout

Next, understanding the physical memory layout will help us to differentiate between data and code segments, assuming consistency between images. Whilst not entirely accurate, the physical memory layout of the ESP8266 has been documented.

From the information within, we can conclude the following:

  • 0×40100000 – Instruction RAM. Used by bootloader to load SPI Flash <40000h.
  • 0x3FFE8000 – User data RAM. Available to applications.
  • 0x3FFFFFFF – Anything below this address appears to be data, not code
  • 0×40100000 – Anything above this address appears to be code, not data

Anything that doesn’t match an address exactly, we’ll mark as unknown and classify as either code or data based on the rules above.

It should be noted that simply loading the file as ‘binary’ within IDA, having set the appropriate processor, allows for limited understanding and doesn’t display any xrefs to strings that could guide our efforts:

With this in mind, we can write a simple loader for IDA to identify the firmware image and load the segments accordingly, which should yield better results. We’ll use the memory map above as a guide to name the segments and mark them as code or data accordingly.

Reversing ESP8266 Firmware (Part 2)

( original text by @boredpentester )

Initial analysis

As with any unknown binary, our initial analysis will help to uncover any strings that may allude to what we’re looking at, as well as any signatures within the file that could present a point of further analysis. Lastly, we want to look at the hexadecimal representation of the file, in order to identify padding or other blocks of interest.

File output:

1
recovered_file: , code offset 0x1+3, Bytes/sector 26688, sectors/cluster 5, FATs 16, root entries 16, sectors 20480 (volumes &amp;lt;=32 MB), Media descriptor 0xf5, sectors/FAT 16400, sectors/track 19228, FAT (12 bit by descriptor)

BinWalk output:

01
02
03
04
05
06
07
08
09
10
11
josh@ioteeth:/tmp/reversing$ binwalk -v recovered_file
Scan Time:     2018-05-24 11:51:24
Target File:   /tmp/reversing/recovered_file
MD5 Checksum:  7e11dd07846ecfe2502df7ad0c75952a
Signatures:    344
DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
251942        0x3D826         Unix path: /tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
292925        0x4783D         Unix path: /tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/cores/esp8266/abi.cpp

We can see from the output above that we’re potentially looking at an ESP8266 firmware image. I’ll disregard the results of file as they’re clearly a false positive, based on the challenge pre-text.

Strings output:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[...]
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_pos &amp;lt;= _streamPos
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
cb == stream_rem
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_pos + size &amp;lt;= _size
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_buffer
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_pos &amp;lt;= _streamPos
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
cb == stream_rem
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_pos + size &amp;lt;= _size
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_send_waiting == 0
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/ClientContext.h
_datasource == nullptr
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/ClientContext.h
_connect_pending
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/ClientContext.h
pcb == _pcb
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/ClientContext.h
buffer == _data + _pos
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_pos + size &amp;lt;= _size
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
[...]
 
AConnecting to
WiFi connected
IP address:
Port knocking failed
/get_secrets
Requesting URL:
GET
 
HTTP/1.1
Host:
Connection: close
&amp;gt;&amp;gt;&amp;gt; Client Timeout !
closing connection
services.ioteeth.com
AjT32156
PCSL_GUEST

Based on the strings above, in particular:

1
2
IP address:
Port knocking failed

It would appear our firmware is potentially performing a form of port knocking, before connecting to services.ioteeth.com to retrieve the aforementioned secrets. Port knocking is a means of instructing a firewall to open a predefined TCP/UDP port if the correct sequence of ports are ‘knocked’ on, which is usually performed via sending a TCP SYN packet to the required ports. The firewall would recognise the sequence and permit access to the defined resources.

We can perform a fast port scan to check for any filtered ports across a limited number of popular ports:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
josh@ioteeth:/tmp$ nmap -n -PN -F -v services.ioteeth.com -oN services.ioteeth.com.out
Warning: The -PN option is deprecated. Please use -Pn
Starting Nmap 7.40 ( https://nmap.org ) at 2018-05-25 11:29 BST
Initiating Connect Scan at 11:29
Scanning services.ioteeth.com (192.168.1.69) [100 ports]
Discovered open port 22/tcp on 192.168.1.69
Completed Connect Scan at 11:29, 1.20s elapsed (100 total ports)
Nmap scan report for services.ioteeth.com (192.168.1.69)
Host is up (0.00059s latency).
Not shown: 98 closed ports
PORT    STATE    SERVICE
22/tcp  open     ssh
445/tcp filtered microsoft-ds
Read data files from: /usr/bin/../share/nmap
Nmap done: 1 IP address (1 host up) scanned in 1.23 seconds

We can see that port 445 is filtered, so this could be our target port and equally, this would explain why the service couldn’t be reached by the originator of this challenge. It’s also possible another port is the one being used to obtain/upload secrets and ultimately, we’ll find that port through investigation, we note this however as a passive observation.

Continuing with our analysis, let’s take a look at the hexdump of our firmware image:

01
02
03
04
05
06
07
08
09
10
11
josh@ioteeth:/tmp/reversing$ hexdump -v -C recovered_file | head
00000000  e9 01 00 00 9c f2 10 40  00 f0 10 40 68 05 00 00  |.......@...@h...|
00000010  10 10 00 00 50 f5 10 40  1c 4b 00 40 cc 24 00 40  |....P..@.K.@.$.@|
00000020  ff ff ff 3f ff ff 0f 40  ff 7f 10 40 ff ff ff 5f  |...?...@...@..._|
00000030  f0 ff ff 3f 00 10 00 00  1c e2 00 40 00 4a 00 40  |...?.......@.J.@|
00000040  4c 4a 00 40 00 07 00 60  00 00 00 80 e8 2b 00 40  |LJ.@...`.....+.@|
00000050  f0 30 00 40 a0 2f 00 40  b7 1d c1 04 00 12 00 60  |.0.@./.@.......`|
00000060  00 10 00 eb 7c 12 00 60  12 c1 d0 09 b1 f9 a1 fd  |....|..`........|
00000070  01 29 4f 38 4f 21 e6 ff  2a 23 4b 3f 0c 44 01 e6  |.)O8O!..*#K?.D..|
00000080  ff c0 00 00 8c 52 0c 12  86 07 00 00 00 21 e1 ff  |.....R.......!..|
00000090  29 0f 28 0f 28 02 29 2f  28 0f 28 12 29 3f 38 1f  |).(.(.)/(.(.)?8.|

We also note that further into the file is what appears to be padding, followed by more data:

01
02
03
04
05
06
07
08
09
10
11
12
13
00000570  68 f5 10 40 00 00 00 00  00 00 00 00 00 00 00 2d  |h..@...........-|
00000580  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000590  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
000005a0  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
000005b0  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
[...]
00000fd0  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000fe0  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000ff0  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00001000  e9 04 00 00 30 64 10 40  10 10 20 40 c0 ed 03 00  |....0d.@.. @....|
00001010  43 03 ab 83 1c 00 00 60  00 00 00 60 1c 0f 00 60  |C......`...`...`|
00001020  00 0f 00 60 41 fc ff 20  20 74 c0 20 00 32 24 00  |...`A..  t. .2$.|
00001030  30 30 75 56 33 ff 31 f8  ff 66 92 08 42 a0 0d c0  |00uV3.1..f..B...|

This padding is potentially useful, as it could be indicative of multiple files or formats being present within our target firmware image.

At this point, we have a number of questions we need to answer before we can continue. We can theorise that our long-term goal, based on the strings observed within the file, is to uncover the port knocking sequence performed by the firmware, which will hopefully allow us to access the external service that’s communicated with. But how do we go about doing that?

It’s worth noting that IDA doesn’t recognise our file, so let’s pause and do some research first.

Questions we need to answer

As with any firmware image, a good starting point prior to reverse engineering is to understand the following:

  • What is the device in question?
  • What processor does it operate on?
  • What is the format of the firmware image in question?
  • What tools exist to process the type of firmware image we have, if any?
  • What is the boot process of the device?
  • What does the physical memory layout look like?

Throughout the course of this section, we’ll work towards learning the answers to the above and understanding how they can help us.

Our penultimate goal will be to load the firmware into IDA for analysis, in a way that allows us to make some sense of what’s happening.

Reversing ESP8266 Firmware (Part 1)

( original text by @boredpentester )

During my time with Cisco Portcullis, I wanted to learn more about reverse engineering embedded device firmware.

This six-part series was written both during my time with Cisco Portcullis, as well in my spare time (if the tagline of this blog didn’t give that away). This series intends to detail my analysis of an embedded device’s firmware, in this case, the ESP8266’s, present a methodology for analysing firmware more generally and of course, solve the challenge presented. It will be one of many series with a focus on firmware reverse engineering and security assessment of ‘smart’ devices.

We will cover the initial analysis, writing an IDA loader and recovering function symbols (names) via IDA’s FLAIR tools. Finally, we’ll reverse engineer the functionality required to solve the challenge, for extra points, without reliance upon string references.

I chose an ESP8266 firmware image supplied by a colleague as a contrived reversing challenge.  Mainly because this target made a good starting point due to the simplicity of the firmware format.

The challenge was described as follows:

We managed to obtain the firmware of an unknown device connected to our wireless access point. We’ve been told it’s connecting to a service and retrieving secrets, but we can’t reach the service. Can you?

Update: You can find the a slightly modified version of the supplied binary here (within the zip archive).

This series has been broken down into the following parts:

  • Part 1:
    • Introduction (you’re here now)
  • Part 2:
    • Initial analysis
    • Questions we need to answer
  • Part 3:
    • What is it?
    • Understanding the firmware format
    • Understanding the boot process
    • Understanding the physical memory layout
  • Part 4:
    • Writing an IDA loader
    • Performing library recognition
    • Fixing IDB2PAT
    • Generating our pattern file
  • Part 5:
    • Recognising VTABLE’s
    • Finding the port knock sequence
    • Understanding the Xtensa instruction set
    • Understanding the Xtensa calling convention
  • Part 6:
    • Reversing the loop function
    • Getting the secrets!
    • Conclusion
    • References
    • Tools
    • Feedback

Control Register Access Exiting and Crashing VMware

( origin text  from https://howtohypervise.blogspot.com )

Coinciding with my previous two posts, here’s how you can crash or at least detect VMware and many other hypervisors:
https://gist.github.com/drew1ind/d31840bebbbb1ff1d112a6f46e162c05Backstory:
When I was writing a simple SEH emulator (following the documentation on msdn as well as this excellent blogpost) for my hypervisor, I was testing under VMware.

When trying to execute that, I found that VMware would instantly close without any message. After being stumped for a while, I tried on my PC only to find that my SEH emulator did, in fact, work.
When I talked about it with my friend daax (whose blog can be found over at https://revers.engineering/), he recalled experiencing the exact same issue (albeit with different motivations): when he tried to unset CR0.pe to cause a #GP(0), VMware would just close on him. This eventually drove me to the conclusion that VMware was improperly handling the CR access VM exit for CR0, or more specifically, they don’t check that cr0.pg is already enabled, which would normally cause a #GP(0). Since they write the invalid value into the guest CR0 VMCS field, the processor objects upon VM entry in the form of a VM entry failure, which VMware responds to by just closing itself.
Additional checks in that gist linked above rely on hypervisors not properly emulating CPU behaviour, which includes:
  • Injecting an exception upon updating bits of CR0 required to be a certain value by the CPU for VMX operation or just not updating them at all
  • Not injecting a fault when the guest attempts to set a reserved bit of CR0, which can result in either VM entry failure or a triple fault due to repeated #GP(0)s
  • Updating the state of CR0 even though the write caused a #GP(0)
Fixes for such include:
  • Always checking if a change is valid before changing any CPU state
  • For control register bits that are documented, if they are changed, the hypervisor should ensure that the processor supports the bit and if the processor would inject an exception if the bit was changed (i.e with the VMware example, they should check if CR0.pg is set, and if it is, declare the change as invalid)
  • Control register bits that are forced to be a fixed value should be host owned bits which values only change in the read shadow
  • Control register bits which don’t exist at the time of writing the hypervisor should never be allowed to change — this also means that the hypervisor *must* control CPUID responses to remove reserved bits from responses, as well as reserved leafs