Practical Reverse Engineering Part 5 — Digging Through the Firmware

Projects and learnt lessons on Systems Security, Embedded Development, IoT and anything worth writing about

  • Part 1: Hunting for Debug Ports
  • Part 2: Scouting the Firmware
  • Part 3: Following the Data
  • Part 4: Dumping the Flash
  • Part 5: Digging Through the Firmware

In part 4 we extracted the entire firmware from the router and decompressed it. As I explained then, you can often get most of the firmware directly from the manufacturer’s website: Firmware upgrade binaries often contain partial or entire filesystems, or even entire firmwares.

In this post we’re gonna dig through the firmware to find potentially interesting code, common vulnerabilities, etc.

I’m gonna explain some basic theory on the Linux architecture, disassembling binaries, and other related concepts. Feel free to skip some of the parts marked as [Theory]; the real hunt starts at ‘Looking for the Default WiFi Password Generation Algorithm’. At the end of the day, we’re just: obtaining source code in case we can use it, using grep and common sense to find potentially interesting binaries, and disassembling them to find out how they work.

One step at a time.

Gathering and Analysing Open Source Components

GPL Licenses — What They Are and What to Expect [Theory]

Linux, U-Boot and other tools used in this router are licensed under the General Public License. This license mandates that the source code for any binaries built with GPL’d projects must be made available to anyone who wants it.

Having access to all that source code can be a massive advantage during the reversing process. The kernel and the bootloader are particularly interesting, and not just to find security issues.

When hunting for GPL’d sources you can usually expect one of these scenarios:

  1. The code is freely available on the manufacturer’s website, nicely ordered and completely open to be tinkered with. For instance: apple products or theamazon echo
  2. The source code is available by request
    • They send you an email with the sources you requested
    • They ask you for “a reasonable amount” of money to ship you a CD with the sources
  3. They decide to (illegally) ignore your requests. If this happens to you, consider being nice over trying to get nasty.

In the case of this router, the source code was available on their website, even though it was a huge pain in the ass to find; it took me a long time of manual and automated searching but I ended up finding it in the mobile version of the site:

ls -lh gpl_source

But what if they’re hiding something!? How could we possibly tell whether the sources they gave us are the same they used to compile the production binaries?

Challenges of Binary Verification [Theory]

Theoretically, we could try to compile the source code ourselves and compare the resulting binary with the one we extracted from the device. In practice, that is extremely more complicated than it sounds.

The exact contents of the binary are strongly tied to the toolchain and overall environment they were compiled in. We could try to replicate the environment of the original developers, finding the exact same versions of everything they used, so we can obtain the same results. Unfortunately, most compilers are not built with output replicability in mind; even if we managed to find the exact same version of everything, details like timestamps, processor-specific optimizations or file paths would stop us from getting a byte-for-byte identical match.

If you’d like to read more about it, I can recommend this paper. The authors go through the challenges they had to overcome in order to verify that the official binary releases of the application ‘TrueCrypt’ were not backdoored.

Introduction to the Architecture of Linux [Theory]

In multiple parts of the series, we’ve discussed the different components found in the firmware: bootloader, kernel, filesystem and some protected memory to store configuration data. In order to know where to look for what, it’s important to understand the overall architecture of the system. Let’s quickly review this device’s:

Linux Architecture

The bootloader is the first piece of code to be executed on boot. Its job is to prepare the kernel for execution, jump into it and stop running. From that point on, the kernel controls the hardware and uses it to run user space logic. A few more details on each of the components:

  1. Hardware: The CPU, Flash, RAM and other components are all physically connected
  2. Linux Kernel: It knows how to control the hardware. The developers take the Open Source Linux kernel, write drivers for their specific device and compile everything into an executable Kernel. It manages memory, reads and writes hardware registers, etc. In more complex systems, “kernel modules” provide the possibility of keeping device drivers as separate entities in the file system, and dynamically load them when required; most embedded systems don’t need that level of versatility, so developers save precious resources by compiling everything into the kernel
  3. libc (“The C Library”): It serves as a general purpose wrapper for the System Call API, including extremely common functions like printfmalloc or system. Developers are free to call the system call API directly, but in most cases, it’s MUCH more convenient to use libc. Instead of the extremely common glibc (GNU C library) we usually find in more powerful systems, this device uses a version optimised for embedded devices: uClibc.
  4. User Applications: Executable binaries in /bin/ and shared objects in /lib/(libraries that contain functions used by multiple binaries) comprise most of the high-level logic. Shared objects are used to save space by storing commonly used functions in a single location

Bootloader Source Code

As I’ve mentioned multiple times over this series, this router’s bootloader is U-Boot. U-Boot is GPL licensed, but Huawei failed to include the source code in their website’s release.

Having the source code for the bootloader can be very useful for some projects, where it can help you figure out how to run a custom firmware on the device or modify something; some bootloaders are much more feature-rich than others. In this case, I’m not interested in anything U-Boot has to offer, so I didn’t bother following up on the source code.

Kernel Source Code

Let’s just check out the source code and look for anything that might help. Remember the factory reset button? The button is part of the hardware layer, which means the GPIO pin that detects the button press must be controlled by the drivers. These are the logs we saw coming out of the UART port in a previous post:

UART system restore logs

With some simple grep commands we can see how the different components of the system (kernel, binaries and shared objects) can work together and produce the serial output we saw:

System reset button propagates to user space

Having the kernel can help us find poorly implemented security-related algorithms and other weaknesses that are sometimes considered ‘accepted risks’ by manufacturers. Most importantly, we can use the drivers to compile and run our own OS on the device.

User Space Source Code

As we can see in the GPL release, some components of the user space are also open source, such as busybox and iptables. Given the right (wrong) versions, public vulnerability databases could be enough to find exploits for any of these.

That being said, if you’re looking for 0-days, backdoors or sensitive data, your best bet is not the open source projects. Devic specific and closed source code developed by the manufacturer or one of their providers has not been so heavily tested and may very well be riddled with bugs. Most of this code is stored as binaries in the user space; we’ve got the entire filesystem, so we’re good.

Without the source code for user space binaries, we need to find a way to read the machine code inside them. That’s where disassembly comes in.

Binary Disassembly [Theory]

The code inside every executable binary is just a compilation of instructions encoded as Machine Code so they can be processed by the CPU. Our processor’s datasheet will explain the direct equivalence between assembly instructions and their machine code representations. A disassembler has been given that equivalence so it can go through the binary, find data and machine code andtranslate it into assembly. Assembly is not pretty, but at least it’s human-readable.

Due to the very low-level nature of the kernel, and how heavily it interacts with the hardware, it is incredibly difficult to make any sense of its binary. User space binaries, on the other hand, are abstracted away from the hardware and follow unix standards for calling conventions, binary format, etc. They’re an ideal target for disassembly.

There are lots of disassemblers for popular architectures like MIPS; some better than others both in terms of functionality and usability. I’d say these 3 are the most popular and powerful disassemblers in the market right now:

  • IDA Pro: By far the most popular disassembler/debugger in the market. It is extremely powerful, multi-platform, and there are loads of users, tutorials, plugins, etc. around it. Unfortunately, it’s also VERY expensive; a single person license of the Pro version (required to disassemble MIPS binaries) costs over $1000
  • Radare2: Completely Open Source, uses an impressively advanced command line interface, and there’s a great community of hackers around it. On the other hand, the complex command line interface -necessary for the sheer amount of features- makes for a rather steep learning curve
  • Binary Ninja: Not open source, but reasonably priced at $100 for a personal license, it’s middle ground between IDA and radare. It’s still a very new tool; it was just released this year, but it’s improving and gaining popularity day by day. It already works very well for some architectures, but unfortunately it’s still missing MIPS support (coming soon) and some other features I needed for these binaries. I look forward to giving it another try when it’s more mature

In order to display the assembly code in a more readable way, all these disasemblers use a “Graph View”. It provides an intuitive way to follow the different possible execution flows in the binary:

IDA Small Function Graph View

Such a clear representation of branches, and their conditionals, loops, etc. is extremely useful. Without it, we’d have to manually jump from one branch to another in the raw assembly code. Not so fun.

If you read the code in that function you can see the disassembler makes a great job displaying references to functions and hardcoded strings. That might be enough to help us find something juicy, but in most cases you’ll need to understand the assembly code to a certain extent.

Gathering Intel on the CPU and Its Assembly Code [Theory]

Let’s take a look at the format of our binaries:

$ file bin/busybox
bin/busybox: ELF 32-bit LSB executable, MIPS, MIPS-II version 1 (SYSV), dynamically linked (uses shared libs), corrupted section header size

Because ELF headers are designed to be platform-agnostic, we can easily find out some info about our binaries. As you can see, we know the architecture (32-bit MIPS), endianness (LSB), and whether it uses shared libraries.

We can verify that information thanks to the Ralink’s product brief, which specifies the processor core it uses: MIPS24KEc

Product Brief Highlighted Processor Core

With the exact version of the CPU core, we can easily find its datasheet as released by the company that designed it: Imagination Technologies.

Once we know the basics we can just drop the binary into the disassembler. It will help validate some of our findings, and provide us with the assembly code. In order to understand that code we’re gonna need to know the architecture’s instruction sets and register names:

  • MIPS Instruction Set
  • MIPS Pseudo-Instructions: Very simple combinations of basic instructions, used for developer/reverser convenience
  • MIPS Alternate Register Names: In MIPS, there’s no real difference between registers; the CPU doesn’t about what they’re called. Alternate register names exist to make the code more readable for the developer/reverser: $a0 to $a3 for function arguments, $t0 to $t9 for temporary registers, etc.

Beyond instructions and registers, some architectures may have some quirks. One example of this would be the presence of delay slots in MIPS: Instructions that appear immediately after branch instructions (e.g. beqzjalr) but are actually executed before the jump. That sort of non-linearity would be unthinkable in other architectures.

Some interesting links if you’re trying to learn MIPS: Intro to MIPS Reversing using Radare2MIPS Assembler and Runtime SimulatorToolchains to cross-compile for MIPS targets.

Example of User Space Binary Disassembly

Following up on the reset key example we were using for the Kernel, we’ve got the code that generated some of the UART log messages, but not all of them. Since we couldn’t find the ‘button has been pressed’ string in the kernel’s source code, we can deduce it must have come from user space. Let’s find out which binary printed it:

~/Tech/Reversing/Huawei-HG533_TalkTalk/router_filesystem
$ grep -i -r "restore default success" .
Binary file ./bin/cli matches
Binary file ./bin/equipcmd matches
Binary file ./lib/libcfmapi.so matches

3 files contain the next string found in the logs: 2 executables in /bin/ and 1 shared object in /lib/. Let’s take a look at /bin/equipcmd with IDA:

restore success string in /bin/equipcmd - IDA GUI

If we look closely, we can almost read the C code that was compiled into these instructions. We can see a “clear configuration file”, which would match the ERASEcommands we saw in the SPI traffic capture to the flash IC. Then, depending on the result, one of two strings is printed: restore default success or restore default fail . On success, it then prints something else, flushes some buffers and reboots; this also matches the behaviour we observed when we pressed the reset button.

That function is a perfect example of delay slots: the addiu instructions that set both strings as arguments —$a0— for the 2 puts are in the delay slots of the branch if equals zero and jump and link register instructions. They will actually be executed before branching/jumping.

As you can see, IDA has the name of all the functions in the binary. That won’t necessarily be the case in other binaries, and now’s a good time to discuss why.

Function Names in a Binary — Intro to Symbol Tables [Theory]

The ELF format specifies the usage of symbol tables: chunks of data inside a binary that provide useful debugging information. Part of that information are human-readable names for every function in the binary. This is extremely convenient for a developer debugging their binary, but in most cases it should be removed before releasing the production binary. The developers were nice enough to leave most of them in there 🙂

In order to remove them, the developers can use tools like strip, which know what must be kept and what can be spared. These tools serve a double purpose: They save memory by removing data that won’t be necessary at runtime, and they make the reversing process much more complicated for potential attackers. Function names give context to the code we’re looking at, which is massively helpful.

In some cases -mostly when disassembling shared objects- you may see somefunction names or none at all. The ones you WILL see are the Dynamic Symbols in the .dymsym table: We discussed earlier the massive amount of memory that can be saved by using shared objects to keep the pieces of code you need to re-use all over the system (e.g. printf()). In order to locate pieces of data inside the shared object, the caller uses their human-readable name. That means the names for functions and variables that need to be publicly accessible must be left in the binary. The rest of them can be removed, which is why ELF uses 2 symbol tables: .dynsym for publicly accessible symbols and .symtab for the internal ones.

For more details on symbol tables and other intricacies of the ELF format, check out: The ELF Format — How programs look from the insideInside ELF Symbol Tablesand the ELF spec (PDF).

Looking for the Default WiFi Password Generation Algorithm

What do We Know?

Remember the wifi password generation algorithm we discussed in part 3? (The Pot of Gold at the End of the Firmware) I explained then why I didn’t expect this router to have one, but let’s take a look anyway.

If you recall, these are the default WiFi credentials in my router:

Router Sticker - Annotated

So what do we know?

  1. Each device is pre-configured with a different set of WiFi credentials
  2. The credentials could be hardcoded at the factory or generated on the device. Either way, we know from previous posts that both SSID and password are stored in the reserved area of Flash memory, and they’re right next to each other
    • If they were hardcoded at the factory, the router only needs to read them from a known memory location
    • If they are generated in the device and then stored to flash, there must be an algorithm in the router that -given the same inputs- always generates the same outputs. If the inputs are public (e.g. the MAC address) and we can find, reverse and replicate the algorithm, we could calculate default WiFi passwords for any other router that uses the same algorithm

Let’s see what we can do with that…

Finding Hardcoded Strings

Let’s assume there IS such algorithm in the router. Between username and password, there’s only one string that remains constant across devices:TALKTALK-. This string is prepended to the last 6 characters of the MAC address. If the generation algorithm is in the router, surely this string must be hardcoded in there. Let’s look it up:

$ grep -r 'TALKTALK-' .
Binary file ./bin/cms matches
Binary file ./bin/nmbd matches
Binary file ./bin/smbd matches

2 of those 3 binaries (nmbd and smbd) are part of samba, the program used to use the USB flash drive as a network storage device. They’re probably used to identify the router over the network. Let’s take a look at the other one: /bin/cms.

Reversing the Functions that Uses Them

IDA TALKTALK-XXXXXX String Being Built

That looks exactly the way we’d expect the SSID generation algorithm to look. The code is located inside a rather large function called ATP_WLAN_Init, and somewhere in there it performs the following actions:

  1. Find out the MAC address of the device we’re running on:
    • mac = BSP_NET_GetBaseMacAddress()
  2. Create the SSID string:
    • snprintf(SSID, "TALKTALK-%02x%02x%02x", mac[3], mac[4], mac[5])
  3. Save the string somewhere:
    • ATP_DBSetPara(SavedRegister3, 0xE8801E09, SSID)

Unfortunately, right after this branch the function simply does an ATP_DBSave and moves on to start running commands and whatnot. e.g.:

ATP_WLAN_Init moves on before you

Further inspection of this function and other references to ATP_DBSave did not reveal anything interesting.

Giving Up

After some time using this process to find potentially relevant pieces of code, reverse them, and analyse them, I didn’t find anything that looked like the password generation algorithm. That would confirm the suspicions I’ve had since we found the default credentials in the protected flash area: The manufacturer used proper security techniques and flashed the credentials at the factory, which is why there is no algorithm. Since the designers manufacture their own hardware, the decision makes perfect sense for this device. They can do whatever they want with their manufacturing lines, so they decided to do it right.

I might take another look at it in the future, or try to find it in some other router (I’d like to document the process of reversing it), but you should know this method DOES work for a lot of products. There’s a long history of freely available default WiFi password generators.

Since we already know how to find relevant code in the filesystem binaries, let’s see what else we can do with that knowledge.

Looking for Command Injection Vulnerabilities

One of the most common, easy to find and dangerous vulnerabilities is command injection. The idea is simple; we find an input string that is gonna be used as an argument for a shell command. We try to append our own commands and get them to execute, bypassing any filters that the developers may have implemented. In embedded devices, such vulnerabilities often result in full root control of the device.

These vulnerabilities are particularly common in embedded devices due to their memory constraints. Say you’re developing the web interface used by the users to configure the device; you want to add the possibility to ping a user-defined server from the router, because it’s very valuable information to debug network problems. You need to give the user the option to define the ping target, and you need to serve them the results:

Router WEB Interface Ping in action

Once you receive the data of which server to target, you have two options: You find a library with the ICMP protocol implemented and call it directly from the web backend, or you could use a single, standard function call and use the router’s already existing ping shell command. The later is easier to implement, saves memory, etc. and it’s the obvious choice. Taking user input (target server address) and using it as part of a shell command is where the danger comes in. Let’s see how this router’s web application, /bin/web, handles it:

/bin/web's ping function

A call to libc’s system() (not to be confused with a system call/syscall) is the easiest way to execute a shell command from an application. Sometimes developers wrap system() in custom functions in order to systematically filter all inputs, but there’s always something the wrapper can’t do or some developer who doesn’t get the memo.

Looking for references to system in a binary is an excellent way to find vectors for command injections. Just investigate the ones that look like may be using unfiltered user input. These are all the references to system() in the /bin/web binary:

xrefs to system in /bin/web

Even the names of the functions can give you clues on whether or not a reference to system() will receive user input. We can also see some references to PIN and PUK codes, SIMs, etc. Seems like this application is also used in some mobile product…

I spent some time trying to find ways around the filtering provided byatp_gethostbyname (anything that isn’t a domain name causes an error), but I couldn’t find anything in this field or any others. Further analysis may prove me wrong. The idea would be to inject something to the effects of this:

Attempt reboot injection on ping field

Which would result in this final string being executed as a shell command: ping google.com -c 1; reboot; ping 192.168.1.1 > /dev/null. If the router reboots, we found a way in.

As I said, I couldn’t find anything. Ideally we’d like to verify that for all input fields, whether they’re in the web interface or some other network interface. Another example of a network interface potentially vulnerable to remote command injections is the “LAN-Side DSL CPE Configuration” protocol, or TR-064. Even though this protocol was designed to be used over the internal network only, it’s been used to configure routers over the internet in the past. Command injection vulnerabilities in some implementations of this protocol have been used to remotely extract data like WiFi credentials from routers with just a few packets.

This router has a binary conveniently named /bin/tr064; if we take a look, we find this right in the main() function:

/bin/tr064 using /etc/serverkey.pem

That’s the private RSA key we found in Part 2 being used for SSL authentication. Now we might be able to supplant a router in the system and look for vulnerabilities in their servers, or we might use it to find other attack vectors. Most importantly, it closes the mistery of the private key we found while scouting the firmware.

Looking for More Complex Vulnerabilities [Theory]

Even if we couldn’t find any command injection vulnerabilities, there are always other vectors to gain control of the router. The most common ones are good old buffer overflows. Any input string into the router, whether it is for a shell command or any other purpose, is handled, modified and passed around the code. An error by the developer calculating expected buffer lengths, not validating them, etc. in those string operations can result in an exploitable buffer overflow, which an attacker can use to gain control of the system.

The idea behind a buffer overflow is rather simple: We manage to pass a string into the system that contains executable code. We override some address in the program so the execution flow jumps into the code we just injected. Now we can do anything that binary could do -in embedded systems like this one, where everything runs as root, it means immediate root pwnage.

Introducing an unexpectedly long input

Developing an exploit for this sort of vulnerability is not as simple as appending commands to find your way around a filter. There are multiple possible scenarios, and different techniques to handle them. Exploits using more involved techniques like ROP can become necessary in some cases. That being said, most household embedded systems nowadays are decades behind personal computers in terms of anti-exploitation techniques. Methods like Address Space Layout Randomization(ASLR), which are designed to make exploit development much more complicated, are usually disabled or not implemented at all.

If you’d like to find a potential vulnerability so you can learn exploit development on your own, you can use the same techniques we’ve been using so far. Find potentially interesting inputs, locate the code that manages them using function names, hardcoded strings, etc. and try to trigger a malfunction sending an unexpected input. If we find an improperly handled string, we might have an exploitable bug.

Once we’ve located the piece of disassembled code we’re going to attack, we’re mostly interested in string manipulation functions like strcpystrcat,sprintf, etc. Their more secure counterparts strncpystrncat, etc. are also potentially vulnerable to some techniques, but usually much more complicated to work with.

Pic of strcpy handling an input

Even though I’m not sure that function -extracted from /bin/tr064— is passed any user inputs, it’s still a good example of the sort of code you should be looking for. Once you find potentially insecure string operations that may handle user input, you need to figure out whether there’s an exploitable bug.

Try to cause a crash by sending unexpectedly long inputs and work from there. Why did it crash? How many characters can I send without causing a crash? Which payload can I fit in there? Where does it land in memory? etc. etc. I may write about this process in more detail at some point, but there’s plenty of literature available online if you’re interested.

Don’t spend all your efforts on the most obvious inputs only -which are also more likely to be properly filtered/handled-; using tools like the burp web proxy (or even the browser itself), we can modify fields like cookies to check for buffer overflows.

Web vulnerabilities like CSRF are also extremely common in embedded devices with web interfaces. Exploiting them to write to files or bypass authentication can lead to absolute control of the router, specially when combined with command injections. An authentication bypass for a router with the web interface available from the Internet could very well expose the network to being remotely man in the middle’d. They’re definitely an important attack vector, even though I’m not gonna go into how to find them.

Decompiling Binaries [Theory]

When you decompile a binary, instead of simply translating Machine Code to Assembly Code, the decompiler uses algorithms to identify functions, loops, branches, etc. and replicate them in a higher level language like C or Python.

That sounds like a brilliant idea for anybody who has been banging their head against some assembly code for a few hours, but an additional layer of abstraction means more potential errors, which can result in massive wastes of time.

In my (admittedly short) personal experience, the output just doesn’t look reliable enough. It might be fine when using expensive decompilers (IDA itself supports a couple of architectures), but I haven’t found one I can trust with MIPS binaries. That being said, if you’d like to give one a try, the RetDec online decompiler supports multiple architectures- including MIPS.

Binary Decompiled to C by RetDec

Even as a ‘high level’ language, the code is not exactly pretty to look at.

Next Steps

Whether we want to learn something about an algorithm we’re reversing, to debug an exploit we’re developing or to find any other sort of vulnerability, being able to execute (and, if possible, debug) the binary on an environment we fully control would be a massive advantage. In some/most cases -like this router-, being able to debug on the original hardware is not possible. In the next post, we’ll work on CPU emulation to debug the binaries in our own computers.

Thanks for reading! I’m sorry this post took so long to come out. Between work,hardwear.io and seeing family/friends, this post was written about 1 paragraph at a time from 4 different countries. Things should slow down for a while, so hopefully I’ll be able to publish Part 6 soon. I’ve also got some other reversing projects coming down the pipeline, starting with hacking the Amazon Echo and a router with JTAG. I’ll try to get to those soon, work permitting… Happy Hacking 🙂


Tips and Tricks

Mistaken xrefs and how to remove them

Sometimes an address is loaded into a register for 16bit/32bit adjustments. The contents of that address have no effect on the rest of the code; it’s just a routinary adjustment. If the address that is assigned to the register happens to be pointing to some valid data, IDA will rename the address in the assembly and display the contents in a comment.

It is up to you to figure out whether an x-ref makes sense or not. If it doesn’t, select the variable and press o in IDA to ignore the contents and give you only the address. This makes the code much less confusing.

Setting function prototypes so IDA comments the args around calls for us

Set the cursor on a function and press y. Set the prototype for the function: e.g. int memcpy(void *restrict dst, const void *restrict src, int n);. Note:IDA only understands built-in types, so we can’t use types like size_t.

Once again we can use the extern declarations found in the GPL source code. When available, find the declaration for a specific function, and use the same types and names for the arguments in IDA.

Taking Advantage of the GPL Source Code

If we wanna figure out what are the 1st and 2nd parameters of a function likeATP_DBSetPara, we can sometimes rely on the GPL source code. Lots of functions are not implemented in the kernel or any other open source component, but they’re still used from one of them. That means we can’t see the code we’re interested in, but we can see the extern declarations for it. Sometimes the source will include documentation comments or descriptive variable names; very useful info that the disassembly doesn’t provide:

ATP_DBSetPara extern declaration in gpl_source/inc/cfmapi.h

Unfortunately, the function documentation comment is not very useful in this case -seems like there were encoding issues with the file at some point, and everything written in Chinese was lost. At least now we know that the first argument is a list of keys, and the second is something they call ParamCMOParamCMO is a constant in our disassembly, so it’s probably just a reference to the key we’re trying to set.

Disassembly Methods — Linear Sweep vs Recursive Descent

The structure of a binary can vary greatly depending on compiler, developers, etc. How functions call each other is not always straightforward for a disassembler to figure out. That means you may run into lots of ‘orphaned’ functions, which exist in the binary but do not have a known caller.

Which disassembler you use will dictate whether you see those functions or not, some of which can be extremely important to us (e.g. the ping function in theweb binary we reversed earlier). This is due to how they scan binaries for content:

  1. Linear Sweep: Read the binary one byte at a time, anything that looks like a function is presented to the user. This requires significant logic to keep false positives to a minimum
  2. Recursive Descent: We know the binary’s entry point. We find all functions called from main(), then we find the functions called from those, and keep recursively displaying functions until we’ve got “all” of them. This method is very robust, but any functions not referenced in a standard/direct way will be left out

Make sure your disassembler supports linear sweep if you feel like you’re missing any data. Make sure the code you’re looking at makes sense if you’re using linear sweep.

Реклама

Practical Reverse Engineering Part 4 — Dumping the Flash

Projects and learnt lessons on Systems Security, Embedded Development, IoT and anything worth writing about

  • Part 1: Hunting for Debug Ports
  • Part 2: Scouting the Firmware
  • Part 3: Following the Data
  • Part 4: Dumping the Flash
  • Part 5: Digging Through the Firmware

In Parts 1 to 3 we’ve been gathering data within its context. We could sniff the specific pieces of data we were interested in, or observe the resources used by each process. On the other hand, they had some serious limitations; we didn’t have access to ALL the data, and we had to deal with very minimal tools… And what if we had not been able to find a serial port on the PCB? What if we had but it didn’t use default credentials?

In this post we’re gonna get the data straight from the source, sacrificing context in favour of absolute access. We’re gonna dump the data from the Flash IC and decompress it so it’s usable. This method doesn’t require expensive equipment and is independent of everything we’ve done until now. An external Flash IC with a public datasheet is a reverser’s great ally.

Dumping the Memory Contents

As discussed in Part 3, we’ve got access to the datasheet for the Flash IC, so there’s no need to reverse its pinout:

Flash Pic Annotated Pinout

We also have its instruction set, so we can communicate with the IC using almost any device capable of ‘speaking’ SPI.

We also know that powering up the router will cause the Ralink to start communicating with the Flash IC, which would interfere with our own attempts to read the data. We need to stop the communication between the Ralink and the Flash IC, but the best way to do that depends on the design of the circuit we’re working with.

Do We Need to Desolder The Flash IC? [Theory]

The perfect way to avoid interference would be to simply desolder the Flash IC so it’s completely isolated from the rest of the circuit. It gives us absolute control and removes all possible sources of interference. Unfortunately, it also requires additional equipment, experience and time, so let’s see if we can avoid it.

The second option would be to find a way of keeping the Ralink inactive while everything else around it stays in standby. Microcontrollers often have a Reset pin that will force them to shut down when pulled to 0; they’re commonly used to force IC reboots without interrupting power to the board. In this case we don’t have access to the Ralink’s full datasheet (it’s probably distributed only to customers and under NDA); the IC’s form factor and the complexity of the circuit around it make for a very hard pinout to reverse, so let’s keep thinking…

What about powering one IC up but not the other? We can try applying voltage directly to the power pins of the Flash IC instead of powering up the whole circuit. Injecting power into the PCB in a way it wasn’t designed for could blow something up; we could reverse engineer the power circuit, but that’s tedious work. This router is cheap and widely available, so I took the ‘fuck it’ approach. The voltage required, according to the datasheet, is 3V; I’m just gonna apply power directly to the Flash IC and see what happens. It may power up the Ralink too, but it’s worth a try.

Flash Powered UART Connected

We start supplying power while observing the board and waiting for data from the Ralink’s UART port. We can see some LEDs light up at the back of the PCB, but there’s no data coming out of the UART port; the Ralink must not be running. Even though the Ralink is off, its connection to the Flash IC may still interfere with our traffic because of multiple design factors in both power circuit and the silicon. It’s important to keep that possibility in mind in case we see anything dodgy later on; if that was to happen we’d have to desolder the Flash IC (or just its data pins) to physically disconnect it from everything else.

The LEDs and other static components can’t communicate with the Flash IC, so they won’t be an issue as long as we can supply enough current for all of them. I’m just gonna use a bench power supply, with plenty of current available for everything. If you don’t have one you can try using the Master’s power lines, or some USB power adapter if you need some more current. They’ll probably do just fine.

Time to connect our SPI Master.

Connecting to the Flash IC

Now that we’ve confirmed there’s no need to desolder the Ralink we can connect any device that speaks SPI and start reading memory contents block by block. Any microcontroller will do, but a purpose-specific SPI-USB bridge will often be much faster. In this case I’m gonna be using a board based on the FT232H, which supports SPI among some other low level protocols.

We’ve got the pinout for both the Flash and my USB-SPI bridge, so let’s get everything connected.

Shikra and Power Connected to Flash

Now that the hardware is ready it’s time to start pumping data out.

Dumping the Data

We need some software in our computer that can understand the USB-SPI bridge’s traffic and replicate the memory contents as a binary file. Writing our own wouldn’t be difficult, but there are programs out there that already support lots of common Masters and Flash ICs. Let’s try the widely known and open source flashrom.

flashrom is old and buggy, but it already supports both the FT232H as Master and the FL064PIF as Slave. It gave me lots of trouble in both OSX and an Ubuntu VM, but ended up working just fine on a Raspberry Pi (Raspbian):

flashrom stdout

Success! We’ve got our memory dump, so we can ditch the hardware and start preparing the data for analysis.

Splitting the Binary

The file command has been able to identify some data about the binary, but that’s just because it starts with a header in a supported format. In a 0-knowledge scenario we’d use binwalk to take a first look at the binary file and find the data we’d like to extract.

Binwalk is a very useful tool for binary analysis created by the awesome hackers at /dev/ttyS0; you’ll certainly get to know them if you’re into hardware hacking.

binwalk spidump.bin

In this case we’re not in a 0-knowledge scenario; we’ve been gathering data since day 1, and we obtained a complete memory map of the Flash IC in Part 2. The addresses mentioned in the debug message are confirmed by binwalk, and it makes for much cleaner splitting of the binary, so let’s use it:

Flash Memory Map From Part 2

With the binary and the relevant addresses, it’s time to split the binary into its 4 basic segments. dd takes its parameters in terms of block size (bs, bytes), offset (skip, blocks) and size (count, blocks); all of them in decimal. We can use a calculator or let the shell do the hex do decimal conversions with $(()):

$ dd if=spidump.bin of=bootloader.bin bs=1 count=$((0x020000))
    131072+0 records in
    131072+0 records out
    131072 bytes transferred in 0.215768 secs (607467 bytes/sec)
$ dd if=spidump.bin of=mainkernel.bin bs=1 count=$((0x13D000-0x020000)) skip=$((0x020000))
    1167360+0 records in
    1167360+0 records out
    1167360 bytes transferred in 1.900925 secs (614101 bytes/sec)
$ dd if=spidump.bin of=mainrootfs.bin bs=1 count=$((0x660000-0x13D000)) skip=$((0x13D000))
    5386240+0 records in
    5386240+0 records out
    5386240 bytes transferred in 9.163635 secs (587784 bytes/sec)
$ dd if=spidump.bin of=protect.bin bs=1 count=$((0x800000-0x660000)) skip=$((0x660000))
    1703936+0 records in
    1703936+0 records out
    1703936 bytes transferred in 2.743594 secs (621060 bytes/sec)

We have created 4 different binary files:

  1. bootloader.bin: U-boot. The bootloader. It’s not compressed because the Ralink wouldn’t know how to decompress it.
  2. mainkernel.bin: Linux Kernel. The basic firmware in charge of controlling the bare metal. Compressed using lzma
  3. mainrootfs.bin: Filesystem. Contains all sorts of important binaries and configuration files. Compressed as squashfs using the lzma algorithm
  4. protect.bin: Miscellaneous data as explained in Part 3. Not compressed

Extracting the Data

Now that we’ve split the binary into its 4 basic segments, let’s take a closer look at each of them.

Bootloader

binwalk bootloader.bin

Binwalk found the uImage header and decoded it for us. U-Boot uses these headers to identify relevant memory areas. It’s the same info that the file command displayed when we fed it the whole memory dump because it’s the first header in the file.

We don’t care much for the bootloader’s contents in this case, so let’s ignore it.

Kernel

binwalk mainkernel.bin

Compression is something we have to deal with before we can make any use of the data. binwalk has confirmed what we discovered in Part 2, the kernel is compressed using lzma, a very popular compression algorithm in embedded systems. A quick check with strings mainkernel.bin | less confirms there’s no human readable data in the binary, as expected.

There are multiple tools that can decompress lzma, such as 7z or xz. None of those liked mainkernel.bin:

$ xz --decompress mainkernel.bin
xz: mainkernel.bin: File format not recognized

The uImage header is probably messing with tools, so we’re gonna have to strip it out. We know the lzma data starts at byte 0x40, so let’s copy everything but the first 64 bytes.

dd if=mainkernel of=noheader

And when we try to decompress…

$ xz --decompress mainkernel_noheader.lzma
xz: mainkernel_noheader.lzma: Compressed data is corrupt

xz has been able to recognize the file as lzma, but now it doesn’t like the data itself. We’re trying to decompress the whole mainkernel Flash area, but the stored data is extremely unlikely to be occupying 100% of the memory segment. Let’s remove any unused memory from the tail of the binary and try again:

Cut off the tail; decompression success

xz seems to have decompressed the data successfully. We can easily verify that using the strings command, which finds ASCII strings in binary files. Since we’re at it, we may as well look for something useful…

strings kernel grep key

The Wi-Fi Easy and Secure Key Derivation string looks promising, but as it turns out it’s just a hardcoded string defined by the Wi-Fi Protected Setup spec. Nothing to do with the password generation algorithm we’re interested in.

We’ve proven the data has been properly decompressed, so let’s keep moving.

Filesystem

binwalk mainrootfs.bin

The mainrootfs memory segment does not have a uImage header because it’s relevant to the kernel but not to U-Boot.

SquashFS is a very common filesystem in embedded systems. There are multiple versions and variations, and manufacturers sometimes use custom signatures to make the data harder to locate inside the binary. We may have to fiddle with multiple versions of unsquashfs and/or modify the signatures, so let me show you what the signature looks like in this case:

sqsh signature in hexdump

Since the filesystem is very common and finding the right configuration is tedious work, somebody may have already written a script to automate the task. I came across this OSX-specific fork of the Firmware Modification Kit, which compiles multiple versions of unsquashfs and includes a neat script called unsquashfs_all.sh to run all of them. It’s worth a try.

unsquashfs_all.sh mainrootfs.bin

Wasn’t that easy? We got lucky with the SquashFS version and supported signature, and unsquashfs_all.sh managed to decompress the filesystem. Now we’ve got every binary in the filesystem, every symlink and configuration file, and everything is nice and tidy:

tree unsquashed_filesystem

In the complete file tree we can see we’ve got every file in the system, (other than runtime files like those in /var/, of course).

Using the intel we have been gathering on the firmware since day 1 we can start looking for potentially interesting binaries:

grep -i -r '$INTEL' squashfs-root

If we were looking for network/application vulnerabilities in the router, having every binary and config file in the system would be massively useful.

Protected

binwalk protect.bin

As we discussed in Part 3, this memory area is not compressed and contains all pieces of data that need to survive across reboots but be different across devices. strings seems like an appropriate tool for a quick overview of the data:

strings protect.bin

Everything in there seems to be just the curcfg.xml contents, some logs and those few isolated strings in the picture. We already sniffed and analysed all of that data in Part 3, so there’s nothing else to discuss here.

Next Steps

At this point all hardware reversing for the Ralink is complete and we’ve collected everything there was to collect in ROM. Just think of what you may be interested in and there has to be a way to find it. Imagine we wanted to control the router through the UART debug port we found in Part 1, but when we try to access the ATP CLI we can’t figure out the credentials. After dumping the external Flash we’d be able to find the XML file in the protect area, and discover the credentials just like we did in Part 2 (The Rambo Approach to Intel Gatheringadmin:admin).

If you couldn’t dump the memory IC for any reason, the firmware upgrade files provided by the manufacturers will sometimes be complete memory segments; the device simply overwrites the relevant flash areas using code previously loaded to RAM. Downloading the file from the manufacturer would be the equivalent of dumping those segments from flash, so we just need to decompress them. They won’t have all the data, but it may be enough for your purposes.

Now that we’ve got the firmware we just need to think of anything we may be interested in and start looking for it through the data. In the next post we’ll dig a bit into different binaries and try to find more potentially useful data.

Practical Reverse Engineering Part 3 — Following the Data

Projects and learnt lessons on Systems Security, Embedded Development, IoT and anything worth writing about

  • Part 1: Hunting for Debug Ports
  • Part 2: Scouting the Firmware
  • Part 3: Following the Data
  • Part 4: Dumping the Flash
  • Part 5: Digging Through the Firmware

The best thing about hardware hacking is having full access to very bare metal, and all the electrical signals that make the system work. With ingenuity and access to the right equipment we should be able to obtain any data we want. From simply sniffing traffic with a cheap logic analyser to using thousands of dollars worth of equipment to obtain private keys by measuring the power consumed by the device with enough precision (power analysis side channel attack); if the physics make sense, it’s likely to work given the right circumstances.

In this post I’d like to discuss traffic sniffing and how we can use it to gather intel.

Traffic sniffing at a practical level is used all the time for all sorts of purposes, from regular debugging during the delopment process to reversing the interface of gaming controllers, etc. It’s definitely worth a post of its own, even though this device can be reversed without it.

Please check out the legal disclaimer in case I come across anything sensitive.

Full disclosure: I’m in contact with Huawei’s security team. I tried to contact TalkTalk, but their security staff is nowhere to be seen.

Data Flows In the PCB

Data is useless within its static memory cells, it needs to be read, written and passed around in order to be useful. A quick look at the board is enough to deduce where the data is flowing through, based on IC placement and PCB traces:

PCB With Data Flows and Some IC Names

We’re not looking for hardware backdoors or anything buried too deep, so we’re only gonna look into the SPI data flowing between the Ralink and its external Flash.

Pretty much every IC in the market has a datasheet documenting all its technical characteristics, from pinouts to power usage and communication protocols. There are tons of public datasheets on google, so find the ones relevant to the traffic you want to sniff:

Now we’ve got pinouts, electrical characteristics, protocol details… Let’s take a first look and extract the most relevant pieces of data.

Understanding the Flash IC

We know which data flow we’re interested: The SPI traffic between the Ralink IC and Flash. Let’s get started; the first thing we need is to figure out how to connect the logic analyser. In this case we’ve got the datasheet for the Flash IC, so there’s no need to reverse engineer any pinouts:

Flash Pic Annotated Pinout

Standard SPI communication uses 4 pins:

  1. MISO (Master In Slave Out): Data line Ralink<-Flash
  2. MOSI (Master Out Slave In): Data line Ralink->Flash
  3. SCK (Clock Signal): Coordinates when to read the data lines
  4. CS# (Chip Select): Enables the Flash IC when set to 0 so multiple of them can share MISO/MOSI/SCK lines.

We know the pinout, so let’s just connect a logic analyser to those 4 pins and capture some random transmission:

Connected Logic Analyser

In order to set up our logic analyser we need to find out some SPI configuation options, specifically:

  • Transmission endianness [Standard: MSB First]
  • Number of bits per transfer [Standard: 8]. Will be obvious in the capture
  • CPOL: Default state of the clock line while inactive [0 or 1]. Will be obvious in the capture
  • CPHA: Clock edge that triggers the data read in the data lines [0=leading, 1=trailing]. We’ll have to deduce this

The datasheet explains that the flash IC understands only 2 combinations of CPOL and CPHA: (CPOL=0, CPHA=0) or (CPOL=1, CPHA=1)

Datasheet SPI Settings

Let’s take a first look at some sniffed data:

Logic Screencap With CPOL/CPHA Annotated

In order to understand exactly what’s happenning you’ll need the FL064PIF’s instruction set, available in its datasheet:

FL064PIF Instruction Set

Now we can finally analyse the captured data:

Logic Sample SPI Packet

In the datasheet we can see that the FL064PIF has high-performance features for read and write operations: Dual and Quad options that multiplex the data over more lines to increase the transmission speed. From taking a few samples, it doesn’t seem like the router uses these features much -if at all-, but it’s important to keep the possibility in mind in case we see something odd in a capture.

Transmission modes that require additional pins can be a problem if your logic analyser is not powerful enough.

The Importance of Your Sampling Rate [Theory]

A logic analyser is a conceptually simple device: It reads signal lines as digital inputs every x microseconds for y seconds, and when it’s done it sends the data to your computer to be analysed.

For the protocol analyser to generate accurate data it’s vital that we record digital inputs faster than the device writes them. Otherwise the data will be mangled by missing bits or deformed waveforms.

Unfortunately, your logic analyser’s maximum sampling rate depends on how powerful/expensive it is and how many lines you need to sniff at a time. High-speed interfaces with multiple data lines can be a problem if you don’t have access to expensive equipment.

I recorded this data from the Ralink-Flash SPI bus using a low-end Saleae analyser at its maximum sampling rate for this number of lines, 24 MS/s:

Picture of Deformed Clock Signal

As you can see, even though the clock signal has the 8 low to high transitions required for each byte, the waveform is deformed.

Since the clock signal is used to coordinate when to read the data lines, this kind of waveform deformation may cause data corruption even if we don’t drop any bits (depending partly on the design of your logic analyser). There’s always some wiggle room for read inaccuracies, and we don’t need 100% correct data at this point, but it’s important to keep all error vectors in mind.

Let’s sniff the same bus using a higher performance logic analyser at 100 MS/s:

High Sampling Rate SPI Sample Reading

As you can see, this clock signal is perfectly regular when our Sampling Rate is high enough.

If you see anything dodgy in your traffic capture, consider how much data you’re willing to lose and whether you’re being limited by your equipment. If that’s the case, either skip this Reversing vector or consider investing in a better logic analyser.

Seeing the Data Flow

We’re already familiar with the system thanks to the overview of the firmware we did in Part 2, so we can think of some specific SPI transmissions that we may be interested in sniffing. Simply connecting an oscilloscope to the MISO and MOSI pins will help us figure out how to trigger those transmissions and yield some other useful data.

Scope and UART Connected

Here’s a video (no audio) showing both the serial interface and the MISO/MOSI signals while we manipulate the router:

This is a great way of easily identifying processes or actions that trigger flash read/write actions, and will help us find out when to start recording with the logic analyser and for how long.

Analysing SPI Traffic — ATP’s Save Command

In Post 2 I mentioned ATP CLI has a save command that stores something to flash; unfortunately, the help menu (save ?) won’t tell you what it’s doing and the only output when you run it is a few dots that act as a progress bar. Why don’t we find out by ourselves? Let’s make a plan:

  1. Wait until boot sequence is complete and the router is idle so there’s no unexpected SPI traffic
  2. Start the ATP Cli as explained in Part 1
  3. Connect the oscilloscope to MISO/MOSI and run save to get a rough estimate of how much time we need to capture data for
  4. Set a trigger in the enable line sniffed by the logic analyser so it starts recording as soon as the flash IC is selected
  5. Run save
  6. Analyse the captured data

Steps 3 and 4 can be combined so you see the data flow in real time in the scopewhile you see the charge bar for the logic analyser; that way you can make sure you don’t miss any data. In order to comfortably connect both scope and logic sniffer to the same pins, these test clips come in very handy:

SOIC16 Test Clip Connected to Flash IC

Once we’ve got the traffic we can take a first look at it:

Analysing Save Capture on Logic

Let’s consider what sort of data could be extracted from this traffic dump that might be useful to us. We’re working with a memory storage IC, so we can see the data that is being read/written and the addresses where it belongs. I think we can represent that data in a useful way by 2 means:

  1. Traffic map depicting which Flash areas are being written, read or erased in chronological order
  2. Create binary files that replicate the memory blocks that were read/written, preferably removing all the protocol rubbish that we sniffed along with them.

Saleae’s SPI analyser will export the data as a CSV file. Ideally we’d improve their protocol analyser to add the functionality we want, but that would be too much work for this project. One of the great things about low level protocols like SPI is that they’re usually very straightforward; I decided to write some python spaghetti code to analyse the CSV file and extract the data we’re looking for: binmaker.py andtraffic_mapper.py

The workflow to analyse a capture is the following:

  1. Export sniffed traffic as CSV
  2. Run the script:
    • Iterate through the CSV file
    • Identify different commands by their index
    • Recognise the command expressed by the first byte
    • Process its arguments (addresses, etc.)
    • Identify the read/write payload
    • Convert ASCII representation of each payload byte to binary
    • Write binary blocks to different files for MISO (read) and MOSI (write)
  3. Read the traffic map (regular text) and the binaries (hexdump -C output.bin | less)

The scripts generate these results:

The traffic map is much more useful when combined with the Flash memory map we found in Part 2:

Flash Memory Map From Part 2

From the traffic map we can see the bulk of the save command’s traffic is simple:

  1. Read about 64kB of data from the protect area
  2. Overwrite the data we just read

In the MISO binary we can see most of the read data was just tons of 1s:

Picture MISO Hexdump 0xff

Most of the data in the MOSI binary is plaintext XML, and it looks exactly like the /var/curcfg.xml file we discovered in Part 2. As we discussed then, this “current configuration” file contains tons of useful data, including the current WiFi credentials.

It’s standard to keep reserved areas in flash; they’re mostly for miscellaneous data that needs to survive across reboots and be configurable by user, firmware or factory. It makes sense for a command called save to write data to such area, it explains why the data is perfectly readable as opposed to being compressed like the filesystem, and why we found the XML file in the /var/ folder of the filesystem (it’s a folder for runtime files; data in the protect area has to be loaded to memory separately from the filesystem).

The Pot of Gold at the End of the Firmware [Theory]

During this whole process it’s useful to have some sort of target to keep you digging in the same general direction.

Our target is an old one: the algorithm that generates the router’s default WiFi password. If we get our hands on such algorithm and it happens to derive the password from public information, any HG533 in the world with default WiFi credentials would probably be vulnerable.

That exact security issue has been found countless times in the past, usually deriving the password from public data like the Access Point’s MAC address or its SSID.

That being said, not all routers are vulnerable, and I personally don’t expect this one to be. The main reason behind targeting this specific vector is that it’s caused by a recurrent problem in embedded engineering: The need for a piece of data that is known by the firmware, unique to each device and known by an external entity. From default WiFi passwords to device credentials for IoT devices, this problem manifests in different ways all over the Industry.

Future posts will probably reference the different possibilities I’m about to explain, so let me get all that theory out of the way now.

The Sticker Problem

In this day and era, connecting to your router via ethernet so there’s no need for default WiFi credentials is not an option, using a display to show a randomly generated password would be too expensive, etc. etc. etc. The most widely adopted solution for routers is to create a WiFi network using default credentials, print those credentials on a sticker at the factory and stick it to the back of the device.

Router Sticker - Annotated

The WiFi password is the ‘unique piece of data’, and the computer printing the stickers in the factory is the ‘external entity’. Both the firmware and the computer need to know the default WiFi credentials, so the engineer needs to decide how to coordinate them. Usually there are 2 options available:

  1. The same algorithm is implemented in both the device and the computer, and its input parameters are known to both of them
  2. A computer generates the credentials for each device and they’re stored into each device separately

Developer incompetence aside, the first approach is usually taken as a last resort; if you can’t get your hardware manufacturer to flash unique data to each device or can’t afford the increase in manufacturing cost.

The second approach is much better by design: We’re not trusting the hardware with data sensitive enough to compromise every other device in the field. That being said, the company may still decide to use an algorithm with predictable outputs instead of completely random data; that would make the system as secure as the weakest link between the algorithm -mathematically speaking-, the confidentiality of their source code and the security of the computers/network running it.

Sniffing Factory Reset

So now that we’ve discussed our target, let’s gather some data about it. The first thing we wanna figure out is which actions will kickstart the flow of relevant data on the PCB. In this case there’s 1 particular action: Pressing the Factory Reset button for 10s. This should replace the existing WiFi credentials with the default ones, so the default creds will have to be generated/read. If the key or the generation algorithm need to be retrieved from Flash, we’ll see them in a traffic capture.

That’s exactly what we’re gonna do, and we’re gonna observe the UART interface, the oscilloscope and the logic analyser during/after pressing the reset button. The same process we followed for ATP’s save gives us these results:

UART output:

UART Factory Reset Debug Messages

Traffic overview:

Logic Screencap Traffic Overview

Output from our python scripts:

The traffic map tells us the device first reads and overwrites 2 large chunks of data from the protect area and then reads a smaller chunk of data from the filesystem (possibly part of the next process to execute):

___________________
|Transmission  Map|
|  MOSI  |  MISO  |
|        |0x7e0000| Size: 12    //Part of the Protected area
|        |0x7e0000| Size: 1782
|        |0x7e073d| Size: 63683
| ERASE 0x7e073d  | Size: 64kB
|0x7e073d|        | Size: 195
|0x7e0800|        | Size: 256
|0x7e0900|        | Size: 256
---------//--------
       [...]
---------//--------
|0x7e0600|        | Size: 256
|0x7e0700|        | Size: 61
|        |0x7d0008| Size: 65529 //Part of the Protected area
| ERASE 0x7d0008  | Size: 64kB
|0x7d0008|        | Size: 248
|0x7d0100|        | Size: 256
---------//--------
       [...]
---------//--------
|0x7dff00|        | Size: 256
|0x7d0000|        | Size: 8
|        |0x1c3800| Size: 512   //Part of the Filesystem
|        |0x1c3a00| Size: 512
---------//--------
       [...]
---------//--------
|        |0x1c5a00| Size: 512
|        |0x1c5c00| Size: 512
-------------------

Once again, we combine transmission map and binary files to gain some insight into the system. In this case, the ‘factory reset’ code seems to:

  1. Read ATP_LOG from Flash; it contains info such as remote router accesses or factory resets. It ends with a large chunk of 1s (0xff)
  2. Overwrite that memory segment with 1s
  3. write a ‘new’ ATP_LOG followed by the “current configuration” curcfg.xmlfile
  4. Read compressed (unintelligible to us) memory chunk from the filesystem

The chunk from the filesystem is read AFTER writing the new password to Flash, which doesn’t make sense for a password generation algorithm. That being said, the algorithm may be already loaded into memory, so its absence in the SPI traffic is not conclusive on whether or not it exists.

As part of the MOSI data we can see the new WiFi password be saved to Flash inside the XML string:

Found Current Password MOSI

What about the default password being read? If we look in the MISO binary, it’s nowhere to be seen. Either the Ralink is reading it using a different mode (secure/dual/quad/?) or the credentials/algorithm are already loaded in RAM (no need to read them from Flash again, since they can’t change). The later seems more likely, so I’m not gonna bother updating my scripts to support different read modes. We write down what we’ve found and we’ll get back to the default credentials in the next part.

Since we’re at it, let’s take a look at the SPI traffic generated when setting new WiFi credentials via HTTP: MapMISOMOSI. We can actually see the default credentials being read from the protect area of Flash this time (not sure why the Ralink would load it to set a new password; it’s probably incidental):

Default WiFi Creds In MISO Capture

As you can see, they’re in plain text and separated from almost anything else in Flash. This may very well mean there’s no password generation algorithm in this device, but it is NOT conclusive. The developers could have decided to generate the credentials only once (first boot?) and store them to flash in order to limit the number of times the algorithm is accessed/executed, which helps hide the binary that contains it. Otherwise we could just observe the running processes in the router while we press the Factory Reset button and see which ones spawn or start consuming more resources.

Next Steps

Now that we’ve got the code we need to create binary recreations of the traffic and transmission maps, getting from a capture to binary files takes seconds. I captured other transmissions such as the first few seconds of boot (mapmiso), but there wasn’t much worth discussing. The ability to easily obtain such useful data will probably come in handy moving forward, though.

In the next post we get the data straight from the source, communicating with the Flash IC directly to dump its memory. We’ll deal with compression algorithms for the extracted data, and we’ll keep piecing everything together.

Happy Hacking! 🙂

 

Practical Reverse Engineering Part 2 — Scouting the Firmware

Projects and learnt lessons on Systems Security, Embedded Development, IoT and anything worth writing about

 

  • Part 1: Hunting for Debug Ports
  • Part 2: Scouting the Firmware
  • Part 3: Following the Data
  • Part 4: Dumping the Flash
  • Part 5: Digging Through the Firmware

In part 1 we found a debug UART port that gave us access to a Linux shell. At this point we’ve got the same access to the router that a developer would use to debug issues, control the system, etc.

This first overview of the system is easy to access, doesn’t require expensive tools and will often yield very interesting results. If you want to do some hardware hacking but don’t have the time to get your hands too dirty, this is often the point where you stop digging into the hardware and start working on the higher level interfaces: network vulnerabilities, ISP configuration protocols, etc.

These posts are hardware-oriented, so we’re just gonna use this access to gather some random pieces of data. Anything that can help us understand the system or may come in handy later on.

Please check out the legal disclaimer in case I come across anything sensitive.

Full disclosure: I’m in contact with Huawei’s security team; they’ve had time to review the data I’m going to reveal in this post and confirm there’s nothing too sensitive for publication. I tried to contact TalkTalk, but their security staff is nowhere to be seen.

Picking Up Where We Left Off

Picture of Documented UARTs

We get our serial terminal application up and running in the computer and power up the router.

Boot Sequence

We press enter and get the login prompt from ATP Cli; introduce the credentials admin:admin and we’re in the ATP command line. Execute the command shell and we get to the BusyBox CLI (more on BusyBox later).

-------------------------------
-----Welcome to ATP Cli------
-------------------------------
Login: admin
Password:    #Password is ‘admin'
ATP>shell
BusyBox vv1.9.1 (2013-08-29 11:15:00 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.
# ls
var   usr   tmp   sbin  proc  mnt   lib   init  etc   dev   bin

At this point we’ve seen the 3 basic layers of firmware in the Ralink IC:

  1. U-boot: The device’s bootloader. It understands the device’s memory map, kickstarts the main firmware execution and takes care of some other low level tasks
  2. Linux: The router is running Linux to keep overall control of the hardware, coordinate parallel processes, etc. Both ATP CLI and BusyBox run on top of it
  3. Busybox: A small binary including reduced versions of multiple linux commands. It also supplies the shell we call those commands from.

Lower level interfaces are less intuitive, may not have access to all the data and increase the chances of bricking the device; it’s always a good idea to start from BusyBox and walk your way down.

For now, let’s focus on the boot sequence itself. The developers thought it would be useful to display certain pieces of data during boot, so let’s see if there’s anything we can use.

Boot Debug Messages

We find multiple random pieces of data scattered across the boot sequence. We’ll find useful info such as the compression algorithm used for some flash segments:

boot msg kernel lzma

Intel on how the external flash memory is structured will be very useful when we get to extracting it.

ram data. not very useful

SPI Flash Memory Map!

And more compression intel:

root is squashfs'd

We’ll have to deal with the compression algorithms when we try to access the raw data from the external Flash, so it’s good to know which ones are being used.

What Are ATP CLI and BusyBox Exactly? [Theory]

The Ralink IC in this router runs a Linux kernel to control memory and parallel processes, keep overall control of the system, etc. In this case, according to the Ralink’s product brief, they used the Linux 2.6.21 SDKATP CLI is a CLI running either on top of Linux or as part of the kernel. It provides a first layer of authentication into the system, but other than that it’s very limited:

ATP>help
Welcome to ATP command line tool.
If any question, please input "?" at the end of command.
ATP>?
cls
debug
help
save
?
exit
ATP>

help doesn’t mention the shell command, but it’s usually either shell orsh. This ATP CLI includes less than 10 commands, and doesn’t support any kind of complex process control or file navigation. That’s where BusyBox comes in.

BusyBox is a single binary containing reduced versions of common unix commands, both for development convenience and -most importantly- to save memory. From ls and cd to top, System V init scripts and pipes, it allows us to use the Ralink IC somewhat like your regular Linux box.

One of the utilities the BusyBox binary includes is the shell itself, which has access to the rest of the commands:

ATP>shell
BusyBox vv1.9.1 (2013-08-29 11:15:00 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.
# ls
var   usr   tmp   sbin  proc  mnt   lib   init  etc   dev   bin
#
# ls /bin
zebra        swapdev      printserver  ln           ebtables     cat
wpsd         startbsp     pppc         klog         dns          busybox
wlancmd      sntp         ping         kill         dms          brctl
web          smbpasswd    ntfs-3g      iwpriv       dhcps        atserver
usbserver    smbd         nmbd         iwconfig     dhcpc        atmcmd
usbmount     sleep        netstat      iptables     ddnsc        atcmd
upnp         siproxd      mount        ipp          date         at
upg          sh           mldproxy     ipcheck      cwmp         ash
umount       scanner      mknod        ip           cp           adslcmd
tr111        rm           mkdir        igmpproxy    console      acl
tr064        ripd         mii_mgr      hw_nat       cms          ac
telnetd      reg          mic          ethcmd       cli
tc           radvdump     ls           equipcmd     chown
switch       ps           log          echo         chmod
#

You’ll notice different BusyBox quirks while exploring the filesystem, such as the symlinks to a busybox binary in /bin/. That’s good to know, since any commands that may contain sensitive data will not be part of the BusyBox binary.

Exploring the File System

Now that we’re in the system and know which commands are available, let’s see if there’s anything useful in there. We just want a first overview of the system, so I’m not gonna bother exposing every tiny piece of data.

The top command will help us identify which processes are consuming the most resources. This can be an extremely good indicator of whether some processes are important or not. It doesn’t say much while the router’s idle, though:

top

One of the processes running is usbmount, so the router must support connecting ‘something’ to the USB port. Let’s plug in a flash drive in there…

usb 1-1: new high speed USB device using rt3xxx-ehci and address 2
[...]
++++++sambacms.c 2374 renice=renice -n +10 -p 1423

The USB is recognised and mounted to /mnt/usb1_1/, and a samba server is started. These files show up in /etc/samba/:

# ls -l /etc/samba/
-rw-r--r--    1 0        0             103 smbpasswd
-rw-r--r--    1 0        0               0 smbusers
-rw-r--r--    1 0        0             480 smb.conf
-rw-------    1 0        0            8192 secrets.tdb
# cat /etc/samba/smbpasswd
nobody:0:XXXXXXXXXXXXXXXXXXX:564E923F5AF30J373F7C8_______4D2A:[U ]:LCT-1ED36884:

More data, in case it ever comes in handy:

  • netstat -a: Network ports the device is listening at
  • iptables –list: We could set up telnet and continue over the network, but I’d rather stay as close to the bare metal as possible
  • wlancmd help: Utility to control the WiFi radio, plenty of options available
  • /etc/profile
  • /etc/inetd
  • /etc/services
  • /var/: Contains files used by the system during the course of its operation
  • /etc/: System configuration files, etc.

/var/ and /etc/ always contain tons of useful data, and some of it makes itself obvious at first sight. Does that say /etc/serverkey.pem??

Blurred /etc/serverkey.pem

¯\_(ツ)_/¯

It’s not unusual to find private keys for TLS certificates in embedded systems. By accessing 1 single device via hardware you may obtain the keys that will help you attack any other device of the same model.

This key could be used to communicate with some server from Huawei or the ISP, although that’s less common. On the other hand, it’s also very common to findpublic certs used to communicate with remote servers.

In this case we find 2 certificates next to the private key; both are self-signed by the same ‘person’:

  • /etc/servercert.pem: Most likely the certificate for the serverkey
  • /etc/root.pem: Probably used to connect to a server from the ISP or Huawei. Not sure.

And some more data in /etc/ppp256/config and /etc/ppp258/config:

/var/wan/ppp256/config

These credentials are also available via the HTTP interface, which is why I’m publishing them, but that’s not the case in many other routers (more on this later).

With so many different files everywhere it can be quite time consuming to go through all the info without the right tools. We’re gonna copy as much data as we can into the USB drive and go through it on our computer.

The Rambo Approach to Intel Gathering

Once we have as many files as possible in our computer we can check some things very quick. find . -name *.pem reveals there aren’t any other TLS certificates.

What about searching the word password in all files? grep -i -r password .

Grep Password

We can see lots of credentials; most of them are for STUN, TR-069 and local services. I’m publishing them because this router proudly displays them all via the HTTP interface, but those are usually hidden.

If you wanna know what happens when someone starts pulling from that thread, check out Alexander Graf’s talk “Beyond Your Cable Modem”, from CCC 2015. There are many other talks about attacking TR-069 from DefCon, BlackHat, etc. etc.

The credentials we can see are either in plain text or encoded in base64. Of course, encoding is worthless for data protection:

$ echo "QUJCNFVCTU4=" | base64 -D
ABB4UBMN

WiFi pwd in curcfg.xml

That is the current WiFi password set in the router. It leads us to 2 VERY interesting files. Not just because of their content, but because they’re a vital part of how the router operates:

  • /var/curcfg.xml: Current configuration file. Among other things, it contains the current WiFi password encoded in base64
  • /etc/defaultcfg.xml: Default configuration file, used for ‘factory reset’. Does not include the default WiFi password (more on this in the next posts)

Exploring ATP’s CLI

The ATP CLI includes very few commands. The most interesting one -besidesshell— is debug. This isn’t your regular debugger; debug display will simply give you some info about the commands igmpproxycwmpsysuptime or atpversionMost of them don’t have anything juicy, but what about cwmp? Wasn’t that related to remote configuration of routers?

debug display cwmp

Once again, these are the CWMP (TR-069) credentials used for remote router configuration. Not even encoded this time.

The rest of the ATP commands are pretty useless: clear screen, help menu, save to flash and exit. Nothing worth going into.

Exploring Uboot’s CLI

The bootloader’s command line interface offers raw access to some memory areas. Unfortunately, it doesn’t give us direct access to the Flash IC, but let’s check it out anyway.

Please choose operation:
   3: Boot system code via Flash (default).
   4: Entr boot command line interface.
You choosed 4
Stopped Uboot WatchDog Timer.
4: System Enter Boot Command Line Interface.
U-Boot 1.1.3 (Aug 29 2013 - 11:16:19)
RT3352 # help
?       - alias for 'help'
bootm   - boot application image from memory
cp      - memory copy
erase   - erase SPI FLASH memory
go      - start application at address 'addr'
help    - print online help
md      - memory display
mdio   - Ralink PHY register R/W command !!
mm      - memory modify (auto-incrementing)
mw      - memory write (fill)
nm      - memory modify (constant address)
printenv- print environment variables
reset   - Perform RESET of the CPU
rf      - read/write rf register
saveenv - save environment variables to persistent storage
setenv  - set environment variables
uip - uip command
version - print monitor version
RT3352 #

Don’t touch commands like erasemmmw or nm unless you know exactly what you’re doing; you’d probably just force a router reboot, but in some cases you may brick the device. In this case, md (memory display) and printenv are the commands that call my atention.

RT3352 # printenv
bootcmd=tftp
bootdelay=2
baudrate=57600
ethaddr="00:AA:BB:CC:DD:10"
ipaddr=192.168.1.1
serverip=192.168.1.2
ramargs=setenv bootargs root=/dev/ram rw
addip=setenv bootargs $(bootargs) ip=$(ipaddr):$(serverip):$(gatewayip):$(netmask):$(hostname):$(netdev):off
addmisc=setenv bootargs $(bootargs) console=ttyS0,$(baudrate) ethaddr=$(ethaddr) panic=1
flash_self=run ramargs addip addmisc;bootm $(kernel_addr) $(ramdisk_addr)
kernel_addr=BFC40000
u-boot=u-boot.bin
load=tftp 8A100000 $(u-boot)
u_b=protect off 1:0-1;era 1:0-1;cp.b 8A100000 BC400000 $(filesize)
loadfs=tftp 8A100000 root.cramfs
u_fs=era bc540000 bc83ffff;cp.b 8A100000 BC540000 $(filesize)
test_tftp=tftp 8A100000 root.cramfs;run test_tftp
stdin=serial
stdout=serial
stderr=serial
ethact=Eth0 (10/100-M)

Environment size: 765/4092 bytes

We can see settings like the UART baudrate, as well as some interesting memory locations. Those memory addresses are not for the Flash IC, though. The flash memory is only addressed by 3 bytes: [0x000000000x00FFFFFF].

Let’s take a look at some of them anyway, just to see the kind of access this interface offers.What about kernel_addr=BFC40000?

md `badd` Picture

Nope, that badd message means bad address, and it has been hardcoded in md to let you know that you’re trying to access invalid memory locations. These are good addresses, but they’re not accessible to u-boot at this point.

It’s worth noting that by starting Uboot’s CLI we have stopped the router from loading the linux Kernel onto memory, so this interface gives access to a very limited subset of data.

SPI Flash string in md

We can find random pieces of data around memory using this method (such as thatSPI Flash Image string), but it’s pretty hopeless for finding anything specific. You can use it to get familiarised with the memory architecture, but that’s about it. For example, there’s a very obvious change in memory contents at 0x000d0000:

md.w 0x000d0000

And just because it’s about as close as it gets to seeing the girl in the red dress, here is the md command in action. You’ll notice it’s very easy to spot that change in memory contents at 0x000d0000.

Next Steps

In the next post we combine firmware and bare metal, explain how data flows and is stored around the device, and start trying to manipulate the system to leak pieces of data we’re interested in.

Thanks for reading! 🙂

Practical Reverse Engineering Part 1 — Hunting for Debug Ports

Projects and learnt lessons on Systems Security, Embedded Development, IoT and anything worth writing about

  • Part 1: Hunting for Debug Ports
  • Part 2: Scouting the Firmware
  • Part 3: Following the Data
  • Part 4: Dumping the Flash
  • Part 5: Digging Through the Firmware

 

In this series of posts we’re gonna go through the process of Reverse Engineering a router. More specifically, a Huawei HG533.

Huawei HG533

At the earliest stages, this is the most basic kind of reverse engineering. We’re simple looking for a serial port that the engineers who designed the device left in the board for debug and -potentially- technical support purposes.

Even though I’ll be explaining the process using a router, it can be applied to tons of household embedded systems. From printers to IP cameras, if it’s mildly complex it’s quite likely to be running some form of linux. It will also probably have hidden debug ports like the ones we’re gonna be looking for in this post.

Finding the Serial Port

Most UART ports I’ve found in commercial products are between 4 and 6 pins, usually neatly aligned and sometimes marked in the PCB’s silkscreen somehow. They’re not for end users, so they almost never have pins or connectors attached.

After taking a quick look at the board, 2 sets of unused pads call my atention (they were unused before I soldered those pins in the picture, anyway):

Pic of the 2 Potential UART Ports

This device seems to have 2 different serial ports to communicate with 2 different Integrated Circuits (ICs). Based on the location on the board and following their traces we can figure out which one is connected to the main IC. That’s the most likely one to have juicy data.

In this case we’re simply gonna try connecting to both of them and find out what each of them has to offer.

Identifying Useless Pins

So we’ve found 2 rows of pins that -at first sight- could be UART ports. The first thing you wanna do is find out if any of those contacts is useless. There’s a very simple trick I use to help find useless pads: Flash a bright light from the backside of the PCB and look at it from directly above. This is what that looks like:

2nd Serial Port - No Headers

We can see if any of the layers of the PCB is making contact with the solder blob in the middle of the pad.

  1. Connected to something (we can see a trace “at 2 o’clock”)
  2. NOT CONNECTED
  3. 100% connected to a plane or thick trace. It’s almost certainly a power pin, either GND or Vcc
  4. Connections at all sides. This one is very likely to be the other power pin. There’s no reason for a data pin in a debug port to be connected to 4 different traces, but the pad being surrounded by a plane would explain those connections
  5. Connected to something

Soldering Pins for Easy Access to the Lines

In the picture above we can see both serial ports.

The pads in these ports are through-hole, but the holes themselves are filled in with blobs of very hard, very high melting point solder.

I tried soldering the pins over the pads, but the solder they used is not easy to work with. For the 2nd serial port I decided to drill through the solder blobs with a Dremel and a needle bit. That way we can pass the pins through the holes and solder them properly on the back of the PCB. It worked like a charm.

Use a Dremel to Drill Through the Solder Blobs

Identifying the Pinout

So we’ve got 2 connectors with only 3 useful pins each. We still haven’t verified the ports are operative or identified the serial protocol used by the device, but the number and arrangement of pins hint at UART.

Let’s review the UART protocol. There are 6 pin types in the spec:

  • Tx [Transmitting Pin. Connects to our Rx]
  • Rx [Receiving Pin. Connects to our Tx]
  • GND [Ground. Connects to our GND]
  • Vcc [The board’s power line. Usually 3.3V or 5V. DO NOT CONNECT]
  • CTS [Typically unused]
  • DTR [Typically unused]

We also know that according to the Standard, Tx and Rx are pulled up (set to 1) by default. The Transmitter of the line (Tx) is in charge of pulling it up, which means if it’s not connected the line’s voltage will float.

So let’s compile what we know and get to some conclusions:

  1. Only 3 pins in each header are likely to be connected to anything. Those must be Tx, Rx and GND
  2. Two pins look a lot like Vcc and GND
  3. One of them -Tx- will be pulled up by default and be transmitting data
  4. The 3rd of them, Rx, will be floating until we connect the other end of the line

That information seems enough to start trying different combinations with your UART-to-USB bridge, but randomly connecting pins you don’t understand is how you end up blowing shit up.

Let’s keep digging.

A multimeter or a logic analyser would be enough to figure out which pin is which, but if you want to understand what exactly is going on in each pin, nothing beats a half decent oscilloscope:

Channel1=Tx Channel2=Rx

After checking the pins out with an oscilloscope, this is what we can see in each of them:

  1. GND and Vcc verified — solid 3.3V and 0V in pins 2 and 3, as expected
  2. Tx verified — You can clearly see the device is sending information
  3. One of the pins floats at near-0V. This must be the device’s Rx, which is floating because we haven’t connected the other side yet.

So now we know which pin is which, but if we want to talk to the serial port we need to figure out its baudrate. We can find this with a simple protocol dump from a logic analyser. If you don’t have one, you’ll have to play “guess the baudrate” with a list of the most common ones until you get readable text through the serial port.

This is a dump from a logic analyser in which we’ve enabled protocol analysis and tried a few different baudrates. When we hit the right one, we start seeing readable text in the sniffed serial data (\n\r\n\rU-Boot 1.1.3 (Aug...)

Logic Protocol Analyser

Once we have both the pinout and baudrate, we’re ready to start communicating with the device:

Documented UART Pinouts

Connecting to the Serial Ports

Now that we’ve got all the info we need on the hardware side, it’s time to start talking to the device. Connect any UART to USB bridge you have around and start wandering around. This is my hardware setup to communicate with both serial ports at the same time and monitor one of the ports with an oscilloscope:

All Connected

And when we open a serial terminal in our computer to communicate with the device, the primary UART starts spitting out useful info. These are the commands I use to connect to each port as well as the first lines they send during the boot process:

Boot Sequence

Please choose operation:
   3: Boot system code via Flash (default).
   4: Entr boot command line interface.
 0

‘Command line interface’?? We’ve found our way into the system! When we press 4we get a command line interface to interact with the device’s bootloader.

Furthermore, if we let the device start as the default 3, wait for it to finish booting up and press enter, we get the message Welcome to ATP Cli and a login prompt. If the devs had modified the password this step would be a bit of an issue, but it’s very common to find default credentials in embedded systems. After a few manual tries, the credentials admin:admin succeeded and I got access into the CLI:

-------------------------------
-----Welcome to ATP Cli------
-------------------------------

Login: admin
Password:    #Password is ‘admin'
ATP>shell

BusyBox vv1.9.1 (2013-08-29 11:15:00 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.

# ls
var   usr   tmp   sbin  proc  mnt   lib   init  etc   dev   bin

Running the shell command in ATP will take us directly into Linux’s CLI with root privileges 🙂

This router runs BusyBox, a linux-ish interface which I’ll talk about in more detail in the next post.

Next Steps

Now that we have access to the BusyBox CLI we can start nosing around the software. Depending on what device you’re reversing there could be plain text passwords, TLS certificates, useful algorithms, unsecured private APIs, etc. etc. etc.

In the next post we’ll focus on the software side of things. I’ll explain the differences between boot modes, how to dump memory, and other fun things you can do now that you’ve got direct access to the device’s firmware.

Thanks for reading! 🙂

Reverse Engineering With Radare2 – Part 3

Sorry about the larger delay between the previous post and this one, but I was very busy the last weeks.
(And the technology I wanted to show wasn’t completely implemented in radare2, which means that I had to implement it on my own 😉 ). In case you’re new to this series, you’ll find the previous posts here.

As you may already know, we’ll deal with the third challenge today. The purpose for this one is to introduce
some constructs which are often used in real programs.

Let’s start like the last times by loading the binary into radare2 (r2 -AA ./challenge03).

As you may notice, the main function differs from the previous challenges:

[0x00400a04]> VV @ sym.main (nodes 4 edges 4 zoom 100%) BB-NORM mouse:canvas-y movements-speed:5
        =---------------------------------------------=
        |  0x400a04                                   |
        |   ;-- main:                                 |
        | (fcn) sym.main 115                          |
        | ; var int local_138h @ rbp-0x138            |
        | ; var int local_130h @ rbp-0x130            |
        | ; var int local_124h @ rbp-0x124            |
        | ; var int local_120h @ rbp-0x120            |
        | ; var int local_18h @ rbp-0x18              |
        | ; var int local_10h @ rbp-0x10              |
        | push rbp                                    |
        | mov rbp, rsp                                |
        | sub rsp, 0x140                              |
        | mov dword [rbp - local_124h], edi           |
        | mov qword [rbp - local_130h], rsi           |
        | mov qword [rbp - local_138h], rdx           |
        | mov eax, 0                                  |
        | call sym.banner ;[a]                        |
        | mov dword [rbp - local_120h], 0             |
        | mov qword [rbp - local_18h], obj.passwords  |
        | mov qword [rbp - local_10h], obj.passwords  |
        | lea rax, [rbp - local_120h]                 |
        | mov rdi, rax                                |
        | call sym.checkPassword ;[b]                 |
        | test al, al                                 |
        | je 0x400a66 ;[c]                            |
        =---------------------------------------------=
              t f
      .-------' '-----------------------.
      |                                 |
      |                                 |
=-------------------------=     =---------------------------------=
|  0x400a66               |     |  0x400a5a                       |
| mov edi, str.Wrong_     |     | mov edi, str.Password_accepted_ |
| call sym.imp.puts ;[d]  |     | call sym.imp.puts ;[d]          |
=-------------------------=     | jmp 0x400a70 ;[e]               |
    v                           =---------------------------------=
    |                               v
    '-------------------.-----------'
                        |
                        |
                    =--------------------=
                    |  0x400a70          |
                    | mov eax, 0         |
                    | leave              |
                    | ret                |
                    =--------------------=

versus

[0x004008e3]> VV @ sym.main (nodes 4 edges 4 zoom 100%) BB-NORM mouse:canvas-y movements-speed:5
            =------------------------------------=
            |  0x4008e3                          |
            |   ;-- main:                        |
            | (fcn) sym.main 133                 |
            |   sym.main ();                     |
            | ; var int local_118h @ rbp-0x118   |
            | ; var int local_110h @ rbp-0x110   |
            | ; var int local_104h @ rbp-0x104   |
            | ; var int local_100h @ rbp-0x100   |
            | ; var int local_1h @ rbp-0x1       |
            | push rbp                           |
            | mov rbp, rsp                       |
            | sub rsp, 0x120                     |
            | mov dword [rbp - local_104h], edi  |
            | mov qword [rbp - local_110h], rsi  |
            | mov qword [rbp - local_118h], rdx  |
            | mov eax, 0                         |
            | call sym.banner ;[a]               |
            | mov edi, str.Enter_Password:       |
            | mov eax, 0                         |
            | call sym.imp.printf ;[b]           |
            | lea rax, [rbp - local_100h]        |
            | mov rsi, rax                       |
            | mov edi, str._255s                 |
            | mov eax, 0                         |
            | call sym.imp.__isoc99_scanf ;[c]   |
            | mov byte [rbp - local_1h], 0       |
            | lea rax, [rbp - local_100h]        |
            | mov rdi, rax                       |
            | call sym.checkPassword ;[d]        |
            | test al, al                        |
            | je 0x400957 ;[e]                   |
            =------------------------------------=
                  t f
      .-----------' '-------------------.
      |                                 |
      |                                 |
=-------------------------=     =---------------------------------=
|  0x400957               |     |  0x40094b                       |
| mov edi, str.Wrong_     |     | mov edi, str.Password_accepted_ |
| call sym.imp.puts ;[f]  |     | call sym.imp.puts ;[f]          |
=-------------------------=     | jmp 0x400961 ;[g]               |
    v                           =---------------------------------=
    |                               v
    '-------------------.-----------'
                        |
                        |
                    =--------------------=
                    |  0x400961          |
                    | mov eax, 0         |
                    | leave              |
                    | ret                |
                    =--------------------=

The second half of the function seems the same (pseudo):

int main(int argc, char **argv) {
    banner();
    [...]
    if(checkPassword(...)) {
        puts("Password accepted!\n");
    } else {
        puts("Wrong!\n");
    }
    return 0;
}

Whilst the previous challenges read the user password via scanf, this time some local variables are set.
One of them is set to zero, the other two to an address flagged as obj.passwords. As this seems very promising,
we’ll look what’s at this address (with px @ obj.passwords):

- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x00601060  080b 4000 0000 0000 100b 4000 0000 0000  ..@.......@.....
0x00601070  180b 4000 0000 0000 200b 4000 0000 0000  ..@..... .@.....
0x00601080  280b 4000 0000 0000 0000 0000 0000 0000  (.@.............
0x00601090  4743 433a 2028 474e 5529 2036 2e31 2e31  GCC: (GNU) 6.1.1
0x006010a0  2032 3031 3630 3830 3200 0000 0000 0000   20160802.......
0x006010b0  0000 0000 0000 0000 0000 0000 0000 0000  ................
0x006010c0  0000 0000 0000 0000 0000 0000 0300 0100  ................
0x006010d0  3802 4000 0000 0000 0000 0000 0000 0000  8.@.............
0x006010e0  0000 0000 0300 0200 5402 4000 0000 0000  ........T.@.....
0x006010f0  0000 0000 0000 0000 0000 0000 0300 0300  ................
0x00601100  7402 4000 0000 0000 0000 0000 0000 0000  t.@.............
0x00601110  0000 0000 0300 0400 9802 4000 0000 0000  ..........@.....
0x00601120  0000 0000 0000 0000 0000 0000 0300 0500  ................
0x00601130  0803 4000 0000 0000 0000 0000 0000 0000  ..@.............
0x00601140  0000 0000 0300 0600 4805 4000 0000 0000  ........H.@.....
0x00601150  0000 0000 0000 0000 0000 0000 0300 0700  ................

Hm, this is obviously not a list of readable strings… But if we look closer,
we can see that the values at this address could contain again addresses…
(the 2nd-7th, 10th-15th, 18th-23th, … bytes are the same and fit into valid memory regions).

As this is a 64bit binary, we’ll do the hexdump again, but this time with quadwords (pxq):

0x00601060  0x0000000000400b08  0x0000000000400b10   ..@.......@.....
0x00601070  0x0000000000400b18  0x0000000000400b20   ..@..... .@.....
0x00601080  0x0000000000400b28  0x0000000000000000   (.@.............
0x00601090  0x4e4728203a434347  0x312e312e36202955   GCC: (GNU) 6.1.1
0x006010a0  0x3038303631303220  0x0000000000000032    20160802.......
0x006010b0  0x0000000000000000  0x0000000000000000   ................
0x006010c0  0x0000000000000000  0x0001000300000000   ................
0x006010d0  0x0000000000400238  0x0000000000000000   8.@.............
0x006010e0  0x0002000300000000  0x0000000000400254   ........T.@.....
0x006010f0  0x0000000000000000  0x0003000300000000   ................
0x00601100  0x0000000000400274  0x0000000000000000   t.@.............
0x00601110  0x0004000300000000  0x0000000000400298   ..........@.....
0x00601120  0x0000000000000000  0x0005000300000000   ................
0x00601130  0x0000000000400308  0x0000000000000000   ..@.............
0x00601140  0x0006000300000000  0x0000000000400548   ........H.@.....
0x00601150  0x0000000000000000  0x0007000300000000   ................

This looks far better… As we see, radare also shows the values in green this time, which is another point into
the address direction. By using pxQ we can advise radare2 to print them one per aline and additional give some meta infos about them:

0x00601060 0x0000000000400b08 str.shad7Pe
0x00601068 0x0000000000400b10 str.miTho7i
0x00601070 0x0000000000400b18 str.Wa3suac
0x00601078 0x0000000000400b20 str.ohGhah7
0x00601080 0x0000000000400b28 str.Aibah1a
0x00601088 0x0000000000000000 section.
0x00601090 0x4e4728203a434347 
0x00601098 0x312e312e36202955 
0x006010a0 0x3038303631303220 
0x006010a8 0x0000000000000032 section_end..comment+24
0x006010b0 0x0000000000000000 section.
0x006010b8 0x0000000000000000 section.
0x006010c0 0x0000000000000000 section.
0x006010c8 0x0001000300000000 
0x006010d0 0x0000000000400238 section..interp
0x006010d8 0x0000000000000000 section.
0x006010e0 0x0002000300000000 
0x006010e8 0x0000000000400254 section_end..interp
0x006010f0 0x0000000000000000 section.
0x006010f8 0x0003000300000000 
0x00601100 0x0000000000400274 section_end..note.ABI_tag
0x00601108 0x0000000000000000 section.
0x00601110 0x0004000300000000 
0x00601118 0x0000000000400298 section_end..note.gnu.build_id
0x00601120 0x0000000000000000 section.
0x00601128 0x0005000300000000 
0x00601130 0x0000000000400308 section..dynsym
0x00601138 0x0000000000000000 section.
0x00601140 0x0006000300000000 
0x00601148 0x0000000000400548 section_end..dynsym
0x00601150 0x0000000000000000 section.
0x00601158 0x0007000300000000 

If you want to have an even more annotated view, use pxr:

0x00601060  0x0000000000400b08   ..@..... (.rodata) str.shad7Pe R 0x65503764616873 (shad7Pe) --> ascii
0x00601068  0x0000000000400b10   ..@..... (.rodata) str.miTho7i R 0x69376f6854696d (miTho7i) --> ascii
0x00601070  0x0000000000400b18   ..@..... (.rodata) str.Wa3suac R 0x63617573336157 (Wa3suac) --> ascii
0x00601078  0x0000000000400b20    .@..... (.rodata) str.ohGhah7 R 0x3768616847686f (ohGhah7) --> ascii
0x00601080  0x0000000000400b28   (.@..... (.rodata) str.Aibah1a R 0x61316861626941 (Aibah1a) --> ascii
0x00601088  0x0000000000000000   ........ section_end.GNU_STACK
0x00601090  0x4e4728203a434347   GCC: (GN ascii
0x00601098  0x312e312e36202955   U) 6.1.1 ascii
0x006010a0  0x3038303631303220    2016080 ascii
0x006010a8  0x0000000000000032   2....... (.shstrtab) ascii
0x006010b0  0x0000000000000000   ........ section_end.GNU_STACK
0x006010b8  0x0000000000000000   ........ section_end.GNU_STACK
0x006010c0  0x0000000000000000   ........ section_end.GNU_STACK
0x006010c8  0x0001000300000000   ........
0x006010d0  0x0000000000400238   8.@..... (.interp) section.INTERP R 0x6c2f343662696c2f (/lib64/ld-linux-x86-64.so.2) --> ascii
0x006010d8  0x0000000000000000   ........ section_end.GNU_STACK
0x006010e0  0x0002000300000000   ........
0x006010e8  0x0000000000400254   T.@..... (.note.ABI_tag) section.NOTE R 0x1000000004
0x006010f0  0x0000000000000000   ........ section_end.GNU_STACK
0x006010f8  0x0003000300000000   ........
0x00601100  0x0000000000400274   t.@..... (.note.gnu.build_id) section..note.gnu.build_id R 0x1400000004
0x00601108  0x0000000000000000   ........ section_end.GNU_STACK
0x00601110  0x0004000300000000   ........
0x00601118  0x0000000000400298   ..@..... (.gnu.hash) section_end.NOTE R 0x800000003
0x00601120  0x0000000000000000   ........ section_end.GNU_STACK
0x00601128  0x0005000300000000   ........
0x00601130  0x0000000000400308   ..@..... (.dynsym) section..dynsym R 0x0 --> section_end.GNU_STACK
0x00601138  0x0000000000000000   ........ section_end.GNU_STACK
0x00601140  0x0006000300000000   ........
0x00601148  0x0000000000400548   H.@..... (.dynstr) section..dynstr R 0x6f732e6362696c00 --> ascii
0x00601150  0x0000000000000000   ........ section_end.GNU_STACK
0x00601158  0x0007000300000000   ........

So we know that the obj.passwords is an array of string pointers, suffixed with a null pointer. Let’s look again at
the main function. We see that this list is assigned to local variables (local_18h and local_10h), but it seems
that only a pointer to local_120h is passed as an argument to the checkPassword function (the lea rax, [rbp - local_120h] part).

Based on our current information the following stack layout is used:

rbp-0x120: 0000 0000
rbp-0x11c: ???? ????
[...]
rbp-0x018: &obj.passwords
rbp-0x010: &obj.passwords

This means that if you have a pointer to the local_120h variable, you’ll be able to read from forwards till the other
two variables. If we look into the checkPassword function, we’ll see that the address in the first argument is used
multiple times (stored in local_18h) and the value 0x110 is added to it. If we make some calculations (?-0x120 + 0x110)
We see that this matches to the local_10h variable of the main function:

-16 0xfffffffffffffff0 01777777777777777777760 17179869184.0G fffff000:0ff0 -16 1111111111111111111111111111111111111111111111111111111111110000 -16.0 -16.000000f -16.000000

(-16 is the same as -0x10 or 0xfffffffffffffff0 for quadwords). Such constructs are high indicators for structures.

radare2 has some rudimentary support for structures (primarily for representing data in memory) and a generic type system.

Based on our current knowledge, the structure should be something like this:

struct Foo {
    int field0;
    char unknown[260];
    char** field108;
    char** field110;
};

If we place this definition in a header file, we can tell radare2 to parse it with the to command. To view all currently stored structures, you can use the ts command:

Foo

By using t Foo you can generate a formatting command for viewing data based on the parsed definitions. As you notice, this produces some errors:

Cannot resolve type 'type.char **'
Cannot resolve type 'type.char **'
pf d[260]z field0 unknown

This is because radare2 currently doesn’t have a definition for the type char **. We can either change the field types to void* or define the missing type. For learning purposes, we’ll define the missing type.

Radare2 uses internally a database named sdb. This is more or less a key value storage with support for namespaces.

All types are defined inside the anal/types namespace. To interact with the database, you can use either the k command (complete database)
or the tk command (type namespace only). To define a new type, we’ll have to make it known to radare2. This is done by assigning the type identifer the string type:

tk char **=type

In the next step we tell radare how this value should be printed:

tk type.char **=p

We’ve used the p format here, which means that this type represents a pointer. (You can see all possible formatting options by using pf??).

This time, the t Foo command succeeds:

pf d[260]zpp field0 unknown field108 field110

Note: If you are interested how radare2 stores the information about this struct, run tk~Foo. The format for the fields is typeoffsetarraysize

As we’ve now the structure defined, we can use it in the main function. To do this, we’ll set the type of the local_120h variable to this new type Foo:

afvt local_120h Foo

(afvt stands for analysis function variables type).

This time, the disassembly has some additional annotations:

            ;-- main:
/ (fcn) sym.main 115
|   sym.main ();
|           ; var int local_138h @ rbp-0x138
|           ; var int local_130h @ rbp-0x130
|           ; var int local_124h @ rbp-0x124
|           ; var Foo local_120h @ rbp-0x120
|           ; UNKNOWN XREF from 0x004004a8 (unk)
|           ; DATA XREF from 0x004007dd (entry0)
|           0x00400a04      55             push rbp
|           0x00400a05      4889e5         mov rbp, rsp
|           0x00400a08      4881ec400100.  sub rsp, 0x140
|           0x00400a0f      89bddcfeffff   mov dword [rbp - local_124h], edi
|           0x00400a15      4889b5d0feff.  mov qword [rbp - local_130h], rsi
|           0x00400a1c      488995c8feff.  mov qword [rbp - local_138h], rdx
|           0x00400a23      b800000000     mov eax, 0
|           0x00400a28      e889feffff     call sym.banner
|           0x00400a2d      c785e0feffff.  mov dword [rbp - local_120h.field0], 0
|           0x00400a37      48c745e86010.  mov qword [rbp - local_120h.field108], obj.passwords
|           0x00400a3f      48c745f06010.  mov qword [rbp - local_120h.field110], obj.passwords
|           0x00400a47      488d85e0feff.  lea rax, [rbp - local_120h.field0]
|           0x00400a4e      4889c7         mov rdi, rax
|           0x00400a51      e815ffffff     call sym.checkPassword
|           0x00400a56      84c0           test al, al
|       ,=< 0x00400a58      740c           je 0x400a66
|       |   0x00400a5a      bfe90b4000     mov edi, str.Password_accepted_ ; "Password accepted!" @ 0x400be9
|       |   0x00400a5f      e81cfdffff     call sym.imp.puts
|      ,==< 0x00400a64      eb0a           jmp 0x400a70
|      |`-> 0x00400a66      bffc0b4000     mov edi, str.Wrong_         ; "Wrong!" @ 0x400bfc
|      |    0x00400a6b      e810fdffff     call sym.imp.puts
|      |    ; JMP XREF from 0x00400a64 (sym.main)
|      `--> 0x00400a70      b800000000     mov eax, 0
|           0x00400a75      c9             leave
\           0x00400a76      c3             ret

If your output is different from the one above (e.g. missing local_120h.field110), make sure that the offsets were correctly calculated in the database. You can change every entry with the tk command, e.g. tk struct.Foo.field110=char **,272,0.

As we can see, the output now nicely aligns with the assignments. An additional try to represent the asm code as C code
would now look like:

int main(int argc, char **argv) {
    struct Foo local_120h;
    banner();
    local_120h.field0 = 0;
    local_120h.field108 = obj.passwords;
    local_120h.field110 = obj.passwords;
    if(checkPassword(&local_120h)) {
        puts("Password accepted!\n");
    } else {
        puts("Wrong!\n");
    }
    return 0;
}

The next function (checkPassword) is (obviously) also different to the previous ones:

[0x0040096b]> VV @ sym.checkPassword (nodes 7 edges 8 zoom 100%) BB-NORM mouse:canvas-y movements-speed:5
                     =-----------------------------------------------=
                     |  0x40096b                                     |
                     | (fcn) sym.checkPassword 153                   |
                     |   sym.checkPassword ();                       |
                     | ; var int local_18h @ rbp-0x18                |
                     | ; var int local_4h @ rbp-0x4                  |
                     | push rbp                                      |
                     | mov rbp, rsp                                  |
                     | sub rsp, 0x20                                 |
                     | mov qword [rbp - local_18h], rdi              |
                     | mov rax, qword [rbp - local_18h]              |
                     | mov rdx, qword [rax + section_end..shstrtab]  |
                     | mov rax, qword [rbp - local_18h]              |
                     | mov qword [rax + 0x110], rdx                  |
                     | jmp 0x4009ea ;[a]                             |
                     =-----------------------------------------------=
                         v
                         '-----.
                               |
                               .------------.
                           =----------------------------------=
                           |  0x4009ea      |                 |
                           | mov rax, qword [rbp - local_18h] |
                           | mov rax, qword [rax + 0x110]     |
                           | mov rax, qword [rax]             |
                           | test rax, rax  |                 |
                           | jne 0x40098f ;[b]                |
                           =----------------------------------=
                                 t f        |
        .------------------------' '--------|----------------------.
        |                                   |                      |
        |                                   |                      |
  =-----------------------------------=     |              =--------------------=
  |  0x40098f                         |     |              |  0x4009fd          |
  | mov rax, qword [rbp - local_18h]  |     |              | mov eax, 1         |
  | mov rdi, rax                      |     |              =--------------------=
  | call sym.readPassword ;[c]        |     |                  v
  | mov dword [rbp - local_4h], eax   |     |                  |
  | mov eax, dword [rbp - local_4h]   |     |                  |
  | movsxd rdx, eax                   |     |                  |
  | mov rax, qword [rbp - local_18h]  |     |                  |
  | mov rax, qword [rax + 0x110]      |     |                  |
  | mov rax, qword [rax]              |     |                  |
  | mov rcx, qword [rbp - local_18h]  |     |                  |
  | add rcx, 4                        |     |                  |
  | mov rsi, rax                      |     |                  '------.
  | mov rdi, rcx                      |     |                         |
  | call sym.imp.strncmp ;[d]         |     |                         |
  | test eax, eax                     |     |                         |
  | je 0x4009d0 ;[e]                  |     |                         |
  =-----------------------------------=     |                         |
          f t                               |                         |
        .-' '---------------------.         |                         |
        |                         |         |                         |
        |                         |         |                         |
=--------------------=      =-----------------------------------=     |
|  0x4009c9          |      |  0x4009d0     |                   |     |
| mov eax, 0         |      | mov rax, qword [rbp - local_18h]  |     |
| jmp 0x400a02 ;[f]  |      | mov rax, qword [rax + 0x110]      |     |
=--------------------=      | lea rdx, [rax + 8]                |     |
    v                       | mov rax, qword [rbp - local_18h]  |     |
    |                       | mov qword [rax + 0x110], rdx      |     |
    |                       =-----------------------------------=     |
    '----------------------------.----------'                         |
                                 |------------------------------------'
                                 |
                                 |
                             =--------------------=
                             |  0x400a02          |
                             | leave              |
                             | ret                |
                             =--------------------=

First of all, the pointer to the previously used structure is stored in the local variable local_18h. We’ll rename it
to a more meaningful name (afvn local_18h passwordStruct). Let’s also look into the function block by block (e.g. using the minimap mode of VV (remember from the last post?)).

    ; UNKNOWN XREF from 0x00400508 (unk)
    ; CALL XREF from 0x00400a51 (sym.main)
    0x0040096b      55             push rbp
    0x0040096c      4889e5         mov rbp, rsp
    0x0040096f      4883ec20       sub rsp, 0x20
    0x00400973      48897de8       mov qword [rbp - passwordStruct], rdi
    0x00400977      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x0040097b      488b90080100.  mov rdx, qword [rax + section_end..shstrtab] ; [0x108:8]=0x288 LEA section_end..shstrtab ; section_end..shstrtab
    0x00400982      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x00400986      488990100100.  mov qword [rax + 0x110], rdx
    0x0040098d      eb5b           jmp 0x4009ea

This block does only two things. First, it reserves space on the stack (0x20 or 32 bytes). Second, it stores some values based on the input struct. The input pointer goes to our local variable at 0x400973, this address is afterwards stored into rax to access it’s members (typical for structures). In 0x40097b the field a offset 0x108 is accessed (radare2 was a bit to eager when it tried to replace constants with flags). This means it reads the value of our field108 variable. The value is then stored into a field at offset 0x110(field110). Sadly, the inline annotation of structure offsets is currently not implemented in radare2 (but is already planned for future releases).

int checkPassword(struct Foo *input) {
    // 0x00400973
    struct Foo* passwordStruct = input;
    // 0x00400977 - 0x00400986
    passwordStruct->field110 = passwordStruct->field108;

The following uncondtional jump brings us to the following small block:

    ; JMP XREF from 0x0040098d (sym.checkPassword)
    0x004009ea      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x004009ee      488b80100100.  mov rax, qword [rax + 0x110]        ; [0x110:8]=0x290
    0x004009f5      488b00         mov rax, qword [rax]
    0x004009f8      4885c0         test rax, rax
    0x004009fb      7592           jne 0x40098f

This block simply checks if the pointer in field110 points to a valid pointer itself.

// 0x004009ea - 0x004009fb
if (*passwordStruct->field110) {

If the pointer is valid, it will jump to the largest block of this function:

    0x0040098f      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x00400993      4889c7         mov rdi, rax
    0x00400996      e854ffffff     call sym.readPassword
    0x0040099b      8945fc         mov dword [rbp - local_4h], eax
    0x0040099e      8b45fc         mov eax, dword [rbp - local_4h]
    0x004009a1      4863d0         movsxd rdx, eax
    0x004009a4      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x004009a8      488b80100100.  mov rax, qword [rax + 0x110]        ; [0x110:8]=0x290
    0x004009af      488b00         mov rax, qword [rax]
    0x004009b2      488b4de8       mov rcx, qword [rbp - passwordStruct]
    0x004009b6      4883c104       add rcx, 4
    0x004009ba      4889c6         mov rsi, rax
    0x004009bd      4889cf         mov rdi, rcx
    0x004009c0      e8abfdffff     call sym.imp.strncmp
    0x004009c5      85c0           test eax, eax
    0x004009c7      7407           je 0x4009d0

This block uses the pointer to the structure as an argument for the function readPassword. The result of this function (eax) is then stored into a local variable local_4h. This value is afterwards used for the following strncmp call (third argument, rdx). The first argument of this call is our current unknown field (offset 4) of the structure, the second argument is where field110 points at.

A pseudo representation might look like the following:

// 0x0040098f - 0x0040099b
int x = readPassword(passwordStruct);
// 0x0040099e - 0x004009c7
if (!strncmp(passwordStruct->unknown, *passwordStruct->field110, x)) {

If the inputs are equal, the following block will be exectued

    0x004009d0      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x004009d4      488b80100100.  mov rax, qword [rax + 0x110]        ; [0x110:8]=0x290
    0x004009db      488d5008       lea rdx, [rax + 8]                  ; 0x8
    0x004009df      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x004009e3      488990100100.  mov qword [rax + 0x110], rdx

This is again a smaller one, which takes the value in field110, increments it by 8 (one pointer size on 64bit systems) and stores it.

// 0x004009d0 - 0x004009e3
// passwordStruct->field110 = ((char*)passwordStruct->field110) + 8;
passwordStruct->field110++;

As this starts now starts over at 0x004009ea we’ll continue with the non matching case of the strings.

    0x004009c9      b800000000     mov eax, 0
    0x004009ce      eb32           jmp 0x400a02

This is simple, it just sets eax to zero and jumps to the end of the function

} else {
    // 0x004009c9 & 0x00400a02
    return 0;
}

The last block(s) represent the success case of the function

    0x004009fd      b801000000     mov eax, 1
    ; JMP XREF from 0x004009ce (sym.checkPassword)
    0x00400a02      c9             leave
    0x00400a03      c3             ret
    }
    // 0x004009fd - 0x00400a03
    return 1;
}

With some refactoring based on an overall look (jumping back to the zero check for example) we can represent the function as the following:

int checkPassword(struct Foo *input) {
    // 0x00400973
    struct Foo* passwordStruct = input;
    // 0x00400977 - 0x00400986
    passwordStruct->field110 = passwordStruct->field108;
    // 0x004009ea - 0x004009fb
    while (*passwordStruct.field110) {
        // 0x0040098f - 0x0040099b
        int x = readPassword(passwordStruct);
        // 0x0040099e - 0x004009c7
        if (!strncmp(passwordStruct->unknown, *passwordStruct->field110, x)) {
            // 0x004009d0 - 0x004009e3
            // passwordStruct->field110 = ((char*)passwordStruct->field110) + 8;
            passwordStruct->field110++;
        } else {
            // 0x004009c9 & 0x00400a02
            return 0;
        }
    }
    // 0x004009fd - 0x00400a03
    return 1;
}

As it seems, there is still one additional function missing (readPassword).

/ (fcn) sym.readPassword 124
|   sym.readPassword ();
|           ; var int local_8h @ rbp-0x8
|           ; CALL XREF from 0x00400996 (sym.checkPassword)
|           0x004008ef      55             push rbp
|           0x004008f0      4889e5         mov rbp, rsp
|           0x004008f3      4883ec10       sub rsp, 0x10
|           0x004008f7      48897df8       mov qword [rbp - local_8h], rdi
|           0x004008fb      488b45f8       mov rax, qword [rbp - local_8h]
|           0x004008ff      488b80100100.  mov rax, qword [rax + 0x110] ; [0x110:8]=0x290
|           0x00400906      4889c2         mov rdx, rax
|           0x00400909      488b45f8       mov rax, qword [rbp - local_8h]
|           0x0040090d      488b80080100.  mov rax, qword [rax + section_end..shstrtab] ; [0x108:8]=0x288 LEA section_end..shstrtab ; section_end..shstrtab
|           0x00400914      4829c2         sub rdx, rax
|           0x00400917      4889d0         mov rax, rdx
|           0x0040091a      48c1f803       sar rax, 3
|           0x0040091e      4883c001       add rax, 1
|           0x00400922      4889c6         mov rsi, rax
|           0x00400925      bfcb0b4000     mov edi, str.Enter_Password_no.__d: ; "Enter Password no. %d: " @ 0x400bcb
|           0x0040092a      b800000000     mov eax, 0
|           0x0040092f      e86cfeffff     call sym.imp.printf
|           0x00400934      488b45f8       mov rax, qword [rbp - local_8h]
|           0x00400938      4883c004       add rax, 4
|           0x0040093c      4889c6         mov rsi, rax
|           0x0040093f      bfe30b4000     mov edi, str._255s          ; "%255s" @ 0x400be3
|           0x00400944      b800000000     mov eax, 0
|           0x00400949      e862feffff     call sym.imp.__isoc99_scanf
|           0x0040094e      488b45f8       mov rax, qword [rbp - local_8h]
|           0x00400952      c68003010000.  mov byte [rax + 0x103], 0
|           0x00400959      488b45f8       mov rax, qword [rbp - local_8h]
|           0x0040095d      4883c004       add rax, 4
|           0x00400961      4889c7         mov rdi, rax
|           0x00400964      e827feffff     call sym.imp.strlen
|           0x00400969      c9             leave
\           0x0040096a      c3             ret

This function has no jumps or internal function calls. The summary view (pdfs) shows which flags and calls are used
by the function:

0x00400925 "Enter Password no. %d: "
0x0040092f call sym.imp.printf
0x0040093f "%255s"
0x00400949 call sym.imp.__isoc99_scanf
0x00400964 call sym.imp.strlen

Simply based on this output we could already guess what this function does:

int readPassword(struct Foo* foo) {
    printf("Enter Password no. %d\n");
    scanf("%255s");
    return strlen();
}

Obviously, this is not enough for us, as we want to know exactly what’s really going on. Let’s start with the part 0x004008ef-0x0040090d:

The first instructions represent again a typical function prologue. They reserve 0x10 bytes on the stack and store the input pointer into this buffer (local_8h). Afterwards, field110 is stored in rdx and field108 in rax.

int readPassword(struct Foo* foo) {
    // 0x004008ef-0x0040090d
    struct Foo* local_8h = foo;
    rdx = local_8h->field110;
    rax = local_8h->field108;

Part 0x00400914-0x0040091e consists of some mathematical instructions which can be translated into the following code snippet:

rax = rdx - rax
rax >>= 3
rax += 1

As we know, the rdx and rax registers contain pointers to a list of strings. We also know (based on the checkPassword function) that field110 is incremented during the processing, whilst field108 stays the same. This ultimatively means that rdx - rax describes the offset between the two list entries on which field110 and field108 are pointing at. The next instruction (rax >>= 3) dos a right shift of the value in rax(our offset) by three positions. A right shift by one is equivalent to a division by two, therefore we have a
division by eight here. As we are dealing with 64bit pointers, these two instructions simply calculate how many times the field110 pointer was moved forward. The last increment just increments them by one, as (based on the following string) the current (one based) position should be printed out.

Together with the instructions 0x00400922-0x0040092f we get:

int readPassword(struct Foo* foo) {
    // 0x004008ef-0x0040092f
    struct Foo* local_8h = foo;
    printf("Enter Password no. %d\n", (local_8h->field110 - local_8h-field108) + 1);

Part 0x00400934-0x00400952 represents a scanf call with the unknown field of the struct (offset 4) followed by an explicit zero byte at the end:

// 0x00400934-0x00400952
scanf("%255s", local_8h->unknown);
local_8h->unknown[0x103] = 0;

The end of the function simply returns the result of a strlen call with the input read by scanf.

int readPassword(struct Foo* foo) {
    // 0x004008ef-0x0040092f
    struct Foo* local_8h = foo;
    printf("Enter Password no. %d\n", (local_8h->field110 - local_8h-field108) + 1);
    // 0x00400934-0x00400952
    scanf("%255s", local_8h->unknown);
    local_8h->unknown[259] = 0;
    // 0x00400959 - 0x0040096a
    return strlen(local_8h->unknown);
}

As we have now reversed all functions, we can see that this time multiple passwords are required and that they are stored in an array at obj.passwords. We’ve also reversed the format of the structure and (just for reference) can now rename the corresponding fields. This could be either done by editing the
header file, editing the database entries directly (tk) or by overwriting the definition inline

"td struct Foo {int field0; char password[260]; char** passwordListHead; char** currentPasswordListEntry;};"

Note: the quotes around the command are required to ensure that the definition is parsed correctly. Also be aware that the offsets may be incorrectly parsed.

Let’s test the passwords:

$ ./challenge03
##################################
#          Challenge 3           #
#                                #
#      (c) 2016 Timo Schmid      #
##################################
Enter Password no. 1: shad7Pe
Enter Password no. 2: miTho7i
Enter Password no. 3: Wa3suac
Enter Password no. 4: ohGhah7
Enter Password no. 5: Aibah1a
Password accepted!

Challenge 0x04

You will find the next challenge here (linux64 dynamically linked, linux64 statically linked, win64):

https://github.com/bluec0re/reversing-radare2/tree/master/sources/challenge04

MD5 (challenge04) = 215a8fa0a80c95082f8fb279ad7e5b9b
MD5 (challenge04.exe) = 016b96d38e914a96dca372b30d6d0949
MD5 (challenge04-static) = 87e67337a34fa8892bc2fc3c68c4db7d

SHA1 (challenge04) = 95a042b68c036c1d6b6698820589480be29a1437
SHA1 (challenge04.exe) = e4351d511c05666984c43d24cd4c338b3d715a66
SHA1 (challenge04-static) = 64d1ce0ae9a653860953faa87f88377fc0eda34b

SHA256 (challenge04) = 2829ef89135f9065c8b1122a13cde09df5d19768e25feb162a758ab168e1bfb4
SHA256 (challenge04.exe) = d63aa0768ba64eb5211a4f86c07b4b45b17e6125b79b24620f854c70aab401de
SHA256 (challenge04-static) = f723e4f44657484945501bf5ccdab2e7dd229db719f3568ad3cd88da382dccf9

The goal is again to find the correct password for the login into the binary. Next time, I’ll give you a walkthrough for the fourth challenge and challenge #5.

Happy reversing!

Reverse Engineering With Radare2 – Part 2

Welcome back to the radare2 reversing tutorials. If you’ve missed the previous parts, you can find them here and here.

Last time we’ve used the rabin2 application to view the  strings found inside the challenge01 binary to find password candidates. Based on the results we looked into the assembly to find the correct password. In this post, we’ll go through the next challenge and try out some of the features provided by radare2.

As recommended by the developers, I strongly suggest that you update your radare2 installation regularly, as it gets new features and bugfixes nearly every day.

Today, we’ll have a look into some of the visualization features of radare2.

First of all, you may have been noticed that the radare2 output is rather colorful. If the current colorscheme does not fit your needs, you’ll have multiple options to modify it. Several configuration parameters exist under the scr configuration space. You can view the different settings by using e?scr:

          scr.atport: V@ starts a background http server and spawns an r2 -C
           scr.color: Enable colors
     scr.color.bytes: Colorize bytes that represent the opcodes of the instruction
       scr.color.ops: Colorize numbers and registers in opcodes
         scr.columns: Force console column count (width)
            scr.echo: Show rcons output in realtime to stderr and buffer
        scr.feedback: Set visual feedback level (1=arrow on jump, 2=every key (useful for videos))
           scr.fgets: Use fgets() instead of dietline for prompt input
     scr.fix_columns: Workaround for Prompt iOS SSH client
        scr.fix_rows: Workaround for Linux TTY
             scr.fps: Show FPS in Visual
       scr.highlight: Highlight that word at RCons level
        scr.histsave: Always save history on exit
            scr.html: Disassembly uses HTML syntax
     scr.interactive: Start in interactive mode
            scr.nkey: Select the seek mode in visual
            scr.null: Show no output
           scr.pager: Select pager program (when output overflows the window)
       scr.pipecolor: Enable colors when using pipes
          scr.prompt: Show user prompt (used by r2 -q)
      scr.promptfile: Show user prompt file (used by r2 -q)
      scr.promptflag: Show flag name in the prompt
      scr.promptsect: Show section name in the prompt
         scr.randpal: Random color palete or just get the next one from 'eco'
      scr.responsive: Auto-adjust Visual depending on screen (e.g. unset asm.bytes)
        scr.rgbcolor: Use RGB colors (not available on Windows)
            scr.rows: Force console row count (height) (duplicate?)
            scr.seek: Seek to the specified address on startup
             scr.tee: Pipe output to file of this name
       scr.truecolor: Manage color palette (0: ansi 16, 1: 256, 2: 16M)
            scr.utf8: Show UTF-8 characters instead of ANSI
           scr.wheel: Mouse wheel in Visual; temporaryly disable/reenable by right click/Enter)
       scr.wheelnkey: Use sn/sp and scr.nkey on wheel instead of scroll
      scr.wheelspeed: Mouse wheel speed

For example if you have a terminal supporting utf8 characters, you might want to enable scr.utf8:

Before:

            ;-- main:
/ (fcn) sym.main 133
|           ; var int local_118h @ rbp-0x118
|           ; var int local_110h @ rbp-0x110
|           ; var int local_104h @ rbp-0x104
|           ; var int local_100h @ rbp-0x100
|           ; var int local_1h @ rbp-0x1
|           ; UNKNOWN XREF from 0x00400470 (unk)
|           ; DATA XREF from 0x0040073d (entry0)
|           0x004008e3      55             push rbp
|           0x004008e4      4889e5         mov rbp, rsp
|           0x004008e7      4881ec200100.  sub rsp, 0x120
|           0x004008ee      89bdfcfeffff   mov dword [rbp - local_104h], edi
|           0x004008f4      4889b5f0feff.  mov qword [rbp - local_110h], rsi
|           0x004008fb      488995e8feff.  mov qword [rbp - local_118h], rdx
|           0x00400902      b800000000     mov eax, 0
|           0x00400907      e80affffff     call sym.banner
|           0x0040090c      bf9d0a4000     mov edi, str.Enter_Password: ; "Enter Password: " @ 0x400a9d
|           0x00400911      b800000000     mov eax, 0
|           0x00400916      e8e5fdffff     call sym.imp.printf
|           0x0040091b      488d8500ffff.  lea rax, [rbp - local_100h]
|           0x00400922      4889c6         mov rsi, rax
|           0x00400925      bfae0a4000     mov edi, str._255s          ; "%255s" @ 0x400aae
|           0x0040092a      b800000000     mov eax, 0
|           0x0040092f      e8dcfdffff     call sym.imp.__isoc99_scanf
|           0x00400934      c645ff00       mov byte [rbp - local_1h], 0
|           0x00400938      488d8500ffff.  lea rax, [rbp - local_100h]
|           0x0040093f      4889c7         mov rdi, rax
|           0x00400942      e808ffffff     call sym.checkPassword
|           0x00400947      84c0           test al, al
|       ,=< 0x00400949      740c           je 0x400957
|       |   0x0040094b      bfb40a4000     mov edi, str.Password_accepted_ ; "Password accepted!" @ 0x400ab4
|       |   0x00400950      e88bfdffff     call sym.imp.puts
|      ,==< 0x00400955      eb0a           jmp 0x400961
|      |`-> 0x00400957      bfc70a4000     mov edi, str.Wrong_         ; "Wrong!" @ 0x400ac7
|      |    0x0040095c      e87ffdffff     call sym.imp.puts
|      |    ; JMP XREF from 0x00400955 (sym.main)
|      `--> 0x00400961      b800000000     mov eax, 0
|           0x00400966      c9             leave
\           0x00400967      c3             ret

 

After e scr.utf8=true:

            ;-- main:
╒ (fcn) sym.main 133
│           ; var int local_118h @ rbp-0x118
│           ; var int local_110h @ rbp-0x110
│           ; var int local_104h @ rbp-0x104
│           ; var int local_100h @ rbp-0x100
│           ; var int local_1h @ rbp-0x1
│           ; UNKNOWN XREF from 0x00400470 (unk)
│           ; DATA XREF from 0x0040073d (entry0)
│           0x004008e3      55             push rbp
│           0x004008e4      4889e5         mov rbp, rsp
│           0x004008e7      4881ec200100.  sub rsp, 0x120
│           0x004008ee      89bdfcfeffff   mov dword [rbp - local_104h], edi
│           0x004008f4      4889b5f0feff.  mov qword [rbp - local_110h], rsi
│           0x004008fb      488995e8feff.  mov qword [rbp - local_118h], rdx
│           0x00400902      b800000000     mov eax, 0
│           0x00400907      e80affffff     call sym.banner
│           0x0040090c      bf9d0a4000     mov edi, str.Enter_Password: ; "Enter Password: " @ 0x400a9d
│           0x00400911      b800000000     mov eax, 0
│           0x00400916      e8e5fdffff     call sym.imp.printf
│           0x0040091b      488d8500ffff.  lea rax, [rbp - local_100h]
│           0x00400922      4889c6         mov rsi, rax
│           0x00400925      bfae0a4000     mov edi, str._255s          ; "%255s" @ 0x400aae
│           0x0040092a      b800000000     mov eax, 0
│           0x0040092f      e8dcfdffff     call sym.imp.__isoc99_scanf
│           0x00400934      c645ff00       mov byte [rbp - local_1h], 0
│           0x00400938      488d8500ffff.  lea rax, [rbp - local_100h]
│           0x0040093f      4889c7         mov rdi, rax
│           0x00400942      e808ffffff     call sym.checkPassword
│           0x00400947      84c0           test al, al
│       ┌─< 0x00400949      740c           je 0x400957
│       │   0x0040094b      bfb40a4000     mov edi, str.Password_accepted_ ; "Password accepted!" @ 0x400ab4
│       │   0x00400950      e88bfdffff     call sym.imp.puts
│      ┌──< 0x00400955      eb0a           jmp 0x400961
│      │└─> 0x00400957      bfc70a4000     mov edi, str.Wrong_         ; "Wrong!" @ 0x400ac7
│      │    0x0040095c      e87ffdffff     call sym.imp.puts
│      │    ; JMP XREF from 0x00400955 (sym.main)
│      └──> 0x00400961      b800000000     mov eax, 0
│           0x00400966      c9             leave
╘           0x00400967      c3             ret

 

Currently, it mainly affects the drawn ascii arrows on the left hand side of the disassemblies. Of course you can also disable the coloring entirely by executing e scr.color = false (entirely), e scr.color.bytes = false (only the bytes in the disassembly) or e scr.color.ops = false (only the opcodes). If you like colors, but not the current scheme, you can choose another one by using the eccommands. One option is to choose one of the predefined themes, for example eco solarized:

            ;-- main:
╒ (fcn) sym.main 133
│           ; var int local_118h @ rbp-0x118
│           ; var int local_110h @ rbp-0x110
│           ; var int local_104h @ rbp-0x104
│           ; var int local_100h @ rbp-0x100
│           ; var int local_1h @ rbp-0x1
│           ; UNKNOWN XREF from 0x00400470 (unk)
│           ; DATA XREF from 0x0040073d (entry0)
│           0x004008e3      55             push rbp
│           0x004008e4      4889e5         mov rbp, rsp
│           0x004008e7      4881ec200100.  sub rsp, 0x120
│           0x004008ee      89bdfcfeffff   mov dword [rbp - local_104h], edi
│           0x004008f4      4889b5f0feff.  mov qword [rbp - local_110h], rsi
│           0x004008fb      488995e8feff.  mov qword [rbp - local_118h], rdx
│           0x00400902      b800000000     mov eax, 0
│           0x00400907      e80affffff     call sym.banner
│           0x0040090c      bf9d0a4000     mov edi, str.Enter_Password: ; "Enter Password: " @ 0x400a9d
│           0x00400911      b800000000     mov eax, 0
│           0x00400916      e8e5fdffff     call sym.imp.printf
│           0x0040091b      488d8500ffff.  lea rax, [rbp - local_100h]
│           0x00400922      4889c6         mov rsi, rax
│           0x00400925      bfae0a4000     mov edi, str._255s          ; "%255s" @ 0x400aae
│           0x0040092a      b800000000     mov eax, 0
│           0x0040092f      e8dcfdffff     call sym.imp.__isoc99_scanf
│           0x00400934      c645ff00       mov byte [rbp - local_1h], 0
│           0x00400938      488d8500ffff.  lea rax, [rbp - local_100h]
│           0x0040093f      4889c7         mov rdi, rax
│           0x00400942      e808ffffff     call sym.checkPassword
│           0x00400947      84c0           test al, al
│       ┌─< 0x00400949      740c           je 0x400957
│       │   0x0040094b      bfb40a4000     mov edi, str.Password_accepted_ ; "Password accepted!" @ 0x400ab4
│       │   0x00400950      e88bfdffff     call sym.imp.puts
│      ┌──< 0x00400955      eb0a           jmp 0x400961
│      │└─> 0x00400957      bfc70a4000     mov edi, str.Wrong_         ; "Wrong!" @ 0x400ac7
│      │    0x0040095c      e87ffdffff     call sym.imp.puts
│      │    ; JMP XREF from 0x00400955 (sym.main)
│      └──> 0x00400961      b800000000     mov eax, 0
│           0x00400966      c9             leave
╘           0x00400967      c3             ret

(use eco without any parameters for a complete list). If this does not fits your need, you could also use random colors (ecr) or set the colors yourself (ec for a list of all available keys and ec <key> <frontcolor> (<backcolor>) eg. ec comment rgb:ffff00 blue (use ecs for supported colors)).

Besides the coloring, you’ll find various settings which affect the output of the disassembly in the asm configuration space. They allow you, for example, to customize the spacing between the different metadata of the disassembly (flow arrows, bytes, addresses, opcodes, comments,…) and also turning them on and off.

After you found some pleasing settings, let’s start with the current binary (btw, you can store option commands in the $HOME/.radare2rc file for persistence). After loading the binary into radare2 (r2 -AA ./challenge02), you should find yourself at the address 0x00400720. In case of some of you are wondering about why we are placed here and not at the main function, we’ll use different ways to get a clue. Assuming that we are currently placed inside or at the beginning of a function we could use the aficommand to give us some information about the function we are placed in:

[0x00400720]> afi
#
 offset: 0x00400720
 name: entry0
 size: 43
 realsz: 43
 stackframe: 0
 call-convention: @�adAV
 cyclomatic-complexity: 1
 bits: 64
 type: fcn [NEW]
 num-bbs: 1
 edges: 0
 end-bbs: 1
 call-refs: 
 data-refs: 0x004009e0 0x00400970 0x004008e3 0x00600ff0 
 code-xrefs: 
 data-xrefs: 
 locals:0
 args: 0
 diff: type: new

 

As you may notice, besides the other different information, we get the name of the current function which is entry0. This strongly indicates that we are currently placed at the entrypoint of the application, which typically sets some things up (like parsing the commandline, retrieving environment variables etc.) and calls the actual main function afterwards. To check this assumption, we’ll use the ie command to get the address of the entry point as specified in the file headers (elf64 in this case):

[0x00400720]> ie
[Entrypoints]
vaddr=0x00400720 paddr=0x00000720 baddr=0x00400000 laddr=0x00000000 type=program

1 entrypoints

 

As you can see, the vaddr (virtual address) is matching the current position, so we are actually placed at the beginning of the entry point function.

Sidenote: the virtual address is most of the time a combination of the physical address (paddr; position in the binary) and the base address (baddr).

 

Let’s take a look at the disassembly of this function:

 

            ;-- section_end..plt:
            ;-- section..text:
            ;-- _start:
╒ (fcn) entry0 43
│           ; UNKNOWN XREF from 0x00400440 (unk)
│           0x00400720      31ed           xor ebp, ebp                ; [13] va=0x00400720 pa=0x00000720 sz=706 vsz=706 rwx=--r-x .text
│           0x00400722      4989d1         mov r9, rdx
│           0x00400725      5e             pop rsi
│           0x00400726      4889e2         mov rdx, rsp
│           0x00400729      4883e4f0       and rsp, 0xfffffffffffffff0
│           0x0040072d      50             push rax
│           0x0040072e      54             push rsp
│           0x0040072f      49c7c0e00940.  mov r8, sym.__libc_csu_fini ; sym.__libc_csu_fini
│           0x00400736      48c7c1700940.  mov rcx, sym.__libc_csu_init ; "AWAVA..AUATL.%.. " @ 0x400970
│           0x0040073d      48c7c7e30840.  mov rdi, sym.main           ; "UH..H.. ." @ 0x4008e3
│           0x00400744      ff15a6082000   call qword [rip + 0x2008a6] ; [0x600ff0:8]=0 LEA reloc.__libc_start_main_240 ; reloc.__libc_start_main_240
╘           0x0040074a      f4             hlt

 

After some stack setup between lines 0x400720 and 0x400729, some function is called in line 0x400744with the main function as the first parameter (mov rdi, sym.main) besides some other parameters. As you can see, the call is using a process counter relative addressing. This technique is typically used to produce position independent code (PIC) which is needed for the ASLR technique. As the rip is pointing to the next address to be executed, it will contain the address 0x40074a. This results in the address 0x600ff0(to calculate in radare2: ? 0x40074a+0x2008a6).

If we look into the sections of the binary (iS) we’ll see that this address belongs to the the .got section. This section contains the global object table (GOT) and is filled by the linker during load. It’s used as a jump table for library functions. Radare2 is also able to parse the relocation tables to tell us which functions will be mapped at runtime (ir command):

 

[Relocations]
vaddr=0x00600ff0 paddr=0x00000ff0 type=SET_64 __libc_start_main
vaddr=0x00600ff8 paddr=0x00000ff8 type=SET_64 __gmon_start__
vaddr=0x00601018 paddr=0x00001018 type=SET_64 puts
vaddr=0x00601020 paddr=0x00001020 type=SET_64 strlen
vaddr=0x00601028 paddr=0x00001028 type=SET_64 printf
vaddr=0x00601030 paddr=0x00001030 type=SET_64 __isoc99_scanf

6 relocations

Therefore the __libc_start_main function will be called at runtime :)… Some of you may already noticed that radare is smart enough to provide this information to us in the comment at the call line ;).

Radare is also able to replace the mnemonic RIP-relative addressing with the target function name. To get this, you’ve to enable the asm.relsub option:

call qword [reloc.__libc_start_main_240] ; [0x600ff0:8]=0 LEA reloc.__libc_start_main_240 ; reloc.__libc_start_main_240

 

As the __libc_start_main function is a standard library function provided by the linux library =libc, we’ll jump directly into the main function (s sym.main).

This time, we’ll try to get a fast overview of the referenced flags in this function (strings, global variables, functions, etc.). To get a graph of the references, we have two options. The first one is to generate a graphviz graph by using agc $$ | xdot - ($$ is a variable containing the current address, sym.main in this case):

Reference Graph

The other option would be the ascii graph in the visual mode (we’ll come to this later) which you’ll get by executing VV and then pressing >. As this looks very similar to the last challenge, we could directly look into the checkPassword function. You can do this either by seeking to it (s sym.checkPassword) or in the visual graph view by pressing the tab key until the function is marked. After pressing q, you’ll be placed inside the function (press q multiple times if you want to return to the main interface).

 

As we want to take a look into some of the visual modes of radere2, we’ll stay in the graph mode this time (either by pressing q only once or by entering VV).

[0x0040084f]> VV @ sym.checkPassword (nodes 9 edges 11 zoom 100%) BB-NORM mouse:canvas-y movements-speed:5
                                  =---------------------------------------------=
                                  |  0x40084f                                   |
                                  | (fcn) sym.checkPassword 148                 |
                                  | ; var int local_28h @ rbp-0x28              |
                                  | ; var int local_1ch @ rbp-0x1c              |
                                  | ; var int local_18h @ rbp-0x18              |
                                  | ; var int local_14h @ rbp-0x14              |
                                  | ; var int local_10h @ rbp-0x10              |
                                  | ; var int local_4h @ rbp-0x4                |
                                  | push rbp                                    |
                                  | mov rbp, rsp                                |
                                  | sub rsp, 0x30                               |
                                  | mov qword [rbp - local_28h], rdi            |
                                  | mov qword [rbp - local_10h], str.Sup3rP4ss  |
                                  | mov rax, qword [rbp - local_10h]            |
                                  | mov rdi, rax                                |
                                  | call sym.imp.strlen ;[a]                    |
                                  | mov dword [rbp - local_14h], eax            |
                                  | mov rax, qword [rbp - local_28h]            |
                                  | mov rdi, rax                                |
                                  | call sym.imp.strlen ;[a]                    |
                                  | mov dword [rbp - local_18h], eax            |
                                  | mov eax, dword [rbp - local_14h]            |
                                  | cmp eax, dword [rbp - local_18h]            |
                                  | je 0x400890 ;[b]                            |
                                  =---------------------------------------------=
                                        t f
                           .------------' '------------------------------.
                           |                                             |
                           |                                             |
                     =-----------------------------------=       =--------------------=
                     |  0x400890                         |       |  0x400889          |
                     | mov eax, dword [rbp - local_14h]  |       | mov eax, 0         |
                     | mov dword [rbp - local_1ch], eax  |       | jmp 0x4008e1 ;[f]  |
                     | mov dword [rbp - local_4h], 0     |       =--------------------=
                     | jmp 0x4008d4 ;[c]                 |           v
                     =-----------------------------------=           |
                         v                                           '------.
                         |                                                  |
                         |                                                  |
                         .----------------.                                 |
                     =-----------------------------------=                  |
                     |  0x4008d4          |              |                  |
                     | mov eax, dword [rbp - local_4h]   |                  |
                     | cmp eax, dword [rbp - local_1ch]  |                  |
                     | jl 0x40089f ;[d]   |              |                  |
                     =-----------------------------------=                  |
                           t f            |                                 |
      .--------------------' '------------|-------------.                   |
      |                                   |             |                   |
      |                                   |             |                   |
=-----------------------------------=     |     =--------------------=      |
|  0x40089f                         |     |     |  0x4008dc          |      |
| mov eax, dword [rbp - local_4h]   |     |     | mov eax, 1         |      |
| movsxd rdx, eax                   |     |     =--------------------=      |
| mov rax, qword [rbp - local_28h]  |     |         v                       |
| add rax, rdx                      |     |         |                       |
| movzx edx, byte [rax]             |     |         |                       |
| mov eax, dword [rbp - local_1ch]  |     |         |                       |
| sub eax, 1                        |     |         |                       |
| sub eax, dword [rbp - local_4h]   |     |         |                       |
| movsxd rcx, eax                   |     |         |                       |
| mov rax, qword [rbp - local_10h]  |     |         |                       |
| add rax, rcx                      |     |         '-----------------.     |
| movzx eax, byte [rax]             |     |                           |     |
| cmp dl, al                        |     |                           |     |
| je 0x4008d0 ;[e]                  |     |                           |     |
=-----------------------------------=     |                           |     |
        f t                               |                           |     |
        '---.-------------------------.   |                           |     |
            |                         |   |                           |     |
            |                         |   |                           |     |
    =--------------------=      =-------------------------------=     |     |
    |  0x4008c9          |      |  0x4008d0                     |     |     |
    | mov eax, 0         |      | add dword [rbp - local_4h], 1 |     |     |
    | jmp 0x4008e1 ;[f]  |      =-------------------------------=     |     |
    =--------------------=          `-----'                           |     |
        v                                                             |     |
        '-------------------------------------.-----------------------------'
                                              |
                                              |
                                          =--------------------=
                                          |  0x4008e1          |
                                          | leave              |
                                          | ret                |
                                          =--------------------=

 

This graph is available in different versions, which can be cycled through by pressing p. The view we’ll focus on this time is the minimap view. You’ll get this view by pressing p until you’ll see a BB-SMALL in the first line of the output. It should look like this:

[0x0040084f]> VV @ sym.checkPassword (nodes 9 edges 11 zoom 100%) BB-SMALL mouse:canvas-y movements-speed:5                                   

0x40084f:
(fcn) sym.checkPassword 148
; var int local_28h @ rbp-0x28
; var int local_1ch @ rbp-0x1c
; var int local_18h @ rbp-0x18
; var int local_14h @ rbp-0x14
; var int local_10h @ rbp-0x10
; var int local_4h @ rbp-0x4
push rbp
mov rbp, rsp
sub rsp, 0x30
mov qword [rbp - local_28h], rdi
mov qword [rbp - local_10h], str.Sup3rP4ss
mov rax, qword [rbp - local_10h]
mov rdi, rax
call sym.imp.strlen ;[a]                                                <@@@@@@>
mov dword [rbp - local_14h], eax                                           t f
mov rax, qword [rbp - local_28h]                                 .---------' '---------.
mov rdi, rax                                                     |                     |
call sym.imp.strlen ;[a]                                         |                     |
mov dword [rbp - local_18h], eax                              [_0890_]            [_0889_]
mov eax, dword [rbp - local_14h]                               v                   v
cmp eax, dword [rbp - local_18h]                               |                   |
je 0x400890 ;[b]                                               |                   |
                                                               .                   |
                                                              [_08d4_]             |
                                                               | t f               |
                                                       .-------|-' '---------.     |
                                                       |       |             |     |
                                                       |       |             |     |
                                                    [_089f_]   |        [_08dc_]   |
                                                         f t   |         v         |
                                                  .------' '--.|         '---.     |
                                                  |           ||             |     |
                                                  |           ||             |     |
                                             [_08c9_]      [_08d0_]          |     |
                                              v             `--'             |     |
                                              '---------------------.--------------'
                                                                    |
                                                                    |

This view shows you a minimap in the center and the current selected node as a disassembly on the upper left (the current node is marked with @@@@@@). This view helps us to focus on smaller blocks of a function, whilst always having the overview about the execution flow.

As we are now starting with reading the assembler code, I want to show you another nice feature for those of you who aren’t able to speak fluent assembler. Radare2 has an option to replace the assembler code by a pseudo code which is simpler to read (but sometimes not as detailed). To enable this feature, press : (colon, to open main shell) and enter e asm.pseudo = true. After leaving the shell again (press enter) you should see something like this:

(fcn) sym.checkPassword 148
; var int local_28h @ rbp-0x28
; var int local_1ch @ rbp-0x1c
; var int local_18h @ rbp-0x18
; var int local_14h @ rbp-0x14
; var int local_10h @ rbp-0x10
; var int local_4h @ rbp-0x4
push rbp
rbp = rsp
rsp -= 0x30
qword [rbp - local_28h] = rdi
qword [rbp - local_10h] = str.Sup3rP4ss
rax = qword [rbp - local_10h]
rdi = rax
sym.imp.strlen ()
dword [rbp - local_14h] = eax
rax = qword [rbp - local_28h]
rdi = rax
sym.imp.strlen ()
dword [rbp - local_18h] = eax
eax = dword [rbp - local_14h]
if (eax == dword [rbp - local_18h]
isZero 0x400890)

The first three lines are a typical function prologue. They create a backup of the rbp and reserve 48 bytes of memory on the stack. As you may noticed, radare2 displays constant values per default in a hexadecimal representation. Similar to IDA Pro, we are able to tell radare how we want to see a value at a specific position. Sadly it isn’t easily available in the graph view (yet?), but we can set it manually in the main shell. First we need the address of the the opcodes containing the value we want to change. We can get it for example by using pd 3 (which disassembles 3 opcodes from the current position). This would result in 0x00400853. To tell radare that we want to get the value in decimal, we enter ahi d @ 0x00400853. This stands for analysis hints intermediate, decimal at the address 0x00400853.

As we know from the last challenge, the function checkPassword gets one argument and returns if it contains the correct password. Currently, radare2 not detected that the function has an argument. Let’s tell it to do so. The command afvr rdi password const char* tells radare2 that the rdi register is used at an argument with the name password of the type const char* (based on the System V AMD64 ABI). This means that the local variable local_28h is a pointer to the entered password. Therefore we change the name and the type of the variable (afvn local_28h password_1 and afvbt password_1 const char*). The next line indicates that local_10h points to the string we compare with (I rename it to targetPassword). As rdi is always used in the amd64 ABI as the first argument, it becomes confusing as it is now aliased to password in the function disassembly. That’s why I prefer to leave it after I followed the dataflow (remove the alias by afvr- password).

This means that the next 8 lines calculate the length of the different strings and stores them in local_14hand local_18h respectively. Your current disassembly should look now close to this:

(fcn) sym.checkPassword 148
; var const char* password_1 @ rbp-0x28
; var int local_1ch @ rbp-0x1c
; var int inputPwdLength @ rbp-0x18
; var int targetPwdLength @ rbp-0x14
; var const char* targetPassword @ rbp-0x10
; var int local_4h @ rbp-0x4
; UNKNOWN XREF from 0x004004b8 (unk)
; CALL XREF from 0x00400942 (sym.main)
push rbp
rbp = rsp
rsp -= 48
qword [rbp - password_1] = rdi
qword [rbp - targetPassword] = str.Sup3rP4ss
rax = qword [rbp - targetPassword]
rdi = rax
sym.imp.strlen ()
dword [rbp - targetPwdLength] = eax
rax = qword [rbp - password_1]
rdi = rax
sym.imp.strlen ()
dword [rbp - inputPwdLength] = eax
eax = dword [rbp - targetPwdLength]
if (eax == dword [rbp - inputPwdLength]
isZero 0x400890)

If we directly translate what we have so far, we get:

int checkPassword(const char* password) {
    const char* password_1 = password;
    const char* targetPassword = "Sup3rP4ss";
    int targetPwdLength = strlen(targetPassword);
    int inputPwdLength = strlen(password_1);
    if(targetPwdLength == inputPwdLength) {

Let’s check what happens if the both lengths are not equal (you might already have a strong guess based on the minimap 😉 ). In the graph view you can simply press f or t to follow the true or the false branch. Pressing f brings us to a very small node. This node simply sets eax to zero and then jumps to 0x4008e1. To follow this jump, you can press g and the character(s) behind this jump (f in my case). What now happened in my current version of radare2 is that I’ve got an empty disassembly in the top left. This is one of the situations where the pseudo syntax misses some instructions. After disabling it (e asm.pseudo = false), you’ll see the instructions leave and ret. Those instructions tell the CPU to restore the stack based on the stored registers and jump back to the code which called the current function. In combination with the previous eax line, we have:

int checkPassword(const char* password) {
    const char* password_1 = password;
    const char* targetPassword = "Sup3rP4ss";
    int targetPwdLength = strlen(targetPassword);
    int inputPwdLength = strlen(password_1);
    if(targetPwdLength == inputPwdLength) {
    } else {
        return 0;
    }
}

Current state: 3 blocks decompiled (0x4084f, 0x40889 and 0x408e1).

Let’s continue with the next one (the true case of the comparison).

Sidenote: You can always navigate through the blocks by using tab and TAB (shift+tab)

This block consists of three instructions which take a copy of the targetPwdLength variable and set another variable to zero. This is a typical intro of a loop which sets up the counter variable(s). If we take a look on the minimap, we can see that this is actually the intro to a loop (this block jumps to block 0x4008d4 and block 0x408d0 also jumps back to block 0x4008d4). So let’s rename the variables accordingly (another block finished 🙂 ).

The next one is again a small one. It compares the counter variable to the copy of the length. If it is lower, it seems to continue with another check (based on the minimap) if not, it goes to a block which ends up in the function epilog. I’ll go with this one first as it seems to be a quick one… It stores the value on into eax, which means that the function returns true. Our current decompiled code:

int checkPassword(const char* password) {
    const char* password_1 = password;
    const char* targetPassword = "Sup3rP4ss";
    int targetPwdLength = strlen(targetPassword);
    int inputPwdLength = strlen(password_1);
    if(targetPwdLength == inputPwdLength) {
        int counter = 0;
        int length = targetPwdLength;
        while(counter < length) {
        }
        return 1;
    } else {
        return 0;
    }
}

The true block is a larger one with some calculations and one comparison:

eax = dword [rbp - counter]
rdx = eax
rax = qword [rbp - password_1]
rax += rdx
edx = byte [rax]
eax = dword [rbp - length]
eax -= 1
eax -= dword [rbp - counter]
rcx = eax
rax = qword [rbp - targetPassword]
rax += rcx
eax = byte [rax]
if (dl == al
isZero 0x4008d0)

 

The first five lines store the character from the password_1 array at a position based on the current counter value. The next lines simply takes a character from the targetPassword array, but this time counting from the end. Those values are then compared for equality. Again, let’s check the inequality case: zero is stored in eax and the function returns false. The true case simply increments the counter by one and starts over at the block 0x408d4.

We are now able to decompile all blocks of the function:

int checkPassword(const char* password) {
    const char* password_1 = password;
    const char* targetPassword = "Sup3rP4ss";
    int targetPwdLength = strlen(targetPassword);
    int inputPwdLength = strlen(password_1);
    if(targetPwdLength == inputPwdLength) {
        int counter = 0;
        int length = targetPwdLength;
        while(counter < length) {
            if(password_1[counter] == targetPassword[length-1-counter]) {
                counter += 1;
            } else {
                return 0;
            }
        }
        return 1;
    } else {
        return 0;
    }
}

Or after some manual reordering:

int checkPassword(const char* password) {
    const char* targetPassword = "Sup3rP4ss";
    int targetPwdLength = strlen(targetPassword);
    int inputPwdLength = strlen(password);
    if(targetPwdLength != inputPwdLength) {
        return 0;
    }


    for(int counter = 0; counter < targetPwdLength; counter++) {
        if(password[counter] != targetPassword[targetPwdLength - 1 - counter])
            return 0;
    }
    return 1;
}

Have a clue what this function does?

  1. It checks if the entered password is as long as the required password
  2. It compares the entered password with the reverse of “Sup3rP4ss”

This means our required password is “ss4Pr3puS” :

$ ./challenge02
##################################
#          Challenge 2           #
#                                #
#      (c) 2016 Timo Schmid      #
##################################
Enter Password: ss4Pr3puS
Password accepted!

That’s it!

The last command I would like to show you today is the experimental pseudo code feature:

[0x004008d0]> pdc
function sym.checkPassword () {
    loc_0x40084f:

  ; UNKNOWN XREF from 0x004004b8 (unk)
  ; CALL XREF from 0x00400942 (sym.main)
  push rbp
  rbp = rsp
  rsp -= 0x30
  qword [rbp - password_1] = password
  qword [rbp - targetPassword] = 0x400a93
  rax = qword [rbp - targetPassword]
  password = rax
  0x4006f0 ()                    ; sym.imp.strlen
  dword [rbp - targetPwdLength] = eax
  rax = qword [rbp - password_1]
  password = rax
  0x4006f0 ()                    ; sym.imp.strlen
  dword [rbp - inputPwdLength] = eax
  eax = dword [rbp - targetPwdLength]
  if (eax == dword [rbp - inputPwdLength]
  isZero 0x400890) {
        loc_0x400890:

      eax = dword [rbp - targetPwdLength]
      dword [rbp - length] = eax
      dword [rbp - counter] = 0
      goto 0x4008d4
       do {
            loc_0x4008d4:

          ; JMP XREF from 0x0040089d (sym.checkPassword)
          eax = dword [rbp - counter]
          if (eax == dword [rbp - length]
          jl 0x40089f 
           } while (?);
       } while (?);
      }
      return;
}

It’s currently very incomplete in terms of instruction coverage, but it shows where radare2 features might go in the future (there was a GSoC running which started to create a radare2 based decompiler named radeco).

Challenge 0x03

You will find the next challenge here (linux64 dynamically linked, linux64 statically linked, win64):

https://github.com/bluec0re/reversing-radare2/tree/master/sources/challenge03

MD5 (challenge03) = 748f98ced3ca0b9178f8d85cdcc9d750
MD5 (challenge03.exe) = 2a1dde4f0de546216779394ce8960aa5
MD5 (challenge03-static) = 8c48ac034d6cefe135e9f22d6431e930
SHA1 (challenge03) = 0a2317bfb952f64dcd8ca50d2a084275cf8e7816
SHA1 (challenge03.exe) = 1c00a49d1c2249cb528a06c08cce1d15b062f59c
SHA1 (challenge03-static) = 78fcffdbe56d1cea29723df30dc14dfe6baf101f
SHA256 (challenge03) = 00f7662f22a3d305575a0117ea17b8d7801c133ffa2201986d62ebade65bdbf7
SHA256 (challenge03.exe) = 0c4a760a4a8653c4a26d07e0970a3ef7474cbdb53596166a670f5c08453eb57e
SHA256 (challenge03-static) = cbea4018b2451e84b789bd909f892f77486e8d42aee659e175da942a555c97ae

The goal is again to find the correct password for the login into the binary. Next time, I’ll give you a walkthrough for the third challenge and challenge #4.

Happy reversing!

Reverse Engineering With Radare2 – Part 1

Welcome back to the radare2 reversing tutorials. If you’ve missed the intro, you can find it here.

The last time you got the challenge01 binary and your goal was to find the password for the login. Let’s see how the application looks like:

$ ./challenge01
##################################
#          Challenge 1           #
#                                #
#      (c) 2016 Timo Schmid      #
##################################
Enter Password: test
Wrong!

The first and simplest step would be to look for strings inside the binary. We could do this either by using the unix utility strings or the binary analyzing binary from radare rabin2:

$ rabin2 -z ./challenge01
vaddr=0x00400a68 paddr=0x00000a68 ordinal=000 sz=35 len=34 section=.rodata type=ascii string=##################################
vaddr=0x00400a90 paddr=0x00000a90 ordinal=001 sz=35 len=34 section=.rodata type=ascii string=#          Challenge 1           #
vaddr=0x00400ab8 paddr=0x00000ab8 ordinal=002 sz=35 len=34 section=.rodata type=ascii string=#                                #
vaddr=0x00400ae0 paddr=0x00000ae0 ordinal=003 sz=35 len=34 section=.rodata type=ascii string=#      (c) 2016 Timo Schmid      #
vaddr=0x00400b03 paddr=0x00000b03 ordinal=004 sz=9 len=8 section=.rodata type=ascii string=p4ssw0rd
vaddr=0x00400b0c paddr=0x00000b0c ordinal=005 sz=11 len=10 section=.rodata type=ascii string=n0p4ssw0rd
vaddr=0x00400b17 paddr=0x00000b17 ordinal=006 sz=17 len=16 section=.rodata type=ascii string=Enter Password: 
vaddr=0x00400b28 paddr=0x00000b28 ordinal=007 sz=6 len=5 section=.rodata type=ascii string=%255s
vaddr=0x00400b2e paddr=0x00000b2e ordinal=008 sz=19 len=18 section=.rodata type=ascii string=Password accepted!
vaddr=0x00400b41 paddr=0x00000b41 ordinal=009 sz=7 len=6 section=.rodata type=ascii string=Wrong!

Two strings look interesting: “p4ssw0rd” and “n0p4ssw0rd”. Obviously, we could try both of them if one works, but as this is a reversing tutorial, we’ll figure it out in more depth 😉

So after loading the binary into radare (r2 ./challenge01) and do some anlysis (aaa) we’ll start by looking at the main function of the binary. As radare2 seeks to the entry point, we have to seek to the main function first.

challenge01

As we can see in the output above, the main function contains several calls. Some of them are calling (imported) library functions which are named after the scheme sym.imp.<function name>. As this is a x64 binary on a linux system, it is using the System V AMD64 ABI for the calling conventions. This means that the first argument is placed in the rdi register, the second in the rsi, third in rdx, forth rcx, fifth r8 and sixth r9 register. A stack buffer is seemed to also be used as the rsp register is decremented by 0x120, which allocates 288 bytes on the stack. After that, the registers edirsi and rdx are stored on the stack (maybe as backup). After the call to the banner function, a pointer to the string “Enter Password: ” is stored in edi which means in the first argument for the next function call (printf in this case). The next call (scanf) takes the arguments “%255s” and a pointer to the stack. Based on the information shown by radare, the pointer is pointing at position rdp-0x100 which means that it points at a 256 byte buffer on the current stack frame (0x100 = 256 and the last byte is set to zero after the call). Now it become interesting as the pointer to the buffer is given to the checkPassword function as the first argument. The return value of this function (al) is checked if it is zero or not. If it contains zero, the jump at 0x4009c0 will be taken to the address 0x4009ce and “Wrong!” will be printed. Otherwise “Password accepted!”.

Let’s take a look at the checkPassword function:

challenge01_2

As we can see here, our argument is stored on the stack (at local_18h) after that it is read again to be passed to the strlen function. The result of this function is then used as the third argument to the strncmpfunction. The first argument is our input string and the second argument is …. “n0p4ssw0rd”. This means that the searched password is “n0p4ssw0rd” 🙂 :

$ ./challenge01
##################################
#          Challenge 1           #
#                                #
#      (c) 2016 Timo Schmid      #
##################################
Enter Password: n0p4ssw0rd
Password accepted!

Challenge solved!

Challenge 0x02

You will find the next challenge here (linux64 dynamically linked, linux64 statically linked, win64):

https://github.com/bluec0re/reversing-radare2/tree/master/sources/challenge02

MD5 (challenge02) = 2b26165a67274fca1d23959675114444
MD5 (challenge02.exe) = 5bc1f2451d62ff33b2128185a32cdc9b
MD5 (challenge02-static) = 34ab5bb11a095383c121aecbb689767a
SHA1 (challenge02) = 34c8bb9de8f1c78dbe45edc80c7cedbc176b38a9
SHA1 (challenge02.exe) = 07522e7e3c6444bd3dfae1061d442551b23c6ae6
SHA1 (challenge02-static) = 99d0eac08903ff3d7404d4edb3ea7e6ae524b9d6
SHA256 (challenge02) = 3b743b588e21bb0623a45a39ae34f388d1cf292a96489cfe39a590e769ee6750
SHA256 (challenge02.exe) = f8eeee8c16a121d42915a69a64c3cd32d70b233d512587dc9534db7f7fea0a14
SHA256 (challenge02-static) = 5df7c7aa6bbddf141e0127070e170a3650c45a2586d4bb85250633f0264b9f39

The goal is again to find the correct password for the login into the binary. Next time I’ll give you a walkthrough for the second challenge and challenge #3.

Happy reversing!

Reverse Engineering With Radare2 – Intro

As some of you may know, there is a “new” reverse engineering toolkit out there which tries to compete with IDA Pro in terms of reverse engineering. I’m talking about radare2, a framework for reversing, patching, debugging and exploiting.

It has large scripting capabilities, runs on all major plattforms (Android, GNU/Linux, [Net|Free|Open]BSD, iOS, OSX, QNX, w32, w64, Solaris, Haiku, FirefoxOS and even on your pebble smartwatch 😉 ) and is free.

Sadly, I had some problems finding good tutorials on how to use it, as the interface is currently a bit cumbersome. After fiddling around, I’ve decided to create a little tutorial series where we can learn together ;).

I will publish one binary per post and you will have time to reverse it till the next blog post is published (most of the time you have to find which data you have to enter to make the binary happy).

So let’s start 🙂

Setup

The most advisable way is to use the most current version from the github repository:

https://github.com/radare/radare2

(As taken from the README:)

The easiest way to install radare2 from git is by running the following command:

$ sys/install.sh

If you want to install radare2 in the home directory without using root privileges and sudo, simply run:

$ sys/user.sh

 

If everything goes well, you’ll find multiple tools in your path:

  • r2 – the “main” binary
  • rabin2 – binary to analyze files (list imports, exports, strings, …)
  • rax2 – binary to convert between data formats
  • radiff2 – binary to do some diffing
  • rahash2 – creates hashes from file blocks (and whole file)
  • rasm2 – helps to play with assembler instructions

The default interface for radare2 is the command line. Just type

r2 your_binary

to start it. Radare has a very shortened command syntax (it is always logical, but in my opinion a few more characters wouldn’t hurt). So to start the analysis of your file, type aa (for analyze all; use aaa or aaaa to run even more analysis algorithms). Did you already get the idea behind the commands? Radare uses a tree structure like name of the commands so all commands which corresponds to analyzing something start with a. If you want to print something you have to use…. p. For example disassemble the current function: pdf (print disassembly function).

As no human could remember all available commands in radare (there are many of them), the ? character is used to display context sensitive help text. A single ? shows you which command categories are available and some general help about specifying addresses and data. A p? shows you the help about the different print commands, pd? which commands are available for printing disassemblies of different sources, etc.

An important command to walk through your targets is the seek command. By using

s sym.main

you’ll jumping into the main function (as long as you’d analyzed the binary in the first step and symbols are available). The following shows you the output of the pdf command after seeking to the main function of the /bin/true binary.

            ;-- section_end..plt:
            ;-- section..text:
/ (fcn) main 160
|           ; DATA XREF from 0x0040142d (main)
|           0x00401370      83ff02         cmp edi, 2                  ; [12] va=0x00401370 pa=0x00001370 sz=11465 vsz=11465 rwx=--r-x .text
|       ,=< 0x00401373      7403           je 0x401378                
|       |   0x00401375      31c0           xor eax, eax
|       |   0x00401377      c3             ret
|       `-> 0x00401378      53             push rbx
|           0x00401379      488b3e         mov rdi, qword [rsi]
|           0x0040137c      4889f3         mov rbx, rsi
|           0x0040137f      e84c050000     call fcn.004018d0
|           0x00401384      be6d404000     mov esi, 0x40406d
|           0x00401389      bf06000000     mov edi, 6
|           0x0040138e      e81dffffff     call sym.imp.setlocale
|           0x00401393      bef6404000     mov esi, str._usr_share_locale ; "/usr/share/locale" @ 0x4040f6
|           0x00401398      bfe8404000     mov edi, 0x4040e8           ; "coreutils" @ 0x4040e8
|           0x0040139d      e86efdffff     call sym.imp.bindtextdomain
|           0x004013a2      bfe8404000     mov edi, 0x4040e8           ; "coreutils" @ 0x4040e8
|           0x004013a7      e844fdffff     call sym.imp.textdomain
|           0x004013ac      bf20184000     mov edi, 0x401820
|           0x004013b1      e85a2c0000     call fcn.00404010
|           0x004013b6      488b5b08       mov rbx, qword [rbx + 8]    ; [0x8:8]=0
|           0x004013ba      be08414000     mov esi, str.__help         ; "--help" @ 0x404108
|           0x004013bf      4889df         mov rdi, rbx
|           0x004013c2      e839feffff     call sym.imp.strcmp
|           0x004013c7      85c0           test eax, eax
|       ,=< 0x004013c9      743b           je 0x401406                
|       |   0x004013cb      be0f414000     mov esi, str.__version      ; "--version" @ 0x40410f
|       |   0x004013d0      4889df         mov rdi, rbx
|       |   0x004013d3      e828feffff     call sym.imp.strcmp
|       |   0x004013d8      85c0           test eax, eax
|      ,==< 0x004013da      7526           jne 0x401402               
|      ||   0x004013dc      488b0dcd4d20.  mov rcx, qword [rip + 0x204dcd] ; [0x6061b0:8]=0x404383 str.8.25
|      ||   0x004013e3      488b3d3e4e20.  mov rdi, qword [rip + 0x204e3e] ; [0x606228:8]=0x2029554e4728203a  LEA obj.stdout ; ": (GNU) 5.3.0" @ 0x606228
|      ||   0x004013ea      4531c9         xor r9d, r9d
|      ||   0x004013ed      41b819414000   mov r8d, str.Jim_Meyering   ; "Jim Meyering" @ 0x404119
|      ||   0x004013f3      bae4404000     mov edx, str.GNU_coreutils  ; "GNU coreutils" @ 0x4040e4
|      ||   0x004013f8      be64404000     mov esi, str.true           ; "true" @ 0x404064
|      ||   0x004013fd      e86e220000     call fcn.00403670
|      `--> 0x00401402      31c0           xor eax, eax
|       |   0x00401404      5b             pop rbx
|       |   0x00401405      c3             ret
|       `-> 0x00401406      31ff           xor edi, edi
|           0x00401408      e803010000     call fcn.00401510
\           0x0040140d      0f1f00         nop dword [rax]

Nearly all radare commands which operates on addresses (like disassmble) allow to use an argument to specify a different position instead of the current one by using an @. Therefore a

pdf @sym.main

produces the same result. You like the tree view of IDA? Enter the visual mode of radare by entering V. Now you see an hexeditor were you can navigate through. By typing ?, you’ll get a list of possible keyshortcuts for this mode. Press V (capital v) to get this:

Graph view of /bin/true
Graph view of /bin/true

Awesome, isn’t it? Now let’s rename the function fcn.00401510 in our disassembly to a more meaningful name. This could either be done by

s fcn.00401510

afn better_name

Or by using

afn better_name fcn.00401510

or

afn better_name 0x00401510

or

afn better_name @fcn.00401510

or

afn better_name @0x00401510

 

If you’re a little afraid of this huge amount of command line fu, you could also try the webinterface:

r2 -c=H your_binary

Which gets you this:

Radare 2 Web interface

Challenge 0x01

You will find the first challenge here (linux64 dynamically linked, linux64 statically linked, win64):

https://github.com/bluec0re/reversing-radare2/tree/master/sources/challenge01

MD5 (challenge01) = b6449993d4278f9884f45018fc128d15
MD5 (challenge01.exe) = 1d78ad3807163cfb7b5194c1c22308c4
MD5 (challenge01-static) = 833cdfe2b105dfda82d3d4a28bab04bd
SHA1 (challenge01) = 55d7755be8ea872601e2c22172c8beccefca37cc
SHA1 (challenge01.exe) = 108837eeffb6635aa08e0d5607ba4d502d556f32
SHA1 (challenge01-static) = 2ad83f289e94c3326d3729fc2c04f32582aba8a3
SHA256 (challenge01) = 56f239d74bdafd8c2d27c07dff9c66dc92780a4742882b9b8724bce7cc6fb0d1
SHA256 (challenge01.exe) = 9ef57cbd343489317bef66de436ebe7544760b48414ca2247ced1ce462b4591d
SHA256 (challenge01-static) = 99f15536cc8a4a68e6211c3af6a0b327c875373659bf7100363319e168c0deb7

 

Your goal is to find the correct password for the login in the binary (hint: hardcoded passwords are bad) . Next time I’ll give you a walkthrough for the challenge and challenge #2.

Happy reversing!

Reverse Engineering x64 for Beginners – Linux

As to get started, we will be writing a simple C++ program which will prompt for a password. It will check if the password matches, if it does, it will prompt its correct, else will prompt its incorrect. The main reason I took up this example is because this will give you an idea of how the jump, if else and other similar conditions work in assembly language. Another reason for this is that most programs which have hardcoded keys in them can be cracked in a similar manner except with a bit of more mathematics, and this is how most piracy distributors crack the legit softwares and spread the keys.

Let’s first understand the C++ program that we have written. All of the code will be hosted in my Github profile :-

https://github.com/paranoidninja/ScriptDotSh-Reverse-Engineering

The code is pretty simple here. Our program takes one argument as an input which is basically the password. If I don’t enter any password, it will print the help command. If I specify a password, it gets stored as a char with 10 bytes and will send the password to the check_pass() function. Our hardcoded password is PASSWORD1 in the check_pass() function. Out here, our password get’s compared with the actual password variable mypass with the strcmp() function. If the password matches, it returns Zero, else it will return One. Back to our main function, if we receive One, it prints incorrect password, else it prints correct password.

Now, let’s get this code in our GDB debugger. We will execute the binary with GDB and we will first setup a breakpoint on main before we send the argument. Secondly, we will enable time travelling on our GDB, so that if we somehow go one step ahead by mistake, we can reverse that and come one step back again. This can be done with the following command: target record-full and reverse-stepi/nexti

Dont’ be scared if you don’t understand any of this. Just focus on the gdb$ part and as you can see above, I have given an incorrect password as pass123 after giving the breakpoint with break main. My compiled code should print an incorrect password as seen previously, but as we proceed, we will find two ways to bypass the code; one is by getting out the actual password from memory and second is by modifying the jump value and printing that the password is correct.

Disassembly

The next step is to disassemble the entire code and try to understand what actually is happening:

Our main point of intereset in the whole disassembled code would be the below few things:

1. je – je means jump to an address if its equal to something. If unequal, continue with the flow.

2. call – calls a new function. Remember that after this is loaded, the disassembled code will change from the main disassembly function to the new function’s disassembly code.

3. test – check if two values are equal

4. cmp – compare two values with each other

4. jne – jne means jump to and address if its not equal to something. Else, continue with the flow.

Some people might question why do we have test if we have cmp which does the same thing. The answer can be found here which is explained beautifully:-

https://stackoverflow.com/questions/39556649/linux-assembly-whats-difference-between-test-eax-eax-and-cmp-eax-0

So, if we see the disassembly code above, we know that if we run the binary without a password or argument, it will print help, else will proceed to check the password. So this cmp should be the part where it checks whether we have an arguement. If an arguement doesn’t exist it will continue with the printing of help, else it will jump to <main+70>. If you see that numbers next to the addresses on the left hand side, we can see that at <+70>, we are moving something into the rax register. So, what we will do is we will setup a breakpoint at je, by specifying its address 0x0000000000400972 and then will see if it jumps to <+70> by asking it to continue with c. GDB command c will continue running the binary till it hits another breakpoint.

And now if you do a stepi which is step iteration, it will step one iteration of execution and it should take you to <+70> where it moves a Quad Word into the rax register.

So, since our logic is correct till now, let’s move on to the next interesting thing that we see, which is the call part. And if you see next to it, it says something like <_Z10check_passPc> which is nothing but our check_pass() function. Let’s jump to that using stepi and see what’s inside that function.

Once, you jump into the check_pass() function and disassemble it, you will see a new set of disassembled code which is the code of just the check_pass() function itself. And here, there are four interesting lines of assembly code here:

The first part is where the value of rdx register is moved to rsi and rax is moved to rdi. The next part is strcmp() function is called which is a string compare function of C++. Next, we have the test which compares the two values, and if the values are equal, we jump (je) to <_Z10check_passPc+77> which will move the value Zero in the eax register. If the values are not equal, the function will continue to proceed at <+70> and move the value One in the eax register. Now, these are nothing but just the return values that we specified in the check_pass() function previously. Since we have entered an invalid password, the return value which will be sent would be One. But if we can modify the return value to Zero, it would print out as “Correct Password”.

Also, we can go ahead and check what is being moved into the rsi and the rdi register. So, let’s put a breakpoint there and jump straight right to it.

As you can see from the above image, I used x/s $rdx and x/s $rax commands to get the values from the register. x/s means examine the register and display it as a string. If you want to get it in bytes you can specify x/b or if you want characters, you can specify x/c and so on. There are multiple variations however. Now our first part of getting the password is over here. However, let’s continue and see how we can modify the return value at <_Z10check_passPc+70> to Zero. So, we will shoot stepi and jump to that iteration.

Epilogue

As you can see above, the function moved 0x1 to eax in the binary, but before it can do a je, we modified the value to 0x0 in eax using set $eax = 0x0 and then continued the function with c as below, and Voila!!! We have a value returned as Correct Password!

Learning assembly isn’t really something as a rocket science. But given a proper amount of time, it does become understandable and easy with experience.

This was just a simple example to get you started in assembly and reverse engineering. Now as we go deeper, we will see socket functions, runtime encryption, encoded hidden domain names and so on. This whole process can be done using x64dbg in Windows as well which I will show in my next blogpost.