Compiling C/C++ programs with Microsoft’s command-line compilers is possible, even if you don’t have Visual Studio installed. You can do this with the Build Tools for Visual Studio 2017 (a free download).
Please note: While this article uses IDA Pro to disassemble the compiled code, many of the features of IDA Pro (i.e. graphing, pseudocode translation, etc.) can be found in plugins and builds for other free disassemblers such as radare2. Furthermore, while preparing for this article I took the liberty of changing some variable names in the disassembled code from IDA presets like “v20” to what they correspond to in the C code. This was done to make each portion easier to understand. Finally, please note that this C code was compiled into a 64 bit executable and disassembled with IDA Pro’s 64 bit version. This can be especially seen when calculating array sizes, as the 32 bit registers (i.e. eax) are often doubled in size and transformed into 64 bit registers (i.e rax).
Ok, Let’s begin!
While Part 1 broke down and described basic programming concepts like loops and IF statements, this article is meant to explain more advanced topics that you would have to decipher when reverse engineering.
Let’s begin with Arrays, First, let’s take a look at the code as a whole:
Now, let’s take a look at the decompiled assembly as a whole:
As you can see, the 12 lines of code turned into quite a large block of code. But don’t be intimidated! Remember, all we’re doing here is setting up arrays!
Let’s break it down bit by bit:
When initializing an array with an integer literal, the compiler simply initializes the length through a local variable.
EDIT: The above photo labeled “Declaring an array with a literal — disassembled” is actually labeled incorrectly. While yes, when initializing an array with an integer literal the compiler does first initialize the length through a local variable, the above screenshot is actually the initialization of a stack canary. Stack Canaries are used to detect overflow attacks that may, if unmitigated, lead to execution of malicious code. During compilation the compiler allocated enough space for the only litArray element that would be used, litArray (see photo below labeled “local variables — Arrays” — as you can see, the only litArray element that was allocated for is litArray). Compiler optimization can significantly enhance the speed of applications.
Sorry for the confusion!
Declaring an array with a variable — code
Declaring an array with a variable — assembly
declaring an array with pre-defined objects — code
When declaring an array with pre-defined index definitions the compiler simply saves each pre-defined object into its own variable which represents the index within the array (i.e. objArray4 = objArray)
initializing an array index — code
initializing an array index — assembly
Much like declaring an array with pre-defined index definitions, when initializing (or setting) an index in an array, the compiler creates a new variable for said index.
retrieving an item from an array — code
retrieving an item from an array — assembly
When retrieving items from arrays, the item is taken from the index within the array and set to the desired variable.
creating a matrix with variables — code
When creating a matrix, first the row and column sizes are set to their row and col variables. Next, the maximum and minimum indexes for the rows and columns are calculated and used to calculate the base location / overall size of the matrix in memory.
Dynamic memory allocation using malloc — code
In this function we allocate 11 characters using malloc and then copy “Hello World” into the allocated memory space.
Now, let’s take a look at the assembly:
Please note: Throughout the assembly you may see ‘nop’ instructions. these instructions were specifically placed by me during the preparation stage for this article so that I could easily navigate and comment throughout the assembly code.
dynamic memory allocation using malloc — assembly
When using malloc, first the size of the allocated memory (0x0B) is first moved into the edi register. Next, the _malloc system function is called to allocate memory. The allocated memory area is then stored in the ptr variable. Next, the “Hello World” string is broken down into “Hello Wo” and “rld” as it is copied into the allocated memory space. Finally, the newly copied “Hello World” string is printed out and the allocated memory is freed using the _free system function.