Retargetable Machine-Code Decompiler: RetDec

RetDec is a retargetable machine-code decompiler based on LLVM. The decompiler is not limited to any particular target architecture, operating system, or executable file format:

  • Supported file formats: ELF, PE, Mach-O, COFF, AR (archive), Intel HEX, and raw machine code.
  • Supported architectures (32b only): Intel x86, ARM, MIPS, PIC32, and PowerPC.

 

Features:

  • Static analysis of executable files with detailed information.
  • Compiler and packer detection.
  • Loading and instruction decoding.
  • Signature-based removal of statically linked library code.
  • Extraction and utilization of debugging information (DWARF, PDB).
  • Reconstruction of instruction idioms.
  • Detection and reconstruction of C++ class hierarchies (RTTI, vtables).
  • Demangling of symbols from C++ binaries (GCC, MSVC, Borland).
  • Reconstruction of functions, types, and high-level constructs.
  • Integrated disassembler.
  • Output in two high-level languages: C and a Python-like language.
  • Generation of call graphs, control-flow graphs, and various statistics.

 

After seven years of development, Avast open-sources its machine-code decompiler for platform-independent analysis of executable files. Avast released its analytical tool, RetDec, to help the cybersecurity community fight malicious software. The tool allows anyone to study the code of applications to see what the applications do, without running them. The goal behind open sourcing RetDec is to provide a generic tool to transform platform-specific code, such as x86/PE executable files, into a higher form of representation, such as C source code. By generic, we mean that the tool should not be limited to a single platform, but rather support a variety of platforms, including different architectures, file formats, and compilers. At Avast, RetDec is actively used for analysis of malicious samples for various platforms, such as x86/PE and ARM/ELF.

 

What is a decompiler?

A decompiler is a program that takes an executable file as its input and attempts to transform it into a high-level representation while preserving its functionality. For example, the input file may be application.exe, and the output can be source code in a higher-level programming language, such as C. A decompiler is, therefore, the exact opposite of a compiler, which compiles source files into executable files; this is why decompilers are sometimes also called reverse compilers.

By preserving a program’s functionality, we want the source code to reflect what the input program does as accurately as possible; otherwise, we risk assuming the program does one thing, when it really does another.

Generally, decompilers are unable to perfectly reconstruct original source code, due to the fact that a lot of information is lost during the compilation process. Furthermore, malware authors often use various obfuscation and anti-decompilation tricks to make the decompilation of their software as difficult as possible.

RetDec addresses the above mentioned issues by using a large set of supported architectures and file formats, as well as in-house heuristics and algorithms to decode and reconstruct applications. RetDec is also the only decompiler of its scale using a proven LLVM infrastructure and provided for free, licensed under MIT.

Decompilers can be used in a variety of situations. The most obvious is reverse engineering when searching for bugs, vulnerabilities, or analyzing malicious software. Decompilation can also be used to retrieve lost source code when comparing two executables, or to verify that a compiled program does exactly what is written in its source code.

There are several important differences between a decompiler and a disassembler. The former tries to reconstruct an executable file into a platform-agnostic, high-level source code, while the latter gives you low-level, platform-specific assembly instructions. The assembly output is non-portable, error-prone when modified, and requires specific knowledge about the instruction set of the target processor. Another positive aspect of decompilers is the high-level source code they produce, like  C source code, which can be read by people who know nothing about the assembly language for the particular processor being analyzed.

 

Installation and Use

Currently,RetDec support only Windows (7 or later) and Linux.

 

Windows

  1. Either download and unpack a pre-built package from the following list, or build and install the decompiler by yourself (the process is described below):
  2. Install Microsoft Visual C++ Redistributable for Visual Studio 2015.
  3. Install MSYS2 and other needed applications by following RetDec’s Windows environment setup guide.
  4. Now, you are all set to run the decompiler. To decompile a binary file named test.exe, go into $RETDEC_INSTALLED_DIR/bin and run:
    bash decompile.sh test.exe
    

    For more information, run bash decompile.sh --help.

 

Linux

  1. There are currently no pre-built packages for Linux. You will have to build and install the decompiler by yourself. The process is described below.
  2. After you have built the decompiler, you will need to install the following packages via your distribution’s package manager:
  3. Now, you are all set to run the decompiler. To decompile a binary file named test.exe, go into $RETDEC_INSTALLED_DIR/bin and run:
    ./decompile.sh test.exe
    

    For more information, run ./decompile.sh --help.

 

Build and Installation


Requirements

Linux

On Debian-based distributions (e.g. Ubuntu), the required packages can be installed with apt-get:

sudo apt-get install build-essential cmake git perl python bash coreutils wget bc graphviz upx flex bison zlib1g-dev libtinfo-dev autoconf pkg-config m4 libtool

 

Windows

  • Microsoft Visual C++ (version >= Visual Studio 2015 Update 2)
  • Git
  • MSYS2 and some other applications. Follow RetDec’s Windows environment setup guide to get everything you need on Windows.
  • Active Perl. It needs to be the first Perl in PATH, or it has to be provided to CMake using CMAKE_PROGRAM_PATH variable, e.g. -DCMAKE_PROGRAM_PATH=/c/perl/bin.
  • Python (version >= 3.4)

 

Process

Warning: Currently, RetDec has to be installed into a clean, dedicated directory. Do NOT install it into /usr,/usr/local, etc. because our build system is not yet ready for system-wide installations. So, when running cmake, always set -DCMAKE_INSTALL_PREFIX=<path> to a directory that will be used just by RetDec. 

  • Recursively clone the repository (it contains submodules):
    • git clone --recursive https://github.com/avast-tl/retdec
  • Linux:
    • cd retdec
    • mkdir build && cd build
    • cmake .. -DCMAKE_INSTALL_PREFIX=<path>
    • make && make install
  • Windows:
    • Open MSBuild command prompt, or any terminal that is configured to run the msbuild command.
    • cd retdec
    • mkdir build && cd build
    • cmake .. -DCMAKE_INSTALL_PREFIX=<path> -G<generator>
    • msbuild /m /p:Configuration=Release retdec.sln
    • msbuild /m /p:Configuration=Release INSTALL.vcxproj
    • Alternatively, you can open retdec.sln generated by cmake in Visual Studio IDE.

You have to pass the following parameters to cmake:

  • -DCMAKE_INSTALL_PREFIX=<path> to set the installation path to <path>.
  • (Windows only) -G<generator> is -G"Visual Studio 14 2015" for 32-bit build using Visual Studio 2015, or -G"Visual Studio 14 2015 Win64" for 64-bit build using Visual Studio 2015. Later versions of Visual Studio may be used.

You can pass the following additional parameters to cmake:

  • -DRETDEC_DOC=ON to build with API documentation (requires Doxygen and Graphviz, disabled by default).
  • -DRETDEC_TESTS=ON to build with tests, including all the tests in dependency submodules (disabled by default).
  • -DCMAKE_BUILD_TYPE=Debug to build with debugging information, which is useful during development. By default, the project is built in the Release mode. This has no effect on Windows, but the same thing can be achieved by running msbuild with the /p:Configuration=Debug parameter.
  • -DCMAKE_PROGRAM_PATH=<path> to use Perl at <path> (probably useful only on Windows).

Вычисление с выводом данных в окно консоли: Задано массив А из N = 4 элементов. Написать программу определения суммы элементов массива А, для которых биты 0 и 5 совпадают.

.686; Директива определения типа микропроцессора

. Model flat, stdcall; задачи линейной модели памяти
; И соглашения ОС Windows
option casemap: none; отличие малых и больших букв

include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\fpu.inc
include \masm32\include\user32.inc
includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\fpu.lib

ExitProcess proto: DWORD

. Data; директива определения данных
st1 db «Вывод суммы массива! А», 0
st2 db 10 dup (?), 0
ifmt db «Сумма =% d», 0
masivA db 75,31,88,32
sum dw 0
iden db 0
work1 db 0
work2 db 0
prom dd 0

. Code; директива начала кода
_start:
mov eax, 0
mov ebx, 0
mov ecx, 3
mov edx, 0
lea esi, masivA

M1:
mov prom, ebx
mov al, byte ptr [esi + ebx]; пересылки значения массива в младший регистр al
inc ebx
mov bl, byte ptr [esi + ebx]; пересылки значения массива в младший регистр bl
mov work1, al
mov work2, bl
and eax, 21h
and ebx, 21h
sub eax, ebx проверка сходимости битов
jz M3
mov iden, 0;идентификатор. Он необходим для суммы.

M2:
mov ebx, prom
inc ebx
loop M1
jmp M4

M3:
mov al, work1
dec iden
jz Q1; если идентификатов = 0, тогда перейти на метку Q1
mov iden, 1
mov bl, work2
Q1: add sum, ax; подсчета суммы
add sum, bx; подсчета суммы
jmp M2

M4:
mov ebx, 0
mov bx, sum; пересылка значение суммы в регистр
invoke wsprintf, \
ADDR st2, \
ADDR ifmt, \
ebx
invoke MessageBox, \; функция вывода значения
NULL, \
addr st2, \
addr st1 \
MB_OK
invoke ExitProcess, 0
end _start; окончания программы

Ввести 2 дробных числа и сложить их и вывести на консоль и в MessageBox.

.686; Директива определения типа микропроцессора

. Model flat, stdcall; задачи линейной модели памяти
; И соглашения ОС Windows
option casemap: none; отличие малых и больших букв

include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\fpu.inc
include \masm32\include\user32.inc
includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\fpu.lib

ExitProcess proto: DWORD

. Data; директива определения данных
st1 db «Вывод суммы массива! А», 0
st2 db 10 dup (?), 0
ifmt db «Сумма =% d», 0
masivA db 75,31,88,32
sum dw 0
iden db 0
work1 db 0
work2 db 0
prom dd 0

. Code; директива начала кода
_start:
mov eax, 0
mov ebx, 0
mov ecx, 3
mov edx, 0
lea esi, masivA

M1:
mov prom, ebx
mov al, byte ptr [esi + ebx]; пересылки значения массива в младший регистр al
inc ebx
mov bl, byte ptr [esi + ebx]; пересылки значения массива в младший регистр bl
mov work1, al
mov work2, bl
and eax, 21h
and ebx, 21h
sub eax, ebx проверка сходимости битов
jz M3
mov iden, 0;идентификатор. Он необходим для суммы.

M2:
mov ebx, prom
inc ebx
loop M1
jmp M4

M3:
mov al, work1
dec iden
jz Q1; если идентификатов = 0, тогда перейти на метку Q1
mov iden, 1
mov bl, work2
Q1: add sum, ax; подсчета суммы
add sum, bx; подсчета суммы
jmp M2

M4:
mov ebx, 0
mov bx, sum; пересылка значение суммы в регистр
invoke wsprintf, \
ADDR st2, \
ADDR ifmt, \
ebx
invoke MessageBox, \; функция вывода значения
NULL, \
addr st2, \
addr st1 \
MB_OK
invoke ExitProcess, 0
end _start; окончания программы

Программа расчета формулы в языке Ассемблер. Формула имеет вид: D = 4*Pi*Ha*Fi*D0/L

D = 4 * Pi * Ha * Fi * D0 / L

.386; Директива определения типа микропроцессора
. Model flat, stdcall; задачи линейной модели памяти
; И соглашения ОС Windows

option casemap: none; отличие малых и больших букв

include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\fpu.inc
include \masm32\include\user32.inc
includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\fpu.lib
BSIZE equ 30
. Data; директива определения данных
D0 dword 200;сохранение в 32-разрядной ячейке памяти переменной х
L dword 1; резервирования 32-х разрядов памяти для переменной y
Ha dd 3,4,5
Fi dd 1,2,3
const dd 4

st1 db «Результат вычисления:», 0
st2 db 10 dup (?), 0

. Code; директива начала кода
_start:
lea esi, Ha
lea edi, Fi
mov ebx, 3

m1:
mov ecx, 3
lea edi, Fi
m2:
finit
fldpi; заносим значения Пи
fimul const; Pi * 4
fild dword ptr [esi] fmul; умножаем результат на На
fild dword ptr [edi]; заносим результат Фи
fmul; умножаем результат на Фи
fild D0; заносим значение D0
fmul; умножаем результат на D0
fild L; заносим значение лянда
fdiv; делим результат на лянда
pushad; сохраняем все регистры общего назначения в стек

invoke FpuFLtoA, 0, 10, ADDR st2, SRC1_FPU or SRC2_DIMM
invoke MessageBox, NULL, addr st2, addr st1, MB_OK

popad; считываем из стека

add edi, 4;следующий элемент в массиве
loop m2
add esi, 4; следующий элемент в массиве
dec ebx
jnz m1; переходить на метку m1 пока ebx не станет равна 0

invoke ExitProcess, NULL; возврат управления Windows
; И освобождения ресурсов

end _start;директива окончания программы с именем start