Bypass EDR’s memory protection, introduction to hooking

Original text by Hoang Bui

Introduction

On a recent internal penetration engagement, I was faced against an EDR product that I will not name. This product greatly hindered my ability to access lsass’ memory and use our own custom flavor of Mimikatz to dump clear-text credentials.

For those who recommends ProcDump

The Wrong Path

So now, as an ex-malware author — I know that there are a few things you could do as a driver to accomplish this detection and block. The first thing that comes to my mind was Obregistercallback which is commonly used by many Antivirus products. Microsoft implemented this callback due to many antivirus products performing very sketchy winapi hooks that reassemble malware rootkits. However, at the bottom of the msdn page, you will notice a text saying “Available starting with Windows Vista with Service Pack 1 (SP1) and Windows Server 2008.” To give some missing context, I am on a Windows server 2003 at the moment. Therefore, it is missing the necessary function to perform this block.

After spending hours and hours, doing black magic stuff with csrss.exe and attempting to inherit a handle to lsass.exe through csrss.exe, I was successful in gaining a handle with PROCESS_ALL_ACCESS to lsass.exe. This was through abusing csrss to spawn a child process and then inherit the already existing handle to lsass.

There is no EDR solution on this machine, this was just an PoC

However, after thinking “I got this!” and was ready to rejoice in victory over defeating a certain EDR, I was met with a disappointing conclusion. The EDR blocked the shellcode injection into csrss as well as the thread creation through RtlCreateUserThread. However, for some reason — the code while failing to spawn as a child process and inherit the handle, was still somehow able to get the PROCESS_ALL_ACCESS handle to lsass.exe.

WHAT?!

Hold up, let me try just opening a handle to lsass.exe without any fancy stuff with just this line:

HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, lsasspid);

And what do you know, I got a handle with FULL CONTROL over lsass.exe. The EDR did not make a single fuzz about this. This is when I realized, I started off the approach the wrong way and the EDR never really cared about you gaining the handle access. It is what you do afterward with that handle that will come under scrutiny.

Back on Track

Knowing there was no fancy trick in getting a full control handle to lsass.exe, we can now move forward to find the next point of the issue. Immediately calling MiniDumpWriteDump() with the handle failed spectacularly.

Let’s dissect this warning further. “Violation: LsassRead”. I didn’t read anything, what are you talking about? I just want to do a dump of the process. However, I also know that to make a dump of a remote process, there must be some sort of WINAPI being called such as ReadProcessMemory (RPM) inside MiniDumpWriteDump(). Let’s look at MiniDumpWriteDump’s source code at ReactOS.

Multiple calls to RPM

As you can see by, the function (2) dump_exception_info(), as well as many other functions, relies on (3) RPM to perform its duty. These functions are referenced by MiniDumpWriteDump (1) and this is probably the root of our issue. Now here is where a bit of experience comes into play. You must understand the Windows System Internal and how WINAPIs are processed. Using ReadProcessMemory as an example — it works like this.

ReadProcessMemory is just a wrapper. It does a bunch of sanity check such as nullptr check. That is all RPM does. However, RPM also calls a function “NtReadVirtualMemory”, which sets up the registers before doing a syscall instruction. Syscall instruction is just telling the CPU to enter kernel mode which then another function ALSO named NtReadVirtualMemory is called, which does the actual logic of what ReadProcessMemory is supposed to do.

— — — — — -Userland — — — —- — — — | — — — Kernel Land — — — —

RPM — > NtReadVirtualMemory —> SYSCALL->NtReadVirtualMemory

Kernel32 — — -ntdll — — — — — — — — — - — — — — — ntoskrnl

With that knowledge, we now must identify HOW the EDR product is detecting and stopping the RPM/NtReadVirtualMemory call. This comes as a simple answer which is “hooking”. Please refer to my previous post regarding hooking here for more information. In short, it gives you the ability to put your code in the middle of any function and gain access to the arguments as well as the return variable. I am 100% sure that the EDR is using some sort of hook through one or more of the various techniques that I mentioned.

However, readers should know that most if not all EDR products are using a service, specifically a driver running inside kernel mode. With access to the kernel mode, the driver could perform the hook at ANY of the level in the RPM’s callstack. However, this opens up a huge security hole in a Windows environment if it was trivial for any driver to hook ANY level of a function. Therefore, a solution is to put forward to prevent modification of such nature and that solution is known as Kernel Patch Protection (KPP or Patch Guard). KPP scans the kernel on almost every level and will triggers a BSOD if a modification is detected. This includes ntoskrnl portion which houses the WINAPI’s kernel level’s logic. With this knowledge, we are assured that the EDR would not and did not hook any kernel level function inside that portion of the call stack, leaving us with the user-land’s RPM and NtReadVirtualMemory calls.

The Hook

To see where the function is located inside our application’s memory, it is as trivial as a printf with %p format string and the function name as the argument, such as below.

However, unlike RPM, NtReadVirtualMemory is not an exported function inside ntdll and therefore you cannot just reference to the function like normal. You must specify the signature of the function as well as linking ntdll.lib into your project to do so.

With everything in place, let’s run it and take a look!

Now, this provides us with the address of both RPM and ntReadVirtualMemory. I will now use my favorite reversing tool to read the memory and analyze its structure, Cheat Engine.

ReadProcessMemory
NtReadVirtualMemory

For the RPM function, it looks fine. It does some stack and register set up and then calls ReadProcessMemory inside Kernelbase (Topic for another time). Which would eventually leads you down into ntdll’s NtReadVirtualMemory. However, if you look at NtReadVirtualMemory and know what the most basic detour hook look like, you can tell that this is not normal. The first 5 bytes of the function is modified and the rest are left as-is. You can tell this by looking at other similar functions around it. All the other functions follows a very similar format:

0x4C, 0x8B, 0xD1, // mov r10, rcx; NtReadVirtualMemory

0xB8, 0x3c, 0x00, 0x00, 0x00, // eax, 3ch — aka syscall id

0x0F, 0x05, // syscall

0xC3 // retn

With one difference being the syscall id (which identifies the WINAPI function to be called once inside kernel land). However, for NtReadVirtualMemory, the first instruction is actually a JMP instruction to an address somewhere else in memory. Let’s follow that.

CyMemDef64.dll

Okay, so we are no longer inside ntdll’s module but instead inside CyMemdef64.dll’s module. Ahhhhh now I get it.

The EDR placed a jump instruction where the original NtReadVirtualMemory function is supposed to be, redirect the code flow into their own module which then checked for any sort of malicious activity. If the checks fail, the Nt* function would then return with an error code, never entering the kernel land and execute to begin with.

The Bypass

It is now very self-evident what the EDR is doing to detect and stop our WINAPI calls. But how do we get around that? There are two solutions.

Re-Patch the Patch

We know what the NtReadVirtualMemory function SHOULD looks like and we can easily overwrite the jmp instruction with the correct instructions. This will stop our calls from being intercepted by CyMemDef64.dll and enter the kernel where they have no control over.

Ntdll IAT Hook

We could also create our own function, similar to what we are doing in Re-Patch the Patch, but instead of overwriting the hooked function, we will recreate it elsewhere. Then, we will walk Ntdll’s Import Address Table, swap out the pointer for NtReadVirtualMemory and points it to our new fixed_NtReadVirtualMemory. The advantage of this method is that if the EDR decides to check on their hook, it will looks unmodified. It just is never called and the ntdll IAT is pointed elsewhere.

The Result

I went with the first approach. It is simple, and it allows me to get out the blog quicker :). However, it would be trivial to do the second method and I have plans on doing just that within a few days. Introducing AndrewSpecial, for my manager Andrew who is currently battling a busted appendix in the hospital right now. Get well soon man.

AndrewSpecial.exe was never caught 😛

Conclusion

This currently works for this particular EDR, however — It would be trivial to reverse similar EDR products and create a universal bypass due to their limitation around what they can hook and what they can’t (Thank you KPP).

Did I also mention that this works on both 64 bit (on all versions of windows) and 32 bits (untested)? And the source code is available HERE.

Thank you again for your time and please let me know if I made any mistake.

Реклама

Userland API Monitoring and Code Injection Detection

( Original text by dtm )

bout This Paper

The following document is a result of self-research of malicious software (malware) and its interaction with the Windows Application Programming Interface (WinAPI). It details the fundamental concepts behind how malware is able to implant malicious payloads into other processes and how it is possible to detect such functionality by monitoring communication with the Windows operating system. The notion of observing calls to the API will also be illustrated by the procedure of hooking certain functions which will be used to achieve the code injection techniques.

Disclaimer: Since this was a relatively accelerated project due to some time constraints, I would like to kindly apologise in advance for any potential misinformation that may be presented and would like to ask that I be notified as soon as possible so that it may revised. On top of this, the accompanying code may be under-developed for practical purposes and have unforseen design flaws.

Introduction

In the present day, malware are developed by cyber-criminals with the intent of compromising machines that may be leveraged to perform activities from which they can profit. For many of these activities, the malware must be able survive out in the wild, in the sense that they must operate covertly with all attempts to avert any attention from the victims of the infected and thwart detection by anti-virus software. Thus, the inception of stealth via code injection was the solution to this problem.


Section I: Fundamental Concepts

Inline Hooking

Inline hooking is the act of detouring the flow of code via hotpatching. Hotpatching is defined as the modification of code during the runtime of an executable image[1]. The purpose of inline hooking is to be able to capture the instance of when the program calls a function and then from there, observation and/or manipulation of the call can be accomplished. Here is a visual representation of how normal execution works:

Normal Execution of a Function Call

| Program | ------ calls function -----> | Function | (execution of function)

versus execution of a hooked function:

This can be separated into three steps. To demonstrate this process, the WinAPI function MessageBox 15 will be used.

  1. Hooking the function

To hook the function, we first require the intermediate function which must replicate parameters of the targetted function. Microsoft Developer Network (MSDN) defines MessageBox as the following:

int WINAPI MessageBox(
    _In_opt_ HWND    hWnd,
    _In_opt_ LPCTSTR lpText,
    _In_opt_ LPCTSTR lpCaption,
    _In_     UINT    uType
);

So the intermediate function may be defined like so:

int WINAPI HookedMessageBox(HWND hWnd, LPCTSTR lpText, LPCTSTR lpCaption, UINT uType) {
    // our code in here
}

Once this exists, execution flow has somewhere for the code to be redirected. To actually hook the MessageBoxfunction, the first few bytes of the code can be patched (keep in mind that the original bytes must be saved so that the function may be restored for when the intermediate function is finished). Here are the original assembly instructions of the function as represented in its corresponding module user32.dll:

; MessageBox
8B FF   mov edi, edi
55      push ebp
8B EC   mov ebp, esp

versus the hooked function:

; MessageBox
68 xx xx xx xx  push <HookedMessageBox> ; our intermediate function
C3              ret

Here I have opted to use the push-ret combination instead of an absolute jmp due to my past experiences of it not being reliable for reasons to be discovered. xx xx xx xx represents the little-endian byte-order address of HookedMessageBox.

  1. Capturing the function call

When the program calls MessageBox, it will execute the push-ret and effectively jump into the HookedMessageBox function and once there, it has complete control over the paramaters and the call itself. To replace the text that will be shown on the message box dialog, the following can be defined in HookedMessageBox:

int WINAPI HookedMessageBox(HWND hWnd, LPCTSTR lpText, LPCTSTR lpCaption, UINT uType) {
    TCHAR szMyText[] = TEXT("This function has been hooked!");
}

szMyText can be used to replace the LPCTSTR lpText parameter of MessageBox.

  1. Resuming normal execution

To forward this parameter, execution needs to continue to the original MessageBox so that the operating system can display the dialog. Since calling MessageBox again will just result in an infinite recursion, the original bytes must be restored (as previously mentioned).

int WINAPI HookedMessageBox(HWND hWnd, LPCTSTR lpText, LPCTSTR lpCaption, UINT uType) {
    TCHAR szMyText[] = TEXT("This function has been hooked!");
    
    // restore the original bytes of MessageBox
    // ...
    
    // continue to MessageBox with the replaced parameter and return the return value to the program
    return MessageBox(hWnd, szMyText, lpCaption, uType);
}

If rejecting the call to MessageBox was desired, it is as easy as returning a value, preferrably one that is defined in the documentation. For example, to return the “No” option from a “Yes/No” dialog, the intermediate function can be:

int WINAPI HookedMessageBox(HWND hWnd, LPCTSTR lpText, LPCTSTR lpCaption, UINT uType) {
    return IDNO;  // IDNO defined as 7
}

API Monitoring

The concept of API monitoring follows on from function hooking. Because gaining control of function calls is possible, observation of all of the parameters is also possible, as previously mentioned hence the name API monitoring. However, there is a small issue which is caused by the availability of different high-level API calls that are unique but operate using the same set of API at a lower level. This is called function wrapping, defined as subroutines whose purpose is to call a secondary subroutine. Returning to the MessageBox example, there are two defined functions: MessageBoxA for parameters that contain ASCII characters and a MessageBoxW for parameters that contain wide characters. In reality, to hook MessageBox, it is required that both MessageBoxAand MessageBoxW be patched. The solution to this problem is to hook at the lowest possible common point of the function call hierarchy.

                                                      +---------+
                                                      | Program |
                                                      +---------+
                                                     /           \
                                                    |             |
                                            +------------+   +------------+
                                            | Function A |   | Function B |
                                            +------------+   +------------+
                                                    |             |
                                           +-------------------------------+
                                           | user32.dll, kernel32.dll, ... |
                                           +-------------------------------+
       +---------+       +-------- hook -----------------> |
       |   API   | <---- +              +-------------------------------------+
       | Monitor | <-----+              |              ntdll.dll              |
       +---------+       |              +-------------------------------------+
                         +-------- hook -----------------> |                           User mode
                                 -----------------------------------------------------
                                                                                       Kernel mode

Here is what the MessageBox call hierarchy looks like:

Here is MessageBoxA:

user32!MessageBoxA -> user32!MessageBoxExA -> user32!MessageBoxTimeoutA -> user32!MessageBoxTimeoutW

and MessageBoxW:

user32!MessageBoxW -> user32!MessageBoxExW -> user32!MessageBoxTimeoutW

The call hierarchy both funnel into MessageBoxTimeoutW which is an appropriate location to hook. For functions that have a deeper hierarchy, hooking any lower could prove to be unecessarily troublesome due to the possibility of an increasing complexity of the function’s parameters. MessageBoxTimeoutW is an undocumented WinAPI function and is defined[2] like so:

int WINAPI MessageBoxTimeoutW(
    HWND hWnd, 
    LPCWSTR lpText, 
    LPCWSTR lpCaption, 
    UINT uType, 
    WORD wLanguageId, 
    DWORD dwMilliseconds
);

To log the usage:

int WINAPI MessageBoxTimeoutW(HWND hWnd, LPCWSTR lpText, LPCWSTR lpCaption, UINT uType, WORD wLanguageId, DWORD dwMilliseconds) {
    std::wofstream logfile;     // declare wide stream because of wide parameters
    logfile.open(L"log.txt", std::ios::out | std::ios::app);
    
    logfile << L"Caption: " << lpCaption << L"\n";
    logfile << L"Text: " << lpText << L"\n";
    logfile << L"Type: " << uType << :"\n";
    
    logfile.close();
    
    // restore the original bytes
    // ...
    
    // pass execution to the normal function and save the return value
    int ret = MessageBoxTimeoutW(hWnd, lpText, lpCaption, uType, wLanguageId, dwMilliseconds);
    
    // rehook the function for next calls
    // ...
    
    return ret;   // return the value of the original function
}

Once the hook has been placed into MessageBoxTimeoutWMessageBoxA and MessageBoxW should both be captured.


Code Injection Primer

For the purposes of this paper, code injection will be defined as the insertion of executable code into an external process. The possibility of injecting code is a natural result of the functionality allowed by the WinAPI. If certain functions are stringed together, it is possible to access an existing process, write data to it and then execute it remotely under its context. In this section, the relevant techniques of code injection that was covered in the research will be introduced.

DLL Injection

Code can come from a variety of forms, one of which is a Dynamic Link Library (DLL). DLLs are libraries that are designed to offer extended functionality to an executable program which is made available by exporting subroutines. Here is an example DLL that will be used for the remainder of the paper:

extern "C" void __declspec(dllexport) Demo() {
    ::MessageBox(nullptr, TEXT("This is a demo!"), TEXT("Demo"), MB_OK);
}

bool APIENTRY DllMain(HINSTANCE hInstDll, DWORD fdwReason, LPVOID lpvReserved) {
    if (fdwReason == DLL_PROCESS_ATTACH)
        ::CreateThread(nullptr, 0, (LPTHREAD_START_ROUTINE)Demo, nullptr, 0, nullptr);
    return true;
}

When a DLL is loaded into a process and initialised, the loader will call DllMain with fdwReason set to DLL_PROCESS_ATTACH. For this example, when it is loaded into a process, it will thread the Demo subroutine to display a message box with the title Demo and the text This is a demo!. To correctly finish the initialisation of a DLL, it must return true or it will be unloaded.

CreateRemoteThread

DLL injection via the CreateRemoteThread 7 function utilises this function to execute a remote thread in the virtual space of another process. As mentioned above, all that is required to execute a DLL is to have it load into the process by forcing it to execute the LoadLibrary function. The following code can be used to accomplish this:

void injectDll(const HANDLE hProcess, const std::string dllPath) {
    LPVOID lpBaseAddress = ::VirtualAllocEx(hProcess, nullptr, dllPath.length(), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
	
    ::WriteProcessMemory(hProcess, lpBaseAddress, dllPath.c_str(), dllPath.length(), &dwWritten);
  
    HMODULE hModule = ::GetModuleHandle(TEXT("kernel32.dll"));
  
    LPVOID lpStartAddress = ::GetProcAddress(hModule, "LoadLibraryA");      // LoadLibraryA for ASCII string
  
    ::CreateRemoteThread(hProcess, nullptr, 0, (LPTHREAD_START_ROUTINE)lpStartAddress, lpBaseAddress, 0, nullptr);
}

MSDN defines LoadLibrary as:

HMODULE WINAPI LoadLibrary(
    _In_ LPCTSTR lpFileName
);

It takes a single parameter which is the path name to the desired library to load. The CreateRemoteThreadfunction allows one parameter to be passed into the thread routine which matches exactly that of LoadLibrary‘s function definition. The goal is to allocate the string parameter in the virtual address space of the target process and then pass that allocated space’s address into the parameter argument of CreateRemoteThread so that LoadLibrary can be invoked to load the DLL.

  1. Allocating virtual memory in the target process

Using VirtualAllocEx allows space to be allocated within a selected process and on success, it will return the starting address of the allocated memory.

Virtual Address Space of Target Process
                                              +--------------------+
                                              |                    |
                        VirtualAllocEx        +--------------------+
                        Allocated memory ---> |     Empty space    |
                                              +--------------------+
                                              |                    |
                                              +--------------------+
                                              |     Executable     |
                                              |       Image        |
                                              +--------------------+
                                              |                    |
                                              |                    |
                                              +--------------------+
                                              |    kernel32.dll    |
                                              +--------------------+
                                              |                    |
                                              +--------------------+
  1. Writing the DLL path to allocated memory

Once memory has been initialised, the path to the DLL can be injected into the allocated memory returned by VirtualAllocEx using WriteProcessMemory.

Virtual Address Space of Target Process
                                              +--------------------+
                                              |                    |
                        WriteProcessMemory    +--------------------+
                        Inject DLL path ----> | "..\..\myDll.dll"  |
                                              +--------------------+
                                              |                    |
                                              +--------------------+
                                              |     Executable     |
                                              |       Image        |
                                              +--------------------+
                                              |                    |
                                              |                    |
                                              +--------------------+
                                              |    kernel32.dll    |
                                              +--------------------+
                                              |                    |
                                              +--------------------+
  1. Get address of LoadLibrary

Since all system DLLs are mapped to the same address space across all processes, the address of LoadLibrary does not have to be directly retrieved from the target process. Simply calling GetModuleHandle(TEXT("kernel32.dll")) and GetProcAddress(hModule, "LoadLibraryA") will do the job.

  1. Loading the DLL

The address of LoadLibrary and the path to the DLL are the two main elements required to load the DLL. Using the CreateRemoteThread function, LoadLibrary is executed under the context of the target process with the DLL path as a parameter.

Virtual Address Space of Target Process
                                              +--------------------+
                                              |                    |
                                              +--------------------+
                                   +--------- | "..\..\myDll.dll"  |
                                   |          +--------------------+
                                   |          |                    |
                                   |          +--------------------+ <---+
                                   |          |     myDll.dll      |     |
                                   |          +--------------------+     |
                                   |          |                    |     | LoadLibrary
                                   |          +--------------------+     | loads
                                   |          |     Executable     |     | and
                                   |          |       Image        |     | initialises
                                   |          +--------------------+     | myDll.dll
                                   |          |                    |     |
                                   |          |                    |     |
          CreateRemoteThread       v          +--------------------+     |
          LoadLibraryA("..\..\myDll.dll") --> |    kernel32.dll    | ----+
                                              +--------------------+
                                              |                    |
                                              +--------------------+

SetWindowsHookEx

Windows offers developers the ability to monitor certain events with the installation of hooks by using the SetWindowsHookEx 6 function. While this function is very common in the monitoring of keystrokes for keylogger functionality, it can also be used to inject DLLs. The following code demonstrates DLL injection into itself:

int main() {
    HMODULE hMod = ::LoadLibrary(DLL_PATH);
    HOOKPROC lpfn = (HOOKPROC)::GetProcAddress(hMod, "Demo");
    HHOOK hHook = ::SetWindowsHookEx(WH_GETMESSAGE, lpfn, hMod, ::GetCurrentThreadId());
    ::PostThreadMessageW(::GetCurrentThreadId(), WM_RBUTTONDOWN, (WPARAM)0, (LPARAM)0);

    // message queue to capture events
    MSG msg;
    while (::GetMessage(&msg, nullptr, 0, 0) > 0) {
        ::TranslateMessage(&msg);
        ::DispatchMessage(&msg);
    }
    
    return 0;
}

SetWindowsHookEx defined by MSDN as:

HHOOK WINAPI SetWindowsHookEx(
    _In_ int       idHook,
    _In_ HOOKPROC  lpfn,
    _In_ HINSTANCE hMod,
    _In_ DWORD     dwThreadId
);

takes a HOOKPROC parameter which is a user-defined callback subroutine that is executed when the specific hook event is trigged. In this case, the event is WH_GETMESSAGE which deals with messages in the message queue. The code initially loads the DLL into its own virtual process space and the exported Demo function’s address is obtained and defined as the callback function in the call to SetWindowsHookEx. To force the callback function to execute, PostThreadMessage is called with the message WM_RBUTTONDOWN which will trigger the WH_GETMESSAGE hook and thus the message box will be displayed.

QueueUserAPC

DLL injection with QueueUserAPC 5 works similar to that of CreateRemoteThread. Both allocate and inject the DLL path into the virtual address space of a target process and then force a call to LoadLibrary under its context.

int injectDll(const std::string dllPath, const DWORD dwProcessId, const DWORD dwThreadId) {
    HANDLE hProcess = ::OpenProcess(PROCESS_ALL_ACCESS, false, dwProcessId);

    HANDLE hThread = ::OpenThread(THREAD_ALL_ACCESS, false, dwThreadId);
    
    LPVOID lpLoadLibraryParam = ::VirtualAllocEx(hProcess, nullptr, dllPath.length(), MEM_COMMIT, PAGE_READWRITE);
    
    ::WriteProcessMemory(hProcess, lpLoadLibraryParam, dllPath.data(), dllPath.length(), &dwWritten);
    
    ::QueueUserAPC((PAPCFUNC)::GetProcAddress(::GetModuleHandle(TEXT("kernel32.dll")), "LoadLibraryA"), hThread, (ULONG_PTR)lpLoadLibraryParam);
    
    return 0;
}

One major difference between this and CreateRemoteThread is that QueueUserAPC operates on alertable states. Asynchronous procedures queued by QueueUserAPC are only handled when a thread enters this state.

Process Hollowing

Process hollowing, AKA RunPE, is a popular method used to evade anti-virus detection. It allows the injection of entire executable files to be loaded into a target process and executed under its context. Often seen in crypted applications, a file on disk that is compatible with the payload is selected as the host and is created as a process, has its main executable module hollowed out and replaced. This procedure can be broken up into four stages.

  1. Creating a host process

In order for the payload to be injected, the bootstrap must first locate a suitable host. If the payload is a .NET application, the host must also be a .NET application. If the payload is a native executable defined to use the console subsystem, the host must also reflect the same attributes. The same is applied to x86 and x64 programs. Once the host has been chosen, it is created as a suspended process using CreateProcess(PATH_TO_HOST_EXE, ..., CREATE_SUSPENDED, ...).

Executable Image of Host Process
                                        +---  +--------------------+
                                        |     |         PE         |
                                        |     |       Headers      |
                                        |     +--------------------+
                                        |     |       .text        |
                                        |     +--------------------+
                          CreateProcess +     |       .data        |
                                        |     +--------------------+
                                        |     |         ...        |
                                        |     +--------------------+
                                        |     |         ...        |
                                        |     +--------------------+
                                        |     |         ...        |
                                        +---  +--------------------+
  1. Hollowing the host process

For the payload to work correctly after injection, it must be mapped to a virtual address space that matches its ImageBase value found in the optional header of the payload’s PE headers.

typedef struct _IMAGE_OPTIONAL_HEADER {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;          // <---- this is required later
  DWORD                BaseOfCode;
  DWORD                BaseOfData;
  DWORD                ImageBase;                    // <---- 
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;                  // <---- size of the PE file as an image
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  DWORD                SizeOfStackReserve;
  DWORD                SizeOfStackCommit;
  DWORD                SizeOfHeapReserve;
  DWORD                SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER, *PIMAGE_OPTIONAL_HEADER;

This is important because it is more than likely that absolute addresses are involved within the code which is entirely dependent on its location in memory. To safely map the executable image, the virtual memory space starting at the described ImageBase value must be unmapped. Since many executables share common base addresses (usually 0x400000), it is not uncommon to see the host process’s own executable image unmapped as a result. This is done with NtUnmapViewOfSection(IMAGE_BASE, SIZE_OF_IMAGE).

Executable Image of Host Process
                                        +---  +--------------------+
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                   NtUnmapViewOfSection +     |                    |
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                                        +---  +--------------------+
  1. Injecting the payload

To inject the payload, the PE file must be parsed manually to transform it from its disk form to its image form. After allocating virtual memory with VirtualAllocEx, the PE headers are directly copied to that base address.

Executable Image of Host Process
                                        +---  +--------------------+
                                        |     |         PE         |
                                        |     |       Headers      |
                                        +---  +--------------------+
                                        |     |                    |
                                        |     |                    |
                     WriteProcessMemory +     |                    |
                                              |                    |
                                              |                    |
                                              |                    |
                                              |                    |
                                              |                    |
                                              |                    |
                                              +--------------------+

To convert the PE file to an image, all of the sections must be individually read from their file offsets and then placed correctly into their correct virtual offsets using WriteProcessMemory. This is described in each of the sections’ own section header 1.

typedef struct _IMAGE_SECTION_HEADER {
  BYTE  Name[IMAGE_SIZEOF_SHORT_NAME];
  union {
    DWORD PhysicalAddress;
    DWORD VirtualSize;
  } Misc;
  DWORD VirtualAddress;               // <---- virtual offset
  DWORD SizeOfRawData;
  DWORD PointerToRawData;             // <---- file offset
  DWORD PointerToRelocations;
  DWORD PointerToLinenumbers;
  WORD  NumberOfRelocations;
  WORD  NumberOfLinenumbers;
  DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;
Executable Image of Host Process
                                              +--------------------+
                                              |         PE         |
                                              |       Headers      |
                                        +---  +--------------------+
                                        |     |       .text        |
                                        +---  +--------------------+
                     WriteProcessMemory +     |       .data        |
                                        +---  +--------------------+
                                        |     |         ...        |
                                        +---- +--------------------+
                                        |     |         ...        |
                                        +---- +--------------------+
                                        |     |         ...        |
                                        +---- +--------------------+
  1. Execution of payload

The final step is to point the starting address of execution to the payload’s aforementioned AddressOfEntryPoint. Since the process’s main thread is suspended, using GetThreadContext to retrieve the relevant information. The context structure is defined as:

typedef struct _CONTEXT
{
     ULONG ContextFlags;
     ULONG Dr0;
     ULONG Dr1;
     ULONG Dr2;
     ULONG Dr3;
     ULONG Dr6;
     ULONG Dr7;
     FLOATING_SAVE_AREA FloatSave;
     ULONG SegGs;
     ULONG SegFs;
     ULONG SegEs;
     ULONG SegDs;
     ULONG Edi;
     ULONG Esi;
     ULONG Ebx;
     ULONG Edx;
     ULONG Ecx;
     ULONG Eax;                        // <----
     ULONG Ebp;
     ULONG Eip;
     ULONG SegCs;
     ULONG EFlags;
     ULONG Esp;
     ULONG SegSs;
     UCHAR ExtendedRegisters[512];
} CONTEXT, *PCONTEXT;

To modify the starting address, the Eax member must be changed to the virtual address of the payload’s AddressOfEntryPoint. Simply, context.Eax = ImageBase + AddressOfEntryPoint. To apply the changes to the process’s thread, calling SetThreadContext and passing in the modified CONTEXT struct is sufficient. All that is required now is to call ResumeThread and payload should start execution.

Atom Bombing

The Atom Bombing is a code injection technique that takes advantage of global data storage via Windows’s global atom table. The global atom table’s data is accessible across all processes which is what makes it a viable approach. The data stored in the table is a null-terminated C-string type and is represented with a 16-bit integer key called the atom, similar to that of a map data structure. To add data, MSDN provides a GlobalAddAtom 4 function and is defined as:

ATOM WINAPI GlobalAddAtom(
    _In_ LPCTSTR lpString
);

where lpString is the data to be stored. The 16-bit integer atom is returned on a successful call. To retrieve the data stored in the global atom table, MSDN provides a GlobalGetAtomName 2 defined as:

UINT WINAPI GlobalGetAtomName(
    _In_  ATOM   nAtom,
    _Out_ LPTSTR lpBuffer,
    _In_  int    nSize
);

Passing in the identifying atom returned from GlobalAddAtom will place the data into lpBuffer and return the length of the string excluding the null-terminator.

Atom bombing works by forcing the target process to load and execute code placed within the global atom table and this relies on one other crucial function, NtQueueApcThread, which is lowest level userland call for QueueUserAPC. The reason why NtQueueApcThread is used over QueueUserAPC is because, as seen before, QueueUserAPC‘s APCProc 1 only receives one parameter which is a parameter mismatch compared to GlobalGetAtomName[3].

VOID CALLBACK APCProc(               UINT WINAPI GlobalGetAtomName(
                                         _In_  ATOM   nAtom,
    _In_ ULONG_PTR dwParam     ->        _Out_ LPTSTR lpBuffer,
                                         _In_  int    nSize
);                                   );

However, the underlying implementation of NtQueueApcThread allows for three potential parameters:

NTSTATUS NTAPI NtQueueApcThread(                      UINT WINAPI GlobalGetAtomName(
    _In_     HANDLE           ThreadHandle,               // target process's thread
    _In_     PIO_APC_ROUTINE  ApcRoutine,                 // APCProc (GlobalGetAtomName)
    _In_opt_ PVOID            ApcRoutineContext,  ->      _In_  ATOM   nAtom,
    _In_opt_ PIO_STATUS_BLOCK ApcStatusBlock,             _Out_ LPTSTR lpBuffer,
    _In_opt_ ULONG            ApcReserved                 _In_  int    nSize
);                                                    );

Here is a visual representation of the code injection procedure:

Atom bombing code injection
                                              +--------------------+
                                              |                    |
                                              +--------------------+
                                              |      lpBuffer      | <-+
                                              |                    |   |
                                              +--------------------+   |
     +---------+                              |                    |   | Calls
     |  Atom   |                              +--------------------+   | GlobalGetAtomName
     | Bombing |                              |     Executable     |   | specifying
     | Process |                              |       Image        |   | arbitrary
     +---------+                              +--------------------+   | address space
          |                                   |                    |   | and loads shellcode
          |                                   |                    |   |
          |           NtQueueApcThread        +--------------------+   |
          +---------- GlobalGetAtomName ----> |      ntdll.dll     | --+
                                              +--------------------+
                                              |                    |
                                              +--------------------+

This is a very simplified overview of atom bombing but should be adequate for the remainder of the paper. For more information on atom bombing, please refer to enSilo’s AtomBombing: Brand New Code Injection for Windows 27.


Section II: UnRunPE

UnRunPE is a proof-of-concept (PoC) tool that was created for the purposes of applying API monitoring theory to practice. It aims to create a chosen executable file as a suspended process into which a DLL will be injected to hook specific functions utilised by the process hollowing technique.

Code Injection Detection

From the code injection primer, the process hollowing method was described with the following WinAPI call chain:

  1. CreateProcess
  2. NtUnmapViewOfSection
  3. VirtualAllocEx
  4. WriteProcessMemory
  5. GetThreadContext
  6. SetThreadContext
  7. ResumeThread

A few of these calls do not have to be in this specific order, for example, GetThreadContext can be called before VirtualAllocEx. However, the general arrangement cannot deviate much because of the reliance on former API calls, for example, SetThreadContext must be called before GetThreadContext or CreateProcess must be called first otherwise there will be no target process to inject the payload. The tool assumes this as a basis on which it will operate in an attempt to detect a potentially active process hollowing.

Following the theory of API monitoring, it is best to hook the lowest, common point but when it comes it malware, it should ideally be the lowest possible that is accessible. Assuming a worst case scenario, the author may attempt to skip the higher-level WinAPI functions and directly call the lowest function in the call hierarchy, usually found in the ntdll.dll module. The following WinAPI functions are the lowest in the call hierarchy for process hollowing:

  1. NtCreateUserProcess
  2. NtUnmapViewOfSection
  3. NtAllocateVirtualMemory
  4. NtWriteVirtualMemory
  5. NtGetContextThread
  6. NtSetContextThread
  7. NtResumeThread

Code Injection Dumping

Once the necessary functions are hooked, the target process is executed and each of the hooked functions’ parameters are logged to keep track of the current progress of the process hollowing and the host process. The most significant hooks are NtWriteVirtualMemory and NtResumeThread because the former applies the injection of the code and the latter executes it. Along with logging the parameters, UnRunPE will also attempt to dump the bytes written using NtWriteVirtualMemory and then when NtResumeThread is reached, it will attempt to dump the entire payload that has been injected into the host process. To achieve this, it uses the process and thread handle parameters logged in NtCreateUserProcess and the base address and size logged from NtUnmapViewOfSection. Using the parameters provided by NtAllocateVirtualMemory may be more appropriate however, due to some unknown reasons, hooking that function results in some runtime errors. When the payload has been dumped from NtResumeThread, it will terminate the target process and its host process to prevent execution of the injected code.

UnRunPE Demonstration

For the demonstration, I have chosen to use a trojanised binary that I had previously created as an experiment. It consists of the main executable PEview.exe and PuTTY.exe as the hidden executable.

105


Section III: Dreadnought

Dreadnought is a PoC tool that was built upon UnRunPE to support a wider variety of code injection detection, namely, those listed in Code Injection Primer. To engineer such an application, a few augmentations are required.

Detecting Code Injection Method

Because there are so many methods of code injection, differentiating each technique was a necessity. The first approach to this was to recognise a “trigger” API call, that is, the API call which would peform the remote execution of the payload. Using this would do two things: identify the completion of and, to an extent, the type of the code injection. The type can be categorised into four groups:

  • Section: Code injected as/into a section
  • Process: Code injected into a process
  • Code: Generic code injection or shellcode
  • DLL: Code injected as DLLs
process-injection

Process%2BInjection%25281%2529.png1024x768
Process Injection Info Graphic[4] by Karsten Hahn 2

Each trigger API is listed underneath Execute. When either of these APIs have been reached, Dreadought will perform a code dumping method that matches the assumed injection type in a similar fashion to what occurs with process hollowing in UnRunPE. Reliance on this is not enough because there is still potential for API calls to be mixed around to achieve the same functionality as displayed from the stemming of arrows.

Heuristics

For Dreadnought to be able to determine code injection methods more accurately, a heuristic should be involved as an assist. In the development, a very simplistic heuristic was applied. Following the process injection infographic, every time an API was hooked, it would increase the weight of one or more of the associated code injection types stored within a map data structure. As it traces each API call, it will start to favour a certain type. Once the trigger API has been entered, it will identify and compare the weights of the relevant types and proceed with an appropriate action.

Dreadnought Demonstration

Process Injection — Process Hollowing

51

DLL Injection — SetWindowsHookEx

29

DLL Injection — QueueUserAPC

21

Code Injection — Atom Bombing

25

13

15


Conclusion

This paper aimed to bring a technical understanding of code injection and its interaction with the WinAPI. Furthermore, the concept of API monitoring in userland was entertained with the malicious use of injection methods utilised by malware to bypass anti-virus detection. The following presents the current status of Dreadnought as of this writing.

Limitations

Dreadnought’s current heuristic and detection design is incredibly poor but was sufficient enough for theoretical demonstration purposes. Practical use may not be ideal since there is a high possibility that there will be collateral with respect to the hooked API calls during regular operations with the operating system. Because of the impossibility to discern benign from malicious behaviour, false positives and negatives may arise as a result.

With regards to Dreadnought and its operations within userland, it may not be ideal use when dealing with sophisticated malware, especially those which have access to direct interactions with the kernel and those which have the capabilities to evade hooks in general.


PoC Repositories


References