MS16-039 — «Windows 10» 64 bits Integer Overflow exploitation by using GDI objects

On April 12, 2016 Microsoft released 13 security bulletins.
Let’s to talk about how I triggered and exploited the CVE-2016-0165, one of the MS16-039 fixes.

Diffing Stage

For  MS16-039, Microsoft released a fix for all Window versions, either for 32 and 64 bits.
Four vulnerabilities were fixed: CVE-2016-0143, CVE-2016-0145, CVE-2016-0165 y CVE-2016-0167.

Diffing «win32kbase.sys» (v10.0.10586.162 vs v10.0.10586.212), I found 26 changed functions.
Among all the functions that had been changed, I focused on a single function: «RGNMEMOBJ::vCreate».


It’s interesting to say that this function started to be exported since Windows 10, when «win32k.sys» was split into 3 parts:  «win32kbase.sys», «win32kfull.sys» and a very small version of «win32k.sys».

If we look at the diff between the old and the new function version, we can see on the right side that in the first red basic block (left-top), there is a call to «UIntAdd» function.
This new basic block checks that the original instruction «lea eax,[rdi+1]» (first instruction on the left-yellow basic block) won’t produce an integer overflow when the addition is made.

In the second red basic block (right-down) there is a call to «UIntMult» function.
This function checks that the original instruction «lea ecx,[rax+rax*2]» (third instruction on the left-yellow basic block) won’t produce an integer overflow when the multiplication is made.

Summing up, two integer overflows were patched in the same function.


Understanding the fix

If we look at the 3rd instruction of the original basic block (left-yellow), we can see this one:

"lea ecx,[rax+rax*2]"

In this addition/multiplication, the «rax» register represents the number of POINT structs to be handled.
In this case, this number is multiplied by 3 (1+1*2).

At the same time, we can see that the structs number is represented by a 64 bit register, but the destination of this calculation is a 32 bit register!

Now, we know that it’s an integer overflow, the only thing we need to know is what number multiplied by 3 gives us a bigger result than 4GB.
The idea is that this result can’t be represented by a 32 bit number.

A simple way to know that is making the next calculation:

(4,294,967,296 (2^32) / 3) + 1 = 1,431,655,766 (0x55555556)

Now, if we multiplied this result by 3, we will obtain the next one:

 0x55555556 x 3 = 0x1'0000'0002 = 4GB + 2 bytes

In the same basic block and two instructions below («shl ecx,4»), we can see that the number «2» obtained previously will be shifted 4 times to the left, which is the same to multiply this one by 16, resulting in the 0x20 value.

So, the «PALLOCMEM2» function is going to allocate 0x20 bytes to be used by 0x55555556 POINT structs … 🙂

Path to the vulnerability

For the development of this exploit, the path I took was via the «NtGdiPathToRegion» function, located in «win32kfull.sys».
This function calls directly to the vulnerable function.


From user space, this function is located in «gdi32.dll» and it’s exported as «PathToRegion«.

Triggering the vulnerability

Now we know the bug, we need 0x55555556 POINT structs to trigger this vulnerability but, is it possible to
reach this number of POINTs?

In the exploit I wrote, the function that I used to create POINT structs was «PolylineTo«.

Looking at the documentation, we see this definition:

BOOL PolylineTo(
 _In_ HDC hdc,
 _In_ const POINT *lppt,
 _In_ DWORD cCount

The second argument is a POINT struct array and the third one is the array size.

It’s easy to think that, if we create 0x55555556 structs and then, we pass this structures as parameter we will trigger the vulnerability but WE WON’T, let’s see why.

If we analyze the «PolylineTo» internal code, we can see a call to «NtGdiPolyPolyDraw».


«NtGdiPolyPolyDraw» is located in «win32kbase.sys», part of the Windows kernel.

If we see this function, there is a check in the POINT struct number passed as argument:


The maximum POINTs number that we can pass as parameter is 0x4E2000.

It’s clear that there is not a direct way to reach the wanted number to trigger this vulnerability, so what is the trick ?

Well, after some tests, the answer was pretty simple: «call many times to PolylineTo until reach the wanted number of POINT structs».

And the result was this:


The trick is to understand that the «PathToRegion» function processes the sum of all POINT structs assigned to the HDC passed as argument.

PALLOCMEM2 function — «Bonus Track»

Triggering this vulnerability is relatively easy in 64 bit targets like Windows 8, 8.1 y 10.
Now, in «Windows 7» 64 bits, the vulnerability is very difficult to exploit.

Let’s see the vulnerable basic block and the memory allocator function:


The destination of the multiplication by 3 is a 64 bit register (rdx), not a 32 bit register like Windows versions mentioned before.

The only feasible way to produce an integer overflow is with the previous instruction:


In this case, the number of POINTs to be assigned to the HDC should be greater than or equal to 4GB.
Unfortunately, during my tests it was easier to get a kernel memory exhaustion than allocate this number of structures.

Now, why Windows 7 is different to the latest Windows versions ?

Well, if we look the previous picture, we can see that there is a call to «__imp_ExAllocatePoolWithTag», instead of «PALLOCMEM2».

What is the difference ?

The «PALLOCMEM2» function receives a 32 bit argument size, but the  «__imp_ExAllocatePoolWithTag» function receives a 64 bit argument size.
The argument type defines how the result of the multiplication will be passed to the function allocator, in this case, the result is casted to «unsigned int».

We could guess that functions that used to call «__imp_ExAllocatePoolWithTag» in Windows 7 and now they call «PALLOCMEM2» have been exposed to integer overflows much easier to exploit.

Analyzing the heap overflow

Once we trigger the integer overflow, we have to understand what the consequences are.

As a result, we obtain a heap overflow produced by the copy of POINT structs, via the «bConstructGET» function (child of the vulnerable function), where every single struct is copied by «AddEdgeToGet».


This heap overflow is produced when POINT structs are converted and copied to the small allocated memory.

It’s intuitive to think that, if 0x55555556 POINT structs were allocated, the same number will be copied.
If this were true, we would have a huge «memcpy» that it would destroy a big part of the Windows kernel heap, which quickly would give us a BSoD.

What makes it a nice bug is that the «memcpy» can be controlled exactly with the number of POINTs that we want, regardless of the total number passed to the the vulnerable function.

The trick here is that only POINT structs are copied when coordinates ARE NOT REPEATED.
E.g: if «POINT.A is X=30/Y=40» and «POINT.B is X=30/Y=40», only one will be copied.

Thus, it’s possible to control exactly how many structures will be used by the heap overflow.

Some exploitation considerations

One of the most important things to know before to start to write the exploit is that, the vulnerable function allocates memory and produces the heap overflow, but when this function finishes, it frees the allocated memory, since this is used only temporarily.


It means that, when the memory is freed, the Windows kernel will check the current heap chunk header and the next one.
If the next one is corrupted, we will get a BSoD.

Unfortunately, only some values to be overwritten are totally controlled by us, so, we are not able to overwrite the next chunk header with its original content.

On the other hand, we could think the alloc/free operation like «atomic», because we don’t have control execution until the «PathToRegion» function returns.

So, How is it possible to successfully exploit this vulnerability ?

Four years ago I explained something similar in the»The Big Trick Behind Exploit MS12-034» blogpost.

Without a deep reading of the blogpost previously mentioned, the only thing to know is that if the allocated memory chunk is at the end of the 4KB memory page, THERE WON’T BE A NEXT CHUNK HEADER.

So, if the vulnerable function is able to allocate at the end of the memory page, the heap overflow will be done in the next page.
It means that the DATA contained by the second memory page will be corrupted but, we will avoid a BSoD when the allocated memory is freed.

Finding the best memory allocator

Considering the previous one, now it’s necessary to create a very precise heap spray to be able to allocate memory at the end of the memory page.


When heap spray requires several interactions, meaning that memory chunks are allocated and freed many times, the name used for this technique is «Heap Feng Shui», making reference to the ancient Chinese technique (

The POOL TYPE used by the vulnerable function is 0x21, which according to Microsoft means «NonPagedPoolSession» + «NonPagedPoolExecute».

Knowing this, it’s necessary to find some function that allow us to allocate memory in this pool type with the best possible accuracy.

The best function that I have found to heap spray the pool type 0x21 is via the «ZwUserConvertMemHandle» undocumented function, located in «gdi32.dll» and «user32.dll».


When this function is called from user space, the «NtUserConvertMemHandle» function is invoked in kernel space, and this one calls «ConvertMemHandle», both located in «win32kfull.sys».

If we look at the «ConvertMemHandle» code, we can see the perfect allocator:


Basically, this function receives 2 parameters, BUFFER and SIZE and returns a HANDLE.

If we only see the yellow basic blocks, we can see that the «HMAllocObject» function allocates memory through «HMAllocObject».
This function allocates SIZE + 0x14 bytes.
After that, our DATA is copied by «memcpy» to this new memory chunk and it will stay there until it’s freed.

To free the memory chunk created by «NtUserConvertMemHandle», we have to call two functions consecutively: «SetClipboardData» and «EmptyClipboard«.

Summing up, we have a function that allows us to allocate and free memory in the same place where the heap overflow will be done.

Choosing GDI objects to be overwritten

Now, we know how to make a good Heap Feng Shui, we need to find something interesting to be corrupted by the heap overflow.

Considering Diego Juarez’s blogpost «Abusing GDI for ring0 exploit primitives» and exchanging some ideas with him, we remembered that GDI objects are allocated in the pool type 0x21, which is exactly what I needed to exploit this vulnerability.

In that blogpost he described how GDI objects are composed:

typedef struct
   BASEOBJECT64 BaseObject;
   SURFOBJ64 SurfObj;

As explained in the blogpost mentioned above, if the «SURFOBJ64.pvScan0» field is overwritten, we could read or write memory where we want by calling «GetBitmapBits/SetBitmapBits».

In my case, the problem is that I don’t control all values to be overwritten by the heap overflow, so, I can’t overwrite this property with an USEFUL ADDRESS.

A variant of abusing GDI object

Taking into account the previous information, I decided to find another GDI object property to be overwritten by the heap overflow.

After some tests, I found a very interesting thing, the «SURFOBJ64.sizlBitmap» field.
This field is a SIZE struct that defines width and height of the GDI object.

This picture shows the content of the GDI object, before and after the heap overflow:

The final result is that the «cx» property of the «SURFOBJ64.sizlBitmap» SIZE struct is set with the 0xFFFFFFFF value.
It means that now the GDI object is width=0xFFFFFFFF and height=0x01.

So, we are able to read/write contiguous memory far beyond the original limits set for «SURFOBJ64.pvScan0»!

Another interesting thing to know is that, when GDI objects are smaller than 4KB, the DATA pointed by «SURFOBJ64.pvScan0» is contiguous to the object properties.

With all these things, it was time to write an exploit …

Exploitation — Step 1

In the exploit I wrote, I used 0x55555557 POINT structs, which is one more point than what I gave as an example.

So, the new calculation is:

0x55555557 x 3 = 0x1'0000'0005

As the result is a 32 bit number, we get 0x5, an then this number is multiplied by 16

0x5 << 4 = 0x50

It means that «PALLOCMEM2» function will allocate 0x50 bytes when the vulnerable function calls it.

The reason why I decided to increase the size by 0x30 bytes is because very small chunk allocations are not always predictable.

Adding the chunk header size (0x10 bytes), the heap spray to do should be like this:ms16_039-heap-spray-candidate

Looking at the previous picture, only one FREE chunk will be used by the vulnerable function.
When this happens, there will be a GDI object next to this one.

For alignment problems between the used small chunk and the «» property, it was necessary to use an extra PADDING chunk.
It means that three different memory chunks were used to make this heap feng shui.

Hitting a breakpoint after the memory allocation, we can see how the heap spray worked and what position, inside the 4KB memory page, was used by the vulnerable function.


Making some calculations, we can see that if we add «0x60 + 0xbf0» bytes to the allocated chunk, we get the first GDI object (Gh15) next to it.

Exploitation — Step 1.5

Once a GDI object has been overwritten by the heap overflow, it’s necessary to know which one it is.

As the heap spray uses a big number of GDI objects, 4096 in my case, the next step is to go through the GDI object array and detect which has been modified by calling «GetBitmapBits».
When this function is able to read beyond the original object limits, it means that the overwritten GDI object has been found.

Looking at the function prototype:

HBITMAP CreateBitmap(
 _In_ int nWidth,
 _In_ int nHeight,
 _In_ UINT cPlanes,
 _In_ UINT cBitsPerPel,
 _In_ const VOID *lpvBits

As an example, we could create a GDI object like this:

CreateBitmap (100, 100, 1, 32, lpvBits);

Once the object has been created, if we call «GetBitmapBits» with a size bigger than 100 x 100 x 4 bytes (32 bits) it will fail, except if this object has been overwritten afterwards.

So, the way to detect which GDI object has been modified is to check when its behavior is different than expected.

Exploitation — Step 2

Now we can read/write beyond the GDI object limits, we could use this new skill to overwrite a second GDI object, and thus, to get an arbitrary write.

Looking at our heap spray, we can see that there is a second GDI object located 0x1000 bytes after from the first one.


So, if from the first GDI object, we are able to write the contiguous memory that we want, it means that we can modify the «SURFOBJ64.pvScan0» property of the second one.

Then, if we use the second GDI object by calling «GetBitmapBits/SetBitmapBits», we are able to read/write where we want to because we control exactly which address will be used.

Thus, if we repeat the above steps, we are able to read/write ‘n’ times any kernel memory address from USER SPACE, and at the same time, we will avoid running ring-0 shellcode in kernel space.

It’s important to say that before overwriting the «SURFOBJ64.pvScan0» property of the second GDI object, we have to read all DATA between both GDI objects, and then overwrite the same data up to the property we want to modify.

On the other hand, it’s pretty simple to detect which is the second GDI object, because when we read DATA between both objects, we are getting a lot of information, including its HANDLE.

Summing up, we use the heap overflow to overwrite a GDI object, and from this object to overwrite a second GDI object next to it.

Exploitation — Final Stage

Once we get a kernel read/write primitive, we could say that the last step is pretty simple.

The idea is to steal the «System» process token and set it to our process (exploit.exe).

As this attack is done from «Low Integrity Level», we have to know that it’s not possible to get TOKEN addresses by calling «NtQuerySystemInformation» («SystemInformationClass = SystemModuleInformation»), so, we have to take the long way.

The EPROCESS list is a linked list, where every element is a EPROCESS struct that contains information about a unique running process, including its TOKEN.

This list is pointed by the «PsInitialSystemProcess» symbol, located in «ntoskrnl.exe».
So, if we get the Windows kernel base, we could get the «PsInitialSystemProcess» kernel address, and then to do the famous TOKEN KIDNAPPING.

The best way I know of leaking a Windows kernel address is by using the «sidt» user-mode instruction.
This instruction returns the size and address of the operating system interrupt list located in kernel space.

Every single entry contains a pointer to its interrupt handler located in «ntoskrnl.exe».
So, if we use the primitive we got previously, we are able to read these entries and get one «ntoskrnl.exe» interrupt handler address.

The next step is to read backwards several «ntoskrnl.exe» memory addresses until you find the well known «MZ», which means it’s the base address of «ntoskrnl.exe».

Once we get the Windows kernel base, we only need to know what the «PsInitialSystemProcess» kernel address is.
Fortunately, from USER SPACE it’s possible to use the «LoadLibrary» function to load «ntoskrnl.exe» and then to use «GetProcAddress» to get the «PsInitialSystemProcess» relative offset.

As a result of what I explained before, I obtained this:


Final notes

It’s important to say that it wasn’t necessary to use the GDI objects memory leak explained by the «Abusing GDI for ring0 exploit primitives» blogpost.

However, it’s interesting to see how «Windows 10» 64 bits can be exploited from «Low Integrity Level» through kernel vulnerabilities, despite all kernel exploit mitigations implemented until now.


Derandomization of ASLR on any modern processors using JavaScript


Запись обращений к кэшу устройством управления памятью (MMU) в процессоре по мере вызова страниц по особому паттерну, разработанному для выявления различий между разными уровнями иерархии таблиц. Например, паттерн «лесенки» (слева) указывает на первый уровень иерархии, то есть PTL1, при вызове страниц по 32K. Для других уровней иерархии тоже есть методы выявленияПятеро исследователей из Амстердамского свободного университета (Нидерланды) доказали фундаментальную уязвимость техники защиты памяти ASLR на современных процессорах. Они выложили исходный код и подробное описание атаки AnC (ASLR⊕Cache), которой подвержены практически все процессоры.

Исследователи проверили AnC на 22 процессорах разных архитектур — и не нашли ни одного, который был бы защищён от такого рода атаки по стороннему каналу. Это и понятно, ведь во всех процессорах используется буфер динамической трансляции для кэширования адресов памяти, которые транслируются в виртуальные адреса. Защититься от этой атаки можно только отключив кэш процессора.

Рандомизация размещения адресного пространства (ASLR) — технология, применяемая в операционных системах, при использовании которой случайным образом изменяется расположение в адресном пространстве процесса важных структур данных, а именно образов исполняемого файла, подгружаемых библиотек, кучи и стека. Технология создана для усложнения эксплуатации нескольких типов уязвимостей. Предполагается, что она должна защищать память от эксплойтов с переполнением буфера — якобы ASLR не даст злоумышленнику узнать, по какому конкретно адресу размещаются структуры данных после переполнения, куда можно записать шелл-код.

В прошлом исследователи несколько раз показывали, что защиту ASLR можно обойти в некоторых случаях. Например, если у злоумышленника есть полные права в системе, он может сломать ASLR на уровне ядра ОС. Но в типичных условиях — против атаки через браузер — ASLR считалась вполне надёжной защитой. Теперь нет.

В 2016 году эта же группа голландских специалистов показала, как с помощью JavaScript обойти защиту ASLR в браузере Microsoft Edge, используя атаку по стороннему каналу дедупликации памяти. Microsoft быстро отключила дедубликацию памяти, чтобы защитить своих пользователей. Но это не решило проблему ASLR на фундаментальном уровне, связанную с работой самого устройства управления памятью (MMU) в процессорах.

В современных процессорах модуль MMU использует иерархию кэша для улучшения производительности прохода по иерархическим таблицам страниц в памяти. Это неотъемлемая функция современных процессоров. Корень проблемы в том, что кэш L3 доступен для любых сторонних приложений, в том числе скриптов JavaScript из браузера.

Организация памяти в современном процессоре Intel

Хакеры сумели определить средствами JavaScript, к каким страницам в таблицах страниц чаще всего обращается модуль MMU. Точность определения вполне достаточна для обхода ASLR даже с энтропией 36 бит.

Принцип атаки

Принцип атаки ASLR⊕Cache основан на том, что в результате прохода MMU по таблицам страниц происходит их запись в кэш процессора. Эта операция осуществляется и во время трансляции виртуального адреса в соответствующий ему физический адрес в памяти. Таким образом, буфер динамической трансляции (TLB) в каждом процессоре всегда хранит самые последние трансляции адресов.

Если происходит промах мимо кэша TLB, то модулю MMU придётся пройти по всем таблицам страниц конкретного процесса (тоже хранящимся в основной памяти), чтобы выполнить трансляцию. Для улучшения производительности MMU в такой ситуации таблицы страниц кэшируются в быстром кэше процессора L3 для ускорения доступа к ним.

Получается, что при использовании механизма безопасности ASLR сами таблицы страниц становятся хранителем секретной информации: начальный номер записи таблицы страниц (через offset) на каждом уровне содержит часть секретного виртуального адреса, который использовался при трансляции.

Проход MMU по таблице страниц для трансляции адреса 0x644b321f4000 в соответствующую страницу памяти на архитектуре x86_64

Хакеры разработали специальную технику сканирования памяти (им нужны были таймеры получше, чем есть в браузерах), чтобы определить наборы кэша (конкретные строки в секторированном кэше прямого отображения), к которым обращается MMU после таргетированного прохода по таблицам страниц, когда происходит разыменование указателя данных или исполнение инструкции кода. Поскольку только определённые адреса могут соответствовать определённым наборам кэша, то получение информации об этих наборах выдаёт начальные номера записей нужных записей таблиц страниц на каждом уровне иерархии — и, таким образом, дерандомизирует ASLR.

Исследователи протестировали эксплойт в Chrome и Firefox на 22-ти современных процессорах и показали его успешную работу.

Успешность атаки в Chrome и Firefox и процент ложных срабатываний

Не спасают даже встроенные механизмы защиты вроде умышленной поломки разработчиками браузеров точного JavaScript-таймера

Сломанный таймер в Chrome и Firefox

Авторы эксплойта написали собственный таймер.

Демонстрация атаки в Firefox

Атака проверена на следующих процессорах:

Модель процессора Микроархитектура Год
Intel Xeon E3-1240 v5 Skylake 2015
Intel Core i7-6700K Skylake 2015
Intel Celeron N2840 Silvermont 2014
Intel Xeon E5-2658 v2 Ivy Bridge EP 2013
Intel Atom C2750 Silvermont 2013
Intel Core i7-4500U Haswell 2013
Intel Core i7-3632QM Ivy Bridge 2012
Intel Core i7-2620QM Sandy Bridge 2011
Intel Core i5 M480 Westmere 2010
Intel Core i7 920 Nehalem 2008
AMD FX-8350 8-Core Piledriver 2012
AMD FX-8320 8-Core Piledriver 2012
AMD FX-8120 8-Core Bulldozer 2011
AMD Athlon II 640 X4 K10 2010
AMD E-350 Bobcat 2010
AMD Phenom 9550 4-Core K10 2008
Allwinner A64 ARM Cortex A53 2016
Samsung Exynos 5800 ARM Cortex A15 2014
Samsung Exynos 5800 ARM Cortex A7 2014
Nvidia Tegra K1 CD580M-A1 ARM Cortex A15 2014
Nvidia Tegra K1 CD570M-A1 ARM Cortex A15; LPAE  2014

К сожалению, описание этой техники в открытом доступе может снова сделать эффективными старые способы атаки на кэш, от которых вроде бы уже давно защитились.

В данный момент атака AnC задокументирована в четырёх бюллетенях безопасности:

  • CVE-2017-5925 — для процессоров Intel;
  • CVE-2017-5926 — для процессоров AMD;
  • CVE-2017-5927 — для процессоров ARM;
  • CVE-2017-5928 — для JavaScript-тайеров в разных браузерах.

По мнению авторов, единственным способом защиты для пользователя является использование программ вроде NoScript, которые блокирует выполнение сторонних скриптов JavaScript в браузере.