( Original text by zerosum0x0 )
Pwning Windows 7 was no problem, but I would re-visit the EternalBlue exploit against Windows XP for a time and it never seemed to work. I tried all levels of patching and service packs, but the exploit would either always passively fail to work or blue-screen the machine. I moved on from it, because there was so much more of FuzzBunch that was unexplored.
Well, one day on a pentest a wild Windows XP appeared, and I figured I would give FuzzBunch a go. To my surprise, it worked! And on the first try.
Why did this exploit work in the wild but not against runs in my «lab»?
tl;dr: Differences in NT/HAL between single-core/multi-core/PAE CPU installs causes FuzzBunch’s XP payload to abort prematurely on single-core installs.
Multiple Exploit Chains
Keep in mind that there are several versions of EternalBlue. The Windows 7 kernel exploit has been well documented. There are also ports to Windows 10 which have been documented by myself and JennaMagius as well as sleepya_.
But FuzzBunch includes a completely different exploit chain for Windows XP, which cannot use the same basic primitives (i.e. SMB2 and SrvNet.sys do not exist yet!). I discussed this version in depth at DerbyCon 8.0 (slides / video).
tl;dw: The boot processor KPCR is static on Windows XP, and to gain shellcode execution the value of KPRCB.PROCESSOR_POWER_STATE.IdleFunction is overwritten.
As it turns out, the exploit was working just fine in the lab. What was failing was FuzzBunch’s payload.
The main stages of the ring 0 shellcode performs the following actions:
- Obtains &nt and &hal using the now-defunct KdVersionBlock trick
- Resolves some necessary function pointers, such as hal!HalInitializeProcessor
- Restores the boot processor KPCR/KPRCB which was corrupted during exploitation
- Runs DoublePulsar to backdoor the SMB service
- Gracefully resumes execution at a normal state (nt!PopProcessorIdle)
Single Core Branch Anomaly
Setting a couple hardware breakpoints on the IdleFunction switch and +0x170 into the shellcode (after a couple initial XOR/Base64 shellcode decoder stages), it is observed that a multi-core machine install branches differently than the single-core machine.
"ba e 1 poi(ffdffc50)+0x170;g;"
The multi-core machine has acquired a function pointer to hal!HalInitializeProcessor.
Presumably, this function will be called to clean up the semi-corrupted KPRCB.
The single-core machine did not find hal!HalInitializeProcessor… sub_547 instead returned NULL. The payload cannot continue, and will now self destruct by zeroing as much of itself out as it can and set up a ROP chain to free some memory and resume execution.
Note: A successful shellcode execution will perform this action as well, just after installing DoublePulsar first.
Root Cause Analysis
The shellcode function sub_547 does not properly find hal!HalInitializeProcessor on single core CPU installs, and thus the entire payload is forced to abruptly abort. We will need to reverse engineer the shellcode function to figure out exactly why the payload is failing.
There is an issue in the kernel shellcode that does not take into account all of the different types of the NT kernel executables are available for Windows XP. Specifically, the multi-core processor version of NT works fine (i.e. ntkrnlamp.exe), but a single core install (i.e. ntoskrnl.exe) will fail. Likewise, there is a similar difference in halmacpi.dll vs halacpi.dll.
The NT Red Herring
The first operation that sub_547 performs is to obtain HAL function imports used by the NT executive. It finds HAL functions by first reading at offset 0x1040 into NT.
On multi-core installs of Windows XP, this offset works as intended, and the shellcode finds hal!HalQueryRealTimeClock:
However, on single-core installations this is not a HAL import table, but instead a string table:
At first I figured this was probably the root cause. But it is a red herring, as there is correction code. The shellcode will check if the value at 0x1040 is an address in the range within HAL. If not it will subtract 0xc40 and start searching in increments of 0x40 for an address within the HAL range, until it reaches 0x1040 again.
Eventually, the single-core version will find a HAL function, this time hal!HalCalibratePerformanceCounter:
This all checks out and is fine, and shows that Equation Group did a good job here for determining different types of XP NT.
HAL Variation Byte Table
Now that a function within HAL has been found, the shellcode will attempt to locate hal!HalInitializeProcessor. It does so by carrying around a table (at shellcode offset 0x5e7) that contains a 1-byte length field followed by an expected sequence of bytes. The original discovered HAL function address is incremented in search of those bytes within the first 0x20 bytes of a new function.
The desired 5 bytes are easily found in the multi-core version of HAL:
However, the function on single-core HAL is much different.
There is a similar mov instruction, but it is not a movzx. The byte sequence being searched for is not present in this function, and consequently the function is not discovered.
It is well known (from many flame wars on Windows kernel development mailing lists) that searching for byte sequences to identify functions is unreliable across different versions and service packs of Windows. We have learned from this bug that exploit developers must also be careful to account for differences in single/multi-core and PAE variations of NTOSKRNL and HAL. In this case, the compiler decided to change one movzx instruction to a mov instruction and broke the entire payload.
It is very curious that the KdVersionBlock trick and a byte sequence search is used to find functions in this payload. The Windows 7 payload finds NT and its exports in, as seen, a more reliable way, by searching backwards in memory from the KPCR IDT and then parsing PE headers.
This HAL function can be found through such other means (it appears readily exported by HAL). The corrupted KPCR can also be cleaned up in other ways. But those are both exercises for the reader.
There is circumstantial evidence that primary FuzzBunch development was started in late 2001. The payload seems maybe it was only written for and tested against multi-core processors? Perhaps this could be a indicator as to how recent the XP exploit was first written. Windows XP was broadly released on October 25, 2001. While this is the same year that IBM invented the first dual-core processor (POWER4), Intel and AMD would not have a similar offering until 2004 and 2005, respectively.
This is yet another example of the evolution of these ETERNAL exploits. The Equation Group could have re-used the same exploit and payload primitives, yet chose to develop them using many different methodologies, perhaps so if one methodology was burned they could continue to reap the benefits of their exploit diversification. There is much esoteric Windows kernel internals knowledge that can be learned from studying these exploits.