OATmeal on the Universal Cereal Bus: Exploiting Android phones over USB

( origin  by Jann Horn, Google Project Zero )

Recently, there has been some attention around the topic of physical attacks on smartphones, where an attacker with the ability to connect USB devices to a locked phone attempts to gain access to the data stored on the device. This blogpost describes how such an attack could have been performed against Android devices (tested with a Pixel 2).

 

After an Android phone has been unlocked once on boot (on newer devices, using the «Unlock for all features and data» screen; on older devices, using the «To start Android, enter your password» screen), it retains the encryption keys used to decrypt files in kernel memory even when the screen is locked, and the encrypted filesystem areas or partition(s) stay accessible. Therefore, an attacker who gains the ability to execute code on a locked device in a sufficiently privileged context can not only backdoor the device, but can also directly access user data.
(Caveat: We have not looked into what happens to work profile data when a user who has a work profile toggles off the work profile.)

 

The bug reports referenced in this blogpost, and the corresponding proof-of-concept code, are available at:
https://bugs.chromium.org/p/project-zero/issues/detail?id=1583 («directory traversal over USB via injection in blkid output»)
https://bugs.chromium.org/p/project-zero/issues/detail?id=1590 («privesc zygote->init; chain from USB»)

 

These issues were fixed as CVE-2018-9445 (fixed at patch level 2018-08-01) and CVE-2018-9488 (fixed at patch level 2018-09-01).

The attack surface

Many Android phones support USB host mode (often using OTG adapters). This allows phones to connect to many types of USB devices (this list isn’t necessarily complete):

 

  • USB sticks: When a USB stick is inserted into an Android phone, the user can copy files between the system and the USB stick. Even if the device is locked, Android versions before P will still attempt to mount the USB stick. (Android 9, which was released after these issues were reported, has logic in vold that blocks mounting USB sticks while the device is locked.)
  • USB keyboards and mice: Android supports using external input devices instead of using the touchscreen. This also works on the lockscreen (e.g. for entering the PIN).
  • USB ethernet adapters: When a USB ethernet adapter is connected to an Android phone, the phone will attempt to connect to a wired network, using DHCP to obtain an IP address. This also works if the phone is locked.

 

This blogpost focuses on USB sticks. Mounting an untrusted USB stick offers nontrivial attack surface in highly privileged system components: The kernel has to talk to the USB mass storage device using a protocol that includes a subset of SCSI, parse its partition table, and interpret partition contents using the kernel’s filesystem implementation; userspace code has to identify the filesystem type and instruct the kernel to mount the device to some location. On Android, the userspace implementation for this is mostly in vold(one of the processes that are considered to have kernel-equivalent privileges), which uses separate processes in restrictive SELinux domains to e.g. determine the filesystem types of partitions on USB sticks.

 

The bug (part 1): Determining partition attributes

When a USB stick has been inserted and vold has determined the list of partitions on the device, it attempts to identify three attributes of each partition: Label (a user-readable string describing the partition), UUID (a unique identifier that can be used to determine whether the USB stick is one that has been inserted into the device before), and filesystem type. In the modern GPT partitioning scheme, these attributes can mostly be stored in the partition table itself; however, USB sticks tend to use the MBR partition scheme instead, which can not store UUIDs and labels. For normal USB sticks, Android supports both the MBR partition scheme and the GPT partition scheme.

 

To provide the ability to label partitions and assign UUIDs to them even when the MBR partition scheme is used, filesystems implement a hack: The filesystem header contains fields for these attributes, allowing an implementation that has already determined the filesystem type and knows the filesystem header layout of the specific filesystem to extract this information in a filesystem-specific manner. When vold wants to determine label, UUID and filesystem type, it invokes /system/bin/blkid in the blkid_untrusted SELinux domain, which does exactly this: First, it attempts to identify the filesystem type using magic numbers and (failing that) some heuristics, and then, it extracts the label and UUID. It prints the results to stdout in the following format:

 

/dev/block/sda1: LABEL=»<label>» UUID=»<uuid>» TYPE=»<type>»

 

However, the version of blkid used by Android did not escape the label string, and the code responsible for parsing blkid’s output only scanned for the first occurrences of UUID=» and TYPE=». Therefore, by creating a partition with a crafted label, it was possible to gain control over the UUID and type strings returned to vold, which would otherwise always be a valid UUID string and one of a fixed set of type strings.

The bug (part 2): Mounting the filesystem

When vold has determined that a newly inserted USB stick with an MBR partition table contains a partition of type vfat that the kernel’s vfat filesystem implementation should be able to mount,PublicVolume::doMount() constructs a mount path based on the filesystem UUID, then attempts to ensure that the mountpoint directory exists and has appropriate ownership and mode, and then attempts to mount over that directory:

 

   if (mFsType != «vfat») {
       LOG(ERROR) << getId() << » unsupported filesystem » << mFsType;
       return -EIO;
   }
   if (vfat::Check(mDevPath)) {
       LOG(ERROR) << getId() << » failed filesystem check»;
       return -EIO;
   }
   // Use UUID as stable name, if available
   std::string stableName = getId();
   if (!mFsUuid.empty()) {
       stableName = mFsUuid;
   }
   mRawPath = StringPrintf(«/mnt/media_rw/%s», stableName.c_str());
   […]
   if (fs_prepare_dir(mRawPath.c_str(), 0700, AID_ROOT, AID_ROOT)) {
       PLOG(ERROR) << getId() << » failed to create mount points»;
       return -errno;
   }
   if (vfat::Mount(mDevPath, mRawPath, false, false, false,
           AID_MEDIA_RW, AID_MEDIA_RW, 0007, true)) {
       PLOG(ERROR) << getId() << » failed to mount » << mDevPath;
       return -EIO;
   }

 

The mount path is determined using a format string, without any sanity checks on the UUID string that was provided by blkid. Therefore, an attacker with control over the UUID string can perform a directory traversal attack and cause the FAT filesystem to be mounted outside of /mnt/media_rw.

 

This means that if an attacker inserts a USB stick with a FAT filesystem whose label string is ‘UUID=»../##’ into a locked phone, the phone will mount that USB stick to /mnt/##.

 

However, this straightforward implementation of the attack has several severe limitations; some of them can be overcome, others worked around:

 

  • Label string length: A FAT filesystem label is limited to 11 bytes. An attacker attempting to perform a straightforward attack needs to use the six bytes ‘UUID=»‘ to start the injection, which leaves only five characters for the directory traversal — insufficient to reach any interesting point in the mount hierarchy. The next section describes how to work around that.
  • SELinux restrictions on mountpoints: Even though vold is considered to be kernel-equivalent, a SELinux policy applies some restrictions on what vold can do. Specifically, the mountonpermission is restricted to a set of permitted labels.
  • Writability requirement: fs_prepare_dir() fails if the target directory is not mode 0700 and chmod() fails.
  • Restrictions on access to vfat filesystems: When a vfat filesystem is mounted, all of its files are labeled as u:object_r:vfat:s0. Even if the filesystem is mounted in a place from which important code or data is loaded, many SELinux contexts won’t be permitted to actually interact with the filesystem — for example, the zygote and system_server aren’t allowed to do so. On top of that, processes that don’t have sufficient privileges to bypass DAC checks also need to be in the media_rw group. The section «Dealing with SELinux: Triggering the bug twice» describes how these restrictions can be avoided in the context of this specific bug.

Exploitation: Chameleonic USB mass storage

As described in the previous section, a FAT filesystem label is limited to 11 bytes. blkid supports a range of other filesystem types that have significantly longer label strings, but if you used such a filesystem type, you’d then have to make it past the fsck check for vfat filesystems and the filesystem header checks performed by the kernel when mounting a vfat filesystem. The vfat kernel filesystem doesn’t require a fixed magic value right at the start of the partition, so this might theoretically work somehow; however, because several of the values in a FAT filesystem header are actually important for the kernel, and at the same time,blkid also performs some sanity checks on superblocks, the PoC takes a different route.

 

After blkid has read parts of the filesystem and used them to determine the filesystem’s type, label and UUID, fsck_msdos and the in-kernel filesystem implementation will re-read the same data, and those repeated reads actually go through to the storage device. The Linux kernel caches block device pages when userspace directly interacts with block devices, but __blkdev_put() removes all cached data associated with a block device when the last open file referencing the device is closed.

 

A physical attacker can abuse this by attaching a fake storage device that returns different data for multiple reads from the same location. This allows us to present, for example, a romfs header with a long label string to blkid while presenting a perfectly normal vfat filesystem to fsck_msdos and the in-kernel filesystem implementation.

 

This is relatively simple to implement in practice thanks to Linux’ built-in support for device-side USB.Andrzej Pietrasiewicz’s talk «Make your own USB gadget» is a useful introduction to this topic. Basically, the kernel ships with implementations for device-side USB mass storage, HID devices, ethernet adapters, and more; using a relatively simple pseudo-filesystem-based configuration interface, you can configure a composite gadget that provides one or multiple of these functions, potentially with multiple instances, to the connected device. The hardware you need is a system that runs Linux and supports device-side USB; for testing this attack, a Raspberry Pi Zero W was used.

 

The f_mass_storage gadget function is designed to use a normal file as backing storage; to be able to interactively respond to requests from the Android phone, a FUSE filesystem is used as backing storage instead, using the direct_io option / the FOPEN_DIRECT_IO flag to ensure that our own kernel doesn’t add unwanted caching.

 

At this point, it is already possible to implement an attack that can steal, for example, photos stored on external storage. Luckily for an attacker, immediately after a USB stick has been mounted,com.android.externalstorage/.MountReceiver is launched, which is a process whose SELinux domain permits access to USB devices. So after a malicious FAT partition has been mounted over /data (using the label string ‘UUID=»../../data’), the zygote forks off a child with appropriate SELinux context and group membership to permit accesses to USB devices. This child then loads bytecode from /data/dalvik-cache/, permitting us to take control over com.android.externalstorage, which has the necessary privileges to exfiltrate external storage contents.

 

However, for an attacker who wants to access not just photos, but things like chat logs or authentication credentials stored on the device, this level of access should normally not be sufficient on its own.

Dealing with SELinux: Triggering the bug twice

The major limiting factor at this point is that, even though it is possible to mount over /data, a lot of the highly-privileged code running on the device is not permitted to access the mounted filesystem. However, one highly-privileged service does have access to it: vold.

 

vold actually supports two types of USB sticks, PublicVolume and PrivateVolume. Up to this point, this blogpost focused on PublicVolume; from here on, PrivateVolume becomes important.
A PrivateVolume is a USB stick that must be formatted using a GUID Partition Table. It must contain a partition that has type UUID kGptAndroidExpand (193D1EA4-B3CA-11E4-B075-10604B889DCF), which contains a dm-crypt-encrypted ext4 (or f2fs) filesystem. The corresponding key is stored at/data/misc/vold/expand_{partGuid}.key, where {partGuid} is the partition GUID from the GPT table as a normalized lowercase hexstring.

 

As an attacker, it normally shouldn’t be possible to mount an ext4 filesystem this way because phones aren’t usually set up with any such keys; and even if there is such a key, you’d still have to know what the correct partition GUID is and what the key is. However, we can mount a vfat filesystem over /data/misc and put our own key there, for our own GUID. Then, while the first malicious USB mass storage device is still connected, we can connect a second one that is mounted as PrivateVolume using the keys vold will read from the first USB mass storage device. (Technically, the ordering in the last sentence isn’t entirely correct — actually, the exploit provides both mass storage devices as a single composite device at the same time, but stalls the first read from the second mass storage device to create the desired ordering.)

 

Because PrivateVolume instances use ext4, we can control DAC ownership and permissions on the filesystem; and thanks to the way a PrivateVolume is integrated into the system, we can even control SELinux labels on that filesystem.

 

In summary, at this point, we can mount a controlled filesystem over /data, with arbitrary file permissions and arbitrary SELinux contexts. Because we control file permissions and SELinux contexts, we can allow any process to access files on our filesystem — including mapping them with PROT_EXEC.

Injecting into zygote

The zygote process is relatively powerful, even though it is not listed as part of the TCB. By design, it runs with UID 0, can arbitrarily change its UID, and can perform dynamic SELinux transitions into the SELinux contexts of system_server and normal apps. In other words, the zygote has access to almost all user data on the device.

 

When the 64-bit zygote starts up on system boot, it loads code from /data/dalvik-cache/arm64/system@framework@boot*.{art,oat,vdex}. Normally, the oat file (which contains an ELF library that will be loaded with dlopen()) and the vdex file are symlinks to files on the immutable/system partition; only the art file is actually stored on /data. But we can instead makesystem@framework@boot.art and system@framework@boot.vdex symlinks to /system (to get around some consistency checks without knowing exactly which Android build is running on the device) while placing our own malicious ELF library at system@framework@boot.oat (with the SELinux context that the legitimate oat file would have). Then, by placing a function with__attribute__((constructor)) in our ELF library, we can get code execution in the zygote as soon as it calls dlopen() on startup.

 

The missing step at this point is that when the attack is performed, the zygote is already running; and this attack only works while the zygote is starting up.

Crashing the system

This part is a bit unpleasant.

 

When a critical system component (in particular, the zygote or system_server) crashes (which you can simulate on an eng build using kill), Android attempts to automatically recover from the crash by restarting most userspace processes (including the zygote). When this happens, the screen first shows the boot animation for a bit, followed by the lock screen with the «Unlock for all features and data» prompt that normally only shows up after boot. However, the key material for accessing user data is still present at this point, as you can verify if ADB is on by running «ls /sdcard» on the device.

 

This means that if we can somehow crash system_server, we can then inject code into the zygote during the following userspace restart and will be able to access user data on the device.

 

Of course, mounting our own filesystem over /data is very crude and makes all sorts of things fail, but surprisingly, the system doesn’t immediately fall over — while parts of the UI become unusable, most places have some error handling that prevents the system from failing so clearly that a restart happens.
After some experimentation, it turned out that Android’s code for tracking bandwidth usage has a safety check: If the network usage tracking code can’t write to disk and >=2MiB (mPersistThresholdBytes) of network traffic have been observed since the last successful write, a fatal exception is thrown. This means that if we can create some sort of network connection to the device and then send it >=2MiB worth of ping flood, then trigger a stats writeback by either waiting for a periodic writeback or changing the state of a network interface, the device will reboot.

 

To create a network connection, there are two options:

 

  • Connect to a wifi network. Before Android 9, even when the device is locked, it is normally possible to connect to a new wifi network by dragging down from the top of the screen, tapping the drop-down below the wifi symbol, then tapping on the name of an open wifi network. (This doesn’t work for networks protected with WPA, but of course an attacker can make their own wifi network an open one.) Many devices will also just autoconnect to networks with certain names.
  • Connect to an ethernet network. Android supports USB ethernet adapters and will automatically connect to ethernet networks.

 

For testing the exploit, a manually-created connection to a wifi network was used; for a more reliable and user-friendly exploit, you’d probably want to use an ethernet connection.

 

At this point, we can run arbitrary native code in zygote context and access user data; but we can’t yet read out the raw disk encryption key, directly access the underlying block device, or take a RAM dump (although at this point, half the data that would’ve been in a RAM dump is probably gone anyway thanks to the system crash). If we want to be able to do those things, we’ll have to escalate our privileges a bit more.

From zygote to vold

Even though the zygote is not supposed to be part of the TCB, it has access to the CAP_SYS_ADMINcapability in the initial user namespace, and the SELinux policy permits the use of this capability. The zygote uses this capability for the mount() syscall and for installing a seccomp filter without setting theNO_NEW_PRIVS flag. There are multiple ways to abuse CAP_SYS_ADMIN; in particular, on the Pixel 2, the following ways seem viable:

 

  • You can install a seccomp filter without NO_NEW_PRIVS, then perform an execve() with a privilege transition (SELinux exec transition, setuid/setgid execution, or execution with permitted file capability set). The seccomp filter can then force specific syscalls to fail with error number 0 — which e.g. in the case of open() means that the process will believe that the syscall succeeded and allocated file descriptor 0. This attack works here, but is a bit messy.
  • You can instruct the kernel to use a file you control as high-priority swap device, then create memory pressure. Once the kernel writes stack or heap pages from a sufficiently privileged process into the swap file, you can edit the swapped-out memory, then let the process load it back. Downsides of this technique are that it is very unpredictable, it involves memory pressure (which could potentially cause the system to kill processes you want to keep, and probably destroys many forensic artifacts in RAM), and requires some way to figure out which swapped-out pages belong to which process and are used for what. This requires the kernel to support swap.
  • You can use pivot_root() to replace the root directory of either the current mount namespace or a newly created mount namespace, bypassing the SELinux checks that would have been performed for mount(). Doing it for a new mount namespace is useful if you only want to affect a child process that elevates its privileges afterwards. This doesn’t work if the root filesystem is a rootfs filesystem. This is the technique used here.

 

In recent Android versions, the mechanism used to create dumps of crashing processes has changed: Instead of asking a privileged daemon to create a dump, processes execute one of the helpers/system/bin/crash_dump64 and /system/bin/crash_dump32, which have the SELinux labelu:object_r:crash_dump_exec:s0. Currently, when a file with such a label is executed by any SELinux domain, an automatic domain transition to the crash_dump domain is triggered (which automatically implies setting the AT_SECURE flag in the auxiliary vector, instructing the linker of the new process to be careful with environment variables like LD_PRELOAD):

 

domain_auto_trans(domain, crash_dump_exec, crash_dump);

 

At the time this bug was reported, the crash_dump domain had the following SELinux policy:

 

[…]
allow crash_dump {
 domain
 -init
 -crash_dump
 -keystore
 -logd
}:process { ptrace signal sigchld sigstop sigkill };
[…]
r_dir_file(crash_dump, domain)
[…]

 

This policy permitted crash_dump to attach to processes in almost any domain via ptrace() (providing the ability to take over the process if the DAC controls permit it) and allowed it to read properties of any process in procfs. The exclusion list for ptrace access lists a few TCB processes; but notably, vold was not on the list. Therefore, if we can execute crash_dump64 and somehow inject code into it, we can then take overvold.

 

Note that the ability to actually ptrace() a process is still gated by the normal Linux DAC checks, andcrash_dump can’t use CAP_SYS_PTRACE or CAP_SETUID. If a normal app managed to inject code intocrash_dump64, it still wouldn’t be able to leverage that to attack system components because of the UID mismatch.

 

If you’ve been reading carefully, you might now wonder whether we could just place our own binary with context u:object_r:crash_dump_exec:s0 on our fake /data filesystem, and then execute that to gain code execution in the crash_dump domain. This doesn’t work because vold — very sensibly — hardcodes theMS_NOSUID flag when mounting USB storage devices, which not only degrades the execution of classic setuid/setgid binaries, but also degrades the execution of files with file capabilities and executions that would normally involve automatic SELinux domain transitions (unless the SELinux policy explicitly opts out of this behavior by granting PROCESS2__NOSUID_TRANSITION).

 

To inject code into crash_dump64, we can create a new mount namespace with unshare() (using ourCAP_SYS_ADMIN capability), then call pivot_root() to point the root directory of our process into a directory we fully control, and then execute crash_dump64. Then the kernel parses the ELF headers ofcrash_dump64, reads the path to the linker (/system/bin/linker64), loads the linker into memory from that path (relative to the process root, so we can supply our own linker here), and executes it.

 

At this point, we can execute arbitrary code in crash_dump context and escalate into vold from there, compromising the TCB. At this point, Android’s security policy considers us to have kernel-equivalent privileges; however, to see what you’d have to do from here to gain code execution in the kernel, this blogpost goes a bit further.

From vold to init context

It doesn’t look like there is an easy way to get from vold into the real init process; however, there is a way into the init SELinux context. Looking through the SELinux policy for allowed transitions into init context, we find the following policy:

 

domain_auto_trans(kernel, init_exec, init)

 

This means that if we can get code running in kernel context to execute a file we control labeled init_exec, on a filesystem that wasn’t mounted with MS_NOSUID, then our file will be executed in init context.

 

The only code that is running in kernel context is the kernel, so we have to get the kernel to execute the file for us. Linux has a mechanism called «usermode helpers» that can do this: Under some circumstances, the kernel will delegate actions (such as creating coredumps, loading key material into the kernel, performing DNS lookups, …) to userspace code. In particular, when a nonexistent key is looked up (e.g. viarequest_key()), /sbin/request-key (hardcoded, can only be changed to a different static path at kernel build time with CONFIG_STATIC_USERMODEHELPER_PATH) will be invoked.

 

Being in vold, we can simply mount our own ext4 filesystem over /sbin without MS_NOSUID, then callrequest_key(), and the kernel invokes our request-key in init context.

 

The exploit stops at this point; however, the following section describes how you could build on it to gain code execution in the kernel.

From init context to the kernel

From init context, it is possible to transition into modprobe or vendor_modprobe context by executing an appropriately labeled file after explicitly requesting a domain transition (note that this is domain_trans(), which permits a transition on exec, not domain_auto_trans(), which automatically performs a transition on exec):

 

domain_trans(init, { rootfs toolbox_exec }, modprobe)
domain_trans(init, vendor_toolbox_exec, vendor_modprobe)

 

modprobe and vendor_modprobe have the ability to load kernel modules from appropriately labeled files:

 

allow modprobe self:capability sys_module;
allow modprobe { system_file }:system module_load;
allow vendor_modprobe self:capability sys_module;
allow vendor_modprobe { vendor_file }:system module_load;

 

Android nowadays doesn’t require signatures for kernel modules:

 

walleye:/ # zcat /proc/config.gz | grep MODULE
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODULE_SRCVERSION_ALL=y
# CONFIG_MODULE_SIG is not set
# CONFIG_MODULE_COMPRESS is not set
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_ARM64_MODULE_CMODEL_LARGE=y
CONFIG_ARM64_MODULE_PLTS=y
CONFIG_RANDOMIZE_MODULE_REGION_FULL=y
CONFIG_DEBUG_SET_MODULE_RONX=y

 

Therefore, you could execute an appropriately labeled file to execute code in modprobe context, then load an appropriately labeled malicious kernel module from there.

Lessons learned

Notably, this attack crosses two weakly-enforced security boundaries: The boundary from blkid_untrusted tovold (when vold uses the UUID provided by blkid_untrusted in a pathname without checking that it resembles a valid UUID) and the boundary from the zygote to the TCB (by abusing the zygote’s CAP_SYS_ADMINcapability). Software vendors have, very rightly, been stressing for quite some time that it is important for security researchers to be aware of what is, and what isn’t, a security boundary — but it is also important for vendors to decide where they want to have security boundaries and then rigorously enforce those boundaries. Unenforced security boundaries can be of limited use — for example, as a development aid while stronger isolation is in development -, but they can also have negative effects by obfuscating how important a component is for the security of the overall system.

In this case, the weakly-enforced security boundary between vold and blkid_untrusted actually contributed to the vulnerability, rather than mitigating it. If the blkid code had run in the vold process, it would not have been necessary to serialize its output, and the injection of a fake UUID would not have worked.

Реклама

Windows Exploitation Tricks: Exploiting Arbitrary File Writes for Local Elevation of Privilege

Previously I presented a technique to exploit arbitrary directory creation vulnerabilities on Windows to give you read access to any file on the system. In the upcoming Spring Creators Update (RS4) the abuse of mount points to link to files as I exploited in the previous blog post has been remediated. This is an example of a long term security benefit from detailing how vulnerabilities might be exploited, giving a developer an incentive to find ways of mitigating the exploitation vector.

Keeping with that spirit in this blog post I’ll introduce a novel technique to exploit the more common case of arbitrary file writes on Windows 10. Perhaps once again Microsoft might be able to harden the OS to make it more difficult to exploit these types of vulnerabilities. I’ll demonstrate exploitation by describing in detail the recently fixed issue that Project Zero reported to Microsoft (issue 1428).

An arbitrary file write vulnerability is where a user can create or modify a file in a location they could not normally access. This might be due to a privileged service incorrectly sanitizing information passed by the user or due to a symbolic link planting attack where the user can write a link into a location which is subsequently used by the privileged service. The ideal vulnerability is one where the attacking user not only controls the location of the file being written but also the entire contents. This is the type of vulnerability we’ll consider in this blog post.

A common way of exploiting arbitrary file writes is to perform DLL hijacking. When a Windows executable begins executing the initial loader in NTDLL will attempt to find all imported DLLs. The locations that the loader checks for imported DLLs are more complex than you’d expect but for our purposes can be summarized as follows: 

  1. Check Known DLLs, which is a pre-cached list of DLLs which are known to the OS. If found, the DLL is mapped into memory from a pre-loaded section object.
  2. Check the application’s directory, for example if importing TEST.DLL and the application is in C:\APP then it will check C:\APP\TEST.DLL.
  3. Check the system locations, such as C:\WINDOWS\SYSTEM32 and C:\WINDOWS.
  4. If all else fails search the current environment PATH.

The aim of the DLL hijack is to find an executable which runs at a high privilege which will load a DLL from a location that the vulnerability allows us to write to. The hijack only succeeds if the DLL hasn’t already been found in a location checked earlier.

There are two problems which make DLL hijacking annoying:

 

  1. You typically need to create a new instance of a privileged process as the majority of DLL imports are resolved when the process is first executed.
  2. Most system binaries, executables and DLLs that will run as a privileged user will be installed into SYSTEM32.

The second problem means that in steps 2 and 3 the loader will always look for DLLs in SYSTEM32. Assuming that overwriting a DLL isn’t likely to be an option (at the least if the DLL is already loaded you can’t write to the file), that makes it harder to find a suitable DLL to hijack. A typical way around these problems is to pick an executable that is not located in SYSTEM32 and which can be easily activated, such as by loading a COM server or running a scheduled task.

Even if you find a suitable target executable to DLL hijack the implementation can be quite ugly. Sometimes you need to implement stub exports for the original DLL, otherwise the loading of the DLL will fail. In other cases the best place to run code is during DllMain, which introduces other problems such as running code inside the loader lock. What would be nice is a privileged service that will just load an arbitrary DLL for us, no hijacking, no needing to spawn the “correct” privileged process. The question is, does such a service exist?

It turns out yes one does, and the service itself has been abused at least twice previously, once by Lokihardt for a sandbox escape, and once by me for user to system EoP. This service goes by the name “Microsoft (R) Diagnostics Hub Standard Collector Service,” but we’ll call it DiagHub for short.

The DiagHub service was introduced in Windows 10, although there’s a service that performs a similar task called IE ETW Collector in Windows 7 and 8.1. The purpose of the service is to collect diagnostic information using Event Tracing for Windows (ETW) on behalf of sandboxed applications, specifically Edge and Internet Explorer. One of its interesting features is that it can be configured to load an arbitrary DLL from the SYSTEM32 directory, which is the exact feature that Lokihardt and I exploited to gain elevated privileges. All the functionality for the service is exposed over a registered DCOM object, so in order to load our DLL we’ll need to work out how to call methods on that DCOM object. At this point you can skip to the end but if you want to understand how I would go about finding how the DCOM object is implemented, the next section might be of interest.

Reverse Engineering a DCOM Object

Let’s go through the steps I would take to try and find what interfaces an unknown DCOM object supports and find the implementation so we can reverse engineer them. There are two approaches I will typically take, go straight for RE in IDA Pro or similar, or do some on-system inspection first to narrow down the areas we have to investigate. Here we’ll go for the second approach as it’s more informative. I can’t say how Lokihardt found his issue; I’m going to opt for magic.

For this approach we’ll need some tools, specifically my OleViewDotNet v1.4+ (OVDN) tool from github as well as an installation of WinDBG from the SDK. The first step is to find the registration information for the DCOM object and discover what interfaces are accessible. We know that the DCOM object is hosted in a service so once you’ve loaded OVDN go to the menu Registry ⇒ Local Services and the tool will load a list of registered system services which expose COM objects. If you now find the  “Microsoft (R) Diagnostics Hub Standard Collector Service” service (applying a filter here is helpful) you should find the entry in the list. If you open the service tree node you’ll see a child, “Diagnostics Hub Standard Collector Service,” which is the hosted DCOM object. If you open that tree node the tool will create the object, then query for all remotely accessible COM interfaces to give you a list of interfaces the object supports. I’ve shown this in the screenshot below:

While we’re here it’s useful to inspect what security is required to access the DCOM object. If you right click the class treenode you can select View Access Permissions or View Launch Permissions and you’ll get a window that shows the permissions. In this case it shows that this DCOM object will be accessible from IE Protected Mode as well as Edge’s AppContainer sandbox, including LPAC.

Of the list of interfaces shown we only really care about the standard interfaces. Sometimes there are interesting interfaces in the factory but in this case there aren’t. Of these standard interfaces there are two we care about, the IStandardCollectorAuthorizationService and IStandardCollectorService. Just to cheat slightly I already know that it’s the IStandardCollectorService service we’re interested in, but as the following process is going to be the same for each of the interfaces it doesn’t matter which one we pick first. If you right click the interface treenode and select Properties you can see a bit of information about the registered interface.

There’s not much more information that will help us here, other than we can see there are 8 methods on this interface. As with a lot of COM registration information, this value might be missing or erroneous, but in this case we’ll assume it’s correct. To understand what the methods are we’ll need to track down the implementation of IStandardCollectorService inside the COM server. This knowledge will allow us to target our RE efforts to the correct binary and the correct methods. Doing this for an in-process COM object is relatively easy as we can query for an object’s VTable pointer directly by dereferencing a few pointers. However, for out-of-process it’s more involved. This is because the actual in-process object you’d call is really a proxy for the remote object, as shown in the following diagram:

All is not lost, however; we can still find the the VTable of the OOP object by extracting the information stored about the object in the server process. Start by right clicking the “Diagnostics Hub Standard Collector Service” object tree node and select Create Instance. This will create a new instance of the COM object as shown below:

The instance gives you basic information such as the CLSID for the object which we’ll need later (in this case {42CBFAA7-A4A7-47BB-B422-BD10E9D02700}) as well as the list of supported interfaces. Now we need to ensure we have a connection to the interface we’re interested in. For that select theIStandardCollectorService interface in the lower list, then in the Operations menu at the bottom selectMarshal ⇒ View Properties. If successful you’ll now see the following new view:

There’s a lot of information in this view but the two pieces of most interest are the Process ID of the hosting service and the Interface Pointer Identifier (IPID). In this case the Process ID should be obvious as the service is running in its own process, but this isn’t always the case—sometimes when you create a COM object you’ve no idea which process is actually hosting the COM server so this information is invaluable. The IPID is the unique identifier in the hosting process for the server end of the DCOM object; we can use the Process ID and the IPID in combination to find this server and from that find out the location of the actual VTable implementing the COM methods. It’s worth noting that the maximum Process ID size from the IPID is 16 bits; however, modern versions of Windows can have much larger PIDs so there’s a chance that you’ll have to find the process manually or restart the service multiple times until you get a suitable PID.

Now we’ll use a feature of OVDN which allows us to reach into the memory of the server process and find the IPID information. You can access information about all processes through the main menu Object ⇒ Processes but as we know which process we’re interested in just click the View button next to the Process ID in the marshal view. You do need to be running OVDN as an administrator otherwise you’ll not be able to open the service process. If you’ve not done so already the tool will ask you to configure symbol support as OVDN needs public symbols to find the correct locations in the COM DLLs to parse. You’ll want to use the version of DBGHELP.DLL which comes with WinDBG as that supports remote symbol servers. Configure the symbols similar to the following dialog:

If everything is correctly configured and you’re an administrator you should now see more details about the IPID, as shown below:

 

The two most useful pieces of information here are the Interface pointer, which is the location of the heap allocated object (in case you want to inspect its state), and the VTable pointer for the interface. The VTable address gives us information for where exactly the COM server implementation is located. As we can see here the VTable is located in a different module (DiagnosticsHub.StandardCollector.Runtime) from the main executable (DiagnosticsHub.StandardCollector.Server). We can verify the VTable address is correct by attaching to the service process using WinDBG and dumping the symbols at the VTable address. We also know from before we’re expecting 8 methods so we can take that into account by using the command:

dqs DiagnosticsHub_StandardCollector_Runtime+0x36C78 L8

Note that WinDBG converts periods in a module name to underscores. If successful you’ll see the something similar to the following screenshot:

Extracting out that information we now get the name of the methods (shown below) as well as the address in the binary. We could set breakpoints and see what gets called during normal operation, or take this information and start the RE process.

ATL::CComObject<StandardCollectorService>::QueryInterface

ATL::CComObjectCached<StandardCollectorService>::AddRef

ATL::CComObjectCached<StandardCollectorService>::Release

StandardCollectorService::CreateSession

StandardCollectorService::GetSession

StandardCollectorService::DestroySession

StandardCollectorService::DestroySessionAsync

StandardCollectorService::AddLifetimeMonitorProcessIdForSession

The list of methods looks correct: they start with the 3 standard methods for a COM object, which in this case are implemented by the ATL library. Following those methods are five implemented by theStandardCollectorService class. Being public symbols, this doesn’t tell us what parameters we expect to pass to the COM server. Due to C++ names containing some type information, IDA Pro might be able to extract that information for you, however that won’t necessarily tell you the format of any structures which might be passed to the function. Fortunately due to how COM proxies are implemented using the Network Data Representation (NDR) interpreter to perform marshalling, it’s possible to reverse the NDR bytecode back into a format we can understand. In this case go back to the original service information, right click theIStandardCollectorService treenode and select View Proxy Definition. This will get OVDN to parse the NDR proxy information and display a new view as shown below.

Viewing the proxy definition will also parse out any other interfaces which that proxy library implements. This is likely to be useful for further RE work. The decompiled proxy definition is shown in a C# like pseudo code but it should be easy to convert into working C# or C++ as necessary. Notice that the proxy definition doesn’t contain the names of the methods but we’ve already extracted those out. So applying a bit of cleanup and the method names we get a definition which looks like the following:

[uuid(«0d8af6b7-efd5-4f6d-a834-314740ab8caa»)]
struct IStandardCollectorService : IUnknown {
   HRESULT CreateSession(_In_ struct Struct_24* p0, 
                         _In_ IStandardCollectorClientDelegate* p1,
                         _Out_ ICollectionSession** p2);
   HRESULT GetSession(_In_ GUID* p0, _Out_ ICollectionSession** p1);
   HRESULT DestroySession(_In_ GUID* p0);
   HRESULT DestroySessionAsync(_In_ GUID* p0);
   HRESULT AddLifetimeMonitorProcessIdForSession(_In_ GUID* p0, [In] int p1);
}

There’s one last piece missing; we don’t know the definition of the Struct_24 structure. It’s possible to extract this from the RE process but fortunately in this case we don’t have to. The NDR bytecode must know how to marshal this structure across so OVDN just extracts the structure definition out for us automatically: select the Structures tab and find Struct_24.

As you go through the RE process you can repeat this process as necessary until you understand how everything works. Now let’s get to actually exploiting the DiagHub service and demonstrating its use with a real world exploit.

Example Exploit

So after our efforts of reverse engineering, we’ll discover that in order to to load a DLL from SYSTEM32 we need to do the following steps:

  1. Create a new Diagnostics Session using IStandardCollectorService::CreateSession.
  2. Call the ICollectionSession::AddAgent method on the new session, passing the name of the DLL to load (without any path information).

The simplified loading code for ICollectionSession::AddAgent is as follows:

void EtwCollectionSession::AddAgent(LPWCSTR dll_path, 
                                   REFGUID guid) {
 WCHAR valid_path[MAX_PATH];
 if ( !GetValidAgentPath(dll_path, valid_path)) {
   return E_INVALID_AGENT_PATH;
 HMODULE mod = LoadLibraryExW(valid_path, 
       nullptr, LOAD_WITH_ALTERED_SEARCH_PATH);
 dll_get_class_obj = GetProcAddress(hModule, «DllGetClassObject»);
 return dll_get_class_obj(guid);
}

We can see that it checks that the agent path is valid and returns a full path (this is where the previous EoP bugs existed, insufficient checks). This path is loading using LoadLibraryEx, then the DLL is queried for the exported method DllGetClassObject which is then called. Therefore to easily get code execution all we need is to implement that method and drop the file into SYSTEM32. The implemented DllGetClassObject will be called outside the loader lock so we can do anything we want. The following code (error handling removed) will be sufficient to load a DLL called dummy.dll.

IStandardCollectorService* service;
CoCreateInstance(CLSID_CollectorService, nullptr, CLSCTX_LOCAL_SERVER,IID_PPV_ARGS(&service));

SessionConfiguration config = {};
config.version = 1;
config.monitor_pid = ::GetCurrentProcessId();
CoCreateGuid(&config.guid);
config.path = ::SysAllocString(L»C:\Dummy»);
ICollectionSession* session;
service->CreateSession(&config, nullptr, &session);

GUID agent_guid;
CoCreateGuid(&agent_guid);
session->AddAgent(L»dummy.dll», agent_guid);

All we need now is the arbitrary file write so that we can drop a DLL into SYSTEM32, load it and elevate our privileges. For this I’ll demonstrate using a vulnerability I found in the SvcMoveFileInheritSecurity RPC method in the system Storage Service. This function caught my attention due to its use in an exploit for a vulnerability in ALPC discovered and presented by Clément Rouault & Thomas Imbert at PACSEC 2017. While this method was just a useful exploit primitive for the vulnerability I realized it has not one, but two actual vulnerabilities lurking in it (at least from a normal user privilege). The code prior to any fixes forSvcMoveFileInheritSecurity looked like the following:

void SvcMoveFileInheritSecurity(LPCWSTR lpExistingFileName, 
                               LPCWSTR lpNewFileName, 
                               DWORD dwFlags) {
 PACL pAcl;
 if (!RpcImpersonateClient()) {
   // Move file while impersonating.
   if (MoveFileEx(lpExistingFileName, lpNewFileName, dwFlags)) {
     RpcRevertToSelf();
     // Copy inherited DACL while not.
     InitializeAcl(&pAcl, 8, ACL_REVISION);
     DWORD status = SetNamedSecurityInfo(lpNewFileName, SE_FILE_OBJECT, 
         UNPROTECTED_DACL_SECURITY_INFORMATION | DACL_SECURITY_INFORMATION,
         nullptr, nullptr, &pAcl, nullptr);
       if (status != ERROR_SUCCESS)
         MoveFileEx(lpNewFileName, lpExistingFileName, dwFlags);
   }
   else {
     // Copy file instead…
     RpcRevertToSelf();
   }
 }
}

The purpose of this method seems to be to move a file then apply any inherited ACE’s to the DACL from the new directory location. This would be necessary as when a file is moved on the same volume, the old filename is unlinked and the file is linked to the new location. However, the new file will maintain the security assigned from its original location. Inherited ACEs are only applied when a new file is created in a directory, or as in this case, the ACEs are explicitly applied by calling a function such as SetNamedSecurityInfo.

To ensure this method doesn’t allow anyone to move an arbitrary file while running as the service’s user, which in this case is Local System, the RPC caller is impersonated. The trouble starts immediately after the first call to MoveFileEx, the impersonation is reverted and SetNamedSecurityInfo is called. If that call fails then the code calls MoveFileEx again to try and revert the original move operation. This is the first vulnerability; it’s possible that the original filename location now points somewhere else, such as through the abuse of symbolic links. It’s pretty easy to cause SetNamedSecurityInfo to fail, just add a Deny ACL for Local System to the file’s ACE for WRITE_DAC and it’ll return an error which causes the revert and you get an arbitrary file creation. This was reported as issue 1427.

This is not in fact the vulnerability we’ll be exploiting, as that would be too easy. Instead we’ll exploit a second vulnerability in the same code: the fact that we can get the service to call SetNamedSecurityInfo on any file we like while running as Local System. This can be achieved either by abusing the impersonated device map to redirect the local drive letter (such as C:) when doing the initial MoveFileEx, which then results in lpNewFileName pointing to an arbitrary location, or more interestingly abusing hard links. This was reported as issue 1428. We can exploit this using hard links as follows:

  1. Create a hard link to a target file in SYSTEM32 that we want to overwrite. We can do this as you don’t need to have write privileges to a file to create a hard link to it, at least outside of a sandbox.
  2. Create a new directory location that has an inheritable ACE for a group such as Everyone or Authenticated Users to allow for modification of any new file. You don’t even typically need to do this explicitly; for example, any new directory created in the root of the C: drive has an inherited ACE for Authenticated Users. Then a request can be made to the RPC service to move the hardlinked file to the new directory location. The move succeeds under impersonation as long as we have FILE_DELETE_CHILD access to the original location and FILE_ADD_FILE in the new location, which we can arrange.
  3. The service will now call SetNamedSecurityInfo on the moved hardlink file. SetNamedSecurityInfo will pick up the inherited ACEs from the new directory location and apply them to the hardlinked file. The reason the ACEs are applied to the hardlinked file is from the perspective of SetNamedSecurityInfo the hardlinked file is in the new location, even though the original target file we linked to was in SYSTEM32.

By exploiting this we can modify the security of any file that Local System can access for WRITE_DAC access. Therefore we can modify a file in SYSTEM32, then use the DiagHub service to load it. There is a slight problem, however. The majority of files in SYSTEM32 are actually owned by the TrustedInstaller group and so cannot be modified, even by Local System. We need to find a file we can write to which isn’t owned by TrustedInstaller. Also we’d want to pick a file that won’t cause the OS install to become corrupt. We don’t care about the file’s extension as AddAgent only checks that the file exists and loads it with LoadLibraryEx. There are a number of ways we can find a suitable file, such as using the SysInternals AccessChk utility, but to be 100% certain that the Storage Service’s token can modify the file we’ll use my NtObjectManagerPowerShell module (specifically its Get-AccessibleFile cmdlet, which accepts a process to do the access check from). While the module was designed for checking accessible files from a sandbox, it also works to check for files accessible by privileged services. If you run the following script as an administrator with the module installed the $files variable will contain a list of files that the Storage Service has WRITE_DAC access to.

Import-Module NtObjectManager

Start-Service Name «StorSvc»
Set-NtTokenPrivilege SeDebugPrivilege | Out-Null
$files = Use-NtObject($p = Get-NtProcess ServiceName «StorSvc») {
   Get-AccessibleFile Win32Path C:\Windows\system32 Recurse `
    MaxDepth 1 FormatWin32Path AccessRights WriteDac CheckMode FilesOnly
}

Looking through the list of files I decided to pick on the file license.rtf, which contains a short license statement for Windows. The advantage of this file is it’s very likely to be not be critical to the operation of the system and so overwriting it shouldn’t cause the installation to become corrupted.

So putting it all together:

  1. Use the Storage Service vulnerability to change the security of the license.rtf file inside SYSTEM32.
  2. Copy a DLL, which implements DllGetClassObject over the license.rtf file.
  3. Use the DiagHub service to load our modified license file as a DLL, get code execution as Local System and do whatever we want.

If you’re interested in seeing a fully working example, I’ve uploaded a full exploit to the original issue on the tracker.

Wrapping Up

In this blog post I’ve described a useful exploit primitive for Windows 10, which you can even use from some sandboxed environments such as Edge LPAC. Finding these sorts of primitives makes exploitation much simpler and less error-prone. Also I’ve given you a taste of how you can go about finding your own bugs in similar DCOM implementations.