CVE-2021-1647: Windows Defender mpengine remote code execution

Microsoft Defender Remote Code Execution Vulnerability

Original text by Maddie Stone

The Basics

Disclosure or Patch Date: 12 January 2021

Product: Microsoft Windows Defender

Advisory: https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-1647

Affected Versions: Version 1.1.17600.5 and previous

First Patched Version: Version 1.1.17700.4

Issue/Bug Report: N/A

Patch CL: N/A

Bug-Introducing CL: N/A

Reporter(s): Anonymous

The Code

Proof-of-concept:

Exploit sample: 6e1e9fa0334d8f1f5d0e3a160ba65441f0656d1f1c99f8a9f1ae4b1b1bf7d788

Did you have access to the exploit sample when doing the analysis? Yes

The Vulnerability

Bug class: Heap buffer overflow

Vulnerability details:

There is a heap buffer overflow when Windows Defender (mpengine.dll) processes the section table when unpacking an ASProtect packed executable. Each section entry has two values: the virtual address and the size of the section. The code in CAsprotectDLLAndVersion::RetrieveVersionInfoAndCreateObjects only checks if the next section entry’s address is lower than the previous one, not if they are equal. This means that if you have a section table such as the one used in this exploit sample: [ (0,0), (0,0), (0x2000,0), (0x2000,0x3000) ], 0 bytes are allocated for the section at address 0x2000, but when it sees the next entry at 0x2000, it simply skips over it without exiting nor updating the size of the section. 0x3000 bytes will then be copied to that section during the decompression, leading to the heap buffer overflow.

if ( next_sect_addr > sect_addr )// current va is greater than prev (not also eq)
{
    sect_addr = next_sect_addr;
    sect_sz = (next_sect_sz + 0xFFF) & 0xFFFFF000;
} 
// if next_sect_addr <= sect_addr we continue on to next entry in the table 

[...]
			new_sect_alloc = operator new[](sect_sz + sect_addr);// allocate new section
[...]

Patch analysis: There are quite a few changes to the function CAsprotectDLLAndVersion::RetrieveVersionInfoAndCreateObjects between version 1.1.17600.5 (vulnerable) and 1.1.17700.4 (patched). The directly related change was to add an else branch to the comparison so that if any entry in the section array has an address less than or equal to the previous entry, the code will error out and exit rather than continuing to decompress.

Thoughts on how this vuln might have been found (fuzzing, code auditing, variant analysis, etc.):

It seems possible that this vulnerability was found through fuzzing or manual code review. If the ASProtect unpacking code was included from an external library, that would have made the process of finding this vulnerability even more straightforward for both fuzzing & review.

(Historical/present/future) context of bug:

The Exploit

(The terms exploit primitiveexploit strategyexploit technique, and exploit flow are defined here.)

Exploit strategy (or strategies):

  1. The heap buffer overflow is used to overwrite the data in an object stored as the first field in the lfind_switch object which is allocated in the lfind_switch::switch_out function.
  2. The two fields that were overwritten in the object pointed to by the lfind_switch object are used as indices in lfind_switch::switch_in. Due to no bounds checking on these indices, another out-of-bounds write can occur.
  3. The out of bounds write in step 2 performs an or operation on the field in the VMM_context_t struct (the virtual memory manager within Windows Defender) that stores the length of a table that tracks the virtual mapped pages. This field usually equals the number of pages mapped * 2. By performing the ‘or’ operations, the value in the that field is increased (for example from 0x0000000C to 0x0003030c. When it’s increased, it allows for an additional out-of-bounds read & write, used for modifying the memory management struct to allow for arbitrary r/w.

Exploit flow:

The exploit uses «primitive bootstrapping» to to use the original buffer overflow to cause two additional out-of-bounds writes to ultimately gain arbitrary read/write.

Known cases of the same exploit flow: Unknown.

Part of an exploit chain? Unknown.

The Next Steps

Variant analysis

Areas/approach for variant analysis (and why):

  • Review ASProtect unpacker for additional parsing bugs.
  • Review and/or fuzz other unpacking code for parsing and memory issues.

Found variants: N/A

Structural improvements

What are structural improvements such as ways to kill the bug class, prevent the introduction of this vulnerability, mitigate the exploit flow, make this type of vulnerability harder to exploit, etc.?

Ideas to kill the bug class:

  • Building mpengine.dll with ASAN enabled should allow for this bug class to be caught.
  • Open sourcing unpackers could allow more folks to find issues in this code, which could potentially detect issues like this more readily.

Ideas to mitigate the exploit flow:

  • Adding bounds checking to anywhere indices are used. For example, if there had been bounds checking when using indices in lfind_switch::switch_in, it would have prevented the 2nd out-of-bounds write which allowed this exploit to modify the VMM_context_t structure.

Other potential improvements:

It appears that by default the Windows Defender emulator runs outside of a sandbox. In 2018, there was this article that Windows Defender Antivirus can now run in a sandbox. The article states that when sandboxing is enabled, you will see a content process MsMpEngCp.exe running in addition to MsMpEng.exe. By default, on Windows 10 machines, I only see MsMpEng.exe running as SYSTEM. Sandboxing the anti-malware emulator by default, would make this vulnerability more difficult to exploit because a sandbox escape would then be required in addition to this vulnerability.

0-day detection methods

What are potential detection methods for similar 0-days? Meaning are there any ideas of how this exploit or similar exploits could be detected as a 0-day?

  • Detecting these types of 0-days will be difficult due to the sample simply dropping a new file with the characteristics to trigger the vulnerability, such as a section table that includes the same virtual address twice. The exploit method also did not require anything that especially stands out.

Other References

Compiling C without webassembly

Compiling C without webassembly

Original text by Surma

A compiler is just a part of Emscripten. What if we stripped away all the bells and whistles and used just the compiler?

Emscripten is a compiler toolchain for C/C++ targeting WebAssembly. But it does so much more than just compiling. Emscripten’s goal is to be a drop-in replacement for your off-the-shelf C/C++ compiler and make code that was not written for the web run on the web. To achieve this, Emscripten emulates an entire POSIX operating system for you. If your program uses fopen(), Emscripten will bundle the code to emulate a filesystem. If you use OpenGL, Emscripten will bundle code that creates a C-compatible GL context backed by WebGL. That requires a lot of work and also amounts in a lot of code that you need send over the wire. What if we just,… didn’t?

The compiler in Emscripten’s toolchain, the program that translates C code to WebAssembly byte-code, is LLVM. LLVM is a modern, modular compiler framework. LLVM is modular in the sense that it never compiles one language straight to machine code. Instead, it has a front-end compiler that compiles your code to an intermediate representation (IR). This IR is called LLVM, as the IR is modeled around a Low-level Virtual Machine, hence the name of the project. The back-end compiler then takes care of translating the IR to the host’s machine code. The advantage of this strict separation is that adding support for a new architecture “merely” requires adding a new back-end compiler. WebAssembly, in that sense, is just one of many targets that LLVM supports and has been available behind a flag for a while. Since version 8 of LLVM the WebAssembly target is available by default. If you are on MacOS, you can install LLVM using homebrew:

$ brew install llvm
$ brew link --force llvm

To make sure you have WebAssembly support, we can go and check the back-end compiler:

$ llc --version
LLVM (http://llvm.org/):
  LLVM version 8.0.0
  Optimized build.
  Default target: x86_64-apple-darwin18.5.0
  Host CPU: skylake

  Registered Targets:
    # … OMG so many architectures …
    systemz    - SystemZ
    thumb      - Thumb
    thumbeb    - Thumb (big endian)
    wasm32     - WebAssembly 32-bit # 🎉🎉🎉
    wasm64     - WebAssembly 64-bit
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64
    xcore      - XCore

Seems like we are good to go!

Compiling C the hard way

Note: We’ll be looking at some low-level file formats like raw WebAssembly here. If you are struggling with that, that is ok. You don’t need to understand this entire blog post to make good use of WebAssembly. If you are here for the copy-pastables, look at the compiler invocation in the “Optimizing” section. But if you are interested, keep going! I also wrote an introduction to Raw Webassembly and WAT previously which covers the basics needed to understand this post.

Warning: I’ll bend over backwards here for a bit and use human-readable formats for every step of the process (as much as possible). Our program for this journey is going to be super simple to avoid edge cases and distractions:

// Filename: add.c
int add(int a, int b) {
  return a*a + b;
}

What a mind-boggling feat of engineering! Especially because it’s called “add” but doesn’t actually add. More importantly: This program makes no use of C’s standard library and only uses `int` as a type.

Turning C into LLVM IR

The first step is to turn our C program into LLVM IR. This is the job of the front-end compiler clang that got installed with LLVM:

clang \
  --target=wasm32 \ # Target WebAssembly
  -emit-llvm \ # Emit LLVM IR (instead of host machine code)
  -c \ # Only compile, no linking just yet
  -S \ # Emit human-readable assembly rather than binary
  add.c

And as a result we get add.ll containing the LLVM IR. I’m only showing this here for completeness sake. When working with WebAssembly, or even with clang when developing C, you never get into contact with LLVM IR.

; ModuleID = 'add.c'
source_filename = "add.c"
target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
target triple = "wasm32"

; Function Attrs: norecurse nounwind readnone
define hidden i32 @add(i32, i32) local_unnamed_addr #0 {
  %3 = mul nsw i32 %0, %0
  %4 = add nsw i32 %3, %1
  ret i32 %4
}

attributes #0 = { norecurse nounwind readnone "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 8.0.0 (tags/RELEASE_800/final)"}

LLVM IR is full of additional meta data and annotations, allowing the back-end compiler to make more informed decisions when generating machine code.

Turning LLVM IR into object files

The next step is invoking LLVMs backend compiler llc to turn the LLVM IR into an object file:

llc \
  -march=wasm32 \ # Target WebAssembly
  -filetype=obj \ # Output an object file
  add.ll

The output, add.o, is effectively a valid WebAssembly module and contains all the compiled code of our C file. However, most of the time you won’t be able to run object files as essential parts are still missing.

If we omitted -filetype=obj we’d get LLVM’s assembly format for WebAssembly, which is human-readable and somewhat similar to WAT. However, the tool that can consume these files, llvm-mc, does not fully support this text format yet and often fails to consume the output of llc. So instead we’ll disassemble the object files after the fact. Object files are target-specific and therefore need target-specific tool to inspect them. In the case of WebAssembly, the tool is wasm-objdump, which is part of the WebAssembly Binary Toolkit, or wabt for short.

$ brew install wabt # in case you haven’t
$ wasm-objdump -x add.o

add.o:  file format wasm 0x1

Section Details:

Type[1]:
 - type[0] (i32, i32) -> i32
Import[3]:
 - memory[0] pages: initial=0 <- env.__linear_memory
 - table[0] elem_type=funcref init=0 max=0 <- env.__indirect_function_table
 - global[0] i32 mutable=1 <- env.__stack_pointer
Function[1]:
 - func[0] sig=0 <add>
Code[1]:
 - func[0] size=75 <add>
Custom:
 - name: "linking"
  - symbol table [count=2]
   - 0: F <add> func=0 binding=global vis=hidden
   - 1: G <env.__stack_pointer> global=0 undefined binding=global vis=default
Custom:
 - name: "reloc.CODE"
  - relocations for section: 3 (Code) [1]
   - R_WASM_GLOBAL_INDEX_LEB offset=0x000006(file=0x000080) symbol=1 <env.__stack_pointer>

The output shows that our add() function is in this module, but it also contains custom sections filled with metadata and, surprisingly, a couple of imports. In the next phase, called linking, the custom sections will be analyzed and removed and the imports will be resolved by the linker.

Linking

Traditionally, the linker’s job is to assembles multiple object file into the executable. LLVM’s linker is called lld, but it has to be invoked with one of the target-specific symlinks. For WebAssembly there is wasm-ld.

wasm-ld \
  --no-entry \ # We don’t have an entry function
  --export-all \ # Export everything (for now)
  -o add.wasm \
  add.o

The output is a 262 bytes WebAssembly module.

Running it

Of course the most important part is to see that this actually works. As we did in the previous blog post, we can use a couple lines of inline JavaScript to load and run this WebAssembly module.

<!DOCTYPE html>

<script type="module">
  async function init() {
    const { instance } = await WebAssembly.instantiateStreaming(
      fetch("./add.wasm")
    );
    console.log(instance.exports.add(4, 1));
  }
  init();
</script>

If nothing went wrong, you shoud see a 17 in your DevTool’s console. We just successfully compiled C to WebAssembly without touching Emscripten. It’s also worth noting that we don’t have any glue code that is required to setup and load the WebAssembly module.

Compiling C the slightly less hard way

The numbers of steps we currently have to do to get from C code to WebAssembly is a bit daunting. As I said, I was bending over backwards for educational purposes. Let’s stop doing that and skip all the human-readable, intermediate formats and use the C compiler as the swiss-army knife it was designed to be:

clang \
  --target=wasm32 \
  -nostdlib \ # Don’t try and link against a standard library
  -Wl,--no-entry \ # Flags passed to the linker
  -Wl,--export-all \
  -o add.wasm \
  add.c

This will produce the same .wasm file as before, but with a single command.

Optimizing

Let’s take a look at the WAT of our WebAssembly module by running wasm2wat:

(module
  (type (;0;) (func))
  (type (;1;) (func (param i32 i32) (result i32)))
  (func $__wasm_call_ctors (type 0))
  (func $add (type 1) (param i32 i32) (result i32)
    (local i32 i32 i32 i32 i32 i32 i32 i32)
    global.get 0
    local.set 2
    i32.const 16
    local.set 3
    local.get 2
    local.get 3
    i32.sub
    local.set 4
    local.get 4
    local.get 0
    i32.store offset=12
    local.get 4
    local.get 1
    i32.store offset=8
    local.get 4
    i32.load offset=12
    local.set 5
    local.get 4
    i32.load offset=12
    local.set 6
    local.get 5
    local.get 6
    i32.mul
    local.set 7
    local.get 4
    i32.load offset=8
    local.set 8
    local.get 7
    local.get 8
    i32.add
    local.set 9
    local.get 9
    return)
  (table (;0;) 1 1 anyfunc)
  (memory (;0;) 2)
  (global (;0;) (mut i32) (i32.const 66560))
  (global (;1;) i32 (i32.const 66560))
  (global (;2;) i32 (i32.const 1024))
  (global (;3;) i32 (i32.const 1024))
  (export "memory" (memory 0))
  (export "__wasm_call_ctors" (func $__wasm_call_ctors))
  (export "__heap_base" (global 1))
  (export "__data_end" (global 2))
  (export "__dso_handle" (global 3))
  (export "add" (func $add)))

Wowza that’s a lot of WAT. To my suprise, the module uses memory (indicated by the i32.load and i32.store operations), 8 local variables and a couple of globals. If you think you’d be able to write a shorter version by hand, you’d probably be right. The reason this program is so big is because we didn’t have any optimizations enabled. Let’s change that:

 clang \
   --target=wasm32 \
+  -O3 \ # Agressive optimizations
+  -flto \ # Add metadata for link-time optimizations
   -nostdlib \
   -Wl,--no-entry \
   -Wl,--export-all \
+  -Wl,--lto-O3 \ # Aggressive link-time optimizations
   -o add.wasm \
   add.c

Note: Technically, link-time optimizations don’t bring us any gains here as we are only linking a single file. In bigger projects, LTO will help you keep your file size down.

After running the commands above, our .wasm file went down from 262 bytes to 197 bytes and the WAT is much easier on the eye, too:

(module
  (type (;0;) (func))
  (type (;1;) (func (param i32 i32) (result i32)))
  (func $__wasm_call_ctors (type 0))
  (func $add (type 1) (param i32 i32) (result i32)
    local.get 0
    local.get 0
    i32.mul
    local.get 1
    i32.add)
  (table (;0;) 1 1 anyfunc)
  (memory (;0;) 2)
  (global (;0;) (mut i32) (i32.const 66560))
  (global (;1;) i32 (i32.const 66560))
  (global (;2;) i32 (i32.const 1024))
  (global (;3;) i32 (i32.const 1024))
  (export "memory" (memory 0))
  (export "__wasm_call_ctors" (func $__wasm_call_ctors))
  (export "__heap_base" (global 1))
  (export "__data_end" (global 2))
  (export "__dso_handle" (global 3))
  (export "add" (func $add)))

Calling into the standard library.

Now, C without the standard library (called “libc”) is pretty rough. It seems logical to look into adding a libc as a next step, but I’m going to be honest: It’s not going to be easy. I actually won’t link against any libc in this blog post. There are a couple of libc implementations out there that we could grab, most notably glibcmusl and dietlibc. However, most of these libraries expect to run on a POSIX operating system, which implements a specific set of syscalls (calls to the system’s kernel). Since we don’t have a kernel interface in JavaScript, we’d have to implement these POSIX syscalls ourselves, probably by calling out to JavaScript. This is quite the task and I am not going to do that here. The good news is: This is exactly what Emscripten does for you.

Not all of libc’s functions rely on syscalls, of course. Functions like strlen()sin() or even memset() are implemented in plain C. That means you could use these functions or even just copy/paste their implementation from one of the libraries above.

Dynamic memory

With no libc at hand, fundamental C APIs like malloc() and free() are not available. In our unoptimized WAT above we have seen that the compiler will make use of memory if necessary. That means we can’t just use the memory however we like without risking corruption. We need to understand how that memory is used.

LLVM’s memory model

The way the WebAssembly memory is segmented is dictated by wasm-ld and might take C veterans a bit by surprise. Firstly, address 0 is technically valid in WebAssembly, but will often still be handled as an error case by a lot of C code. Secondly, the stack comes first and grows downwards (towards lower addresses) and the heap cames after and grows upwards. The reason for this is that WebAssembly memory can grow at runtime. That means there is no fixed end to place the stack or the heap at.

The layout that wasm-ld uses is the following:

A depiction of the wasm-ld’d memory layout.
The stack grows downwards and the heap grows upwards. The stack starts at __data_end, the heap starts at __heap_base. Because the stack is placed first, it is limited to a maximum size set at compile time, which is __heap_base - __data_end.

If we look back at the globals section in our WAT we can find these symbols defined. __heap_base is 66560 and __data_end is 1024. This means that the stack can grow to a maximum of 64KiB, which is not a lot. Luckily, wasm-ld allows us to configure this value:

 clang \
   --target=wasm32 \
   -O3 \
   -flto \
   -nostdlib \
   -Wl,--no-entry \
   -Wl,--export-all \
   -Wl,--lto-O3 \
+  -Wl,-z,stack-size=$[8 * 1024 * 1024] \ # Set maximum stack size to 8MiB
   -o add.wasm \
   add.c

Building an allocator

We now know that the heap region starts at __heap_base and since there is no malloc() function, we know that the memory region from there on upwards is ours to control. We can place data in there however we like and don’t have to fear corruption as the stack is growing the other way. Leaving the heap as a free-for-all can get hairy quickly, though, so usually some sort of dynamic memory management is needed. One option is to pull in a full malloc() implementation like Doug Lea’s malloc implementation, which is used by Emscripten today. There is also a couple of smaller implementations with different tradeoffs.

But why don’t we write our own malloc()? We are in this deep, we might as well. One of the simplest allocators is a bump allocator. The advantages: It’s super fast, extremely small and simple to implement. The downside: You can’t free memory. While this seems incredibly useless at first sight, I have encountered use-cases while working on Squoosh where this would have been an excellent choice. The concept of a bump allocator is that we store the start address of unused memory as a global. If the program requests n bytes of memory, we advance that marker by n and return the previous value:

extern unsigned char __heap_base;

unsigned int bump_pointer = &__heap_base;
void* malloc(int n) {
  unsigned int r = bump_pointer;
  bump_pointer += n;
  return (void *)r;
}

void free(void* p) {
  // lol
}

The globals we saw in the WAT are actually defined by wasm-ld which means we can access them from our C code as normal variables if we declare them as externWe just wrote our own malloc() in, like, 5 lines of C 😱

Note: Our bump allocator is not fully compatible with C’s malloc(). For example, we don’t make any alignment guarantees. But it’s good enough and it works, so 🤷‍♂️.

Using dynamic memory

To prove that this actually works, let’s build a C function that takes an arbitrary-sized array of numbers and calculates the sum. Not very exciting, but it does force us to use dynamic memory, as we don’t know the size of the array at build time:

int sum(int a[], int len) {
  int sum = 0;
  for(int i = 0; i < len; i++) {
    sum += a[i];
  }
  return sum;
}

The sum() function is hopefully straight forward. The more interesting question is how we can pass an array from JavaScript to WebAssembly — after all, WebAssembly only understands numbers. The general idea is to use malloc() from JavaScript to allocate a chunk of memory, copy the values into that chunk and pass the address (a number!) to where the array is located:

<!DOCTYPE html>

<script type="module">
  async function init() {
    const { instance } = await WebAssembly.instantiateStreaming(
      fetch("./add.wasm")
    );

    const jsArray = [1, 2, 3, 4, 5];
    // Allocate memory for 5 32-bit integers
    // and return get starting address.
    const cArrayPointer = instance.exports.malloc(jsArray.length * 4);
    // Turn that sequence of 32-bit integers
    // into a Uint32Array, starting at that address.
    const cArray = new Uint32Array(
      instance.exports.memory.buffer,
      cArrayPointer,
      jsArray.length
    );
    // Copy the values from JS to C.
    cArray.set(jsArray);
    // Run the function, passing the starting address and length.
    console.log(instance.exports.sum(cArrayPointer, cArray.length));
  }
  init();
</script>

When running this you should see a very happy 15 in the DevTools console, which is indeed the sum of all the number from 1 to 5.

You made it to the end. Congratulations! Again, if you feel a bit overwhelmed, that’s okay: This is not required reading. You do not need to understand all of this to be a good web developer or even to make good use of WebAssembly. But I did want to share this journey with you as it really makes you appreciate all the work that a project like Emscripten does for you. At the same time, it gave me an understanding of how small purely computational WebAssembly modules can be. The Wasm module for the array summing ended up at just 230 bytes, including an allocator for dynamic memory. Compiling the same code with Emscripten would yield 100 bytes of WebAssembly accompanied by 11K of JavaScript glue code. It took a lot of work to get there, but there might be situations where it is worth it.

Zero-day vulnerability in Desktop Window Manager (CVE-2021-28310) used in the wild

Zero-day vulnerability in Desktop Window Manager (CVE-2021-28310) used in the wild

Original text by Costin Raiu Boris Larin Brian Bartholomew

While analyzing the CVE-2021-1732 exploit originally discovered by the DBAPPSecurity Threat Intelligence Center and used by the BITTER APT group, we discovered another zero-day exploit we believe is linked to the same actor. We reported this new exploit to Microsoft in February and after confirmation that it is indeed a zero-day, it received the designation CVE-2021-28310. Microsoft released a patch to this vulnerability as a part of its April security updates.

We believe this exploit is used in the wild, potentially by several threat actors. It is an escalation of privilege (EoP) exploit that is likely used together with other browser exploits to escape sandboxes or get system privileges for further access. Unfortunately, we weren’t able to capture a full chain, so we don’t know if the exploit is used with another browser zero-day, or coupled with known, patched vulnerabilities.

The exploit was initially identified by our advanced exploit prevention technology and related detection records. In fact, over the past few years, we have built a multitude of exploit protection technologies into our products that have detected several zero-days, proving their effectiveness time and again. We will continue to improve defenses for our users by enhancing technologies and working with third-party vendors to patch vulnerabilities, making the internet more secure for everyone. In this blog we provide a technical analysis of the vulnerability and how the bad guys exploited it. More information about BITTER APT and IOCs are available to customers of the Kaspersky Intelligence Reporting service. Contact: intelreports@kaspersky.com.

Technical details

CVE-2021-28310 is an out-of-bounds (OOB) write vulnerability in dwmcore.dll, which is part of Desktop Window Manager (dwm.exe). Due to the lack of bounds checking, attackers are able to create a situation that allows them to write controlled data at a controlled offset using DirectComposition API. DirectComposition is a Windows component that was introduced in Windows 8 to enable bitmap composition with transforms, effects and animations, with support for bitmaps of different sources (GDI, DirectX, etc.). We’ve already published a blogpost about in-the-wild zero-days abusing DirectComposition API. DirectComposition API is implemented by the win32kbase.sys driver and the names of all related syscalls start with the string “NtDComposition”.

DirectComposition syscalls in the win32kbase.sys driver

For exploitation only three syscalls are required: NtDCompositionCreateChannel, NtDCompositionProcessChannelBatchBuffer and NtDCompositionCommitChannel. The NtDCompositionCreateChannel syscall initiates a channel that can be used together with the NtDCompositionProcessChannelBatchBuffer syscall to send multiple DirectComposition commands in one go for processing by the kernel in a batch mode. For this to work, commands need to be written sequentially in a special buffer mapped by NtDCompositionCreateChannel syscall. Each command has its own format with a variable length and list of parameters.

enum DCOMPOSITION_COMMAND_ID
{
ProcessCommandBufferIterator,
CreateResource,
OpenSharedResource,
ReleaseResource,
GetAnimationTime,
CapturePointer,
OpenSharedResourceHandle,
SetResourceCallbackId,
SetResourceIntegerProperty,
SetResourceFloatProperty,
SetResourceHandleProperty,
SetResourceHandleArrayProperty,
SetResourceBufferProperty,
SetResourceReferenceProperty,
SetResourceReferenceArrayProperty,
SetResourceAnimationProperty,
SetResourceDeletedNotificationTag,
AddVisualChild,
RedirectMouseToHwnd,
SetVisualInputSink,
RemoveVisualChild
};

List of command IDs supported by the function DirectComposition::CApplicationChannel::ProcessCommandBufferIterator

While these commands are processed by the kernel, they are also serialized into another format and passed by the Local Procedure Call (LPC) protocol to the Desktop Window Manager (dwm.exe) process for rendering to the screen. This procedure could be initiated by the third syscall – NtDCompositionCommitChannel.

To trigger the vulnerability the discovered exploit uses three types of commands: CreateResource, ReleaseResource and SetResourceBufferProperty.

void CreateResourceCmd(int resourceId)
{
DWORD *buf = (DWORD *)((PUCHAR)pMappedAddress + BatchLength);
*buf = CreateResource;
buf[1] = resourceId;
buf[2] = PropertySet; // MIL_RESOURCE_TYPE
buf[3] = FALSE;
BatchLength += 16;
}
 
void ReleaseResourceCmd(int resourceId)
{
DWORD *buf = (DWORD *)((PUCHAR)pMappedAddress + BatchLength);
*buf = ReleaseResource;
buf[1] = resourceId;
BatchLength += 8;
}
 
void SetPropertyCmd(int resourceId, bool update, int propertyId, int storageOffset, int hidword, int lodword)
{
DWORD *buf = (DWORD *)((PUCHAR)pMappedAddress + BatchLength);
*buf = SetResourceBufferProperty;
buf[1] = resourceId;
buf[2] = update;
buf[3] = 20;
buf[4] = propertyId;
buf[5] = storageOffset;
buf[6] = _D2DVector2; // DCOMPOSITION_EXPRESSION_TYPE
buf[7] = hidword;
buf[8] = lodword;
BatchLength += 36;
}

Format of commands used in exploitation

Let’s take a look at the function CPropertySet::ProcessSetPropertyValue in dwmcore.dll. This function is responsible for processing the SetResourceBufferProperty command. We are most interested in the code responsible for handling DCOMPOSITION_EXPRESSION_TYPE = D2DVector2.

int CPropertySet::ProcessSetPropertyValue(CPropertySet *this, …)
{
  …
 
  if (expression_type == _D2DVector2)
  {
    if (!update)
    {
      CPropertySet::AddProperty<D2DVector2>(this, propertyId, storageOffset, _D2DVector2, value);
    }
    else
    {
      if ( storageOffset != this->properties[propertyId]->offset & 0x1FFFFFFF )
      {
        goto fail;
      }
 
      CPropertySet::UpdateProperty<D2DVector2>(this, propertyId, _D2DVector2, value);
    }
  }
 
  …
}
 
int CPropertySet::AddProperty<D2DVector2>(CResource *this, unsigned int propertyId, int storageOffset, int type, _QWORD *value)
{
  int propertyIdAdded;
 
  int result = PropertySetStorage<DynArrayNoZero,PropertySetUserModeAllocator>::AddProperty<D2DVector2>(
     this->propertiesData,
     type,
     value,
     &propertyIdAdded);
  if ( result < 0 )
  {
    return result;
  }
 
  if ( propertyId != propertyIdAdded || storageOffset != this->properties[propertyId]->offset & 0x1FFFFFFF )
  {
    return 0x88980403;
  }
 
  result = CPropertySet::PropertyUpdated<D2DMatrix>(this, propertyId);
  if ( result < 0 )
  {
    return result;
  }
 
  return 0;
}
 
int CPropertySet::UpdateProperty<D2DVector2>(CResource *this, unsigned int propertyId, int type, _QWORD *value)
{
  if ( this->properties[propertyId]->type == type )
  {
    *(_QWORD *)(this->propertiesData + (this->properties[propertyId]->offset & 0x1FFFFFFF)) = *value;
 
    int result = CPropertySet::PropertyUpdated<D2DMatrix>(this, propertyId);
    if ( result < 0 )
    {
      return result;
    }
 
    return 0;
  }
  else
  {
    return 0x80070057;
  }
}

Processing of the SetResourceBufferProperty (D2DVector2) command in dwmcore.dll

For the SetResourceBufferProperty command with the expression type set to D2DVector2, the function CPropertySet::ProcessSetPropertyValue(…) would either call CPropertySet::AddProperty<D2DVector2>(…) or CPropertySet::UpdateProperty<D2DVector2>(…) depending on whether the update flag is set in the command. The first thing that catches the eye is the way the new property is added in the CPropertySet::AddProperty<D2DVector2>(…) function. You can see that it adds a new property to the resource, but it only checks if the propertyId and storageOffset of a new property are equal to the provided values after the new property is added, and returns an error if that’s not the case. Checking something after a job is done is bad coding practice and can result in vulnerabilities. However, a real issue can be found in the CPropertySet::UpdateProperty<D2DVector2>(…) function. No check takes place that will ensure if the provided propertyId is less than the count of properties added to the resource. As a result, an attacker can use this function to perform an OOB write past the propertiesData buffer if it manages to bypass two additional checks for data inside the properties array.

(1) storageOffset == this->properties[propertyId]->offset & 0x1FFFFFFF(2) this->properties[propertyId]->type == type

Conditions which need to be met for exploitation in dwmcore.dll

These checks could be bypassed if an attacker is able to allocate and release objects in the dwm.exe process to groom heap into the desired state and spray memory at specific locations with fake properties. The discovered exploit manages to do this using the CreateResource, ReleaseResource and SetResourceBufferProperty commands.

At the time of writing, we still hadn’t analyzed the updated binaries that are fixing this vulnerability, but to exclude the possibility of other variants for this vulnerability Microsoft would need to check the count of properties for other expression types as well.

Even with the above issues in dwmcore.dll, if the desired memory state is achieved to bypass the previously mentioned checks and a batch of commands are issued to trigger the vulnerability, it still won’t be triggered because there is one more thing preventing it from happening.

As mentioned above, commands are first processed by the kernel and only after that are they sent to Desktop Window Manager (dwm.exe). This means that if you try to send a command with an invalid propertyId, NtDCompositionProcessChannelBatchBuffer syscall will return an error and the command will not be passed to the dwm.exe process. SetResourceBufferProperty commands with expression type set to D2DVector2 are processed in the win32kbase.sys driver with the functions DirectComposition::CPropertySetMarshaler::AddProperty<D2DVector2>(…) and DirectComposition::CPropertySetMarshaler::UpdateProperty<D2DVector2>(…), which are very similar to those present in dwmcore.dll (it’s quite likely they were copy-pasted). However, the kernel version of the UpdateProperty<D2DVector2> function has one notable difference – it actually checks the count of properties added to the resource.

DirectComposition::CPropertySetMarshaler::UpdateProperty<D2DVector2>(…) in win32kbase.sys

The check for propertiesCount in the kernel mode version of the UpdateProperty<D2DVector2> function prevents further processing of a malicious command by its user mode twin and mitigates the vulnerability, but this is where DirectComposition::CPropertySetMarshaler::AddProperty<D2DVector2>(…) comes in to play. The kernel version of the AddProperty<D2DVector2> function works exactly like its user mode variant and it also applies the same behavior of checking property after it has already been added and returns an error if propertyId and storageOffset of the created property do not match the provided values. Because of this, it’s possible to use the AddProperty<D2DVector2> function to add a new property and force the function to return an error and cause inconsistency between the number of properties assigned to the same resource in kernel mode/user mode. The propertiesCount check in the kernel could be bypassed this way and malicious commands would be passed to Desktop Window Manager (dwm.exe).

Inconsistency between the number of properties assigned to the same resource in kernel mode/user mode could be a source of other vulnerabilities, so we recommend Microsoft to change the behavior of the AddProperty function and check properties before they are added.

The whole exploitation process for the discovered exploit is as follows:

  1. Create a large number of resources with properties of specific size to get heap into predictable state.
  2. Create additional resources with properties of specific size and content to spray memory at specific locations with fake properties.
  3. Release resources created at stage 2.
  4. Create additional resources with properties. These resources will be used to perform OOB writes.
  5. Make holes among resources created at stage 1.
  6. Create additional properties for resources created at stage 4. Their buffers are expected to be allocated at specific locations.
  7. Create “special” properties to cause inconsistency between the number of properties assigned to the same resource in kernel mode/user mode for resources created at stage 4.
  8. Use OOB write vulnerability to write shellcode, create an object and get code execution.
  9. Inject additional shellcode into another system process.

Kaspersky products detect this exploit with the verdicts:

  • HEUR:Exploit.Win32.Generic
  • HEUR:Trojan.Win32.Generic
  • PDM:Exploit.Win32.Generic

Process Herpaderping

Process Herpaderping proof of concept, tool, and technical deep dive. Process Herpaderping bypasses security products by obscuring the intentions of a process.

Original text by jxy-s

Process Herpaderping is a method of obscuring the intentions of a process by modifying the content on disk after the image has been mapped. This results in curious behavior by security products and the OS itself.

https://github.com/jxy-s/herpaderping

Summary

Generally, a security product takes action on process creation by registering a callback in the Windows Kernel (PsSetCreateProcessNotifyRoutineEx). At this point, a security product may inspect the file that was used to map the executable and determine if this process should be allowed to execute. This kernel callback is invoked when the initial thread is inserted, not when the process object is created.

Because of this, an actor can create and map a process, modify the content of the file, then create the initial thread. A product that does inspection at the creation callback would see the modified content. Additionally, some products use an on-write scanning approach which consists of monitoring for file writes. A familiar optimization here is recording the file has been written to and defer the actual inspection until IRP_MJ_CLEANUP occurs (e.g. the file handle is closed). Thus, an actor using a write -> map -> modify -> execute -> close workflow will subvert on-write scanning that solely relies on inspection at IRP_MJ_CLEANUP.

To abuse this convention, we first write a binary to a target file on disk. Then, we map an image of the target file and provide it to the OS to use for process creation. The OS kindly maps the original binary for us. Using the existing file handle, and before creating the initial thread, we modify the target file content to obscure or fake the file backing the image. Some time later, we create the initial thread to begin execution of the original binary. Finally, we will close the target file handle. Let’s walk through this step-by-step:

  1. Write target binary to disk, keeping the handle open. This is what will execute in memory.
  2. Map the file as an image section (NtCreateSectionSEC_IMAGE).
  3. Create the process object using the section handle (NtCreateProcessEx).
  4. Using the same target file handle, obscure the file on disk.
  5. Create the initial thread in the process (NtCreateThreadEx).
    • At this point the process creation callback in the kernel will fire. The contents on disk do not match what was mapped. Inspection of the file at this point will result in incorrect attribution.
  6. Close the handle. IRP_MJ_CLEANUP will occur here.
    • Since we’ve hidden the contents of what is executing, inspection at this point will result in incorrect attribution.
@startuml
hide empty description

[*] --> CreateFile
CreateFile --> FileHandle
FileHandle --> Write
FileHandle --> NtCreateSection
Write -[hidden]-> NtCreateSection
NtCreateSection --> SectionHandle
SectionHandle --> NtCreateProcessEx
FileHandle --> Modify
NtCreateProcessEx -[hidden]-> Modify
NtCreateProcessEx --> NtCreateThreadEx
Modify -[hidden]-> NtCreateThreadEx
NtCreateThreadEx --> [*]
FileHandle --> CloseFile
NtCreateThreadEx -[hidden]-> CloseFile
NtCreateThreadEx --> PspCallProcessNotifyRoutines
PspCallProcessNotifyRoutines -[hidden]-> [*]
CloseFile --> IRP_MJ_CLEANUP
IRP_MJ_CLEANUP -[hidden]-> [*]
PspCallProcessNotifyRoutines --> Inspect
PspCallProcessNotifyRoutines -[hidden]-> CloseFile 
IRP_MJ_CLEANUP --> Inspect
Inspect -[hidden]-> [*]

CreateFile : Create target file, keep handle open.
Write : Write source payload into target file.
Modify : Obscure the file on disk.
NtCreateSection : Create section using file handle.
NtCreateProcessEx : Image section for process is mapped and cached in file object.
NtCreateThreadEx : The cached section is used.
NtCreateThreadEx : Process notify routines fire in kernel.
Inspect : The contents on disk do not match what was executed. 
Inspect : Inspection of the file at this point will result in incorrect attribution.
@enduml

Behavior

You’ll see in the demo below, CMD.exe is used as the execution target. The first run overwrites the bytes on disk with a pattern. The second run overwrites CMD.exe with ProcessHacker.exe. The Herpaderping tool fixes up the binary to look as close to ProcessHacker.exe as possible, even retaining the original signature. Note the multiple executions of the same binary and how the process looks to the user compared to what is in the file on disk.

Diving Deeper

We’ve observed the behavior and some of this may be surprising. Let’s try to explain this behavior.

Technical Deep Dive

Background and Motivation

When designing products for securing Windows platforms, many engineers in this field (myself included) have fallen on preconceived notions with respect to how the OS will handle data. In this scenario, some might expect the file on disk to remain «locked» when the process is created. You can’t delete the file. You can’t write to it. But you can rename it. Seen here, under the right conditions, you can in fact write to it. Remain vigilant on your assumptions, always question them, and do your research.

The motivation for this research came about when discovering how to do analysis when a file is written. With prior background researching process Hollowing and Doppelganging, I had theorized this might be possible. The goal is to provide better security. You cannot create a better lock without first understanding how to break the old one.

Similar Techniques

Herpaderping is similar to Hollowing and Doppelganging however there are some key differences:

Process Hollowing

Process Hollowing involves modifying the mapped section before execution begins, which abstractly this looks like: map -> modify section -> execute. This workflow results in the intended execution flow of the Hollowed process diverging into unintended code. Doppelganging might be considered a form of Hollowing. However, Hollowing, in my opinion, is closer to injection in that Hollowing usually involves an explicit write to the already mapped code. This differs from Herpaderping where there are no modified sections.

Process Doppelganging

Process Doppelganging is closer to Herpaderping. Doppelganging abuses transacted file operations and generally involves these steps: transact -> write -> map -> rollback -> execute. In this workflow, the OS will create the image section and account for transactions, so the cached image section ends up being what you wrote to the transaction. The OS has patched this technique. Well, they patched the crash it caused. Maybe they consider this a «legal» use of a transaction. Thankfully, Windows Defender does catch the Doppelganging technique. Doppelganging differs from Herpaderping in that Herpaderping does not rely on transacted file operations. And Defender doesn’t catch Herpaderping.

Comparison

For reference, the generalized techniques:

TypeTechnique
Hollowingmap -> modify section -> execute
Doppelgangingtransact -> write -> map -> rollback -> execute
Herpaderpingwrite -> map -> modify -> execute -> close

We can see the differences laid out here. While Herpaderping is arguably noisier than Doppelganging, in that the malicious bits do hit the disk, we’ve seen that security products are still incapable of detecting Herpaderping.

Possible Solution

There is not a clear fix here. It seems reasonable that preventing an image section from being mapped/cached when there is write access to the file should close the hole. However, that may or may not be a practical solution.

Another option might be to flush the changes to the file through to the cached image section if it hasn’t yet been mapped into a process. However, since the map into the new process occurs at NtCreateProcess that is probably not a viable solution.

From a detection standpoint, there is not a great way to identify the actual bits that got mapped, inspection at IRP_MJ_CLEANUP or a callback registered at PsSetCreateProcessNotifyRoutineEx results in incorrect attribution since the bits on disk have been changed, you would have to rebuild the file from the section that got created. It’s worth pointing out here there is a new callback in Windows 10 you may register for PsSetCreateProcessNotifyRoutineEx2 however this suffers from the same problem as the previous callback, it’s called out when the initial thread is executed, not when the process object is created. Microsoft did add PsSetCreateThreadNotifyRoutineEx which is called out when the initial thread is inserted if registered with PsCreateThreadNotifyNonSystem, opposed to when it is about to begin execution (as the old callback did). Extending PSCREATEPROCESSNOTIFYTYPE to be called out when the process object is created won’t help either, we’ve seen in the Diving Deeper section that the image section object is cached on the NtCreateSection call not NtCreateProcess.

We can’t easily identify what got executed. We’re left with trying to detect the exploitive behavior by the actor, I’ll leave discovery of the behavior indicators as an exercise for the reader.

Known Affected Platforms

Below is a list of products and Windows OSes that have been tested as of (8/31/2020). Tests were carried out with a known malicious binary.

Operating SystemVersionVulnerable
Windows 7 Enterprise x866.1.7601Yes
Windows 10 Pro x6410.0.18363.900Yes
Windows 10 Pro Insider Preview x6410.0.20170.1000Yes
Windows 10 Pro Insider Preview x6410.0.20201.1000Yes
Security ProductVersionVulnerable
Windows Defender AntiMalware Client4.18.2006.10Yes
Windows Defender Engine1.1.17200.2Yes
Windows Defender Antivirus1.319.1127.0Yes
Windows Defender Antispyware1.319.1127.0Yes
Windows Defender AntiMalware Client4.18.2007.6Yes
Windows Defender Engine1.1.17300.2Yes
Windows Defender Antivirus1.319.1676.0Yes
Windows Defender Antispyware1.319.1676.0Yes
Windows Defender AntiMalware Client4.18.2007.8Yes
Windows Defender Engine1.1.17400.5Yes
Windows Defender Antivirus1.323.267.0Yes
Windows Defender Antispyware1.323.267.0Yes

Responsible Disclosure

This vulnerability was disclosed to the Microsoft Security Response Center (MSRC) on 7/17/2020 and a case was opened by MSRC on 7/22/2020. MSRC concluded their investigation on 8/25/2020 and determined the findings are valid but do not meet their bar for immediate servicing. At this time their case is closed, without resolution, and is marked for future review, with no timeline.

We disagree on the severity of this bug; this was communicated to MSRC on 8/27/2020.

  1. There are similar vulnerabilities in this class (Hollowing and Doppelganging).
  2. The vulnerability is shown to defeat security features inherent to the OS (Windows Defender).
  3. The vulnerability allows an actor to gain execution of arbitrary code.
  4. The user is not notified of the execution of unintended code.
  5. The process information presented to the user does not accurately reflect what is executing.
  6. Facilities to accurately identify the process are not intuitive or incorrect, even from the kernel.

Source

This repo contains a tool for exercising the Herpaderping method of process obfuscation. Usage is as follows:

Process Herpaderping Tool - Copyright (c) Johnny Shaw
ProcessHerpaderping.exe SourceFile TargetFile [ReplacedWith] [Options...]
Usage:
  SourceFile               Source file to execute.
  TargetFile               Target file to execute the source from.
  ReplacedWith             File to replace the target with. Optional,
                           default overwrites the binary with a pattern.
  -h,--help                Prints tool usage.
  -d,--do-not-wait         Does not wait for spawned process to exit,
                           default waits.
  -l,--logging-mask number Specifies the logging mask, defaults to full
                           logging.
                               0x1   Successes
                               0x2   Informational
                               0x4   Warnings
                               0x8   Errors
                               0x10  Contextual
  -q,--quiet               Runs quietly, overrides logging mask, no title.
  -r,--random-obfuscation  Uses random bytes rather than a pattern for
                           file obfuscation.
  -e,--exclusive           Target file is created with exclusive access and
                           the handle is held open as long as possible.
                           Without this option the handle has full share
                           access and is closed as soon as possible.
  -u,--do-not-flush-file   Does not flush file after overwrite.
  -c,--close-file-early    Closes file before thread creation (before the
                           process notify callback fires in the kernel).
                           Not valid with "--exclusive" option.
  -k,--kill                Terminates the spawned process regardless of
                           success or failure, this is useful in some
                           automation environments. Forces "--do-not-wait
                           option.

Cloning and Building

The repo uses submodules, after cloning be sure to init and update the submodules. Projects files are targeted to Visual Studio 2019.

git clone https://github.com/jxy-s/herpaderping.git
cd .\herpaderping\
git submodule update --init --recursive
MSBuild .\herpaderping.sln

Credits

The following are used without modification. Credits to their authors.

  • Windows Implementation Libraries (WIL)
    A header-only C++ library created to make life easier for developers on Windows through readable type-safe C++ interfaces for common Windows coding patterns.
  • Process Hacker Native API Headers
    Collection of Native API header files. Gathered from Microsoft header files and symbol files, as well as a lot of reverse engineering and guessing.