The world’s most popular torrent client, uTorrent, contained a security vulnerability — later to be called CVE-2020-8437— that could be exploited by a remote attacker to crash and corrupt any uTorrent instance connected to the internet. As white-hat hackers, my friend (who wishes to remain anonymous) and I reported this vulnerability as soon as we found it and it was quickly fixed. Now, after ample time has been given for users to update, it’s safe to disclose an overview of the vulnerability and how to exploit it.
Torrent Protocol — What You Need To Know
Torrent downloads utilize simultaneous connections to multiple peers (other people downloading the same file), creating a decentralized download network that benefits the collective peer group. Each peer can upload and download data to and from any other peer, eliminating any single point of failure or bandwidth bottleneck, resulting in a faster and more stable download for all peers. Peers communicate with each other using the BitTorrent protocol, which is initiated with a handshake. We’re going to focus on this handshake and the packet following it because that’s all that’s needed for exploiting the CVE-2020-8437 uTorrent vulnerability. Surprisingly convenient. 😊
BitTorrent Handshake
The handshake packet is the first packet the initiating peer sends to another peer. It has 5 fields in a strictly structured format:
Handshake Packet Format
Name Length — 1 byte unsigned int — The length of the string that follows.
Protocol Name — variable length string — The protocol the initiating peer supports. This field is for future compatibility, but is set to “BitTorrent protocol” in all major implementations.
Reserved Bytes — 8 byte bitfield — Each bit represents a protocol extension (functionality) that was not part of the original BitTorrent specification. Modern torrent clients utilize this field to communicate their advanced capabilities, which are then used for an optimized download. Today, the grand majority of torrent clients support the “Extension Protocol” extension (confusing name, I know), the 20th bit in this bit field, that provides a foundation for exchanging information about other extensions. Yes, you understood that correctly: there is an extension bit that allows for even more extensions. I wonder what such a complicated protocol can lead to 😉.
Info Hash — 20 byte SHA1 — Used to identify the torrent the initiating peer wants to download, this is the hash of all the information needed to download the torrent (torrent name, hashes of file sections, file section size, file section count, etc…).
Peer ID — 20 byte buffer — A self-designated random ID the initiating peer gives itself.
Figure 1. BitTorrent handshake packet #1 as seen in Wireshark
After a peer receives a handshake packet, it replies with its own handshake packet in the exact same format.
If both peers set the Extension Protocol bit in the Reserved Bytes field, the peers then exchange further information about extensions, using an “Extended” message handshake.
BitTorrent Extended Message Handshake
The Extended Message Handshake is used by peers to share the exact additional extensions they support and other supplemental information. Unlike the BitTorrent handshake packet we previously examined, which was (practically) statically sized, the Extended Message’s Handshake packet can dynamically grow, allowing the packet to transport a multitude of extension data.
Extended Message Handshake Packet Format
Length — 4 bytes unsigned int — the length of the entire message that follows
BitTorrent Message Type — 1 byte — The BitTorrent message ID of this packet. This is set to 20 (0x14) for Extended Messages
BitTorrent Extended Message Type — 1 byte — The Extended Message ID of this extended message. This is set to 0 for an extension exchange.
M — dynamically sized — a bencoded dictionary of the supplemental extensions supported.
Figure 2. Extended Message Extension Exchange
Bencoded Dictionaries
The M field is a bencoded dictionary, which is a format similar to a python dictionary: string-type keys are associated with values. However, in contrast to python dictionaries, bencoded dictionaries include the length of each string before its value, and “d” and “e” are used instead of “{“ and “}” respectively. Below is an example of a python dictionary and its corresponding bencoded dictionary encoding (newlines and spaces inserted in both formats for clarity).
Figure 3. A Bencoded Dictionary Is Very Similar To A Python Dictionary
Additionally, just as a python dictionary can contain a separate dictionary inside itself (and another dictionary inside that one, etc…), so too can a benencoded dictionary.
Figure 4. Both Formats (But We Only Care About Bencoded Dictionaries) can contain more dictionaries inside themselves
The CVE-2020-8437 Vulnerability
The CVE-2020-8437 vulnerability is in how uTorrent parses bencoded dictionaries — specifically, nested dictionaries. Before the patch (uTorrent 3.5.5 and earlier), uTorrent would use an integer (32 bits) as a bit field to keep track of which layer in the bencoded dictionary it was currently parsing. For example, when uTorrent would parse the first layer, the bit field would hold ‘00000000 00000000 00000000 00000001’, and when uTorrent would parse the second layer, the bit field would hold ‘00000000 00000000 00000000 00000011’. “But what happens if uTorrent parses a bencoded dictionary with more than 32 layers of nested dictionaries?”, my friend and I curiously asked one Thursday night. So we quickly created such a dictionary and fed it into uTorrent’s bencoding dictionary parser. The result:
Figure 5. uTorrent crash message 🥳
Awesome Possum! uTorrent crashed! Further inspection of the crash revealed its source: a null pointer dereference.
Exploiting CVE-2020-8437
There are two easy exploit vectors for CVE-2020-8437: The first is a remote peer sending an Extended Message packet with a malicious bencoded dictionary, and the second is a .torrent file that contains a malicious bencoded dictionary.
Remote Peer Exploit
As described earlier, when two peers that support Extended Messages start communicating with each other, they each send a packet enumerating the various extensions they support. That information about supported extensions is sent as a bencoded dictionary, and since that bencoded dictionary gets parsed by the client, if that dictionary is malicious (having more than 32 nested dictionaries layers), it will trigger CVE-2020-8437. 😊
Torrent File Exploit
.torrent files encapsulate the most basic information a client needs to start downloading torrents. These files are openly and commonly shared on torrent websites, downloaded, and then opened by torrent clients, effectively making these files a possible vehicle for triggering vulnerabilities in those torrent clients. Let me take you on a behind-the-scenes-sneak-peek-never-before-seen-on-live-tv look at the internals of a .torrent file, exposing how simple it is to use it to trigger CVE-2020-8437: a .torrent file is simply a bencoded dictionary saved as a file. So to exploit CVE-2020-8437 from a .torrent file, you just need to save a malicious bencoded dictionary to a file and give that file the .torrent extension. Check out my .torrent file exploit. Please enjoy this proof of concept video of crashing uTorrent with the malicious.torrent file linked above 😉
Instagram, with over 100+ million photos uploaded every day, is one of the most popular social media platforms. For that reason, we decided to audit the security of the Instagram app for both Android and iOS operating systems. We found a critical vulnerability that can be used to perform remote code execution on a victim’s phone.
Our modus operandi for this research was to examine the 3rd party projects used by Instagram.
Many software developers, regardless of their size, utilize open-source projects in their software. We found a vulnerability in the way that Instagram utilizes Mozjpeg, the open source project used as their JPEG format decoder.
In the attack scenario we describe below, an attacker simply sends an image to the victim via email, WhatsApp or other media exchange platforms. When the victim opens the Instagram app, the exploitation takes place.
Tell me who your friends are and I’ll tell you your vulnerabilities
We all know that even the biggest companies rely on public open-source projects and that those projects are integrated into their apps with little to no modifications.
Most companies using 3rd party open-source projects declare it, but not all libraries appear in the app’s About page. The best way to be sure you see all the libraries is to go to the lib-superpack-zstd folder of the Instagram app:
Figure 1. Shared objects used by Instagram.
In the image below, you can see that when you upload an image using Instagram, three shared objects are loaded: libfb_mozjpeg.so, libjpegutils_moz.so, and libcj_moz.so.
Figure 2. Mozjpeg’s shared objects.
The “moz” suffix is short for “mozjpeg,” which is short for Mozilla JPEG Encoder Project but what do these modules do?
What is Mozjpeg?
Let’s start with a brief history of the JPEG format. JPEG is an image file format that’s been around since the early 1990s, and is based on the concept of lossy compression, meaning that some information is lost in the compression process, but this information loss is negligible to the human eye. Libjpeg is the baseline JPEG encoder built into the Windows, Mac and Linux operating systems and is maintained by an informal independent group. This library tries to balance encoding speed and quality with file size.
In contrast, Libjpeg-turbo is a higher performance replacement for libjpeg, and is the default library for most Linux distributions. This library was designed to use less CPU time during encoding and decoding.
On March 5, 2014, Mozilla announced the “Mozjpeg” project, a JPEG encoder built on top of libjpeg-turbo, to provide better compression for web images, at the expense of performance.
The open-source project is specifically for images on the web. Mozilla forked libjpeg-turbo in 2014 so they could focus on reducing file size to lower bandwidth and load web images more quickly.
Instagram decided to split the mozjpeg library into 3 different shared objects:
libfb_mozjpeg.so – Responsible for the Mozilla-specific decompression exported API.
libcj_moz.so – The libjeg-turbo that parses the image data.
libjpegutils_moz.so – The connector between the two shared objects. It holds the exported API that the JNI calls to trigger the decompression from the Java application side.
Fuzzing
Our team at CPR built a multi-processor fuzzing lab that gave us amazing results with our Adobe Research, so we decided to expand our fuzzing efforts to Mozjpeg as well.
The primary addition made by Mozilla on top of libjpeg-turbo was the compression algorithm, so that is where set our sights.
AFL was our weapon of choice, so naturally we had to write a harness for it.
To write the harness, we had to understand how to instrument the Mozjpeg decompression function.
Fortunately, Mozjpeg comes with a code sample explaining how to use the library:
METHODDEF(int)
do_read_JPEG_file(struct jpeg_decompress_struct *cinfo, char *filename)
{
struct my_error_mgr jerr;
/* More stuff */
FILE *infile; /* source file */
JSAMPARRAY buffer; /* Output row buffer */
int row_stride; /* physical row width in output buffer */
if ((infile = fopen(filename, "rb")) == NULL) {
fprintf(stderr, "can't open %s\\n", filename);
return 0;
}
/* Step 1: allocate and initialize JPEG decompression object */
/* We set up the normal JPEG error routines, then override error_exit. */
cinfo->err = jpeg_std_error(&jerr.pub);
jerr.pub.error_exit = my_error_exit;
/* Establish the setjmp return context for my_error_exit to use. */
if (setjmp(jerr.setjmp_buffer)) {
jpeg_destroy_decompress(cinfo);
fclose(infile);
return 0;
}
/* Now we can initialize the JPEG decompression object. */
jpeg_create_decompress(cinfo);
/* Step 2: specify data source (eg, a file) */
jpeg_stdio_src(cinfo, infile);
/* Step 3: read file parameters with jpeg_read_header() */
(void)jpeg_read_header(cinfo, TRUE);
/* Step 4: set parameters for decompression */
/* In this example, we don't need to change any of the defaults set by
* jpeg_read_header(), so we do nothing here.
*/
/* Step 5: Start decompressor */
(void)jpeg_start_decompress(cinfo);
/* JSAMPLEs per row in output buffer */
row_stride = cinfo->output_width * cinfo->output_components;
/* Make a one-row-high sample array that will go away when done with image */
buffer = (*cinfo->mem->alloc_sarray)
((j_common_ptr)cinfo, JPOOL_IMAGE, row_stride, 1);
/* Step 6: while (scan lines remain to be read) */
/* jpeg_read_scanlines(...); */
while (cinfo->output_scanline < cinfo->output_height) {
(void)jpeg_read_scanlines(cinfo, buffer, 1);
/* Assume put_scanline_someplace wants a pointer and sample count. */
put_scanline_someplace(buffer[0], row_stride);
}
/* Step 7: Finish decompression */
(void)jpeg_finish_decompress(cinfo);
/* Step 8: Release JPEG decompression object */
jpeg_destroy_decompress(cinfo);
fclose(infile);
return 1;
}
However, to make sure any crash we found in Mozjpeg impacts Instagram itself, we need to see how Instagram integrated Mozjpeg to their code.
Luckily, below you can see that Instagram copy-pasted the best practice for using the library:
Figure 3. Instagram’s implementation for using Mozjpeg.
As you can see, the only thing they really changed was to replace the put_scanline_someplace dummy function from the example code with read_jpg_copy_loop which utilizes memcpy.
Our harness receives generated image files from AFL and sends them to the wrapped Mozjpeg decompression function.
We ran the fuzzer for only a single day with 30 CPU cores, and AFL notified us about 447 unique “unique” crashes.
After triaging the results, we found an interesting crash related to the parsing of the image dimensions of JPEG. The crash was an out-of-bounds write and we decided to focus on it.
CVE-2020-1895
The vulnerable function is read_jpg_copy_loop which leads to an integer overflow during the decompression process.
Figure 4. Read_jpg_copy_loop code snippet from IDA.
The vulnerable function handles the image dimensions when parsing JPEG image files. Here’s a pseudo code from the original vulnerable code:
width = rect->right - rect->bottom;
height = rect->top - rect->left;
allocated_address = __wrap_malloc(width*height*cinfo->output_components);// <---Integer overflow
bytes_copied = 0;
while ( 1 ){
output_scanline = cinfo->output_scanline;
if ( (unsigned int)output_scanline >= cinfo->output_height )
break;
//reads one line from the file into the cinfo buffer
jpeg_read_scanlines(cinfo, line_buffer, 1);
if ( output_scanline >= Rect->left && output_scanline < Rect->top )
{
memcpy(allocated_address + bytes_copied , line_buffer, width*output_component);// <--Oops
bytes_copied += width * output_component;
}
}
First, let’s understand what this code does.
The _wrap_malloc function allocates a memory chunk based on 3 parameters which are the image dimensions. Both width and height are 16 bit integers (uint16_t) that are parsed from the file.
cinfo->output_component tells us how many bytes represent each pixel.
This variable can vary from 1 for Greyscale, 3 for RGB, and 4 for RGB + Alpha\CMYK\etc.
In addition to height and width, the output_component is also completely controlled by the attacker. It is parsed from the file and is not validated with regards to the remaining data available in the file.
__warp_malloc expects its parameters to be passed in 32bit registers! That means if we can cause the allocation size to exceed (2^32) bytes, we have an integer overflow that leads to a much smaller allocation than expected.
The allocated size is calculated by multiplying the image’s width, height and output_components. Those sizes are unchecked and in our control. When abused, they lead to an integer overflow.
A data of size (width*output_component) is copied (height) times.
It’s a promising-looking bug from an exploitation perspective: a linear heap-overflow gives the attacker control over the size of the allocation, the amount of overflow, and the contents of the overflowed memory region.
Wild Copy Exploitation
To cause the memory corruption, we need to overflow the integer determining the allocation size; our calculation must exceed 32 bits. We are dealing with a wildcopy which means we are trying to copy data that is larger than 2^32 (4GB). Therefore, there is an extremely high probability the program will crash when the loop reaches an unmapped page:
Figure 5. Segfault caused by our wildcopy.
So how can we exploit this?
Before we dive into wildcopy exploitation techniques, we need to differentiate our case from the classic case of wildcopy like in the Stagefright bug. The classic case usually involves one memcpy that writes 4GB of data.
However, in our case there is a for loop that tries to copy X bytes Y times while X * Y is 4GB.
When we try to exploit such a memory corruption vulnerability, we need to ask ourselves a few important questions:
Can we control (even partially) the content of the data we are corrupting with?
Can we control the length of the data we are corrupting with?
Can we control the size of the allocated chunk we overflow?
This last question is especially important because in Jemalloc/LFH (or every bucket-based allocator), if we can’t control the size of the chunk we are corrupting from, it might be difficult to shape the heap such that we could corrupt a specific target structure, if that structure is in a significantly different size.
At first glance, it seems clear that the answer to the first question, about our ability to control the content, is “yes”, because we control the content of the image data.
Now, moving on to the second question – controlling the length of the data we corrupt with. The answer here is also clearly “yes” because the memcpy loop copies the file line by line and the size of each line copied is a multiplication of the width argument and output_component that are controlled by the attacker.
The answer to the 3rd question, about the size of the buffer we corrupt, is trivial.
As it is controlled by `width * height * cinfo->output_components`, we wrote a small Python script that gives us what these 3 parameters should be, according to the chunk size we wish to allocate, considering the effect of the integer overflow:
import sys
def main(low=None, high=None):
res = []
print("brute forcing...")
for a in range(0xffff):
for b in range(0xffff):
x = 4 * (a+1) * (b+1) - 2**32
if 0 < x <= 0x100000:#our limit
if (not low or (x > low)) and (not high or x <= high):
res.append((x, a+1, b+1))
for s, x, y in sorted(res, key=lambda i: i[0]):
print "0x%06x, 0x%08x, 0x%08x" % (s, x, y)
if __name__ == '__main__':
high = None
low = None
if len(sys.argv) == 2:
high = int(sys.argv[1], 16)
elif len(sys.argv) == 3:
high = int(sys.argv[2], 16)
low = int(sys.argv[1], 16)
main(low, high)
Now that we have our prerequisites for exploiting a wildcopy, let’s see how we can utilize them.
To trigger the vulnerability, we must specify a size larger than 2^32 bytes. In practice, we need to stop the wildcopy before we reach the unmapped memory.
We have a number of options:
Rely on a race condition – While the wildcopy corrupts some useful target structures or memory, we can race a different thread to use that now corrupted data to do something before the wildcopy crashes (e.g., construct other primitives, terminate the wildcopy, etc.).
If the wildcopy loop has some logic that can stop the loop under certain conditions, we can mess with these checks and stop after it corrupts enough data.
If the wildcopy loop has a call to a virtual function on every iteration, and that pointer to a function is in a structure in heap memory (or at another memory address we can corrupt during the wildcopy), the exploit can use the loop to overwrite and divert execution during the wildcopy.
Sadly, the first option isn’t applicable here because we are attacking from an image vector. Therefore, we don’t have any control over threads so the race condition option does not help.
To use the second approach, we looked for a kill-switch to stop the wildcopy. We tried cutting the file in half while keeping the same size in the image header. However, we found out that if the library reaches an EOF marker, it just adds another EOF marker, so we end up in an infinite loop of EOF markers.
We also tried looking for an ERREXIT function that could stop the decompression process at runtime, but we learned that no matter what we do, we can never reach a path that leads to ERREXIT in this code. Therefore, the second option isn’t applicable either.
To use the third option, we need to look for a virtual function that gets called on every iteration of our wildcopy loop.
Let’s go back to the loop logic where the memcpy copy occurs:
proccess_data points to another function called process_data_simple_main:
process_data_simple_main(j_decompress_ptr cinfo, JSAMPARRAY output_buf,
JDIMENSION *out_row_ctr, JDIMENSION out_rows_avail)
{
my_main_ptr main_ptr = (my_main_ptr)cinfo->main;
JDIMENSION rowgroups_avail;
/* Read input data if we haven't filled the main buffer yet */
if (!main_ptr->buffer_full) {
if (!(*cinfo->coef->decompress_data) (cinfo, main_ptr->buffer))
return;
main_ptr->buffer_full = TRUE;
}
rowgroups_avail = (JDIMENSION)cinfo->_min_DCT_scaled_size;
/* Feed the postprocessor */
(*cinfo->post->post_process_data) (cinfo, main_ptr->buffer,
&main_ptr->rowgroup_ctr, rowgroups_avail,
output_buf, out_row_ctr, out_rows_avail);
/* Has postprocessor consumed all the data yet? If so, mark buffer empty */
if (main_ptr->rowgroup_ctr >= rowgroups_avail) {
main_ptr->buffer_full = FALSE;
main_ptr->rowgroup_ctr = 0;
}
}
From process_data_simple_main, we can identify 2 more virtual functions that get called in every iteration. They all have a cinfo struct as a common denominator.
What is this cinfo?
Cinfo is a struct that is passed around during the Mozjpeg various functionality. It holds crucial members, function pointers and image meta-data.
Let’s look at cinfo struct from Jpeglib.h
struct jpeg_decompress_struct {
struct jpeg_error_mgr *err;
struct jpeg_memory_mgr *mem;
struct jpeg_progress_mgr *progress;
void *client_data;
boolean is_decompressor;
int global_state
struct jpeg_source_mgr *src;
JDIMENSION image_width;
JDIMENSION image_height;
int num_components;
...
J_COLOR_SPACE out_color_space;
unsigned int scale_num
...
JDIMENSION output_width;
JDIMENSION output_height;
int out_color_components;
int output_components;
int rec_outbuf_height;
int actual_number_of_colors;
...
boolean saw_JFIF_marker;
UINT8 JFIF_major_version;
UINT8 JFIF_minor_version;
UINT8 density_unit;
UINT16 X_density;
UINT16 Y_density;
...
...
int unread_marker;
struct jpeg_decomp_master *master;
struct jpeg_d_main_controller *main; <<-- there’s a function pointer here
struct jpeg_d_coef_controller *coef; <<-- there’s a function pointer here
struct jpeg_d_post_controller *post; <<-- there’s a function pointer here
struct jpeg_input_controller *inputctl;
struct jpeg_marker_reader *marker;
struct jpeg_entropy_decoder *entropy;
. . .
struct jpeg_upsampler *upsample;
struct jpeg_color_deconverter *cconvert
. . .
};
In the cinfo struct, we can see 3 pointers to functions that we can try to overwrite during the overwrite loop and divert the execution flow.
It turns out that the third option is applicable in our case!
Jemalloc 101
Before we dive into the Jemalloc exploitation concepts, we need to understand how Android’s heap allocator works, as well as all of the terms that we focus on in the next chapter – Chunks, Runs,Regions.
Jemalloc is a bucket-based allocator that divides memory into chunks, always of the same size, and uses these chunks to store all of its other data structures (and user-requested memory as well). Chunks are further divided into ‘runs’ that are responsible for requests/allocations up to certain sizes. A run keeps track of free and used ‘regions’ of these sizes. Regions are the heap items returned on user allocations (malloc calls). Finally, each run is associated with a ‘bin.’ Bins are responsible for storing structures (trees) of free regions.
Figure 6. Jemalloc basic design.
Controlling the PC register
We found 3 good function pointers that we can use to divert execution during the wildcopy and control the PC register.
Mozjpeg has its own memory manager. The JPEG library’s memory manager controls allocating and freeing memory, and it manages large “virtual” data arrays. All memory and temporary file allocation within the library is done via the memory manager. This approach helps prevent storage-leak bugs, and it speeds up operations whenever malloc/free are slow.
The memory manager creates “pools” of free storage, and a whole pool can be freed at once.
Some data is allocated “permanently” and is not freed until the JPEG object is destroyed.
Most of the data is allocated “per image” and is freed by jpeg_finish_decompress or jpeg_abort functions.
For example, let’s look at one of the allocations that Mozjpeg did as part of the image decoding process. When Mozjpeg asks to allocate 0x108 bytes, in reality malloc is called with the size 0x777. As you can see, the requested size and the actual size allocated are different.
Let’s analyze this behavior.
Mozjpeg uses wrapper functions for small and big allocations alloc_small and alloc_large.
The allocated “pools” are managed by alloc_small and the other wrapper functions which maintain a set of members that help them monitor the state of the “pools.” Therefore, whenever there is an allocation request, the wrapper functions check if there is enough space left in the “pool.”
If there is space available, the alloc_small function returns an address from the current “pool” and advances the pointer that points to the free space.
When the “pool” runs out of space, it allocates another “pool” using predefined sizes that it reads from the first_pool_slop array, which in our case are 1600 and 16000.
static const size_t first_pool_slop[JPOOL_NUMPOOLS] = {
1600, /* first PERMANENT pool */
16000 /* first IMAGE pool */
};
Now that we understand how Mozjpeg’s memory manager works, we need to figure out which “pool” of memory holds our targeted virtual function pointers.
As part of the decompression process, there are two major functions that decode the image metadata and prepare the environment for later processing. The two major functions jpeg_read_header and jpeg_start_decompress are the only functions that allocate memory until we reach our wild copy loop.
jpeg_read_header parses the different markers from the file.
While parsing those markers, the second and largest “pool” of size 16000 (0x3e80) gets allocated by the Mozjpeg memory manager. The sizes of the “pools” are const values from the first_pool_slop array (from the code snippet above), which means that the Mozjpeg’s internal allocator already used all of the space of the first pool.
We know that our targeted main, coef and post structures get allocated from within the jpeg_start_decompress function. We can therefore safely assume that the rest of the allocations (until we reach our wildcopy loop) will end up being in the second big “pool” including the main, coef and post structures that we want to override!
Now let’s have a closer look on how Jemalloc deals with this type of size class allocation.
Using Shadow to put some light
Allocations returned by Jemalloc are divided into three size classes- small, large, and huge.
Small/medium: These regions are smaller than the page size (typically 4KB).
Large: These regions are between small/medium and huge (between page size to chunk size).
Huge: These are bigger than the chunk size. They are dealt with separately and not managed by arenas; they have a global allocator tree.
Memory returned by the OS is divided into chunks, the highest abstraction used in Jemalloc’s design. In Android, those chunks have different sizes for different versions. They are usually around 2MB/4MB. Each chunk is associated with an arena.
A run can be used to host either one large allocation or multiple small allocations.
Large regions have their own runs, i.e. each large allocation has a dedicated run.
We know that our targeted “pool” size is (0x3e80=16,000 DEC) which is bigger than page size (4K) and smaller than Android chunk size. Therefore, Jemalloc allocates a large run of size (0x5000) each time!
Let’s take a closer look.
(gdb)info registers X0
X0 0x3fc7
(gdb)bt
#0 0x0000007e6a0cbd44 in malloc () from target:/system/lib64/libc.so
#1 0x0000007e488b3e3c in alloc_small () from target:/data/data/com.instagram.android/lib-superpack-zstd/libfb_mozjpeg.so
#2 0x0000007e488ab1e8 in get_sof () from target:/data/data/com.instagram.android/lib-superpack-zstd/libfb_mozjpeg.so
#3 0x0000007e488aa9b8 in read_markers () from target:/data/data/com.instagram.android/lib-superpack-zstd/libfb_mozjpeg.so
#4 0x0000007e488a92bc in consume_markers () from target:/data/data/com.instagram.android/lib-superpack-zstd/libfb_mozjpeg.so
#5 0x0000007e488a354c in jpeg_consume_input () from target:/data/data/com.instagram.android/lib-superpack-zstd/libfb_mozjpeg.so
#6 0x0000007e488a349c in jpeg_read_header () from target:/data/data/com.instagram.android/lib-superpack-zstd/libfb_mozjpeg.so
We can see that the actual allocated values sent to malloc are indeed (0x3fc7). This matches the large “pool” size of 16000 (0x3e80) plus the sizes of Mozjpeg’s large_pool_hdr, and the actual size of the object that was supposed to be allocated and ALIGN_SIZE(16/32) – 1.
One thing which can really make a huge difference when implementing heap shaping for an exploit is having a way to visualize the heap: to see the various allocations in the context of the heap.
For this we use a simple tool which allows us to inspect the heap state for a target process during exploit development. We used a tool called “shadow” that argp and vats wrote for visualizing the Jemalloc heap.
We performed a debugging session using shadow over gdb to verify our assumptions regarding the large run that we wish to override.
Our goal is to exploit an integer overflow that leads to a heap buffer overflow.
Exploiting these kinds of bugs is all about precise positioning of heap objects. We want to force certain objects to be allocated in specific locations in the heap, so we can form useful adjacencies for memory corruption.
To achieve this adjacency, we need to shape the heap so our exploitable object is allocated just before our targeted object.
Unfortunately, we have no control over free operations. According to Mozjpeg documentation, “most of the data is allocated “per image” and is freed by jpeg_finish_decompress, or jpeg_abort.” This means that all of the free operations occur at the end of the decompression process using jpeg_finish_decompress, or jpeg_abort which is only called after we have finished overriding memory with our wildcopy loop.
However, in our case we don’t need any free operations because we have control over a function which performs a raw malloc with a size that we control. This gives us the power to choose where we want to place our overflowed buffer on the heap.
We want to position the object containing our overflowed buffer just before the large (0x5000) object containing the main/post/coef data structures that performs a call to function pointers.
Figure 7. Visualizing Jemalloc objects on the heap.
Therefore, the simplest way for us to exploit this is to shape the heap so that the overflowed buffer is allocated right before our targeted large (0x5000) object, and then (use the bug to) overwrite the main/post/coef virtual functions address to our own. This gives us full control of the virtual table that redirects any method to any code address.
We know that the targeted object is always at the same (0x5000) large size, and because Jemalloc allocates large sizes from top to bottom, the only thing we need is to place our overflow objects in the bottom of the same chunk where the large target object is located.
Jemalloc’s chunk size is 2MB in our tested Android version.
The distance (in bytes) between the objects doesn’t matter because we have a wildcopy loop that can copy enormous amounts of data line by line (we control the size of the line). The data that is copied is ultimately larger than 2MB, so we know for sure that we will end up corrupting every object on the chunk that is located after our overflow object.
As we don’t have any control over free operations, we cannot create holes that our object will fall to. (A hole is one or more free places in a run.) Instead, we tried looking for holes that happen anyways as part of the image decompression flow, looking for sizes that repeat every time during debugging.
Let’s use the shadow tool to examine our chunk’s layout in memory:
(gdb) jechunk 0x72a6200000
This chunk belongs to the arena at 0x72c808fc00.
addr info size usage
------------------------------------------------------------
0x72a6200000 headers 0xd000 -
0x72a620d000 large run 0x1b000 -
0x72a6227000 large run 0x1b000 -
0x72a6228000 small run (0x180) 0x3000 10/32
0x72a622b000 small run (0x200) 0x1000 8/8
...
...
0x72a638f000 small run (0x80) 0x1000 6/32
0x72a6390000 small run (0x60) 0x3000 12/128
0x72a6393000 small run (0xc00) 0x3000 4/4
0x72a6396000 small run (0xc00) 0x3000 4/4
0x72a6399000 small run (0x200) 0x1000 2/8
0x72a639a000 small run (0xe0) 0x7000 6/128 <===== The run we want to hit!!!
0x72a63a1000 small run (0x1000) 0x1000 1/1
0x72a63a2000 small run (0x1000) 0x1000 1/1
0x72a63a3000 small run (0x1000) 0x1000 1/1
0x72a63a4000 small run (0x1000) 0x1000 1/1
0x72a63a5000 large run 0x5000 - <===== Large targeted object!!!
We are looking for runs with holes, and those runs must be before the large targeted buffer we want to override. A run can be used to host either one large allocation, or multiple small/medium allocations.
Runs that host small allocations are divided into regions. A region is synonymous to a small allocation. Each small run hosts regions of just one size. In other words, a small run is associated with exactly one region size class.
Runs that host medium allocations are also divided into regions, but as the name indicates, they are bigger than the small allocations. Therefore, the runs that host medium allocations are divided into bigger size class regions that take up more space.
For example, a small run of size class 0xe0 is divided into 128 regions:
0x72a639a000 small run (0xe0) 0x7000 6/128
Medium runs of size class 0x200 are divided into 8 regions:
0x72a6399000 small run (0x200) 0x1000 2/8
Small allocations are the most common allocations, and most likely the ones you need to manipulate/control/overflow. As small allocations are divided into more regions, they are easier to control as it is less likely that other threads will allocate all of the remaining regions.
Therefore, to cause the overflowable object to be allocated before the large targeted object, we use our Python script from (Wild Copy Exploitation paragraph). The script helps us generate the dimensions that will cause the malloc to allocate our overflowable object in our targeted small size class.
We constructed a new JPEG image with the sizes to trigger allocation to the small size class of (0xe0) objects and set a breakpoint on libjepgutils_moz.so+0x918.
The address we got back from malloc is the address of our overflowable object (0x72a639ac40). Let’s examine its location on the heap using the jeinfo method from the shadow framework.
(gdb) jeinfo 0x72a639ac40
parent address size
--------------------------------------
arena 0x72c808fc00 -
chunk 0x72a6200000 0x200000
run 0x72a639a000 0x7000
region 0x72a639ac40 0xe0
We are at the same chunk (0x72a6200000) as our targeted large object! Let’s look at the chunk’s layout again to make sure that our overflowable buffer is at the small size class (0xe0) that we aimed to hit.
(gdb) jechunk 0x72a6200000
This chunk belongs to the arena at 0x72c808fc00.
…
...
0x72a639a000 small run (0xe0) 0x7000 7/128 <-----hit!!!
0x72a63a1000 small run (0x1000) 0x1000 1/1
0x72a63a2000 small run (0x1000) 0x1000 1/1
0x72a63a3000 small run (0x1000) 0x1000 1/1
0x72a63a4000 small run (0x1000) 0x1000 1/1
0x72a63a5000 large run 0x5000 - <------Large targeted object!!!
Yesss! Now let’s continue the execution and see what happens when we overwrite the large targeted object.
(gdb) c
Continuing.
[New Thread 29767.30462]
Thread 93 "IgExecutor #19" received signal SIGBUS, Bus error.
0xff9d9588ff989083 in ?? ()
BOOM! Exactly what we were aiming for–the crash occurred while trying to load a function address through the function pointer for our corrupted data from the overflowable object. We got a Bus error (also known as SIGBUS and is usually signal 10) which occurs when a process is trying to access memory that the CPU cannot physically address. In other words, the memory the program tried to access is not a valid memory address because it contains the data from our image that replaced the real function pointer and led to this crash!
Putting everything together
We have a controlled function call. All that is missing for a reliable exploit is to redirect execution to a convenient gadget to stack pivot, and then build an ROP stack.
Now we need to put everything together and (1) construct an image with malformed dimensions that (2) triggers the bug, which then(3) leads to a copy of our controlled payload that (4) diverts the execution to an address that we control.
We need to generate a corrupted JPEG with our controlled data. Therefore, our next step was to determine exactly what image formats are supported by the Mozjpeg platform. We can figure that out from that piece of code below. out_color_space represents the amount of bits per pixel that is determined according to the image format.
switch (cinfo->out_color_space) {
case JCS_GRAYSCALE:
cinfo->out_color_components = 1;
Break;
case JCS_RGB:
case JCS_EXT_RGB:
case JCS_EXT_RGBX:
case JCS_EXT_BGR:
case JCS_EXT_BGRX:
case JCS_EXT_XBGR:
case JCS_EXT_XRGB:
case JCS_EXT_RGBA:
case JCS_EXT_BGRA:
case JCS_EXT_ABGR:
case JCS_EXT_ARGB:
cinfo->out_color_components = rgb_pixelsize[cinfo->out_color_space];
Break;
case JCS_YCbCr:
case JCS_RGB565:
cinfo->out_color_components = 3;
break;
case JCS_CMYK:
case JCS_YCCK:
cinfo->out_color_components = 4;
break;
default:
cinfo->out_color_components = cinfo->num_components;
Break;
We used a simple Python library called PIL to construct a RGB BMP file. We chose the RGB format that is familiar and known to us and we filled it with “AAA” as payload. This file is the base image format that we use to create our malicious compressed JPEG.
from PIL import Image
img = Image.new('RGB', (100, 100))
pixels = img.load()
for i in range(img.size[0]):
for j in range(img.size[1]):
pixels[i,j] = (0x41, 0x41, 0x41)
img.save('rgb100.bmp')
We then used the cjpeg tool from the Mozjpeg project to compress our bmp file into a JPEG file.
Next, we tested the compressed output file to test our assumptions. We know that the RGB format is 3 bytes per pixel.
We verified that the code does set cinfo->out_color_space = 0x2 (JCS_RGB) correctly. However, when we checked our controlled allocation, we saw that the height and width arguments as part of the integer overflow are still multiplied by out_color_components which is equal to 4, even though we started with a RGB format using a 3×8-bits per pixel. It seems that Mozjpeg prefers to convert our image to a 4×8-bits per pixel format.
We then turned to a 4×8-bit pixels format that is supported by the Mozjpeg platform, and the CMYK format met the criteria. We used the CMYK format as a base image to give us full control over all 4 bytes. We filled the image with “AAAA” as the payload.
We compressed it to a JPEG format and added the dimensions that trigger the bug. To our delight, we got the following crash!
Thread 93 "IgExecutor #19" received signal SIGBUS, Bus error.
0xff414141ff414141 in ?? ()
However, we got a weird 0xFF bytes as part of our controlled address even though we constructed a 4×8 bits per pixel image, and the 4th component is not part of our payload.
Bitmap file formats that support transparency include GIF, PNG, BMP, TIFF, and JPEG 2000, through either a transparent color or an alpha channel.
Bitmap-based images are technically characterized by the width and height of the image in pixels and by the number of bits per pixel.
Therefore, we decided to construct a RGBA BMP format file with our controlled alpha channel (0x61) using the PIL library.
from PIL import Image
img = Image.new('RGBA', (100, 100))
pixels = img.load()
for i in range(img.size[0]):
for j in range(img.size[1]):
pixels[i,j] = (0x41, 0x41, 0x41,0x61)
img.save('rgba100.bmp')
Surprisingly, we got the same results as when we used the CMYK malicious JPEG. We still we got an alpha channel of 0xFF as part of our controlled address even though we used a RGBA format as the base for the compressed JPEG, and we had our own alpha channel from the file with the value (0x61). How did this happen? Let’s go back to the code and understand the reason for that odd behavior.
We found the answer in this little piece of code below:
Figure 8. Setting cinfo->out_color_space to RGBA(0xC) as seen in the IDA disassembly snippet.
We found that Instagram decided to add their own const value after jpeg_read_header finished and before calling jpeg_start_decompress.
We used the RGB format from the first test and we saw that Mozjpeg does correctly set cinfo->out_color_space = 0x2 (JCS_RGB). However, from Instagram’s code (see Figure 3) we can see that this value is overwritten by a const value of 0xc which represents the (JCS_EXT_RGBA) format.
This also explains the weird 0xFF alpha channel that we got even though we used a 3×8-bits per pixel RGB object.
After diving further into the code, we saw that value of the alpha channel (0xFF) is hard coded as a const value. When Instagram sets the cinfo->out_color_space = 0xc to point to the (JCS_EXT_RGBA) format, the code copies 3 bytes from our input base file, and then the 4th byte copied is always the hardcoded alpha channel value.
#ifdef RGB_ALPHA
outptr[RGB_ALPHA] = 0xFF;
#endif
Now that we put everything together, we came to the conclusion that no matter what image format is used for the base of the compressed JPEG, Instagram always converts the output file to a RGBA format file.
The fact that 0xff is always added to the beginning means we could have achieved our goal in a big-endian environment.
Little-endian systems store the least-significant byte of a word at the smallest memory address. Because we’re dealing with a little-endian system, the alpha channel value is always written as the MSB (Most Significant Byte) of our controlled address. As we’re trying to exploit the bug in user mode, and the (0xFF) value belongs to the kernel address space, it foils our plans.
Is exploitation possible?
We lost our quick win. One lesson we can learn from this is that real life is not a CTF game, and sometimes one crucial const value set by a developer can ruin everything from an exploitation perspective.
Let’s recall the content from the main website of the Mozilla foundation about Mozjpeg:
“Mozjpeg’s sole purpose is to reduce the size of JPEG files that are served up on the web.”
From what we saw, Instagram will increase memory usage by 25% for each image we want to upload! That’s about 100 million per day!
To quote one sentence from a lecture that Halvar Flake gave in the last OffisiveCon:
“The only person in computing that is paid to actually understand the system from top to bottom is the attacker! Everybody else usually gets paid to do their parts.”
At this point, Facebook already patched the vulnerability so we stopped our exploitation effort even though we weren’t quite finished with it.
We still have 3 bytes overwrite, and in theory we could invest more time to find more useful primitives that could help us to exploit this bug. However, we decided we did enough and we have publicized the important point that we wanted to convey.
The Mozjpeg project on Instagram is just the tip of the iceberg when talking about Mozjpeg. The Mozilla-based project is still widely used in many other projects over the web, in particular Firefox, and it is also widely used as part of different popular open-source projects such as sharp and libvips projects (on the Github platform alone, they have more than 20k stars combined).
Conclusion & Recommendations
Our blog post describes how image parsing code, as a third party library, ends up being the weakest point of Instagram’s large system. Fuzzing the exposed code turned up some new vulnerabilities which have since been fixed. It is likely that, given enough effort, one of these vulnerabilities can be exploited for RCE in a zero-click attack scenario. Unfortunately, it is also likely that other bugs remain or will be introduced in the future. As such, continuous fuzz-testing of this and similar media format parsing code, both in operating system libraries and third party libraries, is absolutely necessary. We also recommend reducing the attack surface by restricting the receiver to a small number of supported image formats.
This field has been researched a lot by various appreciated independent security researchers as well as nationally-sponsored security researchers. Media format parsing remains an important issue. See also other researcher and vendor advisories:
Facebook’s advisory described this vulnerability as an “Integer Overflow leading to Heap Buffer Overflow – large heap overflow could occur in Instagram for Android when attempting to upload an image with specially crafted dimensions. This affects versions prior to 128.0.0.26.128.”
We at Check Point responsibly disclosed the vulnerability to Facebook, who released a patch on (February 10, 2020). Facebook acknowledged the vulnerability and assigned it CVE-2020-1895. The bug was tested for both 32bit & 64bit versions of the Instagram app.
Many thanks to my colleagues Eyal Itkin (@EyalItkin), Oleg Ilushin, Omri Herscovici (@omriher) for their help in this research.