Extracting kernel stack function arguments from Linux x86-64 kernel crash dumps

Original text

Introduction

It’s common, when analysing a kernel crash dump, to look at kernel tasks’ stack backtraces in order to see what the tasks are doing, e.g. what nested function calls led to the current position; this is easily displayed by the crash utility. We often want also to know the arguments to those function calls; unfortunately these are not so easily displayed.

This blog will illustrate some techniques for extracting kernel function call arguments, where possible, from the crash dump. Several worked examples are given. The examples are from the Oracle UEK kernel, but the techniques are applicable to any Linux kernel.

Note: The Python-Crash API toolkit pykdump includes the command fregs, which automates some of this process. However, it is useful to study how to do it manually, in order to understand what’s going on, and to be able to do it when pykdump may not be available, or if fregs fails to produce the desired result.

Basics

This section gives the minimum detail needed to use the techniques. Background explanatory detail will be given in a subsequent section.

You need to know a little bit of the x86 instruction set, but not much. You can get started knowing just that mov %r12,%rdi places the contents of cpu register %r12 into register %rdi, and that mov 0x8(%r14),%rcx takes the contents of register %r14, adds 8 to it, takes the result as the address of a memory location, reads the value at that memory location and puts it into register %rcx.

For the Linux kernel running on the x86-64 architecture, kernel function arguments are normally passed using 64-bit cpu registers. The first six arguments to a function are passed in the following cpu registers, respectively: %rdi%rsi%rdx%rcx%r8%r9.

To slightly complicate matters, these 64-bit register contents may be accessed via shorter subsets under different names (for compatibility with previous 32/16/8-bit instruction sets), as shown in the following table:

The contents of these registers are not preserved (for every kernel function call) in the crash dump. However, it is often possible to extract the values that were placed into these registers, from the kernel stack, or from memory.

In order to find out the arguments supplied to a given function, we need to look at the disassembled code for either the calling function, the called function, or both.

If we disassemble the caller’s code, we can see where it obtained the values to place into the registers used to pass the arguments to the called function. We can also disassemble the called function, to see if it stored the values it was passed in those registers, to its stack or to memory.

We may then be able to extract those values from the same place, if that is either on the kernel stack, or in kernel memory (if it has not been subsequently changed), both of which are normally included in the crash dump.

The techniques shown cover different ways that the compiler might have chosen to store the values that are passed in the registers:

  • The calling function might retrieve the values from:
    • Memory
    • The calling function’s stack, via an offset from the (fixed) base of the stack
    • Another register, which got its value from one of these above in turn
  • The called function might save the values it received in the registers, to:
    • Memory
    • The called function’s stack, via push
    • Another register, which might itself then be saved in turn

Therefore the technique we use is to:

  • Disassemble either or both of:
    • The calling function, leading up to the callq instruction, to see from where it obtains the values it passes to the called function in the registers
    • The called function, to see whether it puts the values from the registers onto its stack, or into memory
  • Inspect those areas (caller/callee stack and/or memory) to see if the values may be extracted

In some cases, it might not be possible to use any of the above methods to find the arguments passed. If this is the case, consider looking at another level of the stack: it’s quite common for the same value to be passed down from one function to the next. Thus although you might be unsuccessful trying to recover that argument’s value using the above methods for your function of interest, that same value might be passed down to further functions (or itself been passed down from earlier functions), and you might have more luck finding it looking at one of those other functions, using the methods above. The value might also be itself contained in another structure, which may be passed in a function argument. A knowledge of the code in question obviously helps in this case.

Finding a function’s stack frame base pointer

For some of the methods noted above, we will need to know how to find the (fixed) base of a kernel function’s stack. Whilst the function is executing, this is stored in the %rbp register, the function’s stack frame base pointer. We may find it in two places in the kernel task’s stack backtrace.

To show a kernel task’s stack backtrace, use bt:

Note: -sx tells bt to try to show addresses as as symbol name plus a hex offset.

The above lists the functions calls found in the stack; we also want to see the actual stack content, i.e. the content of the stack frames, for each function call. Let’s say we are interested in the arguments passed to the mutex_lock call; we may therefore need to look at its caller, which is do_last, so let’s concentrate on its stack frame:

Note: -FF tells bt to show all the data for a stack frame, symbolically, with its slab cache name, if appropriate.

The stack frame of the calling function do_last is shown above. Its stack frame base pointer 0xffff88180dc37d58 appears in two locations, shown highlighted with ***.

The stack frame base pointer, for a function, may be found:

  • As the second-last value in the stack frame above the function (i.e. above in the bt output)
  • As the location of the second-last value in the stack frame for the function

For now, just use the above to find the value of the stack frame base pointer, for a function, if you need it. The structure of the stack frame will be explained in the following section.

Summary of steps

  1. Note which registers you need, corresponding to the position of the called function’s arguments you need
    1. Refer to the register-naming table above, in case the quantities passed are smaller than 64-bit, e.g. integers, other non-pointer types. The 1st argument will be passed in %rdi%edi%di or %dil. Note that all the names contain «di«.
  2. Disassemble the calling function, and inspect the instructions leading up to where it calls the function you’re interested in. Note from where the compiler gets the values it places in those registers
    1. If from the stack, find the caller’s stack frame base pointer, and from there find the value in the stack frame
    2. If from memory, can you calculate the memory address used? If so, read the value from memory
    3. If from another register, from where was that register’s contents obtained? And see case 3.3 below.
  3. Disassemble the first part of the called function. Note where it stores the values passed in the registers you need
    1. If onto the stack, find the called function’s stack frame base pointer, and find the value in the stack frame
    2. If from memory, can you calculate the memory address used? If so, read the value from memory
    3. If the calling function obtained the value from another register (case 2.3 above) does the called function save that register to stack/memory?
  4. If none of the above gave a usable result, see if the values you need are passed to another function call further up or down the stack, or may be derived from a different value.
    1. For example the structure you want is referenced from another structure that is passed to a function elsewhere in the stack trace
  5. Once you’ve obtained answers, perform a sanity check
    1. Is the value obtained on a slab cache? If so, is the cache of the expected type?
    2. Is the value, or what it points to, of the expected type?
    3. If the value is a pointer to a structure, does the structure content look correct? e.g. pointers where pointers are expected, function op pointers pointing to real functions, etc
  6. Read the Caveats section, to understand whether you can rely on the answer you’ve found

At this point, you may either skip directly to the Worked Examples , or read on for more detail.

In more depth

This section gives more background. If you’re in a hurry, skip directly to the Worked Examples, and come back and read this later; it may help in understanding what’s going on, and in identifying edge cases and other apparently odd behaviour.

In Linux on x86-64, the kernel stack grows down, from larger towards smaller memory addresses. i.e. a particular function’s stack frame grows downwards from its fixed stack frame base pointer %rbp, with new elements being added via the push instruction, which first decrements the current stack pointer %rsp (which points to the item on the «top» (lowest memory address) of the stack) by the size of the element pushed onto the stack, then copies its argument to the stack location now pointed at by %rsp, which is left pointing at the new item on top of the stack.

However, the bt command shows the stack in ascending memory order, as you read down the page. Therefore we may imagine the bt display of a stack frame as like a pile of magazines stacked up on the table. The top line shown is the top of the stack, which is stored in the stack pointer register %rsp, and is where new items are pushed onto the stack (magazines added to the pile). The bottom line is the stack frame base (the table), which is fixed, stored in the stack frame base pointer %rbp (yes, I’m neglecting the function’s return address here).

Kernel function stack frame layout

The stack consists of multiple frames, one per function. The layout of a kernel function’s stack frame is as follows (in the ascending memory location order as shown by bt):

This is built-up in stages, as follows. When a function is called the callq instruction does two things:

  • Pushes the address of the instruction following the callq instruction onto the stack (still the caller’s stack frame, at this point). This will be the return address for the called function.
  • Jumps to the first instruction in the called function.

At this point, what will become the stack frame for the called function now looks like this:

The compiler has inserted the following preamble instructions before the start of the code for most called functions:

The push puts the caller’s stack frame base pointer on top of the stack frame, which now looks like this:

The mov changes the stack frame base pointer register %rbp to also point to the top of the stack, which now looks like this:

From now on, %rsp gets decremented as we add things to the top of the stack, but %rbp remains fixed, denoting the base of the stack. Below that, at the very bottom, is the return address, which records the location of the instruction after the callq instruction, in the calling function (this is the instruction to which this called function will return, via the retq instruction). From now on, the called function’s stack frame looks like this:

Caveats

  • The description here applies to arguments of simple type, e.g. integer, long, pointer, etc. Things are not quite the same for more complex types, e.g. struct (as a struct, not a pointer to a struct), float, etc. For more detail, refer to the References.
  • Remember that the crash dump contains data from when the system crashed, not from the point of execution of the instruction you may be looking at. For example:
    • Memory content will be that of when the system crashed, which may be many function calls deeper in the stack below where you are looking, some of which may have overwritten that area of memory
    • Stack frame content will be that of when the function (whose stack frame you’re looking at) called the next-deeper function. If the function you’re looking at went on to modify that stack location, before calling the next-deeper function, that is what you will see when you look at the stack frame
  • Remember that there may be more than one code path branch within a function, leading to a callq instruction. The different paths may populate the function-call registers from different sources, and/or with different values. Your linear reading of the instructions leading up to the callq may not be the path that the code took in every instance.

FAQ

References

  1. https://cs61.seas.harvard.edu/site/2018/Asm2/
  2. https://www.wikiwand.com/en/X86_calling_conventions#/x86-64_calling_conventions

Worked examples

Example 1

In this example, we have a hanging task, stuck trying to fsync a file. We want to obtain the struct file pointer, and fl_owner for the file in question.

Let’s start by trying to find it via the arguments to filp_close.

Note: bt -l shows file and line number of each stack trace function call.

int filp_close(struct
file
*filp, fl_owner_t
id
)
typedef void *fl_owner_t;

The compiler will use registers %rdi & %rsi, respectively, to pass the two arguments to filp_close.

Let’s look at the full stack frame for filp_close:

Let’s disassemble the calling function put_files_struct, to see where the compiler obtains the values it will pass in registers to filp_close:

The first argument, the struct file pointer will be passed in register %rdi. The compiler fills that register in this way:

We can’t easily retrieve the first argument using this method, since we don’t know the values of %rcx or %rax.

So how about the second argument? That is passed in register %rsi, which is populated from another register %r13:

0xffffffff8122d5e9 <put_files_struct+0x89>: mov %r13,%rsi

That’s not immediately helpful, until we notice what the called function filp_close does, immediately after being called:

crash7latest> dis -x filp_close |
head
0xffffffff8120b1b0 <filp_close>: push %rbp
0xffffffff8120b1b1 <filp_close+0x1>: mov %rsp,%rbp
0xffffffff8120b1b4 <filp_close+0x4>: push %r13

Notice that filp_close pushes %r13 onto its stack. This is the first push instruction that is done by filp_close after its initial push of %rbp. Let’s look again at the stack frame for filp_close:

#13 [ffff8807b1f1fc20] filp_close+0x36 at ffffffff8120b1e6
ffff8807b1f1fc28: ffffffffffffffff 000000000000ffff
ffff8807b1f1fc38: 0000000000000000 [ffff8800c0b21b80:files_cache]
ffff8807b1f1fc48: ffff8807b1f1fc98 put_files_struct+145

Referring back to the Basics section, we can identify the stack frame base pointer %rbp for filp_close as 0xffff8807b1f1fc48.

Referring back to the stack frame layout section, we can see that the stack for filp_close starts at the very bottom with its return address put_files_struct+145. The next address «up» (in the bt display) is location 0xffff8807b1f1fc48, which is filp_close‘s stack frame base pointer %rbp. It contains a pointer to the parent (put_files_struct) stack frame base pointer 0xffff8807b1f1fc98. From then on «up» are the normal stack pushes done by filp_close. Since the push of %r13 is the first push (following the preamble push of %rbp), we find it next: 0xffff8800c0b21b80, which is the value of fl_owner_t id.

To find on the stack the content of push number n (following the preamble push of %rbp) we calculate the address: %rbp — (n * 8) In this case, n == 1, the first push, so:

crash7latest> px (0xffff8807b1f1fc48 - 1*8)
$1 = 0xffff8807b1f1fc40

Note: px print expression in hex and read its contents:

crash7latest> rd 0xffff8807b1f1fc40
ffff8807b1f1fc40: ffff8800c0b21b80

Thus we find the value of fl_owner_t id == 0xffff8800c0b21b80.

We could, of course, simply have walked «up» the stack frame visually, counting pushes, rather than manually calculating the address.

We still need to find the first argument, the struct file, but we may find that elsewhere, in another function on the stack… it is also the first argument of:

int vfs_fsync(struct
file
*
file
, int datasync)

and so will be passed in register %rdi.

Here’s the relevant extract from the stack backtrace:

#11 [ffff8807b1f1fbf0] vfs_fsync+0x1c at ffffffff8124123c
#12 [ffff8807b1f1fc00] nfs_file_flush+0x80 at ffffffffc02d2630 [nfs]

Let’s disassemble the caller, leading up to the call:

On line 8 we see that register %rdi — the first argument to vfs_fsync — is populated from register %rbx. (Whilst we’re here, note that the second argument is passed in register %esi, which is the 32-bit subset of the 64-bit register %rsi, since the second argument is an integer: int datasync)

Now disassemble the called function:

We see that vfs_fsync does not save %rbx on its stack, but nor does it alter it before calling vfs_fsync_range. Now disassemble the latter:

crash7latest> dis -x vfs_fsync_range | head

0xffffffff81241170 <vfs_fsync_range>: push %rbp

0xffffffff81241171 <vfs_fsync_range+0x1>: mov %rsp,%rbp

0xffffffff81241174 <vfs_fsync_range+0x4>: push %r14

0xffffffff81241176 <vfs_fsync_range+0x6>: push %r13

0xffffffff81241178 <vfs_fsync_range+0x8>: push %r12

0xffffffff8124117a <vfs_fsync_range+0xa>: push %rbx

We see that vfs_fsync_range saves %rbx to its stack. It’s the fourth push (after the preamble).

Find vfs_fsync_range’s stack frame base pointer: 0xffff8807b1f1fbe8. Use the method shown in the Basics section.

Find the value four 8-byte values (four pushes) up from the stack frame base:

crash7latest> px (0xffff8807b1f1fbe8 - 4*8)
$4 = 0xffff8807b1f1fbc8
crash7latest> rd 0xffff8807b1f1fbc8
ffff8807b1f1fbc8: ffff8807eb507b00

We have found our value:

struct
file
*
file
= 0xffff8807eb507b00

Perform a sanity check; let’s check that the file structure’s ops pointers point to an NFS function:

crash7latest> struct -p file.f_op ffff8807eb507b00 | grep
llseekllseek = 0xffffffffc034d2c0,

crash7latest> dis 0xffffffffc034d2c0 1

0xffffffffc034d2c0 nfs4_file_llseek: push %rbp

Example 2

Here’s a UEK4 dump, where a process is hung, blocked waiting for a mutex.

We want to find the mutex on which it is waiting. Let’s see how do_last calls mutex_lock:

dis -r (reverse) displays all instructions from the start of the routine up to and including the designated address.

dis -x overrides the default output format with hexadecimal format.

The first arg to mutex_lock is passed in %rdi, which is populated like this:

0xffffffff8121409d

<do_last+0x36d>: mov -0x48(%rbp),%rax0xffffffff812140a1

<do_last+0x371>: mov 0x30(%rax),%rax0xffffffff812140a5 <do_last+0x375>: lea 0xa8(%rax),%rdi

We need to start with do_last‘s stack frame (base) pointer %rbp.

That may be found here:

crash7latest> bt -FFsx

...

ffff88180dc37ca8: ***ffff88180dc37d58*** do_last+901

#5 [ffff88180dc37cb0] do_last+0x385 at ffffffff812140b5

%rbp is ffff88180dc37ca8, and the value at that location — denoted by (%rbp) — is ffff88180dc37d58

Then we can emulate the effects of the mov/lea instructions, to arrive at the value that do_last put into %rdi:

We can also note that since we’ve offset from the stack frame pointer %rbp, this value is on the stack, and bt will tell us more about it, specifically whether it’s part of a slab cache and, if so, which one:

Address 0xffff88180dc37d10 contains a pointer to something from the dentry slab cache, i.e. a dentry.

At this point, we have the dentry pointer in %rax. The next instruction offsets 0x30 from the dentry:

struct -o shows member offsets when displaying structure definitions; if used with an address or symbol argument, each member will be preceded by its virtual address.

struct -x overrides default output format with hexadecimal format.

So the above is the inode.

The next instruction offsets 0xa8 from the inode:

So the above is the mutex, and this (0xffff881d4603e6a8) is what ends up in %rdi, which becomes the first arg to mutex_lock, as expected:

*void __sched mutex_lock(struct mutex *lock)*

Having found the mutex, we would likely want to find its owner:

Example 3

In this example, we look at a UEK3 crash dump, from a system where processes were spending a lot of time in ‘D’ state waiting for an NFS server to respond. The system crashed since hung_task_panic was set (which is not a good idea on a production system, and should never be set on an NFS client or server).

Looking at the hung task:

From the hung task traceback, we can see that nfs_getattr is stuck waiting on a mutex. It wants to write-back dirty pages, before performing the getattr call, and it grabs the inode mutex to keep other apps out from writing whilst we’re trying to write-back. So, we need to find out who’s got that inode mutex.

Let’s see how nfs_getattr calls mutex_lock:

The first arg to mutex_lock is the mutex, which is passed in %rdi.

We can see that nfs_getattr fills %rdi from %rdx, before calling mutex_lock, but it also stores %rdx at an offset from its stack frame base pointer %rbp:

*0xffffffffc09e5dc5 <nfs_getattr+0x1a5>: mov %rdx,-0x38(%rbp)*

We can get nfs_getattr’s %rbp here:

So let’s calculate that offset:

12
crash7latest> px (0xffff9c88b79bfe30-0x38)
$1 = 0xffff9c88b79bfdf8

and read the value there, which will be the value of %rdx, i.e. the address of the mutex:

12
crash7latest> rd -x 0xffff9c88b79bfdf8
ffff9c88b79bfdf8: ffff9c8ed6bb4808

Note: rd reads memory

Let’s see who owns it:

crash7latest> mutex.owner 0xffff9c8ed6bb4808

owner = 0xffff9c5f0ab25140

crash7latest> task 0xffff9c5f0ab25140 | head

PID: 30365 TASK: ffff9c5f0ab25140 CPU: 5 COMMAND: "ls"

and what is that ls task doing?

So, the mutex is held by another NFS getattr task, that is in the process of performing the write-back. This is likely just part of the normal NFS writeback, blocked by congestion, a slow server, or some other interruption.

As mentioned already in general it is advised never to set hung_task_panic on a production NFS system (client or server).

Mutation XSS via namespace confusion – DOMPurify < 2.0.17 bypass

Original text by MICHAŁ BENTKOWSKI

In this blogpost I’ll explain my recent bypass in DOMPurify – the popular HTML sanitizer library. In a nutshell, DOMPurify’s job is to take an untrusted HTML snippet, supposedly coming from an end-user, and remove all elements and attributes that can lead to Cross-Site Scripting (XSS).

This is the bypass:

<form><math><mtext>
</form><form><mglyph>
<style></math><img src onerror=alert(1)>

Believe me that there’s not a single element in this snippet that is superfluous 🙂

To understand why this particular code worked, I need to give you a ride through some interesting features of HTML specification that I used to make the bypass work.

Usage of DOMPurify

Let’s begin with the basics, and explain how DOMPurify is usually used. Assuming that we have an untrusted HTML in 

htmlMarkup
 and we want to assign it to a certain 
div
, we use the following code to sanitize it using DOMPurify and assign to the 
div
:

div.innerHTML = DOMPurify.sanitize(htmlMarkup)

In terms of parsing and serializing HTML as well as operations on the DOM tree, the following operations happen in the short snippet above:

  1. htmlMarkup
     is parsed into the DOM Tree.
  2. DOMPurify sanitizes the DOM Tree (in a nutshell, the process is about walking through all elements and attributes in the DOM tree, and deleting all nodes that are not in the allow-list).
  3. The DOM tree is serialized back into the HTML markup.
  4. After assignment to 
    innerHTML
    , the browser parses the HTML markup again.
  5. The parsed DOM tree is appended into the DOM tree of the document.

Let’s see that on a simple example. Assume that our initial markup is 

A&lt;img src=1 onerror=alert(1)&gt;B
. In the first step it is parsed into the following tree:

Then, DOMPurify sanitizes it, leaving the following DOM tree:

Then it is serialized to:

A<img src=»1″>B

And this is what 

DOMPurify.sanitize
 returns. Then the markup is parsed again by the browser on assignment to innerHTML:

The DOM tree is identical to the one that DOMPurify worked on, and it is then appended to the document.

So to put it shortly, we have the following order of operations: parsing ➡️ serialization ➡️ parsing. The intuition may be that serializing a DOM tree and parsing it again should always return the initial DOM tree. But this is not true at all. There’s even a warning in the HTML spec in a section about serializing HTML fragments:

It is possible that the output of this algorithm [serializing HTML], if parsed with an HTML parser, will not return the original tree structure. Tree structures that do not roundtrip a serialize and reparse step can also be produced by the HTML parser itself, although such cases are typically non-conforming.

The important take-away is that serialize-parse roundtrip is not guaranteed to return the original DOM tree (this is also a root cause of a type of XSS known as mutation XSS). While usually these situations are a result of some kind of parser/serializer error, there are at least two cases of spec-compliant mutations.

Nesting FORM element

One of these cases is related to the FORM element. It is quite special element in the HTML because it cannot be nested in itself. The specification is explicit that it cannot have any descendant that is also a FORM:

This can be confirmed in any browser, with the following markup:

<form id=form1>INSIDE_FORM1<form id=form2>INSIDE_FORM2

Which would yield the following DOM tree:

The second 

form
 is completely omitted in the DOM tree just as it wasn’t ever there.

Now comes the interesting part. If we keep reading the HTML specification, it actually gives an example that with a slightly broken markup with mis-nested tags, it is possible to create nested forms. Here it comes (taken directly from the spec):

<form id=»outer»><div></form><form id=»inner»><input>

It yields the following DOM tree, which contains a nested form element:

This is not a bug in any particular browser; it results directly from the HTML spec, and is described in the algorithm of parsing HTML. Here’s the general idea:

  • When you open a 
    &lt;form&gt;
     tag, the parser needs to keep record of the fact that it was opened with a form element pointer (that’s how it’s called in the spec). If the pointer is not 
    null
    , then 
    form
     element cannot be created.
  • When you end a 
    &lt;form&gt;
     tag, the form element pointer is always set to 
    null
    .

Thus, going back to the snippet:

<form id=»outer»><div></form><form id=»inner»><input>

In the beginning, the form element pointer is set to the one with 

id="outer"
. Then, a 
div
 is being started, and the 
&lt;/form&gt;
 end tag set the form element pointer to 
null
. Because it’s 
null
, the next form with 
id="inner"
 can be created; and because we’re currently within 
div
, we effectively have a 
form
 nested in 
form
.

Now, if we try to serialize the resulting DOM tree, we’ll get the following markup:

<form id=»outer»><div><form id=»inner»><input></form></div></form>

Note that this markup no longer has any mis-nested tags. And when the markup is parsed again, the following DOM tree is created:

So this is a proof that serialize-reparse roundtrip is not guaranteed to return the original DOM tree. And even more interestingly, this is basically a spec-compliant mutation.

Since the very moment I was made aware of this quirk, I’ve been pretty sure that it must be possible to somehow abuse it to bypass HTML sanitizers. And after a long time of not getting any ideas of how to make use of it, I finally stumbled upon another quirk in HTML specification. But before going into the specific quirk itself, let’s talk about my favorite Pandora’s box of the HTML specification: foreign content.

Foreign content

Foreign content is a like a Swiss Army knife for breaking parsers and sanitizers. I used it in my previous DOMPurify bypass as well as in bypass of Ruby sanitize library.

The HTML parser can create a DOM tree with elements of three namespaces:

  • HTML namespace (
    http://www.w3.org/1999/xhtml
    )
  • SVG namespace (
    http://www.w3.org/2000/svg
    )
  • MathML namespace (
    http://www.w3.org/1998/Math/MathML
    )

By default, all elements are in HTML namespace; however if the parser encounters 

&lt;svg&gt;
 or 
&lt;math&gt;
 element, then it “switches” to SVG and MathML namespace respectively. And both these namespaces make foreign content.

In foreign content markup is parsed differently than in ordinary HTML. This can be most clearly shown on parsing of 

&lt;style&gt;
 element. In HTML namespace, 
&lt;style&gt;
 can only contain text; no descendants, and HTML entities are not decoded. The same is not true in foreign content: foreign content’s 
&lt;style&gt;
 can have child elements, and entities are decoded.

Consider the following markup:

<style><a>ABC</style><svg><style><a>ABC

It is parsed into the following DOM tree

Note: from now on, all elements in the DOM tree in this blogpost will contain a namespace. So 

html style
 means that it is a 
&lt;style&gt;
 element in HTML namespace, while 
svg style
 means that it is a 
&lt;style&gt;
 element in SVG namespace.

The resulting DOM tree proves my point: 

html style
 has only text content, while 
svg style
 is parsed just like an ordinary element.

Moving on, it may be tempting to make a certain observation. That is: if we are inside 

&lt;svg&gt;
 or 
&lt;math&gt;
 then all elements are also in non-HTML namespace. But this is not true. There are certain elements in HTML specification called MathML text integration points and HTML integration point. And the children of these elements have HTML namespace (with certain exceptions I’m listing below).

Consider the following example:

<math><style></style><mtext><style></style>

It is parsed into the following DOM tree:

Note how the 

style
 element that is a direct child of 
math
 is in MathML namespace, while the 
style
 element in 
mtext
 is in HTML namespace. And this is because 
mtext
 is MathML text integration points and makes the parser switch namespaces.

MathML text integration points are:

  • math mi
  • math mo
  • math mn
  • math ms

HTML integration points are:

  • math annotation-xml
     if it has an attribute called 
    encoding
     whose value is equal to either 
    text/html
     or 
    application/xhtml+xml
  • svg foreignObject
  • svg desc
  • svg title

I always assumed that all children of MathML text integration points or HTML integration points have HTML namespace by default. How wrong was I! The HTML specification says that children of MathML text integration points are by default in HTML namespace with two exceptions: 

mglyph
 and 
malignmark
. And this only happens if they are a direct child of MathML text integration points.

Let’s check that with the following markup:

<math><mtext><mglyph></mglyph><a><mglyph>

Notice that 

mglyph
 that is a direct child of 
mtext
 is in MathML namespace, while the one that is a child of 
html a
 element is in HTML namespace.

Assume that we have a “current element”, and we’d like determine its namespace. I’ve compiled some rules of thumb:

  • Current element is in the namespace of its parent unless conditions from the points below are met.
  • If current element is 
    &lt;svg&gt;
     or 
    &lt;math&gt;
     and parent is in HTML namespace, then current element is in SVG or MathML namespace respectively.
  • If parent of current element is an HTML integration point, then current element is in HTML namespace unless it’s 
    &lt;svg&gt;
     or 
    &lt;math&gt;
    .
  • If parent of current element is an MathML integration point, then current element is in HTML namespace unless it’s 
    &lt;svg&gt;
    &lt;math&gt;
    &lt;mglyph&gt;
     or 
    &lt;malignmark&gt;
    .
  • If current element is one of 
    &lt;b&gt;, &lt;big&gt;, &lt;blockquote&gt;, &lt;body&gt;, &lt;br&gt;, &lt;center&gt;, &lt;code&gt;, &lt;dd&gt;, &lt;div&gt;, &lt;dl&gt;, &lt;dt&gt;, &lt;em&gt;, &lt;embed&gt;, &lt;h1&gt;, &lt;h2&gt;, &lt;h3&gt;, &lt;h4&gt;, &lt;h5&gt;, &lt;h6&gt;, &lt;head&gt;, &lt;hr&gt;, &lt;i&gt;, &lt;img&gt;, &lt;li&gt;, &lt;listing&gt;, &lt;menu&gt;, &lt;meta&gt;, &lt;nobr&gt;, &lt;ol&gt;, &lt;p&gt;, &lt;pre&gt;, &lt;ruby&gt;, &lt;s&gt;, &lt;small&gt;, &lt;span&gt;, &lt;strong&gt;, &lt;strike&gt;, &lt;sub&gt;, &lt;sup&gt;, &lt;table&gt;, &lt;tt&gt;, &lt;u&gt;, &lt;ul&gt;, &lt;var&gt;
     or 
    &lt;font&gt;
     with 
    color
    face
     or 
    size
     attributes defined, then all elements on the stack are closed until a MathML text integration point, HTML integration point or element in HTML namespace is seen. Then, the current element is also in HTML namespace.

When I found this gem about 

mglyph
 in HTML spec, I immediately knew that it was what I’d been looking for in terms of abusing 
html form
 mutation to bypass sanitizer.

DOMPurify bypass

So let’s get back to the payload that bypasses DOMPurify:

<form><math><mtext></form><form><mglyph><style></math><img src onerror=alert(1)>

The payload makes use of the mis-nested 

html form
 elements, and also contains 
mglyph
 element. It produces the following DOM tree:

This DOM tree is harmless. All elements are in the allow-list of DOMPurify. Note that 

mglyph
 is in HTML namespace. And the snippet that looks like XSS payload is just a text within 
html style
. Because there’s a nested 
html form
, we can be pretty sure that this DOM tree is going to be mutated on reparsing.

So DOMPurify has nothing to do here, and returns a serialized HTML:

1<form><math><mtext><form><mglyph><style></math><img src onerror=alert(1)></style></mglyph></form></mtext></math></form>

This snippet has nested 

form
 tags. So when it is assigned to 
innerHTML
, it is parsed into the following DOM tree:

So now the second 

html form
 is not created and 
mglyph
 is now a direct child of 
mtext
, meaning it is in MathML namespace. Because of that, 
style
 is also in MathML namespace, hence its content is not treated as a text. Then 
&lt;/math&gt;
 closes the 
&lt;math&gt;
 element, and now 
img
 is created in HTML namespace, leading to XSS.

Summary

To summarize, this bypass was possible because of a few factors:

  • The typical usage of DOMPurify makes the HTML markup to be parsed twice.
  • HTML specification has a quirk, making it possible to create nested 
    form
     elements. However, on reparsing, the second 
    form
     will be gone.
  • mglyph
     and 
    malignmark
     are special elements in the HTML spec in a way that they are in MathML namespace if they are a direct child of MathML text integration point even though all other tags are in HTML namespace by default.
  • Using all of the above, we can create a markup that has two 
    form
     elements and 
    mglyph
     element that is initially in HTML namespace, but on reparsing it is in MathML namespace, making the subsequent 
    style
     tag to be parsed differently and leading to XSS.

After Cure53 pushed update to my bypass, another one was found:https://platform.twitter.com/embed/index.html?dnt=false&embedId=twitter-widget-1&frame=false&hideCard=false&hideThread=false&id=1307929537749999616&lang=en&origin=https%3A%2F%2Fresearch.securitum.com%2Fmutation-xss-via-mathml-mutation-dompurify-2-0-17-bypass%2F&theme=light&widgetsVersion=ed20a2b%3A1601588405575&width=550px

I leave it as an exercise for the reader to figure it out why this payload worked. Hint: the root cause is the same as in the bug I found.

The bypass also made me realize that the pattern of

1div.innerHTML = DOMPurify.sanitize(html)

Is prone to mutation XSS-es by design and it’s just a matter of time to find another instances. I strongly suggest that you pass 

RETURN_DOM
 or 
RETURN_DOM_FRAGMENT
 options to DOMPurify, so that the serialize-parse roundtrip is not executed.

As a final note, I found the DOMPurify bypass when preparing materials for my upcoming remote training called XSS Academy. While it hasn’t been officially announced yet, details (including agenda) will be published within two weeks. I will teach about interesting XSS tricks with lots of emphasis on breaking parsers and sanitizers. If you already know that you’re interested, please contact us on training@securitum.com and we’ll have your seat booked!