[Case study] Decrypt strings using Dumpulator

Original text by m4n0w4r

1. References
2. Code analysis

I received a suspicious Dll that needs to be analyzed. This Dll is packed. After unpacking it and throwing the Dll into IDA, IDA successfully analyzed it with over 7000 functions (including API/library function calls). Upon quickly examining at the Strings tab, I came across numerous strings in the following format:

Based on the information provided, I believe these strings have definitely been encrypted. Going through the code snippet using an arbitrary string, I found the corresponding assembly code and pseudocode as follows (function and variable names have been changed accordingly):

With the image above, it is easy to see:

  • The 
    <mark><strong>EAX</strong>&nbsp;</mark>
    register will hold the address of the encrypted string.
  • The 
    <mark><strong>EDX</strong>&nbsp;</mark>
    register will hold the address of the string after decryption.
  • The 
    <mark><strong>mw_decrypt_str_wrap</strong>&nbsp;</mark>
    function performs the task of decrypting the string.

Here, if any of you have the same idea of analyzing the 

<mark><strong>mw_decrypt_str_wrap</strong> </mark>
function to rewrite the IDApython code for decryption, congratulations to you 🙂 You share the same thought as me! The 
<mark><strong>mw_decrypt_str_wrap</strong> </mark>
function will call the 
<mark><strong>mw_decrypt_str</strong> </mark>
function.

After going around various functions and thinking about how to code, I started feeling increasingly discouraged. Moreover, when examining the cross-references to the 

<mark>mw_decrypt_str_wrap </mark>
function, I noticed that it was called over 4000 times to decrypt strings… WTF 😐

3. Use dumpulator

As shown in the above image, there are too many function calls to the decryption function. Moreover, rewriting this decryption function would be time-consuming and require code debugging for verification. I think I need to find a way to emulate this function to perform the decryption step and retrieve the decrypted string. Several solutions came to mind, and I also asked my brother, who suggested using x or y solutions. After some trial and error, I decided to try using dumpulator. To be able to use dumpulator, we first need to create a minidump file of this DLL (dump when halted at DllEntryPoint). After obtaining the dump file, I tested the following code snippet:

from dumpulator import Dumpulator
 
dec_str_fn = 0x02FE08C0
enc_str_offset = 0x02FD9988
 
dp = Dumpulator("mal_dll.dmp", quiet=True)
tmp_addr = dp.allocate(256)
dp.call(dec_str_fn, [], regs={'eax':enc_str_offset , 'edx': tmp_addr})
dec_str = dp.read_str(dp.read_long(tmp_addr))
print(f"Encrypted string: '{dp.read_str(enc_str_offset)}'")
print(f"Decrypted string: '{dec_str}'")

Result when executing the above code:

0ly Sh1T… 😂 that’s exactly what I wanted.

Next, I will rewrite the code according to my intention as follows:

  • Use regex to search for patterns and extract all encoded string addresses.
  • Filter out addresses that match the pattern but are not decryption functions or undefined addresses and add them to the 
    <mark>BLACK_LIST</mark>
    .

Here’s a lame code snippet that meets my needs:

import re
import struct
import pefile
from dumpulator import Dumpulator
 
dump_image_base = 0x2F80000
dec_str_fn = 0x02FE08C0
 
BLACK_LIST = [0x3027520, 0x30380b6, 0x30380d0, 0x3039a08, 0x3039169, 0x303a6b6, 0x303aa0e, 0x303ab5c, 0x303bbf3, 0x3066075, 0x306661b, 0x3083e50,
              0x3084373, 0x30856d1, 0x30858aa, 0x308c7ac, 0x308d02d, 0x30acbfd, 0x30cd12e, 0x30cd187, 0x30cd670, 0x30cd6d4, 0x30cfe2f, 0x30d4cc4,
              0x3106da0]
 
FILE_PATH = 'dumped_dll.dll'
dp = Dumpulator("mal_dll.dmp", quiet=True)
 
file_data = open(FILE_PATH, 'rb').read()
pe = pefile.PE(data=file_data)
 
egg = rb'\x8D\x55.\xB8(....)\xE8....\x8b.'
tmp_addr = dp.allocate(256)
 
def decrypt_str(xref_addr, enc_str_offset):    
    print(f"Processing xref address at: {hex(xref_addr)}")
    print(f"Encryped string offset: {hex(enc_str_offset)}")
    dp.call(dec_str_fn, [], regs={'eax': enc_str_offset, 'edx': tmp_addr})
    dec_str = dp.read_str(dp.read_long(tmp_addr))
    print(f"{hex(xref_addr)}: {dec_str}\n")
    return dec_str
     
for m in re.finditer(egg, file_data):
    enc_str_offset = struct.unpack('<I', m.group(1))[0]
    inst_offset = m.start() 
    enc_str_offset_in_dmp = enc_str_offset - 0x400000 + dump_image_base
    call_fn_addr = inst_offset + 8 - 0x400 + dump_image_base + 0x1000
    if call_fn_addr not in BLACK_LIST:
        str_ret =  decrypt_str(call_fn_addr, enc_str_offset_in_dmp)
 
print(f"H0lY SH1T... IT's D0NE!!!")

Result when executing the above script:

No errors whatsoever 😈!!! As a final step, I added a code snippet to this script that will output a Python file. This file will contain the 

<mark><strong>idc.set_cmt</strong>&nbsp;</mark>
commands to set comment for the decrypted strings above at the address where the decrypt function is called. 

The final result is as follows:

End.

m4n0w4r