Advanced CyberChef Techniques For Malware Analysis — Detailed Walkthrough and Examples

Advanced CyberChef Techniques For Malware Analysis - Detailed Walkthrough and Examples

Original by Matthew

We’re all used to the regular CyberChef operations like «From Base64», From Decimal and the occasional magic decode or xor. But what happens when we need to do something more advanced?

Cyberchef contains many advanced operations that are often ignored in favour of Python scripting. Few are aware of the more complex operations of which Cyberchef is capable. These include things like Flow Control, Registers and various Regular Expression capabilities. 

In this post. We will break down some of the more advanced CyberChef operations and how these can be applied to develop a configuration extractor for a multi-stage malware loader. 

Examples of Advanced Operations in CyberChef

Before we dive in, let’s look at a quick summary of the operations we will demonstrate. 

  • Registers 
  • Regular Expressions and Capture Groups
  • Flow Control Via Forking and Merging
  • Merging
  • Subtraction
  • AES Decryption

After demonstrating these individually to show the concepts, we will combine them all to develop a configuration extractor for a multi-stage malware sample.

Obtaining the Sample 

The sample demonstrated can be found on Malware Bazaar with


Advanced Operation 1 — Registers

Registers allow us to create variables within the CyberChef session and later reference them when needed. 

Registers are defined via a regular expression capture group and allow us to create a variable with an unknown value that fits a known pattern within the code. 

How To Use Registers in CyberChef

Below we have a Powershell script utilising AES decryption. 

Traditionally, this is easy to decode using CyberChef by manually copying out the key value and pasting it into an «AES Decrypt» Operation.

We can see the key copied into an AES Decrypt operation.

This method of manually copying out the key works effectively, however this means that the key is «hardcoded» and the recipe will not apply to similar samples using the same technique. 

If another sample utilises a different key, then this new key will need to be manually updated for the CyberChef recipe to work. 

Registers Example 1 

By utilising a «Register» operation, we can develop a regular expression to match the structure of the AES key and later access this via a register variable like 


The AES key, in this case, is a 44-character base64 string, hence we can use a base64 regular expression of 44-46 characters to extract the AES Key. 

We can later access this via the $R0 variable inside of the AES Decrypt operation.

Registers Example 2

In a previous stage of the same sample, the malware utilises a basic subtract operation to create ASCII char codes from an array of large integers.

Traditionally, this would be decoded by manually copying out the 787 value and applying this to a subtract operation. 

However, again, this causes issues if another sample utilises the same technique but with a different value. 

A better method is to create another register with a regular expression that matches the 787 value. 

Here we can see an example of this, where a Register has been used to locate and store the 787 value inside of $R0. This can later be referenced in a subtract operation by referencing $R0.

Regular Expressions

Regular expressions are frustrating, tedious and difficult to learn. But they are extremely powerful and you should absolutely learn them in order to improve your Cyberchef and malware analysis capability. 

In the development of this configuration extractor, regular expressions are applied in 10 separate operations. 

Regular Expressions — Use Case 1 (Registers)

The first use of regular expressions is inside of the initial register operation. 

Here, we have applied a regex to extract a key value used later as part of the deobfuscation process. 

The key use of regex here is to generically capture keys related to the decoding process, avoiding the need to hardcode values and allowing the recipe to work across multiple samples.

How To Use Regular Expressions to Isolate Text

The second use of regular expressions in this recipe is to isolate the main array of integers containing the second stage of the malware. 

The second stage is stored inside a large array of decimal values separated by commas and contained in round brackets. 

By specifying this inside of a regex, we can extract and isolate the large array and effectively ignore the rest of the code. This is in contrast to manually copying out the array and starting a new recipe.

A key benefit here is the ability to isolate portions of the code without needing to copy and paste. This enables you to continue working inside of the same recipe

Regular Expressions — Use Case 3 (Appending Values)

Occasionally you will need to append values to individual lines of output. 

In these cases, a regular expression can be utilised to capture an entire line 

 and then replace it with the same value (via capture group referenced in $1) followed by another value (our initial register). 

The key use case is the ability to easily capture and append data, which is essential for operations like the subtract operator which will be later used in this recipe.

Regular Expressions — Use Case 4 (Extracting Encryption Keys)

We can utilise regular expressions inside of register operations to extract encryption keys and store these inside of variables. 

Here, we can see the 44-character AES key stored inside of the $R1 register. 

This is effective as the key is stored in a particular format across samples. Leveraging regex allows us to capture this format (44 char base64 inside single quotes) without needing to worry about the exact value.

Using Regular Expressions To Extract Base64 Text

Regular expressions can be used to isolate base64 text containing content of interest. 

This particular sample stores the final malware stage inside of a large AES Encrypted and Base64 encoded blob. 

Since we have already extracted the AES key via registers, we can apply the regex to isolate the primary base64 blob and later perform the AES Decryption.

Regular Expressions — Use Case 6 (Extracting Initial Characters)

This sample utilises the first 16 bytes of the base64 decoded content to create an IV for the AES decryption. 

We can leverage regular expressions and registers to extract out the first 16 bytes of the decoded content using 


This enables us to capture the IV and later reference it via a register to perform the AES Decryption.

Using Regular Expressions To Remove Trailing Null-Bytes

Regular expressions can be used to remove trailing null bytes from the end of data. 

This is particularly useful as sometimes we only want to remove null bytes at the «end» of data. Whereas a traditional «remove null bytes» will remove null bytes everywhere in the code. 

In the sample here, there are trailing null bytes that are breaking a portion of the decryption process.

By applying a null byte search  

 we can use a find/replace to remove these trailing null bytes. 

In this case, the 

 looks for one or more null bytes, and the 
 specifies that this must be at the end of the data.

After applying this operation, the trailing null bytes are now removed from the end of the data.

How To Use a Fork in CyberChef

Forking allows us to separate values and act on each independently as if they were a separate recipe. 

In the use case below, we have a large array of decimal values, and we need to subtract 787 from every single one. A major issue here is that in order to subtract 787, we need to append 787 after every single decimal value in the screenshot below. This would be a nightmare to do by hand.

As the data is structured and separated by commas, we can apply a forking operation with a split delimiter of commas and a merge delimiter of newlines. 

The split delimiter is whatever separates the values in your data, but the merge delimiter is how you want your new data structured.

At this point, every new line represents a new input data, and all future operations will act on each line independently.

If we now apply a find-replace operation, we can see that the operation has affected each line individually.

If we had applied the same concept without a fork, only a single 787 would have been added to the end of the entire blob of decimal data.

After applying the find/replace, we can continue to apply a subtraction operation and a «From Decimal». 

This reveals the decoded text and the next stage of the malware.

Note that the «Merge Delimiter» mentioned previously is purely a matter of formatting. 

Once you have decoded your content, as in the screenshot above, you will want to remove the merge delimiter to ensure that all the decoded content is together. 

We can see the full script after removing the merge delimiter.

How To Apply a Merge Operation in CyberChef

 operation is essentially an «undo» for fork operations.

After successfully decoding content using a fork, you should apply a 

 to ensure that the new content can be analysed appropriately. 

Without a merge, all future operations would affect only a single character and not the entire script.

Cyberchef is capable of AES Decryption via the AES Decrypt operation. 

To utilise AES decryption, look for the key indicators of AES inside of malware code and align all the variables with the AES operation. 

For example, align the Key, Mode, and IV. Then, plug these values into CyberChef.

Eventually, you can effectively automate this using Regular Expressions and Registers, as previously shown.

Configuration Extractor Walkthrough (22 Operations)

Utilising all of these techniques, we can develop a configuration extractor for a NetSupport Loader with 3 separate scripts that can all be decoded within the same recipe. 

This requires a total of 22 operations which will be demonstrated below.

The initial script is obfuscated using a large array of decimal integers. 

For each of these decimal values, the number 787 is subtracted and then the result is used as an ASCII charcode.

To decode this component, we must

  • Use a Register to extract the subtraction value
  • Use a Regular Expression to extract the decimal array
  • Use Forking to Separate each of the decimal values
  • Use a regular expression to append the 787 value stored in our register. 
  • Apply a Subtract operation to produce ASCII char codes
  • Apply a «From Decimal» to produce the 2nd stage
  • Use a Merge operation to enable analysis of the 2nd stage script. 

Operation 1 — Extracting Subtraction Value

The initial subtraction value can be extracted with a register operation and regular expression. 

This must be done prior to additional analysis and decoding to ensure that the subtraction value is stored inside of a register.

Operation 2 — Extracting the Decimal Array

The second step is to extract out the main decimal array using a regular expression and a capture group. 

The capture group ensures that we are isolating the decimal values and ignoring any script content surrounding it. 

This regex looks for decimals or commas 

 of length 1000 or more 
. That are surrounded by round brackets 

The inner brackets without escapes form the capture group.

Operation 3 — Separating the Decimal Values

The third operation leverages a Fork to separate the decimal values and act on each of them independently. 

The Fork defines a delimiter at the commas present in the original code, and specifies a Merge Delimiter of 

 to improve readability.

Operation 4 — Appending the Subtraction Value

The fourth operation uses a regex find/replace to append the 787 value to the end of each line created by the forking operation. 

Note that we have used 

 to capture the original decimal value, and have then used 
 to access it again. The 
 is used to access the register that can created in Operation 1.

Operation 5 — Subtracting the Values

We can now perform the subtraction operation after appending the 787 value in Operation 4. 

This produces the original ASCII char codes that form the second stage script. 

Note that we have specified a space delimiter, as this is what separates our decimal values from our subtraction values in operation 4.

Operation 6 — Decoding ASCII Code In CyberChef

We can now decode the ASCII codes using a «From Decimal» operation. 

This produces the original script. However, the values are separated via a newline due to our previous Fork operation.

Operation 7 — Merging the Result

We now want to act on the new script in it’s entirety, we do not want to act on each character independently. 

Hence, we will undo our forking operation by applying a Merge Operation and modifying the «Merge Delimiter» of our previous fork to an empty space.

Stage 2 — Powershell Script With AES Encryption (8 Operations)

After 7 operations, we have now uncovered a 2nd stage Powershell script that utilises AES Encryption to unravel an additional stage. 

The key points in this script that are needed for decrypting are highlighted below.

To Decode this stage, we must be able to

  • Use Registers to Extract the AES Key
  • Use Regex to extract the Base64 blob
  • Decode the Base64 blob
  • Use Registers to extract an Initialization Vector
  • Remove the IV from the output
  • Perform the AES Decryption, referencing our registers
  • Use Regex to Remove Trailing NullBytes
  • Perform a GZIP Decompression to unlock stage 3

CyberChef Operation 8 — Extracting an AES Key

We must now extract the AES Key and store it using a Register operation. 

We can do this by applying a Register and creating a regex for base64 characters that are exactly 44 characters in length and surrounded by single quotes. (We could also adjust this to be a range from 42 to 46)

We now have the AES key stored inside of the 

specifying register.

Operation 9 — Extracting the Base64 Blob

Now that we have the AES key, we can isolate the primary base64 blob that contains the next stage of the Malware. 

We can do this with a regular expression for Base64 text that is 100 or more characters in length. 

We’re also making sure to change the output format to «List Matches», as we only want the text that matches our regular expression.

Operation 10 — Decoding The Base64

This is a straightforward operation to decode the Base64 blob prior to the main AES Decryption.

Operation 11 — Extracting Initialization Vector

The first 16 bytes of the current data form the initialization vector for the AES decryption. 

We can extract this using another Register operation and specifying 

 to grab the first 16 characters from the current blob of data.

We know that these bytes are the IV due to this code in the original script. 

Note how the first 16 bytes are taken after base64 decoding, and then this is set to the IV.

Operation 12 — Dropping the Initial 16 Bytes

The initial 16 bytes are ignored when the actual AES decryption process takes place. 

Hence, we need to remove them by using a 

drop bytes
 operation with a length of 

We know this is the case because the script begins the decryption from an offset of 16 from the data.

This can be confirmed with the official documentation for TransformFinalBlock.

Operation 13 — AES Decryption

Now that the Key and IV for the AES Decryption have been extracted and stored in registers, we can go ahead and apply an AES Decrypt operation. 

Note how we can access our key and IV via the $R1 and $R2 registers that were previously created. We do not need to specify a key here.

Also note that we do need to specify base64 and utf8 for the key and IV, respectively, as these were their formats at the time when we extracted them

We can also note that ECB mode was chosen, as this is the mode specified in the script.

Operation 14 — Removing Trailing Null Bytes

The current data after AES Decryption is compressed using GZIP. 

However, Gunzip fails to execute due to some random null bytes that are present at the end of the data after AES Decryption.

Operation 14 involves removing these trailing null bytes using a Regular expression for «one or more null bytes \0+ at the end of the data $»

We will leave the «Replace» value empty as we want to remove the trailing null bytes.

Operation 15 — GZIP Decompression

We can now apply a Gunzip operation to perform the GZIP Decompression. 

This will reveal stage 3 of the malicious content, which is another Powershell script.

Note that we know Gzip was used as it is referenced in stage 2 after the AES Decryption process.

Stage 3 — Powershell Script (7 Operations)

We now have a stage 3 PowerShell script that leverages a very similar technique to stage 1. 

The obfuscated data is again stored in large decimal arrays, with the number 4274 subtracted from each value. 

Note that in this case, there are 4 total arrays of integers.

To Decode stage 3, we must perform the following actions

  • Use Registers to Extract The Subtraction Value
  • Use Regex to extract the decimal arrays
  • Use Forking to Separate the arrays
  • Use another Fork to Separate the individual decimal values
  • Use a find/replace to append the subtraction value
  • Perform the Subtraction
  • Restore the text from the resulting ASCII codes

Operation 16 — Extracting The Subtraction Value with Registers

Our first step of stage 3 is to extract the subtraction value and store it inside a register. 

We can do this by creating another register and implementing a regular expression to capture the value 

. We can specify a dollar sign, followed by characters, followed by equals, followed by integers, followed by a semicolon. 

Apply a capture group (round brackets) to the decimal component, as we want to store and use this later.

Operation 17 — Extracting and Isolating the Decimal Arrays

Now that we have the subtraction key, we can go ahead and use a regular expression to isolate the decimal arrays.

We have chosen a regex that looks for round brackets containing long sequences of integers and commas (at least 30). The inside of the brackets has been converted to a capture group by adding round brackets without escapes. 

We have also selected 

List Capture Groups
new line to list only the captured decimal values and commas.

Operation 18 — Separating the Arrays With Forking

We can now separate the decimal arrays by applying a fork operation. 

The current arrays are separated by a new line, so we can specify this as our split delimiter. 

In the interests of readability, we can specify our merge delimiter as a double newline. The double newline does nothing except make the output easier to read.

Operation 19 — Separating the Decimal Values With another Fork

Now that we’ve isolated the arrays, we need to isolate the individual integer values so that we can append the subtraction value. 

We can do this with another Fork operation, specifying a comma delimiter (as this is what separates our decimal values) and a merge delimiter of newline. Again, this new line does nothing but improve readability.

Operation 20 — Appending Subtraction Values

With the decimal values isolated, we can use a previous technique to capture each line and append the subtraction key currently stored in 


We can see the subtraction key appended to each line containing a decimal value.

Operation 21 — Applying the Subtraction Operation

We can now apply a subtract operation to subtract the value appended in the previous step. 

This restores the original ASCII char codes so we can decode them in the next step.

Operation 22 — Decoding the ASCII Codes

With the ASCII codes restored in their original decimal form, we can apply a from decimal operation to restore the original text. 

We can see the 

 string, albeit it is spaced out over newlines due to our forking operation.

Final Result — Extracting Malicious URLs

Now that the content is decoded, we can remove the readability step we added in Operation 19. 

That is, we can remove the 

Merge Delimiter
 that was added to improve the readability of steps 20 and 21.

With the 

Merge Delimiter
 removed, The output of the four decimal arrays will now be displayed.

A blueprint for evading industry leading endpoint protection in 2022

original text by vivami

About two years ago I quit being a full-time red team operator. However, it still is a field of expertise that stays very close to my heart. A few weeks ago, I was looking for a new side project and decided to pick up an old red teaming hobby of mine: bypassing/evading endpoint protection solutions.

In this post, I’d like to lay out a collection of techniques that together can be used to bypassed industry leading enterprise endpoint protection solutions. This is purely for educational purposes for (ethical) red teamers and alike, so I’ve decided not to publicly release the source code. The aim for this post is to be accessible to a wide audience in the security industry, but not to drill down to the nitty gritty details of every technique. Instead, I will refer to writeups of others that deep dive better than I can.

In adversary simulations, a key challenge in the “initial access” phase is bypassing the detection and response capabilities (EDR) on enterprise endpoints. Commercial command and control frameworks provide unmodifiable shellcode and binaries to the red team operator that are heavily signatured by the endpoint protection industry and in order to execute that implant, the signatures (both static and behavioural) of that shellcode need to be obfuscated.

In this post, I will cover the following techniques, with the ultimate goal of executing malicious shellcode, also known as a (shellcode) loader:

1. Shellcode encryption

Let’s start with a basic but important topic, static shellcode obfuscation. In my loader, I leverage a XOR or RC4 encryption algorithm, because it is easy to implement and doesn’t leave a lot of external indicators of encryption activities performed by the loader. AES encryption to obfuscate static signatures of the shellcode leaves traces in the import address table of the binary, which increase suspicion. I’ve had Windows Defender specifically trigger on AES decryption functions (e.g. 

 etc.) in earlier versions of this loader.

Output of dumpbin /imports, an easy giveaway of only AES decryption functions being used in the binary.

2. Reducing entropy

Many AV/EDR solutions consider binary entropy in their assessment of an unknown binary. Since we’re encrypting the shellcode, the entropy of our binary is rather high, which is a clear indicator of obfuscated parts of code in the binary.

There are several ways of reducing the entropy of our binary, two simple ones that work are:

  1. Adding low entropy resources to the binary, such as (low entropy) images.
  2. Adding strings, such as the English dictionary or some of 
    "strings C:\Program Files\Google\Chrome\Application\100.0.4896.88\chrome.dll"

A more elegant solution would be to design and implement an algorithm that would obfuscate (encode/encrypt) the shellcode into English words (low entropy). That would kill two birds with one stone.

3. Escaping the (local) AV sandbox

Many EDR solutions will run the binary in a local sandbox for a few seconds to inspect its behaviour. To avoid compromising on the end user experience, they cannot afford to inspect the binary for longer than a few seconds (I’ve seen Avast taking up to 30 seconds in the past, but that was an exception). We can abuse this limitation by delaying the execution of our shellcode. Simply calculating a large prime number is my personal favourite. You can go a bit further and deterministically calculate a prime number and use that number as (a part of) the key to your encrypted shellcode.

4. Import table obfuscation

You want to avoid suspicious Windows API (WINAPI) from ending up in our IAT (import address table). This table consists of an overview of all the Windows APIs that your binary imports from other system libraries. A list of suspicious (oftentimes therefore inspected by EDR solutions) APIs can be found here. Typically, these are 

 etc. Running 
dumpbin /exports &lt;binary.exe&gt;
 will list all the imports. For the most part, we’ll use Direct System calls to bypass both EDR hooks (refer to section 7) of suspicious WINAPI calls, but for less suspicious API calls this method works just fine.

We add the function signature of the WINAPI call, get the address of the WINAPI in 

 and then create a function pointer to that address:

typedef BOOL (WINAPI * pVirtualProtect)(LPVOID lpAddress, SIZE_T dwSize, DWORD  flNewProtect, PDWORD lpflOldProtect);
pVirtualProtect fnVirtualProtect;

unsigned char sVirtualProtect&#91;] = { 'V','i','r','t','u','a','l','P','r','o','t','e','c','t', 0x0 };
unsigned char sKernel32&#91;] = { 'k','e','r','n','e','l','3','2','.','d','l','l', 0x0 };

fnVirtualProtect = (pVirtualProtect) GetProcAddress(GetModuleHandle((LPCSTR) sKernel32), (LPCSTR)sVirtualProtect);
// call VirtualProtect
fnVirtualProtect(address, dwSize, PAGE_READWRITE, &amp;oldProt);

Obfuscating strings using a character array cuts the string up in smaller pieces making them more difficult to extract from a binary.

The call will still be to an 

 WINAPI, and will not bypass any hooks in WINAPIs in 
, but is purely to remove suspicious functions from the IAT.

5. Disabling Event Tracing for Windows (ETW)

Many EDR solutions leverage Event Tracing for Windows (ETW) extensively, in particular Microsoft Defender for Endpoint (formerly known as Microsoft ATP). ETW allows for extensive instrumentation and tracing of a process’ functionality and WINAPI calls. ETW has components in the kernel, mainly to register callbacks for system calls and other kernel operations, but also consists of a userland component that is part of 

 (ETW deep dive and attack vectors). Since 
 is a DLL loaded into the process of our binary, we have full control over this DLL and therefore the ETW functionality. There are quite a few different bypasses for ETW in userspace, but the most common one is patching the function 
 which is called to write/log ETW events. We fetch its address in 
, and replace its first instructions with instructions to return 0 (

void disableETW(void) {
    // return 0
    unsigned char patch&#91;] = { 0x48, 0x33, 0xc0, 0xc3};     // xor rax, rax; ret
    ULONG oldprotect = 0;
    size_t size = sizeof(patch);
    HANDLE hCurrentProc = GetCurrentProcess();
    unsigned char sEtwEventWrite&#91;] = { 'E','t','w','E','v','e','n','t','W','r','i','t','e', 0x0 };
    void *pEventWrite = GetProcAddress(GetModuleHandle((LPCSTR) sNtdll), (LPCSTR) sEtwEventWrite);
    NtProtectVirtualMemory(hCurrentProc, &amp;pEventWrite, (PSIZE_T) &amp;size, PAGE_READWRITE, &amp;oldprotect);
    memcpy(pEventWrite, patch, size / sizeof(patch&#91;0]));
    NtProtectVirtualMemory(hCurrentProc, &amp;pEventWrite, (PSIZE_T) &amp;size, oldprotect, &amp;oldprotect);
    FlushInstructionCache(hCurrentProc, pEventWrite, size);

I’ve found the above method to still work on the two tested EDRs, but this is a noisy ETW patch.

6. Evading common malicious API call patterns

Most behavioural detection is ultimately based on detecting malicious patterns. One of these patters is the order of specific WINAPI calls in a short timeframe. The suspicious WINAPI calls briefly mentioned in section 4 are typically used to execute shellcode and therefore heavily monitored. However, these calls are also used for benign activity (the 

 pattern in combination with a memory allocation and write of ~250KB of shellcode) and so the challenge for EDR solutions is to distinguish benign from malicious calls. Filip Olszak wrote a great blog post leveraging delays and smaller chunks of allocating and writing memory to blend in with benign WINAPI call behaviour. In short, his method adjusts the following behaviour of a typical shellcode loader:

  1. Instead of allocating one large chuck of memory and directly write the ~250KB implant shellcode into that memory, allocate small contiguous chunks of e.g. <64KB memory and mark them as 
    . Then write the shellcode in a similar chunk size to the allocated memory pages.
  2. Introduce delays between every of the above mentioned operations. This will increase the time required to execute the shellcode, but will also make the consecutive execution pattern stand out much less.

One catch with this technique is to make sure you find a memory location that can fit your entire shellcode in consecutive memory pages. Filip’s DripLoader implements this concept.

The loader I’ve built does not inject the shellcode into another process but instead starts the shellcode in a thread in its own process space using 

. An unknown process (our binary will de facto have low prevalence) into other processes (typically a Windows native ones) is suspicious activity that stands out (recommended read “Fork&Run – you’re history”). It is much easier to blend into the noise of benign thread executions and memory operations within a process when we run the shellcode within a thread in the loader’s process space. The downside however is that any crashing post-exploitation modules will also crash the process of the loader and therefore the implant. Persistence techniques as well as running stable and reliable BOFs can help to overcome this downside.

7. Direct system calls and evading “mark of the syscall”

The loader leverages direct system calls for bypassing any hooks put in 

 by the EDRs. I want to avoid going into too much detail on how direct syscalls work, since it’s not the purpose of this post and a lot of great posts have been written about it (e.g. Outflank).

In short, a direct syscall is a WINAPI call directly to the kernel system call equivalent. Instead of calling the 

 we call its kernel equivalent 
 defined in the Windows kernel. This is great because we’re bypassing any EDR hooks used to monitor calls to (in this example) 
 defined in 

In order to call a system call directly, we fetch the syscall ID of the system call we want to call from 

, use the function signature to push the correct order and types of function arguments to the stack, and call the 
syscall &lt;id&gt;
 instruction. There are several tools that arrange all this for us, SysWhispers2 and SysWhisper3 are two great examples. From an evasion perspective, there are two issues with calling direct system calls:

  1. Your binary ends up with having the 
     instruction, which is easy to statically detect (a.k.a “mark of the syscall”, more in “SysWhispers is dead, long live SysWhispers!”).
  2. Unlike benign use of a system call that is called through its 
     equivalent, the return address of the system call does not point to 
    . Instead, it points to our code from where we called the syscall, which resides in memory regions outside of 
    . This is an indicator of a system call that is not called through 
    , which is suspicious.

To overcome these issues we can do the following:

  1. Implement an egg hunter mechanism. Replace the 
     instruction with the 
     (some random unique identifiable pattern) and at runtime, search for this 
     in memory and replace it with the 
     instruction using the 
     WINAPI calls. Thereafter, we can use direct system calls normally. This technique has been implemented by klezVirus.
  2. Instead of calling the 
     instruction from our own code, we search for the 
     instruction in 
     and jump to that memory address once we’ve prepared the stack to call the system call. This will result in an return address in RIP that points to 
     memory regions.

Both techniques are part of SysWhisper3.

8. Removing hooks in 

Another nice technique to evade EDR hooks in 

 is to overwrite the loaded 
 that is loaded by default (and hooked by the EDR) with a fresh copy from 
 is the first DLL that gets loaded by any Windows process. EDR solutions make sure their DLL is loaded shortly after, which puts all the hooks in place in the loaded 
 before our own code will execute. If our code loads a fresh copy of 
 in memory afterwards, those EDR hooks will be overwritten. RefleXXion is a C++ library that implements the research done for this technique by MDSec. RelfeXXion uses direct system calls 
 to get a handle to a clean 
 (registry path with previously loaded DLLs). It then overwrites the 
 section of the loaded 
, which flushes out the EDR hooks.

I recommend to use adjust the RefleXXion library to use the same trick as described above in section 7.

9. Spoofing the thread call stack

The next two sections cover two techniques that provide evasions against detecting our shellcode in memory. Due to the beaconing behaviour of an implant, for a majority of the time the implant is sleeping, waiting for incoming tasks from its operator. During this time the implant is vulnerable for memory scanning techniques from the EDR. The first of the two evasions described in this post is spoofing the thread call stack.

When the implant is sleeping, its thread return address is pointing to our shellcode residing in memory. By examining the return addresses of threads in a suspicious process, our implant shellcode can be easily identified. In order to avoid this, want to break this connection between the return address and shellcode. We can do so by hooking the 

 function. When that hook is called (by the implant/beacon shellcode), we overwrite the return address with 
 and call the original 
 function. When 
 returns, we put the original return address back in place so the thread returns to the correct address to continue execution. Mariusz Banach has implemented this technique in his ThreadStackSpoofer project. This repo provides much more detail on the technique and also outlines some caveats.

We can observe the result of spoofing the thread call stack in the two screenshots below, where the non-spoofed call stack points to non-backed memory locations and a spoofed thread call stack points to our hooked Sleep (

) function and “cuts off” the rest of the call stack.

Default beacon thread call stack.
Spoofed beacon thread call stack.

10. In-memory encryption of beacon

The other evasion for in-memory detection is to encrypt the implant’s executable memory regions while sleeping. Using the same sleep hook as described in the section above, we can obtain the shellcode memory segment by examining the caller address (the beacon code that calls 

 and therefore our 
 hook). If the caller memory region is 
 and roughly the size of our shellcode, then the memory segment is encrypted with a XOR function and 
 is called. Then 
 returns, it decrypts the memory segment and returns to it.

Another technique is to register a Vectored Exception Handler (VEH) that handles 

 violation exceptions, decrypts the memory segments and changes the permissions to 
. Then just before sleeping, mark the memory segments as 
, so that when 
 returns, it throws a memory access violation exception. Because we registered a VEH, the exception is handled within that thread context and can be resumed at the exact same location the exception was thrown. The VEH can simply decrypt and change the permissions back to RX and the implant can continue execution. This technique prevents a detectible 
 hook being in place when the implant is sleeping.

Mariusz Banach has also implemented this technique in ShellcodeFluctuation.

11. A custom reflective loader

The beacon shellcode that we execute in this loader ultimately is a DLL that needs to be executed in memory. Many C2 frameworks leverage Stephen Fewer’s ReflectiveLoader. There are many well written explanations of how exactly a relfective DLL loader works, and Stephen Fewer’s code is also well documented, but in short a Reflective Loader does the following:

  1. Resolve addresses to necessary 
     WINAPIs required for loading the DLL (e.g. 
  2. Write the DLL and its sections to memory
  3. Build up the DLL import table, so the DLL can call 
  4. Load any additional library’s and resolve their respective imported function addresses
  5. Call the DLL entrypoint

Cobalt Strike added support for a custom way for reflectively loading a DLL in memory that allows a red team operator to customize the way a beacon DLL gets loaded and add evasion techniques. Bobby Cooke and Santiago P built a stealthy loader (BokuLoader) using Cobalt Strike’s UDRL which I’ve used in my loader. BokuLoader implements several evasion techniques:

  • Limit calls to 
     (commonly EDR hooked WINAPI call to resolve a function address, as we do in section 4)
  • AMSI & ETW bypasses
  • Use only direct system calls
  • Use only 
    , and no 
    ) permissions
  • Removes beacon DLL headers from memory

Make sure to uncomment the two defines to leverage direct system calls via HellsGate & HalosGate and bypass ETW and AMSI (not really necessary, as we’ve already disabled ETW and are not injecting the loader into another process).

12. OpSec configurations in your Malleable profile

In your Malleable C2 profile, make sure the following options are configured, which limit the use of 

 marked memory (suspicious and easily detected) and clean up the shellcode after beacon has started.

    set startrwx        "false";
    set userwx          "false";
    set cleanup         "true";
    set stomppe         "true";
    set obfuscate       "true";
    set sleep_mask      "true";
    set smartinject     "true";


Combining these techniques allow you to bypass (among others) Microsoft Defender for Endpoint and CrowdStrike Falcon with 0 detections (tested mid April 2022), which together with SentinelOne lead the endpoint protection industry.

CrowdStrike Falcon with 0 alerts.
Windows Defender (and also Microsoft Defender for Endpoint, not screenshotted) with 0 alerts.

Of course this is just one and the first step in fully compromising an endpoint, and this doesn’t mean “game over” for the EDR solution. Depending on what post-exploitation activity/modules the red team operator choses next, it can still be “game over” for the implant. In general, either run BOFs, or tunnel post-ex tools through the implant’s SOCKS proxy feature. Also consider putting the EDR hooks patches back in place in our 

 hook to avoid detection of unhooking, as well as removing the ETW/AMSI patches.

It’s a cat and mouse game, and the cat is undoubtedly getting better.