original text by vivami
About two years ago I quit being a full-time red team operator. However, it still is a field of expertise that stays very close to my heart. A few weeks ago, I was looking for a new side project and decided to pick up an old red teaming hobby of mine: bypassing/evading endpoint protection solutions.
In this post, I’d like to lay out a collection of techniques that together can be used to bypassed industry leading enterprise endpoint protection solutions. This is purely for educational purposes for (ethical) red teamers and alike, so I’ve decided not to publicly release the source code. The aim for this post is to be accessible to a wide audience in the security industry, but not to drill down to the nitty gritty details of every technique. Instead, I will refer to writeups of others that deep dive better than I can.
In adversary simulations, a key challenge in the “initial access” phase is bypassing the detection and response capabilities (EDR) on enterprise endpoints. Commercial command and control frameworks provide unmodifiable shellcode and binaries to the red team operator that are heavily signatured by the endpoint protection industry and in order to execute that implant, the signatures (both static and behavioural) of that shellcode need to be obfuscated.
In this post, I will cover the following techniques, with the ultimate goal of executing malicious shellcode, also known as a (shellcode) loader:
1. Shellcode encryption
Let’s start with a basic but important topic, static shellcode obfuscation. In my loader, I leverage a XOR or RC4 encryption algorithm, because it is easy to implement and doesn’t leave a lot of external indicators of encryption activities performed by the loader. AES encryption to obfuscate static signatures of the shellcode leaves traces in the import address table of the binary, which increase suspicion. I’ve had Windows Defender specifically trigger on AES decryption functions (e.g.

2. Reducing entropy
Many AV/EDR solutions consider binary entropy in their assessment of an unknown binary. Since we’re encrypting the shellcode, the entropy of our binary is rather high, which is a clear indicator of obfuscated parts of code in the binary.
There are several ways of reducing the entropy of our binary, two simple ones that work are:
- Adding low entropy resources to the binary, such as (low entropy) images.
- Adding strings, such as the English dictionary or some of
output."strings C:\Program Files\Google\Chrome\Application\100.0.4896.88\chrome.dll"
A more elegant solution would be to design and implement an algorithm that would obfuscate (encode/encrypt) the shellcode into English words (low entropy). That would kill two birds with one stone.
3. Escaping the (local) AV sandbox
Many EDR solutions will run the binary in a local sandbox for a few seconds to inspect its behaviour. To avoid compromising on the end user experience, they cannot afford to inspect the binary for longer than a few seconds (I’ve seen Avast taking up to 30 seconds in the past, but that was an exception). We can abuse this limitation by delaying the execution of our shellcode. Simply calculating a large prime number is my personal favourite. You can go a bit further and deterministically calculate a prime number and use that number as (a part of) the key to your encrypted shellcode.
4. Import table obfuscation
You want to avoid suspicious Windows API (WINAPI) from ending up in our IAT (import address table). This table consists of an overview of all the Windows APIs that your binary imports from other system libraries. A list of suspicious (oftentimes therefore inspected by EDR solutions) APIs can be found here. Typically, these are
We add the function signature of the WINAPI call, get the address of the WINAPI in
typedef BOOL (WINAPI * pVirtualProtect)(LPVOID lpAddress, SIZE_T dwSize, DWORD flNewProtect, PDWORD lpflOldProtect);
pVirtualProtect fnVirtualProtect;
unsigned char sVirtualProtect[] = { 'V','i','r','t','u','a','l','P','r','o','t','e','c','t', 0x0 };
unsigned char sKernel32[] = { 'k','e','r','n','e','l','3','2','.','d','l','l', 0x0 };
fnVirtualProtect = (pVirtualProtect) GetProcAddress(GetModuleHandle((LPCSTR) sKernel32), (LPCSTR)sVirtualProtect);
// call VirtualProtect
fnVirtualProtect(address, dwSize, PAGE_READWRITE, &oldProt);
Obfuscating strings using a character array cuts the string up in smaller pieces making them more difficult to extract from a binary.
The call will still be to an
5. Disabling Event Tracing for Windows (ETW)
Many EDR solutions leverage Event Tracing for Windows (ETW) extensively, in particular Microsoft Defender for Endpoint (formerly known as Microsoft ATP). ETW allows for extensive instrumentation and tracing of a process’ functionality and WINAPI calls. ETW has components in the kernel, mainly to register callbacks for system calls and other kernel operations, but also consists of a userland component that is part of
void disableETW(void) {
// return 0
unsigned char patch[] = { 0x48, 0x33, 0xc0, 0xc3}; // xor rax, rax; ret
ULONG oldprotect = 0;
size_t size = sizeof(patch);
HANDLE hCurrentProc = GetCurrentProcess();
unsigned char sEtwEventWrite[] = { 'E','t','w','E','v','e','n','t','W','r','i','t','e', 0x0 };
void *pEventWrite = GetProcAddress(GetModuleHandle((LPCSTR) sNtdll), (LPCSTR) sEtwEventWrite);
NtProtectVirtualMemory(hCurrentProc, &pEventWrite, (PSIZE_T) &size, PAGE_READWRITE, &oldprotect);
memcpy(pEventWrite, patch, size / sizeof(patch[0]));
NtProtectVirtualMemory(hCurrentProc, &pEventWrite, (PSIZE_T) &size, oldprotect, &oldprotect);
FlushInstructionCache(hCurrentProc, pEventWrite, size);
}
I’ve found the above method to still work on the two tested EDRs, but this is a noisy ETW patch.
6. Evading common malicious API call patterns
Most behavioural detection is ultimately based on detecting malicious patterns. One of these patters is the order of specific WINAPI calls in a short timeframe. The suspicious WINAPI calls briefly mentioned in section 4 are typically used to execute shellcode and therefore heavily monitored. However, these calls are also used for benign activity (the
- Instead of allocating one large chuck of memory and directly write the ~250KB implant shellcode into that memory, allocate small contiguous chunks of e.g. <64KB memory and mark them as
. Then write the shellcode in a similar chunk size to the allocated memory pages.NO_ACCESS
- Introduce delays between every of the above mentioned operations. This will increase the time required to execute the shellcode, but will also make the consecutive execution pattern stand out much less.
One catch with this technique is to make sure you find a memory location that can fit your entire shellcode in consecutive memory pages. Filip’s DripLoader implements this concept.
The loader I’ve built does not inject the shellcode into another process but instead starts the shellcode in a thread in its own process space using
7. Direct system calls and evading “mark of the syscall”
The loader leverages direct system calls for bypassing any hooks put in
In short, a direct syscall is a WINAPI call directly to the kernel system call equivalent. Instead of calling the
In order to call a system call directly, we fetch the syscall ID of the system call we want to call from
- Your binary ends up with having the
instruction, which is easy to statically detect (a.k.a “mark of the syscall”, more in “SysWhispers is dead, long live SysWhispers!”).syscall
- Unlike benign use of a system call that is called through its
equivalent, the return address of the system call does not point tontdll.dll. Instead, it points to our code from where we called the syscall, which resides in memory regions outside ofntdll.dll. This is an indicator of a system call that is not called throughntdll.dll, which is suspicious.ntdll.dll
To overcome these issues we can do the following:
- Implement an egg hunter mechanism. Replace the
instruction with thesyscall(some random unique identifiable pattern) and at runtime, search for thiseggin memory and replace it with theegginstruction using thesyscallandReadProcessMemoryWINAPI calls. Thereafter, we can use direct system calls normally. This technique has been implemented by klezVirus.WriteProcessMemory
- Instead of calling the
instruction from our own code, we search for thesyscallinstruction insyscalland jump to that memory address once we’ve prepared the stack to call the system call. This will result in an return address in RIP that points tontdll.dllmemory regions.ntdll.dll
Both techniques are part of SysWhisper3.
8. Removing hooks in
ntdll.dll
Another nice technique to evade EDR hooks in
I recommend to use adjust the RefleXXion library to use the same trick as described above in section 7.
9. Spoofing the thread call stack
The next two sections cover two techniques that provide evasions against detecting our shellcode in memory. Due to the beaconing behaviour of an implant, for a majority of the time the implant is sleeping, waiting for incoming tasks from its operator. During this time the implant is vulnerable for memory scanning techniques from the EDR. The first of the two evasions described in this post is spoofing the thread call stack.
When the implant is sleeping, its thread return address is pointing to our shellcode residing in memory. By examining the return addresses of threads in a suspicious process, our implant shellcode can be easily identified. In order to avoid this, want to break this connection between the return address and shellcode. We can do so by hooking the
We can observe the result of spoofing the thread call stack in the two screenshots below, where the non-spoofed call stack points to non-backed memory locations and a spoofed thread call stack points to our hooked Sleep (


10. In-memory encryption of beacon
The other evasion for in-memory detection is to encrypt the implant’s executable memory regions while sleeping. Using the same sleep hook as described in the section above, we can obtain the shellcode memory segment by examining the caller address (the beacon code that calls
Another technique is to register a Vectored Exception Handler (VEH) that handles
Mariusz Banach has also implemented this technique in ShellcodeFluctuation.
11. A custom reflective loader
The beacon shellcode that we execute in this loader ultimately is a DLL that needs to be executed in memory. Many C2 frameworks leverage Stephen Fewer’s ReflectiveLoader. There are many well written explanations of how exactly a relfective DLL loader works, and Stephen Fewer’s code is also well documented, but in short a Reflective Loader does the following:
- Resolve addresses to necessary
WINAPIs required for loading the DLL (e.g.kernel32.dll,VirtualAllocetc.)LoadLibraryA
- Write the DLL and its sections to memory
- Build up the DLL import table, so the DLL can call
andntdll.dllWINAPIskernel32.dll
- Load any additional library’s and resolve their respective imported function addresses
- Call the DLL entrypoint
Cobalt Strike added support for a custom way for reflectively loading a DLL in memory that allows a red team operator to customize the way a beacon DLL gets loaded and add evasion techniques. Bobby Cooke and Santiago P built a stealthy loader (BokuLoader) using Cobalt Strike’s UDRL which I’ve used in my loader. BokuLoader implements several evasion techniques:
- Limit calls to
(commonly EDR hooked WINAPI call to resolve a function address, as we do in section 4)GetProcAddress()
- AMSI & ETW bypasses
- Use only direct system calls
- Use only
orRW, and noRX(RWX) permissionsEXECUTE_READWRITE
- Removes beacon DLL headers from memory
Make sure to uncomment the two defines to leverage direct system calls via HellsGate & HalosGate and bypass ETW and AMSI (not really necessary, as we’ve already disabled ETW and are not injecting the loader into another process).
12. OpSec configurations in your Malleable profile
In your Malleable C2 profile, make sure the following options are configured, which limit the use of
set startrwx "false";
set userwx "false";
set cleanup "true";
set stomppe "true";
set obfuscate "true";
set sleep_mask "true";
set smartinject "true";
Conclusions
Combining these techniques allow you to bypass (among others) Microsoft Defender for Endpoint and CrowdStrike Falcon with 0 detections (tested mid April 2022), which together with SentinelOne lead the endpoint protection industry.


Of course this is just one and the first step in fully compromising an endpoint, and this doesn’t mean “game over” for the EDR solution. Depending on what post-exploitation activity/modules the red team operator choses next, it can still be “game over” for the implant. In general, either run BOFs, or tunnel post-ex tools through the implant’s SOCKS proxy feature. Also consider putting the EDR hooks patches back in place in our
It’s a cat and mouse game, and the cat is undoubtedly getting better.