Obfuscation

The following material is a crash course in code obfuscation by example. First, let's look at some reasons why someone might want to obfuscate code:

All programmers are hopefully familiar with typical "Hello World" programs. Let's begin our code obfuscation example with the following simple "Hello World" program written in C:

#include <stdio.h>

int main() {    printf("Hello World\n"); }

As we progress through the various examples, you can compile and test them with a simple complication:

   cc <source-name>

And then run the compiled code using the command:

   ./a.out

Each obfuscated program has the following goals:


helloworld-1.c

The first example obfuscated code is a bit trivial. The user must rename the executable file "Hello World" (including the space). The code just prints out the program name. Ok, so I admit its a cheap try, and one could argue it violates the rules I stated above, but one should be aware of such devious possibilities.

#include <stdio.h>

int main(int argc, char *argv[])
{
   printf("%s\n",argv[0]);
}

helloworld-2.c

In the next example we use two obfuscation tricks: a.) string encoding and b.) Macro substitution. The string variable bar gets a series of charactors where every other charactor is a member of our target "Hello World" string. The for loop then extracts the proper charactors into the string variable foo. The last line THIS IS OBFUSCATION are really three #define macros that yeild our printf() statement.

#include <stdio.h>
#include <string.h>

#define THIS printf(
#define IS "%s\n"
#define OBFUSCATION ,foo);

int main(int argc, char *argv[])
{
   int a = 0; char bar[32]; char foo[32];

   strcpy(bar,"@h3eglhl1o. >w%o#rtlwdl!S");
   for (a = 0; a < 26; a++) { foo[a/2] = bar[++a];};

   THIS IS OBFUSCATION
}

helloworld-3.c

Example 3 takes the obfuscation a step further by introducing recursive calls to the main() function. Remember that main() is a function just like any other, so you can always call it again later in the program if one so desires. Initially, the value of c will always be greater-than or equal to one (1) because at minimum the program name is passed in the argument list to the main() function (as we exploited in the cheap example 1).

During execution, the default case is called first, which sets are charactor encoded string. Next, case 34123 is called since we are now in the second recursion level of main(). We end up with our Macro'ed print statement executing in case 0 under recursion level 3 of main(). The program then unwraps the recursion tree and ends. Easy as pi :-p.

#include <stdio.h>

#define THIS printf(
#define IS "%s\n"
#define OBFUSCATION ,v);

int main(c, v) char *v; int c; {
   int a = 0; char f[32];
   switch (c) {
      case 0:
         THIS IS OBFUSCATION
         break;
      case 34123:
         for (a = 0; a < 13; a++) { f[a] = v[a*2+1];};
         main(0,f);
         break;
      default:
         main(34123,"@h3eglhl1o. >w%o#rtlwdl!S\0m");
         break;
      }
}

helloworld-4.c

Ok, now for some real dark-arts fun. To write the code in example 4, I actually first had to execute another program:

#include <stdio.h>

int main(){
  double str[2];
  strcpy((char*)str,"Hello World");
  printf("%f\n%f\n",str[0],str[1]);
}

This little program gave me a numerical coding for my target string. Confused? Ok, let's break it down. A double has 8 bytes of data in it, so a double array of 2 elements has 16 bytes which is plenty of room for the string "Hello World"'s charactors. The C language allows us to typecast pointers at will, so we typecast the double-array into the strcpy() function as a string. The computer doesn't care, since its all 1's and 0's, so we get the right bits for "Hello World" assigned to our 2-element double array. I can then take the two double values and reverse the process in my obfuscated code:

#include <stdio.h>

int main(_, v)
   char *v;
   int _;
{
   int a = 0;
   char f[32];
   double h[2];
   h[0] = 219144411970696341534563910188240261707095231701777609973 20759459436800394073072125018704290409006721463388339383036594392377406 35160500855813030357492372682887858054616489605441589829740433065995076 650229152079883597110973562880.000000;
   h[1] = 1867980801.569119;
   printf("%s\n",(char *)h);
}

helloworld-5.c

Ok, so let's now apply the above charactor encoding trick with the tricks from example 3. Notice also that I am now using the variable name _ (that is an underscore). Check your C/C++ syntax rules: this is a valid variable name. So is __, and ___, etc. One could REALLY confuse things by replacing all variable names with different lengths of underscores.

#include <stdio.h>
#define THIS printf(
#define IS "%s\n"
#define OBFUSCATION ,v);
double h[2];

int main(_, v) char *v; int _; {
   int a = 0; char f[32];
   h[2%2] = 2191444119706963415345639101882402617070952317017776099732075 9459436800394073072125018704290409006721463388339383036594392377406351605 0085581303035749237268288785805461648960544158982974043306599507665022915 2079883597110973562880.000000;
   h[4%3] = 1867980801.569119;

   switch (_) {
      case 0:
         THIS IS OBFUSCATION
         break;
      default:
         main(0,(char *)h);
         break;
      }
}

helloworld-6.c

Last but not least, take out all the nice line-breaks and indentions. Pretty code has no place in real obfuscation.

#include <stdio.h> #define THIS printf(
#define IS "%s\n"
#define OBFUSCATION ,v);
double h[2]; int main(_, v) char *v; int _; { int a = 0; char f[32]; h[2%2] = 21914441197069634153456391018824026170709523170177760997320759459436800394073 07212501870429040900672146338833938303659439237740635160500855813030357492372 682887858054616489605441589829740433065995076650229152079883597110973562880.0 00000; h[4%3] = 1867980801.569119; switch (_) { case 0: THIS IS OBFUSCATION break; default: main(0,(char *)h); break; } }