The third assignment of the SLAE64 exam states:

  • Study about the Egg Hunter shellcode
  • Create a working demo of the Egg Hunter
  • It should be configurable for different payloads

I for one had not heard before of the concept of an egg hunter so a little searching around led me to a (the?) paper by skape called Safely Searching Process Virtual Address Space published in 2004.

In a nutshell an egg hunter is a piece of code that searches the virtual address space (VAS) of a process looking for a predefined marker, called an egg. Once this marker has been found it jumps to it and the code that was marked by the egg is executed in the context of the application.

The situations where one would use an egg hunter are situations where an attacker is able to inject code of arbitrary length into the application’s memory but cannot predict its final location. An overflow elsewhere which would otherwise not allow uploading the larger piece of code could then be utilized to insert the egg hunter. It then hunts for the egg in memory to find the rest of the exploit.

The paper defines three criteria an egg hunter should satisfy:

  1. It should be robust
  2. It must be small
  3. It should be fast

The paper then makes a number of observations which I’ll summarize here:

  • the egg should be 8 bytes; this is due to the fact the egg hunter would typically store the egg as a 4 byte key and it may otherwise end up finding it’s own key rather than the actual egg when searching memory;
  • the pattern of the egg should be easy to recognize by the hunter without requiring a lot of code to verify the “something” is actually the egg (criterium 2);
  • the key could be valid code; for example: 0x50905090 (thus the egg is the dword 0x50905090 placed in memory twice in a row) which decodes to: nop;push eax;nop;push eax. When the hunter jumps to it the key mustn’t contain invalid instructions or the application would crash, or, the hunter should account for this and jump beyond the marker to first bytes of actual code.
  • iterating through memory addresses and potentially dereferencing uninitialized/unavailabale memory could lead to a segmentation fault. Registering a “fake” SIGSEGV handler requires too much code according to the author so it is argued it’s better to rely on standard syscalls such as access(2) and sigaction(2) to catch the EFAULT error upon touching uninitialized memory.

Note however that while the paper focusses on Linux/x86, the described concepts do generally apply to Linux/x86_64 too. However it should be noted that the 64-bit shellcode cannot really come close to their 32-bit equivalents of 30-odd bytes with a runtime below 10 seconds. To me this effectively renders this (interesting) technique not feasible for practical usage on Linux/x86_64 as the haystack to search for our needle is so much bigger.

The second implementation described by the author eliminates the need for an executable egg by using the scasd instruction which actually does the incrementing of EDI by four bytes per comparison. Thus bypassing the need to jump to the beginning of the egg or explicitly accounting for 8 bytes of non-executable egg.

For this assignment I have based my code on skape’s access(2) revisited implementation. I thought this implementation was very clear in what it’s doing without making sacrifices in size. In the end it comes in at a total of 45 bytes.

I have used the egg value of 0x90509050 which is actually valid code for:

nop
push eax
nop
push eax

but due to using the scasd call we don’t actually end up jumping to the very beginning of our egg. RDI has been advanced by scasd until right where our actual payload would begin.

One way we could speed this up can be learned from pmap $PID where $PID is the process ID of our shellcode wrapper. For example:

root@slae64:~# pmap 20182
20182:   ./egghunter
00005635d4eb4000      4K r-x-- egghunter
00005635d50b4000      4K r-x-- egghunter
00005635d50b5000      4K rwx-- egghunter
00005635d5c24000    132K rwx--   [ anon ]
00007f40cc854000   1948K r-x-- libc-2.27.so
00007f40cca3b000   2048K ----- libc-2.27.so
00007f40ccc3b000     16K r-x-- libc-2.27.so
00007f40ccc3f000      8K rwx-- libc-2.27.so
00007f40ccc41000     16K rwx--   [ anon ]
00007f40ccc45000    156K r-x-- ld-2.27.so
00007f40cce63000      8K rwx--   [ anon ]
00007f40cce6c000      4K r-x-- ld-2.27.so
00007f40cce6d000      4K rwx-- ld-2.27.so
00007f40cce6e000      4K rwx--   [ anon ]
00007ffc5e1bc000    132K rwx--   [ stack ]
00007ffc5e1f2000     12K r----   [ anon ]
00007ffc5e1f5000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total             4512K

As you can see the program is mapped into memory starting at 0x5635d4eb4000 yet our egg hunter will spend a considerable amount of time iterating through lower memory before getting to that address. One optimization at the cost of code size might be to determine the base address and start searching from there.

In order to verify my code is correct I forced my shellcode wrapper to place the .text section at a relatively low address so the egg would be found quite quickly. I used the -Wl,-Ttext-segment,0x40000000 compiler flag for that.

asciicast

Wrapping up

I have uploaded my code to jasperla/slae64 on GitHub:

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification. Student ID: SLAE64-1614