16. First Shellcode - Hello World!
Now that we understand the theory, let’s create our first working shellcode! We’ll build a simple “Hello World” shellcode that writes directly to stdout using system calls.
The Goal
We want to create shellcode that accomplishes the same as this C program:
1
2
3
4
5
6
#include <unistd.h>
int main() {
write(1, "Hello World!\n", 13);
_exit(0);
}
But in pure assembly, without any library dependencies.
Step 1: Understanding the System Calls
We need write system calls:
write(1, “Hello World!\n”, 13) - Output our message
On success, the number of bytes actually written to the file or device associated with the file descriptor. Otherwise it returns -1.
Looking up the Linux x86_64 syscall numbers:
write= 1
Step 2: The Assembly Implementation
Here’s our first attempt (hello.asm):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
section .text
global _start
_start:
; write(1, message, 13)
mov rax, 1 ; syscall number for write
mov rdi, 1 ; file descriptor (stdout)
mov rsi, message ; pointer to message
mov rdx, 13 ; message length
syscall ; invoke kernel
section .data
message:
db "Hello World!", 0x0a ; "Hello World!" + "\n"
Step 3: Assembling and Testing
Let’s build this as a normal executable first:
1
2
nasm -f elf64 hello.s -o hello.o
ld -o hello hello.o
On running we can see we get segmentation fault!
1
2
3
$ ./hello
Hello World!
Segmentation fault (core dumped)
This is because after the write syscall completes, the CPU continues executing whatever bytes come next in memory:
Memory layout:
[Our assembly code] → [Data section] → [Random memory] → [Invalid memory]
↓ ↓ ↓ ↓
write() executes → "Hello World!" bytes → garbage → SEGFAULT!
The Execution Flow:
writesyscall executes successfully - “Hello World!” prints- CPU continues to next instruction - but we have no more code!
- Interprets data as code - tries to execute
"Hello World!"string as CPU instructions - Hits invalid instructions - random bytes aren’t valid x86-64 opcodes
- Segmentation fault - CPU protection fault when trying to execute invalid memory
What if we add jmp _start?
section .text
global _start
_start:
; write(1, message, 13)
mov rax, 1 ; syscall number for write
mov rdi, 1 ; file descriptor (stdout)
mov rsi, message ; pointer to message
mov rdx, 13 ; message length
syscall ; invoke kernel
jmp _start
section .data
message:
db "Hello World!", 0x0a ; "Hello World!" + "\n"
If we add a jmp _start instruction to the end of our program, it will continuously jump back to the beginning, causing "Hello World!" to print indefinitely.
This creates an infinite loop that continues until the program is manually stopped (for example, by pressing Ctrl+C).
Let’s replace the infinite jump with a clean exit syscall so the program prints once and terminates properly.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
section .text
global _start
_start:
; write(1, message, 13)
mov rax, 1 ; syscall number for write
mov rdi, 1 ; file descriptor (stdout)
mov rsi, message ; pointer to message
mov rdx, 13 ; message length
syscall ; invoke kernel to print message
; exit(0)
mov rax, 60 ; syscall number for exit
mov rdi, 0xa ; status code 10
syscall ; invoke kernel to exit program
section .data
message:
db "Hello World!", 0x0a ; "Hello World!\n"
On assembling and linking you can see -
1
2
3
4
5
6
7
$ nasm -f elf64 hello.s -o hello.o
$ ld -o hello hello.o
$ ./hello
Hello World!
$ echo $?
10
Our code exits gracefully with return code 10.
Now that we have working assembly code, let’s transform it into actual shellcode and learn how to deploy it in real exploits.
Now that we have working assembly code, let’s transform it into actual shellcode and learn how to deploy it in real exploits.
Step 1: Generating Raw Shellcode Bytes
We will extract shellcode from our assembly:
1
2
3
4
5
6
# Assemble and link normally
nasm -f elf64 hello.asm
ld -o hello hello.o
# Extract the raw bytes and use a script to format it for C
objdump -d hello | grep -Po '\s\K[a-f0-9]{2}(?=\s)' | tr -d '\n' | sed 's/\(..\)/\\x\1/g'
On running this we get -
\xb8\x01\x00\x00\x00\xbf\x01\x00\x00\x00\x48\xbe\x00\x20\x40\x00\x00\x00\x00\x00\xba\x0d\x00\x00\x00\x0f\x05\xb8\x3c\x00\x00\x00\xbf\x0a\x00\x00\x00\x0f\x05
Step 2: The Shellcode Loader Template
Once you have the shellcode bytes, test them with a loader:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <stdio.h>
#include <string.h>
// Generated shellcode
unsigned char shellcode[] =
/*
Put shellcode here
*/
int main() {
printf("Shellcode length: %zu bytes\n", strlen(shellcode));
printf("Shellcode address: %p\n", shellcode);
// Make memory executable and execute
int (*func)() = (int(*)())shellcode;
func();
return 0;
}
If this doesn’t work then we will use our mmap loader which works on Modern Kernel.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
const char shellcode[] = "\xb8\x01\x00\x00\x00\xbf\x01\x00\x00\x00\x48\xbe\x00\x20\x40\x00\x00\x00\x00\x00\xba\x0d\x00\x00\x00\x0f\x05\xb8\x3c\x00\x00\x00\xbf\x0a\x00\x00\x00\x0f\x05";
int main() {
printf("Shellcode Length: %zu bytes\n", strlen(shellcode));
// Allocate executable memory
void *exec_mem = mmap(NULL, sizeof(shellcode),
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (exec_mem == MAP_FAILED) {
perror("mmap failed");
return 1;
}
printf("Allocated executable memory at: %p\n", exec_mem);
// Copy shellcode to executable memory
memcpy(exec_mem, shellcode, sizeof(shellcode));
printf("Executing shellcode...\n");
/* PROBLEM */
// int (*ret)() = (int(*)())code; // Trying to execute data section
// ret();
// Cast to function pointer and execute
int (*func)() = (int(*)())exec_mem;
func();
// Cleanup (though we may not reach this)
munmap(exec_mem, sizeof(shellcode));
return 0;
}
But this fails!
1
2
3
4
5
$ gcc loader.c -o loader
$ ./loader
Shellcode Length: 2 bytes
Allocated executable memory at: 0x748e81250000
Executing shellcode...
We can see that the size is 2 bytes. This is strange!
The problem is NULL BYTE!
The Problem: Null Bytes in Shellcode!
The problem is that strlen() stops counting at the first null byte (\x00), but your shellcode is full of null bytes.
We can try to make our shellcode Null Bytes free!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
section .text
global _start
_start:
jmp short get_message
shellcode:
; Get message address using JMP-CALL-POP
pop rsi
; write(1, message, 13) - NO NULL BYTES!
xor rax, rax ; Clear RAX
mov al, 1 ; SYS_write = 1 (only sets low byte)
xor rdi, rdi ; Clear RDI
mov dil, 1 ; fd = 1 (only sets low byte)
xor rdx, rdx ; Clear RDX
mov dl, 13 ; length = 13 (only sets low byte)
syscall
; exit(0) - NO NULL BYTES!
xor rax, rax ; Clear RAX
mov al, 60 ; SYS_exit = 60 (only sets low byte)
xor rdi, rdi ; status = 0 (no null bytes!)
syscall
get_message:
call shellcode
message db "Hello World!", 0x0a
When you assemble and link this code, then dump the output, you’ll find the generated shellcode contains no null bytes.
1
2
$ objdump -d null_free | grep -Po '\s\K[a-f0-9]{2}(?=\s)' | tr -d '\n' | sed 's/\(..\)/\\x\1/g'
\xeb\x1d\x5e\x48\x31\xc0\xb0\x01\x48\x31\xff\x40\xb7\x01\x48\x31\xd2\xb2\x0d\x0f\x05\x48\x31\xc0\xb0\x3c\x48\x31\xff\x0f\x05\xe8\xde\xff\xff\xff\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x0a
Let’s use this shellcode in our loader program and run it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
const char shellcode[] = "\xeb\x1d\x5e\x48\x31\xc0\xb0\x01\x48\x31\xff\x40\xb7\x01\x48\x31\xd2\xb2\x0d\x0f\x05\x48\x31\xc0\xb0\x3c\x48\x31\xff\x0f\x05\xe8\xde\xff\xff\xff\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x0a";
int main() {
printf("Shellcode Length: %zu bytes\n", strlen(shellcode));
// Allocate executable memory
void *exec_mem = mmap(NULL, sizeof(shellcode),
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (exec_mem == MAP_FAILED) {
perror("mmap failed");
return 1;
}
printf("Allocated executable memory at: %p\n", exec_mem);
// Copy shellcode to executable memory
memcpy(exec_mem, shellcode, sizeof(shellcode));
printf("Executing shellcode...\n");
/* PROBLEM */
// int (*ret)() = (int(*)())code; // Trying to execute data section
// ret();
// Cast to function pointer and execute
int (*func)() = (int(*)())exec_mem;
func();
// Cleanup (though we may not reach this)
munmap(exec_mem, sizeof(shellcode));
return 0;
}
On compiling and running, the shellcode executes successfully:
1
2
3
4
5
6
7
$ gcc loader.c -o loader
$ ./loader
Shellcode Length: 49 bytes
Allocated executable memory at: 0x7d824fbad000
Executing shellcode...
Hello World!
I have developed an automation tool that streamlines the process of compilation to shellcode extraction and execution.
Workflow:
Assembly (.s) → Binary → Shellcode Extraction → Loader Generation → Execution
You can find it here