Post

16. First Shellcode - Hello World!

16. First Shellcode - Hello World!

Now that we understand the theory, let’s create our first working shellcode! We’ll build a simple “Hello World” shellcode that writes directly to stdout using system calls.

The Goal

We want to create shellcode that accomplishes the same as this C program:

1
2
3
4
5
6
#include <unistd.h>

int main() {
    write(1, "Hello World!\n", 13);
    _exit(0);
}

But in pure assembly, without any library dependencies.

Step 1: Understanding the System Calls

We need write system calls:

write(1, “Hello World!\n”, 13) - Output our message

On success, the number of bytes actually written to the file or device associated with the file descriptor. Otherwise it returns -1.

Looking up the Linux x86_64 syscall numbers:

  • write = 1

Step 2: The Assembly Implementation

Here’s our first attempt (hello.asm):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
section .text
    global _start

_start:
    ; write(1, message, 13)
    mov rax, 1          ; syscall number for write
    mov rdi, 1          ; file descriptor (stdout)
    mov rsi, message    ; pointer to message
    mov rdx, 13         ; message length
    syscall             ; invoke kernel

section .data
message:
    db "Hello World!", 0x0a  ; "Hello World!" + "\n"

Step 3: Assembling and Testing

Let’s build this as a normal executable first:

1
2
nasm -f elf64 hello.s -o hello.o
ld -o hello hello.o

On running we can see we get segmentation fault!

1
2
3
$ ./hello
Hello World!
Segmentation fault (core dumped)

This is because after the write syscall completes, the CPU continues executing whatever bytes come next in memory:

Memory layout:
[Our assembly code] → [Data section] → [Random memory] → [Invalid memory]
     ↓                    ↓                   ↓               ↓
write() executes  →  "Hello World!" bytes → garbage → SEGFAULT!

The Execution Flow:

  1. write syscall executes successfully - “Hello World!” prints
  2. CPU continues to next instruction - but we have no more code!
  3. Interprets data as code - tries to execute "Hello World!" string as CPU instructions
  4. Hits invalid instructions - random bytes aren’t valid x86-64 opcodes
  5. Segmentation fault - CPU protection fault when trying to execute invalid memory

What if we add jmp _start?

section .text
    global _start

_start:
    ; write(1, message, 13)
    mov rax, 1          ; syscall number for write
    mov rdi, 1          ; file descriptor (stdout)
    mov rsi, message    ; pointer to message
    mov rdx, 13         ; message length
    syscall             ; invoke kernel

   jmp _start

section .data
message:
    db "Hello World!", 0x0a  ; "Hello World!" + "\n"

If we add a jmp _start instruction to the end of our program, it will continuously jump back to the beginning, causing "Hello World!" to print indefinitely.

This creates an infinite loop that continues until the program is manually stopped (for example, by pressing Ctrl+C).

Let’s replace the infinite jump with a clean exit syscall so the program prints once and terminates properly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
section .text
    global _start

_start:
    ; write(1, message, 13)
    mov     rax, 1          ; syscall number for write
    mov     rdi, 1          ; file descriptor (stdout)
    mov     rsi, message    ; pointer to message
    mov     rdx, 13         ; message length
    syscall                 ; invoke kernel to print message

    ; exit(0)
    mov     rax, 60         ; syscall number for exit
    mov     rdi, 0xa        ; status code 10
    syscall                 ; invoke kernel to exit program

section .data
message:
    db "Hello World!", 0x0a ; "Hello World!\n"

On assembling and linking you can see -

1
2
3
4
5
6
7
$ nasm -f elf64 hello.s -o hello.o
$ ld -o hello hello.o
$ ./hello 
Hello World!
$ echo $?
10

Our code exits gracefully with return code 10.

Now that we have working assembly code, let’s transform it into actual shellcode and learn how to deploy it in real exploits.

Now that we have working assembly code, let’s transform it into actual shellcode and learn how to deploy it in real exploits.

Step 1: Generating Raw Shellcode Bytes

We will extract shellcode from our assembly:

1
2
3
4
5
6
# Assemble and link normally
nasm -f elf64 hello.asm
ld -o hello hello.o

# Extract the raw bytes and use a script to format it for C
objdump -d hello | grep -Po '\s\K[a-f0-9]{2}(?=\s)' | tr -d '\n' | sed 's/\(..\)/\\x\1/g'

On running this we get -

\xb8\x01\x00\x00\x00\xbf\x01\x00\x00\x00\x48\xbe\x00\x20\x40\x00\x00\x00\x00\x00\xba\x0d\x00\x00\x00\x0f\x05\xb8\x3c\x00\x00\x00\xbf\x0a\x00\x00\x00\x0f\x05

Step 2: The Shellcode Loader Template

Once you have the shellcode bytes, test them with a loader:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <stdio.h>
#include <string.h>

// Generated shellcode
unsigned char shellcode[] = 
/*
Put shellcode here
*/

int main() {
    printf("Shellcode length: %zu bytes\n", strlen(shellcode));
    printf("Shellcode address: %p\n", shellcode);
    
    // Make memory executable and execute
    int (*func)() = (int(*)())shellcode;
    func();
    
    return 0;
}

If this doesn’t work then we will use our mmap loader which works on Modern Kernel.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

const char shellcode[] = "\xb8\x01\x00\x00\x00\xbf\x01\x00\x00\x00\x48\xbe\x00\x20\x40\x00\x00\x00\x00\x00\xba\x0d\x00\x00\x00\x0f\x05\xb8\x3c\x00\x00\x00\xbf\x0a\x00\x00\x00\x0f\x05";

int main() {
    printf("Shellcode Length: %zu bytes\n", strlen(shellcode));
    
    // Allocate executable memory
    void *exec_mem = mmap(NULL, sizeof(shellcode), 
                         PROT_READ | PROT_WRITE | PROT_EXEC,
                         MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    
    if (exec_mem == MAP_FAILED) {
        perror("mmap failed");
        return 1;
    }
    
    printf("Allocated executable memory at: %p\n", exec_mem);
    
    // Copy shellcode to executable memory
    memcpy(exec_mem, shellcode, sizeof(shellcode));
    
    printf("Executing shellcode...\n");
    
    /* PROBLEM */
	// int (*ret)() = (int(*)())code;  // Trying to execute data section
	// ret();
      
    // Cast to function pointer and execute
    int (*func)() = (int(*)())exec_mem;
    func();
    
    // Cleanup (though we may not reach this)
    munmap(exec_mem, sizeof(shellcode));
    
    return 0;
}

But this fails!

1
2
3
4
5
$ gcc loader.c -o loader
$ ./loader 
Shellcode Length: 2 bytes
Allocated executable memory at: 0x748e81250000
Executing shellcode...

We can see that the size is 2 bytes. This is strange!

The problem is NULL BYTE!

The Problem: Null Bytes in Shellcode!

The problem is that strlen() stops counting at the first null byte (\x00), but your shellcode is full of null bytes.

We can try to make our shellcode Null Bytes free!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
section .text
    global _start

_start:
    jmp short get_message

shellcode:
    ; Get message address using JMP-CALL-POP
    pop rsi
    
    ; write(1, message, 13) - NO NULL BYTES!
    xor rax, rax        ; Clear RAX
    mov al, 1           ; SYS_write = 1 (only sets low byte)
    xor rdi, rdi        ; Clear RDI  
    mov dil, 1          ; fd = 1 (only sets low byte)
    xor rdx, rdx        ; Clear RDX
    mov dl, 13          ; length = 13 (only sets low byte)
    syscall

    ; exit(0) - NO NULL BYTES!
    xor rax, rax        ; Clear RAX
    mov al, 60          ; SYS_exit = 60 (only sets low byte)
    xor rdi, rdi        ; status = 0 (no null bytes!)
    syscall

get_message:
    call shellcode
    message db "Hello World!", 0x0a

When you assemble and link this code, then dump the output, you’ll find the generated shellcode contains no null bytes.

1
2
$ objdump -d null_free | grep -Po '\s\K[a-f0-9]{2}(?=\s)' | tr -d '\n' | sed 's/\(..\)/\\x\1/g'
\xeb\x1d\x5e\x48\x31\xc0\xb0\x01\x48\x31\xff\x40\xb7\x01\x48\x31\xd2\xb2\x0d\x0f\x05\x48\x31\xc0\xb0\x3c\x48\x31\xff\x0f\x05\xe8\xde\xff\xff\xff\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x0a

Let’s use this shellcode in our loader program and run it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

const char shellcode[] = "\xeb\x1d\x5e\x48\x31\xc0\xb0\x01\x48\x31\xff\x40\xb7\x01\x48\x31\xd2\xb2\x0d\x0f\x05\x48\x31\xc0\xb0\x3c\x48\x31\xff\x0f\x05\xe8\xde\xff\xff\xff\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x0a";

int main() {
    printf("Shellcode Length: %zu bytes\n", strlen(shellcode));
    
    // Allocate executable memory
    void *exec_mem = mmap(NULL, sizeof(shellcode), 
                         PROT_READ | PROT_WRITE | PROT_EXEC,
                         MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    
    if (exec_mem == MAP_FAILED) {
        perror("mmap failed");
        return 1;
    }
    
    printf("Allocated executable memory at: %p\n", exec_mem);
    
    // Copy shellcode to executable memory
    memcpy(exec_mem, shellcode, sizeof(shellcode));
    
    printf("Executing shellcode...\n");
    
    /* PROBLEM */
	// int (*ret)() = (int(*)())code;  // Trying to execute data section
	// ret();
      
    // Cast to function pointer and execute
    int (*func)() = (int(*)())exec_mem;
    func();
    
    // Cleanup (though we may not reach this)
    munmap(exec_mem, sizeof(shellcode));
    
    return 0;
}

On compiling and running, the shellcode executes successfully:

1
2
3
4
5
6
7
$ gcc loader.c -o loader
$ ./loader 
Shellcode Length: 49 bytes
Allocated executable memory at: 0x7d824fbad000
Executing shellcode...
Hello World!

I have developed an automation tool that streamlines the process of compilation to shellcode extraction and execution.

Workflow:

Assembly (.s) → Binary → Shellcode Extraction → Loader Generation → Execution

You can find it here

This post is licensed under CC BY 4.0 by the author.