4. Assembly Language Basics
Now that we understand how computers represent data, it’s time to start speaking their language. This chapter introduces the fundamental concepts of x86-64 assembly syntax, your first program, and the core instructions that will form the building blocks of everything we do.
Assembly Syntax: AT&T vs. Intel
x86-64 assembly has two major syntax formats. We’ll be using Intel syntax (used by NASM), but you should be aware of both:
AT&T Syntax (GAS - GNU Assembler)
movq $0x10, %rax # Immediate value with $, register with %
addl %ebx, %eax # Source first, destination second
movl 8(%rbp), %edi # Memory operand: displacement(base,index,scale)
Intel Syntax (NASM, MASM)
mov rax, 10h # No special symbols for immediates/registers
add eax, ebx # Destination first, source second
mov edi, [rbp+8] # Memory operand: [base + displacement]
Your First Assembly Program
Let’s create the classic “Hello World” program to see assembly in action:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
; hello.s - Hello World in x86-64 Assembly (Intel Syntax)
section .data
message db "Hello, World!", 10, 0 ; String with newline and null terminator
section .text
global _start ; Make entry point visible to linker
_start:
; Write system call (sys_write)
mov rax, 1 ; system call number (sys_write)
mov rdi, 1 ; file descriptor (stdout)
mov rsi, message ; message pointer
mov rdx, 14 ; message length
syscall ; invoke system call
; Exit system call (sys_exit)
mov rax, 60 ; system call number (sys_exit)
mov rdi, 0 ; exit status (0 = success)
syscall ; invoke system call
To assemble and run (using NASM):
1
2
3
4
sudo apt install nasm
nasm -f elf64 hello.s -o hello.o
ld hello.o -o hello
./hello
Explanation
; indicates a comment. Everything after ; on this line is for humans to read and is ignored by the assembler.
section .data - This defines the “data section” of our program. This is where we put all our pre-defined variables and constants that have initial values.
message db "Hello, World!", 10, 0
message- This is a label (a name we give to a memory location). Think of it like a variable name.db- Stands for “define bytes”. It tells the assembler we’re defining a sequence of bytes."Hello, World!"- The actual text string we want to print.10- The ASCII code for a newline character (\n).0- A null terminator (value 0). This marks the end of the string in memory.
section .text defines the “text section” where our actual program code lives.
global _start makes the _start label visible to the linker. The linker needs to know where our program begins execution.
_start: is our program’s entry point. When the program runs, execution begins here. The colon : defines it as a label.
syscall instruction triggers the system call. The CPU looks at rax, rdi, rsi, and rdx to know what function to execute and with what parameters.
Now that you’ve seen your first assembly program, let’s understand the magic behind the syscall instruction and how programs communicate with the operating system.
What are System Calls?
System calls are the interface between your program and the operating system kernel. They allow user programs to request services from the OS, such as:
- Input/Output (reading/writing to files, printing to screen)
- Memory management (allocating/freeing memory)
- Process control (creating/ending processes)
- Device communication (accessing hardware)
Think of system calls as “privileged functions” that only the OS can perform for security and stability reasons.
In x86-64 Linux, we use the syscall instruction to invoke system calls:
1
2
3
4
5
mov rax, 1 ; System call number (sys_write = 1)
mov rdi, 1 ; First argument (file descriptor = stdout)
mov rsi, message ; Second argument (pointer to message)
mov rdx, 14 ; Third argument (message length)
syscall ; Trigger the system call
What happens during syscall:
- CPU switches to kernel mode
- Kernel checks the system call number in
rax - Kernel reads arguments from registers
- Kernel performs the requested operation
- CPU switches back to user mode
- Result is returned in
rax
Refer this amazing site - syscall.sh
The calling convention for system calls on x86-64 Linux is straightforward:
| Register | Purpose |
|---|---|
| rax | System call number |
| rdi | First argument |
| rsi | Second argument |
| rdx | Third argument |
| r10 | Fourth argument |
| r8 | Fifth argument |
| r9 | Sixth argument |
Return value is always in rax
Basic Operand Types
Immediate Values:
1
2
3
mov rax, 100 ; Decimal
mov rbx, 0x64 ; Hexadecimal
mov rcx, 0b1100100 ; Binary
Register Operands:
mov rax, rbx ; Register to register
add rcx, rax ; Register arithmetic
Memory Operands:
mov rax, [variable] ; Direct addressing
mov rbx, [rsi] ; Register indirect
mov rcx, [rsi + 8] ; Register + displacement
Essential Instructions
Data Movement:
mov dest, src ; Move data
lea dest, [src] ; Load effective address (calculate address)
Arithmetic:
add dest, src ; Addition
sub dest, src ; Subtraction
inc dest ; Increment
dec dest ; Decrement
Comparison:
cmp op1, op2 ; Compare (sets flags without storing result)
Control Flow:
jmp label ; Unconditional jump
je label ; Jump if equal
jne label ; Jump if not equal
Don’t worry if you don’t get everything right away! Some of these concepts like memory addressing, system calls, and program structure might feel unfamiliar at first. In the upcoming chapters, we’ll break everything down step by step and these concepts will become second nature.
For now, just focus on getting comfortable with the tools - try to compile the example program and run it successfully. If you can make “Hello, World!” appear on your screen, you’re already winning!
Next Up: We’ll explore how to work with different data sizes and perform arithmetic operations while understanding how they affect the CPU’s status flags.