Post

4. Assembly Language Basics

4. Assembly Language Basics

Now that we understand how computers represent data, it’s time to start speaking their language. This chapter introduces the fundamental concepts of x86-64 assembly syntax, your first program, and the core instructions that will form the building blocks of everything we do.

Assembly Syntax: AT&T vs. Intel

x86-64 assembly has two major syntax formats. We’ll be using Intel syntax (used by NASM), but you should be aware of both:

AT&T Syntax (GAS - GNU Assembler)

movq $0x10, %rax      # Immediate value with $, register with %
addl %ebx, %eax       # Source first, destination second
movl 8(%rbp), %edi    # Memory operand: displacement(base,index,scale)

Intel Syntax (NASM, MASM)

mov rax, 10h          # No special symbols for immediates/registers
add eax, ebx          # Destination first, source second  
mov edi, [rbp+8]      # Memory operand: [base + displacement]

Your First Assembly Program

Let’s create the classic “Hello World” program to see assembly in action:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
; hello.s - Hello World in x86-64 Assembly (Intel Syntax)

section .data
    message db "Hello, World!", 10, 0   ; String with newline and null terminator

section .text
    global _start                       ; Make entry point visible to linker

_start:
    ; Write system call (sys_write)
    mov rax, 1                          ; system call number (sys_write)
    mov rdi, 1                          ; file descriptor (stdout)
    mov rsi, message                    ; message pointer
    mov rdx, 14                         ; message length
    syscall                             ; invoke system call

    ; Exit system call (sys_exit)
    mov rax, 60                         ; system call number (sys_exit)
    mov rdi, 0                          ; exit status (0 = success)
    syscall                             ; invoke system call

To assemble and run (using NASM):

1
2
3
4
sudo apt install nasm
nasm -f elf64 hello.s -o hello.o
ld hello.o -o hello
./hello

Explanation

; indicates a comment. Everything after ; on this line is for humans to read and is ignored by the assembler.

section .data - This defines the “data section” of our program. This is where we put all our pre-defined variables and constants that have initial values.

message db "Hello, World!", 10, 0

  • message - This is a label (a name we give to a memory location). Think of it like a variable name.
  • db - Stands for “define bytes”. It tells the assembler we’re defining a sequence of bytes.
  • "Hello, World!" - The actual text string we want to print.
  • 10 - The ASCII code for a newline character (\n).
  • 0 - A null terminator (value 0). This marks the end of the string in memory.

section .text defines the “text section” where our actual program code lives.

global _start makes the _start label visible to the linker. The linker needs to know where our program begins execution.

_start: is our program’s entry point. When the program runs, execution begins here. The colon : defines it as a label.

syscall instruction triggers the system call. The CPU looks at rax, rdi, rsi, and rdx to know what function to execute and with what parameters.

Now that you’ve seen your first assembly program, let’s understand the magic behind the syscall instruction and how programs communicate with the operating system.

What are System Calls?

System calls are the interface between your program and the operating system kernel. They allow user programs to request services from the OS, such as:

  • Input/Output (reading/writing to files, printing to screen)
  • Memory management (allocating/freeing memory)
  • Process control (creating/ending processes)
  • Device communication (accessing hardware)

Think of system calls as “privileged functions” that only the OS can perform for security and stability reasons.

In x86-64 Linux, we use the syscall instruction to invoke system calls:

1
2
3
4
5
mov rax, 1    ; System call number (sys_write = 1)
mov rdi, 1    ; First argument (file descriptor = stdout)
mov rsi, message ; Second argument (pointer to message)
mov rdx, 14   ; Third argument (message length)
syscall       ; Trigger the system call

What happens during syscall:

  1. CPU switches to kernel mode
  2. Kernel checks the system call number in rax
  3. Kernel reads arguments from registers
  4. Kernel performs the requested operation
  5. CPU switches back to user mode
  6. Result is returned in rax

Refer this amazing site - syscall.sh

The calling convention for system calls on x86-64 Linux is straightforward:

RegisterPurpose
raxSystem call number
rdiFirst argument
rsiSecond argument
rdxThird argument
r10Fourth argument
r8Fifth argument
r9Sixth argument

Return value is always in rax

Basic Operand Types

Immediate Values:

1
2
3
mov rax, 100         ; Decimal
mov rbx, 0x64        ; Hexadecimal  
mov rcx, 0b1100100   ; Binary

Register Operands:

mov rax, rbx         ; Register to register
add rcx, rax         ; Register arithmetic

Memory Operands:

mov rax, [variable]  ; Direct addressing
mov rbx, [rsi]       ; Register indirect
mov rcx, [rsi + 8]   ; Register + displacement

Essential Instructions

Data Movement:

mov dest, src        ; Move data
lea dest, [src]      ; Load effective address (calculate address)

Arithmetic:

add dest, src        ; Addition
sub dest, src        ; Subtraction
inc dest             ; Increment
dec dest             ; Decrement

Comparison:

cmp op1, op2         ; Compare (sets flags without storing result)

Control Flow:

jmp label            ; Unconditional jump
je label             ; Jump if equal
jne label            ; Jump if not equal

Don’t worry if you don’t get everything right away! Some of these concepts like memory addressing, system calls, and program structure might feel unfamiliar at first. In the upcoming chapters, we’ll break everything down step by step and these concepts will become second nature.

For now, just focus on getting comfortable with the tools - try to compile the example program and run it successfully. If you can make “Hello, World!” appear on your screen, you’re already winning!

Next Up: We’ll explore how to work with different data sizes and perform arithmetic operations while understanding how they affect the CPU’s status flags.

This post is licensed under CC BY 4.0 by the author.