Post

9. Array and Addressing Modes

9. Array and Addressing Modes

Now that we can make decisions and create loops, let’s learn how to work with collections of data. Arrays are fundamental to programming, and x86-64 provides powerful addressing modes to access them efficiently.

We have already seen the Addressing modes here.

What are Arrays?

In assembly or in CS, an array is simply a contiguous block of memory containing multiple elements of the same data type.

Here is how you define an array -

1
2
3
4
5
6
7
8
section .data
    bytes    db 10, 20, 30, 40, 50          ; Array of 5 bytes
    words    dw 1000, 2000, 3000, 4000      ; Array of 4 words (2 bytes each)
    dwords   dd 1, 2, 3, 4, 5, 6            ; Array of 6 doublewords (4 bytes each)
    qwords   dq 100, 200, 300, 400          ; Array of 4 quadwords (8 bytes each)
    
    ; String is also an array!
    message  db 'Hello', 0                  ; Array of characters

Addressing Modes for Array Access

x86-64 provides several ways to calculate memory addresses for array elements.

1. Direct Addressing

Access elements using fixed offsets from the array base.

1
2
3
mov al, [bytes]        ; First element (bytes[0] = 10)
mov bl, [bytes + 1]    ; Second element (bytes[1] = 20)
mov cl, [bytes + 2]    ; Third element (bytes[2] = 30)

2. Register Indirect

Use a register as a pointer to traverse the array.

1
2
3
4
mov rsi, bytes         ; RSI points to start of array
mov al, [rsi]          ; bytes[0]
inc rsi                ; Move to next element
mov bl, [rsi]          ; bytes[1]

3. Indexed Addressing

The most powerful method - combines base, index, and scale.

Syntax: [base + index * scale + displacement]

Where:

  • base: Base address register
  • index: Index register (the array subscript)
  • scale: Element size (1, 2, 4, or 8)
  • displacement: Constant offset
1
2
3
4
5
6
7
8
9
10
11
section .data
    arr dd 10, 20, 30, 40, 50    ; 32-bit integers (4 bytes each)

section .text
global _start
_start:
    mov rbx, arr        ; Base address
    mov rsi, 2          ; Index (we want arr[2])
    
    ; Access arr[2] = 30
    mov eax, [rbx + rsi * 4]    ; Base + (index * element_size)

Let’s see some practical examples now on arrays.

Example 1: Summing a Byte Array

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
section .data
    numbers db 5, 10, 15, 20, 25, 30
    length equ $ - numbers        ; Calculate array length

section .text
global _start

_start:
    mov rsi, numbers    ; Pointer to array start
    mov rcx, length     ; Counter = number of elements
    xor rax, rax        ; Sum = 0

sum_loop:
    movzx rbx, byte [rsi]  ; Load byte, zero-extend to 64 bits
    add rax, rbx        ; Add to sum
    inc rsi             ; Move to next element
    loop sum_loop       ; Decrement RCX and loop if not zero
    
    ; RAX now contains the sum (5+10+15+20+25+30 = 105)
    
    mov rax, 60
    mov rdi, 0
    syscall

You must have noticed this $ - numbers thing…

Well length equ $ - numbers is just a compile-time calculation (not runtime):

  • $ represents the current memory address
  • numbers is the start address of the array
  • $ - numbers calculates: (address after array) - (address of array start) = total bytes
  • equ makes length a constant equal to 6 (since we have 6 bytes)

The syntax byte [rsi] is a memory operand with explicit size specification. Using byte [..] explicitly tells the assembler we want to read 1 byte from that memory location.

Without size specifier, the assembler gets confused: It will be ambiguous for assembler and it will ask how many bytes should we read from [rsi]?

There is a new instruction we are seeing movzx = Move with Zero eXtend It takes a small value and places it in a larger register, filling the upper bits with zeros.

But why do we need to do it? We’re reading a byte (8 bits) but adding to rax (64 bits). Without zero-extension, we might get incorrect results.

Example 2: String Length Calculation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
section .data
    string db "Hello, Assembly!", 0

section .text
global _start

_start:
    mov rdi, string     ; String pointer
    xor rcx, rcx        ; Counter = 0
    
strlen_loop:
    cmp byte [rdi], 0   ; Check for null terminator
    je strlen_done      ; Found end of string
    inc rdi             ; Move to next character
    inc rcx             ; Increment count
    jmp strlen_loop
    
strlen_done:
    ; RCX now contains string length (16)
    
    mov rax, 60
    mov rdi, 0
    syscall

Addressing modes make array access elegant and efficient! The ability to combine base addresses, indexes, and scaling factors in a single instruction is what makes x86-64 assembly powerful for data processing.

Next Up: We’ll learn about Multiplication and Division Instructions - essential for more complex calculations and working with arrays of different element sizes!

This post is licensed under CC BY 4.0 by the author.