Post

12. Strings

12. Strings

String instructions are specialized x86-64 operations designed for efficient processing of strings and memory blocks. They automatically handle pointer increment/decrement and can repeat operations, making them ideal for working with sequences of bytes, words, or doublewords.

What are String Instructions?

String instructions use specific register combinations:

  • RSI - Source Index (points to source data)
  • RDI - Destination Index (points to destination)
  • RCX - Counter (number of iterations)
  • RAX - Value for comparisons and storage

They automatically update pointers based on the Direction Flag (DF).

Direction Flag Control

The Direction Flag determines whether pointers increment or decrement:

1
2
cld                 ; Clear Direction Flag (DF=0) - increment pointers
std                 ; Set Direction Flag (DF=1) - decrement pointers

Always set the direction flag’s direction before using string instructions!

  • The Direction Flag’s initial state is not guaranteed
  • String instructions behave differently based on DF:

    • DF=0 (cld): Pointers increment (RSI++, RDI++)
    • DF=1 (std): Pointers decrement (RSI–, RDI–)

Basic String Instructions

1. MOVSB / MOVSW / MOVSD / MOVSQ - Move String

Syntax: MOVSx where x = B(byte), W(word), D(doubleword), Q(quadword)

How it works:

  • Copies data from [RSI] to [RDI]
  • Updates RSI and RDI based on DF and data size
1
2
3
4
5
6
7
8
9
10
11
12
13
14
section .data
    src db 'Hello', 0
    dst times 6 db 0

section .text
global _start
_start:
    mov rsi, src        ; Source pointer
    mov rdi, dst        ; Destination pointer
    mov rcx, 6          ; Number of bytes to copy
    cld                 ; Clear DF (forward direction)
    rep movsb           ; Copy RCX bytes from RSI to RDI
    
    ; Now dst contains 'Hello'

If we debug this after compilation in GDB we can see that dst stores the string the string Hello.

In my case the address of dst was 0x402006

1
2
pwndbg> x/s 0x402006
0x402006:	"Hello"

A more dynamic code instead of hard coding the length of the string would be to use our good old friend $ - string_name ;)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
section .data
    source_string db 'Hello, World!', 0
    string_length equ $ - source_string

section .bss
    dest_string resb string_length

section .text
    global _start

_start:
    ; Set up the source and destination pointers
    cld                     ; Clear Direction Flag (DF=0) for forward processing
    mov esi, source_string  ; Load effective address of source string into ESI
    mov edi, dest_string    ; Load effective address of destination string into EDI

    ; Set up the counter for the string length
    mov ecx, string_length  ; Load the length of the string into ECX

    ; Execute the repeated move operation
    rep movsb               ; Repeat the MOVSB instruction ECX times

    ; calling exit syscall
    mov eax, 1              ; Exit system call number
    mov ebx, 0              ; Exit code 0
    int 0x80                ; Call kernel

By the way, you can compile your assembly code using gcc also. This is quite convenient because gcc does the automation of automatically calling the assembler and linker for you, simplifying the build process. For your assembly file named hello.s, you would first assemble and link it with a single command like:

1
gcc -nostdlib -o hello hello.s

The -nostdlib flag tells gcc not to link the standard C runtime, which matches your use case for a minimal assembly program with a main symbol.

If you want to keep your entry point as main instead of _start, gcc will handle this automatically when linking. This way, you can write your assembly with a global main label and gcc will produce an executable without the linker warning about missing _start.

Intel syntax in GNU Assembler is an alternative assembly language notation to the default AT&T syntax. Using .intel_syntax noprefix at the start of your assembly file switches the assembler to Intel syntax mode.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
.intel_syntax noprefix    # Switch to Intel syntax without register prefixes

.section .data
    message: .ascii "Hello, World!\n"   # String with newline
    message_len = . - message           # Length of the string

.section .text
    .global _start

_start:
    mov rax, 1                          # syscall: sys_write
    mov rdi, 1                          # file descriptor: stdout
    mov rsi, offset message             # pointer to message (note 'offset')
    mov rdx, message_len                # message length
    syscall                             # make kernel call

    mov rax, 60                         # syscall: sys_exit
    xor rdi, rdi                        # exit code 0
    syscall                             # make kernel call

Important Notes for GNU Assembler with Intel Syntax:

  • Use # for comments in place of ;
  • Use .ascii or .asciz instead of db for string data
  • Use = instead of equ for constants
  • Use offset keyword when loading addresses
  • Labels use colon syntax (message: not message)
  • Directives start with dot (.section, .global)

Let’s talk a bit on offset strange word we are seeing above.

mov rsi, offset message

Well offset is a directive that tells the assembler: “Get the address of this label at assembly time”.

1
2
3
4
5
6
7
$ gcc -o hello hello.s
/usr/bin/ld: /tmp/cc9n75kl.o: in function `_start':
(.text+0x0): multiple definition of `_start'; /usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/Scrt1.o:(.text+0x0): first defined here
/usr/bin/ld: /tmp/cc9n75kl.o: relocation R_X86_64_32S against `.data' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status

I deliberately generated an error so that you will know what we are missing.

ERROR 1: multiple definition of ‘_start’

What’s happening: GCC links with the C runtime library by default, which already provides a _start function. When you define your own _start, you get a conflict.

Solution: Either:

  • Use main instead of _start when using GCC
  • Or use -nostdlib to exclude the C runtime

ERROR 2: PIE Relocation Error

Modern GCC creates Position Independent Executables (PIE) by default. Your assembly code uses absolute addresses that aren’t compatible with PIE.

Solution: Either:

  • Make your code position-independent
  • Or disable PIE with -no-pie

If you’re wondering why our code is position dependent, it’s because of how we’re referencing the message label. The offset directive we used earlier calculates the absolute memory address of the label during assembly/linking, which creates position dependency.

You can verify this in objdump while debugging -

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ objdump  -M intel -D hello
#...
Disassembly of section .text:

0000000000401000 <_start>:
  401000:	48 c7 c0 01 00 00 00 	mov    rax,0x1
  401007:	48 c7 c7 01 00 00 00 	mov    rdi,0x1
  40100e:	48 c7 c6 00 20 40 00 	mov    rsi,0x402000
  401015:	48 c7 c2 0e 00 00 00 	mov    rdx,0xe
  40101c:	0f 05                	syscall 
  40101e:	48 c7 c0 3c 00 00 00 	mov    rax,0x3c
  401025:	48 31 ff             	xor    rdi,rdi
  401028:	0f 05                	syscall 
#...

We can see mov rsi, 0x402000.

When you disassemble the binary, you’ll see something like this:

1
2
3
4
5
6
$ objdump -s -j .data hello

hello:     file format elf64-x86-64

Contents of section .data:
 402000 48656c6c 6f2c2057 6f726c64 210a      Hello, World!. 

Here, the string "Hello, World!\n" is stored starting at address 0x402000 in the .data section of the ELF file. This is the address that the program uses when it moves the pointer into the rsi register with the instruction mov rsi, 0x402000.

But when we try dumping the content using dd, hexdump or xxd we will fail.

If we try to use dd, hexdump or xxd to dump the content starting from 0x402000, we won’t get the string as expected.

1
2
3
4
5
$ xxd -s 0x402000 -l 16 hello
# We get nothing
$ hexdump -s 0x402000 -n 16 -C hello
00002308

At first glance, it might seem straightforward to dump the contents of the .data section from memory by accessing the address 0x402000. However, when you attempt to dump the contents of the ELF file directly, things aren’t as simple.

The Virtual Address vs File Offset

The issue lies in the difference between virtual memory addresses and file offsets.

  • The address 0x402000 is a virtual memory address, which is where the operating system will load the program’s data into memory at runtime.
  • However, when you look at the ELF file on disk, the actual data is stored at a file offset, not at the virtual memory address. This means if you try to dump the file content at 0x402000 using tools like dd, hexdump, or xxd, you’ll run into problems because the address is not the same as the file offset.

Let’s demonstrate this again with dd

1
2
$ dd if=hello of=/dev/stdout bs=1 skip=0x402000 count=16 2>/dev/null | xxd
# No output nothing

As you can see, the content we expect ("Hello, World!\n") does not appear at the address 0x402000. This is because we are dumping data directly from the file, and the offset in the ELF file may differ from the virtual address where the program will load it during execution.

The first step is to check the ELF file’s section headers. We can use readelf to do this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ readelf --wide -S hello 
There are 7 section headers, starting at offset 0x2148:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .note.gnu.build-id NOTE            0000000000400120 000120 000024 00   A  0   0  4
  [ 2] .text             PROGBITS        0000000000401000 001000 00002a 00  AX  0   0  1
  [ 3] .data             PROGBITS        0000000000402000 002000 00000e 00  WA  0   0  1
  [ 4] .symtab           SYMTAB          0000000000000000 002010 0000c0 18      5   4  8
  [ 5] .strtab           STRTAB          0000000000000000 0020d0 000038 00      0   0  1
  [ 6] .shstrtab         STRTAB          0000000000000000 002108 00003a 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

From the readelf output, we can see that the .data section starts at file offset 0x1fa0, which corresponds to the virtual address 0x402000 when the program is loaded into memory.

The address 0x402000 is where the data is loaded into memory during runtime. But in the raw ELF file, it resides at offset 0x2000 you can see under off column.

Now that we know the correct file offset is 0x2000, we can dump the contents of the .data section using dd, xxd, or hexdump from the correct file offset.

1
2
3
4
5
6
7
8
9
$ xxd -s 0x2000 -l 16 hello
00002000: 4865 6c6c 6f2c 2057 6f72 6c64 210a 0000  Hello, World!...
$ hexdump -s 0x2000 -n 16 -C hello
00002000  48 65 6c 6c 6f 2c 20 57  6f 72 6c 64 21 0a 00 00  |Hello, World!...|
00002010

$ dd if=hello of=/dev/stdout bs=1 skip=$((0x2000)) count=16 2>/dev/null | xxd
00000000: 4865 6c6c 6f2c 2057 6f72 6c64 210a 0000  Hello, World!...

The Virtual Address (0x402000) represents the memory location where the data will reside during program execution, while the File Offset (0x2000) indicates the position of that data within the ELF file on disk.

That was too much!

Now let’s come back to our aim of making the binary Position Independent.

PIE executables can be loaded at different memory addresses each time they run, so hardcoded addresses don’t work.

We can achieve this using 3 methods:

1. RIP-Relative Addressing (Recommended)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
.intel_syntax noprefix

.section .data
    message: .ascii "Hello, World!\n"
    message_len = . - message

.section .text
    .global main

main:
    # RIP-relative addressing - calculates address relative to current instruction
    lea rsi, [rip + message]      # Load effective address of message
    
    mov rax, 1                    # sys_write
    mov rdi, 1                    # stdout
    mov rdx, message_len          # message length
    syscall
    
    mov rax, 60                   # sys_exit
    xor rdi, rdi                  # exit code 0
    syscall

Visual Representation

Let’s see visual representation of instruction lea rsi, [rip + message].

Memory Layout (Actual Example):
┌─────────────────┐ 0x401000
│   .text section │ ← Code section (executable)
│                 │
│   _start:       │
│   0x401000:     │ ← lea rsi,[rip+0xff9] instruction
│   0x401007:     │ ← RIP points HERE during execution ✓
│   mov rax,60    │
│   0x40100e:     │
│   ...           │
├─────────────────┤ 
│   (gap)         │ ← Other sections may be here
├─────────────────┤ 0x402000  
│   .data section │ ← Data section (read-write)
│                 │
│   message:      │ ← "Hello" string at 0x402000 ✓
│   "Hello"       │
└─────────────────┘

Offset calculation: 
  RIP during execution:   0x401007
  Message address:        0x402000
  Offset: 0x402000 - 0x401007 = 0xFF9

Final encoded instruction: lea rsi, [rip + 0xff9]

When you write [rip + message], you’re not actually adding RIP to the address of message. Instead, the assembler calculates the offset from the current instruction to the message label and encodes that offset into the instruction. At runtime, the CPU adds this offset to RIP to find the actual address. This relative addressing is what makes the code position-independent - the distance between code and data remains constant, even when the entire program is loaded at different memory locations.

2. Use GOT (Global Offset Table)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
.intel_syntax noprefix

.section .data
    message: .ascii "Hello, World!\n"
    message_len = . - message

.section .text
    .global main

main:
    # Access through Global Offset Table
    mov rsi, [rip + message@GOTPCREL]  # Get address from GOT
    
    mov rax, 1                    # sys_write
    mov rdi, 1                    # stdout
    mov rdx, message_len          # message length
    syscall
    
    mov rax, 60                   # sys_exit
    xor rdi, rdi                  # exit code 0
    syscall

The Global Offset Table is a special memory area that contains pointers to all global variables and functions. Think of it as an “address book” that gets filled in when the program loads.

Memory Layout will look like this:

┌─────────────────┐ 0x401000
│   .text section │ 
│   _start:       │
│   mov rsi,...   │ ← This instruction at 0x401000
│   mov rax, 60   │ ← RIP points HERE during execution (0x401007)
│   ...           │
├─────────────────┤ 0x401800
│   .got section  │ ← Global Offset Table
│   ...           │
│   message_ptr:  │ ← Pointer to message (initially zero)
│   0x00000000    │ ← Will be filled at runtime!
│   ...           │
├─────────────────┤ 0x402000  
│   .data section │ 
│   message:      │ ← "Hello" at 0x402000
│   "Hello"       │
└─────────────────┘

What the Assembler Does

  1. Sees: message@GOTPCREL
  2. Calculates: Where the GOT entry for message will be relative to RIP
  3. Encodes: That offset into the instruction

For our example:

  • Instruction: mov rsi, [rip + message@GOTPCREL] at 0x401000
  • RIP during execution = 0x401007
  • GOT entry address = 0x401800 (where the pointer to message lives)
  • Offset = 0x401800 - 0x401007 = 0x7F9

So the instruction becomes: mov rsi, [rip + 0x7F9]

PCREL = “PC-relative” (RIP-relative)

When the program starts, the dynamic linker:

  1. Finds where message actually is in memory
  2. Writes that address into the GOT entry at 0x401800
  3. So 0x401800 now contains: 0x402000 (actual address of “Hello”)
Memory Map:
.text (0x401000)    →    .got (0x401800)    →    .data (0x402000)
┌─────────────┐          ┌─────────────┐          ┌─────────────┐
│ mov rsi,... │          │ 0x402000    │          │ "Hello"     │
│             │          │ (pointer)   │          │             │
│             │ 0x7F9    │             │          │             │
│             │ ───────→ │             │ ───────→ │             │
│ RIP=0x401007│          │ GOT entry   │          │ message     │
└─────────────┘          └─────────────┘          └─────────────┘
         Access GOT entry              Get actual address

But why do we use GOT?

Your Program:        mov rsi, [rip + printf@GOTPCREL]
                     call rsi

Memory:
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ Your code   │ →  │ GOT entry   │ →  │ printf in   │
│             │    │ for printf  │    │ libc.so     │
└─────────────┘    └─────────────┘    └─────────────┘

You can verify this using ldd

1
2
3
4
5
$ ldd print_str 
	linux-vdso.so.1 (0x00007ffeb26f0000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000781887400000)
	/lib64/ld-linux-x86-64.so.2 (0x000078188768a000)

Now you will know -

message@GOTPCREL = “The offset to message’s GOT entry from RIP”

So the assembler converts:

mov rsi, [rip + message@GOTPCREL]
↓
mov rsi, [rip + 0x7F9]  # Where 0x7F9 is offset to GOT entry

When you write mov rsi, [rip + message@GOTPCREL], you’re not loading the address of message directly. Instead, you’re accessing the Global Offset Table - a special memory area that contains pointers to all global variables. The @GOTPCREL syntax tells the assembler to calculate the offset to message’s GOT entry. At runtime, the instruction reads from that GOT entry to get the actual address of message, which was filled in by the dynamic linker. This extra indirection enables powerful features like shared libraries and runtime symbol resolution, making it the foundation of dynamic linking on modern systems!

2. CMPSB / CMPSW / CMPSD / CMPSQ - Compare String

Syntax: CMPSx

How it works:

  • Compares [RSI] with [RDI]
  • Sets flags like CMP instruction
  • Updates RSI and RDI
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
section .data
    str1 db 'Hello', 0
    str2 db 'Hello', 0
    str3 db 'World', 0

section .text
global _start
_start:
    ; Compare two equal strings
    mov rsi, str1
    mov rdi, str2
    mov rcx, 6
    cld
    repe cmpsb          ; Compare while equal
    jz strings_equal    ; Jump if all characters matched
    
    ; Compare different strings  
    mov rsi, str1
    mov rdi, str3
    mov rcx, 6
    cld
    repe cmpsb
    jnz strings_different

The REPE instruction in x86 is a prefix that repeats a string instruction as long as the count register is not zero and the Zero Flag (ZF) is set. REPE (Repeat while Equal) is synonymous with REPZ (Repeat while Zero).

Syntax: REPE instruction or REPZ instruction

Operation:

  1. Execute the string instruction
  2. Decrement RCX
  3. Continue only if:

    • RCX ≠ 0 AND
    • ZF = 1 (last comparison was equal/zero)

Stops when:

  • RCX reaches 0, OR
  • ZF becomes 0 (comparison fails)

3. SCASB / SCASW / SCASD / SCASQ - Scan String

Syntax: SCASx

How it works:

  • Compares AL/AX/EAX/RAX with [RDI]
  • Sets flags
  • Updates RDI
1
2
3
4
5
6
7
8
9
10
11
12
13
section .data
    message db 'Find the letter X', 0

section .text
global _start
_start:
    mov rdi, message    ; String to scan
    mov al, 'X'         ; Character to find
    mov rcx, 17         ; String length
    cld
    repne scasb         ; Scan while not equal
    jz found            ; Jump if character found
    ; else, not found
PrefixConditionUse Case
REPEZF=1Find first mismatch
REPNEZF=0Find first match

4. STOSB / STOSW / STOSD / STOSQ - Store String

Syntax: STOSx

How it works:

  • Stores AL/AX/EAX/RAX into [RDI]
  • Updates RDI
1
2
3
4
5
6
7
8
9
10
11
section .bss
    buffer resb 100

section .text
global _start
_start:
    mov rdi, buffer     ; Destination buffer
    mov al, 'A'         ; Character to store
    mov rcx, 100        ; Buffer size
    cld
    rep stosb           ; Fill buffer with 'A'

5. LODSB / LODSW / LODSD / LODSQ - Load String

Syntax: LODSx

How it works:

  • Loads [RSI] into AL/AX/EAX/RAX
  • Updates RSI
1
2
3
4
5
6
7
8
9
10
11
12
13
14
section .data
    numbers db 1, 2, 3, 4, 5

section .text
global _start
_start:
    mov rsi, numbers    ; Source data
    mov rcx, 5          ; Number of elements
    xor rax, rax        ; Clear sum
    cld
sum_loop:
    lodsb               ; Load byte into AL
    add rax, rax        ; Process data (example: double each number)
    loop sum_loop

Repeat Prefixes

String instructions become powerful when combined with repeat prefixes:

PrefixConditionOperation
REPRCX ≠ 0Repeat while RCX > 0
REPE / REPZRCX ≠ 0 and ZF=1Repeat while equal/zero
REPNE / REPNZRCX ≠ 0 and ZF=0Repeat while not equal/not zero

Now let’s get hands on experience on some practical examples.

Example 1: String Length (Strlen)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
section .data
    my_string db 'Hello, World!', 0

section .text
global _start
_start:
    mov rdi, my_string  ; String to measure
    xor rax, rax        ; AL = 0 (null terminator)
    mov rcx, -1         ; Maximum count (4GB-1)
    cld
    repne scasb         ; Scan until null byte found
    
    ; Calculate length: -1 - RCX - 1
    mov rax, -2         ; -1 - 1
    sub rax, rcx        ; RAX = string length

Example 2: Memory Fill

1
2
3
4
5
6
7
8
9
10
11
section .bss
    array resd 1000     ; Reserve 1000 doublewords

section .text
global _start
_start:
    mov rdi, array      ; Destination
    mov eax, 0xFFFFFFFF ; Pattern to fill
    mov rcx, 1000       ; Number of doublewords
    cld
    rep stosd           ; Fill with 0xFFFFFFFF

Example 3: String Copy with Length Limit

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
section .data
    source db 'This is a long string', 0
section .bss
    dest resb 50

section .text
global _start
_start:
    mov rsi, source
    mov rdi, dest
    mov rcx, 49         ; Maximum characters to copy
    cld
    
copy_loop:
    lodsb               ; Load from source
    test al, al         ; Check for null terminator
    jz copy_done        ; Stop if null found
    stosb               ; Store to destination
    loop copy_loop      ; Continue until RCX=0
    
copy_done:
    mov byte [rdi], 0   ; Add null terminator

Example 4: Case Conversion

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
section .data
    text db 'Hello World', 0
    len equ $ - text - 1

section .text
global _start
_start:
    mov rsi, text
    mov rdi, text
    mov rcx, len
    cld
    
convert_loop:
    lodsb               ; Load character
    cmp al, 'a'
    jb not_lower
    cmp al, 'z'
    ja not_lower
    sub al, 32          ; Convert to uppercase
not_lower:
    stosb               ; Store back
    loop convert_loop

Using Multiple String Instructions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
; Convert string to uppercase and calculate length
section .data
    input db 'hello world', 0

section .text
global _start
_start:
    ; Find length first
    mov rdi, input
    xor rax, rax
    mov rcx, -1
    repne scasb
    mov r8, rcx         ; Save length information
    
    ; Convert to uppercase
    mov rsi, input
    mov rdi, input
    not r8              ; Convert negative length to positive
    dec r8              ; Adjust for null terminator
    mov rcx, r8
    cld
    
convert:
    lodsb
    cmp al, 'a'
    jb store
    cmp al, 'z'
    ja store
    sub al, 32
store:
    stosb
    loop convert

Just remember -

  • MOVSx - Efficient memory copying
  • CMPSx - Fast block comparison
  • SCASx - Rapid scanning and searching
  • STOSx - Quick memory initialization
  • LODSx - Streamlined data loading
  • Repeat prefixes - Automated repetition

You’ve now mastered the powerful string instructions that make x86-64 assembly exceptionally efficient for data processing.

In our next topic, we’ll explore one of the most fundamental concepts in programming - the Stack and Procedures!

This post is licensed under CC BY 4.0 by the author.