12. Strings
String instructions are specialized x86-64 operations designed for efficient processing of strings and memory blocks. They automatically handle pointer increment/decrement and can repeat operations, making them ideal for working with sequences of bytes, words, or doublewords.
What are String Instructions?
String instructions use specific register combinations:
- RSI - Source Index (points to source data)
- RDI - Destination Index (points to destination)
- RCX - Counter (number of iterations)
- RAX - Value for comparisons and storage
They automatically update pointers based on the Direction Flag (DF).
Direction Flag Control
The Direction Flag determines whether pointers increment or decrement:
1
2
cld ; Clear Direction Flag (DF=0) - increment pointers
std ; Set Direction Flag (DF=1) - decrement pointers
Always set the direction flag’s direction before using string instructions!
- The Direction Flag’s initial state is not guaranteed
String instructions behave differently based on DF:
- DF=0 (
cld): Pointers increment (RSI++, RDI++) - DF=1 (
std): Pointers decrement (RSI–, RDI–)
- DF=0 (
Basic String Instructions
1. MOVSB / MOVSW / MOVSD / MOVSQ - Move String
Syntax: MOVSx where x = B(byte), W(word), D(doubleword), Q(quadword)
How it works:
- Copies data from
[RSI]to[RDI] - Updates RSI and RDI based on DF and data size
1
2
3
4
5
6
7
8
9
10
11
12
13
14
section .data
src db 'Hello', 0
dst times 6 db 0
section .text
global _start
_start:
mov rsi, src ; Source pointer
mov rdi, dst ; Destination pointer
mov rcx, 6 ; Number of bytes to copy
cld ; Clear DF (forward direction)
rep movsb ; Copy RCX bytes from RSI to RDI
; Now dst contains 'Hello'
If we debug this after compilation in GDB we can see that dst stores the string the string Hello.
In my case the address of dst was 0x402006
1
2
pwndbg> x/s 0x402006
0x402006: "Hello"
A more dynamic code instead of hard coding the length of the string would be to use our good old friend $ - string_name ;)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
section .data
source_string db 'Hello, World!', 0
string_length equ $ - source_string
section .bss
dest_string resb string_length
section .text
global _start
_start:
; Set up the source and destination pointers
cld ; Clear Direction Flag (DF=0) for forward processing
mov esi, source_string ; Load effective address of source string into ESI
mov edi, dest_string ; Load effective address of destination string into EDI
; Set up the counter for the string length
mov ecx, string_length ; Load the length of the string into ECX
; Execute the repeated move operation
rep movsb ; Repeat the MOVSB instruction ECX times
; calling exit syscall
mov eax, 1 ; Exit system call number
mov ebx, 0 ; Exit code 0
int 0x80 ; Call kernel
By the way, you can compile your assembly code using gcc also. This is quite convenient because gcc does the automation of automatically calling the assembler and linker for you, simplifying the build process. For your assembly file named hello.s, you would first assemble and link it with a single command like:
1
gcc -nostdlib -o hello hello.s
The -nostdlib flag tells gcc not to link the standard C runtime, which matches your use case for a minimal assembly program with a main symbol.
If you want to keep your entry point as main instead of _start, gcc will handle this automatically when linking. This way, you can write your assembly with a global main label and gcc will produce an executable without the linker warning about missing _start.
Intel syntax in GNU Assembler is an alternative assembly language notation to the default AT&T syntax. Using .intel_syntax noprefix at the start of your assembly file switches the assembler to Intel syntax mode.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
.intel_syntax noprefix # Switch to Intel syntax without register prefixes
.section .data
message: .ascii "Hello, World!\n" # String with newline
message_len = . - message # Length of the string
.section .text
.global _start
_start:
mov rax, 1 # syscall: sys_write
mov rdi, 1 # file descriptor: stdout
mov rsi, offset message # pointer to message (note 'offset')
mov rdx, message_len # message length
syscall # make kernel call
mov rax, 60 # syscall: sys_exit
xor rdi, rdi # exit code 0
syscall # make kernel call
Important Notes for GNU Assembler with Intel Syntax:
- Use
#for comments in place of; - Use
.asciior.ascizinstead ofdbfor string data - Use
=instead ofequfor constants - Use
offsetkeyword when loading addresses - Labels use colon syntax (
message:notmessage) - Directives start with dot (
.section,.global)
Let’s talk a bit on offset strange word we are seeing above.
mov rsi, offset message
Well offset is a directive that tells the assembler: “Get the address of this label at assembly time”.
1
2
3
4
5
6
7
$ gcc -o hello hello.s
/usr/bin/ld: /tmp/cc9n75kl.o: in function `_start':
(.text+0x0): multiple definition of `_start'; /usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/Scrt1.o:(.text+0x0): first defined here
/usr/bin/ld: /tmp/cc9n75kl.o: relocation R_X86_64_32S against `.data' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
I deliberately generated an error so that you will know what we are missing.
ERROR 1: multiple definition of ‘_start’
What’s happening: GCC links with the C runtime library by default, which already provides a _start function. When you define your own _start, you get a conflict.
Solution: Either:
- Use
maininstead of_startwhen using GCC - Or use
-nostdlibto exclude the C runtime
ERROR 2: PIE Relocation Error
Modern GCC creates Position Independent Executables (PIE) by default. Your assembly code uses absolute addresses that aren’t compatible with PIE.
Solution: Either:
- Make your code position-independent
- Or disable PIE with
-no-pie
If you’re wondering why our code is position dependent, it’s because of how we’re referencing the message label. The offset directive we used earlier calculates the absolute memory address of the label during assembly/linking, which creates position dependency.
You can verify this in objdump while debugging -
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ objdump -M intel -D hello
#...
Disassembly of section .text:
0000000000401000 <_start>:
401000: 48 c7 c0 01 00 00 00 mov rax,0x1
401007: 48 c7 c7 01 00 00 00 mov rdi,0x1
40100e: 48 c7 c6 00 20 40 00 mov rsi,0x402000
401015: 48 c7 c2 0e 00 00 00 mov rdx,0xe
40101c: 0f 05 syscall
40101e: 48 c7 c0 3c 00 00 00 mov rax,0x3c
401025: 48 31 ff xor rdi,rdi
401028: 0f 05 syscall
#...
We can see mov rsi, 0x402000.
When you disassemble the binary, you’ll see something like this:
1
2
3
4
5
6
$ objdump -s -j .data hello
hello: file format elf64-x86-64
Contents of section .data:
402000 48656c6c 6f2c2057 6f726c64 210a Hello, World!.
Here, the string "Hello, World!\n" is stored starting at address 0x402000 in the .data section of the ELF file. This is the address that the program uses when it moves the pointer into the rsi register with the instruction mov rsi, 0x402000.
But when we try dumping the content using dd, hexdump or xxd we will fail.
If we try to use dd, hexdump or xxd to dump the content starting from 0x402000, we won’t get the string as expected.
1
2
3
4
5
$ xxd -s 0x402000 -l 16 hello
# We get nothing
$ hexdump -s 0x402000 -n 16 -C hello
00002308
At first glance, it might seem straightforward to dump the contents of the .data section from memory by accessing the address 0x402000. However, when you attempt to dump the contents of the ELF file directly, things aren’t as simple.
The Virtual Address vs File Offset
The issue lies in the difference between virtual memory addresses and file offsets.
- The address
0x402000is a virtual memory address, which is where the operating system will load the program’s data into memory at runtime. - However, when you look at the ELF file on disk, the actual data is stored at a file offset, not at the virtual memory address. This means if you try to dump the file content at
0x402000using tools likedd,hexdump, orxxd, you’ll run into problems because the address is not the same as the file offset.
Let’s demonstrate this again with dd
1
2
$ dd if=hello of=/dev/stdout bs=1 skip=0x402000 count=16 2>/dev/null | xxd
# No output nothing
As you can see, the content we expect ("Hello, World!\n") does not appear at the address 0x402000. This is because we are dumping data directly from the file, and the offset in the ELF file may differ from the virtual address where the program will load it during execution.
The first step is to check the ELF file’s section headers. We can use readelf to do this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ readelf --wide -S hello
There are 7 section headers, starting at offset 0x2148:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .note.gnu.build-id NOTE 0000000000400120 000120 000024 00 A 0 0 4
[ 2] .text PROGBITS 0000000000401000 001000 00002a 00 AX 0 0 1
[ 3] .data PROGBITS 0000000000402000 002000 00000e 00 WA 0 0 1
[ 4] .symtab SYMTAB 0000000000000000 002010 0000c0 18 5 4 8
[ 5] .strtab STRTAB 0000000000000000 0020d0 000038 00 0 0 1
[ 6] .shstrtab STRTAB 0000000000000000 002108 00003a 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), l (large), p (processor specific)
From the readelf output, we can see that the .data section starts at file offset 0x1fa0, which corresponds to the virtual address 0x402000 when the program is loaded into memory.
The address 0x402000 is where the data is loaded into memory during runtime. But in the raw ELF file, it resides at offset 0x2000 you can see under off column.
Now that we know the correct file offset is 0x2000, we can dump the contents of the .data section using dd, xxd, or hexdump from the correct file offset.
1
2
3
4
5
6
7
8
9
$ xxd -s 0x2000 -l 16 hello
00002000: 4865 6c6c 6f2c 2057 6f72 6c64 210a 0000 Hello, World!...
$ hexdump -s 0x2000 -n 16 -C hello
00002000 48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 0a 00 00 |Hello, World!...|
00002010
$ dd if=hello of=/dev/stdout bs=1 skip=$((0x2000)) count=16 2>/dev/null | xxd
00000000: 4865 6c6c 6f2c 2057 6f72 6c64 210a 0000 Hello, World!...
The Virtual Address (0x402000) represents the memory location where the data will reside during program execution, while the File Offset (0x2000) indicates the position of that data within the ELF file on disk.
That was too much!
Now let’s come back to our aim of making the binary Position Independent.
PIE executables can be loaded at different memory addresses each time they run, so hardcoded addresses don’t work.
We can achieve this using 3 methods:
1. RIP-Relative Addressing (Recommended)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
.intel_syntax noprefix
.section .data
message: .ascii "Hello, World!\n"
message_len = . - message
.section .text
.global main
main:
# RIP-relative addressing - calculates address relative to current instruction
lea rsi, [rip + message] # Load effective address of message
mov rax, 1 # sys_write
mov rdi, 1 # stdout
mov rdx, message_len # message length
syscall
mov rax, 60 # sys_exit
xor rdi, rdi # exit code 0
syscall
Visual Representation
Let’s see visual representation of instruction lea rsi, [rip + message].
Memory Layout (Actual Example):
┌─────────────────┐ 0x401000
│ .text section │ ← Code section (executable)
│ │
│ _start: │
│ 0x401000: │ ← lea rsi,[rip+0xff9] instruction
│ 0x401007: │ ← RIP points HERE during execution ✓
│ mov rax,60 │
│ 0x40100e: │
│ ... │
├─────────────────┤
│ (gap) │ ← Other sections may be here
├─────────────────┤ 0x402000
│ .data section │ ← Data section (read-write)
│ │
│ message: │ ← "Hello" string at 0x402000 ✓
│ "Hello" │
└─────────────────┘
Offset calculation:
RIP during execution: 0x401007
Message address: 0x402000
Offset: 0x402000 - 0x401007 = 0xFF9
Final encoded instruction: lea rsi, [rip + 0xff9]
When you write [rip + message], you’re not actually adding RIP to the address of message. Instead, the assembler calculates the offset from the current instruction to the message label and encodes that offset into the instruction. At runtime, the CPU adds this offset to RIP to find the actual address. This relative addressing is what makes the code position-independent - the distance between code and data remains constant, even when the entire program is loaded at different memory locations.
2. Use GOT (Global Offset Table)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
.intel_syntax noprefix
.section .data
message: .ascii "Hello, World!\n"
message_len = . - message
.section .text
.global main
main:
# Access through Global Offset Table
mov rsi, [rip + message@GOTPCREL] # Get address from GOT
mov rax, 1 # sys_write
mov rdi, 1 # stdout
mov rdx, message_len # message length
syscall
mov rax, 60 # sys_exit
xor rdi, rdi # exit code 0
syscall
The Global Offset Table is a special memory area that contains pointers to all global variables and functions. Think of it as an “address book” that gets filled in when the program loads.
Memory Layout will look like this:
┌─────────────────┐ 0x401000
│ .text section │
│ _start: │
│ mov rsi,... │ ← This instruction at 0x401000
│ mov rax, 60 │ ← RIP points HERE during execution (0x401007)
│ ... │
├─────────────────┤ 0x401800
│ .got section │ ← Global Offset Table
│ ... │
│ message_ptr: │ ← Pointer to message (initially zero)
│ 0x00000000 │ ← Will be filled at runtime!
│ ... │
├─────────────────┤ 0x402000
│ .data section │
│ message: │ ← "Hello" at 0x402000
│ "Hello" │
└─────────────────┘
What the Assembler Does
- Sees:
message@GOTPCREL - Calculates: Where the GOT entry for
messagewill be relative to RIP - Encodes: That offset into the instruction
For our example:
- Instruction:
mov rsi, [rip + message@GOTPCREL]at 0x401000 - RIP during execution = 0x401007
- GOT entry address = 0x401800 (where the pointer to message lives)
- Offset = 0x401800 - 0x401007 = 0x7F9
So the instruction becomes: mov rsi, [rip + 0x7F9]
PCREL = “PC-relative” (RIP-relative)
When the program starts, the dynamic linker:
- Finds where
messageactually is in memory - Writes that address into the GOT entry at 0x401800
- So 0x401800 now contains:
0x402000(actual address of “Hello”)
Memory Map:
.text (0x401000) → .got (0x401800) → .data (0x402000)
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ mov rsi,... │ │ 0x402000 │ │ "Hello" │
│ │ │ (pointer) │ │ │
│ │ 0x7F9 │ │ │ │
│ │ ───────→ │ │ ───────→ │ │
│ RIP=0x401007│ │ GOT entry │ │ message │
└─────────────┘ └─────────────┘ └─────────────┘
Access GOT entry Get actual address
But why do we use GOT?
Your Program: mov rsi, [rip + printf@GOTPCREL]
call rsi
Memory:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Your code │ → │ GOT entry │ → │ printf in │
│ │ │ for printf │ │ libc.so │
└─────────────┘ └─────────────┘ └─────────────┘
You can verify this using ldd
1
2
3
4
5
$ ldd print_str
linux-vdso.so.1 (0x00007ffeb26f0000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000781887400000)
/lib64/ld-linux-x86-64.so.2 (0x000078188768a000)
Now you will know -
message@GOTPCREL = “The offset to message’s GOT entry from RIP”
So the assembler converts:
mov rsi, [rip + message@GOTPCREL]
↓
mov rsi, [rip + 0x7F9] # Where 0x7F9 is offset to GOT entry
When you write mov rsi, [rip + message@GOTPCREL], you’re not loading the address of message directly. Instead, you’re accessing the Global Offset Table - a special memory area that contains pointers to all global variables. The @GOTPCREL syntax tells the assembler to calculate the offset to message’s GOT entry. At runtime, the instruction reads from that GOT entry to get the actual address of message, which was filled in by the dynamic linker. This extra indirection enables powerful features like shared libraries and runtime symbol resolution, making it the foundation of dynamic linking on modern systems!
2. CMPSB / CMPSW / CMPSD / CMPSQ - Compare String
Syntax: CMPSx
How it works:
- Compares
[RSI]with[RDI] - Sets flags like CMP instruction
- Updates RSI and RDI
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
section .data
str1 db 'Hello', 0
str2 db 'Hello', 0
str3 db 'World', 0
section .text
global _start
_start:
; Compare two equal strings
mov rsi, str1
mov rdi, str2
mov rcx, 6
cld
repe cmpsb ; Compare while equal
jz strings_equal ; Jump if all characters matched
; Compare different strings
mov rsi, str1
mov rdi, str3
mov rcx, 6
cld
repe cmpsb
jnz strings_different
The REPE instruction in x86 is a prefix that repeats a string instruction as long as the count register is not zero and the Zero Flag (ZF) is set. REPE (Repeat while Equal) is synonymous with REPZ (Repeat while Zero).
Syntax: REPE instruction or REPZ instruction
Operation:
- Execute the string instruction
- Decrement RCX
Continue only if:
- RCX ≠ 0 AND
- ZF = 1 (last comparison was equal/zero)
Stops when:
- RCX reaches 0, OR
- ZF becomes 0 (comparison fails)
3. SCASB / SCASW / SCASD / SCASQ - Scan String
Syntax: SCASx
How it works:
- Compares
AL/AX/EAX/RAXwith[RDI] - Sets flags
- Updates RDI
1
2
3
4
5
6
7
8
9
10
11
12
13
section .data
message db 'Find the letter X', 0
section .text
global _start
_start:
mov rdi, message ; String to scan
mov al, 'X' ; Character to find
mov rcx, 17 ; String length
cld
repne scasb ; Scan while not equal
jz found ; Jump if character found
; else, not found
| Prefix | Condition | Use Case |
|---|---|---|
| REPE | ZF=1 | Find first mismatch |
| REPNE | ZF=0 | Find first match |
4. STOSB / STOSW / STOSD / STOSQ - Store String
Syntax: STOSx
How it works:
- Stores
AL/AX/EAX/RAXinto[RDI] - Updates RDI
1
2
3
4
5
6
7
8
9
10
11
section .bss
buffer resb 100
section .text
global _start
_start:
mov rdi, buffer ; Destination buffer
mov al, 'A' ; Character to store
mov rcx, 100 ; Buffer size
cld
rep stosb ; Fill buffer with 'A'
5. LODSB / LODSW / LODSD / LODSQ - Load String
Syntax: LODSx
How it works:
- Loads
[RSI]intoAL/AX/EAX/RAX - Updates RSI
1
2
3
4
5
6
7
8
9
10
11
12
13
14
section .data
numbers db 1, 2, 3, 4, 5
section .text
global _start
_start:
mov rsi, numbers ; Source data
mov rcx, 5 ; Number of elements
xor rax, rax ; Clear sum
cld
sum_loop:
lodsb ; Load byte into AL
add rax, rax ; Process data (example: double each number)
loop sum_loop
Repeat Prefixes
String instructions become powerful when combined with repeat prefixes:
| Prefix | Condition | Operation |
|---|---|---|
| REP | RCX ≠ 0 | Repeat while RCX > 0 |
| REPE / REPZ | RCX ≠ 0 and ZF=1 | Repeat while equal/zero |
| REPNE / REPNZ | RCX ≠ 0 and ZF=0 | Repeat while not equal/not zero |
Now let’s get hands on experience on some practical examples.
Example 1: String Length (Strlen)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
section .data
my_string db 'Hello, World!', 0
section .text
global _start
_start:
mov rdi, my_string ; String to measure
xor rax, rax ; AL = 0 (null terminator)
mov rcx, -1 ; Maximum count (4GB-1)
cld
repne scasb ; Scan until null byte found
; Calculate length: -1 - RCX - 1
mov rax, -2 ; -1 - 1
sub rax, rcx ; RAX = string length
Example 2: Memory Fill
1
2
3
4
5
6
7
8
9
10
11
section .bss
array resd 1000 ; Reserve 1000 doublewords
section .text
global _start
_start:
mov rdi, array ; Destination
mov eax, 0xFFFFFFFF ; Pattern to fill
mov rcx, 1000 ; Number of doublewords
cld
rep stosd ; Fill with 0xFFFFFFFF
Example 3: String Copy with Length Limit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
section .data
source db 'This is a long string', 0
section .bss
dest resb 50
section .text
global _start
_start:
mov rsi, source
mov rdi, dest
mov rcx, 49 ; Maximum characters to copy
cld
copy_loop:
lodsb ; Load from source
test al, al ; Check for null terminator
jz copy_done ; Stop if null found
stosb ; Store to destination
loop copy_loop ; Continue until RCX=0
copy_done:
mov byte [rdi], 0 ; Add null terminator
Example 4: Case Conversion
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
section .data
text db 'Hello World', 0
len equ $ - text - 1
section .text
global _start
_start:
mov rsi, text
mov rdi, text
mov rcx, len
cld
convert_loop:
lodsb ; Load character
cmp al, 'a'
jb not_lower
cmp al, 'z'
ja not_lower
sub al, 32 ; Convert to uppercase
not_lower:
stosb ; Store back
loop convert_loop
Using Multiple String Instructions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
; Convert string to uppercase and calculate length
section .data
input db 'hello world', 0
section .text
global _start
_start:
; Find length first
mov rdi, input
xor rax, rax
mov rcx, -1
repne scasb
mov r8, rcx ; Save length information
; Convert to uppercase
mov rsi, input
mov rdi, input
not r8 ; Convert negative length to positive
dec r8 ; Adjust for null terminator
mov rcx, r8
cld
convert:
lodsb
cmp al, 'a'
jb store
cmp al, 'z'
ja store
sub al, 32
store:
stosb
loop convert
Just remember -
- MOVSx - Efficient memory copying
- CMPSx - Fast block comparison
- SCASx - Rapid scanning and searching
- STOSx - Quick memory initialization
- LODSx - Streamlined data loading
- Repeat prefixes - Automated repetition
You’ve now mastered the powerful string instructions that make x86-64 assembly exceptionally efficient for data processing.
In our next topic, we’ll explore one of the most fundamental concepts in programming - the Stack and Procedures!