3. Data Representation
At the lowest level, computers understand only two states: on and off, represented as 1 and 0. This binary system forms the foundation of all data representation. In this chapter, we’ll explore how numbers, text, and other information are encoded in binary form for the computer to process.
Binary and Hexadecimal: The Language of Machines
Binary (Base-2)
- Uses only two digits: 0 and 1
- Each binary digit is called a bit
- Groups of 8 bits form a byte
- Example:
1101 1010(1 byte)
Hexadecimal (Base-16)
- More compact representation of binary
- Uses digits 0-9 and letters A-F
- One hex digit represents 4 bits (a nibble)
- Example:
0xDArepresents1101 1010
Why Hexadecimal?
Reading long binary strings like 1101101010111100 is error-prone. The hexadecimal equivalent 0xDABC is much easier to work with and is commonly used in assembly programming.
Representing Integers
Unsigned Integers
- All bits represent positive values
- Range for n bits: 0 to 2ⁿ - 1
- Example (8-bit):
1111 1111= 255
Signed Integers (Two’s Complement)
- Most significant bit indicates sign (0 = positive, 1 = negative)
- Range for n bits: -2ⁿ⁻¹ to +2ⁿ⁻¹ - 1
To get negative representation:
- Invert all bits
- Add 1
Examples (8-bit):
5 = 0000 0101
-5 = 1111 1011 (invert 0000 0101 → 1111 1010, then add 1)
x86-64 Data Types and Sizes
Understanding data sizes is crucial in assembly, as instructions often specify the operand size.
| Data Type | Size (bits) | Size (bytes) | Range (Signed) | Range (Unsigned) |
|---|---|---|---|---|
| Byte | 8 | 1 | -128 to +127 | 0 to 255 |
| Word | 16 | 2 | -32,768 to +32,767 | 0 to 65,535 |
| Doubleword | 32 | 4 | -2³¹ to +2³¹-1 | 0 to 2³²-1 |
| Quadword | 64 | 8 | -2⁶³ to +2⁶³-1 | 0 to 2⁶⁴-1 |
Register Naming Conventions:
AL/BL/CL/DL- Low byte of 16-bit registerAH/BH/CH/DH- High byte of 16-bit registerAX/BX/CX/DX- 16-bit word registersEAX/EBX/ECX/EDX- 32-bit doubleword registersRAX/RBX/RCX/RDX- 64-bit quadword registers
Character Representation: ASCII
Characters are represented using standardized encoding systems, primarily ASCII (American Standard Code for Information Interchange).
Key ASCII Ranges:
0-31: Control characters (non-printable)32-47: Punctuation and symbols48-57: Digits 0-965-90: Uppercase letters A-Z97-122: Lowercase letters a-z
Examples:
'A' = 65 = 0x41 = 0100 0001
'a' = 97 = 0x61 = 0110 0001
'0' = 48 = 0x30 = 0011 0000
If you often need to look up ASCII values or character codes, Linux has a handy command-line tool called ascii. You can easily install it using apt:
1
2
3
4
5
6
7
8
9
10
$ sudo apt install ascii
$ ascii A
ASCII 4/1 is decimal 065, hex 41, octal 101, bits 01000001: prints as `A'
Official name: Majuscule A
Other names: Capital A, Uppercase A
ASCII 0/10 is decimal 010, hex 0a, octal 012, bits 00001010: called ^J, LF, NL
Official name: Line Feed
Other names: Newline, \n
To convert a decimal number to binary and then to hexadecimal from the Linux command line, standard utilities like bc.
Decimal to Binary
1
2
3
# echo "obase=2; DECIMAL_NUMBER" | bc
$ echo "obase=2; 15" | bc
1111
Decimal to Hexadecimal
1
2
3
# echo "obase=16; DECIMAL_NUMBER" | bc
$ echo "obase=16; 255" | bc
FF
Hexadecimal to String (ASCII/Text)
Use xxd or printf for this -
1
2
3
4
5
6
# echo HEXSTRING | xxd -r -p && echo ''
$ echo 68656c6c6f | xxd -r -p
hello
$ printf '\x68\x65\x6c\x6c\x6f'
hello
With this essential context in place, you’re now ready to start writing instructions that operate on this data.