Post

3. Data Representation

3. Data Representation

At the lowest level, computers understand only two states: on and off, represented as 1 and 0. This binary system forms the foundation of all data representation. In this chapter, we’ll explore how numbers, text, and other information are encoded in binary form for the computer to process.

Binary and Hexadecimal: The Language of Machines

Binary (Base-2)

  • Uses only two digits: 0 and 1
  • Each binary digit is called a bit
  • Groups of 8 bits form a byte
  • Example: 1101 1010 (1 byte)

Hexadecimal (Base-16)

  • More compact representation of binary
  • Uses digits 0-9 and letters A-F
  • One hex digit represents 4 bits (a nibble)
  • Example: 0xDA represents 1101 1010

Why Hexadecimal?
Reading long binary strings like 1101101010111100 is error-prone. The hexadecimal equivalent 0xDABC is much easier to work with and is commonly used in assembly programming.

Representing Integers

Unsigned Integers

  • All bits represent positive values
  • Range for n bits: 0 to 2ⁿ - 1
  • Example (8-bit): 1111 1111 = 255

Signed Integers (Two’s Complement)

  • Most significant bit indicates sign (0 = positive, 1 = negative)
  • Range for n bits: -2ⁿ⁻¹ to +2ⁿ⁻¹ - 1
  • To get negative representation:

    1. Invert all bits
    2. Add 1

Examples (8-bit):

 5 = 0000 0101
-5 = 1111 1011  (invert 0000 0101 → 1111 1010, then add 1)

x86-64 Data Types and Sizes

Understanding data sizes is crucial in assembly, as instructions often specify the operand size.

Data TypeSize (bits)Size (bytes)Range (Signed)Range (Unsigned)
Byte81-128 to +1270 to 255
Word162-32,768 to +32,7670 to 65,535
Doubleword324-2³¹ to +2³¹-10 to 2³²-1
Quadword648-2⁶³ to +2⁶³-10 to 2⁶⁴-1

Register Naming Conventions:

  • AL/BL/CL/DL - Low byte of 16-bit register
  • AH/BH/CH/DH - High byte of 16-bit register
  • AX/BX/CX/DX - 16-bit word registers
  • EAX/EBX/ECX/EDX - 32-bit doubleword registers
  • RAX/RBX/RCX/RDX - 64-bit quadword registers

Character Representation: ASCII

Characters are represented using standardized encoding systems, primarily ASCII (American Standard Code for Information Interchange).

Key ASCII Ranges:

  • 0-31: Control characters (non-printable)
  • 32-47: Punctuation and symbols
  • 48-57: Digits 0-9
  • 65-90: Uppercase letters A-Z
  • 97-122: Lowercase letters a-z

Examples:

'A' = 65 = 0x41 = 0100 0001
'a' = 97 = 0x61 = 0110 0001
'0' = 48 = 0x30 = 0011 0000

If you often need to look up ASCII values or character codes, Linux has a handy command-line tool called ascii. You can easily install it using apt:

1
2
3
4
5
6
7
8
9
10
$ sudo apt install ascii

$ ascii A
ASCII 4/1 is decimal 065, hex 41, octal 101, bits 01000001: prints as `A'
Official name: Majuscule A
Other names: Capital A, Uppercase A 

ASCII 0/10 is decimal 010, hex 0a, octal 012, bits 00001010: called ^J, LF, NL
Official name: Line Feed
Other names: Newline, \n 

To convert a decimal number to binary and then to hexadecimal from the Linux command line, standard utilities like bc.

Decimal to Binary

1
2
3
# echo "obase=2; DECIMAL_NUMBER" | bc
$ echo "obase=2; 15" | bc
1111

Decimal to Hexadecimal

1
2
3
# echo "obase=16; DECIMAL_NUMBER" | bc
$ echo "obase=16; 255" | bc
FF

Hexadecimal to String (ASCII/Text)

Use xxd or printf for this -

1
2
3
4
5
6
# echo HEXSTRING | xxd -r -p && echo ''
$ echo 68656c6c6f | xxd -r -p
hello

$ printf '\x68\x65\x6c\x6c\x6f'
hello

With this essential context in place, you’re now ready to start writing instructions that operate on this data.

This post is licensed under CC BY 4.0 by the author.