Computer Encoding
I. In-Class Exercises
Programming Exercises
II. Knowledge Summary
✨ Core Concept of Computer Encoding
Encoding is a fundamental operation in information processing:
- Broadly: Converting information from one format to another
- Narrowly: Representing information using binary numbers
- Purpose: Facilitating storage, transmission, and parsing
Main categories of encoding:
- Numeric encoding: Representing numbers in binary (sign-magnitude, ones' complement, two's complement)
- Character encoding: Representing characters in binary (ASCII, Unicode)
- Multimedia encoding: Representing images, audio, and video in binary
- Compression encoding: Encoding methods that reduce data size
✨ Numeric Encoding
Inside computers, numbers are represented using three methods: sign-magnitude, ones' complement, and two's complement. For positive numbers, all three representations are identical; the difference lies in how negative numbers are represented.
Sign-Magnitude
Sign-magnitude is the most intuitive binary representation:
- The sign-magnitude of a positive number is simply its binary representation
- The sign-magnitude of a negative number is the positive number's binary with the most significant bit changed to 1
Using 8-bit sign-magnitude as an example:
| Encoding Type | Value | Encoding |
|---|---|---|
| Sign-magnitude | 5 | 00000101 |
| Sign-magnitude | -5 | 10000101 |
Ones' Complement
The rules for ones' complement are:
- The ones' complement of a positive number is simply its binary representation (same as sign-magnitude)
- The ones' complement of a negative number is obtained by flipping every bit of the corresponding positive number's binary
Using 8-bit ones' complement as an example:
| Encoding Type | Value | Encoding |
|---|---|---|
| Ones' complement | 5 | 00000101 |
| Ones' complement | -5 | 11111010 |
Two's Complement
Two's complement is the most commonly used numeric encoding in computers:
- The two's complement of a positive number is simply its binary representation (same as sign-magnitude)
- The two's complement of a negative number is its ones' complement plus 1
Using 8-bit two's complement as an example:
| Encoding Type | Value | Encoding |
|---|---|---|
| Two's complement | 5 | 00000101 |
| Two's complement | -5 | 11111011 |
Summary of the Three Numeric Encodings
Memory aid:
- Positive numbers: all three encodings are the same
- Sign-magnitude for negatives: flip the leftmost bit (set MSB to 1)
- Ones' complement for negatives: flip every bit
- Two's complement for negatives: ones' complement + 1
Using 8-bit encoding as an example, the complete comparison is:
| Encoding Type | Value | Encoding |
|---|---|---|
| Sign-magnitude | 5 | 00000101 |
| Sign-magnitude | -5 | 10000101 |
| Ones' complement | 5 | 00000101 |
| Ones' complement | -5 | 11111010 |
| Two's complement | 5 | 00000101 |
| Two's complement | -5 | 11111011 |
✨ Bitwise Operations
Bitwise operations operate on individual bits in binary numbers and are an efficient computation method inside computers. They can be combined with assignment operators to form bitwise assignment operators.
| Operation | Symbol | Type | Meaning | Example |
|---|---|---|---|---|
| AND | & | Bitwise AND (binary) | Result is 1 only when both bits are 1 | 5 & 6 = (100)_2 = 4 |
| OR | | | Bitwise OR (binary) | Result is 1 when at least one bit is 1 | 5 | 6 = (111)_2 = 7 |
| XOR | ^ | Bitwise XOR (binary) | Result is 1 only when the two bits differ | 5 ^ 6 = (011)_2 = 3 |
| NOT | ~ | Bitwise NOT (unary) | Flips all 0s and 1s in the two's complement | ~5 = -6, ~(-5) = 4 |
| Left shift | << | Bitwise left shift (binary) | Shifts binary digits left by i positions | 5 << 1 = (00001010)_2 = 10 |
| Right shift | >> | Bitwise right shift (binary) | Shifts binary digits right by i positions | 5 >> 1 = (00000010)_2 = 2 |
✨ Character Encoding
Character encoding maps characters to numbers. Common character encoding standards include:
| Encoding Standard | Coverage | Number of Characters | Description |
|---|---|---|---|
| ASCII | English letters, digits, symbols | 128 | The most basic character encoding, 1 byte |
| GBK | Simplified Chinese | ~21,000 characters | National standard encoding, 2 bytes per character |
| BIG5 | Traditional Chinese | ~13,000 characters | Used in Hong Kong and Taiwan |
| Unicode | Most of the world's writing systems | Over 140,000 characters | Universal encoding; UTF-8/UTF-16/UTF-32 are its encoding schemes |
Common ASCII values:
| Character | ASCII Value | Character | ASCII Value | Character | ASCII Value |
|---|---|---|---|---|---|
\0 (null) | 0 | 0 | 48 | A | 65 |
| space | 32 | 9 | 57 | Z | 90 |
\n (newline) | 10 | a | 97 | z | 122 |
Memory tip:
0<A<a, meaning digits < uppercase letters < lowercase letters. The difference between an uppercase letter and its corresponding lowercase letter is 32 (e.g.,A(65) anda(97)).
✨ Multimedia Encoding
Encoding Formats
| Type | Format | Characteristics |
|---|---|---|
| Image | jpg (jpeg) | Lossy compression, suitable for photos |
| Image | png | Lossless compression, supports transparency |
| Image | gif | Supports animation, limited colors (256) |
| Audio | mp3 | Lossy compression, small file size, most common |
| Audio | aac | Lossy compression, better quality than mp3 |
| Audio | wav | Lossless format, large file size, best quality |
| Video | mp4 | Most universal video format |
| Video | avi | Older format, good compatibility |
| Video | mkv | Open format, supports multiple audio tracks/subtitles |
Encoding Standards
| Standard | Representative Format | Application |
|---|---|---|
| MPEG-1 | mp3 | Audio compression |
| MPEG-4 | mp4 | Video compression |
| H.264/AVC | - | HD video transmission (4K, 8K) |
| H.265/HEVC | - | Ultra HD video, ~50% better compression than H.264 |
✨ Compression Encoding
Compression encoding reduces data size through specific algorithms for easier storage and transmission:
| Compression Algorithm | Corresponding Format | Compression Type | Description |
|---|---|---|---|
| DEFLATE | ZIP | Lossless | Most common general-purpose compression algorithm |
| RAR | RAR | Lossless | Compression ratio typically slightly higher than ZIP |
| Huffman coding | - | Lossless | Based on character frequency; high-frequency characters use shorter codes |
| LZ77/LZ78 | - | Lossless | Based on replacing repeated strings; foundation of DEFLATE |
✨ Execution Examples for Computer Encoding
Find the sign-magnitude, ones' complement, and two's complement of -13 (8-bit)
Step 1: Find the sign-magnitude
- First write 13 in binary: 13 = 8 + 4 + 1 = 00001101
- Sign-magnitude of -13: change the MSB to 1 -> 10001101
Step 2: Find the ones' complement
- Flip every bit of 13's binary: 00001101 -> 11110010
Step 3: Find the two's complement
- Ones' complement plus 1: 11110010 + 1 = 11110011
Complete comparison table:
| Encoding | +13 | -13 |
|---|---|---|
| Sign-magnitude | 00001101 | 10001101 |
| Ones' complement | 00001101 | 11110010 |
| Two's complement | 00001101 | 11110011 |
Bitwise Operation Execution: 5 & 6, 5 | 6, 5 ^ 6
First convert 5 and 6 to binary:
- 5 = 101
- 6 = 110
Bitwise AND (&): result is 1 only when both bits are 1
101 (5)
& 110 (6)
-----
100 (4)Bitwise OR (|): result is 1 when at least one bit is 1
101 (5)
| 110 (6)
-----
111 (7)Bitwise XOR (^): result is 1 only when the two bits differ
101 (5)
^ 110 (6)
-----
011 (3)Left Shift and Right Shift Execution
5 << 1: Shift 101 left by 1 position -> 1010 = 10 (equivalent to multiplying by 2)5 >> 1: Shift 101 right by 1 position -> 10 = 2 (equivalent to dividing by 2, rounding down)5 << 2: Shift 101 left by 2 positions -> 10100 = 20 (equivalent to multiplying by 4)
Rule: Left shifting by k positions is equivalent to multiplying by 2^k; right shifting by k positions is equivalent to dividing by 2^k (rounding down).
✨ Problem-Solving Steps for Computer Encoding
Example: Find the 8-bit two's complement of -20
Steps:
- Find 20 in binary: 20 = 16 + 4 = 00010100
- Find the ones' complement (flip every bit): 00010100 -> 11101011
- Ones' complement plus 1 gives two's complement: 11101011 + 1 = 11101100
Verification: Convert the two's complement back to decimal
- Two's complement 11101100, MSB is 1, indicating a negative number
- Flip all bits: 00010011, add 1: 00010100 = 20
- So the original number is -20
Example: Use bitwise operations to determine if an integer is odd or even
if (n & 1) {
cout << "odd" << endl;
} else {
cout << "even" << endl;
}Principle: The least significant bit of any integer's binary representation is 1 for odd numbers and 0 for even numbers. n & 1 extracts the LSB for checking, which is more efficient than n % 2.
✨ Common Mistakes with Computer Encoding
- Confusing sign-magnitude and ones' complement: Sign-magnitude changes "only the MSB to 1," while ones' complement "flips every bit." For example, the sign-magnitude of -5 is 10000101 (only MSB changed), and the ones' complement is 11111010 (all bits flipped) — they are completely different
- Forgetting carry in two's complement calculation: Adding 1 to the ones' complement may produce a carry. For example, ones' complement 11111111 + 1 = 100000000; the 8-bit two's complement keeps only the lower 8 bits: 00000000
- Thinking the three encodings differ for positive numbers: For positive numbers, the sign-magnitude, ones' complement, and two's complement are all identical; only negative numbers require conversion
- Bitwise operator precedence trap: In C++, the precedence of
&,|,^is lower than comparison operators.if (a & 1 == 1)is actuallyif (a & (1 == 1))— it should be written asif ((a & 1) == 1) - Left shift overflow: Shifting by too many positions may cause data overflow. For example,
1 << 31becomes negative for an int type
✨ Gray Code
Concept
Gray code was invented by Frank Gray of Bell Labs in 1947. Also known as reflected binary code, it is a special binary encoding system.
Characteristics of Gray code:
- Single-bit change: Two consecutive values differ by only one binary bit
- Cyclic property: The maximum and minimum values also differ by only one binary bit
Main applications: Digital system error detection, analog-to-digital conversion, position encoding.
Gray Code Construction Methods
Manual Construction Method
For k-bit Gray code, starting from all zeros, alternate between the following two steps:
- Flip the least significant bit to get the next Gray code
- Flip the bit to the left of the rightmost 1 to get the next Gray code
Repeat steps 1 and 2 a total of 2^k - 1 times to complete the encoding.
Mirror Construction Method
Starting with 1-bit Gray codes 0 and 1, gradually expand:
- Mirror the (k-1)-bit Gray codes vertically (top-to-bottom reflection)
- Prepend 0 to the first k Gray codes
- Prepend 1 to the last k Gray codes
Repeat the prepend operation k-1 times to complete the encoding.
Mirror construction process example:
| 1-bit (start) | 2-bit (mirror + prepend) | 3-bit (mirror + prepend) |
|---|---|---|
| 0 | 00 | 000 |
| 1 | 01 | 001 |
| 11 | 011 | |
| 10 | 010 | |
| 110 | ||
| 111 | ||
| 101 | ||
| 100 |
Conversion Between Gray Code and Binary
Binary to Gray Code
How it works:
- Keep the most significant bit of the binary number as the most significant bit of the Gray code
- For each subsequent bit: XOR that bit with the bit to its left to get the next Gray code bit
g(n) = n ^ (n >> 1);Gray Code to Binary
How it works:
- Keep the most significant bit of the Gray code as the most significant bit of the binary number
- For each subsequent bit: XOR that Gray code bit with the previous binary bit to get the current binary bit
bin[0] = gray[0];
for (int i = 1; i < k; ++i) {
bin[i] = bin[i - 1] ^ gray[i];
}