Edit Distance
I. In-Class Exercises
Programming Exercises
- Edit Distance Problem: L3471
II. Knowledge Summary
✨ Core Idea
Edit Distance (also known as Levenshtein Distance) is a method for measuring the difference between two strings. Specifically, the edit distance is the minimum number of edit operations required to transform one string into another.
Allowed edit operations include:
- Insert: Insert a character into the string
- Delete: Delete a character from the string
- Replace: Replace one character in the string with another
✨ Applications
Edit distance has wide applications in many fields, including natural language processing, computer vision, bioinformatics (such as DNA sequence analysis), and other areas requiring text comparison or pattern recognition.
✨ Dynamic Programming Solution
Let the two strings be str1 and str2, with lengths m and n respectively. Define dp[i][j] as the minimum number of edit operations needed to transform the first i characters of str1 into the first j characters of str2.
Initialization
- When
i = 0,dp[0][j] = j, meaning converting an empty string to a string of length j requires j insert operations - When
j = 0,dp[i][0] = i, meaning converting a string of length i to an empty string requires i delete operations
State Transition Equation
For i > 0 and j > 0:
- If
str1[i-1] == str2[j-1], thendp[i][j] = dp[i-1][j-1](current characters match, no additional operation needed) - If
str1[i-1] != str2[j-1], thendp[i][j] = min(dp[i-1][j-1], dp[i][j-1], dp[i-1][j]) + 1, where:dp[i-1][j-1] + 1represents a replace operationdp[i][j-1] + 1represents an insert operationdp[i-1][j] + 1represents a delete operation
Final Result
dp[m][n] is the minimum edit distance from str1 to str2.
1#include <iostream>
2#include <vector>
3#include <string>
4#include <algorithm> // For std::min
5
6int editDistance(const std::string& str1, const std::string& str2) {
7 int m = str1.size();
8 int n = str2.size();
9 std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));
10
11 // Initialize the dp array
12 for (int i = 0; i <= m; ++i) {
13 dp[i][0] = i; // First column: progressively delete all characters
14 }
15 for (int j = 0; j <= n; ++j) {
16 dp[0][j] = j; // First row: progressively insert all characters
17 }
18
19 // Fill the dp array
20 for (int i = 1; i <= m; ++i) {
21 for (int j = 1; j <= n; ++j) {
22 if (str1[i - 1] == str2[j - 1]) {
23 dp[i][j] = dp[i - 1][j - 1]; // Characters match, no operation needed
24 } else {
25 dp[i][j] = std::min(dp[i - 1][j - 1], // Replace operation
26 std::min(dp[i - 1][j], // Delete operation
27 dp[i][j - 1])) // Insert operation
28 + 1;
29 }
30 }
31 }
32
33 return dp[m][n];
34}
35
36int main() {
37 std::string str1 = "kitten";
38 std::string str2 = "sitting";
39 std::cout << "Edit Distance between '" << str1 << "' and '" << str2 << "' is "
40 << editDistance(str1, str2) << std::endl;
41 return 0;
42}✨ Execution Example
Using str1 = "cat" and str2 = "cut" as an example, let's trace the dp table filling process:
Initialization:
| dp | "" | c | u | t |
|---|---|---|---|---|
| "" | 0 | 1 | 2 | 3 |
| c | 1 | |||
| a | 2 | |||
| t | 3 |
Step-by-step filling:
dp[1][1]: str1[0]='c' == str2[0]='c', characters match
- dp[1][1] = dp[0][0] = 0
dp[1][2]: str1[0]='c' != str2[1]='u'
- Replace: dp[0][1] + 1 = 2
- Delete: dp[0][2] + 1 = 3
- Insert: dp[1][1] + 1 = 1
- dp[1][2] = min(2, 3, 1) = 1
dp[1][3]: str1[0]='c' != str2[2]='t'
- dp[1][3] = min(dp[0][2]+1, dp[0][3]+1, dp[1][2]+1) = min(3, 4, 2) = 2
dp[2][1]: str1[1]='a' != str2[0]='c'
- dp[2][1] = min(dp[1][0]+1, dp[1][1]+1, dp[2][0]+1) = min(2, 1, 3) = 1
dp[2][2]: str1[1]='a' != str2[1]='u'
- dp[2][2] = min(dp[1][1]+1, dp[1][2]+1, dp[2][1]+1) = min(1, 2, 2) = 1
dp[2][3]: str1[1]='a' != str2[2]='t'
- dp[2][3] = min(dp[1][2]+1, dp[1][3]+1, dp[2][2]+1) = min(2, 3, 2) = 2
dp[3][1]: str1[2]='t' != str2[0]='c'
- dp[3][1] = min(dp[2][0]+1, dp[2][1]+1, dp[3][0]+1) = min(3, 2, 4) = 2
dp[3][2]: str1[2]='t' != str2[1]='u'
- dp[3][2] = min(dp[2][1]+1, dp[2][2]+1, dp[3][1]+1) = min(2, 2, 3) = 2
dp[3][3]: str1[2]='t' == str2[2]='t', characters match
- dp[3][3] = dp[2][2] = 1
Final dp table:
| dp | "" | c | u | t |
|---|---|---|---|---|
| "" | 0 | 1 | 2 | 3 |
| c | 1 | 0 | 1 | 2 |
| a | 2 | 1 | 1 | 2 |
| t | 3 | 2 | 2 | 1 |
Answer: dp[3][3] = 1, only need to replace 'a' with 'u', i.e., "cat" -> "cut".
✨ Step-by-Step Problem Solving Guide
Steps for solving the Edit Distance problem:
- Identify the three operations: Insert, delete, replace — each operation costs 1
- Build the dp table: Rows correspond to characters of str1, columns to characters of str2, with an extra row and column for the empty string
- Initialize the boundaries: Converting an empty string to a string of length j requires j inserts; converting a string of length i to an empty string requires i deletes
- Fill the table: If characters match, no operation needed (take the upper-left value); if different, take the minimum of three directions + 1
- Meaning of the three directions:
- Upper-left
dp[i-1][j-1]+ 1 -> Replace - Above
dp[i-1][j]+ 1 -> Delete a character from str1 - Left
dp[i][j-1]+ 1 -> Insert a character into str1
- Upper-left
✨ Common Mistakes
- Confusing the directions for the three operations: Replace corresponds to upper-left, delete to above, insert to left — mixing them up causes confusion
- Adding 1 when characters match: When
str1[i-1] == str2[j-1],dp[i][j] = dp[i-1][j-1], no +1 needed - Missing initialization: The first row and first column must be initialized to 0, 1, 2, 3, ... — forgetting this causes all results to be wrong
- String index offset: In the dp table, dp[i][j] corresponds to the first i characters of str1, but string indices start at 0, so you should compare
str1[i-1]andstr2[j-1] - min function only compares two values: In C++,
minaccepts only two parameters by default — for three values, use nested calls:min(a, min(b, c))
✨ Algorithm Evaluation
- Time Complexity: O(m x n), as the entire dp table needs to be filled
- Space Complexity: O(m x n), for storing the dp array. Can be optimized to O(min(m, n)) by keeping only two rows