3/6/2024
Longest Common Subsequence
The process I took to solve this problem is like any dynamic programming problem involving three steps.
The first step is recursion. Basically, if two original strings have the same first character, the length of the longest subsequence between them is equal to the length of the longest subsequence between their substrings, each starts from index 1 and ends at the end of each string. If their first character are not the same, the longest subsequence between them is either the longest subsequence between the first string and the substring start from 1 of the second string or the substring start from 1 of the first string and the second string, whichever is greater. We use this principle to work recursively until we reach the end of either one of two strings. At that time, we return 0 because the length of the subsequence between a string and an empty string is always 0.
The second step is using a hash table. The problem with recursion is that it takes too long to compute. It makes too many redundant calculations. So, we only need it to make the calculation once, store the result at the hash table, then if it reaches the same case again, we just need to return the value at the hash table. In this case, we can create a 2D array that has the number of rows equals to the length of the first string and the number of columns equals to the length of the second string. Then, we store the length of the longest subsequence of the two substrings, one starts from index i of the first string, the other starts from index j of the second string, to the element at [i][j]
of the hash array. Then we return the value at arr[0][0]
because it is the length of the longest subsequence of the two substrings starting at index 0
of both string, which are the original strings themselves.
The third step is only building the hash table. Because everything is boiled down to building the hash table and return the value at arr[0][0]
, there is another way, which is faster, to get the result: just build the hash table from the beginning and don't bother about the resursions. That is exactly what my code is doing.
Revised by ChatGPT
- Recursion:
- If the first characters of both strings are the same, the length of the longest common subsequence (LCS) is 1 plus the LCS of the substrings starting from index 1.
- If the first characters are different, the LCS is the maximum of the LCS between the first string and the substring of the second string starting from index 1, and vice versa.
- The recursion continues until one of the strings is empty, in which case the LCS is 0.
- Memoization (Using a Hash Table):
- Recursion alone leads to redundant calculations. To optimize this, we use memorization to store the results of subproblems in a 2D array.
- The element at
arr[i][j]
represents the length of the LCS of the substrings starting from indexi
of the first string and indexj
of the second string. - The final answer is stored at
arr[0][0]
, which represents the LCS of the original strings.
- Dynamic Programming Table (Bottom-Up Approach):
- Instead of using recursion, we can directly build the dynamic programming table in a bottom-up manner.
- This approach iteratively fills the table based on the recurrence relation derived from the recursive step, without the overhead of recursive calls.
My code implements the third step, building the dynamic programming table in a bottom-up approach to efficiently find the length of the longest common subsequence.