Eat Study Love

먹고 공부하고 사랑하라

Coding_Practice

Edit Distance(애증의 Dynamic Programming)

eatplaylove 2024. 11. 16. 17:45

https://www.geeksforgeeks.org/edit-distance-dp-5/

 

Edit Distance - GeeksforGeeks

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

www.geeksforgeeks.org

Given two strings s1 and s2 of lengths m and n respectively and below operations that can be performed on s1. Find the minimum number of edits (operations) to convert ‘s1‘ into ‘s2‘.  

  • Insert: Insert any character before or after any index of s1
  • Remove: Remove a character of s1
  • Replace: Replace a character at any index of s1 with some other character.

Note: All of the above operations are of equal cost. 

Examples: 

Input:   s1 = “geek”, s2 = “gesek”
Output:  1
Explanation: We can convert s1 into s2 by inserting a ‘s’ between two consecutive ‘e’ in s2.

Input:   s1 = “cat”, s2 = “cut”
Output:  1
Explanation: We can convert s1 into s2 by replacing ‘a’ with ‘u’.

Input:   s1 = “sunday”, s2 = “saturday”
Output:  3
Explanation: Last three and first characters are same.  We basically need to convert “un” to “atur”.  This can be done using below three operations. Replace ‘n’ with ‘r’, insert t, insert a

Recommended Problem
Companies:
Show Topics 
Solve Problem
Hard
35.14%
2.2L

Illustration of Edit Distance:

Let’s suppose we have s1=”GEEXSFRGEEKKS” and s2=”GEEKSFORGEEKS”
Now to convert s1 into s2 we would require 3 minimum operations:
Operation 1: Replace ‘X‘ to ‘K
Operation 2: Insert ‘O‘ between ‘F‘ and ‘R
Operation 3: Remove second last character i.e. ‘K

Refer the below image for better understanding.

Using Recursion:

Subproblems in Edit Distance:

The idea is to process all characters one by one starting from either from left or right sides of both strings. 
Let us process from the right end of the strings, there are two possibilities for every pair of characters being traversed, either they match or they don’t match. If last characters of both string matches then we simply recursively calculate the answer for rest of part of the strings. When last characters do not match, we perform all three operations to match the last characters, i.e. insert, replace, and remove. We then recursively calculate the result for the remaining part of the string. Upon completion of these operations, we will select the minimum answer and add 1 to it.

Below is the recursive tree for this problem considering the case when the last characters never match.

When the last characters of strings matches. Make a recursive call editDist(m-1, n-1) to calculate the answer for remaining part of the strings.

When the last characters of strings don’t matches. Make three recursive calls as show below:

  • Insert s2[n-1] at last of s1 : editDist(m, n-1)
  • Replace s1[m-1] with s2[n-1] : editDist(m-1, n-1)
  • Remove s1[m-1] : editDist(m-1, n)

Recurrence Relations for Edit Distance:

  • Case 1: When the last character of both the strings are same. editDist(s1, s2, m, n) = editDist(s1, s2, m-1, n-1)
  • Case 2: When the last characters are different
    • editDist(s1, s2, m, n) = 1 + Minimum{ editDist(s1, s2 ,m,n-1), editDist(s1, s2 ,m-1, n), editDist(s1, s2 , m-1, n-1)}

Base Case for Edit Distance:

  • Case 1: When s1 becomes empty i.e. m=0
    • return n, as it require n insertions to convert an empty string to s2 of size n
  • Case 2: When s2 becomes empty i.e. n=0
    • return m, as it require m removals to convert s1 of size m to an empty string.

Below is the implementation of the above recursive solution.

 

애증의 Dynamic Programming 문제를 시험에서 맞닥뜨렸는데, 이게 참 머리로는 풀리는데 코드로 짜는게 왜이렇게 어려운 것인가 ㅠㅠ

 

어느 정도 풀긴 풀었는데 문제를 Naive하게만 풀었고 심지어 모든 경우의 수를 다 Cover하지도 못했다.

 

여러가지 방법이 있다.

 

일단 C로 프로그래밍을 할 것이고, DP를 쓰면 아래와 같다.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int min(int x, int y, int z) {
    if (x < y && x < z) return x;
    if (y < z) return y;
    return z;
}

int editDistance(char word1[], char word2[]) {
    int n1 = strlen(word1);
    int n2 = strlen(word2);

    // Step 1: Create DP table
    int dp[n1 + 1][n2 + 1];

    // Step 2: Initialize base cases
    for (int i = 0; i <= n1; i++) {
        for (int j = 0; j <= n2; j++) {
            if (i == 0) {
                dp[i][j] = j; // If first string is empty, insert all characters of second string
            } else if (j == 0) {
                dp[i][j] = i; // If second string is empty, remove all characters of first string
            } else if (word1[i - 1] == word2[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1]; // If characters are the same, no cost
            } else {
                dp[i][j] = 1 + min(dp[i - 1][j],    // Remove
                                   dp[i][j - 1],    // Insert
                                   dp[i - 1][j - 1]); // Replace
            }
        }
    }

    // Step 3: Return result
    return dp[n1][n2];
}

int main(int argc, char *argv[]) {
    if (argc != 3) {
        printf("Usage: %s <word1> <word2>\n", argv[0]);
        return 1;
    }

    char *word1 = argv[1];
    char *word2 = argv[2];
    int result = editDistance(word1, word2);
    printf("Edit Distance: %d\n", result);

    return 0;
}

 

경의롭다 진짜..

아니 이거 누가 생각해낸 개념이냐 ㅡㅡ

 

내 방법을 개선한 건 요놈

int min(int x, int y, int z) {
    return (x < y) ? ((x < z) ? x : z) : ((y < z) ? y : z);
}

// Helper function for recursion with explicit indices
int editDistanceHelper(char word1[], char word2[], int len1, int len2) {
    // Base Cases
    if (len1 == 0) return len2; // If word1 is empty, insert all characters of word2
    if (len2 == 0) return len1; // If word2 is empty, remove all characters of word1

    // If last characters are the same, move to the next
    if (word1[len1 - 1] == word2[len2 - 1]) {
        return editDistanceHelper(word1, word2, len1 - 1, len2 - 1);
    }

    // Case 1: Replace the last character
    int replace = editDistanceHelper(word1, word2, len1 - 1, len2 - 1);

    // Case 2: Insert a character into word1
    int insert = editDistanceHelper(word1, word2, len1, len2 - 1);

    // Case 3: Remove a character from word1
    int remove = editDistanceHelper(word1, word2, len1 - 1, len2);

    // Return the minimum of these three operations + 1 for the current operation
    return 1 + min(replace, insert, remove);
}

// Main recursive function to calculate Edit Distance
int editDistanceRecursive(char word1[], char word2[]) {
    int len1 = strlen(word1);
    int len2 = strlen(word2);
    return editDistanceHelper(word1, word2, len1, len2);
}

// Main function for testing
int main(int argc, char *argv[]) {
    if (argc != 3) {
        printf("Usage: %s <word1> <word2>\n", argv[0]);
        return 1;
    }

    char *word1 = argv[1];
    char *word2 = argv[2];
    int result = editDistanceRecursive(word1, word2);
    printf("Edit Distance: %d\n", result);

    return 0;
}

 

Recursive를 써야했구만..

 

int min(int a, int b, int c){
    if(a<b&&a<c) return a;
    else if(b<a&&b<c) return b;
    else return c;
}

int helper(char w1[],char w2[], int l1, int l2){
    if(l1==0) return l2;
    if(l2==0) return l1;
    if(w1[l1-1]==w2[l2-1]) return helper(w1,w2,l1-1,l2-1);

    int remove = helper(w1,w2,l1-1,l2);
    int add = helper(w1,w2,l1,l2-1);
    int replace = helper(w1,w2,l1-1,l2-1);
    
    return 1+min(remove,add,replace);
}

int editDistanceRecursive(char word1[], char word2[]){
    int n1 = strlen(word1);
    int n2 = strlen(word2);
    return helper(word1,word2,n1,n2);
}

int main(int argc, char *argv[]) {
    if (argc != 3) {
        printf("Usage: %s <word1> <word2>\n", argv[0]);
        return 1;
    }

    char *word1 = argv[1];
    char *word2 = argv[2];
    int result = editDistanceRecursive(word1, word2);
    printf("Edit Distance: %d\n", result);

    return 0;
}

 

골자는, Recursive를 쓰려면 뒤에서부터 idx를 tracking 해야한다.

 

아니네..

 

걍 이렇게 바꿔주면 idx를 0부터 count할 수도 있다.

 

int min(int a, int b, int c){
    if(a<b&&a<c) return a;
    else if(b<a&&b<c) return b;
    else return c;
}

int helper(char w1[],char w2[], int l1, int l2){
    int n1 = strlen(w1);
    int n2 = strlen(w2);
    if(l1==n1) return n2-l2;
    if(l2==n2) return n1-l1;
    if(w1[l1]==w2[l2]) return helper(w1,w2,l1+1,l2+1);

    int remove = helper(w1,w2,l1+1,l2);
    int add = helper(w1,w2,l1,l2+1);
    int replace = helper(w1,w2,l1+1,l2+1);

    return 1+min(remove,add,replace);
}

int editDistanceRecursive(char word1[], char word2[]){
    return helper(word1,word2,0,0);
}

int main(int argc, char *argv[]) {
    if (argc != 3) {
        printf("Usage: %s <word1> <word2>\n", argv[0]);
        return 1;
    }

    char *word1 = argv[1];
    char *word2 = argv[2];
    int result = editDistanceRecursive(word1, word2);
    printf("Edit Distance: %d\n", result);

    return 0;
}