blob | fc281597fd2fd5ad2299eab0848da99355e383fc |

4 /*

5 * This function implements the Damerau-Levenshtein algorithm to

6 * calculate a distance between strings.

7 *

8 * Basically, it says how many letters need to be swapped, substituted,

9 * deleted from, or added to string1, at least, to get string2.

10 *

11 * The idea is to build a distance matrix for the substrings of both

12 * strings. To avoid a large space complexity, only the last three rows

13 * are kept in memory (if swaps had the same or higher cost as one deletion

14 * plus one insertion, only two rows would be needed).

15 *

16 * At any stage, "i + 1" denotes the length of the current substring of

17 * string1 that the distance is calculated for.

18 *

19 * row2 holds the current row, row1 the previous row (i.e. for the substring

20 * of string1 of length "i"), and row0 the row before that.

21 *

22 * In other words, at the start of the big loop, row2[j + 1] contains the

23 * Damerau-Levenshtein distance between the substring of string1 of length

24 * "i" and the substring of string2 of length "j + 1".

25 *

26 * All the big loop does is determine the partial minimum-cost paths.

27 *

28 * It does so by calculating the costs of the path ending in characters

29 * i (in string1) and j (in string2), respectively, given that the last

30 * operation is a substitution, a swap, a deletion, or an insertion.

31 *

32 * This implementation allows the costs to be weighted:

33 *

34 * - w (as in "sWap")

35 * - s (as in "Substitution")

36 * - a (for insertion, AKA "Add")

37 * - d (as in "Deletion")

38 *

39 * Note that this algorithm calculates a distance _iff_ d == a.

40 */

43 {

57 /* substitution */

59 /* swap */

64 /* deletion */

67 /* insertion */

70 }

76 }

84 }