3 <sect1 id=
"fuzzystrmatch">
4 <title>fuzzystrmatch
</title>
6 <indexterm zone=
"fuzzystrmatch">
7 <primary>fuzzystrmatch
</primary>
11 The
<filename>fuzzystrmatch<
/> module provides several
12 functions to determine similarities and distance between strings.
16 <title>Soundex
</title>
19 The Soundex system is a method of matching similar-sounding names
20 by converting them to the same code. It was initially used by the
21 United States Census in
1880,
1900, and
1910. Note that Soundex
22 is not very useful for non-English names.
26 The
<filename>fuzzystrmatch<
/> module provides two functions
27 for working with Soundex codes:
31 soundex(text) returns text
32 difference(text, text) returns int
36 The
<function>soundex<
/> function converts a string to its Soundex code.
37 The
<function>difference<
/> function converts two strings to their Soundex
38 codes and then reports the number of matching code positions. Since
39 Soundex codes have four characters, the result ranges from zero to four,
40 with zero being no match and four being an exact match. (Thus, the
41 function is misnamed
— <function>similarity<
/> would have been
46 Here are some usage examples:
50 SELECT soundex('hello world!');
52 SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann');
53 SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
54 SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
56 CREATE TABLE s (nm text);
58 INSERT INTO s VALUES ('john');
59 INSERT INTO s VALUES ('joan');
60 INSERT INTO s VALUES ('wobbly');
61 INSERT INTO s VALUES ('jack');
63 SELECT * FROM s WHERE soundex(nm) = soundex('john');
65 SELECT * FROM s WHERE difference(s.nm, 'john')
> 2;
70 <title>Levenshtein
</title>
73 This function calculates the Levenshtein distance between two strings:
77 levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int
78 levenshtein(text source, text target) returns int
82 Both
<literal>source
</literal> and
<literal>target
</literal> can be any
83 non-null string, with a maximum of
255 bytes. The cost parameters
84 specify how much to charge for a character insertion, deletion, or
85 substitution, respectively. You can omit the cost parameters, as in
86 the second version of the function; in that case they all default to
1.
94 test=# SELECT levenshtein('GUMBO', 'GAMBOL');
100 test=# SELECT levenshtein('GUMBO', 'GAMBOL',
2,
1,
1);
109 <title>Metaphone
</title>
112 Metaphone, like Soundex, is based on the idea of constructing a
113 representative code for an input string. Two strings are then
114 deemed similar if they have the same codes.
118 This function calculates the metaphone code of an input string:
122 metaphone(text source, int max_output_length) returns text
126 <literal>source
</literal> has to be a non-null string with a maximum of
127 255 characters.
<literal>max_output_length
</literal> sets the maximum
128 length of the output metaphone code; if longer, the output is truncated
137 test=# SELECT metaphone('GUMBO',
4);
146 <title>Double Metaphone
</title>
149 The Double Metaphone system computes two
<quote>sounds like<
/> strings
150 for a given input string
— a
<quote>primary<
/> and an
151 <quote>alternate<
/>. In most cases they are the same, but for non-English
152 names especially they can be a bit different, depending on pronunciation.
153 These functions compute the primary and alternate codes:
157 dmetaphone(text source) returns text
158 dmetaphone_alt(text source) returns text
162 There is no length limit on the input strings.
170 test=# select dmetaphone('gumbo');