Fix spelling error in docs.
[PostgreSQL.git] / doc / src / sgml / fuzzystrmatch.sgml
blob5411bea9d51e4e4acdd91eab8f982e683e024df5
1 <!-- $PostgreSQL$ -->
3 <sect1 id="fuzzystrmatch">
4 <title>fuzzystrmatch</title>
6 <indexterm zone="fuzzystrmatch">
7 <primary>fuzzystrmatch</primary>
8 </indexterm>
10 <para>
11 The <filename>fuzzystrmatch</> module provides several
12 functions to determine similarities and distance between strings.
13 </para>
15 <sect2>
16 <title>Soundex</title>
18 <para>
19 The Soundex system is a method of matching similar-sounding names
20 by converting them to the same code. It was initially used by the
21 United States Census in 1880, 1900, and 1910. Note that Soundex
22 is not very useful for non-English names.
23 </para>
25 <para>
26 The <filename>fuzzystrmatch</> module provides two functions
27 for working with Soundex codes:
28 </para>
30 <programlisting>
31 soundex(text) returns text
32 difference(text, text) returns int
33 </programlisting>
35 <para>
36 The <function>soundex</> function converts a string to its Soundex code.
37 The <function>difference</> function converts two strings to their Soundex
38 codes and then reports the number of matching code positions. Since
39 Soundex codes have four characters, the result ranges from zero to four,
40 with zero being no match and four being an exact match. (Thus, the
41 function is misnamed &mdash; <function>similarity</> would have been
42 a better name.)
43 </para>
45 <para>
46 Here are some usage examples:
47 </para>
49 <programlisting>
50 SELECT soundex('hello world!');
52 SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann');
53 SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
54 SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
56 CREATE TABLE s (nm text);
58 INSERT INTO s VALUES ('john');
59 INSERT INTO s VALUES ('joan');
60 INSERT INTO s VALUES ('wobbly');
61 INSERT INTO s VALUES ('jack');
63 SELECT * FROM s WHERE soundex(nm) = soundex('john');
65 SELECT * FROM s WHERE difference(s.nm, 'john') &gt; 2;
66 </programlisting>
67 </sect2>
69 <sect2>
70 <title>Levenshtein</title>
72 <para>
73 This function calculates the Levenshtein distance between two strings:
74 </para>
76 <programlisting>
77 levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int
78 levenshtein(text source, text target) returns int
79 </programlisting>
81 <para>
82 Both <literal>source</literal> and <literal>target</literal> can be any
83 non-null string, with a maximum of 255 bytes. The cost parameters
84 specify how much to charge for a character insertion, deletion, or
85 substitution, respectively. You can omit the cost parameters, as in
86 the second version of the function; in that case they all default to 1.
87 </para>
89 <para>
90 Examples:
91 </para>
93 <programlisting>
94 test=# SELECT levenshtein('GUMBO', 'GAMBOL');
95 levenshtein
96 -------------
98 (1 row)
100 test=# SELECT levenshtein('GUMBO', 'GAMBOL', 2,1,1);
101 levenshtein
102 -------------
104 (1 row)
105 </programlisting>
106 </sect2>
108 <sect2>
109 <title>Metaphone</title>
111 <para>
112 Metaphone, like Soundex, is based on the idea of constructing a
113 representative code for an input string. Two strings are then
114 deemed similar if they have the same codes.
115 </para>
117 <para>
118 This function calculates the metaphone code of an input string:
119 </para>
121 <programlisting>
122 metaphone(text source, int max_output_length) returns text
123 </programlisting>
125 <para>
126 <literal>source</literal> has to be a non-null string with a maximum of
127 255 characters. <literal>max_output_length</literal> sets the maximum
128 length of the output metaphone code; if longer, the output is truncated
129 to this length.
130 </para>
132 <para>
133 Example:
134 </para>
136 <programlisting>
137 test=# SELECT metaphone('GUMBO', 4);
138 metaphone
139 -----------
141 (1 row)
142 </programlisting>
143 </sect2>
145 <sect2>
146 <title>Double Metaphone</title>
148 <para>
149 The Double Metaphone system computes two <quote>sounds like</> strings
150 for a given input string &mdash; a <quote>primary</> and an
151 <quote>alternate</>. In most cases they are the same, but for non-English
152 names especially they can be a bit different, depending on pronunciation.
153 These functions compute the primary and alternate codes:
154 </para>
156 <programlisting>
157 dmetaphone(text source) returns text
158 dmetaphone_alt(text source) returns text
159 </programlisting>
161 <para>
162 There is no length limit on the input strings.
163 </para>
165 <para>
166 Example:
167 </para>
169 <programlisting>
170 test=# select dmetaphone('gumbo');
171 dmetaphone
172 ------------
174 (1 row)
175 </programlisting>
176 </sect2>
178 </sect1>