Some clean-up and simplification for JobObject class.
[MUtilities.git] / src / 3rd_party / strnatcmp / readme.htm
blobc443f3624b9f1878162fa249f1cc45d3b0124a58
1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2 <html><head>
3 <meta http-equiv="content-type" content="text/html; charset=windows-1252">
4 <title>Natural Order String Comparison</title>
5 </head>
6 <body>
8 <h1>Natural Order String Comparison</h1>
9 <p>by <a href="http://sourcefrog.net/">Martin Pool</a>
11 </p><p>Computer string sorting algorithms generally don't order strings
12 containing numbers in the same way that a human would do. Consider:
14 </p><blockquote><pre>rfc1.txt
15 rfc2086.txt
16 rfc822.txt
17 </pre></blockquote>
18 <p>It would be more friendly if the program listed the files as
20 </p><blockquote><pre>rfc1.txt
21 rfc822.txt
22 rfc2086.txt
23 </pre></blockquote>
25 <p>Filenames sort properly if people insert leading zeros, but they
26 don't always do that.
28 </p><p>I've written a subroutine that compares strings according to this
29 natural ordering. You can use this routine in your own software, or
30 download a patch to add it to your favourite Unix program.
33 </p><h2>Sorting</h2>
35 <p>Strings are sorted as usual, except that decimal integer substrings
36 are compared on their numeric value. For example,
38 </p><blockquote>
39 a &lt; a0 &lt; a1 &lt; a1a &lt; a1b &lt; a2 &lt; a10 &lt; a20
40 </blockquote>
42 <p>Strings can contain several number parts:
44 </p><blockquote>
45 x2-g8 &lt; x2-y7 &lt; x2-y08 &lt; x8-y8
46 </blockquote>
48 in which case numeric fields are separated by nonnumeric characters.
49 Leading spaces are ignored. This works very well for IP addresses
50 from log files, for example.
52 <p>
53 Leading zeros are <u>not</u> ignored, which tends to give more
54 reasonable results on decimal fractions.
55 </p>
57 <blockquote>
58 1.001 &lt; 1.002 &lt; 1.010 &lt; 1.02 &lt; 1.1 &lt; 1.3
59 </blockquote>
61 <p>Some applications may wish to change this by modifying the test
62 that calls <code>isspace</code>.
65 </p><p>
66 Performance is linear: each character of the string is scanned
67 at most once, and only as many characters as necessary to decide
68 are considered.
69 </p>
71 <p><a href="http://sourcefrog.net/projects/natsort/example-out.txt">Longer example of the results</a>
74 </p><h2>Licensing</h2>
76 <p>This software is copyright by Martin Pool, and made available under
77 the same licence as zlib:
79 </p><blockquote>
80 <p> This software is provided 'as-is', without any express or implied
81 warranty. In no event will the authors be held liable for any damages
82 arising from the use of this software.
84 </p><p> Permission is granted to anyone to use this software for any purpose,
85 including commercial applications, and to alter it and redistribute it
86 freely, subject to the following restrictions:
88 </p><p> 1. The origin of this software must not be misrepresented; you must not
89 claim that you wrote the original software. If you use this software
90 in a product, an acknowledgment in the product documentation would be
91 appreciated but is not required.
92 </p><p> 2. Altered source versions must be plainly marked as such, and must not be
93 misrepresented as being the original software.
94 </p><p> 3. This notice may not be removed or altered from any source distribution.
95 </p></blockquote>
97 <p>This licence applies only to the C implementation. You are free to
98 reimplement the idea fom scratch in any language.
100 </p><h2>Related Work</h2>
104 POSIX sort(1) has the -n option to sort numbers, but this doesn't
105 work if there is a non-numeric prefix.
106 </p>
109 GNU ls(1) has the <tt>--sort=version</tt> option, which works
110 the same way.
111 </p>
114 The PHP scripting language now has a
115 <a href="http://us3.php.net/manual/en/function.strnatcmp.php">strnatcmp</a>
116 function based on this code.
117 The PHP wrapper was done by Andrei Zimievsky.
118 </p>
121 <a href="http://www.naturalordersort.org/">Stuart
122 Cheshire</a> has a Macintosh <q>system extension</q> to do natural ordering.
123 I indepdendently reinvented the algorithm, but Stuart had it
124 first. I borrowed the term <q>natural sort</q> from him.
126 </p>
129 <a href="http://search.cpan.org/src/EDAVIS/Sort-Versions-1.4/README"><tt>Sort::Versions</tt></a>
130 in Perl. "The code has some special magic to deal with common
131 conventions in program version numbers, like the difference between
132 'decimal' versions (eg perl 5.005) and the Unix kind (eg perl 5.6.1)."
134 </p><p><a href="http://www.cpan.org/modules/by-module/Sort/Sort-Naturally-1.01.readme"><tt>Sort::Naturally</tt></a>
135 is also in Perl, by Sean M. Burke. It uses locale-sensitive character classes to sort words and numeric substrings
136 in a way similar to natsort.
138 </p><p>
139 Ed Avis wrote <a href="http://membled.com/work/apps/todo/numsort">something similar in Haskell</a>.
142 </p><p>
143 Pierre-Luc Paour wrote a <a href="http://pierre-luc.paour.9online.fr/NaturalOrderComparator.java"><tt>NaturalOrderComparator</tt>
144 in Java</a>
146 </p><p>Kristof Coomans wrote a <a href="http://sourcefrog.net/projects/natsort/natcompare.js">natural sort comparison in Javascript</a></p>
148 <p>Alan Davies wrote
149 <a href="http://sourcefrog.net/projects/natsort/natcmp.rb"><tt>natcmp.rb</tt></a>,
150 an implementation in <a href="http://www.ruby-lang.org/">Ruby</a>.
152 </p><p><a href="http://sourceforge.net/projects/numacomp">Numacomp</a>
153 - similar thing in Python.
155 </p><p><a href="http://code.google.com/p/as3natcompare/">as3natcompare</a>
156 implementation in Flash ActionScript 3.
158 </p><h2>Get It!</h2>
160 <ul>
161 <li><a href="http://sourcefrog.net/projects/natsort/strnatcmp.c">strnatcmp.c</a>,
162 <a href="http://sourcefrog.net/projects/natsort/strnatcmp.h">strnatcmp.h</a> - the algorithm itself
164 </li><li><a href="http://sourcefrog.net/projects/natsort/natsort.c">natsort.c</a> - example driver program.
165 (Try <tt>ls -F /proc | natsort</tt>)
167 </li><li><a href="http://sourcefrog.net/projects/natsort/textutils.diff">textutils.diff</a> - patch to add
168 natural sort to sort(1) from GNU textutils-2.0; use the new
169 <tt>-N</tt> option.</li>
171 <li>Natural ordering is now in PHP4rc2, through the <a href="http://php.net/manual/html/function.strnatcasecmp.html">strnatcasecmp</a>
172 and <a href="http://php.net/manual/html/function.strnatcmp.html">strnatcmp</a>
173 functions.</li>
174 </ul>
177 <h2>To Do</h2>
180 Comparison of characters is purely numeric, without taking
181 character set or locale into account. So it is only correct for
182 ASCII. This should probably be a separate function because doing
183 the comparisons will probably introduce a dependency on the OS
184 mechanism for finding the locale and comparing characters.
187 </p><p>
188 It might be good to support multibyte character sets too.
190 </p><p>
191 If you fix either of these, please mail me. They should not be
192 very hard.
196 </p></body></html>