1 @node Searching and Sorting, Pattern Matching, Message Translation, Top
2 @c %MENU% General searching and sorting functions
3 @chapter Searching and Sorting
5 This chapter describes functions for searching and sorting arrays of
6 arbitrary objects. You pass the appropriate comparison function to be
7 applied as an argument, along with the size of the objects in the array
8 and the total number of elements.
11 * Comparison Functions:: Defining how to compare two objects.
12 Since the sort and search facilities
13 are general, you have to specify the
15 * Array Search Function:: The @code{bsearch} function.
16 * Array Sort Function:: The @code{qsort} function.
17 * Search/Sort Example:: An example program.
18 * Hash Search Function:: The @code{hsearch} function.
19 * Tree Search Function:: The @code{tsearch} function.
22 @node Comparison Functions
23 @section Defining the Comparison Function
24 @cindex Comparison Function
26 In order to use the sorted array library functions, you have to describe
27 how to compare the elements of the array.
29 To do this, you supply a comparison function to compare two elements of
30 the array. The library will call this function, passing as arguments
31 pointers to two array elements to be compared. Your comparison function
32 should return a value the way @code{strcmp} (@pxref{String/Array
33 Comparison}) does: negative if the first argument is ``less'' than the
34 second, zero if they are ``equal'', and positive if the first argument
37 Here is an example of a comparison function which works with an array of
38 numbers of type @code{double}:
42 compare_doubles (const void *a, const void *b)
44 const double *da = (const double *) a;
45 const double *db = (const double *) b;
47 return (*da > *db) - (*da < *db);
51 The header file @file{stdlib.h} defines a name for the data type of
52 comparison functions. This type is a GNU extension.
56 @tindex comparison_fn_t
58 int comparison_fn_t (const void *, const void *);
61 @node Array Search Function
62 @section Array Search Function
63 @cindex search function (for arrays)
64 @cindex binary search function (for arrays)
65 @cindex array search function
67 Generally searching for a specific element in an array means that
68 potentially all elements must be checked. @Theglibc{} contains
69 functions to perform linear search. The prototypes for the following
70 two functions can be found in @file{search.h}.
74 @deftypefun {void *} lfind (const void *@var{key}, const void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar})
75 @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
76 The @code{lfind} function searches in the array with @code{*@var{nmemb}}
77 elements of @var{size} bytes pointed to by @var{base} for an element
78 which matches the one pointed to by @var{key}. The function pointed to
79 by @var{compar} is used decide whether two elements match.
81 The return value is a pointer to the matching element in the array
82 starting at @var{base} if it is found. If no matching element is
83 available @code{NULL} is returned.
85 The mean runtime of this function is @code{*@var{nmemb}}/2. This
86 function should only be used if elements often get added to or deleted from
87 the array in which case it might not be useful to sort the array before
93 @deftypefun {void *} lsearch (const void *@var{key}, void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar})
94 @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
95 @c A signal handler that interrupted an insertion and performed an
96 @c insertion itself would leave the array in a corrupt state (e.g. one
97 @c new element initialized twice, with parts of both initializations
98 @c prevailing, and another uninitialized element), but this is just a
99 @c special case of races on user-controlled objects, that have to be
102 @c In case of cancellation, we know the array won't be left in a corrupt
103 @c state; the new element is initialized before the element count is
104 @c incremented, and the compiler can't reorder these operations because
105 @c it can't know that they don't alias. So, we'll either cancel after
106 @c the increment and the initialization are both complete, or the
107 @c increment won't have taken place, and so how far the initialization
108 @c got doesn't matter.
109 The @code{lsearch} function is similar to the @code{lfind} function. It
110 searches the given array for an element and returns it if found. The
111 difference is that if no matching element is found the @code{lsearch}
112 function adds the object pointed to by @var{key} (with a size of
113 @var{size} bytes) at the end of the array and it increments the value of
114 @code{*@var{nmemb}} to reflect this addition.
116 This means for the caller that if it is not sure that the array contains
117 the element one is searching for the memory allocated for the array
118 starting at @var{base} must have room for at least @var{size} more
119 bytes. If one is sure the element is in the array it is better to use
120 @code{lfind} so having more room in the array is always necessary when
121 calling @code{lsearch}.
124 To search a sorted array for an element matching the key, use the
125 @code{bsearch} function. The prototype for this function is in
126 the header file @file{stdlib.h}.
131 @deftypefun {void *} bsearch (const void *@var{key}, const void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare})
132 @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
133 The @code{bsearch} function searches the sorted array @var{array} for an object
134 that is equivalent to @var{key}. The array contains @var{count} elements,
135 each of which is of size @var{size} bytes.
137 The @var{compare} function is used to perform the comparison. This
138 function is called with two pointer arguments and should return an
139 integer less than, equal to, or greater than zero corresponding to
140 whether its first argument is considered less than, equal to, or greater
141 than its second argument. The elements of the @var{array} must already
142 be sorted in ascending order according to this comparison function.
144 The return value is a pointer to the matching array element, or a null
145 pointer if no match is found. If the array contains more than one element
146 that matches, the one that is returned is unspecified.
148 This function derives its name from the fact that it is implemented
149 using the binary search algorithm.
152 @node Array Sort Function
153 @section Array Sort Function
154 @cindex sort function (for arrays)
155 @cindex quick sort function (for arrays)
156 @cindex array sort function
158 To sort an array using an arbitrary comparison function, use the
159 @code{qsort} function. The prototype for this function is in
165 @deftypefun void qsort (void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare})
166 @safety{@prelim{}@mtsafe{}@assafe{}@acunsafe{@acucorrupt{}}}
167 The @code{qsort} function sorts the array @var{array}. The array
168 contains @var{count} elements, each of which is of size @var{size}.
170 The @var{compare} function is used to perform the comparison on the
171 array elements. This function is called with two pointer arguments and
172 should return an integer less than, equal to, or greater than zero
173 corresponding to whether its first argument is considered less than,
174 equal to, or greater than its second argument.
176 @cindex stable sorting
177 @strong{Warning:} If two objects compare as equal, their order after
178 sorting is unpredictable. That is to say, the sorting is not stable.
179 This can make a difference when the comparison considers only part of
180 the elements. Two elements with the same sort key may differ in other
183 Although the object addresses passed to the comparison function lie
184 within the array, they need not correspond with the original locations
185 of those objects because the sorting algorithm may swap around objects
186 in the array before making some comparisons. The only way to perform
187 a stable sort with @code{qsort} is to first augment the objects with a
188 monotonic counter of some kind.
190 Here is a simple example of sorting an array of doubles in numerical
191 order, using the comparison function defined above (@pxref{Comparison
199 qsort (array, size, sizeof (double), compare_doubles);
203 The @code{qsort} function derives its name from the fact that it was
204 originally implemented using the ``quick sort'' algorithm.
206 The implementation of @code{qsort} in this library might not be an
207 in-place sort and might thereby use an extra amount of memory to store
211 @node Search/Sort Example
212 @section Searching and Sorting Example
214 Here is an example showing the use of @code{qsort} and @code{bsearch}
215 with an array of structures. The objects in the array are sorted
216 by comparing their @code{name} fields with the @code{strcmp} function.
217 Then, we can look up individual objects based on their names.
219 @comment This example is dedicated to the memory of Jim Henson. RIP.
221 @include search.c.texi
224 @cindex Kermit the frog
225 The output from this program looks like:
236 Sweetums, the monster
237 Dr. Strangepork, the pig
238 Link Hogthrob, the pig
240 Dr. Bunsen Honeydew, the human
242 Swedish Chef, the human
247 Dr. Bunsen Honeydew, the human
248 Dr. Strangepork, the pig
252 Link Hogthrob, the pig
256 Swedish Chef, the human
257 Sweetums, the monster
262 Couldn't find Janice.
266 @node Hash Search Function
267 @section The @code{hsearch} function.
269 The functions mentioned so far in this chapter are for searching in a sorted
270 or unsorted array. There are other methods to organize information
271 which later should be searched. The costs of insert, delete and search
272 differ. One possible implementation is using hashing tables.
273 The following functions are declared in the header file @file{search.h}.
277 @deftypefun int hcreate (size_t @var{nel})
278 @safety{@prelim{}@mtunsafe{@mtasurace{:hsearch}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
279 @c hcreate @mtasurace:hsearch @ascuheap @acucorrupt @acsmem
280 @c hcreate_r dup @mtsrace:htab @ascuheap @acucorrupt @acsmem
281 The @code{hcreate} function creates a hashing table which can contain at
282 least @var{nel} elements. There is no possibility to grow this table so
283 it is necessary to choose the value for @var{nel} wisely. The method
284 used to implement this function might make it necessary to make the
285 number of elements in the hashing table larger than the expected maximal
286 number of elements. Hashing tables usually work inefficiently if they are
287 filled 80% or more. The constant access time guaranteed by hashing can
288 only be achieved if few collisions exist. See Knuth's ``The Art of
289 Computer Programming, Part 3: Searching and Sorting'' for more
292 The weakest aspect of this function is that there can be at most one
293 hashing table used through the whole program. The table is allocated
294 in local memory out of control of the programmer. As an extension @theglibc{}
295 provides an additional set of functions with a reentrant
296 interface which provide a similar interface but which allow to keep
297 arbitrarily many hashing tables.
299 It is possible to use more than one hashing table in the program run if
300 the former table is first destroyed by a call to @code{hdestroy}.
302 The function returns a non-zero value if successful. If it return zero
303 something went wrong. This could either mean there is already a hashing
304 table in use or the program runs out of memory.
309 @deftypefun void hdestroy (void)
310 @safety{@prelim{}@mtunsafe{@mtasurace{:hsearch}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
311 @c hdestroy @mtasurace:hsearch @ascuheap @acucorrupt @acsmem
312 @c hdestroy_r dup @mtsrace:htab @ascuheap @acucorrupt @acsmem
313 The @code{hdestroy} function can be used to free all the resources
314 allocated in a previous call of @code{hcreate}. After a call to this
315 function it is again possible to call @code{hcreate} and allocate a new
316 table with possibly different size.
318 It is important to remember that the elements contained in the hashing
319 table at the time @code{hdestroy} is called are @emph{not} freed by this
320 function. It is the responsibility of the program code to free those
321 strings (if necessary at all). Freeing all the element memory is not
322 possible without extra, separately kept information since there is no
323 function to iterate through all available elements in the hashing table.
324 If it is really necessary to free a table and all elements the
325 programmer has to keep a list of all table elements and before calling
326 @code{hdestroy} s/he has to free all element's data using this list.
327 This is a very unpleasant mechanism and it also shows that this kind of
328 hashing tables is mainly meant for tables which are created once and
329 used until the end of the program run.
332 Entries of the hashing table and keys for the search are defined using
335 @deftp {Data type} {struct ENTRY}
336 Both elements of this structure are pointers to zero-terminated strings.
337 This is a limiting restriction of the functionality of the
338 @code{hsearch} functions. They can only be used for data sets which use
339 the NUL character always and solely to terminate the records. It is not
340 possible to handle general binary data.
344 Pointer to a zero-terminated string of characters describing the key for
345 the search or the element in the hashing table.
347 Pointer to a zero-terminated string of characters describing the data.
348 If the functions will be called only for searching an existing entry
349 this element might stay undefined since it is not used.
355 @deftypefun {ENTRY *} hsearch (ENTRY @var{item}, ACTION @var{action})
356 @safety{@prelim{}@mtunsafe{@mtasurace{:hsearch}}@asunsafe{}@acunsafe{@acucorrupt{/action==ENTER}}}
357 @c hsearch @mtasurace:hsearch @acucorrupt/action==ENTER
358 @c hsearch_r dup @mtsrace:htab @acucorrupt/action==ENTER
359 To search in a hashing table created using @code{hcreate} the
360 @code{hsearch} function must be used. This function can perform simple
361 search for an element (if @var{action} has the @code{FIND}) or it can
362 alternatively insert the key element into the hashing table. Entries
365 The key is denoted by a pointer to an object of type @code{ENTRY}. For
366 locating the corresponding position in the hashing table only the
367 @code{key} element of the structure is used.
369 If an entry with matching key is found the @var{action} parameter is
370 irrelevant. The found entry is returned. If no matching entry is found
371 and the @var{action} parameter has the value @code{FIND} the function
372 returns a @code{NULL} pointer. If no entry is found and the
373 @var{action} parameter has the value @code{ENTER} a new entry is added
374 to the hashing table which is initialized with the parameter @var{item}.
375 A pointer to the newly added entry is returned.
378 As mentioned before the hashing table used by the functions described so
379 far is global and there can be at any time at most one hashing table in
380 the program. A solution is to use the following functions which are a
381 GNU extension. All have in common that they operate on a hashing table
382 which is described by the content of an object of the type @code{struct
383 hsearch_data}. This type should be treated as opaque, none of its
384 members should be changed directly.
388 @deftypefun int hcreate_r (size_t @var{nel}, struct hsearch_data *@var{htab})
389 @safety{@prelim{}@mtsafe{@mtsrace{:htab}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
390 @c Unlike the lsearch array, the htab is (at least in part) opaque, so
391 @c let's make it absolutely clear that ensuring exclusive access is a
392 @c caller responsibility.
394 @c Cancellation is unlikely to leave the htab in a corrupt state: the
395 @c last field to be initialized is the one that tells whether the entire
396 @c data structure was initialized, and there's a function call (calloc)
397 @c in between that will often ensure all other fields are written before
398 @c the table. However, should this call be inlined (say with LTO), this
399 @c assumption may not hold. The calloc call doesn't cross our library
400 @c interface barrier, so let's consider this could happen and mark this
401 @c with @acucorrupt. It's no safety loss, since we already have
402 @c @ascuheap anyway...
404 @c hcreate_r @mtsrace:htab @ascuheap @acucorrupt @acsmem
406 @c calloc dup @ascuheap @acsmem
407 The @code{hcreate_r} function initializes the object pointed to by
408 @var{htab} to contain a hashing table with at least @var{nel} elements.
409 So this function is equivalent to the @code{hcreate} function except
410 that the initialized data structure is controlled by the user.
412 This allows having more than one hashing table at one time. The memory
413 necessary for the @code{struct hsearch_data} object can be allocated
414 dynamically. It must be initialized with zero before calling this
417 The return value is non-zero if the operation was successful. If the
418 return value is zero, something went wrong, which probably means the
419 programs ran out of memory.
424 @deftypefun void hdestroy_r (struct hsearch_data *@var{htab})
425 @safety{@prelim{}@mtsafe{@mtsrace{:htab}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
426 @c The table is released while the table pointer still points to it.
427 @c Async cancellation is thus unsafe, but it already was because we call
428 @c free(). Using the table in a handler while it's being released would
429 @c also be dangerous, but calling free() already makes it unsafe, and
430 @c the requirement on the caller to ensure exclusive access already
431 @c guarantees this doesn't happen, so we don't get @asucorrupt.
433 @c hdestroy_r @mtsrace:htab @ascuheap @acucorrupt @acsmem
434 @c free dup @ascuheap @acsmem
435 The @code{hdestroy_r} function frees all resources allocated by the
436 @code{hcreate_r} function for this very same object @var{htab}. As for
437 @code{hdestroy} it is the programs responsibility to free the strings
438 for the elements of the table.
443 @deftypefun int hsearch_r (ENTRY @var{item}, ACTION @var{action}, ENTRY **@var{retval}, struct hsearch_data *@var{htab})
444 @safety{@prelim{}@mtsafe{@mtsrace{:htab}}@assafe{}@acunsafe{@acucorrupt{/action==ENTER}}}
445 @c Callers have to ensure mutual exclusion; insertion, if cancelled,
446 @c leaves the table in a corrupt state.
448 @c hsearch_r @mtsrace:htab @acucorrupt/action==ENTER
451 The @code{hsearch_r} function is equivalent to @code{hsearch}. The
452 meaning of the first two arguments is identical. But instead of
453 operating on a single global hashing table the function works on the
454 table described by the object pointed to by @var{htab} (which is
455 initialized by a call to @code{hcreate_r}).
457 Another difference to @code{hcreate} is that the pointer to the found
458 entry in the table is not the return value of the functions. It is
459 returned by storing it in a pointer variables pointed to by the
460 @var{retval} parameter. The return value of the function is an integer
461 value indicating success if it is non-zero and failure if it is zero.
462 In the latter case the global variable @var{errno} signals the reason for
467 The table is filled and @code{hsearch_r} was called with a so far
468 unknown key and @var{action} set to @code{ENTER}.
470 The @var{action} parameter is @code{FIND} and no corresponding element
471 is found in the table.
476 @node Tree Search Function
477 @section The @code{tsearch} function.
479 Another common form to organize data for efficient search is to use
480 trees. The @code{tsearch} function family provides a nice interface to
481 functions to organize possibly large amounts of data by providing a mean
482 access time proportional to the logarithm of the number of elements.
483 @Theglibc{} implementation even guarantees that this bound is
484 never exceeded even for input data which cause problems for simple
485 binary tree implementations.
487 The functions described in the chapter are all described in the @w{System
488 V} and X/Open specifications and are therefore quite portable.
490 In contrast to the @code{hsearch} functions the @code{tsearch} functions
491 can be used with arbitrary data and not only zero-terminated strings.
493 The @code{tsearch} functions have the advantage that no function to
494 initialize data structures is necessary. A simple pointer of type
495 @code{void *} initialized to @code{NULL} is a valid tree and can be
496 extended or searched. The prototypes for these functions can be found
497 in the header file @file{search.h}.
501 @deftypefun {void *} tsearch (const void *@var{key}, void **@var{rootp}, comparison_fn_t @var{compar})
502 @safety{@prelim{}@mtsafe{@mtsrace{:rootp}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
503 @c The tree is not modified in a thread-safe manner, and rotations may
504 @c leave the tree in an inconsistent state that could be observed in an
505 @c asynchronous signal handler (except for the caller-synchronization
506 @c requirement) or after asynchronous cancellation of the thread
507 @c performing the rotation or the insertion.
508 The @code{tsearch} function searches in the tree pointed to by
509 @code{*@var{rootp}} for an element matching @var{key}. The function
510 pointed to by @var{compar} is used to determine whether two elements
511 match. @xref{Comparison Functions}, for a specification of the functions
512 which can be used for the @var{compar} parameter.
514 If the tree does not contain a matching entry the @var{key} value will
515 be added to the tree. @code{tsearch} does not make a copy of the object
516 pointed to by @var{key} (how could it since the size is unknown).
517 Instead it adds a reference to this object which means the object must
518 be available as long as the tree data structure is used.
520 The tree is represented by a pointer to a pointer since it is sometimes
521 necessary to change the root node of the tree. So it must not be
522 assumed that the variable pointed to by @var{rootp} has the same value
523 after the call. This also shows that it is not safe to call the
524 @code{tsearch} function more than once at the same time using the same
525 tree. It is no problem to run it more than once at a time on different
528 The return value is a pointer to the matching element in the tree. If a
529 new element was created the pointer points to the new data (which is in
530 fact @var{key}). If an entry had to be created and the program ran out
531 of space @code{NULL} is returned.
536 @deftypefun {void *} tfind (const void *@var{key}, void *const *@var{rootp}, comparison_fn_t @var{compar})
537 @safety{@prelim{}@mtsafe{@mtsrace{:rootp}}@assafe{}@acsafe{}}
538 The @code{tfind} function is similar to the @code{tsearch} function. It
539 locates an element matching the one pointed to by @var{key} and returns
540 a pointer to this element. But if no matching element is available no
541 new element is entered (note that the @var{rootp} parameter points to a
542 constant pointer). Instead the function returns @code{NULL}.
545 Another advantage of the @code{tsearch} function in contrast to the
546 @code{hsearch} functions is that there is an easy way to remove
551 @deftypefun {void *} tdelete (const void *@var{key}, void **@var{rootp}, comparison_fn_t @var{compar})
552 @safety{@prelim{}@mtsafe{@mtsrace{:rootp}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
553 To remove a specific element matching @var{key} from the tree
554 @code{tdelete} can be used. It locates the matching element using the
555 same method as @code{tfind}. The corresponding element is then removed
556 and a pointer to the parent of the deleted node is returned by the
557 function. If there is no matching entry in the tree nothing can be
558 deleted and the function returns @code{NULL}. If the root of the tree
559 is deleted @code{tdelete} returns some unspecified value not equal to
565 @deftypefun void tdestroy (void *@var{vroot}, __free_fn_t @var{freefct})
566 @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
567 If the complete search tree has to be removed one can use
568 @code{tdestroy}. It frees all resources allocated by the @code{tsearch}
569 function to generate the tree pointed to by @var{vroot}.
571 For the data in each tree node the function @var{freefct} is called.
572 The pointer to the data is passed as the argument to the function. If
573 no such work is necessary @var{freefct} must point to a function doing
574 nothing. It is called in any case.
576 This function is a GNU extension and not covered by the @w{System V} or
577 X/Open specifications.
580 In addition to the function to create and destroy the tree data
581 structure, there is another function which allows you to apply a
582 function to all elements of the tree. The function must have this type:
585 void __action_fn_t (const void *nodep, VISIT value, int level);
588 The @var{nodep} is the data value of the current node (once given as the
589 @var{key} argument to @code{tsearch}). @var{level} is a numeric value
590 which corresponds to the depth of the current node in the tree. The
591 root node has the depth @math{0} and its children have a depth of
592 @math{1} and so on. The @code{VISIT} type is an enumeration type.
594 @deftp {Data Type} VISIT
595 The @code{VISIT} value indicates the status of the current node in the
596 tree and how the function is called. The status of a node is either
597 `leaf' or `internal node'. For each leaf node the function is called
598 exactly once, for each internal node it is called three times: before
599 the first child is processed, after the first child is processed and
600 after both children are processed. This makes it possible to handle all
601 three methods of tree traversal (or even a combination of them).
605 The current node is an internal node and the function is called before
606 the first child was processed.
608 The current node is an internal node and the function is called after
609 the first child was processed.
611 The current node is an internal node and the function is called after
612 the second child was processed.
614 The current node is a leaf.
620 @deftypefun void twalk (const void *@var{root}, __action_fn_t @var{action})
621 @safety{@prelim{}@mtsafe{@mtsrace{:root}}@assafe{}@acsafe{}}
622 For each node in the tree with a node pointed to by @var{root}, the
623 @code{twalk} function calls the function provided by the parameter
624 @var{action}. For leaf nodes the function is called exactly once with
625 @var{value} set to @code{leaf}. For internal nodes the function is
626 called three times, setting the @var{value} parameter or @var{action} to
627 the appropriate value. The @var{level} argument for the @var{action}
628 function is computed while descending the tree with increasing the value
629 by one for the descend to a child, starting with the value @math{0} for
632 Since the functions used for the @var{action} parameter to @code{twalk}
633 must not modify the tree data, it is safe to run @code{twalk} in more
634 than one thread at the same time, working on the same tree. It is also
635 safe to call @code{tfind} in parallel. Functions which modify the tree
636 must not be used, otherwise the behavior is undefined.