4 **LibHashSet** is a [*hash set*](https://en.wikipedia.org/wiki/Hash_table) and [*hash map*](https://en.wikipedia.org/wiki/Hash_table) implementation for C99. It uses open addressing and double hashing.
6 At this time, the *only* types of elements supported are `uint16_t`, `uint32_t` and `uint64_t`.
8 This hash set implementation has successfully been tested to *efficiently* handle several billions of items 😏
14 Here is a simple example of how to use LibHashSet in your application:
23 uintptr_t cursor = 0U;
25 /* create new hash set instance */
26 hash_set64_t* const hash_set = hash_set_create64(0U, -1.0, 42U);
29 fputs("Allocation has failed!\n", stderr);
33 /* add a number of items to the hash set, the set will grow as needed */
34 puts("Insertign items, please wait...");
35 while (have_more_items())
37 const errno_t error = hash_set_insert64(hash_set, get_next_item());
40 fprintf(stderr, "Insert operation has failed! (error: %d)\n", error);
46 /* print total number of items in the hash set*/
47 printf("Total number of items: %zu\n\n", hash_set_size64(hash_set));
49 /* print all items in the set */
50 while (hash_set_iterate64(hash_set, &cursor, &item) == 0)
52 printf("Item: %016llX\n", item);
55 /* destroy the hash set, when it is no longer needed! */
56 hash_set_destroy64(hash_set);
65 This section describes the LibHashSet programming interface. The functions for managing *hash sets* are declared in `<hash_set.h>`, whereas the functions for managing to *hash maps* are declared in `<hash_map.h>`.
67 LibHashSet supports sets and maps containing elements of different integral types. For each element type, separate functions are provided. The functions for `uint16_t`, `uint32_t` and `uint64_t` can be distinguished by the suffix `…16`, `…32` and `…64`, respectively. In the following, the functions are described in their "generic" form.
69 ***Note:*** On Microsoft Windows (Visual C++), when using LibHashSet as a "shared" library (DLL file), the macro `HASHSET_DLL` must be defined *before* including the `<hash_set.h>` or `<hash_map.h>` header files!
76 A `struct` that represents a LibHashSet *hash set* instance. Hash set instances can be allocated and de-allocated via the [hash_set_create()](#hash_set_create) and [hash_set_destroy()](#hash_set_destroy) functions, respectively.
78 ***Note:*** Application code shall treat this `struct` as opaque. The internals may change in future versions!
81 typedef struct _hash_set hash_set_t;
86 A `struct` that represents a LibHashSet *hash map* instance. Hash map instances can be allocated and de-allocated via the [hash_map_create()](#hash_map_create) and [hash_map_destroy()](#hash_map_destroy) functions, respectively.
88 ***Note:*** Application code shall treat this `struct` as opaque! The internals may change in future versions!
91 typedef struct _hash_map hash_map_t;
97 ### Version information
99 The *major*, *minor* and *patch* version of the LibHashSet library:
102 extern const uint16_t HASHSET_VERSION_MAJOR;
103 extern const uint16_t HASHSET_VERSION_MINOR;
104 extern const uint16_t HASHSET_VERSION_PATCH;
107 ### Build information
109 The build *date* and *time* of the LibHashSet library:
112 extern const char *const HASHSET_BUILD_DATE;
113 extern const char *const HASHSET_BUILD_TIME;
119 This section describes all functions for managing `hash_set_t` instances.
121 ### hash_set_create()
123 Allocates a new hash set instance. The new hash set instance is empty initially.
126 hash_set_t *hash_set_create(
127 const size_t initial_capacity,
128 const double load_factor,
136 The initial capacity of the hash set (number of items). The given count will be rounded to the next power of two. If the number of items to be inserted into the hash set can be estimated beforehand, then the initial capacity should be adjusted accordingly to avoid unnecessary re-allocations. In any case, the hash set will be able to grow dynamically as needed. If this parameter is set to *zero*, the the *default* initial capacity (8192) is used.
139 The load factor to be applied to the hash set. The given load factor will be clipped to the **0.1** to **1.0** range. Generally, the default load factor (0.75) offers a good trade-off between performance and memory usage. Higher load factors decrease the memory overhead, but also may increase the time required for insert, lookup and remove operations. If this parameter is less than or equal to *zero*, the *default* load factor is used.
142 The "seed" value that is used to tweak the internal hash computation. The application should set this parameter to a value that is hard to predict and that is unlikely to repeat (e.g., a high-resolution timer is suitable here).
146 On success, this function returns a pointer to a new hash set instance. On error, a `NULL` pointer is returned.
148 ***Note:*** To avoid a memory leak, the returned pointer must be de-allocated by the application using the [hash_set_destroy()](#hash_set_destroy) function, as soon as the instance is *not* needed anymore!
150 ### hash_set_destroy()
152 De-allocates an existing hash set instance. All items in the hash set are discarded.
155 void hash_set_destroy(
163 A pointer to the hash set instance that is to be destroyed, as returned by the [hash_set_create()](#hash_set_create) function.
164 ***Note:*** The given pointer is *invalidated* by this function, and it **must not** be used afterwards!
166 ### hash_set_insert()
168 Tries to insert the given item into the hash set. The operation fails, if the set already contains the given item.
170 ***Note:*** If the item is actually inserted, then the hash set *may* need to grow.
173 errno_t hash_set_insert(
174 hash_set_t *const instance,
182 A pointer to the hash set instance to be modified, as returned by the [hash_set_create()](#hash_set_create) function.
185 The item to be inserted into the hash set.
189 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
192 An invalid argument was given, e.g. `instance` was set to `NULL`.
195 The given item was *not* inserted into the hash set (again), because it was already present.
198 The set failed to grow, because the required amount of memory could *not* be allocated (out of memory).
201 The set needs to grow, but doing so would exceed the maximum size supported by the underlying system.
204 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
206 ### hash_set_remove()
208 Tries to remove the given item from the hash set. The operation fails, if the set does *not* contain the given item.
210 ***Note:*** If the item is actually removed, then the hash set *may* shrink.
213 errno_t hash_set_remove(
214 hash_set_t *const instance,
222 A pointer to the hash set instance to be modified, as returned by the [hash_set_create()](#hash_set_create) function.
225 The item to be removed from the hash set.
229 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
232 An invalid argument was given, e.g. `instance` was set to `NULL`.
235 The given item could *not* be removed from the hash set, because *no* such item was present.
238 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
242 Discards *all* items from the hash set at once.
245 errno_t hash_set_clear(
246 hash_set_t *const instance
252 A pointer to the hash set instance to be modified, as returned by the [hash_set_create()](#hash_set_create) function.
256 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
259 An invalid argument was given, e.g. `instance` was set to `NULL`.
262 The hash set was *not* cleared, because it already was empty. Please try again later!
265 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
267 ### hash_set_contains()
269 Tests whether the hash set contains an item. The operation fails, if the set does *not* contain the given item.
272 errno_t hash_set_contains(
273 const hash_set_t *const instance,
281 A pointer to the hash set instance to be examined, as returned by the [hash_set_create()](#hash_set_create) function.
284 The item to be searched in the hash set.
288 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
291 An invalid argument was given, e.g. `instance` was set to `NULL`.
294 The hash set does *not* contain the specified item.
297 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
299 ### hash_set_iterate()
301 Iterates through the items stored in the hash set. The elements are iterated in **no** particular order.
303 This function returns one item at a time. It should be called *repeatedly*, until the end of the set is encountered.
305 ***Warning:*** The result is *undefined*, if the set is modified while the iteration is in progress!
308 errno_t hash_set_iterate(
309 const hash_set_t *const instance,
310 size_t *const cursor,
318 A pointer to the hash set instance to be examined, as returned by the [hash_set_create()](#hash_set_create) function.
321 A pointer to a variable of type `size_t` where the current iterator state (position) is saved.
322 This variable **must** be initialized to the value `0U`, by the calling application, prior to the the *first* invocation!
323 Each invocation will update the value of `*cursor`. This value **shall not** be altered by the application.
326 A pointer to a variable of type `value_t` where the next item in the set is stored on success.
327 The content of the variable should be considered *undefined*, if the invocation has failed.
331 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
334 An invalid argument was given, e.g. `instance` was set to `NULL`.
337 No more items. The end of the set has been encountered.
340 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
344 Returns the current number of (distinct) items in the hash set.
347 size_t hash_set_size(
348 const hash_set_t *const instance
355 A pointer to the hash set instance to be examined, as returned by the [hash_set_create()](#hash_set_create) function.
359 This function returns the number of (distinct) items in the hash set.
363 Returns technical information about the hash set.
366 errno_t hash_set_info(
367 const hash_set_t *const instance,
368 size_t *const capacity,
370 size_t *const deleted,
378 A pointer to the hash set instance to be examined, as returned by the [hash_set_create()](#hash_set_create) function.
381 A pointer to a variable of type `size_t` where the current total *capacity* of the hash set is stored.
382 This value will always be greater than or equal to the sum of the *valid* and *deleted* entries.
385 A pointer to a variable of type `size_t` where the current number of *valid* entries in the hash set is stored.
386 This value is equivalent to the return value of the [hash_set_size()](#hash_set_size) function.
389 A pointer to a variable of type `size_t` where the current number of *deleted* entries in the hash set is stored.
390 For technical reasons, entires are *not* removed from the set immediately, but are marked as "deleted".
393 A pointer to a variable of type `size_t` where the current "grow" *limit* of the hash set is stored.
394 The hash set is grown automatically, as soon as the sum of the *valid* and *deleted* entries exceeds this limit.
398 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
401 An invalid argument was given, e.g. `instance` was set to `NULL`.
404 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
408 Dump the current status and content of all "slots" of the hash set.
411 errno_t hash_set_dump(
412 const hash_set_t *const instance,
413 const hash_map_callback_t callback
420 A pointer to the hash set instance to be examined, as returned by the [hash_set_create()](#hash_set_create) function.
423 A pointer to the callback function that will be invoked once for every "slot" in the hash set.
425 The callback function is defined as follows:
427 typedef int (*hash_map_callback_t)(
438 The index of the current "slot" within the hash set.
441 Indicates the status of the current "slot":
442 - `'u'` – the slot is *unused*
443 - `'v'` – the slot is *valid*
444 - `'d'` – the slot is *deleted*
447 The item that is stored at the current "slot" index.
451 If the function returns a *non-zero* value, the iteration continues; otherwise it is cancelled.
455 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
458 An invalid argument was given, e.g. `instance` was set to `NULL`.
461 The operation was cancelled by the calling application.
464 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
469 This section describes all functions for managing `hash_map_t` instances.
471 ### hash_map_create()
473 Allocates a new hash map instance. The new hash map instance is empty initially.
476 hash_map_t *hash_map_create(
477 const size_t initial_capacity,
478 const double load_factor,
486 The initial capacity of the hash map. The given count will be rounded to the next power of two. If the number of key-value pairs to be inserted into the hash map can be estimated beforehand, then the initial capacity should be adjusted accordingly to avoid unnecessary re-allocations. In any case, the hash map will be able to grow dynamically as needed. If this parameter is map to *zero*, the the *default* initial capacity (8192) is used.
489 The load factor to be applied to the hash map. The given load factor will be clipped to the **0.1** to **1.0** range. Generally, the default load factor (0.75) offers a good trade-off between performance and memory usage. Higher load factors decrease the memory overhead, but also may increase the time required for insert, lookup and remove operations. If this parameter is less than or equal to *zero*, the *default* load factor is used.
492 The "seed" value that is used to tweak the internal hash computation. The application should set this parameter to a value that is hard to predict and that is unlikely to repeat (e.g., a high-resolution timer is suitable here).
496 On success, this function returns a pointer to a new hash map instance. On error, a `NULL` pointer is returned.
498 ***Note:*** To avoid a memory leak, the returned pointer must be de-allocated by the application using the [hash_map_destroy()](#hash_map_destroy) function, as soon as the instance is *not* needed anymore!
500 ### hash_map_destroy()
502 De-allocates an existing hash map instance. All key-value pairs in the hash map are discarded.
505 void hash_map_destroy(
513 A pointer to the hash map instance that is to be destroyed, as returned by the [hash_map_create()](#hash_map_create) function.
514 ***Note:*** The given pointer is *invalidated* by this function, and it **must not** be used afterwards!
516 ### hash_map_insert()
518 Tries to insert the given key-value pair into the hash map.
520 ***Note:*** If the key is actually inserted, then the hash map *may* need to grow.
523 errno_t hash_map_insert(
524 hash_map_t *const instance,
534 A pointer to the hash map instance to be modified, as returned by the [hash_map_create()](#hash_map_create) function.
537 The key to be inserted into the hash map.
540 The value to be associated with the given key.
543 If the map already contains the specified key, then if this parameter is *non-zero*, the value associated with the existing key will be updated; otherwise, the value currently associated with key remains unchanged.
547 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
550 An invalid argument was given, e.g. `instance` was map to `NULL`.
553 The given key was *not* inserted into the hash map (again), because it was already present.
554 Nonetheless, if `update` was non-zero, the value associated with the existing key has been updated.
557 The map failed to grow, because the required amount of memory could *not* be allocated (out of memory).
560 The map needs to grow, but doing so would exceed the maximum size supported by the underlying system.
563 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
565 ### hash_map_remove()
567 Tries to remove the given key from the hash map. The operation fails, if the map does *not* contain the given key.
569 The value associated with the removed key is discarded.
571 ***Note:*** If the key is actually removed, then the hash map *may* shrink.
574 errno_t hash_map_remove(
575 hash_map_t *const instance,
584 A pointer to the hash map instance to be modified, as returned by the [hash_map_create()](#hash_map_create) function.
587 The key to be removed from the hash map.
590 A pointer to a variable of type `value_t` where the value that was associated with the deleted key is stored.
591 The content of the variable should be considered *undefined*, if the key could *not* be removed.
592 ***Note:*** This parameter can be set to `NULL`, in which case the value will *not* be reported to the application.
596 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
599 An invalid argument was given, e.g. `instance` was map to `NULL`.
602 The given key could *not* be removed from the hash map, because *no* such key was present.
605 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
609 Discards *all* key-value pairs from the hash map at once.
612 errno_t hash_map_clear(
613 hash_map_t *const instance
619 A pointer to the hash map instance to be modified, as returned by the [hash_map_create()](#hash_map_create) function.
623 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
626 An invalid argument was given, e.g. `instance` was map to `NULL`.
629 The hash map was *not* cleared, because it already was empty. Please try again later!
632 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
634 ### hash_map_contains()
636 Tests whether the hash map contains a key. The operation fails, if the map does *not* contain the given key.
639 errno_t hash_map_contains(
640 const hash_map_t *const instance,
648 A pointer to the hash map instance to be examined, as returned by the [hash_map_create()](#hash_map_create) function.
651 The key to be searched in the hash map.
655 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
658 An invalid argument was given, e.g. `instance` was map to `NULL`.
661 The hash map does *not* contain the specified key.
664 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
668 Retrieves the value that is associated with the given key. The operation fails, if the map does *not* contain the key.
671 errno_t hash_map_get(
672 const hash_map_t *const instance,
681 A pointer to the hash map instance to be examined, as returned by the [hash_map_create()](#hash_map_create) function.
684 The key to be searched in the hash map.
687 A pointer to a variable of type `value_t` where the value associated with the key is stored on success.
688 The content of the variable should be considered *undefined*, if the invocation has failed.
692 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
695 An invalid argument was given, e.g. `instance` was map to `NULL`.
698 The hash map does *not* contain the specified key.
701 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
703 ### hash_map_iterate()
705 Iterates through the key-value pairs stored in the hash map. The entries are iterated in **no** particular order.
707 This function returns one key-value pair at a time. It should be called *repeatedly*, until the end of the map is encountered.
709 ***Warning:*** The result is *undefined*, if the map is modified while the iteration is in progress!
712 errno_t hash_map_iterate(
713 const hash_map_t *const instance,
714 size_t *const cursor,
723 A pointer to the hash map instance to be examined, as returned by the [hash_map_create()](#hash_map_create) function.
726 A pointer to a variable of type `size_t` where the current iterator state (position) is saved.
727 This variable **must** be initialized to the value `0U`, by the calling application, prior to the the *first* invocation!
728 Each invocation will update the value of `*cursor`. This value **shall not** be altered by the application.
731 A pointer to a variable of type `value_t` where the next key in the map is stored on success.
732 The content of the variable should be considered *undefined*, if the invocation has failed.
735 A pointer to a variable of type `value_t` where the next value in the map is stored on success.
736 The content of the variable should be considered *undefined*, if the invocation has failed.
740 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
743 An invalid argument was given, e.g. `instance` was map to `NULL`.
746 No more entries. The end of the map has been encountered.
749 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
753 Returns the current number of (distinct) keys in the hash map.
756 size_t hash_map_size(
757 const hash_map_t *const instance
764 A pointer to the hash map instance to be examined, as returned by the [hash_map_create()](#hash_map_create) function.
768 This function returns the number of (distinct) keys in the hash map.
772 Returns technical information about the hash map.
775 errno_t hash_map_info(
776 const hash_map_t *const instance,
777 size_t *const capacity,
779 size_t *const deleted,
787 A pointer to the hash map instance to be examined, as returned by the [hash_map_create()](#hash_map_create) function.
790 A pointer to a variable of type `size_t` where the current total *capacity* of the hash map is stored.
791 This value will always be greater than or equal to the sum of the *valid* and *deleted* entries.
794 A pointer to a variable of type `size_t` where the current number of *valid* entries in the hash map is stored.
795 This value is equivalent to the return value of the [hash_map_size()](#hash_map_size) function.
798 A pointer to a variable of type `size_t` where the current number of *deleted* entries in the hash map is stored.
799 For technical reasons, entires are *not* removed from the map immediately, but are marked as "deleted".
802 A pointer to a variable of type `size_t` where the current "grow" *limit* of the hash map is stored.
803 The hash map is grown automatically, as soon as the sum of the *valid* and *deleted* entries exceeds this limit.
807 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
810 An invalid argument was given, e.g. `instance` was map to `NULL`.
813 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
817 Dump the current status and content of all "slots" of the hash map.
820 errno_t hash_map_dump(
821 const hash_map_t *const instance,
822 const hash_map_callback_t callback
829 A pointer to the hash map instance to be examined, as returned by the [hash_map_create()](#hash_map_create) function.
832 A pointer to the callback function that will be invoked once for every "slot" in the hash map.
834 The callback function is defined as follows:
836 typedef int (*hash_map_callback_t)(
847 The index of the current "slot" within the hash map.
850 Indicates the status of the current "slot":
851 - `'u'` – the slot is *unused*
852 - `'v'` – the slot is *valid*
853 - `'d'` – the slot is *deleted*
856 The key that is stored at the current "slot" index.
859 The value that is stored at the current "slot" index.
863 If the function returns a *non-zero* value, the iteration continues; otherwise it is cancelled.
867 On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
870 An invalid argument was given, e.g. `instance` was map to `NULL`.
873 The operation was cancelled by the calling application.
876 Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
881 LibHashSet is ***thread-safe***, in the sense that all public functions operate *exclusively* on the given `hash_set_t` or `hash_map_t` instance; there is **no** implicit shared "global" state. This means that **no** synchronization is required in multi-threaded applications, provided that each instance is created and accessed only by a ***single*** thread.
883 However, LibHashSet does **nothing** to synchronize access to a particular `hash_set_t` or `hash_map_t` instance! Consequently, in situations where the *same* instance needs to be shared across *multiple* concurrent threads, the calling application is responsible for serializing all access to the "shared" instance, e.g. by using a [*mutex*](https://pubs.opengroup.org/onlinepubs/007908799/xsh/pthread_mutex_lock.html) lock!
888 The "official" Git repository is mirrored at:
890 * `git clone https://github.com/lordmulder/HashSet.git` ([Browse](https://github.com/lordmulder/HashSet))
892 * `git clone https://gitlab.com/lord_mulder/libhashset.git` ([Browse](https://gitlab.com/lord_mulder/libhashset))
894 * `git clone https://repo.or.cz/HashSet.git` ([Browse](https://repo.or.cz/HashSet.git))
896 * `git clone https://punkindrublic.mooo.com:3000/Muldersoft/LibHashSet.git` ([Browse](https://punkindrublic.mooo.com:3000/Muldersoft/LibHashSet))
902 This section describes how to build LibHashSet from the source codes.
907 On Microsoft Windows, project/solution files are provided to build LibHashSet with [Microsoft Visual Studio](https://visualstudio.microsoft.com/).
909 LibHashSet also can be built using *GCC* or *Clang* via [MSYS2/Mingw-w64](https://www.msys2.org/) or [Cygwin](https://www.cygwin.com/). Be sure that *GCC* (or *Clang*) and *GNU make* are installed. Then run `make` from the project base directory, in the MSYS2 (or Cygwin) shell!
911 Linux, *BSD and MacOS X
912 ---------------------------
914 On Linux, *BSD or MacOS X, building LibHashSet with *GCC* or *Clang* is supported. *GNU make* is required to build.
916 Simply run `make` (or `gmake`, if on *BSD) from the project base directory. That's it!
918 ### Influential environment variables
920 The following environment variables can be used to control the build process:
922 * `CC` – specifies the C compiler (default is `cc`)
924 * `STRIP` – set to a non-zero value in order to *strip* the generated binaries
926 * `STATIC` – set to a non-zero value in order to enable *static* linking for the example/test programs
928 * `FLTO` – set to a non-zero value in order to enable *link-time optimizer* (`-flto`)
930 * `DEBUG` – set to a non-zero value in order to enable "debug" build
932 * `ASAN` – set to a non-zero value in order to enable the [address-sanitizer](https://en.wikipedia.org/wiki/AddressSanitizer)
938 This work has been released under the **CC0 1.0 Universal** license.
940 For details, please refer to:
941 <https://creativecommons.org/publicdomain/zero/1.0/legalcode>