descriptionSimple hash set implementation for C99
homepage URLhttp://muldersoft.com/
ownermulder2@gmx.de
last changeTue, 6 Dec 2022 14:15:46 +0000 (6 15:15 +0100)
content tags
add:
README.md

Introduction

LibHashSet is a hash set and hash map implementation for C99. It uses open addressing and double hashing.

At this time, the only types of elements supported are uint16_t, uint32_t and uint64_t.

This hash set implementation has successfully been tested to efficiently handle several billions of items 😏

Getting Started

Here is a simple example of how to use LibHashSet in your application:

#include <hash_set.h>
#include <stdio.h>

int main(void)
{
        uint64_t item;
        uintptr_t cursor = 0U;

        /* create new hash set instance */
        hash_set64_t* const hash_set = hash_set_create64(0U, -1.0, 42U);
        if (!hash_set)
        {
                fputs("Allocation has failed!\n", stderr);
                return EXIT_FAILURE;
        }

        /* add a number of items to the hash set, the set will grow as needed */
        puts("Insertign items, please wait...");
        while (have_more_items())
        {
                const errno_t error = hash_set_insert64(hash_set, get_next_item());
                if (error)
                {
                        fprintf(stderr, "Insert operation has failed! (error: %d)\n", error);
                        return EXIT_FAILURE;
                }
        }
        puts("Done.\n");

        /* print total number of items in the hash set*/
        printf("Total number of items: %zu\n\n", hash_set_size64(hash_set));

        /* print all items in the set */
        while (hash_set_iterate64(hash_set, &cursor, &item) == 0)
        {
                printf("Item: %016llX\n", item);
        }

        /* destroy the hash set, when it is no longer needed! */
        hash_set_destroy64(hash_set);
        return EXIT_SUCCESS;
}

API Reference

This section describes the LibHashSet programming interface. The functions for managing hash sets are declared in <hash_set.h>, whereas the functions for managing to hash maps are declared in <hash_map.h>.

LibHashSet supports sets and maps containing elements of different integral types. For each element type, separate functions are provided. The functions for uint16_t, uint32_t and uint64_t can be distinguished by the suffix …16, …32 and …64, respectively. In the following, the functions are described in their "generic" form.

Note: On Microsoft Windows (Visual C++), when using LibHashSet as a "shared" library (DLL file), the macro HASHSET_DLL must be defined before including the <hash_set.h> or <hash_map.h> header files!

Types

hash_set_t

A struct that represents a LibHashSet hash set instance. Hash set instances can be allocated and de-allocated via the hash_set_create() and hash_set_destroy() functions, respectively.

Note: Application code shall treat this struct as opaque. The internals may change in future versions!

typedef struct _hash_set hash_set_t;

hash_map_t

A struct that represents a LibHashSet hash map instance. Hash map instances can be allocated and de-allocated via the hash_map_create() and hash_map_destroy() functions, respectively.

Note: Application code shall treat this struct as opaque! The internals may change in future versions!

typedef struct _hash_map hash_map_t;

Globals

Version information

The major, minor and patch version of the LibHashSet library:

extern const uint16_t HASHSET_VERSION_MAJOR;
extern const uint16_t HASHSET_VERSION_MINOR;
extern const uint16_t HASHSET_VERSION_PATCH;

Build information

The build date and time of the LibHashSet library:

extern const char *const HASHSET_BUILD_DATE;
extern const char *const HASHSET_BUILD_TIME;

Set Functions

This section describes all functions for managing hash_set_t instances.

hash_set_create()

Allocates a new hash set instance. The new hash set instance is empty initially.

hash_set_t *hash_set_create(
        const size_t initial_capacity,
        const double load_factor,
        const uint64_t seed
);

Parameters

Return value

On success, this function returns a pointer to a new hash set instance. On error, a NULL pointer is returned.

Note: To avoid a memory leak, the returned pointer must be de-allocated by the application using the hash_set_destroy() function, as soon as the instance is not needed anymore!

hash_set_destroy()

De-allocates an existing hash set instance. All items in the hash set are discarded.

void hash_set_destroy(
        hash_set_t *instance
);

Parameters

hash_set_insert()

Tries to insert the given item into the hash set. The operation fails, if the set already contains the given item.

Note: If the item is actually inserted, then the hash set may need to grow.

errno_t hash_set_insert(
        hash_set_t *const instance,
        const value_t item
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_set_remove()

Tries to remove the given item from the hash set. The operation fails, if the set does not contain the given item.

Note: If the item is actually removed, then the hash set may shrink.

errno_t hash_set_remove(
        hash_set_t *const instance,
        const value_t item
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_set_clear()

Discards all items from the hash set at once.

errno_t hash_set_clear(
        hash_set_t *const instance
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_set_contains()

Tests whether the hash set contains an item. The operation fails, if the set does not contain the given item.

errno_t hash_set_contains(
        const hash_set_t *const instance,
        const value_t item
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_set_iterate()

Iterates through the items stored in the hash set. The elements are iterated in no particular order.

This function returns one item at a time. It should be called repeatedly, until the end of the set is encountered.

Warning: The result is undefined, if the set is modified while the iteration is in progress!

errno_t hash_set_iterate(
        const hash_set_t *const instance,
        size_t *const cursor,
        value_t *const item
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_set_size()

Returns the current number of (distinct) items in the hash set.

size_t hash_set_size(
        const hash_set_t *const instance
);

Parameters

Return value

This function returns the number of (distinct) items in the hash set.

hash_set_info()

Returns technical information about the hash set.

errno_t hash_set_info(
        const hash_set_t *const instance,
        size_t *const capacity,
        size_t *const valid,
        size_t *const deleted,
        size_t *const limit
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_set_dump()

Dump the current status and content of all "slots" of the hash set.

errno_t hash_set_dump(
        const hash_set_t *const instance,
        const hash_map_callback_t callback
);

Parameters

typedef int (*hash_map_callback_t)(
        const size_t index,
        const char status,
        const value_t key,
        const value_t value
);

##### Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

Map Functions

This section describes all functions for managing hash_map_t instances.

hash_map_create()

Allocates a new hash map instance. The new hash map instance is empty initially.

hash_map_t *hash_map_create(
        const size_t initial_capacity,
        const double load_factor,
        const uint64_t seed
);

Parameters

Return value

On success, this function returns a pointer to a new hash map instance. On error, a NULL pointer is returned.

Note: To avoid a memory leak, the returned pointer must be de-allocated by the application using the hash_map_destroy() function, as soon as the instance is not needed anymore!

hash_map_destroy()

De-allocates an existing hash map instance. All key-value pairs in the hash map are discarded.

void hash_map_destroy(
        hash_map_t *instance
);

Parameters

hash_map_insert()

Tries to insert the given key-value pair into the hash map.

Note: If the key is actually inserted, then the hash map may need to grow.

errno_t hash_map_insert(
        hash_map_t *const instance,
        const value_t key,
        const value_t value,
        const int update
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_map_remove()

Tries to remove the given key from the hash map. The operation fails, if the map does not contain the given key.

The value associated with the removed key is discarded.

Note: If the key is actually removed, then the hash map may shrink.

errno_t hash_map_remove(
        hash_map_t *const instance,
        const value_t key,
        value_t *const value
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_map_clear()

Discards all key-value pairs from the hash map at once.

errno_t hash_map_clear(
        hash_map_t *const instance
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_map_contains()

Tests whether the hash map contains a key. The operation fails, if the map does not contain the given key.

errno_t hash_map_contains(
        const hash_map_t *const instance,
        const value_t key
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_map_get()

Retrieves the value that is associated with the given key. The operation fails, if the map does not contain the key.

errno_t hash_map_get(
        const hash_map_t *const instance,
        const value_t key,
        value_t *const value
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_map_iterate()

Iterates through the key-value pairs stored in the hash map. The entries are iterated in no particular order.

This function returns one key-value pair at a time. It should be called repeatedly, until the end of the map is encountered.

Warning: The result is undefined, if the map is modified while the iteration is in progress!

errno_t hash_map_iterate(
        const hash_map_t *const instance,
        size_t *const cursor,
        value_t *const key,
        value_t *const value
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_map_size()

Returns the current number of (distinct) keys in the hash map.

size_t hash_map_size(
        const hash_map_t *const instance
);

Parameters

Return value

This function returns the number of (distinct) keys in the hash map.

hash_map_info()

Returns technical information about the hash map.

errno_t hash_map_info(
        const hash_map_t *const instance,
        size_t *const capacity,
        size_t *const valid,
        size_t *const deleted,
        size_t *const limit
);

Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

hash_map_dump()

Dump the current status and content of all "slots" of the hash map.

errno_t hash_map_dump(
        const hash_map_t *const instance,
        const hash_map_callback_t callback
);

Parameters

typedef int (*hash_map_callback_t)(
        const size_t index,
        const char status,
        const value_t key,
        const value_t value
);

##### Parameters

Return value

On success, this function returns zero. On error, the appropriate error code is returned. Possible error codes include:

Thread Safety

LibHashSet is thread-safe, in the sense that all public functions operate exclusively on the given hash_set_t or hash_map_t instance; there is no implicit shared "global" state. This means that no synchronization is required in multi-threaded applications, provided that each instance is created and accessed only by a single thread.

However, LibHashSet does nothing to synchronize access to a particular hash_set_t or hash_map_t instance! Consequently, in situations where the same instance needs to be shared across multiple concurrent threads, the calling application is responsible for serializing all access to the "shared" instance, e.g. by using a mutex lock!

Source Code

The "official" Git repository is mirrored at:

Build Instructions

This section describes how to build LibHashSet from the source codes.

Windows

On Microsoft Windows, project/solution files are provided to build LibHashSet with Microsoft Visual Studio.

LibHashSet also can be built using GCC or Clang via MSYS2/Mingw-w64 or Cygwin. Be sure that GCC (or Clang) and GNU make are installed. Then run make from the project base directory, in the MSYS2 (or Cygwin) shell!

Linux, *BSD and MacOS X

On Linux, *BSD or MacOS X, building LibHashSet with GCC or Clang is supported. GNU make is required to build.

Simply run make (or gmake, if on *BSD) from the project base directory. That's it!

Influential environment variables

The following environment variables can be used to control the build process:

License

This work has been released under the CC0 1.0 Universal license.

For details, please refer to:
<https://creativecommons.org/publicdomain/zero/1.0/legalcode>

shortlog
2022-12-06 LoRd_MuldeRUpdated README file.master1.2.1
2022-12-06 LoRd_MuldeRAdded build helper script for MacOS X.
2022-12-05 LoRd_MuldeRAdded missing line breaks.
2022-12-05 LoRd_MuldeRExplicitly set LD_LIBRARY_PATH (or PATH, if running...
2022-12-04 LoRd_MuldeRFixed a circular dependency in Makefile.
2022-12-04 LoRd_MuldeRFixed build on MacOS X.
2022-12-04 LoRd_MuldeRReturn error EFBIG, if the set or map cannot grow any...
2022-12-04 LoRd_MuldeRhash_map_insert(): Added parameter 'update' to specify...
2022-12-04 LoRd_MuldeRSmall improvement to Linux build script.
2022-12-04 LoRd_MuldeRAdded helper script for Linux and MinGW/Cygwin build.1.2.0
2022-12-03 LoRd_MuldeRUpdated README file.
2022-12-03 LoRd_MuldeRSmall improvement to map entry handling.
2022-12-03 LoRd_MuldeRUpdated README file.
2022-12-03 LoRd_MuldeRMSVC: Exclude DllMain() function from the "static"...
2022-12-03 LoRd_MuldeRFixed a few warnings of MSVC at "EnableAllWarnings...
2022-12-03 LoRd_MuldeRFixed detection of Clang compiler on Windows platform.
...
tags
16 months ago 1.2.1 Version 1.2.1 [2022-12-06]
16 months ago 1.2.0 Version 1.2.0 [2022-12-04]
16 months ago 1.1.0 Version 1.1.0 [2022-11-29]
17 months ago 1.0.0 Version 1.0.0 [2022-11-25]
heads
16 months ago master