Generic data structures in C

Static typing is a great tool. It reduces a lot of bugs. However, having too many types hinders the uniformity of your code.

Modern languages have tooling that does generic types automatically, but C does not. I don’t always agree that creating and using generics is a good idea, but I use generics in C.

How C programmers normally do it

Usually, C programmers use void* and wrap functions around the operations. Doing this creates friction, and it has no type safety.

Note that in cases where you don’t need type safety or easy top-level operations, void* may be an excellent tool.

Example of a void* based linked list:

typedef struct generic_linked_list generic_linked_list;
struct generic_linked_list {
    generic_linked_list* next;
    void* data;
};

generic_linked_list* create_node(void* data, size_t data_size_bytes) {
    generic_linked_list* res = malloc(sizeof(*res));
    res->next = 0;
    res->data = malloc(data_size_bytes);
    memcpy(res->data, data, data_size_bytes);
}

void llist_stack_push(generic_linked_list** first_node, void *data, size_t data_size_bytes)
{
    generic_linked_list* new_node = create_node(data, data_size_bytes);
    new_node->next = *first_node;
    *first_node = new_node;
}

The simple way

Instead of plastering void* everywhere, create macros that assume member names.

#define llist_queue_push(f,l,n) ((f)==0) ? (f)=(l)=(n) : ((l)->next=(n), (l)=(n), (n)->next=0)
#define llist_queue_pop(f,l) ((f)==(l)) ? ((f)=(l)=0) : ((f)=(f)->next)

#define llist_stack_push(f,n) ((n)->next=(f), (f)=(n))
#define llist_stack_pop(f) ((f)==0) ? 0 : ((f)=(f->next))

Having generic operations encourages defining the structures yourself.

typedef struct llist_int llist_int;
struct llist_int {
    llist_int* next;
    int val;
}

Then you use it like below.

llist_int* list_first;

llist_int* n = malloc((sizeof(*n)));
n->val = 10;
llist_stack_push(list_first, n);

It’s a lot simpler. It’s type safe and gives the user control of memory management.

Meanwhile, with the void* method, merely reading the list requires typing *(int*)list_first->data.

Retaining control over data types also makes it straightforward to compose large structures.

typedef struct tree_node tree_node;
struct tree_node {
    tree_node* parent;

    tree_node* child_first;
    tree_node* child_last;

    tree_node* next;
    tree_node* prev;
    
    int val;
}

We can use this structure as is. We don’t need the boilerplate that comes with a different solution.

The “hide away” method

You can also conceal the header information and provide the type in a friendlier way.

typedef struct {
    int sz;
} _dyn_arr_header;
#define dyn_arr(T) T*

#define dyn_arr_get_sz(arr) ((arr) ? ((_dyn_arr_header*)((u8*)(arr) - sizeof(_dyn_arr_header)))->sz : 0)

#define dyn_arr_push(arr, new_val) (                                      \
        arr = realloc(arr, sizeof(_dyn_arr_header) + (dyn_arr_get_sz(arr) + 1) * sizeof(*arr)), \
        arr[dyn_arr_get_sz(arr)-1] = new_val                              \
    )
dyn_arr(float) my_arr = NULL;

dyn_arr_push(my_arr, 10.0f);
dyn_arr_push(my_arr, 20.0f);

assert(my_arr[1] == 20.0f);;

I don’t use dynamic arrays anymore, and that’s mostly where this technique is helpful. This approach creates more obfuscated and harder-to-read code than exposing types.

Going a bit crazy

Although using generic operations rather than types is the takeaway from this post, I figured I should demonstrate an extreme example. Let’s make a hash table.

To ease use, we can provide macros that generate the correct structures.

#define htable_def_ex(_htable_entry_T)          \
    struct {                                    \
        u64 table_sz;                           \
        _htable_entry_T** table;                \
    }
    
// If you don't care about having anonymous types, you can use this:
#define htable_def(_T) htable_def_ex(struct {u64 hash; void* next; _T val;})

We wrap our operations in generic macros like before. This time, using internal functions and processing the input and output of them.

typedef struct {u64 hash; void* next;} null_htable_entry_t;
typedef htable_def_ex(null_htable_entry_t) _null_htable_t;

#define htable_find(_HTABLE, _KEY)                                   \
    ((_HTABLE) ? (                                                   \
        (typeof((_HTABLE)->table[0]))_htable_find((_HTABLE), (_KEY)) \
    ) : NULL)

void* _htable_find(void* htable, uint64_t key)
{
    _null_htable_t* ht = htable;
    uint64_t index = key % ht->table_sz;

    for (null_htable_entry_header_t* it = ht->table[index]; it; it = it->next) {
        if (it->hash == key) return (u8*)it + sizeof(null_htable_entry_t);
    }
    return NULL;
}

#define htable_find_or_create(_HTABLE, _KEY)\
    ((typeof((_HTABLE)->table[0]))_htable_find_or_create(_HTABLE, sizeof(*(_HTABLE)->tmp_list), _KEY))

void* _htable_find_or_create(void* htable, size_t entry_sz, uint64_t key)
{
    null_htable_t* ht = htable;
    null_htable_entry_t* result = _htable_find(htable, key);

    if (!result) {
        uint64_t index = key % ht->table_sz;

        result = malloc(sizeof(null_htable_entry_t) + entry_sz);

        result->next = ht->table[index];
        result->hash = key;
        ht->table[index] = result;
    }
    return result;
}

Usage code:

htable_def(int) int_htable;
int_htable id_table = {
    .table_sz = 1001,
    .table = malloc(1001 * sizeof(*id_table.table)),
};

{
    int* res = htable_find_or_create(&id_table, key);
    *res = 10;
}

{
    int* res = htable_find(&id_table, key);
    assert(res);
    assert(res == 10);
}

If you wish to avoid using typeof, you can make another struct member to use as the “return” value. Like this:

#define htable_def_ex(_htable_entry_T)          \
    struct {                                    \
        u64 table_sz;                           \
        _htable_entry_T** table;                \
        _htable_entry_T* last_find;             \
    }

#define htable_find(_HTABLE, _KEY)                             \
    ((_HTABLE) ? (                                             \
        (_HTABLE)->last_find = _htable_find((_HTABLE), (_KEY)) \
    ) : NULL)

What you should remember

The best generic data structure is no generic data structure. Nowadays, I almost exclusively use linked lists. They are flexible, pointer stable, and thereby play nicely with arenas. The hash table we made also plays well with Arenas.

I rarely long for a dynamic array, a hash table, or other more complicated data structures. In the cases where you actually would benefit from having a reusable generic data structure, don’t do it the void* way. Try doing the types raw and focus on making the operations “generic”.

You will be surprised at how much it reduces friction and boilerplate.


For the non-C people reading, you might have become thankful that your tooling does generics automatically. I think of it differently. One of the beauties of C is that complicated things are hard to do. Experiencing that hardship directs you toward simplicity.

· writing, programming, C