Generic data structures in C
Static typing is a great tool. It reduces a lot of bugs. However, having too many types hinders the uniformity of your code.
Modern languages have tooling that does generic types automatically, but C does not. I don’t always agree that creating and using generics is a good idea, but I use generics in C.
How C programmers normally do it
Usually, C programmers use void*
and wrap functions around the operations. Doing this creates friction, and it has no type safety.
Note that in cases where you don’t need type safety or easy top-level operations,
void*
may be an excellent tool.
Example of a void*
based linked list:
typedef struct generic_linked_list generic_linked_list;
struct generic_linked_list {
generic_linked_list* next;
void* data;
};
generic_linked_list* create_node(void* data, size_t data_size_bytes) {
generic_linked_list* res = malloc(sizeof(*res));
res->next = 0;
res->data = malloc(data_size_bytes);
memcpy(res->data, data, data_size_bytes);
}
void llist_stack_push(generic_linked_list** first_node, void *data, size_t data_size_bytes)
{
generic_linked_list* new_node = create_node(data, data_size_bytes);
new_node->next = *first_node;
*first_node = new_node;
}
The simple way
Instead of plastering void*
everywhere, create macros that assume member names.
#define llist_queue_push(f,l,n) ((f)==0) ? (f)=(l)=(n) : ((l)->next=(n), (l)=(n), (n)->next=0)
#define llist_queue_pop(f,l) ((f)==(l)) ? ((f)=(l)=0) : ((f)=(f)->next)
#define llist_stack_push(f,n) ((n)->next=(f), (f)=(n))
#define llist_stack_pop(f) ((f)==0) ? 0 : ((f)=(f->next))
Having generic operations encourages defining the structures yourself.
typedef struct llist_int llist_int;
struct llist_int {
llist_int* next;
int val;
}
Then you use it like below.
llist_int* list_first;
llist_int* n = malloc((sizeof(*n)));
n->val = 10;
llist_stack_push(list_first, n);
It’s a lot simpler. It’s type safe and gives the user control of memory management.
Meanwhile, with the void*
method, merely reading the list requires typing *(int*)list_first->data
.
Retaining control over data types also makes it straightforward to compose large structures.
typedef struct tree_node tree_node;
struct tree_node {
tree_node* parent;
tree_node* child_first;
tree_node* child_last;
tree_node* next;
tree_node* prev;
int val;
}
We can use this structure as is. We don’t need the boilerplate that comes with a different solution.
The “hide away” method
You can also conceal the header information and provide the type in a friendlier way.
typedef struct {
int sz;
} _dyn_arr_header;
#define dyn_arr(T) T*
#define dyn_arr_get_sz(arr) ((arr) ? ((_dyn_arr_header*)((u8*)(arr) - sizeof(_dyn_arr_header)))->sz : 0)
#define dyn_arr_push(arr, new_val) ( \
arr = realloc(arr, sizeof(_dyn_arr_header) + (dyn_arr_get_sz(arr) + 1) * sizeof(*arr)), \
arr[dyn_arr_get_sz(arr)-1] = new_val \
)
dyn_arr(float) my_arr = NULL;
dyn_arr_push(my_arr, 10.0f);
dyn_arr_push(my_arr, 20.0f);
assert(my_arr[1] == 20.0f);;
I don’t use dynamic arrays anymore, and that’s mostly where this technique is helpful. This approach creates more obfuscated and harder-to-read code than exposing types.
Going a bit crazy
Although using generic operations rather than types is the takeaway from this post, I figured I should demonstrate an extreme example. Let’s make a hash table.
To ease use, we can provide macros that generate the correct structures.
#define htable_def_ex(_htable_entry_T) \
struct { \
u64 table_sz; \
_htable_entry_T** table; \
}
// If you don't care about having anonymous types, you can use this:
#define htable_def(_T) htable_def_ex(struct {u64 hash; void* next; _T val;})
We wrap our operations in generic macros like before. This time, using internal functions and processing the input and output of them.
typedef struct {u64 hash; void* next;} null_htable_entry_t;
typedef htable_def_ex(null_htable_entry_t) _null_htable_t;
#define htable_find(_HTABLE, _KEY) \
((_HTABLE) ? ( \
(typeof((_HTABLE)->table[0]))_htable_find((_HTABLE), (_KEY)) \
) : NULL)
void* _htable_find(void* htable, uint64_t key)
{
_null_htable_t* ht = htable;
uint64_t index = key % ht->table_sz;
for (null_htable_entry_header_t* it = ht->table[index]; it; it = it->next) {
if (it->hash == key) return (u8*)it + sizeof(null_htable_entry_t);
}
return NULL;
}
#define htable_find_or_create(_HTABLE, _KEY)\
((typeof((_HTABLE)->table[0]))_htable_find_or_create(_HTABLE, sizeof(*(_HTABLE)->tmp_list), _KEY))
void* _htable_find_or_create(void* htable, size_t entry_sz, uint64_t key)
{
null_htable_t* ht = htable;
null_htable_entry_t* result = _htable_find(htable, key);
if (!result) {
uint64_t index = key % ht->table_sz;
result = malloc(sizeof(null_htable_entry_t) + entry_sz);
result->next = ht->table[index];
result->hash = key;
ht->table[index] = result;
}
return result;
}
Usage code:
htable_def(int) int_htable;
int_htable id_table = {
.table_sz = 1001,
.table = malloc(1001 * sizeof(*id_table.table)),
};
{
int* res = htable_find_or_create(&id_table, key);
*res = 10;
}
{
int* res = htable_find(&id_table, key);
assert(res);
assert(res == 10);
}
If you wish to avoid using typeof
, you can make another struct member to use as the “return” value. Like this:
#define htable_def_ex(_htable_entry_T) \
struct { \
u64 table_sz; \
_htable_entry_T** table; \
_htable_entry_T* last_find; \
}
#define htable_find(_HTABLE, _KEY) \
((_HTABLE) ? ( \
(_HTABLE)->last_find = _htable_find((_HTABLE), (_KEY)) \
) : NULL)
What you should remember
The best generic data structure is no generic data structure. Nowadays, I almost exclusively use linked lists. They are flexible, pointer stable, and thereby play nicely with arenas. The hash table we made also plays well with Arenas.
I rarely long for a dynamic array, a hash table, or other more complicated data structures. In the cases where you actually would benefit from having a reusable generic data structure, don’t do it the void*
way. Try doing the types raw and focus on making the operations “generic”.
You will be surprised at how much it reduces friction and boilerplate.
For the non-C people reading, you might have become thankful that your tooling does generics automatically. I think of it differently. One of the beauties of C is that complicated things are hard to do. Experiencing that hardship directs you toward simplicity.