|
Lite³
A JSON-Compatible Zero-Copy Serialization Format
|
The previous guide was about reading messages. This guide will be about strings.
Strings, are just pointers. This guide will show how to safely read strings pointing to live message data using the LITE3_STR() macro.
This guide is based on an example inside the Lite³ repository found in examples/context_api/03-strings.c:
Output:
We will walk through the example code step-by-step, explaining the use of Lite³ library functions.
Lite³ messages are just bytes, stored contiguously inside a buffer. If you want to allocate these messages inside your own custom allocators, you can using the Buffer API. However in this guide, we will be using the Context API so that memory is managed automatically.
Note that Lite³ is a binary format, but the examples print message data as JSON to stdout for better readability.
As explained in the first guide, we use contexts to store Lite³ buffers (see: Context API):
We then insert some basic fields:
At this point, the message looks like this:
Now we enter a curious sequence:
email string and store it in a variable"phone"email stringLet's see the code first:
Output:
If we however uncomment the code snippet to overwrite the reference after the mutation has occured:
Now the string is valid again, and the macro returns a direct pointer.
NULL character to strings, although it is not included in the length count. This makes sure that Lite³ strings are compatible with functions expecting C strings.Strings in C are typically declared as const char * or just char *. So why then do we bother with this lite3_str struct?
To understand why, imagine we were using just a simple char pointer. When we read a string field, we store the char pointer somewhere, and read it when we want. Right?
Wrong.
Because as you see in the above code sample, what happens when we insert data? Inserting data will mutate the buffer, changing its internal structure. Since this affects the underlying data, we can no longer guarantee that the pointer will be valid. If we're lucky, the string is still located in the same place. However if unlucky, we could run into a dangling pointer scenario.
With most other serialization formats, the data, once serialized, is immutable. Therefore it is trivial to hand out char pointers. Lite³ however is mutable 'by design'. Data is always simultaneously serialized and mutable. If we hand out string references, we we cannot possibly keep track of all their individual lifetimes, let alone their validity.
One solution would be to always copy string data. This way, the copy remains safe to read. However this would require extra memory allocation and violate our 'zero-copy philosophy'. Being able to read data directly in-place is an incredibly powerful optimization. But how then can we provide this ability without compromising on safety or useability?
For this, Lite³ implements a safety mechanism called 'generational pointers', also known as 'generational references'.
Basically, every pointer stores an extra 'generation' field. Pointers always point to some data source, which also contain this field. When a pointer is obtained, its generation matches that of the source. As long as its generation matches that of the source, it is considered 'valid'. If however the data source is changed or modified in any way, we increment the source generation (new data, new generation). All outstanding references to the previous generation are now outdated and therefore 'invalid'. Attempting to read such a reference will return NULL.
This becomes visible through the lite3_str struct members:
The generation count is not read manually, but using a macro:
ctx->buf is the buffer pointer (uint8_t *), pointing to the start of the serialized message stored inside the context.email is an instance of lite3_str.Every Lite³ message stores an internal generation count that is incremented on buffer mutation, typically by Object Set functions. The macro will compare the generation count of the buffer with that of the string reference. If they match, the string pointer is returned. Otherwise, it returns NULL.
This mechanism allows for safe references that will automatically invalidate when the underlying data is changed. It provides safety and peace of mind, knowing that pointers are safe to dereference. Application developers are highly encouraged to use this pattern throughout their codebases.
For those worried about the runtime performance impact of this macro, it is almost non-existant. The check is branchless and essentially compiles down to a single cmov.
All examples until now have used lite3_ctx_get_str(). This function inserts a string into a Lite³ message. Behind the scenes, a call to strlen() is made to reserve enough space. This means the entire string must be scanned to find out its length. But what if you already know the length beforehand?
Then you can use lite3_ctx_set_str_n():
It just takes one extra parameter: str_len. This is the length of the string, excluding NULL-terminator.
Finally after all insertions, we can see what the data looks like:
Output:
We are good citizens, so we clean up after ourselves:
This destroys the context, freeing all the internal buffers so that the memory is released. When you create contexts, don't forget to destroy them, or else you will be leaking memory.
This was the third guide showing how to use strings.
In the next guide we will start seeing nested objects: Next Guide: Nesting