|
Lite³
A JSON-Compatible Zero-Copy Serialization Format
|
The previous guide was about nesting. This guide will be cover arrays.
Basics first: arrays are ordered, while objects are unordered. Arrays also don't have keys, but indexes.
The performance characteristics of arrays in Lite³ are somewhat unusual. The following operations:
In Lite³ all have logarithmic time complexity of O(log N). The underlying datastructure is a B-tree where indexes are used as keys.
Typically, arrays are implemented as a contiguous set of elements. For fixed-sized elements, this provides constant time access O(1) for individual elements. However, JSON-like data often contains variable sized elements like strings. Also, appending is also not always possible due to neighbouring data and may require relocating elements. As a result, the above operations could all degrade to O(N) for 'naive arrays'.
Since Lite³ is mutable 'by design', O(N) time for single-element operations is not permittable. The O(log N) B-tree guarantee enables efficient mutation of serialized data, such that latency remains predictable even in a realtime setting. The implementation for objects and arrays is also shared inside the source code, both using the same B-tree structure. Despite this implementation, the user API is no different to what programmers are used to for arrays.
This guide is based on an example inside the Lite³ repository found in examples/context_api/05-arrays.c:
Output:
We will walk through the example code step-by-step, explaining the use of Lite³ library functions.
Lite³ messages are just bytes, stored contiguously inside a buffer. If you want to allocate these messages inside your own custom allocators, you can using the Buffer API. However in this guide, we will be using the Context API so that memory is managed automatically.
Note that Lite³ is a binary format, but the examples print message data as JSON to stdout for better readability.
As explained in the first guide, we use contexts to store Lite³ buffers (see: Context API):
Normally, we initilize the buffer as an object. This time for a change, it will be an array:
Converting Lite³ to JSON shows us what this data looks like:
Output:
Array functions, for the most part, have 'arr' inside the name. Here is how to get an element by index:
Output:
For the function lite3_ctx_arr_get_str(), the index is the 3rd parameter. The 2nd parameter is the ofs or 'offset' parameter (see Nesting).
Output:
One interesting fact about lite3_ctx_count() is that it does not have 'arr' in the name. This is because it works on both objects and arrays. For objects, the number of entries (key-value pairs) is returned. For arrays, the number of elements.
(uint32_t), while buffer offsets and lengths are (size_t).Any real programmer should know that the first index is 0 and the last is (element_count - 1).
Luckily, if you are reading this then you probably are a real programmer:
Output:
Now we get to the juicy part. Recall that Lite³ data is simulateously serialized and mutable. A read or write to any individual value can be satisfied without requiring parsing or processing of a message in full. This includes appending to- and setting elements inside arrays.
Let's demonstrate this by replacing the element at index 2 with the string "gnu":
Here is the code:
Output:
Notice that the string got replaced. But also, that the buffer length (message size) stayed entirely unchanged at 168 bytes, which it also was before. This is the power of Lite³. No parsing. No processing. Change any value just like that. Serialized data is mutable.
Unfortunately, there is no magic. In this case, the new string value was smaller than the previous ("gnu" vs "buffalo"). If we override with a string that larger than the existing one, where do we get the extra space from? Inevitably, message size will have to increase.
Let's try the same thing, but instead we replace "lion" with "springbok":
Output:
The operation succeeded, though at the cost of increasing message size.
Internally, situations like this cause the new data to be appended to the buffer. Unfortunately, this leaves an empty gap at the previous storage location of "lion". More unfortunate, this wasted space will never be recovered. Recovering it would require a sophisticated defragmentation or memory allocation system. This means that if we keep replacing variable sized strings, the buffer will eventually fragment and grow indefinitely.
To rid the buffer of these fragments, it must be rebuilt from scratch. Applications can choose to increase message size until the cost of carrying unused bytes outweighs the cost of rebuilding. If we had an array of integers or doubles however, we can keep overriding them forever without ever increasing buffer size. This works because those types are fixed-size.
All the above is equally valid for overwriting of values inside objects.
One function commonly provided by dynamic programming languages is 'insert' in the middle of an array. That is, to set an alement at a target index and 'shift' all elements after it to make room.
This operation is not (yet) supported by Lite³. A workaround is to set all the elements individually, or to completely replace the array.
We are good citizens, so we clean up after ourselves:
This destroys the context, freeing all the internal buffers so that the memory is released. When you create contexts, don't forget to destroy them, or else you will be leaking memory.
This was the fifth guide covering arrays.
The next guide will show iterators: Next Guide: Iterators