|
Lite³
A JSON-Compatible Zero-Copy Serialization Format
|
Parse no more—the wire format is the memory format.
Lite³ is a JSON-compatible zero-copy serialization format able to encode semi-structured data in a lightweight binary format, suitable for embedded and no-malloc environments. The flagship feature is the possibility to apply mutations directly on the serialized form. With Lite³, you can insert any arbitrary key, with any arbitrary value, directly into a serialized message. Essentially, it functions as a serialized dictionary.
Some other formats provide this, but only limited to in-place updates of fixed size. As of writing (Nov 15th, 2025), this is a capability not provided by any other format. Additionally, Lite³ implements this without additional memory allocations and always guaranteeing O(log n) amortized time complexity with predicatable latency for IOPS.
As a result, the serialization boundary has been broken: 'parsing' or 'serializing' in the traditional sense is no longer necessary. Lite³ structures can be read and mutated similar to hashmaps or binary trees, and since they exist in a single contiguous buffer, they always remain ready to send. Other state-of-the-art binary- and zero-copy formats still require full message reserialization for any non-trivial mutation.
Compared to other binary formats, Lite³ is also schemaless, self-describing (no IDL or schema definitons required) and fully compatible with JSON, enabling seamless conversion between the two formats. This ensures compatibility with many existing datasets and APIs while also allowing for easy debugging/inspecting of messages.
Example to illustrate:
memcpy())Lite³ blurs the line between memory and wire formats. It eliminates several steps typically required in computer communications, unlocking new potential for realtime, embedded and high-performance applications.
malloc() API, caller provides buffer| Format name | Schemaless | Zero-copy reads[^1] | Zero-copy writes[^2] | Human-readable[^3] |
|---|---|---|---|---|
| Lite³ | ✅ | ✅ O(log n) | ✅ O(log n) | ⚠️ (convertable to JSON) |
| JSON | ✅ | ❌ | ❌ | ✅ |
| BSON | ✅ | ❌ | ❌ | ⚠️ (convertable to JSON) |
| MessagePack | ✅ | ❌ | ❌ | ⚠️ (convertable to JSON) |
| CBOR | ✅ | ❌ | ❌ | ⚠️ (convertable to JSON) |
| Smile | ✅ | ❌ | ❌ | ⚠️ (convertable to JSON) |
| Ion (Amazon) | ✅ | ❌ | ❌ | ⚠️ (convertable to JSON) |
| Protobuf (Google) | ❌ | ❌ | ❌ | ❌[^4] |
| Apache Arrow (based on Flatb.) | ❌ | ✅ O(1) | ❌ (immutable) | ❌ |
| Flatbuffers (Google) | ❌ | ✅ O(1) | ❌ (immutable) | ❌ |
| Flexbuffers (Google) | ✅ | ✅[^5] | ❌ (immutable) | ⚠️ (convertable to JSON) |
| Cap'n Proto (Cloudflare) | ❌ | ✅ O(1) | ⚠️ (in-place only) | ❌ |
| Thrift (Facebook) | ❌ | ❌ | ❌ | ❌ |
| Avro (Apache) | ❌ | ❌ | ❌ | ❌ |
| Bond (Microsoft, discontinued) | ❌ | ⚠️ (limited) | ❌ | ❌ |
| DER (ASN.1) | ❌ | ⚠️ (limited) | ❌ | ❌ |
| SBE | ❌ | ✅ O(1) | ⚠️ (in-place only) | ❌ |
[^1]: Zero-copy reads: The ability to perform arbitrary lookups inside the structure without deserializing or parsing it first. [^2]: Zero-copy writes: The ability to perform arbitrary mutations inside the structure without deserializing or parsing it first. [^3]: To be considered human-readable, all necessary information must be provided in-band (no outside schema). [^4]: Protobuf can optionally send messages in 'ProtoJSON' format for debugging, but in production systems they are still sent as binary and not inspectable without schema. Other binary formats also support similar features, however we do not consider these formats 'human-readable' since they rely on out-of-band information. [^5]: Flexbuffer access to scalars and vectors is O(1) (ints, floats, etc.). For maps, access is O(log n).
Remember that we judge the behavior of formats by their implementation rather than by their official spec. This is because we cannot judge the behavior of hypothetical non-existant implementations.
This benchmark by the authors of the official simdjson respository was created to compare JSON parsing performance for different C/C++ libraries.
An input dataset twitter.json is used, consisting ~632 kB of real twitter API data to perform a number of tasks, each having its own category:
std::vector.std::vector<uint64_t>.While these tasks are intended to compare JSON parsing performance, they represent real patterns inside applications in which data might be queried.
Text formats do not contain enough information for a parser to know the structure of the document immediately. This structure must be 'discovered' by finding brackets, commas, semicolons etc. Through this process, the parser acquires information necessary for traversal. An unfortunate result of this, is that typically the entire dataset must be fed through the CPU, even if a query is only interested in a subset or single field.
A zero-copy format will approach each problem in a different way. It already contains all the information necessary to find internal fields. Only some index structure is required, along with fields of interest. The rest of the dataset is irrelevant to the CPU and might never even enter cache. Therefore to answer a query like 'find tweet by ID', the actual bytes read may be counted only in the hundreds or low thousands out of ~632 kB.
Converting the dataset to Lite³ (a zero-copy format) to answer the exact same queries presents an opportunity to quantify this advantage and reveal something about the cost of text formats.
| Format | top_tweet | partial_tweets | find_tweet | distinct_user_id |
|---|---|---|---|---|
| yyjson | 205426 ns | - | 203147 ns | 207233 ns |
| simdjson On-Demand | 91184 ns | 91090 ns | 53937 ns | 85036 ns |
| simdjson DOM | 147264 ns | 153397 ns | 143567 ns | 150541 ns |
| RapidJSON | 1081987 ns | 1091551 ns | 1075215 ns | 1085541 ns |
| Lite³ Context API | 2285 ns | 17820 ns | 456 ns | 11869 ns |
| Lite³ Buffer API | 2221 ns | 17659 ns | 448 ns | 11699 ns |
To be clear: the other formats are parsing JSON.
Lite³ operates on the same dataset, but converted to binary Lite³ format in order to show the potential.
This benchmark is open source and can be replicated here.
A somewhat popular benchmark comparing the performance of different programming languages. In the JSON category, a ~115 MB JSON document is generated consisting of many floating point numbers representing coordinates. The program will be timed for how long it takes to sum all the numbers.
The aim for this test is similar: quantifying the advantage of a zero-copy format. This time, reading the entire dataset is unavoidable to produce a correct result. So instead, the emphasis will be on text-to-binary conversion. Because Lite³ stores numbers natively in 64 bits, there is no need to parse and convert ASCII-decimals. This conversion can be tricky for floating point numbers in particular.
| Language / Library | Execution Time | Memory Usage |
|---|---|---|
| C++/g++ (DAW JSON Link) | 0.094 s | 113 MB |
| C++/g++ (RapidJSON) | 0.1866 s | 238 MB |
| C++/g++ (gason) | 0.1462 s | 209 MB |
| C++/g++ (simdjson DOM) | 0.1515 s | 285 MB |
| C++/g++ (simdjson On-Demand) | 0.0759 s | 173 MB |
| C/gcc (lite3) | 0.027 s | 203 MB |
| C/gcc (lite3_context_api) | 0.027 s | 203 MB |
| Go (Sonic) | 0.2246 s | 121 MB |
| Rust (Serde Custom) | 0.113 s | 111 MB |
| Zig | 0.2493 s | 147 MB |
To be clear: the other formats are parsing JSON.
Lite³ operates on the same dataset, but converted to binary Lite³ format in order to show the potential.
This benchmark is open source and can be replicated here.
It is to be expected that binary formats will peform well compared to text formats. The comparison however is not entirely unwarranted. Pure binary formats present another category, typically requiring schema files and extra tooling. They are chosen by those who value performance over other considerations. In doing so, trade-offs are made in usability and flexibility.
Lite³ also being a binary format, rather opts for a schemaless design. This produces a more balanced set of trade-offs with the notable feature of JSON-compatibility.
Performance of course will remain a strong selling point. This next benchmark originates from the Cista++ serialization library to compare several binary formats, including zero-copy formats. The measurements cover the time required to serialize, deserialize and traverse a graph consisting of nodes and edges. The Cista++ authors created three variants for their format, notably the 'offset' and 'offset slim' variants where the edges use indices to reference nodes instead of pointers.
| Name | Serialize + Deserialize | Deserialize | Serialize | Traverse | Deserialize and traverse | Message size |
|---|---|---|---|---|---|---|
| Cap’n Proto | 66.55 ms | 0 ms | 66.55 ms | 210.1 ms | 211 ms | 50.5093 MB |
| cereal | 229.16 ms | 98.76 ms | 130.4 ms | 79.17 ms | 180.7 ms | 37.829 MB |
| Cista++ (offset) | 913.2 ms | 274.1 ms | 639.1 ms | 79.59 ms | 80.02 ms | 176.378 MB |
| Cista++ (offset slim) | 3.96 ms | 0.17 ms | 3.79 ms | 79.99 ms | 80.46 ms | 25.317 MB |
| Cista++ (raw) | 947.4 ms | 289.2 ms | 658.2 ms | 81.53 ms | 113.3 ms | 176.378 MB |
| Flatbuffers | 1887.49 ms | 41.69 ms | 1845.8 ms | 90.53 ms | 90.35 ms | 62.998 MB |
| Lite³ Buffer API | 7.79 ms | 4.77 ms | 3.02 ms | 79.39 ms | 84.92 ms | 38.069 MB |
| Lite³ Context API | 7.8 ms | 4.76 ms | 3.04 ms | 79.59 ms | 84.13 ms | 38.069 MB |
| zpp::bits | 4.66 ms | 1.9 ms | 2.76 ms | 78.66 ms | 81.21 ms | 37.8066 MB |
This benchmark is open source and can be replicated here.
Lite³ is a binary format, but the examples print message data as JSON to stdout for better readability.
Here is an example with error handling omitted for brevity, taken from examples/buffer_api/01-building-messages.c:
Output:
Lite³ provides an alternative API called the 'Context API' where memory management is abstracted away from the user.
This example is taken from examples/context_api/04-nesting.c. Again, with error handling omitted for brevity:
Output:
For a complete How-to Guide with examples, see the documentation.
| Command | Description |
|---|---|
make all | Build the static library with -O2 optimizations (default) |
make tests | Build and run all tests |
make examples | Build all examples |
make install | Install library in /usr/local (for pkg-config) |
make uninstall | Uninstall library |
make clean | Remove all build artifacts |
make help | Show this help message |
A gcc or clang compiler is required due to the use of various builtins.
First clone the repository:
Then choose between installation via pkg-config or manual linking.
Inside the project root, run:
This will build the static library, then install it to /usr/local and refresh the pkg-config cache. If installation was successful, you should be able to check the library version like so:
You can now compile using these flags:
For example, to compile a single file main.c:
First build the library inside project root:
Then in your main program:
build/liblite3.ainclude/lite3.h + include/lite3_context_api.hFor example, to compile a single file main.c:
The Buffer API provides the most control, utilizing caller-supplied buffers to support environments with custom allocation patterns, avoiding the use of malloc().
The Context API is a wrapper aound the Buffer API where memory allocations are hidden from the user, presenting a more accessible interface. If you are using Lite³ for the first time, it is recommmended to start with the Context API.
There is no need to include both headers, only the API you intend to use.
By default, library error messages are disabled. However it is recommended to enable them to receive feedback during development. To do this, either:
// #define LITE3_ERROR_MESSAGES inside the header file: include/lite3.h-DLITE3_ERROR_MESSAGESIf you installed using pkg-config, you may need to reinstall the library to apply the changes. To do this, run:
Examples can be found in separate directories for each API:
examples/buffer_api/*examples/context_api/*To build the examples, inside the project root run:
To run an example:
tech's so clever and so refined, yet still parses text like it's '99 🎶🎵
To understand Lite³, let us first look at JSON.
JSON is a text format. This makes it very convient for humans. We can read it in any text editor, understand and modify it if necessary.
Unfortunately, a computer cannot directly work with numbers in text form. It can display them, sure. But to actually do useful calculations like addition, multiplication etc. it must convert them to binary, because processors can only operate on numbers in native word sizes (8-bit, 16-bit, 32-bit, 64-bit). This conversion from text to binary must be done through parsing. This happens inside your browser when you visit a website, and all around the world, millions or billions of times per second by computers communicating with eachother through all kinds of protocols, APIs etc.
There are broadly 3 strategies or 'ways' to parse JSON:
foo(str) everytime a string is encountered. This approach allows for capturing of only the information that the program needs, ignoring everything else. While more efficient, it is also very inconvenient, because you might discover your program 'missed' an event it now wants to read and has to go back and restart the iterator. The programmer must constantly keep track of their logical position within the document, which could lead to complexity and bugs. Therefore, this approach is not very popular.Despite the DOM-based approach being easiest for the programmer, for the computer it represents a non-trivial amount of work. The text must be parsed to find commas, brackets, decimals etc. A block of memory must be allocated and then the tree must be built. The same data is now duplicated and stored in two different representations: as DOM-tree and string. All these operations add memory overhead, runtime cost and increased latency.
So we spend a lot of time constructing and serializing (separate) DOM-trees. Wouldn't it be great if we could, say, encode the tree directly inside the message?
That is exactly what we do. In Lite³, the DOM-tree index is embedded directly inside the structure of the message itself. The underlying datastructure is a B-tree. As a result, lookups, inserts and deletions (get(), set() & delete()) can be performed directly on the serialized form. Encoding to a separate representation becomes unnecessary. Since it is already serialized, it can be sent over the wire at any time. This works because Lite³ structures are always contiguous in memory. Think of the 'dictionary itself' being sent.
We can perform zero-copy lookups, meaning that we do not need to process the entire message, just the field we are interested in. Similarly, we can insert and delete data only by reading the index and nothing else.
Another side effect of having a full functioning B-tree inside a serialized message is that it is possible to mutate data directly in serialized form. Tree rebalancing, key inserts etc. all happen inside the message data. Typically, to mutate serialized data en route you would have to:
Receive JSON string -> Parse -> Mutate -> Stringify -> Send JSON string
With Lite³, this simply becomes:
Receive Lite³ message -> Mutate -> Send Lite³ message
Many internal datacenter communications consist of patterns where messages are received, a few things modified, then sent on to the other services. In such patterns, the design of Lite³ shines because the reserialization overhead is entirely avoided. It is possible to insert entirely new keys, values, or to overriding existing values while completely avoiding reserialization.
With JSON, if you change just a single field inside a 1 MB document, you typically cannot avoid reserializing the entire 1 MB document. But with Lite³, you call an API function to traverse the message, find the field and change it in-place. Such an operation can be done in tens of nanoseconds on modern hardware given the data is present in cache.
Lite³ also implements a BYTES type for native encoding of raw bytes. JSON however does not support this. When converting from Lite³ to JSON, bytes are automatically converted to a base64-encoded string.
Lite³ internally uses a B-tree datastructure. B-trees are well-known performant datastructures, as evidenced by the many popular databases using them (SQLite, MySQL, PostgreSQL, MongoDB & DynamoDB). The reason for their popularity is that they allow for dynamic insertions/deletions while keeping the datastructure balanced and always guaranteeing log(n) lookups.
However, B-trees are rarely found in a memory or serialization context. For memory-only datastructures, common wisdom rather opts for hashmaps or classic binary trees. B-trees are seen as too 'heavyweight'. But we can do the same in memory, though we need a much more compact 'micro B-tree' specifically adapted for fast memory operations. Databases typically use 4kB to store a node in the tree, matching disk pages sizes. Since we are in memory, we work with cache lines, not disk pages. Therefore the node size in Lite³ is set to a (configurable) 96 bytes or 1.5 cache lines by default. Literature suggests that modern machines have nearly identical latency for 64-byte and 256-byte memory accesses, though larger nodes will also increase message size. CPU performance in the current age is all about access patterns and cache friendliness. It may surprise some people that as a result, memory B-trees actually outperform classic binary trees and red-black trees.
An algorithmic observer will note that B-trees have logarithmic time complexity, versus hashmaps' constant time lookups. But typical serialized messages are often small (70% of JSON messages are < 10kB) and only read once. So the overhead of other operations will dominate, except with very frequent updates on large structures (larger than LLC). More importantly, using Lite³ structures completely eliminates an O(n) parsing and serialization step.
In Lite³, all tree nodes and data entries are serialized inside a single byte buffer. Tree hierarchies are established using pointers; all parent nodes have pointers to their child nodes:
Nodes consist of key hashes stored alongside pointers to key-value pairs (data entries), as well as child node pointers:
The key-value pairs are stored anywhere between nodes, or at the end of the buffer:
Over time, deleted entries cause the contiguous byte buffer to accumulate 'holes'. For a serialization format, this is undesirable as data should be compact to occupy less bandwidth. Lite³ will use a defragmentation system to lazily insert new values into the gaps whenever possible. This way, fragmentation is kept under control without excessive data copying or tree rebuilding (STILL WIP, NOT YET IMPLEMENTED).
NOTE: By default, deleted values are overwritten with NULL bytes (0x00). This is a safety feature since not doing so would leave 'deleted' entries intact inside the datastructure until they are overwritten by other values. If the user wishes to maximize performance at the cost of leaking deleted data,
LITE3_ZERO_MEM_DELETEDshould be disabled.
Also, Lite³ does not store raw pointers, but rather 32-bit indexes relative to the buffer pointer. The buffer pointer always points to the zero-index, and the root node is always stored at the zero-index.
Despite being a binary format, Lite³ is schemaless and can be converted to/from JSON. For a more complete explanation of the design, see Design and Limitations.
A number of formats advertise themselves as being 'binary JSON'. Instead of bracks and commas, they typically use a system of tags (TLV) to encode types and values. Being binary also means they store numbers natively, avoiding the parsing of ASCII floating point decimals which is known to be performance-problematic for text formats.
The notable contenders are:
Date objects.While these formats avoid the conversion of numbers from text, they do not eliminate parsing entirely. All of these formats still require a separate memory representation such as a DOM-tree to support meaningful mutations, including reserializing overhead.
Fundamentally, this flaw arises out the fact that values are stored contiguously like arrays, meaning they suffer from all the downsides of arrays. To find an element inside, typically an O(n) linear search is required. This is particularly problematic for random access on large element counts. Additionally, the contiguous nature means that a change to an internal element will require (partial) rewriting of the document.
In contrast, Lite³ is a zero-copy format storing all internal elements within a B-tree structure, guaranteeing O(log n) amortized time complexity for access and modification of any internal value, even inside arrays. The modification of an internal element will also never trigger a rewrite of the document. Only the target element might require reallocation and updating of the corresponding reference. Even throughout modifications, zero-copy access is maintained.
Therefore these formats may be interesting from a perspective of compactness or rich typing. However looking from the standpoint of encode/decode performance, they exist in a lower category.
By making a number of assumptions about the structure of a serialized message, it is possible to (greatly) accelerate the process of encoding/decoding to/from the serialized form. This is where so-called 'binary formats' come in.
But if binary formats exist and are much faster, why is everyone still using JSON?
The answer lies in the fact that most binary formats require a schema file written in an IDL; basically an instruction manual for how to read a message. A binary message doesn't actually tell you anything. It is literally just a bunch of bytes. Only with the schema does it acquire meaning. Note that 'binary' does not necessarily mean schema-only, though in practice this is often implied.
When sending messages between systems you control, you can create your own schemas. But communicating with other people's servers? Now you need to use their schemas as well. And if you want to communicate with the whole world? Well you better start collecting schemas. Relying on out-of-band information eventually takes its toll. Imagine needing an instruction manual for every person you wanted to talk to. Crazy right?
Because of these restrictions, schema-only formats reside in their own special category, notably distinct from schemaless and self-describing formats like JSON, which can be directly read and interpreted without the requirement of extra outside information.
That said, these formats come in 3 primary forms:
.proto) are compiled using the external protoc Protobuf compiler into encoding/decoding source code (C++, Go, Java) for each type of message. This is also the approach taken by Rust's "serde" crate, although the compilation is performed by the Rust compiler, not an external tool.Schema-only formats tend to be brittle and require simultaneous end-to-end upgrades to handle change. Although backwards-compatible evolution is possible, it requires recompilation and synchronization of IDL specifications. But updating all clients and servers simultaneously can be challenging. Major changes like renaming, removing fields or changing types can lead to silent data loss or incompatibility if not handled correctly. In some cases it is better to define a new API endpoint and deprecate the old one.
Here is a section taken from the Simple Binary Encoding's "Design Principles" page:
Backwards Compatibility
In a large enterprise, or across enterprises, it is not always possible to upgrade all systems at the same time. For communication to continue working the message formats have to be backwards compatible, i.e. an older system should be able to read a newer version of the same message and vice versa.An extension mechanism is designed into SBE which allows for the introduction of new optional fields within a message that the new systems can use while the older systems ignore them until upgrade. If new mandatory fields are required or a fundamental structural change is required then a new message type must be employed because it is no longer a semantic extension of an existing message type.
For self-describing formats like Lite³ and JSON, adding and removing fields is easy. It is only positional formats that have this problem. More precisely:
There sometimes exists ambiguity and confusion around the term 'zero-copy'. What does it mean for a data format to be 'zero-copy', or any system in general?
In most contexts, zero-copy refers to a method of accessing and using data without physically moving or duplicating it from its original location. The guiding principle being that compute resources should be spent on real work or calculations, not wasting CPU cycles on unnecessarily relocating and moving bytes around. Applications, operating systems and programming languages may all use zero-copy techniques in various forms, under various names:
splice() and sendfile() system calls&[u8] slices, std::io::Cursor and bytes::Bytesstd::string_view and std::spanmemoryview objectjava.nio.ByteBuffer and slice()Span<T>, Memory<T> and ReadOnlySpan<T>mySlice := myArray[1:4])Buffer.subarray(), TypedArray views on an ArrayBufferMoving around data is a real cost. In the best case, performance degrades linearly with the size of the data being moved. In reality, memory allocations, cache misses, garbage collection and other overhead mean that these costs can multiply non-linearly.
When we talk about 'zero-copy serialization formats', the format should support reading values from the original location, i.e. directly from the serialized message. If a format requires thats its contents are first transformed into an alternative memory representation (i.e. DOM-tree), then this does not classify as zero-copy.
In some cases, a format may support 'zero-copy' references to string or byte objects. However, the ability to access some member field by reference does not immediately make a format 'zero-copy'. Instead, every member field must be accessible without requiring any parsing or transformation step on a received message.
The 4 most notable existing zero-copy formats are:
Note that all of these formats except Flexbuffers require rigid, pre-defined schema compiled into your application. Also, none of these formats support arbitrary mutation of serialized data. If a single field must be changed, then the entire message must be re-serialized. Only the latter 2 formats support trivial in-place mutation of fixed-sized values.
As of writing, JSON remains the global standard today for data serialization. Reasons include: ease of use, human readability and interopability.
Though it comes with one primary drawback: performance. When deploying services at scale using JSON, parsing/serialization can become a serious bottleneck.
The need for performance is ever-present in today's world of large-scale digital infrastructure. For parties involved, cloud and electricity costs are significant factors which cannot be ignored. Based on a report by the IEA, data centres in 2024 used 415 terawatt hours (TWh) or about 1.5% of global electricity consumption. This is expected to double and reach 945 TWh by 2030.
Building systems that scale to millions of users requires being mindful of cloud costs. According to a paper from 2021, protobuf operations constitute 9.6% of fleet-wide CPU cycles in Google’s infrastructure. Microservices at Meta (Facebook) also spend between 4-13% of CPU cycles on (de)serialization alone. Similar case studies of Atlassian and LinkedIn show the need to step away from JSON for performance reasons.
JSON is truly widespread and ubiquitous. If we estimate that inefficient communication formats account for 1-2% of datacenter infrastructure, this amounts to several TWh annualy; comparable to the energy consumption of a small country like Latvia (7.17 TWh in 2023) or Albania (8.09 TWh in 2023). True figures are hard to obtain, but for a comprehensive picture, all devices outside datacenters must also be considered. Not just big tech, but also hardware devices, IoT and a myriad of other applications across different sectors have spawned a variety of 'application specific' binary formats to answer the performance question.
But many binary formats are domain specific. Or they require rigid schema definitions, typically written using some IDL and required by both sender and receiver. Both must be in sync at all times to avoid communication errors. Then if the schema should be changed (so-called 'schema evolution'), it is often a complex and fragile task to preserve backwards compatibility. This, combined with lacking integration in web browsers means many developers avoid binary formats despite performance benefits.
Purely schemaless formats are simply easier to work with. This fact is evidenced by the popularity of JSON. For systems talking to eachother, fragmented communications and lack of standards become problematic, especially when conversion steps are required between different formats. In many cases, systems still fall back to JSON for interopability.
Despite being schemaless, Lite³ directly competes with the performance of binary formats.
Lite³ is designed to handle untrusted messages. Being a pointer chasing format, special attention is paid to security. Some measures include:
If you suspect to have found a security vulnerability, please contact the developer.
Q: Should I use this instead of JSON in my favorite programming language?
A: If you care about performance and can directly interface with C code, then go ahead. If not, wait for better language bindings.
Q: Should I use this instead of Protocol Buffers (or any other binary format)?
A: In terms of encode/decode performance, Lite³ outperforms Protobuf due to the zero-copy advantage. But Lite³ must encode field names to be self-describing, so messages take up more space over the wire. So choose Lite³ if you are CPU-constrained. Are you bandwidth constrained? Then choose Protocol Buffers and be prepared to accept extra tooling, IDL and ABI-breaking evolution to minimize message size.
Q: Can I use this in production?
A: The format is developed for use in the field, though keep in mind this is a new project and the API is unstable. Also: understand the limitations. Experiment first and decide if it suits your needs.
Q: Can I use this in embedded / ARM?
A: Yes, but your platform should support the int64_t type, 8-byte doubles and a suitable C11 gcc/clang compiler, though downgrading to C99 is possible by removing all static assertions. The format has not yet been tested on ARM.
-fltoyyjsonIf you would like to be part of developer discussions with the project author, consider joining the mailing list:
devlist@fastserial.com
To join, send a mail to devlist-subscribe@fastserial.com with non-empty subject. You will receive an email with instructions to confirm your subscription.
Reply is set to the entire list, though with moderation enabled.
To quit the mailing list, simply mail devlist-unsubscribe@fastserial.com
This project was inspired by a paper published in 2024 as Lite²:
Tianyi Chen †, Xiaotong Guan †, Shi Shuai †, Cuiting Huang † and Michal Aibin †
(2024).
Lite²: A Schemaless Zero-Copy Serialization Format
https://doi.org/10.3390/computers13040089
The paper authors in turn got their idea from SQL databases. They noticed how it is possible to insert arbitrary keys, therefore being schemaless. Also, performing a key lookup can be done without loading the entire DB in memory, thus being zero-copy.
They theorized that it would be possible to remove all the overhead associated with a full-fledged database system, such that it would be lightweight enough to be used as a serialization format. They chose the name Lite² since their format is lighter than SQLite.
Despite showing benchmarks, the paper authors did not include code artifacts.
The Lite³ project is an independent interpretation and implementation, with no affiliations or connections to the authors of the original Lite² paper.
The name Lite³ was chosen since it is lighter than Lite².
TIP: To type
³on your keyboard on Linux holdCtrl+Shift+Uthen type00B3. On Windows, useAlt+(numpad)0179.
Lite³ is released under the MIT License. Refer to the LICENSE file for details.
For JSON conversion, Lite³ also includes yyjson, the fastest JSON library in C. yyjson is written by YaoYuan and also released under the MIT License.