Kelp is the base database for Baratine. The core requirements for the database
- Reliable, minimizing corruption
- Reasonably fast writes
- Reasonably fast loads
- Reasonably fast queries
- Support medium and large blobs
- Support clustering and sharding (managed by Kraken)
- Designed to work together with Baratine data services.
- Btree key/value where value is a fixed-size row with attached blobs.
- Single-thread/writer Baratine service for each table.
- Journal used for reliability and to reduce db writes.
- Btree nodes organized as numbered pages.
- In-memory and on-disk organized as append-only with deltas.
- Fixed sized rows with small blobs like strings embedded in the leaf nodes.
- Only leafs are written to disk. Tree is reconstructed on start.
- Writes to variable sized segments (64k - 64m), depending on table size.
- Sequence number and page id used to recover most recent data.
- GC used to recover segments.
Baratine’s data services are the primary clients of the Kelp/Kraken database.
Data services have several specific behaviors:
- The data is primary serialized documents representing an object.
- Each key/value row is owned entirely by a data service item. Only that item will ever write the data.
- Data is typically loaded once and written multiple times. (Until the item is LRU’ed out of memory. On re-access, the pattern repeats.)
- Queries can retrieve copies of the data, but only the data owner can modify it.
Updated btree leaf pages and blobs are written to the current open segment, which can vary in size from 64k to 64m depending on the table size. Deltas are written after the initial node.
Leaf pages are written to the open segments, which has a sequence
value. A specific leaf, say #24 with keys ranging from
0xaff_ffff, will be written to the open segment
with sequence #319.
The first copy of the leaf will be complete with its keys sorted. Following updates write deltas, where each delta is the list of changed key/values. If the leaf is changed significantly, it may write a new copy of the entire node. So each segment will contain several updated leaf nodes and medium or large blobs.
On startup, Kelp sorts all segments by their sequence number and loads the offset and deltas for each most-recent update of a page. The offset and delta locations will be used to load the page when an application needs the data.
When a leaf is loaded, the in-memory page is also structured as append-only. Although there is only a single writer thread per table, which avoids the need for most locking, multiple reader threads are allowed. The append-only structure avoids the need for locking, since the reader will always see a valid snapshot of the data.
When a page fills up, it’s compacted and sorted. If necessary, it will be split.