Inside the engine.

Typed records, relationship tables, vector indexes, provenance, MVCC, WAL/qlog recovery, and daemon commands on one storage layer behind one transaction boundary. Every layer below maps to an architecture decision record in the repo.

Schematic

┌──────────────────────────────────────────────────────────────────┐
        │  query surface           typed Cypher subset (parser + planner)  │
        ├──────────────────────────────────────────────────────────────────┤
        │  execution               factorized vector groups · morsels      │
        │                          NodeScan · Filter · Project · Limit     │
        │                          ExtendOut · ExtendIn · Count            │
        ├──────────────────────────────────────────────────────────────────┤
        │  index                   B+-tree (primary, secondary)            │
        │                          HNSW vector index                       │
        ├──────────────────────────────────────────────────────────────────┤
        │  storage                 node groups · column chunks · CSR rel   │
        │                          MVCC · WAL · pager · buffer pool        │
        └──────────────────────────────────────────────────────────────────┘

Each layer maps to one or more architecture decision records in DECISIONS/, numbered in dependency order; ADR 0001 is the node-group storage layout and the place to start reading.

Layer ledger

LayerIn tree todayState
Pager + buffer poolSingle file, 8 KiB CRC-backed pages, full-file integrity scan, deterministic eviction with hit/miss/write-back counters.shipped
Node groups + column chunks131K tuples per group; null, int, float, dictionary, dense-F32 embedding, and var-bytes chunk codecs with nullable variants.shipped
Relationship tablesA+ double CSR, forward and backward adjacency persisted, gap-preserving rel-lists with 1.1x growth.shipped
MVCC + WALSnapshot isolation, version chains, WAL encode/decode/replay, recovery tested under torn and corrupt tails.shipped
Schema catalogTyped catalog persisted through pager chunks; secondary-index roots and HNSW handles survive close/reopen via the superblock.shipped
Schema migrationsALTER ... ADD/RENAME/DROP PROPERTY with physical backfill chunks for scalar, embedding, Array, and Struct defaults; complex shapes remain.partial
B+-tree indexesPrimary fixed-key and variable-key trees persisted; descent and in-place leaf insert run directly on cached page bytes.shipped
Secondary indexesOn-disk trees with catalog manifests, cost-aware candidate selection across equality/range/string/null predicates; full composite plans pending.partial
HNSW vector indexPersistent (Navix pattern): vector chunks, tombstone/upsert maintenance, compaction, delayed reclaimed-page reuse.shipped
Execution operatorsNodeScan, Filter, Project, Limit, Count, ExtendOut/In, IMJ, HashJoin, OrderBy, Distinct, Union; broader morsel threading remains.partial
PlannerFilter-pushdown rewrites, label-cardinality-aware index costs, DP join ordering with observed-cardinality hooks; broader WCO/adaptive work pending.partial
Cypher runtimeParser, evaluator, executor: OPTIONAL MATCH, WITH, UNWIND, parameters, aggregates, real MERGE, DETACH DELETE, bounded variable-length paths.shipped
Query-log durabilityEvery mutation appends a CRC32-framed qlog record; replay reproduces the durable prefix and tolerates torn tails; compaction and export/import.shipped
Multi-source BFSThen 2014 MS-BFS with u64 seed bitmaps, scale-tested to 1M nodes / 1.55M edges.shipped
ffsd daemonSingle-owner lock, startup integrity scan, 40+ line-oriented commands across query, ingest, validation, snapshots, and metrics; richer clients ahead.partial
Python bindingpyo3 abi3 wheels for Linux/macOS, async driver, temporal and property-bag marshalling tested through the binding core.shipped
Simulation11 deterministic crash/recovery scenarios: WAL torn tails, power-loss prefixes, randomized mutation schedules with minimized failing prefixes.shipped
v1 spine (flag-gated)On-disk read path: node, CSR, and edge shadows persisted to sibling pagers, snapshot visibility from disk, dirty-tail overlay, read-through page cache taking point reads from ~448 to ~1.6M ops/sec. Default-off; the write side still dual-writes.partial

How the B+-tree got fast

The insert path started as a sorted-Vec placeholder losing to sled by 8.8x on lookups. Four changes, each measured before the next:

Real pages. Replaced the in-memory sorted Vec with 8 KiB on-disk pages, splits propagating up to a fresh root, leaf sibling pointers (ADR 0008). Baseline after: 4676 ns insert, 3417 ns lookup.

Buffer pool. Every descent reads a memory-resident page slot instead of issuing seek-plus-read per touch (ADR 0002). Lookups dropped to 999 ns.

Decode-less descent. Binary-search directly on the cached page bytes without materializing a key vector. Lookups dropped to 136 ns, ahead of sled by 2x.

In-place leaf insert. memmove directly on the cached page; decode-and-split only when the leaf actually overflows, under 1% of inserts on this workload. Inserts dropped from 1490 ns to 429 ns. A non-overflowing insert touches exactly one leaf page and makes zero heap allocations.

Speed mechanics

HNSW scratch. Vector search uses epoch-stamped thread-local visited scratch instead of allocating a HashSet per search layer: one allocation up front, O(1) check-and-set per node visit, no hashing. The same change speeds up build, since insert calls search_layer during construction.

CSR traversal. Relationship tables keep gap-preserving CSR rel-lists with 1.1x growth. Appends pay for the bookkeeping (30 ns vs petgraph's 3 ns) and 1-hop reads collect it back: 11 ns vs 102 ns, because neighbors are a dense slice.

Multi-source BFS. cypher::query::bfs implements Then 2014 MS-BFS with u64 seed bitmaps. 15 seeds at depth 4 on a 35K-node graph: 14 ms, against 68.8 s for the per-seed/per-hop driver loop. 129 ms on a 1M-node / 1.55M-edge scale run.

Planner cost. The planner keeps materialized B+-tree and HNSW probe work visible through LIMIT, charges label-cardinality-aware descent, narrows projections before adjacency expansion, and carries observed-cardinality hooks for adaptive join ordering.

Morsel parallelism. Filter selection, Limit truncation, Count reduction, HashJoin build/probe/output, Distinct and OrderBy key extraction, and IMJ row probing can all split large chunks across morsels; broader threading through the executor is in flight.

Write-back durability. Native mutation batches, Cypher writes, vector tombstones/upserts, secondary-index updates, feed checkpoints, and provenance write-back all ride the same qlog/WAL machinery, so durability is one discipline instead of per-store reconciliation.

Operator surface

Command familyWhat exists in tree
QUERY / QUERY_PARAMSLine-oriented Cypher execution, parameter maps, EXPLAIN/PROFILE paths.
APPLY_MUTATIONS / LOAD_MUTATIONSNative durable batch ingest with syntax-and-semantics preflight and qlog receipts.
LOAD_NODES / LOAD_EDGESWhole-file-preflighted CSV loads, including append-only and checkpointed-resume variants.
LOAD_MANIFESTOrdered native load plans: CSV batches, DDL, assertions, verify gates, metrics, snapshots, export/import, compaction steps.
VERIFY / METRICS / QLOG_STATUSStorage and qlog integrity scans, latency buckets with exact rolling percentiles, error counters, replay posture.
SNAPSHOT / EXPORT / IMPORTPager+qlog snapshots and logical dumps, with live-path and collision preflight before anything is copied or truncated.

Why Rust

No GC pause budget. A storage engine touches pages, indexes, buffers, WAL records, and vector graph structures on hot paths. The goals doc names memory safety without GC pauses as a reason for the Rust core.

Page-byte control. The B+-tree speedup above came entirely from operating on cached page bytes: buffer-pool integration, decode-less descent, in-place leaf writes.

Scratch discipline. The HNSW query speedup came from replacing per-call visited HashSet tracking with an epoch-stamped scratch vector that lives across calls.

Unsafe policy. The workspace denies unsafe operations in unsafe functions, warns on undocumented unsafe blocks, and the goals doc requires a safety proof for every unsafe block.

Rust example

use ffs::cypher::{compile_query, parse};
        use ffs::exec::collect_column;
        use ffs::planner::{compile, CompileCtx};
        use ffs::storage::RelTable;

        // A tiny social graph: 0 → 1, 0 → 2, 1 → 2, 2 → 3.
        let mut knows = RelTable::new(0, "KNOWS", 4, 4);
        knows.add_edge(0, 0, 1, 1);
        knows.add_edge(0, 0, 2, 2);
        knows.add_edge(1, 1, 2, 2);
        knows.add_edge(2, 2, 3, 3);

        // One-hop traversal, end-to-end through the engine.
        let q = parse("MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN b").unwrap();
        let plan = compile_query(&q, |_| vec![0, 1, 2, 3]);
        let ctx = CompileCtx::empty().with_rel_table("KNOWS", &knows);
        let mut op = compile(plan, &ctx).unwrap();
        let mut neighbours = collect_column(&mut *op, 0);
        neighbours.sort();
        assert_eq!(neighbours, vec![1, 2, 2, 3]);

The bigger demo — graph, vector, and provenance in one transaction with a mocked LLM — lives at ffs-demo/examples/write_back_loop.rs in the repo.

ffsdb.com — private preview next: evals →