From core engine to public proof.
The labels describe sequence; none of them carry dates. Each entry says what should get better in plain English and what evidence has to exist before the claim goes public.
Releases
-
v0.5core existsshipped
Opens a database file, writes, reopens, verifies storage/qlog posture, runs Cypher, searches HNSW vectors, exposes ffsd commands. 920 passing tests and the local comparator evals.
-
v0.6trust + index hardeningdue
Fewer sharp edges on restart, import/export, schema backfills, secondary indexes, and vector-index maintenance. Proof: crash/reopen tests, index invariant checks, safer planner fallbacks.
-
v0.7execution speed passlater
Bigger scans, joins, traversals, sorts, and write-back paths do less row-at-a-time work. Speed claims wait until before/after numbers exist on measured workloads.
-
v0.8movement-aware plannerlater
The engine chooses where work happens: scan columns, use a B+-tree, expand graph edges, search vectors first, filter first, stream, materialize, or write back. Proof: EXPLAIN output that names movement cost.
-
v0.9write-back looplater
The signature workflow runs end to end in the engine — retrieve, traverse, derive, write back, update indexes, commit — with no sidecar processes stitched around it.
-
v0.10public proof ladderlater
Public, reproducible numbers against well-known systems, with raw commands, machine specs, and losses included. 10GB, 100GB, and 1TB-class ladders where feasible.
Storage convergence (v1)
Alongside the release labels, v1 moves the engine onto disk, stage by stage (ADR 0021/0022/0023 in the repo). The read side is done: point reads, scans, BFS adjacency, and MVCC snapshot visibility all serve from pager-backed structures plus a dirty-tail overlay, proven equal to the in-memory engine across the full Cypher surface and a reopen. Read performance is done: a read-through page cache and a direct row/neighbour seek took disk reads from ~10,000x slower than in-memory to ~3x — point reads went from ~448 to ~1.6M ops/sec on the 20k-node bench. On the write side, the reads writes depend on — endpoint existence, the pre-mutation state a SET builds from, MERGE candidate verification — now serve from the disk engine done, and edge state (ids, endpoints, property bags) persists and reads back from its own pager file done. The current work now: the edge overlay split, persisting the MERGE/uniqueness indexes, and evicting clean entries at flush so the in-memory store becomes exactly the unflushed tail. Then comes the cut next: disk-served reads on by default and the in-memory store deleted — irreversible, so it goes last. Everything so far lands flag-gated and default-off, so 0.5.0 behavior is unchanged until the cut.
Build phases
The engineering ladder behind the labels, from the repo's
ROADMAP.md. Each phase ends at an exit bar.
P0 — trust the core. Make the existing engine boringly correct: catalog recovery across close/reopen, WAL replay for mixed writes, page and index invariant checks, deterministic crash scenarios, fuzzed WAL decode and query parsing. Exit: kill/reopen tests are ordinary; every persisted structure has an invariant checker.
P1 — durable native indexes. Persist HNSW and secondary indexes as engine-native structures, define index rebuild vs WAL replay rules, add stats for planner selectivity, benchmark maintenance cost under writes. Exit: vector and secondary indexes survive restart; maintenance cost is measured, not guessed.
P2 — data movement execution. Finish HashJoin, IMJ, OrderBy, Distinct, and Union; thread morsel execution through scans, joins, traversal, and vector search; SIMD kernels where useful; bulk ingest and write-back paths. Exit: scan/filter/join/traverse/search/write-back run as one physical plan.
P3 — movement-aware planner. A shared physical-plan IR across surfaces; column, fanout, and vector-index stats; cost terms for scan vs lookup vs graph expansion vs vector search, including write amplification. Exit: plans are explainable; obvious bad plans are rejected by tests.
P4 — derived write-back loop. Bulk provenance write-back, embedding refresh APIs, write-back operators in the execution engine, one transaction across derived fact, provenance edges, embedding update, and index maintenance. Exit: the full loop runs without sidecar systems.
P5 — developer surfaces. Stabilize the Rust API for hot paths, improve Python ergonomics, open a SQL/Postgres compatibility track, add migration and inspection commands, typed failures with clear messages. Exit: a developer can open a file and build something useful in minutes.
P6 — proof. Comparator harness against SQLite, DuckDB, Postgres, Kùzu, Neo4j, LanceDB, Qdrant/pgvector, and RocksDB/sled, with published machine specs, commands, datasets, raw outputs, and losses; large workload ladders; crash-during-load scenarios. Exit: users can reproduce the numbers; FFS has a credible first arena where it is clearly better.
P7 — optional server wrapper. Only after the embedded engine earns it: TCP wire wrapper, auth and connection management, observability, backup and restore tooling, replication research. Exit: server mode does not fork the engine; embedded remains the reference path.