ffsdb

rust engine

Inside the FFS graph, vector, and write-back engine.

FFS keeps typed records, relationship tables, vector indexes, provenance, MVCC, WAL/qlog recovery, and daemon commands close to one storage substrate and one transaction boundary.

Engine

Storage

Single-file pager, buffer pool, typed node groups, column chunks, relationship tables, catalog roots.

Durability

WAL/MVCC, qlog records, recovery, snapshots, logical export/import, manifest preflight.

Query

Cypher parser, planner, read/write executor, query params, metrics, EXPLAIN/PROFILE paths.

Indexes

B+-tree primary and secondary indexes, persistent HNSW, tombstone/upsert vector maintenance.

Planner / exec

Movement-aware planning tracks scan, projection, traversal, B+-tree/HNSW descent, materialization, hash join, distinct, order, and morsel costs.

Server

ffsd owns the database lock, verifies storage/qlog integrity, runs line-oriented Cypher queries, and exposes native load, metrics, snapshot, export, import, and compaction commands.

Bindings

Rust core first, Python bindings in tree, and product-surface crates for control, studio, and flow while the boundaries are still moving.

Verification

Current verified surface: 920 passing tests across storage, schema, indexes, execution, planner, Cypher, WAL, transactions, vector, GNN/provenance, ffsd, Python, bridge, bench, and simulation crates.

Operator Surface

ffsd wraps the same core

Command family What exists in tree
QUERY / QUERY_PARAMS Line-oriented Cypher execution through ffsd, including parameter maps and metrics/profile paths.
APPLY_MUTATIONS / LOAD_MUTATIONS Native durable batch ingest with syntax-and-semantics preflight and qlog receipts.
LOAD_NODES / LOAD_EDGES Whole-file-preflighted CSV node and edge loads, including append and checkpointed resume variants.
LOAD_MANIFEST Ordered native load plans with CSV batch defaults, DDL, assertions, verify gates, metrics, qlog status, snapshots, export/import, and compaction steps.
VERIFY / METRICS / QLOG_STATUS Storage/qlog integrity checks, query latency buckets, exact rolling percentiles, error counters, buffer-pool counters, and replay/tail posture.
SNAPSHOT / EXPORT / IMPORT Pager+qlog snapshots and logical qlog export/import with live-path and manifest collision preflight.

Speed Mechanics

what the repo actually optimized

B+-tree hot path

Primary-index inserts descend over cached page bytes, try an in-place leaf write first, and only decode/split on overflow. README eval: 429 ns random insert, 182 ns lookup, 796 ns 100-key range scan against sled at 1241 ns, 282 ns, and 3134 ns.

HNSW scratch

Vector search uses epoch-stamped thread-local visited scratch instead of allocating a HashSet per search layer. README eval at 10K vectors, dim 128, k=10: 113 us/query with 0.966 recall@10 vs instant-distance at 298 us/query and 0.969 recall@10.

CSR traversal

Relationship tables use gap-preserving CSR rel-lists with 1.1x growth. That makes appends slower than petgraph in the microbench, but 1-hop reads are 11 ns/op vs 102 ns/op because neighbors are a dense slice.

Multi-source BFS

cypher::query::bfs implements Then 2014 MS-BFS with u64 seed bitmaps. STATUS reports 14 ms for 15 seeds x depth 4 on a 35K-node graph vs 68.8 s for the per-seed/per-hop driver loop, and 129 ms on a 1M-node / 1.55M-edge scale run.

Planner cost

The planner keeps materialized B+-tree and HNSW probe work visible through LIMIT, charges label-cardinality-aware descent, narrows projections before adjacency expansion, and has observed-aware binary/hybrid join ordering hooks.

Write-back durability

Native mutation batches, Cypher writes, vector tombstones/upserts, secondary-index updates, feed checkpoints, and provenance write-back all ride the same qlog/WAL posture instead of app-side reconciliation across stores.

Why Rust

storage code, not glue code

No GC pause budget

A storage engine touches pages, indexes, buffers, WAL records, and vector graph structures on hot paths. The goals doc names memory safety without GC pauses as a reason for the Rust core.

Page-byte control

The B+-tree speedup came from buffer-pool integration, decode-less descent over cached page bytes, and in-place leaf insert when the leaf does not split.

Scratch discipline

The HNSW query speedup came from replacing per-call visited HashSet tracking with an epoch-stamped scratch vector: one allocation up front, O(1) checks, no hashing per visit.

Unsafe has to earn it

The workspace denies unsafe operations in unsafe functions, warns on undocumented unsafe blocks, and the goals doc says every unsafe block needs a safety proof.

Monorepo Surface

Core ffs / ffsd

storage, WAL, planner, execution, graph/vector indexes, server daemon

Control ffs-control

projects, endpoints, branches, deployment metadata, backups, restores

Studio ffs-studio

query, catalog, lineage, dashboards, query history, first wire commands

Flow ffs-flow

pipelines, retrieval, features, models, agent write-back loops

Rust Example

same engine, same relationship table

use ffs::cypher::{compile_query, parse};
use ffs::exec::collect_column;
use ffs::planner::{compile, CompileCtx};
use ffs::storage::RelTable;

let mut knows = RelTable::new(0, "KNOWS", 4, 4);
knows.add_edge(0, 0, 1, 1);
knows.add_edge(0, 0, 2, 2);

let q = parse("MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN b")?;
let plan = compile_query(&q, |_| vec![0, 1, 2, 3]);
let ctx = CompileCtx::empty().with_rel_table("KNOWS", &knows);
let mut op = compile(plan, &ctx)?;
let neighbours = collect_column(&mut *op, 0);

engine context