Every number, wins and losses.

cargo run -p ffs-evals --release runs every scenario every time and prints both faster and slower paths. Numbers below are from a single run on Apple M-series silicon; machine spec, methodology, and the rationale for each comparator live in ffs-evals/src/main.rs. Losses are published in the same tables as the wins.

Headline results

ComparatorShapeMeasured pathResult
sledembedded KVinsert / lookup / 100-key range scanffs 2.89x / 1.55x / 3.94x
instant-distanceembedded HNSWbuild / query, dim 128, k=10ffs 1.13x / 2.64x
petgraphin-memory graph1-hop / 2-hop BFS / edge appendffs 9.28x / 1.43x · append 9.64x slower
Postgres + pgvectorserver vector DBHNSW query, same algorithm and parametersffs 3.01x
LanceDBembedded vector DBquery vs auto IVF-PQ index / vs brute forceffs 3.03x / 4.78x
Kùzuembedded graph DBbulk load / 1-hop — native API vs Cypher, stack cost*ffs 18.5x / ~29,000x*
Neo4jserver graph DBbulk load / 1-hop — native API vs Bolt, stack cost*ffs ~1,200x / ~195,000x*

* The starred ratios measure architectural stack cost rather than engine parity. "The loud numbers" below explains exactly what they do and do not show.

vs sled

Opffs::BTreesledResult
insert (random u64)429 ns/op1241 ns/opffs 2.89x
lookup (random u64)182 ns/op282 ns/opffs 1.55x
range scan (100-key)796 ns/op3134 ns/opffs 3.94x

100,000 random inserts and lookups, 1,000 range scans, fresh DB per run. The FFS tree is backed by a 4096-page buffer pool; reads and writes operate directly on cached page bytes with no node materialization on the hot paths. The engine page walks through the four changes that took inserts from 4676 ns to 429 ns.

vs instant-distance

Opffs::Hnswinstant-distanceResult
build (per insert)202 µs/op228 µs/opffs 1.13x
query (per search)113 µs/op298 µs/opffs 2.64x
recall@100.9660.969comparable

10,000 random vectors at dim 128, M=16, k=10, recall measured against an exact brute-force oracle over 1,000 queries. The speedup is the epoch-stamped visited-scratch trick; recall stays even.

vs petgraph

Opffs::CsrpetgraphResult
append edge30 ns/op3 ns/oppetgraph 9.64x
1-hop out-neighbours11 ns/op102 ns/opffs 9.28x
2-hop BFS reachable set1137 ns/op1630 ns/opffs 1.43x

50,000 vertices, 500,000 random directed edges, 1,000 traversal sources. The intended tradeoff is visible in the first row: FFS pays at append time for gap-preserving rel-list bookkeeping and collects on every read, because neighbors are a dense slice with cache locality an edge-list cannot match.

vs Postgres + pgvector

ApproachBuild pathQuery latency
ffs::Hnswper-insert HNSW graph build126 µs/query
pgvector, HNSW index585 µs/insert + 5.75 s index build380 µs/query
pgvector, no index9.5 µs/insert1877 µs/query

Postgres 17 + pgvector over the default Unix socket, same HNSW algorithm, same M=16 and ef_construction=200 on both sides. This is the cleanest apples-to-apples vector number in the matrix: the 3.01x gap is the SQL parse, libpq round-trip, and row deserialization between the application and the index — same algorithm, different stack shape.

vs LanceDB

ApproachBuild pathQuery latency
ffs::Hnswper-insert HNSW graph build115 µs/query
LanceDB, auto IVF-PQ index17.6 µs/insert avg + 0.17 s index350 µs/query
LanceDB, brute scan523 ns/insert, no index550 µs/query

10,000 vectors at dim 128, 200 queries at k=10. LanceDB's bulk load is much cheaper because it builds no index inline; FFS pays at insert time and queries are quick from the first insert. Which side of that tradeoff matters depends on the workload.

The loud numbers

Opffs::CsrKùzuNeo4j (Bolt)
bulk load (per edge)21–22 ns/op381 ns/op26,359 ns/op
1-hop out-neighbours10 ns/op289,462 ns/op1,877,869 ns/op

5,000 vertices, 25,000 edges, 200 traversal sources. These ratios are loud and the README names what they actually measure. FFS here is a native Rust call — csr.neighbors(src) is a slice iterator. Kùzu runs each query through ANTLR parse, binder, planner, executor, and FFI marshalling even with a prepared statement; Neo4j adds a Bolt server round-trip on top. That stack is the cost real applications pay on every query, and FFS's collapsed-into-one-process design avoids it by construction. The apples-to-apples follow-up — FFS Cypher over stored graph data vs Kùzu Cypher — is named in the README as the next comparison to add. Until then, read these rows as stack-overhead evidence and remaining headroom, never as a sibling-database score.

Predicate-aware retrieval

SelectivityStrategyLatencyRecall@10
10%post-filter, 10x oversample125 µs/query85%
10%predicate-aware ANN618 µs/query100%
1%post-filter, 10x oversample126 µs/query11%
1%predicate-aware ANN2535 µs/query100%

20,000 nodes at dim 128, graph predicate dept_id == 0, top-10 nearest neighbours among matching nodes. The honest read: at low selectivity, post-filtering isn't faster — it's wrong. 11% recall means the user gets one of the ten results they asked for. Matching predicate-aware recall would need ~100x oversampling, which closes the latency gap anyway. The 2.5 ms query is the cost of correctness, paid once, inside one engine — a three-system stack pays serialization, network, and join-on-id on every call before it even gets the answer wrong.

Coverage

ComparatorShapeStatus
sledembedded KVmeasured — faster on all three ops
instant-distanceembedded HNSWmeasured — faster on build and query
petgraphin-memory graphmeasured — faster on reads, slower on appends
Kùzuembedded graph DBmeasured — native API vs Cypher, stack cost
LanceDBembedded vector DBmeasured — query 3x faster than indexed path
Neo4jserver graph DBmeasured — native API vs Bolt, stack cost
Postgres + pgvectorserver vector DBmeasured — 3.01x, same algorithm and parameters
Memgraph / FalkorDBserver graphdeferred — needs Docker
Postgres + AGEserver graph extensiondeferred — needs a source build

Where FFS is slower, the harness measures it. Where the comparator cannot run yet, the table says so. The aim is a comparator matrix for graph, vector, and typed storage paths rather than one hand-picked headline number.

ffsdb.com — private preview next: comparisons →