Every number, wins and losses.

cargo run -p ffs-evals --release runs every scenario every time and prints both faster and slower paths. Numbers below are from a single run on Apple M-series silicon; machine spec, methodology, and the rationale for each comparator live in ffs-evals/src/main.rs. Losses are published in the same tables as the wins.

Headline results

Comparator	Shape	Measured path	Result
sled	embedded KV	insert / lookup / 100-key range scan	ffs 2.89x / 1.55x / 3.94x
instant-distance	embedded HNSW	build / query, dim 128, k=10	ffs 1.13x / 2.64x
petgraph	in-memory graph	1-hop / 2-hop BFS / edge append	ffs 9.28x / 1.43x · append 9.64x slower
Postgres + pgvector	server vector DB	HNSW query, same algorithm and parameters	ffs 3.01x
LanceDB	embedded vector DB	query vs auto IVF-PQ index / vs brute force	ffs 3.03x / 4.78x
Kùzu	embedded graph DB	bulk load / 1-hop — native API vs Cypher, stack cost*	ffs 18.5x / ~29,000x*
Neo4j	server graph DB	bulk load / 1-hop — native API vs Bolt, stack cost*	ffs ~1,200x / ~195,000x*

* The starred ratios measure architectural stack cost rather than engine parity. "The loud numbers" below explains exactly what they do and do not show.

vs sled

Op	ffs::BTree	sled	Result
insert (random u64)	429 ns/op	1241 ns/op	ffs 2.89x
lookup (random u64)	182 ns/op	282 ns/op	ffs 1.55x
range scan (100-key)	796 ns/op	3134 ns/op	ffs 3.94x

100,000 random inserts and lookups, 1,000 range scans, fresh DB per run. The FFS tree is backed by a 4096-page buffer pool; reads and writes operate directly on cached page bytes with no node materialization on the hot paths. The engine page walks through the four changes that took inserts from 4676 ns to 429 ns.

vs instant-distance

Op	ffs::Hnsw	instant-distance	Result
build (per insert)	202 µs/op	228 µs/op	ffs 1.13x
query (per search)	113 µs/op	298 µs/op	ffs 2.64x
recall@10	0.966	0.969	comparable

10,000 random vectors at dim 128, M=16, k=10, recall measured against an exact brute-force oracle over 1,000 queries. The speedup is the epoch-stamped visited-scratch trick; recall stays even.

vs petgraph

Op	ffs::Csr	petgraph	Result
append edge	30 ns/op	3 ns/op	petgraph 9.64x
1-hop out-neighbours	11 ns/op	102 ns/op	ffs 9.28x
2-hop BFS reachable set	1137 ns/op	1630 ns/op	ffs 1.43x

50,000 vertices, 500,000 random directed edges, 1,000 traversal sources. The intended tradeoff is visible in the first row: FFS pays at append time for gap-preserving rel-list bookkeeping and collects on every read, because neighbors are a dense slice with cache locality an edge-list cannot match.

vs Postgres + pgvector

Approach	Build path	Query latency
ffs::Hnsw	per-insert HNSW graph build	126 µs/query
pgvector, HNSW index	585 µs/insert + 5.75 s index build	380 µs/query
pgvector, no index	9.5 µs/insert	1877 µs/query

Postgres 17 + pgvector over the default Unix socket, same HNSW algorithm, same M=16 and ef_construction=200 on both sides. This is the cleanest apples-to-apples vector number in the matrix: the 3.01x gap is the SQL parse, libpq round-trip, and row deserialization between the application and the index — same algorithm, different stack shape.

vs LanceDB

Approach	Build path	Query latency
ffs::Hnsw	per-insert HNSW graph build	115 µs/query
LanceDB, auto IVF-PQ index	17.6 µs/insert avg + 0.17 s index	350 µs/query
LanceDB, brute scan	523 ns/insert, no index	550 µs/query

10,000 vectors at dim 128, 200 queries at k=10. LanceDB's bulk load is much cheaper because it builds no index inline; FFS pays at insert time and queries are quick from the first insert. Which side of that tradeoff matters depends on the workload.

The loud numbers

Op	ffs::Csr	Kùzu	Neo4j (Bolt)
bulk load (per edge)	21–22 ns/op	381 ns/op	26,359 ns/op
1-hop out-neighbours	10 ns/op	289,462 ns/op	1,877,869 ns/op

5,000 vertices, 25,000 edges, 200 traversal sources. These ratios are loud and the README names what they actually measure. FFS here is a native Rust call — csr.neighbors(src) is a slice iterator. Kùzu runs each query through ANTLR parse, binder, planner, executor, and FFI marshalling even with a prepared statement; Neo4j adds a Bolt server round-trip on top. That stack is the cost real applications pay on every query; FFS runs in one process and skips it. The apples-to-apples follow-up — FFS Cypher over stored graph data vs Kùzu Cypher — is named in the README as the next comparison to add. Until then, read these rows as stack-overhead evidence and remaining headroom, not as a sibling-database score.

Predicate-aware retrieval

Selectivity	Strategy	Latency	Recall@10
10%	post-filter, 10x oversample	125 µs/query	85%
10%	predicate-aware ANN	618 µs/query	100%
1%	post-filter, 10x oversample	126 µs/query	11%
1%	predicate-aware ANN	2535 µs/query	100%

20,000 nodes at dim 128, graph predicate dept_id == 0, top-10 nearest neighbours among matching nodes. At 1% selectivity, post-filtered recall drops to 11%: the user gets one of the ten results they asked for. Matching predicate-aware recall would take roughly 100x oversampling, which closes the latency gap anyway. The 2.5 ms query is what a correct answer costs on this workload.

Coverage

Comparator	Shape	Status
sled	embedded KV	measured — faster on all three ops
instant-distance	embedded HNSW	measured — faster on build and query
petgraph	in-memory graph	measured — faster on reads, slower on appends
Kùzu	embedded graph DB	measured — native API vs Cypher, stack cost
LanceDB	embedded vector DB	measured — query 3x faster than indexed path
Neo4j	server graph DB	measured — native API vs Bolt, stack cost
Postgres + pgvector	server vector DB	measured — 3.01x, same algorithm and parameters
Memgraph / FalkorDB	server graph	deferred — needs Docker
Postgres + AGE	server graph extension	deferred — needs a source build

Slower paths get measured and published; comparators that cannot run yet are marked deferred. The aim is a full matrix across graph, vector, and typed storage paths.

ffsdb.com — private preview next: comparisons →