ADR-0007: Oxigraph as default triple store¶

Status: Accepted
Date: 2026-04-12

Context¶

Trails needs a default triple store that works out-of-the-box with zero ops for development and small-to-medium production. Candidates:

Apache Jena + TDB2. Mature, feature-complete. JVM dependency, slow startup, heavyweight for embedded use.
Blazegraph. High-performance, scaled to Wikidata. Abandoned by LinkedData in 2020, no active maintenance.
Fuseki. Jena-based server; more an HTTP-fronted Jena than a fresh engine. Same JVM concerns.
Neptune. Managed AWS service. Not embeddable, not OSS.
GraphDB. Commercial, though free tier exists. Not OSS; not embeddable.
Qlever. C++-based, high-performance, strong for large read-heavy datasets (Wikidata-scale). Remote-only, complex setup.
Oxigraph. Rust-native, embeddable, SPARQL 1.1-compliant, MIT licensed, actively maintained (Pelletier + contributors). Can run as library or as server.
pyoxigraph. Python bindings over Oxigraph — already solves part of our problem.

Requirements for v1 default: - Embeddable (no separate service in dev). - Apache 2.0 or compatible license. - SPARQL 1.1-compliant. - Fast enough for ≤ 10M triples on commodity hardware. - Active maintenance. - Rust-native (no cross-language friction with our kernel).

Decision¶

Oxigraph is the default GraphStore backend.

Embedded by default (trails dev runs Oxigraph in-process with RocksDB persistence, or in-memory for ephemeral use).
Oxigraph server mode for single-node prod deployments.
Remote-store adapters (Qlever, Fuseki) shipped as separate crates for scale-out scenarios.
Neptune and GraphDB as community-contributed adapters (not core).

Why Oxigraph specifically: - Rust-native — zero FFI cost from our kernel. - SPARQL 1.1 compliant — covers our query needs. - MIT licensed — compatible with Apache 2.0 distribution. - Actively maintained. - Both embedded and server modes — same codebase scales from dev to prod. - Already Python-exposed (pyoxigraph) — precedent for the pattern Trails adopts. Note: Trails uses Oxigraph as a Rust dependency via trails-graph, not pyoxigraph directly; the Python surface accesses Oxigraph exclusively through the PyO3 FFI layer (trails-ffi).

Consequences¶

Positive¶

Zero-ops dev experience. trails dev needs nothing external.
No FFI cost between kernel and graph store — both are Rust.
Single-binary distribution possible (static link).
Migration path to Qlever/Fuseki for read-heavy or federated workloads via trait swap.
License compatibility with Apache 2.0 framework.

Negative¶

Oxigraph scale ceiling is below Qlever for very large datasets (> 100M triples, Wikidata-class). Mitigated by Qlever adapter for those workloads.
Smaller community than Jena. Mitigated by active maintenance and Rust ecosystem growth.
No clustering in Oxigraph itself — multi-node writes require application-level partitioning. Mitigated by named-graph sharding conventions.
Persistence backend choice in Oxigraph (RocksDB on-disk vs. in-memory) has performance and durability implications — requires benchmarking on representative workloads.
RocksDB toolchain dependency. RocksDB 10.7+ requires a C++20 compiler, which complicates static single-binary distribution — mitigation: ship a Dockerfile as the dev distribution unit until a pure-Rust on-disk backend is viable.

Non-consequences¶

Users can run without Oxigraph (switch to Qlever remote, or plug in custom backend) — it's the default, not the only option.
Application code is identical regardless of backend.
Oxigraph does not currently garbage-collect orphaned strings on deletion; long-lived update-heavy deployments should plan periodic snapshot-and-reload as an operational step.

Revisit conditions¶

If Oxigraph maintenance slows significantly, evaluate replacing with Jena + PyJena bridge (JVM sidecar) or a community-maintained fork.
If embedded-store performance becomes a bottleneck for representative apps at < 10M triples, tune or consider in-house engine.
If string-GC behavior or RocksDB C++20 toolchain friction materially hurts operators, revisit backend choice (in-memory + WAL, or alternative embedded RDF store).

Update (2026-04-12)¶

Per security review (§H-3), SPARQL query complexity MUST be bounded at the GraphStore trait level: every query executes under a wall-clock timeout and memory cap (defaults: 30 s, 256 MB), enforced by the kernel adapter, not by handler convention. Adapters that cannot honor the bounds refuse to load. Oxigraph's query evaluator is single-threaded per connection, so one expensive or hostile query blocks all others on that connection — connection-pool sizing and per-principal concurrency limits are part of the operator contract, not a bug to mitigate in userland. Added to Non-consequences: Oxigraph does not provide fair-share scheduling across connections; frameworks layering multi-tenant isolation on top must size pools accordingly.