ADR-0007: Oxigraph as default triple store¶
- Status: Accepted
- Date: 2026-04-12
Context¶
Trails needs a default triple store that works out-of-the-box with zero ops for development and small-to-medium production. Candidates:
- Apache Jena + TDB2. Mature, feature-complete. JVM dependency, slow startup, heavyweight for embedded use.
- Blazegraph. High-performance, scaled to Wikidata. Abandoned by LinkedData in 2020, no active maintenance.
- Fuseki. Jena-based server; more an HTTP-fronted Jena than a fresh engine. Same JVM concerns.
- Neptune. Managed AWS service. Not embeddable, not OSS.
- GraphDB. Commercial, though free tier exists. Not OSS; not embeddable.
- Qlever. C++-based, high-performance, strong for large read-heavy datasets (Wikidata-scale). Remote-only, complex setup.
- Oxigraph. Rust-native, embeddable, SPARQL 1.1-compliant, MIT licensed, actively maintained (Pelletier + contributors). Can run as library or as server.
- pyoxigraph. Python bindings over Oxigraph — already solves part of our problem.
Requirements for v1 default: - Embeddable (no separate service in dev). - Apache 2.0 or compatible license. - SPARQL 1.1-compliant. - Fast enough for ≤ 10M triples on commodity hardware. - Active maintenance. - Rust-native (no cross-language friction with our kernel).
Decision¶
Oxigraph is the default GraphStore backend.
- Embedded by default (
trails devruns Oxigraph in-process with RocksDB persistence, or in-memory for ephemeral use). - Oxigraph server mode for single-node prod deployments.
- Remote-store adapters (Qlever, Fuseki) shipped as separate crates for scale-out scenarios.
- Neptune and GraphDB as community-contributed adapters (not core).
Why Oxigraph specifically:
- Rust-native — zero FFI cost from our kernel.
- SPARQL 1.1 compliant — covers our query needs.
- MIT licensed — compatible with Apache 2.0 distribution.
- Actively maintained.
- Both embedded and server modes — same codebase scales from dev to prod.
- Already Python-exposed (pyoxigraph) — precedent for the pattern Trails adopts. Note: Trails uses Oxigraph as a Rust dependency via trails-graph, not pyoxigraph directly; the Python surface accesses Oxigraph exclusively through the PyO3 FFI layer (trails-ffi).
Consequences¶
Positive¶
- Zero-ops dev experience.
trails devneeds nothing external. - No FFI cost between kernel and graph store — both are Rust.
- Single-binary distribution possible (static link).
- Migration path to Qlever/Fuseki for read-heavy or federated workloads via trait swap.
- License compatibility with Apache 2.0 framework.
Negative¶
- Oxigraph scale ceiling is below Qlever for very large datasets (> 100M triples, Wikidata-class). Mitigated by Qlever adapter for those workloads.
- Smaller community than Jena. Mitigated by active maintenance and Rust ecosystem growth.
- No clustering in Oxigraph itself — multi-node writes require application-level partitioning. Mitigated by named-graph sharding conventions.
- Persistence backend choice in Oxigraph (RocksDB on-disk vs. in-memory) has performance and durability implications — requires benchmarking on representative workloads.
- RocksDB toolchain dependency. RocksDB 10.7+ requires a C++20 compiler, which complicates static single-binary distribution — mitigation: ship a Dockerfile as the dev distribution unit until a pure-Rust on-disk backend is viable.
Non-consequences¶
- Users can run without Oxigraph (switch to Qlever remote, or plug in custom backend) — it's the default, not the only option.
- Application code is identical regardless of backend.
- Oxigraph does not currently garbage-collect orphaned strings on deletion; long-lived update-heavy deployments should plan periodic snapshot-and-reload as an operational step.
Revisit conditions¶
- If Oxigraph maintenance slows significantly, evaluate replacing with Jena + PyJena bridge (JVM sidecar) or a community-maintained fork.
- If embedded-store performance becomes a bottleneck for representative apps at < 10M triples, tune or consider in-house engine.
- If string-GC behavior or RocksDB C++20 toolchain friction materially hurts operators, revisit backend choice (in-memory + WAL, or alternative embedded RDF store).
Update (2026-04-12)¶
Per security review (§H-3), SPARQL query complexity MUST be bounded at the GraphStore trait level: every query executes under a wall-clock timeout and memory cap (defaults: 30 s, 256 MB), enforced by the kernel adapter, not by handler convention. Adapters that cannot honor the bounds refuse to load. Oxigraph's query evaluator is single-threaded per connection, so one expensive or hostile query blocks all others on that connection — connection-pool sizing and per-principal concurrency limits are part of the operator contract, not a bug to mitigate in userland. Added to Non-consequences: Oxigraph does not provide fair-share scheduling across connections; frameworks layering multi-tenant isolation on top must size pools accordingly.