Skip to content

ADR-0004: Query-time reasoning, opt-in, cached

  • Status: Accepted
  • Date: 2026-04-12

Context

RDFS / OWL-RL entailment lets a triple store answer questions the explicit data doesn't directly contain — e.g., "is :Patient1 a schema:Person?" derived from :Patient rdfs:subClassOf schema:Person. Without reasoning, queries miss these implications.

Reasoning has a cost. Materializing entailments eagerly on every write can 10× the write size. Running them on every query can 10× query latency. Getting this wrong kills framework usability.

Placement options:

  1. Eager, pre-handler. On every write, materialize OWL-RL closure. Handlers see fully-closed graph. Storage and write-latency impact severe.
  2. Eager, background. Writes are fast; closure materialized asynchronously. Simpler to reason about but stale-data window.
  3. Query-time, per-query. Closure computed on demand for each SPARQL query. Slow for every query.
  4. Query-time with cache + invalidation. Closure computed once per graph, cached in :inferred named graph, invalidated on premise-graph writes. Lazy but fast once warm.
  5. None by default, opt-in per capability. Frame-level: most handlers don't need reasoning; those that do declare it.

Decision

Combination: #4 (query-time, cached, per-named-graph) + #5 (opt-in per capability).

  • Default reasoning mode for a capability is none.
  • Capabilities opt in via @capability(reasoning="rdfs" | "owl-rl").
  • When a reasoning-enabled query executes, the kernel checks whether the :inferred/<graph> cache is current. If yes, query runs against <graph> UNION :inferred/<graph>. If no, closure is computed, stored, and then query runs.
  • Any write to <graph> marks :inferred/<graph> stale.
  • Cache warms lazily; there is no background reasoner daemon in v1.

Consequences

Positive

  • Zero perf cost for reasoning-unaware capabilities. Most handlers pay nothing.
  • Developers must think about reasoning to use it. Forces the question "do I actually need entailment here?" which is almost always no.
  • Cache amortizes cost over read-heavy workloads. Typical semweb apps read far more than they write.
  • Named-graph scoping means a noisy write graph doesn't invalidate reasoning over other graphs.

Negative

  • Cache invalidation is genuinely hard. Writes that affect ontology-level premises (new rdfs:subClassOf) may require broader invalidation than naive per-graph approach. Mitigated by treating ontology updates as global invalidations (explicit operation).
  • First query after invalidation is slow. Mitigated by optional pre-warming in trails onto evolve.
  • Debugging inference issues requires inspecting :inferred graph contents. Mitigated by trails trace showing which inferred triples were used.

Non-consequences

  • Apps that don't opt in never see inference behavior change.
  • Reasoner implementation is abstracted behind a trait — swap OWL-RL for RDFS or SWRL without framework changes.

Deferred

  • Incremental reasoning (recompute only affected portion on write) is v2+. v1 recomputes full closure on invalidation.
  • Backward-chaining (reason at query plan time without materialization) is v2+ exploration.

Revisit conditions

  • If real apps frequently need reasoning on hot paths, add an eager-materialize mode as an opt-in alternative.
  • If OWL-RL proves insufficient (e.g., users need OWL-DL expressivity), evaluate embedding a DL reasoner (HermiT, Pellet) — likely v2+.