Skip to content

ADR-0017: ActiveGraph-style ORM on top of the async GraphStore

  • Status: Accepted (2026-04-19)
  • Date: 2026-04-14
  • Supersedes:
  • Superseded by:

Context

Trails positions itself as Rails for agentic-semantic-web apps. The Rails analogy has concrete load-bearing pieces today — conventional project layout, generators, a Python-first surface (ADR-0001), shapes (ADR-0002), always-on provenance (ADR-0009), cost as a primitive (ADR-0012). The model layer is conspicuously missing.

Application authors writing a capability body have two options right now:

  1. Raw SPARQL through sparql_proxy.validate_query + GraphStore.query. Safe (the proxy blocks UPDATE / SERVICE / federation), but the author hand-assembles strings, binds parameters, parses bindings back into Python objects, and re-implements that boilerplate for every capability.
  2. Hand-rolled per-app helpers. Every team invents its own thin wrapper, none of which compose with shapes.py, provenance, cost, or Cedar.

Meanwhile the sibling project Fabrica (TypeScript-primary) makes ActiveGraph its headline developer-ergonomics play: a Rails-ActiveRecord analogue for RDF. Model.where(...), instance.save(), associations traversed as property paths. This is the surface a developer coming from Rails, Django, or Prisma expects the moment they hear the word "framework."

Named cost of inaction: without an ORM, the Rails analogy in Trails' positioning is aspirational, not concrete. "Convention over configuration" is an empty slogan if the model layer is "write SPARQL yourself." Every capability body shipped without an ORM cements raw-SPARQL patterns that will be expensive to migrate away from later.

The framework already has the ingredients:

  • async GraphStore seam (graph.py) — reads via query, writes via add_quads, named-graph scoping as a first-class parameter.
  • SHACL validation (shapes.py) — @shape + predicate() register shapes with the kernel validator.
  • SPARQL safety + parameter binding plumbing (sparql_proxy.py) — the validator + inject_from_named used by any read path.
  • Provenance (ADR-0009) and cost envelopes (ADR-0012) already intercept capability invocation boundaries.

What is missing is the surface convention that binds a Python class to an RDF type and translates pythonic read/write calls into the SPARQL those pieces already run.

Decision

Add an ActiveGraph-style ORM as a new Python module python/src/trails/orm.py. The ORM is a surface convention layered on top of the existing async GraphStore; it is not a new kernel capability and it does not introduce a new storage path. Every ORM call lowers to SPARQL the kernel already runs today.

TypeScript parity is deferred to a follow-on ADR once the Python surface stabilises (see §Scope fence).

Surface

# python/src/trails/orm.py (design — not yet implemented)

@model(type_iri="https://myapp.example/ns/Patient", graph="app:patients")
class Patient:
    name: str = property_(predicate="schema:name")
    age:  int = property_(predicate="schema:age", datatype="xsd:integer")

# Phase 1 — write + single-entity read
p = Patient(name="Ada", age=36)
await p.save()                       # SHACL-validated INSERT
p2 = await Patient.find(iri="https://myapp.example/p/ada")

# Phase 2 — query builder
adults = await (Patient
                .where(age__gte=18)
                .order_by("name")
                .limit(50)
                .fetch())

# Escape hatch — tagged template for raw, parameterised SPARQL
min_age = 18
rows = await sparql"""
    SELECT ?p ?name WHERE {
      ?p a <https://myapp.example/ns/Patient> ;
         schema:name ?name ;
         schema:age  ?age .
      FILTER(?age >= {min_age})
    }
"""

The primitives:

  • @model(type_iri=..., graph=...) — class decorator binding a Python class to an RDF type. Accepts the same prefix/extends machinery as @shape; in fact @model is a superset of @shape — it calls shape() internally so SHACL registration is free.
  • @property_(predicate=..., datatype=...) — field decorator. Named property_ to avoid shadowing the builtin. Reuses shapes.PredicateInfo.
  • Model.find(iri) — single-entity lookup, returns an instance or None. Lowers to a constant SPARQL SELECT keyed on the IRI.
  • Model.where(**filters) — returns a chainable Query with .limit(n), .order_by("field"), .fetch() (async terminal). Filter kwargs use Django-style suffixes (__gte, __lt, __in) and map to SPARQL FILTER clauses.
  • instance.save() — SHACL-validates against the registered shape (via shapes.validate_via_kernel) then issues an idempotent INSERT DATA through GraphStore.add_quads scoped to the model's named graph.
  • sparql"..." — tagged-template helper for parameterised raw SPARQL. Interpolated values are escaped as typed literals or IRIs by the same binder sparql_proxy exposes; the result goes through validate_query before reaching the store. This is the explicit escape hatch for queries the ORM cannot express (complex aggregates, federated reads once policy allows them, analytical queries).

Lowering example

Patient.where(age__gte=18).order_by("name").limit(50).fetch() lowers to:

# Generated by orm.py — do not hand-edit.
SELECT ?iri ?name ?age FROM NAMED <app:patients> WHERE {
  GRAPH <app:patients> {
    ?iri a <https://myapp.example/ns/Patient> ;
         <http://schema.org/name> ?name ;
         <http://schema.org/age>  ?age .
    FILTER(?age >= 18)
  }
}
ORDER BY ?name
LIMIT 50

The FROM NAMED clause is injected by the same inject_from_named path that already enforces tenant scoping; the ORM does not bypass it.

What the ORM reuses (not re-implements)

Concern Reused from
Async read/write primitives graph.GraphStore (ADR-0001 seam)
SHACL validation on write shapes.validate_via_kernel (ADR-0002)
SPARQL safety + FROM NAMED injection sparql_proxy.validate_query, inject_from_named
Parameter escaping for sparql"..." The binder already used by sparql_proxy
Named-graph scoping GraphStore named_graphs parameter
Provenance on writes Capability-level PROV-O emission (ADR-0009) — ORM does not fork this path
Cost envelopes Capability-level envelope (ADR-0012) — unchanged

Scope fence — what the ORM does NOT do

  • No relational joins beyond RDF property paths. There is no JOIN translator. Associations traverse predicates; anything more complex stays in sparql"...".
  • No migration DSL. Schema evolution is trails onto evolve's job.
  • No schema introspection from SHACL. Phase 2 may add helpers that read shapes._SHAPES to generate typed query stubs; the first cut requires explicit @property_ declarations.
  • No TypeScript surface in this ADR. TS parity is a follow-on ADR once the Python surface stabilises. The names (@model, property_, where, save) are chosen with a TS counterpart in mind, but nothing here commits the TS design.
  • No lazy loading / identity map / unit-of-work. Each call is a discrete async round-trip. If these patterns prove necessary they earn their own ADR; guessing up front invites the "ORM that leaks everywhere" anti-pattern.

Phased delivery

Phase Scope Gate
1 @model, @property_, Model.find, instance.save (write + single-entity read). SHACL validation on save. Green capability-body test showing one-shape CRUD through the ORM with no raw SPARQL in user code.
2 Model.where, .limit, .order_by, .fetch (query builder). Django-style filter suffixes. Two reference capabilities in the demo app converted from raw SPARQL to .where.
3 Property-path traversal for associations (Patient.where(care_team__lead__name="…") lowering to SPARQL property paths, not joins). Bench showing property-path lowering stays under NFR-Perf1 on a 10k-triple graph.
4 TypeScript parity — own follow-on ADR, not covered here.

PROV-O emission on .save() and Cedar-policy interaction with Model(graph=...) are open questions (see below); both are explicitly out of phase 1 so the first cut ships without entangling two ADRs that are still settling.

Consequences

Positive

  • Rails analogy becomes concrete. A developer evaluating Trails can point at a model class and see "this is the Rails part." The positioning document stops being aspirational.
  • Capability bodies stop writing raw SPARQL. Expectation: ≥ 80% of capability bodies in the first reference app use only ORM surface. This is the number to verify after Phase 2 ships.
  • Cooperates with existing decisions rather than competing. Shapes, provenance, cost, and policy remain the authoritative layers; the ORM is a surface.
  • Lowers the "what does a Trails app look like?" onboarding cost. New contributors read orm.py, not seven modules.

Negative

  • New surface to maintain. Every new SPARQL feature the kernel gains (property paths, aggregates, full-text search) is a question of "does this get an ORM verb, or stay in sparql"..."?" Each answer is a design call.
  • ORM leaks abstractions. RDF is not SQL. An IRI is not a primary key. A predicate is not a column. The ORM will occasionally surface RDF weirdness (blank nodes, multiple values per predicate, language tags) that pretending-it's-Django will not hide gracefully.
  • Risk of encouraging anti-patterns. N+1 queries over associations. Joins-that-should-be-property-paths. The ORM makes easy things easy, which historically means hard things look easy too, which historically means production incidents.

Mitigations

  • Linter warning (trails doctor rule) when a capability body contains raw GraphStore.query(...) calls. Use sparql"..." or the ORM; raw .query is a framework-internal path.
  • Docs emphasise the escape hatch. Every where / find example ends with "if this doesn't express what you need, drop to sparql." Make the escape hatch load-bearing, not shameful.
  • Benchmark + reference bad pattern. Ship one example in the docs of an N+1 ORM loop and the property-path rewrite beside it, with numbers. Teach the failure mode explicitly.

Non-consequences

  • Kernel surface (ADR-0001) unchanged — no new Rust crates, no new PyO3 types.
  • Shapes (ADR-0002) unchanged — @model composes with @shape, does not replace it.
  • Provenance always-on (ADR-0009) unchanged — the ORM writes through the same capability boundary that emits PROV-O / ECT (ADR-0013) today.
  • Cost primitive (ADR-0012) unchanged — envelope opens/closes at the capability boundary, above the ORM.
  • Cedar policy (ADR-0006) unchanged — access checks happen at the capability boundary, not inside the ORM. Model(graph=...) is declarative metadata, not an authorisation bypass (see Open questions).

Revisit conditions

  • If Phase 2 ships and the "≥ 80% of capability bodies use only ORM" target is missed by a wide margin, re-evaluate: either the surface is wrong, or the escape hatch is too tempting, or the ORM was the wrong abstraction.
  • If sparql"..." usage in capability bodies also collapses (people abandon both surfaces for hand-rolled helpers), the ORM is in the uncanny valley — treat as a signal to simplify, not extend.
  • If TypeScript adoption stalls on waiting for ORM parity, bring ADR-0018 (the TS ADR) forward.

Alternatives considered

  • Do nothing — keep sparql_proxy as the only read surface. Rejected. The ergonomics gap against Fabrica is real and the Rails analogy suffers with every shipped capability that embeds raw SPARQL.
  • Adopt RDFLib's existing sugar (Graph.triples, Resource, etc). Rejected. RDFLib idioms are synchronous and don't compose cleanly with Trails' async GraphStore, named-graph-first conventions, SHACL validation hook, or cost/provenance interception points. We'd spend more effort adapting RDFLib than writing 300 focused lines.
  • Port Fabrica's ActiveGraph directly. Rejected as a full port; partially adopted at the surface-name level. Fabrica is TS-primary and its internal mapping targets a different store abstraction. Using the same public names (@model, where, save) preserves the cross-framework mental model without coupling implementations.
  • Code-generate from SHACL. Considered for phase 2+ (auto-generate @model classes from declared shapes). Not the v1 surface — explicit declarations give better IDE ergonomics and keep the learning curve shallow.

Open questions

  1. Does save() emit PROV-O automatically, or only when called from inside an @capability body? ADR-0009 mandates always-on provenance at the capability boundary. If save() runs outside a capability (e.g., in a seed script), should it still emit PROV-O? Likely answer: yes, with a synthetic prov:Activity attributed to the caller identity — but this needs the identity module's consent (see ADR-0013).
  2. Is ORM use inside capability bodies the **recommended pattern, or just the allowed pattern?** If recommended, the @capability docstring in decorators.py should say so; if merely allowed, the docs must make the trade-offs explicit.
  3. How does Model(graph=...) compose with Cedar policies (ADR-0006)? The named graph is often the tenant boundary. If a model hard-codes graph="app:patients" but the caller has policy that restricts them to graph="app:patients:tenant-42", whose declaration wins? Proposed: policy always wins; the model's graph= is a default, overridable by the capability-level tenant scope.
  4. How do multi-valued predicates surface? A schema:keyword might have ten values. Does Patient.keyword return list[str] always, or only when max_count > 1 is declared on the shape? Stance: the shape is the source of truth; @property_ inherits min/max from the underlying PredicateInfo.
  5. Transactional semantics. GraphStore.add_quads is idempotent but not transactional across multiple models. If a capability body save()s three related entities and the third fails SHACL, do we roll back the first two? Phase-1 stance: no — each save() is independent; composition is the capability author's job. Revisit if this bites in the reference app.