ADR-0017: ActiveGraph-style ORM on top of the async GraphStore¶
- Status: Accepted (2026-04-19)
- Date: 2026-04-14
- Supersedes: —
- Superseded by: —
Context¶
Trails positions itself as Rails for agentic-semantic-web apps. The Rails analogy has concrete load-bearing pieces today — conventional project layout, generators, a Python-first surface (ADR-0001), shapes (ADR-0002), always-on provenance (ADR-0009), cost as a primitive (ADR-0012). The model layer is conspicuously missing.
Application authors writing a capability body have two options right now:
- Raw SPARQL through
sparql_proxy.validate_query+GraphStore.query. Safe (the proxy blocks UPDATE /SERVICE/ federation), but the author hand-assembles strings, binds parameters, parses bindings back into Python objects, and re-implements that boilerplate for every capability. - Hand-rolled per-app helpers. Every team invents its own thin wrapper,
none of which compose with
shapes.py, provenance, cost, or Cedar.
Meanwhile the sibling project Fabrica (TypeScript-primary) makes
ActiveGraph its headline developer-ergonomics play: a Rails-ActiveRecord
analogue for RDF. Model.where(...), instance.save(), associations
traversed as property paths. This is the surface a developer coming from
Rails, Django, or Prisma expects the moment they hear the word "framework."
Named cost of inaction: without an ORM, the Rails analogy in Trails' positioning is aspirational, not concrete. "Convention over configuration" is an empty slogan if the model layer is "write SPARQL yourself." Every capability body shipped without an ORM cements raw-SPARQL patterns that will be expensive to migrate away from later.
The framework already has the ingredients:
- async
GraphStoreseam (graph.py) — reads viaquery, writes viaadd_quads, named-graph scoping as a first-class parameter. - SHACL validation (
shapes.py) —@shape+predicate()register shapes with the kernel validator. - SPARQL safety + parameter binding plumbing (
sparql_proxy.py) — the validator +inject_from_namedused by any read path. - Provenance (ADR-0009) and cost envelopes (ADR-0012) already intercept capability invocation boundaries.
What is missing is the surface convention that binds a Python class to an RDF type and translates pythonic read/write calls into the SPARQL those pieces already run.
Decision¶
Add an ActiveGraph-style ORM as a new Python module python/src/trails/orm.py.
The ORM is a surface convention layered on top of the existing async
GraphStore; it is not a new kernel capability and it does not introduce a
new storage path. Every ORM call lowers to SPARQL the kernel already runs
today.
TypeScript parity is deferred to a follow-on ADR once the Python surface stabilises (see §Scope fence).
Surface¶
# python/src/trails/orm.py (design — not yet implemented)
@model(type_iri="https://myapp.example/ns/Patient", graph="app:patients")
class Patient:
name: str = property_(predicate="schema:name")
age: int = property_(predicate="schema:age", datatype="xsd:integer")
# Phase 1 — write + single-entity read
p = Patient(name="Ada", age=36)
await p.save() # SHACL-validated INSERT
p2 = await Patient.find(iri="https://myapp.example/p/ada")
# Phase 2 — query builder
adults = await (Patient
.where(age__gte=18)
.order_by("name")
.limit(50)
.fetch())
# Escape hatch — tagged template for raw, parameterised SPARQL
min_age = 18
rows = await sparql"""
SELECT ?p ?name WHERE {
?p a <https://myapp.example/ns/Patient> ;
schema:name ?name ;
schema:age ?age .
FILTER(?age >= {min_age})
}
"""
The primitives:
@model(type_iri=..., graph=...)— class decorator binding a Python class to an RDF type. Accepts the same prefix/extends machinery as@shape; in fact@modelis a superset of@shape— it callsshape()internally so SHACL registration is free.@property_(predicate=..., datatype=...)— field decorator. Namedproperty_to avoid shadowing the builtin. Reusesshapes.PredicateInfo.Model.find(iri)— single-entity lookup, returns an instance orNone. Lowers to a constant SPARQLSELECTkeyed on the IRI.Model.where(**filters)— returns a chainableQuerywith.limit(n),.order_by("field"),.fetch()(async terminal). Filter kwargs use Django-style suffixes (__gte,__lt,__in) and map to SPARQLFILTERclauses.instance.save()— SHACL-validates against the registered shape (viashapes.validate_via_kernel) then issues an idempotentINSERT DATAthroughGraphStore.add_quadsscoped to the model's named graph.sparql"..."— tagged-template helper for parameterised raw SPARQL. Interpolated values are escaped as typed literals or IRIs by the same bindersparql_proxyexposes; the result goes throughvalidate_querybefore reaching the store. This is the explicit escape hatch for queries the ORM cannot express (complex aggregates, federated reads once policy allows them, analytical queries).
Lowering example¶
Patient.where(age__gte=18).order_by("name").limit(50).fetch() lowers to:
# Generated by orm.py — do not hand-edit.
SELECT ?iri ?name ?age FROM NAMED <app:patients> WHERE {
GRAPH <app:patients> {
?iri a <https://myapp.example/ns/Patient> ;
<http://schema.org/name> ?name ;
<http://schema.org/age> ?age .
FILTER(?age >= 18)
}
}
ORDER BY ?name
LIMIT 50
The FROM NAMED clause is injected by the same inject_from_named path
that already enforces tenant scoping; the ORM does not bypass it.
What the ORM reuses (not re-implements)¶
| Concern | Reused from |
|---|---|
| Async read/write primitives | graph.GraphStore (ADR-0001 seam) |
| SHACL validation on write | shapes.validate_via_kernel (ADR-0002) |
SPARQL safety + FROM NAMED injection |
sparql_proxy.validate_query, inject_from_named |
Parameter escaping for sparql"..." |
The binder already used by sparql_proxy |
| Named-graph scoping | GraphStore named_graphs parameter |
| Provenance on writes | Capability-level PROV-O emission (ADR-0009) — ORM does not fork this path |
| Cost envelopes | Capability-level envelope (ADR-0012) — unchanged |
Scope fence — what the ORM does NOT do¶
- No relational joins beyond RDF property paths. There is no
JOINtranslator. Associations traverse predicates; anything more complex stays insparql"...". - No migration DSL. Schema evolution is
trails onto evolve's job. - No schema introspection from SHACL. Phase 2 may add helpers that
read
shapes._SHAPESto generate typed query stubs; the first cut requires explicit@property_declarations. - No TypeScript surface in this ADR. TS parity is a follow-on ADR
once the Python surface stabilises. The names (
@model,property_,where,save) are chosen with a TS counterpart in mind, but nothing here commits the TS design. - No lazy loading / identity map / unit-of-work. Each call is a discrete async round-trip. If these patterns prove necessary they earn their own ADR; guessing up front invites the "ORM that leaks everywhere" anti-pattern.
Phased delivery¶
| Phase | Scope | Gate |
|---|---|---|
| 1 | @model, @property_, Model.find, instance.save (write + single-entity read). SHACL validation on save. |
Green capability-body test showing one-shape CRUD through the ORM with no raw SPARQL in user code. |
| 2 | Model.where, .limit, .order_by, .fetch (query builder). Django-style filter suffixes. |
Two reference capabilities in the demo app converted from raw SPARQL to .where. |
| 3 | Property-path traversal for associations (Patient.where(care_team__lead__name="…") lowering to SPARQL property paths, not joins). |
Bench showing property-path lowering stays under NFR-Perf1 on a 10k-triple graph. |
| 4 | TypeScript parity — own follow-on ADR, not covered here. |
PROV-O emission on .save() and Cedar-policy interaction with Model(graph=...)
are open questions (see below); both are explicitly out of phase 1 so the
first cut ships without entangling two ADRs that are still settling.
Consequences¶
Positive¶
- Rails analogy becomes concrete. A developer evaluating Trails can point at a model class and see "this is the Rails part." The positioning document stops being aspirational.
- Capability bodies stop writing raw SPARQL. Expectation: ≥ 80% of capability bodies in the first reference app use only ORM surface. This is the number to verify after Phase 2 ships.
- Cooperates with existing decisions rather than competing. Shapes, provenance, cost, and policy remain the authoritative layers; the ORM is a surface.
- Lowers the "what does a Trails app look like?" onboarding cost. New
contributors read
orm.py, not seven modules.
Negative¶
- New surface to maintain. Every new SPARQL feature the kernel gains
(property paths, aggregates, full-text search) is a question of "does
this get an ORM verb, or stay in
sparql"..."?" Each answer is a design call. - ORM leaks abstractions. RDF is not SQL. An IRI is not a primary key. A predicate is not a column. The ORM will occasionally surface RDF weirdness (blank nodes, multiple values per predicate, language tags) that pretending-it's-Django will not hide gracefully.
- Risk of encouraging anti-patterns. N+1 queries over associations. Joins-that-should-be-property-paths. The ORM makes easy things easy, which historically means hard things look easy too, which historically means production incidents.
Mitigations¶
- Linter warning (
trails doctorrule) when a capability body contains rawGraphStore.query(...)calls. Usesparql"..."or the ORM; raw.queryis a framework-internal path. - Docs emphasise the escape hatch. Every
where/findexample ends with "if this doesn't express what you need, drop tosparql." Make the escape hatch load-bearing, not shameful. - Benchmark + reference bad pattern. Ship one example in the docs of an N+1 ORM loop and the property-path rewrite beside it, with numbers. Teach the failure mode explicitly.
Non-consequences¶
- Kernel surface (ADR-0001) unchanged — no new Rust crates, no new PyO3 types.
- Shapes (ADR-0002) unchanged —
@modelcomposes with@shape, does not replace it. - Provenance always-on (ADR-0009) unchanged — the ORM writes through the same capability boundary that emits PROV-O / ECT (ADR-0013) today.
- Cost primitive (ADR-0012) unchanged — envelope opens/closes at the capability boundary, above the ORM.
- Cedar policy (ADR-0006) unchanged — access checks happen at the
capability boundary, not inside the ORM.
Model(graph=...)is declarative metadata, not an authorisation bypass (see Open questions).
Revisit conditions¶
- If Phase 2 ships and the "≥ 80% of capability bodies use only ORM" target is missed by a wide margin, re-evaluate: either the surface is wrong, or the escape hatch is too tempting, or the ORM was the wrong abstraction.
- If
sparql"..."usage in capability bodies also collapses (people abandon both surfaces for hand-rolled helpers), the ORM is in the uncanny valley — treat as a signal to simplify, not extend. - If TypeScript adoption stalls on waiting for ORM parity, bring ADR-0018 (the TS ADR) forward.
Alternatives considered¶
- Do nothing — keep
sparql_proxyas the only read surface. Rejected. The ergonomics gap against Fabrica is real and the Rails analogy suffers with every shipped capability that embeds raw SPARQL. - Adopt RDFLib's existing sugar (
Graph.triples,Resource, etc). Rejected. RDFLib idioms are synchronous and don't compose cleanly with Trails' asyncGraphStore, named-graph-first conventions, SHACL validation hook, or cost/provenance interception points. We'd spend more effort adapting RDFLib than writing 300 focused lines. - Port Fabrica's ActiveGraph directly. Rejected as a full port;
partially adopted at the surface-name level. Fabrica is TS-primary and
its internal mapping targets a different store abstraction. Using the
same public names (
@model,where,save) preserves the cross-framework mental model without coupling implementations. - Code-generate from SHACL. Considered for phase 2+ (auto-generate
@modelclasses from declared shapes). Not the v1 surface — explicit declarations give better IDE ergonomics and keep the learning curve shallow.
Open questions¶
- Does
save()emit PROV-O automatically, or only when called from inside an@capabilitybody? ADR-0009 mandates always-on provenance at the capability boundary. Ifsave()runs outside a capability (e.g., in a seed script), should it still emit PROV-O? Likely answer: yes, with a syntheticprov:Activityattributed to the caller identity — but this needs the identity module's consent (see ADR-0013). - Is ORM use inside capability bodies the **recommended pattern, or
just the allowed pattern?** If recommended, the
@capabilitydocstring indecorators.pyshould say so; if merely allowed, the docs must make the trade-offs explicit. - How does
Model(graph=...)compose with Cedar policies (ADR-0006)? The named graph is often the tenant boundary. If a model hard-codesgraph="app:patients"but the caller has policy that restricts them tograph="app:patients:tenant-42", whose declaration wins? Proposed: policy always wins; the model'sgraph=is a default, overridable by the capability-level tenant scope. - How do multi-valued predicates surface? A
schema:keywordmight have ten values. DoesPatient.keywordreturnlist[str]always, or only whenmax_count > 1is declared on the shape? Stance: the shape is the source of truth;@property_inheritsmin/maxfrom the underlyingPredicateInfo. - Transactional semantics.
GraphStore.add_quadsis idempotent but not transactional across multiple models. If a capability bodysave()s three related entities and the third fails SHACL, do we roll back the first two? Phase-1 stance: no — eachsave()is independent; composition is the capability author's job. Revisit if this bites in the reference app.