Skip to content

ADR-0020: Tiered KG surface (T1 / T2 / T3)

  • Status: Superseded by ADR-0021 (2026-04-14)
  • Date: 2026-04-14
  • Target: M11 — proposed as v4.0.0 (positioning-level change)
  • Supersedes:
  • Superseded by: ADR-0021 (progressive enhancement, not tiered surfaces)

Note 2026-04-14: During drafting of amendments ADR-0005a/0006a/0009a it became clear the tier framing added more complexity than it removed. Superseded by ADR-0021 the same day, which replaces tiered surfaces with progressive feature adoption on a single trails surface. The analysis in this ADR of "what a plain-KG app needs vs a semweb app" remains valid as motivation; the three-tier framing and the tier enum do not. - Amends (pending): ADR-0002, ADR-0004, ADR-0005, ADR-0006, ADR-0009, ADR-0012, ADR-0017 (see Open questions §5)

Context

Today Trails' entry price is the entire semantic-web stack. To write even a throwaway agent that stores "this claim has this source with this confidence" a developer must declare a @shape, learn JSON-LD contexts, understand IRI minting (ADR-0003), opt into SHACL validation, read the reasoning ADR (ADR-0004) to be sure they don't want it, and only then start wiring their actual app. The Rails analogy breaks at the front door: Rails never asked you to learn third normal form before rails new.

Surveying the problem space for agentic KG apps:

  • Evidence graphs. Fact / claim / source triples with confidence scores. Need traversal and provenance. Don't need OWL, don't need closed-world SHACL, don't need JSON-LD.
  • Entity graphs. Nodes + relations + JSON-ish properties — the Neo4j shape. People, companies, documents, threads. Typed properties, ad-hoc traversal. Don't need subclass reasoning.
  • Event / temporal graphs. Actions, observations, timestamps. Window queries, causality, ordering. Don't need ontology alignment.
  • Labelled property graphs for prototypes. Someone sketching an agentic app wants graph.create_node(...) and graph.edge(a, b), not a Turtle file.

Honest estimate: ~80% of agentic KG apps live here. The remaining ~20% — the ones Trails was originally designed for (regulated workflows, compliance subgraphs, PROV-O evidence chains, cross-vocabulary alignment) — genuinely need the full semweb surface.

Comparison to Fabrica (TypeScript-primary sibling framework): Fabrica is also semweb-native by default. Neither framework serves the plain-KG use case today. This is an uncontested market segment, not a crowded one.

The current codebase backs this up: graph.py exposes Quad, URIRef, BNode, Literal, and async SPARQL; shapes.py emits SHACL Turtle; there is no surface shorter than "declare a shape, register it with the kernel." That is the correct surface for the 20%, and the wrong one for the 80%.

Decision

Trails exposes a tiered surface with three levels, each composable with the next. The tiers are additive, not exclusive — a single app can hold most of its data at T1 and a compliance subgraph at T3, in the same process, in the same store, with no ETL.

T1 — Plain KG

  • Module: trails.kg (new, Phase 1 of M11).
  • Surface: Node, Edge, Graph. String labels, JSON-typed properties. Cypher-ish methods: Graph.create_node(label, **props), Graph.create_edge(src, dst, rel, **props), Graph.match(label=..., **filters), Graph.traverse(start, rels=[...], depth=N).
  • Lowering to Oxigraph: each Node is an auto-minted IRI in a per-app namespace (ADR-0003 hybrid minting applies); each property is a triple <node> trails-kg:prop/<name> <literal>; each Edge is reified as a named-graph entry — predicate is trails-kg:rel/<label>, edge properties attached via singleton-property reification so that edges can carry properties without RDF* (which Oxigraph does not yet support). The trails-kg:* vocabulary is framework-private.
  • Visible to user: Python objects, string labels, dicts of props. Traversal results are lists of Nodes and Edges. No URIRef, no Quad, no @context.
  • Hidden: the IRI scheme, the reification pattern, the named graph each triple lands in, SHACL, SPARQL, reasoning.
  • Escape hatch: Graph.to_sparql_view(node_or_label) and Graph.sparql(query) for users who want to drop to T3-style queries over T1 data. Documented but not promoted.

T2 — Typed KG

  • Module: trails.kg.types (Phase 2 of M11).
  • Surface: @node_type(name, fields: dict[str, type]), @edge_type(name, source, target, fields: dict[str, type]). Fields validated by JSON Schema at write time (not SHACL). Property domain / range enforcement at write time. Typed queries: Evidence.match(confidence__gte=0.8).
  • Lowering to Oxigraph: same storage as T1 plus one registration triple <type-iri> rdf:type trails-kg:NodeType carrying the JSON Schema as a literal; JSON Schema validation runs in the Python surface against the kernel's PyO3 jsonschema validator (already present).
  • Visible: Python classes, typed fields, typed filters, validation errors with field-level paths.
  • Hidden: SHACL shape emission, JSON-LD, OWL-RL.
  • Escape hatch: a @node_type can be promoted to a @shape (T2→T3) via trails.kg.types.promote(NodeType) — the Python class gains a _trails_shape attribute and is registered with shapes._SHAPES, without re-writing storage. JSON Schema constraints are best-effort translated to SHACL; anything that doesn't translate emits a warning and stays T2-only.

T3 — Full semweb

  • Module: trails (current surface — shapes, graph, sparql_proxy, capability, orm from ADR-0017). Unchanged.
  • Surface: @shape + predicate(), SHACL export, JSON-LD, OWL, SPARQL, reasoning (ADR-0004), rich capability manifest (ADR-0005), ActiveGraph ORM (ADR-0017).
  • Lowering: the path already documented in ADR-0001, ADR-0002, ADR-0007. Nothing new here.
  • Visible: IRIs, predicates, named graphs, SHACL reports, reasoning modes, PROV-O activities, capability descriptors.
  • Docs: existing T3 documentation is unchanged. This ADR does not re-document T3; it positions T3 as the top tier of a three-tier surface.

Composability guarantee

  • Shared storage. All three tiers write to the same Oxigraph instance, the same named-graph scheme, the same GraphStore seam (ADR-0001). No copy, no ETL, no second database.
  • One process, one budget. Cost envelopes (ADR-0012), provenance (ADR-0009 — see Open questions for how it surfaces at T1), and Cedar policies (ADR-0006 — ditto) wrap the capability boundary for all three tiers.
  • Bridging is a view, not a migration. An app can declare that its T1 Citation nodes have a T3 view as schema:Claim by registering a view adapter; reads from the T3 side see the same triples the T1 side wrote.

Concrete surface sketches

All three examples are complete as shown (≤ 15 content lines each) and target the proposed M11 surface.

T1 — plain KG, 10-node graph + 2-hop traversal

from trails.kg import Graph

g = Graph("my-app")
alice = g.create_node("Person", name="Alice")
bob   = g.create_node("Person", name="Bob")
claim = g.create_node("Claim", text="It rained.", confidence=0.9)
src   = g.create_node("Source", url="https://example.org/weather")

g.create_edge(alice, claim, "ASSERTED")
g.create_edge(claim, src,   "CITES")
g.create_edge(bob,   claim, "DISPUTED")

# 2-hop: who cites what Alice asserted?
hits = g.traverse(alice, rels=["ASSERTED", "CITES"], depth=2)
assert src in [h.target for h in hits]

T2 — typed Evidence node

from trails.kg.types import node_type

@node_type("Evidence", fields={"source": str, "confidence": float})
class Evidence: ...

from trails.kg import Graph
g = Graph("my-app")
e = g.insert(Evidence(source="https://example.org/a", confidence=0.82))
# g.insert(Evidence(source=42, confidence=0.82))
#   -> ValidationError: field 'source' expected str, got int
assert Evidence.match(confidence__gte=0.8).first() == e

T3 — same evidence, full semweb

See existing T3 docs: docs/guides/shapes.md, ADR-0002, ADR-0017. The T3 equivalent declares @shape(iri="schema:Claim", extends=["prov:Entity"]) with predicate(...) fields, opts into reasoning="rdfs" at the capability (ADR-0004), and gets SHACL validation, PROV-O emission, and SPARQL access automatically. Not re-documented here.

Tier bridging — worked example

Scenario. An agentic research app is built in three passes.

  1. Prototype (T1). The team ships in two days. trails.kg.Graph stores Papers, Authors, Citations as plain nodes + edges. Agents traverse citation graphs. No semweb anywhere in the codebase.
  2. Typing (T2). A week in, "citation" needs a typed confidence field and a published_at timestamp. The team adds @node_type for Citation. All existing T1 citation nodes now validate against the JSON Schema on next write; existing rows are untouched (JSON Schema enforcement is write-time, not retroactive). The team opts into trails kg doctor to find non-conforming rows.
  3. Compliance (T3). A legal review requires the citation subgraph to emit PROV-O chains and pass SHACL validation for external audit. The team calls trails.kg.types.promote(Citation) — the class gains a SHACL NodeShape, is registered with shapes._SHAPES, and the capability body adds @capability(reasoning="rdfs"). Storage does not change. A SPARQL query against the citations named graph sees the same triples it saw before. T1 agents keep running unchanged on the same nodes; the compliance pipeline queries via T3 SPARQL and SHACL-validates on export.

Key guarantees of the bridge:

  • Storage shared — one Oxigraph instance, one named graph per logical dataset, one copy of each triple.
  • No ETL — promotion is a metadata operation (shape registration + JSON-Schema-to-SHACL best-effort translation). No rewrite of existing quads.
  • No forced migration — T1 call sites keep working. T2 adds write-time validation without changing reads. T3 adds SHACL / reasoning without changing T1 reads.
  • One-way reliability. T1→T2 and T2→T3 are supported. T3→T1 is read-only — a T3 subgraph can be viewed via trails.kg.Graph wrapped over an existing shape, but T1 writes to a T3-shaped subgraph must pass SHACL (or they are rejected). The ADR does not promise seamless T3→T1 write demotion.

Phased delivery (M11 scope)

Phase Scope Gate
1 T1 surface. trails.kg module — Node, Edge, Graph, create_node, create_edge, match, traverse. Oxigraph lowering with the trails-kg:* framework-private vocabulary. Docs + 2 examples. Reference example: 1000-node graph built and traversed, no RDF/SPARQL in user code.
2 T2 surface. @node_type / @edge_type decorators, JSON Schema validation on write, typed filters (__gte, __lt, __in), Model.match(...). trails kg doctor for retroactive conformance checks. Reference example: evidence-graph app with typed Evidence and Source, write-time validation errors surface with field paths.
3 T1↔T3 interop. Documented SPARQL view of T1 data (the trails-kg:* vocabulary is published and stable). Documented T1 view of T3 data (read-only wrapper over a named graph). promote(NodeType) for T2→T3. Round-trip tests both directions. Test: an app writes at T1, queries the same data at T3 via SPARQL, and the reverse; both agree on triple counts and values.
4 Docs rewrite + positioning. docs/00-vision.md reframed as "tiered KG for agentic apps, with semweb as the ceiling not the floor." Tutorial ladder: Start at T1 → add types (T2) → promote to semweb (T3) when you need it. MCP / OpenAPI / capability-manifest projections updated to surface which tier each capability operates at (new field tier: "T1" | "T2" | "T3" — see Open questions). Vision + README + one end-to-end tutorial covering all three tiers on a single app.

Non-goals / scope fence

  • Not a Cypher implementation. T1 is a traversal API (.match, .traverse), not a query language. Users who want Cypher use a Cypher engine.
  • Not Neo4j-compatible. No Bolt protocol, no APOC, no Cypher. The Neo4j-shape surface is the ergonomic reference, not the wire reference.
  • Not abandoning semweb. T3 stays the full current surface. This ADR explicitly does not deprecate @shape, SHACL, SPARQL, OWL-RL, JSON-LD, or any existing T3 construct.
  • Not a competing ORM. T2 @node_type is not a rename of the ADR-0017 ActiveGraph ORM. @node_type targets JSON-Schema-validated KG data without RDF types; @model (ADR-0017) targets RDF-typed entities with SHACL. They cohabit (see Open questions §4).
  • Not a new storage engine. Oxigraph remains the default (ADR-0007). T1/T2 lower to Oxigraph; they do not get a separate store.

Consequences

Positive

  • Reach broadens ~5×. Rough estimate based on the KG-apps-to-semweb-apps ratio in the agentic space: the 80% who would have bounced at the front door now have a door their size.
  • The reference compliance application can start at T1/T2. The compliance parts of the reference application pay the semweb tax; the fact-and-source parts don't. Under the current surface the reference application pays the tax everywhere.
  • Rails analogy gets stronger. trails kg new produces a working app in one command with no ontology ceremony. The ceiling (T3) is the same as today; the floor is new.
  • T3 users benefit too. A T3 app can cheaply hold auxiliary data (request logs, cache metadata, UI state) at T1 without inflating the SHACL surface area.
  • Teaching ladder. Onboarding documentation can climb T1 → T2 → T3 progressively instead of dumping the full stack on day one.

Negative

  • Maintaining three surfaces is expensive. Every future feature gate now asks "which tier does this touch?" — e.g., when MCP gets a new field, the T1 projection, T2 projection, and T3 projection all need answers. Docs approximately triple in complexity.
  • Conceptual leakage risk (T3 users). A semweb purist using T3 may resent seeing T1 concepts in the README, the CLI, and the MCP tool list. The framework's identity becomes "tiered KG," not "the semweb agent framework."
  • Conceptual leakage risk (T1 users). If T1 is too thin a veneer, the RDF weirdness leaks through — blank nodes in error messages, trails-kg:* IRIs in SPARQL views users accidentally see. The veneer must be opaque or the T1 audience won't buy it.
  • Promotion ambiguity. promote(NodeType) makes T2→T3 look free; in practice JSON-Schema-to-SHACL translation drops constraints that don't map (regex patterns, conditional schemas, custom formats). Each dropped constraint is a data-quality regression users may not notice until audit.
  • ADR debt. At least seven existing ADRs assume semweb-first (see Open questions §5). Each needs a tiering amendment; none is urgent alone, all are urgent collectively.

Positioning risk

Trails stops being "the semweb agent framework" and becomes "the tiered KG agent framework." This is the biggest surfaced risk of this ADR. The 2026-04-12 vision (docs/00-vision.md) stakes Trails' identity on the semweb-first thesis — that agents need legible, typed, provenance-bearing APIs and that RDF/SHACL/PROV are the right substrate. A tiered surface does not contradict that thesis, but it dilutes the framing: "Rails for agentic apps, from prototype to compliance" is broader than "Rails for agentic semweb apps." Broader targets are harder to reach with messaging tuned to a regulated-industries audience.

This ADR does not resolve that risk; it surfaces it and forces the decision. The alternative (stay semweb-first) costs the 80%.

Alternatives considered

  • Status quo — T3 only. Rejected. Serves the minority. The 80% go elsewhere, usually to Neo4j or plain Postgres + JSON columns. Trails permanently caps at the semweb niche.
  • Drop semweb entirely — T1 only. Rejected. Throws away the interop, reasoning, provenance-chain, and SHACL stories Trails has already invested in. Also kills the reference compliance application's use case and the regulated-industries positioning that the vision document anchors on.
  • Ship T1 as a separate framework (e.g., trails-kg). Rejected. Two frameworks fragment the narrative ("which Trails do I use?"), duplicate the kernel wiring, and lose the tier-bridging guarantee that is the load-bearing feature of this ADR. Without shared storage and shared kernel, T1→T2→T3 promotion is ETL — and ETL is the thing we are specifically promising users they can skip.
  • T1 veneer as a tutorial convention, not a module. Rejected. A tutorial-only T1 means every serious app drifts back to T3 surface; the 80% don't get a supported path, they get a toy.

Open questions

  1. Cedar policy (ADR-0006) on T1 data. Cedar evaluates over typed principals, actions, and resources; T1 data has string labels, not RDF types. Proposed resolution: T1 resources are typed as the framework-private IRI trails-kg:Node/<label>, so Cedar rules referencing label == "Patient" translate to resource-type checks at the named-graph boundary. Requires ADR-0006 amendment to acknowledge label-typed resources alongside RDF-typed ones.
  2. PROV-O (ADR-0009, always-on) for T1 operations. ADR-0009 mandates provenance on every capability invocation. For T1 writes this needs to work without the user declaring a PROV-O Entity. Proposed: the kernel emits a minimal PROV activity with the trails-kg:Node/<label> type as prov:Entity, and the provenance graph is itself T3 (PROV-O is inherently T3). T1 users never see it unless they query the provenance graph explicitly. Requires ADR-0009 amendment acknowledging non-RDF-typed entities.
  3. Rich capability manifest (ADR-0005) and tier declaration. Does @capability gain a tier field in the canonical manifest? MCP, OpenAPI, and JSON-LD projections all need to answer. Proposed: add tier: "T1" | "T2" | "T3" (default T3 for backward compatibility), surface in JSON-LD manifest and MCP _trails:tier extension. Amends ADR-0005 field inventory.
  4. ActiveGraph ORM (ADR-0017) targeting. Does @model from ADR-0017 target T1, T2, or T3? Current ADR-0017 text reads T3-only (calls shape() internally, SHACL validates on save(), lowers to SPARQL with RDF types). Proposed for this ADR: ADR-0017 is T3-only and stays that way; T2 has its own @node_type; T1 has no ORM (it has Graph.create_node(...)). Amends ADR-0017 to state the tier explicitly.
  5. Which existing ADRs need amendments. Current list, each tracked as a separate follow-on change — none blocking this ADR, all required before M11 closes:
  6. ADR-0002 (Python-first shapes) — note that shapes are T3, that T2 uses JSON Schema, and that promote() is the bridge.
  7. ADR-0004 (Query-time reasoning) — confirm reasoning is T3 only; T1/T2 never materialise :inferred graphs.
  8. ADR-0005 (Rich capability manifest) — add tier field (§3).
  9. ADR-0006 (Cedar policy) — acknowledge label-typed resources (§1).
  10. ADR-0009 (Provenance always-on) — acknowledge non-RDF-typed entities (§2).
  11. ADR-0012 (Cost primitive) — confirm cost envelopes wrap all three tiers identically.
  12. ADR-0017 (ActiveGraph ORM) — state T3-only targeting (§4).
  13. trails-kg:* vocabulary stability. The framework-private vocabulary used to lower T1/T2 to Oxigraph is exposed the moment a user opens a SPARQL view. Is it stable? Proposed: the vocabulary is documented and SemVer'd starting M11; breaking changes gate on a Trails major version. Without this guarantee the T1↔T3 interop story (Phase 3) is not reliable.
  14. T3→T1 read wrappers and polymorphism. A T3 schema:Person with three subclasses renders as what at T1? A single Person label, three labels, or a Person label with a type property? This ADR defers the answer to the Phase 3 design doc; the answer affects whether T1 views of T3 data are useful or a footgun.

Decision owner: (unassigned — target M11 lead). Review gate: amendments §5.1–5.7 must be drafted (not merged) before Phase 1 of M11 begins; Phase 4 blocks on all amendments being merged.