02 — Architecture¶

Trails has one surface. A capability is a Python function; it sees one ctx; it grows from plain nodes and edges into typed nodes, SHACL shapes, and OWL classes by the author adding a feature, never by changing module, import, or API shape. The architecture mirrors that framing: a Python framework (~24K LLOC, 94 modules) that owns all framework logic — ORM, policy evaluation, SHACL validation, reasoning, federation, agents, ingestion, vector search, MCP server, CLI, and admin UI — backed by a Rust-embedded storage engine (~5K LLOC, 7 active crates + 5 archived) that provides in-process Oxigraph access, provenance attachment, panic-safe FFI containment, and a structured error taxonomy. The relationship is comparable to Django and SQLite: the framework is Python, the embedded store is Rust. See ADR-0021 for the north-star decision and concepts/progressive-enhancement.md for the conceptual overview.

Layer diagram¶

+--------------------------------------------------------------+
|  APP                                                         |
|  app/capabilities/  app/shapes/  policies/*.cedar            |
|  ontology/*.ttl (optional)  trails.toml                      |
+--------------------------------------------------------------+
                            |
+--------------------------------------------------------------+
|  FRAMEWORK   (Python, `trails` package — ~24K LLOC)          |
|                                                              |
|  Authoring decorators                                        |
|  @capability   @node_type   @shape   @policy                 |
|  @before  @after  @on_error  @around       (middleware)      |
|  @resource  @prompt                        (MCP)             |
|                                                              |
|  Core modules (all `trails.<name>`, no tier split)           |
|  orm (4,770 LLOC)  kg  shapes  policy  llm  agent           |
|  ingest  vector  testing  observability  rendering           |
|  registry  doctor                                            |
|                                                              |
|  Middleware layer (cross-cutting)                             |
|  before / around / after / on_error on every dispatch.       |
|  Validation, policy, provenance, cost live in the            |
|  coordinator, not in middleware (no skip primitive).          |
|                                                              |
|  Transports                                                  |
|  mcp_server (stdio + SSE)   http_adapter (FastAPI)           |
|  cli (click)                Dev REPL `trails console`        |
+--------------------------------------------------------------+
                            |  PyO3 abi3-py311 FFI (panic-safe)
+--------------------------------------------------------------+
|  STORAGE ENGINE   (Rust, 7 active crates + 5 archived — ~5K LLOC)               |
|  trails-graph   trails-shapes   trails-reason                |
|  trails-policy  trails-prov     trails-caps                  |
|  trails-identity  trails-cost                                |
|  trails-adapters-fuseki  trails-adapters-qlever              |
|  trails-ffi   trails-wasm                                    |
+--------------------------------------------------------------+
                            |
+--------------------------------------------------------------+
|  BACKENDS   (pluggable via trait impls)                      |
|  Graph:  Oxigraph (embedded, default) | Fuseki | Qlever      |
|  Vector: SqliteVecStore (embedded) | QdrantStore             |
|  Queue:  NATS | embedded                                     |
|  LLM:    Anthropic | Ollama | mock                           |
+--------------------------------------------------------------+

Architectural principles¶

AP1 — Python framework, Rust-embedded store¶

Framework logic in Python (DX, ecosystem, rapid iteration); storage engine in Rust (in-process Oxigraph without JVM overhead, panic-safe FFI boundary, PROV-O attachment). The Rust layer is an embedded store — comparable to how Django embeds SQLite — not a traditional kernel. Pattern of Rust-accelerated Python proven by Ruff, uv, Polars, Pydantic-core. See ADR-0001.

AP2 — One module, features opt in¶

There is one trails module, one @capability decorator, one ctx object. Labels, @node_type, @shape, and OWL are additive features that the author enables when the app needs them; no split namespaces, no "tier" choice at project start. Cedar, PROV-O, and SHACL inspect each entity and act on whatever typing is present — labels only, or JSON-Schema type, or SHACL shape, or RDF class ("strongest available type," ADR-0022). See ADR-0021.

AP3 — Opinionated floor, pluggable ceiling¶

Framework raises the floor with strong defaults (Oxigraph, Cedar, PROV-O always on, MCP primary). Raises the ceiling via traits (swappable graph backends, opt-in reasoning, pluggable identity). No opinions where they lock users in; strong opinions where they save users from themselves.

AP4 — Capability-first, not route-first¶

URL routes are a projection. The canonical address is the capability descriptor (JSON-LD with shape IRIs, preconditions, costs). HTTP routes and MCP tools are projections of the canonical form. See ADR-0005.

AP5 — IRI is the primary key¶

Every entity has an IRI; every response carries IRIs. Auto-increment IDs never leak across the API boundary. See ADR-0003.

AP6 — Provenance is not optional¶

PROV-O triples are emitted on every write; the prov: graph is always populated. You can query it off, but you can't write without it. See ADR-0009.

AP7 — Cost is a framework primitive¶

Every capability has a cost envelope. Budget limits are enforced by the framework, not by userland decorators. Making cost central forces design conversations to stay honest. See ADR-0012.

DIDs for both. VCs for claims. ACT / biscuit tokens for capability-bearing auth. No separate user table + service account table. See ADR-0011 and ADR-0013.

Component responsibilities¶

Storage engine components (Rust, 7 active crates + 5 archived, ~5K LLOC)¶

Crate	Responsibility	Never does
`trails-graph`	SPARQL query/update, named graphs, transactions, snapshots	Validation, reasoning, policy
`trails-shapes`	SHACL shape validation on inputs, outputs, and `ctx.kg` writes	Storage, reasoning, policy
`trails-reason`	RDFS + OWL-RL entailment, opt-in via FFI, feature-detected from the loaded ontology	Validation, storage, policy
`trails-policy`	Cedar PDP: permit/deny + reasoning trace; strongest-available-type matching (ADR-0022)	Authentication (that's identity)
`trails-prov`	PROV-O triple generation, attachment to `prov:` graph; owns the `Assurance` enum	Cost accounting, identity
`trails-caps`	Stores capability descriptors, emits MCP / JSON-LD / OpenAPI projections	Dispatch (lives in the framework coordinator)
`trails-identity`	DID resolution, VC verification, ACT issue/verify, biscuit attenuation, `Signer` (ECT key custody)	Authorization (that's policy)
`trails-cost`	Per-capability envelopes, budget enforcement, `CostScope` nesting, anomaly hooks	Metrics export (that's observability)
`trails-adapters-fuseki`	Apache Jena Fuseki HTTP adapter (async-native)	Anything but graph I/O
`trails-adapters-qlever`	Qlever HTTP adapter (async-native, read-heavy)	Anything but graph I/O
`trails-ffi`	PyO3 bindings; one module per store subsystem under `trails._core`; every entry point wraps `catch_unwind` → `PyErr`	Framework logic (lives in Python)
`trails-wasm`	`WasmStore` over `OxigraphStore` for browser / edge targets	Anything but a thin store facade

Framework components (Python, `trails` package, ~24K LLOC)¶

Module	Responsibility	Delegates to
Decorators (`@capability`, `@node_type`, `@shape`, `@policy`, `@before/@after/@on_error/@around`, `@resource`, `@prompt`)	Metadata registration, dispatch wrapping, middleware binding	`trails._core` for store ops
`trails.orm`	`@node_type`, `Model`, `QueryBuilder`, `Q` combinators, property-path traversal, sync + async variants	`trails._core.Store` via FFI
`trails.kg`	`ctx.kg` namespace — `add / save / find / where / node / edge / match / traverse`	`trails._core.Store` via FFI
`trails.shapes`	`@shape` + `predicate()` with `one_of / min_value / max_value / pattern / min_length / max_length`; SHACL validation logic	`trails._core.Store` via FFI
`trails.policy`	Cedar `@policy` enforced by `invoke()`; `.cedar` file loader; strongest-available-type resolver (ADR-0022); policy evaluation	`trails._core` via FFI
`trails.llm`	`LLMClient` over `anthropic` / `ollama` / `mock`; streaming + tool calls; PROV-O step links	`trails._core` via FFI for provenance
`trails.agent`	`Session` (token-windowed, fork / branch / replay, KG persistence); ReAct / Plan-and-Execute / Reflexion planners	`trails.llm`, `trails.kg`, `trails._core`
`trails.ingest`	PDF / HTML / Markdown extractors, paragraph chunker, `@node_type` `Document` + `Chunk`, PROV-O per run	`trails.orm`, `trails.kg`
`trails.vector`	Embedders (mock, sentence-transformers, OpenAI); `SqliteVecStore` / `QdrantStore`; hybrid SPARQL + vector `retrieve()`	`trails.orm`, optional vector backend
`trails.testing`	`isolated_kernel`, `mock_llm`, `capture_events`, `fresh_context`, shape-pinned assertions	Everything (stdlib-only)
`trails.observability`	Hook + `Span` / `tracer` / `metrics`; OTel-friendly events on capability + LLM + KG read/write	None (emits; does not collect)
`trails.rendering`	Bi-modal (Markdown + JSON-LD) output from Jinja templates	None (pure Python)
MCP server	JSON-RPC 2.0 stdio per MCP `2024-11-05`; Tools + Resources (`resources/list/read/subscribe`) + Prompts (`prompts/list/get`); SSE alongside stdio	`trails.registry`, `@resource`, `@prompt`
HTTP adapter	FastAPI routes, content negotiation, OpenAPI	`trails.registry`
CLI (`trails …`)	Scaffolding, ontology export, dev server, simulation, registry, doctor	Everything below
Middleware layer	Cross-cutting `before / after / on_error / around` on every dispatch; glob-pattern targeting	Runtime dispatch coordinator

Request lifecycle (trace)¶

The full dispatch runs inside a single coordinator step in Python. The coordinator owns a snapshot-isolated graph transaction (via the Rust store layer) spanning policy evaluation, handler execution, and provenance write, so that policy attributes and handler reads observe the same point-in-time graph state (NFR-Sec10). Middleware (before / around / after / on_error) runs around this spine but cannot skip validation, policy, provenance, or cost.

Happy-path order (strict):

framework.render.decode(input)
Middleware — matching @around handlers wrap steps 3–11.
ACT verification — store.identity.verify_act(ctx.token) (checks signature, jti-replay cache, aud, exp/nbf with +-60 s skew; NFR-Sec5, NFR-Sec6, NFR-Rel5)
Identity resolution — store.identity.resolve(ctx.principal) via DID method (TLS-pinned for did:web, NFR-Sec7)
Matching @before handlers run in registration order.
Input shape validation — store.validator.validate(input, shape) [abort: 400]
Policy decision — store.policy.permit?(principal, action, resource) with strongest-available-type match (ADR-0022); decision cached on the coordinator for the handler window [abort: 403]
Budget envelope open — store.cost.open_envelope(estimate) [abort: 429]
Handler execution (snapshot-isolated) — user code runs against ctx (handler reads + writes share the same txn as policy attributes):
ctx.kg.add / save / find / where / node / edge / match / traverse
ctx.llm if an LLM client is configured
store.reasoner.entail? (if an ontology with owl:Class / rdfs:subClassOf is loaded; charged to envelope)
Output shape validation — store.validator.validate(output, shape) [abort: 500; rollback]
Provenance recording — store.provenance.record(activity); if ProvenanceWriter.assurance == L2|L3 the writer signs the ECT; L3 anchors to the external audit ledger synchronously
Matching @after (success) or @on_error (failure) runs in registration order.
Budget close — store.cost.close_envelope(actual) (RAII: envelope closes even on panic; NFR-Sec14)
Policy decision log — append-only, hash-chained
Coordinator commits the graph txn; response envelope ships to the transport layer

Response envelope: { payload, prov_iri, ect?, cost, consent_receipt, trace_id }.

Deny and failure branches¶

On any abort above, the coordinator:

Rolls back the graph txn (no partial writes, NFR-Rel1).
Still emits a provenance record with trails:outcome "denied" (or "validation_failed", "handler_error", "budget_exceeded") — audit posture requires every invocation to produce a PROV-O trail, not just successes (see ADR-0009).
Closes the cost envelope with actual=estimate so reserved budget is released.
Writes the policy decision log entry even for deny paths.
Runs matching @on_error middleware before translating the error per the error taxonomy (§"Error taxonomy") below.

FFI boundary design¶

PyO3-based. One Rust module per store subsystem, unified under a Python trails._core package. The Python framework imports from trails._core; application code never does.
Async-capable. PyO3 0.21+ supports Python async def bridging to Rust async fn via pyo3-async-runtimes. Used for streaming capabilities and SSE transport.
Zero-copy where it matters. RDF terms passed as borrowed slices; JSON-LD documents serialized once on the Rust side and exposed as bytes to Python.
Error mapping. Rust Result<T, TrailsError> → Python trails.TrailsError with structured context (.field, .constraint, .iri).
Stable ABI. PyO3 abi3-py311 target; one wheel per platform.

Deployment topologies¶

Dev (default)¶

+--------------------------------------+
|  single process                      |
|  +--------------------------------+  |
|  |  Trails Python framework       |  |
|  |  |- MCP server (stdio/SSE)     |  |
|  |  |- FastAPI on :8000           |  |
|  |  `- _core (PyO3)               |  |
|  |     `- Oxigraph (embedded)     |  |
|  |        `- RocksDB (embedded)   |  |
|  +--------------------------------+  |
+--------------------------------------+

Single-node prod¶

+--------------------------------------------------+
|  Docker host                                     |
|  +--------------+  +--------------+  +--------+  |
|  | Trails app   |<-| NATS         |  | R2/S3  |  |
|  | (n replicas) |  | (queue)      |  | (blobs)|  |
|  +------+-------+  +--------------+  +--------+  |
|         |                                        |
|  +------v-------+  +--------------+              |
|  | Oxigraph     |  | Postgres     |              |
|  | (server)     |  | (relational) |              |
|  +--------------+  +--------------+              |
+--------------------------------------------------+

Cluster (v2)¶

          +-> Trails pod --+
  agents -+-> Trails pod --+--> Qlever cluster (read)
          +-> Trails pod --+    Oxigraph cluster (write)
                           |    Postgres HA
                           |    NATS cluster
                           +--> R2

Cross-cutting concerns¶

Observability¶

Structured logs (JSON) to stdout, one line per capability invocation with trace ID.
OTLP traces spanning FFI boundary (Python → PyO3 → Oxigraph).
trails.observability emits kg_write / kg_query / capability_start / capability_end / llm_call events.
PROV-O graph serves as a queryable audit trail.

Error taxonomy¶

Class	HTTP	MCP error	Source
`ValidationError`	400	`InvalidParams`	shape mismatch
`AuthenticationError`	401	`Unauthorized`	bad token / DID
`AuthorizationError`	403	`Forbidden`	policy deny
`PreconditionError`	412	`PreconditionFailed`	capability precondition
`BudgetExceededError`	429	`RateLimited`	cost envelope
`HandlerError`	500	`InternalError`	user code
`BackendError`	503	`ServiceUnavailable`	graph/store failure

Security model¶

Trails' security posture is enforced at the Rust FFI boundary, not by convention in application code. The controls below map to the 01-requirements NFR-Sec5..14 block and the corresponding deep-dive in the design spec (§3.8 request-time enforcement, §3.11 supply chain). This section is the architect-level summary; normative text lives in requirements + ADRs.

Token integrity (NFR-Sec5, NFR-Sec6). ACT is the primary capability mandate; biscuit is a chain-delegation attenuation layer. trails-identity maintains a jti-seen cache keyed by token exp + skew and an algorithm allowlist (EdDSA | ES256 | ES384). alg=none and symmetric algs are refused at parse time.
DID resolution trust (NFR-Sec7). Non-did:key resolvers use TLS 1.3 with pinned CA bundles or DNSSEC; DID documents are TOFU-pinned per content hash with auditable rotation. No silent substitution path.
Caveat bounds (NFR-Sec8). Biscuit Datalog verification has a hard cap on facts, rules, attenuation depth, and wall-clock — a malformed or malicious caveat cannot stall the store engine.
SPARQL/SHACL bounds (NFR-Sec9). Every query runs under wall-clock + memory + result-row caps; SHACL recursion is depth-limited.
Snapshot isolation (NFR-Sec10). PDP evaluation, handler reads, and provenance writes share one read-snapshot. TOCTOU gaps between "policy said yes" and "handler acted" are closed at the trait level.
Provenance write exclusivity (NFR-Sec11). The prov: graph is store-write-only. A handler calling INSERT DATA { GRAPH <prov:> { … } } is denied at the GraphStore boundary before policy even sees it (defense in depth).
Key rotation bound (NFR-Sec12). Overlap windows are finite; purged keys refuse previously-issued tokens.
L1 export restriction (NFR-Sec13). Unsigned L1 ECT records stay within the originating trust domain; any cross-domain export requires L2+.
FFI panic containment (NFR-Sec14). Every PyO3 entry point runs under catch_unwind; a Rust panic becomes a Python TrailsError, never a process abort. The Signer trait owns JOSE key material and performs ECT signing for L2/L3.

Cross-references: ADR-0006 (Cedar + snapshot), ADR-0007 (Oxigraph resource bounds), ADR-0009 (prov write exclusivity, erasure), ADR-0010 (biscuit caveat bounds, rotation), ADR-0011 (DID resolution + VC allowlist), ADR-0013 (ACT/ECT replay + L1 export), ADR-0014 (supply chain), ADR-0021 (progressive enhancement), ADR-0022 (Cedar unified matcher).

Versioning¶

Ontology: OWL owl:versionIRI, SemVer on ontology bundles.
Capability: SemVer in descriptor; deprecates field for replacement pointers. version_status ("active"/"deprecated"/"retired") is a registry-admin concern, computable from deprecates pointers rather than a field on the descriptor itself.
Shape: SHACL supports shape subclassing; framework supports @shape(deprecates=OldShape).
Node type: @node_type is additive — adding a type to an unlabelled label never breaks existing writes.
Framework: SemVer. ADRs supersede each other explicitly.

What lives where (cheat sheet)¶

Concern	Store engine (Rust)	Framework (Python)	User code
SPARQL query execution	yes	—	—
SHACL validation	yes	—	—
Cedar policy evaluation (with strongest-available-type match)	yes	—	—
PROV-O triple emission	yes	—	—
DID resolution	yes	—	—
ACT verify/issue; biscuit attenuation	yes	—	—
ECT emission + JOSE signing (L2/L3, via `Signer`)	yes	—	—
RDFS / OWL-RL reasoning (opt-in, feature-detected)	yes	—	—
Cost budget check	yes	—	—
MCP protocol handling (Tools + Resources + Prompts)	—	yes	—
FastAPI routes	—	yes	—
CLI / scaffolding	—	yes	—
Shape declaration	—	yes (decorator)	yes (class)
Node-type declaration	—	yes (decorator)	yes (class)
Capability declaration	—	yes (decorator)	yes (function)
Middleware declaration	—	yes (decorator)	yes (function)
Policy definition	—	—	yes (.cedar file)
Ontology (`.ttl`)	—	—	yes (optional)
Business logic	—	—	yes (handler body)