Skip to content

02 — Architecture

Trails has one surface. A capability is a Python function; it sees one ctx; it grows from plain nodes and edges into typed nodes, SHACL shapes, and OWL classes by the author adding a feature, never by changing module, import, or API shape. The architecture mirrors that framing: a Python framework (~24K LLOC, 94 modules) that owns all framework logic — ORM, policy evaluation, SHACL validation, reasoning, federation, agents, ingestion, vector search, MCP server, CLI, and admin UI — backed by a Rust-embedded storage engine (~5K LLOC, 7 active crates + 5 archived) that provides in-process Oxigraph access, provenance attachment, panic-safe FFI containment, and a structured error taxonomy. The relationship is comparable to Django and SQLite: the framework is Python, the embedded store is Rust. See ADR-0021 for the north-star decision and concepts/progressive-enhancement.md for the conceptual overview.

Layer diagram

+--------------------------------------------------------------+
|  APP                                                         |
|  app/capabilities/  app/shapes/  policies/*.cedar            |
|  ontology/*.ttl (optional)  trails.toml                      |
+--------------------------------------------------------------+
                            |
+--------------------------------------------------------------+
|  FRAMEWORK   (Python, `trails` package — ~24K LLOC)          |
|                                                              |
|  Authoring decorators                                        |
|  @capability   @node_type   @shape   @policy                 |
|  @before  @after  @on_error  @around       (middleware)      |
|  @resource  @prompt                        (MCP)             |
|                                                              |
|  Core modules (all `trails.<name>`, no tier split)           |
|  orm (4,770 LLOC)  kg  shapes  policy  llm  agent           |
|  ingest  vector  testing  observability  rendering           |
|  registry  doctor                                            |
|                                                              |
|  Middleware layer (cross-cutting)                             |
|  before / around / after / on_error on every dispatch.       |
|  Validation, policy, provenance, cost live in the            |
|  coordinator, not in middleware (no skip primitive).          |
|                                                              |
|  Transports                                                  |
|  mcp_server (stdio + SSE)   http_adapter (FastAPI)           |
|  cli (click)                Dev REPL `trails console`        |
+--------------------------------------------------------------+
                            |  PyO3 abi3-py311 FFI (panic-safe)
+--------------------------------------------------------------+
|  STORAGE ENGINE   (Rust, 7 active crates + 5 archived — ~5K LLOC)               |
|  trails-graph   trails-shapes   trails-reason                |
|  trails-policy  trails-prov     trails-caps                  |
|  trails-identity  trails-cost                                |
|  trails-adapters-fuseki  trails-adapters-qlever              |
|  trails-ffi   trails-wasm                                    |
+--------------------------------------------------------------+
                            |
+--------------------------------------------------------------+
|  BACKENDS   (pluggable via trait impls)                      |
|  Graph:  Oxigraph (embedded, default) | Fuseki | Qlever      |
|  Vector: SqliteVecStore (embedded) | QdrantStore             |
|  Queue:  NATS | embedded                                     |
|  LLM:    Anthropic | Ollama | mock                           |
+--------------------------------------------------------------+

Architectural principles

AP1 — Python framework, Rust-embedded store

Framework logic in Python (DX, ecosystem, rapid iteration); storage engine in Rust (in-process Oxigraph without JVM overhead, panic-safe FFI boundary, PROV-O attachment). The Rust layer is an embedded store — comparable to how Django embeds SQLite — not a traditional kernel. Pattern of Rust-accelerated Python proven by Ruff, uv, Polars, Pydantic-core. See ADR-0001.

AP2 — One module, features opt in

There is one trails module, one @capability decorator, one ctx object. Labels, @node_type, @shape, and OWL are additive features that the author enables when the app needs them; no split namespaces, no "tier" choice at project start. Cedar, PROV-O, and SHACL inspect each entity and act on whatever typing is present — labels only, or JSON-Schema type, or SHACL shape, or RDF class ("strongest available type," ADR-0022). See ADR-0021.

AP3 — Opinionated floor, pluggable ceiling

Framework raises the floor with strong defaults (Oxigraph, Cedar, PROV-O always on, MCP primary). Raises the ceiling via traits (swappable graph backends, opt-in reasoning, pluggable identity). No opinions where they lock users in; strong opinions where they save users from themselves.

AP4 — Capability-first, not route-first

URL routes are a projection. The canonical address is the capability descriptor (JSON-LD with shape IRIs, preconditions, costs). HTTP routes and MCP tools are projections of the canonical form. See ADR-0005.

AP5 — IRI is the primary key

Every entity has an IRI; every response carries IRIs. Auto-increment IDs never leak across the API boundary. See ADR-0003.

AP6 — Provenance is not optional

PROV-O triples are emitted on every write; the prov: graph is always populated. You can query it off, but you can't write without it. See ADR-0009.

AP7 — Cost is a framework primitive

Every capability has a cost envelope. Budget limits are enforced by the framework, not by userland decorators. Making cost central forces design conversations to stay honest. See ADR-0012.

AP8 — Humans and agents share the identity model

DIDs for both. VCs for claims. ACT / biscuit tokens for capability-bearing auth. No separate user table + service account table. See ADR-0011 and ADR-0013.

Component responsibilities

Storage engine components (Rust, 7 active crates + 5 archived, ~5K LLOC)

Crate Responsibility Never does
trails-graph SPARQL query/update, named graphs, transactions, snapshots Validation, reasoning, policy
trails-shapes SHACL shape validation on inputs, outputs, and ctx.kg writes Storage, reasoning, policy
trails-reason RDFS + OWL-RL entailment, opt-in via FFI, feature-detected from the loaded ontology Validation, storage, policy
trails-policy Cedar PDP: permit/deny + reasoning trace; strongest-available-type matching (ADR-0022) Authentication (that's identity)
trails-prov PROV-O triple generation, attachment to prov: graph; owns the Assurance enum Cost accounting, identity
trails-caps Stores capability descriptors, emits MCP / JSON-LD / OpenAPI projections Dispatch (lives in the framework coordinator)
trails-identity DID resolution, VC verification, ACT issue/verify, biscuit attenuation, Signer (ECT key custody) Authorization (that's policy)
trails-cost Per-capability envelopes, budget enforcement, CostScope nesting, anomaly hooks Metrics export (that's observability)
trails-adapters-fuseki Apache Jena Fuseki HTTP adapter (async-native) Anything but graph I/O
trails-adapters-qlever Qlever HTTP adapter (async-native, read-heavy) Anything but graph I/O
trails-ffi PyO3 bindings; one module per store subsystem under trails._core; every entry point wraps catch_unwindPyErr Framework logic (lives in Python)
trails-wasm WasmStore over OxigraphStore for browser / edge targets Anything but a thin store facade

Framework components (Python, trails package, ~24K LLOC)

Module Responsibility Delegates to
Decorators (@capability, @node_type, @shape, @policy, @before/@after/@on_error/@around, @resource, @prompt) Metadata registration, dispatch wrapping, middleware binding trails._core for store ops
trails.orm @node_type, Model, QueryBuilder, Q combinators, property-path traversal, sync + async variants trails._core.Store via FFI
trails.kg ctx.kg namespace — add / save / find / where / node / edge / match / traverse trails._core.Store via FFI
trails.shapes @shape + predicate() with one_of / min_value / max_value / pattern / min_length / max_length; SHACL validation logic trails._core.Store via FFI
trails.policy Cedar @policy enforced by invoke(); .cedar file loader; strongest-available-type resolver (ADR-0022); policy evaluation trails._core via FFI
trails.llm LLMClient over anthropic / ollama / mock; streaming + tool calls; PROV-O step links trails._core via FFI for provenance
trails.agent Session (token-windowed, fork / branch / replay, KG persistence); ReAct / Plan-and-Execute / Reflexion planners trails.llm, trails.kg, trails._core
trails.ingest PDF / HTML / Markdown extractors, paragraph chunker, @node_type Document + Chunk, PROV-O per run trails.orm, trails.kg
trails.vector Embedders (mock, sentence-transformers, OpenAI); SqliteVecStore / QdrantStore; hybrid SPARQL + vector retrieve() trails.orm, optional vector backend
trails.testing isolated_kernel, mock_llm, capture_events, fresh_context, shape-pinned assertions Everything (stdlib-only)
trails.observability Hook + Span / tracer / metrics; OTel-friendly events on capability + LLM + KG read/write None (emits; does not collect)
trails.rendering Bi-modal (Markdown + JSON-LD) output from Jinja templates None (pure Python)
MCP server JSON-RPC 2.0 stdio per MCP 2024-11-05; Tools + Resources (resources/list/read/subscribe) + Prompts (prompts/list/get); SSE alongside stdio trails.registry, @resource, @prompt
HTTP adapter FastAPI routes, content negotiation, OpenAPI trails.registry
CLI (trails …) Scaffolding, ontology export, dev server, simulation, registry, doctor Everything below
Middleware layer Cross-cutting before / after / on_error / around on every dispatch; glob-pattern targeting Runtime dispatch coordinator

Request lifecycle (trace)

The full dispatch runs inside a single coordinator step in Python. The coordinator owns a snapshot-isolated graph transaction (via the Rust store layer) spanning policy evaluation, handler execution, and provenance write, so that policy attributes and handler reads observe the same point-in-time graph state (NFR-Sec10). Middleware (before / around / after / on_error) runs around this spine but cannot skip validation, policy, provenance, or cost.

Happy-path order (strict):

  1. framework.render.decode(input)
  2. Middleware — matching @around handlers wrap steps 3–11.
  3. ACT verificationstore.identity.verify_act(ctx.token) (checks signature, jti-replay cache, aud, exp/nbf with +-60 s skew; NFR-Sec5, NFR-Sec6, NFR-Rel5)
  4. Identity resolutionstore.identity.resolve(ctx.principal) via DID method (TLS-pinned for did:web, NFR-Sec7)
  5. Matching @before handlers run in registration order.
  6. Input shape validationstore.validator.validate(input, shape) [abort: 400]
  7. Policy decisionstore.policy.permit?(principal, action, resource) with strongest-available-type match (ADR-0022); decision cached on the coordinator for the handler window [abort: 403]
  8. Budget envelope openstore.cost.open_envelope(estimate) [abort: 429]
  9. Handler execution (snapshot-isolated) — user code runs against ctx (handler reads + writes share the same txn as policy attributes):
  10. ctx.kg.add / save / find / where / node / edge / match / traverse
  11. ctx.llm if an LLM client is configured
  12. store.reasoner.entail? (if an ontology with owl:Class / rdfs:subClassOf is loaded; charged to envelope)
  13. Output shape validationstore.validator.validate(output, shape) [abort: 500; rollback]
  14. Provenance recordingstore.provenance.record(activity); if ProvenanceWriter.assurance == L2|L3 the writer signs the ECT; L3 anchors to the external audit ledger synchronously
  15. Matching @after (success) or @on_error (failure) runs in registration order.
  16. Budget closestore.cost.close_envelope(actual) (RAII: envelope closes even on panic; NFR-Sec14)
  17. Policy decision log — append-only, hash-chained
  18. Coordinator commits the graph txn; response envelope ships to the transport layer

Response envelope: { payload, prov_iri, ect?, cost, consent_receipt, trace_id }.

Deny and failure branches

On any abort above, the coordinator:

  • Rolls back the graph txn (no partial writes, NFR-Rel1).
  • Still emits a provenance record with trails:outcome "denied" (or "validation_failed", "handler_error", "budget_exceeded") — audit posture requires every invocation to produce a PROV-O trail, not just successes (see ADR-0009).
  • Closes the cost envelope with actual=estimate so reserved budget is released.
  • Writes the policy decision log entry even for deny paths.
  • Runs matching @on_error middleware before translating the error per the error taxonomy (§"Error taxonomy") below.

FFI boundary design

  • PyO3-based. One Rust module per store subsystem, unified under a Python trails._core package. The Python framework imports from trails._core; application code never does.
  • Async-capable. PyO3 0.21+ supports Python async def bridging to Rust async fn via pyo3-async-runtimes. Used for streaming capabilities and SSE transport.
  • Zero-copy where it matters. RDF terms passed as borrowed slices; JSON-LD documents serialized once on the Rust side and exposed as bytes to Python.
  • Error mapping. Rust Result<T, TrailsError> → Python trails.TrailsError with structured context (.field, .constraint, .iri).
  • Stable ABI. PyO3 abi3-py311 target; one wheel per platform.

Deployment topologies

Dev (default)

+--------------------------------------+
|  single process                      |
|  +--------------------------------+  |
|  |  Trails Python framework       |  |
|  |  |- MCP server (stdio/SSE)     |  |
|  |  |- FastAPI on :8000           |  |
|  |  `- _core (PyO3)               |  |
|  |     `- Oxigraph (embedded)     |  |
|  |        `- RocksDB (embedded)   |  |
|  +--------------------------------+  |
+--------------------------------------+

Single-node prod

+--------------------------------------------------+
|  Docker host                                     |
|  +--------------+  +--------------+  +--------+  |
|  | Trails app   |<-| NATS         |  | R2/S3  |  |
|  | (n replicas) |  | (queue)      |  | (blobs)|  |
|  +------+-------+  +--------------+  +--------+  |
|         |                                        |
|  +------v-------+  +--------------+              |
|  | Oxigraph     |  | Postgres     |              |
|  | (server)     |  | (relational) |              |
|  +--------------+  +--------------+              |
+--------------------------------------------------+

Cluster (v2)

          +-> Trails pod --+
  agents -+-> Trails pod --+--> Qlever cluster (read)
          +-> Trails pod --+    Oxigraph cluster (write)
                           |    Postgres HA
                           |    NATS cluster
                           +--> R2

Cross-cutting concerns

Observability

  • Structured logs (JSON) to stdout, one line per capability invocation with trace ID.
  • OTLP traces spanning FFI boundary (Python → PyO3 → Oxigraph).
  • trails.observability emits kg_write / kg_query / capability_start / capability_end / llm_call events.
  • PROV-O graph serves as a queryable audit trail.

Error taxonomy

Class HTTP MCP error Source
ValidationError 400 InvalidParams shape mismatch
AuthenticationError 401 Unauthorized bad token / DID
AuthorizationError 403 Forbidden policy deny
PreconditionError 412 PreconditionFailed capability precondition
BudgetExceededError 429 RateLimited cost envelope
HandlerError 500 InternalError user code
BackendError 503 ServiceUnavailable graph/store failure

Security model

Trails' security posture is enforced at the Rust FFI boundary, not by convention in application code. The controls below map to the 01-requirements NFR-Sec5..14 block and the corresponding deep-dive in the design spec (§3.8 request-time enforcement, §3.11 supply chain). This section is the architect-level summary; normative text lives in requirements + ADRs.

  • Token integrity (NFR-Sec5, NFR-Sec6). ACT is the primary capability mandate; biscuit is a chain-delegation attenuation layer. trails-identity maintains a jti-seen cache keyed by token exp + skew and an algorithm allowlist (EdDSA | ES256 | ES384). alg=none and symmetric algs are refused at parse time.
  • DID resolution trust (NFR-Sec7). Non-did:key resolvers use TLS 1.3 with pinned CA bundles or DNSSEC; DID documents are TOFU-pinned per content hash with auditable rotation. No silent substitution path.
  • Caveat bounds (NFR-Sec8). Biscuit Datalog verification has a hard cap on facts, rules, attenuation depth, and wall-clock — a malformed or malicious caveat cannot stall the store engine.
  • SPARQL/SHACL bounds (NFR-Sec9). Every query runs under wall-clock + memory + result-row caps; SHACL recursion is depth-limited.
  • Snapshot isolation (NFR-Sec10). PDP evaluation, handler reads, and provenance writes share one read-snapshot. TOCTOU gaps between "policy said yes" and "handler acted" are closed at the trait level.
  • Provenance write exclusivity (NFR-Sec11). The prov: graph is store-write-only. A handler calling INSERT DATA { GRAPH <prov:> { … } } is denied at the GraphStore boundary before policy even sees it (defense in depth).
  • Key rotation bound (NFR-Sec12). Overlap windows are finite; purged keys refuse previously-issued tokens.
  • L1 export restriction (NFR-Sec13). Unsigned L1 ECT records stay within the originating trust domain; any cross-domain export requires L2+.
  • FFI panic containment (NFR-Sec14). Every PyO3 entry point runs under catch_unwind; a Rust panic becomes a Python TrailsError, never a process abort. The Signer trait owns JOSE key material and performs ECT signing for L2/L3.

Cross-references: ADR-0006 (Cedar + snapshot), ADR-0007 (Oxigraph resource bounds), ADR-0009 (prov write exclusivity, erasure), ADR-0010 (biscuit caveat bounds, rotation), ADR-0011 (DID resolution + VC allowlist), ADR-0013 (ACT/ECT replay + L1 export), ADR-0014 (supply chain), ADR-0021 (progressive enhancement), ADR-0022 (Cedar unified matcher).

Versioning

  • Ontology: OWL owl:versionIRI, SemVer on ontology bundles.
  • Capability: SemVer in descriptor; deprecates field for replacement pointers. version_status ("active"/"deprecated"/"retired") is a registry-admin concern, computable from deprecates pointers rather than a field on the descriptor itself.
  • Shape: SHACL supports shape subclassing; framework supports @shape(deprecates=OldShape).
  • Node type: @node_type is additive — adding a type to an unlabelled label never breaks existing writes.
  • Framework: SemVer. ADRs supersede each other explicitly.

What lives where (cheat sheet)

Concern Store engine (Rust) Framework (Python) User code
SPARQL query execution yes
SHACL validation yes
Cedar policy evaluation (with strongest-available-type match) yes
PROV-O triple emission yes
DID resolution yes
ACT verify/issue; biscuit attenuation yes
ECT emission + JOSE signing (L2/L3, via Signer) yes
RDFS / OWL-RL reasoning (opt-in, feature-detected) yes
Cost budget check yes
MCP protocol handling (Tools + Resources + Prompts) yes
FastAPI routes yes
CLI / scaffolding yes
Shape declaration yes (decorator) yes (class)
Node-type declaration yes (decorator) yes (class)
Capability declaration yes (decorator) yes (function)
Middleware declaration yes (decorator) yes (function)
Policy definition yes (.cedar file)
Ontology (.ttl) yes (optional)
Business logic yes (handler body)