02 — Architecture¶
Trails has one surface. A capability is a Python function; it sees
one ctx; it grows from plain nodes and edges into typed nodes, SHACL
shapes, and OWL classes by the author adding a feature, never by
changing module, import, or API shape. The architecture mirrors that
framing: a Python framework (~24K LLOC, 94 modules) that owns all
framework logic — ORM, policy evaluation, SHACL validation, reasoning,
federation, agents, ingestion, vector search, MCP server, CLI, and admin
UI — backed by a Rust-embedded storage engine (~5K LLOC, 7 active crates + 5 archived) that
provides in-process Oxigraph access, provenance attachment, panic-safe
FFI containment, and a structured error taxonomy. The relationship is
comparable to Django and SQLite: the framework is Python, the embedded
store is Rust. See ADR-0021
for the north-star decision and
concepts/progressive-enhancement.md
for the conceptual overview.
Layer diagram¶
+--------------------------------------------------------------+
| APP |
| app/capabilities/ app/shapes/ policies/*.cedar |
| ontology/*.ttl (optional) trails.toml |
+--------------------------------------------------------------+
|
+--------------------------------------------------------------+
| FRAMEWORK (Python, `trails` package — ~24K LLOC) |
| |
| Authoring decorators |
| @capability @node_type @shape @policy |
| @before @after @on_error @around (middleware) |
| @resource @prompt (MCP) |
| |
| Core modules (all `trails.<name>`, no tier split) |
| orm (4,770 LLOC) kg shapes policy llm agent |
| ingest vector testing observability rendering |
| registry doctor |
| |
| Middleware layer (cross-cutting) |
| before / around / after / on_error on every dispatch. |
| Validation, policy, provenance, cost live in the |
| coordinator, not in middleware (no skip primitive). |
| |
| Transports |
| mcp_server (stdio + SSE) http_adapter (FastAPI) |
| cli (click) Dev REPL `trails console` |
+--------------------------------------------------------------+
| PyO3 abi3-py311 FFI (panic-safe)
+--------------------------------------------------------------+
| STORAGE ENGINE (Rust, 7 active crates + 5 archived — ~5K LLOC) |
| trails-graph trails-shapes trails-reason |
| trails-policy trails-prov trails-caps |
| trails-identity trails-cost |
| trails-adapters-fuseki trails-adapters-qlever |
| trails-ffi trails-wasm |
+--------------------------------------------------------------+
|
+--------------------------------------------------------------+
| BACKENDS (pluggable via trait impls) |
| Graph: Oxigraph (embedded, default) | Fuseki | Qlever |
| Vector: SqliteVecStore (embedded) | QdrantStore |
| Queue: NATS | embedded |
| LLM: Anthropic | Ollama | mock |
+--------------------------------------------------------------+
Architectural principles¶
AP1 — Python framework, Rust-embedded store¶
Framework logic in Python (DX, ecosystem, rapid iteration); storage engine in Rust (in-process Oxigraph without JVM overhead, panic-safe FFI boundary, PROV-O attachment). The Rust layer is an embedded store — comparable to how Django embeds SQLite — not a traditional kernel. Pattern of Rust-accelerated Python proven by Ruff, uv, Polars, Pydantic-core. See ADR-0001.
AP2 — One module, features opt in¶
There is one trails module, one @capability decorator, one ctx
object. Labels, @node_type, @shape, and OWL are additive features
that the author enables when the app needs them; no split namespaces,
no "tier" choice at project start. Cedar, PROV-O, and SHACL inspect
each entity and act on whatever typing is present — labels only, or
JSON-Schema type, or SHACL shape, or RDF class ("strongest available
type," ADR-0022). See ADR-0021.
AP3 — Opinionated floor, pluggable ceiling¶
Framework raises the floor with strong defaults (Oxigraph, Cedar, PROV-O always on, MCP primary). Raises the ceiling via traits (swappable graph backends, opt-in reasoning, pluggable identity). No opinions where they lock users in; strong opinions where they save users from themselves.
AP4 — Capability-first, not route-first¶
URL routes are a projection. The canonical address is the capability descriptor (JSON-LD with shape IRIs, preconditions, costs). HTTP routes and MCP tools are projections of the canonical form. See ADR-0005.
AP5 — IRI is the primary key¶
Every entity has an IRI; every response carries IRIs. Auto-increment IDs never leak across the API boundary. See ADR-0003.
AP6 — Provenance is not optional¶
PROV-O triples are emitted on every write; the prov: graph is always populated. You can query it off, but you can't write without it. See ADR-0009.
AP7 — Cost is a framework primitive¶
Every capability has a cost envelope. Budget limits are enforced by the framework, not by userland decorators. Making cost central forces design conversations to stay honest. See ADR-0012.
AP8 — Humans and agents share the identity model¶
DIDs for both. VCs for claims. ACT / biscuit tokens for capability-bearing auth. No separate user table + service account table. See ADR-0011 and ADR-0013.
Component responsibilities¶
Storage engine components (Rust, 7 active crates + 5 archived, ~5K LLOC)¶
| Crate | Responsibility | Never does |
|---|---|---|
trails-graph |
SPARQL query/update, named graphs, transactions, snapshots | Validation, reasoning, policy |
trails-shapes |
SHACL shape validation on inputs, outputs, and ctx.kg writes |
Storage, reasoning, policy |
trails-reason |
RDFS + OWL-RL entailment, opt-in via FFI, feature-detected from the loaded ontology | Validation, storage, policy |
trails-policy |
Cedar PDP: permit/deny + reasoning trace; strongest-available-type matching (ADR-0022) | Authentication (that's identity) |
trails-prov |
PROV-O triple generation, attachment to prov: graph; owns the Assurance enum |
Cost accounting, identity |
trails-caps |
Stores capability descriptors, emits MCP / JSON-LD / OpenAPI projections | Dispatch (lives in the framework coordinator) |
trails-identity |
DID resolution, VC verification, ACT issue/verify, biscuit attenuation, Signer (ECT key custody) |
Authorization (that's policy) |
trails-cost |
Per-capability envelopes, budget enforcement, CostScope nesting, anomaly hooks |
Metrics export (that's observability) |
trails-adapters-fuseki |
Apache Jena Fuseki HTTP adapter (async-native) | Anything but graph I/O |
trails-adapters-qlever |
Qlever HTTP adapter (async-native, read-heavy) | Anything but graph I/O |
trails-ffi |
PyO3 bindings; one module per store subsystem under trails._core; every entry point wraps catch_unwind → PyErr |
Framework logic (lives in Python) |
trails-wasm |
WasmStore over OxigraphStore for browser / edge targets |
Anything but a thin store facade |
Framework components (Python, trails package, ~24K LLOC)¶
| Module | Responsibility | Delegates to |
|---|---|---|
Decorators (@capability, @node_type, @shape, @policy, @before/@after/@on_error/@around, @resource, @prompt) |
Metadata registration, dispatch wrapping, middleware binding | trails._core for store ops |
trails.orm |
@node_type, Model, QueryBuilder, Q combinators, property-path traversal, sync + async variants |
trails._core.Store via FFI |
trails.kg |
ctx.kg namespace — add / save / find / where / node / edge / match / traverse |
trails._core.Store via FFI |
trails.shapes |
@shape + predicate() with one_of / min_value / max_value / pattern / min_length / max_length; SHACL validation logic |
trails._core.Store via FFI |
trails.policy |
Cedar @policy enforced by invoke(); .cedar file loader; strongest-available-type resolver (ADR-0022); policy evaluation |
trails._core via FFI |
trails.llm |
LLMClient over anthropic / ollama / mock; streaming + tool calls; PROV-O step links |
trails._core via FFI for provenance |
trails.agent |
Session (token-windowed, fork / branch / replay, KG persistence); ReAct / Plan-and-Execute / Reflexion planners |
trails.llm, trails.kg, trails._core |
trails.ingest |
PDF / HTML / Markdown extractors, paragraph chunker, @node_type Document + Chunk, PROV-O per run |
trails.orm, trails.kg |
trails.vector |
Embedders (mock, sentence-transformers, OpenAI); SqliteVecStore / QdrantStore; hybrid SPARQL + vector retrieve() |
trails.orm, optional vector backend |
trails.testing |
isolated_kernel, mock_llm, capture_events, fresh_context, shape-pinned assertions |
Everything (stdlib-only) |
trails.observability |
Hook + Span / tracer / metrics; OTel-friendly events on capability + LLM + KG read/write |
None (emits; does not collect) |
trails.rendering |
Bi-modal (Markdown + JSON-LD) output from Jinja templates | None (pure Python) |
| MCP server | JSON-RPC 2.0 stdio per MCP 2024-11-05; Tools + Resources (resources/list/read/subscribe) + Prompts (prompts/list/get); SSE alongside stdio |
trails.registry, @resource, @prompt |
| HTTP adapter | FastAPI routes, content negotiation, OpenAPI | trails.registry |
CLI (trails …) |
Scaffolding, ontology export, dev server, simulation, registry, doctor | Everything below |
| Middleware layer | Cross-cutting before / after / on_error / around on every dispatch; glob-pattern targeting |
Runtime dispatch coordinator |
Request lifecycle (trace)¶
The full dispatch runs inside a single coordinator step in Python. The
coordinator owns a snapshot-isolated graph transaction (via the Rust
store layer) spanning policy evaluation, handler execution, and
provenance write, so that policy attributes and handler reads observe
the same point-in-time graph state (NFR-Sec10). Middleware
(before / around / after / on_error) runs around this spine but
cannot skip validation, policy, provenance, or cost.
Happy-path order (strict):
framework.render.decode(input)- Middleware — matching
@aroundhandlers wrap steps 3–11. - ACT verification —
store.identity.verify_act(ctx.token)(checks signature,jti-replay cache,aud,exp/nbfwith +-60 s skew; NFR-Sec5, NFR-Sec6, NFR-Rel5) - Identity resolution —
store.identity.resolve(ctx.principal)via DID method (TLS-pinned fordid:web, NFR-Sec7) - Matching
@beforehandlers run in registration order. - Input shape validation —
store.validator.validate(input, shape)[abort: 400] - Policy decision —
store.policy.permit?(principal, action, resource)with strongest-available-type match (ADR-0022); decision cached on the coordinator for the handler window [abort: 403] - Budget envelope open —
store.cost.open_envelope(estimate)[abort: 429] - Handler execution (snapshot-isolated) — user code runs against
ctx(handler reads + writes share the same txn as policy attributes): ctx.kg.add / save / find / where / node / edge / match / traversectx.llmif an LLM client is configuredstore.reasoner.entail?(if an ontology withowl:Class/rdfs:subClassOfis loaded; charged to envelope)- Output shape validation —
store.validator.validate(output, shape)[abort: 500; rollback] - Provenance recording —
store.provenance.record(activity); ifProvenanceWriter.assurance == L2|L3the writer signs the ECT; L3 anchors to the external audit ledger synchronously - Matching
@after(success) or@on_error(failure) runs in registration order. - Budget close —
store.cost.close_envelope(actual)(RAII: envelope closes even on panic; NFR-Sec14) - Policy decision log — append-only, hash-chained
- Coordinator commits the graph txn; response envelope ships to the transport layer
Response envelope: { payload, prov_iri, ect?, cost, consent_receipt, trace_id }.
Deny and failure branches¶
On any abort above, the coordinator:
- Rolls back the graph txn (no partial writes, NFR-Rel1).
- Still emits a provenance record with
trails:outcome "denied"(or"validation_failed","handler_error","budget_exceeded") — audit posture requires every invocation to produce a PROV-O trail, not just successes (see ADR-0009). - Closes the cost envelope with
actual=estimateso reserved budget is released. - Writes the policy decision log entry even for deny paths.
- Runs matching
@on_errormiddleware before translating the error per the error taxonomy (§"Error taxonomy") below.
FFI boundary design¶
- PyO3-based. One Rust module per store subsystem, unified under a Python
trails._corepackage. The Python framework imports fromtrails._core; application code never does. - Async-capable. PyO3 0.21+ supports Python
async defbridging to Rustasync fnviapyo3-async-runtimes. Used for streaming capabilities and SSE transport. - Zero-copy where it matters. RDF terms passed as borrowed slices; JSON-LD documents serialized once on the Rust side and exposed as
bytesto Python. - Error mapping. Rust
Result<T, TrailsError>→ Pythontrails.TrailsErrorwith structured context (.field,.constraint,.iri). - Stable ABI. PyO3 abi3-py311 target; one wheel per platform.
Deployment topologies¶
Dev (default)¶
+--------------------------------------+
| single process |
| +--------------------------------+ |
| | Trails Python framework | |
| | |- MCP server (stdio/SSE) | |
| | |- FastAPI on :8000 | |
| | `- _core (PyO3) | |
| | `- Oxigraph (embedded) | |
| | `- RocksDB (embedded) | |
| +--------------------------------+ |
+--------------------------------------+
Single-node prod¶
+--------------------------------------------------+
| Docker host |
| +--------------+ +--------------+ +--------+ |
| | Trails app |<-| NATS | | R2/S3 | |
| | (n replicas) | | (queue) | | (blobs)| |
| +------+-------+ +--------------+ +--------+ |
| | |
| +------v-------+ +--------------+ |
| | Oxigraph | | Postgres | |
| | (server) | | (relational) | |
| +--------------+ +--------------+ |
+--------------------------------------------------+
Cluster (v2)¶
+-> Trails pod --+
agents -+-> Trails pod --+--> Qlever cluster (read)
+-> Trails pod --+ Oxigraph cluster (write)
| Postgres HA
| NATS cluster
+--> R2
Cross-cutting concerns¶
Observability¶
- Structured logs (JSON) to stdout, one line per capability invocation with trace ID.
- OTLP traces spanning FFI boundary (Python → PyO3 → Oxigraph).
trails.observabilityemitskg_write / kg_query / capability_start / capability_end / llm_callevents.- PROV-O graph serves as a queryable audit trail.
Error taxonomy¶
| Class | HTTP | MCP error | Source |
|---|---|---|---|
ValidationError |
400 | InvalidParams |
shape mismatch |
AuthenticationError |
401 | Unauthorized |
bad token / DID |
AuthorizationError |
403 | Forbidden |
policy deny |
PreconditionError |
412 | PreconditionFailed |
capability precondition |
BudgetExceededError |
429 | RateLimited |
cost envelope |
HandlerError |
500 | InternalError |
user code |
BackendError |
503 | ServiceUnavailable |
graph/store failure |
Security model¶
Trails' security posture is enforced at the Rust FFI boundary, not by convention in application code. The controls below map to the 01-requirements NFR-Sec5..14 block and the corresponding deep-dive in the design spec (§3.8 request-time enforcement, §3.11 supply chain). This section is the architect-level summary; normative text lives in requirements + ADRs.
- Token integrity (NFR-Sec5, NFR-Sec6). ACT is the primary capability mandate; biscuit is a chain-delegation attenuation layer.
trails-identitymaintains ajti-seen cache keyed by tokenexp + skewand an algorithm allowlist (EdDSA | ES256 | ES384).alg=noneand symmetric algs are refused at parse time. - DID resolution trust (NFR-Sec7). Non-
did:keyresolvers use TLS 1.3 with pinned CA bundles or DNSSEC; DID documents are TOFU-pinned per content hash with auditable rotation. No silent substitution path. - Caveat bounds (NFR-Sec8). Biscuit Datalog verification has a hard cap on facts, rules, attenuation depth, and wall-clock — a malformed or malicious caveat cannot stall the store engine.
- SPARQL/SHACL bounds (NFR-Sec9). Every query runs under wall-clock + memory + result-row caps; SHACL recursion is depth-limited.
- Snapshot isolation (NFR-Sec10). PDP evaluation, handler reads, and provenance writes share one read-snapshot. TOCTOU gaps between "policy said yes" and "handler acted" are closed at the trait level.
- Provenance write exclusivity (NFR-Sec11). The
prov:graph is store-write-only. A handler callingINSERT DATA { GRAPH <prov:> { … } }is denied at theGraphStoreboundary before policy even sees it (defense in depth). - Key rotation bound (NFR-Sec12). Overlap windows are finite; purged keys refuse previously-issued tokens.
- L1 export restriction (NFR-Sec13). Unsigned L1 ECT records stay within the originating trust domain; any cross-domain export requires L2+.
- FFI panic containment (NFR-Sec14). Every PyO3 entry point runs under
catch_unwind; a Rust panic becomes a PythonTrailsError, never a process abort. TheSignertrait owns JOSE key material and performs ECT signing for L2/L3.
Cross-references: ADR-0006 (Cedar + snapshot), ADR-0007 (Oxigraph resource bounds), ADR-0009 (prov write exclusivity, erasure), ADR-0010 (biscuit caveat bounds, rotation), ADR-0011 (DID resolution + VC allowlist), ADR-0013 (ACT/ECT replay + L1 export), ADR-0014 (supply chain), ADR-0021 (progressive enhancement), ADR-0022 (Cedar unified matcher).
Versioning¶
- Ontology: OWL
owl:versionIRI, SemVer on ontology bundles. - Capability: SemVer in descriptor;
deprecatesfield for replacement pointers.version_status("active"/"deprecated"/"retired") is a registry-admin concern, computable fromdeprecatespointers rather than a field on the descriptor itself. - Shape: SHACL supports shape subclassing; framework supports
@shape(deprecates=OldShape). - Node type:
@node_typeis additive — adding a type to an unlabelled label never breaks existing writes. - Framework: SemVer. ADRs supersede each other explicitly.
What lives where (cheat sheet)¶
| Concern | Store engine (Rust) | Framework (Python) | User code |
|---|---|---|---|
| SPARQL query execution | yes | — | — |
| SHACL validation | yes | — | — |
| Cedar policy evaluation (with strongest-available-type match) | yes | — | — |
| PROV-O triple emission | yes | — | — |
| DID resolution | yes | — | — |
| ACT verify/issue; biscuit attenuation | yes | — | — |
ECT emission + JOSE signing (L2/L3, via Signer) |
yes | — | — |
| RDFS / OWL-RL reasoning (opt-in, feature-detected) | yes | — | — |
| Cost budget check | yes | — | — |
| MCP protocol handling (Tools + Resources + Prompts) | — | yes | — |
| FastAPI routes | — | yes | — |
| CLI / scaffolding | — | yes | — |
| Shape declaration | — | yes (decorator) | yes (class) |
| Node-type declaration | — | yes (decorator) | yes (class) |
| Capability declaration | — | yes (decorator) | yes (function) |
| Middleware declaration | — | yes (decorator) | yes (function) |
| Policy definition | — | — | yes (.cedar file) |
Ontology (.ttl) |
— | — | yes (optional) |
| Business logic | — | — | yes (handler body) |