ADR-0009: Provenance (PROV-O) always on¶
- Status: Accepted
- Date: 2026-04-12
Context¶
Provenance — who did what, when, to what data, based on what — is fundamental to agent trust. Without it: - Audit impossible. - EU AI Act Art. 12 compliance impossible. - "Why did the agent do this?" cannot be answered. - Cross-agent trust cannot be established. - Data corrections can't be propagated (you can't undo what you can't trace).
Frameworks historically treat provenance as a plugin or afterthought. Options:
- Opt-in plugin. Devs enable provenance per capability. Rails-level: nothing.
- Opt-out default. On by default, devs can disable. Slight friction.
- Always on, queryable off. Triples always generated; apps can choose not to query/expose them. No opt-out of generation.
- Always on, always exposed. Responses always include provenance IRI.
Decision¶
Option 4: Always on, always exposed. PROV-O triples are emitted on every capability invocation. Every response envelope includes a resolvable provenance IRI. The prov: named graph is always populated.
Specifically:
- Every @capability invocation produces a prov:Activity record.
- Inputs are linked via prov:used; outputs via prov:generated; principal via prov:wasAssociatedWith.
- Activity includes prov:startedAtTime and prov:endedAtTime.
- Derivation links are recorded where a capability derives new entities from existing ones.
- All provenance triples live in a per-app-configurable prov: named graph.
- Response envelopes include provenance: { "@id": "..." } pointing to the activity.
Apps cannot turn provenance generation off. They can choose not to expose the IRI externally (by rewriting the envelope in a custom renderer), but the internal graph always has it.
Consequences¶
Positive¶
- Regulatory posture. EU AI Act Art. 12 logging is effectively free for Trails apps.
- Agent trust. Receiving agents can verify the chain and decide whether to trust.
- Debuggability. PROV-O graph is a distributed audit trail queryable via SPARQL.
- Data corrections. When a source is discovered bad, derived data is traceable via
prov:wasDerivedFrom. - Framework differentiator. No comparable framework has provenance as a core primitive.
Negative¶
- Storage cost. Each activity is ~5–20 triples. At high throughput (10k invocations/hour) this is ~50–200k triples/hour. Mitigated by:
- Dedicated named graph enables cheap truncation / archival policies.
- Oxigraph scales well into billions of triples.
- Apps needing extreme throughput can point
prov:graph at a separate store. - Write latency impact. Extra triples per capability = extra write. Benchmarked at ~1–2 ms overhead; acceptable per NFR-Perf1.
- Privacy tension. Provenance may itself contain sensitive data (which principal took which action). Mitigated by:
- Provenance graph is a distinct, auth-gated resource.
- Principals are DIDs, not human identities, by default.
- Capability-aware redaction policies supported v1.5+.
Non-consequences¶
- Apps can still query provenance via SPARQL like any other graph.
- Existing PROV-O tooling (Apache Jena, pyprov) works on the emitted graph.
Revisit conditions¶
- If at-scale storage becomes a real problem and opt-out cohorts emerge, consider a "provenance sampling" mode (record 10% of activities in full, 100% summarized). Not before v2.
Update (2026-04-12)¶
Per ADR-0013, the on-wire / exported form of a capability invocation is now an ECT (Execution Context Token, draft-nennemann-wimse-ect) with assurance level L1/L2/L3 chosen via @capability(assurance=...). PROV-O remains the internal graph representation in the prov: named graph for SPARQL queryability. The kernel maps ECT ↔ PROV-O bidirectionally. Provenance is still always-on; this ADR's core decision is unchanged — only the export format is now a standards-track token instead of a Trails-specific serialization.
Update (2026-04-12) — Security hardening¶
Three normative additions from the security review:
- (a) GDPR Art. 17 erasure. Provenance references may contain principal DIDs that are subject to erasure requests. The framework MUST support a
redact_principal(did)operation that rewrites everyprov:wasAssociatedWith(and any other DID reference) in theprov:graph todid:redacted:<hash>, preserving chain structure and hash-chain integrity while removing the linkable identifier. Redaction is recorded as aprov:Activityof its own. - (b) L1 ECT export across trust boundaries forbidden. L1 ECTs are unsigned (integrity inherited from the originating TLS channel). Exporting an L1 ECT outside the originating trust domain — to another MCP peer, a queue, a webhook, a file drop — is forbidden at the framework layer. The kernel enforces an
allow_export_across_domain: boolflag per named graph; attempting to export L1 records from a graph where the flag is false raisesExportDenied. Any export across a trust boundary MUST be L2 or higher. - ©
prov:graph is kernel-write-exclusive. Only thetrails-provwriter may write to theprov:named graph. The kernel'sGraphStorewrite path denies all principal-initiated writes whose target isprov:(structural defence, independent of Cedar policy). Handlers that attemptINSERT DATA { GRAPH <prov:> { ... } }receiveProvenanceWriteDenied.