ADR-0009: Provenance (PROV-O) always on¶

Status: Accepted
Date: 2026-04-12

Context¶

Provenance — who did what, when, to what data, based on what — is fundamental to agent trust. Without it: - Audit impossible. - EU AI Act Art. 12 compliance impossible. - "Why did the agent do this?" cannot be answered. - Cross-agent trust cannot be established. - Data corrections can't be propagated (you can't undo what you can't trace).

Frameworks historically treat provenance as a plugin or afterthought. Options:

Opt-in plugin. Devs enable provenance per capability. Rails-level: nothing.
Opt-out default. On by default, devs can disable. Slight friction.
Always on, queryable off. Triples always generated; apps can choose not to query/expose them. No opt-out of generation.
Always on, always exposed. Responses always include provenance IRI.

Decision¶

Option 4: Always on, always exposed. PROV-O triples are emitted on every capability invocation. Every response envelope includes a resolvable provenance IRI. The prov: named graph is always populated.

Specifically: - Every @capability invocation produces a prov:Activity record. - Inputs are linked via prov:used; outputs via prov:generated; principal via prov:wasAssociatedWith. - Activity includes prov:startedAtTime and prov:endedAtTime. - Derivation links are recorded where a capability derives new entities from existing ones. - All provenance triples live in a per-app-configurable prov: named graph. - Response envelopes include provenance: { "@id": "..." } pointing to the activity.

Apps cannot turn provenance generation off. They can choose not to expose the IRI externally (by rewriting the envelope in a custom renderer), but the internal graph always has it.

Consequences¶

Positive¶

Regulatory posture. EU AI Act Art. 12 logging is effectively free for Trails apps.
Agent trust. Receiving agents can verify the chain and decide whether to trust.
Debuggability. PROV-O graph is a distributed audit trail queryable via SPARQL.
Data corrections. When a source is discovered bad, derived data is traceable via prov:wasDerivedFrom.
Framework differentiator. No comparable framework has provenance as a core primitive.

Negative¶

Storage cost. Each activity is ~5–20 triples. At high throughput (10k invocations/hour) this is ~50–200k triples/hour. Mitigated by:
Dedicated named graph enables cheap truncation / archival policies.
Oxigraph scales well into billions of triples.
Apps needing extreme throughput can point prov: graph at a separate store.
Write latency impact. Extra triples per capability = extra write. Benchmarked at ~1–2 ms overhead; acceptable per NFR-Perf1.
Privacy tension. Provenance may itself contain sensitive data (which principal took which action). Mitigated by:
Provenance graph is a distinct, auth-gated resource.
Principals are DIDs, not human identities, by default.
Capability-aware redaction policies supported v1.5+.

Non-consequences¶

Apps can still query provenance via SPARQL like any other graph.
Existing PROV-O tooling (Apache Jena, pyprov) works on the emitted graph.

Revisit conditions¶

If at-scale storage becomes a real problem and opt-out cohorts emerge, consider a "provenance sampling" mode (record 10% of activities in full, 100% summarized). Not before v2.

Update (2026-04-12)¶

Per ADR-0013, the on-wire / exported form of a capability invocation is now an ECT (Execution Context Token, draft-nennemann-wimse-ect) with assurance level L1/L2/L3 chosen via @capability(assurance=...). PROV-O remains the internal graph representation in the prov: named graph for SPARQL queryability. The kernel maps ECT ↔ PROV-O bidirectionally. Provenance is still always-on; this ADR's core decision is unchanged — only the export format is now a standards-track token instead of a Trails-specific serialization.

Update (2026-04-12) — Security hardening¶

Three normative additions from the security review:

(a) GDPR Art. 17 erasure. Provenance references may contain principal DIDs that are subject to erasure requests. The framework MUST support a redact_principal(did) operation that rewrites every prov:wasAssociatedWith (and any other DID reference) in the prov: graph to did:redacted:<hash>, preserving chain structure and hash-chain integrity while removing the linkable identifier. Redaction is recorded as a prov:Activity of its own.
(b) L1 ECT export across trust boundaries forbidden. L1 ECTs are unsigned (integrity inherited from the originating TLS channel). Exporting an L1 ECT outside the originating trust domain — to another MCP peer, a queue, a webhook, a file drop — is forbidden at the framework layer. The kernel enforces an allow_export_across_domain: bool flag per named graph; attempting to export L1 records from a graph where the flag is false raises ExportDenied. Any export across a trust boundary MUST be L2 or higher.
© prov: graph is kernel-write-exclusive. Only the trails-prov writer may write to the prov: named graph. The kernel's GraphStore write path denies all principal-initiated writes whose target is prov: (structural defence, independent of Cedar policy). Handlers that attempt INSERT DATA { GRAPH <prov:> { ... } } receive ProvenanceWriteDenied.