Skip to content

ADR-0009: Provenance (PROV-O) always on

  • Status: Accepted
  • Date: 2026-04-12

Context

Provenance — who did what, when, to what data, based on what — is fundamental to agent trust. Without it: - Audit impossible. - EU AI Act Art. 12 compliance impossible. - "Why did the agent do this?" cannot be answered. - Cross-agent trust cannot be established. - Data corrections can't be propagated (you can't undo what you can't trace).

Frameworks historically treat provenance as a plugin or afterthought. Options:

  1. Opt-in plugin. Devs enable provenance per capability. Rails-level: nothing.
  2. Opt-out default. On by default, devs can disable. Slight friction.
  3. Always on, queryable off. Triples always generated; apps can choose not to query/expose them. No opt-out of generation.
  4. Always on, always exposed. Responses always include provenance IRI.

Decision

Option 4: Always on, always exposed. PROV-O triples are emitted on every capability invocation. Every response envelope includes a resolvable provenance IRI. The prov: named graph is always populated.

Specifically: - Every @capability invocation produces a prov:Activity record. - Inputs are linked via prov:used; outputs via prov:generated; principal via prov:wasAssociatedWith. - Activity includes prov:startedAtTime and prov:endedAtTime. - Derivation links are recorded where a capability derives new entities from existing ones. - All provenance triples live in a per-app-configurable prov: named graph. - Response envelopes include provenance: { "@id": "..." } pointing to the activity.

Apps cannot turn provenance generation off. They can choose not to expose the IRI externally (by rewriting the envelope in a custom renderer), but the internal graph always has it.

Consequences

Positive

  • Regulatory posture. EU AI Act Art. 12 logging is effectively free for Trails apps.
  • Agent trust. Receiving agents can verify the chain and decide whether to trust.
  • Debuggability. PROV-O graph is a distributed audit trail queryable via SPARQL.
  • Data corrections. When a source is discovered bad, derived data is traceable via prov:wasDerivedFrom.
  • Framework differentiator. No comparable framework has provenance as a core primitive.

Negative

  • Storage cost. Each activity is ~5–20 triples. At high throughput (10k invocations/hour) this is ~50–200k triples/hour. Mitigated by:
  • Dedicated named graph enables cheap truncation / archival policies.
  • Oxigraph scales well into billions of triples.
  • Apps needing extreme throughput can point prov: graph at a separate store.
  • Write latency impact. Extra triples per capability = extra write. Benchmarked at ~1–2 ms overhead; acceptable per NFR-Perf1.
  • Privacy tension. Provenance may itself contain sensitive data (which principal took which action). Mitigated by:
  • Provenance graph is a distinct, auth-gated resource.
  • Principals are DIDs, not human identities, by default.
  • Capability-aware redaction policies supported v1.5+.

Non-consequences

  • Apps can still query provenance via SPARQL like any other graph.
  • Existing PROV-O tooling (Apache Jena, pyprov) works on the emitted graph.

Revisit conditions

  • If at-scale storage becomes a real problem and opt-out cohorts emerge, consider a "provenance sampling" mode (record 10% of activities in full, 100% summarized). Not before v2.

Update (2026-04-12)

Per ADR-0013, the on-wire / exported form of a capability invocation is now an ECT (Execution Context Token, draft-nennemann-wimse-ect) with assurance level L1/L2/L3 chosen via @capability(assurance=...). PROV-O remains the internal graph representation in the prov: named graph for SPARQL queryability. The kernel maps ECT ↔ PROV-O bidirectionally. Provenance is still always-on; this ADR's core decision is unchanged — only the export format is now a standards-track token instead of a Trails-specific serialization.

Update (2026-04-12) — Security hardening

Three normative additions from the security review:

  • (a) GDPR Art. 17 erasure. Provenance references may contain principal DIDs that are subject to erasure requests. The framework MUST support a redact_principal(did) operation that rewrites every prov:wasAssociatedWith (and any other DID reference) in the prov: graph to did:redacted:<hash>, preserving chain structure and hash-chain integrity while removing the linkable identifier. Redaction is recorded as a prov:Activity of its own.
  • (b) L1 ECT export across trust boundaries forbidden. L1 ECTs are unsigned (integrity inherited from the originating TLS channel). Exporting an L1 ECT outside the originating trust domain — to another MCP peer, a queue, a webhook, a file drop — is forbidden at the framework layer. The kernel enforces an allow_export_across_domain: bool flag per named graph; attempting to export L1 records from a graph where the flag is false raises ExportDenied. Any export across a trust boundary MUST be L2 or higher.
  • © prov: graph is kernel-write-exclusive. Only the trails-prov writer may write to the prov: named graph. The kernel's GraphStore write path denies all principal-initiated writes whose target is prov: (structural defence, independent of Cedar policy). Handlers that attempt INSERT DATA { GRAPH <prov:> { ... } } receive ProvenanceWriteDenied.