ADR-0038: Explainable Provenance — Citation Graphs and Confidence Propagation¶

Status: Accepted (2026-04-19)
Date: 2026-04-18

Context¶

Trails records comprehensive PROV-O provenance for every capability invocation (ADR-0009): activities, agents, derivation chains, and timestamps all land in the trails:prov named graph. But the raw provenance triples are machine-readable, not human-readable. When a user asks "why did the system produce this answer?", the framework has no way to surface a coherent explanation without manual SPARQL spelunking.

Concrete gaps:

No citation trail. An agent answer references KG nodes internally, but there is no structured link from the answer back to the specific nodes that informed it. Users cannot audit which data drove a conclusion.
No confidence propagation. If a source node carries a confidence score of 0.8, conclusions derived from it should inherit bounded confidence — but today provenance records derivation without quantifying trust attenuation.
No counterfactual reasoning. "What would change if we removed this source?" is unanswerable without traversing the full derivation graph by hand.
No human-readable explanations. PROV-O triples are precise but opaque. Developers and end-users need natural-language or structured summaries: "This answer was produced by capability X, which read nodes Y and Z."

These gaps are blockers for regulated industries (GDPR right to explanation, EU AI Act transparency requirements) and for any deployment where agent outputs require audit-grade justification.

Decision¶

1. `trails.explain` module¶

A new top-level module providing explainability primitives that operate on the existing PROV-O graph. All operations are pure SPARQL queries over the kernel store — no AI/LLM dependency.

Core data structures:

from dataclasses import dataclass, field

@dataclass
class Citation:
    node_iri: str
    node_type: str
    field: str
    value: str
    relevance: float  # 0.0–1.0
    accessed_by: str  # capability or SPARQL that read it

@dataclass
class ReasoningStep:
    step_number: int
    action: str
    observation: str
    cited_nodes: list[str]
    confidence_contribution: float

@dataclass
class Explanation:
    answer: str
    citations: list[Citation]
    confidence: float
    reasoning_chain: list[ReasoningStep]
    provenance_iris: list[str]

Core functions:

explain_result(store, activity_iri) — trace back through the PROV-O graph from an activity and produce an Explanation. Walks prov:used, prov:wasInformedBy, prov:wasDerivedFrom to find all source nodes.
citation_graph(store, activity_iri) — return a mapping of {node_iri: [activities that used it]}. This is the "which data informed this conclusion?" graph.
propagate_confidence(store, node_iri, base_confidence=1.0) — walk the derivation chain from a node, computing confidence as the product of source confidences. Returns {node_iri: confidence} for the whole chain. Confidence is always clamped to [0.0, 1.0].
counterfactual(store, remove_iri, activity_iri) — estimate what would change if a source node were removed. Returns a list of affected activities and their dependency paths.
format_explanation(explanation, format="text"|"markdown"|"json") — render an Explanation in different output formats.

2. Provenance graph traversal strategy¶

Explanations are derived entirely from existing PROV-O triples:

prov:used — direct data dependencies of an activity
prov:wasInformedBy — inter-activity information flow (agent step chains, planner sub-activities)
prov:wasDerivedFrom — entity-level derivation (output X was derived from input Y)
prov:wasAssociatedWith — which principal/agent performed the activity
prov:wasGeneratedBy — which activity produced an entity

The traversal is bounded: a configurable max_depth (default 10) prevents runaway walks on deeply nested provenance chains. Cycles are detected and broken.

3. Confidence propagation model¶

Confidence propagates multiplicatively along derivation chains:

confidence(derived) = confidence(source1) * confidence(source2) * ...

When a node has no explicit confidence annotation, 1.0 is assumed (full trust). The product is clamped to [0.0, 1.0]. This is a conservative model: derived conclusions are never more confident than their least-confident source.

Confidence annotations are stored as triples:

<node_iri> <https://trails.dev/confidence> "0.8"^^xsd:decimal .

4. CLI integration¶

trails explain <activity_iri>          # human-readable explanation
trails explain --citations <iri>       # citation graph
trails explain --confidence <node_iri> # confidence propagation
trails explain --counterfactual <remove_iri> <activity_iri>
trails explain --format json <iri>     # output as JSON

The explain subcommand is registered in the lazy CLI loader alongside existing commands like prov and cred.

5. Integration with the agent runtime¶

Planners (ReAct, Plan-and-Execute, Reflexion) already record prov:wasInformedBy links between step activities. The explain module leverages these existing links — no changes to planner code are needed.

Future enhancement (not in scope for this ADR): planners could automatically collect citations during tool calls by annotating prov:used edges with the specific KG node IRIs read during each step.

Non-goals¶

Not an AI explainability tool. This module explains provenance chains, not neural network internals. It does not generate natural-language explanations via LLM — it formats structured provenance data.
No causal inference. Counterfactual queries identify dependency paths, not causal mechanisms. "What would change" means "which activities would lose a dependency", not "what would the output have been."
No real-time explanation streaming. Explanations are computed post-hoc from the provenance graph, not during capability execution.
No UI component. The explain module provides data structures and CLI output. Dashboard integration (ADR-0019) is a separate concern.

Dependencies¶

ADR	Relationship
ADR-0009 (PROV-O)	Source of all provenance triples; explain reads from the prov graph
ADR-0017 (Context)	`ctx.kg` provides store access for provenance queries
ADR-0021 (Progressive Enhancement)	Explain works with any provenance level — from bare capabilities to full agent plans
ADR-0030 (Verifiable Credentials)	Future: AuditTrailCredential could package an Explanation as a VC

Consequences¶

Positive¶

Audit-grade transparency. Every agent output can be traced to its source data with a single function call — critical for GDPR, EU AI Act, and regulated industries.
Zero LLM dependency. Pure SPARQL traversal over existing PROV-O triples. No cost, no latency, no model risk.
Composable with existing primitives. Works with any capability that records provenance — no changes to existing handlers required.
Bounded confidence model. Simple multiplicative propagation is easy to reason about and audit. More sophisticated models can be layered on top.
Counterfactual analysis. "What if we removed this source?" is a unique capability for a KG framework — enables data quality impact assessment.

Negative¶

SPARQL query cost. Deep provenance chains require recursive graph traversal. Mitigated by: max_depth bound, result caching for repeated queries on the same activity.
Confidence model simplicity. Multiplicative propagation is lossy — it cannot express "this source contradicts that source" or "confidence increases with corroboration." Mitigated by: the model is a floor, not a ceiling; richer models can override propagate_confidence.
Explanation completeness depends on provenance quality. If a capability does not record prov:used for its data reads, the explanation will be incomplete. Mitigated by: clear docs on provenance best practices; future automatic citation collection in planners.

Revisit conditions¶

If a standard explanation ontology emerges (beyond PROV-O), consider alignment or adoption.
If LLM-generated natural-language explanations are needed, add an optional explain_natural(ctx, activity_iri) that calls ctx.llm with the structured explanation as context.
If the multiplicative confidence model proves insufficient, design a pluggable confidence propagation strategy.