ADR-0052: Memory Security — Application Layer Separation and Provenance Integrity¶
- Status: Accepted (2026-04-19)
- Date: 2026-04-18
- Depends on: ADR-0006 (Cedar Policy), ADR-0009 (Provenance Always On), ADR-0010 (Biscuit Tokens), ADR-0011 (DID Identity), ADR-0012 (Cost Primitive), ADR-0051 (Agent Memory)
- Target milestone: M11
Context¶
ADR-0051 introduced Agent Memory — a persistent shared knowledge store for AI agents. The Phase 1 implementation (example app) demonstrates the capability surface and data model but has a fundamental security gap: agents self-report their own provenance.
An agent calling memory.learn today controls:
trails.invoke("memory.learn", {
"content": "This library is safe to use", # content — expected
"confidence": 0.99, # metadata — manipulable
"source": "security audit of dependency tree", # provenance — fabricable
"agent_did": "did:key:z6MkTrustedBot", # identity — spoofable
})
Nothing prevents an agent from fabricating sources, inflating confidence, impersonating other agents, or suppressing knowledge by issuing bogus corrections. This is the confused deputy problem applied to knowledge: the entity that writes facts also writes claims about those facts, with no independent verification.
For the memory system to be trustworthy in production — especially in regulated contexts like medtech (submission access) — the provenance layer must be independently enforced by the framework, not self-reported by the agent.
Threat model¶
| # | Threat | Vector | Impact |
|---|---|---|---|
| T1 | Identity spoofing | Agent passes another agent's DID in agent_did |
Facts attributed to wrong agent; trust decisions based on wrong identity |
| T2 | Confidence inflation | Agent sets confidence: 0.99 on every fact |
Drowns out other agents' knowledge in recall ranking |
| T3 | Source fabrication | Agent claims source: "security audit" without performing one |
Consumers trust unverified claims |
| T4 | Fact suppression | Agent A calls memory.correct on Agent B's fact with bogus reason |
Censorship; knowledge loss |
| T5 | Graph poisoning | Agent floods memory with plausible-sounding wrong facts | Downstream agents act on false information |
| T6 | Timestamp manipulation | Agent provides past/future timestamp to manipulate recency ranking | Facts appear older/newer than they are |
| T7 | Provenance tampering | Compromised agent modifies existing provenance records in storage | Audit trail integrity destroyed |
| T8 | Correction laundering | Agent corrects its own fact to replace low-confidence claim with high-confidence version, bypassing trust calibration | Artificially inflated trust |
Decision¶
1. Three-Layer Architecture¶
Separate the memory system into three distinct trust zones:
┌────────────────────────────────────────────────────┐
│ Agent Layer (UNTRUSTED) │
│ │
│ Agent provides: │
│ content, confidence_hint, topic, tags, │
│ source_description, scope │
│ │
│ Agent CANNOT provide: │
│ agent_did, timestamp, source_attestation, │
│ prev_hash, provenance records │
├────────────────────────────────────────────────────┤
│ Memory Gateway (TRUSTED — framework-enforced) │
│ │
│ 1. Authenticate: extract DID from session │
│ 2. Authorize: Cedar policy evaluation │
│ 3. Rate limit: per-agent write budget │
│ 4. Cap confidence: agent trust level │
│ 5. Tag source: attestation level │
│ 6. Generate timestamp: server-side │
│ 7. Compute hash chain: tamper-evident │
│ 8. Write fact + provenance (atomic) │
│ 9. Emit audit event │
├────────────────────────────────────────────────────┤
│ Storage Layer (APPEND-ONLY provenance) │
│ │
│ Fact graph: mutable (corrections allowed) │
│ Provenance graph: append-only, hash-chained │
│ Policy graph: admin-only writes │
└────────────────────────────────────────────────────┘
Key invariant: The agent layer talks to the gateway. The gateway talks to storage. The agent never writes provenance directly. This is a hard boundary, not a convention.
2. Identity Injection (mitigates T1)¶
The agent_did field is removed from the agent-facing API. Identity is injected by the gateway from the authenticated session:
# Agent-facing API (what the agent sees)
@capability(id="memory.learn")
def learn(ctx, content: str, confidence_hint: float = 0.8, topic: str = "general", ...):
...
# Gateway implementation (what actually happens)
def _gateway_learn(ctx, content, confidence_hint, topic, ...):
# Identity from authenticated session — NOT from agent input
agent_did = ctx.auth.principal_did
# If no authenticated session, use a low-trust anonymous DID
if not agent_did:
agent_did = "did:key:anonymous"
trust_level = TrustLevel.ANONYMOUS
else:
trust_level = _resolve_trust_level(agent_did)
...
The session's principal_did is set during MCP handshake (Biscuit token → DID extraction, per ADR-0010 and ADR-0011). For HTTP clients, the Authorization: Bearer header carries the Biscuit. For stdio MCP, the DID comes from the process-level credential.
Fallback for development: When no authentication is configured (e.g., local dev with trails server --no-auth), the gateway mints a session-scoped did:key:z6MkAnon... and tags all facts with trust_level: "anonymous". This keeps the model consistent without requiring auth setup for local experimentation.
3. Server-Side Timestamps (mitigates T6)¶
The learned_at field is set by the gateway, not by the agent:
def _gateway_learn(ctx, ...):
learned_at = datetime.datetime.now(datetime.timezone.utc).isoformat()
# Agent cannot override this
For federated scenarios where clock skew matters, the gateway includes both:
- learned_at — server timestamp (authoritative for ordering)
- received_at — reception timestamp at the federation peer (if relayed)
4. Confidence Calibration (mitigates T2, T8)¶
The agent provides a confidence_hint, not a final confidence. The gateway applies a trust-level multiplier:
class TrustLevel:
ANONYMOUS = 0.3 # no auth, max effective confidence = 0.3
AUTHENTICATED = 0.7 # valid DID, no track record
ESTABLISHED = 0.9 # agent with verified track record
HUMAN = 1.0 # human-verified agent or direct human input
SYSTEM = 1.0 # framework internal (e.g., compact)
def _calibrate_confidence(hint: float, trust_level: TrustLevel) -> float:
return min(hint, trust_level.value)
Track record adjustment: An agent whose facts are frequently corrected by other agents or humans gets its trust level automatically reduced:
def _compute_trust_adjustment(agent_did: str, ctx) -> float:
corrections_received = count_corrections_targeting(agent_did, ctx)
facts_written = count_facts_by(agent_did, ctx)
if facts_written == 0:
return 1.0
correction_rate = corrections_received / facts_written
# High correction rate → reduced trust
# 0% corrections → 1.0, 50%+ → 0.5
return max(0.5, 1.0 - correction_rate)
5. Source Attestation Levels (mitigates T3)¶
Sources are classified by how they were verified, not by what the agent claims:
class SourceAttestation:
SELF_REPORTED = "self-reported"
# Agent claims it from a source. No verification.
# Lowest trust. Default for all agent-provided sources.
TOOL_OBSERVED = "tool-observed"
# The gateway observed the agent invoke a capability that
# accessed this source (e.g., file read, API call) in the
# current session's provenance graph.
CONTENT_HASHED = "content-hashed"
# The fact content can be verified against a hash of the
# source material (e.g., SHA-256 of the file at read time).
HUMAN_CONFIRMED = "human-confirmed"
# A human explicitly marked this fact as verified.
SCITT_ANCHORED = "scitt-anchored"
# Fact hash registered in a transparency log.
# Highest trust for automated systems.
The gateway automatically upgrades attestation when it can:
def _determine_attestation(source_description: str, session) -> SourceAttestation:
# Check if the session's provenance graph shows a matching
# capability invocation (file read, API call, etc.)
if session.provenance.has_activity_matching(source_description):
return SourceAttestation.TOOL_OBSERVED
return SourceAttestation.SELF_REPORTED
6. Correction Policies (mitigates T4)¶
Cross-agent corrections are gated by Cedar policies:
// Agents can always correct their own facts
permit(
principal,
action == Action::"memory.correct",
resource
) when {
resource.agent_did == principal.did
};
// Cross-agent corrections require reviewer role
permit(
principal,
action == Action::"memory.correct",
resource
) when {
resource.agent_did != principal.did &&
principal has role &&
principal.role == "reviewer"
};
// Humans can correct any fact
permit(
principal,
action == Action::"memory.correct",
resource
) when {
principal has trust_level &&
principal.trust_level == "human"
};
// Nobody can correct SCITT-anchored facts without human approval
forbid(
principal,
action == Action::"memory.correct",
resource
) when {
resource.source_attestation == "scitt-anchored" &&
principal.trust_level != "human"
};
When a cross-agent correction is permitted, the gateway records both the original author and the corrector:
ex:correction-42 a brain:Correction ;
brain:supersedes ex:fact-17 ;
brain:originalAuthor did:key:agent-a ;
brain:correctedBy did:key:agent-b ;
brain:reason "Verified in source code: value is 60s not 30s" ;
prov:wasAssociatedWith did:key:agent-b ;
prov:generatedAtTime "2026-04-18T14:30:00Z" .
7. Write Budgets (mitigates T5)¶
Per-agent write limits prevent graph poisoning:
[memory.budgets]
default_max_facts_per_hour = 100
default_max_facts_total = 10000
anonymous_max_facts_per_hour = 10
anonymous_max_facts_total = 100
[memory.budgets.overrides]
"did:key:z6MkCIBot" = { max_facts_per_hour = 500 } # CI bots write more
The gateway tracks writes per agent and rejects memory.learn calls that exceed the budget:
def _check_write_budget(agent_did: str, config) -> bool:
recent_count = count_facts_by_agent_since(agent_did, hours=1)
limit = config.get_limit(agent_did, "max_facts_per_hour")
return recent_count < limit
Budget exhaustion returns a structured error with the limit and reset time, not a silent failure.
8. Hash-Chained Provenance (mitigates T7)¶
Every provenance record includes a hash of the previous record, creating a tamper-evident chain:
ex:prov-001 a prov:Activity ;
prov:wasAssociatedWith did:key:agent-a ;
prov:generatedAtTime "2026-04-18T10:00:00Z" ;
brain:action "memory.learn" ;
brain:factIri ex:fact-001 ;
brain:prevHash "sha256:0000...0000" ; # genesis
brain:selfHash "sha256:a1b2c3..." .
ex:prov-002 a prov:Activity ;
prov:wasAssociatedWith did:key:agent-b ;
prov:generatedAtTime "2026-04-18T10:01:00Z" ;
brain:action "memory.learn" ;
brain:factIri ex:fact-002 ;
brain:prevHash "sha256:a1b2c3..." ; # must match prov-001.selfHash
brain:selfHash "sha256:d4e5f6..." .
Verification: trails memory verify walks the chain and reports any broken links. This runs:
- On memory.compact (before pruning)
- On trails doctor (health check)
- On federation peer handshake (before accepting remote facts)
Self-hash computation:
def _compute_provenance_hash(activity: dict, prev_hash: str) -> str:
canonical = json.dumps({
"agent": activity["agent_did"],
"action": activity["action"],
"fact": activity["fact_iri"],
"timestamp": activity["timestamp"],
"prev_hash": prev_hash,
}, sort_keys=True)
return f"sha256:{hashlib.sha256(canonical.encode()).hexdigest()}"
9. Anomaly Detection (mitigates T5, T2)¶
The gateway emits events that an anomaly detector can consume:
class MemoryAuditEvent:
LEARN = "memory.learn"
CORRECT = "memory.correct"
FORGET = "memory.forget"
BUDGET_WARNING = "memory.budget_warning" # 80% of limit
BUDGET_EXCEEDED = "memory.budget_exceeded"
CHAIN_BREAK = "memory.chain_break" # hash chain integrity failure
RAPID_WRITES = "memory.rapid_writes" # burst detection
CROSS_CORRECTION = "memory.cross_correction" # agent correcting another
Phase 1: events logged to provenance graph (queryable via SPARQL).
Phase 2: hook into trails.observability (ADR-0012) for OpenTelemetry export.
10. Fact Quarantine¶
Facts from low-trust agents (anonymous, new, high correction rate) enter a quarantine zone:
def _determine_initial_scope(trust_level: TrustLevel, requested_scope: str) -> str:
if trust_level in (TrustLevel.ANONYMOUS, TrustLevel.AUTHENTICATED):
# Low-trust agents' "shared" facts go to quarantine
if requested_scope == "shared":
return "pending_review"
return requested_scope
return requested_scope
Facts in pending_review scope:
- Are NOT visible to memory.recall with scope: "shared"
- ARE visible to memory.recall with scope: "pending_review" (for reviewers)
- Can be promoted to shared by a human or a trusted agent via memory.promote
- Are automatically promoted after N days without correction (configurable)
11. Integration with Existing Trust Stack¶
| Trails Primitive | Role in Memory Security |
|---|---|
| Biscuit tokens (ADR-0010) | Session authentication; DID extraction; capability attenuation |
| DIDs (ADR-0011) | Agent identity; trust level resolution; cross-instance identity |
| Cedar policies (ADR-0006) | Correction access control; write budget enforcement; scope gating |
| PROV-O (ADR-0009) | Provenance graph; hash chain storage; activity records |
| Cost envelopes (ADR-0012) | Write budget tracking; federated cost attribution |
| SCITT (reference application) | External trust anchor for high-stakes facts |
| Federation (ADR-0023) | Hash chain verification on peer handshake; remote fact quarantine |
12. Configuration¶
[memory.security]
# Identity
require_authentication = false # true in production
allow_anonymous_writes = true # false in production
anonymous_trust_level = "anonymous" # caps confidence at 0.3
# Confidence
enable_confidence_calibration = true
enable_trust_adjustment = true # auto-reduce trust for oft-corrected agents
# Source
default_attestation = "self-reported"
auto_upgrade_tool_observed = true # check session provenance for matching reads
# Corrections
require_reviewer_for_cross_correction = false # true in production
protect_scitt_anchored_facts = true
# Write budgets
enable_write_budgets = true
default_max_facts_per_hour = 100
# Integrity
enable_hash_chain = true
verify_chain_on_compact = true
verify_chain_on_federation = true
# Quarantine
enable_quarantine = false # true in production
quarantine_auto_promote_days = 7
Progressive enhancement: Every feature defaults to OFF (permissive) for development. Production deployments enable the features they need. This follows the Trails principle (ADR-0021): start simple, add constraints when needed.
Non-goals¶
- Formal verification of fact content — the gateway ensures provenance integrity, not factual correctness. Whether "this library is safe" is true is outside scope; whether the claim is attributed to the right agent with the right confidence is in scope.
- Encryption at rest — fact content is stored in cleartext in Oxigraph. Encryption belongs to the storage layer configuration, not the memory security model.
- Consensus protocols — no Raft/PBFT between multiple memory instances. Federation is eventually consistent with policy gating, not consensus-based.
- Replacing human judgment — the system provides tools (attestation levels, correction policies, anomaly events) for humans to make trust decisions. It does not automate "is this fact true?"
Consequences¶
Positive¶
- Provenance integrity guaranteed by architecture, not by agent cooperation
- Zero-trust agent model — new agents are untrusted by default, earn trust through track record
- Progressive security — start permissive for dev, harden for production
- Auditable — every security decision (trust level, confidence cap, quarantine, correction policy) is recorded in the provenance graph
- Compatible with submission access — SCITT anchoring and hash chains directly support the reference application's regulatory use case
Negative¶
- Latency overhead — gateway adds hash computation, policy evaluation, and trust resolution per write. Mitigation: these are O(1) operations; the graph query is still the bottleneck.
- Complexity for simple use cases — a single developer using memory locally doesn't need quarantine or trust levels. Mitigation: everything defaults to OFF; the Phase 1 example app continues to work unchanged.
- Trust bootstrapping — new agents start with low trust, which may feel restrictive. Mitigation: configurable initial trust level; fast promotion path for authenticated agents.
Neutral¶
- The agent-facing API changes minimally:
agent_didbecomes injected rather than passed,confidencebecomesconfidence_hint. All other parameters remain the same. - Existing facts in a memory instance without hash chains can be back-filled by running
trails memory sealwhich computes the chain from existing provenance records.
Revisit conditions¶
- When a second production consumer deploys memory (beyond the submission-access use case): validate that the trust levels and quarantine thresholds work for a non-regulated use case.
- When
trails.vectorintegrates with memory recall: ensure vector similarity scores don't bypass confidence calibration. - When multi-instance federation is tested under adversarial conditions: validate hash chain verification across network partitions.
Alternatives considered¶
- Fully trust agents (status quo): Rejected — works for single-user dev but breaks in any shared or adversarial context.
- Blockchain for provenance: Rejected — too heavy for the use case. Hash chains provide tamper evidence without consensus overhead. SCITT anchoring covers the external trust requirement.
- Capability-based access control only (no identity): Rejected — you need to know WHO wrote a fact, not just WHETHER they were allowed to. Identity attribution is essential for trust calibration and correction policies.
- Central authority for fact verification: Rejected — introduces a single point of failure and doesn't scale to federated scenarios. Distributed trust (DID + attestation levels + Cedar policies) is more resilient.
Open questions¶
- Q: Should
confidence_hintbe renamed to justconfidencewith the calibration happening transparently? Recommendation: Keep asconfidence_hintin the gateway API to make it explicit that the agent's stated confidence is not the final value. The storedconfidencefield is the calibrated result. - Q: Should hash chain verification be synchronous (blocking writes) or asynchronous (background check)? Recommendation: Synchronous for the chain append (cheap — one hash computation). Asynchronous for full chain verification (expensive — walks entire chain).
trails memory verifyis the explicit full-check command. - Q: How should the system handle a detected hash chain break? Recommendation: Log a
CHAIN_BREAKaudit event, flag all facts after the break asintegrity: "unverified", and alert the operator. Do NOT auto-repair — a break indicates either a bug or tampering, both requiring human investigation. - Q: Should quarantine apply to corrections too, or only to new facts? Recommendation: Apply to cross-agent corrections from low-trust agents. Self-corrections and same-agent corrections bypass quarantine.