ADR-0052: Memory Security — Application Layer Separation and Provenance Integrity¶

Status: Accepted (2026-04-19)
Date: 2026-04-18
Depends on: ADR-0006 (Cedar Policy), ADR-0009 (Provenance Always On), ADR-0010 (Biscuit Tokens), ADR-0011 (DID Identity), ADR-0012 (Cost Primitive), ADR-0051 (Agent Memory)
Target milestone: M11

Context¶

ADR-0051 introduced Agent Memory — a persistent shared knowledge store for AI agents. The Phase 1 implementation (example app) demonstrates the capability surface and data model but has a fundamental security gap: agents self-report their own provenance.

An agent calling memory.learn today controls:

trails.invoke("memory.learn", {
    "content": "This library is safe to use",       # content — expected
    "confidence": 0.99,                              # metadata — manipulable
    "source": "security audit of dependency tree",   # provenance — fabricable
    "agent_did": "did:key:z6MkTrustedBot",          # identity — spoofable
})

Nothing prevents an agent from fabricating sources, inflating confidence, impersonating other agents, or suppressing knowledge by issuing bogus corrections. This is the confused deputy problem applied to knowledge: the entity that writes facts also writes claims about those facts, with no independent verification.

For the memory system to be trustworthy in production — especially in regulated contexts like medtech (submission access) — the provenance layer must be independently enforced by the framework, not self-reported by the agent.

Threat model¶

#	Threat	Vector	Impact
T1	Identity spoofing	Agent passes another agent's DID in `agent_did`	Facts attributed to wrong agent; trust decisions based on wrong identity
T2	Confidence inflation	Agent sets `confidence: 0.99` on every fact	Drowns out other agents' knowledge in recall ranking
T3	Source fabrication	Agent claims `source: "security audit"` without performing one	Consumers trust unverified claims
T4	Fact suppression	Agent A calls `memory.correct` on Agent B's fact with bogus reason	Censorship; knowledge loss
T5	Graph poisoning	Agent floods memory with plausible-sounding wrong facts	Downstream agents act on false information
T6	Timestamp manipulation	Agent provides past/future timestamp to manipulate recency ranking	Facts appear older/newer than they are
T7	Provenance tampering	Compromised agent modifies existing provenance records in storage	Audit trail integrity destroyed
T8	Correction laundering	Agent corrects its own fact to replace low-confidence claim with high-confidence version, bypassing trust calibration	Artificially inflated trust

Decision¶

1. Three-Layer Architecture¶

Separate the memory system into three distinct trust zones:

┌────────────────────────────────────────────────────┐
│  Agent Layer (UNTRUSTED)                           │
│                                                    │
│  Agent provides:                                   │
│    content, confidence_hint, topic, tags,           │
│    source_description, scope                       │
│                                                    │
│  Agent CANNOT provide:                             │
│    agent_did, timestamp, source_attestation,        │
│    prev_hash, provenance records                   │
├────────────────────────────────────────────────────┤
│  Memory Gateway (TRUSTED — framework-enforced)     │
│                                                    │
│  1. Authenticate: extract DID from session         │
│  2. Authorize: Cedar policy evaluation             │
│  3. Rate limit: per-agent write budget             │
│  4. Cap confidence: agent trust level              │
│  5. Tag source: attestation level                  │
│  6. Generate timestamp: server-side                │
│  7. Compute hash chain: tamper-evident             │
│  8. Write fact + provenance (atomic)               │
│  9. Emit audit event                               │
├────────────────────────────────────────────────────┤
│  Storage Layer (APPEND-ONLY provenance)            │
│                                                    │
│  Fact graph: mutable (corrections allowed)         │
│  Provenance graph: append-only, hash-chained       │
│  Policy graph: admin-only writes                   │
└────────────────────────────────────────────────────┘

Key invariant: The agent layer talks to the gateway. The gateway talks to storage. The agent never writes provenance directly. This is a hard boundary, not a convention.

2. Identity Injection (mitigates T1)¶

The agent_did field is removed from the agent-facing API. Identity is injected by the gateway from the authenticated session:

# Agent-facing API (what the agent sees)
@capability(id="memory.learn")
def learn(ctx, content: str, confidence_hint: float = 0.8, topic: str = "general", ...):
    ...

# Gateway implementation (what actually happens)
def _gateway_learn(ctx, content, confidence_hint, topic, ...):
    # Identity from authenticated session — NOT from agent input
    agent_did = ctx.auth.principal_did

    # If no authenticated session, use a low-trust anonymous DID
    if not agent_did:
        agent_did = "did:key:anonymous"
        trust_level = TrustLevel.ANONYMOUS
    else:
        trust_level = _resolve_trust_level(agent_did)
    ...

The session's principal_did is set during MCP handshake (Biscuit token → DID extraction, per ADR-0010 and ADR-0011). For HTTP clients, the Authorization: Bearer header carries the Biscuit. For stdio MCP, the DID comes from the process-level credential.

Fallback for development: When no authentication is configured (e.g., local dev with trails server --no-auth), the gateway mints a session-scoped did:key:z6MkAnon... and tags all facts with trust_level: "anonymous". This keeps the model consistent without requiring auth setup for local experimentation.

3. Server-Side Timestamps (mitigates T6)¶

The learned_at field is set by the gateway, not by the agent:

def _gateway_learn(ctx, ...):
    learned_at = datetime.datetime.now(datetime.timezone.utc).isoformat()
    # Agent cannot override this

For federated scenarios where clock skew matters, the gateway includes both: - learned_at — server timestamp (authoritative for ordering) - received_at — reception timestamp at the federation peer (if relayed)

4. Confidence Calibration (mitigates T2, T8)¶

The agent provides a confidence_hint, not a final confidence. The gateway applies a trust-level multiplier:

class TrustLevel:
    ANONYMOUS = 0.3     # no auth, max effective confidence = 0.3
    AUTHENTICATED = 0.7 # valid DID, no track record
    ESTABLISHED = 0.9   # agent with verified track record
    HUMAN = 1.0         # human-verified agent or direct human input
    SYSTEM = 1.0        # framework internal (e.g., compact)

def _calibrate_confidence(hint: float, trust_level: TrustLevel) -> float:
    return min(hint, trust_level.value)

Track record adjustment: An agent whose facts are frequently corrected by other agents or humans gets its trust level automatically reduced:

def _compute_trust_adjustment(agent_did: str, ctx) -> float:
    corrections_received = count_corrections_targeting(agent_did, ctx)
    facts_written = count_facts_by(agent_did, ctx)
    if facts_written == 0:
        return 1.0
    correction_rate = corrections_received / facts_written
    # High correction rate → reduced trust
    # 0% corrections → 1.0, 50%+ → 0.5
    return max(0.5, 1.0 - correction_rate)

5. Source Attestation Levels (mitigates T3)¶

Sources are classified by how they were verified, not by what the agent claims:

class SourceAttestation:
    SELF_REPORTED = "self-reported"
    # Agent claims it from a source. No verification.
    # Lowest trust. Default for all agent-provided sources.

    TOOL_OBSERVED = "tool-observed"
    # The gateway observed the agent invoke a capability that
    # accessed this source (e.g., file read, API call) in the
    # current session's provenance graph.

    CONTENT_HASHED = "content-hashed"
    # The fact content can be verified against a hash of the
    # source material (e.g., SHA-256 of the file at read time).

    HUMAN_CONFIRMED = "human-confirmed"
    # A human explicitly marked this fact as verified.

    SCITT_ANCHORED = "scitt-anchored"
    # Fact hash registered in a transparency log.
    # Highest trust for automated systems.

The gateway automatically upgrades attestation when it can:

def _determine_attestation(source_description: str, session) -> SourceAttestation:
    # Check if the session's provenance graph shows a matching
    # capability invocation (file read, API call, etc.)
    if session.provenance.has_activity_matching(source_description):
        return SourceAttestation.TOOL_OBSERVED

    return SourceAttestation.SELF_REPORTED

6. Correction Policies (mitigates T4)¶

Cross-agent corrections are gated by Cedar policies:

// Agents can always correct their own facts
permit(
    principal,
    action == Action::"memory.correct",
    resource
) when {
    resource.agent_did == principal.did
};

// Cross-agent corrections require reviewer role
permit(
    principal,
    action == Action::"memory.correct",
    resource
) when {
    resource.agent_did != principal.did &&
    principal has role &&
    principal.role == "reviewer"
};

// Humans can correct any fact
permit(
    principal,
    action == Action::"memory.correct",
    resource
) when {
    principal has trust_level &&
    principal.trust_level == "human"
};

// Nobody can correct SCITT-anchored facts without human approval
forbid(
    principal,
    action == Action::"memory.correct",
    resource
) when {
    resource.source_attestation == "scitt-anchored" &&
    principal.trust_level != "human"
};

When a cross-agent correction is permitted, the gateway records both the original author and the corrector:

ex:correction-42 a brain:Correction ;
    brain:supersedes ex:fact-17 ;
    brain:originalAuthor did:key:agent-a ;
    brain:correctedBy did:key:agent-b ;
    brain:reason "Verified in source code: value is 60s not 30s" ;
    prov:wasAssociatedWith did:key:agent-b ;
    prov:generatedAtTime "2026-04-18T14:30:00Z" .

7. Write Budgets (mitigates T5)¶

Per-agent write limits prevent graph poisoning:

[memory.budgets]
default_max_facts_per_hour = 100
default_max_facts_total = 10000
anonymous_max_facts_per_hour = 10
anonymous_max_facts_total = 100

[memory.budgets.overrides]
"did:key:z6MkCIBot" = { max_facts_per_hour = 500 }  # CI bots write more

The gateway tracks writes per agent and rejects memory.learn calls that exceed the budget:

def _check_write_budget(agent_did: str, config) -> bool:
    recent_count = count_facts_by_agent_since(agent_did, hours=1)
    limit = config.get_limit(agent_did, "max_facts_per_hour")
    return recent_count < limit

Budget exhaustion returns a structured error with the limit and reset time, not a silent failure.

8. Hash-Chained Provenance (mitigates T7)¶

Every provenance record includes a hash of the previous record, creating a tamper-evident chain:

ex:prov-001 a prov:Activity ;
    prov:wasAssociatedWith did:key:agent-a ;
    prov:generatedAtTime "2026-04-18T10:00:00Z" ;
    brain:action "memory.learn" ;
    brain:factIri ex:fact-001 ;
    brain:prevHash "sha256:0000...0000" ;  # genesis
    brain:selfHash "sha256:a1b2c3..." .

ex:prov-002 a prov:Activity ;
    prov:wasAssociatedWith did:key:agent-b ;
    prov:generatedAtTime "2026-04-18T10:01:00Z" ;
    brain:action "memory.learn" ;
    brain:factIri ex:fact-002 ;
    brain:prevHash "sha256:a1b2c3..." ;   # must match prov-001.selfHash
    brain:selfHash "sha256:d4e5f6..." .

Verification: trails memory verify walks the chain and reports any broken links. This runs: - On memory.compact (before pruning) - On trails doctor (health check) - On federation peer handshake (before accepting remote facts)

Self-hash computation:

def _compute_provenance_hash(activity: dict, prev_hash: str) -> str:
    canonical = json.dumps({
        "agent": activity["agent_did"],
        "action": activity["action"],
        "fact": activity["fact_iri"],
        "timestamp": activity["timestamp"],
        "prev_hash": prev_hash,
    }, sort_keys=True)
    return f"sha256:{hashlib.sha256(canonical.encode()).hexdigest()}"

9. Anomaly Detection (mitigates T5, T2)¶

The gateway emits events that an anomaly detector can consume:

class MemoryAuditEvent:
    LEARN = "memory.learn"
    CORRECT = "memory.correct"
    FORGET = "memory.forget"
    BUDGET_WARNING = "memory.budget_warning"   # 80% of limit
    BUDGET_EXCEEDED = "memory.budget_exceeded"
    CHAIN_BREAK = "memory.chain_break"         # hash chain integrity failure
    RAPID_WRITES = "memory.rapid_writes"       # burst detection
    CROSS_CORRECTION = "memory.cross_correction"  # agent correcting another

Phase 1: events logged to provenance graph (queryable via SPARQL). Phase 2: hook into trails.observability (ADR-0012) for OpenTelemetry export.

10. Fact Quarantine¶

Facts from low-trust agents (anonymous, new, high correction rate) enter a quarantine zone:

def _determine_initial_scope(trust_level: TrustLevel, requested_scope: str) -> str:
    if trust_level in (TrustLevel.ANONYMOUS, TrustLevel.AUTHENTICATED):
        # Low-trust agents' "shared" facts go to quarantine
        if requested_scope == "shared":
            return "pending_review"
        return requested_scope
    return requested_scope

Facts in pending_review scope: - Are NOT visible to memory.recall with scope: "shared" - ARE visible to memory.recall with scope: "pending_review" (for reviewers) - Can be promoted to shared by a human or a trusted agent via memory.promote - Are automatically promoted after N days without correction (configurable)

11. Integration with Existing Trust Stack¶

Trails Primitive	Role in Memory Security
Biscuit tokens (ADR-0010)	Session authentication; DID extraction; capability attenuation
DIDs (ADR-0011)	Agent identity; trust level resolution; cross-instance identity
Cedar policies (ADR-0006)	Correction access control; write budget enforcement; scope gating
PROV-O (ADR-0009)	Provenance graph; hash chain storage; activity records
Cost envelopes (ADR-0012)	Write budget tracking; federated cost attribution
SCITT (reference application)	External trust anchor for high-stakes facts
Federation (ADR-0023)	Hash chain verification on peer handshake; remote fact quarantine

12. Configuration¶

[memory.security]
# Identity
require_authentication = false       # true in production
allow_anonymous_writes = true        # false in production
anonymous_trust_level = "anonymous"  # caps confidence at 0.3

# Confidence
enable_confidence_calibration = true
enable_trust_adjustment = true       # auto-reduce trust for oft-corrected agents

# Source
default_attestation = "self-reported"
auto_upgrade_tool_observed = true    # check session provenance for matching reads

# Corrections
require_reviewer_for_cross_correction = false  # true in production
protect_scitt_anchored_facts = true

# Write budgets
enable_write_budgets = true
default_max_facts_per_hour = 100

# Integrity
enable_hash_chain = true
verify_chain_on_compact = true
verify_chain_on_federation = true

# Quarantine
enable_quarantine = false            # true in production
quarantine_auto_promote_days = 7

Progressive enhancement: Every feature defaults to OFF (permissive) for development. Production deployments enable the features they need. This follows the Trails principle (ADR-0021): start simple, add constraints when needed.

Non-goals¶

Formal verification of fact content — the gateway ensures provenance integrity, not factual correctness. Whether "this library is safe" is true is outside scope; whether the claim is attributed to the right agent with the right confidence is in scope.
Encryption at rest — fact content is stored in cleartext in Oxigraph. Encryption belongs to the storage layer configuration, not the memory security model.
Consensus protocols — no Raft/PBFT between multiple memory instances. Federation is eventually consistent with policy gating, not consensus-based.
Replacing human judgment — the system provides tools (attestation levels, correction policies, anomaly events) for humans to make trust decisions. It does not automate "is this fact true?"

Consequences¶

Positive¶

Provenance integrity guaranteed by architecture, not by agent cooperation
Zero-trust agent model — new agents are untrusted by default, earn trust through track record
Progressive security — start permissive for dev, harden for production
Auditable — every security decision (trust level, confidence cap, quarantine, correction policy) is recorded in the provenance graph
Compatible with submission access — SCITT anchoring and hash chains directly support the reference application's regulatory use case

Negative¶

Latency overhead — gateway adds hash computation, policy evaluation, and trust resolution per write. Mitigation: these are O(1) operations; the graph query is still the bottleneck.
Complexity for simple use cases — a single developer using memory locally doesn't need quarantine or trust levels. Mitigation: everything defaults to OFF; the Phase 1 example app continues to work unchanged.
Trust bootstrapping — new agents start with low trust, which may feel restrictive. Mitigation: configurable initial trust level; fast promotion path for authenticated agents.

Neutral¶

The agent-facing API changes minimally: agent_did becomes injected rather than passed, confidence becomes confidence_hint. All other parameters remain the same.
Existing facts in a memory instance without hash chains can be back-filled by running trails memory seal which computes the chain from existing provenance records.

Revisit conditions¶

When a second production consumer deploys memory (beyond the submission-access use case): validate that the trust levels and quarantine thresholds work for a non-regulated use case.
When trails.vector integrates with memory recall: ensure vector similarity scores don't bypass confidence calibration.
When multi-instance federation is tested under adversarial conditions: validate hash chain verification across network partitions.

Alternatives considered¶

Fully trust agents (status quo): Rejected — works for single-user dev but breaks in any shared or adversarial context.
Blockchain for provenance: Rejected — too heavy for the use case. Hash chains provide tamper evidence without consensus overhead. SCITT anchoring covers the external trust requirement.
Capability-based access control only (no identity): Rejected — you need to know WHO wrote a fact, not just WHETHER they were allowed to. Identity attribution is essential for trust calibration and correction policies.
Central authority for fact verification: Rejected — introduces a single point of failure and doesn't scale to federated scenarios. Distributed trust (DID + attestation levels + Cedar policies) is more resilient.

Open questions¶

Q: Should confidence_hint be renamed to just confidence with the calibration happening transparently? Recommendation: Keep as confidence_hint in the gateway API to make it explicit that the agent's stated confidence is not the final value. The stored confidence field is the calibrated result.
Q: Should hash chain verification be synchronous (blocking writes) or asynchronous (background check)? Recommendation: Synchronous for the chain append (cheap — one hash computation). Asynchronous for full chain verification (expensive — walks entire chain). trails memory verify is the explicit full-check command.
Q: How should the system handle a detected hash chain break? Recommendation: Log a CHAIN_BREAK audit event, flag all facts after the break as integrity: "unverified", and alert the operator. Do NOT auto-repair — a break indicates either a bug or tampering, both requiring human investigation.
Q: Should quarantine apply to corrections too, or only to new facts? Recommendation: Apply to cross-agent corrections from low-trust agents. Self-corrections and same-agent corrections bypass quarantine.