Skip to content

ADR-0052: Memory Security — Application Layer Separation and Provenance Integrity

  • Status: Accepted (2026-04-19)
  • Date: 2026-04-18
  • Depends on: ADR-0006 (Cedar Policy), ADR-0009 (Provenance Always On), ADR-0010 (Biscuit Tokens), ADR-0011 (DID Identity), ADR-0012 (Cost Primitive), ADR-0051 (Agent Memory)
  • Target milestone: M11

Context

ADR-0051 introduced Agent Memory — a persistent shared knowledge store for AI agents. The Phase 1 implementation (example app) demonstrates the capability surface and data model but has a fundamental security gap: agents self-report their own provenance.

An agent calling memory.learn today controls:

trails.invoke("memory.learn", {
    "content": "This library is safe to use",       # content — expected
    "confidence": 0.99,                              # metadata — manipulable
    "source": "security audit of dependency tree",   # provenance — fabricable
    "agent_did": "did:key:z6MkTrustedBot",          # identity — spoofable
})

Nothing prevents an agent from fabricating sources, inflating confidence, impersonating other agents, or suppressing knowledge by issuing bogus corrections. This is the confused deputy problem applied to knowledge: the entity that writes facts also writes claims about those facts, with no independent verification.

For the memory system to be trustworthy in production — especially in regulated contexts like medtech (submission access) — the provenance layer must be independently enforced by the framework, not self-reported by the agent.

Threat model

# Threat Vector Impact
T1 Identity spoofing Agent passes another agent's DID in agent_did Facts attributed to wrong agent; trust decisions based on wrong identity
T2 Confidence inflation Agent sets confidence: 0.99 on every fact Drowns out other agents' knowledge in recall ranking
T3 Source fabrication Agent claims source: "security audit" without performing one Consumers trust unverified claims
T4 Fact suppression Agent A calls memory.correct on Agent B's fact with bogus reason Censorship; knowledge loss
T5 Graph poisoning Agent floods memory with plausible-sounding wrong facts Downstream agents act on false information
T6 Timestamp manipulation Agent provides past/future timestamp to manipulate recency ranking Facts appear older/newer than they are
T7 Provenance tampering Compromised agent modifies existing provenance records in storage Audit trail integrity destroyed
T8 Correction laundering Agent corrects its own fact to replace low-confidence claim with high-confidence version, bypassing trust calibration Artificially inflated trust

Decision

1. Three-Layer Architecture

Separate the memory system into three distinct trust zones:

┌────────────────────────────────────────────────────┐
│  Agent Layer (UNTRUSTED)                           │
│                                                    │
│  Agent provides:                                   │
│    content, confidence_hint, topic, tags,           │
│    source_description, scope                       │
│                                                    │
│  Agent CANNOT provide:                             │
│    agent_did, timestamp, source_attestation,        │
│    prev_hash, provenance records                   │
├────────────────────────────────────────────────────┤
│  Memory Gateway (TRUSTED — framework-enforced)     │
│                                                    │
│  1. Authenticate: extract DID from session         │
│  2. Authorize: Cedar policy evaluation             │
│  3. Rate limit: per-agent write budget             │
│  4. Cap confidence: agent trust level              │
│  5. Tag source: attestation level                  │
│  6. Generate timestamp: server-side                │
│  7. Compute hash chain: tamper-evident             │
│  8. Write fact + provenance (atomic)               │
│  9. Emit audit event                               │
├────────────────────────────────────────────────────┤
│  Storage Layer (APPEND-ONLY provenance)            │
│                                                    │
│  Fact graph: mutable (corrections allowed)         │
│  Provenance graph: append-only, hash-chained       │
│  Policy graph: admin-only writes                   │
└────────────────────────────────────────────────────┘

Key invariant: The agent layer talks to the gateway. The gateway talks to storage. The agent never writes provenance directly. This is a hard boundary, not a convention.

2. Identity Injection (mitigates T1)

The agent_did field is removed from the agent-facing API. Identity is injected by the gateway from the authenticated session:

# Agent-facing API (what the agent sees)
@capability(id="memory.learn")
def learn(ctx, content: str, confidence_hint: float = 0.8, topic: str = "general", ...):
    ...

# Gateway implementation (what actually happens)
def _gateway_learn(ctx, content, confidence_hint, topic, ...):
    # Identity from authenticated session — NOT from agent input
    agent_did = ctx.auth.principal_did

    # If no authenticated session, use a low-trust anonymous DID
    if not agent_did:
        agent_did = "did:key:anonymous"
        trust_level = TrustLevel.ANONYMOUS
    else:
        trust_level = _resolve_trust_level(agent_did)
    ...

The session's principal_did is set during MCP handshake (Biscuit token → DID extraction, per ADR-0010 and ADR-0011). For HTTP clients, the Authorization: Bearer header carries the Biscuit. For stdio MCP, the DID comes from the process-level credential.

Fallback for development: When no authentication is configured (e.g., local dev with trails server --no-auth), the gateway mints a session-scoped did:key:z6MkAnon... and tags all facts with trust_level: "anonymous". This keeps the model consistent without requiring auth setup for local experimentation.

3. Server-Side Timestamps (mitigates T6)

The learned_at field is set by the gateway, not by the agent:

def _gateway_learn(ctx, ...):
    learned_at = datetime.datetime.now(datetime.timezone.utc).isoformat()
    # Agent cannot override this

For federated scenarios where clock skew matters, the gateway includes both: - learned_at — server timestamp (authoritative for ordering) - received_at — reception timestamp at the federation peer (if relayed)

4. Confidence Calibration (mitigates T2, T8)

The agent provides a confidence_hint, not a final confidence. The gateway applies a trust-level multiplier:

class TrustLevel:
    ANONYMOUS = 0.3     # no auth, max effective confidence = 0.3
    AUTHENTICATED = 0.7 # valid DID, no track record
    ESTABLISHED = 0.9   # agent with verified track record
    HUMAN = 1.0         # human-verified agent or direct human input
    SYSTEM = 1.0        # framework internal (e.g., compact)

def _calibrate_confidence(hint: float, trust_level: TrustLevel) -> float:
    return min(hint, trust_level.value)

Track record adjustment: An agent whose facts are frequently corrected by other agents or humans gets its trust level automatically reduced:

def _compute_trust_adjustment(agent_did: str, ctx) -> float:
    corrections_received = count_corrections_targeting(agent_did, ctx)
    facts_written = count_facts_by(agent_did, ctx)
    if facts_written == 0:
        return 1.0
    correction_rate = corrections_received / facts_written
    # High correction rate → reduced trust
    # 0% corrections → 1.0, 50%+ → 0.5
    return max(0.5, 1.0 - correction_rate)

5. Source Attestation Levels (mitigates T3)

Sources are classified by how they were verified, not by what the agent claims:

class SourceAttestation:
    SELF_REPORTED = "self-reported"
    # Agent claims it from a source. No verification.
    # Lowest trust. Default for all agent-provided sources.

    TOOL_OBSERVED = "tool-observed"
    # The gateway observed the agent invoke a capability that
    # accessed this source (e.g., file read, API call) in the
    # current session's provenance graph.

    CONTENT_HASHED = "content-hashed"
    # The fact content can be verified against a hash of the
    # source material (e.g., SHA-256 of the file at read time).

    HUMAN_CONFIRMED = "human-confirmed"
    # A human explicitly marked this fact as verified.

    SCITT_ANCHORED = "scitt-anchored"
    # Fact hash registered in a transparency log.
    # Highest trust for automated systems.

The gateway automatically upgrades attestation when it can:

def _determine_attestation(source_description: str, session) -> SourceAttestation:
    # Check if the session's provenance graph shows a matching
    # capability invocation (file read, API call, etc.)
    if session.provenance.has_activity_matching(source_description):
        return SourceAttestation.TOOL_OBSERVED

    return SourceAttestation.SELF_REPORTED

6. Correction Policies (mitigates T4)

Cross-agent corrections are gated by Cedar policies:

// Agents can always correct their own facts
permit(
    principal,
    action == Action::"memory.correct",
    resource
) when {
    resource.agent_did == principal.did
};

// Cross-agent corrections require reviewer role
permit(
    principal,
    action == Action::"memory.correct",
    resource
) when {
    resource.agent_did != principal.did &&
    principal has role &&
    principal.role == "reviewer"
};

// Humans can correct any fact
permit(
    principal,
    action == Action::"memory.correct",
    resource
) when {
    principal has trust_level &&
    principal.trust_level == "human"
};

// Nobody can correct SCITT-anchored facts without human approval
forbid(
    principal,
    action == Action::"memory.correct",
    resource
) when {
    resource.source_attestation == "scitt-anchored" &&
    principal.trust_level != "human"
};

When a cross-agent correction is permitted, the gateway records both the original author and the corrector:

ex:correction-42 a brain:Correction ;
    brain:supersedes ex:fact-17 ;
    brain:originalAuthor did:key:agent-a ;
    brain:correctedBy did:key:agent-b ;
    brain:reason "Verified in source code: value is 60s not 30s" ;
    prov:wasAssociatedWith did:key:agent-b ;
    prov:generatedAtTime "2026-04-18T14:30:00Z" .

7. Write Budgets (mitigates T5)

Per-agent write limits prevent graph poisoning:

[memory.budgets]
default_max_facts_per_hour = 100
default_max_facts_total = 10000
anonymous_max_facts_per_hour = 10
anonymous_max_facts_total = 100

[memory.budgets.overrides]
"did:key:z6MkCIBot" = { max_facts_per_hour = 500 }  # CI bots write more

The gateway tracks writes per agent and rejects memory.learn calls that exceed the budget:

def _check_write_budget(agent_did: str, config) -> bool:
    recent_count = count_facts_by_agent_since(agent_did, hours=1)
    limit = config.get_limit(agent_did, "max_facts_per_hour")
    return recent_count < limit

Budget exhaustion returns a structured error with the limit and reset time, not a silent failure.

8. Hash-Chained Provenance (mitigates T7)

Every provenance record includes a hash of the previous record, creating a tamper-evident chain:

ex:prov-001 a prov:Activity ;
    prov:wasAssociatedWith did:key:agent-a ;
    prov:generatedAtTime "2026-04-18T10:00:00Z" ;
    brain:action "memory.learn" ;
    brain:factIri ex:fact-001 ;
    brain:prevHash "sha256:0000...0000" ;  # genesis
    brain:selfHash "sha256:a1b2c3..." .

ex:prov-002 a prov:Activity ;
    prov:wasAssociatedWith did:key:agent-b ;
    prov:generatedAtTime "2026-04-18T10:01:00Z" ;
    brain:action "memory.learn" ;
    brain:factIri ex:fact-002 ;
    brain:prevHash "sha256:a1b2c3..." ;   # must match prov-001.selfHash
    brain:selfHash "sha256:d4e5f6..." .

Verification: trails memory verify walks the chain and reports any broken links. This runs: - On memory.compact (before pruning) - On trails doctor (health check) - On federation peer handshake (before accepting remote facts)

Self-hash computation:

def _compute_provenance_hash(activity: dict, prev_hash: str) -> str:
    canonical = json.dumps({
        "agent": activity["agent_did"],
        "action": activity["action"],
        "fact": activity["fact_iri"],
        "timestamp": activity["timestamp"],
        "prev_hash": prev_hash,
    }, sort_keys=True)
    return f"sha256:{hashlib.sha256(canonical.encode()).hexdigest()}"

9. Anomaly Detection (mitigates T5, T2)

The gateway emits events that an anomaly detector can consume:

class MemoryAuditEvent:
    LEARN = "memory.learn"
    CORRECT = "memory.correct"
    FORGET = "memory.forget"
    BUDGET_WARNING = "memory.budget_warning"   # 80% of limit
    BUDGET_EXCEEDED = "memory.budget_exceeded"
    CHAIN_BREAK = "memory.chain_break"         # hash chain integrity failure
    RAPID_WRITES = "memory.rapid_writes"       # burst detection
    CROSS_CORRECTION = "memory.cross_correction"  # agent correcting another

Phase 1: events logged to provenance graph (queryable via SPARQL). Phase 2: hook into trails.observability (ADR-0012) for OpenTelemetry export.

10. Fact Quarantine

Facts from low-trust agents (anonymous, new, high correction rate) enter a quarantine zone:

def _determine_initial_scope(trust_level: TrustLevel, requested_scope: str) -> str:
    if trust_level in (TrustLevel.ANONYMOUS, TrustLevel.AUTHENTICATED):
        # Low-trust agents' "shared" facts go to quarantine
        if requested_scope == "shared":
            return "pending_review"
        return requested_scope
    return requested_scope

Facts in pending_review scope: - Are NOT visible to memory.recall with scope: "shared" - ARE visible to memory.recall with scope: "pending_review" (for reviewers) - Can be promoted to shared by a human or a trusted agent via memory.promote - Are automatically promoted after N days without correction (configurable)

11. Integration with Existing Trust Stack

Trails Primitive Role in Memory Security
Biscuit tokens (ADR-0010) Session authentication; DID extraction; capability attenuation
DIDs (ADR-0011) Agent identity; trust level resolution; cross-instance identity
Cedar policies (ADR-0006) Correction access control; write budget enforcement; scope gating
PROV-O (ADR-0009) Provenance graph; hash chain storage; activity records
Cost envelopes (ADR-0012) Write budget tracking; federated cost attribution
SCITT (reference application) External trust anchor for high-stakes facts
Federation (ADR-0023) Hash chain verification on peer handshake; remote fact quarantine

12. Configuration

[memory.security]
# Identity
require_authentication = false       # true in production
allow_anonymous_writes = true        # false in production
anonymous_trust_level = "anonymous"  # caps confidence at 0.3

# Confidence
enable_confidence_calibration = true
enable_trust_adjustment = true       # auto-reduce trust for oft-corrected agents

# Source
default_attestation = "self-reported"
auto_upgrade_tool_observed = true    # check session provenance for matching reads

# Corrections
require_reviewer_for_cross_correction = false  # true in production
protect_scitt_anchored_facts = true

# Write budgets
enable_write_budgets = true
default_max_facts_per_hour = 100

# Integrity
enable_hash_chain = true
verify_chain_on_compact = true
verify_chain_on_federation = true

# Quarantine
enable_quarantine = false            # true in production
quarantine_auto_promote_days = 7

Progressive enhancement: Every feature defaults to OFF (permissive) for development. Production deployments enable the features they need. This follows the Trails principle (ADR-0021): start simple, add constraints when needed.

Non-goals

  • Formal verification of fact content — the gateway ensures provenance integrity, not factual correctness. Whether "this library is safe" is true is outside scope; whether the claim is attributed to the right agent with the right confidence is in scope.
  • Encryption at rest — fact content is stored in cleartext in Oxigraph. Encryption belongs to the storage layer configuration, not the memory security model.
  • Consensus protocols — no Raft/PBFT between multiple memory instances. Federation is eventually consistent with policy gating, not consensus-based.
  • Replacing human judgment — the system provides tools (attestation levels, correction policies, anomaly events) for humans to make trust decisions. It does not automate "is this fact true?"

Consequences

Positive

  • Provenance integrity guaranteed by architecture, not by agent cooperation
  • Zero-trust agent model — new agents are untrusted by default, earn trust through track record
  • Progressive security — start permissive for dev, harden for production
  • Auditable — every security decision (trust level, confidence cap, quarantine, correction policy) is recorded in the provenance graph
  • Compatible with submission access — SCITT anchoring and hash chains directly support the reference application's regulatory use case

Negative

  • Latency overhead — gateway adds hash computation, policy evaluation, and trust resolution per write. Mitigation: these are O(1) operations; the graph query is still the bottleneck.
  • Complexity for simple use cases — a single developer using memory locally doesn't need quarantine or trust levels. Mitigation: everything defaults to OFF; the Phase 1 example app continues to work unchanged.
  • Trust bootstrapping — new agents start with low trust, which may feel restrictive. Mitigation: configurable initial trust level; fast promotion path for authenticated agents.

Neutral

  • The agent-facing API changes minimally: agent_did becomes injected rather than passed, confidence becomes confidence_hint. All other parameters remain the same.
  • Existing facts in a memory instance without hash chains can be back-filled by running trails memory seal which computes the chain from existing provenance records.

Revisit conditions

  • When a second production consumer deploys memory (beyond the submission-access use case): validate that the trust levels and quarantine thresholds work for a non-regulated use case.
  • When trails.vector integrates with memory recall: ensure vector similarity scores don't bypass confidence calibration.
  • When multi-instance federation is tested under adversarial conditions: validate hash chain verification across network partitions.

Alternatives considered

  1. Fully trust agents (status quo): Rejected — works for single-user dev but breaks in any shared or adversarial context.
  2. Blockchain for provenance: Rejected — too heavy for the use case. Hash chains provide tamper evidence without consensus overhead. SCITT anchoring covers the external trust requirement.
  3. Capability-based access control only (no identity): Rejected — you need to know WHO wrote a fact, not just WHETHER they were allowed to. Identity attribution is essential for trust calibration and correction policies.
  4. Central authority for fact verification: Rejected — introduces a single point of failure and doesn't scale to federated scenarios. Distributed trust (DID + attestation levels + Cedar policies) is more resilient.

Open questions

  • Q: Should confidence_hint be renamed to just confidence with the calibration happening transparently? Recommendation: Keep as confidence_hint in the gateway API to make it explicit that the agent's stated confidence is not the final value. The stored confidence field is the calibrated result.
  • Q: Should hash chain verification be synchronous (blocking writes) or asynchronous (background check)? Recommendation: Synchronous for the chain append (cheap — one hash computation). Asynchronous for full chain verification (expensive — walks entire chain). trails memory verify is the explicit full-check command.
  • Q: How should the system handle a detected hash chain break? Recommendation: Log a CHAIN_BREAK audit event, flag all facts after the break as integrity: "unverified", and alert the operator. Do NOT auto-repair — a break indicates either a bug or tampering, both requiring human investigation.
  • Q: Should quarantine apply to corrections too, or only to new facts? Recommendation: Apply to cross-agent corrections from low-trust agents. Self-corrections and same-agent corrections bypass quarantine.