Skip to content

ADR-0018: Agent runtime — planning strategies, LLM client, session context

  • Status: Accepted (2026-04-19)
  • Date: 2026-04-14
  • Supersedes:
  • Superseded by:
  • Target milestone: M9 (proposed v3.0.0)

Context

Trails' capability dispatch surface (@capability + invoke(), see python/src/trails/decorators.py, runtime.py) is pure function dispatch: a typed, provenance-stamped, budgeted call from principal to handler. That is the right shape for the kernel — small, boring, auditable. It is not an agent.

Building an actual agent on top of Trails today looks like this:

  1. Write an outer loop (ReAct, Plan-and-Execute, whatever).
  2. Pick an LLM client (Anthropic SDK / OpenAI SDK / Ollama / LiteLLM / LangChain) and wire retries, caching, prompt caching, model selection.
  3. Thread a conversation-history / tool-invocation / auth-state object through the loop by hand.
  4. Remember to feed the MCP tools/list — or the richer /.well-known/capabilities manifest (ADR-0005) — back into the planner prompt.
  5. Call trails.invoke() from inside step 2's tool-call handler.
  6. Hope that the cost of step 2 lands in CostTracker (cost.py) — it does not, by default, because invoke() only tracks per-capability costs.
  7. Hope that the PROV-O graph (ADR-0009) captures the reasoning trace — it does not, because PROV-O only fires on @capability invocations.

Every Trails application so far has reinvented these seven pieces. This is exactly the friction Rails eliminated for web apps (routing, controllers, sessions, CSRF, rendering), and it is exactly the friction Fabrica's AgentExecutor eliminates for its ecosystem by baking in ReAct / Plan-and-Execute / Reflexion as first-class.

Without a framework-owned agent runtime:

  • Cost telemetry is partial. The LLM-side spend — empirically 80–95 % of agent-app budgets — bypasses trails.cost. ADR-0012 is fulfilled at the capability boundary but violated at the LLM boundary.
  • Provenance is partial. The reasoning steps ("the model decided to call patient.intake because …") are not in the prov: graph. ADR-0009 is fulfilled at the capability boundary but violated at the planning boundary.
  • The positioning claim is aspirational. "Rails for agentic KG apps" with no planning loop is the same category error as "Rails without ActionController" — the component the claim advertises is missing.

The alternatives (Sections 10 below) are: ship dispatch-only and document patterns; depend on LangChain; or ship only an LLM client without planners. The first two fail the cost/provenance ADRs; the third still leaves every user reinventing ReAct.

Decision

Trails ships three composable primitives as one coordinated surface, introduced together in M9. They extend @capability / invoke(); they do not replace it. Apps that already have a planning loop keep using raw invoke() — the opt-out path stays clean.

1. Planning strategies — trails.agent.planners

Surface sketch.

trails.agent.planners
    PlanningStrategy          (Protocol / ABC)
    react(ctx, capabilities, goal, *, max_steps=...)
    plan_and_execute(ctx, capabilities, goal, *, replan_after=...)
    reflexion(ctx, capabilities, goal, *, critique_every=...)

Semantics. A planning strategy is a function (or callable) that consumes a Context, a list of discovered Capability descriptors (from /.well-known/capabilities, ADR-0005), and a user goal, and emits a sequence of trails.invoke() calls interleaved with LLM reasoning turns. Each strategy owns its loop shape: ReAct (think/act/observe cycles until goal or budget), Plan-and-Execute (LLM plan up-front, kernel executes steps, LLM replans on error or completion), Reflexion (ReAct cycles with an explicit self-critique phase between them). Strategies are pure orchestration — they produce no side effects the kernel does not already stamp. Selection happens in trails.toml ([agent] strategy = "react") or per-session (Session(strategy="reflexion")).

Relationship to existing ADRs. - ADR-0005 (rich manifest): planners consume the canonical JSON-LD descriptor, not the MCP projection — they read cost, preconditions, side_effects to make cheaper / safer plans. The manifest finally has a first-party consumer inside Trails. - ADR-0009 (provenance): each planner step that triggers invoke() produces the usual PROV-O activity; the planner additionally emits one prov:Activity per LLM reasoning turn, linked via prov:wasInformedBy from each resulting capability activity. - ADR-0012 (cost): each planner step passes through CostAccountant; a budget-exhausted principal raises BudgetExceeded from inside the loop, terminating the strategy cleanly.

2. LLM client — trails.llm

Surface sketch.

trails.llm
    LLMClient                 (thin adapter)
    select_model(task, *, budget_hint=...) -> ModelSpec
    complete(prompt, *, model, cache=True, principal=...) -> Response
    stream(prompt, ...)       -> Iterator[Chunk]
    Response                  (dataclass: text, tool_calls, tokens, usd, cache_hit)

Semantics. A thin adapter, not a new abstraction layer over LiteLLM / LangChain. trails.llm owns exactly the properties the framework must own to honour ADR-0009 and ADR-0012: (a) every complete() / stream() call opens a cost envelope tagged with the acting principal, plugs into trails.cost.CostTracker, and respects per-principal budgets; (b) every call emits a prov:Activity of type llm:Completion linked to the calling capability or planner activity; © retry / rate-limit / Anthropic + OpenAI prompt-caching are implemented once here, not in user code; (d) model selection is declarative (ModelSpec) so the planner can cost-compare models the way it cost-compares capabilities. Supported providers at M9: Anthropic (primary), Ollama (local), and a pluggable Adapter interface to let apps add OpenAI / Azure / Bedrock without the core taking a dependency.

Relationship to existing ADRs. - ADR-0009 (provenance): LLM calls join the prov: graph on equal footing with capability calls. The graph goes from "what did the agent do" to "what did the agent think and do." - ADR-0012 (cost): trails.llm is where cost.py's claim of being "a framework primitive" finally covers the dominant cost source. - ADR-0008 (MCP primary): trails.llm is deliberately not a transport layer — it is a client for model providers. It does not compete with MCP; planners call MCP-exposed capabilities via invoke(), and call LLMs via trails.llm.

3. Context / Session — trails.agent.context

Surface sketch.

trails.agent.context
    Session                   (user-facing)
        new(*, strategy, principal, budget=None, agent_card=None) -> Session
        run(goal) -> Result
        history: TokenWindow          # ephemeral conversation
        invocations: list[Envelope]   # capability calls this session
        auth: AuthState
        card: AgentCard | None        # ADR-0015 WoT card
        persist(*, graph="session:<uuid>") -> IRI
        @classmethod replay(graph_iri) -> Session
    Context                   (internal, passed to planners + handlers)

Semantics. Session is ephemeral per-session state — it is not conflated with the RDF knowledge graph the application serves. It holds: (a) a token-windowed conversation history (default: model's context window minus a safety margin; eviction strategy pluggable), (b) the ordered list of invoke() envelopes made in this session, © the current auth state (principal, biscuit, DID), and (d) the active WoT agent card (ADR-0015) if one is bound. Context is the planner-facing view of Session: handlers receive a ctx argument already today (see decorators._extract_params skipping leading ctx) — M9 gives that ctx real shape. A session is persistable on user request: calling session.persist() writes a replayable projection into a named graph (e.g., session:<uuid>), enabling Session.replay(iri) for audit and regression testing. Persistence is opt-in because most sessions are not worth keeping.

Relationship to existing ADRs. - ADR-0009 (provenance): session-level prov:Activity becomes the parent activity for every LLM + capability step in that session, linked via prov:wasInformedBy — the graph now has a session root, not just a forest of per-invocation activities. - ADR-0015 (WoT agent card): the session's card field is how an inbound agent card enters the runtime; planners use it to decide which capabilities the calling agent is allowed to see. - ADR-0013 (ACT/ECT): session persistence into a named graph inherits the export-boundary rules — an L1 session cannot be exported cross-domain (ADR-0009 Update ©).

Worked example — combined surface (under 35 lines)

# user code                              (U)
# library code                           (L)
from trails import capability, Session   # L
from trails.agent import planners        # L
from trails.llm import select_model      # L

@capability(id="evidence.fetch",         # U
            description="Fetch a document from the evidence graph",
            cost_estimate={"usd": 0.0, "tokens": 0})
def fetch(ctx, doc_id: str):             # U
    return ctx.graph.get(doc_id)         # U (app-level KG read)

@capability(id="evidence.summarize",     # U
            cost_estimate={"usd": 0.02, "tokens": 1500})
def summarize(ctx, text: str):           # U
    return ctx.llm.complete(             # U -> L (auto cost + prov)
        f"Summarize:\n{text}",
        model=select_model("summarize", budget_hint="cheap"),
    ).text

session = Session.new(                   # L (ephemeral, not KG-backed)
    strategy="react",                    # U (picks ReAct planner)
    principal="did:key:zAlice",          # U
    budget={"usd": 0.50, "tokens": 50_000},  # U (ADR-0012)
)
result = session.run(                    # L drives ReAct loop:
    "Summarize every regulatory filing "
    "touching product SKU-7 this quarter"
)                                        # each invoke() -> PROV (ADR-0009)
                                         # each LLM call -> PROV + cost
print(result.answer)                     # U
session.persist(graph="audit:SKU-7-Q1")  # U (opt-in KG write)

Library-owned lines (L) expand; user-owned lines (U) do not. A developer writes capabilities and a goal string; the framework writes the loop, the LLM calls, the cost envelope, the PROV-O graph, and the optional audit persistence.

Phased delivery (M9 scope)

Phase 1 — trails.llm + Context

  • trails.llm.LLMClient with complete / stream, Anthropic and Ollama adapters, retry + Anthropic / OpenAI-style prompt caching, per-call cost envelope plugged into trails.cost, per-call prov:Activity.
  • trails.agent.context.Context + Session skeleton with TokenWindow (default FIFO eviction, strategy pluggable).
  • Unit tests: cost accounting correctness, PROV-O emission shape, token-window eviction bounds.

Phase 2 — ReAct strategy

  • trails.agent.planners.react(...) consuming the rich capability manifest.
  • End-to-end integration with invoke(): each action step routes through the existing dispatch path; no new side-effect plumbing.
  • PROV-O per step: llm:Completion activity + capability prov:Activity, linked via prov:wasInformedBy.
  • Integration test: compliance-shaped goal with two capabilities, asserts budget enforcement terminates the loop on exhaustion.

Phase 3 — Plan-and-Execute + Reflexion

  • trails.agent.planners.plan_and_execute(...) with explicit plan / execute / replan phases.
  • trails.agent.planners.reflexion(...) with critique-every-N cycles.
  • Strategy chooser: trails.toml [agent] strategy = "..." and per-session override.

Phase 4 — Session persistence to KG

  • Session.persist(graph=...) writes a replayable projection (conversation history, invocation envelope hashes, LLM activity IRIs, agent card ref) into the chosen named graph.
  • Session.replay(graph_iri) reconstructs the ephemeral state for audit / regression; replay is read-only and does not re-emit PROV.

Non-goals (scope fence)

  • Not a LangChain / LlamaIndex replacement. Trails is narrower and opinionated: PROV-O and cost envelopes on every step, rich-manifest consumption, MCP-first transport. Abstractions that cannot honour those constraints do not belong here.
  • No new LLM model support beyond the adapter targets. M9 ships Anthropic + Ollama + a pluggable Adapter. No fine-tuning, no training-loop integration, no bespoke inference server.
  • No agent-to-agent protocol beyond what MCP already covers (ADR-0008). Planners drive local invoke() calls; agent-to-agent handshake is the MCP surface and its associated WoT agent card (ADR-0015), not a new Trails-owned protocol.
  • Not an evaluation harness. trails sim / evals land elsewhere.

Consequences

Positive

  • "Rails for agentic KG" becomes concrete. The headline claim now points at a component that exists. A developer with capabilities written can reach a working agent in ~10 lines.
  • Cost + provenance apply uniformly. ADR-0009 and ADR-0012 cover the LLM and planning boundaries, not only the capability boundary. The dominant spend source is finally instrumented.
  • Planner can negotiate on the rich manifest. ADR-0005's cost, preconditions, side_effects fields get a first-party consumer inside Trails, not only external agents.
  • Replayable audit trails. Session.persist() gives regulators, debuggers, and CI the same artefact — a named graph they can SPARQL against.

Negative

  • A meaningful surface to maintain. Three primitives, provider adapters, strategy implementations, persistence format.
  • Risk of drifting into a generic agent framework. Mitigated by the scope fence above and by keeping the trails.llm adapter deliberately thin (no chain / graph / DAG abstractions — raw complete / stream only).
  • Provider churn exposure. Model SDKs and prompt-caching semantics change. Mitigated by versioning LLMClient and keeping adapter surface area minimal; an adapter is a ~150-line file, cheap to replace.

Neutral

  • Opt-out path stays clean. Apps that already have a planning loop keep calling trails.invoke() directly and pay no new cost — trails.agent and trails.llm are imported on use, not on import trails.
  • Session persistence competes with ADR-0017 ORM for KG-write attention. Resolved by scoping: sessions write only into the app-configurable session graph (default session:*), never into domain graphs the ORM owns.

Alternatives considered

  • Keep dispatch-only; document ReAct / Plan-and-Execute as patterns. Rejected. Every existing Trails prototype app has reinvented these three primitives, none of them compatibly. This is the Fabrica thesis inverted: not shipping the agent loop is the wrong bet, because without it every user reinvents it worse.
  • Depend on LangChain / LlamaIndex for the agent loop; keep Trails dispatch-only. Rejected. LangChain's cost accounting is opaque to Trails' CostAccountant, and its callback machinery does not naturally project into PROV-O. Taking the dependency would break ADR-0009 (provenance always on) and ADR-0012 (cost as primitive) at exactly the boundary those ADRs need to cover.
  • Bake only the LLM client; leave planning strategies out. Rejected. This solves © prompt-caching and (d) budget-linked LLM calls but leaves every user still writing a ReAct loop by hand — against inconsistent conventions. The three primitives are co-valuable; shipping them together is cheaper than shipping them serially across three minor versions.
  • Ship only ReAct; add other strategies on demand. Deferred, not rejected. Phase 2 ships only ReAct; Phase 3 adds Plan-and-Execute and Reflexion. If Phase 3 slips past M9, the ADR still stands — the strategy slot exists.

Open questions

  1. LLM-internal provenance granularity. Do we emit a prov:Activity per token, per LLM call, or both (call-level as primary, optional token-level trace linked via prov:wasInformedBy)? Per-token loses us the NFR-Perf1 latency budget almost immediately; per-call may be too coarse for regulators who want to see reasoning trace. Leaning toward per call + an opt-in token-trace named graph (prov:llm-tokens) that defaults off and is never exported cross-boundary regardless of ADR-0013 assurance.
  2. Sync vs async LLMClient. Kernel is sync through M1 and moves async in M1+ (guardrails §3). M9 is post-async; default async is the right call, but sync shims matter for scripts and tests. Ship both, or sync-over-async?
  3. Context vs ADR-0017 ORM overlap. Session.invocations holds capability envelopes; ActiveGraph models (ADR-0017) hold domain entities. When a capability returns an ActiveGraph instance, does the session hold the instance, its IRI, or a pickled snapshot? The replay semantics of each choice differ sharply.
  4. Strategy mix mid-session. Can a session switch strategies (e.g., start in Plan-and-Execute, fall back to ReAct on replan failure)? Convenient but complicates PROV graph topology — the root activity's prov:wasInformedBy tree gets messier.
  5. Prompt-cache coordination across sessions. Anthropic's prompt-cache keys are per-account. If two sessions on the same principal share a system prompt, do we key the cache on the system prompt hash (sharing cache) or on session.id (no sharing)? Sharing saves money; not sharing is simpler to reason about for audit replay.
  6. Budget handling when a planner and a capability both want to spend. A single invoke() inside a ReAct loop may itself call ctx.llm.complete() (see worked example). We open two cost envelopes (planner-step, capability-internal). Do they nest, or does the capability one attach to the capability's prov:Activity only? Double-counting must not happen.