ADR-0018: Agent runtime — planning strategies, LLM client, session context¶
- Status: Accepted (2026-04-19)
- Date: 2026-04-14
- Supersedes: —
- Superseded by: —
- Target milestone: M9 (proposed
v3.0.0)
Context¶
Trails' capability dispatch surface (@capability + invoke(), see
python/src/trails/decorators.py, runtime.py) is pure function
dispatch: a typed, provenance-stamped, budgeted call from principal to
handler. That is the right shape for the kernel — small, boring,
auditable. It is not an agent.
Building an actual agent on top of Trails today looks like this:
- Write an outer loop (ReAct, Plan-and-Execute, whatever).
- Pick an LLM client (Anthropic SDK / OpenAI SDK / Ollama / LiteLLM / LangChain) and wire retries, caching, prompt caching, model selection.
- Thread a conversation-history / tool-invocation / auth-state object through the loop by hand.
- Remember to feed the MCP
tools/list— or the richer/.well-known/capabilitiesmanifest (ADR-0005) — back into the planner prompt. - Call
trails.invoke()from inside step 2's tool-call handler. - Hope that the cost of step 2 lands in
CostTracker(cost.py) — it does not, by default, becauseinvoke()only tracks per-capability costs. - Hope that the PROV-O graph (ADR-0009) captures the reasoning trace —
it does not, because PROV-O only fires on
@capabilityinvocations.
Every Trails application so far has reinvented these seven pieces. This
is exactly the friction Rails eliminated for web apps (routing,
controllers, sessions, CSRF, rendering), and it is exactly the friction
Fabrica's AgentExecutor eliminates for its ecosystem by baking in
ReAct / Plan-and-Execute / Reflexion as first-class.
Without a framework-owned agent runtime:
- Cost telemetry is partial. The LLM-side spend — empirically
80–95 % of agent-app budgets — bypasses
trails.cost. ADR-0012 is fulfilled at the capability boundary but violated at the LLM boundary. - Provenance is partial. The reasoning steps ("the model decided
to call
patient.intakebecause …") are not in theprov:graph. ADR-0009 is fulfilled at the capability boundary but violated at the planning boundary. - The positioning claim is aspirational. "Rails for agentic KG apps" with no planning loop is the same category error as "Rails without ActionController" — the component the claim advertises is missing.
The alternatives (Sections 10 below) are: ship dispatch-only and document patterns; depend on LangChain; or ship only an LLM client without planners. The first two fail the cost/provenance ADRs; the third still leaves every user reinventing ReAct.
Decision¶
Trails ships three composable primitives as one coordinated surface,
introduced together in M9. They extend @capability / invoke();
they do not replace it. Apps that already have a planning loop keep
using raw invoke() — the opt-out path stays clean.
1. Planning strategies — trails.agent.planners¶
Surface sketch.
trails.agent.planners
PlanningStrategy (Protocol / ABC)
react(ctx, capabilities, goal, *, max_steps=...)
plan_and_execute(ctx, capabilities, goal, *, replan_after=...)
reflexion(ctx, capabilities, goal, *, critique_every=...)
Semantics. A planning strategy is a function (or callable) that
consumes a Context, a list of discovered Capability descriptors
(from /.well-known/capabilities, ADR-0005), and a user goal, and
emits a sequence of trails.invoke() calls interleaved with LLM
reasoning turns. Each strategy owns its loop shape: ReAct
(think/act/observe cycles until goal or budget), Plan-and-Execute
(LLM plan up-front, kernel executes steps, LLM replans on error or
completion), Reflexion (ReAct cycles with an explicit self-critique
phase between them). Strategies are pure orchestration — they produce
no side effects the kernel does not already stamp. Selection happens in
trails.toml ([agent] strategy = "react") or per-session
(Session(strategy="reflexion")).
Relationship to existing ADRs.
- ADR-0005 (rich manifest): planners consume the canonical JSON-LD
descriptor, not the MCP projection — they read cost, preconditions,
side_effects to make cheaper / safer plans. The manifest finally has
a first-party consumer inside Trails.
- ADR-0009 (provenance): each planner step that triggers invoke()
produces the usual PROV-O activity; the planner additionally emits one
prov:Activity per LLM reasoning turn, linked via prov:wasInformedBy
from each resulting capability activity.
- ADR-0012 (cost): each planner step passes through CostAccountant;
a budget-exhausted principal raises BudgetExceeded from inside the
loop, terminating the strategy cleanly.
2. LLM client — trails.llm¶
Surface sketch.
trails.llm
LLMClient (thin adapter)
select_model(task, *, budget_hint=...) -> ModelSpec
complete(prompt, *, model, cache=True, principal=...) -> Response
stream(prompt, ...) -> Iterator[Chunk]
Response (dataclass: text, tool_calls, tokens, usd, cache_hit)
Semantics. A thin adapter, not a new abstraction layer over
LiteLLM / LangChain. trails.llm owns exactly the properties the
framework must own to honour ADR-0009 and ADR-0012: (a) every
complete() / stream() call opens a cost envelope tagged with the
acting principal, plugs into trails.cost.CostTracker, and respects
per-principal budgets; (b) every call emits a prov:Activity of type
llm:Completion linked to the calling capability or planner activity;
© retry / rate-limit / Anthropic + OpenAI prompt-caching are
implemented once here, not in user code; (d) model selection is
declarative (ModelSpec) so the planner can cost-compare models the
way it cost-compares capabilities. Supported providers at M9: Anthropic
(primary), Ollama (local), and a pluggable Adapter interface to let
apps add OpenAI / Azure / Bedrock without the core taking a
dependency.
Relationship to existing ADRs.
- ADR-0009 (provenance): LLM calls join the prov: graph on equal
footing with capability calls. The graph goes from "what did the
agent do" to "what did the agent think and do."
- ADR-0012 (cost): trails.llm is where cost.py's claim of being
"a framework primitive" finally covers the dominant cost source.
- ADR-0008 (MCP primary): trails.llm is deliberately not a
transport layer — it is a client for model providers. It does not
compete with MCP; planners call MCP-exposed capabilities via
invoke(), and call LLMs via trails.llm.
3. Context / Session — trails.agent.context¶
Surface sketch.
trails.agent.context
Session (user-facing)
new(*, strategy, principal, budget=None, agent_card=None) -> Session
run(goal) -> Result
history: TokenWindow # ephemeral conversation
invocations: list[Envelope] # capability calls this session
auth: AuthState
card: AgentCard | None # ADR-0015 WoT card
persist(*, graph="session:<uuid>") -> IRI
@classmethod replay(graph_iri) -> Session
Context (internal, passed to planners + handlers)
Semantics. Session is ephemeral per-session state — it is
not conflated with the RDF knowledge graph the application serves. It
holds: (a) a token-windowed conversation history (default: model's
context window minus a safety margin; eviction strategy pluggable), (b)
the ordered list of invoke() envelopes made in this session, © the
current auth state (principal, biscuit, DID), and (d) the active WoT
agent card (ADR-0015) if one is bound. Context is the planner-facing
view of Session: handlers receive a ctx argument already today (see
decorators._extract_params skipping leading ctx) — M9 gives that
ctx real shape. A session is persistable on user request: calling
session.persist() writes a replayable projection into a named graph
(e.g., session:<uuid>), enabling Session.replay(iri) for audit and
regression testing. Persistence is opt-in because most sessions are not
worth keeping.
Relationship to existing ADRs.
- ADR-0009 (provenance): session-level prov:Activity becomes the
parent activity for every LLM + capability step in that session,
linked via prov:wasInformedBy — the graph now has a session root,
not just a forest of per-invocation activities.
- ADR-0015 (WoT agent card): the session's card field is how an
inbound agent card enters the runtime; planners use it to decide
which capabilities the calling agent is allowed to see.
- ADR-0013 (ACT/ECT): session persistence into a named graph
inherits the export-boundary rules — an L1 session cannot be
exported cross-domain (ADR-0009 Update ©).
Worked example — combined surface (under 35 lines)¶
# user code (U)
# library code (L)
from trails import capability, Session # L
from trails.agent import planners # L
from trails.llm import select_model # L
@capability(id="evidence.fetch", # U
description="Fetch a document from the evidence graph",
cost_estimate={"usd": 0.0, "tokens": 0})
def fetch(ctx, doc_id: str): # U
return ctx.graph.get(doc_id) # U (app-level KG read)
@capability(id="evidence.summarize", # U
cost_estimate={"usd": 0.02, "tokens": 1500})
def summarize(ctx, text: str): # U
return ctx.llm.complete( # U -> L (auto cost + prov)
f"Summarize:\n{text}",
model=select_model("summarize", budget_hint="cheap"),
).text
session = Session.new( # L (ephemeral, not KG-backed)
strategy="react", # U (picks ReAct planner)
principal="did:key:zAlice", # U
budget={"usd": 0.50, "tokens": 50_000}, # U (ADR-0012)
)
result = session.run( # L drives ReAct loop:
"Summarize every regulatory filing "
"touching product SKU-7 this quarter"
) # each invoke() -> PROV (ADR-0009)
# each LLM call -> PROV + cost
print(result.answer) # U
session.persist(graph="audit:SKU-7-Q1") # U (opt-in KG write)
Library-owned lines (L) expand; user-owned lines (U) do not. A developer writes capabilities and a goal string; the framework writes the loop, the LLM calls, the cost envelope, the PROV-O graph, and the optional audit persistence.
Phased delivery (M9 scope)¶
Phase 1 — trails.llm + Context¶
trails.llm.LLMClientwithcomplete/stream, Anthropic and Ollama adapters, retry + Anthropic / OpenAI-style prompt caching, per-call cost envelope plugged intotrails.cost, per-callprov:Activity.trails.agent.context.Context+Sessionskeleton withTokenWindow(default FIFO eviction, strategy pluggable).- Unit tests: cost accounting correctness, PROV-O emission shape, token-window eviction bounds.
Phase 2 — ReAct strategy¶
trails.agent.planners.react(...)consuming the rich capability manifest.- End-to-end integration with
invoke(): each action step routes through the existing dispatch path; no new side-effect plumbing. - PROV-O per step:
llm:Completionactivity + capabilityprov:Activity, linked viaprov:wasInformedBy. - Integration test: compliance-shaped goal with two capabilities, asserts budget enforcement terminates the loop on exhaustion.
Phase 3 — Plan-and-Execute + Reflexion¶
trails.agent.planners.plan_and_execute(...)with explicit plan / execute / replan phases.trails.agent.planners.reflexion(...)with critique-every-N cycles.- Strategy chooser:
trails.toml[agent] strategy = "..."and per-session override.
Phase 4 — Session persistence to KG¶
Session.persist(graph=...)writes a replayable projection (conversation history, invocation envelope hashes, LLM activity IRIs, agent card ref) into the chosen named graph.Session.replay(graph_iri)reconstructs the ephemeral state for audit / regression; replay is read-only and does not re-emit PROV.
Non-goals (scope fence)¶
- Not a LangChain / LlamaIndex replacement. Trails is narrower and opinionated: PROV-O and cost envelopes on every step, rich-manifest consumption, MCP-first transport. Abstractions that cannot honour those constraints do not belong here.
- No new LLM model support beyond the adapter targets. M9 ships
Anthropic + Ollama + a pluggable
Adapter. No fine-tuning, no training-loop integration, no bespoke inference server. - No agent-to-agent protocol beyond what MCP already covers (ADR-0008).
Planners drive local
invoke()calls; agent-to-agent handshake is the MCP surface and its associated WoT agent card (ADR-0015), not a new Trails-owned protocol. - Not an evaluation harness.
trails sim/ evals land elsewhere.
Consequences¶
Positive¶
- "Rails for agentic KG" becomes concrete. The headline claim now points at a component that exists. A developer with capabilities written can reach a working agent in ~10 lines.
- Cost + provenance apply uniformly. ADR-0009 and ADR-0012 cover the LLM and planning boundaries, not only the capability boundary. The dominant spend source is finally instrumented.
- Planner can negotiate on the rich manifest. ADR-0005's
cost,preconditions,side_effectsfields get a first-party consumer inside Trails, not only external agents. - Replayable audit trails.
Session.persist()gives regulators, debuggers, and CI the same artefact — a named graph they canSPARQLagainst.
Negative¶
- A meaningful surface to maintain. Three primitives, provider adapters, strategy implementations, persistence format.
- Risk of drifting into a generic agent framework. Mitigated by
the scope fence above and by keeping the
trails.llmadapter deliberately thin (no chain / graph / DAG abstractions — rawcomplete/streamonly). - Provider churn exposure. Model SDKs and prompt-caching semantics
change. Mitigated by versioning
LLMClientand keeping adapter surface area minimal; an adapter is a ~150-line file, cheap to replace.
Neutral¶
- Opt-out path stays clean. Apps that already have a planning loop
keep calling
trails.invoke()directly and pay no new cost —trails.agentandtrails.llmare imported on use, not onimport trails. - Session persistence competes with ADR-0017 ORM for KG-write
attention. Resolved by scoping: sessions write only into the
app-configurable session graph (default
session:*), never into domain graphs the ORM owns.
Alternatives considered¶
- Keep dispatch-only; document ReAct / Plan-and-Execute as patterns. Rejected. Every existing Trails prototype app has reinvented these three primitives, none of them compatibly. This is the Fabrica thesis inverted: not shipping the agent loop is the wrong bet, because without it every user reinvents it worse.
- Depend on LangChain / LlamaIndex for the agent loop; keep Trails
dispatch-only. Rejected. LangChain's cost accounting is opaque to
Trails'
CostAccountant, and its callback machinery does not naturally project into PROV-O. Taking the dependency would break ADR-0009 (provenance always on) and ADR-0012 (cost as primitive) at exactly the boundary those ADRs need to cover. - Bake only the LLM client; leave planning strategies out. Rejected. This solves © prompt-caching and (d) budget-linked LLM calls but leaves every user still writing a ReAct loop by hand — against inconsistent conventions. The three primitives are co-valuable; shipping them together is cheaper than shipping them serially across three minor versions.
- Ship only ReAct; add other strategies on demand. Deferred, not rejected. Phase 2 ships only ReAct; Phase 3 adds Plan-and-Execute and Reflexion. If Phase 3 slips past M9, the ADR still stands — the strategy slot exists.
Open questions¶
- LLM-internal provenance granularity. Do we emit a
prov:Activityper token, per LLM call, or both (call-level as primary, optional token-level trace linked viaprov:wasInformedBy)? Per-token loses us the NFR-Perf1 latency budget almost immediately; per-call may be too coarse for regulators who want to see reasoning trace. Leaning toward per call + an opt-in token-trace named graph (prov:llm-tokens) that defaults off and is never exported cross-boundary regardless of ADR-0013 assurance. - Sync vs async
LLMClient. Kernel is sync through M1 and moves async in M1+ (guardrails §3). M9 is post-async; default async is the right call, but sync shims matter for scripts and tests. Ship both, or sync-over-async? Contextvs ADR-0017 ORM overlap.Session.invocationsholds capability envelopes; ActiveGraph models (ADR-0017) hold domain entities. When a capability returns an ActiveGraph instance, does the session hold the instance, its IRI, or a pickled snapshot? The replay semantics of each choice differ sharply.- Strategy mix mid-session. Can a session switch strategies
(e.g., start in Plan-and-Execute, fall back to ReAct on replan
failure)? Convenient but complicates PROV graph topology — the
root activity's
prov:wasInformedBytree gets messier. - Prompt-cache coordination across sessions. Anthropic's
prompt-cache keys are per-account. If two sessions on the same
principal share a system prompt, do we key the cache on the system
prompt hash (sharing cache) or on
session.id(no sharing)? Sharing saves money; not sharing is simpler to reason about for audit replay. - Budget handling when a planner and a capability both want to
spend. A single
invoke()inside a ReAct loop may itself callctx.llm.complete()(see worked example). We open two cost envelopes (planner-step, capability-internal). Do they nest, or does the capability one attach to the capability'sprov:Activityonly? Double-counting must not happen.