Chapter 5 — Building Agentic Apps¶
Learning objectives¶
After this chapter you will be able to:
- Explain what makes a Trails app "agentic" versus a plain KG app.
- Create an
LLMClientbacked by Anthropic, Ollama, or a mock provider. - Manage multi-turn conversations with
Sessionand token windowing. - Choose between ReAct, Plan-and-Execute, and Reflexion planners.
- Enforce cost and time budgets on planner runs.
- Trace every agent step through the PROV-O provenance graph.
What makes an app "agentic"?¶
A standard Trails app exposes @capability handlers and answers requests
with a single invoke() call. An agentic app wraps those same
capabilities in a planning loop: an LLM decides which capability to
call next, observes the result, and repeats until a goal is met. The
capabilities do not change; the planner is a thin orchestrator on top.
non-agentic: user → invoke("notes.search", {tag: "urgent"}) → result
agentic: user → planner(goal="find urgent notes and summarise them")
├─ LLM decides → invoke("notes.search", {tag: "urgent"})
├─ LLM reads observation → invoke("note.summarize", ...)
└─ LLM decides → finish with answer
The key difference: the human writes the goal, not the dispatch sequence. The planner fills in the sequence at runtime.
The LLM client¶
Trails ships a single LLMClient class with three factory methods. All
three return a client with the same .complete() interface, so swapping
providers is a one-line change.
Ollama (local, free)¶
from trails.llm import LLMClient
client = LLMClient.ollama(
model="qwen3:8b",
base_url="http://localhost:11434",
)
No extra dependencies. Uses stdlib http.client to talk to Ollama's
REST API. Run any model locally — Qwen, Llama, Mistral, Phi, Gemma:
If the server is unreachable you get a clean TrailsError with the hint
"Is ollama serve running?".
Anthropic (cloud)¶
client = LLMClient.anthropic(
model="claude-sonnet-4-5",
cache=True, # enable Anthropic prompt caching
timeout=30.0,
)
Requires the optional anthropic SDK:
Mock (tests)¶
client = LLMClient.mock(response="42 is the answer")
# Or with a callable that inspects the messages:
client = LLMClient.mock(
response=lambda msgs: f"You sent {len(msgs)} messages",
cost_usd=0.0,
)
Mock clients are deterministic and free. Use them in unit tests so your test suite never calls an API.
The shared call shape¶
All three clients expose the same .complete() method:
from trails.llm import Message
resp = client.complete(
messages=[
Message(role="system", content="You are a helpful analyst."),
Message(role="user", content="Summarize Q1 filings."),
],
max_tokens=1024,
temperature=0.0,
ctx=ctx, # optional — enables cost + provenance tracking
)
print(resp.text) # the LLM's reply
print(resp.cost_usd) # cost in USD (0.0 for Ollama/mock)
print(resp.usage) # LLMUsage(prompt_tokens=..., completion_tokens=..., total_tokens=...)
Pass ctx=ctx inside a @capability handler and cost tracking plus
PROV-O activity emission happen automatically. Omit it in scripts and
tests.
Sessions: multi-turn conversation state¶
For single-shot LLM calls, pass a messages list directly. For
multi-turn work (chat, agent loops, reasoning chains), use Session:
from trails.agent import Session
sess = Session(
principal="did:local:alice",
max_tokens=32_000, # token budget for the conversation window
pin_head=1, # pin the first N messages (usually system prompt)
)
sess.append("system", "You are a research assistant.")
sess.append("user", "Find papers about drug interactions.")
Token windowing¶
Session wraps a TokenWindow — a FIFO sliding window that keeps the
conversation within max_tokens. The first pin_head messages (your
system prompt) are pinned and never evicted. When the window is full,
the oldest non-pinned messages are dropped first.
This means long conversations degrade gracefully: the LLM always sees the system prompt and the most recent turns.
History and replay¶
Session stores the full message history and an invocations list of
raw dispatch envelopes. Reuse the same Session across planner calls to
continue a conversation:
# First goal
result1 = react.run("Find urgent notes.", llm=client, session=sess)
# Second goal — the LLM sees the full conversation so far
result2 = react.run("Now summarize them.", llm=client, session=sess)
Create a fresh Session for each independent goal.
Planners¶
Trails ships three planning strategies. They share the same call
signature and return the same PlanResult shape, so switching is a
one-line import change.
ReAct — think, act, observe¶
from trails.agent.planners import react
result = react.run(
"Find urgent notes and summarize them.",
llm=client,
session=sess,
max_steps=10,
temperature=0.0,
)
print(result.answer) # the final answer
print(result.stopped) # "goal_achieved" | "max_steps" | "error"
print(len(result.steps)) # number of think-act-observe turns
On every turn the LLM emits a JSON block with thought, action, and
action_input. The planner dispatches the action through trails.invoke,
feeds the observation back, and loops. When action == "finish", the
planner returns the final_answer.
Tool errors are not fatal -- they become the step's observation so the LLM can self-correct.
Best for: search-then-summarize, classify-then-route, any task where the next step depends on the last observation and fits under ~10 steps.
Plan-and-Execute — plan once, execute many¶
from trails.agent.planners import plan_and_execute
result = plan_and_execute.run(
"Ingest 5 files and validate each.",
llm=client,
session=sess,
max_steps=10,
max_replans=3, # replan up to 3 times on failure
)
The LLM emits a full plan in one call (a JSON list of actions), then the kernel executes each step without further LLM calls. If a step fails, the planner asks the LLM for a revised plan and resumes.
Best for: long-horizon workflows with a clear decomposition where each step's success is individually checkable.
Reflexion — self-critique and retry¶
from trails.agent.planners import reflexion
result = reflexion.run(
"Write a comprehensive analysis of the trial data.",
llm=client,
session=sess,
max_steps=10,
max_outer_iterations=3, # max critic loops
)
Reflexion wraps ReAct in a critic loop. After each inner ReAct run, a
separate critic LLM call evaluates the answer. On "accept" the loop
ends. On "retry", the critique is appended to the session so the next
attempt avoids the named defect.
Critic replies are stored in session.metadata["critiques"] for
post-hoc inspection.
Best for: tasks where correctness is hard to verify in one pass -- rubric-matching, soft constraints, answer quality.
Tool discovery¶
By default, planners discover every registered @capability
automatically. You can scope the tool set to reduce prompt size and
cost:
# Explicit list — deterministic, lean prompt
react.run(goal, llm=c, session=s, tools=["notes.search", "notes.tag"])
# Top-N most recently registered
react.run(goal, llm=c, session=s, tools=5)
# Keyword filter
react.run(goal, llm=c, session=s, tool_filter="notes search tag")
# Predicate filter
react.run(
goal, llm=c, session=s,
tool_filter=lambda meta: "read-only" in meta["description"],
)
At 50+ registered capabilities the system prompt starts dominating
cost. Always scope tools= for production workloads.
Budget enforcement¶
Three cumulative-budget kwargs cap a planner run without depending on
max_steps:
| Kwarg | On breach, result.stopped = |
|---|---|
max_cost_usd=0.50 |
"max_cost" |
max_tokens=100_000 |
"max_tokens" |
max_wall_time_s=30.0 |
"max_wall_time" |
All three are None (unlimited) by default and available on all three
planners.
result = react.run(
"Analyze all documents.",
llm=client,
session=sess,
max_cost_usd=0.25,
max_wall_time_s=30.0,
)
if result.stopped == "goal_achieved":
print("Done:", result.answer)
elif result.stopped in {"max_cost", "max_tokens", "max_wall_time"}:
print(f"Budget hit ({result.stopped}), best so far:", result.answer)
# Walk the trajectory for structured partial results
for step in reversed(result.steps):
if step.observation and not str(step.observation).startswith("error:"):
print("Last good observation:", step.observation)
break
Budget checks run at iteration boundaries, not mid-step. The in-flight
operation finishes, the loop aborts cleanly, and a trails:wasTerminatedBy
triple is written to the provenance graph.
PROV-O tracking for agent steps¶
When you pass ctx=ctx to a planner, provenance is automatic:
- One
trails:ReActPlanroot activity perrun(), tagged with goal, principal, and session ID. - One
trails:LLMCompletionactivity per LLM call (each planner step). - Each
invoke()inside the loop emits its ownprov:Activity. - All activities are linked via
prov:wasInformedBy.
Without ctx the planner still runs but without telemetry. Tests can
omit it; production code should always pass it.
When to use which planner¶
How is "goal met?" judged?
├── Step-by-step, next action depends on last observation → ReAct
├── End-to-end, one upfront decomposition is obvious → Plan-and-Execute
└── End-to-end, correctness needs a second pass → Reflexion
Single LLM call enough to produce the answer?
→ Don't use a planner — just call LLMClient.complete directly.
| Axis | ReAct | Plan-and-Execute | Reflexion |
|---|---|---|---|
| LLM calls | N (one per step) | 1 + replans | C x (N_inner + 1 critic) |
| Best for | Short reactive tasks | Multi-step workflows | Quality-critical answers |
| Disappoints when | Long horizons (>10 steps) | Every step depends on the prior | Answer is obviously correct on first try |
| Typical cost | Cheapest | Cheapest for long clean runs | 2-3x ReAct |
For tasks that boil down to one templated LLM call plus one invoke(),
skip the planner entirely:
text = client.complete([Message(role="user", content=prompt)], ctx=ctx).text
result = trails.invoke("notes.tag", {"id": note_id, "tag": text.strip()},
principal="did:local:alice")
Example: building a research assistant¶
Here is a complete example that combines everything from this chapter. The assistant searches a note database, reads matching notes, and produces a summary.
from trails import capability
from trails.agent import Session
from trails.agent.planners import react
from trails.llm import LLMClient
# --- Define capabilities ---
@capability(id="notes.search", description="Search notes by tag. Returns note IDs.")
def search_notes(ctx, tag: str) -> dict:
results = ctx.kg.query(f"""
SELECT ?note WHERE {{
?note a <trails://app/Note> ;
<trails://app/Note/tag> "{tag}" .
}}
""")
return {"hits": [r["note"] for r in results]}
@capability(id="notes.read", description="Read a note's full text by IRI.")
def read_note(ctx, iri: str) -> dict:
results = ctx.kg.query(f"""
SELECT ?text WHERE {{
<{iri}> <trails://app/Note/text> ?text .
}}
""")
return {"text": results[0]["text"] if results else "Not found"}
@capability(id="notes.summarize", description="Summarize a list of texts.")
def summarize(ctx, texts: list) -> dict:
joined = "\n---\n".join(texts)
resp = LLMClient.ollama(model="qwen3:8b").complete(
[{"role": "user", "content": f"Summarize these notes:\n{joined}"}],
max_tokens=500,
ctx=ctx,
)
return {"summary": resp.text}
# --- Run the agent ---
client = LLMClient.anthropic(model="claude-sonnet-4-5", cache=True)
sess = Session(principal="did:local:researcher", max_tokens=32_000)
result = react.run(
"Find notes tagged 'drug-interaction' and write a short summary.",
llm=client,
session=sess,
tools=["notes.search", "notes.read", "notes.summarize"],
max_cost_usd=0.10,
max_wall_time_s=60.0,
ctx=ctx,
)
print(f"Answer: {result.answer}")
print(f"Steps: {len(result.steps)}")
print(f"Stopped: {result.stopped}")
# Inspect the trajectory
for i, step in enumerate(result.steps):
print(f"\nStep {i+1}:")
print(f" Thought: {step.thought}")
print(f" Action: {step.action}")
print(f" Observation: {str(step.observation)[:100]}...")
This example shows the pattern: define capabilities as normal, then let
the planner compose them. The agent discovers the tools automatically
(here scoped with tools=[...] to keep the prompt lean), reasons about
them, and produces a trajectory you can inspect step by step.
Deep dives¶
- LLM Client & Session guide -- full client API, retry policies, prompt caching, cost attribution.
- Agent Runtime guide -- per-strategy prompts, parsing, PROV-O shape, error paths, session reference.
- Agentic Patterns guide -- head-to-head comparison with worked examples and cost analysis.
What's next: Chapter 6 -- Data Integration covers how to load data into your knowledge graph from PDFs, CSVs, and other sources using extractors, RML mappings, and vector search.