Skip to content

Chapter 5 — Building Agentic Apps

Learning objectives

After this chapter you will be able to:

  • Explain what makes a Trails app "agentic" versus a plain KG app.
  • Create an LLMClient backed by Anthropic, Ollama, or a mock provider.
  • Manage multi-turn conversations with Session and token windowing.
  • Choose between ReAct, Plan-and-Execute, and Reflexion planners.
  • Enforce cost and time budgets on planner runs.
  • Trace every agent step through the PROV-O provenance graph.

What makes an app "agentic"?

A standard Trails app exposes @capability handlers and answers requests with a single invoke() call. An agentic app wraps those same capabilities in a planning loop: an LLM decides which capability to call next, observes the result, and repeats until a goal is met. The capabilities do not change; the planner is a thin orchestrator on top.

non-agentic:  user  →  invoke("notes.search", {tag: "urgent"})  →  result

agentic:      user  →  planner(goal="find urgent notes and summarise them")
                        ├─ LLM decides → invoke("notes.search", {tag: "urgent"})
                        ├─ LLM reads observation → invoke("note.summarize", ...)
                        └─ LLM decides → finish with answer

The key difference: the human writes the goal, not the dispatch sequence. The planner fills in the sequence at runtime.

The LLM client

Trails ships a single LLMClient class with three factory methods. All three return a client with the same .complete() interface, so swapping providers is a one-line change.

Ollama (local, free)

from trails.llm import LLMClient

client = LLMClient.ollama(
    model="qwen3:8b",
    base_url="http://localhost:11434",
)

No extra dependencies. Uses stdlib http.client to talk to Ollama's REST API. Run any model locally — Qwen, Llama, Mistral, Phi, Gemma:

ollama pull qwen3:8b && ollama serve

If the server is unreachable you get a clean TrailsError with the hint "Is ollama serve running?".

Anthropic (cloud)

client = LLMClient.anthropic(
    model="claude-sonnet-4-5",
    cache=True,           # enable Anthropic prompt caching
    timeout=30.0,
)

Requires the optional anthropic SDK:

pip install 'trails[llm]'

Mock (tests)

client = LLMClient.mock(response="42 is the answer")

# Or with a callable that inspects the messages:
client = LLMClient.mock(
    response=lambda msgs: f"You sent {len(msgs)} messages",
    cost_usd=0.0,
)

Mock clients are deterministic and free. Use them in unit tests so your test suite never calls an API.

The shared call shape

All three clients expose the same .complete() method:

from trails.llm import Message

resp = client.complete(
    messages=[
        Message(role="system", content="You are a helpful analyst."),
        Message(role="user", content="Summarize Q1 filings."),
    ],
    max_tokens=1024,
    temperature=0.0,
    ctx=ctx,            # optional — enables cost + provenance tracking
)

print(resp.text)        # the LLM's reply
print(resp.cost_usd)   # cost in USD (0.0 for Ollama/mock)
print(resp.usage)       # LLMUsage(prompt_tokens=..., completion_tokens=..., total_tokens=...)

Pass ctx=ctx inside a @capability handler and cost tracking plus PROV-O activity emission happen automatically. Omit it in scripts and tests.

Sessions: multi-turn conversation state

For single-shot LLM calls, pass a messages list directly. For multi-turn work (chat, agent loops, reasoning chains), use Session:

from trails.agent import Session

sess = Session(
    principal="did:local:alice",
    max_tokens=32_000,    # token budget for the conversation window
    pin_head=1,           # pin the first N messages (usually system prompt)
)

sess.append("system", "You are a research assistant.")
sess.append("user", "Find papers about drug interactions.")

Token windowing

Session wraps a TokenWindow — a FIFO sliding window that keeps the conversation within max_tokens. The first pin_head messages (your system prompt) are pinned and never evicted. When the window is full, the oldest non-pinned messages are dropped first.

This means long conversations degrade gracefully: the LLM always sees the system prompt and the most recent turns.

History and replay

Session stores the full message history and an invocations list of raw dispatch envelopes. Reuse the same Session across planner calls to continue a conversation:

# First goal
result1 = react.run("Find urgent notes.", llm=client, session=sess)

# Second goal — the LLM sees the full conversation so far
result2 = react.run("Now summarize them.", llm=client, session=sess)

Create a fresh Session for each independent goal.

Planners

Trails ships three planning strategies. They share the same call signature and return the same PlanResult shape, so switching is a one-line import change.

ReAct — think, act, observe

from trails.agent.planners import react

result = react.run(
    "Find urgent notes and summarize them.",
    llm=client,
    session=sess,
    max_steps=10,
    temperature=0.0,
)

print(result.answer)                # the final answer
print(result.stopped)               # "goal_achieved" | "max_steps" | "error"
print(len(result.steps))            # number of think-act-observe turns

On every turn the LLM emits a JSON block with thought, action, and action_input. The planner dispatches the action through trails.invoke, feeds the observation back, and loops. When action == "finish", the planner returns the final_answer.

Tool errors are not fatal -- they become the step's observation so the LLM can self-correct.

Best for: search-then-summarize, classify-then-route, any task where the next step depends on the last observation and fits under ~10 steps.

Plan-and-Execute — plan once, execute many

from trails.agent.planners import plan_and_execute

result = plan_and_execute.run(
    "Ingest 5 files and validate each.",
    llm=client,
    session=sess,
    max_steps=10,
    max_replans=3,      # replan up to 3 times on failure
)

The LLM emits a full plan in one call (a JSON list of actions), then the kernel executes each step without further LLM calls. If a step fails, the planner asks the LLM for a revised plan and resumes.

Best for: long-horizon workflows with a clear decomposition where each step's success is individually checkable.

Reflexion — self-critique and retry

from trails.agent.planners import reflexion

result = reflexion.run(
    "Write a comprehensive analysis of the trial data.",
    llm=client,
    session=sess,
    max_steps=10,
    max_outer_iterations=3,    # max critic loops
)

Reflexion wraps ReAct in a critic loop. After each inner ReAct run, a separate critic LLM call evaluates the answer. On "accept" the loop ends. On "retry", the critique is appended to the session so the next attempt avoids the named defect.

Critic replies are stored in session.metadata["critiques"] for post-hoc inspection.

Best for: tasks where correctness is hard to verify in one pass -- rubric-matching, soft constraints, answer quality.

Tool discovery

By default, planners discover every registered @capability automatically. You can scope the tool set to reduce prompt size and cost:

# Explicit list — deterministic, lean prompt
react.run(goal, llm=c, session=s, tools=["notes.search", "notes.tag"])

# Top-N most recently registered
react.run(goal, llm=c, session=s, tools=5)

# Keyword filter
react.run(goal, llm=c, session=s, tool_filter="notes search tag")

# Predicate filter
react.run(
    goal, llm=c, session=s,
    tool_filter=lambda meta: "read-only" in meta["description"],
)

At 50+ registered capabilities the system prompt starts dominating cost. Always scope tools= for production workloads.

Budget enforcement

Three cumulative-budget kwargs cap a planner run without depending on max_steps:

Kwarg On breach, result.stopped =
max_cost_usd=0.50 "max_cost"
max_tokens=100_000 "max_tokens"
max_wall_time_s=30.0 "max_wall_time"

All three are None (unlimited) by default and available on all three planners.

result = react.run(
    "Analyze all documents.",
    llm=client,
    session=sess,
    max_cost_usd=0.25,
    max_wall_time_s=30.0,
)

if result.stopped == "goal_achieved":
    print("Done:", result.answer)
elif result.stopped in {"max_cost", "max_tokens", "max_wall_time"}:
    print(f"Budget hit ({result.stopped}), best so far:", result.answer)
    # Walk the trajectory for structured partial results
    for step in reversed(result.steps):
        if step.observation and not str(step.observation).startswith("error:"):
            print("Last good observation:", step.observation)
            break

Budget checks run at iteration boundaries, not mid-step. The in-flight operation finishes, the loop aborts cleanly, and a trails:wasTerminatedBy triple is written to the provenance graph.

PROV-O tracking for agent steps

When you pass ctx=ctx to a planner, provenance is automatic:

  • One trails:ReActPlan root activity per run(), tagged with goal, principal, and session ID.
  • One trails:LLMCompletion activity per LLM call (each planner step).
  • Each invoke() inside the loop emits its own prov:Activity.
  • All activities are linked via prov:wasInformedBy.

Without ctx the planner still runs but without telemetry. Tests can omit it; production code should always pass it.

When to use which planner

How is "goal met?" judged?
├── Step-by-step, next action depends on last observation  →  ReAct
├── End-to-end, one upfront decomposition is obvious       →  Plan-and-Execute
└── End-to-end, correctness needs a second pass            →  Reflexion

Single LLM call enough to produce the answer?
→  Don't use a planner — just call LLMClient.complete directly.
Axis ReAct Plan-and-Execute Reflexion
LLM calls N (one per step) 1 + replans C x (N_inner + 1 critic)
Best for Short reactive tasks Multi-step workflows Quality-critical answers
Disappoints when Long horizons (>10 steps) Every step depends on the prior Answer is obviously correct on first try
Typical cost Cheapest Cheapest for long clean runs 2-3x ReAct

For tasks that boil down to one templated LLM call plus one invoke(), skip the planner entirely:

text = client.complete([Message(role="user", content=prompt)], ctx=ctx).text
result = trails.invoke("notes.tag", {"id": note_id, "tag": text.strip()},
                       principal="did:local:alice")

Example: building a research assistant

Here is a complete example that combines everything from this chapter. The assistant searches a note database, reads matching notes, and produces a summary.

from trails import capability
from trails.agent import Session
from trails.agent.planners import react
from trails.llm import LLMClient

# --- Define capabilities ---

@capability(id="notes.search", description="Search notes by tag. Returns note IDs.")
def search_notes(ctx, tag: str) -> dict:
    results = ctx.kg.query(f"""
        SELECT ?note WHERE {{
            ?note a <trails://app/Note> ;
                  <trails://app/Note/tag> "{tag}" .
        }}
    """)
    return {"hits": [r["note"] for r in results]}

@capability(id="notes.read", description="Read a note's full text by IRI.")
def read_note(ctx, iri: str) -> dict:
    results = ctx.kg.query(f"""
        SELECT ?text WHERE {{
            <{iri}> <trails://app/Note/text> ?text .
        }}
    """)
    return {"text": results[0]["text"] if results else "Not found"}

@capability(id="notes.summarize", description="Summarize a list of texts.")
def summarize(ctx, texts: list) -> dict:
    joined = "\n---\n".join(texts)
    resp = LLMClient.ollama(model="qwen3:8b").complete(
        [{"role": "user", "content": f"Summarize these notes:\n{joined}"}],
        max_tokens=500,
        ctx=ctx,
    )
    return {"summary": resp.text}

# --- Run the agent ---

client = LLMClient.anthropic(model="claude-sonnet-4-5", cache=True)
sess = Session(principal="did:local:researcher", max_tokens=32_000)

result = react.run(
    "Find notes tagged 'drug-interaction' and write a short summary.",
    llm=client,
    session=sess,
    tools=["notes.search", "notes.read", "notes.summarize"],
    max_cost_usd=0.10,
    max_wall_time_s=60.0,
    ctx=ctx,
)

print(f"Answer: {result.answer}")
print(f"Steps: {len(result.steps)}")
print(f"Stopped: {result.stopped}")

# Inspect the trajectory
for i, step in enumerate(result.steps):
    print(f"\nStep {i+1}:")
    print(f"  Thought: {step.thought}")
    print(f"  Action:  {step.action}")
    print(f"  Observation: {str(step.observation)[:100]}...")

This example shows the pattern: define capabilities as normal, then let the planner compose them. The agent discovers the tools automatically (here scoped with tools=[...] to keep the prompt lean), reasons about them, and produces a trajectory you can inspect step by step.

Deep dives


What's next: Chapter 6 -- Data Integration covers how to load data into your knowledge graph from PDFs, CSVs, and other sources using extractors, RML mappings, and vector search.