Skip to content

Observability

Trails exposes a zero-dependency event hook plus an in-memory tracer and metrics store. The hook fires on every @capability invocation, on every LLMClient.complete() call, and on every ctx.kg write or query — so an application can wire OpenTelemetry, Prometheus, or a custom sink without Trails taking a hard dependency on any of them.

Commits of record: 622d533 (capability + LLM events) and bbbac1b (KG events). Cost-side mirror is ADR-0012; runtime-side integration is ADR-0018.

Quickstart

from trails import capability, invoke
from trails.observability import register_observer

def log(kind, event):
    print(kind, event)

register_observer(log)

@capability
def greet(name: str) -> dict:
    return {"msg": f"hello {name}"}

invoke("greet", {"name": "ada"})
# capability_started {'capability_id': 'greet', 'args_keys': ['name'], ...}
# capability_completed {'capability_id': 'greet', 'outcome': 'success', ...}

Observers run in registration order, receive the same dict, and never affect the invoke path: exceptions are caught and logged.

Event kinds

Six kinds are emitted from the runtime today. Every event is a plain dict — pick the fields you need.

capability_started

Fired before the handler runs (and before any @policy check). Fields: capability_id, args_keys (sorted list of top-level arg names — values are NOT leaked), trace_id, principal, started_at (time.monotonic() snapshot).

capability_completed

Fired after provenance is attached, on success only. Fields: capability_id, trace_id, duration_ms, outcome="success".

capability_failed

Fired whenever the handler raises, whenever @policy denies, or when the return value is not JSON-serializable. Fields: capability_id, trace_id, duration_ms, outcome="failed", error_kind (the exception class name, e.g. ValueError, PermissionError), message (str(exc)).

llm_call

Fired from LLMClient.complete() after retry resolution, independent of whether a Context was passed. Best-effort — an emit failure only logs a warning. Fields: model, provider ("anthropic" | "ollama" | "mock"), tokens (total prompt + completion), cost_usd, latency_ms.

kg_write

Fired from ctx.kg.add, ctx.kg.save, and ctx.kg.update. Common fields: trace_id, principal, duration_ms. The add / save paths add op ("add" or "save") and model (the @node_type class name); save additionally carries dirty_fields (the list of attributes that changed since the last flush). The raw SPARQL escape hatch (ctx.kg.update) instead carries sparql_kind="update" — no op, no model.

kg_query

Fired from ctx.kg.query and ctx.kg.match. Common fields: trace_id, principal, row_count, duration_ms. The SPARQL path adds sparql_kind — one of "select", "ask", "construct" (detected by first non-prefix token). The match-variant adds op="match", labels (list of strings), and types (class names stringified — class objects are never leaked to observers).

API reference

from trails.observability import (
    Observer, register_observer, unregister_observer,
    clear_observers, emit,
)
  • register_observer(callback) — appends callback to the global observer list. Registering the same callable twice fires it twice; each registration must be removed independently.
  • unregister_observer(callback) — removes the first registration. Silent no-op when not registered (safe in finally blocks).
  • clear_observers() — strips every registration. Intended for tests.
  • emit(kind, **fields) — fires an event to every observer. Takes a snapshot under lock, then releases the lock before invoking callbacks, so a slow observer does not block concurrent emitters. Zero observers → cheap return.

Thread-safety: registration, unregistration, and the snapshot inside emit are all guarded by a module-level threading.Lock. Observer callbacks run outside that lock, so an observer may itself emit events or (un)register observers without self-deadlocking — but the changes apply to the next emit, not the in-flight one.

OpenTelemetry bridge

The event hook is designed to map cleanly onto OTel spans. The minimal bridge below turns every capability invocation into a span and records LLM-call spans as events on that span:

from opentelemetry import trace
from trails.observability import register_observer

otel = trace.get_tracer("my-app")
spans: dict[str, object] = {}

def to_otel(kind, event):
    if kind == "capability_started":
        spans[event["trace_id"]] = otel.start_span(event["capability_id"])
    elif kind in ("capability_completed", "capability_failed"):
        s = spans.pop(event["trace_id"], None)
        if s is not None:
            s.end()

register_observer(to_otel)

For direct in-process tracing without OTel, the module also ships a TrailsTracer and a module-level tracer singleton. It stores spans in memory, logs each one as a JSON line on the trails.trace logger, and supports nesting:

from trails.observability import tracer

with tracer.span("my-op", attributes={"version": 2}) as span:
    ...  # nested tracer.span() calls inherit span.trace_id

tracer.list_spans() / tracer.get_spans_by_trace(trace_id) are the query API used by the trails CLI. TrailsTracer.__init__ accepts an otlp_endpoint argument that is reserved for future OTLP export; until then it is a no-op and spans go to the logger.

Cost accounting bridge

llm_call events and the CostTracker surface record the same data twice, on purpose:

  • llm_call events are the thin, user-wirable side. They flow through any observer (OTel, Prometheus, stdout) and fire even when no Context is threaded through.
  • CostTracker (ADR-0012) is the framework-internal budget primitive: one envelope per call, nested correctly via call_id / parent_call_id (commit d2cd31a), enforceable at capability / principal / tenant scope.

Use observers for metrics export, dashboards, and custom alerting; use CostTracker when you need to enforce budget before the next call runs.

Best practices

  • Observers are best-effort. An exception in your callback is caught and logged on trails.observability, never re-raised. Treat observers as fire-and-forget.
  • Do not block. The emit path runs synchronously on the invoke thread. If your sink is remote (Datadog, Honeycomb, OTLP HTTP), queue the event and flush on a background thread. Synchronous observers must stay microseconds-cheap.
  • Observers compose. Register as many as you need; every one receives every event. Call clear_observers() between tests to avoid leakage between cases — see trails.testing.capture_events for a context-manager helper that registers + unregisters an in-memory collector.
  • Zero observers = zero cost. emit exits early when the list is empty; decorator-free handlers pay no observability overhead.

Reference table

Every public symbol exported by trails.observability:

Symbol Kind Purpose
EventKind Literal type alias Enumerates the six emitted kinds
Observer Callable[[str, dict], None] alias Callback signature
Span dataclass Single trace span (trace_id, span_id, name, attributes, start_time, end_time, status, parent_id)
TrailsTracer class In-memory tracer; start_span, end_span, span() ctx manager, list_spans, get_spans_by_trace, clear
TrailsMetrics class In-memory counters + latency histograms; record_invocation, record_error, get_summary, clear
tracer TrailsTracer singleton Process-wide tracer used by the CLI
metrics TrailsMetrics singleton Process-wide metrics registry
register_observer(cb) function Append an event observer
unregister_observer(cb) function Remove one registration (silent no-op if absent)
clear_observers() function Drop every registration (tests)
emit(kind, **fields) function Fire an event to every observer

See also: LLM Client & Session for how llm_call is wired, and Testing for capture_events — the canonical in-test observer.