Observability¶

Trails exposes a zero-dependency event hook plus an in-memory tracer and metrics store. The hook fires on every @capability invocation, on every LLMClient.complete() call, and on every ctx.kg write or query — so an application can wire OpenTelemetry, Prometheus, or a custom sink without Trails taking a hard dependency on any of them.

Commits of record: 622d533 (capability + LLM events) and bbbac1b (KG events). Cost-side mirror is ADR-0012; runtime-side integration is ADR-0018.

Quickstart¶

from trails import capability, invoke
from trails.observability import register_observer

def log(kind, event):
    print(kind, event)

register_observer(log)

@capability
def greet(name: str) -> dict:
    return {"msg": f"hello {name}"}

invoke("greet", {"name": "ada"})
# capability_started {'capability_id': 'greet', 'args_keys': ['name'], ...}
# capability_completed {'capability_id': 'greet', 'outcome': 'success', ...}

Observers run in registration order, receive the same dict, and never affect the invoke path: exceptions are caught and logged.

Event kinds¶

Six kinds are emitted from the runtime today. Every event is a plain dict — pick the fields you need.

`capability_started`¶

Fired before the handler runs (and before any @policy check). Fields: capability_id, args_keys (sorted list of top-level arg names — values are NOT leaked), trace_id, principal, started_at (time.monotonic() snapshot).

`capability_completed`¶

Fired after provenance is attached, on success only. Fields: capability_id, trace_id, duration_ms, outcome="success".

`capability_failed`¶

Fired whenever the handler raises, whenever @policy denies, or when the return value is not JSON-serializable. Fields: capability_id, trace_id, duration_ms, outcome="failed", error_kind (the exception class name, e.g. ValueError, PermissionError), message (str(exc)).

`llm_call`¶

Fired from LLMClient.complete() after retry resolution, independent of whether a Context was passed. Best-effort — an emit failure only logs a warning. Fields: model, provider ("anthropic" | "ollama" | "mock"), tokens (total prompt + completion), cost_usd, latency_ms.

`kg_write`¶

Fired from ctx.kg.add, ctx.kg.save, and ctx.kg.update. Common fields: trace_id, principal, duration_ms. The add / save paths add op ("add" or "save") and model (the @node_type class name); save additionally carries dirty_fields (the list of attributes that changed since the last flush). The raw SPARQL escape hatch (ctx.kg.update) instead carries sparql_kind="update" — no op, no model.

`kg_query`¶

Fired from ctx.kg.query and ctx.kg.match. Common fields: trace_id, principal, row_count, duration_ms. The SPARQL path adds sparql_kind — one of "select", "ask", "construct" (detected by first non-prefix token). The match-variant adds op="match", labels (list of strings), and types (class names stringified — class objects are never leaked to observers).

API reference¶

from trails.observability import (
    Observer, register_observer, unregister_observer,
    clear_observers, emit,
)

register_observer(callback) — appends callback to the global observer list. Registering the same callable twice fires it twice; each registration must be removed independently.
unregister_observer(callback) — removes the first registration. Silent no-op when not registered (safe in finally blocks).
clear_observers() — strips every registration. Intended for tests.
emit(kind, **fields) — fires an event to every observer. Takes a snapshot under lock, then releases the lock before invoking callbacks, so a slow observer does not block concurrent emitters. Zero observers → cheap return.

Thread-safety: registration, unregistration, and the snapshot inside emit are all guarded by a module-level threading.Lock. Observer callbacks run outside that lock, so an observer may itself emit events or (un)register observers without self-deadlocking — but the changes apply to the next emit, not the in-flight one.

OpenTelemetry bridge¶

The event hook is designed to map cleanly onto OTel spans. The minimal bridge below turns every capability invocation into a span and records LLM-call spans as events on that span:

from opentelemetry import trace
from trails.observability import register_observer

otel = trace.get_tracer("my-app")
spans: dict[str, object] = {}

def to_otel(kind, event):
    if kind == "capability_started":
        spans[event["trace_id"]] = otel.start_span(event["capability_id"])
    elif kind in ("capability_completed", "capability_failed"):
        s = spans.pop(event["trace_id"], None)
        if s is not None:
            s.end()

register_observer(to_otel)

For direct in-process tracing without OTel, the module also ships a TrailsTracer and a module-level tracer singleton. It stores spans in memory, logs each one as a JSON line on the trails.trace logger, and supports nesting:

from trails.observability import tracer

with tracer.span("my-op", attributes={"version": 2}) as span:
    ...  # nested tracer.span() calls inherit span.trace_id

tracer.list_spans() / tracer.get_spans_by_trace(trace_id) are the query API used by the trails CLI. TrailsTracer.__init__ accepts an otlp_endpoint argument that is reserved for future OTLP export; until then it is a no-op and spans go to the logger.

Cost accounting bridge¶

llm_call events and the CostTracker surface record the same data twice, on purpose:

llm_call events are the thin, user-wirable side. They flow through any observer (OTel, Prometheus, stdout) and fire even when no Context is threaded through.
CostTracker (ADR-0012) is the framework-internal budget primitive: one envelope per call, nested correctly via call_id / parent_call_id (commit d2cd31a), enforceable at capability / principal / tenant scope.

Use observers for metrics export, dashboards, and custom alerting; use CostTracker when you need to enforce budget before the next call runs.

Best practices¶

Observers are best-effort. An exception in your callback is caught and logged on trails.observability, never re-raised. Treat observers as fire-and-forget.
Do not block. The emit path runs synchronously on the invoke thread. If your sink is remote (Datadog, Honeycomb, OTLP HTTP), queue the event and flush on a background thread. Synchronous observers must stay microseconds-cheap.
Observers compose. Register as many as you need; every one receives every event. Call clear_observers() between tests to avoid leakage between cases — see trails.testing.capture_events for a context-manager helper that registers + unregisters an in-memory collector.
Zero observers = zero cost. emit exits early when the list is empty; decorator-free handlers pay no observability overhead.

Reference table¶

Every public symbol exported by trails.observability:

Symbol	Kind	Purpose
`EventKind`	`Literal` type alias	Enumerates the six emitted kinds
`Observer`	`Callable[[str, dict], None]` alias	Callback signature
`Span`	dataclass	Single trace span (`trace_id`, `span_id`, `name`, `attributes`, `start_time`, `end_time`, `status`, `parent_id`)
`TrailsTracer`	class	In-memory tracer; `start_span`, `end_span`, `span()` ctx manager, `list_spans`, `get_spans_by_trace`, `clear`
`TrailsMetrics`	class	In-memory counters + latency histograms; `record_invocation`, `record_error`, `get_summary`, `clear`
`tracer`	`TrailsTracer` singleton	Process-wide tracer used by the CLI
`metrics`	`TrailsMetrics` singleton	Process-wide metrics registry
`register_observer(cb)`	function	Append an event observer
`unregister_observer(cb)`	function	Remove one registration (silent no-op if absent)
`clear_observers()`	function	Drop every registration (tests)
`emit(kind, **fields)`	function	Fire an event to every observer

See also: LLM Client & Session for how llm_call is wired, and Testing for capture_events — the canonical in-test observer.