Skip to content

ADR-0039: Live Schema Inference — Streaming Schema Discovery from KG Writes

  • Status: Accepted
  • Date: 2026-04-18

Context

Trails ships deterministic batch schema inference (M14 onto_infer): scan the entire store, cluster subjects by rdf:type or predicate similarity, emit @node_type candidates. This works well for one-shot analysis of an existing graph, but offers no feedback while an application is running. Common pain-points:

  1. No live feedback. A developer adds a new entity type through ctx.kg.add() but doesn't discover until much later that the field types drifted (e.g. a field that used to be int now receives str values).
  2. No proactive suggestions. When new rdf:type values appear that don't match any @node_type, the system stays silent. Developers want a nudge: "You wrote 10 :Bug nodes — here's the @node_type definition."
  3. Cardinality surprises. A field that was always single-valued suddenly receives a list. Today this breaks downstream code silently.
  4. No streaming integration. Batch trails onto infer re-reads the whole store. For development inner loops (write → check → iterate) an incremental, event-driven approach is more ergonomic.

Decision

1. trails.schema_watcher module

A new module provides streaming schema inference by observing kg_write events via the existing observability hook (trails.observability.register_observer).

Core types:

  • SchemaWatcher — registers as an observer on kg_write events. Maintains per-type statistics (fields seen, value types, cardinality histograms, sample counts). Thread-safe; suitable for long-running servers.
  • SchemaAlert — emitted when the watcher detects a schema anomaly: new unknown type, new field on a known type, type drift (a field's inferred Python type changes), or cardinality change (single → multi-valued).
  • SchemaSuggestion — emitted via get_suggestions() once a type has accumulated enough samples (min_samples, default 5). Includes the inferred field map and a ready-to-paste @node_type(...) code string.

2. Progressive / opt-in

The watcher is not started by default. Users enable it explicitly:

from trails.schema_watcher import SchemaWatcher

watcher = SchemaWatcher(ctx, min_samples=5)
watcher.start()
# ... use ctx.kg normally ...
alerts = watcher.get_alerts()
suggestions = watcher.get_suggestions()
watcher.stop()

This preserves ADR-0021's progressive-enhancement promise: the KG write path has zero overhead when the watcher is not active.

3. Alert callback

An optional alert_callback parameter lets users react to alerts as they happen (e.g. log, push to a dashboard, fail-fast in tests):

def on_alert(alert: SchemaAlert) -> None:
    print(f"SCHEMA: {alert.alert_type} on {alert.type_name}.{alert.field_name}")

watcher = SchemaWatcher(ctx, alert_callback=on_alert)

4. CLI surface

Two new subcommands under trails schema:

  • trails schema watch — start the watcher in foreground, print alerts and suggestions as they arrive. Useful during trails serve development loops.
  • trails schema suggest — run batch inference (existing onto_infer) augmented with any live watcher stats, then display suggestions.

5. Integration with existing batch inference

SchemaWatcher.get_suggestions() produces SchemaSuggestion objects that parallel onto_infer.NodeTypeCandidate. A future PR may unify the two into a shared generate_code() path; for now they are independent implementations with compatible output.

Consequences

  • Positive: Developers get immediate feedback when schema patterns emerge or drift, reducing debugging cycles.
  • Positive: The alert mechanism integrates naturally with the existing observability pipeline — no new event infrastructure needed.
  • Negative: The watcher accumulates per-type/per-field statistics in memory. For stores with thousands of distinct types this could grow; a future refinement may add LRU eviction or sampling.
  • Neutral: The watcher does not modify the store or block writes. All analysis is read-only and best-effort.

References

  • python/src/trails/onto_infer.py — batch inference (M14)
  • python/src/trails/observability.py — event hook infrastructure (M3)
  • python/src/trails/context.pykg_write event emission
  • ADR-0021 — progressive enhancement (north star)