ADR-0039: Live Schema Inference — Streaming Schema Discovery from KG Writes¶
- Status: Accepted
- Date: 2026-04-18
Context¶
Trails ships deterministic batch schema inference (M14 onto_infer):
scan the entire store, cluster subjects by rdf:type or predicate
similarity, emit @node_type candidates. This works well for
one-shot analysis of an existing graph, but offers no feedback while
an application is running. Common pain-points:
- No live feedback. A developer adds a new entity type through
ctx.kg.add()but doesn't discover until much later that the field types drifted (e.g. a field that used to beintnow receivesstrvalues). - No proactive suggestions. When new
rdf:typevalues appear that don't match any@node_type, the system stays silent. Developers want a nudge: "You wrote 10:Bugnodes — here's the@node_typedefinition." - Cardinality surprises. A field that was always single-valued suddenly receives a list. Today this breaks downstream code silently.
- No streaming integration. Batch
trails onto inferre-reads the whole store. For development inner loops (write → check → iterate) an incremental, event-driven approach is more ergonomic.
Decision¶
1. trails.schema_watcher module¶
A new module provides streaming schema inference by observing
kg_write events via the existing observability hook
(trails.observability.register_observer).
Core types:
SchemaWatcher— registers as an observer onkg_writeevents. Maintains per-type statistics (fields seen, value types, cardinality histograms, sample counts). Thread-safe; suitable for long-running servers.SchemaAlert— emitted when the watcher detects a schema anomaly: new unknown type, new field on a known type, type drift (a field's inferred Python type changes), or cardinality change (single → multi-valued).SchemaSuggestion— emitted viaget_suggestions()once a type has accumulated enough samples (min_samples, default 5). Includes the inferred field map and a ready-to-paste@node_type(...)code string.
2. Progressive / opt-in¶
The watcher is not started by default. Users enable it explicitly:
from trails.schema_watcher import SchemaWatcher
watcher = SchemaWatcher(ctx, min_samples=5)
watcher.start()
# ... use ctx.kg normally ...
alerts = watcher.get_alerts()
suggestions = watcher.get_suggestions()
watcher.stop()
This preserves ADR-0021's progressive-enhancement promise: the KG write path has zero overhead when the watcher is not active.
3. Alert callback¶
An optional alert_callback parameter lets users react to alerts
as they happen (e.g. log, push to a dashboard, fail-fast in tests):
def on_alert(alert: SchemaAlert) -> None:
print(f"SCHEMA: {alert.alert_type} on {alert.type_name}.{alert.field_name}")
watcher = SchemaWatcher(ctx, alert_callback=on_alert)
4. CLI surface¶
Two new subcommands under trails schema:
trails schema watch— start the watcher in foreground, print alerts and suggestions as they arrive. Useful duringtrails servedevelopment loops.trails schema suggest— run batch inference (existingonto_infer) augmented with any live watcher stats, then display suggestions.
5. Integration with existing batch inference¶
SchemaWatcher.get_suggestions() produces SchemaSuggestion objects
that parallel onto_infer.NodeTypeCandidate. A future PR may unify
the two into a shared generate_code() path; for now they are
independent implementations with compatible output.
Consequences¶
- Positive: Developers get immediate feedback when schema patterns emerge or drift, reducing debugging cycles.
- Positive: The alert mechanism integrates naturally with the existing observability pipeline — no new event infrastructure needed.
- Negative: The watcher accumulates per-type/per-field statistics in memory. For stores with thousands of distinct types this could grow; a future refinement may add LRU eviction or sampling.
- Neutral: The watcher does not modify the store or block writes. All analysis is read-only and best-effort.
References¶
python/src/trails/onto_infer.py— batch inference (M14)python/src/trails/observability.py— event hook infrastructure (M3)python/src/trails/context.py—kg_writeevent emission- ADR-0021 — progressive enhancement (north star)