ADR-0028: Schema Migrations — Versioned Graph Evolution¶
- Status: Accepted (2026-04-19)
- Date: 2026-04-17
- Depends on: ADR-0017 (ActiveGraph ORM), ADR-0021 (Progressive enhancement), ADR-0025 (Auto-ontology generation)
- Supersedes: —
- Superseded by: —
Context¶
Trails has trails onto evolve (M4) — an interactive, exploratory tool
that diffs registered @shape / @node_type definitions against a
previous TTL export and generates SPARQL UPDATE statements
(onto_evolution.py). This works well during development: a single
developer explores shape changes, reviews the diff, and applies the
migration in one sitting.
Production KG apps need something different:
- Reproducibility. A migration must produce the same result on
every instance — dev, staging, prod — without interactive prompts.
onto evolveis interactive by design; piping--yesis a footgun. - History. Teams need to know which schema changes have been applied, when, and by whom. Git tracks the migration files; the KG itself should track which ones ran.
- Ordering. When multiple developers change the schema concurrently,
their migrations must be sequenced.
onto evolvehas no concept of sequencing — it diffs "old vs. new" as a single step. - Reversibility. Rolling back a bad schema change in production requires an explicit reverse operation, not "re-run onto evolve with the old export." Reverse operations must be tested, versioned, and auditable.
- Automation. CI/CD pipelines, deployment scripts, and
infrastructure-as-code workflows need a single command that
idempotently applies all pending schema changes.
trails onto evolveis not that command.
Rails solved this problem in 2005 with db:migrate. Django solved it
with makemigrations / migrate. Every serious ORM since has followed
the same pattern: numbered migration files, a tracking table, forward
and reverse operations, auto-detection of changes, and a CLI to manage
the lifecycle. Trails needs the same pattern — adapted for knowledge
graphs instead of relational tables.
The existing OntologyEvolution class (onto_evolution.py) already
provides the core diffing engine (ShapeDiff) and SPARQL generation
(generate_migration). What is missing is the lifecycle around it:
files, sequencing, tracking, reversibility, and CLI commands.
Decision¶
Add a Rails-style migration system for KG schema changes. Migration files are Python modules. Each migration declares its dependencies and a list of operations. The CLI generates, applies, rolls back, and squashes migrations. A dedicated named graph in the KG tracks migration history with PROV-O provenance.
1. Migration files¶
Migrations live in migrations/ at the project root (configurable via
trails.toml key migrations.directory). Each file is a Python module
named NNNN_description.py where NNNN is a zero-padded sequential
number:
Each migration is a class inheriting from trails.migrate.Migration:
from trails.migrate import Migration, ops
class AddSentimentToArticle(Migration):
dependencies = ["0001_initial"]
operations = [
ops.AddField("Article", "sentiment", str, nullable=True),
ops.RenameField("Article", "author_name", "author"),
ops.RemoveField("Article", "legacy_id"),
ops.AddNodeType("Review", fields={"rating": int, "text": str}),
ops.AddRelation("Article", "reviews", "Review"),
ops.RunSPARQL("INSERT DATA { ... }"), # escape hatch
]
Operation catalogue (initial set, extensible):
| Operation | Forward | Reverse (auto-generated) |
|---|---|---|
AddField(type, name, dtype, nullable) |
Add sh:property to shape; emit SHACL constraint |
Remove sh:property; drop data if --prune |
RemoveField(type, name) |
Remove sh:property from shape |
Re-add with original dtype/constraints (stored in migration metadata) |
RenameField(type, old, new) |
Update sh:name on the property shape |
Reverse rename |
AlterField(type, name, **changes) |
Modify datatype, cardinality, or constraints | Restore previous values |
AddNodeType(name, fields, parent) |
Create sh:NodeShape + rdf:type declaration; register @node_type |
Remove shape and type declaration |
RemoveNodeType(name) |
Remove shape; optionally remove instances (--prune) |
Re-create shape (instances are not restored) |
AddRelation(from_type, name, to_type) |
Add predicate linking two shapes | Remove predicate |
RemoveRelation(from_type, name) |
Remove predicate | Re-add (target type stored in metadata) |
RunSPARQL(forward, reverse=None) |
Execute arbitrary SPARQL UPDATE | Execute reverse if provided; non-reversible if reverse is None |
2. CLI commands¶
trails migrate generate [--name NAME]
Auto-detect changes between registered @node_type / @shape definitions
and the current KG state. Generate a new migration file in migrations/.
Uses onto_infer (ADR-0025) for KG state and OntologyEvolution.diff()
for change detection.
trails migrate run [--to NNNN]
Apply all pending migrations (or up to migration NNNN) in dependency
order. Idempotent — already-applied migrations are skipped.
trails migrate rollback [N]
Reverse the last N applied migrations (default: 1). Non-reversible
operations (RunSPARQL without reverse) block rollback with an error.
trails migrate status
Show applied and pending migrations with timestamps.
trails migrate squash [--from NNNN] [--to MMMM]
Combine multiple sequential migrations into a single migration.
The squashed migration replaces the originals; a mapping entry in
_trails_migrations records the replacement so instances that already
applied the originals are not re-applied.
trails migrate check
Dry-run: detect pending migrations and report what would change
without applying anything. Suitable for CI gates.
3. Migration tracking¶
Applied migrations are recorded in a dedicated named graph
<urn:trails:migrations> (the _trails_migrations graph). Each
applied migration is a PROV-O prov:Activity:
<urn:trails:migrations/0002_add_sentiment_to_article>
a prov:Activity ;
prov:startedAtTime "2026-04-17T14:30:00Z"^^xsd:dateTime ;
prov:endedAtTime "2026-04-17T14:30:01Z"^^xsd:dateTime ;
prov:wasAssociatedWith <urn:trails:agent/migrate-cli> ;
trails:migrationName "0002_add_sentiment_to_article" ;
trails:migrationHash "sha256:abc123..." ;
trails:operationCount 3 ;
trails:reversible true .
The migrationHash is a content hash of the migration file, ensuring
that a migration is not silently modified after being applied. If a hash
mismatch is detected during trails migrate run, the CLI halts with an
error and directs the user to trails migrate squash or manual
resolution.
4. Auto-detection¶
trails migrate generate performs the following steps:
- Collect registered types. Walk the Python codebase (same
discovery path as
trails onto export) to find all@node_typeand@shapedecorators. These represent the desired state. - Infer current KG state. Use
onto infer(ADR-0025 Phase 1) to extract the actual schema from the running KG store. This includes node types, predicates, cardinalities, and datatypes discovered from data. - Diff. Feed both into
OntologyEvolution.diff()to produceShapeDiffentries. - Map diffs to operations. Translate each
ShapeDiffinto the appropriateops.*calls. Heuristics for rename detection: if a field is removed from one type and an identically-typed field is added in the same migration, prompt the user (or use--autoto assume rename when types match). - Write migration file. Generate a numbered Python file with the operation list and a human-readable comment header.
5. Reversibility¶
Every operation class implements forward(store, ctx) and
reverse(store, ctx). The reverse method is auto-generated for
declarative operations (AddField, RenameField, etc.) by storing
the pre-change state in the operation's metadata.
RunSPARQL is the escape hatch and requires an explicit reverse
parameter. If reverse is None, the operation is marked
non-reversible. trails migrate rollback refuses to reverse a
migration containing non-reversible operations unless --force is
passed (which skips those operations and logs a warning).
Non-reversible operations are flagged during trails migrate check
and in trails migrate status output.
6. Integration with onto evolve¶
trails migrate generate replaces the interactive trails onto evolve
for production use. The relationship:
trails onto evolve |
trails migrate generate |
|
|---|---|---|
| Purpose | Exploration, prototyping | Production schema management |
| Mode | Interactive, one-shot | Automated, versioned |
| Output | Raw SPARQL UPDATE (stdout or applied) | Migration file (Python) |
| Tracking | None | _trails_migrations named graph |
| Reversibility | Manual | Built-in per operation |
| Sequencing | N/A | Dependency graph |
onto evolve remains available and unchanged — it is the right tool
for exploring changes during development. trails migrate generate
builds on the same OntologyEvolution.diff() and ShapeDiff engine
but wraps the output in the migration lifecycle.
7. Configuration¶
trails.toml gains a [migrations] section:
[migrations]
directory = "migrations" # relative to project root
tracking_graph = "urn:trails:migrations" # named graph IRI
auto_prune = false # whether RemoveField/RemoveNodeType drops data
hash_algorithm = "sha256" # content hash for tamper detection
Consequences¶
Positive¶
- Reproducible deployments. The same migration sequence produces the
same schema on every instance. CI can gate on
trails migrate checkreturning clean. - Auditable history. PROV-O provenance on every migration means the complete schema evolution history is queryable from the KG itself.
- Rails-familiar workflow. Developers from Rails/Django backgrounds
recognise
migrate generate/migrate run/migrate rollbackimmediately. The learning curve is near zero for the happy path. - Builds on existing code.
OntologyEvolution,ShapeDiff, andparse_shapes_from_ttlare reused wholesale — the migration system is a lifecycle wrapper, not a rewrite. - Progressive enhancement preserved. Migrations work at whatever
typing level the app uses. A label-only app migrating to
@node_typegenerates the sameAddNodeTypeoperations as a full SHACL app.
Negative¶
- New surface to maintain. The operation catalogue will grow as the framework gains features (relations, constraints, indexes). Each new operation needs forward, reverse, and serialisation logic.
- Python-only migration files. TypeScript apps will need to call
trails migrate runvia CLI or subprocess until a TS migration surface is built (follow-on ADR if TS adoption warrants it). - Rename detection is heuristic. Auto-detection of renames (vs.
remove + add) is inherently ambiguous. The interactive prompt
mitigates this for
trails migrate generate; the--autoflag uses type-matching heuristics that may guess wrong. - Named-graph dependency. Migration tracking requires the store to support named graphs. All Trails-supported stores (Oxigraph, Fuseki, Qlever) support this, but custom adapters must too.
Non-consequences¶
onto evolveis not deprecated. It remains the exploration tool.- The ORM (ADR-0017) is unchanged. Migrations operate on the schema layer (shapes, types); the ORM operates on the data layer (instances).
- Provenance (ADR-0009) unchanged — migration activities use the same PROV-O vocabulary as capability invocations.
- Cost envelopes (ADR-0012) not involved — migrations are admin operations, not capability invocations.
Non-goals¶
- Not a full data migration framework. Migrations handle schema
(shapes, types, constraints). Data transformations (backfill a new
field, reformat existing values) use
RunSPARQLas an escape hatch, not a dedicated scheduling / batching system. - No automatic conflict resolution for concurrent migrations. If two developers create migrations with the same sequence number, manual renumbering is required (same as Django). A future ADR may add hash-based ordering to eliminate numbering conflicts.
- No cross-instance migration sync. Each instance tracks its own migration state. Federated instances (ADR-0023) do not automatically propagate migrations — that is federation's job, not the migration system's.
- No GUI. Migration management is CLI-only. The admin UI (ADR-0019 M10) may surface migration status as a read-only view in a future phase.
Alternatives considered¶
- Extend
onto evolvewith tracking and reversibility. Rejected. Adding lifecycle features to an interactive exploration tool would overload its UX. The concerns are genuinely different: exploration wants flexibility; production wants reproducibility. Separate tools sharing the same diff engine is the right split. - Store migrations as TTL files instead of Python. Rejected.
Python migration files can express conditional logic, data
transformations, and the
RunSPARQLescape hatch. TTL is declarative-only and cannot express reverse operations or conditions. The Django/Rails precedent of code-as-migration is well-proven. - Use git history as the migration ledger (no tracking graph). Rejected. Git tracks what was authored; the KG tracking graph tracks what was applied. These are different questions. A production instance may be several migrations behind HEAD; only the tracking graph knows which ones have actually run.
- Adopt Liquibase/Flyway patterns (XML/SQL changelogs). Rejected.
Trails is a Python framework; Python migration files compose with
the existing
@node_type/@shapetype system and can import project code. XML changelogs would be an alien abstraction. - Wait for a general-purpose RDF migration tool to emerge. Rejected. The RDF ecosystem has no Rails-migrate equivalent after 20+ years. This is a framework-level concern that Trails must own.
Open questions¶
- Should
trails migrate generaterequire a running KG store, or can it work from a TTL export? Theonto inferpath requires a store; theparse_shapes_from_ttlpath works from files. Supporting both (store-first, file-fallback) is the likely answer but adds complexity to the generation path. - How do squashed migrations interact with instances that partially
applied the originals? Proposed: the squash records a mapping
(
replaces: [0002, 0003, 0004]); an instance that already applied 0002 and 0003 applies only the delta from the squash. This needs careful design. - Should migrations be atomic (all-or-nothing)? SPARQL UPDATE
does not guarantee transactionality across multiple statements on
all stores. Proposed: best-effort per-operation; if an operation
fails, the migration is marked as partially applied with a list of
completed operations, and
trails migrate runcan resume from the failure point. - Naming:
trails migrateortrails schema?migratefollows Rails convention and is immediately recognisable.schemais more descriptive but less action-oriented. Decision:trails migratefor the CLI commands;trails.migratefor the Python module.
Relationship to other ADRs¶
| ADR | Impact |
|---|---|
| ADR-0001 (Rust kernel + Python surface) | Unchanged. Migrations are pure Python surface; kernel provides GraphStore.update. |
| ADR-0002 (Python-first shapes) | Unchanged. @shape is one of the sources for auto-detection. |
| ADR-0009 (Provenance always on) | Extended: migration activities recorded as prov:Activity in the tracking graph. |
| ADR-0017 (ActiveGraph ORM) | Complementary. ORM is the data layer; migrations are the schema layer. ADR-0017 explicitly deferred migration DSL — this ADR fills that gap. |
| ADR-0021 (Progressive enhancement) | Aligned. Migrations work at every enhancement level (labels → types → shapes → OWL). |
| ADR-0023 (Federation) | Independent. Each instance manages its own migration state; federation does not propagate migrations. |
| ADR-0025 (Auto-ontology) | Dependent. onto infer (ADR-0025 P1) provides the KG-state half of the auto-detection diff. |
| ADR-0026 (Schema transformation) | Complementary. onto transform handles KG→KG schema mapping; migrations handle versioned schema evolution within a single KG. |
Phased delivery¶
| Phase | Scope | Gate |
|---|---|---|
| 1 | Migration base class, operation catalogue (8 ops), trails migrate run, trails migrate status. Manual migration authoring. |
One hand-written migration applied + tracked in _trails_migrations graph with PROV-O. |
| 2 | trails migrate generate (auto-detection from @node_type / @shape vs. KG state). trails migrate check. |
Auto-generated migration matches a hand-written one for the same change set. |
| 3 | trails migrate rollback, reversibility for all declarative ops, RunSPARQL reverse validation. |
Rollback of a 3-operation migration restores the previous KG state. |
| 4 | trails migrate squash, hash-based tamper detection, partial-apply resume. |
Squash of 3 migrations produces one equivalent migration; partial-apply resumes correctly. |