Schema Transformation¶
Real-world knowledge-graph projects constantly migrate between schemas: legacy systems to standards, v1 to v2, domain A to domain B. Every schema change triggers a cascade of manual work — writing SPARQL CONSTRUCT queries, mapping field names, coercing types, handling structural mismatches. Trails replaces this grind with a declarative transformation engine: give it a source schema and a target schema, and it produces the mapping plan, the executable code, and the provenance trail. The full design lives in ADR-0026; the progressive-enhancement framing is ADR-0021.
How it works¶
The transformation engine compares two schemas — a source and a target —
and produces a TransformPlan containing field mappings, type
coercions, structural changes, and placeholder markers for fields that
have no source equivalent.
Three steps, each optional:
-
Auto-mapping. Fields with matching names and compatible types are mapped automatically.
title: strin source totitle: strin target requires zero configuration. -
LLM-assisted mapping (optional). Fields with ambiguous correspondence (
author_nametocreator.full_name,pub_datetopublished_at) can be proposed by a cheap LLM call. The user confirms or overrides. This step is entirely optional — the engine works without any LLM or API key. Cost-tracked per ADR-0012. -
Code generation. The engine produces SPARQL CONSTRUCT queries (preferred for simple renames and type coercion) or Python transformation functions using the ORM (for complex cases like splitting/merging fields).
Field mapping types¶
The engine supports five mapping strategies:
| Strategy | Description | Generated code |
|---|---|---|
direct |
Same name, same type — identity mapping | No-op (or SPARQL CONSTRUCT pass-through) |
rename |
Different name, same type | SPARQL CONSTRUCT with predicate substitution |
type_coerce |
Same/different name, different type | SPARQL BIND with cast (xsd:integer, etc.) |
computed |
Target derived from multiple source fields | Python function or SPARQL CONCAT/expression |
placeholder |
No source equivalent — marked for enrichment | Field created with (type, placeholder) marker |
Examples of each strategy¶
Direct mapping — no transformation needed:
Rename — field name changed, type preserved:
source: Article.author_name (str) → target: Article.creator_name (str)
Strategy: rename
Generated SPARQL:
CONSTRUCT { ?s ex:creator_name ?o }
WHERE { ?s ex:author_name ?o }
Type coercion — value cast to a different type:
source: Article.year (str) → target: Article.year (int)
Strategy: type_coerce
Generated SPARQL:
CONSTRUCT { ?s ex:year ?year_int }
WHERE { ?s ex:year ?year_str . BIND(xsd:integer(?year_str) AS ?year_int) }
Computed — target derived from multiple source fields:
source: Person.first_name + Person.last_name → target: Person.full_name
Strategy: computed
Generated SPARQL:
CONSTRUCT { ?s ex:full_name ?full }
WHERE { ?s ex:first_name ?f ; ex:last_name ?l .
BIND(CONCAT(?f, " ", ?l) AS ?full) }
Placeholder — no source equivalent, marked for later enrichment:
source: (none) → target: Article.sentiment (str)
Strategy: placeholder
Generated code: field marked as (str, placeholder) in @node_type
SPARQL CONSTRUCT generation¶
For simple transformations (rename, type coercion, merge), the engine generates SPARQL CONSTRUCT queries that run directly against the store. SPARQL CONSTRUCT is preferred because it is declarative, auditable, and does not require Python execution:
# Auto-generated by: trails onto transform
# Source: models/v1.py → Target: models/v2.py
# Rename: author_name → creator_name
CONSTRUCT {
?s <https://myapp.example/creator_name> ?name .
}
WHERE {
?s a <https://myapp.example/Article> ;
<https://myapp.example/author_name> ?name .
}
# Type coercion: year (string → integer)
CONSTRUCT {
?s <https://myapp.example/year> ?year_int .
}
WHERE {
?s a <https://myapp.example/Article> ;
<https://myapp.example/year> ?year_str .
BIND(xsd:integer(?year_str) AS ?year_int)
}
For complex transformations (conditional logic, multi-step derivations, field splitting with regex), the engine generates Python functions using the ActiveGraph ORM:
"""Auto-generated transformation: v1 → v2
Source: models/v1.py
Target: models/v2.py
"""
from trails.orm import node_type
def transform_full_name(node, ctx):
"""Split full_name into first_name + last_name."""
parts = node.full_name.rsplit(" ", 1)
return {
"first_name": parts[0],
"last_name": parts[1] if len(parts) > 1 else "",
}
CLI usage¶
# Generate transformation plan (dry-run, default — no store changes)
trails onto transform --from models/v1.py --to models/v2.py
# Show the generated SPARQL / Python code without executing
trails onto transform --from old.ttl --to new.ttl --plan-only
# Dry-run: show what would change, with node/triple counts
trails onto transform --from models/v1.py --to models/v2.py --dry-run
# Execute the transformation against the store
trails onto transform --from models/v1.py --to models/v2.py --execute
# Disable LLM-assisted mapping (pure deterministic)
trails onto transform --from old.ttl --to new.ttl --no-llm
# Enable LLM-assisted mapping for ambiguous fields
trails onto transform --from old.ttl --to new.ttl --llm-assist
# Mark unmappable target fields as placeholders for enrichment
trails onto transform --from old.ttl --to new.ttl --enrich
Key flags:
| Flag | Description |
|---|---|
--from |
Source schema (Python module or TTL file) |
--to |
Target schema (same formats) |
--plan-only |
Show the transformation plan without code generation |
--dry-run |
Show what would change without modifying the store |
--execute |
Run the transformation (requires explicit opt-in) |
--no-llm |
Disable LLM-assisted mapping — pure deterministic |
--llm-assist |
Enable LLM for ambiguous field mappings (cost-tracked) |
--enrich |
Mark unmappable target fields as placeholders |
The default behavior (no --execute) is safe: it generates the plan
and shows it. No store modifications happen without --execute.
Python API¶
from trails.onto_transform import plan_transform, execute_plan
# Generate a transformation plan
plan = plan_transform(
source="models/v1.py",
target="models/v2.py",
llm_assist=False, # no LLM needed
ctx=ctx,
)
# Inspect the plan before executing
print(f"Mappings: {len(plan.field_mappings)}")
print(f"Placeholders: {len(plan.placeholders)}")
for mapping in plan.field_mappings:
print(f" {mapping.source} → {mapping.target} ({mapping.strategy})")
for ph in plan.placeholders:
print(f" [placeholder] {ph.field} — no source equivalent")
# View generated SPARQL
for query in plan.sparql_queries:
print(query)
# Execute the plan against the store
result = execute_plan(plan, ctx)
print(f"Transformed {result.node_count} nodes, {result.triple_count} triples")
print(f"Duration: {result.duration_ms}ms")
LLM-assisted mapping (optional)¶
When llm_assist=True, the engine sends ambiguous field pairs to a
cheap LLM (Haiku by default) to propose mappings. The user confirms or
overrides each proposal. This is entirely optional — every
transformation works without it.
plan = plan_transform(
source="models/v1.py",
target="models/v2.py",
llm_assist=True,
model="haiku", # cheapest adequate model
ctx=ctx,
)
# LLM-proposed mappings are marked for review
for mapping in plan.field_mappings:
if mapping.llm_proposed:
print(f" [LLM] {mapping.source} → {mapping.target} "
f"(confidence: {mapping.confidence:.2f})")
The LLM is never called unless you explicitly pass llm_assist=True.
No API key is required for deterministic transformations.
Complete example: migrating a schema¶
Suppose you have a v1 schema with articles and want to migrate to v2 with richer metadata:
# models/v1.py
@node_type("Article", fields={
"title": str,
"author_name": str,
"pub_date": str,
"body": str,
})
class Article:
pass
# models/v2.py
from trails.orm import node_type, placeholder
@node_type("Article", fields={
"title": str,
"creator_name": str, # renamed from author_name
"published_at": str, # renamed from pub_date
"body": str,
"sentiment": (str, placeholder), # new — to be enriched later
"word_count": (int, placeholder), # new — to be enriched later
})
class Article:
pass
Run the transformation:
# Preview the plan
trails onto transform --from models/v1.py --to models/v2.py --enrich
# Output:
# Field mappings:
# title → title (direct)
# author_name → creator_name (rename)
# pub_date → published_at (rename)
# body → body (direct)
# Placeholders:
# sentiment — no source (placeholder)
# word_count — no source (placeholder)
# Execute
trails onto transform --from models/v1.py --to models/v2.py --enrich --execute
After transformation, the placeholder fields (sentiment, word_count)
are ready for the enrichment pipeline.
Reference¶
| Symbol | Description |
|---|---|
plan_transform(source, target, *, llm_assist, model, ctx) |
Generate a TransformPlan from source and target schemas |
execute_plan(plan, ctx) |
Execute a transformation plan against the store; returns TransformResult |
TransformPlan |
.field_mappings, .placeholders, .sparql_queries, .python_functions |
FieldMapping |
.source, .target, .strategy, .llm_proposed, .confidence |
Placeholder |
.field, .python_type, .target_type |
TransformResult |
.node_count, .triple_count, .duration_ms, .provenance_id |
See also¶
- ADR-0026 — full transformation and enrichment design
- Enrichment Pipeline — fill placeholder fields after transformation
- ActiveGraph ORM —
@node_type, the schema format - Auto-Ontology — infer and generate schemas
- RML Data Mapping — ingest external data; transform reshapes it within the KG