Schema Transformation¶

Real-world knowledge-graph projects constantly migrate between schemas: legacy systems to standards, v1 to v2, domain A to domain B. Every schema change triggers a cascade of manual work — writing SPARQL CONSTRUCT queries, mapping field names, coercing types, handling structural mismatches. Trails replaces this grind with a declarative transformation engine: give it a source schema and a target schema, and it produces the mapping plan, the executable code, and the provenance trail. The full design lives in ADR-0026; the progressive-enhancement framing is ADR-0021.

How it works¶

The transformation engine compares two schemas — a source and a target — and produces a TransformPlan containing field mappings, type coercions, structural changes, and placeholder markers for fields that have no source equivalent.

source schema + target schema → TransformPlan → SPARQL / Python code → execute

Three steps, each optional:

Auto-mapping. Fields with matching names and compatible types are mapped automatically. title: str in source to title: str in target requires zero configuration.
LLM-assisted mapping (optional). Fields with ambiguous correspondence (author_name to creator.full_name, pub_date to published_at) can be proposed by a cheap LLM call. The user confirms or overrides. This step is entirely optional — the engine works without any LLM or API key. Cost-tracked per ADR-0012.
Code generation. The engine produces SPARQL CONSTRUCT queries (preferred for simple renames and type coercion) or Python transformation functions using the ORM (for complex cases like splitting/merging fields).

Field mapping types¶

The engine supports five mapping strategies:

Strategy	Description	Generated code
`direct`	Same name, same type — identity mapping	No-op (or SPARQL CONSTRUCT pass-through)
`rename`	Different name, same type	SPARQL CONSTRUCT with predicate substitution
`type_coerce`	Same/different name, different type	SPARQL BIND with cast (`xsd:integer`, etc.)
`computed`	Target derived from multiple source fields	Python function or SPARQL CONCAT/expression
`placeholder`	No source equivalent — marked for enrichment	Field created with `(type, placeholder)` marker

Examples of each strategy¶

Direct mapping — no transformation needed:

source: Article.title (str) → target: Article.title (str)
Strategy: direct

Rename — field name changed, type preserved:

source: Article.author_name (str) → target: Article.creator_name (str)
Strategy: rename
Generated SPARQL:
  CONSTRUCT { ?s ex:creator_name ?o }
  WHERE     { ?s ex:author_name ?o }

Type coercion — value cast to a different type:

source: Article.year (str) → target: Article.year (int)
Strategy: type_coerce
Generated SPARQL:
  CONSTRUCT { ?s ex:year ?year_int }
  WHERE     { ?s ex:year ?year_str . BIND(xsd:integer(?year_str) AS ?year_int) }

Computed — target derived from multiple source fields:

source: Person.first_name + Person.last_name → target: Person.full_name
Strategy: computed
Generated SPARQL:
  CONSTRUCT { ?s ex:full_name ?full }
  WHERE     { ?s ex:first_name ?f ; ex:last_name ?l .
              BIND(CONCAT(?f, " ", ?l) AS ?full) }

Placeholder — no source equivalent, marked for later enrichment:

source: (none) → target: Article.sentiment (str)
Strategy: placeholder
Generated code: field marked as (str, placeholder) in @node_type

SPARQL CONSTRUCT generation¶

For simple transformations (rename, type coercion, merge), the engine generates SPARQL CONSTRUCT queries that run directly against the store. SPARQL CONSTRUCT is preferred because it is declarative, auditable, and does not require Python execution:

# Auto-generated by: trails onto transform
# Source: models/v1.py → Target: models/v2.py

# Rename: author_name → creator_name
CONSTRUCT {
    ?s <https://myapp.example/creator_name> ?name .
}
WHERE {
    ?s a <https://myapp.example/Article> ;
       <https://myapp.example/author_name> ?name .
}

# Type coercion: year (string → integer)
CONSTRUCT {
    ?s <https://myapp.example/year> ?year_int .
}
WHERE {
    ?s a <https://myapp.example/Article> ;
       <https://myapp.example/year> ?year_str .
    BIND(xsd:integer(?year_str) AS ?year_int)
}

For complex transformations (conditional logic, multi-step derivations, field splitting with regex), the engine generates Python functions using the ActiveGraph ORM:

"""Auto-generated transformation: v1 → v2

Source: models/v1.py
Target: models/v2.py
"""
from trails.orm import node_type

def transform_full_name(node, ctx):
    """Split full_name into first_name + last_name."""
    parts = node.full_name.rsplit(" ", 1)
    return {
        "first_name": parts[0],
        "last_name": parts[1] if len(parts) > 1 else "",
    }

CLI usage¶

# Generate transformation plan (dry-run, default — no store changes)
trails onto transform --from models/v1.py --to models/v2.py

# Show the generated SPARQL / Python code without executing
trails onto transform --from old.ttl --to new.ttl --plan-only

# Dry-run: show what would change, with node/triple counts
trails onto transform --from models/v1.py --to models/v2.py --dry-run

# Execute the transformation against the store
trails onto transform --from models/v1.py --to models/v2.py --execute

# Disable LLM-assisted mapping (pure deterministic)
trails onto transform --from old.ttl --to new.ttl --no-llm

# Enable LLM-assisted mapping for ambiguous fields
trails onto transform --from old.ttl --to new.ttl --llm-assist

# Mark unmappable target fields as placeholders for enrichment
trails onto transform --from old.ttl --to new.ttl --enrich

Key flags:

Flag	Description
`--from`	Source schema (Python module or TTL file)
`--to`	Target schema (same formats)
`--plan-only`	Show the transformation plan without code generation
`--dry-run`	Show what would change without modifying the store
`--execute`	Run the transformation (requires explicit opt-in)
`--no-llm`	Disable LLM-assisted mapping — pure deterministic
`--llm-assist`	Enable LLM for ambiguous field mappings (cost-tracked)
`--enrich`	Mark unmappable target fields as placeholders

The default behavior (no --execute) is safe: it generates the plan and shows it. No store modifications happen without --execute.

Python API¶

from trails.onto_transform import plan_transform, execute_plan

# Generate a transformation plan
plan = plan_transform(
    source="models/v1.py",
    target="models/v2.py",
    llm_assist=False,  # no LLM needed
    ctx=ctx,
)

# Inspect the plan before executing
print(f"Mappings: {len(plan.field_mappings)}")
print(f"Placeholders: {len(plan.placeholders)}")

for mapping in plan.field_mappings:
    print(f"  {mapping.source} → {mapping.target} ({mapping.strategy})")

for ph in plan.placeholders:
    print(f"  [placeholder] {ph.field} — no source equivalent")

# View generated SPARQL
for query in plan.sparql_queries:
    print(query)

# Execute the plan against the store
result = execute_plan(plan, ctx)
print(f"Transformed {result.node_count} nodes, {result.triple_count} triples")
print(f"Duration: {result.duration_ms}ms")

LLM-assisted mapping (optional)¶

When llm_assist=True, the engine sends ambiguous field pairs to a cheap LLM (Haiku by default) to propose mappings. The user confirms or overrides each proposal. This is entirely optional — every transformation works without it.

plan = plan_transform(
    source="models/v1.py",
    target="models/v2.py",
    llm_assist=True,
    model="haiku",  # cheapest adequate model
    ctx=ctx,
)

# LLM-proposed mappings are marked for review
for mapping in plan.field_mappings:
    if mapping.llm_proposed:
        print(f"  [LLM] {mapping.source} → {mapping.target} "
              f"(confidence: {mapping.confidence:.2f})")

The LLM is never called unless you explicitly pass llm_assist=True. No API key is required for deterministic transformations.

Complete example: migrating a schema¶

Suppose you have a v1 schema with articles and want to migrate to v2 with richer metadata:

# models/v1.py
@node_type("Article", fields={
    "title": str,
    "author_name": str,
    "pub_date": str,
    "body": str,
})
class Article:
    pass

# models/v2.py
from trails.orm import node_type, placeholder

@node_type("Article", fields={
    "title": str,
    "creator_name": str,           # renamed from author_name
    "published_at": str,           # renamed from pub_date
    "body": str,
    "sentiment": (str, placeholder),  # new — to be enriched later
    "word_count": (int, placeholder), # new — to be enriched later
})
class Article:
    pass

Run the transformation:

# Preview the plan
trails onto transform --from models/v1.py --to models/v2.py --enrich

# Output:
# Field mappings:
#   title       → title         (direct)
#   author_name → creator_name  (rename)
#   pub_date    → published_at  (rename)
#   body        → body          (direct)
# Placeholders:
#   sentiment   — no source (placeholder)
#   word_count  — no source (placeholder)

# Execute
trails onto transform --from models/v1.py --to models/v2.py --enrich --execute

After transformation, the placeholder fields (sentiment, word_count) are ready for the enrichment pipeline.

Reference¶

Symbol	Description
`plan_transform(source, target, *, llm_assist, model, ctx)`	Generate a `TransformPlan` from source and target schemas
`execute_plan(plan, ctx)`	Execute a transformation plan against the store; returns `TransformResult`
`TransformPlan`	`.field_mappings`, `.placeholders`, `.sparql_queries`, `.python_functions`
`FieldMapping`	`.source`, `.target`, `.strategy`, `.llm_proposed`, `.confidence`
`Placeholder`	`.field`, `.python_type`, `.target_type`
`TransformResult`	`.node_count`, `.triple_count`, `.duration_ms`, `.provenance_id`