Skip to content

Schema Transformation

Real-world knowledge-graph projects constantly migrate between schemas: legacy systems to standards, v1 to v2, domain A to domain B. Every schema change triggers a cascade of manual work — writing SPARQL CONSTRUCT queries, mapping field names, coercing types, handling structural mismatches. Trails replaces this grind with a declarative transformation engine: give it a source schema and a target schema, and it produces the mapping plan, the executable code, and the provenance trail. The full design lives in ADR-0026; the progressive-enhancement framing is ADR-0021.

How it works

The transformation engine compares two schemas — a source and a target — and produces a TransformPlan containing field mappings, type coercions, structural changes, and placeholder markers for fields that have no source equivalent.

source schema + target schema → TransformPlan → SPARQL / Python code → execute

Three steps, each optional:

  1. Auto-mapping. Fields with matching names and compatible types are mapped automatically. title: str in source to title: str in target requires zero configuration.

  2. LLM-assisted mapping (optional). Fields with ambiguous correspondence (author_name to creator.full_name, pub_date to published_at) can be proposed by a cheap LLM call. The user confirms or overrides. This step is entirely optional — the engine works without any LLM or API key. Cost-tracked per ADR-0012.

  3. Code generation. The engine produces SPARQL CONSTRUCT queries (preferred for simple renames and type coercion) or Python transformation functions using the ORM (for complex cases like splitting/merging fields).

Field mapping types

The engine supports five mapping strategies:

Strategy Description Generated code
direct Same name, same type — identity mapping No-op (or SPARQL CONSTRUCT pass-through)
rename Different name, same type SPARQL CONSTRUCT with predicate substitution
type_coerce Same/different name, different type SPARQL BIND with cast (xsd:integer, etc.)
computed Target derived from multiple source fields Python function or SPARQL CONCAT/expression
placeholder No source equivalent — marked for enrichment Field created with (type, placeholder) marker

Examples of each strategy

Direct mapping — no transformation needed:

source: Article.title (str) → target: Article.title (str)
Strategy: direct

Rename — field name changed, type preserved:

source: Article.author_name (str) → target: Article.creator_name (str)
Strategy: rename
Generated SPARQL:
  CONSTRUCT { ?s ex:creator_name ?o }
  WHERE     { ?s ex:author_name ?o }

Type coercion — value cast to a different type:

source: Article.year (str) → target: Article.year (int)
Strategy: type_coerce
Generated SPARQL:
  CONSTRUCT { ?s ex:year ?year_int }
  WHERE     { ?s ex:year ?year_str . BIND(xsd:integer(?year_str) AS ?year_int) }

Computed — target derived from multiple source fields:

source: Person.first_name + Person.last_name → target: Person.full_name
Strategy: computed
Generated SPARQL:
  CONSTRUCT { ?s ex:full_name ?full }
  WHERE     { ?s ex:first_name ?f ; ex:last_name ?l .
              BIND(CONCAT(?f, " ", ?l) AS ?full) }

Placeholder — no source equivalent, marked for later enrichment:

source: (none) → target: Article.sentiment (str)
Strategy: placeholder
Generated code: field marked as (str, placeholder) in @node_type

SPARQL CONSTRUCT generation

For simple transformations (rename, type coercion, merge), the engine generates SPARQL CONSTRUCT queries that run directly against the store. SPARQL CONSTRUCT is preferred because it is declarative, auditable, and does not require Python execution:

# Auto-generated by: trails onto transform
# Source: models/v1.py → Target: models/v2.py

# Rename: author_name → creator_name
CONSTRUCT {
    ?s <https://myapp.example/creator_name> ?name .
}
WHERE {
    ?s a <https://myapp.example/Article> ;
       <https://myapp.example/author_name> ?name .
}

# Type coercion: year (string → integer)
CONSTRUCT {
    ?s <https://myapp.example/year> ?year_int .
}
WHERE {
    ?s a <https://myapp.example/Article> ;
       <https://myapp.example/year> ?year_str .
    BIND(xsd:integer(?year_str) AS ?year_int)
}

For complex transformations (conditional logic, multi-step derivations, field splitting with regex), the engine generates Python functions using the ActiveGraph ORM:

"""Auto-generated transformation: v1 → v2

Source: models/v1.py
Target: models/v2.py
"""
from trails.orm import node_type

def transform_full_name(node, ctx):
    """Split full_name into first_name + last_name."""
    parts = node.full_name.rsplit(" ", 1)
    return {
        "first_name": parts[0],
        "last_name": parts[1] if len(parts) > 1 else "",
    }

CLI usage

# Generate transformation plan (dry-run, default — no store changes)
trails onto transform --from models/v1.py --to models/v2.py

# Show the generated SPARQL / Python code without executing
trails onto transform --from old.ttl --to new.ttl --plan-only

# Dry-run: show what would change, with node/triple counts
trails onto transform --from models/v1.py --to models/v2.py --dry-run

# Execute the transformation against the store
trails onto transform --from models/v1.py --to models/v2.py --execute

# Disable LLM-assisted mapping (pure deterministic)
trails onto transform --from old.ttl --to new.ttl --no-llm

# Enable LLM-assisted mapping for ambiguous fields
trails onto transform --from old.ttl --to new.ttl --llm-assist

# Mark unmappable target fields as placeholders for enrichment
trails onto transform --from old.ttl --to new.ttl --enrich

Key flags:

Flag Description
--from Source schema (Python module or TTL file)
--to Target schema (same formats)
--plan-only Show the transformation plan without code generation
--dry-run Show what would change without modifying the store
--execute Run the transformation (requires explicit opt-in)
--no-llm Disable LLM-assisted mapping — pure deterministic
--llm-assist Enable LLM for ambiguous field mappings (cost-tracked)
--enrich Mark unmappable target fields as placeholders

The default behavior (no --execute) is safe: it generates the plan and shows it. No store modifications happen without --execute.

Python API

from trails.onto_transform import plan_transform, execute_plan

# Generate a transformation plan
plan = plan_transform(
    source="models/v1.py",
    target="models/v2.py",
    llm_assist=False,  # no LLM needed
    ctx=ctx,
)

# Inspect the plan before executing
print(f"Mappings: {len(plan.field_mappings)}")
print(f"Placeholders: {len(plan.placeholders)}")

for mapping in plan.field_mappings:
    print(f"  {mapping.source}{mapping.target} ({mapping.strategy})")

for ph in plan.placeholders:
    print(f"  [placeholder] {ph.field} — no source equivalent")

# View generated SPARQL
for query in plan.sparql_queries:
    print(query)

# Execute the plan against the store
result = execute_plan(plan, ctx)
print(f"Transformed {result.node_count} nodes, {result.triple_count} triples")
print(f"Duration: {result.duration_ms}ms")

LLM-assisted mapping (optional)

When llm_assist=True, the engine sends ambiguous field pairs to a cheap LLM (Haiku by default) to propose mappings. The user confirms or overrides each proposal. This is entirely optional — every transformation works without it.

plan = plan_transform(
    source="models/v1.py",
    target="models/v2.py",
    llm_assist=True,
    model="haiku",  # cheapest adequate model
    ctx=ctx,
)

# LLM-proposed mappings are marked for review
for mapping in plan.field_mappings:
    if mapping.llm_proposed:
        print(f"  [LLM] {mapping.source}{mapping.target} "
              f"(confidence: {mapping.confidence:.2f})")

The LLM is never called unless you explicitly pass llm_assist=True. No API key is required for deterministic transformations.

Complete example: migrating a schema

Suppose you have a v1 schema with articles and want to migrate to v2 with richer metadata:

# models/v1.py
@node_type("Article", fields={
    "title": str,
    "author_name": str,
    "pub_date": str,
    "body": str,
})
class Article:
    pass
# models/v2.py
from trails.orm import node_type, placeholder

@node_type("Article", fields={
    "title": str,
    "creator_name": str,           # renamed from author_name
    "published_at": str,           # renamed from pub_date
    "body": str,
    "sentiment": (str, placeholder),  # new — to be enriched later
    "word_count": (int, placeholder), # new — to be enriched later
})
class Article:
    pass

Run the transformation:

# Preview the plan
trails onto transform --from models/v1.py --to models/v2.py --enrich

# Output:
# Field mappings:
#   title       → title         (direct)
#   author_name → creator_name  (rename)
#   pub_date    → published_at  (rename)
#   body        → body          (direct)
# Placeholders:
#   sentiment   — no source (placeholder)
#   word_count  — no source (placeholder)

# Execute
trails onto transform --from models/v1.py --to models/v2.py --enrich --execute

After transformation, the placeholder fields (sentiment, word_count) are ready for the enrichment pipeline.

Reference

Symbol Description
plan_transform(source, target, *, llm_assist, model, ctx) Generate a TransformPlan from source and target schemas
execute_plan(plan, ctx) Execute a transformation plan against the store; returns TransformResult
TransformPlan .field_mappings, .placeholders, .sparql_queries, .python_functions
FieldMapping .source, .target, .strategy, .llm_proposed, .confidence
Placeholder .field, .python_type, .target_type
TransformResult .node_count, .triple_count, .duration_ms, .provenance_id

See also