Skip to content

ADR-0079: Graph-as-Code (GaC) — Python-native and YAML Ontology Declarations

  • Status: Accepted (2026-05-26)
  • Date: 2026-05-26
  • Extends: ADR-0002 (Python-first shapes), ADR-0021 (Progressive enhancement)
  • Relates to: ADR-0025 (Auto-ontology generation), ADR-0028 (Schema migrations)
  • Supersedes:
  • Superseded by:

Context

ADR-0021 defines the progressive enhancement ladder:

labels  →  @app.model (JSON-Schema)  →  @shape (SHACL)  →  OWL

ADR-0002 established that shapes are authored Python-first and compiled to SHACL. Both decisions are correct and remain unchanged. However, a friction gap persists at the @shape level and above.

The current pain point: reaching the SHACL tier still requires RDF vocabulary knowledge. Developers must know predicate(), sh:minCount, sh:minLength, sh:in, and so on before they can do something as mundane as "this field is required" or "age must be between 0 and 150." The escape to raw SHACL predicates was intentional in ADR-0002 (full expressiveness), but it blocks everyday usage.

A parallel exists in infrastructure tooling: before Terraform and Pulumi, provisioning infrastructure meant clicking through a cloud console or hand- authoring YAML manifests. Infrastructure-as-Code removed that barrier by letting engineers describe resources in a high-level language that compiled to the underlying primitives. The same move is overdue for knowledge-graph constraint authoring.

Infrastructure-as-Code moment for KGs: Graph-as-Code (GaC) is the principle that ontology declarations, shape constraints, and cross-property business rules live as ordinary Python — no Turtle files, no SHACL vocabulary, no separate toolchain — and compile down to the same runtime structures the framework already uses.

This ADR records the decision to introduce a GaC annotation layer as a first- class part of the @app.model surface.


Decision

Introduce trails.gac with two declaration surfaces that compile to the same PredicateInfo / ShapeMeta runtime representation:

  1. Python surfaceAnnotated[] constraint markers on class fields; @app.model reads them automatically.
  2. YAML surface — a declarative models.yaml file loaded via app.load_models(path) or load_yaml_models(path, app); no Python class required. Intended for data-engineering workflows (dbt-style pipelines, generated schemas, non-Python tooling).

New API

from typing import Annotated
from trails.gac import required, optional, min_length, max_length
from trails.gac import min_value, max_value, pattern, one_of, unique
from trails.gac import constraint, require

@app.model                        # bare decorator — reads class annotations
class Person:
    name:  Annotated[str, required(), min_length(1)]
    age:   Annotated[int, optional(), min_value(0), max_value(150)]
    email: Annotated[str, pattern(r".+@.+")]
    role:  Annotated[str, one_of("admin", "user", "guest")]

@app.model("Employee")            # explicit IRI name still works
class Employee(Person):
    salary:   Annotated[float, optional(), min_value(0)]
    contract: Annotated[str, optional()]

@constraint(Employee)
def salary_requires_contract(node):
    if node.salary is not None:
        require(node.contract is not None, "salary implies contract")

YAML surface (new)

# models.yaml
models:
  - name: Word
    fields:
      writtenForm: {type: str, required: true, min_length: 1}
      language:    {type: str, optional: true, one_of: [de, en, fr]}
      quality:     {type: str, one_of: [high, benchmark, detected]}

  - name: DriftEvent
    fields:
      year:       {type: int, required: true, min_value: 1000}
      confidence: {type: float, optional: true, min_value: 0, max_value: 1}
app.load_models("models.yaml")   # all models registered, shapes compiled

Requires PyYAML (pip install pyyaml). Each field key maps to the same constraint markers as the Python surface (see Constraint → SHACL Mapping table). Types: str / string, int / integer, float / number, bool / boolean.

Old API — unchanged

@app.model("Post", fields={"title": str, "body": str})
class Post: pass

All three surfaces coexist. The decorator detects which path to take: fields= → legacy; Annotated[] class body → Python GaC; YAML file → YAML GaC.


Technical Implementation

trails/gac.py

Each constraint is a frozen dataclass marker. Being frozen makes the markers hashable, equality-comparable, and safely storable in sets or as dict keys.

_Required       → sh:minCount 1
_Optional       → sh:minCount 0
_MinLength(n)   → sh:minLength n
_MaxLength(n)   → sh:maxLength n
_MinValue(v)    → sh:minInclusive v
_MaxValue(v)    → sh:maxInclusive v
_Pattern(r)     → sh:pattern r
_OneOf(*vs)     → sh:in (v1 v2 …)
_Unique         → ORM unique list (no SHACL PropertyShape)

Public factory functions (required(), min_length(n), …) return the corresponding frozen dataclass instances.

constraint(cls) is a class-level decorator that registers a Python validator function against a node type. require(cond, msg) raises ConstraintError when cond is False. run_constraints(node) executes all registered validators for the node's type.

_extract_gac_annotations(cls)

Iterates cls.__annotations__, strips Annotated[T, *markers] into (T, [markers]), separates plain-type fields from annotated fields, and returns:

plain_fields: dict[str, type]
constraints:  dict[str, list[marker]]

_register_gac_shape(node_type_name, constraints_dict)

Converts each (field_name, [markers]) pair to a PredicateInfo object using the same logic as the explicit predicate() descriptor path. Collects them into a ShapeMeta and inserts it directly into the existing _SHAPES and _SHAPES_BY_NODE_TYPE registries. No new registry; no parallel code path for downstream consumers.

app.model() extension

model() gains two changes:

  1. Bare decorator support: @app.model (no call) is detected by checking whether the argument is a class; the decorator wraps itself and re-applies.
  2. After node_type registration, if _extract_gac_annotations finds any Annotated[] fields, _register_gac_shape is called before the decorated class is returned.

Constraint → SHACL Mapping

Python marker SHACL / ORM equivalent
required() sh:minCount 1
optional() sh:minCount 0
min_length(n) sh:minLength n
max_length(n) sh:maxLength n
min_value(v) sh:minInclusive v
max_value(v) sh:maxInclusive v
pattern(r) sh:pattern r
one_of(*vs) sh:in (v1 v2 …)
unique() ORM unique list (not a SHACL PropertyShape)

Consequences

Positive

  • No Turtle/SHACL knowledge required for the common 90 %+ of constraint patterns. Developers stay in Python throughout.
  • Full backward compatibility. The fields= keyword path, explicit predicate() descriptors, and raw @shape are all unchanged. Nothing breaks; teams can migrate field-by-field.
  • Inspectable markers. Frozen dataclasses are hashable and serialisable. Tooling can enumerate the constraints on a class without running the decorator.
  • Debuggable cross-property rules. @constraint validators are plain Python functions — they show up in tracebacks, accept breakpoints, and can be unit-tested in isolation with a mock node object.
  • One runtime representation. GaC compiles to the same ShapeMeta / PredicateInfo structures consumed by SHACL export (trails onto export), federation schema negotiation, and SHACL validation. No GaC-specific validation path; correctness is inherited.
  • Progressive enhancement preserved. GaC sits at level 2 (@app.model) and part of level 3 (@shape). Level 1 (bare labels) and level 4 (OWL) are unaffected. Users who never annotate a field never encounter trails.gac.

Negative / Limitations

  • Complex class expressionssh:or, sh:xone over multiple properties, sh:node nesting — are not addressable from single-field markers. The predicate() escape hatch remains the path for those cases.
  • OWL axiom compilation (Meta inner class, owl:symmetric, owl:transitive, domain/range restrictions) is deliberately out of scope for this ADR. It is a natural next step and may be tracked as ADR-0080.
  • SPARQL-based SHACL rules (sh:SPARQLConstraint) cannot be expressed from Python annotations; @constraint covers the equivalent Python-side, but the SPARQL form requires explicit Turtle.
  • Marker proliferation risk. Each new constraint type requires a new dataclass, factory function, and mapping entry. A governance rule is needed: markers are added only when the SHACL equivalent is unambiguous and the Python form is materially simpler. Niche SHACL vocabulary stays in the predicate() layer.
  • YAML surface: no compile-time checking. YAML field types and constraint values are validated only at load time (load_yaml_models), not by a type checker or IDE. The Python surface is preferred when type safety matters; YAML is preferred for generated or non-Python-authored schemas.
  • YAML surface: no @constraint cross-property validators. The @constraint decorator is Python-only and has no YAML equivalent. YAML models that need cross-property rules must add them in Python after calling app.load_models.

Alternatives Considered

1 — Pydantic-style Field() reuse

Reuse pydantic.Field as the constraint carrier. Rejected: Pydantic's FieldInfo is not frozen, carries many Pydantic-internal fields, and creates a runtime dependency on Pydantic for users who do not otherwise need it. Trails' markers are a thin, dependency-free layer.

2 — Class Meta inner class (Django-style)

Declare constraints in a Meta inner class, a la Django model Meta. Allows multi-field constraints naturally but is verbose for per-field rules and does not integrate with Python's Annotated[] type system (no IDE type narrowing, no get_type_hints introspection).

3 — Keep predicate() as the only path

The status quo. Rejected because it gates the SHACL tier behind RDF vocabulary knowledge, violating the spirit of ADR-0021's "one surface, additive features." GaC is the additive feature that makes level 3 accessible to developers without a semweb background.


References

  • ADR-0021: Progressive enhancement, not tiered surfaces
  • ADR-0002: Python-first shapes, emit SHACL
  • ADR-0025: Auto-ontology generation
  • ADR-0028: Schema migrations
  • Terraform documentation — Infrastructure as Code concept
  • Pulumi — "Cloud Infrastructure as Software"
  • W3C SHACL specification (W3C Recommendation, 2017)