ADR-0079: Graph-as-Code (GaC) — Python-native and YAML Ontology Declarations¶
- Status: Accepted (2026-05-26)
- Date: 2026-05-26
- Extends: ADR-0002 (Python-first shapes), ADR-0021 (Progressive enhancement)
- Relates to: ADR-0025 (Auto-ontology generation), ADR-0028 (Schema migrations)
- Supersedes: —
- Superseded by: —
Context¶
ADR-0021 defines the progressive enhancement ladder:
ADR-0002 established that shapes are authored Python-first and compiled to
SHACL. Both decisions are correct and remain unchanged. However, a friction
gap persists at the @shape level and above.
The current pain point: reaching the SHACL tier still requires RDF
vocabulary knowledge. Developers must know predicate(), sh:minCount,
sh:minLength, sh:in, and so on before they can do something as mundane as
"this field is required" or "age must be between 0 and 150." The escape to raw
SHACL predicates was intentional in ADR-0002 (full expressiveness), but it
blocks everyday usage.
A parallel exists in infrastructure tooling: before Terraform and Pulumi, provisioning infrastructure meant clicking through a cloud console or hand- authoring YAML manifests. Infrastructure-as-Code removed that barrier by letting engineers describe resources in a high-level language that compiled to the underlying primitives. The same move is overdue for knowledge-graph constraint authoring.
Infrastructure-as-Code moment for KGs: Graph-as-Code (GaC) is the principle that ontology declarations, shape constraints, and cross-property business rules live as ordinary Python — no Turtle files, no SHACL vocabulary, no separate toolchain — and compile down to the same runtime structures the framework already uses.
This ADR records the decision to introduce a GaC annotation layer as a first-
class part of the @app.model surface.
Decision¶
Introduce trails.gac with two declaration surfaces that compile to the
same PredicateInfo / ShapeMeta runtime representation:
- Python surface —
Annotated[]constraint markers on class fields;@app.modelreads them automatically. - YAML surface — a declarative
models.yamlfile loaded viaapp.load_models(path)orload_yaml_models(path, app); no Python class required. Intended for data-engineering workflows (dbt-style pipelines, generated schemas, non-Python tooling).
New API¶
from typing import Annotated
from trails.gac import required, optional, min_length, max_length
from trails.gac import min_value, max_value, pattern, one_of, unique
from trails.gac import constraint, require
@app.model # bare decorator — reads class annotations
class Person:
name: Annotated[str, required(), min_length(1)]
age: Annotated[int, optional(), min_value(0), max_value(150)]
email: Annotated[str, pattern(r".+@.+")]
role: Annotated[str, one_of("admin", "user", "guest")]
@app.model("Employee") # explicit IRI name still works
class Employee(Person):
salary: Annotated[float, optional(), min_value(0)]
contract: Annotated[str, optional()]
@constraint(Employee)
def salary_requires_contract(node):
if node.salary is not None:
require(node.contract is not None, "salary implies contract")
YAML surface (new)¶
# models.yaml
models:
- name: Word
fields:
writtenForm: {type: str, required: true, min_length: 1}
language: {type: str, optional: true, one_of: [de, en, fr]}
quality: {type: str, one_of: [high, benchmark, detected]}
- name: DriftEvent
fields:
year: {type: int, required: true, min_value: 1000}
confidence: {type: float, optional: true, min_value: 0, max_value: 1}
Requires PyYAML (pip install pyyaml). Each field key maps to the same
constraint markers as the Python surface (see Constraint → SHACL Mapping
table). Types: str / string, int / integer, float / number,
bool / boolean.
Old API — unchanged¶
All three surfaces coexist. The decorator detects which path to take: fields=
→ legacy; Annotated[] class body → Python GaC; YAML file → YAML GaC.
Technical Implementation¶
trails/gac.py¶
Each constraint is a frozen dataclass marker. Being frozen makes the markers hashable, equality-comparable, and safely storable in sets or as dict keys.
_Required → sh:minCount 1
_Optional → sh:minCount 0
_MinLength(n) → sh:minLength n
_MaxLength(n) → sh:maxLength n
_MinValue(v) → sh:minInclusive v
_MaxValue(v) → sh:maxInclusive v
_Pattern(r) → sh:pattern r
_OneOf(*vs) → sh:in (v1 v2 …)
_Unique → ORM unique list (no SHACL PropertyShape)
Public factory functions (required(), min_length(n), …) return the
corresponding frozen dataclass instances.
constraint(cls) is a class-level decorator that registers a Python validator
function against a node type. require(cond, msg) raises ConstraintError
when cond is False. run_constraints(node) executes all registered
validators for the node's type.
_extract_gac_annotations(cls)¶
Iterates cls.__annotations__, strips Annotated[T, *markers] into (T,
[markers]), separates plain-type fields from annotated fields, and returns:
_register_gac_shape(node_type_name, constraints_dict)¶
Converts each (field_name, [markers]) pair to a PredicateInfo object using
the same logic as the explicit predicate() descriptor path. Collects them
into a ShapeMeta and inserts it directly into the existing _SHAPES and
_SHAPES_BY_NODE_TYPE registries. No new registry; no parallel code path for
downstream consumers.
app.model() extension¶
model() gains two changes:
- Bare decorator support:
@app.model(no call) is detected by checking whether the argument is a class; the decorator wraps itself and re-applies. - After
node_typeregistration, if_extract_gac_annotationsfinds anyAnnotated[]fields,_register_gac_shapeis called before the decorated class is returned.
Constraint → SHACL Mapping¶
| Python marker | SHACL / ORM equivalent |
|---|---|
required() |
sh:minCount 1 |
optional() |
sh:minCount 0 |
min_length(n) |
sh:minLength n |
max_length(n) |
sh:maxLength n |
min_value(v) |
sh:minInclusive v |
max_value(v) |
sh:maxInclusive v |
pattern(r) |
sh:pattern r |
one_of(*vs) |
sh:in (v1 v2 …) |
unique() |
ORM unique list (not a SHACL PropertyShape) |
Consequences¶
Positive¶
- No Turtle/SHACL knowledge required for the common 90 %+ of constraint patterns. Developers stay in Python throughout.
- Full backward compatibility. The
fields=keyword path, explicitpredicate()descriptors, and raw@shapeare all unchanged. Nothing breaks; teams can migrate field-by-field. - Inspectable markers. Frozen dataclasses are hashable and serialisable. Tooling can enumerate the constraints on a class without running the decorator.
- Debuggable cross-property rules.
@constraintvalidators are plain Python functions — they show up in tracebacks, accept breakpoints, and can be unit-tested in isolation with a mock node object. - One runtime representation. GaC compiles to the same
ShapeMeta/PredicateInfostructures consumed by SHACL export (trails onto export), federation schema negotiation, and SHACL validation. No GaC-specific validation path; correctness is inherited. - Progressive enhancement preserved. GaC sits at level 2 (
@app.model) and part of level 3 (@shape). Level 1 (bare labels) and level 4 (OWL) are unaffected. Users who never annotate a field never encountertrails.gac.
Negative / Limitations¶
- Complex class expressions —
sh:or,sh:xoneover multiple properties,sh:nodenesting — are not addressable from single-field markers. Thepredicate()escape hatch remains the path for those cases. - OWL axiom compilation (
Metainner class,owl:symmetric,owl:transitive, domain/range restrictions) is deliberately out of scope for this ADR. It is a natural next step and may be tracked as ADR-0080. - SPARQL-based SHACL rules (
sh:SPARQLConstraint) cannot be expressed from Python annotations;@constraintcovers the equivalent Python-side, but the SPARQL form requires explicit Turtle. - Marker proliferation risk. Each new constraint type requires a new
dataclass, factory function, and mapping entry. A governance rule is needed:
markers are added only when the SHACL equivalent is unambiguous and the
Python form is materially simpler. Niche SHACL vocabulary stays in the
predicate()layer. - YAML surface: no compile-time checking. YAML field types and constraint
values are validated only at load time (
load_yaml_models), not by a type checker or IDE. The Python surface is preferred when type safety matters; YAML is preferred for generated or non-Python-authored schemas. - YAML surface: no
@constraintcross-property validators. The@constraintdecorator is Python-only and has no YAML equivalent. YAML models that need cross-property rules must add them in Python after callingapp.load_models.
Alternatives Considered¶
1 — Pydantic-style Field() reuse¶
Reuse pydantic.Field as the constraint carrier. Rejected: Pydantic's
FieldInfo is not frozen, carries many Pydantic-internal fields, and creates
a runtime dependency on Pydantic for users who do not otherwise need it.
Trails' markers are a thin, dependency-free layer.
2 — Class Meta inner class (Django-style)¶
Declare constraints in a Meta inner class, a la Django model Meta. Allows
multi-field constraints naturally but is verbose for per-field rules and does
not integrate with Python's Annotated[] type system (no IDE type narrowing,
no get_type_hints introspection).
3 — Keep predicate() as the only path¶
The status quo. Rejected because it gates the SHACL tier behind RDF vocabulary knowledge, violating the spirit of ADR-0021's "one surface, additive features." GaC is the additive feature that makes level 3 accessible to developers without a semweb background.
References¶
- ADR-0021: Progressive enhancement, not tiered surfaces
- ADR-0002: Python-first shapes, emit SHACL
- ADR-0025: Auto-ontology generation
- ADR-0028: Schema migrations
- Terraform documentation — Infrastructure as Code concept
- Pulumi — "Cloud Infrastructure as Software"
- W3C SHACL specification (W3C Recommendation, 2017)