ADR-0029: KG Test Primitives — Competency Questions to Assertions¶
- Status: Accepted (2026-04-19)
- Date: 2026-04-17
Context¶
Trails ships trails.testing with four helpers (isolated_kernel,
mock_llm, capture_events, fresh_context) that solve
infrastructure isolation — clean registries, deterministic LLM
responses, scoped observability. What is missing is graph-state
testing: asserting that the knowledge graph contains the right
triples, instances, and provenance after a capability runs.
Three forces converge to make this the right time:
-
ISTQB-shaped test design. The project author holds ISTQB Test Manager and Test Analyst certifications. Structured test design — test conditions derived from requirements, traceability between conditions and test cases, coverage metrics — is non-negotiable. Knowledge graphs have a natural test-condition primitive that the semantic-web community already uses: competency questions (CQs). A CQ is a natural-language question the ontology must be able to answer; it maps directly to an ISTQB test condition.
-
Research validates the pipeline. TESTaLOD (Blomqvist et al.) and the ISWC eXtreme Design methodology both show that KG developers want a CQ → SPARQL → assertion pipeline. Developers write competency questions during ontology design, then translate them to SPARQL queries, then wrap those queries in executable tests. Today this pipeline is manual and ad-hoc. A framework primitive makes it reproducible.
-
"Rails for KG" without
rails testis incomplete. Rails ships test generators, fixtures, assertions, and a test runner that wraps Minitest/RSpec. Trails currently extends pytest with isolation helpers but provides no KG-specific assertions, no graph fixtures, no snapshot testing, and no CQ coverage reporting. Developers fall back to raw SPARQL in test bodies — the same anti-pattern ADR-0017 addressed in production code.
The existing trails.testing surface (Level 0) is a solid foundation.
This ADR adds four layers on top, following the progressive-enhancement
principle from ADR-0021: each layer is additive, and code at a lower
level keeps working unchanged.
Decision¶
Trails ships five test layers, each additive. No migration required between layers. Everything extends pytest — no custom runner.
Layer 1: Graph assertions — trails.testing.assertions¶
A module of composable assertion functions that inspect the kernel
store through an existing Context. Every function raises
AssertionError with a KG-aware message (including the triple or
type that failed) so pytest's native reporting picks it up.
from trails.testing import assert_graph
assert_graph.has_type(ctx, "Patient") # type exists in store
assert_graph.has_instance(ctx, "Patient", id=x) # instance with id exists
assert_graph.triple_exists(ctx, s, p, o) # specific triple present
assert_graph.count(ctx, "Patient") == 5 # instance count
assert_graph.field_equals(ctx, instance, "name", "Alice")
assert_graph.shacl_valid(ctx) # full SHACL validation passes
assert_graph.no_orphans(ctx) # no dangling references
assert_graph.provenance_chain(ctx, instance) # PROV-O chain intact
Design constraints:
- Functions accept a
Context, not a raw store — consistent with the existingfresh_context()helper and every ORM entry point. - Each function is a standalone callable, not a method on a class — composable in plain pytest without subclassing.
- Error messages include the SPARQL query that failed (when applicable) so developers can debug in a SPARQL console.
shacl_valid(ctx)delegates to the kernel validator (ADR-0002); does not reimplement validation in Python.
Layer 2: Competency questions — @competency_question decorator¶
A decorator that marks a test as implementing a competency question. The CQ text is metadata collected at test-discovery time and surfaced in reports.
from trails.testing import competency_question
@competency_question("Which patients have more than 2 encounters?")
def test_frequent_patients(ctx):
results = Patient.where(encounter_count__gt=2).fetch(ctx)
assert len(results) > 0
for p in results:
assert p.encounter_count > 2
ISTQB mapping:
| ISTQB concept | Trails CQ equivalent |
|---|---|
| Test condition | Competency question text |
| Test case | Decorated test function |
| Test coverage | CQ coverage report (see CLI) |
| Test basis | Ontology + domain requirements |
Mechanics:
- The decorator stores the CQ string as
test_func._cq_textand adds a pytest marker (@pytest.mark.competency_question). - A pytest plugin (shipped in
trails.testing) collects all marked tests at session end and writes a coverage report. trails test --cq-reportoutputs which CQs are covered, which are pending (declared but@pytest.mark.skip), and which have no test.- CQs can also be declared without a test body (for planning):
@competency_question("...", pending=True).
Layer 3: Graph fixtures — trails.testing.fixtures¶
Reusable graph-state factories, analogous to Rails fixtures or pytest factories. Two forms: programmatic and file-based.
Programmatic fixtures:
from trails.testing import graph_fixture
@graph_fixture("patients")
def sample_patients(ctx):
p1 = Patient(name="Alice", age=30)
p2 = Patient(name="Bob", age=45)
ctx.kg.add(p1); ctx.kg.add(p2)
return {"alice": p1, "bob": p2}
def test_something(ctx, patients):
assert patients["alice"].name == "Alice"
File-based fixtures (TTL/N-Triples):
from trails.testing import load_fixture
def test_with_ttl(ctx):
load_fixture(ctx, "fixtures/patients.ttl")
patients = Patient.where().fetch(ctx)
assert len(patients) == 3
Isolation guarantee: every test gets a fresh graph context. The
graph_fixture decorator integrates with isolated_kernel to ensure
fixture data does not leak between tests. File-based fixtures are
loaded into the test's scoped store, not the singleton.
Discovery: trails test --fixtures lists all registered
@graph_fixture names with their docstrings.
Layer 4: Snapshot testing — trails.testing.snapshots¶
Compare the current graph state against a known-good serialization. Useful for migration testing and regression detection.
from trails.testing import assert_graph_snapshot
def test_after_migration(ctx):
run_migration(ctx)
assert_graph_snapshot(ctx, "after_migration.nt") # N-Triples snapshot
Mechanics:
- Snapshots are stored as N-Triples files (deterministic sort order, blank-node-stable via Oxigraph's canonical serializer).
- On first run with no snapshot file, the test writes the snapshot and passes (like Jest's snapshot behaviour).
- On subsequent runs, the test compares current graph output against the stored snapshot. Diff is shown triple-by-triple.
trails test --snapshot-updateregenerates all snapshots from current state.- Snapshot files live in
tests/__snapshots__/by convention.
CLI surface¶
All commands wrap pytest with Trails context — no custom test runner.
| Command | Effect |
|---|---|
trails test |
Run all tests (wraps pytest with trails plugin loaded) |
trails test --cq-report |
Competency question coverage report |
trails test --fixtures |
List registered graph fixtures |
trails test --snapshot-update |
Regenerate snapshot files |
trails test -k <pattern> |
Pass-through to pytest -k filter |
Progressive enhancement levels¶
| Level | What | Requires |
|---|---|---|
| 0 | Plain pytest + isolated_kernel |
trails.testing (today) |
| 1 | Graph assertions | trails.testing.assertions |
| 2 | Competency questions | @competency_question decorator |
| 3 | Graph fixtures + snapshots | trails.testing.fixtures, trails.testing.snapshots |
| 4 | Coverage reports | trails test --cq-report |
Each level is additive. A project using Level 0 today gains Level 1 by
importing assert_graph — no migration, no config change.
Consequences¶
Positive¶
- Structured test design for KGs. Competency questions bridge the gap between ontology design and test automation — the same gap that TESTaLOD and XD methodology identified.
- ISTQB alignment. Test conditions (CQs), test cases (decorated functions), and coverage metrics (CQ report) map directly to ISTQB Foundation concepts. Test managers can read the CQ report without understanding SPARQL.
- Rails parity.
trails testbecomes as natural asrails test. Fixtures, assertions, and a test runner that "just works" lower the barrier for developers new to KGs. - Progressive. ADR-0021 compliance: each layer is additive; no existing test code breaks.
- Debuggable. Graph assertions include the failing SPARQL query in the error message. Snapshot diffs are triple-by-triple. CQ reports show exactly which domain questions are untested.
Negative¶
- Surface area. Four new submodules (
assertions,fixtures,snapshots, and the CQ decorator) increase the API surface oftrails.testing. Mitigation: each submodule is opt-in; the existing helpers remain the default entry point. - Snapshot maintenance. N-Triples snapshots can be noisy when
blank node identifiers change. Mitigation: use Oxigraph's canonical
serializer for deterministic blank-node IDs; provide
--snapshot-updatefor intentional changes. - CQ report is only as good as CQ coverage. If developers don't
write CQ-annotated tests, the report is empty. Mitigation:
trails newgenerators scaffold CQ-annotated tests by default; docs emphasize CQs as the starting point for test design.
Neutral¶
- Does not replace pytest. All primitives are pytest-native (markers, fixtures, plain assertions). Developers who prefer raw pytest can ignore the CQ layer entirely.
- Does not introduce property-based testing. Projects that want
property-based KG testing can use Hypothesis directly with
fresh_context()— no framework support needed.
Non-goals¶
- UI test runner. No browser-based test dashboard. The CLI and pytest's existing reporters are sufficient.
- Property-based testing. Use Hypothesis directly with
fresh_context(). A Hypothesis strategy for graph generation is a future ADR candidate, not part of this one. - Custom test runner.
trails testwraps pytest. It does not implement test discovery, execution, or reporting from scratch. - SHACL test suites. The W3C SHACL test suite is a conformance
tool for validators, not an app-level test primitive.
shacl_valid()delegates to the kernel validator; it does not run the W3C suite.
Alternatives considered¶
-
Ship only graph assertions (no CQ layer). Rejected. Assertions without CQ metadata lose the structured-test-design story. The CQ decorator is cheap to implement and high-value for traceability.
-
Build a custom test runner instead of extending pytest. Rejected. pytest is the Python standard. Fighting it adds maintenance burden and confuses developers who already know pytest.
-
Use RDF-Unit or similar existing KG test frameworks. RDF-Unit (Kontokostas et al.) is a Java framework for SPARQL-based test cases. Its approach (test patterns as RDF resources) is powerful but alien to Python developers. Trails wraps the same concept in Python-native decorators and assertions. The CQ report can export to RDF-Unit format as a future extension.
-
Defer to "when we have real users." Rejected. Testing primitives shape how developers think about KG quality from day one. Shipping them late means retrofitting test culture; shipping them early means test culture grows with the framework.
Relationship to other ADRs¶
| ADR | Impact |
|---|---|
| ADR-0001 (Rust kernel + Python surface) | Graph assertions delegate to kernel store via Context. No new FFI surface needed. |
| ADR-0002 (Python-first shapes) | shacl_valid() delegates to the existing SHACL validator. |
| ADR-0009 (Provenance always on) | provenance_chain() assertion verifies PROV-O integrity per ADR-0009's always-on guarantee. |
| ADR-0017 (ActiveGraph ORM) | CQ tests use ORM queries (Patient.where(...)) instead of raw SPARQL — validates ADR-0017's DX promise. |
| ADR-0021 (Progressive enhancement) | Test layers follow the same additive pattern: each layer adds capability without requiring migration. |
Open questions¶
- Should
@competency_questionsupport CQ identifiers (e.g.,@competency_question("CQ-01", "Which patients...")) for traceability to external requirements documents? Recommendation: yes, optionalid=parameter. - Should snapshot testing support TTL format in addition to N-Triples? Recommendation: N-Triples only initially (deterministic, line-diffable). TTL as a future enhancement if users request it.
- Should the CQ report export to machine-readable formats (JSON, CSV)?
Recommendation: yes,
--cq-report --format jsonfor CI integration. Plain text default for humans.