ADR-0029: KG Test Primitives — Competency Questions to Assertions¶

Status: Accepted (2026-04-19)
Date: 2026-04-17

Context¶

Trails ships trails.testing with four helpers (isolated_kernel, mock_llm, capture_events, fresh_context) that solve infrastructure isolation — clean registries, deterministic LLM responses, scoped observability. What is missing is graph-state testing: asserting that the knowledge graph contains the right triples, instances, and provenance after a capability runs.

Three forces converge to make this the right time:

ISTQB-shaped test design. The project author holds ISTQB Test Manager and Test Analyst certifications. Structured test design — test conditions derived from requirements, traceability between conditions and test cases, coverage metrics — is non-negotiable. Knowledge graphs have a natural test-condition primitive that the semantic-web community already uses: competency questions (CQs). A CQ is a natural-language question the ontology must be able to answer; it maps directly to an ISTQB test condition.
Research validates the pipeline. TESTaLOD (Blomqvist et al.) and the ISWC eXtreme Design methodology both show that KG developers want a CQ → SPARQL → assertion pipeline. Developers write competency questions during ontology design, then translate them to SPARQL queries, then wrap those queries in executable tests. Today this pipeline is manual and ad-hoc. A framework primitive makes it reproducible.
"Rails for KG" without rails test is incomplete. Rails ships test generators, fixtures, assertions, and a test runner that wraps Minitest/RSpec. Trails currently extends pytest with isolation helpers but provides no KG-specific assertions, no graph fixtures, no snapshot testing, and no CQ coverage reporting. Developers fall back to raw SPARQL in test bodies — the same anti-pattern ADR-0017 addressed in production code.

The existing trails.testing surface (Level 0) is a solid foundation. This ADR adds four layers on top, following the progressive-enhancement principle from ADR-0021: each layer is additive, and code at a lower level keeps working unchanged.

Decision¶

Trails ships five test layers, each additive. No migration required between layers. Everything extends pytest — no custom runner.

Layer 1: Graph assertions — `trails.testing.assertions`¶

A module of composable assertion functions that inspect the kernel store through an existing Context. Every function raises AssertionError with a KG-aware message (including the triple or type that failed) so pytest's native reporting picks it up.

from trails.testing import assert_graph

assert_graph.has_type(ctx, "Patient")              # type exists in store
assert_graph.has_instance(ctx, "Patient", id=x)    # instance with id exists
assert_graph.triple_exists(ctx, s, p, o)           # specific triple present
assert_graph.count(ctx, "Patient") == 5            # instance count
assert_graph.field_equals(ctx, instance, "name", "Alice")
assert_graph.shacl_valid(ctx)                      # full SHACL validation passes
assert_graph.no_orphans(ctx)                       # no dangling references
assert_graph.provenance_chain(ctx, instance)       # PROV-O chain intact

Design constraints:

Functions accept a Context, not a raw store — consistent with the existing fresh_context() helper and every ORM entry point.
Each function is a standalone callable, not a method on a class — composable in plain pytest without subclassing.
Error messages include the SPARQL query that failed (when applicable) so developers can debug in a SPARQL console.
shacl_valid(ctx) delegates to the kernel validator (ADR-0002); does not reimplement validation in Python.

Layer 2: Competency questions — `@competency_question` decorator¶

A decorator that marks a test as implementing a competency question. The CQ text is metadata collected at test-discovery time and surfaced in reports.

from trails.testing import competency_question

@competency_question("Which patients have more than 2 encounters?")
def test_frequent_patients(ctx):
    results = Patient.where(encounter_count__gt=2).fetch(ctx)
    assert len(results) > 0
    for p in results:
        assert p.encounter_count > 2

ISTQB mapping:

ISTQB concept	Trails CQ equivalent
Test condition	Competency question text
Test case	Decorated test function
Test coverage	CQ coverage report (see CLI)
Test basis	Ontology + domain requirements

Mechanics:

The decorator stores the CQ string as test_func._cq_text and adds a pytest marker (@pytest.mark.competency_question).
A pytest plugin (shipped in trails.testing) collects all marked tests at session end and writes a coverage report.
trails test --cq-report outputs which CQs are covered, which are pending (declared but @pytest.mark.skip), and which have no test.
CQs can also be declared without a test body (for planning): @competency_question("...", pending=True).

Layer 3: Graph fixtures — `trails.testing.fixtures`¶

Reusable graph-state factories, analogous to Rails fixtures or pytest factories. Two forms: programmatic and file-based.

Programmatic fixtures:

from trails.testing import graph_fixture

@graph_fixture("patients")
def sample_patients(ctx):
    p1 = Patient(name="Alice", age=30)
    p2 = Patient(name="Bob", age=45)
    ctx.kg.add(p1); ctx.kg.add(p2)
    return {"alice": p1, "bob": p2}

def test_something(ctx, patients):
    assert patients["alice"].name == "Alice"

File-based fixtures (TTL/N-Triples):

from trails.testing import load_fixture

def test_with_ttl(ctx):
    load_fixture(ctx, "fixtures/patients.ttl")
    patients = Patient.where().fetch(ctx)
    assert len(patients) == 3

Isolation guarantee: every test gets a fresh graph context. The graph_fixture decorator integrates with isolated_kernel to ensure fixture data does not leak between tests. File-based fixtures are loaded into the test's scoped store, not the singleton.

Discovery: trails test --fixtures lists all registered @graph_fixture names with their docstrings.

Layer 4: Snapshot testing — `trails.testing.snapshots`¶

Compare the current graph state against a known-good serialization. Useful for migration testing and regression detection.

from trails.testing import assert_graph_snapshot

def test_after_migration(ctx):
    run_migration(ctx)
    assert_graph_snapshot(ctx, "after_migration.nt")  # N-Triples snapshot

Mechanics:

Snapshots are stored as N-Triples files (deterministic sort order, blank-node-stable via Oxigraph's canonical serializer).
On first run with no snapshot file, the test writes the snapshot and passes (like Jest's snapshot behaviour).
On subsequent runs, the test compares current graph output against the stored snapshot. Diff is shown triple-by-triple.
trails test --snapshot-update regenerates all snapshots from current state.
Snapshot files live in tests/__snapshots__/ by convention.

CLI surface¶

All commands wrap pytest with Trails context — no custom test runner.

Command	Effect
`trails test`	Run all tests (wraps `pytest` with trails plugin loaded)
`trails test --cq-report`	Competency question coverage report
`trails test --fixtures`	List registered graph fixtures
`trails test --snapshot-update`	Regenerate snapshot files
`trails test -k <pattern>`	Pass-through to pytest `-k` filter

Progressive enhancement levels¶

Level	What	Requires
0	Plain pytest + `isolated_kernel`	`trails.testing` (today)
1	Graph assertions	`trails.testing.assertions`
2	Competency questions	`@competency_question` decorator
3	Graph fixtures + snapshots	`trails.testing.fixtures`, `trails.testing.snapshots`
4	Coverage reports	`trails test --cq-report`

Each level is additive. A project using Level 0 today gains Level 1 by importing assert_graph — no migration, no config change.

Consequences¶

Positive¶

Structured test design for KGs. Competency questions bridge the gap between ontology design and test automation — the same gap that TESTaLOD and XD methodology identified.
ISTQB alignment. Test conditions (CQs), test cases (decorated functions), and coverage metrics (CQ report) map directly to ISTQB Foundation concepts. Test managers can read the CQ report without understanding SPARQL.
Rails parity. trails test becomes as natural as rails test. Fixtures, assertions, and a test runner that "just works" lower the barrier for developers new to KGs.
Progressive. ADR-0021 compliance: each layer is additive; no existing test code breaks.
Debuggable. Graph assertions include the failing SPARQL query in the error message. Snapshot diffs are triple-by-triple. CQ reports show exactly which domain questions are untested.

Negative¶

Surface area. Four new submodules (assertions, fixtures, snapshots, and the CQ decorator) increase the API surface of trails.testing. Mitigation: each submodule is opt-in; the existing helpers remain the default entry point.
Snapshot maintenance. N-Triples snapshots can be noisy when blank node identifiers change. Mitigation: use Oxigraph's canonical serializer for deterministic blank-node IDs; provide --snapshot-update for intentional changes.
CQ report is only as good as CQ coverage. If developers don't write CQ-annotated tests, the report is empty. Mitigation: trails new generators scaffold CQ-annotated tests by default; docs emphasize CQs as the starting point for test design.

Neutral¶

Does not replace pytest. All primitives are pytest-native (markers, fixtures, plain assertions). Developers who prefer raw pytest can ignore the CQ layer entirely.
Does not introduce property-based testing. Projects that want property-based KG testing can use Hypothesis directly with fresh_context() — no framework support needed.

Non-goals¶

UI test runner. No browser-based test dashboard. The CLI and pytest's existing reporters are sufficient.
Property-based testing. Use Hypothesis directly with fresh_context(). A Hypothesis strategy for graph generation is a future ADR candidate, not part of this one.
Custom test runner. trails test wraps pytest. It does not implement test discovery, execution, or reporting from scratch.
SHACL test suites. The W3C SHACL test suite is a conformance tool for validators, not an app-level test primitive. shacl_valid() delegates to the kernel validator; it does not run the W3C suite.

Alternatives considered¶

Ship only graph assertions (no CQ layer). Rejected. Assertions without CQ metadata lose the structured-test-design story. The CQ decorator is cheap to implement and high-value for traceability.
Build a custom test runner instead of extending pytest. Rejected. pytest is the Python standard. Fighting it adds maintenance burden and confuses developers who already know pytest.
Use RDF-Unit or similar existing KG test frameworks. RDF-Unit (Kontokostas et al.) is a Java framework for SPARQL-based test cases. Its approach (test patterns as RDF resources) is powerful but alien to Python developers. Trails wraps the same concept in Python-native decorators and assertions. The CQ report can export to RDF-Unit format as a future extension.
Defer to "when we have real users." Rejected. Testing primitives shape how developers think about KG quality from day one. Shipping them late means retrofitting test culture; shipping them early means test culture grows with the framework.

Relationship to other ADRs¶

ADR	Impact
ADR-0001 (Rust kernel + Python surface)	Graph assertions delegate to kernel store via `Context`. No new FFI surface needed.
ADR-0002 (Python-first shapes)	`shacl_valid()` delegates to the existing SHACL validator.
ADR-0009 (Provenance always on)	`provenance_chain()` assertion verifies PROV-O integrity per ADR-0009's always-on guarantee.
ADR-0017 (ActiveGraph ORM)	CQ tests use ORM queries (`Patient.where(...)`) instead of raw SPARQL — validates ADR-0017's DX promise.
ADR-0021 (Progressive enhancement)	Test layers follow the same additive pattern: each layer adds capability without requiring migration.

Open questions¶

Should @competency_question support CQ identifiers (e.g., @competency_question("CQ-01", "Which patients...")) for traceability to external requirements documents? Recommendation: yes, optional id= parameter.
Should snapshot testing support TTL format in addition to N-Triples? Recommendation: N-Triples only initially (deterministic, line-diffable). TTL as a future enhancement if users request it.
Should the CQ report export to machine-readable formats (JSON, CSV)? Recommendation: yes, --cq-report --format json for CI integration. Plain text default for humans.