Skip to content

ADR-0029: KG Test Primitives — Competency Questions to Assertions

  • Status: Accepted (2026-04-19)
  • Date: 2026-04-17

Context

Trails ships trails.testing with four helpers (isolated_kernel, mock_llm, capture_events, fresh_context) that solve infrastructure isolation — clean registries, deterministic LLM responses, scoped observability. What is missing is graph-state testing: asserting that the knowledge graph contains the right triples, instances, and provenance after a capability runs.

Three forces converge to make this the right time:

  1. ISTQB-shaped test design. The project author holds ISTQB Test Manager and Test Analyst certifications. Structured test design — test conditions derived from requirements, traceability between conditions and test cases, coverage metrics — is non-negotiable. Knowledge graphs have a natural test-condition primitive that the semantic-web community already uses: competency questions (CQs). A CQ is a natural-language question the ontology must be able to answer; it maps directly to an ISTQB test condition.

  2. Research validates the pipeline. TESTaLOD (Blomqvist et al.) and the ISWC eXtreme Design methodology both show that KG developers want a CQ → SPARQL → assertion pipeline. Developers write competency questions during ontology design, then translate them to SPARQL queries, then wrap those queries in executable tests. Today this pipeline is manual and ad-hoc. A framework primitive makes it reproducible.

  3. "Rails for KG" without rails test is incomplete. Rails ships test generators, fixtures, assertions, and a test runner that wraps Minitest/RSpec. Trails currently extends pytest with isolation helpers but provides no KG-specific assertions, no graph fixtures, no snapshot testing, and no CQ coverage reporting. Developers fall back to raw SPARQL in test bodies — the same anti-pattern ADR-0017 addressed in production code.

The existing trails.testing surface (Level 0) is a solid foundation. This ADR adds four layers on top, following the progressive-enhancement principle from ADR-0021: each layer is additive, and code at a lower level keeps working unchanged.

Decision

Trails ships five test layers, each additive. No migration required between layers. Everything extends pytest — no custom runner.

Layer 1: Graph assertions — trails.testing.assertions

A module of composable assertion functions that inspect the kernel store through an existing Context. Every function raises AssertionError with a KG-aware message (including the triple or type that failed) so pytest's native reporting picks it up.

from trails.testing import assert_graph

assert_graph.has_type(ctx, "Patient")              # type exists in store
assert_graph.has_instance(ctx, "Patient", id=x)    # instance with id exists
assert_graph.triple_exists(ctx, s, p, o)           # specific triple present
assert_graph.count(ctx, "Patient") == 5            # instance count
assert_graph.field_equals(ctx, instance, "name", "Alice")
assert_graph.shacl_valid(ctx)                      # full SHACL validation passes
assert_graph.no_orphans(ctx)                       # no dangling references
assert_graph.provenance_chain(ctx, instance)       # PROV-O chain intact

Design constraints:

  • Functions accept a Context, not a raw store — consistent with the existing fresh_context() helper and every ORM entry point.
  • Each function is a standalone callable, not a method on a class — composable in plain pytest without subclassing.
  • Error messages include the SPARQL query that failed (when applicable) so developers can debug in a SPARQL console.
  • shacl_valid(ctx) delegates to the kernel validator (ADR-0002); does not reimplement validation in Python.

Layer 2: Competency questions — @competency_question decorator

A decorator that marks a test as implementing a competency question. The CQ text is metadata collected at test-discovery time and surfaced in reports.

from trails.testing import competency_question

@competency_question("Which patients have more than 2 encounters?")
def test_frequent_patients(ctx):
    results = Patient.where(encounter_count__gt=2).fetch(ctx)
    assert len(results) > 0
    for p in results:
        assert p.encounter_count > 2

ISTQB mapping:

ISTQB concept Trails CQ equivalent
Test condition Competency question text
Test case Decorated test function
Test coverage CQ coverage report (see CLI)
Test basis Ontology + domain requirements

Mechanics:

  • The decorator stores the CQ string as test_func._cq_text and adds a pytest marker (@pytest.mark.competency_question).
  • A pytest plugin (shipped in trails.testing) collects all marked tests at session end and writes a coverage report.
  • trails test --cq-report outputs which CQs are covered, which are pending (declared but @pytest.mark.skip), and which have no test.
  • CQs can also be declared without a test body (for planning): @competency_question("...", pending=True).

Layer 3: Graph fixtures — trails.testing.fixtures

Reusable graph-state factories, analogous to Rails fixtures or pytest factories. Two forms: programmatic and file-based.

Programmatic fixtures:

from trails.testing import graph_fixture

@graph_fixture("patients")
def sample_patients(ctx):
    p1 = Patient(name="Alice", age=30)
    p2 = Patient(name="Bob", age=45)
    ctx.kg.add(p1); ctx.kg.add(p2)
    return {"alice": p1, "bob": p2}

def test_something(ctx, patients):
    assert patients["alice"].name == "Alice"

File-based fixtures (TTL/N-Triples):

from trails.testing import load_fixture

def test_with_ttl(ctx):
    load_fixture(ctx, "fixtures/patients.ttl")
    patients = Patient.where().fetch(ctx)
    assert len(patients) == 3

Isolation guarantee: every test gets a fresh graph context. The graph_fixture decorator integrates with isolated_kernel to ensure fixture data does not leak between tests. File-based fixtures are loaded into the test's scoped store, not the singleton.

Discovery: trails test --fixtures lists all registered @graph_fixture names with their docstrings.

Layer 4: Snapshot testing — trails.testing.snapshots

Compare the current graph state against a known-good serialization. Useful for migration testing and regression detection.

from trails.testing import assert_graph_snapshot

def test_after_migration(ctx):
    run_migration(ctx)
    assert_graph_snapshot(ctx, "after_migration.nt")  # N-Triples snapshot

Mechanics:

  • Snapshots are stored as N-Triples files (deterministic sort order, blank-node-stable via Oxigraph's canonical serializer).
  • On first run with no snapshot file, the test writes the snapshot and passes (like Jest's snapshot behaviour).
  • On subsequent runs, the test compares current graph output against the stored snapshot. Diff is shown triple-by-triple.
  • trails test --snapshot-update regenerates all snapshots from current state.
  • Snapshot files live in tests/__snapshots__/ by convention.

CLI surface

All commands wrap pytest with Trails context — no custom test runner.

Command Effect
trails test Run all tests (wraps pytest with trails plugin loaded)
trails test --cq-report Competency question coverage report
trails test --fixtures List registered graph fixtures
trails test --snapshot-update Regenerate snapshot files
trails test -k <pattern> Pass-through to pytest -k filter

Progressive enhancement levels

Level What Requires
0 Plain pytest + isolated_kernel trails.testing (today)
1 Graph assertions trails.testing.assertions
2 Competency questions @competency_question decorator
3 Graph fixtures + snapshots trails.testing.fixtures, trails.testing.snapshots
4 Coverage reports trails test --cq-report

Each level is additive. A project using Level 0 today gains Level 1 by importing assert_graph — no migration, no config change.

Consequences

Positive

  • Structured test design for KGs. Competency questions bridge the gap between ontology design and test automation — the same gap that TESTaLOD and XD methodology identified.
  • ISTQB alignment. Test conditions (CQs), test cases (decorated functions), and coverage metrics (CQ report) map directly to ISTQB Foundation concepts. Test managers can read the CQ report without understanding SPARQL.
  • Rails parity. trails test becomes as natural as rails test. Fixtures, assertions, and a test runner that "just works" lower the barrier for developers new to KGs.
  • Progressive. ADR-0021 compliance: each layer is additive; no existing test code breaks.
  • Debuggable. Graph assertions include the failing SPARQL query in the error message. Snapshot diffs are triple-by-triple. CQ reports show exactly which domain questions are untested.

Negative

  • Surface area. Four new submodules (assertions, fixtures, snapshots, and the CQ decorator) increase the API surface of trails.testing. Mitigation: each submodule is opt-in; the existing helpers remain the default entry point.
  • Snapshot maintenance. N-Triples snapshots can be noisy when blank node identifiers change. Mitigation: use Oxigraph's canonical serializer for deterministic blank-node IDs; provide --snapshot-update for intentional changes.
  • CQ report is only as good as CQ coverage. If developers don't write CQ-annotated tests, the report is empty. Mitigation: trails new generators scaffold CQ-annotated tests by default; docs emphasize CQs as the starting point for test design.

Neutral

  • Does not replace pytest. All primitives are pytest-native (markers, fixtures, plain assertions). Developers who prefer raw pytest can ignore the CQ layer entirely.
  • Does not introduce property-based testing. Projects that want property-based KG testing can use Hypothesis directly with fresh_context() — no framework support needed.

Non-goals

  • UI test runner. No browser-based test dashboard. The CLI and pytest's existing reporters are sufficient.
  • Property-based testing. Use Hypothesis directly with fresh_context(). A Hypothesis strategy for graph generation is a future ADR candidate, not part of this one.
  • Custom test runner. trails test wraps pytest. It does not implement test discovery, execution, or reporting from scratch.
  • SHACL test suites. The W3C SHACL test suite is a conformance tool for validators, not an app-level test primitive. shacl_valid() delegates to the kernel validator; it does not run the W3C suite.

Alternatives considered

  1. Ship only graph assertions (no CQ layer). Rejected. Assertions without CQ metadata lose the structured-test-design story. The CQ decorator is cheap to implement and high-value for traceability.

  2. Build a custom test runner instead of extending pytest. Rejected. pytest is the Python standard. Fighting it adds maintenance burden and confuses developers who already know pytest.

  3. Use RDF-Unit or similar existing KG test frameworks. RDF-Unit (Kontokostas et al.) is a Java framework for SPARQL-based test cases. Its approach (test patterns as RDF resources) is powerful but alien to Python developers. Trails wraps the same concept in Python-native decorators and assertions. The CQ report can export to RDF-Unit format as a future extension.

  4. Defer to "when we have real users." Rejected. Testing primitives shape how developers think about KG quality from day one. Shipping them late means retrofitting test culture; shipping them early means test culture grows with the framework.

Relationship to other ADRs

ADR Impact
ADR-0001 (Rust kernel + Python surface) Graph assertions delegate to kernel store via Context. No new FFI surface needed.
ADR-0002 (Python-first shapes) shacl_valid() delegates to the existing SHACL validator.
ADR-0009 (Provenance always on) provenance_chain() assertion verifies PROV-O integrity per ADR-0009's always-on guarantee.
ADR-0017 (ActiveGraph ORM) CQ tests use ORM queries (Patient.where(...)) instead of raw SPARQL — validates ADR-0017's DX promise.
ADR-0021 (Progressive enhancement) Test layers follow the same additive pattern: each layer adds capability without requiring migration.

Open questions

  • Should @competency_question support CQ identifiers (e.g., @competency_question("CQ-01", "Which patients...")) for traceability to external requirements documents? Recommendation: yes, optional id= parameter.
  • Should snapshot testing support TTL format in addition to N-Triples? Recommendation: N-Triples only initially (deterministic, line-diffable). TTL as a future enhancement if users request it.
  • Should the CQ report export to machine-readable formats (JSON, CSV)? Recommendation: yes, --cq-report --format json for CI integration. Plain text default for humans.