ADR-0076: Explainable SHACL Validation (xpSHACL)¶
- Status: Accepted
- Date: 2026-05-25
- Extends: ADR-0002 (Python-first shapes, emit SHACL), ADR-0021 (Progressive enhancement)
- Tracks:
trails.shacl_explain
Context¶
Trails validates RDF graph instances against SHACL shapes derived from @shape and @node_type decorators (ADR-0002). When validation fails, the current validate_instance() raises a ValidationError that surfaces the failing field name and SHACL constraint type, but provides no:
- Human-readable explanation of why the constraint failed (beyond the machine-readable SHACL violation report)
- Actionable fix suggestion ("add a value for this required field", "the value must be a positive integer")
- Contextual awareness of what the shape expects vs. what the instance provided
For developer experience this is a significant gap. A ValidationError that says sh:minCount 1 violated on :Patient/email requires the developer to know SHACL vocabulary to understand what is wrong and how to fix it. For NL2SPARQL errors (shipped M29), the gap is even larger: a SPARQL syntax error from a generated query gives no guidance on what the NL question intended.
Research basis: xpSHACL (Borges et al., VLDB 2025 LLM+Graph Workshop) demonstrates that RAG over SHACL justification trees combined with LLM-generated natural-language explanations significantly improves developer comprehension and fix time. Critically, the paper shows that a rule-based template approach (no LLM required) already provides most of the benefit for standard constraint types — LLM enhancement adds value primarily for complex nested violations and for generating domain-specific fix suggestions.
Goals:
- Rule-based explanations for all standard SHACL constraint types, working offline with no LLM.
- Optional LLM-enhanced explanations for complex violations and domain-specific fix suggestions.
- Violation caching so repeated identical violations (common in batch validation) are explained once.
- SPARQL error explanations for the NL2SPARQL pipeline (M29).
Decision¶
Introduce trails/shacl_explain.py with the following public surface.
ExplainedViolation¶
@dataclass
class ExplainedViolation:
constraint_type: str # e.g. "sh:MinCountConstraintComponent"
path: str | None # property path that failed, e.g. "schema:email"
shape_class: str | None # shape that the constraint belongs to, e.g. "PatientShape"
offending_value: str | None # the value (or None) that triggered the violation
focus_node: str # the IRI/blank node being validated
explanation: str # human-readable: what went wrong
fix_suggestion: str # actionable: what to change
severity: str # "violation" | "warning" | "info"
source: str # "rule_based" | "llm_enhanced"
ExplainedReport¶
@dataclass
class ExplainedReport:
valid: bool
violations: list[ExplainedViolation]
summary: str # one-line human-readable summary
def format(self, *, color: bool = True, max_violations: int = 20) -> str:
"""
Render a human-readable validation report.
Format:
- One-line summary ("3 violations found in PatientShape")
- Per-violation block: constraint, path, explanation, fix suggestion
- Color output via ANSI codes when color=True and TTY detected
- Truncates to max_violations with "N more..." footer
"""
def format_json(self) -> str:
"""Render as JSON-L for machine consumption / CI output."""
def has_severity(self, severity: str) -> bool:
"""True if any violation matches the given severity."""
validate_and_explain()¶
def validate_and_explain(
instance: dict | str,
*,
ctx: TrailsContext | None = None,
llm=None,
shape_class: str | None = None,
cache: ViolationCache | None = None,
) -> ExplainedReport:
"""
Validate an instance against SHACL shapes and return explained violations.
Parameters
----------
instance : dict | str
The instance to validate. dict is treated as a JSON-LD document;
str is treated as a Turtle/N-Triples string.
ctx : TrailsContext, optional
Live context for shape resolution. When None, shapes are inferred
from registered @node_type / @shape decorators in the calling module.
llm : LLM client (trails.llm), optional
When provided, complex violations get LLM-enhanced explanations and
domain-specific fix suggestions. When None, rule-based explanations
only (offline-safe).
shape_class : str, optional
Restrict validation to a specific shape class IRI. When None,
all applicable shapes are evaluated.
cache : ViolationCache, optional
Violation explanation cache. Pass a shared instance to reuse
explanations across batch validation calls.
Returns
-------
ExplainedReport
Fully explained report. Always returns a report; never raises
(SHACL parse errors appear as a single ExplainedViolation with
constraint_type="ParseError").
"""
explain_sparql_violation()¶
def explain_sparql_violation(
sparql: str,
error_msg: str,
*,
ctx: TrailsContext | None = None,
llm=None,
) -> ExplainedViolation:
"""
Explain a failed SPARQL query (for NL2SPARQL error feedback).
Parses the error message to identify common SPARQL error patterns
(undefined prefix, syntax error at token X, unknown predicate Y)
and generates a human-readable explanation and fix suggestion.
When llm= is provided, the explanation includes a suggested corrected
query. When None, rule-based pattern matching only.
Used by trails.nl2sparql in its LLM correction loop to provide
richer error context when re-prompting the LLM for a corrected query.
"""
Rule-based explanation templates¶
The rule-based engine maps SHACL constraint component types to explanation templates. Coverage for all standard SHACL Core constraint components:
| Constraint | Template explanation | Template fix suggestion |
|---|---|---|
sh:MinCountConstraintComponent |
"The property {path} is required but has no value." |
"Add at least {min_count} value(s) for {path}." |
sh:MaxCountConstraintComponent |
"The property {path} has {actual} values but the maximum is {max_count}." |
"Remove {excess} value(s) from {path}." |
sh:DatatypeConstraintComponent |
"The value {value} on {path} has type {actual_type} but {expected_type} is required." |
"Change the value to a {expected_type} literal." |
sh:MinExclusiveConstraintComponent |
"The value {value} on {path} must be strictly greater than {min}." |
"Use a value greater than {min}." |
sh:MaxInclusiveConstraintComponent |
"The value {value} on {path} must be at most {max}." |
"Use a value of {max} or less." |
sh:PatternConstraintComponent |
"The value {value} on {path} does not match the pattern {pattern}." |
"Ensure the value matches the regular expression {pattern}." |
sh:NodeKindConstraintComponent |
"The value on {path} must be a {expected_kind} but {value} is a {actual_kind}." |
"Change to a {expected_kind}." |
sh:ClassConstraintComponent |
"The value {value} on {path} must be an instance of {class}." |
"Ensure the value is declared as rdf:type {class}." |
sh:ClosedConstraintComponent |
"The property {path} is not allowed on {shape_class}." |
"Remove the {path} property, or add it to the shape's sh:property list." |
sh:UniqueLanguageConstraintComponent |
"Multiple values of {path} share language tag {lang}." |
"Ensure at most one value per language tag." |
ViolationCache¶
class ViolationCache:
"""
Process-lifetime LRU cache for violation explanations.
Cache key: SHA-256(constraint_type + ":" + (path or "") + ":" + (shape_class or ""))
Cache value: (explanation, fix_suggestion, source)
The cache key covers the structural identity of the violation —
same constraint on the same path on the same shape class reuses the
explanation regardless of which specific instance triggered it.
The offending_value and focus_node are not part of the key, so they
are interpolated into the cached template at retrieval time.
"""
def __init__(self, maxsize: int = 512):
...
def get(self, key: str) -> tuple[str, str, str] | None:
...
def put(self, key: str, explanation: str, fix_suggestion: str, source: str) -> None:
...
@staticmethod
def make_key(constraint_type: str, path: str | None, shape_class: str | None) -> str:
"""Compute the SHA-256 cache key."""
A module-level default cache is created at import time (_DEFAULT_CACHE). Callers can pass a custom cache instance for isolation (e.g., in tests) or share the default for efficiency.
LLM-enhanced explanations¶
When llm= is provided and a violation is not satisfactorily explained by the rule-based templates (or when the constraint type is custom / non-standard), validate_and_explain() sends a structured prompt to the LLM:
You are a SHACL validation assistant. A constraint violation occurred.
Constraint type: {constraint_type}
Property path: {path}
Shape: {shape_class}
Offending value: {offending_value}
Focus node: {focus_node}
SHACL shape definition: {shape_ttl_snippet}
Explain in one sentence what is wrong, and in one sentence what the
developer should do to fix it. Be specific and avoid SHACL vocabulary
unless necessary.
The LLM response is validated for length and format before being set as the explanation and fix_suggestion. If the LLM returns an unusable response, the rule-based template is used as fallback.
LLM-enhanced explanations are cached identically to rule-based ones — if the same violation structure is explained by the LLM, subsequent occurrences use the cached result.
CLI integration¶
trails validate --explain [--llm] <file_or_iri>
trails kg ask --explain-errors "..." # NL2SPARQL with explained SPARQL errors
The --explain flag on trails validate switches from the current raw SHACL report to the human-readable ExplainedReport.format() output.
Non-goals¶
- No replacement of the existing
validate_instance()/ValidationErrorpath. The explained validation is additive — the raw SHACL report is always available. - No LLM required. The entire
trails.shacl_explainmodule works offline. LLM enhancement is strictly opt-in via thellm=parameter. - No GUI for validation results in this milestone. Terminal output only; dashboard integration is a follow-on.
- No streaming explanation for very large graphs. Batch validation with the shared
ViolationCacheis the performance path.
Consequences¶
Positive¶
- Developer experience. Actionable explanations replace cryptic SHACL vocabulary in error output. Fix time for common violations (missing required field, wrong datatype) drops significantly.
- Offline-safe. Rule-based templates require no LLM, no network, no API key. CI pipelines get useful explanations without cost.
- NL2SPARQL synergy.
explain_sparql_violation()feeds richer context into the NL2SPARQL correction loop (M29), potentially reducing correction rounds. - Caching. Batch validation of 1000 instances with the same 3 violation patterns calls the explanation logic 3 times, not 3000.
- Progressive enhancement. Rule-based → LLM-enhanced follows the framework's north star (ADR-0021): works at every level, gets richer with more investment.
Negative¶
- Template maintenance. Adding new SHACL constraint components (custom or SHACL-AF) requires adding templates. Mitigation: the LLM path handles unknown constraints; templates only needed for common/standard ones.
- LLM cost. LLM-enhanced explanations incur LLM API costs. Mitigation: caching ensures each unique violation structure is explained once per process; uncached violations are small, cheap prompts.
- Cache invalidation. The cache key does not include shape version — if a shape changes (e.g.,
sh:minCount 1→sh:minCount 2), cached explanations with the old min value may be stale until cache eviction. Mitigation: cache maxsize=512 + LRU eviction; or clear the cache on schema migration.
Non-consequences¶
- ADR-0002 (SHACL emit) unchanged. The SHACL shapes are generated identically; explanations are a read-path layer.
- ADR-0021 (progressive enhancement) upheld. The explained validation is a richer variant of existing validation, not a separate tier.
Revisit conditions¶
- If xpSHACL is extended with a formal justification tree format in a follow-on publication, adopt the format for LLM prompt construction.
- If the SHACL specification adds new standard constraint components (SHACL-AF, SHACL-JS, etc.), add corresponding rule-based templates.
- If LLM cost for explanation becomes significant (e.g., in high-volume ingestion pipelines), introduce a persistent (disk-backed) violation cache that survives process restarts.
References¶
-
Borges, E., Svátek, V., & Gangemi, A. (2025). xpSHACL: Explainable SHACL Validation Using RAG and LLMs. LLM+Graph Workshop at VLDB 2025.
-
Garijo, D., Wilcke, X., & Poveda-Villalón, M. (2024). LLMs for Ontology Engineering: A Landscape of Tasks, Methods, and Open Challenges. Preprint.