Skip to content

ADR-0031: Federation Ontology Exchange

  • Status: Accepted
  • Date: 2026-04-18

Context

Trails federation (ADR-0023) allows instances to query each other via SPARQL SERVICE clauses and invoke remote capabilities via MCP relay. However, federated queries are currently blind: Instance A has no idea what node types, predicates, or shapes Instance B exposes. A user writing a SERVICE query against a peer must know the peer's schema out of band — there is no discovery mechanism.

This creates several problems:

  1. Fragile queries. Federated SPARQL queries reference remote IRIs by hardcoded strings. A schema change on the remote peer silently breaks queries.
  2. No tooling support. CLI and IDE tooling cannot autocomplete or validate predicates when the schema is unknown.
  3. Opaque mesh. The mesh manager (Phase 4) tracks health but not what each peer contains. An operator looking at trails federation peers sees URLs and latencies but not what data lives behind them.

Other knowledge-graph ecosystems (Solid, DCAT, VoID) solve this with schema/dataset description documents. Trails needs an equivalent that fits its progressive-enhancement model (ADR-0021) and integrates with the existing federation, MCP, and CLI surfaces.

Decision

Implement federation ontology exchange in three levels. This ADR covers Levels 1 and 2; Level 3 is deferred.

Level 1: Schema Advertisement

Each Trails instance exposes a schema advertisement — a JSON document listing:

  • Registered @node_type definitions (name, IRI, fields, types)
  • SHACL constraints from @shape declarations
  • Registered capability IDs
  • Data-discovered predicates (from the store, via SPARQL aggregation)
  • Instance metadata (name, base IRI, Trails version, generation timestamp)

The advertisement is served at:

  • HTTP: GET /schema on the FastAPI adapter (JSON response)
  • MCP: trails://schema resource (JSON text)

Level 2: Schema Discovery

Instances can fetch and cache remote peer schemas:

  • fetch_peer_schema(peer_url) fetches /schema from a peer
  • SchemaCache provides TTL-based in-memory caching (default 5 min)
  • MeshManager.schema_cache is populated during discover_peers()
  • discover_peers() includes a "schema" key in each peer dict

Level 3: Schema Alignment (deferred)

Negotiation of shared vocabularies between peers — e.g. mapping trails://app-a/Patient to trails://app-b/Subject via owl:sameAs or SKOS mappings. This requires consensus protocols and is out of scope for this increment.

CLI Surface

  • trails federation schema — show local schema advertisement
  • trails federation schema --peer warehouse — fetch and show a remote peer's schema
  • trails federation schema --json — raw JSON output

Module Structure

New module: trails.federation_schema containing:

  • SchemaAdvertisement, NodeTypeInfo, PredicateInfo — dataclasses
  • build_local_schema() — combines ORM, shapes, capabilities, and store data into an advertisement
  • schema_to_json() / schema_from_json() — serialization
  • fetch_peer_schema() — HTTP fetch with error handling
  • SchemaCache — thread-safe TTL cache

Integration points (minimal changes to existing modules):

  • http_adapter.py — new GET /schema route
  • mcp_resources.pyregister_schema_resource() for trails://schema
  • federation_mesh.pySchemaCache on MeshManager, populated during discover_peers()
  • cli/federation.pyschema subcommand

Consequences

Positive

  • Federated queries are no longer blind — peers can inspect each other's schemas before writing SERVICE clauses.
  • CLI tooling can display what a peer offers (trails federation schema --peer X), reducing trial-and-error.
  • The mesh view becomes richer: operators see not just "is it up?" but "what does it have?"
  • Schema caching avoids redundant fetches during health check rounds.
  • The advertisement is purely additive: instances that don't expose /schema simply return 404 and federation continues to work as before.

Negative

  • Schema advertisements reveal the internal structure of an instance. In security-sensitive deployments, the /schema endpoint should be gated by Cedar policy (future work).
  • The advertisement is a point-in-time snapshot. Dynamic schema changes (new @node_type registered at runtime) are only visible after the next fetch. The TTL cache mitigates staleness but does not eliminate it.
  • Level 3 alignment is deferred, so cross-instance vocabulary mapping remains manual.

Risks

  • Schema drift. If a peer changes its schema between advertisement fetches, cached information becomes stale. The TTL (default 5 min) bounds the staleness window.
  • Large schemas. An instance with hundreds of node types produces a large JSON document. Pagination is not implemented in L1/L2 but can be added if needed.

See Also

  • ADR-0023 — federation and instance mesh design
  • ADR-0021 — progressive enhancement (schema exchange is additive)
  • trails.federation_schema — implementation module
  • docs/guides/federation.md — user-facing federation guide