Vector Retrieval¶
trails.vector is the M10 Phase 2 surface for embeddings, vector
storage, and hybrid graph+vector retrieval. It is the third primitive
in the app-builder trilogy specified by
ADR-0019 — complementing
trails.ingest (Phase 1) and the forthcoming trails-admin UI
(Phase 3).
The module follows the same "thin adapter, strong defaults" shape as
the rest of Trails: the embedder and the vector store are pluggable
protocols; two embedders (local + API) and two stores (zero-ops +
scale) ship in-tree; optional dependencies are lazy-imported and
surface a clean TrailsError when the extra is not installed.
Quickstart¶
from trails.vector import (
MockEmbedder,
SqliteVecStore,
retrieve,
)
embedder = MockEmbedder(dim=16, seed=1) # deterministic, for tests
store = SqliteVecStore(path=":memory:", dim=16) # or path="vectors.db"
# Index a few chunks
chunks = [
("trails://app/Chunk/1", "Rails is a web framework"),
("trails://app/Chunk/2", "Elephants are large mammals"),
("trails://app/Chunk/3", "Ruby on Rails was released in 2004"),
]
for iri, text in chunks:
store.add(
id=iri,
vector=embedder.embed(text),
metadata={"iri": iri, "snippet": text},
)
# Pure vector retrieval
hits = retrieve(
"what is Rails",
mode="vector",
k=3,
vector_store=store,
embedder=embedder,
)
for h in hits:
print(h.iri, h.score, h.snippet)
In production you will swap MockEmbedder for
SentenceTransformerEmbedder (local, no API cost) or OpenAIEmbedder
(API). The surface is identical.
Embedders¶
SentenceTransformerEmbedder — local default¶
from trails.vector import SentenceTransformerEmbedder
embedder = SentenceTransformerEmbedder(model="all-MiniLM-L6-v2")
embedder.dim # 384
Requires sentence-transformers:
all-MiniLM-L6-v2 is a 22 MB, 384-dim model that runs comfortably on
CPU. For higher quality at the cost of size, try all-mpnet-base-v2
(420 MB, 768-dim).
OpenAIEmbedder — API¶
Requires the openai SDK:
API-key resolution falls back to OPENAI_API_KEY when api_key=None.
Supply dim=512 (or any supported size) to request OpenAI's
dimensionality-reduction feature on -3-small / -3-large.
MockEmbedder — tests¶
Deterministic pseudo-random embedder. embed(text) is a pure function
of (text, seed), so tests can compare embeddings by value without
touching any model:
from trails.vector import MockEmbedder
a = MockEmbedder(dim=16, seed=1)
b = MockEmbedder(dim=16, seed=1)
assert a.embed("hello") == b.embed("hello") # deterministic
Anthropic?¶
Anthropic does not currently expose a first-party embedding endpoint
(Voyage AI is recommended through Anthropic's docs but ships through
its own SDK). ADR-0019 §"Open questions" #5 tracks this; for now the
"API embedder" slot is filled by OpenAIEmbedder and a future
VoyageEmbedder can be added without interface churn.
Vector stores¶
SqliteVecStore — zero-ops default¶
from trails.vector import SqliteVecStore
store = SqliteVecStore(path="app-vectors.db", dim=384)
# or path=":memory:" for ephemeral tests
Uses the sqlite-vec loadable
extension — SQLite + a virtual table + zero ops. Included in the
vector extra.
QdrantStore — scale adapter¶
from trails.vector import QdrantStore
store = QdrantStore(
collection="docs-chunks",
dim=384,
url="http://localhost:6333", # or host=..., port=...
)
Requires qdrant-client:
When neither url nor host is given, the adapter falls back to
qdrant-client's in-process ":memory:" sentinel — handy for
local development without a running Qdrant server.
Shared interface¶
Both stores implement the same contract:
| Method | Shape | Notes |
|---|---|---|
add(id, vector, metadata=None) |
id: str, vector: list[float] |
Upsert — re-adding the same id overwrites. |
search(vector, k=10) |
returns [{id, score, metadata}] |
score is higher-is-better (similarity). |
delete(id) |
returns bool |
True when a row was removed. |
count() |
returns int |
Exact count. |
dim |
property | Vector dimension. |
retrieve() — one entry point, three modes¶
from trails.vector import retrieve
hits = retrieve(
query,
ctx=ctx, # trails.context.Context
mode="hybrid", # "graph" | "vector" | "hybrid"
k=10,
sparql_filter=None, # optional SPARQL narrow
vector_store=store,
embedder=embedder,
)
mode="graph" — SPARQL only¶
Runs sparql_filter against ctx.kg and returns the IRIs. Used for
uniformity when your app sometimes wants graph-only results; direct
ctx.kg.query(...) is equally valid.
mode="vector" — similarity only¶
Embeds query and asks the store for its top k. No graph
interaction. Requires vector_store and embedder.
mode="hybrid" — SPARQL narrow + vector rerank¶
- Run
sparql_filterto produce a candidate IRI set. - Embed
queryand search the vector store (over-fetching to compensate for filter drop-out). - Keep only hits whose
metadata["iri"]is in the candidate set. - Return the top
k.
This is the load-bearing path for compliance-shaped apps — "find evidence supporting claim X, ranked by semantic similarity, but only from documents created after 2024-01-01". The ADR-0019 sketch runs verbatim:
hits = retrieve(
"evidence that the defendant was present at the scene",
ctx=ctx,
mode="hybrid",
k=20,
sparql_filter="""
SELECT ?iri WHERE {
?iri schema:isPartOf ?doc .
?doc schema:dateCreated ?d .
FILTER (?d >= "2024-01-01"^^xsd:date)
}
""",
vector_store=store,
embedder=embedder,
)
Why filter-then-rank, not rank-then-filter?¶
We narrow with SPARQL first, then score. The alternative — rank-then-filter (take the vector top-N, intersect with the SPARQL set) — is tempting because it pushes the heavy lifting into the ANN index, but it loses recall catastrophically whenever the candidate set is small relative to the corpus: the top-N from the whole corpus may not intersect the narrow candidate set at all. Filter-then-rank is correct in both regimes.
ADR-0019 flags reciprocal-rank-fusion (RRF) as a follow-up — once we
have real compliance data to tune against, we will ship it under a
fusion= kwarg, without changing the current default.
Back-references to ingestion¶
The store holds only (id, vector, metadata) — it does not know
anything about the KG. The hybrid path honours a single convention: a
metadata["iri"] entry back-references the graph IRI. When you wire
trails.ingest to trails.vector you should index one vector per
chunk and set metadata["iri"] to the chunk's minted IRI:
for chunk in document.chunks:
store.add(
id=chunk.iri, # same as metadata["iri"] is fine
vector=embedder.embed(chunk.text),
metadata={
"iri": chunk.iri, # back-reference
"doc_iri": document.iri,
"snippet": chunk.text[:280],
},
)
metadata["snippet"] and metadata["text"] are both picked up by
RetrievalHit.snippet, so the default result shape already has a
preview ready.
Testing¶
Use MockEmbedder together with trails.testing.fresh_context /
trails.testing.isolated_kernel:
from trails.testing import fresh_context, isolated_kernel
from trails.vector import MockEmbedder, SqliteVecStore, retrieve
def test_my_retrieval():
with isolated_kernel():
ctx = fresh_context()
emb = MockEmbedder(dim=16, seed=1)
store = SqliteVecStore(":memory:", dim=16)
# ... index + assert ...
See python/tests/test_vector.py for the full test surface.
ADR composition¶
- ADR-0019 — the app surface that added this module.
- ADR-0007 — Oxigraph remains the default triple store; vector adapters are orthogonal.
- ADR-0005 — registering retrieval as a capability (with cost + shape) arrives in a follow-up; the surface here is deliberately agnostic.
- ADR-0012 — embedding / retrieval cost wiring will land alongside the capability manifest.