Skip to content

Vector Retrieval

trails.vector is the M10 Phase 2 surface for embeddings, vector storage, and hybrid graph+vector retrieval. It is the third primitive in the app-builder trilogy specified by ADR-0019 — complementing trails.ingest (Phase 1) and the forthcoming trails-admin UI (Phase 3).

The module follows the same "thin adapter, strong defaults" shape as the rest of Trails: the embedder and the vector store are pluggable protocols; two embedders (local + API) and two stores (zero-ops + scale) ship in-tree; optional dependencies are lazy-imported and surface a clean TrailsError when the extra is not installed.

Quickstart

from trails.vector import (
    MockEmbedder,
    SqliteVecStore,
    retrieve,
)

embedder = MockEmbedder(dim=16, seed=1)           # deterministic, for tests
store = SqliteVecStore(path=":memory:", dim=16)   # or path="vectors.db"

# Index a few chunks
chunks = [
    ("trails://app/Chunk/1", "Rails is a web framework"),
    ("trails://app/Chunk/2", "Elephants are large mammals"),
    ("trails://app/Chunk/3", "Ruby on Rails was released in 2004"),
]
for iri, text in chunks:
    store.add(
        id=iri,
        vector=embedder.embed(text),
        metadata={"iri": iri, "snippet": text},
    )

# Pure vector retrieval
hits = retrieve(
    "what is Rails",
    mode="vector",
    k=3,
    vector_store=store,
    embedder=embedder,
)
for h in hits:
    print(h.iri, h.score, h.snippet)

In production you will swap MockEmbedder for SentenceTransformerEmbedder (local, no API cost) or OpenAIEmbedder (API). The surface is identical.

Embedders

SentenceTransformerEmbedder — local default

from trails.vector import SentenceTransformerEmbedder

embedder = SentenceTransformerEmbedder(model="all-MiniLM-L6-v2")
embedder.dim   # 384

Requires sentence-transformers:

pip install 'trails[vector]'

all-MiniLM-L6-v2 is a 22 MB, 384-dim model that runs comfortably on CPU. For higher quality at the cost of size, try all-mpnet-base-v2 (420 MB, 768-dim).

OpenAIEmbedder — API

from trails.vector import OpenAIEmbedder

embedder = OpenAIEmbedder(model="text-embedding-3-small")

Requires the openai SDK:

pip install 'trails[vector-openai]'

API-key resolution falls back to OPENAI_API_KEY when api_key=None. Supply dim=512 (or any supported size) to request OpenAI's dimensionality-reduction feature on -3-small / -3-large.

MockEmbedder — tests

Deterministic pseudo-random embedder. embed(text) is a pure function of (text, seed), so tests can compare embeddings by value without touching any model:

from trails.vector import MockEmbedder

a = MockEmbedder(dim=16, seed=1)
b = MockEmbedder(dim=16, seed=1)
assert a.embed("hello") == b.embed("hello")  # deterministic

Anthropic?

Anthropic does not currently expose a first-party embedding endpoint (Voyage AI is recommended through Anthropic's docs but ships through its own SDK). ADR-0019 §"Open questions" #5 tracks this; for now the "API embedder" slot is filled by OpenAIEmbedder and a future VoyageEmbedder can be added without interface churn.

Vector stores

SqliteVecStore — zero-ops default

from trails.vector import SqliteVecStore

store = SqliteVecStore(path="app-vectors.db", dim=384)
# or path=":memory:" for ephemeral tests

Uses the sqlite-vec loadable extension — SQLite + a virtual table + zero ops. Included in the vector extra.

QdrantStore — scale adapter

from trails.vector import QdrantStore

store = QdrantStore(
    collection="docs-chunks",
    dim=384,
    url="http://localhost:6333",  # or host=..., port=...
)

Requires qdrant-client:

pip install 'trails[vector-qdrant]'

When neither url nor host is given, the adapter falls back to qdrant-client's in-process ":memory:" sentinel — handy for local development without a running Qdrant server.

Shared interface

Both stores implement the same contract:

Method Shape Notes
add(id, vector, metadata=None) id: str, vector: list[float] Upsert — re-adding the same id overwrites.
search(vector, k=10) returns [{id, score, metadata}] score is higher-is-better (similarity).
delete(id) returns bool True when a row was removed.
count() returns int Exact count.
dim property Vector dimension.

retrieve() — one entry point, three modes

from trails.vector import retrieve

hits = retrieve(
    query,
    ctx=ctx,                   # trails.context.Context
    mode="hybrid",             # "graph" | "vector" | "hybrid"
    k=10,
    sparql_filter=None,        # optional SPARQL narrow
    vector_store=store,
    embedder=embedder,
)

mode="graph" — SPARQL only

Runs sparql_filter against ctx.kg and returns the IRIs. Used for uniformity when your app sometimes wants graph-only results; direct ctx.kg.query(...) is equally valid.

mode="vector" — similarity only

Embeds query and asks the store for its top k. No graph interaction. Requires vector_store and embedder.

mode="hybrid" — SPARQL narrow + vector rerank

  1. Run sparql_filter to produce a candidate IRI set.
  2. Embed query and search the vector store (over-fetching to compensate for filter drop-out).
  3. Keep only hits whose metadata["iri"] is in the candidate set.
  4. Return the top k.

This is the load-bearing path for compliance-shaped apps — "find evidence supporting claim X, ranked by semantic similarity, but only from documents created after 2024-01-01". The ADR-0019 sketch runs verbatim:

hits = retrieve(
    "evidence that the defendant was present at the scene",
    ctx=ctx,
    mode="hybrid",
    k=20,
    sparql_filter="""
        SELECT ?iri WHERE {
            ?iri schema:isPartOf ?doc .
            ?doc schema:dateCreated ?d .
            FILTER (?d >= "2024-01-01"^^xsd:date)
        }
    """,
    vector_store=store,
    embedder=embedder,
)

Why filter-then-rank, not rank-then-filter?

We narrow with SPARQL first, then score. The alternative — rank-then-filter (take the vector top-N, intersect with the SPARQL set) — is tempting because it pushes the heavy lifting into the ANN index, but it loses recall catastrophically whenever the candidate set is small relative to the corpus: the top-N from the whole corpus may not intersect the narrow candidate set at all. Filter-then-rank is correct in both regimes.

ADR-0019 flags reciprocal-rank-fusion (RRF) as a follow-up — once we have real compliance data to tune against, we will ship it under a fusion= kwarg, without changing the current default.

Back-references to ingestion

The store holds only (id, vector, metadata) — it does not know anything about the KG. The hybrid path honours a single convention: a metadata["iri"] entry back-references the graph IRI. When you wire trails.ingest to trails.vector you should index one vector per chunk and set metadata["iri"] to the chunk's minted IRI:

for chunk in document.chunks:
    store.add(
        id=chunk.iri,                    # same as metadata["iri"] is fine
        vector=embedder.embed(chunk.text),
        metadata={
            "iri": chunk.iri,            # back-reference
            "doc_iri": document.iri,
            "snippet": chunk.text[:280],
        },
    )

metadata["snippet"] and metadata["text"] are both picked up by RetrievalHit.snippet, so the default result shape already has a preview ready.

Testing

Use MockEmbedder together with trails.testing.fresh_context / trails.testing.isolated_kernel:

from trails.testing import fresh_context, isolated_kernel
from trails.vector import MockEmbedder, SqliteVecStore, retrieve

def test_my_retrieval():
    with isolated_kernel():
        ctx = fresh_context()
        emb = MockEmbedder(dim=16, seed=1)
        store = SqliteVecStore(":memory:", dim=16)
        # ... index + assert ...

See python/tests/test_vector.py for the full test surface.

ADR composition

  • ADR-0019 — the app surface that added this module.
  • ADR-0007 — Oxigraph remains the default triple store; vector adapters are orthogonal.
  • ADR-0005 — registering retrieval as a capability (with cost + shape) arrives in a follow-up; the surface here is deliberately agnostic.
  • ADR-0012 — embedding / retrieval cost wiring will land alongside the capability manifest.