Knowledge Base
A RAG layer over your repos, docs, issues, and PRs — Meilisearch for instant full-text, Qdrant for vector semantic, Reciprocal Rank Fusion to combine them, optional reranker on top.
The knowledge base is ProxifAI’s retrieval layer. Every chat mode draws on it, every @-mention pulls from it, and every agent that needs grounding hits it before the model. It’s a hybrid retrieval system — Meilisearch for full-text speed, Qdrant for vector semantics, Reciprocal Rank Fusion to merge them, and an optional HuggingFace TEI reranker for the final ordering. Source: internal/knowledgebase/.
How it fits together
Source content Ingestion Stores Retrieval
────────────── ───────── ────── ─────────
git pushes ─┐ Meilisearch /api/v1/kb/search
docs saved ─┤ → extraction → chunks ─→ index ─► (instant) ↑
issues / PRs ─┤ │ Qdrant │
@mentions ─┘ └─ embedding (TEI) ─→ index ─► (semantic) │
↓
ChatHandler
ToolsForMode
MCP / @-mentions
The pipeline is opt-in via KB_ENABLED=true plus connection details for Qdrant and Meilisearch. Without it, ProxifAI works fine but chat retrieval falls back to whatever the chat agent’s other tools (forge read, code intelligence, MCP) can surface.
Three search modes
POST /api/v1/kb/search accepts a mode field. Implementation in search/search.go.
| Mode | Engine | When |
|---|---|---|
instant | Meilisearch full-text + typo tolerance | Exact keywords, identifiers, file paths |
semantic | Qdrant vector similarity | Conceptual queries where wording differs from source |
hybrid (default) | Both, run in parallel + Reciprocal Rank Fusion | Most queries — gets best of both |
When mode is omitted, hybrid is used. RRF combines instant and semantic ranks via score = sum(1 / (k + rank_in_each_engine)) (default k=60). The hybrid path is a single round-trip to the API — both engines query in parallel, results are merged, deduplicated by chunk ID, and limited.
Reranking
If a RerankerClient is configured (HuggingFace TEI, default model bge-reranker), the merged result list is reranked before being returned. The reranker scores each (query, chunk) pair end-to-end and bumps relevant chunks to the top — the trade-off is a ~50–200 ms latency hit. Disable by leaving the reranker URL unset.
Embeddings
Vector embeddings come from a HuggingFace Text Embeddings Inference instance (embedding/client.go).
| Setting | Default (local dev) | Default (production) |
|---|---|---|
| Model | BAAI/bge-small-en-v1.5 | BAAI/bge-m3 |
| Vector size | 384 | 1024 |
| Configurable via | EMBEDDING_MODEL + VECTOR_SIZE | same |
bge-m3 is multilingual + supports both dense and sparse vectors; bge-small-en-v1.5 is the lightweight English-only choice for laptops and CI. Pick based on the corpus you index — switching mid-deployment requires re-embedding everything (vector size mismatch will reject queries).
Ingestion
Content is pushed onto the KNOWLEDGE JetStream stream under kb.ingest.> subjects (ingestion/). The kb-worker subscribes, extracts text per source type, chunks it, embeds each chunk, and writes both Meilisearch and Qdrant atomically.
Source types ingested:
| Source | Trigger | Chunk strategy |
|---|---|---|
| Code | git.push events | Per-symbol via tree-sitter; falls back to fixed-window for unsupported languages |
| Documents | Save in the docs UI | Heading-aware splitting |
| Issues | issue.* events | Title + description + comments as a single chunk per issue |
| Pull requests | pr.* events | Title + body + diff summary + reviews per chunk |
| Commits | git.push events | Commit message + files changed |
Re-ingestion is incremental — only changed files re-embed on each push. The full surface is callable via POST /api/v1/kb/ingest for custom sources.
The ingestion path uses the LLM gateway for embedding calls when configured to. Token usage attribution flows through the same usage stream as chat completions.
Search responses
A search call returns chunks plus enough metadata to render a citation:
{
"query": "auth middleware jwt",
"mode": "hybrid",
"totalHits": 42,
"queryTimeMs": 87,
"results": [
{
"documentId": "doc_…",
"chunkId": "chk_…",
"title": "internal/auth/auth.go",
"content": "func Middleware(next http.Handler) http.Handler { … }",
"sourceType": "code",
"sourceId": "github.com/your-org/repo:internal/auth/auth.go",
"score": 0.87,
"highlights": { "content": ["…validates <em>JWT</em> Bearer tokens…"] }
},
…
]
}
The chat agent receives results in this shape via the search_knowledge_base tool. The web UI’s chat surface renders sourceType + title as clickable citation chips below each AI message.
Cross-org isolation
Every chunk is tagged with its source orgId. Search queries always include the caller’s org as a filter — Meilisearch via a filterable attribute, Qdrant via a payload filter. There’s no shared knowledge across orgs even on a single deployment.
Within an org, results respect RBAC — chunks from repos a user doesn’t have read access to are filtered out before returning. The filter happens in the API layer, not the index, so an admin browsing the underlying Qdrant or Meilisearch directly sees everything.
@-mentions
The chat input recognizes @-mention prefixes that pull specific entities into the conversation context as full payloads (not just chunks):
| Mention | Brings in |
|---|---|
@<repo-name> | Repo summary + recent activity |
@<file-path> | Full file contents at the current branch HEAD |
@<issue-id> | Issue title, description, comments, linked PRs |
@<pr-number> | PR title, body, diff, reviews, CI status |
@<doc-title> | Full document body |
Mentions are an autocomplete affordance in the chat input — typing @ opens a fuzzy filter over the user’s recent + relevant entities. Internally, mentioned entities are appended to the query’s source list and bypass retrieval ranking (they’re guaranteed to land in context).
The @proxifai mention on issue and PR comments has a different purpose — it triggers the comment-trigger workflow that dispatches the named agent.
Configuration
| Env var | Default | Effect |
|---|---|---|
KB_ENABLED | false | Master switch — when false, /api/v1/kb/* returns 404 |
QDRANT_URL | required if KB enabled | Qdrant gRPC/HTTP endpoint |
QDRANT_API_KEY | optional | Qdrant cloud auth |
MEILISEARCH_URL | required if KB enabled | Meilisearch endpoint |
MEILISEARCH_API_KEY | optional | Meilisearch master key |
EMBEDDING_URL | required if KB enabled | TEI service URL |
EMBEDDING_MODEL | BAAI/bge-small-en-v1.5 | Tag for ingest metadata |
VECTOR_SIZE | 384 | Must match the model |
RERANKER_URL | optional | TEI reranker; if unset, no reranking |
KB_NATS_URL | inherits | NATS for ingestion stream |
REST endpoints
| Method · Path | Purpose |
|---|---|
POST /api/v1/kb/search | Search — {query, mode, sourceType, limit, offset} |
POST /api/v1/kb/ingest | Manually queue a chunk for ingestion |
GET /api/v1/kb/sources | List ingested source types and counts |
GET /api/v1/kb/health | Embedding service + index health |
DELETE /api/v1/kb/documents/{id} | Remove a document and its chunks |
pfai kb search, pfai kb sources, and pfai kb ingest wrap the most common ones.
See also
The four chat modes that draw on this layer — and which tools each one gets.
Where embedding calls go through — provider routing for the embedding model itself.
A separate graph layer for blast-radius and community detection — complementary, not duplicate.
Listen for git.push / issue.created events to drive your own ingestion pipelines.