Embeddings + Vector Search
- Dense Retrieval
- Bi-Encoder
- ANN
Best for: Semantic search and retrieval Aliases: Dense Retrieval, Bi-Encoder, ANN
How it works
$$\text{sim}(q,d)=\frac{e_q^\top e_d}{\|e_q\|\,\|e_d\|}$$A bi-encoder (e.g. a sentence Transformer) maps each query and document into a dense vector $e\in\mathbb{R}^d$, and ranking reduces to cosine similarity $\text{sim}(q,d)=\frac{e_q^\top e_d}{\|e_q\|\|e_d\|}$ (or dot product) over an approximate-nearest-neighbour index such as HNSW or IVF-PQ. Training uses a contrastive InfoNCE loss $-\log\frac{\exp(e_q^\top e_d^+/\tau)}{\sum_d\exp(e_q^\top e_d/\tau)}$ so that related text lands nearby in the embedding space. A cross-encoder re-ranker then refines the top-$k$ candidates for higher precision.
When to use
Semantic search, RAG, deduplication, and recommendation over large document or item catalogs.
Watch out
Embedding model choice dominates quality; re-ranking matters; index tuning (HNSW/IVF) drives latency.
Common fields
RAG · document search · recommendations