RAG
Hybrid Retrieval
Advanced
This guide explores
Hybrid Retrieval
— one of the most effective techniques to dramatically improve RAG performance. You will learn how to combine semantic (dense) and keyword (sparse) search, implement multi-strategy pipelines, add reranking, integrate everything into LangGraph, and apply production best practices with full working code examples.
Hybrid Retrieval
What Is Hybrid Retrieval?
Hybrid Retrieval combines dense vector search (semantic similarity via embeddings) with sparse keyword search (BM25 or TF-IDF) to leverage the strengths of both approaches. It then fuses the results using algorithms like Reciprocal Rank Fusion (RRF) or weighted scoring.
This addresses the core weaknesses of pure vector search (missing exact terms, product codes, names) and pure keyword search (no understanding of synonyms or intent).
Semantic Search vs Keyword Search
|
Aspect
|
Semantic (Dense)
|
Keyword (Sparse / BM25)
|
|---|---|---|
|
Strength
|
Meaning, synonyms, intent
|
Exact matches, rare terms
|
|
Weakness
|
Poor on proper nouns/codes
|
No semantic understanding
|
|
Speed
|
Fast with ANN
|
Very fast
|
|
Best For
|
Natural language questions
|
Technical docs, error codes
|
Combining BM25 with Vector Search
from langchain_community.retrievers import BM25Retriever
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.retrievers import EnsembleRetriever
# Vector (Semantic) Retriever
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(chunks, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 8})
# Keyword (BM25) Retriever
bm25_retriever = BM25Retriever.from_documents(chunks)
bm25_retriever.k = 8
# Hybrid Ensemble
hybrid_retriever = EnsembleRetriever(
retrievers=[vector_retriever, bm25_retriever],
weights=[0.7, 0.3] # Tune based on your data
)
Multi-Strategy Retrieval
You can combine more than two retrievers:
from langchain.retrievers import EnsembleRetriever
ensemble = EnsembleRetriever(
retrievers=[vector_retriever, bm25_retriever, multi_query_retriever],
weights=[0.6, 0.25, 0.15]
)
Hybrid Ranking Systems
Reciprocal Rank Fusion (RRF) is the most popular fusion method — it rewards documents that rank highly in multiple retrievers without needing score normalization.
LangChain’s
EnsembleRetriever
uses RRF by default.
Metadata + Semantic Retrieval
vector_retriever = vectorstore.as_retriever(
search_kwargs={
"k": 10,
"filter": {"category": "technical", "date": {"$gte": "2025-01-01"}}
}
)
Combine metadata filtering with hybrid for powerful enterprise search.
Dense vs Sparse Retrieval
- Dense: Embeddings (high-dimensional vectors)
- Sparse: BM25, SPLADE, or TF-IDF (term-based)
Query Expansion Techniques
Improve both retrievers with:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.retrievers import MultiVectorRetriever
# HyDE (Hypothetical Document Embeddings)
hyde_prompt = "Write a hypothetical answer to this question..."
# Then embed the hypothetical answer instead of raw query
Reranking Retrieved Documents
The most powerful upgrade after hybrid.
from langchain.retrievers.document_compressors import CohereRerank
from langchain.retrievers import ContextualCompressionRetriever
compressor = CohereRerank(model="rerank-english-v3.0", top_n=5)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=hybrid_retriever
)
# Alternative: Open-source BGE reranker
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-large")
compressor = CrossEncoderReranker(model=model, top_n=6)
Recommended rerankers in 2026: Cohere Rerank 3, Voyage Rerank, BGE-reranker-large.
Hybrid Retrieval in LangGraph
from typing import TypedDict, Annotated, List
from langchain_core.documents import Document
import operator
class State(TypedDict):
messages: Annotated[list, operator.add]
context: List[Document]
def hybrid_retrieve(state: State):
query = state["messages"][-1].content
docs = compression_retriever.invoke(query) # hybrid + rerank
return {"context": docs}
# Add to your graph
graph.add_node("retrieve", hybrid_retrieve)
Performance and Accuracy Tradeoffs
- Hybrid + Rerank usually gives 10-30%+ better retrieval metrics
- Latency increases (reranking top 20–50 candidates)
- Cost increases slightly with API rerankers
- Start with hybrid only → add reranking for high-value queries
Common Hybrid Retrieval Mistakes
- Wrong weights (default 0.5/0.5 often suboptimal)
- Using BM25 on poorly tokenized text (e.g., non-English)
- No reranking after fusion
- Ignoring metadata filtering
- Not tuning fetch_k and top_n
- Over-relying on one retriever
- No evaluation against ground truth
- Using hybrid everywhere instead of routing by query type
Best Practices for Hybrid Retrieval
- Always combine BM25 + Vector as baseline
- Use RRF or learnable fusion when possible
- Add cross-encoder reranking on top-20~50 results
- Tune weights based on your domain (more BM25 for technical docs)
- Enrich metadata and filter aggressively
- Implement query expansion (MultiQuery, HyDE)
- Evaluate with context_precision/recall (RAGAS)
- Route queries: simple factual → hybrid, complex → agentic multi-hop
- Monitor retrieval quality in production with user feedback
- Abstract your hybrid retriever for easy experimentation
Pro Tip – Production Hybrid Retriever Class
from langchain.retrievers import EnsembleRetriever, ContextualCompressionRetriever
class ProductionHybridRetriever:
def __init__(self, chunks, embeddings_model="text-embedding-3-small"):
self.vector_retriever = Chroma.from_documents(chunks, OpenAIEmbeddings(model=embeddings_model)).as_retriever(search_kwargs={"k": 12})
self.bm25_retriever = BM25Retriever.from_documents(chunks)
self.bm25_retriever.k = 12
self.ensemble = EnsembleRetriever(
retrievers=[self.vector_retriever, self.bm25_retriever],
weights=[0.65, 0.35]
)
# Add reranker
self.reranker = CohereRerank(top_n=6)
self.final_retriever = ContextualCompressionRetriever(
base_compressor=self.reranker,
base_retriever=self.ensemble
)
def invoke(self, query: str):
return self.final_retriever.invoke(query)
# Usage
retriever = ProductionHybridRetriever(chunks)
docs = retriever.invoke("What are the latest LangGraph breakpoints?")
Hybrid Retrieval is often the single highest-leverage improvement you can make to any RAG system. When combined with good chunking, metadata, and reranking, it turns “okay” retrieval into production-grade reliability.
AI agent LangChain LangGraph Python RAG