RAG

Hybrid Retrieval

Advanced

Hybrid Retrieval
This guide explores Hybrid Retrieval — one of the most effective techniques to dramatically improve RAG performance. You will learn how to combine semantic (dense) and keyword (sparse) search, implement multi-strategy pipelines, add reranking, integrate everything into LangGraph, and apply production best practices with full working code examples.

Hybrid Retrieval

What Is Hybrid Retrieval?

Hybrid Retrieval combines dense vector search (semantic similarity via embeddings) with sparse keyword search (BM25 or TF-IDF) to leverage the strengths of both approaches. It then fuses the results using algorithms like Reciprocal Rank Fusion (RRF) or weighted scoring. This addresses the core weaknesses of pure vector search (missing exact terms, product codes, names) and pure keyword search (no understanding of synonyms or intent).

Aspect
Semantic (Dense)
Keyword (Sparse / BM25)
Strength
Meaning, synonyms, intent
Exact matches, rare terms
Weakness
Poor on proper nouns/codes
No semantic understanding
Speed
Fast with ANN
Very fast
Best For
Natural language questions
Technical docs, error codes
from langchain_community.retrievers import BM25Retriever
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.retrievers import EnsembleRetriever

# Vector (Semantic) Retriever
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(chunks, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 8})

# Keyword (BM25) Retriever
bm25_retriever = BM25Retriever.from_documents(chunks)
bm25_retriever.k = 8

# Hybrid Ensemble
hybrid_retriever = EnsembleRetriever(
    retrievers=[vector_retriever, bm25_retriever],
    weights=[0.7, 0.3]          # Tune based on your data
)

Multi-Strategy Retrieval

You can combine more than two retrievers:
from langchain.retrievers import EnsembleRetriever

ensemble = EnsembleRetriever(
    retrievers=[vector_retriever, bm25_retriever, multi_query_retriever],
    weights=[0.6, 0.25, 0.15]
)

Hybrid Ranking Systems

Reciprocal Rank Fusion (RRF) is the most popular fusion method — it rewards documents that rank highly in multiple retrievers without needing score normalization.
LangChain’s EnsembleRetriever uses RRF by default.

Metadata + Semantic Retrieval

vector_retriever = vectorstore.as_retriever(
    search_kwargs={
        "k": 10,
        "filter": {"category": "technical", "date": {"$gte": "2025-01-01"}}
    }
)
Combine metadata filtering with hybrid for powerful enterprise search.

Dense vs Sparse Retrieval

  • Dense: Embeddings (high-dimensional vectors)
  • Sparse: BM25, SPLADE, or TF-IDF (term-based)
Many modern vector databases (Weaviate, Pinecone, Elasticsearch, Qdrant) now support native hybrid with built-in sparse vectors.

Query Expansion Techniques

Improve both retrievers with:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.retrievers import MultiVectorRetriever

# HyDE (Hypothetical Document Embeddings)
hyde_prompt = "Write a hypothetical answer to this question..."
# Then embed the hypothetical answer instead of raw query

Reranking Retrieved Documents

The most powerful upgrade after hybrid.
from langchain.retrievers.document_compressors import CohereRerank
from langchain.retrievers import ContextualCompressionRetriever

compressor = CohereRerank(model="rerank-english-v3.0", top_n=5)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=hybrid_retriever
)

# Alternative: Open-source BGE reranker
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-large")
compressor = CrossEncoderReranker(model=model, top_n=6)
Recommended rerankers in 2026: Cohere Rerank 3, Voyage Rerank, BGE-reranker-large.

Hybrid Retrieval in LangGraph

from typing import TypedDict, Annotated, List
from langchain_core.documents import Document
import operator

class State(TypedDict):
    messages: Annotated[list, operator.add]
    context: List[Document]

def hybrid_retrieve(state: State):
    query = state["messages"][-1].content
    docs = compression_retriever.invoke(query)   # hybrid + rerank
    return {"context": docs}

# Add to your graph
graph.add_node("retrieve", hybrid_retrieve)

Performance and Accuracy Tradeoffs

  • Hybrid + Rerank usually gives 10-30%+ better retrieval metrics
  • Latency increases (reranking top 20–50 candidates)
  • Cost increases slightly with API rerankers
  • Start with hybrid only → add reranking for high-value queries

Common Hybrid Retrieval Mistakes

  • Wrong weights (default 0.5/0.5 often suboptimal)
  • Using BM25 on poorly tokenized text (e.g., non-English)
  • No reranking after fusion
  • Ignoring metadata filtering
  • Not tuning fetch_k and top_n
  • Over-relying on one retriever
  • No evaluation against ground truth
  • Using hybrid everywhere instead of routing by query type

Best Practices for Hybrid Retrieval

  1. Always combine BM25 + Vector as baseline
  2. Use RRF or learnable fusion when possible
  3. Add cross-encoder reranking on top-20~50 results
  4. Tune weights based on your domain (more BM25 for technical docs)
  5. Enrich metadata and filter aggressively
  6. Implement query expansion (MultiQuery, HyDE)
  7. Evaluate with context_precision/recall (RAGAS)
  8. Route queries: simple factual → hybrid, complex → agentic multi-hop
  9. Monitor retrieval quality in production with user feedback
  10. Abstract your hybrid retriever for easy experimentation
Pro Tip – Production Hybrid Retriever Class
from langchain.retrievers import EnsembleRetriever, ContextualCompressionRetriever

class ProductionHybridRetriever:
    def __init__(self, chunks, embeddings_model="text-embedding-3-small"):
        self.vector_retriever = Chroma.from_documents(chunks, OpenAIEmbeddings(model=embeddings_model)).as_retriever(search_kwargs={"k": 12})
        self.bm25_retriever = BM25Retriever.from_documents(chunks)
        self.bm25_retriever.k = 12
        
        self.ensemble = EnsembleRetriever(
            retrievers=[self.vector_retriever, self.bm25_retriever],
            weights=[0.65, 0.35]
        )
        
        # Add reranker
        self.reranker = CohereRerank(top_n=6)
        self.final_retriever = ContextualCompressionRetriever(
            base_compressor=self.reranker,
            base_retriever=self.ensemble
        )
    
    def invoke(self, query: str):
        return self.final_retriever.invoke(query)

# Usage
retriever = ProductionHybridRetriever(chunks)
docs = retriever.invoke("What are the latest LangGraph breakpoints?")
Hybrid Retrieval is often the single highest-leverage improvement you can make to any RAG system. When combined with good chunking, metadata, and reranking, it turns “okay” retrieval into production-grade reliability.

AI agent LangChain LangGraph Python RAG

← All training