RAG

Hybrid Retrieval

Advanced

This guide explores Hybrid Retrieval — one of the most effective techniques to dramatically improve RAG performance. You will learn how to combine semantic (dense) and keyword (sparse) search, implement multi-strategy pipelines, add reranking, integrate everything into LangGraph, and apply production best practices with full working code examples.

Hybrid Retrieval

What Is Hybrid Retrieval?

Hybrid Retrieval combines dense vector search (semantic similarity via embeddings) with sparse keyword search (BM25 or TF-IDF) to leverage the strengths of both approaches. It then fuses the results using algorithms like Reciprocal Rank Fusion (RRF) or weighted scoring. This addresses the core weaknesses of pure vector search (missing exact terms, product codes, names) and pure keyword search (no understanding of synonyms or intent).

Semantic Search vs Keyword Search

Aspect	Semantic (Dense)	Keyword (Sparse / BM25)
Strength	Meaning, synonyms, intent	Exact matches, rare terms
Weakness	Poor on proper nouns/codes	No semantic understanding
Speed	Fast with ANN	Very fast
Best For	Natural language questions	Technical docs, error codes

Combining BM25 with Vector Search

from langchain_community.retrievers import BM25Retriever
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.retrievers import EnsembleRetriever

# Vector (Semantic) Retriever
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(chunks, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 8})

# Keyword (BM25) Retriever
bm25_retriever = BM25Retriever.from_documents(chunks)
bm25_retriever.k = 8

# Hybrid Ensemble
hybrid_retriever = EnsembleRetriever(
    retrievers=[vector_retriever, bm25_retriever],
    weights=[0.7, 0.3]          # Tune based on your data
)

Multi-Strategy Retrieval

You can combine more than two retrievers:

from langchain.retrievers import EnsembleRetriever

ensemble = EnsembleRetriever(
    retrievers=[vector_retriever, bm25_retriever, multi_query_retriever],
    weights=[0.6, 0.25, 0.15]
)

Hybrid Ranking Systems

Reciprocal Rank Fusion (RRF) is the most popular fusion method — it rewards documents that rank highly in multiple retrievers without needing score normalization.

LangChain’s


    EnsembleRetriever

uses RRF by default.

Metadata + Semantic Retrieval

vector_retriever = vectorstore.as_retriever(
    search_kwargs={
        "k": 10,
        "filter": {"category": "technical", "date": {"$gte": "2025-01-01"}}
    }
)

Combine metadata filtering with hybrid for powerful enterprise search.

Dense vs Sparse Retrieval

Dense: Embeddings (high-dimensional vectors)
Sparse: BM25, SPLADE, or TF-IDF (term-based)

Many modern vector databases (Weaviate, Pinecone, Elasticsearch, Qdrant) now support native hybrid with built-in sparse vectors.

Query Expansion Techniques

Improve both retrievers with:

from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.retrievers import MultiVectorRetriever

# HyDE (Hypothetical Document Embeddings)
hyde_prompt = "Write a hypothetical answer to this question..."
# Then embed the hypothetical answer instead of raw query

Reranking Retrieved Documents

The most powerful upgrade after hybrid.

from langchain.retrievers.document_compressors import CohereRerank
from langchain.retrievers import ContextualCompressionRetriever

compressor = CohereRerank(model="rerank-english-v3.0", top_n=5)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=hybrid_retriever
)

# Alternative: Open-source BGE reranker
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-large")
compressor = CrossEncoderReranker(model=model, top_n=6)

Recommended rerankers in 2026: Cohere Rerank 3, Voyage Rerank, BGE-reranker-large.

Hybrid Retrieval in LangGraph

from typing import TypedDict, Annotated, List
from langchain_core.documents import Document
import operator

class State(TypedDict):
    messages: Annotated[list, operator.add]
    context: List[Document]

def hybrid_retrieve(state: State):
    query = state["messages"][-1].content
    docs = compression_retriever.invoke(query)   # hybrid + rerank
    return {"context": docs}

# Add to your graph
graph.add_node("retrieve", hybrid_retrieve)

Performance and Accuracy Tradeoffs

Hybrid + Rerank usually gives 10-30%+ better retrieval metrics
Latency increases (reranking top 20–50 candidates)
Cost increases slightly with API rerankers
Start with hybrid only → add reranking for high-value queries

Common Hybrid Retrieval Mistakes

Wrong weights (default 0.5/0.5 often suboptimal)
Using BM25 on poorly tokenized text (e.g., non-English)
No reranking after fusion
Ignoring metadata filtering
Not tuning fetch_k and top_n
Over-relying on one retriever
No evaluation against ground truth
Using hybrid everywhere instead of routing by query type

Best Practices for Hybrid Retrieval

Always combine BM25 + Vector as baseline
Use RRF or learnable fusion when possible
Add cross-encoder reranking on top-20~50 results
Tune weights based on your domain (more BM25 for technical docs)
Enrich metadata and filter aggressively
Implement query expansion (MultiQuery, HyDE)
Evaluate with context_precision/recall (RAGAS)
Route queries: simple factual → hybrid, complex → agentic multi-hop
Monitor retrieval quality in production with user feedback
Abstract your hybrid retriever for easy experimentation

Pro Tip – Production Hybrid Retriever Class

from langchain.retrievers import EnsembleRetriever, ContextualCompressionRetriever

class ProductionHybridRetriever:
    def __init__(self, chunks, embeddings_model="text-embedding-3-small"):
        self.vector_retriever = Chroma.from_documents(chunks, OpenAIEmbeddings(model=embeddings_model)).as_retriever(search_kwargs={"k": 12})
        self.bm25_retriever = BM25Retriever.from_documents(chunks)
        self.bm25_retriever.k = 12
        
        self.ensemble = EnsembleRetriever(
            retrievers=[self.vector_retriever, self.bm25_retriever],
            weights=[0.65, 0.35]
        )
        
        # Add reranker
        self.reranker = CohereRerank(top_n=6)
        self.final_retriever = ContextualCompressionRetriever(
            base_compressor=self.reranker,
            base_retriever=self.ensemble
        )
    
    def invoke(self, query: str):
        return self.final_retriever.invoke(query)

# Usage
retriever = ProductionHybridRetriever(chunks)
docs = retriever.invoke("What are the latest LangGraph breakpoints?")

Hybrid Retrieval is often the single highest-leverage improvement you can make to any RAG system. When combined with good chunking, metadata, and reranking, it turns “okay” retrieval into production-grade reliability.

AI agent LangChain LangGraph Python RAG

← All training