RAG

Retrievers

Intermediate

Retrievers
This guide explores Retrievers in depth — the critical component that bridges user queries and relevant context in RAG systems and AI agents. You will learn different retriever types, advanced architectures, how to combine them in LangGraph, evaluation methods, and production best practices with complete, runnable code examples.

Retrievers

What Are Retrievers?

A Retriever is a component that takes a user query (or internal agent message) and returns a list of relevant documents/chunks from a vector store or other knowledge source. It is the “search engine” layer in RAG pipelines. Retrievers abstract the complexity of search logic, allowing LangGraph nodes to focus on orchestration, reasoning, and generation.

Retriever Architectures

  • Simple Retriever → VectorStore + similarity search
  • Advanced Retrievers → Multi-query, parent-child, compression, ensemble, agentic
  • LangChain Interface → All retrievers implement .invoke(query) or .get_relevant_documents(query)

Semantic Retrieval

The foundation of modern retrievers. Uses embeddings to find conceptually similar content.
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma(..., embedding_function=embeddings)

retriever = vectorstore.as_retriever(search_kwargs={"k": 6})
docs = retriever.invoke("How does LangGraph handle state?")

Similarity Search Retrieval

Default method: returns top-k most similar vectors (usually cosine similarity).
# Custom similarity search
results = vectorstore.similarity_search(query, k=8)
# or with scores
results = vectorstore.similarity_search_with_score(query, k=8)

k-Nearest Neighbor (kNN) Retrieval

Core algorithm behind similarity search. Modern vector DBs use Approximate kNN (ANN) for speed at scale.
You control k based on context window and desired recall.

Max Marginal Relevance (MMR)

Reduces redundancy by balancing relevance and diversity.
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 8,
        "fetch_k": 20,      # fetch more candidates
        "lambda_mult": 0.7  # 0 = max diversity, 1 = max relevance
    }
)
Best for: Summarization, diverse research, or when documents are very similar.

Metadata-Based Retrieval

Filter before or during vector search.
retriever = vectorstore.as_retriever(
    search_kwargs={
        "k": 6,
        "filter": {
            "source": {"$eq": "company_policy.pdf"},
            "date": {"$gte": "2025-01-01"},
            "department": "engineering"
        }
    }
)

Contextual Retrieval (Advanced)

Add document-level context (title, summary, hierarchy) to each chunk.
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# First add context during chunking
for chunk in chunks:
    chunk.metadata["context"] = f"Document: {chunk.metadata['title']}\nSection: {chunk.metadata['section']}"

# Then retrieve normally

Multi-Query Retrieval

Generate multiple phrasings of the query to improve recall.
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

retriever = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(search_kwargs={"k": 6}),
    llm=llm
)

docs = retriever.invoke("Explain LangGraph checkpoints")

Parent-Child Retrieval

Retrieve small child chunks but return larger parent documents for better context.
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter

child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)

store = InMemoryStore()

parent_retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

parent_retriever.add_documents(docs)

Compression Retrievers

Reduce token usage by extracting only relevant parts.
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain.retrievers import ContextualCompressionRetriever

compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectorstore.as_retriever(search_kwargs={"k": 10})
)

compressed_docs = compression_retriever.invoke("query here")

Ensemble Retrievers

Combine multiple retrievers (vector + keyword + BM25).
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

# Vector retriever
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 6})

# Keyword retriever
bm25_retriever = BM25Retriever.from_documents(chunks)
bm25_retriever.k = 6

ensemble = EnsembleRetriever(
    retrievers=[vector_retriever, bm25_retriever],
    weights=[0.7, 0.3]
)

Retriever Evaluation

Always measure performance.
from ragas import evaluate
from ragas.metrics import context_precision, context_recall

# Prepare dataset and run evaluation
results = evaluate(dataset, [context_precision, context_recall])
print(results)
Key metrics: Recall@K, Precision, NDCG, Context Relevance, Faithfulness.

Common Retriever Mistakes

  • Using default k=4 without testing
  • No metadata filtering
  • Pure vector search without hybrid/keyword
  • Ignoring redundancy (no MMR)
  • Over-compressing context
  • Not using parent-child for large documents
  • No evaluation pipeline
  • Hard-coding retriever parameters
  • Forgetting to tune fetch_k in MMR

Best Practices for Retrievers

  1. Start simple (similarity + metadata) → add complexity only when needed
  2. Always use hybrid/ensemble for production
  3. Set k=8~15 and compress if using large context models
  4. Implement query rewriting or multi-query
  5. Add rich metadata and use it aggressively
  6. Use Parent-Child or Contextual Compression for long documents
  7. Evaluate with domain-specific test sets (RAGAS, ARES, or custom)
  8. Log retrieval latency, scores, and user feedback
  9. Abstract retrievers behind a clean interface in LangGraph
  10. Re-rank top results with a stronger cross-encoder when quality is critical
Pro Tip – Custom LangGraph Retriever Node
from typing import TypedDict, Annotated, List
from langchain_core.documents import Document
import operator

class State(TypedDict):
    messages: Annotated[list, operator.add]
    context: List[Document]

def retrieve(state: State):
    query = state["messages"][-1].content
    docs = ensemble.invoke(query)          # or any advanced retriever
    return {"context": docs}

# In your LangGraph
graph.add_node("retrieve", retrieve)
Retrievers are where most RAG performance is won or lost. Mastering different retriever patterns and combining them intelligently is one of the highest-leverage skills when building reliable AI agents with LangGraph.

AI agent LangGraph Python RAG

← All training