RAG

Vector Databases

Intermediate

This comprehensive guide dives deep into Vector Databases — the backbone of scalable semantic search and Retrieval-Augmented Generation (RAG) systems. You will learn core concepts, how to choose the right database in 2026, detailed integration examples with LangChain/LangGraph, advanced features like metadata filtering and hybrid search, persistence strategies, and production best practices with fully runnable code.

Vector Databases

What Are Vector Databases?

Vector databases are specialized systems designed to store, index, and query high-dimensional dense vectors (embeddings) at scale. Unlike traditional databases that excel at exact matches or range queries on structured data, vector databases optimize for approximate nearest neighbor (ANN) search based on semantic similarity. They combine vector storage with fast indexing algorithms (HNSW, IVF, DiskANN, etc.) to return the most relevant items in milliseconds even from billions of vectors.

Why Vector Databases Matter

In AI agents and RAG systems, you need:

Sub-second semantic search across millions of documents
Efficient metadata filtering (date, source, user, department, etc.)
Hybrid search (vector + keyword)
Scalability, persistence, and observability
Easy integration with LangChain/LangGraph

Pure in-memory solutions (like raw FAISS) or relational DB hacks don’t scale or provide production features.

Storing Embeddings

A typical flow:

Load & chunk documents
Generate embeddings
Store vectors + rich metadata
Query with filters

Similarity Search

Most databases default to cosine similarity (recommended for normalized embeddings). Others support Euclidean (L2), dot product, or Manhattan.

Vector Indexing

Key algorithms:

HNSW: Best balance of speed/accuracy (most popular in 2026)
IVF: Good for very large datasets
DiskANN: Disk-based for billion-scale
Quantization (PQ, Scalar, Binary): Reduce memory footprint

Popular Vector Databases in 2026

Database	Type	Best For	Hybrid Search	Open Source	Scale	Recommendation
Pinecone	Managed SaaS	Production, zero-ops	Yes	No	Billions	Enterprise default
Chroma	Embedded / Server	Prototyping & small apps	Basic	Yes	< 10M vectors	Local development
FAISS	Library	High-performance in-memory	No	Yes	Millions (in-memory)	Speed-critical local use
Weaviate	Self-hosted/Cloud	Hybrid search + rich filtering	Excellent	Yes	Hundreds of millions	Complex RAG & multi-tenant
Milvus	Self-hosted/Cloud	Massive scale & GPU acceleration	Good	Yes	Billions	Large enterprise datasets

Chroma (Best for Development)

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Chroma(
    collection_name="my_knowledge_base",
    embedding_function=embeddings,
    persist_directory="./chroma_db"   # for persistence
)

# Add documents
vectorstore.add_documents(chunks)

# Similarity search
results = vectorstore.similarity_search("What is LangGraph?", k=5)

# With score
results_with_score = vectorstore.similarity_search_with_score("...", k=5)

Pinecone (Production Managed)

from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone, ServerlessSpec
import os

pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

# Create index if not exists
index_name = "rag-index"
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )

vectorstore = PineconeVectorStore.from_documents(
    documents=chunks,
    embedding=embeddings,
    index_name=index_name
)

# Query
results = vectorstore.similarity_search("query here", k=6)

FAISS (Fast Local)

from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_documents(chunks, embeddings)

# Save / Load
vectorstore.save_local("faiss_index")
loaded = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)

# Merge indexes easily for incremental updates

Weaviate (Strong Hybrid & Filtering)

from langchain_weaviate import WeaviateVectorStore
import weaviate

client = weaviate.connect_to_local()   # or cloud

vectorstore = WeaviateVectorStore.from_documents(
    chunks,
    embeddings,
    client=client,
    index_name="MyDocuments"
)

# Hybrid search example
results = vectorstore.similarity_search(
    "query",
    k=10,
    alpha=0.75,           # 0.0 = keyword only, 1.0 = vector only
    filter={"path": ["source"], "operator": "Equal", "valueString": "policy.pdf"}
)

Metadata Filtering

Critical for production RAG.

# Chroma
results = vectorstore.similarity_search(
    "What are the benefits?",
    k=5,
    filter={"date": {"$gte": "2025-01-01"}, "category": "technical"}
)

# Pinecone
results = vectorstore.similarity_search(
    "...",
    filter={"user_id": {"$eq": "user_123"}, "department": "engineering"}
)

Hybrid Metadata + Vector Search

# Weaviate / Qdrant / Milvus excel here

# Example with LangChain (Weaviate)
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={
        "k": 8,
        "score_threshold": 0.75,
        "filter": {"path": ["year"], "operator": "GreaterThanEqual", "valueInt": 2024}
    }
)

Persistence and Scaling

Chroma → persist_directory or run as server
Pinecone → Serverless or Pod-based (auto-scales)
Weaviate / Milvus → Kubernetes-ready, sharding, replication
FAISS → Save to disk, not truly persistent for concurrent writes

Updating Vector Stores

# Incremental add
vectorstore.add_documents(new_chunks)

# Update existing (by ID)
vectorstore.update_document(document_id="doc_123", document=new_doc)

# Delete
vectorstore.delete(ids=["doc_456"])
# or by filter
vectorstore.delete(filter={"source": "old_document.pdf"})

Common Vector Database Mistakes

Using Chroma in production at scale
Poor or missing metadata → useless filtering
No hybrid search when keywords matter
Wrong distance metric or unnormalized embeddings
Ignoring index tuning (HNSW efConstruction, M parameters)
No backup / recovery strategy
Storing everything in one collection/namespace
Not monitoring query latency and recall
Over-relying on pure vector search (ignore BM25)

Best Practices for Vector Databases

Start with Chroma/FAISS locally → migrate to Pinecone/Weaviate/Milvus
Always store rich, filterable metadata
Use hybrid search by default for most RAG use cases
Implement proper indexing + quantization for cost/performance
Version your collections (e.g., knowledge_base_v2026_06)
Add observability (query logs, latency, relevance feedback)
Use namespaces/multi-tenancy for different users or domains
Evaluate recall@K and latency with real workloads
Abstract the vector store behind a LangChain Retriever interface
Combine with reranking (Cohere, Voyage, or cross-encoders) for top quality

Pro Tip – Unified Vector Store Abstraction

from langchain_core.vectorstores import VectorStore
from typing import Literal

def get_vectorstore(provider: Literal["chroma", "pinecone", "weaviate"] = "chroma"):
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
    
    if provider == "chroma":
        return Chroma(embedding_function=embeddings, persist_directory="./db")
    elif provider == "pinecone":
        return PineconeVectorStore(index_name="prod", embedding=embeddings)
    elif provider == "weaviate":
        # ... return WeaviateVectorStore
        pass

# Easy switching in LangGraph
vectorstore = get_vectorstore("weaviate")
retriever = vectorstore.as_retriever(search_kwargs={"k": 6})

Vector databases turn static embeddings into dynamic, queryable knowledge. Choosing and configuring the right one is one of the highest-leverage decisions in building reliable AI agents.

AI agent LangGraph Python RAG

← All training