RAG

Vector Databases

Intermediate

Vector Databases
This comprehensive guide dives deep into Vector Databases — the backbone of scalable semantic search and Retrieval-Augmented Generation (RAG) systems. You will learn core concepts, how to choose the right database in 2026, detailed integration examples with LangChain/LangGraph, advanced features like metadata filtering and hybrid search, persistence strategies, and production best practices with fully runnable code.

Vector Databases

What Are Vector Databases?

Vector databases are specialized systems designed to store, index, and query high-dimensional dense vectors (embeddings) at scale. Unlike traditional databases that excel at exact matches or range queries on structured data, vector databases optimize for approximate nearest neighbor (ANN) search based on semantic similarity. They combine vector storage with fast indexing algorithms (HNSW, IVF, DiskANN, etc.) to return the most relevant items in milliseconds even from billions of vectors.

Why Vector Databases Matter

In AI agents and RAG systems, you need:
  • Sub-second semantic search across millions of documents
  • Efficient metadata filtering (date, source, user, department, etc.)
  • Hybrid search (vector + keyword)
  • Scalability, persistence, and observability
  • Easy integration with LangChain/LangGraph
Pure in-memory solutions (like raw FAISS) or relational DB hacks don’t scale or provide production features.

Storing Embeddings

A typical flow:

  1. Load & chunk documents
  2. Generate embeddings
  3. Store vectors + rich metadata
  4. Query with filters
Most databases default to cosine similarity (recommended for normalized embeddings). Others support Euclidean (L2), dot product, or Manhattan.

Vector Indexing

Key algorithms:
  • HNSW: Best balance of speed/accuracy (most popular in 2026)
  • IVF: Good for very large datasets
  • DiskANN: Disk-based for billion-scale
  • Quantization (PQ, Scalar, Binary): Reduce memory footprint
Database
Type
Best For
Hybrid Search
Open Source
Scale
Recommendation
Pinecone
Managed SaaS
Production, zero-ops
Yes
No
Billions
Enterprise default
Chroma
Embedded / Server
Prototyping & small apps
Basic
Yes
< 10M vectors
Local development
FAISS
Library
High-performance in-memory
No
Yes
Millions (in-memory)
Speed-critical local use
Weaviate
Self-hosted/Cloud
Hybrid search + rich filtering
Excellent
Yes
Hundreds of millions
Complex RAG & multi-tenant
Milvus
Self-hosted/Cloud
Massive scale & GPU acceleration
Good
Yes
Billions
Large enterprise datasets

Chroma (Best for Development)

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Chroma(
    collection_name="my_knowledge_base",
    embedding_function=embeddings,
    persist_directory="./chroma_db"   # for persistence
)

# Add documents
vectorstore.add_documents(chunks)

# Similarity search
results = vectorstore.similarity_search("What is LangGraph?", k=5)

# With score
results_with_score = vectorstore.similarity_search_with_score("...", k=5)

Pinecone (Production Managed)

from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone, ServerlessSpec
import os

pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

# Create index if not exists
index_name = "rag-index"
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )

vectorstore = PineconeVectorStore.from_documents(
    documents=chunks,
    embedding=embeddings,
    index_name=index_name
)

# Query
results = vectorstore.similarity_search("query here", k=6)

FAISS (Fast Local)

from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_documents(chunks, embeddings)

# Save / Load
vectorstore.save_local("faiss_index")
loaded = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)

# Merge indexes easily for incremental updates

Weaviate (Strong Hybrid & Filtering)

from langchain_weaviate import WeaviateVectorStore
import weaviate

client = weaviate.connect_to_local()   # or cloud

vectorstore = WeaviateVectorStore.from_documents(
    chunks,
    embeddings,
    client=client,
    index_name="MyDocuments"
)

# Hybrid search example
results = vectorstore.similarity_search(
    "query",
    k=10,
    alpha=0.75,           # 0.0 = keyword only, 1.0 = vector only
    filter={"path": ["source"], "operator": "Equal", "valueString": "policy.pdf"}
)

Metadata Filtering

Critical for production RAG.
# Chroma
results = vectorstore.similarity_search(
    "What are the benefits?",
    k=5,
    filter={"date": {"$gte": "2025-01-01"}, "category": "technical"}
)

# Pinecone
results = vectorstore.similarity_search(
    "...",
    filter={"user_id": {"$eq": "user_123"}, "department": "engineering"}
)
# Weaviate / Qdrant / Milvus excel here

# Example with LangChain (Weaviate)
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={
        "k": 8,
        "score_threshold": 0.75,
        "filter": {"path": ["year"], "operator": "GreaterThanEqual", "valueInt": 2024}
    }
)

Persistence and Scaling

  • Chroma → persist_directory or run as server
  • Pinecone → Serverless or Pod-based (auto-scales)
  • Weaviate / Milvus → Kubernetes-ready, sharding, replication
  • FAISS → Save to disk, not truly persistent for concurrent writes

Updating Vector Stores

# Incremental add
vectorstore.add_documents(new_chunks)

# Update existing (by ID)
vectorstore.update_document(document_id="doc_123", document=new_doc)

# Delete
vectorstore.delete(ids=["doc_456"])
# or by filter
vectorstore.delete(filter={"source": "old_document.pdf"})

Common Vector Database Mistakes

  • Using Chroma in production at scale
  • Poor or missing metadata → useless filtering
  • No hybrid search when keywords matter
  • Wrong distance metric or unnormalized embeddings
  • Ignoring index tuning (HNSW efConstruction, M parameters)
  • No backup / recovery strategy
  • Storing everything in one collection/namespace
  • Not monitoring query latency and recall
  • Over-relying on pure vector search (ignore BM25)

Best Practices for Vector Databases

  1. Start with Chroma/FAISS locally → migrate to Pinecone/Weaviate/Milvus
  2. Always store rich, filterable metadata
  3. Use hybrid search by default for most RAG use cases
  4. Implement proper indexing + quantization for cost/performance
  5. Version your collections (e.g., knowledge_base_v2026_06)
  6. Add observability (query logs, latency, relevance feedback)
  7. Use namespaces/multi-tenancy for different users or domains
  8. Evaluate recall@K and latency with real workloads
  9. Abstract the vector store behind a LangChain Retriever interface
  10. Combine with reranking (Cohere, Voyage, or cross-encoders) for top quality
Pro Tip – Unified Vector Store Abstraction
from langchain_core.vectorstores import VectorStore
from typing import Literal

def get_vectorstore(provider: Literal["chroma", "pinecone", "weaviate"] = "chroma"):
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
    
    if provider == "chroma":
        return Chroma(embedding_function=embeddings, persist_directory="./db")
    elif provider == "pinecone":
        return PineconeVectorStore(index_name="prod", embedding=embeddings)
    elif provider == "weaviate":
        # ... return WeaviateVectorStore
        pass

# Easy switching in LangGraph
vectorstore = get_vectorstore("weaviate")
retriever = vectorstore.as_retriever(search_kwargs={"k": 6})
Vector databases turn static embeddings into dynamic, queryable knowledge. Choosing and configuring the right one is one of the highest-leverage decisions in building reliable AI agents.

AI agent LangGraph Python RAG

← All training