Hello curious engineers, welcome to the twenty first issue of The Main Thread.
In this issue, we will cover five questions anyone must ask before adding a vector database in their applications.
Vector database have become the default answer to many questions many teams haven’t fully asked. The moment someone mentions “semantic search“ or “RAG pipeline“, the conversation jumps to Pinecone vs Weaviate vs Milvus, skipping past the more fundamental question of whether a dedicated vector database belongs in our architecture at all.
I have seen many engineers add vector databases that solved real problems elegantly. I have also seen engineers spend month integrating specialized infrastructure for workloads that Postgres with pgvector would have handled perfectly well.
The difference usually comes down to whether anyone paused to ask the right questions before making the decision.
Below are the five questions I work through before recommending, or advising against, adding a vector database to a system.
Q1: Do I Actually Need a Dedicated Vector Database?
The vector database market have exploded, but so has vector support in databases we might already be running.
Postgres now offers
pgvectorwhich provides exact and approximate nearest neighbor search withIVFandHNSWindexes.Elasticsearch added dense vector fields and KNN search.
Redis has vector similarity search
SQLite has
sqlite-vssEven MongoDB and Azure Cosmos DB now support vector operations.
I am not saying if vector search is useful, it often is. I am saying we need to determine whether the use case demands capabilities that the existing infra can’t provide.
For many applications, the answer is no. If you are building semantic search over tens of thousands of documents, pgvector running alongside your Postgres DB handles that workload without introducing a new system to deploy, monitor, secure, and maintain.
The operational simplicity of keeping vectors in the primary database with unified backup, consistent transactions, and a single connection pool often outweighs the performance advantages of specialized systems.
Dedicated vector databases start earning their complexity budget when we are working at scales of tens of millions of vectors, when sub-10-millisecond query latency is needed along with filtered search with complex metadata predicates, or when vectors and primary data have fundamentally different access patterns that benefit from separate scaling.
If your vectors can live in the existing database without causing performance problems, they probably should. Every new database in the architecture is a new thing to break in the middle of the night.
Q2. What Are My Scale and Latency Requirements - Really?
Vector search performance depends heavily on the algorithm used and how it’s tuned. Understanding these trade-offs helps us right-size our infrastructure.
Most vector databases use ANN (Approximate Nearest Neighbors) algorithms rather than exact search. This is because exact search requires comparing query against every vector in the database which is computationally prohibitive at scale.
The most common ANN algorithms are HNSW (Hierarchical Navigable Small World) graphs and IVF (Inverted File) indexes, sometimes combined with product quantization to compress vectors.
HNSW builds a multi-layer graph structure that allows traversal from coarse to fine granularity. It offers excellent query performance which is typically a sub-millisecond for collections under a million vectors but requires significant memory because the graph structure must be held in RAM for fast traversals.
IVF partitions vectors into clusters and searches only the most relevant clusters, trading some recall accuracy for lower memory requirements.
The practical implications are significant. A million 1536-dimensional vectors requires roughly 6GB just for raw vector data, before accounting for index structures. HNSW indexes can double or triple that memory requirement.
Before committing to a vector database, we must quantify our actual requirements. How many vectors will we store in year one? Year three? What query latency does our application actually need? What recall accuracy is acceptable?
Many teams over-provisions because they haven’t answered these questions precisely. A system designed for 100M vectors with 5ms latency costs dramatically more than one designed for 10M vectors with 50ms latency, and often the latter is what application requires.
Q3. How Will I Handle Data Consistency and Freshness?
Vector databases typically operate as secondary indexes alongside a primary database. The source of truth might be Postgres or MongoDB holding documents, while vectors live in Pinecone or Weaviate for similarity search. This immediately raises synchronization questions that many engineers underestimate.
When a document is updated in the primary database, how quickly must that change reflect in vector search results? If a user edits a product description, can we tolerate minutes of staleness before the updated embedding is searchable? Hours? For some applications like e-commerce, slight staleness is acceptable. For others, like real-time content moderation, it isn’t.
The synchronization architecture matters. Synchronous embedding generation adds latency to every write operation, since we must call an embedding API and write to the vector database before confirming the transaction.
Asynchronous approaches using Change Data Capture (CDC) or message queues provide better write performance but introduce eventual consistency and failure modes where the primary and vector databases can drift apart.
Consider failure scenarios. If the vector database is temporarily unavailable, can our application function without vector search? If the embedding service fails, do writes to the primary database also fail or do we queue embedding generation for later? If the systems drift out of sync, how do we detect and repair the inconsistency?
These questions don’t have universal answers, but they need explicit answers for the use case at hand. The worst outcome is discovering our consistency model is inadequate when a customer reports that freshly updated content isn’t appearing in search results.
Q4. What’s My Embedding Strategy and How Will It Evolve?
Embeddings are not static artifacts. The models that generate them improve continuously, and our choice of model significantly impacts search qualities.
When OpenAI released text-embedding-3-small and text-embedding-3-large, they offered better performance than text-embedding-ada-002 on retrieval benchmarks. Organizations using the older model faced a choice: continue with inferior embeddings or re-embed their entire corpus.
Re-embedding a million documents through and API is expensive and operationally complex. It requires either downtime or a careful migration process running old and new embeddings in parallel.
We must plan for embedding model changes from the beginning. The schema should track which model version generated each vector. The infrastructure should support re-embedding without downtime, either through blue-green deployment of vector collections or through gradual migration strategies. Budget for re-embedding costs - both the compute or API expenses and the engineering time to execute migrations.
Dimension changes compound the complexity. Moving from a 1536-dimensional model to 3072-dimensional model is more than just re-embedding exercise. It may require schema changes, index rebuilds, and potentially different infrastructure sizing given the increases memory requirements.
The embedding landscape is evolving rapidly, with open-source models from providers like Cohere, VoyageAI, and projects like E5 and BGE offering competitive alternative to proprietary APIs. Our architecture should accommodate experimentation and migration rather than locking us in a single embedding provider permanently.
Q5. What’s the Total Cost of Ownership?
Vector database costs extend far beyond the database service itself.
Managed vector database pricing typically combines storage costs, query costs, and sometimes cost per vector indexed. At small scale, these prices seem reasonable. At large scale, they can exceed the cost of the primary database infrastructure.
Embedding generation has its own cost structure. OpenAI charges per token embedded, which seems cheap until we are processing millions of documents. Self-hosted embedding models avoid per-token fees but requires GPU infrastructure and GPUs are expensive to run continuously.
Operational costs accumulate quietly. A new database means new monitoring dashboards, new alerting rules, new runbooks for on-call engineers, new backup verification procedures, and new security review surface area. The team needs to develop expertise in a system they haven’t operated before.
We must calculate the fully loaded cost before committing. For a given workload, compare the cost of a dedicated vector database against the cost of pgvector on slightly larger Postgres instances. Factor in the engineering hours for integration and ongoing maintenance. Consider whether a simpler architecture frees up engineering time for features that more directly impact the business.
Sometimes the dedicated vector database is clearly worth the investment. Sometimes the analysis reveals that existing infrastructure, appropriately configured, solves the problem at a fraction of the cost and complexity.
The goal is to make that determination deliberately rather than defaulting to the newest tehcnoligy because it’s receiving the most attention.
The Decision Framework
These five questions won’t generate automatic answers, but they structure the conversation that leads to good architectural decisions.
If vectors fit comfortably in the existing database, scale is measured in hundreds of thousands rather than hundreds of millions, consistency requirements are simple, embedding strategy is stable, and cost sensitivity is high, pgvector or similar integrated solutions deserve serious consideration.
If we are operating at a massive scale with demanding latency requirements, we need sophisticated filtered search capabilities, embeddings are a core competitor differentiator warranting dedicated infrastructure, and the operational cost is justified by business value, then a dedicate vector database makes sense.
Most engineering teams fall somewhere between these extremes, which is exactly why the questions matter more than predetermined answers. The best architecture is the one that solves the actual problem we don’t have yet.
That’s it from this issue. I’d love to know your thoughts, mental models, or framework on the strategies to determine whether a vector database is needed.
— Anirudh

