Vector Database Sizing: Sizing Index, RAM and Storage Overhead
An in-depth engineering guide to calculating RAM capacity, disk storage overhead, and index multipliers for vector databases like Pinecone, Milvus, Qdrant, and pgvector.
Vector Databases and Sizing Complexity
Vector databases have become the cornerstone of modern AI architectures, serving as the long-term memory for Large Language Models (LLMs) via Retrieval-Augmented Generation (RAG). Unlike traditional databases that store structured tables or document documents, vector databases store multi-dimensional floating-point vectors representing semantic embeddings. Because finding similar vectors requires computing distances (such as Cosine or Euclidean distance) across high-dimensional space, indexing and search operations are extremely memory-intensive.
Calculating memory (RAM) and disk storage capacity for vector databases is a critical step in system architecture. Under-provisioning leads to query latency spikes or out-of-memory (OOM) crashes, while over-provisioning results in excessive cloud infrastructure costs. To estimate capacity, developers must understand the mathematical components of vector sizes, index overheads, and vector compression techniques like quantization.
The Mathematics of Vector Sizing
The absolute baseline for any sizing estimate is the raw vector size. This represents the memory required to store the bare array of numbers before applying any search indexing. It depends entirely on the embedding dimensions (determined by the model, such as 1536 for OpenAI's text-embedding-3-small) and the numerical precision format used (such as 32-bit floating point).
For instance, storing 10,000,000 vectors with 1536 dimensions using standard float32 precision requires: 1536 × 4 bytes × 10,000,000 = 61,440,000,000 bytes, or approximately 61.44 GB of raw capacity. However, when loaded into a vector database, this raw footprint represents only a fraction of the total memory required because of index structures.
Swipe sideways to compare columns.
| Dimensions | Float32 (4 Bytes) | Float16 (2 Bytes) | Int8 (1 Byte) | Binary (1/8 Byte) |
|---|---|---|---|---|
| 384 (e.g. MiniLM) | 1.54 GB | 0.77 GB | 0.38 GB | 0.05 GB |
| 768 (e.g. Cohere V3) | 3.07 GB | 1.54 GB | 0.77 GB | 0.10 GB |
| 1024 (e.g. BGE-Large) | 4.10 GB | 2.05 GB | 1.02 GB | 0.13 GB |
| 1536 (e.g. OpenAI v3) | 6.14 GB | 3.07 GB | 1.54 GB | 0.19 GB |
Understanding HNSW Index RAM Overhead
To query vectors in milliseconds, databases construct Approximate Nearest Neighbor (ANN) index graphs. The most popular algorithm is Hierarchical Navigable Small World (HNSW). HNSW builds a multi-layer graph structure that allows search queries to skip large sections of the database, similar to skip lists. This speed comes at the expense of substantial memory overhead.
An HNSW index must store the graph edges connecting the vectors. The size of the graph is determined by parameters like M (the maximum number of connection links per node) and efConstruction (the size of the dynamic candidate list evaluated during index construction). On average, an HNSW index adds 1.5x to 2.0x memory overhead to the raw vector storage. When storing the index entirely in RAM (which is required for maximum speed), the total memory required is the raw vector size plus the HNSW graph overhead.
Vector Compression: Scalar Quantization (SQ) and Product Quantization (PQ)
To mitigate high RAM costs, modern vector databases utilize quantization techniques to compress vectors. The two main types are Scalar Quantization (SQ) and Product Quantization (PQ).
Scalar Quantization (SQ8) converts 32-bit floating point numbers (float32) into 8-bit integers (int8). This reduces the vector size by 75% while maintaining around 98-99% of the original search recall accuracy. Product Quantization (PQ) splits the high-dimensional vector space into smaller sub-vectors and clusters them, assigning short codes to represent them. PQ can compress vectors by up to 95-97% (compressing float32 to less than 1 byte per dimension) but results in a higher loss of search accuracy and increases query-time decompression overhead.
Vector DB Sizing Recommendations
The table below details total RAM planning scenarios for common vector workloads using standard float32 precision and HNSW indexing, including the 20% database engine overhead padding.
Swipe sideways to compare columns.
| Vector Count | Dimensions | Raw Vector Size | Projected HNSW Graph Size | Total Planned RAM |
|---|---|---|---|---|
| 1 Million | 1536 | 6.14 GB | 9.21 GB | 18.42 GB |
| 5 Million | 1536 | 30.72 GB | 46.08 GB | 92.16 GB |
| 10 Million | 1536 | 61.44 GB | 92.16 GB | 184.32 GB |
| 50 Million | 768 | 153.60 GB | 230.40 GB | 460.80 GB |
| 100 Million | 384 | 153.60 GB | 230.40 GB | 460.80 GB |
Frequently Asked Questions
Why do vector databases require so much RAM compared to relational databases?
To deliver millisecond-level search latencies across millions of vectors, the database must hold the index graph (like HNSW) and sometimes the raw vectors in memory. Reading high-dimensional index structures from disk during search introduces excessive I/O latency.
How does pgvector store vectors and what are its memory requirements?
pgvector is an extension for PostgreSQL. It supports IVFFlat and HNSW index types. While IVFFlat has minimal memory overhead, HNSW in pgvector requires similar memory to standalone databases, requiring enough RAM to keep the HNSW index in memory for fast performance.
What is the difference between float32 and float16 in vector storage?
Float32 uses 4 bytes per dimension, while float16 uses 2 bytes. Moving from float32 to float16 cuts the raw memory requirement in half with negligible loss in accuracy, which is why many embedding providers now support float16 formats.
Does disk storage need to match RAM capacity?
No. Disk storage needs to be larger than RAM. Disk storage holds the raw vectors, index files, and transaction write-ahead logs (WAL). Generally, provision disk space at 1.5x to 2x the size of the total index and raw vector footprint to allow for system backups and temp files.