How we build Vector Panda: architecture decisions, performance deep dives, and lessons from running a distributed vector search system.
Every vector database uses HNSW or IVF to trade recall for speed. We went the opposite direction: exhaustive search with PCA-based dimensionality reduction and aggressive distribution across workers. Here's why that decision gives us 100% recall at competitive latencies.
Not all vectors need sub-10ms query times. Our tiered storage model lets you keep frequently-queried data on NVMe (hot), less active data on SSD (warm), and archived data on HDD (paused) at a fraction of the cost. We explain the coordinator logic that routes queries and manages tier transitions.
Our coordinator distributes vector shards across a fleet of workers connected via WebSocket. We cover the epoch system that ensures consistency, the discovery service that finds workers on the network, and the query fanout that aggregates partial results into a single ranked response.
We analyzed billing models across Pinecone, Weaviate Cloud, Qdrant, and Zilliz. Most charge per-query or per-compute-unit, making costs unpredictable. We chose per-vector storage billing with unlimited queries included. Here's the math behind that decision and why it works.
Traditional vector databases require setting nprobe, ef_search, M, and dozens of other parameters. Our approach auto-selects index strategies based on collection size, dimensionality, and query patterns. Upload your vectors and search immediately — no configuration needed.
When vectors are appended, deleted, or resharded, every worker needs a consistent view of the data. We use an epoch system where the coordinator bumps a version number and workers transition atomically. This post walks through the protocol messages, the hot-append optimization, and failure recovery.