Documentation
From a pile of vectors to a served query API in 3 minutes
3-Minute Quickstart
Install, upload, and query in three steps
Python SDK
Full veep SDK reference
HTTP API
REST endpoint documentation
3-Minute Quickstart
Vector Panda takes your embedding vectors and serves them as a high-performance similarity search API. Upload Parquet, CSV, or binary vector files and go from files to live queries in three steps.
Install the SDK
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ veep
Upload your vectors
from veep import Client
client = Client(api_key="sk_live_your_key_here")
# Upload a Parquet file containing your embeddings
client.upload("my-collection", "embeddings.parquet")
Vector Panda ingests the file, distributes vectors across workers, and begins serving them within seconds.
Query for similar vectors
results = client.query(
"my-collection",
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
include_metadata=True
)
for r in results:
print(f"{r.key}: {r.score:.4f}")
Your vectors are now searchable via the API. No index configuration, no cluster setup, no YAML files.
Installation
Install the veep Python SDK. Requires Python 3.9 or later.
veep is currently published to Test PyPI while in alpha. Install with the command below. It will move to the main PyPI index before GA.
# Install from Test PyPI (alpha)
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ veep
# With Parquet upload support
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ "veep[parquet]"
The core package depends only on requests.
The [parquet] extra adds pyarrow for file validation during uploads.
Authentication
All API access requires an API key. You can find your key in the dashboard after signing in.
from veep import Client
# Pass the key directly
client = Client(api_key="sk_live_your_key_here")
# Or set the VEEP_API_KEY environment variable
# export VEEP_API_KEY=sk_live_your_key_here
client = Client() # reads from VEEP_API_KEY
You can also set VEEP_HOST to point at a custom endpoint.
By default the SDK connects to https://api.vectorpanda.com.
Never commit API keys to version control. Use environment variables or a secrets manager.
Upload Vectors
Vector Panda ingests vectors from multiple file formats. Upload your data and the ingestion pipeline handles the rest.
Supported formats
| Format | Extensions | Description |
|---|---|---|
| Apache Parquet | .parquet |
Columnar format with vectors, keys, and metadata columns. Recommended for most use cases. |
| CSV | .csv |
Comma-separated values with vector columns. Good for small datasets or exports from other tools. |
| Binary vectors | .fvecs, .bvecs, .ivecs |
Standard binary vector formats (float32, uint8, int32). Common in ANN benchmark datasets. |
Basic upload
client.upload("products", "product_embeddings.parquet")
With options (Parquet)
client.upload(
"products",
"product_embeddings.parquet",
vector_column="emb", # column containing vectors (default: "emb")
key_column="product_id", # column for vector keys (auto-generated if omitted)
)
Upload flow
When you call upload():
- The file is uploaded to Vector Panda over HTTPS
- The ingestion pipeline detects the format from the file extension
- Vectors are extracted, distributed across workers, and indexed
- Vectors become queryable within seconds of ingestion completing
You can upload multiple files to the same collection. Each upload adds vectors; it does not replace existing ones.
Deleting uploaded files
client.delete("products", "product_embeddings.parquet")
Query Vectors
Search a collection for the most similar vectors to a query vector.
results = client.query(
"products", # collection name
vector=[0.1, 0.2, ...], # query vector (must match collection dimensions)
top_k=10, # max results (default: 10)
similarity_threshold=0.7, # minimum score (default: 0.7)
distance_metric="cosine", # cosine | euclidean | dot_product
include_metadata=True, # include metadata fields (default: False)
)
Working with results
query() returns a QueryResults object. It is iterable and indexable.
# Iterate
for result in results:
print(result.key, result.score)
# Index
best = results[0]
print(best.key, best.score, best.metadata)
# Length
print(f"Found {len(results)} matches")
Each Result object has three fields:
| Field | Type | Description |
|---|---|---|
key |
str | Vector identifier |
score |
float | Similarity score (higher = more similar) |
metadata |
dict | Metadata fields (empty dict if not requested) |
Index acceleration
For large collections, you can request index-accelerated search:
results = client.query(
"products",
vector=query_vec,
use_index="pca",
index_params={"reduced_dimensions": 64},
)
Manage Collections
A collection is a namespace for your vectors. Collections are created automatically when you upload your first file. You can list all collections accessible to your API key.
collections = client.collections()
for col in collections:
print(f"{col.name}: {col.vector_count} vectors, {col.storage_gb:.2f} GB, tier={col.tier}")
Each Collection object has these fields:
| Field | Type | Description |
|---|---|---|
name |
str | Collection name |
tier |
str | Storage tier: hot, warm, or paused |
is_active |
bool | Whether the collection is currently queryable |
vector_count |
int | Total vectors stored |
storage_gb |
float | Storage used in gigabytes |
Tiers & Pausing
Every collection runs in one of three storage tiers. New collections default to warm. Free-tier accounts (no payment method) are limited to warm; hot requires a payment method on file.
| Tier | Storage | Queries | Use case |
|---|---|---|---|
| Hot | NVMe SSD | Unlimited, lowest latency | Production workloads, real-time search |
| Warm | SSD | Unlimited | Development, moderate-traffic apps (default) |
| Paused | HDD | None (must resume first) | Archival, cost savings for inactive collections |
Pausing a collection
Pausing moves data to HDD and stops serving queries. Your index parameters and vector data are preserved. Paused collections bill at $0.09/M vectors (base 512D).
Resuming
Resuming rebuilds artifacts from HDD source, distributes to workers, and promotes the epoch. Typical resume times: 100K vectors <30s, 1M ~1 min, 10M ~3-4 min.
Free-tier collections with no queries for 30 days are automatically paused. You'll receive an email notification 7 days before auto-pause.
Metadata
Metadata is stored alongside vectors and can be returned with query results. In Parquet files, any column that is not the vector column or key column is treated as metadata. CSV files also support metadata columns.
Supported metadata types: strings, integers, floats, booleans.
# Query with metadata
results = client.query(
"products",
vector=query_vec,
include_metadata=True
)
for r in results:
print(r.metadata)
# {"product_name": "Widget", "price": 29.99, "category": "electronics"}
When include_metadata=False (the default), metadata is not loaded during search,
keeping queries fast. Set it to True only when you need the extra fields.
Distance Metrics
Vector Panda supports three distance metrics for similarity search.
Pass the distance_metric parameter to query().
| Metric | Value | Best for |
|---|---|---|
| Cosine similarity | "cosine" |
Text embeddings, normalized vectors (default) |
| Euclidean distance | "euclidean" |
Spatial data, image features |
| Dot product | "dot_product" |
Maximum inner product search, recommendation scores |
Pricing & Free Tier
Every account gets 250,000 free vector slots at 512 dimensions — no credit card required. The free allowance scales inversely with dimensions: higher dimensions use more storage per vector, so fewer vectors fit in the same space.
Dimension multiplier
All pricing uses 512D as the baseline. The cost multiplier is dimensions / 512.
| Dimensions | Multiplier | Free vectors | Example models |
|---|---|---|---|
| 128 | 0.25x | 1,000,000 | Custom, lightweight |
| 256 | 0.50x | 500,000 | E5-small |
| 384 | 0.75x | 333,333 | all-MiniLM-L6-v2 |
| 512 | 1.00x | 250,000 | CLIP ViT-B/32 |
| 768 | 1.50x | 166,666 | all-mpnet-base-v2 |
| 1024 | 2.00x | 125,000 | Cohere embed-v3 |
| 1536 | 3.00x | 83,333 | OpenAI text-embedding-3-small |
| 3072 | 6.00x | 41,666 | OpenAI text-embedding-3-large |
Storage rates
| Tier | Rate (per M vectors/mo, 512D) |
|---|---|
| Hot | $2.99 |
| Warm | $0.49 |
| Paused | $0.09 |
Cost examples
Free tier: 250K vectors at 512D (or 83K at 1536D) — $0/mo.
1M vectors, 768D, warm: 1M × 1.5x × $0.49 = $0.74/mo.
5M vectors, 1536D, hot: 5M × 3.0x × $2.99 = $44.85/mo, minus free tier (83K vectors free ≈ $0.75 off).
There is no monthly minimum or platform fee. You pay only for storage above the free tier.
Rate Limits
Rate limits are applied per API key. If you exceed the limit, the server returns
HTTP 429 Too Many Requests. Retry after the delay indicated in the
Retry-After header.
| Tier | Queries/sec | Uploads/min |
|---|---|---|
| Free | 10 | 5 |
| Warm | 50 | 20 |
| Hot | 200 | 60 |
# Handle rate limiting
from veep import ServerError
try:
results = client.query("products", vector=vec)
except ServerError as e:
if e.status_code == 429:
print("Rate limited — slow down or upgrade tier")
Error Handling
The SDK raises typed exceptions for different error conditions.
All exceptions inherit from VeepError.
from veep import Client, AuthError, NotFoundError, ServerError, UploadError
try:
results = client.query("my-collection", vector=vec)
except AuthError:
print("Invalid API key")
except ServerError as e:
print(f"Server error (HTTP {e.status_code}): {e}")
| Exception | When |
|---|---|
AuthError |
API key missing, invalid, or rejected (HTTP 401) |
NotFoundError |
Collection or file not found (HTTP 404) |
ServerError |
Unexpected server error (HTTP 5xx). Has .status_code |
UploadError |
File not found, wrong format, or missing vector column |
File Formats
Parquet (recommended)
Apache Parquet is the recommended format. It supports vector columns, key columns, and arbitrary metadata columns in a single file.
# Minimum: a vector column named "emb"
emb: list<float32>[768] # fixed-size list of your embedding dimensions
# Recommended: vector column + key column + metadata
emb: list<float32>[768] # embedding vectors
id: string # unique key per vector
title: string # metadata
category: string # metadata
price: float64 # metadata
Creating a Parquet file
import pyarrow as pa
import pyarrow.parquet as pq
# Your data
vectors = [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]
keys = ["doc_1", "doc_2"]
titles = ["First document", "Second document"]
table = pa.table({
"emb": vectors,
"id": keys,
"title": titles,
})
pq.write_table(table, "my_embeddings.parquet")
CSV
Plain CSV files with vector values. Useful for small datasets or quick experiments.
Binary vector formats (fvecs / bvecs / ivecs)
Standard binary formats used by ANN benchmark datasets (SIFT, GloVe, etc). Each file contains a sequence of vectors prefixed with their dimension count.
.fvecs— float32 vectors.bvecs— uint8 vectors.ivecs— int32 vectors
Generating Embeddings
Vector Panda stores and searches vectors. You generate them using any embedding model. Here are examples with popular providers.
OpenAI
from openai import OpenAI
openai = OpenAI()
response = openai.embeddings.create(
input="Vector databases are fast",
model="text-embedding-3-small" # 1536 dimensions
)
vector = response.data[0].embedding
Sentence Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2") # 384 dimensions
vector = model.encode("Vector databases are fast").tolist()
Common dimension sizes
| Model | Dimensions |
|---|---|
| OpenAI text-embedding-3-small | 1,536 |
| OpenAI text-embedding-3-large | 3,072 |
| all-MiniLM-L6-v2 | 384 |
| CLIP ViT-B/32 | 512 |
| Cohere embed-english-v3 | 1,024 |
Best Practices
Keep dimensions consistent
All vectors in a collection must have the same number of dimensions. Use the same embedding model for all data in a collection and for query vectors.
Use cosine similarity for text
Most text embedding models produce normalized vectors. Cosine similarity (the default) is the correct metric for these. Use dot product only if your model documentation specifically recommends it.
Request metadata only when needed
Leave include_metadata=False (the default) for search-only workloads.
Metadata lookup adds a round-trip to the storage layer. Request it when you
need to display or filter on metadata fields.
Batch your uploads
Combine vectors into fewer, larger Parquet files rather than uploading many small files. Each file triggers an ingestion cycle, so larger files are more efficient.
Use client.health() to verify connectivity before running queries in scripts or CI pipelines.