Parquet to Vector Search in 5 Minutes
Upload your embedding vectors, confirm your schema, query instantly. No cluster setup, no YAML, no infrastructure.
Step 1 — Install the SDK
Install veep with Parquet support from Test PyPI (alpha). This adds pyarrow for file validation.
!pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ "veep[parquet]"
Successfully installed veep-0.1.0 pyarrow-18.1.0 requests-2.32.3
Step 2 — Create your embeddings file
Vector Panda ingests Parquet, CSV, and binary vector files (.fvecs, .bvecs, .ivecs). Here we'll create a Parquet file with 1,000 random vectors — in practice, these come from your embedding model (OpenAI, Sentence Transformers, CLIP, etc).
import numpy as np import pyarrow as pa import pyarrow.parquet as pq # Generate 1,000 vectors with 384 dimensions (e.g. all-MiniLM-L6-v2) np.random.seed(42) vectors = np.random.randn(1000, 384).astype(np.float32) vectors = vectors / np.linalg.norm(vectors, axis=1, keepdims=True) # normalize # Build a Parquet table with vector, key, and metadata columns table = pa.table({ "emb": vectors.tolist(), "id": [f"doc_{i}" for i in range(1000)], "category": np.random.choice(["science", "sports", "music"], 1000).tolist(), }) pq.write_table(table, "demo_embeddings.parquet") print(f"Created: 1,000 vectors x 384 dims")
Created: 1,000 vectors x 384 dims
Step 3 — Upload to Vector Panda
Create a client with your API key and upload the file. Collections are created automatically on first upload. This example uses 1,000 vectors at 384 dimensions — well within the 250K free tier (333K vectors at 384D). No credit card needed.
from veep import Client client = Client(api_key="sk_live_your_key_here") # Upload — collection "demo" is created automatically client.upload("demo", "demo_embeddings.parquet")
{'status': 'uploaded', 'collection': 'demo', 'file': 'demo_embeddings.parquet', 'bytes': 1572864}
Step 4 — Review and confirm schema
After upload, Vector Panda analyzes your file and suggests which columns contain IDs, vectors, and metadata. Review the suggestion and confirm to start indexing. This step ensures your data is interpreted correctly, especially when column names are ambiguous.
# Check the suggested schema schema = client.schema("demo") print(f"State: {schema['state']}") print(f"ID field: {schema['id_field']}") print(f"Vector: {schema['vector_field']} ({schema['dimension']}D)") print(f"Metadata: {schema['metadata_fields']}")
State: suggested
ID field: id
Vector: emb (384D)
Metadata: ['category']
# Confirm the schema to start indexing client.confirm_schema("demo", id_field="id", vector_field="emb") print("Schema confirmed — vectors are now being indexed")
Schema confirmed — vectors are now being indexed
Step 5 — Query for similar vectors
Pick any vector as a query and search for its nearest neighbors. Results come back sorted by similarity score, with metadata included.
# Use the first vector as our query query_vector = vectors[0].tolist() results = client.query( "demo", vector=query_vector, top_k=5, include_metadata=True, ) for r in results: print(f"{r.key:10} score={r.score:.4f} category={r.metadata.get('category', '')}")
doc_0 score=1.0000 category=science
doc_472 score=0.8834 category=music
doc_891 score=0.8719 category=sports
doc_204 score=0.8651 category=science
doc_637 score=0.8590 category=music
Step 6 — Verify your collections
Check what's deployed and how much storage you're using.
for col in client.collections(): print(f"{col.name:15} {col.vector_count:>8} vectors {col.storage_gb:.2f} GB tier={col.tier}")
demo 1000 vectors 0.00 GB tier=hot
Done.
Your vectors are live. Upload more files to add vectors, or query from any Python script, cURL, or HTTP client using the same API key.