quickstart.ipynb

Step 1 — Install the SDK

Install veep with Parquet support from Test PyPI (alpha). This adds pyarrow for file validation.

In [1]:
!pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ "veep[parquet]"
Successfully installed veep-0.1.0 pyarrow-18.1.0 requests-2.32.3

Step 2 — Create your embeddings file

Vector Panda ingests Parquet, CSV, and binary vector files (.fvecs, .bvecs, .ivecs). Here we'll create a Parquet file with 1,000 random vectors — in practice, these come from your embedding model (OpenAI, Sentence Transformers, CLIP, etc).

In [2]:
import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq

# Generate 1,000 vectors with 384 dimensions (e.g. all-MiniLM-L6-v2)
np.random.seed(42)
vectors = np.random.randn(1000, 384).astype(np.float32)
vectors = vectors / np.linalg.norm(vectors, axis=1, keepdims=True)  # normalize

# Build a Parquet table with vector, key, and metadata columns
table = pa.table({
    "emb": vectors.tolist(),
    "id": [f"doc_{i}" for i in range(1000)],
    "category": np.random.choice(["science", "sports", "music"], 1000).tolist(),
})

pq.write_table(table, "demo_embeddings.parquet")
print(f"Created: 1,000 vectors x 384 dims")
Created: 1,000 vectors x 384 dims

Step 3 — Upload to Vector Panda

Create a client with your API key and upload the file. Collections are created automatically on first upload. This example uses 1,000 vectors at 384 dimensions — well within the 250K free tier (333K vectors at 384D). No credit card needed.

In [3]:
from veep import Client

client = Client(api_key="sk_live_your_key_here")

# Upload — collection "demo" is created automatically
client.upload("demo", "demo_embeddings.parquet")
{'status': 'uploaded', 'collection': 'demo', 'file': 'demo_embeddings.parquet', 'bytes': 1572864}

Step 4 — Review and confirm schema

After upload, Vector Panda analyzes your file and suggests which columns contain IDs, vectors, and metadata. Review the suggestion and confirm to start indexing. This step ensures your data is interpreted correctly, especially when column names are ambiguous.

In [4]:
# Check the suggested schema
schema = client.schema("demo")
print(f"State:    {schema['state']}")
print(f"ID field: {schema['id_field']}")
print(f"Vector:   {schema['vector_field']} ({schema['dimension']}D)")
print(f"Metadata: {schema['metadata_fields']}")
State:    suggested
ID field: id
Vector:   emb (384D)
Metadata: ['category']
In [5]:
# Confirm the schema to start indexing
client.confirm_schema("demo", id_field="id", vector_field="emb")
print("Schema confirmed — vectors are now being indexed")
Schema confirmed — vectors are now being indexed

Step 5 — Query for similar vectors

Pick any vector as a query and search for its nearest neighbors. Results come back sorted by similarity score, with metadata included.

In [6]:
# Use the first vector as our query
query_vector = vectors[0].tolist()

results = client.query(
    "demo",
    vector=query_vector,
    top_k=5,
    include_metadata=True,
)

for r in results:
    print(f"{r.key:10} score={r.score:.4f}  category={r.metadata.get('category', '')}")
doc_0      score=1.0000  category=science
doc_472    score=0.8834  category=music
doc_891    score=0.8719  category=sports
doc_204    score=0.8651  category=science
doc_637    score=0.8590  category=music

Step 6 — Verify your collections

Check what's deployed and how much storage you're using.

In [7]:
for col in client.collections():
    print(f"{col.name:15} {col.vector_count:>8} vectors  {col.storage_gb:.2f} GB  tier={col.tier}")
demo              1000 vectors  0.00 GB  tier=hot

Done.

Your vectors are live. Upload more files to add vectors, or query from any Python script, cURL, or HTTP client using the same API key.

Full Documentation

Distance metrics, metadata, Parquet format, best practices

Python SDK Reference

Complete API for Client, query(), upload(), and all types

HTTP API

REST endpoints with curl examples for direct integration