Skip to main content
This is a hands-on tutorial. You will run Python against a local Actian VectorAI DB instance and step through choosing models, ingesting embeddings, and comparing search behavior. By the end, you will have a working multimodel search pipeline that lets you compare retrieval quality across different embedding architectures side by side. Embedding models convert text (or images, audio, code) into dense numerical vectors that capture semantic meaning. Actian VectorAI DB stores these vectors and retrieves similar ones at scale — but the quality of your search depends entirely on the quality of your embeddings. Choosing the right open-source model is one of the most impactful decisions you will make when building a vector search application. The wrong model wastes storage on unhelpful dimensions, produces low-recall results, and adds unnecessary latency.

Architecture overview

The diagram below shows how documents flow through model selection, embedding, and storage in Actian VectorAI DB. Each model produces vectors of a different size, stored in separate collections so you can compare retrieval quality across configurations.

Environment setup

Run the following command to install the two packages this tutorial depends on.
pip install actian-vectorai-client sentence-transformers

What this installs

Both packages are required: one communicates with the database, and the other loads and runs embedding models on your machine.
  • actian-vectorai-client — Official Python SDK for Actian VectorAI DB; provides async/sync clients, Filter DSL, and gRPC transport.
  • sentence-transformers — Framework for loading and running open-source embedding models; downloads and caches models from Hugging Face.

Step 1: Understand the model landscape

Before writing any code, review the table below to understand how the available models differ in dimension count, speed, and quality. The model you choose determines the shape of every vector stored in the database.
ModelDimensionsSpeedQualityBest for
sentence-transformers/all-MiniLM-L6-v2384Very fastGoodPrototyping, low-latency apps
sentence-transformers/all-MiniLM-L12-v2384FastBetterProduction with speed constraints
sentence-transformers/all-mpnet-base-v2768ModerateHighGeneral production use
sentence-transformers/multi-qa-mpnet-base-dot-v1768ModerateHigh (QA)Question-answering systems
sentence-transformers/all-distilroberta-v1768ModerateHighDiverse text types
intfloat/e5-large-v21024SlowVery highMaximum quality, offline indexing
BAAI/bge-large-en-v1.51024SlowVery highBenchmarks, academic use
sentence-transformers/clip-ViT-B-32512ModerateHigh (multi)Text + image multimodal

Key trade-offs

Keep the following trade-offs in mind before choosing a model.
  • More dimensions means more storage and slower search, but better semantic resolution.
  • Fewer dimensions means less RAM and faster search, but may lose subtle meaning.
  • Model architecture matters more than dimension count — a well-trained 384-dim model can outperform a poorly trained 768-dim one.

Step 2: Import dependencies and configure

The block below imports every module used across all steps of this tutorial and sets the server address. Run it once at the top of your script or notebook.
import asyncio
import time
from sentence_transformers import SentenceTransformer

from actian_vectorai import (
    AsyncVectorAIClient,
    Distance,
    Field,
    FilterBuilder,
    PointStruct,
    PrefetchQuery,
    VectorParams,
)
from actian_vectorai.models.collections import HnswConfigDiff
from actian_vectorai.models.enums import Fusion
from actian_vectorai.models.points import (
    SearchParams,
    WithPayloadSelector,
)

SERVER = "localhost:6574"

print(f"VectorAI Server: {SERVER}")

Expected output

VectorAI Server: localhost:6574

Step 3: Load multiple models and compare embedding output

The code below loads three models — small, medium, and large — and encodes the same sample sentence with each one. Running it prints each model’s load time, dimension count, and the first five values of the resulting vector, confirming that each model produces a vector of a different size.
# Three models spanning small, medium, and large architectures
models = {
    "minilm": {
        "name": "all-MiniLM-L6-v2",
        "dim": 384,
        "description": "Small, fast, good for prototyping",
    },
    "mpnet": {
        "name": "all-mpnet-base-v2",
        "dim": 768,
        "description": "Balanced quality and speed",
    },
    "e5-large": {
        "name": "intfloat/e5-large-v2",
        "dim": 1024,
        "description": "High quality, slower",
    },
}

loaded_models = {}
for key, info in models.items():
    print(f"Loading {info['name']}...")
    t0 = time.time()
    loaded_models[key] = SentenceTransformer(info["name"])
    elapsed = time.time() - t0
    print(f"  Loaded in {elapsed:.1f}s — {info['dim']} dimensions — {info['description']}")

sample_text = "Vector databases store high-dimensional embeddings for similarity search."
print(f"\nSample: \"{sample_text}\"\n")
for key, m in loaded_models.items():
    vec = m.encode(sample_text)
    print(f"  {key:>10}: dim={len(vec)}, first 5 values={vec[:5].round(4).tolist()}")

Expected output

This block iterates over the three model definitions — MiniLM-L6 (384 dimensions), MPNet-base (768 dimensions), and E5-large-v2 (1024 dimensions) — loads each one from the Hugging Face cache, and records the load time. It then encodes the same sample sentence with every loaded model and prints each model’s actual output dimension and the first five vector values, confirming that the models are producing embeddings of the expected shape.
Loading all-MiniLM-L6-v2...
  Loaded in <time>s — 384 dimensions — Small, fast, good for prototyping
Loading all-mpnet-base-v2...
  Loaded in <time>s — 768 dimensions — Balanced quality and speed
Loading intfloat/e5-large-v2...
  Loaded in <time>s — 1024 dimensions — High quality, slower

Sample: "Vector databases store high-dimensional embeddings for similarity search."

      minilm: dim=384, first 5 values=[...]
       mpnet: dim=768, first 5 values=[...]
    e5-large: dim=1024, first 5 values=[...]
Load times and vector values vary by hardware, library version, and model revision.

Step 4: Measure embedding speed

The code below encodes a batch of 100 texts with each loaded model and prints total encoding time and throughput in texts per second. Running it shows how much slower larger models are relative to smaller ones, which directly affects ingestion time and real-time query latency.
benchmark_texts = [
    "How to create a collection in VectorAI DB?",
    "Semantic search finds documents by meaning rather than keywords.",
    "HNSW indexing builds a navigable small-world graph for fast search.",
    "Payload filters combine vector similarity with structured conditions.",
    "Quantization reduces memory usage by compressing vector components.",
] * 20  # 100 texts total

print(f"Benchmarking {len(benchmark_texts)} texts:\n")

for key, m in loaded_models.items():
    t0 = time.time()
    vecs = m.encode(benchmark_texts)
    elapsed = time.time() - t0
    throughput = len(benchmark_texts) / elapsed
    print(
        f"  {key:>10}: {elapsed:.3f}s total, "
        f"{throughput:.0f} texts/sec, "
        f"{models[key]['dim']} dims"
    )

Expected output

This block constructs a 100-text corpus and passes the full batch to each model’s encode method. It measures wall-clock time and computes throughput in texts per second, making the latency cost of moving from MiniLM to E5-large directly visible.
Benchmarking 100 texts:

      minilm: <time>s total, <n> texts/sec, 384 dims
       mpnet: <time>s total, <n> texts/sec, 768 dims
    e5-large: <time>s total, <n> texts/sec, 1024 dims
Throughput figures depend on your hardware and whether a GPU is available. Relative ordering — MiniLM fastest, E5-large slowest — is consistent across environments.
MiniLM is significantly faster than E5-large. For real-time query embedding, this difference directly impacts response latency. For batch ingestion, it affects indexing time but not search quality.

Step 5: Match distance metrics to models

Different models are trained with different objectives, and using the wrong distance metric silently degrades retrieval quality. The code below prints the correct metric for each model so that you can verify your collection configuration matches the model.
# Correct distance metric for each model based on its training objective
model_distance_map = {
    "all-MiniLM-L6-v2":            Distance.Cosine,
    "all-MiniLM-L12-v2":           Distance.Cosine,
    "all-mpnet-base-v2":           Distance.Cosine,
    "all-distilroberta-v1":        Distance.Cosine,
    "multi-qa-mpnet-base-dot-v1":  Distance.Dot,
    "intfloat/e5-large-v2":        Distance.Cosine,
    "BAAI/bge-large-en-v1.5":      Distance.Cosine,
}

print("Model → Distance metric:\n")
for model_name, dist in model_distance_map.items():
    print(f"  {model_name:<35}{dist.name}")
The table below describes when each distance metric applies and which models use it.
DistanceWhen to useModels trained with it
Distance.CosineMost general-purpose models, where outputs are normalized or benefit from angular comparison.MiniLM, MPNet, E5, BGE
Distance.DotModels trained with dot-product loss, where outputs are not normalized and magnitude matters.multi-qa-mpnet-base-dot-v1
Distance.EuclidWhen absolute distance matters; rare for text, but common for structured or tabular embeddings.Custom models
If the model documentation says “cosine similarity”, use Distance.Cosine. If it says “dot product”, use Distance.Dot. When in doubt, use Distance.Cosine.

Step 6: Create collections for different models

The code below creates one collection for each of the three models, each configured with the matching dimension count, distance metric, and HNSW settings. Running it prints a confirmation line per collection.
collection_configs = {
    "embeddings-minilm": {
        "dim": 384,
        "distance": Distance.Cosine,
        "hnsw": HnswConfigDiff(m=16, ef_construct=128),
    },
    "embeddings-mpnet": {
        "dim": 768,
        "distance": Distance.Cosine,
        "hnsw": HnswConfigDiff(m=16, ef_construct=128),
    },
    "embeddings-e5-large": {
        "dim": 1024,
        "distance": Distance.Cosine,
        "hnsw": HnswConfigDiff(m=16, ef_construct=128),
    },
}

async def create_collections():
    for name, cfg in collection_configs.items():
        async with AsyncVectorAIClient(url=SERVER) as client:
            # Delete any stale collection from previous runs before creating fresh
            try:
                await client.collections.delete(name)
            except Exception:
                pass
            await client.collections.get_or_create(
                name=name,
                vectors_config=VectorParams(
                    size=cfg["dim"],
                    distance=cfg["distance"],
                ),
                hnsw_config=cfg["hnsw"],
            )
            # get_info confirms the collection is fully committed on the server
            # before any subsequent operation uses it
            await client.collections.get_info(name)
            print(f"  Collection '{name}' ready: {cfg['dim']}-dim, {cfg['distance'].name}")

asyncio.run(create_collections())

Memory footprint at scale

The table below shows how much memory each model requires at one million documents. Use it to plan your deployment.
ModelDimsBytes/vector (float32)Memory at 1M docs
MiniLM3841,536~1.5 GB
MPNet7683,072~3.0 GB
E5-large1,0244,096~4.0 GB

Expected output

  Collection 'embeddings-minilm' ready: 384-dim, Cosine
  Collection 'embeddings-mpnet' ready: 768-dim, Cosine
  Collection 'embeddings-e5-large' ready: 1024-dim, Cosine
Pattern: Delete any stale collection first, then get_or_create, then call get_info as a synchronisation barrier before closing the connection. This confirms the collection is fully committed on the server. Upsert into the collection only after this sequence completes — in a separate connection if needed.

Step 7: Prepare a shared dataset

The code below defines a list of 20 short passages with topic and difficulty metadata. All three models embed this same dataset so that retrieval quality can be compared directly.
documents = [
    {"text": "HNSW is a graph-based approximate nearest-neighbour index that achieves sub-linear search time.", "topic": "indexing", "difficulty": "intermediate"},
    {"text": "Cosine similarity measures the angle between two vectors and ranges from -1 to 1.", "topic": "distance_metrics", "difficulty": "beginner"},
    {"text": "Payload filters let you combine vector similarity with structured conditions like category equals electronics.", "topic": "filtering", "difficulty": "beginner"},
    {"text": "Scalar quantization compresses each float32 component to int8, reducing memory by 4x.", "topic": "quantization", "difficulty": "intermediate"},
    {"text": "Prefetch queries retrieve candidates from multiple vector spaces and merge them with fusion.", "topic": "search", "difficulty": "advanced"},
    {"text": "Named vectors allow a single collection to store multiple embedding spaces per document.", "topic": "multimodal", "difficulty": "intermediate"},
    {"text": "The FilterBuilder supports must, should, and must_not for complex boolean logic.", "topic": "filtering", "difficulty": "beginner"},
    {"text": "Connection pooling distributes gRPC calls across multiple channels for higher throughput.", "topic": "infrastructure", "difficulty": "advanced"},
    {"text": "Score thresholds discard results below a minimum similarity, preventing low-quality answers.", "topic": "search", "difficulty": "beginner"},
    {"text": "Reciprocal rank fusion merges results from multiple retrieval strategies by rank position.", "topic": "search", "difficulty": "advanced"},
    {"text": "The HNSW m parameter controls how many neighbours each node connects to during index construction.", "topic": "indexing", "difficulty": "intermediate"},
    {"text": "Euclidean distance measures the straight-line distance between two points in vector space.", "topic": "distance_metrics", "difficulty": "beginner"},
    {"text": "Dot product similarity is equivalent to cosine similarity when vectors are unit-normalized.", "topic": "distance_metrics", "difficulty": "intermediate"},
    {"text": "Batch search sends multiple queries in a single RPC call for better throughput.", "topic": "search", "difficulty": "intermediate"},
    {"text": "The VDE namespace provides operational commands like flush, rebuild index, and compact.", "topic": "infrastructure", "difficulty": "advanced"},
    {"text": "The ef parameter at search time controls accuracy versus speed for HNSW queries.", "topic": "indexing", "difficulty": "intermediate"},
    {"text": "Product quantization divides vectors into subspaces and quantizes each independently.", "topic": "quantization", "difficulty": "advanced"},
    {"text": "Geo filters restrict results to points within a radius, bounding box, or polygon.", "topic": "filtering", "difficulty": "intermediate"},
    {"text": "SmartBatcher provides streaming ingestion with automatic size-based and time-based flushing.", "topic": "infrastructure", "difficulty": "advanced"},
    {"text": "Text indexes tokenize string payloads to support full-text search alongside vector similarity.", "topic": "filtering", "difficulty": "intermediate"},
]

Step 8: Embed and ingest with each model

The code below embeds all 20 documents with each model and upserts the resulting vectors into the corresponding collection. Running it prints one line per model showing embedding time, ingestion time, and the total number of points confirmed in the collection after flushing.
model_to_collection = {
    "minilm":    "embeddings-minilm",
    "mpnet":     "embeddings-mpnet",
    "e5-large":  "embeddings-e5-large",
}

async def ingest_with_model(model_key: str, collection_name: str):
    m = loaded_models[model_key]
    texts = [d["text"] for d in documents]

    # Embed before opening the connection — CPU-only, no server needed
    t0 = time.time()
    vectors = m.encode(texts).tolist()
    embed_time = time.time() - t0

    points = [
        PointStruct(
            id=i,
            vector=vec,
            payload={
                "text":       doc["text"],
                "topic":      doc["topic"],
                "difficulty": doc["difficulty"],
                "model":      model_key,
            },
        )
        for i, (doc, vec) in enumerate(zip(documents, vectors))
    ]

    # Open a fresh connection for upsert — collection is fully committed from Step 6
    async with AsyncVectorAIClient(url=SERVER) as client:
        t1 = time.time()
        await client.points.upsert(collection_name, points=points)
        ingest_time = time.time() - t1
        await client.vde.flush(collection_name)
        count = await client.vde.get_vector_count(collection_name)

    print(
        f"  {model_key:>10}: embedded in {embed_time:.3f}s, "
        f"ingested {len(points)} pts in {ingest_time:.3f}s, "
        f"total={count}"
    )

async def ingest_all():
    for key, collection in model_to_collection.items():
        await ingest_with_model(key, collection)

asyncio.run(ingest_all())

Expected output

      minilm: embedded in <time>s, ingested 20 pts in <time>s, total=20
       mpnet: embedded in <time>s, ingested 20 pts in <time>s, total=20
    e5-large: embedded in <time>s, ingested 20 pts in <time>s, total=20
Timing values vary by hardware and network conditions.

Step 9: Compare search quality across models

The code below runs three test queries against all three collections and prints the top-scoring documents from each. Running it lets you see whether different models surface different documents for the same query, and how confidently each model scores its top result.
test_queries = [
    "How does approximate nearest neighbour search work?",
    "What is the difference between cosine and dot product?",
    "How to filter search results by category?",
]

async def compare_models(query: str, top_k: int = 3):
    print(f"\nQuery: \"{query}\"\n")

    for key, collection in model_to_collection.items():
        m = loaded_models[key]
        vec = m.encode(query).tolist()

        async with AsyncVectorAIClient(url=SERVER) as client:
            results = await client.points.search(
                collection, vector=vec, limit=top_k,
                with_payload=WithPayloadSelector(include=["text", "topic"]),
            ) or []

        print(f"  [{key:>10}]")
        for r in results:
            print(f"    score={r.score:.4f}  [{r.payload['topic']:>16}]  {r.payload['text'][:65]}...")
        print()

for q in test_queries:
    asyncio.run(compare_models(q))

What to look for

Use the same query across collections to see how each model ranks passages.
  • Check whether scores are spread between relevant and irrelevant results. Higher-quality models tend to produce wider separation, making it easier to set a threshold.
  • Check whether the top result is the most relevant document. When models disagree on rank 1, the larger model is often a better reference, though results vary by query and domain.
  • Check whether cosine scores fall above 0.7, which indicates strong relevance for most sentence transformers. Scores below 0.4 are usually noise.

Step 10: Tune search accuracy with SearchParams

SearchParams controls how the HNSW index is traversed at query time. The code below runs the same query two ways — approximate with a specific hnsw_ef value, and exact brute-force — so you can compare accuracy and latency directly.
async def tuned_vs_exact(query: str, top_k: int = 5):
    m = loaded_models["minilm"]
    vec = m.encode(query).tolist()

    async with AsyncVectorAIClient(url=SERVER) as client:
        # High-accuracy approximate search
        t0 = time.time()
        approx_results = await client.points.search(
            "embeddings-minilm",
            vector=vec,
            limit=top_k,
            with_payload=WithPayloadSelector(include=["text"]),
            params=SearchParams(hnsw_ef=256),
        ) or []
        approx_time = time.time() - t0

        # Exact brute-force — 100% recall, slowest option
        t1 = time.time()
        exact_results = await client.points.search(
            "embeddings-minilm",
            vector=vec,
            limit=top_k,
            with_payload=WithPayloadSelector(include=["text"]),
            params=SearchParams(exact=True),
        ) or []
        exact_time = time.time() - t1

    print(f"Query: \"{query}\"\n")
    print(f"  Approximate (hnsw_ef=256): {approx_time*1000:.1f}ms")
    for r in approx_results[:3]:
        print(f"    id={r.id}  score={r.score:.4f}  {r.payload['text'][:60]}...")

    print(f"\n  Exact (brute-force): {exact_time*1000:.1f}ms")
    for r in exact_results[:3]:
        print(f"    id={r.id}  score={r.score:.4f}  {r.payload['text'][:60]}...")

asyncio.run(tuned_vs_exact("How does graph-based indexing work?"))
SettingSpeedAccuracyWhen to use
hnsw_ef=64FastestGoodHigh-throughput, latency-sensitive
hnsw_ef=256FastHighDefault recommendation for most use cases
exact=TrueSlowestPerfectBenchmarking and establishing ground truth

Step 11: Named vectors — multiple models in one collection

Rather than creating separate collections, you can store embeddings from multiple models in a single collection using named vectors. The code below creates a collection with two named vector spaces, embeds the shared dataset with both models, and uploads each document with both vectors attached.
async def create_multi_model_collection():
    async with AsyncVectorAIClient(url=SERVER) as client:
        await client.collections.get_or_create(
            name="embeddings-multimodel",
            vectors_config={
                "minilm": VectorParams(size=384, distance=Distance.Cosine),
                "mpnet":  VectorParams(size=768, distance=Distance.Cosine),
            },
            hnsw_config=HnswConfigDiff(m=16, ef_construct=128),
        )

        texts = [d["text"] for d in documents]
        minilm_vecs = loaded_models["minilm"].encode(texts).tolist()
        mpnet_vecs  = loaded_models["mpnet"].encode(texts).tolist()

        points = [
            PointStruct(
                id=i,
                vector={"minilm": minilm_vecs[i], "mpnet": mpnet_vecs[i]},
                payload={
                    "text":       doc["text"],
                    "topic":      doc["topic"],
                    "difficulty": doc["difficulty"],
                },
            )
            for i, doc in enumerate(documents)
        ]

        await client.points.upsert("embeddings-multimodel", points=points)
        await client.vde.flush("embeddings-multimodel")
        count = await client.vde.get_vector_count("embeddings-multimodel")

    print(f"Multi-model collection ready: {count} documents, 2 vector spaces")

asyncio.run(create_multi_model_collection())

Expected output

This block creates a single collection with two named vector spaces — "minilm" (384-dim, Cosine) and "mpnet" (768-dim, Cosine) — then encodes all 20 shared documents with both models. Each document is stored as a single PointStruct carrying both embedding vectors alongside its text, topic, and difficulty metadata.
Multi-model collection ready: 20 documents, 2 vector spaces
vde.get_vector_count() returns the total across all named spaces — 20 documents × 2 spaces = 40 indexed vectors.

Step 12: Search individual models and fuse results

Each named space was produced by a different encoder, so you embed the query once per model and pass the vector that matches the using parameter. The code below runs two single-space searches, then one fused query that merges candidate lists with reciprocal rank fusion (RRF).
async def multi_model_search(query: str, top_k: int = 5):
    minilm_vec = loaded_models["minilm"].encode(query).tolist()
    mpnet_vec  = loaded_models["mpnet"].encode(query).tolist()

    async with AsyncVectorAIClient(url=SERVER) as client:
        minilm_results = await client.points.search(
            "embeddings-multimodel",
            vector=minilm_vec, using="minilm", limit=top_k,
            with_payload=WithPayloadSelector(include=["text", "topic"]),
        ) or []

        mpnet_results = await client.points.search(
            "embeddings-multimodel",
            vector=mpnet_vec, using="mpnet", limit=top_k,
            with_payload=WithPayloadSelector(include=["text", "topic"]),
        ) or []

        fused_results = await client.points.query(
            "embeddings-multimodel",
            query={"fusion": Fusion.RRF},
            prefetch=[
                PrefetchQuery(query=minilm_vec, using="minilm", limit=10),
                PrefetchQuery(query=mpnet_vec,  using="mpnet",  limit=10),
            ],
            limit=top_k,
            with_payload=WithPayloadSelector(include=["text", "topic"]),
        )

    print(f"Query: \"{query}\"\n")

    print("  [MiniLM only]")
    for r in (minilm_results or [])[:3]:
        print(f"    score={r.score:.4f}  {r.payload['text'][:60]}...")

    print("\n  [MPNet only]")
    for r in (mpnet_results or [])[:3]:
        print(f"    score={r.score:.4f}  {r.payload['text'][:60]}...")

    print("\n  [RRF fusion: MiniLM + MPNet]")
    for r in (list(fused_results) if fused_results else [])[:3]:
        print(f"    score={r.score:.4f}  {r.payload['text'][:60]}...")

asyncio.run(multi_model_search("How does approximate search indexing work?"))

Why multimodel fusion improves results

Different models capture different aspects of meaning. MiniLM is strong at lexical similarity, where “search” closely matches “search.” MPNet better captures paraphrases, where “ANN search” matches “approximate nearest-neighbour.” Documents that rank highly in both models are almost certainly relevant, and RRF fusion naturally promotes those consensus results.

Step 13: Re-embed specific vectors when switching models

To upgrade from one model to another, re-embed the text payloads and re-upsert all affected points. The code below simulates upgrading the MiniLM space from L6 to L12 by fetching all existing points, re-encoding each text with the upgraded model, and upserting the new vectors back.
async def reembed_minilm_space():
    """Simulate upgrading minilm from L6 to L12."""
    upgraded_model = SentenceTransformer("all-MiniLM-L12-v2")

    async with AsyncVectorAIClient(url=SERVER) as client:
        # Retrieve all existing points to access their stored text payloads
        all_points = await client.points.get(
            "embeddings-multimodel",
            ids=list(range(len(documents))),
            with_payload=WithPayloadSelector(include=["text"]),
            with_vectors=False,
        )

        # Re-encode each document with the upgraded model
        updated = []
        for pt in all_points:
            text = pt.payload.get("text", "")
            new_vec = upgraded_model.encode(text).tolist()
            # Upsert the full point with the updated minilm vector
            updated.append(PointStruct(
                id=pt.id,
                vector={"minilm": new_vec, "mpnet": loaded_models["mpnet"].encode(text).tolist()},
                payload=pt.payload,
            ))

        await client.points.upsert("embeddings-multimodel", points=updated)
        await client.vde.flush("embeddings-multimodel")

    print(f"Re-embedded {len(updated)} points in 'minilm' space with L12 model.")
    print("MPNet vectors are unchanged.")

asyncio.run(reembed_minilm_space())

Why this pattern matters

Re-upserting lets you upgrade one model’s vectors without losing other data. This matters in the following situations.
  • You want to upgrade one model without re-processing all data from scratch.
  • Different teams own different embedding spaces.
  • You need to keep payload metadata intact during a model upgrade.

Step 14: Run multiple searches efficiently

When you need to run multiple queries against the same collection, run them sequentially within a single client connection to minimise connection overhead.
async def multi_search(queries: list, top_k: int = 3):
    for collection_key in ["minilm", "mpnet"]:
        m = loaded_models[collection_key]
        collection = model_to_collection[collection_key]

        async with AsyncVectorAIClient(url=SERVER) as client:
            print(f"\n[{collection_key}] — {len(queries)} queries:\n")
            for i, query in enumerate(queries):
                vec = m.encode(query).tolist()
                results = await client.points.search(
                    collection, vector=vec, limit=top_k,
                    with_payload=WithPayloadSelector(include=["text"]),
                ) or []
                print(f"  Q{i+1}: \"{query[:50]}...\"")
                for r in results[:2]:
                    print(f"    score={r.score:.4f}  {r.payload['text'][:55]}...")

asyncio.run(multi_search([
    "How does HNSW graph indexing work?",
    "What are the options for compressing vectors?",
    "How to combine vector search with metadata filters?",
]))
Keeping all searches inside a single async with block reuses the same gRPC channel, eliminating repeated connection setup overhead for each query. Network round trips are reduced because the connection is opened once and shared across all queries in the block.

Step 15: Build a model selection helper

The code below defines a recommend_model function that takes corpus size, latency budget, and quality priority as inputs and returns a recommended model with configuration.
from dataclasses import dataclass

@dataclass
class ModelRecommendation:
    model_name: str
    dimension: int
    distance: Distance
    hnsw_m: int
    hnsw_ef_construct: int
    reason: str

def recommend_model(
    corpus_size: int,
    max_latency_ms: float,
    quality_priority: str = "balanced",
) -> ModelRecommendation:
    """Return a model and VectorAI configuration for the given constraints."""

    est_ram_gb = lambda dim: (dim * 4 * corpus_size) / (1024**3)

    if quality_priority == "speed" or max_latency_ms < 10:
        return ModelRecommendation(
            model_name="all-MiniLM-L6-v2", dimension=384,
            distance=Distance.Cosine, hnsw_m=16, hnsw_ef_construct=100,
            reason=f"Fastest model. RAM: ~{est_ram_gb(384):.1f} GB for {corpus_size:,} docs.",
        )
    if quality_priority == "quality":
        return ModelRecommendation(
            model_name="intfloat/e5-large-v2", dimension=1024,
            distance=Distance.Cosine, hnsw_m=32, hnsw_ef_construct=256,
            reason=f"Highest quality. RAM: ~{est_ram_gb(1024):.1f} GB float32.",
        )
    return ModelRecommendation(
        model_name="all-mpnet-base-v2", dimension=768,
        distance=Distance.Cosine, hnsw_m=16, hnsw_ef_construct=128,
        reason=f"Balanced quality and speed. RAM: ~{est_ram_gb(768):.1f} GB float32.",
    )

scenarios = [
    {"corpus_size": 10_000,    "max_latency_ms": 5,   "quality_priority": "speed"},
    {"corpus_size": 500_000,   "max_latency_ms": 50,  "quality_priority": "balanced"},
    {"corpus_size": 2_000_000, "max_latency_ms": 200, "quality_priority": "quality"},
]

for s in scenarios:
    rec = recommend_model(**s)
    print(f"\n  Scenario: {s}")
    print(f"  → Model: {rec.model_name} ({rec.dimension}-dim, {rec.distance.name})")
    print(f"    HNSW: m={rec.hnsw_m}, ef_construct={rec.hnsw_ef_construct}")
    print(f"    Reason: {rec.reason}")

Step 16: Report and clean up tutorial collections

The code below lists all tutorial collections with their document counts, flushes each one, then deletes them all.
async def cleanup():
    collections_to_delete = [
        "embeddings-minilm",
        "embeddings-mpnet",
        "embeddings-e5-large",
        "embeddings-multimodel",
    ]

    async with AsyncVectorAIClient(url=SERVER) as client:
        for name in collections_to_delete:
            try:
                count = await client.vde.get_vector_count(name)
                print(f"  '{name}': {count} documents")
                await client.vde.flush(name)
            except Exception:
                pass  # collection may not exist

        print("\nDeleting tutorial collections...")
        for name in collections_to_delete:
            try:
                await client.collections.delete(name)
                print(f"  Deleted '{name}'")
            except Exception:
                pass

asyncio.run(cleanup())

Quick reference: model → VectorAI configuration

ModelDimsDistanceHNSW mHNSW ef_construct
sentence-transformers/all-MiniLM-L6-v2384Cosine16100–128
sentence-transformers/all-MiniLM-L12-v2384Cosine16128
sentence-transformers/all-mpnet-base-v2768Cosine16128
sentence-transformers/multi-qa-mpnet-base-dot-v1768Dot16128
sentence-transformers/all-distilroberta-v1768Cosine16128
intfloat/e5-large-v21024Cosine32256
BAAI/bge-large-en-v1.51024Cosine32256
sentence-transformers/clip-ViT-B-32512Cosine16128

Actian VectorAI features used

FeatureAPIPurpose
Collection creationcollections.get_or_create(vectors_config=VectorParams(...))Configure dimensions and distance metric
Distance metricsDistance.Cosine, Distance.Dot, Distance.EuclidMatch the metric to the model’s training objective
Exact searchSearchParams(exact=True)Brute-force ground truth for benchmarking
hnsw_ef tuningSearchParams(hnsw_ef=256)Control accuracy vs speed at query time
Batch ingestionpoints.upsert(collection, points=[...])Insert or update vectors and payloads
Named vectorsvectors_config={"minilm": VectorParams(...), "mpnet": VectorParams(...)}Multiple models stored in one collection
Named vector searchpoints.search(..., using="mpnet")Search a specific embedding space by name
Multi-model fusionPrefetchQuery(using="minilm") + Fusion.RRFCombine results from different models
Selective payloadWithPayloadSelector(include=["text"])Return only needed payload fields
Vector countvde.get_vector_count()Verify ingestion completed successfully
Flushvde.flush()Persist pending writes to durable storage

Next steps