> ## Documentation Index
> Fetch the complete documentation index at: https://actianvectorai-ml-crtx-1153-academy-tutorial-rewrites.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Use open-source embedding models

> Learn how to choose, configure, and integrate open-source embedding models with Actian VectorAI DB — covering model selection, dimensionality trade-offs, distance metrics, batch ingestion, quantization for large models, named vectors for multimodel search, and reembedding workflows.

This is a hands-on tutorial. You will run Python against a local Actian VectorAI DB instance and step through choosing models, ingesting embeddings, and comparing search behavior. By the end, you will have a working multimodel search pipeline that lets you compare retrieval quality across different embedding architectures side by side.

Embedding models convert text (or images, audio, code) into dense numerical vectors that capture semantic meaning. Actian VectorAI DB stores these vectors and retrieves similar ones at scale — but the quality of your search depends entirely on the quality of your embeddings.

Choosing the right open-source model is one of the most impactful decisions you will make when building a vector search application. The wrong model wastes storage on unhelpful dimensions, produces low-recall results, and adds unnecessary latency.

***

## Architecture overview

The diagram below shows how documents flow through model selection, embedding, and storage in Actian VectorAI DB. Each model produces vectors of a different size, stored in separate collections so you can compare retrieval quality across configurations.

***

## Environment setup

Run the following command to install the two packages this tutorial depends on.

```bash theme={null}
pip install actian-vectorai-client sentence-transformers
```

### What this installs

Both packages are required: one communicates with the database, and the other loads and runs embedding models on your machine.

* `actian-vectorai-client` — Official Python SDK for Actian VectorAI DB; provides async/sync clients, Filter DSL, and gRPC transport.
* `sentence-transformers` — Framework for loading and running open-source embedding models; downloads and caches models from Hugging Face.

***

## Step 1: Understand the model landscape

Before writing any code, review the table below to understand how the available models differ in dimension count, speed, and quality. The model you choose determines the shape of every vector stored in the database.

| Model                                              | Dimensions | Speed     | Quality      | Best for                          |
| -------------------------------------------------- | ---------- | --------- | ------------ | --------------------------------- |
| `sentence-transformers/all-MiniLM-L6-v2`           | 384        | Very fast | Good         | Prototyping, low-latency apps     |
| `sentence-transformers/all-MiniLM-L12-v2`          | 384        | Fast      | Better       | Production with speed constraints |
| `sentence-transformers/all-mpnet-base-v2`          | 768        | Moderate  | High         | General production use            |
| `sentence-transformers/multi-qa-mpnet-base-dot-v1` | 768        | Moderate  | High (QA)    | Question-answering systems        |
| `sentence-transformers/all-distilroberta-v1`       | 768        | Moderate  | High         | Diverse text types                |
| `intfloat/e5-large-v2`                             | 1024       | Slow      | Very high    | Maximum quality, offline indexing |
| `BAAI/bge-large-en-v1.5`                           | 1024       | Slow      | Very high    | Benchmarks, academic use          |
| `sentence-transformers/clip-ViT-B-32`              | 512        | Moderate  | High (multi) | Text + image multimodal           |

### Key trade-offs

Keep the following trade-offs in mind before choosing a model.

* More dimensions means more storage and slower search, but better semantic resolution.
* Fewer dimensions means less RAM and faster search, but may lose subtle meaning.
* Model architecture matters more than dimension count — a well-trained 384-dim model can outperform a poorly trained 768-dim one.

***

## Step 2: Import dependencies and configure

The block below imports every module used across all steps of this tutorial and sets the server address. Run it once at the top of your script or notebook.

```python theme={null}
import asyncio
import time
from sentence_transformers import SentenceTransformer

from actian_vectorai import (
    AsyncVectorAIClient,
    Distance,
    Field,
    FilterBuilder,
    PointStruct,
    PrefetchQuery,
    VectorParams,
)
from actian_vectorai.models.collections import HnswConfigDiff
from actian_vectorai.models.enums import Fusion
from actian_vectorai.models.points import (
    SearchParams,
    WithPayloadSelector,
)

SERVER = "localhost:6574"

print(f"VectorAI Server: {SERVER}")
```

### Expected output

```
VectorAI Server: localhost:6574
```

***

## Step 3: Load multiple models and compare embedding output

The code below loads three models — small, medium, and large — and encodes the same sample sentence with each one. Running it prints each model's load time, dimension count, and the first five values of the resulting vector, confirming that each model produces a vector of a different size.

```python theme={null}
# Three models spanning small, medium, and large architectures
models = {
    "minilm": {
        "name": "all-MiniLM-L6-v2",
        "dim": 384,
        "description": "Small, fast, good for prototyping",
    },
    "mpnet": {
        "name": "all-mpnet-base-v2",
        "dim": 768,
        "description": "Balanced quality and speed",
    },
    "e5-large": {
        "name": "intfloat/e5-large-v2",
        "dim": 1024,
        "description": "High quality, slower",
    },
}

loaded_models = {}
for key, info in models.items():
    print(f"Loading {info['name']}...")
    t0 = time.time()
    loaded_models[key] = SentenceTransformer(info["name"])
    elapsed = time.time() - t0
    print(f"  Loaded in {elapsed:.1f}s — {info['dim']} dimensions — {info['description']}")

sample_text = "Vector databases store high-dimensional embeddings for similarity search."
print(f"\nSample: \"{sample_text}\"\n")
for key, m in loaded_models.items():
    vec = m.encode(sample_text)
    print(f"  {key:>10}: dim={len(vec)}, first 5 values={vec[:5].round(4).tolist()}")
```

### Expected output

This block iterates over the three model definitions — MiniLM-L6 (384 dimensions), MPNet-base (768 dimensions), and E5-large-v2 (1024 dimensions) — loads each one from the Hugging Face cache, and records the load time. It then encodes the same sample sentence with every loaded model and prints each model's actual output dimension and the first five vector values, confirming that the models are producing embeddings of the expected shape.

```
Loading all-MiniLM-L6-v2...
  Loaded in <time>s — 384 dimensions — Small, fast, good for prototyping
Loading all-mpnet-base-v2...
  Loaded in <time>s — 768 dimensions — Balanced quality and speed
Loading intfloat/e5-large-v2...
  Loaded in <time>s — 1024 dimensions — High quality, slower

Sample: "Vector databases store high-dimensional embeddings for similarity search."

      minilm: dim=384, first 5 values=[...]
       mpnet: dim=768, first 5 values=[...]
    e5-large: dim=1024, first 5 values=[...]
```

> Load times and vector values vary by hardware, library version, and model revision.

***

## Step 4: Measure embedding speed

The code below encodes a batch of 100 texts with each loaded model and prints total encoding time and throughput in texts per second. Running it shows how much slower larger models are relative to smaller ones, which directly affects ingestion time and real-time query latency.

```python theme={null}
benchmark_texts = [
    "How to create a collection in VectorAI DB?",
    "Semantic search finds documents by meaning rather than keywords.",
    "HNSW indexing builds a navigable small-world graph for fast search.",
    "Payload filters combine vector similarity with structured conditions.",
    "Quantization reduces memory usage by compressing vector components.",
] * 20  # 100 texts total

print(f"Benchmarking {len(benchmark_texts)} texts:\n")

for key, m in loaded_models.items():
    t0 = time.time()
    vecs = m.encode(benchmark_texts)
    elapsed = time.time() - t0
    throughput = len(benchmark_texts) / elapsed
    print(
        f"  {key:>10}: {elapsed:.3f}s total, "
        f"{throughput:.0f} texts/sec, "
        f"{models[key]['dim']} dims"
    )
```

### Expected output

This block constructs a 100-text corpus and passes the full batch to each model's `encode` method. It measures wall-clock time and computes throughput in texts per second, making the latency cost of moving from MiniLM to E5-large directly visible.

```
Benchmarking 100 texts:

      minilm: <time>s total, <n> texts/sec, 384 dims
       mpnet: <time>s total, <n> texts/sec, 768 dims
    e5-large: <time>s total, <n> texts/sec, 1024 dims
```

> Throughput figures depend on your hardware and whether a GPU is available. Relative ordering — MiniLM fastest, E5-large slowest — is consistent across environments.

MiniLM is significantly faster than E5-large. For real-time query embedding, this difference directly impacts response latency. For batch ingestion, it affects indexing time but not search quality.

***

## Step 5: Match distance metrics to models

Different models are trained with different objectives, and using the wrong distance metric silently degrades retrieval quality. The code below prints the correct metric for each model so that you can verify your collection configuration matches the model.

```python theme={null}
# Correct distance metric for each model based on its training objective
model_distance_map = {
    "all-MiniLM-L6-v2":            Distance.Cosine,
    "all-MiniLM-L12-v2":           Distance.Cosine,
    "all-mpnet-base-v2":           Distance.Cosine,
    "all-distilroberta-v1":        Distance.Cosine,
    "multi-qa-mpnet-base-dot-v1":  Distance.Dot,
    "intfloat/e5-large-v2":        Distance.Cosine,
    "BAAI/bge-large-en-v1.5":      Distance.Cosine,
}

print("Model → Distance metric:\n")
for model_name, dist in model_distance_map.items():
    print(f"  {model_name:<35} → {dist.name}")
```

The table below describes when each distance metric applies and which models use it.

| Distance          | When to use                                                                                     | Models trained with it     |
| ----------------- | ----------------------------------------------------------------------------------------------- | -------------------------- |
| `Distance.Cosine` | Most general-purpose models, where outputs are normalized or benefit from angular comparison.   | MiniLM, MPNet, E5, BGE     |
| `Distance.Dot`    | Models trained with dot-product loss, where outputs are not normalized and magnitude matters.   | multi-qa-mpnet-base-dot-v1 |
| `Distance.Euclid` | When absolute distance matters; rare for text, but common for structured or tabular embeddings. | Custom models              |

If the model documentation says "cosine similarity", use `Distance.Cosine`. If it says "dot product", use `Distance.Dot`. When in doubt, use `Distance.Cosine`.

***

## Step 6: Create collections for different models

The code below creates one collection for each of the three models, each configured with the matching dimension count, distance metric, and HNSW settings. Running it prints a confirmation line per collection.

```python theme={null}
collection_configs = {
    "embeddings-minilm": {
        "dim": 384,
        "distance": Distance.Cosine,
        "hnsw": HnswConfigDiff(m=16, ef_construct=128),
    },
    "embeddings-mpnet": {
        "dim": 768,
        "distance": Distance.Cosine,
        "hnsw": HnswConfigDiff(m=16, ef_construct=128),
    },
    "embeddings-e5-large": {
        "dim": 1024,
        "distance": Distance.Cosine,
        "hnsw": HnswConfigDiff(m=16, ef_construct=128),
    },
}

async def create_collections():
    for name, cfg in collection_configs.items():
        async with AsyncVectorAIClient(url=SERVER) as client:
            # Delete any stale collection from previous runs before creating fresh
            try:
                await client.collections.delete(name)
            except Exception:
                pass
            await client.collections.get_or_create(
                name=name,
                vectors_config=VectorParams(
                    size=cfg["dim"],
                    distance=cfg["distance"],
                ),
                hnsw_config=cfg["hnsw"],
            )
            # get_info confirms the collection is fully committed on the server
            # before any subsequent operation uses it
            await client.collections.get_info(name)
            print(f"  Collection '{name}' ready: {cfg['dim']}-dim, {cfg['distance'].name}")

asyncio.run(create_collections())
```

### Memory footprint at scale

The table below shows how much memory each model requires at one million documents. Use it to plan your deployment.

| Model    | Dims  | Bytes/vector (float32) | Memory at 1M docs |
| -------- | ----- | ---------------------- | ----------------- |
| MiniLM   | 384   | 1,536                  | \~1.5 GB          |
| MPNet    | 768   | 3,072                  | \~3.0 GB          |
| E5-large | 1,024 | 4,096                  | \~4.0 GB          |

### Expected output

```
  Collection 'embeddings-minilm' ready: 384-dim, Cosine
  Collection 'embeddings-mpnet' ready: 768-dim, Cosine
  Collection 'embeddings-e5-large' ready: 1024-dim, Cosine
```

> **Pattern:** Delete any stale collection first, then `get_or_create`, then call `get_info` as a synchronisation barrier before closing the connection. This confirms the collection is fully committed on the server. Upsert into the collection only after this sequence completes — in a separate connection if needed.

***

## Step 7: Prepare a shared dataset

The code below defines a list of 20 short passages with `topic` and `difficulty` metadata. All three models embed this same dataset so that retrieval quality can be compared directly.

```python theme={null}
documents = [
    {"text": "HNSW is a graph-based approximate nearest-neighbour index that achieves sub-linear search time.", "topic": "indexing", "difficulty": "intermediate"},
    {"text": "Cosine similarity measures the angle between two vectors and ranges from -1 to 1.", "topic": "distance_metrics", "difficulty": "beginner"},
    {"text": "Payload filters let you combine vector similarity with structured conditions like category equals electronics.", "topic": "filtering", "difficulty": "beginner"},
    {"text": "Scalar quantization compresses each float32 component to int8, reducing memory by 4x.", "topic": "quantization", "difficulty": "intermediate"},
    {"text": "Prefetch queries retrieve candidates from multiple vector spaces and merge them with fusion.", "topic": "search", "difficulty": "advanced"},
    {"text": "Named vectors allow a single collection to store multiple embedding spaces per document.", "topic": "multimodal", "difficulty": "intermediate"},
    {"text": "The FilterBuilder supports must, should, and must_not for complex boolean logic.", "topic": "filtering", "difficulty": "beginner"},
    {"text": "Connection pooling distributes gRPC calls across multiple channels for higher throughput.", "topic": "infrastructure", "difficulty": "advanced"},
    {"text": "Score thresholds discard results below a minimum similarity, preventing low-quality answers.", "topic": "search", "difficulty": "beginner"},
    {"text": "Reciprocal rank fusion merges results from multiple retrieval strategies by rank position.", "topic": "search", "difficulty": "advanced"},
    {"text": "The HNSW m parameter controls how many neighbours each node connects to during index construction.", "topic": "indexing", "difficulty": "intermediate"},
    {"text": "Euclidean distance measures the straight-line distance between two points in vector space.", "topic": "distance_metrics", "difficulty": "beginner"},
    {"text": "Dot product similarity is equivalent to cosine similarity when vectors are unit-normalized.", "topic": "distance_metrics", "difficulty": "intermediate"},
    {"text": "Batch search sends multiple queries in a single RPC call for better throughput.", "topic": "search", "difficulty": "intermediate"},
    {"text": "The VDE namespace provides operational commands like flush, rebuild index, and compact.", "topic": "infrastructure", "difficulty": "advanced"},
    {"text": "The ef parameter at search time controls accuracy versus speed for HNSW queries.", "topic": "indexing", "difficulty": "intermediate"},
    {"text": "Product quantization divides vectors into subspaces and quantizes each independently.", "topic": "quantization", "difficulty": "advanced"},
    {"text": "Geo filters restrict results to points within a radius, bounding box, or polygon.", "topic": "filtering", "difficulty": "intermediate"},
    {"text": "SmartBatcher provides streaming ingestion with automatic size-based and time-based flushing.", "topic": "infrastructure", "difficulty": "advanced"},
    {"text": "Text indexes tokenize string payloads to support full-text search alongside vector similarity.", "topic": "filtering", "difficulty": "intermediate"},
]
```

***

## Step 8: Embed and ingest with each model

The code below embeds all 20 documents with each model and upserts the resulting vectors into the corresponding collection. Running it prints one line per model showing embedding time, ingestion time, and the total number of points confirmed in the collection after flushing.

```python theme={null}
model_to_collection = {
    "minilm":    "embeddings-minilm",
    "mpnet":     "embeddings-mpnet",
    "e5-large":  "embeddings-e5-large",
}

async def ingest_with_model(model_key: str, collection_name: str):
    m = loaded_models[model_key]
    texts = [d["text"] for d in documents]

    # Embed before opening the connection — CPU-only, no server needed
    t0 = time.time()
    vectors = m.encode(texts).tolist()
    embed_time = time.time() - t0

    points = [
        PointStruct(
            id=i,
            vector=vec,
            payload={
                "text":       doc["text"],
                "topic":      doc["topic"],
                "difficulty": doc["difficulty"],
                "model":      model_key,
            },
        )
        for i, (doc, vec) in enumerate(zip(documents, vectors))
    ]

    # Open a fresh connection for upsert — collection is fully committed from Step 6
    async with AsyncVectorAIClient(url=SERVER) as client:
        t1 = time.time()
        await client.points.upsert(collection_name, points=points)
        ingest_time = time.time() - t1
        await client.vde.flush(collection_name)
        count = await client.vde.get_vector_count(collection_name)

    print(
        f"  {model_key:>10}: embedded in {embed_time:.3f}s, "
        f"ingested {len(points)} pts in {ingest_time:.3f}s, "
        f"total={count}"
    )

async def ingest_all():
    for key, collection in model_to_collection.items():
        await ingest_with_model(key, collection)

asyncio.run(ingest_all())
```

### Expected output

```
      minilm: embedded in <time>s, ingested 20 pts in <time>s, total=20
       mpnet: embedded in <time>s, ingested 20 pts in <time>s, total=20
    e5-large: embedded in <time>s, ingested 20 pts in <time>s, total=20
```

> Timing values vary by hardware and network conditions.

***

## Step 9: Compare search quality across models

The code below runs three test queries against all three collections and prints the top-scoring documents from each. Running it lets you see whether different models surface different documents for the same query, and how confidently each model scores its top result.

```python theme={null}
test_queries = [
    "How does approximate nearest neighbour search work?",
    "What is the difference between cosine and dot product?",
    "How to filter search results by category?",
]

async def compare_models(query: str, top_k: int = 3):
    print(f"\nQuery: \"{query}\"\n")

    for key, collection in model_to_collection.items():
        m = loaded_models[key]
        vec = m.encode(query).tolist()

        async with AsyncVectorAIClient(url=SERVER) as client:
            results = await client.points.search(
                collection, vector=vec, limit=top_k,
                with_payload=WithPayloadSelector(include=["text", "topic"]),
            ) or []

        print(f"  [{key:>10}]")
        for r in results:
            print(f"    score={r.score:.4f}  [{r.payload['topic']:>16}]  {r.payload['text'][:65]}...")
        print()

for q in test_queries:
    asyncio.run(compare_models(q))
```

### What to look for

Use the same query across collections to see how each model ranks passages.

* Check whether scores are spread between relevant and irrelevant results. Higher-quality models tend to produce wider separation, making it easier to set a threshold.
* Check whether the top result is the most relevant document. When models disagree on rank 1, the larger model is often a better reference, though results vary by query and domain.
* Check whether cosine scores fall above 0.7, which indicates strong relevance for most sentence transformers. Scores below 0.4 are usually noise.

***

## Step 10: Tune search accuracy with SearchParams

`SearchParams` controls how the HNSW index is traversed at query time. The code below runs the same query two ways — approximate with a specific `hnsw_ef` value, and exact brute-force — so you can compare accuracy and latency directly.

```python theme={null}
async def tuned_vs_exact(query: str, top_k: int = 5):
    m = loaded_models["minilm"]
    vec = m.encode(query).tolist()

    async with AsyncVectorAIClient(url=SERVER) as client:
        # High-accuracy approximate search
        t0 = time.time()
        approx_results = await client.points.search(
            "embeddings-minilm",
            vector=vec,
            limit=top_k,
            with_payload=WithPayloadSelector(include=["text"]),
            params=SearchParams(hnsw_ef=256),
        ) or []
        approx_time = time.time() - t0

        # Exact brute-force — 100% recall, slowest option
        t1 = time.time()
        exact_results = await client.points.search(
            "embeddings-minilm",
            vector=vec,
            limit=top_k,
            with_payload=WithPayloadSelector(include=["text"]),
            params=SearchParams(exact=True),
        ) or []
        exact_time = time.time() - t1

    print(f"Query: \"{query}\"\n")
    print(f"  Approximate (hnsw_ef=256): {approx_time*1000:.1f}ms")
    for r in approx_results[:3]:
        print(f"    id={r.id}  score={r.score:.4f}  {r.payload['text'][:60]}...")

    print(f"\n  Exact (brute-force): {exact_time*1000:.1f}ms")
    for r in exact_results[:3]:
        print(f"    id={r.id}  score={r.score:.4f}  {r.payload['text'][:60]}...")

asyncio.run(tuned_vs_exact("How does graph-based indexing work?"))
```

| Setting       | Speed   | Accuracy | When to use                                |
| ------------- | ------- | -------- | ------------------------------------------ |
| `hnsw_ef=64`  | Fastest | Good     | High-throughput, latency-sensitive         |
| `hnsw_ef=256` | Fast    | High     | Default recommendation for most use cases  |
| `exact=True`  | Slowest | Perfect  | Benchmarking and establishing ground truth |

***

## Step 11: Named vectors — multiple models in one collection

Rather than creating separate collections, you can store embeddings from multiple models in a single collection using named vectors. The code below creates a collection with two named vector spaces, embeds the shared dataset with both models, and uploads each document with both vectors attached.

```python theme={null}
async def create_multi_model_collection():
    async with AsyncVectorAIClient(url=SERVER) as client:
        await client.collections.get_or_create(
            name="embeddings-multimodel",
            vectors_config={
                "minilm": VectorParams(size=384, distance=Distance.Cosine),
                "mpnet":  VectorParams(size=768, distance=Distance.Cosine),
            },
            hnsw_config=HnswConfigDiff(m=16, ef_construct=128),
        )

        texts = [d["text"] for d in documents]
        minilm_vecs = loaded_models["minilm"].encode(texts).tolist()
        mpnet_vecs  = loaded_models["mpnet"].encode(texts).tolist()

        points = [
            PointStruct(
                id=i,
                vector={"minilm": minilm_vecs[i], "mpnet": mpnet_vecs[i]},
                payload={
                    "text":       doc["text"],
                    "topic":      doc["topic"],
                    "difficulty": doc["difficulty"],
                },
            )
            for i, doc in enumerate(documents)
        ]

        await client.points.upsert("embeddings-multimodel", points=points)
        await client.vde.flush("embeddings-multimodel")
        count = await client.vde.get_vector_count("embeddings-multimodel")

    print(f"Multi-model collection ready: {count} documents, 2 vector spaces")

asyncio.run(create_multi_model_collection())
```

### Expected output

This block creates a single collection with two named vector spaces — `"minilm"` (384-dim, Cosine) and `"mpnet"` (768-dim, Cosine) — then encodes all 20 shared documents with both models. Each document is stored as a single `PointStruct` carrying both embedding vectors alongside its text, topic, and difficulty metadata.

```
Multi-model collection ready: 20 documents, 2 vector spaces
```

> `vde.get_vector_count()` returns the total across all named spaces — 20 documents × 2 spaces = 40 indexed vectors.

***

## Step 12: Search individual models and fuse results

Each named space was produced by a different encoder, so you embed the query once per model and pass the vector that matches the `using` parameter. The code below runs two single-space searches, then one fused query that merges candidate lists with reciprocal rank fusion (RRF).

```python theme={null}
async def multi_model_search(query: str, top_k: int = 5):
    minilm_vec = loaded_models["minilm"].encode(query).tolist()
    mpnet_vec  = loaded_models["mpnet"].encode(query).tolist()

    async with AsyncVectorAIClient(url=SERVER) as client:
        minilm_results = await client.points.search(
            "embeddings-multimodel",
            vector=minilm_vec, using="minilm", limit=top_k,
            with_payload=WithPayloadSelector(include=["text", "topic"]),
        ) or []

        mpnet_results = await client.points.search(
            "embeddings-multimodel",
            vector=mpnet_vec, using="mpnet", limit=top_k,
            with_payload=WithPayloadSelector(include=["text", "topic"]),
        ) or []

        fused_results = await client.points.query(
            "embeddings-multimodel",
            query={"fusion": Fusion.RRF},
            prefetch=[
                PrefetchQuery(query=minilm_vec, using="minilm", limit=10),
                PrefetchQuery(query=mpnet_vec,  using="mpnet",  limit=10),
            ],
            limit=top_k,
            with_payload=WithPayloadSelector(include=["text", "topic"]),
        )

    print(f"Query: \"{query}\"\n")

    print("  [MiniLM only]")
    for r in (minilm_results or [])[:3]:
        print(f"    score={r.score:.4f}  {r.payload['text'][:60]}...")

    print("\n  [MPNet only]")
    for r in (mpnet_results or [])[:3]:
        print(f"    score={r.score:.4f}  {r.payload['text'][:60]}...")

    print("\n  [RRF fusion: MiniLM + MPNet]")
    for r in (list(fused_results) if fused_results else [])[:3]:
        print(f"    score={r.score:.4f}  {r.payload['text'][:60]}...")

asyncio.run(multi_model_search("How does approximate search indexing work?"))
```

### Why multimodel fusion improves results

Different models capture different aspects of meaning. MiniLM is strong at lexical similarity, where "search" closely matches "search." MPNet better captures paraphrases, where "ANN search" matches "approximate nearest-neighbour." Documents that rank highly in both models are almost certainly relevant, and RRF fusion naturally promotes those consensus results.

***

## Step 13: Re-embed specific vectors when switching models

To upgrade from one model to another, re-embed the text payloads and re-upsert all affected points. The code below simulates upgrading the MiniLM space from L6 to L12 by fetching all existing points, re-encoding each text with the upgraded model, and upserting the new vectors back.

```python theme={null}
async def reembed_minilm_space():
    """Simulate upgrading minilm from L6 to L12."""
    upgraded_model = SentenceTransformer("all-MiniLM-L12-v2")

    async with AsyncVectorAIClient(url=SERVER) as client:
        # Retrieve all existing points to access their stored text payloads
        all_points = await client.points.get(
            "embeddings-multimodel",
            ids=list(range(len(documents))),
            with_payload=WithPayloadSelector(include=["text"]),
            with_vectors=False,
        )

        # Re-encode each document with the upgraded model
        updated = []
        for pt in all_points:
            text = pt.payload.get("text", "")
            new_vec = upgraded_model.encode(text).tolist()
            # Upsert the full point with the updated minilm vector
            updated.append(PointStruct(
                id=pt.id,
                vector={"minilm": new_vec, "mpnet": loaded_models["mpnet"].encode(text).tolist()},
                payload=pt.payload,
            ))

        await client.points.upsert("embeddings-multimodel", points=updated)
        await client.vde.flush("embeddings-multimodel")

    print(f"Re-embedded {len(updated)} points in 'minilm' space with L12 model.")
    print("MPNet vectors are unchanged.")

asyncio.run(reembed_minilm_space())
```

### Why this pattern matters

Re-upserting lets you upgrade one model's vectors without losing other data. This matters in the following situations.

* You want to upgrade one model without re-processing all data from scratch.
* Different teams own different embedding spaces.
* You need to keep payload metadata intact during a model upgrade.

***

## Step 14: Run multiple searches efficiently

When you need to run multiple queries against the same collection, run them sequentially within a single client connection to minimise connection overhead.

```python theme={null}
async def multi_search(queries: list, top_k: int = 3):
    for collection_key in ["minilm", "mpnet"]:
        m = loaded_models[collection_key]
        collection = model_to_collection[collection_key]

        async with AsyncVectorAIClient(url=SERVER) as client:
            print(f"\n[{collection_key}] — {len(queries)} queries:\n")
            for i, query in enumerate(queries):
                vec = m.encode(query).tolist()
                results = await client.points.search(
                    collection, vector=vec, limit=top_k,
                    with_payload=WithPayloadSelector(include=["text"]),
                ) or []
                print(f"  Q{i+1}: \"{query[:50]}...\"")
                for r in results[:2]:
                    print(f"    score={r.score:.4f}  {r.payload['text'][:55]}...")

asyncio.run(multi_search([
    "How does HNSW graph indexing work?",
    "What are the options for compressing vectors?",
    "How to combine vector search with metadata filters?",
]))
```

Keeping all searches inside a single `async with` block reuses the same gRPC channel, eliminating repeated connection setup overhead for each query. Network round trips are reduced because the connection is opened once and shared across all queries in the block.

***

## Step 15: Build a model selection helper

The code below defines a `recommend_model` function that takes corpus size, latency budget, and quality priority as inputs and returns a recommended model with configuration.

```python theme={null}
from dataclasses import dataclass

@dataclass
class ModelRecommendation:
    model_name: str
    dimension: int
    distance: Distance
    hnsw_m: int
    hnsw_ef_construct: int
    reason: str

def recommend_model(
    corpus_size: int,
    max_latency_ms: float,
    quality_priority: str = "balanced",
) -> ModelRecommendation:
    """Return a model and VectorAI configuration for the given constraints."""

    est_ram_gb = lambda dim: (dim * 4 * corpus_size) / (1024**3)

    if quality_priority == "speed" or max_latency_ms < 10:
        return ModelRecommendation(
            model_name="all-MiniLM-L6-v2", dimension=384,
            distance=Distance.Cosine, hnsw_m=16, hnsw_ef_construct=100,
            reason=f"Fastest model. RAM: ~{est_ram_gb(384):.1f} GB for {corpus_size:,} docs.",
        )
    if quality_priority == "quality":
        return ModelRecommendation(
            model_name="intfloat/e5-large-v2", dimension=1024,
            distance=Distance.Cosine, hnsw_m=32, hnsw_ef_construct=256,
            reason=f"Highest quality. RAM: ~{est_ram_gb(1024):.1f} GB float32.",
        )
    return ModelRecommendation(
        model_name="all-mpnet-base-v2", dimension=768,
        distance=Distance.Cosine, hnsw_m=16, hnsw_ef_construct=128,
        reason=f"Balanced quality and speed. RAM: ~{est_ram_gb(768):.1f} GB float32.",
    )

scenarios = [
    {"corpus_size": 10_000,    "max_latency_ms": 5,   "quality_priority": "speed"},
    {"corpus_size": 500_000,   "max_latency_ms": 50,  "quality_priority": "balanced"},
    {"corpus_size": 2_000_000, "max_latency_ms": 200, "quality_priority": "quality"},
]

for s in scenarios:
    rec = recommend_model(**s)
    print(f"\n  Scenario: {s}")
    print(f"  → Model: {rec.model_name} ({rec.dimension}-dim, {rec.distance.name})")
    print(f"    HNSW: m={rec.hnsw_m}, ef_construct={rec.hnsw_ef_construct}")
    print(f"    Reason: {rec.reason}")
```

***

## Step 16: Report and clean up tutorial collections

The code below lists all tutorial collections with their document counts, flushes each one, then deletes them all.

```python theme={null}
async def cleanup():
    collections_to_delete = [
        "embeddings-minilm",
        "embeddings-mpnet",
        "embeddings-e5-large",
        "embeddings-multimodel",
    ]

    async with AsyncVectorAIClient(url=SERVER) as client:
        for name in collections_to_delete:
            try:
                count = await client.vde.get_vector_count(name)
                print(f"  '{name}': {count} documents")
                await client.vde.flush(name)
            except Exception:
                pass  # collection may not exist

        print("\nDeleting tutorial collections...")
        for name in collections_to_delete:
            try:
                await client.collections.delete(name)
                print(f"  Deleted '{name}'")
            except Exception:
                pass

asyncio.run(cleanup())
```

***

## Quick reference: model → VectorAI configuration

| Model                                              | Dims | Distance | HNSW m | HNSW ef\_construct |
| -------------------------------------------------- | ---- | -------- | ------ | ------------------ |
| `sentence-transformers/all-MiniLM-L6-v2`           | 384  | Cosine   | 16     | 100–128            |
| `sentence-transformers/all-MiniLM-L12-v2`          | 384  | Cosine   | 16     | 128                |
| `sentence-transformers/all-mpnet-base-v2`          | 768  | Cosine   | 16     | 128                |
| `sentence-transformers/multi-qa-mpnet-base-dot-v1` | 768  | Dot      | 16     | 128                |
| `sentence-transformers/all-distilroberta-v1`       | 768  | Cosine   | 16     | 128                |
| `intfloat/e5-large-v2`                             | 1024 | Cosine   | 32     | 256                |
| `BAAI/bge-large-en-v1.5`                           | 1024 | Cosine   | 32     | 256                |
| `sentence-transformers/clip-ViT-B-32`              | 512  | Cosine   | 16     | 128                |

***

## Actian VectorAI features used

| Feature             | API                                                                        | Purpose                                            |
| ------------------- | -------------------------------------------------------------------------- | -------------------------------------------------- |
| Collection creation | `collections.get_or_create(vectors_config=VectorParams(...))`              | Configure dimensions and distance metric           |
| Distance metrics    | `Distance.Cosine`, `Distance.Dot`, `Distance.Euclid`                       | Match the metric to the model's training objective |
| Exact search        | `SearchParams(exact=True)`                                                 | Brute-force ground truth for benchmarking          |
| hnsw\_ef tuning     | `SearchParams(hnsw_ef=256)`                                                | Control accuracy vs speed at query time            |
| Batch ingestion     | `points.upsert(collection, points=[...])`                                  | Insert or update vectors and payloads              |
| Named vectors       | `vectors_config={"minilm": VectorParams(...), "mpnet": VectorParams(...)}` | Multiple models stored in one collection           |
| Named vector search | `points.search(..., using="mpnet")`                                        | Search a specific embedding space by name          |
| Multi-model fusion  | `PrefetchQuery(using="minilm") + Fusion.RRF`                               | Combine results from different models              |
| Selective payload   | `WithPayloadSelector(include=["text"])`                                    | Return only needed payload fields                  |
| Vector count        | `vde.get_vector_count()`                                                   | Verify ingestion completed successfully            |
| Flush               | `vde.flush()`                                                              | Persist pending writes to durable storage          |

***

## Next steps

* [Building multimodal systems](/academy/tutorials/multimodel-system) — Add image embeddings with CLIP alongside text models
* [Optimizing retrieval quality](/academy/tutorials/retrieval-quality) — Tune HNSW parameters and search settings
* [Reranking search results](/academy/tutorials/re-ranking) — Improve result relevance with cross-encoders and fusion
* [Similarity search fundamentals](/academy/tutorials/similarity-search) — Master the core search and query workflow
