Architecture overview
The diagram below shows how documents flow through model selection, embedding, and storage in Actian VectorAI DB. Each model produces vectors of a different size, stored in separate collections so you can compare retrieval quality across configurations.Environment setup
Run the following command to install the two packages this tutorial depends on.What this installs
Both packages are required: one communicates with the database, and the other loads and runs embedding models on your machine.actian-vectorai-client— Official Python SDK for Actian VectorAI DB; provides async/sync clients, Filter DSL, and gRPC transport.sentence-transformers— Framework for loading and running open-source embedding models; downloads and caches models from Hugging Face.
Step 1: Understand the model landscape
Before writing any code, review the table below to understand how the available models differ in dimension count, speed, and quality. The model you choose determines the shape of every vector stored in the database.| Model | Dimensions | Speed | Quality | Best for |
|---|---|---|---|---|
sentence-transformers/all-MiniLM-L6-v2 | 384 | Very fast | Good | Prototyping, low-latency apps |
sentence-transformers/all-MiniLM-L12-v2 | 384 | Fast | Better | Production with speed constraints |
sentence-transformers/all-mpnet-base-v2 | 768 | Moderate | High | General production use |
sentence-transformers/multi-qa-mpnet-base-dot-v1 | 768 | Moderate | High (QA) | Question-answering systems |
sentence-transformers/all-distilroberta-v1 | 768 | Moderate | High | Diverse text types |
intfloat/e5-large-v2 | 1024 | Slow | Very high | Maximum quality, offline indexing |
BAAI/bge-large-en-v1.5 | 1024 | Slow | Very high | Benchmarks, academic use |
sentence-transformers/clip-ViT-B-32 | 512 | Moderate | High (multi) | Text + image multimodal |
Key trade-offs
Keep the following trade-offs in mind before choosing a model.- More dimensions means more storage and slower search, but better semantic resolution.
- Fewer dimensions means less RAM and faster search, but may lose subtle meaning.
- Model architecture matters more than dimension count — a well-trained 384-dim model can outperform a poorly trained 768-dim one.
Step 2: Import dependencies and configure
The block below imports every module used across all steps of this tutorial and sets the server address. Run it once at the top of your script or notebook.Expected output
Step 3: Load multiple models and compare embedding output
The code below loads three models — small, medium, and large — and encodes the same sample sentence with each one. Running it prints each model’s load time, dimension count, and the first five values of the resulting vector, confirming that each model produces a vector of a different size.Expected output
This block iterates over the three model definitions — MiniLM-L6 (384 dimensions), MPNet-base (768 dimensions), and E5-large-v2 (1024 dimensions) — loads each one from the Hugging Face cache, and records the load time. It then encodes the same sample sentence with every loaded model and prints each model’s actual output dimension and the first five vector values, confirming that the models are producing embeddings of the expected shape.Load times and vector values vary by hardware, library version, and model revision.
Step 4: Measure embedding speed
The code below encodes a batch of 100 texts with each loaded model and prints total encoding time and throughput in texts per second. Running it shows how much slower larger models are relative to smaller ones, which directly affects ingestion time and real-time query latency.Expected output
This block constructs a 100-text corpus and passes the full batch to each model’sencode method. It measures wall-clock time and computes throughput in texts per second, making the latency cost of moving from MiniLM to E5-large directly visible.
Throughput figures depend on your hardware and whether a GPU is available. Relative ordering — MiniLM fastest, E5-large slowest — is consistent across environments.MiniLM is significantly faster than E5-large. For real-time query embedding, this difference directly impacts response latency. For batch ingestion, it affects indexing time but not search quality.
Step 5: Match distance metrics to models
Different models are trained with different objectives, and using the wrong distance metric silently degrades retrieval quality. The code below prints the correct metric for each model so that you can verify your collection configuration matches the model.| Distance | When to use | Models trained with it |
|---|---|---|
Distance.Cosine | Most general-purpose models, where outputs are normalized or benefit from angular comparison. | MiniLM, MPNet, E5, BGE |
Distance.Dot | Models trained with dot-product loss, where outputs are not normalized and magnitude matters. | multi-qa-mpnet-base-dot-v1 |
Distance.Euclid | When absolute distance matters; rare for text, but common for structured or tabular embeddings. | Custom models |
Distance.Cosine. If it says “dot product”, use Distance.Dot. When in doubt, use Distance.Cosine.
Step 6: Create collections for different models
The code below creates one collection for each of the three models, each configured with the matching dimension count, distance metric, and HNSW settings. Running it prints a confirmation line per collection.Memory footprint at scale
The table below shows how much memory each model requires at one million documents. Use it to plan your deployment.| Model | Dims | Bytes/vector (float32) | Memory at 1M docs |
|---|---|---|---|
| MiniLM | 384 | 1,536 | ~1.5 GB |
| MPNet | 768 | 3,072 | ~3.0 GB |
| E5-large | 1,024 | 4,096 | ~4.0 GB |
Expected output
Pattern: Delete any stale collection first, thenget_or_create, then callget_infoas a synchronisation barrier before closing the connection. This confirms the collection is fully committed on the server. Upsert into the collection only after this sequence completes — in a separate connection if needed.
Step 7: Prepare a shared dataset
The code below defines a list of 20 short passages withtopic and difficulty metadata. All three models embed this same dataset so that retrieval quality can be compared directly.
Step 8: Embed and ingest with each model
The code below embeds all 20 documents with each model and upserts the resulting vectors into the corresponding collection. Running it prints one line per model showing embedding time, ingestion time, and the total number of points confirmed in the collection after flushing.Expected output
Timing values vary by hardware and network conditions.
Step 9: Compare search quality across models
The code below runs three test queries against all three collections and prints the top-scoring documents from each. Running it lets you see whether different models surface different documents for the same query, and how confidently each model scores its top result.What to look for
Use the same query across collections to see how each model ranks passages.- Check whether scores are spread between relevant and irrelevant results. Higher-quality models tend to produce wider separation, making it easier to set a threshold.
- Check whether the top result is the most relevant document. When models disagree on rank 1, the larger model is often a better reference, though results vary by query and domain.
- Check whether cosine scores fall above 0.7, which indicates strong relevance for most sentence transformers. Scores below 0.4 are usually noise.
Step 10: Tune search accuracy with SearchParams
SearchParams controls how the HNSW index is traversed at query time. The code below runs the same query two ways — approximate with a specific hnsw_ef value, and exact brute-force — so you can compare accuracy and latency directly.
| Setting | Speed | Accuracy | When to use |
|---|---|---|---|
hnsw_ef=64 | Fastest | Good | High-throughput, latency-sensitive |
hnsw_ef=256 | Fast | High | Default recommendation for most use cases |
exact=True | Slowest | Perfect | Benchmarking and establishing ground truth |
Step 11: Named vectors — multiple models in one collection
Rather than creating separate collections, you can store embeddings from multiple models in a single collection using named vectors. The code below creates a collection with two named vector spaces, embeds the shared dataset with both models, and uploads each document with both vectors attached.Expected output
This block creates a single collection with two named vector spaces —"minilm" (384-dim, Cosine) and "mpnet" (768-dim, Cosine) — then encodes all 20 shared documents with both models. Each document is stored as a single PointStruct carrying both embedding vectors alongside its text, topic, and difficulty metadata.
vde.get_vector_count() returns the total across all named spaces — 20 documents × 2 spaces = 40 indexed vectors.
Step 12: Search individual models and fuse results
Each named space was produced by a different encoder, so you embed the query once per model and pass the vector that matches theusing parameter. The code below runs two single-space searches, then one fused query that merges candidate lists with reciprocal rank fusion (RRF).
Why multimodel fusion improves results
Different models capture different aspects of meaning. MiniLM is strong at lexical similarity, where “search” closely matches “search.” MPNet better captures paraphrases, where “ANN search” matches “approximate nearest-neighbour.” Documents that rank highly in both models are almost certainly relevant, and RRF fusion naturally promotes those consensus results.Step 13: Re-embed specific vectors when switching models
To upgrade from one model to another, re-embed the text payloads and re-upsert all affected points. The code below simulates upgrading the MiniLM space from L6 to L12 by fetching all existing points, re-encoding each text with the upgraded model, and upserting the new vectors back.Why this pattern matters
Re-upserting lets you upgrade one model’s vectors without losing other data. This matters in the following situations.- You want to upgrade one model without re-processing all data from scratch.
- Different teams own different embedding spaces.
- You need to keep payload metadata intact during a model upgrade.
Step 14: Run multiple searches efficiently
When you need to run multiple queries against the same collection, run them sequentially within a single client connection to minimise connection overhead.async with block reuses the same gRPC channel, eliminating repeated connection setup overhead for each query. Network round trips are reduced because the connection is opened once and shared across all queries in the block.
Step 15: Build a model selection helper
The code below defines arecommend_model function that takes corpus size, latency budget, and quality priority as inputs and returns a recommended model with configuration.
Step 16: Report and clean up tutorial collections
The code below lists all tutorial collections with their document counts, flushes each one, then deletes them all.Quick reference: model → VectorAI configuration
| Model | Dims | Distance | HNSW m | HNSW ef_construct |
|---|---|---|---|---|
sentence-transformers/all-MiniLM-L6-v2 | 384 | Cosine | 16 | 100–128 |
sentence-transformers/all-MiniLM-L12-v2 | 384 | Cosine | 16 | 128 |
sentence-transformers/all-mpnet-base-v2 | 768 | Cosine | 16 | 128 |
sentence-transformers/multi-qa-mpnet-base-dot-v1 | 768 | Dot | 16 | 128 |
sentence-transformers/all-distilroberta-v1 | 768 | Cosine | 16 | 128 |
intfloat/e5-large-v2 | 1024 | Cosine | 32 | 256 |
BAAI/bge-large-en-v1.5 | 1024 | Cosine | 32 | 256 |
sentence-transformers/clip-ViT-B-32 | 512 | Cosine | 16 | 128 |
Actian VectorAI features used
| Feature | API | Purpose |
|---|---|---|
| Collection creation | collections.get_or_create(vectors_config=VectorParams(...)) | Configure dimensions and distance metric |
| Distance metrics | Distance.Cosine, Distance.Dot, Distance.Euclid | Match the metric to the model’s training objective |
| Exact search | SearchParams(exact=True) | Brute-force ground truth for benchmarking |
| hnsw_ef tuning | SearchParams(hnsw_ef=256) | Control accuracy vs speed at query time |
| Batch ingestion | points.upsert(collection, points=[...]) | Insert or update vectors and payloads |
| Named vectors | vectors_config={"minilm": VectorParams(...), "mpnet": VectorParams(...)} | Multiple models stored in one collection |
| Named vector search | points.search(..., using="mpnet") | Search a specific embedding space by name |
| Multi-model fusion | PrefetchQuery(using="minilm") + Fusion.RRF | Combine results from different models |
| Selective payload | WithPayloadSelector(include=["text"]) | Return only needed payload fields |
| Vector count | vde.get_vector_count() | Verify ingestion completed successfully |
| Flush | vde.flush() | Persist pending writes to durable storage |
Next steps
- Building multimodal systems — Add image embeddings with CLIP alongside text models
- Optimizing retrieval quality — Tune HNSW parameters and search settings
- Reranking search results — Improve result relevance with cross-encoders and fusion
- Similarity search fundamentals — Master the core search and query workflow