How to choose a Sentence Transformers model for semantic search

A semantic search model decides how text becomes vectors before the search index ever sees a query. In Sentence Transformers, that choice affects language coverage, embedding dimension, ranking behavior, inference speed, and whether a query-to-passage workload starts from a retrieval-trained model.

Use the model card and benchmark family to narrow candidates before writing index code. General all-* models are reasonable baselines for short English text, multi-qa-* and msmarco-* models are built around query and passage retrieval, multilingual models belong in mixed-language corpora, and domain models need a quick task-specific check before they replace a general model.

The first candidate does not need to be final. A small smoke test should load the model, print the embedding dimension that downstream indexes must store, and confirm that a representative query retrieves the intended document above distractors.

Steps to choose a Sentence Transformers model for semantic search:

Define the search workload before selecting a model ID.

Record the query language, corpus language, average document length, latency target, hardware target, and whether the query is a natural question, a keyword phrase, or text similar to the documents.
Shortlist a candidate from the workload shape.

Use sentence-transformers/all-MiniLM-L6-v2 for a small English baseline, sentence-transformers/all-mpnet-base-v2 when general quality matters more than speed, retrieval-trained multi-qa-* or msmarco-* models for question-to-passage search, and multilingual or domain-specific model cards when the corpus is not general English.

Create a reusable smoke-test script for the selected candidate.

model_choose_smoke.py

import os
 
from sentence_transformers import SentenceTransformer
 
 
model_id = os.environ.get("MODEL_ID", "sentence-transformers/all-MiniLM-L6-v2")
model = SentenceTransformer(model_id)
 
query = "How do I index embeddings for fast semantic search?"
documents = [
    "Use FAISS to build a vector index for dense embeddings.",
    "Use a cross-encoder to rerank a short list of retrieved passages.",
    "Fine-tune an embedding model with hard negatives after collecting labeled pairs.",
]
 
query_embedding = model.encode_query(
    query,
    normalize_embeddings=True,
    convert_to_tensor=True,
    show_progress_bar=False,
)
document_embeddings = model.encode_document(
    documents,
    normalize_embeddings=True,
    convert_to_tensor=True,
    show_progress_bar=False,
)
 
scores = model.similarity(query_embedding, document_embeddings)[0]
best_index = int(scores.argmax())
 
print(f"model_id={model_id}")
print(f"embedding_dimension={model.get_embedding_dimension()}")
print(f"max_sequence_length={model.max_seq_length}")
print(f"query={query}")
print(f"best_match={documents[best_index]}")
print(f"score={float(scores[best_index]):.4f}")
 
if best_index != 0:
    raise SystemExit(f"unexpected best match index: {best_index}")

encode_query() and encode_document() keep the smoke test compatible with models that define separate prompts for queries and documents.
Related: How to install Sentence Transformers with pip

Run the smoke test with the candidate model ID.

$ MODEL_ID=sentence-transformers/all-MiniLM-L6-v2 python model_choose_smoke.py
model_id=sentence-transformers/all-MiniLM-L6-v2
embedding_dimension=384
max_sequence_length=256
query=How do I index embeddings for fast semantic search?
best_match=Use FAISS to build a vector index for dense embeddings.
score=0.6025

Replace MODEL_ID with each shortlist candidate. Keep the model only when the top match is the intended document and the printed dimension fits the vector store or index schema.
Related: How to check embedding dimensions with Sentence Transformers