A semantic search model decides how text becomes vectors before the search index ever sees a query. In Sentence Transformers, that choice affects language coverage, embedding dimension, ranking behavior, inference speed, and whether a query-to-passage workload starts from a retrieval-trained model.
Use the model card and benchmark family to narrow candidates before writing index code. General all-* models are reasonable baselines for short English text, multi-qa-* and msmarco-* models are built around query and passage retrieval, multilingual models belong in mixed-language corpora, and domain models need a quick task-specific check before they replace a general model.
The first candidate does not need to be final. A small smoke test should load the model, print the embedding dimension that downstream indexes must store, and confirm that a representative query retrieves the intended document above distractors.
Steps to choose a Sentence Transformers model for semantic search:
- Define the search workload before selecting a model ID.
Record the query language, corpus language, average document length, latency target, hardware target, and whether the query is a natural question, a keyword phrase, or text similar to the documents.
- Shortlist a candidate from the workload shape.
Use sentence-transformers/all-MiniLM-L6-v2 for a small English baseline, sentence-transformers/all-mpnet-base-v2 when general quality matters more than speed, retrieval-trained multi-qa-* or msmarco-* models for question-to-passage search, and multilingual or domain-specific model cards when the corpus is not general English.
- Create a reusable smoke-test script for the selected candidate.
- model_choose_smoke.py
import os from sentence_transformers import SentenceTransformer model_id = os.environ.get("MODEL_ID", "sentence-transformers/all-MiniLM-L6-v2") model = SentenceTransformer(model_id) query = "How do I index embeddings for fast semantic search?" documents = [ "Use FAISS to build a vector index for dense embeddings.", "Use a cross-encoder to rerank a short list of retrieved passages.", "Fine-tune an embedding model with hard negatives after collecting labeled pairs.", ] query_embedding = model.encode_query( query, normalize_embeddings=True, convert_to_tensor=True, show_progress_bar=False, ) document_embeddings = model.encode_document( documents, normalize_embeddings=True, convert_to_tensor=True, show_progress_bar=False, ) scores = model.similarity(query_embedding, document_embeddings)[0] best_index = int(scores.argmax()) print(f"model_id={model_id}") print(f"embedding_dimension={model.get_embedding_dimension()}") print(f"max_sequence_length={model.max_seq_length}") print(f"query={query}") print(f"best_match={documents[best_index]}") print(f"score={float(scores[best_index]):.4f}") if best_index != 0: raise SystemExit(f"unexpected best match index: {best_index}")
encode_query() and encode_document() keep the smoke test compatible with models that define separate prompts for queries and documents.
Related: How to install Sentence Transformers with pip - Run the smoke test with the candidate model ID.
$ MODEL_ID=sentence-transformers/all-MiniLM-L6-v2 python model_choose_smoke.py model_id=sentence-transformers/all-MiniLM-L6-v2 embedding_dimension=384 max_sequence_length=256 query=How do I index embeddings for fast semantic search? best_match=Use FAISS to build a vector index for dense embeddings. score=0.6025
Replace MODEL_ID with each shortlist candidate. Keep the model only when the top match is the intended document and the printed dimension fits the vector store or index schema.
Related: How to check embedding dimensions with Sentence Transformers
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.