How to choose a Sentence Transformers model for semantic search

A semantic search model decides how text becomes vectors before the search index ever sees a query. In Sentence Transformers, that choice affects language coverage, embedding dimension, ranking behavior, inference speed, and whether a query-to-passage workload starts from a retrieval-trained model.

Use the model card and benchmark family to narrow candidates before writing index code. General all-* models are reasonable baselines for short English text, multi-qa-* and msmarco-* models are built around query and passage retrieval, multilingual models belong in mixed-language corpora, and domain models need a quick task-specific check before they replace a general model.

The first candidate does not need to be final. A small smoke test should load the model, print the embedding dimension that downstream indexes must store, and confirm that a representative query retrieves the intended document above distractors.

  1. Define the search workload before selecting a model ID.

    Record the query language, corpus language, average document length, latency target, hardware target, and whether the query is a natural question, a keyword phrase, or text similar to the documents.

  2. Shortlist a candidate from the workload shape.

    Use sentence-transformers/all-MiniLM-L6-v2 for a small English baseline, sentence-transformers/all-mpnet-base-v2 when general quality matters more than speed, retrieval-trained multi-qa-* or msmarco-* models for question-to-passage search, and multilingual or domain-specific model cards when the corpus is not general English.

  3. Create a reusable smoke-test script for the selected candidate.
    model_choose_smoke.py
    import os
     
    from sentence_transformers import SentenceTransformer
     
     
    model_id = os.environ.get("MODEL_ID", "sentence-transformers/all-MiniLM-L6-v2")
    model = SentenceTransformer(model_id)
     
    query = "How do I index embeddings for fast semantic search?"
    documents = [
        "Use FAISS to build a vector index for dense embeddings.",
        "Use a cross-encoder to rerank a short list of retrieved passages.",
        "Fine-tune an embedding model with hard negatives after collecting labeled pairs.",
    ]
     
    query_embedding = model.encode_query(
        query,
        normalize_embeddings=True,
        convert_to_tensor=True,
        show_progress_bar=False,
    )
    document_embeddings = model.encode_document(
        documents,
        normalize_embeddings=True,
        convert_to_tensor=True,
        show_progress_bar=False,
    )
     
    scores = model.similarity(query_embedding, document_embeddings)[0]
    best_index = int(scores.argmax())
     
    print(f"model_id={model_id}")
    print(f"embedding_dimension={model.get_embedding_dimension()}")
    print(f"max_sequence_length={model.max_seq_length}")
    print(f"query={query}")
    print(f"best_match={documents[best_index]}")
    print(f"score={float(scores[best_index]):.4f}")
     
    if best_index != 0:
        raise SystemExit(f"unexpected best match index: {best_index}")

    encode_query() and encode_document() keep the smoke test compatible with models that define separate prompts for queries and documents.
    Related: How to install Sentence Transformers with pip

  4. Run the smoke test with the candidate model ID.
    $ MODEL_ID=sentence-transformers/all-MiniLM-L6-v2 python model_choose_smoke.py
    model_id=sentence-transformers/all-MiniLM-L6-v2
    embedding_dimension=384
    max_sequence_length=256
    query=How do I index embeddings for fast semantic search?
    best_match=Use FAISS to build a vector index for dense embeddings.
    score=0.6025

    Replace MODEL_ID with each shortlist candidate. Keep the model only when the top match is the intended document and the printed dimension fits the vector store or index schema.
    Related: How to check embedding dimensions with Sentence Transformers