How to build semantic search with Sentence Transformers

Semantic search compares a user query with document embeddings, so a support note can match words that are not typed exactly the same way. Sentence Transformers provides bi-encoder models that turn queries and documents into vectors before ranking the nearest matches by similarity.

The small in-memory pattern fits product FAQs, notes, tickets, and other corpora that can be embedded during startup or a batch refresh. Each text keeps an application ID beside it, the model encodes the document texts, and the search result returns the original ID plus a score that can be sent back to the application.

Use encode_document() for corpus text and encode_query() for user queries when the model supports query/document prompts. For corpora that outgrow exact in-memory search, keep the same embedding boundary and move the stored vectors into FAISS, Qdrant, or another vector index.

Steps to build semantic search with Sentence Transformers:

  1. Create the semantic search script.
    semantic_search_build.py
    from sentence_transformers import SentenceTransformer, util
     
    model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
     
    documents = [
        {"id": "doc-001", "text": "Set up billing alerts for monthly cloud spending."},
        {"id": "doc-002", "text": "Reset expired password links from the account security page."},
        {"id": "doc-003", "text": "Rotate SSH keys for production deployment hosts."},
        {"id": "doc-004", "text": "Renew TLS certificates before the web server reload."},
        {"id": "doc-005", "text": "Export customer invoices from the finance dashboard."},
    ]
    query = "password reset link expired"
     
    corpus = [item["text"] for item in documents]
    corpus_embeddings = model.encode_document(
        corpus,
        convert_to_tensor=True,
        normalize_embeddings=True,
        show_progress_bar=False,
    )
    query_embedding = model.encode_query(
        query,
        convert_to_tensor=True,
        normalize_embeddings=True,
        show_progress_bar=False,
    )
     
    hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=3)[0]
     
    print(f"Corpus embeddings: {tuple(corpus_embeddings.shape)}")
    print(f"Query: {query}")
    for rank, hit in enumerate(hits, start=1):
        item = documents[hit["corpus_id"]]
        print(f"{rank}. {item['id']} score={hit['score']:.4f} text={item['text']}")
     
    top_item = documents[hits[0]["corpus_id"]]
    if top_item["id"] == "doc-002":
        print("Semantic search check: pass")

    Use a Python environment where Sentence Transformers is installed before running the script.
    Related: How to install Sentence Transformers with pip

  2. Run the semantic search script.
    $ python semantic_search_build.py
    Corpus embeddings: (5, 384)
    Query: password reset link expired
    1. doc-002 score=0.8394 text=Reset expired password links from the account security page.
    2. doc-004 score=0.3238 text=Renew TLS certificates before the web server reload.
    3. doc-001 score=0.0895 text=Set up billing alerts for monthly cloud spending.
    Semantic search check: pass

    The first tuple value is the number of indexed documents, and the second value is the embedding dimension from the selected model.

  3. Confirm that doc-002 appears first and that the script prints Semantic search check: pass.

    The exact score can change with a different model or corpus, but the highest-ranked ID should belong to the record that matches the query intent.