How to build semantic search with Sentence Transformers

Semantic search compares a user query with document embeddings, so a support note can match words that are not typed exactly the same way. Sentence Transformers provides bi-encoder models that turn queries and documents into vectors before ranking the nearest matches by similarity.

The small in-memory pattern fits product FAQs, notes, tickets, and other corpora that can be embedded during startup or a batch refresh. Each text keeps an application ID beside it, the model encodes the document texts, and the search result returns the original ID plus a score that can be sent back to the application.

Use encode_document() for corpus text and encode_query() for user queries when the model supports query/document prompts. For corpora that outgrow exact in-memory search, keep the same embedding boundary and move the stored vectors into FAISS, Qdrant, or another vector index.

Steps to build semantic search with Sentence Transformers:

Create the semantic search script.

semantic_search_build.py

from sentence_transformers import SentenceTransformer, util
 
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
 
documents = [
    {"id": "doc-001", "text": "Set up billing alerts for monthly cloud spending."},
    {"id": "doc-002", "text": "Reset expired password links from the account security page."},
    {"id": "doc-003", "text": "Rotate SSH keys for production deployment hosts."},
    {"id": "doc-004", "text": "Renew TLS certificates before the web server reload."},
    {"id": "doc-005", "text": "Export customer invoices from the finance dashboard."},
]
query = "password reset link expired"
 
corpus = [item["text"] for item in documents]
corpus_embeddings = model.encode_document(
    corpus,
    convert_to_tensor=True,
    normalize_embeddings=True,
    show_progress_bar=False,
)
query_embedding = model.encode_query(
    query,
    convert_to_tensor=True,
    normalize_embeddings=True,
    show_progress_bar=False,
)
 
hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=3)[0]
 
print(f"Corpus embeddings: {tuple(corpus_embeddings.shape)}")
print(f"Query: {query}")
for rank, hit in enumerate(hits, start=1):
    item = documents[hit["corpus_id"]]
    print(f"{rank}. {item['id']} score={hit['score']:.4f} text={item['text']}")
 
top_item = documents[hits[0]["corpus_id"]]
if top_item["id"] == "doc-002":
    print("Semantic search check: pass")

Use a Python environment where Sentence Transformers is installed before running the script.
Related: How to install Sentence Transformers with pip

Run the semantic search script.

$ python semantic_search_build.py
Corpus embeddings: (5, 384)
Query: password reset link expired
1. doc-002 score=0.8394 text=Reset expired password links from the account security page.
2. doc-004 score=0.3238 text=Renew TLS certificates before the web server reload.
3. doc-001 score=0.0895 text=Set up billing alerts for monthly cloud spending.
Semantic search check: pass

The first tuple value is the number of indexed documents, and the second value is the embedding dimension from the selected model.

Confirm that doc-002 appears first and that the script prints Semantic search check: pass.

The exact score can change with a different model or corpus, but the highest-ranked ID should belong to the record that matches the query intent.