Embedding search usually narrows a corpus before a reader ever sees the result list. Sentence Transformers can add a second scoring pass with a cross-encoder so the application presents candidates by query-document relevance instead of only embedding distance.
A bi-encoder model such as sentence-transformers/all-MiniLM-L6-v2 encodes the query and documents separately, which makes the first retrieval stage fast. A CrossEncoder reads each query-document pair together, so it is better suited to scoring a short top-k candidate set than scanning a large corpus directly.
A small Python script can keep the first-stage candidate list small, rerank those candidates with cross-encoder/ms-marco-MiniLM-L6-v2, and sort the final list by the cross-encoder score. MS MARCO cross-encoder models return logits by default, so compare the scores relative to each other unless the display layer needs probabilities.
Steps to rerank Sentence Transformers search results with a cross-encoder:
- Create a Python script that retrieves embedding candidates and reranks them.
- rerank_results.py
from sentence_transformers import CrossEncoder, SentenceTransformer, util query = "How do I restore a Docker volume backup on another host?" documents = [ "Create a tar archive from a Docker volume, copy it to the new host, and extract it into a replacement volume.", "Use docker compose pull and docker compose up -d to recreate application containers after an image update.", "List Docker images and tags before promoting a release to production.", "Upload a finished build directory to Amazon S3 with aws s3 sync for release backups.", "Inspect container logs with docker logs when a service exits during startup.", ] retriever = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2", device="cpu") reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", device="cpu") document_embeddings = retriever.encode( documents, normalize_embeddings=True, show_progress_bar=False, ) query_embedding = retriever.encode( query, normalize_embeddings=True, show_progress_bar=False, ) hits = util.semantic_search(query_embedding, document_embeddings, top_k=4)[0] candidates = [documents[hit["corpus_id"]] for hit in hits] print("Initial embedding search:") for position, hit in enumerate(hits, start=1): print(f"{position}. cosine={hit['score']:.3f} | {documents[hit['corpus_id']]}") print() print("Final reranked order:") for position, hit in enumerate( reranker.rank( query, candidates, return_documents=True, show_progress_bar=False, ), start=1, ): print(f"{position}. score={hit['score']:.2f} | {hit['text']}")
semantic_search() limits the expensive cross-encoder stage to the top embedding hits. Raise top_k when recall matters more than latency.
- Run the reranking script.
$ python3 rerank_results.py Initial embedding search: 1. cosine=0.740 | Create a tar archive from a Docker volume, copy it to the new host, and extract it into a replacement volume. 2. cosine=0.466 | Use docker compose pull and docker compose up -d to recreate application containers after an image update. 3. cosine=0.364 | Inspect container logs with docker logs when a service exits during startup. 4. cosine=0.272 | List Docker images and tags before promoting a release to production. Final reranked order: 1. score=3.19 | Create a tar archive from a Docker volume, copy it to the new host, and extract it into a replacement volume. 2. score=-6.19 | Use docker compose pull and docker compose up -d to recreate application containers after an image update. 3. score=-7.50 | Inspect container logs with docker logs when a service exits during startup. 4. score=-10.33 | List Docker images and tags before promoting a release to production.
The final list is sorted by CrossEncoder.rank() scores. The first candidate keeps the highest score, while lower candidates are separated by much lower cross-encoder scores.
- Replace the printed output with the application return value after the order is correct.
reranked = reranker.rank(query, candidates, return_documents=True, show_progress_bar=False) reranked_documents = [hit["text"] for hit in reranked]
Keep the original corpus IDs beside each candidate when the application needs to return database records, URLs, or metadata instead of only text.
- Remove the sample script when the rerank check is finished.
$ rm rerank_results.py
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.