How to rerank search results with a Sentence Transformers cross-encoder

Embedding search usually narrows a corpus before a reader ever sees the result list. Sentence Transformers can add a second scoring pass with a cross-encoder so the application presents candidates by query-document relevance instead of only embedding distance.

A bi-encoder model such as sentence-transformers/all-MiniLM-L6-v2 encodes the query and documents separately, which makes the first retrieval stage fast. A CrossEncoder reads each query-document pair together, so it is better suited to scoring a short top-k candidate set than scanning a large corpus directly.

A small Python script can keep the first-stage candidate list small, rerank those candidates with cross-encoder/ms-marco-MiniLM-L6-v2, and sort the final list by the cross-encoder score. MS MARCO cross-encoder models return logits by default, so compare the scores relative to each other unless the display layer needs probabilities.

Steps to rerank Sentence Transformers search results with a cross-encoder:

  1. Create a Python script that retrieves embedding candidates and reranks them.
    rerank_results.py
    from sentence_transformers import CrossEncoder, SentenceTransformer, util
     
    query = "How do I restore a Docker volume backup on another host?"
     
    documents = [
        "Create a tar archive from a Docker volume, copy it to the new host, and extract it into a replacement volume.",
        "Use docker compose pull and docker compose up -d to recreate application containers after an image update.",
        "List Docker images and tags before promoting a release to production.",
        "Upload a finished build directory to Amazon S3 with aws s3 sync for release backups.",
        "Inspect container logs with docker logs when a service exits during startup.",
    ]
     
    retriever = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2", device="cpu")
    reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", device="cpu")
     
    document_embeddings = retriever.encode(
        documents,
        normalize_embeddings=True,
        show_progress_bar=False,
    )
    query_embedding = retriever.encode(
        query,
        normalize_embeddings=True,
        show_progress_bar=False,
    )
     
    hits = util.semantic_search(query_embedding, document_embeddings, top_k=4)[0]
    candidates = [documents[hit["corpus_id"]] for hit in hits]
     
    print("Initial embedding search:")
    for position, hit in enumerate(hits, start=1):
        print(f"{position}. cosine={hit['score']:.3f} | {documents[hit['corpus_id']]}")
     
    print()
    print("Final reranked order:")
    for position, hit in enumerate(
        reranker.rank(
            query,
            candidates,
            return_documents=True,
            show_progress_bar=False,
        ),
        start=1,
    ):
        print(f"{position}. score={hit['score']:.2f} | {hit['text']}")

    semantic_search() limits the expensive cross-encoder stage to the top embedding hits. Raise top_k when recall matters more than latency.

  2. Run the reranking script.
    $ python3 rerank_results.py
    Initial embedding search:
    1. cosine=0.740 | Create a tar archive from a Docker volume, copy it to the new host, and extract it into a replacement volume.
    2. cosine=0.466 | Use docker compose pull and docker compose up -d to recreate application containers after an image update.
    3. cosine=0.364 | Inspect container logs with docker logs when a service exits during startup.
    4. cosine=0.272 | List Docker images and tags before promoting a release to production.
    
    Final reranked order:
    1. score=3.19 | Create a tar archive from a Docker volume, copy it to the new host, and extract it into a replacement volume.
    2. score=-6.19 | Use docker compose pull and docker compose up -d to recreate application containers after an image update.
    3. score=-7.50 | Inspect container logs with docker logs when a service exits during startup.
    4. score=-10.33 | List Docker images and tags before promoting a release to production.

    The final list is sorted by CrossEncoder.rank() scores. The first candidate keeps the highest score, while lower candidates are separated by much lower cross-encoder scores.

  3. Replace the printed output with the application return value after the order is correct.
    reranked = reranker.rank(query, candidates, return_documents=True, show_progress_bar=False)
    reranked_documents = [hit["text"] for hit in reranked]

    Keep the original corpus IDs beside each candidate when the application needs to return database records, URLs, or metadata instead of only text.

  4. Remove the sample script when the rerank check is finished.
    $ rm rerank_results.py