How to use a Sentence Transformers ranker in Haystack

Haystack retrieval pipelines often need a second pass after the first retriever collects candidate documents. A Sentence Transformers ranker uses a cross-encoder model to score each candidate against the query, which helps search and RAG pipelines send better ordered context to the next component.

The Sentence Transformers similarity ranker component comes from the sentence-transformers-haystack integration package. It receives documents from a retriever and the same query string through Pipeline.run(), then returns a ranked list of Document objects with model scores.

A small InMemoryBM25Retriever dataset makes the reranking boundary visible without requiring a separate vector database. In a production pipeline, the upstream retriever can be BM25, embedding-based, or hybrid; keep the retriever top_k small enough that the cross-encoder model can score the candidates within the request latency budget.

Steps to use a Sentence Transformers ranker in Haystack:

  1. Install the Sentence Transformers Haystack integration in the active Python environment.
    $ python3 -m pip install --upgrade \
      sentence-transformers-haystack
    ##### snipped #####
    Successfully installed haystack-ai-2.30.2
    Successfully installed sentence-transformers-haystack-0.1.0

    The integration package installs haystack-ai and sentence-transformers when they are missing. Use a virtual environment when the project should not share Python packages with the system interpreter.
    Related: venv-create
    Related: pip-install

  2. Create a query pipeline script that places the Sentence Transformers ranker after the retriever.
    $ cat > haystack-ranker.py <<'PY'
    from importlib import import_module
    
    from haystack import Document, Pipeline
    from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
    from haystack.document_stores.in_memory import InMemoryDocumentStore
    
    ranker_module = import_module(
        "haystack_integrations.components."
        "rankers.sentence_transformers"
    )
    Ranker = getattr(
        ranker_module,
        "SentenceTransformersSimilarityRanker",
    )
    
    docs = [
        Document(
            content="Munich is a city in Germany.",
            meta={"name": "munich"},
        ),
        Document(
            content="Berlin is the capital of Germany.",
            meta={"name": "berlin"},
        ),
        Document(
            content="Paris is the capital of France.",
            meta={"name": "paris"},
        ),
        Document(
            content="Tokyo is the capital of Japan.",
            meta={"name": "tokyo"},
        ),
    ]
    
    store = InMemoryDocumentStore()
    store.write_documents(docs)
    retriever = InMemoryBM25Retriever(document_store=store)
    ranker = Ranker(
        model="cross-encoder/ms-marco-MiniLM-L6-v2"
    )
    
    query = "What is the capital city of Germany?"
    before = retriever.run(query=query, top_k=4)["documents"]
    print("Retriever order:")
    for doc in before:
        label = doc.meta.get("name")
        print(f"- {label}: {doc.content}")
    
    pipe = Pipeline()
    pipe.add_component("retriever", retriever)
    pipe.add_component("ranker", ranker)
    pipe.connect("retriever.documents", "ranker.documents")
    
    result = pipe.run(
        data={
            "retriever": {"query": query, "top_k": 4},
            "ranker": {"query": query, "top_k": 2},
        }
    )
    
    print()
    print("Reranked top 2:")
    for doc in result["ranker"]["documents"]:
        label = doc.meta.get("name")
        print(f"- {label}: {doc.score:.4f} | {doc.content}")
    
    print()
    print("Pipeline summary:")
    print("Retriever top_k: 4 candidate documents")
    print(f"Ranker top_k: {len(result['ranker']['documents'])} returned documents")
    top_document = result["ranker"]["documents"][0].meta.get("name")
    print(f"Final top document: {top_document}")
    PY

    The first run downloads the cross-encoder model unless it is already cached. Private Hugging Face models need an HF_TOKEN or HF_API_TOKEN environment variable.

  3. Run the pipeline script and confirm that the final output comes from the ranker component.
    $ python3 haystack-ranker.py
    Retriever order:
    - munich: Munich is a city in Germany.
    - berlin: Berlin is the capital of Germany.
    - paris: Paris is the capital of France.
    - tokyo: Tokyo is the capital of Japan.
    
    Reranked top 2:
    - berlin: 0.9998 | Berlin is the capital of Germany.
    - munich: 0.4497 | Munich is a city in Germany.
    
    Pipeline summary:
    Retriever top_k: 4 candidate documents
    Ranker top_k: 2 returned documents
    Final top document: berlin

    The retriever top_k controls how many candidates are sent into the model. The ranker top_k controls how many ranked documents leave the final component.