Retrieval-augmented generation depends on finding the right source text before a language model writes an answer. Sentence Transformers can build that retriever by embedding knowledge-base chunks and comparing a user question against those vectors.

The retriever layer does not need to call an LLM to prove its own behavior. A small local corpus can show whether the query embedding returns the intended chunks, and the same selected chunks can then be assembled into the context block that a generator prompt will receive.

This prototype uses encode_document() for the indexed chunks, encode_query() for the question, and semantic_search() for top-k retrieval. Replace the sample chunks with your application records after the ranking and context formatting are correct.

Steps to build a RAG retriever with Sentence Transformers:

  1. Install Sentence Transformers in the active Python environment.
    $ python -m pip install --upgrade sentence-transformers

    The first model run may download files from Hugging Face. Use the same environment that will run the retrieval code.
    Related: How to install Sentence Transformers with pip

  2. Create the retriever prototype script.
    build_rag_retriever.py
    from sentence_transformers import SentenceTransformer, util
     
     
    model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
     
    chunks = [
        {
            "id": "kb-001",
            "title": "Reset a forgotten password",
            "text": (
                "Users can reset a forgotten password from Account settings. "
                "Send the reset email, open the link, and choose a new password."
            ),
        },
        {
            "id": "kb-002",
            "title": "Download invoice receipts",
            "text": (
                "Billing admins can download invoice receipts from the billing "
                "history screen after payment is processed."
            ),
        },
        {
            "id": "kb-003",
            "title": "Rotate API tokens",
            "text": (
                "Create a replacement API token, update the integration, and revoke "
                "the old token after traffic moves to the new credential."
            ),
        },
        {
            "id": "kb-004",
            "title": "Change notification email",
            "text": (
                "Profile owners can change the notification email address and "
                "confirm the new address before alerts move."
            ),
        },
    ]
     
    query = "How does a user reset a forgotten password?"
    documents = [f"{chunk['title']}: {chunk['text']}" for chunk in chunks]
     
    document_embeddings = model.encode_document(
        documents,
        normalize_embeddings=True,
        convert_to_tensor=True,
        show_progress_bar=False,
    )
    query_embedding = model.encode_query(
        query,
        normalize_embeddings=True,
        convert_to_tensor=True,
        show_progress_bar=False,
    )
     
    hits = util.semantic_search(query_embedding, document_embeddings, top_k=2)[0]
    context_chunks = [chunks[hit["corpus_id"]] for hit in hits]
    context = "\n\n".join(
        f"[{chunk['id']}] {chunk['title']}\n{chunk['text']}" for chunk in context_chunks
    )
    prompt = (
        f"Question: {query}\n\n"
        "Use only this context:\n"
        f"{context}\n\n"
        "Answer:"
    )
     
    print(f"indexed chunks: {len(chunks)}")
    print(f"query: {query}")
    print("retrieved chunks:")
    for rank, hit in enumerate(hits, start=1):
        chunk = chunks[hit["corpus_id"]]
        print(f"{rank}. {chunk['id']} score={hit['score']:.4f} title={chunk['title']}")
     
    print("\nrag prompt context:")
    print(context)
     
    if context_chunks[0]["id"] != "kb-001":
        raise SystemExit(f"unexpected top chunk: {context_chunks[0]['id']}")
     
    if "Reset a forgotten password" not in prompt:
        raise SystemExit("prompt context is missing the retrieved password chunk")
     
    print("\nverification: PASS retriever context is ready for the generator prompt")

    Keep stable chunk IDs beside the text so retrieved context can be traced back to the source record, page, ticket, or document section. normalize_embeddings=True keeps the document and query vectors on the same cosine-similarity scale.

  3. Run the retriever script and confirm the password-reset chunk ranks first.
    $ python build_rag_retriever.py
    indexed chunks: 4
    query: How does a user reset a forgotten password?
    retrieved chunks:
    1. kb-001 score=0.7700 title=Reset a forgotten password
    2. kb-003 score=0.0874 title=Rotate API tokens
    
    rag prompt context:
    [kb-001] Reset a forgotten password
    Users can reset a forgotten password from Account settings. Send the reset email, open the link, and choose a new password.
    
    [kb-003] Rotate API tokens
    Create a replacement API token, update the integration, and revoke the old token after traffic moves to the new credential.
    
    verification: PASS retriever context is ready for the generator prompt

    The first retrieved chunk should be the context that answers the question. If another chunk wins, inspect chunk boundaries, model choice, normalization, and whether the query wording matches the corpus domain.