Semantic search over a larger text collection needs a retrieval path that compares a query against stored document embeddings without building a full all-pairs score matrix. Sentence Transformers can run that exact-search stage with precomputed corpus embeddings and chunked scoring, which keeps the prototype close to the code used later in a vector index or retrieval service.

The semantic_search() helper accepts query embeddings, corpus embeddings, a top_k limit, and chunk sizes for query and corpus scanning. Smaller corpus_chunk_size values lower the temporary score matrix size, while larger chunks can be faster when the available CPU or GPU memory can hold them.

The local sample repeats six support-document topics into a 2,400-document corpus, searches for a password-reset question, and verifies that password-reset documents rank first. Replace the repeated records with stable application IDs and real text before saving results; move the embeddings into FAISS, Qdrant, or another vector store when persistence, filtering, or approximate nearest-neighbor search becomes the main job.

Steps to run large-corpus semantic search with Sentence Transformers:

  1. Install Sentence Transformers in the active Python environment.
    $ python -m pip install --upgrade sentence-transformers

    The first model run may download files from Hugging Face. Use the same environment that will encode the production corpus.
    Related: How to install Sentence Transformers with pip

  2. Create the large-corpus semantic search script.
    large_corpus_search.py
    from sentence_transformers import SentenceTransformer, util
     
     
    model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
     
    topics = [
        (
            "password-reset",
            "Reset a forgotten password from account settings, open the email link, "
            "and choose a new password.",
        ),
        (
            "invoice-export",
            "Export paid invoices from the billing dashboard as a CSV file for accounting.",
        ),
        (
            "api-token-rotation",
            "Rotate API tokens before sharing a new integration with a teammate.",
        ),
        (
            "notification-email",
            "Change the notification email address and confirm the new address for alerts.",
        ),
        (
            "workspace-theme",
            "Change the dashboard color theme for a workspace user profile.",
        ),
        (
            "vector-search",
            "Store dense embeddings in a vector index for semantic search retrieval.",
        ),
    ]
     
    corpus = []
    for shard_id in range(1, 401):
        for topic, text in topics:
            corpus.append(
                {
                    "id": f"{topic}-{shard_id:03d}",
                    "topic": topic,
                    "text": f"{text} Region {shard_id:03d}.",
                }
            )
     
    documents = [item["text"] for item in corpus]
    query = "How does a user reset a forgotten password with an email link?"
     
    document_embeddings = model.encode_document(
        documents,
        batch_size=128,
        normalize_embeddings=True,
        convert_to_tensor=True,
        show_progress_bar=False,
    )
    query_embedding = model.encode_query(
        query,
        normalize_embeddings=True,
        convert_to_tensor=True,
        show_progress_bar=False,
    )
     
    query_chunk_size = 1
    corpus_chunk_size = 256
    top_k = 3
     
    hits = util.semantic_search(
        query_embedding,
        document_embeddings,
        query_chunk_size=query_chunk_size,
        corpus_chunk_size=corpus_chunk_size,
        top_k=top_k,
        score_function=util.dot_score,
    )[0]
     
    print(f"corpus documents: {len(corpus)}")
    print(f"embedding dimension: {document_embeddings.shape[1]}")
    print(f"query chunk size: {query_chunk_size}")
    print(f"corpus chunk size: {corpus_chunk_size}")
    print(f"top k: {top_k}")
    print(f"query: {query}")
    print("top matches:")
    for rank, hit in enumerate(hits, start=1):
        record = corpus[hit["corpus_id"]]
        print(
            f"{rank}. {record['id']} topic={record['topic']} "
            f"score={hit['score']:.4f}"
        )
     
    if corpus[hits[0]["corpus_id"]]["topic"] != "password-reset":
        raise SystemExit("unexpected top topic")
     
    print("verification: PASS password reset documents ranked first")

    encode_document() and encode_query() keep the code ready for embedding models that define separate document and query prompts. Normalized embeddings plus dot_score keep the scores on a cosine-similarity scale.
    Related: How to encode queries and documents with Sentence Transformers

  3. Run the script and confirm the chunked search settings.
    $ python large_corpus_search.py
    corpus documents: 2400
    embedding dimension: 384
    query chunk size: 1
    corpus chunk size: 256
    top k: 3
    query: How does a user reset a forgotten password with an email link?
    top matches:
    1. password-reset-253 topic=password-reset score=0.6977
    2. password-reset-106 topic=password-reset score=0.6953
    3. password-reset-112 topic=password-reset score=0.6937
    verification: PASS password reset documents ranked first

    The reported corpus count and corpus chunk size prove that the query was scored in corpus chunks instead of as a tiny toy list. If the expected topic does not rank first, inspect the source text, chunk boundaries, model choice, and normalization setting before increasing top_k.

  4. Remove the temporary script after the search behavior is verified.
    $ rm large_corpus_search.py