How to refresh chatbot embeddings with Sentence Transformers

Chatbot retrieval indexes can return stale answers when a source document changes but the stored embedding still describes the old text. Refreshing embeddings for changed records keeps the chatbot's knowledge index aligned with the latest content without rebuilding every vector on each run.

Sentence Transformers supports retrieval workflows by embedding stored passages and user queries into the same vector space. In a chatbot index, use encode_document() for source text, encode_query() for incoming questions, and store the document id, content hash, text, and vector together.

A small chatbot_docs.jsonl source file and a chatbot_embeddings.npz vector file make the refresh boundary visible. A changed account-security document is re-embedded, saved, and returned for an API token question; a second run confirms unchanged source records are reused.

Steps to refresh chatbot embeddings with Sentence Transformers:

  1. Create the initial chatbot knowledge source file.
    chatbot_docs.jsonl
    {"id":"billing","text":"Update the billing contact from Account settings after finance approves the change."}
    {"id":"account-security","text":"Reset an account password from the profile security page and send the user a confirmation email."}
  2. Add a refresh script that hashes each document and reuses vectors when the source text is unchanged.
    refresh_chatbot_embeddings.py
    import hashlib
    import json
    import sys
    from pathlib import Path
     
    import numpy as np
    from sentence_transformers import SentenceTransformer
     
    SOURCE = Path("chatbot_docs.jsonl")
    INDEX = Path("chatbot_embeddings.npz")
    MODEL_NAME = "sentence-transformers/paraphrase-albert-small-v2"
     
     
    def load_documents(path):
        return [json.loads(line) for line in path.read_text().splitlines() if line.strip()]
     
     
    def text_hash(text):
        return hashlib.sha256(text.encode("utf-8")).hexdigest()
     
     
    def load_existing_index(path):
        if not path.exists():
            return {}
        data = np.load(path, allow_pickle=True)
        return {
            str(doc_id): {"hash": str(doc_hash), "embedding": embedding}
            for doc_id, doc_hash, embedding in zip(data["ids"], data["hashes"], data["embeddings"])
        }
     
     
    def refresh_index(query):
        docs = load_documents(SOURCE)
        previous = load_existing_index(INDEX)
        model = SentenceTransformer(MODEL_NAME)
     
        embeddings = []
        hashes = []
        refreshed = []
     
        for doc in docs:
            doc_id = doc["id"]
            doc_hash = text_hash(doc["text"])
            old = previous.get(doc_id)
            if old and old["hash"] == doc_hash:
                embedding = old["embedding"]
            else:
                embedding = model.encode_document(doc["text"], normalize_embeddings=True, show_progress_bar=False)
                refreshed.append(doc_id)
            embeddings.append(embedding)
            hashes.append(doc_hash)
     
        matrix = np.vstack(embeddings)
        np.savez(
            INDEX,
            ids=np.array([doc["id"] for doc in docs]),
            hashes=np.array(hashes),
            texts=np.array([doc["text"] for doc in docs]),
            embeddings=matrix,
        )
     
        query_embedding = model.encode_query(query, normalize_embeddings=True, show_progress_bar=False)
        scores = matrix @ query_embedding
        best = int(np.argmax(scores))
        best_id = docs[best]["id"]
     
        print(f"documents indexed: {len(docs)}")
        print("documents refreshed: " + (", ".join(refreshed) if refreshed else "none"))
        print(f"top result: {best_id} score={float(scores[best]):.4f}")
        print(docs[best]["text"])
     
     
    if __name__ == "__main__":
        refresh_index(sys.argv[1])

    The script stores the document hash beside each vector. encode_document() embeds knowledge passages, encode_query() embeds the chatbot question, and normalize_embeddings=True makes the dot product act as cosine similarity.

  3. Build the first chatbot embedding index.
    $ python refresh_chatbot_embeddings.py "How do I reset my password?"
    documents indexed: 2
    documents refreshed: billing, account-security
    top result: account-security score=0.4894
    Reset an account password from the profile security page and send the user a confirmation email.

    The first run refreshes both records because chatbot_embeddings.npz does not exist yet.

  4. Replace the changed knowledge entry in the source file.
    chatbot_docs.jsonl
    {"id":"billing","text":"Update the billing contact from Account settings after finance approves the change."}
    {"id":"account-security","text":"Rotate an API token from the profile security page and notify the chatbot owner after the credential is replaced."}
  5. Refresh the chatbot embeddings after the source text changes.
    $ python refresh_chatbot_embeddings.py "How do I rotate the API token?"
    documents indexed: 2
    documents refreshed: account-security
    top result: account-security score=0.5153
    Rotate an API token from the profile security page and notify the chatbot owner after the credential is replaced.

    documents refreshed: account-security proves only the changed source record was re-embedded. documents indexed: 2 confirms the stored vector set still contains both chatbot knowledge records.

  6. Run the refresh again to confirm unchanged source records are reused.
    $ python refresh_chatbot_embeddings.py "How do I rotate the API token?"
    documents indexed: 2
    documents refreshed: none
    top result: account-security score=0.5153
    Rotate an API token from the profile security page and notify the chatbot owner after the credential is replaced.

    Use the same pattern after a content sync and before reloading any chatbot process that keeps embeddings in memory.