How to use embedding prompts with Sentence Transformers

Embedding prompts add a short task prefix before Sentence Transformers tokenizes text for an embedding model. Retrieval models such as E5, BGE, and INSTRUCTOR often expect different wording for user queries and corpus passages, so the prompt path becomes part of the embedding pipeline rather than a cosmetic string.

SentenceTransformer accepts a prompts dictionary with named prompt text, and encode() can use either prompt_name or an inline prompt value. For retrieval code, encode_query() and encode_document() keep the roles explicit; encode_query() uses a query prompt when available, while encode_document() uses the document-side prompt path.

Use the exact prefixes from the selected model card when building a production index. A small local corpus is enough to prove the prompt mapping, embedding shapes, inline prompt equivalence, and top-ranked document without depending on a vector database.

Steps to use Sentence Transformers embedding prompts:

Create the prompt check script in the project.

embedding_prompts.py

from sentence_transformers import SentenceTransformer
import numpy as np
 
 
model = SentenceTransformer(
    "sentence-transformers/all-MiniLM-L6-v2",
    prompts={
        "query": "query: ",
        "document": "passage: ",
    },
)
 
query = "How do I reset a forgotten password?"
documents = [
    "Generate quarterly revenue charts from a CSV export.",
    "Reset a lost account password from the profile security page.",
    "Tune the database connection pool for a busy API server.",
]
 
query_embedding = model.encode_query(
    [query],
    normalize_embeddings=True,
    show_progress_bar=False,
)
document_embeddings = model.encode_document(
    documents,
    normalize_embeddings=True,
    show_progress_bar=False,
)
inline_query_embedding = model.encode(
    [query],
    prompt="query: ",
    normalize_embeddings=True,
    show_progress_bar=False,
)
 
scores = model.similarity(query_embedding, document_embeddings)[0]
best_index = int(scores.argmax())
inline_delta = float(np.max(np.abs(query_embedding - inline_query_embedding)))
 
print(f"prompt keys: {', '.join(sorted(model.prompts))}")
print(f"query shape: {query_embedding.shape}")
print(f"document shape: {document_embeddings.shape}")
print(f"inline prompt delta: {inline_delta:.6f}")
print(f"top match: doc-{best_index + 1}")
print(f"score: {scores[best_index]:.3f}")
print(f"text: {documents[best_index]}")

The document key is named for the encode_document() role, while its prompt text uses the common retrieval prefix passage: . Replace both prompt texts with the prefixes required by the model card before encoding real data.

Run the script.

$ python embedding_prompts.py
prompt keys: document, query
query shape: (1, 384)
document shape: (3, 384)
inline prompt delta: 0.000000
top match: doc-2
score: 0.658
text: Reset a lost account password from the profile security page.

Confirm that the named and inline query prompts match.

inline prompt delta should stay at 0.000000 when prompt=“query: ” and prompt_name=“query” resolve to the same prepended text. A nonzero value means the inline prompt string and named prompt text differ.
Confirm that the query and document embeddings can be compared.

The query matrix has one row, the document matrix has one row per corpus item, and both have 384 columns for the selected model. The top match should be the password-reset passage.
Remove the temporary prompt check script after copying the prompt mapping into the retrieval code.
```
$ rm embedding_prompts.py
```

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.