Embedding prompts add a short task prefix before Sentence Transformers tokenizes text for an embedding model. Retrieval models such as E5, BGE, and INSTRUCTOR often expect different wording for user queries and corpus passages, so the prompt path becomes part of the embedding pipeline rather than a cosmetic string.
SentenceTransformer accepts a prompts dictionary with named prompt text, and encode() can use either prompt_name or an inline prompt value. For retrieval code, encode_query() and encode_document() keep the roles explicit; encode_query() uses a query prompt when available, while encode_document() uses the document-side prompt path.
Use the exact prefixes from the selected model card when building a production index. A small local corpus is enough to prove the prompt mapping, embedding shapes, inline prompt equivalence, and top-ranked document without depending on a vector database.
from sentence_transformers import SentenceTransformer import numpy as np model = SentenceTransformer( "sentence-transformers/all-MiniLM-L6-v2", prompts={ "query": "query: ", "document": "passage: ", }, ) query = "How do I reset a forgotten password?" documents = [ "Generate quarterly revenue charts from a CSV export.", "Reset a lost account password from the profile security page.", "Tune the database connection pool for a busy API server.", ] query_embedding = model.encode_query( [query], normalize_embeddings=True, show_progress_bar=False, ) document_embeddings = model.encode_document( documents, normalize_embeddings=True, show_progress_bar=False, ) inline_query_embedding = model.encode( [query], prompt="query: ", normalize_embeddings=True, show_progress_bar=False, ) scores = model.similarity(query_embedding, document_embeddings)[0] best_index = int(scores.argmax()) inline_delta = float(np.max(np.abs(query_embedding - inline_query_embedding))) print(f"prompt keys: {', '.join(sorted(model.prompts))}") print(f"query shape: {query_embedding.shape}") print(f"document shape: {document_embeddings.shape}") print(f"inline prompt delta: {inline_delta:.6f}") print(f"top match: doc-{best_index + 1}") print(f"score: {scores[best_index]:.3f}") print(f"text: {documents[best_index]}")
The document key is named for the encode_document() role, while its prompt text uses the common retrieval prefix passage: . Replace both prompt texts with the prefixes required by the model card before encoding real data.
$ python embedding_prompts.py prompt keys: document, query query shape: (1, 384) document shape: (3, 384) inline prompt delta: 0.000000 top match: doc-2 score: 0.658 text: Reset a lost account password from the profile security page.
inline prompt delta should stay at 0.000000 when prompt=“query: ” and prompt_name=“query” resolve to the same prepended text. A nonzero value means the inline prompt string and named prompt text differ.
The query matrix has one row, the document matrix has one row per corpus item, and both have 384 columns for the selected model. The top match should be the password-reset passage.
$ rm embedding_prompts.py