How to generate sparse embeddings with Sentence Transformers

Sparse text embeddings keep only a small set of weighted vocabulary dimensions active for each input. In Sentence Transformers, SparseEncoder models expose that representation for SPLADE-style retrieval checks before a sparse search index is built.

The script loads a public sparse encoder, encodes documents with encode_document() and a query with encode_query(), then prints tensor shapes, active dimension counts, and decoded token weights. The width is the model vocabulary size, not a dense embedding dimension, so most values should stay zero.

Use the same model, max_active_dims limit, query/document method split, and sparse tensor setting when moving from a smoke test into a search engine. Matching vocabulary widths, nonzero active dimensions, high sparsity, and an expected top match show that the sparse encoding path is ready for indexing experiments.

Steps to generate sparse embeddings with Sentence Transformers:

Create a sparse embedding smoke-test script.

sparse_embeddings_generate.py

from sentence_transformers import SparseEncoder
 
 
model = SparseEncoder("rasyosef/splade-tiny", max_active_dims=64)
 
documents = [
    "Reset expired password links from the account security page.",
    "Renew TLS certificates before the web server reload.",
    "Export customer invoices from the finance dashboard.",
]
query = "web server certificate renewal"
 
document_embeddings = model.encode_document(
    documents,
    convert_to_sparse_tensor=True,
    show_progress_bar=False,
)
query_embedding = model.encode_query(
    [query],
    convert_to_sparse_tensor=True,
    show_progress_bar=False,
)
 
document_stats = SparseEncoder.sparsity(document_embeddings)
query_stats = SparseEncoder.sparsity(query_embedding)
query_tokens = model.decode(query_embedding, top_k=4)[0]
scores = model.similarity(query_embedding, document_embeddings)[0]
best_index = int(scores.argmax())
 
print(f"document shape: {tuple(document_embeddings.shape)}")
print(f"query shape: {tuple(query_embedding.shape)}")
print(f"document active dims: {document_stats['active_dims']:.1f}")
print(f"query active dims: {query_stats['active_dims']:.1f}")
print(f"query sparsity: {query_stats['sparsity_ratio']:.4f}")
print("top query tokens:")
for token, weight in query_tokens:
    print(f"  {token}: {weight:.3f}")
print(f"top match: doc-{best_index + 1}")
print(f"text: {documents[best_index]}")

rasyosef/splade-tiny keeps the smoke test small. Replace it with the sparse encoder model selected for the corpus before building a production index.
Related: How to choose a Sentence Transformers model for semantic search

Run the script.

$ python sparse_embeddings_generate.py
document shape: (3, 30522)
query shape: (1, 30522)
document active dims: 22.7
query active dims: 15.0
query sparsity: 0.9995
top query tokens:
  certificate: 2.085
  web: 2.036
  server: 1.926
  renewal: 1.857
top match: doc-2
text: Renew TLS certificates before the web server reload.

convert_to_sparse_tensor=True returns sparse tensors. The second shape value is the vocabulary-width dimension used by the model.

Confirm that the document and query shapes share the same vocabulary width.

The query is encoded as a one-item list so the output remains a two-dimensional sparse tensor, matching the document batch shape.
Check the active dimension and sparsity lines.

active_dims counts nonzero dimensions after max_active_dims=64 is applied, while sparsity_ratio shows that nearly every vocabulary dimension remains zero.
Review the decoded query tokens and top match.

decode() returns weighted token dimensions, and the top match should be the certificate-renewal document for the sample query. For an indexed retrieval flow, reuse the same encode_query() and encode_document() split when building sparse search.
Related: How to build sparse semantic search with Sentence Transformers
Remove the temporary script after copying the pattern into the project.
```
$ rm sparse_embeddings_generate.py
```