Sparse text embeddings keep only a small set of weighted vocabulary dimensions active for each input. In Sentence Transformers, SparseEncoder models expose that representation for SPLADE-style retrieval checks before a sparse search index is built.
The script loads a public sparse encoder, encodes documents with encode_document() and a query with encode_query(), then prints tensor shapes, active dimension counts, and decoded token weights. The width is the model vocabulary size, not a dense embedding dimension, so most values should stay zero.
Use the same model, max_active_dims limit, query/document method split, and sparse tensor setting when moving from a smoke test into a search engine. Matching vocabulary widths, nonzero active dimensions, high sparsity, and an expected top match show that the sparse encoding path is ready for indexing experiments.
from sentence_transformers import SparseEncoder model = SparseEncoder("rasyosef/splade-tiny", max_active_dims=64) documents = [ "Reset expired password links from the account security page.", "Renew TLS certificates before the web server reload.", "Export customer invoices from the finance dashboard.", ] query = "web server certificate renewal" document_embeddings = model.encode_document( documents, convert_to_sparse_tensor=True, show_progress_bar=False, ) query_embedding = model.encode_query( [query], convert_to_sparse_tensor=True, show_progress_bar=False, ) document_stats = SparseEncoder.sparsity(document_embeddings) query_stats = SparseEncoder.sparsity(query_embedding) query_tokens = model.decode(query_embedding, top_k=4)[0] scores = model.similarity(query_embedding, document_embeddings)[0] best_index = int(scores.argmax()) print(f"document shape: {tuple(document_embeddings.shape)}") print(f"query shape: {tuple(query_embedding.shape)}") print(f"document active dims: {document_stats['active_dims']:.1f}") print(f"query active dims: {query_stats['active_dims']:.1f}") print(f"query sparsity: {query_stats['sparsity_ratio']:.4f}") print("top query tokens:") for token, weight in query_tokens: print(f" {token}: {weight:.3f}") print(f"top match: doc-{best_index + 1}") print(f"text: {documents[best_index]}")
rasyosef/splade-tiny keeps the smoke test small. Replace it with the sparse encoder model selected for the corpus before building a production index.
Related: How to choose a Sentence Transformers model for semantic search
$ python sparse_embeddings_generate.py document shape: (3, 30522) query shape: (1, 30522) document active dims: 22.7 query active dims: 15.0 query sparsity: 0.9995 top query tokens: certificate: 2.085 web: 2.036 server: 1.926 renewal: 1.857 top match: doc-2 text: Renew TLS certificates before the web server reload.
convert_to_sparse_tensor=True returns sparse tensors. The second shape value is the vocabulary-width dimension used by the model.
The query is encoded as a one-item list so the output remains a two-dimensional sparse tensor, matching the document batch shape.
active_dims counts nonzero dimensions after max_active_dims=64 is applied, while sparsity_ratio shows that nearly every vocabulary dimension remains zero.
decode() returns weighted token dimensions, and the top match should be the certificate-renewal document for the sample query. For an indexed retrieval flow, reuse the same encode_query() and encode_document() split when building sparse search.
Related: How to build sparse semantic search with Sentence Transformers
$ rm sparse_embeddings_generate.py