Text similarity scores show how closely one sentence or short document matches another in embedding space. With Sentence Transformers, the same model that creates embeddings can compare them directly, so a developer can inspect sample pairs before building search, clustering, or reranking code.
SentenceTransformer.similarity() compares two embedding batches and returns a score matrix. Each row belongs to one query, each column belongs to one reference text, and the highest score in a row is the closest reference under the selected similarity function.
The default similarity function is cosine. Set similarity_fn_name when code must use dot, euclidean, or manhattan instead, and keep the same scoring function wherever those values are interpreted.
Steps to calculate Sentence Transformers text similarity:
- Open a Python environment with Sentence Transformers available.
Use the environment that loads the same model in the application.
Related: How to install Sentence Transformers with pip - Create a similarity script with reference texts, query texts, and a cosine SentenceTransformer model.
- calculate_similarity.py
from sentence_transformers import SentenceTransformer, SimilarityFunction model = SentenceTransformer( "sentence-transformers/all-MiniLM-L6-v2", similarity_fn_name=SimilarityFunction.COSINE, ) reference_texts = [ "Reset an account password.", "Schedule a database backup.", "Bake sourdough bread.", ] queries = [ "Change my account password.", "Schedule a backup for the database.", ] reference_embeddings = model.encode(reference_texts) query_embeddings = model.encode(queries) scores = model.similarity(query_embeddings, reference_embeddings) print("similarity function:", model.similarity_fn_name) print("score matrix:") for query, row in zip(queries, scores): print(query) print(" " + " ".join(f"{float(score):.4f}" for score in row)) print("top matches:") for query, row in zip(queries, scores): best_index = int(row.argmax()) best_score = float(row[best_index]) print(f"{best_score:.4f} | {query} -> {reference_texts[best_index]}")
SimilarityFunction.COSINE is the default. Setting it in the script makes the scoring function visible in the output.
- Run the similarity script.
$ python calculate_similarity.py similarity function: cosine score matrix: Change my account password. 0.8099 0.2072 0.0756 Schedule a backup for the database. 0.3381 0.9309 0.1168 top matches: 0.8099 | Change my account password. -> Reset an account password. 0.9309 | Schedule a backup for the database. -> Schedule a database backup.
- Confirm that each query's highest score points to the intended reference text.
The score columns follow the order of reference_texts in the script. For cosine scoring, higher values mean closer embedding direction, while low or negative values are weaker matches.
- Remove the temporary script after recording the scoring behavior.
$ rm calculate_similarity.py
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.