Text similarity scores show how closely one sentence or short document matches another in embedding space. With Sentence Transformers, the same model that creates embeddings can compare them directly, so a developer can inspect sample pairs before building search, clustering, or reranking code.
SentenceTransformer.similarity() compares two embedding batches and returns a score matrix. Each row belongs to one query, each column belongs to one reference text, and the highest score in a row is the closest reference under the selected similarity function.
The default similarity function is cosine. Set similarity_fn_name when code must use dot, euclidean, or manhattan instead, and keep the same scoring function wherever those values are interpreted.
Use the environment that loads the same model in the application.
Related: How to install Sentence Transformers with pip
from sentence_transformers import SentenceTransformer, SimilarityFunction model = SentenceTransformer( "sentence-transformers/all-MiniLM-L6-v2", similarity_fn_name=SimilarityFunction.COSINE, ) reference_texts = [ "Reset an account password.", "Schedule a database backup.", "Bake sourdough bread.", ] queries = [ "Change my account password.", "Schedule a backup for the database.", ] reference_embeddings = model.encode(reference_texts) query_embeddings = model.encode(queries) scores = model.similarity(query_embeddings, reference_embeddings) print("similarity function:", model.similarity_fn_name) print("score matrix:") for query, row in zip(queries, scores): print(query) print(" " + " ".join(f"{float(score):.4f}" for score in row)) print("top matches:") for query, row in zip(queries, scores): best_index = int(row.argmax()) best_score = float(row[best_index]) print(f"{best_score:.4f} | {query} -> {reference_texts[best_index]}")
SimilarityFunction.COSINE is the default. Setting it in the script makes the scoring function visible in the output.
$ python calculate_similarity.py similarity function: cosine score matrix: Change my account password. 0.8099 0.2072 0.0756 Schedule a backup for the database. 0.3381 0.9309 0.1168 top matches: 0.8099 | Change my account password. -> Reset an account password. 0.9309 | Schedule a backup for the database. -> Schedule a database backup.
The score columns follow the order of reference_texts in the script. For cosine scoring, higher values mean closer embedding direction, while low or negative values are weaker matches.
$ rm calculate_similarity.py