How to calculate text similarity with Sentence Transformers

Text similarity scores show how closely one sentence or short document matches another in embedding space. With Sentence Transformers, the same model that creates embeddings can compare them directly, so a developer can inspect sample pairs before building search, clustering, or reranking code.

SentenceTransformer.similarity() compares two embedding batches and returns a score matrix. Each row belongs to one query, each column belongs to one reference text, and the highest score in a row is the closest reference under the selected similarity function.

The default similarity function is cosine. Set similarity_fn_name when code must use dot, euclidean, or manhattan instead, and keep the same scoring function wherever those values are interpreted.

Steps to calculate Sentence Transformers text similarity:

  1. Open a Python environment with Sentence Transformers available.

    Use the environment that loads the same model in the application.
    Related: How to install Sentence Transformers with pip

  2. Create a similarity script with reference texts, query texts, and a cosine SentenceTransformer model.
    calculate_similarity.py
    from sentence_transformers import SentenceTransformer, SimilarityFunction
     
     
    model = SentenceTransformer(
        "sentence-transformers/all-MiniLM-L6-v2",
        similarity_fn_name=SimilarityFunction.COSINE,
    )
     
    reference_texts = [
        "Reset an account password.",
        "Schedule a database backup.",
        "Bake sourdough bread.",
    ]
    queries = [
        "Change my account password.",
        "Schedule a backup for the database.",
    ]
     
    reference_embeddings = model.encode(reference_texts)
    query_embeddings = model.encode(queries)
    scores = model.similarity(query_embeddings, reference_embeddings)
     
    print("similarity function:", model.similarity_fn_name)
    print("score matrix:")
    for query, row in zip(queries, scores):
        print(query)
        print("  " + "  ".join(f"{float(score):.4f}" for score in row))
     
    print("top matches:")
    for query, row in zip(queries, scores):
        best_index = int(row.argmax())
        best_score = float(row[best_index])
        print(f"{best_score:.4f} | {query} -> {reference_texts[best_index]}")

    SimilarityFunction.COSINE is the default. Setting it in the script makes the scoring function visible in the output.

  3. Run the similarity script.
    $ python calculate_similarity.py
    similarity function: cosine
    score matrix:
    Change my account password.
      0.8099  0.2072  0.0756
    Schedule a backup for the database.
      0.3381  0.9309  0.1168
    top matches:
    0.8099 | Change my account password. -> Reset an account password.
    0.9309 | Schedule a backup for the database. -> Schedule a database backup.
  4. Confirm that each query's highest score points to the intended reference text.

    The score columns follow the order of reference_texts in the script. For cosine scoring, higher values mean closer embedding direction, while low or negative values are weaker matches.

  5. Remove the temporary script after recording the scoring behavior.
    $ rm calculate_similarity.py