How to check embedding dimensions with Sentence Transformers

Vector stores and similarity indexes need one fixed length for every dense vector in a collection. With Sentence Transformers, that length comes from the loaded model and the encode settings, so checking it before schema creation prevents dimension mismatch errors later.

get_embedding_dimension() reports the output length for encode(), while the encoded array shape confirms the value against actual input text. Reading both values is a quick way to catch the wrong model ID, a stale index setting, or an unexpected truncation path before embeddings are written.

Use the same model ID and encode options that the application will use for indexing and querying. If production code sets truncate_dim, verify that shortened shape instead of the model's full vector length.

Steps to check Sentence Transformers embedding dimensions:

Open a Python environment with Sentence Transformers available.

Use the same environment that will build the vector index or write embeddings for the application.
Related: How to install Sentence Transformers with pip

Create a dimension-check script for the selected model.

check_dim.py

import sentence_transformers as st
 
 
org = "sentence-transformers"
model_name = "all-MiniLM-L6-v2"
model_id = f"{org}/{model_name}"
model = st.SentenceTransformer(model_id)
 
sentences = [
    "Reset a user password",
    "Create an S3 bucket",
]
 
embeddings = model.encode(sentences)
get_dim = model.get_embedding_dimension
reported_dim = get_dim()
actual_dim = embeddings.shape[1]
match = actual_dim == reported_dim
 
print(f"model={model_name}")
print(f"reported_dim={reported_dim}")
print(f"shape={embeddings.shape}")
print(f"match={match}")
 
if not match:
    raise SystemExit(
        f"mismatch: shape={actual_dim}, dim={reported_dim}"
    )

Replace model_id with the model that will write the production vectors.

Run the script and compare the reported dimension with the encoded array shape.
```
$ python check_dim.py
model=all-MiniLM-L6-v2
reported_dim=384
shape=(2, 384)
match=True
```
The first number in shape is the number of input sentences. The second number is the embedding dimension that downstream indexes and vector fields must accept.
Use the reported dimension for vector database collections, FAISS index construction, and tensor shape checks.

Recreate or rebuild an existing vector index when its configured dimension differs from the new model output. A collection created for 768-dimension vectors cannot accept 384-dimension embeddings.
Remove the temporary script after recording the dimension.
```
$ rm check_dim.py
```

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.