Long input text is truncated before a Sentence Transformers model creates embeddings. Setting a shorter maximum sequence length lets an embedding job cap token work for documents that have already been chunked, latency-sensitive tests, or workloads where only the beginning of each text should be embedded.
The max_seq_length property belongs to the loaded SentenceTransformer model object. Each model starts with its own limit, and assigning a lower value before encode() changes how much tokenized text reaches the transformer in that Python process.
Use the setting as a truncation guard, not as a replacement for document chunking. Raising the value above what the underlying transformer supports does not make that transformer accept longer inputs, and heavily clipped documents can lose important information near the end of the text.
Steps to set Sentence Transformers maximum sequence length:
- Open a Python environment with Sentence Transformers available.
- Check the current maximum sequence length for the selected model.
$ python3 - <<'PY' from sentence_transformers import SentenceTransformer model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") print(model.max_seq_length) PY 256Use model.max_seq_length directly. get_max_seq_length() remains for older code, but the package reference marks it deprecated in favor of the property.
- Create a smoke-test script that lowers the model sequence length before preprocessing and encoding.
- set_max_seq_length.py
from sentence_transformers import SentenceTransformer model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") sentence = ( "Sentence Transformers truncates input tokens when a document is longer " "than the configured sequence length. " ) long_text = sentence * 20 print(f"original max_seq_length={model.max_seq_length}") model.max_seq_length = 128 features = model.preprocess([long_text]) print(f"updated max_seq_length={model.max_seq_length}") print(f"tokenized sequence length={features['input_ids'].shape[1]}") embedding = model.encode([long_text], show_progress_bar=False) print(f"embedding shape={embedding.shape}")
preprocess() shows the tokenized input length after the limit is changed. Application code can set max_seq_length and call encode() without using preprocess().
- Run the script and confirm the tokenized input length matches the configured value.
$ python3 set_max_seq_length.py original max_seq_length=256 updated max_seq_length=128 tokenized sequence length=128 embedding shape=(1, 384)
The embedding width stays 384 for all-MiniLM-L6-v2 because sequence length controls input truncation, not output vector dimension.
- Move the assignment into the application code immediately after the model load.
from sentence_transformers import SentenceTransformer model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") model.max_seq_length = 128 embeddings = model.encode(documents, show_progress_bar=False)
Set the value before every encode() call that depends on the same truncation policy. Changing document truncation after a vector index is built changes stored embeddings and can change retrieval scores.
- Remove the temporary smoke-test script.
$ rm set_max_seq_length.py
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.