How to save and reload a Sentence Transformers model

A saved Sentence Transformers directory is the handoff point between training, packaging, and later embedding jobs. Saving the model locally keeps the tokenizer, pooling configuration, and weight files together so a later Python process can reload the same embedding behavior without depending on the original model object.

The save_pretrained() method writes the model modules, configuration, tokenizer files, and weights to a directory. Loading that saved path with SentenceTransformer() should recreate the model stack, and local_files_only=True makes the reload fail if the path is incomplete instead of falling back to a remote download.

A public MiniLM embedding model can stand in for a fine-tuned checkpoint during smoke testing. The local directory should remain empty before export, and the pass condition is matching embedding shape plus a zero max difference after reload.

Steps to save and reload a Sentence Transformers model:

Choose the source model and local directory for the saved copy.

Use the model ID and local path defined in the script, and start with an empty directory. A path that already contains another model can leave stale tokenizer, pooling, or weight files beside the new export.

Create a save and reload smoke-test script.

save_reload.py

from pathlib import Path
import importlib as il
 
import numpy as np
 
 
ST = il.import_module(
    "sentence_transformers"
).SentenceTransformer
 
model_id = (
    "sentence-transformers/"
    "all-MiniLM-L6-v2"
)
save_path = Path(
    "models/"
    "support-embeddings"
)
texts = [
    "reset password",
    "change billing address",
]
files = [
    "modules.json",
    (
        "config_sentence_"
        "transformers.json"
    ),
    "model.safetensors",
    "1_Pooling/config.json",
]
 
if save_path.exists():
    message = (
        f"{save_path} exists; "
        "choose an empty path"
    )
    raise SystemExit(message)
 
model = ST(model_id)
before = model.encode(
    texts,
)
 
model.save_pretrained(
    str(save_path),
    safe_serialization=True,
)
 
missing = []
for name in files:
    if not (save_path / name).exists():
        missing.append(name)
if missing:
    raise SystemExit(
        "missing saved files: "
        + ", ".join(missing)
    )
 
reloaded = ST(
    str(save_path),
    local_files_only=True,
)
after = reloaded.encode(
    texts,
)
 
delta = abs(before - after)
diff = float(
    np.max(delta)
)
if before.shape != after.shape:
    raise SystemExit(
        "shape mismatch"
    )
if diff > 1e-6:
    raise SystemExit(
        "embedding mismatch"
    )
 
print("saved path: ok")
print(
    "checked files:",
    len(files),
)
print(
    "shape before:",
    before.shape,
)
print(
    "shape after:",
    after.shape,
)
print(
    "max diff:",
    f"{diff:.8f}",
)

safe_serialization=True writes the model weights as .safetensors. Keep it enabled unless a downstream runtime specifically requires legacy PyTorch weight files.

Run the smoke-test script.
```
$ python save_reload.py
saved path: ok
checked files: 4
shape before: (2, 384)
shape after: (2, 384)
max diff: 0.00000000
```
The printed shapes should match the number of input texts and the embedding dimension of the saved model. A nonzero max difference means the reload did not reproduce the same embeddings.
Create a separate local-path reload script for the application handoff. <file python load_model.py>import importlib as il

ST = il.import_module(

  "sentence_transformers"

).SentenceTransformer

model = ST(

  "models/"
  "support-embeddings",
  local_files_only=True,

) embeddings = model.encode(

  ["reset password"],

) print(

  "local shape:",
  embeddings.shape,

)

Run the local-path reload script from the same project directory.
```
$ python load_model.py
local shape: (1, 384)
```
Remove the smoke-test scripts after the saved model has passed the reload checks.
```
$ rm save_reload.py \
  load_model.py
```
This removes only the temporary validation scripts. Keep models/support-embeddings as the saved model artifact.