A saved Sentence Transformers directory is the handoff point between training, packaging, and later embedding jobs. Saving the model locally keeps the tokenizer, pooling configuration, and weight files together so a later Python process can reload the same embedding behavior without depending on the original model object.
The save_pretrained() method writes the model modules, configuration, tokenizer files, and weights to a directory. Loading that saved path with SentenceTransformer() should recreate the model stack, and local_files_only=True makes the reload fail if the path is incomplete instead of falling back to a remote download.
A public MiniLM embedding model can stand in for a fine-tuned checkpoint during smoke testing. The local directory should remain empty before export, and the pass condition is matching embedding shape plus a zero max difference after reload.
Use the model ID and local path defined in the script, and start with an empty directory. A path that already contains another model can leave stale tokenizer, pooling, or weight files beside the new export.
from pathlib import Path import importlib as il import numpy as np ST = il.import_module( "sentence_transformers" ).SentenceTransformer model_id = ( "sentence-transformers/" "all-MiniLM-L6-v2" ) save_path = Path( "models/" "support-embeddings" ) texts = [ "reset password", "change billing address", ] files = [ "modules.json", ( "config_sentence_" "transformers.json" ), "model.safetensors", "1_Pooling/config.json", ] if save_path.exists(): message = ( f"{save_path} exists; " "choose an empty path" ) raise SystemExit(message) model = ST(model_id) before = model.encode( texts, ) model.save_pretrained( str(save_path), safe_serialization=True, ) missing = [] for name in files: if not (save_path / name).exists(): missing.append(name) if missing: raise SystemExit( "missing saved files: " + ", ".join(missing) ) reloaded = ST( str(save_path), local_files_only=True, ) after = reloaded.encode( texts, ) delta = abs(before - after) diff = float( np.max(delta) ) if before.shape != after.shape: raise SystemExit( "shape mismatch" ) if diff > 1e-6: raise SystemExit( "embedding mismatch" ) print("saved path: ok") print( "checked files:", len(files), ) print( "shape before:", before.shape, ) print( "shape after:", after.shape, ) print( "max diff:", f"{diff:.8f}", )
safe_serialization=True writes the model weights as .safetensors. Keep it enabled unless a downstream runtime specifically requires legacy PyTorch weight files.
$ python save_reload.py saved path: ok checked files: 4 shape before: (2, 384) shape after: (2, 384) max diff: 0.00000000
The printed shapes should match the number of input texts and the embedding dimension of the saved model. A nonzero max difference means the reload did not reproduce the same embeddings.
ST = il.import_module(
"sentence_transformers"
).SentenceTransformer
model = ST(
"models/" "support-embeddings", local_files_only=True,
) embeddings = model.encode(
["reset password"],
) print(
"local shape:", embeddings.shape,
)
$ python load_model.py local shape: (1, 384)
$ rm save_reload.py \ load_model.py
This removes only the temporary validation scripts. Keep models/support-embeddings as the saved model artifact.