A saved Sentence Transformers directory is the handoff point between training, packaging, and later embedding jobs. Saving the model locally keeps the tokenizer, pooling configuration, and weight files together so a later Python process can reload the same embedding behavior without depending on the original model object.

The save_pretrained() method writes the model modules, configuration, tokenizer files, and weights to a directory. Loading that saved path with SentenceTransformer() should recreate the model stack, and local_files_only=True makes the reload fail if the path is incomplete instead of falling back to a remote download.

A public MiniLM embedding model can stand in for a fine-tuned checkpoint during smoke testing. The local directory should remain empty before export, and the pass condition is matching embedding shape plus a zero max difference after reload.

Steps to save and reload a Sentence Transformers model:

  1. Choose the source model and local directory for the saved copy.

    Use the model ID and local path defined in the script, and start with an empty directory. A path that already contains another model can leave stale tokenizer, pooling, or weight files beside the new export.

  2. Create a save and reload smoke-test script.
    save_reload.py
    from pathlib import Path
    import importlib as il
     
    import numpy as np
     
     
    ST = il.import_module(
        "sentence_transformers"
    ).SentenceTransformer
     
    model_id = (
        "sentence-transformers/"
        "all-MiniLM-L6-v2"
    )
    save_path = Path(
        "models/"
        "support-embeddings"
    )
    texts = [
        "reset password",
        "change billing address",
    ]
    files = [
        "modules.json",
        (
            "config_sentence_"
            "transformers.json"
        ),
        "model.safetensors",
        "1_Pooling/config.json",
    ]
     
    if save_path.exists():
        message = (
            f"{save_path} exists; "
            "choose an empty path"
        )
        raise SystemExit(message)
     
    model = ST(model_id)
    before = model.encode(
        texts,
    )
     
    model.save_pretrained(
        str(save_path),
        safe_serialization=True,
    )
     
    missing = []
    for name in files:
        if not (save_path / name).exists():
            missing.append(name)
    if missing:
        raise SystemExit(
            "missing saved files: "
            + ", ".join(missing)
        )
     
    reloaded = ST(
        str(save_path),
        local_files_only=True,
    )
    after = reloaded.encode(
        texts,
    )
     
    delta = abs(before - after)
    diff = float(
        np.max(delta)
    )
    if before.shape != after.shape:
        raise SystemExit(
            "shape mismatch"
        )
    if diff > 1e-6:
        raise SystemExit(
            "embedding mismatch"
        )
     
    print("saved path: ok")
    print(
        "checked files:",
        len(files),
    )
    print(
        "shape before:",
        before.shape,
    )
    print(
        "shape after:",
        after.shape,
    )
    print(
        "max diff:",
        f"{diff:.8f}",
    )

    safe_serialization=True writes the model weights as .safetensors. Keep it enabled unless a downstream runtime specifically requires legacy PyTorch weight files.

  3. Run the smoke-test script.
    $ python save_reload.py
    saved path: ok
    checked files: 4
    shape before: (2, 384)
    shape after: (2, 384)
    max diff: 0.00000000

    The printed shapes should match the number of input texts and the embedding dimension of the saved model. A nonzero max difference means the reload did not reproduce the same embeddings.

  4. Create a separate local-path reload script for the application handoff. <file python load_model.py>import importlib as il

ST = il.import_module(

  "sentence_transformers"

).SentenceTransformer

model = ST(

  "models/"
  "support-embeddings",
  local_files_only=True,

) embeddings = model.encode(

  ["reset password"],

) print(

  "local shape:",
  embeddings.shape,

)

  1. Run the local-path reload script from the same project directory.
    $ python load_model.py
    local shape: (1, 384)
  2. Remove the smoke-test scripts after the saved model has passed the reload checks.
    $ rm save_reload.py \
      load_model.py

    This removes only the temporary validation scripts. Keep models/support-embeddings as the saved model artifact.