Sentence Transformers can run embedding inference through PyTorch, ONNX Runtime, or OpenVINO. Choosing the backend before wiring the model into an application keeps the first deployment aligned with the available hardware, installed runtime packages, and export format.
The backend argument controls the inference runtime used by SentenceTransformer. torch is the default path and usually fits development, GPU-backed experiments, and the first working local baseline. onnx and openvino add runtime-specific dependencies and may export the model on first load when a matching exported file is not already present.
Use the backend decision as a short smoke test rather than a benchmark substitute. A backend is ready for the workload when the runtime package is present, the model loads with that backend name, and a sample encode call returns the expected embedding shape.
$ python - <<'PY'
import importlib.util
import torch
available = ["torch"]
if importlib.util.find_spec("onnxruntime"):
available.append("onnx")
if importlib.util.find_spec("openvino"):
available.append("openvino")
print("available_backends=" + ",".join(available))
print(f"cuda_available={torch.cuda.is_available()}")
PY
available_backends=torch
cuda_available=False
torch is always available when Sentence Transformers and PyTorch are installed. onnx appears when onnxruntime is installed, and openvino appears when the OpenVINO runtime is installed.
Choose torch for the first local baseline, GPU-backed PyTorch deployments, or model work that still changes often. Choose onnx when the service already standardizes on ONNX Runtime. Choose openvino for CPU-focused deployments on hardware where OpenVINO is the intended inference runtime.
$ pip install -U "sentence-transformers[onnx]"
Use sentence-transformers[onnx-gpu] when ONNX Runtime needs GPU providers, or sentence-transformers[openvino] when the selected backend is openvino. Skip this step for the default torch backend.
$ BACKEND=torch python - <<'PY'
import os
from sentence_transformers import SentenceTransformer
backend = os.environ["BACKEND"]
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2", backend=backend)
embeddings = model.encode(["backend choice check", "short text"], convert_to_numpy=True)
print(f"selected_backend={backend}")
print(f"embedding_shape={embeddings.shape}")
PY
selected_backend=torch
embedding_shape=(2, 384)
Replace BACKEND=torch with BACKEND=onnx or BACKEND=openvino after installing the matching extra. The first onnx or openvino load can take longer when Sentence Transformers must export the model before running inference.