Sentence Transformers can run embedding inference through OpenVINO when CPU deployment needs an exported inference graph instead of a live PyTorch transformer. The openvino backend keeps SentenceTransformer.encode() as the application entry point while the transformer component runs from an OpenVINO IR file.
The backend is enabled by installing sentence-transformers[openvino] and passing backend=“openvino” when loading the model. A repository or local directory that already contains an OpenVINO model is used directly; otherwise Sentence Transformers can export the model on first load.
Keep the exported directory as the deployment artifact when startup time matters. The OpenVINO XML covers the transformer graph, while Sentence Transformers still handles tokenization, pooling, and optional embedding normalization around it.
$ source .venv/bin/activate
(.venv) $ python -m pip install --upgrade "sentence-transformers[openvino]"
The extra installs OpenVINO, Optimum Intel, and related export dependencies.
Related: How to install Sentence Transformers with pip
from pathlib import Path from sentence_transformers import SentenceTransformer model_id = "sentence-transformers/all-MiniLM-L6-v2" save_dir = Path("all-minilm-l6-v2-openvino") openvino_file = "openvino/openvino_model.xml" model = SentenceTransformer( model_id, backend="openvino", model_kwargs={ "file_name": openvino_file, "export": True, }, ) embeddings = model.encode( [ "billing question about an invoice", "password reset problem", ], normalize_embeddings=True, show_progress_bar=False, ) model.save_pretrained(save_dir) reloaded = SentenceTransformer( str(save_dir), backend="openvino", model_kwargs={ "file_name": openvino_file, "export": False, }, local_files_only=True, ) reloaded_embedding = reloaded.encode( ["billing invoice question"], show_progress_bar=False, ) print(f"backend: {model.get_backend()}") print(f"embedding shape: {embeddings.shape}") print(f"saved OpenVINO XML: {save_dir / openvino_file}") print(f"saved XML exists: {(save_dir / openvino_file).exists()}") print(f"reloaded backend: {reloaded.get_backend()}") print(f"reloaded shape: {reloaded_embedding.shape}")
export=True makes the first-load conversion explicit. Use export=False when reloading a directory that already contains the exported XML file.
$ python openvino_backend_check.py ##### snipped ##### backend: openvino embedding shape: (2, 384) saved OpenVINO XML: all-minilm-l6-v2-openvino/openvino/openvino_model.xml saved XML exists: True reloaded backend: openvino reloaded shape: (1, 384)
backend: openvino confirms the runtime selection. The first shape value matches the two input texts, and the reloaded shape confirms the saved directory can be loaded without another export.
$ rm openvino_backend_check.py
Keep all-minilm-l6-v2-openvino when the application should reuse the exported OpenVINO model directory.