How to use the OpenVINO backend with Sentence Transformers

Sentence Transformers can run embedding inference through OpenVINO when CPU deployment needs an exported inference graph instead of a live PyTorch transformer. The openvino backend keeps SentenceTransformer.encode() as the application entry point while the transformer component runs from an OpenVINO IR file.

The backend is enabled by installing sentence-transformers[openvino] and passing backend=“openvino” when loading the model. A repository or local directory that already contains an OpenVINO model is used directly; otherwise Sentence Transformers can export the model on first load.

Keep the exported directory as the deployment artifact when startup time matters. The OpenVINO XML covers the transformer graph, while Sentence Transformers still handles tokenization, pooling, and optional embedding normalization around it.

Steps to use the OpenVINO backend with Sentence Transformers:

Activate the Python environment that will run OpenVINO inference.
```
$ source .venv/bin/activate
```
Install the OpenVINO backend extra in the active environment.
```
(.venv) $ python -m pip install --upgrade "sentence-transformers[openvino]"
```
The extra installs OpenVINO, Optimum Intel, and related export dependencies.
Related: How to install Sentence Transformers with pip

Create an OpenVINO backend smoke-test script.

openvino_backend_check.py

from pathlib import Path
 
from sentence_transformers import SentenceTransformer
 
 
model_id = "sentence-transformers/all-MiniLM-L6-v2"
save_dir = Path("all-minilm-l6-v2-openvino")
openvino_file = "openvino/openvino_model.xml"
 
model = SentenceTransformer(
    model_id,
    backend="openvino",
    model_kwargs={
        "file_name": openvino_file,
        "export": True,
    },
)
 
embeddings = model.encode(
    [
        "billing question about an invoice",
        "password reset problem",
    ],
    normalize_embeddings=True,
    show_progress_bar=False,
)
 
model.save_pretrained(save_dir)
 
reloaded = SentenceTransformer(
    str(save_dir),
    backend="openvino",
    model_kwargs={
        "file_name": openvino_file,
        "export": False,
    },
    local_files_only=True,
)
reloaded_embedding = reloaded.encode(
    ["billing invoice question"],
    show_progress_bar=False,
)
 
print(f"backend: {model.get_backend()}")
print(f"embedding shape: {embeddings.shape}")
print(f"saved OpenVINO XML: {save_dir / openvino_file}")
print(f"saved XML exists: {(save_dir / openvino_file).exists()}")
print(f"reloaded backend: {reloaded.get_backend()}")
print(f"reloaded shape: {reloaded_embedding.shape}")

export=True makes the first-load conversion explicit. Use export=False when reloading a directory that already contains the exported XML file.

Run the smoke-test script.

$ python openvino_backend_check.py
##### snipped #####
backend: openvino
embedding shape: (2, 384)
saved OpenVINO XML: all-minilm-l6-v2-openvino/openvino/openvino_model.xml
saved XML exists: True
reloaded backend: openvino
reloaded shape: (1, 384)

backend: openvino confirms the runtime selection. The first shape value matches the two input texts, and the reloaded shape confirms the saved directory can be loaded without another export.

Remove the smoke-test script after the saved directory reloads.
```
$ rm openvino_backend_check.py
```
Keep all-minilm-l6-v2-openvino when the application should reuse the exported OpenVINO model directory.

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.