How to use Sentence Transformers embeddings in LangChain

Local embedding models keep retrieval prototypes inside the Python process instead of sending text to an external embedding API. LangChain can call Sentence Transformers models through the HuggingFaceEmbeddings integration, which returns vectors that fit LangChain retrievers and vector stores.

The integration lives in the separate langchain-huggingface package. It loads a model from the Hugging Face Hub or a local model path, then exposes embed_query() for one search string and embed_documents() for batches of source text.

Use a small public model first so the setup can be tested on CPU before moving to a larger model. The first run downloads model files into the normal Hugging Face or Sentence Transformers cache, and later runs reuse that cache unless the model name or cache settings change.

Steps to use Sentence Transformers embeddings in LangChain:

Open an activated Python project environment.

Related: venv-create
Install the LangChain Hugging Face integration.
```
$ python3 -m pip install --upgrade langchain-huggingface
```
langchain-huggingface installs the LangChain wrapper and pulls in sentence-transformers for local model loading.
Related: pip-install

Create the embedding test script.

$ cat > langchain-sentence-transformers-embeddings-use.py <<'PY'
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True},
)

texts = [
    "Reset password requests should go to the identity team.",
    "Invoice export issues belong to the billing queue.",
]

query_vector = embeddings.embed_query("How do I index support tickets?")
document_vectors = embeddings.embed_documents(texts)
vector_store = InMemoryVectorStore.from_texts(texts, embedding=embeddings)
match = vector_store.similarity_search("password help", k=1)[0]

print(f"query dimensions: {len(query_vector)}")
print(f"documents encoded: {len(document_vectors)}")
print(f"first document dimensions: {len(document_vectors[0])}")
print(f"query norm: {sum(value * value for value in query_vector) ** 0.5:.3f}")
print(f"best match: {match.page_content}")
PY

embed_query() accepts one search string. embed_documents() accepts a list of document strings and returns one vector per item.

Run the script.

$ python3 langchain-sentence-transformers-embeddings-use.py
query dimensions: 384
documents encoded: 2
first document dimensions: 384
query norm: 1.000
best match: Reset password requests should go to the identity team.

The first run may print model download progress before the final lines.

Remove the temporary script.

$ rm langchain-sentence-transformers-embeddings-use.py

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.