Local embedding models keep retrieval prototypes inside the Python process instead of sending text to an external embedding API. LangChain can call Sentence Transformers models through the HuggingFaceEmbeddings integration, which returns vectors that fit LangChain retrievers and vector stores.
The integration lives in the separate langchain-huggingface package. It loads a model from the Hugging Face Hub or a local model path, then exposes embed_query() for one search string and embed_documents() for batches of source text.
Use a small public model first so the setup can be tested on CPU before moving to a larger model. The first run downloads model files into the normal Hugging Face or Sentence Transformers cache, and later runs reuse that cache unless the model name or cache settings change.
Steps to use Sentence Transformers embeddings in LangChain:
- Open an activated Python project environment.
Related: venv-create
- Install the LangChain Hugging Face integration.
$ python3 -m pip install --upgrade langchain-huggingface
langchain-huggingface installs the LangChain wrapper and pulls in sentence-transformers for local model loading.
Related: pip-install - Create the embedding test script.
$ cat > langchain-sentence-transformers-embeddings-use.py <<'PY' from langchain_core.vectorstores import InMemoryVectorStore from langchain_huggingface import HuggingFaceEmbeddings embeddings = HuggingFaceEmbeddings( model_name="sentence-transformers/all-MiniLM-L6-v2", model_kwargs={"device": "cpu"}, encode_kwargs={"normalize_embeddings": True}, ) texts = [ "Reset password requests should go to the identity team.", "Invoice export issues belong to the billing queue.", ] query_vector = embeddings.embed_query("How do I index support tickets?") document_vectors = embeddings.embed_documents(texts) vector_store = InMemoryVectorStore.from_texts(texts, embedding=embeddings) match = vector_store.similarity_search("password help", k=1)[0] print(f"query dimensions: {len(query_vector)}") print(f"documents encoded: {len(document_vectors)}") print(f"first document dimensions: {len(document_vectors[0])}") print(f"query norm: {sum(value * value for value in query_vector) ** 0.5:.3f}") print(f"best match: {match.page_content}") PYembed_query() accepts one search string. embed_documents() accepts a list of document strings and returns one vector per item.
- Run the script.
$ python3 langchain-sentence-transformers-embeddings-use.py query dimensions: 384 documents encoded: 2 first document dimensions: 384 query norm: 1.000 best match: Reset password requests should go to the identity team.
The first run may print model download progress before the final lines.
- Remove the temporary script.
$ rm langchain-sentence-transformers-embeddings-use.py
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.