Local embedding models keep retrieval prototypes inside the Python process instead of sending text to an external embedding API. LangChain can call Sentence Transformers models through the HuggingFaceEmbeddings integration, which returns vectors that fit LangChain retrievers and vector stores.
The integration lives in the separate langchain-huggingface package. It loads a model from the Hugging Face Hub or a local model path, then exposes embed_query() for one search string and embed_documents() for batches of source text.
Use a small public model first so the setup can be tested on CPU before moving to a larger model. The first run downloads model files into the normal Hugging Face or Sentence Transformers cache, and later runs reuse that cache unless the model name or cache settings change.
Related: venv-create
$ python3 -m pip install --upgrade langchain-huggingface
langchain-huggingface installs the LangChain wrapper and pulls in sentence-transformers for local model loading.
Related: pip-install
$ cat > langchain-sentence-transformers-embeddings-use.py <<'PY'
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={"device": "cpu"},
encode_kwargs={"normalize_embeddings": True},
)
texts = [
"Reset password requests should go to the identity team.",
"Invoice export issues belong to the billing queue.",
]
query_vector = embeddings.embed_query("How do I index support tickets?")
document_vectors = embeddings.embed_documents(texts)
vector_store = InMemoryVectorStore.from_texts(texts, embedding=embeddings)
match = vector_store.similarity_search("password help", k=1)[0]
print(f"query dimensions: {len(query_vector)}")
print(f"documents encoded: {len(document_vectors)}")
print(f"first document dimensions: {len(document_vectors[0])}")
print(f"query norm: {sum(value * value for value in query_vector) ** 0.5:.3f}")
print(f"best match: {match.page_content}")
PY
embed_query() accepts one search string. embed_documents() accepts a list of document strings and returns one vector per item.
$ python3 langchain-sentence-transformers-embeddings-use.py query dimensions: 384 documents encoded: 2 first document dimensions: 384 query norm: 1.000 best match: Reset password requests should go to the identity team.
The first run may print model download progress before the final lines.
$ rm langchain-sentence-transformers-embeddings-use.py