Vector search moves semantic matching out of an in-memory prototype and into a collection that can store payloads, IDs, and nearest-neighbor results. Pairing Sentence Transformers with Qdrant lets a Python application encode support documents, upload their vectors, and retrieve the closest record for a natural-language query.
Sentence Transformers handles the dense embeddings with encode_document() and encode_query(), which keep retrieval code ready for models that define separate document and query prompts. Qdrant stores each vector with payload fields so a hit can be mapped back to the source document instead of only returning an array position.
A local in-memory Qdrant client keeps the smoke test reproducible without a running database container. Persistent Qdrant and Qdrant Cloud deployments use the same collection, vector size, upload, and query calls after the client connection is pointed at the service.
$ python -m pip install --upgrade sentence-transformers qdrant-client
The first model run may download model files from Hugging Face.
Related: How to install Sentence Transformers with pip
import numpy as np from qdrant_client import QdrantClient, models from sentence_transformers import SentenceTransformer collection_name = "support_docs" query = "password reset instructions" corpus = [ { "doc_id": "doc-001", "title": "Reset a forgotten password", "text": "Reset a forgotten password from account settings and confirm the email link.", }, { "doc_id": "doc-002", "title": "Create an invoice receipt", "text": "Create a billing invoice and download a PDF receipt.", }, { "doc_id": "doc-003", "title": "Rotate API tokens", "text": "Rotate API tokens before sharing a new integration with a teammate.", }, { "doc_id": "doc-004", "title": "Store semantic vectors", "text": "Qdrant stores Sentence Transformers embeddings for semantic search.", }, ] model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") documents = [item["text"] for item in corpus] document_embeddings = model.encode_document( documents, normalize_embeddings=True, convert_to_numpy=True, show_progress_bar=False, ) document_embeddings = np.asarray(document_embeddings, dtype="float32") dimension = document_embeddings.shape[1] client = QdrantClient(":memory:") client.create_collection( collection_name=collection_name, vectors_config=models.VectorParams( size=dimension, distance=models.Distance.COSINE, ), ) client.upload_points( collection_name=collection_name, points=[ models.PointStruct( id=index, vector=vector.tolist(), payload=item, ) for index, (item, vector) in enumerate(zip(corpus, document_embeddings), start=1) ], ) query_embedding = model.encode_query( query, normalize_embeddings=True, convert_to_numpy=True, show_progress_bar=False, ) hits = client.query_points( collection_name=collection_name, query=query_embedding.tolist(), limit=2, with_payload=True, ).points point_count = client.count(collection_name=collection_name, exact=True).count print(f"collection: {collection_name}") print(f"vector size: {dimension}") print(f"points: {point_count}") print(f"query: {query}") print("top matches:") for rank, hit in enumerate(hits, start=1): payload = hit.payload print( f"{rank}. {payload['doc_id']} score={hit.score:.4f} " f"title={payload['title']}" ) if hits[0].payload["doc_id"] != "doc-001": raise SystemExit(f"unexpected top match: {hits[0].payload['doc_id']}") print("verification: PASS query returned the password reset document")
Replace the in-memory client with QdrantClient(url=“https://qdrant.example.com”, api_key=“qdrant-api-key”) when the collection must persist outside the Python process. Keep the collection vector size tied to the same embedding model used for query vectors.
$ python build_qdrant_index.py collection: support_docs vector size: 384 points: 4 query: password reset instructions top matches: 1. doc-001 score=0.5070 title=Reset a forgotten password 2. doc-003 score=0.1281 title=Rotate API tokens verification: PASS query returned the password reset document
doc-001 wins because its payload text matches the password reset query. The point count should match the number of source records before the collection is used by an application.