Local semantic search needs dense vectors from an embedding model and an index that can compare a query vector against them quickly. Sentence Transformers creates those vectors from text, while FAISS keeps the vectors in memory or on disk for nearest-neighbor search without a separate database service.
The sample corpus uses sentence-transformers/all-MiniLM-L6-v2 and a FAISS IndexFlatIP index. The document and query embeddings are normalized to unit length, so inner product scores from IndexFlatIP behave like cosine similarity for the indexed texts.
A FAISS index stores vectors by row position rather than the original document text. Keep a sidecar metadata file in the same order as the vectors so each returned row ID can be mapped back to the document, title, URL, or internal record ID that the application needs.
$ python -m pip install --upgrade sentence-transformers faiss-cpu
Use faiss-gpu only in an environment where the matching CUDA stack is supported. Keep only one FAISS package in the environment.
import json from pathlib import Path import faiss import numpy as np from sentence_transformers import SentenceTransformer corpus = [ { "id": "doc-001", "text": "Sentence Transformers converts text into dense embeddings.", }, { "id": "doc-002", "text": "FAISS stores vectors and searches nearest neighbors locally.", }, { "id": "doc-003", "text": "Cross-encoders rerank a small set of retrieved passages.", }, { "id": "doc-004", "text": "Qdrant stores vectors behind a database service API.", }, ] model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") texts = [item["text"] for item in corpus] document_embeddings = model.encode_document( texts, normalize_embeddings=True, convert_to_numpy=True, ) document_embeddings = np.asarray(document_embeddings, dtype="float32") dimension = document_embeddings.shape[1] index = faiss.IndexFlatIP(dimension) index.add(document_embeddings) faiss.write_index(index, "support-faq.faiss") Path("support-faq.json").write_text(json.dumps(corpus, indent=2), encoding="utf-8") loaded_index = faiss.read_index("support-faq.faiss") query_embedding = model.encode_query( ["Which library searches vectors nearest neighbors locally?"], normalize_embeddings=True, convert_to_numpy=True, ) query_embedding = np.asarray(query_embedding, dtype="float32") scores, row_ids = loaded_index.search(query_embedding, k=2) metadata = json.loads(Path("support-faq.json").read_text(encoding="utf-8")) print(f"embedding dimension: {dimension}") print(f"indexed vectors: {loaded_index.ntotal}") print("top matches:") for rank, (score, row_id) in enumerate(zip(scores[0], row_ids[0]), start=1): record = metadata[int(row_id)] print(f"{rank}. {record['id']} score={score:.4f} text={record['text']}")
encode_document() and encode_query() keep retrieval code ready for models that define different document and query prompts. The metadata file keeps application IDs beside the FAISS row order.
$ python build_faiss_index.py embedding dimension: 384 indexed vectors: 4 top matches: 1. doc-002 score=0.6675 text=FAISS stores vectors and searches nearest neighbors locally. 2. doc-004 score=0.3530 text=Qdrant stores vectors behind a database service API.
The first run may download the embedding model before printing the search output.
$ python - <<'PY'
import faiss, json
from pathlib import Path
index = faiss.read_index("support-faq.faiss")
records = json.loads(Path("support-faq.json").read_text())
print(f"index rows: {index.ntotal}")
print(f"metadata rows: {len(records)}")
PY
index rows: 4
metadata rows: 4