Local semantic search needs dense vectors from an embedding model and an index that can compare a query vector against them quickly. Sentence Transformers creates those vectors from text, while FAISS keeps the vectors in memory or on disk for nearest-neighbor search without a separate database service.
The sample corpus uses sentence-transformers/all-MiniLM-L6-v2 and a FAISS IndexFlatIP index. The document and query embeddings are normalized to unit length, so inner product scores from IndexFlatIP behave like cosine similarity for the indexed texts.
A FAISS index stores vectors by row position rather than the original document text. Keep a sidecar metadata file in the same order as the vectors so each returned row ID can be mapped back to the document, title, URL, or internal record ID that the application needs.
Steps to build a FAISS index with Sentence Transformers:
- Install the Python packages in the active environment.
$ python -m pip install --upgrade sentence-transformers faiss-cpu
Use faiss-gpu only in an environment where the matching CUDA stack is supported. Keep only one FAISS package in the environment.
- Create the index-building script.
- build_faiss_index.py
import json from pathlib import Path import faiss import numpy as np from sentence_transformers import SentenceTransformer corpus = [ { "id": "doc-001", "text": "Sentence Transformers converts text into dense embeddings.", }, { "id": "doc-002", "text": "FAISS stores vectors and searches nearest neighbors locally.", }, { "id": "doc-003", "text": "Cross-encoders rerank a small set of retrieved passages.", }, { "id": "doc-004", "text": "Qdrant stores vectors behind a database service API.", }, ] model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") texts = [item["text"] for item in corpus] document_embeddings = model.encode_document( texts, normalize_embeddings=True, convert_to_numpy=True, ) document_embeddings = np.asarray(document_embeddings, dtype="float32") dimension = document_embeddings.shape[1] index = faiss.IndexFlatIP(dimension) index.add(document_embeddings) faiss.write_index(index, "support-faq.faiss") Path("support-faq.json").write_text(json.dumps(corpus, indent=2), encoding="utf-8") loaded_index = faiss.read_index("support-faq.faiss") query_embedding = model.encode_query( ["Which library searches vectors nearest neighbors locally?"], normalize_embeddings=True, convert_to_numpy=True, ) query_embedding = np.asarray(query_embedding, dtype="float32") scores, row_ids = loaded_index.search(query_embedding, k=2) metadata = json.loads(Path("support-faq.json").read_text(encoding="utf-8")) print(f"embedding dimension: {dimension}") print(f"indexed vectors: {loaded_index.ntotal}") print("top matches:") for rank, (score, row_id) in enumerate(zip(scores[0], row_ids[0]), start=1): record = metadata[int(row_id)] print(f"{rank}. {record['id']} score={score:.4f} text={record['text']}")
encode_document() and encode_query() keep retrieval code ready for models that define different document and query prompts. The metadata file keeps application IDs beside the FAISS row order.
- Run the script to build, save, reload, and search the index.
$ python build_faiss_index.py embedding dimension: 384 indexed vectors: 4 top matches: 1. doc-002 score=0.6675 text=FAISS stores vectors and searches nearest neighbors locally. 2. doc-004 score=0.3530 text=Qdrant stores vectors behind a database service API.
The first run may download the embedding model before printing the search output.
- Verify the saved index and metadata sidecar row counts.
$ python - <<'PY' import faiss, json from pathlib import Path index = faiss.read_index("support-faq.faiss") records = json.loads(Path("support-faq.json").read_text()) print(f"index rows: {index.ntotal}") print(f"metadata rows: {len(records)}") PY index rows: 4 metadata rows: 4
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.