How to query nearest neighbors with SciPy KDTree

Nearest-neighbor queries find the closest rows in a coordinate or feature array to one or more target points. In SciPy, a KDTree indexes numeric point data so spatial lookups, embedding checks, and clustering prefilters do not need to compare every row manually.

scipy.spatial.KDTree stores the input as an n by m array, where each row is one point and each column is one coordinate or feature. Its query() method returns two aligned results: distances to the matched rows and indexes back into the original array.

A two-dimensional sensor dataset keeps the indexes and Euclidean distances easy to check by hand. Keep the source array unchanged after building the tree unless copy_data=True is used, because the tree may reference the original data buffer.

Steps to query nearest neighbors with SciPy KDTree:

Create a Python script named nearest_neighbor_query.py.

nearest_neighbor_query.py

import numpy as np
from scipy.spatial import KDTree
 
np.set_printoptions(precision=3, suppress=True)
 
points = np.array([
    [0.0, 0.0],
    [2.0, 3.0],
    [2.0, 4.0],
    [5.0, 4.0],
    [8.0, 2.0],
])
labels = np.array(["depot", "sensor-a", "sensor-b", "sensor-c", "sensor-d"])
 
queries = np.array([
    [2.2, 3.9],
    [7.4, 2.2],
])
 
tree = KDTree(points)
distances, indexes = tree.query(queries, k=2)
 
for query_number, query in enumerate(queries, start=1):
    print(f"query {query_number}: {query}")
    for rank, (distance, index) in enumerate(
        zip(distances[query_number - 1], indexes[query_number - 1]),
        start=1,
    ):
        print(
            f"  {rank}: {labels[index]} index={index} "
            f"point={points[index]} distance={distance:.3f}"
        )
 
manual_distance = np.linalg.norm(points[indexes[0, 0]] - queries[0])
print(f"\nmanual check for query 1 nearest: {manual_distance:.3f}")
 
bounded_distances, bounded_indexes = tree.query(
    queries[0],
    k=3,
    distance_upper_bound=1.0,
)
print("\nwithin distance 1.0 from query 1")
print("distances:", bounded_distances)
print("indexes:", bounded_indexes)

KDTree and cKDTree are functionally identical in current SciPy. KDTree is the clearer default when old SciPy compatibility is not required.

Run the script.

$ python3 nearest_neighbor_query.py
query 1: [2.2 3.9]
  1: sensor-b index=2 point=[2. 4.] distance=0.224
  2: sensor-a index=1 point=[2. 3.] distance=0.922
query 2: [7.4 2.2]
  1: sensor-d index=4 point=[8. 2.] distance=0.632
  2: sensor-c index=3 point=[5. 4.] distance=3.000

manual check for query 1 nearest: 0.224

within distance 1.0 from query 1
distances: [0.224 0.922   inf]
indexes: [2 1 5]

Match each returned index to the original point array.

indexes[0, 0] is 2, so the closest row for the first query is points[2], labeled sensor-b.
Confirm the nearest distance with the manual check line.

The first query is 0.2 units away in x and 0.1 units away in y from sensor-b, so the Euclidean distance sqrt(0.2**2 + 0.1**2) rounds to 0.224.
Use k for the number of neighbors to keep for each query point.

When k=1, SciPy squeezes the neighbor dimension from the result. Use k=[1] if downstream code should keep a one-column result.
Use distance_upper_bound when far matches should be treated as missing.

The bounded query asks for three neighbors within distance 1.0. The third result prints inf with index 5 because the tree has five input rows and no third neighbor met the bound.
Remove the demo script when it was only created for the check.
```
$ rm nearest_neighbor_query.py
```

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.