Nearest-neighbor queries find the closest rows in a coordinate or feature array to one or more target points. In SciPy, a KDTree indexes numeric point data so spatial lookups, embedding checks, and clustering prefilters do not need to compare every row manually.
scipy.spatial.KDTree stores the input as an n by m array, where each row is one point and each column is one coordinate or feature. Its query() method returns two aligned results: distances to the matched rows and indexes back into the original array.
A two-dimensional sensor dataset keeps the indexes and Euclidean distances easy to check by hand. Keep the source array unchanged after building the tree unless copy_data=True is used, because the tree may reference the original data buffer.
Steps to query nearest neighbors with SciPy KDTree:
- Create a Python script named nearest_neighbor_query.py.
- nearest_neighbor_query.py
import numpy as np from scipy.spatial import KDTree np.set_printoptions(precision=3, suppress=True) points = np.array([ [0.0, 0.0], [2.0, 3.0], [2.0, 4.0], [5.0, 4.0], [8.0, 2.0], ]) labels = np.array(["depot", "sensor-a", "sensor-b", "sensor-c", "sensor-d"]) queries = np.array([ [2.2, 3.9], [7.4, 2.2], ]) tree = KDTree(points) distances, indexes = tree.query(queries, k=2) for query_number, query in enumerate(queries, start=1): print(f"query {query_number}: {query}") for rank, (distance, index) in enumerate( zip(distances[query_number - 1], indexes[query_number - 1]), start=1, ): print( f" {rank}: {labels[index]} index={index} " f"point={points[index]} distance={distance:.3f}" ) manual_distance = np.linalg.norm(points[indexes[0, 0]] - queries[0]) print(f"\nmanual check for query 1 nearest: {manual_distance:.3f}") bounded_distances, bounded_indexes = tree.query( queries[0], k=3, distance_upper_bound=1.0, ) print("\nwithin distance 1.0 from query 1") print("distances:", bounded_distances) print("indexes:", bounded_indexes)
KDTree and cKDTree are functionally identical in current SciPy. KDTree is the clearer default when old SciPy compatibility is not required.
- Run the script.
$ python3 nearest_neighbor_query.py query 1: [2.2 3.9] 1: sensor-b index=2 point=[2. 4.] distance=0.224 2: sensor-a index=1 point=[2. 3.] distance=0.922 query 2: [7.4 2.2] 1: sensor-d index=4 point=[8. 2.] distance=0.632 2: sensor-c index=3 point=[5. 4.] distance=3.000 manual check for query 1 nearest: 0.224 within distance 1.0 from query 1 distances: [0.224 0.922 inf] indexes: [2 1 5]
- Match each returned index to the original point array.
indexes[0, 0] is 2, so the closest row for the first query is points[2], labeled sensor-b.
- Confirm the nearest distance with the manual check line.
The first query is 0.2 units away in x and 0.1 units away in y from sensor-b, so the Euclidean distance sqrt(0.2**2 + 0.1**2) rounds to 0.224.
- Use k for the number of neighbors to keep for each query point.
When k=1, SciPy squeezes the neighbor dimension from the result. Use k=[1] if downstream code should keep a one-column result.
- Use distance_upper_bound when far matches should be treated as missing.
The bounded query asks for three neighbors within distance 1.0. The third result prints inf with index 5 because the tree has five input rows and no third neighbor met the bound.
- Remove the demo script when it was only created for the check.
$ rm nearest_neighbor_query.py
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.