Pairwise distance calculations turn coordinate or feature rows into numeric separation values for clustering, nearest-neighbor checks, geometry filters, and model diagnostics. SciPy exposes those calculations through scipy.spatial.distance so a small NumPy array can produce either a compact vector or a readable matrix.
pdist() compares every row inside one array and returns a condensed vector that stores the upper triangle of the distance matrix. squareform() expands that vector into a symmetric square matrix with a zero diagonal when the row-to-row layout is easier to inspect.
cdist() compares rows from two separate arrays and returns a matrix with one row per query observation and one column per reference observation. A small two-dimensional dataset keeps Euclidean and cityblock distances easy to check by hand while still showing the vector and matrix shapes used in real code.
Steps to calculate pairwise distances with SciPy:
- Create the pairwise distance demo script.
- pairwise_distance_demo.py
import numpy as np from scipy.spatial.distance import cdist, pdist, squareform np.set_printoptions(precision=3, suppress=True) points = np.array([ [0.0, 0.0], [3.0, 4.0], [6.0, 8.0], ]) condensed = pdist(points, metric="euclidean") print("pdist euclidean") print(condensed) print("\nsquareform rows") for row in squareform(condensed): print(row) queries = np.array([ [0.0, 4.0], [9.0, 12.0], ]) print("\ncdist euclidean rows") for row in cdist(queries, points, metric="euclidean"): print(row) print("\npdist cityblock") print(pdist(points, metric="cityblock"))
Each row in points is one observation. pdist() compares rows, not columns.
- Run the script.
$ python3 pairwise_distance_demo.py pdist euclidean [ 5. 10. 5.] squareform rows [ 0. 5. 10.] [5. 0. 5.] [10. 5. 0.] cdist euclidean rows [4. 3. 7.211] [15. 10. 5.] pdist cityblock [ 7. 14. 7.]
- Match the pdist() vector to the original row pairs.
For three input rows, the condensed vector stores row 0 to row 1, row 0 to row 2, then row 1 to row 2. The printed Euclidean values are 5, 10, and 5 for those three pairs.
- Use the squareform() output when row and column labels matter.
The diagonal remains zero because each row has zero distance to itself, and the matrix is symmetric because the distance from row A to row B matches row B to row A for these metrics.
- Use cdist() for distances from query rows to reference rows.
Both arrays must have the same number of columns. cdist() raises ValueError instead of padding or dropping features when the column counts differ.
- Change the metric argument for the distance definition required by the data.
metric="cityblock" returns Manhattan distances over the same rows. Use built-in SciPy metric names such as euclidean, cityblock, cosine, or correlation when one matches the calculation.
- Remove the demo script when it was only created for the check.
$ rm pairwise_distance_demo.py
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.