How to calculate statistics with NumPy

Statistical summaries turn a NumPy array into a small set of numbers that describe center, spread, and grouped behavior. They help Python scripts check measurements, scores, sensor readings, or table-shaped numeric data before plotting or exporting results.

NumPy reduction functions flatten an array when no axis is supplied. Passing axis=0 summarizes each column, while axis=1 summarizes each row, so the axis choice should match how records and fields are arranged.

Spread calculations need one extra decision before the numbers are reported. std() uses ddof=0 by default for a population-style standard deviation, ddof=1 switches to the common sample calculation, and NaN-aware functions such as np.nanmean() avoid one missing value turning the whole summary into NaN.

Related: Calculate a histogram
Related: Filter NaN values
Related: Sort an array

Steps to calculate statistics with NumPy:

Create a script that calculates full-array, axis-based, percentile, and missing-value statistics.

statistics-calculate.py

import numpy as np
 
np.set_printoptions(precision=2, suppress=True)
 
scores = np.array(
    [
        [72.0, 75.0, 79.0],
        [80.0, 82.0, 88.0],
        [68.0, 74.0, 77.0],
        [88.0, 93.0, 96.0],
    ]
)
 
scores_with_missing = scores.copy()
scores_with_missing[1, 2] = np.nan
 
print("dataset shape:", scores.shape)
print("overall mean:", scores.mean())
print("overall median:", np.median(scores))
print("population std:", round(float(scores.std()), 2))
print("sample std:", round(float(scores.std(ddof=1)), 2))
print("column means:", scores.mean(axis=0))
print("column percentiles:")
print(np.percentile(scores, [25, 50, 75], axis=0))
print("row means with NaN ignored:", np.nanmean(scores_with_missing, axis=1))
print("ordinary mean with NaN:", np.mean(scores_with_missing))

axis=0 summarizes each column. axis=1 summarizes each row.

Run the script and compare the scalar, column, percentile, and NaN-aware summaries.

$ python statistics-calculate.py
dataset shape: (4, 3)
overall mean: 81.0
overall median: 79.5
population std: 8.29
sample std: 8.66
column means: [77. 81. 85.]
column percentiles:
[[71.   74.75 78.5 ]
 [76.   78.5  83.5 ]
 [82.   84.75 90.  ]]
row means with NaN ignored: [75.33 81.   73.   92.33]
ordinary mean with NaN: nan

The percentile rows represent 25, 50, and 75 percentiles for each input column. population std uses ddof=0, sample std uses ddof=1, and np.nanmean() ignores the single missing value.

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.