Statistical summaries turn a NumPy array into a small set of numbers that describe center, spread, and grouped behavior. They help Python scripts check measurements, scores, sensor readings, or table-shaped numeric data before plotting or exporting results.
NumPy reduction functions flatten an array when no axis is supplied. Passing axis=0 summarizes each column, while axis=1 summarizes each row, so the axis choice should match how records and fields are arranged.
Spread calculations need one extra decision before the numbers are reported. std() uses ddof=0 by default for a population-style standard deviation, ddof=1 switches to the common sample calculation, and NaN-aware functions such as np.nanmean() avoid one missing value turning the whole summary into NaN.
Related: Calculate a histogram
Related: Filter NaN values
Related: Sort an array
import numpy as np np.set_printoptions(precision=2, suppress=True) scores = np.array( [ [72.0, 75.0, 79.0], [80.0, 82.0, 88.0], [68.0, 74.0, 77.0], [88.0, 93.0, 96.0], ] ) scores_with_missing = scores.copy() scores_with_missing[1, 2] = np.nan print("dataset shape:", scores.shape) print("overall mean:", scores.mean()) print("overall median:", np.median(scores)) print("population std:", round(float(scores.std()), 2)) print("sample std:", round(float(scores.std(ddof=1)), 2)) print("column means:", scores.mean(axis=0)) print("column percentiles:") print(np.percentile(scores, [25, 50, 75], axis=0)) print("row means with NaN ignored:", np.nanmean(scores_with_missing, axis=1)) print("ordinary mean with NaN:", np.mean(scores_with_missing))
axis=0 summarizes each column. axis=1 summarizes each row.
$ python statistics-calculate.py dataset shape: (4, 3) overall mean: 81.0 overall median: 79.5 population std: 8.29 sample std: 8.66 column means: [77. 81. 85.] column percentiles: [[71. 74.75 78.5 ] [76. 78.5 83.5 ] [82. 84.75 90. ]] row means with NaN ignored: [75.33 81. 73. 92.33] ordinary mean with NaN: nan
The percentile rows represent 25, 50, and 75 percentiles for each input column. population std uses ddof=0, sample std uses ddof=1, and np.nanmean() ignores the single missing value.