Statistical summaries turn a NumPy array into a small set of numbers that describe center, spread, and grouped behavior. They help Python scripts check measurements, scores, sensor readings, or table-shaped numeric data before plotting or exporting results.

NumPy reduction functions flatten an array when no axis is supplied. Passing axis=0 summarizes each column, while axis=1 summarizes each row, so the axis choice should match how records and fields are arranged.

Spread calculations need one extra decision before the numbers are reported. std() uses ddof=0 by default for a population-style standard deviation, ddof=1 switches to the common sample calculation, and NaN-aware functions such as np.nanmean() avoid one missing value turning the whole summary into NaN.

Steps to calculate statistics with NumPy:

  1. Create a script that calculates full-array, axis-based, percentile, and missing-value statistics.
    statistics-calculate.py
    import numpy as np
     
    np.set_printoptions(precision=2, suppress=True)
     
    scores = np.array(
        [
            [72.0, 75.0, 79.0],
            [80.0, 82.0, 88.0],
            [68.0, 74.0, 77.0],
            [88.0, 93.0, 96.0],
        ]
    )
     
    scores_with_missing = scores.copy()
    scores_with_missing[1, 2] = np.nan
     
    print("dataset shape:", scores.shape)
    print("overall mean:", scores.mean())
    print("overall median:", np.median(scores))
    print("population std:", round(float(scores.std()), 2))
    print("sample std:", round(float(scores.std(ddof=1)), 2))
    print("column means:", scores.mean(axis=0))
    print("column percentiles:")
    print(np.percentile(scores, [25, 50, 75], axis=0))
    print("row means with NaN ignored:", np.nanmean(scores_with_missing, axis=1))
    print("ordinary mean with NaN:", np.mean(scores_with_missing))

    axis=0 summarizes each column. axis=1 summarizes each row.

  2. Run the script and compare the scalar, column, percentile, and NaN-aware summaries.
    $ python statistics-calculate.py
    dataset shape: (4, 3)
    overall mean: 81.0
    overall median: 79.5
    population std: 8.29
    sample std: 8.66
    column means: [77. 81. 85.]
    column percentiles:
    [[71.   74.75 78.5 ]
     [76.   78.5  83.5 ]
     [82.   84.75 90.  ]]
    row means with NaN ignored: [75.33 81.   73.   92.33]
    ordinary mean with NaN: nan

    The percentile rows represent 25, 50, and 75 percentiles for each input column. population std uses ddof=0, sample std uses ddof=1, and np.nanmean() ignores the single missing value.