How to filter a NumPy array with a boolean mask

Boolean masks turn array comparisons into a selection rule that NumPy can apply without a Python loop. They are useful when the values kept from an array depend on thresholds, validity flags, row-level quality checks, or any condition that should stay tied to the data.

A mask is an array of True and False values. When the mask has the same shape as the target array, array[mask] returns the values whose positions are True. When a one-dimensional mask is applied to a two-dimensional array, it filters rows on the first axis and keeps the remaining columns for each selected row.

Combine comparisons with parenthesized expressions before using & for and or | for or. Python's and and or operators do not combine NumPy boolean arrays element by element, and a mask with the wrong length raises an indexing error instead of silently trimming the data.

Steps to filter a NumPy array with a boolean mask:

  1. Create a script that filters individual values, filters table rows, and checks a mismatched mask.
    array-filter-boolean-mask.py
    import numpy as np
     
    temperatures = np.array([18.5, 21.0, 26.5, 31.0, 24.0])
    hot_day_mask = temperatures >= 24
    hot_days = temperatures[hot_day_mask]
     
    samples = np.array(
        [
            [18.5, 0.91],
            [26.5, 0.98],
            [31.0, 0.99],
            [21.0, 0.87],
        ]
    )
    valid_hot_rows = (samples[:, 0] >= 24) & (samples[:, 1] >= 0.95)
    selected_rows = samples[valid_hot_rows]
     
    assert hot_day_mask.shape == temperatures.shape
    assert valid_hot_rows.shape == (samples.shape[0],)
    assert hot_days.tolist() == [26.5, 31.0, 24.0]
    assert selected_rows.shape == (2, 2)
     
    print("temperatures:", temperatures)
    print("hot day mask:", hot_day_mask)
    print("hot days:", hot_days)
    print("hot day count:", hot_day_mask.sum())
    print("row mask shape:", valid_hot_rows.shape)
    print("selected rows:")
    for row in selected_rows:
        print(" ", row.tolist())
     
    try:
        temperatures[np.array([True, False])]
    except IndexError as error:
        print("shape error:", error)

    hot_day_mask has one value per temperature. valid_hot_rows has one value per row, so samples[valid_hot_rows] keeps complete rows.

  2. Run the script and confirm the selected values, row mask shape, and indexing error.
    $ python3 array-filter-boolean-mask.py
    temperatures: [18.5 21.  26.5 31.  24. ]
    hot day mask: [False False  True  True  True]
    hot days: [26.5 31.  24. ]
    hot day count: 3
    row mask shape: (4,)
    selected rows:
      [26.5, 0.98]
      [31.0, 0.99]
    shape error: boolean index did not match indexed array along axis 0; size of axis is 5 but size of corresponding boolean axis is 2

    The assertions stop the script if the value mask, row mask, filtered values, or selected row shape no longer match the intended filter.