How to vectorize a NumPy calculation

Python loops are easy to write for a formula, but NumPy arrays are built to carry the same formula across many values at once. Vectorizing a calculation means replacing scalar-by-scalar loop work with an array expression that returns the same values for every input position.

The usual path is native array arithmetic, comparisons, reductions, broadcasting, and ufuncs, which are NumPy functions that operate element by element on arrays. np.vectorize() can make a scalar Python function accept array input, but it is a convenience wrapper rather than the normal performance refactor.

Keep a small trusted loop result while changing existing code. np.testing.assert_allclose() is a good guard for floating-point formulas, while a shape check catches inputs that no longer line up after the loop is removed.

Steps to vectorize a NumPy calculation:

  1. Create a script that compares a scalar loop with array arithmetic.
    calculation-vectorize.py
    import numpy as np
     
    np.set_printoptions(precision=2, suppress=True)
     
    hours = np.array([6.0, 7.5, 8.0, 4.0])
    hourly_rate = np.array([42.0, 42.0, 45.0, 40.0])
    bonus = np.array([0.0, 15.0, 20.0, 0.0])
     
    loop_total = np.array(
        [
            hours_i * rate_i + bonus_i
            for hours_i, rate_i, bonus_i in zip(hours, hourly_rate, bonus)
        ]
    )
     
    vector_total = hours * hourly_rate + bonus
     
    np.testing.assert_allclose(vector_total, loop_total)
     
    print("input shape:", hours.shape)
    print("loop total:", loop_total)
    print("vector total:", vector_total)
    print("matches loop:", np.allclose(vector_total, loop_total))
    print("result dtype:", vector_total.dtype)

    The loop is the reference result. The expression hours * hourly_rate + bonus keeps the actual calculation in array form.

  2. Run the script and verify that the vectorized expression matches the loop.
    $ python calculation-vectorize.py
    input shape: (4,)
    loop total: [252. 330. 380. 160.]
    vector total: [252. 330. 380. 160.]
    matches loop: True
    result dtype: float64
  3. Check shape mismatches before trusting arrays from different sources.
    $ python - <<'PY'
    import numpy as np
    
    hours = np.array([6.0, 7.5, 8.0, 4.0])
    hourly_rate = np.array([42.0, 45.0])
    scalar_rate = np.array(42.0)
    
    try:
        hours * hourly_rate
    except ValueError as error:
        print(error)
    print("scalar rate shape:", np.broadcast_shapes(hours.shape, scalar_rate.shape))
    PY
    operands could not be broadcast together with shapes (4,) (2,)
    scalar rate shape: (4,)

    Use np.broadcast_shapes() when a vectorized calculation intentionally mixes scalars, row values, or column values.
    Related: Calculate with broadcasting