How to calculate probabilities with SciPy distributions

Probability distribution objects in scipy.stats turn a named statistical model into density values, cumulative probabilities, quantiles, and simulated observations. Python analysis code can use the same object to check how unusual a value is, find a cutoff, or generate values from one parameterized model.

Freezing a distribution stores parameters once. For scipy.stats.norm, loc is the mean and scale is the standard deviation; other distributions can add shape parameters while still exposing the same common method names.

Continuous distributions need careful wording because pdf() is density at a point, not probability mass at exactly that value. Use cdf() for probability at or below a threshold, sf() for upper-tail probability, ppf() for inverse cumulative cutoffs, and rvs() with a NumPy generator when repeatable sample values matter.

Steps to calculate probabilities with SciPy distributions:

Create a Python script named probability_distribution_demo.py.

probability_distribution_demo.py

import numpy as np
from scipy.stats import norm
 
np.set_printoptions(precision=2, suppress=True)
 
scores = norm(loc=70, scale=8)
lo, hi = scores.interval(0.90)
rng = np.random.default_rng(20260625)
sample = scores.rvs(size=5, random_state=rng)
q = np.array([0.1, 0.5, 0.9])
 
print(f"mean/std: {scores.mean():.1f} {scores.std():.1f}")
print(f"pdf(75): {scores.pdf(75):.4f}")
print(f"cdf(75): {scores.cdf(75):.4f}")
print(f"sf(85): {scores.sf(85):.4f}")
print(f"central 90%: {lo:.2f} to {hi:.2f}")
print(f"ppf(0.90): {scores.ppf(0.90):.2f}")
print("sample:", sample)
print("cdf(ppf(q)):", np.round(scores.cdf(scores.ppf(q)), 1))

norm(loc=70, scale=8) freezes a normal distribution, so each later method call uses the same mean and standard deviation.

Run the script.

$ python3 probability_distribution_demo.py
mean/std: 70.0 8.0
pdf(75): 0.0410
cdf(75): 0.7340
sf(85): 0.0304
central 90%: 56.84 to 83.16
ppf(0.90): 80.25
sample: [78.71 79.82 84.9  63.33 77.79]
cdf(ppf(q)): [0.1 0.5 0.9]

Use pdf(75) as density at the selected score.

A density can be greater than 1 for some continuous distributions. It is not the probability of exactly one value.
Use cdf(75) and sf(85) for lower-tail and upper-tail probabilities.

sf() is the survival function. It represents the probability above the threshold and can be more accurate than subtracting cdf() from 1 for extreme upper tails.
Use ppf() or interval() to convert probability levels back to score cutoffs.

ppf(0.90) returns the cutoff where 90 percent of the distribution is at or below that value. interval(0.90) returns equal-tail bounds around the median.
Pass a NumPy generator to rvs() when sample output must repeat across runs.

Use a fixed generator seed only for repeatable checks. Use an unseeded generator or a documented project seed policy for simulation work.
Verify the inverse cumulative calculation before changing the distribution.
```
q = np.array([0.1, 0.5, 0.9])
scores.cdf(scores.ppf(q))
```
Confirm cdf(ppf(q)) returns the original probability levels apart from normal floating-point rounding.
Remove the demo script when it was only created for the check.
```
$ rm probability_distribution_demo.py
```

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.