Dropping missing values in pandas removes records or fields that cannot be used safely in a cleanup, analysis, or export step. DataFrame.dropna() is the direct cleanup method when incomplete rows should be excluded instead of filled with replacement values.
dropna() returns a new DataFrame by default, so the original variable remains available for comparison unless the result is assigned back. Row cleanup is the default; column cleanup requires axis=“columns” and should usually target fields that are completely empty or outside the analysis scope.
Choose the rule before deleting data. Use subset for required fields, thresh when a row needs a minimum number of present values, and how=“all” when only fully empty rows or columns should be removed.
Related: How to find missing values in pandas
Related: How to fill missing values in pandas
$ python3 Python 3.13.14 >>>
>>> import pandas as pd >>> import numpy as np
>>> df = pd.DataFrame(
... {
... "order_id": pd.Series(["1001", "1002", "1003", None, "1005"], dtype="string"),
... "customer": pd.Series(["Ava", "Ben", "Cy", "Dana", "Eli"], dtype="string"),
... "region": pd.Series(["EMEA", "APAC", pd.NA, "EMEA", "AMER"], dtype="string"),
... "total": [125.50, np.nan, 89.00, 42.75, 0.00],
... "shipped_at": [
... pd.Timestamp("2026-06-01"),
... pd.NaT,
... pd.Timestamp("2026-06-03"),
... pd.Timestamp("2026-06-04"),
... pd.NaT,
... ],
... "legacy_note": [np.nan, np.nan, np.nan, np.nan, np.nan],
... }
... )
Use the same variable name for an imported CSV, Excel, SQL, or Parquet DataFrame when cleaning real data.
>>> df order_id customer region total shipped_at legacy_note 0 1001 Ava EMEA 125.50 2026-06-01 NaN 1 1002 Ben APAC NaN NaT NaN 2 1003 Cy <NA> 89.00 2026-06-03 NaN 3 <NA> Dana EMEA 42.75 2026-06-04 NaN 4 1005 Eli AMER 0.00 NaT NaN
>>> df.isna().sum() order_id 1 customer 0 region 1 total 1 shipped_at 2 legacy_note 5 dtype: int64
dropna() follows pandas missing-value detection. None, NaN, NaT, and pd.NA are missing; empty strings remain ordinary values unless they are converted first.
>>> required = ["order_id", "total"]
>>> clean_orders = df.dropna(subset=required, ignore_index=True) >>> clean_orders order_id customer region total shipped_at legacy_note 0 1001 Ava EMEA 125.5 2026-06-01 NaN 1 1003 Cy <NA> 89.0 2026-06-03 NaN 2 1005 Eli AMER 0.0 NaT NaN
subset=required checks only those columns. ignore_index=True renumbers the remaining rows from zero after the drop.
>>> clean_orders[required].isna().sum() order_id 0 total 0 dtype: int64
>>> df.dropna(thresh=4) order_id customer region total shipped_at legacy_note 0 1001 Ava EMEA 125.50 2026-06-01 NaN 2 1003 Cy <NA> 89.00 2026-06-03 NaN 3 <NA> Dana EMEA 42.75 2026-06-04 NaN 4 1005 Eli AMER 0.00 NaT NaN
thresh=4 keeps rows with at least four present cells. Do not combine thresh and how in the same dropna() call.
>>> df.dropna(axis="columns", how="all") order_id customer region total shipped_at 0 1001 Ava EMEA 125.50 2026-06-01 1 1002 Ben APAC NaN NaT 2 1003 Cy <NA> 89.00 2026-06-03 3 <NA> Dana EMEA 42.75 2026-06-04 4 1005 Eli AMER 0.00 NaT
axis=“columns”, how=“any” removes every column that has even one missing cell. Use it only when partially populated columns should be discarded.