How to filter rows in a pandas DataFrame

Filtering rows in a pandas DataFrame keeps records that meet a condition before analysis, reporting, or export. Boolean masks work well for thresholds, membership checks, status fields, and text matches because each row receives a True or False decision.

Use .loc when the filter also selects output columns, because the expression before the comma chooses rows and the expression after the comma chooses columns. A mask built from a column comparison must align with the DataFrame rows, and pandas preserves the original index labels in the filtered result.

Combine conditions with & and | inside parentheses rather than Python and or or, which cannot reduce a Series of booleans to one truth value. Create a .copy() of the filtered result before changing it independently from the source DataFrame.

Steps to filter pandas DataFrame rows:

Save a short row-filtering script.

filter_rows.py

import pandas as pd
 
orders = pd.DataFrame(
    {
        "order_id": [1001, 1002, 1003, 1004, 1005, 1006],
        "customer": ["Ada", "Lin", "Maya", "Omar", "Nia", "Kai"],
        "region": ["EMEA", "APAC", "AMER", "EMEA", "APAC", "AMER"],
        "category": ["hardware", "software", "hardware", "service", "software", "hardware"],
        "qty": [3, 12, 7, 2, 15, 9],
        "total_usd": [150.0, 240.0, 875.0, 95.0, 360.0, 420.0],
        "status": ["paid", "paid", "open", "paid", "open", "paid"],
    }
)
 
print(f"pandas {pd.__version__}")
print()
 
print("BASE")
print(orders.to_string(index=False))
print()
 
high_value = orders.loc[
    orders["total_usd"] >= 300,
    ["order_id", "customer", "region", "total_usd"],
]
 
print("FILTER_TOTAL")
print(high_value.to_string(index=False))
print()
 
regions = ["EMEA", "APAC"]
priority_mask = (orders["region"].isin(regions)) & (orders["qty"] >= 10)
priority = orders.loc[
    priority_mask,
    ["order_id", "customer", "region", "qty", "total_usd"],
]
 
print("FILTER_REGION_QTY")
print(priority.to_string(index=False))
print()
 
paid_hardware = orders.loc[
    (orders["status"] == "paid") & (orders["category"] == "hardware"),
    ["order_id", "customer", "category", "status", "total_usd"],
]
 
print("FILTER_STATUS_CATEGORY")
print(paid_hardware.to_string(index=False))
print()
 
customer_match = orders.loc[
    orders["customer"].str.contains("a", case=False, regex=False, na=False),
    ["order_id", "customer", "region"],
]
 
print("FILTER_CUSTOMER_TEXT")
print(customer_match.to_string(index=False))
print()
 
print("VERIFY")
print(f"priority rows: {len(priority)}")
print(f"priority index: {priority.index.tolist()}")
print(f"all priority qty >= 10: {priority['qty'].ge(10).all()}")
print(f"source rows unchanged: {len(orders)}")

Replace the sample orders object with the DataFrame already loaded in the working script.

Run the script and confirm that each section returns only matching rows.

$ python3 filter_rows.py
pandas 3.0.3

BASE
 order_id customer region category  qty  total_usd status
     1001      Ada   EMEA hardware    3      150.0   paid
     1002      Lin   APAC software   12      240.0   paid
     1003     Maya   AMER hardware    7      875.0   open
     1004     Omar   EMEA  service    2       95.0   paid
     1005      Nia   APAC software   15      360.0   open
     1006      Kai   AMER hardware    9      420.0   paid

FILTER_TOTAL
 order_id customer region  total_usd
     1003     Maya   AMER      875.0
     1005      Nia   APAC      360.0
     1006      Kai   AMER      420.0

FILTER_REGION_QTY
 order_id customer region  qty  total_usd
     1002      Lin   APAC   12      240.0
     1005      Nia   APAC   15      360.0

FILTER_STATUS_CATEGORY
 order_id customer category status  total_usd
     1001      Ada hardware   paid      150.0
     1006      Kai hardware   paid      420.0

FILTER_CUSTOMER_TEXT
 order_id customer region
     1001      Ada   EMEA
     1003     Maya   AMER
     1004     Omar   EMEA
     1005      Nia   APAC
     1006      Kai   AMER

VERIFY
priority rows: 2
priority index: [1, 4]
all priority qty >= 10: True
source rows unchanged: 6

Filter rows with a numeric threshold.
```
high_value = orders.loc[
    orders["total_usd"] >= 300,
    ["order_id", "customer", "region", "total_usd"],
]
```
The row condition comes before the comma, and the selected output columns come after it.
Related: How to select DataFrame rows and columns with loc and iloc in pandas

Match rows whose value appears in an allowed list.

regions = ["EMEA", "APAC"]
priority_mask = (orders["region"].isin(regions)) & (orders["qty"] >= 10)
priority = orders.loc[
    priority_mask,
    ["order_id", "customer", "region", "qty", "total_usd"],
]

Use parentheses around each comparison when combining masks. Python and and or raise an ambiguous truth-value error with pandas Series objects.

Filter rows by exact status and category values.

paid_hardware = orders.loc[
    (orders["status"] == "paid") & (orders["category"] == "hardware"),
    ["order_id", "customer", "category", "status", "total_usd"],
]

Filter text rows with a string condition.

customer_match = orders.loc[
    orders["customer"].str.contains("a", case=False, regex=False, na=False),
    ["order_id", "customer", "region"],
]

na=False keeps missing text values from matching the filter.

Verify the filtered row count, preserved index labels, and source row count.

print(f"priority rows: {len(priority)}")
print(f"priority index: {priority.index.tolist()}")
print(f"all priority qty >= 10: {priority['qty'].ge(10).all()}")
print(f"source rows unchanged: {len(orders)}")

Copy a filtered result before changing it independently.
```
priority_review = priority.copy()
priority_review["review_required"] = True
```
.copy() makes the filtered DataFrame a separate working object before later assignment.
Related: How to migrate pandas code for Copy-on-Write
Remove the sample script after confirming the filter behavior.
```
$ rm filter_rows.py
```