Filtering rows in a pandas DataFrame keeps records that meet a condition before analysis, reporting, or export. Boolean masks work well for thresholds, membership checks, status fields, and text matches because each row receives a True or False decision.
Use .loc when the filter also selects output columns, because the expression before the comma chooses rows and the expression after the comma chooses columns. A mask built from a column comparison must align with the DataFrame rows, and pandas preserves the original index labels in the filtered result.
Combine conditions with & and | inside parentheses rather than Python and or or, which cannot reduce a Series of booleans to one truth value. Create a .copy() of the filtered result before changing it independently from the source DataFrame.
import pandas as pd orders = pd.DataFrame( { "order_id": [1001, 1002, 1003, 1004, 1005, 1006], "customer": ["Ada", "Lin", "Maya", "Omar", "Nia", "Kai"], "region": ["EMEA", "APAC", "AMER", "EMEA", "APAC", "AMER"], "category": ["hardware", "software", "hardware", "service", "software", "hardware"], "qty": [3, 12, 7, 2, 15, 9], "total_usd": [150.0, 240.0, 875.0, 95.0, 360.0, 420.0], "status": ["paid", "paid", "open", "paid", "open", "paid"], } ) print(f"pandas {pd.__version__}") print() print("BASE") print(orders.to_string(index=False)) print() high_value = orders.loc[ orders["total_usd"] >= 300, ["order_id", "customer", "region", "total_usd"], ] print("FILTER_TOTAL") print(high_value.to_string(index=False)) print() regions = ["EMEA", "APAC"] priority_mask = (orders["region"].isin(regions)) & (orders["qty"] >= 10) priority = orders.loc[ priority_mask, ["order_id", "customer", "region", "qty", "total_usd"], ] print("FILTER_REGION_QTY") print(priority.to_string(index=False)) print() paid_hardware = orders.loc[ (orders["status"] == "paid") & (orders["category"] == "hardware"), ["order_id", "customer", "category", "status", "total_usd"], ] print("FILTER_STATUS_CATEGORY") print(paid_hardware.to_string(index=False)) print() customer_match = orders.loc[ orders["customer"].str.contains("a", case=False, regex=False, na=False), ["order_id", "customer", "region"], ] print("FILTER_CUSTOMER_TEXT") print(customer_match.to_string(index=False)) print() print("VERIFY") print(f"priority rows: {len(priority)}") print(f"priority index: {priority.index.tolist()}") print(f"all priority qty >= 10: {priority['qty'].ge(10).all()}") print(f"source rows unchanged: {len(orders)}")
Replace the sample orders object with the DataFrame already loaded in the working script.
$ python3 filter_rows.py
pandas 3.0.3
BASE
order_id customer region category qty total_usd status
1001 Ada EMEA hardware 3 150.0 paid
1002 Lin APAC software 12 240.0 paid
1003 Maya AMER hardware 7 875.0 open
1004 Omar EMEA service 2 95.0 paid
1005 Nia APAC software 15 360.0 open
1006 Kai AMER hardware 9 420.0 paid
FILTER_TOTAL
order_id customer region total_usd
1003 Maya AMER 875.0
1005 Nia APAC 360.0
1006 Kai AMER 420.0
FILTER_REGION_QTY
order_id customer region qty total_usd
1002 Lin APAC 12 240.0
1005 Nia APAC 15 360.0
FILTER_STATUS_CATEGORY
order_id customer category status total_usd
1001 Ada hardware paid 150.0
1006 Kai hardware paid 420.0
FILTER_CUSTOMER_TEXT
order_id customer region
1001 Ada EMEA
1003 Maya AMER
1004 Omar EMEA
1005 Nia APAC
1006 Kai AMER
VERIFY
priority rows: 2
priority index: [1, 4]
all priority qty >= 10: True
source rows unchanged: 6
high_value = orders.loc[ orders["total_usd"] >= 300, ["order_id", "customer", "region", "total_usd"], ]
The row condition comes before the comma, and the selected output columns come after it.
Related: How to select DataFrame rows and columns with loc and iloc in pandas
regions = ["EMEA", "APAC"] priority_mask = (orders["region"].isin(regions)) & (orders["qty"] >= 10) priority = orders.loc[ priority_mask, ["order_id", "customer", "region", "qty", "total_usd"], ]
Use parentheses around each comparison when combining masks. Python and and or raise an ambiguous truth-value error with pandas Series objects.
paid_hardware = orders.loc[ (orders["status"] == "paid") & (orders["category"] == "hardware"), ["order_id", "customer", "category", "status", "total_usd"], ]
customer_match = orders.loc[ orders["customer"].str.contains("a", case=False, regex=False, na=False), ["order_id", "customer", "region"], ]
na=False keeps missing text values from matching the filter.
print(f"priority rows: {len(priority)}") print(f"priority index: {priority.index.tolist()}") print(f"all priority qty >= 10: {priority['qty'].ge(10).all()}") print(f"source rows unchanged: {len(orders)}")
priority_review = priority.copy() priority_review["review_required"] = True
.copy() makes the filtered DataFrame a separate working object before later assignment.
Related: How to migrate pandas code for Copy-on-Write
$ rm filter_rows.py