Filtering rows in a pandas DataFrame keeps records that meet a condition before analysis, reporting, or export. Boolean masks work well for thresholds, membership checks, status fields, and text matches because each row receives a True or False decision.
Use .loc when the filter also selects output columns, because the expression before the comma chooses rows and the expression after the comma chooses columns. A mask built from a column comparison must align with the DataFrame rows, and pandas preserves the original index labels in the filtered result.
Combine conditions with & and | inside parentheses rather than Python and or or, which cannot reduce a Series of booleans to one truth value. Create a .copy() of the filtered result before changing it independently from the source DataFrame.
Steps to filter pandas DataFrame rows:
- Save a short row-filtering script.
- filter_rows.py
import pandas as pd orders = pd.DataFrame( { "order_id": [1001, 1002, 1003, 1004, 1005, 1006], "customer": ["Ada", "Lin", "Maya", "Omar", "Nia", "Kai"], "region": ["EMEA", "APAC", "AMER", "EMEA", "APAC", "AMER"], "category": ["hardware", "software", "hardware", "service", "software", "hardware"], "qty": [3, 12, 7, 2, 15, 9], "total_usd": [150.0, 240.0, 875.0, 95.0, 360.0, 420.0], "status": ["paid", "paid", "open", "paid", "open", "paid"], } ) print(f"pandas {pd.__version__}") print() print("BASE") print(orders.to_string(index=False)) print() high_value = orders.loc[ orders["total_usd"] >= 300, ["order_id", "customer", "region", "total_usd"], ] print("FILTER_TOTAL") print(high_value.to_string(index=False)) print() regions = ["EMEA", "APAC"] priority_mask = (orders["region"].isin(regions)) & (orders["qty"] >= 10) priority = orders.loc[ priority_mask, ["order_id", "customer", "region", "qty", "total_usd"], ] print("FILTER_REGION_QTY") print(priority.to_string(index=False)) print() paid_hardware = orders.loc[ (orders["status"] == "paid") & (orders["category"] == "hardware"), ["order_id", "customer", "category", "status", "total_usd"], ] print("FILTER_STATUS_CATEGORY") print(paid_hardware.to_string(index=False)) print() customer_match = orders.loc[ orders["customer"].str.contains("a", case=False, regex=False, na=False), ["order_id", "customer", "region"], ] print("FILTER_CUSTOMER_TEXT") print(customer_match.to_string(index=False)) print() print("VERIFY") print(f"priority rows: {len(priority)}") print(f"priority index: {priority.index.tolist()}") print(f"all priority qty >= 10: {priority['qty'].ge(10).all()}") print(f"source rows unchanged: {len(orders)}")
Replace the sample orders object with the DataFrame already loaded in the working script.
- Run the script and confirm that each section returns only matching rows.
$ python3 filter_rows.py pandas 3.0.3 BASE order_id customer region category qty total_usd status 1001 Ada EMEA hardware 3 150.0 paid 1002 Lin APAC software 12 240.0 paid 1003 Maya AMER hardware 7 875.0 open 1004 Omar EMEA service 2 95.0 paid 1005 Nia APAC software 15 360.0 open 1006 Kai AMER hardware 9 420.0 paid FILTER_TOTAL order_id customer region total_usd 1003 Maya AMER 875.0 1005 Nia APAC 360.0 1006 Kai AMER 420.0 FILTER_REGION_QTY order_id customer region qty total_usd 1002 Lin APAC 12 240.0 1005 Nia APAC 15 360.0 FILTER_STATUS_CATEGORY order_id customer category status total_usd 1001 Ada hardware paid 150.0 1006 Kai hardware paid 420.0 FILTER_CUSTOMER_TEXT order_id customer region 1001 Ada EMEA 1003 Maya AMER 1004 Omar EMEA 1005 Nia APAC 1006 Kai AMER VERIFY priority rows: 2 priority index: [1, 4] all priority qty >= 10: True source rows unchanged: 6 - Filter rows with a numeric threshold.
high_value = orders.loc[ orders["total_usd"] >= 300, ["order_id", "customer", "region", "total_usd"], ]
The row condition comes before the comma, and the selected output columns come after it.
Related: How to select DataFrame rows and columns with loc and iloc in pandas - Match rows whose value appears in an allowed list.
regions = ["EMEA", "APAC"] priority_mask = (orders["region"].isin(regions)) & (orders["qty"] >= 10) priority = orders.loc[ priority_mask, ["order_id", "customer", "region", "qty", "total_usd"], ]
Use parentheses around each comparison when combining masks. Python and and or raise an ambiguous truth-value error with pandas Series objects.
- Filter rows by exact status and category values.
paid_hardware = orders.loc[ (orders["status"] == "paid") & (orders["category"] == "hardware"), ["order_id", "customer", "category", "status", "total_usd"], ]
- Filter text rows with a string condition.
customer_match = orders.loc[ orders["customer"].str.contains("a", case=False, regex=False, na=False), ["order_id", "customer", "region"], ]
na=False keeps missing text values from matching the filter.
- Verify the filtered row count, preserved index labels, and source row count.
print(f"priority rows: {len(priority)}") print(f"priority index: {priority.index.tolist()}") print(f"all priority qty >= 10: {priority['qty'].ge(10).all()}") print(f"source rows unchanged: {len(orders)}")
- Copy a filtered result before changing it independently.
priority_review = priority.copy() priority_review["review_required"] = True
.copy() makes the filtered DataFrame a separate working object before later assignment.
Related: How to migrate pandas code for Copy-on-Write - Remove the sample script after confirming the filter behavior.
$ rm filter_rows.py
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.