Merging pandas DataFrame objects combines rows that describe the same records but live in separate tables. It is useful when a transaction table needs lookup columns such as customer segment, region, product attributes, or other keyed context before analysis or export.
DataFrame.merge() performs database-style joins on columns or indexes. A left merge keeps every row from the left object and attaches matching right-side columns, while an inner merge keeps only keys found on both sides.
Choose the join key deliberately before merging. Duplicate lookup keys can multiply rows, so validate=“many_to_one” catches a right-side lookup table that is not unique, and indicator=True adds a _merge column that identifies unmatched records. Missing join keys need a decision before the merge because pandas matches null keys to null keys rather than leaving them unmatched as SQL usually does.
Related: How to create a pandas DataFrame
Related: How to concatenate pandas DataFrames
Related: How to filter rows in a pandas DataFrame
Steps to merge pandas DataFrames:
- Save a short merge script.
- merge_dataframes.py
import pandas as pd orders = pd.DataFrame( { "order_id": [1001, 1002, 1003, 1004], "customer_id": ["C001", "C002", "C001", "C004"], "total_usd": [125.0, 240.0, 90.0, 410.0], } ) customers = pd.DataFrame( { "customer_id": ["C001", "C002", "C003"], "segment": ["SMB", "Enterprise", "SMB"], "region": ["EMEA", "APAC", "AMER"], } ) left_join = orders.merge( customers, on="customer_id", how="left", validate="many_to_one", indicator=True, ) unmatched_orders = left_join.loc[ left_join["_merge"] == "left_only", ["order_id", "customer_id"], ] inner_join = orders.merge( customers, on="customer_id", how="inner", validate="many_to_one", ) print(f"pandas {pd.__version__}") print() print("LEFT_MERGE") print(left_join.to_string(index=False)) print() print("UNMATCHED_ORDERS") print(unmatched_orders.to_string(index=False)) print() print("LEFT_VERIFY") print(f"rows={len(left_join)}") print(f"unmatched_orders={len(unmatched_orders)}") print(left_join["_merge"].value_counts()) print() print("INNER_MERGE") print(inner_join.to_string(index=False)) print() print("INNER_VERIFY") print(f"rows={len(inner_join)}") print(f"all_segments_present={inner_join['segment'].notna().all()}") print() print("SOURCE_VERIFY") print(f"orders rows unchanged={len(orders)}") print(f"customer keys unique={customers['customer_id'].is_unique}")
Replace the small orders and customers objects with the DataFrame objects already loaded in the working script. Keep the join key visible until row counts and unmatched records are checked.
- Run the merge script.
$ python3 merge_dataframes.py pandas 3.0.3 LEFT_MERGE order_id customer_id total_usd segment region _merge 1001 C001 125.0 SMB EMEA both 1002 C002 240.0 Enterprise APAC both 1003 C001 90.0 SMB EMEA both 1004 C004 410.0 NaN NaN left_only UNMATCHED_ORDERS order_id customer_id 1004 C004 LEFT_VERIFY rows=4 unmatched_orders=1 _merge both 3 left_only 1 right_only 0 Name: count, dtype: int64 INNER_MERGE order_id customer_id total_usd segment region 1001 C001 125.0 SMB EMEA 1002 C002 240.0 Enterprise APAC 1003 C001 90.0 SMB EMEA INNER_VERIFY rows=3 all_segments_present=True SOURCE_VERIFY orders rows unchanged=4 customer keys unique=True - Merge the left DataFrame with the customer lookup.
left_join = orders.merge( customers, on="customer_id", how="left", validate="many_to_one", indicator=True, )
Rows with missing key values can match other missing key values in pandas. Remove or fill missing keys first when null keys should stay unmatched.
- Inspect the rows that did not find a right-side match.
unmatched_orders = left_join.loc[ left_join["_merge"] == "left_only", ["order_id", "customer_id"], ]
indicator=True creates the _merge column with both, left_only, and right_only values.
- Run an inner merge when only matched keys should remain.
inner_join = orders.merge( customers, on="customer_id", how="inner", validate="many_to_one", )
Use how=“outer” when keys from both sides should stay in the output, and use how=“left” when the left-side records are the main table.
- Verify the merged row counts, unmatched records, and lookup key uniqueness.
print(f"rows={len(left_join)}") print(f"unmatched_orders={len(unmatched_orders)}") print(left_join["_merge"].value_counts()) print(f"customer keys unique={customers['customer_id'].is_unique}")
- Remove the sample script after the merge behavior is confirmed.
$ rm merge_dataframes.py
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.