Merging pandas DataFrame objects combines rows that describe the same records but live in separate tables. It is useful when a transaction table needs lookup columns such as customer segment, region, product attributes, or other keyed context before analysis or export.

DataFrame.merge() performs database-style joins on columns or indexes. A left merge keeps every row from the left object and attaches matching right-side columns, while an inner merge keeps only keys found on both sides.

Choose the join key deliberately before merging. Duplicate lookup keys can multiply rows, so validate=“many_to_one” catches a right-side lookup table that is not unique, and indicator=True adds a _merge column that identifies unmatched records. Missing join keys need a decision before the merge because pandas matches null keys to null keys rather than leaving them unmatched as SQL usually does.

Steps to merge pandas DataFrames:

  1. Save a short merge script.
    merge_dataframes.py
    import pandas as pd
     
    orders = pd.DataFrame(
        {
            "order_id": [1001, 1002, 1003, 1004],
            "customer_id": ["C001", "C002", "C001", "C004"],
            "total_usd": [125.0, 240.0, 90.0, 410.0],
        }
    )
     
    customers = pd.DataFrame(
        {
            "customer_id": ["C001", "C002", "C003"],
            "segment": ["SMB", "Enterprise", "SMB"],
            "region": ["EMEA", "APAC", "AMER"],
        }
    )
     
    left_join = orders.merge(
        customers,
        on="customer_id",
        how="left",
        validate="many_to_one",
        indicator=True,
    )
     
    unmatched_orders = left_join.loc[
        left_join["_merge"] == "left_only",
        ["order_id", "customer_id"],
    ]
     
    inner_join = orders.merge(
        customers,
        on="customer_id",
        how="inner",
        validate="many_to_one",
    )
     
    print(f"pandas {pd.__version__}")
    print()
     
    print("LEFT_MERGE")
    print(left_join.to_string(index=False))
    print()
     
    print("UNMATCHED_ORDERS")
    print(unmatched_orders.to_string(index=False))
    print()
     
    print("LEFT_VERIFY")
    print(f"rows={len(left_join)}")
    print(f"unmatched_orders={len(unmatched_orders)}")
    print(left_join["_merge"].value_counts())
    print()
     
    print("INNER_MERGE")
    print(inner_join.to_string(index=False))
    print()
     
    print("INNER_VERIFY")
    print(f"rows={len(inner_join)}")
    print(f"all_segments_present={inner_join['segment'].notna().all()}")
    print()
     
    print("SOURCE_VERIFY")
    print(f"orders rows unchanged={len(orders)}")
    print(f"customer keys unique={customers['customer_id'].is_unique}")

    Replace the small orders and customers objects with the DataFrame objects already loaded in the working script. Keep the join key visible until row counts and unmatched records are checked.

  2. Run the merge script.
    $ python3 merge_dataframes.py
    pandas 3.0.3
    
    LEFT_MERGE
     order_id customer_id  total_usd    segment region    _merge
         1001        C001      125.0        SMB   EMEA      both
         1002        C002      240.0 Enterprise   APAC      both
         1003        C001       90.0        SMB   EMEA      both
         1004        C004      410.0        NaN    NaN left_only
    
    UNMATCHED_ORDERS
     order_id customer_id
         1004        C004
    
    LEFT_VERIFY
    rows=4
    unmatched_orders=1
    _merge
    both          3
    left_only     1
    right_only    0
    Name: count, dtype: int64
    
    INNER_MERGE
     order_id customer_id  total_usd    segment region
         1001        C001      125.0        SMB   EMEA
         1002        C002      240.0 Enterprise   APAC
         1003        C001       90.0        SMB   EMEA
    
    INNER_VERIFY
    rows=3
    all_segments_present=True
    
    SOURCE_VERIFY
    orders rows unchanged=4
    customer keys unique=True
  3. Merge the left DataFrame with the customer lookup.
    left_join = orders.merge(
        customers,
        on="customer_id",
        how="left",
        validate="many_to_one",
        indicator=True,
    )

    Rows with missing key values can match other missing key values in pandas. Remove or fill missing keys first when null keys should stay unmatched.

  4. Inspect the rows that did not find a right-side match.
    unmatched_orders = left_join.loc[
        left_join["_merge"] == "left_only",
        ["order_id", "customer_id"],
    ]

    indicator=True creates the _merge column with both, left_only, and right_only values.

  5. Run an inner merge when only matched keys should remain.
    inner_join = orders.merge(
        customers,
        on="customer_id",
        how="inner",
        validate="many_to_one",
    )

    Use how=“outer” when keys from both sides should stay in the output, and use how=“left” when the left-side records are the main table.

  6. Verify the merged row counts, unmatched records, and lookup key uniqueness.
    print(f"rows={len(left_join)}")
    print(f"unmatched_orders={len(unmatched_orders)}")
    print(left_join["_merge"].value_counts())
    print(f"customer keys unique={customers['customer_id'].is_unique}")
  7. Remove the sample script after the merge behavior is confirmed.
    $ rm merge_dataframes.py