Concatenating pandas DataFrame objects combines repeated extracts, monthly files, or aligned feature tables into one object before analysis or export. pandas.concat() is the right tool when the inputs already have compatible columns or indexes and need to be stacked or placed side by side.

The default axis=0 stacks rows and keeps the original row labels unless ignore_index=True asks pandas to build a fresh integer index. That reset is useful for repeated extracts where each input reused labels such as 0 and 1.

Column differences matter before the result feeds later cleaning or analysis. The default join=“outer” keeps every column and fills missing values where an input does not have that column, while join=“inner” keeps only columns shared by every input. Use axis=1 only when the row indexes are already aligned for side-by-side concatenation.

Steps to concatenate pandas DataFrames:

  1. Save a short concat script.
    concat_dataframes.py
    import pandas as pd
     
    january = pd.DataFrame(
        {
            "order_id": ["A100", "A101"],
            "region": ["EMEA", "APAC"],
            "total": [125.0, 98.5],
        },
        index=["jan-0", "jan-1"],
    )
     
    february = pd.DataFrame(
        {
            "order_id": ["A102", "A103"],
            "region": ["EMEA", "AMER"],
            "total": [143.0, 87.0],
        },
        index=["feb-0", "feb-1"],
    )
     
    print(f"pandas {pd.__version__}")
    print()
     
    row_concat = pd.concat([january, february], ignore_index=True)
     
    print("ROW_CONCAT")
    print(row_concat)
    print()
    print("ROW_VERIFY")
    print(f"shape={row_concat.shape}")
    print(f"index={row_concat.index.tolist()}")
    print()
     
    february_extra = february.assign(discount=[5.0, 0.0])
     
    outer_columns = pd.concat([january, february_extra], ignore_index=True, sort=False)
    inner_columns = pd.concat(
        [january, february_extra],
        ignore_index=True,
        join="inner",
    )
     
    print("OUTER_COLUMNS")
    print(outer_columns)
    print()
    print("INNER_COLUMNS")
    print(inner_columns)
    print()
    print("INNER_VERIFY")
    print(inner_columns.columns.tolist())
    print()
     
    customer_flags = pd.DataFrame(
        {"loyalty_tier": ["gold", "silver", "gold", "bronze"]},
        index=row_concat.index,
    )
     
    column_concat = pd.concat([row_concat, customer_flags], axis=1)
     
    print("COLUMN_CONCAT")
    print(column_concat)
    print()
    print("COLUMN_VERIFY")
    print(f"shape={column_concat.shape}")
    print(column_concat.loc[:, ["order_id", "loyalty_tier"]])

    Replace the small DataFrame objects with the loaded extracts or feature tables from the working script. Keep key columns such as order_id visible until the concatenated result has been checked.

  2. Run the concat script.
    $ python3 concat_dataframes.py
    pandas 3.0.3
    
    ROW_CONCAT
      order_id region  total
    0     A100   EMEA  125.0
    1     A101   APAC   98.5
    2     A102   EMEA  143.0
    3     A103   AMER   87.0
    
    ROW_VERIFY
    shape=(4, 3)
    index=[0, 1, 2, 3]
    
    OUTER_COLUMNS
      order_id region  total  discount
    0     A100   EMEA  125.0       NaN
    1     A101   APAC   98.5       NaN
    2     A102   EMEA  143.0       5.0
    3     A103   AMER   87.0       0.0
    
    INNER_COLUMNS
      order_id region  total
    0     A100   EMEA  125.0
    1     A101   APAC   98.5
    2     A102   EMEA  143.0
    3     A103   AMER   87.0
    
    INNER_VERIFY
    ['order_id', 'region', 'total']
    
    COLUMN_CONCAT
      order_id region  total loyalty_tier
    0     A100   EMEA  125.0         gold
    1     A101   APAC   98.5       silver
    2     A102   EMEA  143.0         gold
    3     A103   AMER   87.0       bronze
    
    COLUMN_VERIFY
    shape=(4, 4)
      order_id loyalty_tier
    0     A100         gold
    1     A101       silver
    2     A102         gold
    3     A103       bronze
  3. Stack repeated extracts by row.
    row_concat = pd.concat([january, february], ignore_index=True)

    ignore_index=True discards the original labels from the concatenation axis and returns a fresh RangeIndex.

  4. Keep only shared columns when inputs have different schemas.
    inner_columns = pd.concat(
        [january, february_extra],
        ignore_index=True,
        join="inner",
    )

    Use the default join=“outer” when columns that exist in only one input should remain in the output with missing values for the other inputs.

  5. Create the aligned feature table with the same index.
    customer_flags = pd.DataFrame(
        {"loyalty_tier": ["gold", "silver", "gold", "bronze"]},
        index=row_concat.index,
    )
  6. Concatenate the feature table beside the order rows.
    column_concat = pd.concat([row_concat, customer_flags], axis=1)

    axis=1 aligns rows by index label. Reset or verify indexes first when the side-by-side values must stay on the same records.

  7. Verify the final shape, index, columns, and record identifiers.
    print(row_concat.shape)
    print(row_concat.index.tolist())
    print(inner_columns.columns.tolist())
    print(column_concat.loc[:, ["order_id", "loyalty_tier"]])
  8. Remove the concat script after the behavior is confirmed.
    $ rm concat_dataframes.py