Concatenating pandas DataFrame objects combines repeated extracts, monthly files, or aligned feature tables into one object before analysis or export. pandas.concat() is the right tool when the inputs already have compatible columns or indexes and need to be stacked or placed side by side.
The default axis=0 stacks rows and keeps the original row labels unless ignore_index=True asks pandas to build a fresh integer index. That reset is useful for repeated extracts where each input reused labels such as 0 and 1.
Column differences matter before the result feeds later cleaning or analysis. The default join=“outer” keeps every column and fills missing values where an input does not have that column, while join=“inner” keeps only columns shared by every input. Use axis=1 only when the row indexes are already aligned for side-by-side concatenation.
Related: How to create a pandas DataFrame
Related: How to merge pandas DataFrames
Related: How to read CSV files with pandas
import pandas as pd january = pd.DataFrame( { "order_id": ["A100", "A101"], "region": ["EMEA", "APAC"], "total": [125.0, 98.5], }, index=["jan-0", "jan-1"], ) february = pd.DataFrame( { "order_id": ["A102", "A103"], "region": ["EMEA", "AMER"], "total": [143.0, 87.0], }, index=["feb-0", "feb-1"], ) print(f"pandas {pd.__version__}") print() row_concat = pd.concat([january, february], ignore_index=True) print("ROW_CONCAT") print(row_concat) print() print("ROW_VERIFY") print(f"shape={row_concat.shape}") print(f"index={row_concat.index.tolist()}") print() february_extra = february.assign(discount=[5.0, 0.0]) outer_columns = pd.concat([january, february_extra], ignore_index=True, sort=False) inner_columns = pd.concat( [january, february_extra], ignore_index=True, join="inner", ) print("OUTER_COLUMNS") print(outer_columns) print() print("INNER_COLUMNS") print(inner_columns) print() print("INNER_VERIFY") print(inner_columns.columns.tolist()) print() customer_flags = pd.DataFrame( {"loyalty_tier": ["gold", "silver", "gold", "bronze"]}, index=row_concat.index, ) column_concat = pd.concat([row_concat, customer_flags], axis=1) print("COLUMN_CONCAT") print(column_concat) print() print("COLUMN_VERIFY") print(f"shape={column_concat.shape}") print(column_concat.loc[:, ["order_id", "loyalty_tier"]])
Replace the small DataFrame objects with the loaded extracts or feature tables from the working script. Keep key columns such as order_id visible until the concatenated result has been checked.
$ python3 concat_dataframes.py pandas 3.0.3 ROW_CONCAT order_id region total 0 A100 EMEA 125.0 1 A101 APAC 98.5 2 A102 EMEA 143.0 3 A103 AMER 87.0 ROW_VERIFY shape=(4, 3) index=[0, 1, 2, 3] OUTER_COLUMNS order_id region total discount 0 A100 EMEA 125.0 NaN 1 A101 APAC 98.5 NaN 2 A102 EMEA 143.0 5.0 3 A103 AMER 87.0 0.0 INNER_COLUMNS order_id region total 0 A100 EMEA 125.0 1 A101 APAC 98.5 2 A102 EMEA 143.0 3 A103 AMER 87.0 INNER_VERIFY ['order_id', 'region', 'total'] COLUMN_CONCAT order_id region total loyalty_tier 0 A100 EMEA 125.0 gold 1 A101 APAC 98.5 silver 2 A102 EMEA 143.0 gold 3 A103 AMER 87.0 bronze COLUMN_VERIFY shape=(4, 4) order_id loyalty_tier 0 A100 gold 1 A101 silver 2 A102 gold 3 A103 bronze
row_concat = pd.concat([january, february], ignore_index=True)
ignore_index=True discards the original labels from the concatenation axis and returns a fresh RangeIndex.
inner_columns = pd.concat( [january, february_extra], ignore_index=True, join="inner", )
Use the default join=“outer” when columns that exist in only one input should remain in the output with missing values for the other inputs.
customer_flags = pd.DataFrame( {"loyalty_tier": ["gold", "silver", "gold", "bronze"]}, index=row_concat.index, )
column_concat = pd.concat([row_concat, customer_flags], axis=1)
axis=1 aligns rows by index label. Reset or verify indexes first when the side-by-side values must stay on the same records.
print(row_concat.shape) print(row_concat.index.tolist()) print(inner_columns.columns.tolist()) print(column_concat.loc[:, ["order_id", "loyalty_tier"]])
$ rm concat_dataframes.py