Concatenating pandas DataFrame objects combines repeated extracts, monthly files, or aligned feature tables into one object before analysis or export. pandas.concat() is the right tool when the inputs already have compatible columns or indexes and need to be stacked or placed side by side.
The default axis=0 stacks rows and keeps the original row labels unless ignore_index=True asks pandas to build a fresh integer index. That reset is useful for repeated extracts where each input reused labels such as 0 and 1.
Column differences matter before the result feeds later cleaning or analysis. The default join=“outer” keeps every column and fills missing values where an input does not have that column, while join=“inner” keeps only columns shared by every input. Use axis=1 only when the row indexes are already aligned for side-by-side concatenation.
Related: How to create a pandas DataFrame
Related: How to merge pandas DataFrames
Related: How to read CSV files with pandas
Steps to concatenate pandas DataFrames:
- Save a short concat script.
- concat_dataframes.py
import pandas as pd january = pd.DataFrame( { "order_id": ["A100", "A101"], "region": ["EMEA", "APAC"], "total": [125.0, 98.5], }, index=["jan-0", "jan-1"], ) february = pd.DataFrame( { "order_id": ["A102", "A103"], "region": ["EMEA", "AMER"], "total": [143.0, 87.0], }, index=["feb-0", "feb-1"], ) print(f"pandas {pd.__version__}") print() row_concat = pd.concat([january, february], ignore_index=True) print("ROW_CONCAT") print(row_concat) print() print("ROW_VERIFY") print(f"shape={row_concat.shape}") print(f"index={row_concat.index.tolist()}") print() february_extra = february.assign(discount=[5.0, 0.0]) outer_columns = pd.concat([january, february_extra], ignore_index=True, sort=False) inner_columns = pd.concat( [january, february_extra], ignore_index=True, join="inner", ) print("OUTER_COLUMNS") print(outer_columns) print() print("INNER_COLUMNS") print(inner_columns) print() print("INNER_VERIFY") print(inner_columns.columns.tolist()) print() customer_flags = pd.DataFrame( {"loyalty_tier": ["gold", "silver", "gold", "bronze"]}, index=row_concat.index, ) column_concat = pd.concat([row_concat, customer_flags], axis=1) print("COLUMN_CONCAT") print(column_concat) print() print("COLUMN_VERIFY") print(f"shape={column_concat.shape}") print(column_concat.loc[:, ["order_id", "loyalty_tier"]])
Replace the small DataFrame objects with the loaded extracts or feature tables from the working script. Keep key columns such as order_id visible until the concatenated result has been checked.
- Run the concat script.
$ python3 concat_dataframes.py pandas 3.0.3 ROW_CONCAT order_id region total 0 A100 EMEA 125.0 1 A101 APAC 98.5 2 A102 EMEA 143.0 3 A103 AMER 87.0 ROW_VERIFY shape=(4, 3) index=[0, 1, 2, 3] OUTER_COLUMNS order_id region total discount 0 A100 EMEA 125.0 NaN 1 A101 APAC 98.5 NaN 2 A102 EMEA 143.0 5.0 3 A103 AMER 87.0 0.0 INNER_COLUMNS order_id region total 0 A100 EMEA 125.0 1 A101 APAC 98.5 2 A102 EMEA 143.0 3 A103 AMER 87.0 INNER_VERIFY ['order_id', 'region', 'total'] COLUMN_CONCAT order_id region total loyalty_tier 0 A100 EMEA 125.0 gold 1 A101 APAC 98.5 silver 2 A102 EMEA 143.0 gold 3 A103 AMER 87.0 bronze COLUMN_VERIFY shape=(4, 4) order_id loyalty_tier 0 A100 gold 1 A101 silver 2 A102 gold 3 A103 bronze
- Stack repeated extracts by row.
row_concat = pd.concat([january, february], ignore_index=True)
ignore_index=True discards the original labels from the concatenation axis and returns a fresh RangeIndex.
- Keep only shared columns when inputs have different schemas.
inner_columns = pd.concat( [january, february_extra], ignore_index=True, join="inner", )
Use the default join=“outer” when columns that exist in only one input should remain in the output with missing values for the other inputs.
- Create the aligned feature table with the same index.
customer_flags = pd.DataFrame( {"loyalty_tier": ["gold", "silver", "gold", "bronze"]}, index=row_concat.index, )
- Concatenate the feature table beside the order rows.
column_concat = pd.concat([row_concat, customer_flags], axis=1)
axis=1 aligns rows by index label. Reset or verify indexes first when the side-by-side values must stay on the same records.
- Verify the final shape, index, columns, and record identifiers.
print(row_concat.shape) print(row_concat.index.tolist()) print(inner_columns.columns.tolist()) print(column_concat.loc[:, ["order_id", "loyalty_tier"]])
- Remove the concat script after the behavior is confirmed.
$ rm concat_dataframes.py
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.