Reshaping a pandas DataFrame from wide to long turns repeated measurement columns into row values. Analysts use that shape before grouping by period, plotting categories, or joining measurements to lookup tables.
DataFrame.melt() keeps identifier columns such as region and product while unpivoting selected measurement columns such as q1_sales, q2_sales, and q3_sales. The output contains one row for each identifier and measured column, with separate fields for the source column name and the measured value.
Name the output fields with var_name and value_name so the long table matches the downstream data model. If the original index carries information, move it into a column before melting or set ignore_index=False and confirm that repeated index labels will not confuse later joins.
Related: How to create a pandas DataFrame
Related: How to aggregate data with pandas groupby
Related: How to create a pivot table in pandas
import pandas as pd wide = pd.DataFrame( { "region": ["North", "South"], "product": ["Widget", "Widget"], "q1_sales": [120, 90], "q2_sales": [135, 104], "q3_sales": [148, 110], } ) id_columns = ["region", "product"] value_columns = ["q1_sales", "q2_sales", "q3_sales"] long = wide.melt( id_vars=id_columns, value_vars=value_columns, var_name="quarter", value_name="sales_usd", ) expected_rows = len(wide) * len(value_columns) print(f"pandas {pd.__version__}") print() print("WIDE_DATA") print(wide.to_string(index=False)) print() print("LONG_DATA") print(long.to_string(index=False)) print() print("VERIFY_SHAPE") print(f"wide rows: {len(wide)}") print(f"value columns: {len(value_columns)}") print(f"expected long rows: {expected_rows}") print(f"actual long rows: {len(long)}") print(f"columns: {', '.join(long.columns)}") print() print("SALES_BY_QUARTER") print(long.groupby("quarter", sort=False)["sales_usd"].sum().to_string())
id_columns stay fixed, while value_columns become rows. Keep value_name different from any existing column label.
$ python3 melt_sales.py pandas 3.0.3 WIDE_DATA region product q1_sales q2_sales q3_sales North Widget 120 135 148 South Widget 90 104 110 LONG_DATA region product quarter sales_usd North Widget q1_sales 120 South Widget q1_sales 90 North Widget q2_sales 135 South Widget q2_sales 104 North Widget q3_sales 148 South Widget q3_sales 110 VERIFY_SHAPE wide rows: 2 value columns: 3 expected long rows: 6 actual long rows: 6 columns: region, product, quarter, sales_usd SALES_BY_QUARTER quarter q1_sales 210 q2_sales 239 q3_sales 258
id_columns = ["region", "product"] long = wide.melt( id_vars=id_columns, value_vars=value_columns, var_name="quarter", value_name="sales_usd", )
Identifier columns are copied into each long-form row. Use customer, product, timestamp, or other keys that explain each measurement.
value_columns = ["q1_sales", "q2_sales", "q3_sales"]
If value_vars is omitted, pandas melts every column not listed in id_vars.
var_name="quarter", value_name="sales_usd"
value_name cannot match an existing column label.
expected_rows = len(wide) * len(value_columns) print(f"expected long rows: {expected_rows}") print(f"actual long rows: {len(long)}") print(f"columns: {', '.join(long.columns)}")
The long table should have one row for every original row and selected measurement column.
print(long.groupby("quarter", sort=False)["sales_usd"].sum().to_string())
$ rm melt_sales.py