Creating columns in pandas adds calculated fields, labels, or fixed values to an existing DataFrame before analysis, aggregation, or export. A new column can come from arithmetic between existing columns, a scalar repeated for every row, or a vectorized expression that marks rows matching a condition.
Direct assignment with df[“name”] = value updates the current DataFrame and appends the column at the end. DataFrame.assign() returns a new DataFrame, which fits method chains and can build multiple columns in order when a later column depends on an earlier one.
Use a scalar, full-length list or array, aligned Series, or vectorized expression for the new values. With pandas 3 Copy-on-Write, chained assignment such as df[“col”][mask] = value does not update the parent DataFrame; use .loc for conditional column values instead.
Related: How to create a pandas DataFrame
Related: How to rename columns in pandas
Related: How to convert data types in pandas
Steps to create pandas DataFrame columns:
- Save a short column creation script.
- create_columns.py
import pandas as pd df = pd.DataFrame( { "item": ["notebook", "pencil", "eraser"], "qty": [3, 10, 5], "unit_price": [2.5, 0.4, 0.8], } ) print(f"pandas {pd.__version__}") print() print("BASE") print(df) print() df["line_total"] = df["qty"] * df["unit_price"] print("LINE_TOTAL") print(df) print() df["currency"] = "USD" print("CURRENCY") currency_cols = ["item", "line_total", "currency"] print(df[currency_cols]) print() df["bulk_order"] = False print("BULK_DEFAULT") bulk_cols = ["item", "qty", "bulk_order"] print(df[bulk_cols]) print() df.loc[df["qty"] >= 10, "bulk_order"] = True print("BULK_LOC") print(df[bulk_cols]) print() result = df.assign( discount=lambda data: (data["line_total"] * 0.10).where(data["bulk_order"], 0), net_total=lambda data: data["line_total"] - data["discount"], ) print("ASSIGN") summary_cols = ["item", "line_total", "discount", "net_total"] print(result[summary_cols]) print() result.insert(1, "sku", ["N-100", "P-200", "E-300"]) print("INSERT") insert_cols = ["item", "sku", "qty", "line_total", "net_total"] print(result[insert_cols]) print() print("VERIFY_COLUMNS") print(result.columns.tolist()) print() print("VERIFY_DTYPES") dtype_cols = ["line_total", "bulk_order", "net_total"] print(result[dtype_cols].dtypes)
Replace the small df with the DataFrame already loaded in the working script. Keep column names unique unless duplicate labels are intentional.
- Run the script and confirm the new columns, values, and dtypes.
$ python3 create_columns.py pandas 3.0.3 BASE item qty unit_price 0 notebook 3 2.5 1 pencil 10 0.4 2 eraser 5 0.8 LINE_TOTAL item qty unit_price line_total 0 notebook 3 2.5 7.5 1 pencil 10 0.4 4.0 2 eraser 5 0.8 4.0 CURRENCY item line_total currency 0 notebook 7.5 USD 1 pencil 4.0 USD 2 eraser 4.0 USD BULK_DEFAULT item qty bulk_order 0 notebook 3 False 1 pencil 10 False 2 eraser 5 False BULK_LOC item qty bulk_order 0 notebook 3 False 1 pencil 10 True 2 eraser 5 False ASSIGN item line_total discount net_total 0 notebook 7.5 0.0 7.5 1 pencil 4.0 0.4 3.6 2 eraser 4.0 0.0 4.0 INSERT item sku qty line_total net_total 0 notebook N-100 3 7.5 7.5 1 pencil P-200 10 4.0 3.6 2 eraser E-300 5 4.0 4.0 VERIFY_COLUMNS ['item', 'sku', 'qty', 'unit_price', 'line_total', 'currency', 'bulk_order', 'discount', 'net_total'] VERIFY_DTYPES line_total float64 bulk_order bool net_total float64 dtype: object - Add a calculated column with direct assignment when the current DataFrame should be updated.
df["line_total"] = df["qty"] * df["unit_price"]
Direct assignment appends the column to the end of df.columns when the label is new.
- Add a scalar column when every row needs the same value.
df["currency"] = "USD"
pandas broadcasts a scalar to every row in the DataFrame.
- Initialize a conditional column before setting selected rows.
df["bulk_order"] = False
- Set selected rows with .loc.
df.loc[df["qty"] >= 10, "bulk_order"] = True
.loc updates the parent DataFrame in one assignment, which matches pandas 3 Copy-on-Write rules.
Related: How to migrate pandas code for Copy-on-Write - Use DataFrame.assign() when the result should stay in a method chain.
result = df.assign( discount=lambda data: (data["line_total"] * 0.10).where(data["bulk_order"], 0), net_total=lambda data: data["line_total"] - data["discount"], )
DataFrame.assign() returns a new DataFrame. Reassign it back to df if the original variable should include the new columns.
- Insert a column at a specific position when column order matters.
result.insert(1, "sku", ["N-100", "P-200", "E-300"])
DataFrame.insert() raises ValueError when the column label already exists unless duplicate labels are explicitly allowed.
- Verify the final column order and important dtypes.
print(result.columns.tolist()) dtype_cols = ["line_total", "bulk_order", "net_total"] print(result[dtype_cols].dtypes)
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.