Creating columns in pandas adds calculated fields, labels, or fixed values to an existing DataFrame before analysis, aggregation, or export. A new column can come from arithmetic between existing columns, a scalar repeated for every row, or a vectorized expression that marks rows matching a condition.
Direct assignment with df[“name”] = value updates the current DataFrame and appends the column at the end. DataFrame.assign() returns a new DataFrame, which fits method chains and can build multiple columns in order when a later column depends on an earlier one.
Use a scalar, full-length list or array, aligned Series, or vectorized expression for the new values. With pandas 3 Copy-on-Write, chained assignment such as df[“col”][mask] = value does not update the parent DataFrame; use .loc for conditional column values instead.
Related: How to create a pandas DataFrame
Related: How to rename columns in pandas
Related: How to convert data types in pandas
import pandas as pd df = pd.DataFrame( { "item": ["notebook", "pencil", "eraser"], "qty": [3, 10, 5], "unit_price": [2.5, 0.4, 0.8], } ) print(f"pandas {pd.__version__}") print() print("BASE") print(df) print() df["line_total"] = df["qty"] * df["unit_price"] print("LINE_TOTAL") print(df) print() df["currency"] = "USD" print("CURRENCY") currency_cols = ["item", "line_total", "currency"] print(df[currency_cols]) print() df["bulk_order"] = False print("BULK_DEFAULT") bulk_cols = ["item", "qty", "bulk_order"] print(df[bulk_cols]) print() df.loc[df["qty"] >= 10, "bulk_order"] = True print("BULK_LOC") print(df[bulk_cols]) print() result = df.assign( discount=lambda data: (data["line_total"] * 0.10).where(data["bulk_order"], 0), net_total=lambda data: data["line_total"] - data["discount"], ) print("ASSIGN") summary_cols = ["item", "line_total", "discount", "net_total"] print(result[summary_cols]) print() result.insert(1, "sku", ["N-100", "P-200", "E-300"]) print("INSERT") insert_cols = ["item", "sku", "qty", "line_total", "net_total"] print(result[insert_cols]) print() print("VERIFY_COLUMNS") print(result.columns.tolist()) print() print("VERIFY_DTYPES") dtype_cols = ["line_total", "bulk_order", "net_total"] print(result[dtype_cols].dtypes)
Replace the small df with the DataFrame already loaded in the working script. Keep column names unique unless duplicate labels are intentional.
$ python3 create_columns.py
pandas 3.0.3
BASE
item qty unit_price
0 notebook 3 2.5
1 pencil 10 0.4
2 eraser 5 0.8
LINE_TOTAL
item qty unit_price line_total
0 notebook 3 2.5 7.5
1 pencil 10 0.4 4.0
2 eraser 5 0.8 4.0
CURRENCY
item line_total currency
0 notebook 7.5 USD
1 pencil 4.0 USD
2 eraser 4.0 USD
BULK_DEFAULT
item qty bulk_order
0 notebook 3 False
1 pencil 10 False
2 eraser 5 False
BULK_LOC
item qty bulk_order
0 notebook 3 False
1 pencil 10 True
2 eraser 5 False
ASSIGN
item line_total discount net_total
0 notebook 7.5 0.0 7.5
1 pencil 4.0 0.4 3.6
2 eraser 4.0 0.0 4.0
INSERT
item sku qty line_total net_total
0 notebook N-100 3 7.5 7.5
1 pencil P-200 10 4.0 3.6
2 eraser E-300 5 4.0 4.0
VERIFY_COLUMNS
['item', 'sku', 'qty', 'unit_price', 'line_total', 'currency', 'bulk_order', 'discount', 'net_total']
VERIFY_DTYPES
line_total float64
bulk_order bool
net_total float64
dtype: object
df["line_total"] = df["qty"] * df["unit_price"]
Direct assignment appends the column to the end of df.columns when the label is new.
df["currency"] = "USD"
pandas broadcasts a scalar to every row in the DataFrame.
df["bulk_order"] = False
df.loc[df["qty"] >= 10, "bulk_order"] = True
.loc updates the parent DataFrame in one assignment, which matches pandas 3 Copy-on-Write rules.
Related: How to migrate pandas code for Copy-on-Write
result = df.assign( discount=lambda data: (data["line_total"] * 0.10).where(data["bulk_order"], 0), net_total=lambda data: data["line_total"] - data["discount"], )
DataFrame.assign() returns a new DataFrame. Reassign it back to df if the original variable should include the new columns.
result.insert(1, "sku", ["N-100", "P-200", "E-300"])
DataFrame.insert() raises ValueError when the column label already exists unless duplicate labels are explicitly allowed.
print(result.columns.tolist()) dtype_cols = ["line_total", "bulk_order", "net_total"] print(result[dtype_cols].dtypes)