How to create columns in pandas

Creating columns in pandas adds calculated fields, labels, or fixed values to an existing DataFrame before analysis, aggregation, or export. A new column can come from arithmetic between existing columns, a scalar repeated for every row, or a vectorized expression that marks rows matching a condition.

Direct assignment with df[“name”] = value updates the current DataFrame and appends the column at the end. DataFrame.assign() returns a new DataFrame, which fits method chains and can build multiple columns in order when a later column depends on an earlier one.

Use a scalar, full-length list or array, aligned Series, or vectorized expression for the new values. With pandas 3 Copy-on-Write, chained assignment such as df[“col”][mask] = value does not update the parent DataFrame; use .loc for conditional column values instead.

Steps to create pandas DataFrame columns:

Save a short column creation script.

create_columns.py

import pandas as pd
 
df = pd.DataFrame(
    {
        "item": ["notebook", "pencil", "eraser"],
        "qty": [3, 10, 5],
        "unit_price": [2.5, 0.4, 0.8],
    }
)
 
print(f"pandas {pd.__version__}")
print()
 
print("BASE")
print(df)
print()
 
df["line_total"] = df["qty"] * df["unit_price"]
 
print("LINE_TOTAL")
print(df)
print()
 
df["currency"] = "USD"
 
print("CURRENCY")
currency_cols = ["item", "line_total", "currency"]
print(df[currency_cols])
print()
 
df["bulk_order"] = False
 
print("BULK_DEFAULT")
bulk_cols = ["item", "qty", "bulk_order"]
print(df[bulk_cols])
print()
 
df.loc[df["qty"] >= 10, "bulk_order"] = True
 
print("BULK_LOC")
print(df[bulk_cols])
print()
 
result = df.assign(
    discount=lambda data: (data["line_total"] * 0.10).where(data["bulk_order"], 0),
    net_total=lambda data: data["line_total"] - data["discount"],
)
 
print("ASSIGN")
summary_cols = ["item", "line_total", "discount", "net_total"]
print(result[summary_cols])
print()
 
result.insert(1, "sku", ["N-100", "P-200", "E-300"])
 
print("INSERT")
insert_cols = ["item", "sku", "qty", "line_total", "net_total"]
print(result[insert_cols])
print()
 
print("VERIFY_COLUMNS")
print(result.columns.tolist())
print()
 
print("VERIFY_DTYPES")
dtype_cols = ["line_total", "bulk_order", "net_total"]
print(result[dtype_cols].dtypes)

Replace the small df with the DataFrame already loaded in the working script. Keep column names unique unless duplicate labels are intentional.

Run the script and confirm the new columns, values, and dtypes.

$ python3 create_columns.py
pandas 3.0.3

BASE
       item  qty  unit_price
0  notebook    3         2.5
1    pencil   10         0.4
2    eraser    5         0.8

LINE_TOTAL
       item  qty  unit_price  line_total
0  notebook    3         2.5         7.5
1    pencil   10         0.4         4.0
2    eraser    5         0.8         4.0

CURRENCY
       item  line_total currency
0  notebook         7.5      USD
1    pencil         4.0      USD
2    eraser         4.0      USD

BULK_DEFAULT
       item  qty  bulk_order
0  notebook    3       False
1    pencil   10       False
2    eraser    5       False

BULK_LOC
       item  qty  bulk_order
0  notebook    3       False
1    pencil   10        True
2    eraser    5       False

ASSIGN
       item  line_total  discount  net_total
0  notebook         7.5       0.0        7.5
1    pencil         4.0       0.4        3.6
2    eraser         4.0       0.0        4.0

INSERT
       item    sku  qty  line_total  net_total
0  notebook  N-100    3         7.5        7.5
1    pencil  P-200   10         4.0        3.6
2    eraser  E-300    5         4.0        4.0

VERIFY_COLUMNS
['item', 'sku', 'qty', 'unit_price', 'line_total', 'currency', 'bulk_order', 'discount', 'net_total']

VERIFY_DTYPES
line_total    float64
bulk_order       bool
net_total     float64
dtype: object

Add a calculated column with direct assignment when the current DataFrame should be updated.
```
df["line_total"] = df["qty"] * df["unit_price"]
```
Direct assignment appends the column to the end of df.columns when the label is new.
Add a scalar column when every row needs the same value.
```
df["currency"] = "USD"
```
pandas broadcasts a scalar to every row in the DataFrame.
Initialize a conditional column before setting selected rows.
```
df["bulk_order"] = False
```
Set selected rows with .loc.
```
df.loc[df["qty"] >= 10, "bulk_order"] = True
```
.loc updates the parent DataFrame in one assignment, which matches pandas 3 Copy-on-Write rules.
Related: How to migrate pandas code for Copy-on-Write

Use DataFrame.assign() when the result should stay in a method chain.

result = df.assign(
    discount=lambda data: (data["line_total"] * 0.10).where(data["bulk_order"], 0),
    net_total=lambda data: data["line_total"] - data["discount"],
)

DataFrame.assign() returns a new DataFrame. Reassign it back to df if the original variable should include the new columns.

Insert a column at a specific position when column order matters.
```
result.insert(1, "sku", ["N-100", "P-200", "E-300"])
```
DataFrame.insert() raises ValueError when the column label already exists unless duplicate labels are explicitly allowed.

Verify the final column order and important dtypes.

print(result.columns.tolist())
dtype_cols = ["line_total", "bulk_order", "net_total"]
print(result[dtype_cols].dtypes)

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.