Creating columns in pandas adds calculated fields, labels, or fixed values to an existing DataFrame before analysis, aggregation, or export. A new column can come from arithmetic between existing columns, a scalar repeated for every row, or a vectorized expression that marks rows matching a condition.

Direct assignment with df[“name”] = value updates the current DataFrame and appends the column at the end. DataFrame.assign() returns a new DataFrame, which fits method chains and can build multiple columns in order when a later column depends on an earlier one.

Use a scalar, full-length list or array, aligned Series, or vectorized expression for the new values. With pandas 3 Copy-on-Write, chained assignment such as df[“col”][mask] = value does not update the parent DataFrame; use .loc for conditional column values instead.

Steps to create pandas DataFrame columns:

  1. Save a short column creation script.
    create_columns.py
    import pandas as pd
     
    df = pd.DataFrame(
        {
            "item": ["notebook", "pencil", "eraser"],
            "qty": [3, 10, 5],
            "unit_price": [2.5, 0.4, 0.8],
        }
    )
     
    print(f"pandas {pd.__version__}")
    print()
     
    print("BASE")
    print(df)
    print()
     
    df["line_total"] = df["qty"] * df["unit_price"]
     
    print("LINE_TOTAL")
    print(df)
    print()
     
    df["currency"] = "USD"
     
    print("CURRENCY")
    currency_cols = ["item", "line_total", "currency"]
    print(df[currency_cols])
    print()
     
    df["bulk_order"] = False
     
    print("BULK_DEFAULT")
    bulk_cols = ["item", "qty", "bulk_order"]
    print(df[bulk_cols])
    print()
     
    df.loc[df["qty"] >= 10, "bulk_order"] = True
     
    print("BULK_LOC")
    print(df[bulk_cols])
    print()
     
    result = df.assign(
        discount=lambda data: (data["line_total"] * 0.10).where(data["bulk_order"], 0),
        net_total=lambda data: data["line_total"] - data["discount"],
    )
     
    print("ASSIGN")
    summary_cols = ["item", "line_total", "discount", "net_total"]
    print(result[summary_cols])
    print()
     
    result.insert(1, "sku", ["N-100", "P-200", "E-300"])
     
    print("INSERT")
    insert_cols = ["item", "sku", "qty", "line_total", "net_total"]
    print(result[insert_cols])
    print()
     
    print("VERIFY_COLUMNS")
    print(result.columns.tolist())
    print()
     
    print("VERIFY_DTYPES")
    dtype_cols = ["line_total", "bulk_order", "net_total"]
    print(result[dtype_cols].dtypes)

    Replace the small df with the DataFrame already loaded in the working script. Keep column names unique unless duplicate labels are intentional.

  2. Run the script and confirm the new columns, values, and dtypes.
    $ python3 create_columns.py
    pandas 3.0.3
    
    BASE
           item  qty  unit_price
    0  notebook    3         2.5
    1    pencil   10         0.4
    2    eraser    5         0.8
    
    LINE_TOTAL
           item  qty  unit_price  line_total
    0  notebook    3         2.5         7.5
    1    pencil   10         0.4         4.0
    2    eraser    5         0.8         4.0
    
    CURRENCY
           item  line_total currency
    0  notebook         7.5      USD
    1    pencil         4.0      USD
    2    eraser         4.0      USD
    
    BULK_DEFAULT
           item  qty  bulk_order
    0  notebook    3       False
    1    pencil   10       False
    2    eraser    5       False
    
    BULK_LOC
           item  qty  bulk_order
    0  notebook    3       False
    1    pencil   10        True
    2    eraser    5       False
    
    ASSIGN
           item  line_total  discount  net_total
    0  notebook         7.5       0.0        7.5
    1    pencil         4.0       0.4        3.6
    2    eraser         4.0       0.0        4.0
    
    INSERT
           item    sku  qty  line_total  net_total
    0  notebook  N-100    3         7.5        7.5
    1    pencil  P-200   10         4.0        3.6
    2    eraser  E-300    5         4.0        4.0
    
    VERIFY_COLUMNS
    ['item', 'sku', 'qty', 'unit_price', 'line_total', 'currency', 'bulk_order', 'discount', 'net_total']
    
    VERIFY_DTYPES
    line_total    float64
    bulk_order       bool
    net_total     float64
    dtype: object
  3. Add a calculated column with direct assignment when the current DataFrame should be updated.
    df["line_total"] = df["qty"] * df["unit_price"]

    Direct assignment appends the column to the end of df.columns when the label is new.

  4. Add a scalar column when every row needs the same value.
    df["currency"] = "USD"

    pandas broadcasts a scalar to every row in the DataFrame.

  5. Initialize a conditional column before setting selected rows.
    df["bulk_order"] = False
  6. Set selected rows with .loc.
    df.loc[df["qty"] >= 10, "bulk_order"] = True

    .loc updates the parent DataFrame in one assignment, which matches pandas 3 Copy-on-Write rules.
    Related: How to migrate pandas code for Copy-on-Write

  7. Use DataFrame.assign() when the result should stay in a method chain.
    result = df.assign(
        discount=lambda data: (data["line_total"] * 0.10).where(data["bulk_order"], 0),
        net_total=lambda data: data["line_total"] - data["discount"],
    )

    DataFrame.assign() returns a new DataFrame. Reassign it back to df if the original variable should include the new columns.

  8. Insert a column at a specific position when column order matters.
    result.insert(1, "sku", ["N-100", "P-200", "E-300"])

    DataFrame.insert() raises ValueError when the column label already exists unless duplicate labels are explicitly allowed.

  9. Verify the final column order and important dtypes.
    print(result.columns.tolist())
    dtype_cols = ["line_total", "bulk_order", "net_total"]
    print(result[dtype_cols].dtypes)