How to create a pandas DataFrame

Creating a pandas DataFrame from in-memory Python data gives tabular rows and columns a labeled structure before filtering, joining, cleaning, or exporting. The DataFrame constructor fits data that is already in Python as dictionaries, lists, arrays, or another object rather than coming from a file.

The pd.DataFrame() constructor accepts dictionaries, list-like records, arrays, and other DataFrame inputs. For row-oriented data, a list of dictionaries keeps each input row readable, while columns fixes the output order and index supplies row labels when the default numeric index is not meaningful.

A finished check should show the expected shape, column order, row labels, inferred dtypes, and at least one selected row. In pandas 3, string columns are inferred as str by default, so dtype checks can differ from older examples that showed object for text columns.

Steps to create a pandas DataFrame:

Save a short DataFrame creation script.

create_dataframe.py

import pandas as pd
 
records = [
    {
        "order_id": 1001,
        "customer": "Ada Lovelace",
        "region": "EMEA",
        "total_usd": 149.50,
        "paid": True,
    },
    {
        "order_id": 1002,
        "customer": "Lin Chen",
        "region": "APAC",
        "total_usd": 89.00,
        "paid": False,
    },
    {
        "order_id": 1003,
        "customer": "Maya Patel",
        "region": "AMER",
        "total_usd": 212.00,
        "paid": True,
    },
]
 
column_order = ["order_id", "customer", "region", "total_usd", "paid"]
row_labels = ["order-1001", "order-1002", "order-1003"]
 
df = pd.DataFrame(records, columns=column_order, index=row_labels)
 
print(f"pandas {pd.__version__}")
print()
 
print("DATAFRAME")
print(df)
print()
 
print("VERIFY_SHAPE")
print(df.shape)
print()
 
print("VERIFY_COLUMNS")
print(df.columns.tolist())
print()
 
print("VERIFY_INDEX")
print(df.index.tolist())
print()
 
print("VERIFY_DTYPES")
print(df.dtypes)
print()
 
print("VERIFY_ROW")
print(df.loc["order-1002", ["customer", "total_usd"]])

Use a list of dictionaries when each input item represents one row. If the source is column-oriented, pass a dictionary of equal-length lists instead.

Run the script and confirm the table and verification output.

$ python3 create_dataframe.py
pandas 3.0.3

DATAFRAME
            order_id      customer region  total_usd   paid
order-1001      1001  Ada Lovelace   EMEA      149.5   True
order-1002      1002      Lin Chen   APAC       89.0  False
order-1003      1003    Maya Patel   AMER      212.0   True

VERIFY_SHAPE
(3, 5)

VERIFY_COLUMNS
['order_id', 'customer', 'region', 'total_usd', 'paid']

VERIFY_INDEX
['order-1001', 'order-1002', 'order-1003']

VERIFY_DTYPES
order_id       int64
customer         str
region           str
total_usd    float64
paid            bool
dtype: object

VERIFY_ROW
customer     Lin Chen
total_usd        89.0
Name: order-1002, dtype: object

Create the DataFrame from the record list.
```
df = pd.DataFrame(records, columns=column_order, index=row_labels)
```
columns keeps the output order explicit. index replaces the default RangeIndex with labels that can be used with .loc.
Verify the row and column count.
```
print(df.shape)
```
The shape tuple is rows, columns, so (3, 5) means three records and five fields.

Verify the column and row labels.

print(df.columns.tolist())
print(df.index.tolist())

Check inferred dtypes before analysis continues.
```
print(df.dtypes)
```
Use .astype() after construction when a column needs a specific dtype before calculations, joins, or exports.
Related: How to convert data types in pandas
Select one known row to confirm representative values.
```
print(df.loc["order-1002", ["customer", "total_usd"]])
```
.loc selects by row label and column label, which makes it a direct check that the custom index and field names match the intended data.