Creating a pandas DataFrame from in-memory Python data gives tabular rows and columns a labeled structure before filtering, joining, cleaning, or exporting. The DataFrame constructor fits data that is already in Python as dictionaries, lists, arrays, or another object rather than coming from a file.
The pd.DataFrame() constructor accepts dictionaries, list-like records, arrays, and other DataFrame inputs. For row-oriented data, a list of dictionaries keeps each input row readable, while columns fixes the output order and index supplies row labels when the default numeric index is not meaningful.
A finished check should show the expected shape, column order, row labels, inferred dtypes, and at least one selected row. In pandas 3, string columns are inferred as str by default, so dtype checks can differ from older examples that showed object for text columns.
Related: How to read CSV files with pandas
Related: How to create columns in pandas
Related: How to convert data types in pandas
import pandas as pd records = [ { "order_id": 1001, "customer": "Ada Lovelace", "region": "EMEA", "total_usd": 149.50, "paid": True, }, { "order_id": 1002, "customer": "Lin Chen", "region": "APAC", "total_usd": 89.00, "paid": False, }, { "order_id": 1003, "customer": "Maya Patel", "region": "AMER", "total_usd": 212.00, "paid": True, }, ] column_order = ["order_id", "customer", "region", "total_usd", "paid"] row_labels = ["order-1001", "order-1002", "order-1003"] df = pd.DataFrame(records, columns=column_order, index=row_labels) print(f"pandas {pd.__version__}") print() print("DATAFRAME") print(df) print() print("VERIFY_SHAPE") print(df.shape) print() print("VERIFY_COLUMNS") print(df.columns.tolist()) print() print("VERIFY_INDEX") print(df.index.tolist()) print() print("VERIFY_DTYPES") print(df.dtypes) print() print("VERIFY_ROW") print(df.loc["order-1002", ["customer", "total_usd"]])
Use a list of dictionaries when each input item represents one row. If the source is column-oriented, pass a dictionary of equal-length lists instead.
$ python3 create_dataframe.py
pandas 3.0.3
DATAFRAME
order_id customer region total_usd paid
order-1001 1001 Ada Lovelace EMEA 149.5 True
order-1002 1002 Lin Chen APAC 89.0 False
order-1003 1003 Maya Patel AMER 212.0 True
VERIFY_SHAPE
(3, 5)
VERIFY_COLUMNS
['order_id', 'customer', 'region', 'total_usd', 'paid']
VERIFY_INDEX
['order-1001', 'order-1002', 'order-1003']
VERIFY_DTYPES
order_id int64
customer str
region str
total_usd float64
paid bool
dtype: object
VERIFY_ROW
customer Lin Chen
total_usd 89.0
Name: order-1002, dtype: object
df = pd.DataFrame(records, columns=column_order, index=row_labels)
columns keeps the output order explicit. index replaces the default RangeIndex with labels that can be used with .loc.
print(df.shape)
The shape tuple is rows, columns, so (3, 5) means three records and five fields.
print(df.columns.tolist()) print(df.index.tolist())
print(df.dtypes)
Use .astype() after construction when a column needs a specific dtype before calculations, joins, or exports.
Related: How to convert data types in pandas
print(df.loc["order-1002", ["customer", "total_usd"]])
.loc selects by row label and column label, which makes it a direct check that the custom index and field names match the intended data.