Parquet files store tabular data in a columnar format that pandas can write and read without flattening every value into CSV text. Parquet keeps typed columns and supports selective column reads when DataFrame output moves to the next Python or analytics job.
The pyarrow engine keeps the read and write path explicit, while compression=“snappy” matches the default pandas compression choice for Parquet writes. Setting index=False keeps the written file focused on data columns instead of adding an index field for non-pandas consumers.
A round-trip read should show the same rows, expected dtypes, and selected columns when columns limits the read. Keep the same checks when replacing the small DataFrame with a production export, especially when indexes, categorical columns, or object-heavy columns could change the file schema.
Related: How to read CSV files with pandas
Related: How to write a CSV file with pandas
Related: How to read and write JSON with pandas
$ python3 -m pip install pyarrow
pandas requires pyarrow or fastparquet for Parquet files. Using pyarrow keeps the engine behavior explicit.
Related: How to install pandas with pip
from pathlib import Path import pandas as pd path = Path("orders.parquet") orders = pd.DataFrame( { "order_id": ["A100", "A101", "A102"], "customer": ["Ada", "Lin", "Maya"], "region": ["EMEA", "APAC", "AMER"], "total_usd": [149.50, 88.00, 212.25], } ) orders.to_parquet( path, engine="pyarrow", compression="snappy", index=False, ) round_trip = pd.read_parquet(path, engine="pyarrow") selected = pd.read_parquet( path, engine="pyarrow", columns=["order_id", "total_usd"], ) print(round_trip.to_string(index=False)) print() print(round_trip.dtypes) print() print(f"rows match: {len(round_trip) == len(orders)}") print(f"columns: {', '.join(round_trip.columns)}") print(f"selected columns: {', '.join(selected.columns)}") print(f"order IDs match: {round_trip['order_id'].tolist() == orders['order_id'].tolist()}")
index=False omits the DataFrame index from the Parquet file. Leave it out or set index=True only when the index carries business data that another reader needs.
$ python3 parquet_roundtrip.py
order_id customer region total_usd
A100 Ada EMEA 149.50
A101 Lin APAC 88.00
A102 Maya AMER 212.25
order_id str
customer str
region str
total_usd float64
dtype: object
rows match: True
columns: order_id, customer, region, total_usd
selected columns: order_id, total_usd
order IDs match: True
The row check, selected-column check, and matching order IDs confirm that to_parquet() wrote the file and read_parquet() loaded the expected data.
$ rm orders.parquet parquet_roundtrip.py