Parquet files store tabular data in a columnar format that pandas can write and read without flattening every value into CSV text. Parquet keeps typed columns and supports selective column reads when DataFrame output moves to the next Python or analytics job.

The pyarrow engine keeps the read and write path explicit, while compression=“snappy” matches the default pandas compression choice for Parquet writes. Setting index=False keeps the written file focused on data columns instead of adding an index field for non-pandas consumers.

A round-trip read should show the same rows, expected dtypes, and selected columns when columns limits the read. Keep the same checks when replacing the small DataFrame with a production export, especially when indexes, categorical columns, or object-heavy columns could change the file schema.

Steps to read and write Parquet files with pandas:

  1. Install the Parquet engine package if the active Python environment does not already have one.
    $ python3 -m pip install pyarrow

    pandas requires pyarrow or fastparquet for Parquet files. Using pyarrow keeps the engine behavior explicit.
    Related: How to install pandas with pip

  2. Create a Parquet round-trip script.
    parquet_roundtrip.py
    from pathlib import Path
     
    import pandas as pd
     
     
    path = Path("orders.parquet")
     
    orders = pd.DataFrame(
        {
            "order_id": ["A100", "A101", "A102"],
            "customer": ["Ada", "Lin", "Maya"],
            "region": ["EMEA", "APAC", "AMER"],
            "total_usd": [149.50, 88.00, 212.25],
        }
    )
     
    orders.to_parquet(
        path,
        engine="pyarrow",
        compression="snappy",
        index=False,
    )
     
    round_trip = pd.read_parquet(path, engine="pyarrow")
    selected = pd.read_parquet(
        path,
        engine="pyarrow",
        columns=["order_id", "total_usd"],
    )
     
    print(round_trip.to_string(index=False))
    print()
    print(round_trip.dtypes)
    print()
    print(f"rows match: {len(round_trip) == len(orders)}")
    print(f"columns: {', '.join(round_trip.columns)}")
    print(f"selected columns: {', '.join(selected.columns)}")
    print(f"order IDs match: {round_trip['order_id'].tolist() == orders['order_id'].tolist()}")

    index=False omits the DataFrame index from the Parquet file. Leave it out or set index=True only when the index carries business data that another reader needs.

  3. Run the script to write the Parquet file and read it back.
    $ python3 parquet_roundtrip.py
    order_id customer region  total_usd
        A100      Ada   EMEA     149.50
        A101      Lin   APAC      88.00
        A102     Maya   AMER     212.25
    
    order_id         str
    customer         str
    region           str
    total_usd    float64
    dtype: object
    
    rows match: True
    columns: order_id, customer, region, total_usd
    selected columns: order_id, total_usd
    order IDs match: True

    The row check, selected-column check, and matching order IDs confirm that to_parquet() wrote the file and read_parquet() loaded the expected data.

  4. Remove the temporary files after the round-trip behavior is confirmed.
    $ rm orders.parquet parquet_roundtrip.py