How to read and write JSON with pandas

Reading and writing JSON with pandas is useful when Python data work has to exchange records with APIs, log exports, or small handoff files. A DataFrame can load record-shaped JSON, inspect the columns and values, and write the result back to a file that another program can read.

JSON Lines keeps one JSON object on each non-blank line, which maps directly to DataFrame rows without wrapping the whole file in one large array. In pandas, read_json(…, lines=True) reads that shape, while to_json(…, orient=“records”, lines=True) writes the same row-per-line layout.

Identifiers such as order numbers, account numbers, and ticket IDs should be read as strings when exact text matters. Date columns are clearer when listed explicitly in convert_dates, and date_format=“iso” keeps exported timestamps readable during the next import.

Steps to read and write JSON records with pandas:

  1. Create a JSON Lines input file named orders.jsonl.
    orders.jsonl
    {"order_id":"A100","customer":"Ada","total":125.5,"ordered_at":"2026-06-01T09:30:00Z"}
    {"order_id":"A101","customer":"Lin","total":88.0,"ordered_at":"2026-06-01T10:15:00Z"}
    {"order_id":"A102","customer":"Mira","total":142.25,"ordered_at":"2026-06-02T14:05:00Z"}

    Use one complete JSON object per non-blank line. When checking an external file first, choose NDJSON / JSON Lines in the validator.
    Tool: JSON Validator

  2. Create a Python script named json_roundtrip.py.
    json_roundtrip.py
    from pathlib import Path
     
    import pandas as pd
     
    source = Path("orders.jsonl")
    output = Path("orders-out.jsonl")
     
    orders = pd.read_json(
        source,
        lines=True,
        dtype={"order_id": "string"},
        convert_dates=["ordered_at"],
    )
     
    orders.to_json(
        output,
        orient="records",
        lines=True,
        date_format="iso",
    )
     
    round_trip = pd.read_json(
        output,
        lines=True,
        dtype={"order_id": "string"},
        convert_dates=["ordered_at"],
    )
     
    print(orders.loc[:, ["order_id", "customer", "total"]])
    print()
    print(orders.dtypes.reindex(["order_id", "customer", "total", "ordered_at"]))
    print(f"written rows: {len(round_trip)}")
    print(f"same order IDs: {round_trip['order_id'].tolist() == orders['order_id'].tolist()}")

    lines=True tells pandas to treat each line as a record. orient=“records” is required when to_json() writes line-delimited JSON.

  3. Run the script.
    $ python3 json_roundtrip.py
      order_id customer   total
    0     A100      Ada  125.50
    1     A101      Lin   88.00
    2     A102     Mira  142.25
    
    order_id                   string
    customer                      str
    total                     float64
    ordered_at    datetime64[us, UTC]
    dtype: object
    written rows: 3
    same order IDs: True
  4. Check the written JSON Lines file before handing it to another tool.
    $ cat orders-out.jsonl
    {"order_id":"A100","customer":"Ada","total":125.5,"ordered_at":"2026-06-01T09:30:00.000Z"}
    {"order_id":"A101","customer":"Lin","total":88.0,"ordered_at":"2026-06-01T10:15:00.000Z"}
    {"order_id":"A102","customer":"Mira","total":142.25,"ordered_at":"2026-06-02T14:05:00.000Z"}

    The timestamp values are ISO strings because the write step uses date_format=“iso”.

  5. Remove the sample files when they were created only for testing.
    $ rm orders.jsonl orders-out.jsonl json_roundtrip.py