How to sort a pandas DataFrame

A pandas DataFrame often needs sorting before review, ranking, export, or downstream grouping. sort_values() orders rows from one or more column values so totals, dates, regions, or statuses appear in the order the next analysis expects.

By default, sort_values() returns a new DataFrame and leaves the source row order available. The ascending argument can apply one direction to all sort keys or a separate direction for each column, and na_position controls whether missing values sort before or after non-missing values.

Index labels stay attached to their rows unless ignore_index=True resets the output labels to a new integer range. Use sort_index() when row labels, not data columns, should determine the order, especially after setting an index from a business key or date.

Steps to sort pandas DataFrame rows:

Save a short sorting script.

sort_dataframe.py

import pandas as pd
 
orders = pd.DataFrame(
    {
        "order_id": [1005, 1001, 1004, 1002, 1006, 1003],
        "customer": ["Nia", "Ada", "Omar", "Lin", "Kai", "Maya"],
        "region": ["APAC", "EMEA", "EMEA", "APAC", "AMER", "AMER"],
        "order_date": pd.to_datetime(
            [
                "2026-02-03",
                "2026-02-01",
                "2026-02-04",
                "2026-02-02",
                "2026-02-05",
                "2026-02-01",
            ]
        ),
        "ship_date": pd.to_datetime(
            [
                None,
                "2026-02-03",
                "2026-02-06",
                "2026-02-04",
                "2026-02-08",
                "2026-02-05",
            ]
        ),
        "total_usd": [360.0, 150.0, 95.0, 240.0, 420.0, 875.0],
    }
).set_index("order_id", drop=False)
 
print(f"pandas {pd.__version__}")
print()
 
print("BASE")
print(
    orders[
        ["customer", "region", "order_date", "ship_date", "total_usd"]
    ].to_string()
)
print()
 
by_total = orders.sort_values("total_usd", ascending=False)
print("SORT_TOTAL_DESC")
print(by_total.loc[:, ["customer", "region", "total_usd"]].to_string())
print()
 
by_region_date = orders.sort_values(
    ["region", "order_date"],
    ascending=[True, False],
)
print("SORT_REGION_DATE")
print(by_region_date.loc[:, ["customer", "region", "order_date", "total_usd"]].to_string())
print()
 
missing_ship_first = orders.sort_values("ship_date", na_position="first")
print("SORT_MISSING_SHIP_FIRST")
print(missing_ship_first.loc[:, ["customer", "ship_date", "total_usd"]].to_string())
print()
 
ranked = orders.sort_values("total_usd", ascending=False, ignore_index=True)
print("SORT_RESET_INDEX")
print(ranked.loc[:, ["order_id", "customer", "total_usd"]].to_string())
print()
 
by_index = orders.sort_index()
print("SORT_INDEX")
print(by_index.loc[:, ["customer", "region", "total_usd"]].to_string())
print()
 
print("VERIFY")
print(f"highest total order_id: {by_total.iloc[0]['order_id']}")
print(f"region/date order_ids: {by_region_date['order_id'].tolist()}")
print(f"missing ship_date first: {missing_ship_first.iloc[0]['ship_date']}")
print(f"reset index labels: {ranked.index.tolist()}")
print(f"source index order unchanged: {orders.index.tolist()}")

Replace the orders object with the DataFrame already loaded in the working script. Keep a visible row key when sorted output must still identify each record.

Run the script and confirm the sorted sections and verification lines.

$ python3 sort_dataframe.py
pandas 3.0.3

BASE
         customer region order_date  ship_date  total_usd
order_id                                                 
1005          Nia   APAC 2026-02-03        NaT      360.0
1001          Ada   EMEA 2026-02-01 2026-02-03      150.0
1004         Omar   EMEA 2026-02-04 2026-02-06       95.0
1002          Lin   APAC 2026-02-02 2026-02-04      240.0
1006          Kai   AMER 2026-02-05 2026-02-08      420.0
1003         Maya   AMER 2026-02-01 2026-02-05      875.0

SORT_TOTAL_DESC
         customer region  total_usd
order_id                           
1003         Maya   AMER      875.0
1006          Kai   AMER      420.0
1005          Nia   APAC      360.0
1002          Lin   APAC      240.0
1001          Ada   EMEA      150.0
1004         Omar   EMEA       95.0

SORT_REGION_DATE
         customer region order_date  total_usd
order_id                                      
1006          Kai   AMER 2026-02-05      420.0
1003         Maya   AMER 2026-02-01      875.0
1005          Nia   APAC 2026-02-03      360.0
1002          Lin   APAC 2026-02-02      240.0
1004         Omar   EMEA 2026-02-04       95.0
1001          Ada   EMEA 2026-02-01      150.0

SORT_MISSING_SHIP_FIRST
         customer  ship_date  total_usd
order_id                               
1005          Nia        NaT      360.0
1001          Ada 2026-02-03      150.0
1002          Lin 2026-02-04      240.0
1003         Maya 2026-02-05      875.0
1004         Omar 2026-02-06       95.0
1006          Kai 2026-02-08      420.0

SORT_RESET_INDEX
   order_id customer  total_usd
0      1003     Maya      875.0
1      1006      Kai      420.0
2      1005      Nia      360.0
3      1002      Lin      240.0
4      1001      Ada      150.0
5      1004     Omar       95.0

SORT_INDEX
         customer region  total_usd
order_id                           
1001          Ada   EMEA      150.0
1002          Lin   APAC      240.0
1003         Maya   AMER      875.0
1004         Omar   EMEA       95.0
1005          Nia   APAC      360.0
1006          Kai   AMER      420.0

VERIFY
highest total order_id: 1003
region/date order_ids: [1006, 1003, 1005, 1002, 1004, 1001]
missing ship_date first: NaT
reset index labels: [0, 1, 2, 3, 4, 5]
source index order unchanged: [1005, 1001, 1004, 1002, 1006, 1003]

Sort rows by one numeric column in descending order.
```
by_total = orders.sort_values("total_usd", ascending=False)
```
sort_values() returns a new DataFrame unless inplace=True is used. Assign the returned object when the original row order should remain available.
Sort rows by multiple columns with separate sort directions.
```
by_region_date = orders.sort_values(
    ["region", "order_date"],
    ascending=[True, False],
)
```
The ascending list must match the columns passed to by. In this case, region sorts A to Z and order_date sorts newest to oldest within each region.
Move missing dates to the beginning of the sorted output.
```
missing_ship_first = orders.sort_values("ship_date", na_position="first")
```
na_position=“last” is the default. Use na_position=“first” when blank, NaN, or NaT values need review before complete rows.
Reset the output labels when the sorted table should use a fresh integer index.
```
ranked = orders.sort_values("total_usd", ascending=False, ignore_index=True)
```
ignore_index=True changes the output index to 0, 1, 2, .... Keep identifier columns such as order_id in the data when sorted exports still need record IDs.
Sort by existing row labels when the index carries the intended order.
```
by_index = orders.sort_index()
```
sort_index() uses index labels instead of column values. It is useful after set_index() moves order IDs, dates, or grouped keys into the row index.
Related: How to set an index in pandas

Verify the top sorted row, reset labels, and unchanged source order.

print(f"highest total order_id: {by_total.iloc[0]['order_id']}")
print(f"reset index labels: {ranked.index.tolist()}")
print(f"source index order unchanged: {orders.index.tolist()}")

Remove the temporary sorting script after confirming the sort behavior.
```
$ rm sort_dataframe.py
```

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.