Checking disk health in Linux helps catch failing media, worn flash, and controller-visible read errors before they turn into filesystem damage, a degraded array, or an unplanned outage. smartctl is the standard command for reading that health data while the disk is still online.

Most ATA, SATA, SAS, and NVMe devices keep health counters, error logs, and self-test history in firmware. On Linux, smartctl from the smartmontools package reads that data from the whole-disk device, shows a quick summary with --health, and exposes the fuller logs and self-test controls needed to judge whether the drive is trending toward replacement.

These checks usually need sudo, and some storage layers do not pass usable S.M.A.R.T. data through to the running system. USB bridges, hardware RAID controllers, cloud volumes, and guest-visible virtual disks can return SMART support is: Unavailable - device lacks SMART capability even when the physical disk is healthy, so use smartctl with --scan-open to identify the usable device type and move the check to the host or controller layer when pass-through is absent.

Steps to check disk health in Linux:

  1. List the whole-disk device nodes so the health check runs against the disk itself instead of a partition.
    $ lsblk -d -e 7,11 -o NAME,PATH,SIZE,MODEL,TRAN
    NAME PATH     SIZE MODEL                TRAN
    sda  /dev/sda 477G Samsung SSD 870 EVO  sata

    Use a whole disk such as /dev/sda in the next commands. For NVMe, use the device path that smartctl reports with --scan-open, such as /dev/nvme0 or /dev/nvme0n1, not a partition such as /dev/sda1. If smartctl is missing, install the smartmontools package with the distribution package manager first.

  2. Read the device identity and current health summary.
    $ sudo smartctl --info --health /dev/sda
    === START OF INFORMATION SECTION ===
    Device Model:     Samsung SSD 870 EVO 500GB
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    A passing summary means the firmware is not reporting an immediate failure condition, but it does not replace the detailed counters in the next step. If the result says Unavailable - device lacks SMART capability, skip to the last step and retry with the detected device type or run the check on the host or controller that owns the physical disk.

  3. Review the full SMART report for the counters and logs that usually show trouble before the summary line changes.
    $ sudo smartctl --xall /dev/sda
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    SMART Attributes Data Structure revision number: 16
    ID# ATTRIBUTE_NAME          VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      5 Reallocated_Sector_Ct  100   100   010    Pre-fail  Always       -       0
    194 Temperature_Celsius     69    58   000    Old_age   Always       -       31
    197 Current_Pending_Sector 100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable  100   100   000    Old_age   Offline      -       0

    Growing Reallocated_Sector_Ct, Current_Pending_Sector, Offline_Uncorrectable, Media and Data Integrity Errors, or Available Spare dropping to its threshold are stronger replacement signals than the single health line by itself.

    --xall is the fuller report. On ATA disks it includes logs and capability data that --all does not enable.

  4. Start a short self-test when the disk reports that self-tests are supported and the workload can tolerate a background read check.
    $ sudo smartctl --test=short /dev/sda
    === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
    Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
    Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
    Testing has begun.
    Please wait 2 minutes for test to complete.

    A short self-test is non-destructive, but it still reads the device in the background and can add latency on a busy system. If the command reports unavailable SMART support or an unsupported test type, keep the summary and report steps and run the deeper diagnostic from the host, storage controller, or a maintenance environment that exposes the physical disk directly.

  5. Read the self-test log after the reported wait time to confirm that the latest diagnostic finished cleanly.
    $ sudo smartctl --log=selftest /dev/sda
    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Short offline       Completed without error       00%          18423         -

    A completed short test with no recorded error confirms that the drive finished its latest built-in diagnostic. If the status shows a read failure or an LBA_of_first_error value, copy data off the disk and plan replacement instead of repeating the same test cycle.

  6. Retry through the detected device type when an adapter or controller hides the disk behind another storage layer.
    $ sudo smartctl --scan-open
    /dev/sda -d sat # /dev/sda [SAT], ATA device
    
    $ sudo smartctl --xall --device=sat /dev/sda
    ##### snipped #####

    --scan-open shows the device type that smartctl can actually open. Common examples are --device=sat for many USB-to-SATA bridges, --device=nvme for NVMe devices, and controller-specific forms such as --device=megaraid,N. If the disk is a guest-visible VM disk or a cloud volume and the command still reports unavailable SMART support, run the health check on the host or storage platform that owns the physical device.