Checking disk health in Linux helps catch failing media, worn flash, and controller-visible read errors before they turn into filesystem damage, a degraded array, or an unplanned outage. smartctl is the standard command for reading that health data while the disk is still online.
Most ATA, SATA, SAS, and NVMe devices keep health counters, error logs, and self-test history in firmware. On Linux, smartctl from the smartmontools package reads that data from the whole-disk device, shows a quick summary with --health, and exposes the fuller logs and self-test controls needed to judge whether the drive is trending toward replacement.
These checks usually need sudo, and some storage layers do not pass usable S.M.A.R.T. data through to the running system. USB bridges, hardware RAID controllers, cloud volumes, and guest-visible virtual disks can return SMART support is: Unavailable - device lacks SMART capability even when the physical disk is healthy, so use smartctl with --scan-open to identify the usable device type and move the check to the host or controller layer when pass-through is absent.
Related: How to check disk errors in Linux
Related: How to mount a disk or partition in Linux
Steps to check disk health in Linux:
- List the whole-disk device nodes so the health check runs against the disk itself instead of a partition.
$ lsblk -d -e 7,11 -o NAME,PATH,SIZE,MODEL,TRAN NAME PATH SIZE MODEL TRAN sda /dev/sda 477G Samsung SSD 870 EVO sata
Use a whole disk such as /dev/sda in the next commands. For NVMe, use the device path that smartctl reports with --scan-open, such as /dev/nvme0 or /dev/nvme0n1, not a partition such as /dev/sda1. If smartctl is missing, install the smartmontools package with the distribution package manager first.
- Read the device identity and current health summary.
$ sudo smartctl --info --health /dev/sda === START OF INFORMATION SECTION === Device Model: Samsung SSD 870 EVO 500GB SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
A passing summary means the firmware is not reporting an immediate failure condition, but it does not replace the detailed counters in the next step. If the result says Unavailable - device lacks SMART capability, skip to the last step and retry with the detected device type or run the check on the host or controller that owns the physical disk.
- Review the full SMART report for the counters and logs that usually show trouble before the summary line changes.
$ sudo smartctl --xall /dev/sda === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART Attributes Data Structure revision number: 16 ID# ATTRIBUTE_NAME VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 100 100 010 Pre-fail Always - 0 194 Temperature_Celsius 69 58 000 Old_age Always - 31 197 Current_Pending_Sector 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 100 100 000 Old_age Offline - 0
Growing Reallocated_Sector_Ct, Current_Pending_Sector, Offline_Uncorrectable, Media and Data Integrity Errors, or Available Spare dropping to its threshold are stronger replacement signals than the single health line by itself.
--xall is the fuller report. On ATA disks it includes logs and capability data that --all does not enable.
- Start a short self-test when the disk reports that self-tests are supported and the workload can tolerate a background read check.
$ sudo smartctl --test=short /dev/sda === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete.
A short self-test is non-destructive, but it still reads the device in the background and can add latency on a busy system. If the command reports unavailable SMART support or an unsupported test type, keep the summary and report steps and run the deeper diagnostic from the host, storage controller, or a maintenance environment that exposes the physical disk directly.
- Read the self-test log after the reported wait time to confirm that the latest diagnostic finished cleanly.
$ sudo smartctl --log=selftest /dev/sda SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 18423 -
A completed short test with no recorded error confirms that the drive finished its latest built-in diagnostic. If the status shows a read failure or an LBA_of_first_error value, copy data off the disk and plan replacement instead of repeating the same test cycle.
- Retry through the detected device type when an adapter or controller hides the disk behind another storage layer.
$ sudo smartctl --scan-open /dev/sda -d sat # /dev/sda [SAT], ATA device $ sudo smartctl --xall --device=sat /dev/sda ##### snipped #####
--scan-open shows the device type that smartctl can actually open. Common examples are --device=sat for many USB-to-SATA bridges, --device=nvme for NVMe devices, and controller-specific forms such as --device=megaraid,N. If the disk is a guest-visible VM disk or a cloud volume and the command still reports unavailable SMART support, run the health check on the host or storage platform that owns the physical device.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
