Disk Imaging Problems

Disk imaging is not a trivial task. Degraded drives and drives whose physical defects have been repaired often yield a high read error rate, or read instability. Read instability means that multiple readings of the same data tend to yield different results each time. Read instability differs from cases where data originally written in error is read consistently each time, or where a physical scratch makes an entire part of the disk unreadable no matter how many read attempts are made.

What Causes Read Instability?

Repaired drives may have high read instability for a number of reasons. Donor parts, while nominally identical to the original failed parts, differ slightly due to tolerances in the manufacturing process. The majority of modern hard disk drives are individually tested at the factory and have optimized adaptive parameters burned into the read-only memory. If one or more parts are later replaced, the adaptive parameters are no longer optimized to the new configuration. Because modern drives are fine-tuned to obtain maximum performance and capacity, even small deviations from optimum can introduce a high rate of read errors.

In drives that have not been repaired, physical degradation due to normal use also means that adaptive parameters are no longer optimum, thus leading to read instability. For example, wear in the actuator bearings causes slight deviations in the distance the arm moves across the platter. As a result, the current that was once required to move the actuator to a certain track is no longer sufficient to move the arm precisely. Read errors can also be caused by dust infiltration into the disk drive over the course of its life, which may make the read process electronically noisy.

The important point is that failed drives tend to have read instability no matter what the reason for failure.

For the purposes of imaging a disk, read instability is not a major obstacle in principle, since each sector can be read several times and statistical methods can be applied to obtain the most likely correct value for each byte. However, because of the way that the basic input/output system (BIOS), operating system (OS), and drive firmware behave, it is impossible to obtain data during drive read instability without specialized hardware and software.

Why System Software Can’t Handle Read Instability

System software consists of a computer’s BIOS and OS. The primary reason that computers can’t handle read instability without specialized tools is that the system software reads each disk sector presuming that the sector is not corrupted—that is, that the data is read-stable. A sector on a disk typically consists of a header, 512 bytes of data, and error correction code (ECC) bytes (Figure 1). Other bytes are present to facilitate read/write synchronization.

For simplicity, the ECC bytes can be thought of as a checksum to validate data integrity. Most current hard disk drives are Advanced Technology Attachment (ATA) compliant devices. The system software reads the drive sector-by-sector, using the standard ATA read sector command. If a sector has a wrong ECC checksum, it returns no data, just an error, despite the fact that most of the bytes in the sector are likely correct and all are actually readable, even if some of the data is corrupt. The system software is unable to access data byte-by-byte, using other ATA read commands that read data regardless of ECC status.

 Example of Sector Format.

The system software reads drives in this way because the mandate of computer architecture is to create a reliable, stable machine that runs at the highest possible speed. The system software cannot allow errors to be propagated from the hard drive to the computer’s random access memory (RAM). These errors might be part of an executable file that would then cause the computer to stop responding. Thus, system software has a low tolerance for read instability and assumes that the drive is working correctly without delving into drive errors except to flag whole sectors as bad and to avoid reading them.

Also, in many cases the system software is unable to handle drive errors, a situation that may cause the system to stop responding.

Using the system software to read a drive is also time-consuming and involves intense drive processing, leading to further wear. This wear increases the likelihood of further disk degradation during the imaging process. See “Processing All Bytes in Sectors with Errors ” for more information on how disk imaging is by its nature very demanding on the drive, involving multiple reads on the entire disk and modifying its firmware data. Since progressive disk degradation is common (that is, read instability increases with use), it is desirable to reduce the
demands on the drive.

Thus, even a relatively low level of read instability, which is common for drives in recovery, can lead to the computer not being able to read the drive at all, despite the fact that the drive is still readable by other means.