What are the common Reasons of Hard Disk Drive failure?

What are the common Reasons of Hard Disk Drive failure?

1. Electronic Component Failure
2. Motor Failure
3. Read / Write Head Failure
4. Media Damage
5. Firmware Corruption
6. Logical Failure

One or all of the above primary causes may be evident when diagnosing a failed hard disk drive.

Electronic Component Failure

Electronic components may fail due to voltage transients, heat or poor handling. Substitution, repair and re-programming is generally required in order to recover data stored on the hard disk. PCB assemblies are however hyper tuned at the manufacture stage and specialist re-programming and calibration is subsequently required to restore the hard disk to a working condition.

Motor Failure

Hard disk motor spindles have fluid bearings; sometimes this fluid leaks or becomes overheated and in-effective. The motor will then seize and the hard disk platters fail to rotate. Platter and component re-location to another hard disk assembly is required to effect a repair and restore data.

Read / Write Head Failure
Read / write heads are aerodynamically designed to “fly” at nanometer distances above the surface of the platters. Ceramic thin film sensors at their tip detect magnetic information (data) stored on the surface of the platter. Occasionally the atmosphere in the hard disk enclosure will become contaminated or vibration will cause the dynamic of the head to be disturbed. This disturbance will cause the read /write process to malfunction resulting in bad data read write cycles and eventual failure.
This type of failure usually manifests itself as a distinct clicking noise as the head actuator makes failed repeat attempts to locate data at the same platter track location.

Media Damage

Amazingly all computer hard disk magnetic storage media is manufactured imperfect but to acceptable and controllable levels. During normal operations imperfections will sometimes increase above the predefined acceptable level. This can be due to heat, vibration, head crash, shock or other factors. The operating system will flag errors or fail to boot and data files will then become in-accessible. Read/.Write head replacement and file repair will allow data file structures to be examined and assessed as to their validity.

Firmware Corruption

Hard disk firmware holds precise parameters relevant to the configuration of the assembly at the time of manufacture. Occasionally the firmware becomes corrupt or will “roll back” to an incorrect set of parameters. Under these conditions the location of the stored data as reported to the operating system will be lost. Simple restoration of the correct parameters will allow the hard disk to function correctly. What causes this corruption? Operating system to drive software bugs, control bus protocol failure, it is difficult to determine but failures do occur.

Logical Failure

Data files are stored at logical locations that relate to a number of physical locations on the surface of the hard disk platters. These logical locations are held in tables by the operating system and indexed when running specific software applications. Operating system errors, reloads or incorrect upgrade applications will sometimes corrupt these tables and data will become in-accessible. This is generally referred to as a logical failure. Logical errors can be repaired with software tools available from the internet. Be cautious however – if you are intending to run a fix utility on your disk you can inadvertently damage these tables irreparably and your data will be unrecoverable. This is especially true when running ScanDisk and Chkdsk on a damaged hard drive.

Hard Drive Failure Signs

  • Your computer “freezes” too often (the picture on the screen is still and does not react to mouse or keyboard manipulations)
  • Regular booting problems. That may be a sign of bad sectors on the disk with corrupted booting data.
  • You computer is terribly slow while accessing, saving and opening files.
  • The usual sound produced by your hard drive is louder than before
  • Regular appearance of BSOD (Blue Screen of Death), “Operating system not found or Missing Operating System” or “your hard drive is not formatted” messages at startup.

Even if you haven’t backed up your files yet, these signs give you a chance and some time to copy the data before the drive crashes.

Far more ominous signs are:

  • Your computer is still running normally, but you can hear unusual metallic sounds (grinding, clicking, whirring, scratching, buzzing). That’s a very bad sign that may imply mechanical damage.
  • You cannot hear any hard drive sounds at all. When the information is written to or read from the disk, it spins and produces sounds, you must be accustomed to these normal sounds. As an example, hard drive becomes silent when it’s inside components expand and get stuck because of overheating.
  • Your hard drive is clicking or producing grinding metallic sounds, your computer won’t recognize the hard disk. This is a sign that hard drive failure have happened.

If the above occurs, shut down immediately and contact a disk recovery service! If you keep your computer running the platters may be damaged and your files will be unrecoverable. Also, if your hard drive has undergone mechanical damage or was exposed to water, fire, smoke or high temperatures, don’t try to power it up. Contact a disk recovery service.

Read More

Preventive recovery action in hard disk drives

1. A method in a data processing system for minimizing read/write errors caused by impaired performance of a hard disk drive during runtime operation of said hard disk drive, said runtime operation including an active mode during which read/write operations are performed and a standby mode during which no read/write operation is underway, said method comprising the steps of: monitoring at least one performance parameter of a hard disk drive during said standby mode of operation; and in response to detecting a degraded value of said at least one performance parameter during said monitoring, performing preventive recovery action only during said standby mode of operation, wherein said preventive recovery action includes restoring said performance parameter to an acceptable value without interfering with hard disk drive operation during an active mode.

2. The method of claim 1 wherein said performance parameter is signal resolution, and wherein said step of performing preventive recovery action comprises the step of adjusting a fly height of a read/write head within said hard disk drive, such that said signal resolution is maintained at an acceptable level.

3. The method of claim 1, wherein said data processing system includes a disk drive controller associated with said disk drive, said method further comprising the steps of: during said step of monitoring at least one performance parameter, detecting a degradation of said performance parameter beyond a pre-determined value; and in response to detecting a degradation of said performance parameter, performing preventive recovery action during said standby mode, wherein said preventive recovery action instructs said disk drive controller to undertake corrective action to rectify the degraded performance parameter.

4. The method of claim 1, further comprising the steps of: detecting a read/write error during said active mode of operation, said error having a cause that is correlated to said performance parameter; and in response to detecting a read/write error during said active mode of operation, examining said performance parameter during said standby mode, such that said cause may be diagnosed and further read/write errors prevented.

5. The method of claim 4, further comprising the step of correlating said preventive recovery action to said cause of said read/write error, such that said cause may be corrected.

6. The method of claim 4, wherein said step of examining said at least one performance parameter is preceded by the steps of: initiating a data recovery procedure during said active mode; and upon completion of said data recovery procedure, initiating preventive recovery action during said standby mode, such that a subsequent read/write error may be prevented.

7. The method of claim 6, wherein the step of initiating preventive recovery action during said standby mode is followed by the steps of: determining whether said cause has been corrected by said preventive recovery action; in response to said cause having been corrected, continuing said runtime operation of said hard disk drive; and in response to said cause having not been corrected, utilizing predictive failure analysis to issue a warning, such that said hard disk drive may be taken off-line.

8. A system for preventing read/write failures within a hard disk drive during runtime operation of said hard disk drive, said runtime operation including an active mode during which read/write operations are performed and a standby mode during which no read/write operation is underway, said hard disk drive including a controller for providing electromechanical control of said hard disk drive, said system comprising: means within a disk controller for monitoring a performance parameter of said hard disk drive during said standby mode of operation; means responsive to a detected degradation of said performance parameter for producing an error signal indicative of a potential hard disk drive failure; and means responsive to receiving said error signal for initiating preventive recovery action only during a standby mode of operation, wherein said preventive recovery action includes restoring said performance parameter to an acceptable value without interfering with hard disk drive operation during an active mode.

9. The system of claim 8, wherein said means for monitoring a performance parameter of a hard disk drive and said means for producing an error signal in response to detection of a potential hard disk drive failure, are predictive failure analysis instruction means.

10. The system of claim 9, further comprising: a controller for providing electromechanical control of said hard disk drive, said controller receiving and executing said predictive failure analysis instructions.

11. The system of claim 9, wherein said means for initiating preventive recovery action only during a standby mode of operation are preventive recovery action instruction means included within said controller.

Read More