Hard disc reliability

In the midst of the posts I made about the reliability of optical media, I want to stop a minute to look at the reliability of magnetic hard disc drives. The recent publication of two interesting studies on this subject create a kind of immediate actuality that is useful to comment upon.

Recent RAID-related posts:

The first publication is a paper by Google research lab titled “Failure Trends in a Large Disk Drive Population” that is quite interesting because it relies upon the analysis of more than 100,000 drives installed all over the Google corporation. They used a lot of data collected mostly through SMART and draw several interesting or counter-intuitive conclusions like:

  • High temperatures and heavy use patterns do not correlate positively with failure rates (on the contrary, if the drive is hotter, it seems to be slightly more reliable).
  • Only four SMART attributes (scan errors, reallocation counts, offline reallocations and probational counts) relate to the reliability of the drive but they are not able to predict the individual failures (56% of the drives failed without such a warning).

Another report from Carnegie Mellon University titled “Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?” -presented at the 5th Usenix Conference in San Jose, CA- points at the fact that the announced (or promised) reliability given as MTTF figures by the HD manufacturers are grossly overestimated. Bianca Schroeder and Garth A. Gibson insist on the fact that they -too- collected data from more than 100,000 hard disc drives and they remarked that the replacement rate predicted by the usual 1,000,000 to 1,500,000 hours MTTF of the manufacturers should translate into replacement rates of less than 0.88% per year. However, they observed in the field that it is usually more than 1%, that 2-4% is common and maximum runs at 13%.

All in all, it seems that magnetic media is still quite reliable but the figures given by the manufacturer are grossly overestimated. Would it be one more reason to think about installing RAID-based storage solutions?