Backblaze report set to examine impact of temperature spikes on hard drives
Backblaze, a specialised storage cloud platform, has released its latest "Drive Stats" report, which will now analyse the impact of temperature spikes on the failures of hard drives in data centres. This could provide crucial insights for IT decision-making, particularly regarding resilience and data centre cooling costs.
The report, "Drive Stats Q2 2023, " continues Backblaze's decade-long analysis of hard drive statistics. The results are compiled by monitoring 263,992 hard disk drives (HDDs) and solid-state drives (SSDs) across Backblaze's global data centres.
Of these, 3,242 are SSDs, 1,217 are HDDs used as boot drives, and their failure rates are analysed separately. The key focus in this report, however, is the failure rates of the remaining 259,533 HDDs.
Backblaze's meticulous review covers the data drives' quarterly and lifetime failure rates as of Q3 2023. For the first time, the drive failure rates are broken down by data centre, reported by their vault_id, pod_id, datacentre, cluster_id, and pod_slot_num, as a part of their ongoing commitment to transparency.
The Q3 2023 Drive Stats reveal significant developments in drive performance. Firstly, introducing the WDC 22TB drives (model: WUH722222ALE6L4) is notable, with a Backblaze Vault comprising 1,200 drives now operational. Installed on September 29, these drives have only had one day of service each in this reporting period, and there have been zero failures reported so far.
On the opposite end of the time-in-service spectrum, the 6TB Seagate drives (model: ST6000DX000) showcased resilience with an average of 101 months in operation. This cohort, consisting of 883 drives, reported zero failures in Q3 2023, resulting in a lifetime Annualised Failure Rate (AFR) of 0.88%.
Six different drive models also achieved zero failures during the quarter, indicating robust performance. However, as mentioned earlier, only the 6TB Seagate model surpassed the minimum standard of over 50,000 drive days, ensuring sufficient data for a plausible AFR calculation.
Conversely, four drive models reported one failure each during Q3. Notably, after applying the 50,000 drive-day metrics, two drives stood out in this category, providing valuable insights into their reliability under real-world conditions.
Backblaze also revealed that during Q3 2023, 354 individual drives, out of the 259,533 data drives in operation, exceeded their maximum manufacturer temperature at least once. Only two of these drives failed, leaving 352 drives still operational. Despite this, in anticipation of increasingly hot summers, Backblaze's data centre teams are investigating the root causes and planning for future temperature management.
Backblaze's analysis suggests that a data centre's location could impact complex drive performance. For instance, the data centre sac0 had the highest Annualized Failure Rate (AFR), possibly due to its ageing drive models and storage pods.
Simultaneously, the data centre has grown considerably since its inception a year ago, thanks to new data and customers using Backblaze's cloud replication capability.
Finally, the report commented on the lifetime AFR data: "You might have noticed the AFR for all drives hasn't changed much from quarter to quarter. It has fluctuated between 1.39% and 1.45% for the last two years."
"We have lots of drives with lots of time-in-service, so moving the needle up or down is hard. While the lifetime stats for individual drive models can be handy, the lifetime AFR for all drives will probably get less and less attractive as we add more and more drives."
"Of course, a few hundred thousand drives that never fail could arrive, so we will continue to calculate and present the lifetime AFR," the Backblaze report stated.
Furthermore, Backblaze will be introducing a separate cohort of drives, dubbed 'Hot Drives', from Q4 2023 onwards, where the company will track drives which exceeded their maximum temperature and compare their failure rates to drives operating within their manufacturer's specifications. This initiative intends to provide insight into whether high temperatures could lead to more frequent drive failures.