Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enterprise Disk Drive Scrubbing Based on Mondrian Conformal Predictors (2306.17169v1)

Published 1 Jun 2023 in cs.DC, cs.AI, and cs.LG

Abstract: Disk scrubbing is a process aimed at resolving read errors on disks by reading data from the disk. However, scrubbing the entire storage array at once can adversely impact system performance, particularly during periods of high input/output operations. Additionally, the continuous reading of data from disks when scrubbing can result in wear and tear, especially on larger capacity disks, due to the significant time and energy consumption involved. To address these issues, we propose a selective disk scrubbing method that enhances the overall reliability and power efficiency in data centers. Our method employs a Machine Learning model based on Mondrian Conformal prediction to identify specific disks for scrubbing, by proactively predicting the health status of each disk in the storage pool, forecasting n-days in advance, and using an open-source dataset. For disks predicted as non-healthy, we mark them for replacement without further action. For healthy drives, we create a set and quantify their relative health across the entire storage pool based on the predictor's confidence. This enables us to prioritize selective scrubbing for drives with established scrubbing frequency based on the scrub cycle. The method we propose provides an efficient and dependable solution for managing enterprise disk drives. By scrubbing just 22.7% of the total storage disks, we can achieve optimized energy consumption and reduce the carbon footprint of the data center.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. MAPIE - Model Agnostic Prediction Interval Estimator ; MAPIE 0.6.3 documentation — mapie.readthedocs.io. https://mapie.readthedocs.io/en/latest/. [Accessed 08-Apr-2023].
  2. Predicting with confidence: using conformal prediction in drug discovery. Journal of Pharmaceutical Sciences, 110(1):42–49, 2021.
  3. An analysis of latent sector errors in disk drives. In Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 289–300, 2007.
  4. Downtime and business continuity. In Always-On Business: Aligning Enterprise Strategies and IT in the Digital Age, pages 29–50. Springer, 2022.
  5. Mondrian conformal regressors. In Conformal and Probabilistic Prediction and Applications, pages 114–133. PMLR, 2020.
  6. The acclerated life test of hard disk in the environment of pacs. Journal of Digital Contents Society, 16(1):63–70, 2015.
  7. DrTycoon. Hdds dataset (baidu inc..), Jan 2023. URL https://www.kaggle.com/datasets/drtycoon/hdds-dataset-baidu-inc.
  8. Bayesian approaches to failure prediction for disk drives. In ICML, volume 1, pages 202–209. Citeseer, 2001.
  9. Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems. ACM SIGMETRICS Performance Evaluation Review, 36(1):241–252, 2008.
  10. Disk scrubbing versus intradisk redundancy for raid storage systems. ACM transactions on storage (TOS), 7(2):1–42, 2011.
  11. A multivariate time series streaming classifier for predicting hard drive failures [application notes]. IEEE Computational Intelligence Magazine, 17(1):102–114, 2022.
  12. Scrub unleveling: Achieving high data reliability at low scrubbing cost. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1403–1408, 2019. 10.23919/DATE.2019.8715169.
  13. Modeling the impact of disk scrubbing on storage system. J. Comput., 5(11):1629–1637, 2010.
  14. Learning under concept drift: A review. IEEE transactions on knowledge and data engineering, 31(12):2346–2363, 2018.
  15. Perseus: A {{\{{Fail-Slow}}\}} detection framework for cloud storage systems. In 21st USENIX Conference on File and Storage Technologies (FAST 23), pages 49–64, 2023.
  16. Making disk failure predictions smarter! In FAST, pages 151–167, 2020.
  17. Sample-efficient safety assurances using conformal prediction. In Algorithmic Foundations of Robotics XV: Proceedings of the Fifteenth Workshop on the Algorithmic Foundations of Robotics, pages 149–169. Springer, 2022.
  18. RAIDShield: Characterizing, monitoring, and proactively protecting against disk failures. In 13th USENIX Conference on File and Storage Technologies (FAST 15), pages 241–256, Santa Clara, CA, February 2015. USENIX Association. ISBN 978-1-931971-201. URL https://www.usenix.org/conference/fast15/technical-sessions/presentation/ma.
  19. Valery Manokhin. Awesome conformal prediction, April 2022. URL https://doi.org/10.5281/zenodo.6467205. .
  20. Deep conformal prediction for robust models. In International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pages 528–540. Springer, 2020.
  21. Class-wise confidence for debt prediction in real estate management: discussion and lessons learned from an application. In Conformal and Probabilistic Prediction and Applications, pages 211–228. PMLR, 2021.
  22. Enhancing data availability in disk drives through background activities. In 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN), pages 492–501. IEEE, 2008.
  23. A clean-slate look at disk scrubbing. In FAST, pages 57–70, 2010.
  24. Solar energy forecasting with fuzzy time series using high-order fuzzy cognitive maps. In 2020 IEEE international conference on fuzzy systems (FUZZ-IEEE), pages 1–8. IEEE, 2020.
  25. Improving disk array reliability through expedited scrubbing. In 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage, pages 119–125. IEEE, 2010.
  26. Failure trends in a large disk drive population. 2007.
  27. A comparison of machine learning algorithms for proactive hard disk drive failure detection. In Proceedings of the 4th International ACM Sigsoft Symposium on Architecting Critical Systems, ISARCS ’13, page 1–10, New York, NY, USA, 2013. Association for Computing Machinery. ISBN 9781450321235. 10.1145/2465470.2465473. URL https://doi.org/10.1145/2465470.2465473.
  28. Effects of data scrubbing on reliability in storage systems. IEICE TRANSACTIONS on Information and Systems, 92(9):1639–1649, 2009.
  29. Understanding latent sector errors and how to protect against them. ACM Transactions on storage (TOS), 6(3):1–23, 2010.
  30. A tutorial on conformal prediction. Journal of Machine Learning Research, 9(3), 2008.
  31. System and method for survival forecasting of disk drives using semi-parametric transfer learning, January 24 2023. US Patent 11,561,701.
  32. System-level hardware failure prediction using deep learning. In Proceedings of the 56th Annual Design Automation Conference 2019, pages 1–6, 2019.
  33. System and method for persistent storage failure prediction, April 22 2021. US Patent App. 16/656,875.
  34. Rahul Deo Vishwakarma and Jayanth Kumar Reddy Perneti. Method and system for reliably forecasting storage disk failure, February 4 2021. US Patent App. 16/529,499.
  35. Venn-abers predictors. arXiv preprint arXiv:1211.0025, 2012.
  36. Mondrian confidence machine. Technical Report, 2003.
  37. Algorithmic learning in a random world. Springer International Publishing, Cham, Switzerland, 2 edition, December 2022.
  38. Jie Yu. Hard disk drive failure prediction challenges in machine learning for multi-variate time series. In Proceedings of the 2019 3rd International Conference on Advances in Image Processing, pages 144–148, 2019.
  39. Tier-scrubbing: An adaptive and tiered disk scrubbing scheme with improved mttd and reduced cost. In 2020 57th ACM/IEEE Design Automation Conference (DAC), pages 1–6, 2020. 10.1109/DAC18072.2020.9218551.
  40. Multi-view feature-based {{\{{SSD}}\}} failure prediction: What, when, and why. In 21st USENIX Conference on File and Storage Technologies (FAST 23), pages 409–424, 2023.

Summary

We haven't generated a summary for this paper yet.