Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Performance Modeling of Data Storage Systems using Generative Models (2307.02073v2)

Published 5 Jul 2023 in cs.LG, cs.AI, and cs.PF

Abstract: High-precision modeling of systems is one of the main areas of industrial data analysis. Models of systems, their digital twins, are used to predict their behavior under various conditions. We have developed several models of a storage system using machine learning-based generative models. The system consists of several components: hard disk drive (HDD) and solid-state drive (SSD) storage pools with different RAID schemes and cache. Each storage component is represented by a probabilistic model that describes the probability distribution of the component performance in terms of IOPS and latency, depending on their configuration and external data load parameters. The results of the experiments demonstrate the errors of 4-10 % for IOPS and 3-16 % for latency predictions depending on the components and models of the system. The predictions show up to 0.99 Pearson correlation with Little's law, which can be used for unsupervised reliability checks of the models. In addition, we present novel data sets that can be used for benchmarking regression algorithms, conditional generative models, and uncertainty estimation methods in machine learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. James Byron, Darrell D.E. Long and Ethan L. Miller “Using Simulation to Design Scalable and Cost-Efficient Archival Storage Systems” In 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2018 IEEE DOI: 10.1109/mascots.2018.00011
  2. Yang Li, Li Guo and Yike Guo “An Efficient and Performance-Aware Big Data Storage System” In Communications in Computer and Information Science Springer International Publishing, 2013, pp. 102–116 DOI: 10.1007/978-3-319-04519-1˙7
  3. “Performance Modeling and Analysis of Flash-Based Storage Devices” In 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST), 2011, pp. 1–11 DOI: 10.1109/MSST.2011.5937213
  4. “Black-Box Performance Modeling for Solid-State Drives” In 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems Miami Beach, FL, USA: IEEE, 2010, pp. 391–393 DOI: 10.1109/MASCOTS.2010.48
  5. “Performance Modeling and Practical Use Cases for Black-Box SSDs” In ACM Transactions on Storage 17.2, 2021, pp. 14:1–14:38 DOI: 10.1145/3440022
  6. “Quick Generation of SSD Performance Models Using Machine Learning” In IEEE Transactions on Emerging Topics in Computing 10.4, 2022, pp. 1821–1836 DOI: 10.1109/TETC.2021.3116197
  7. “IMRSim: A Disk Simulator for Interlaced Magnetic Recording Technology” arXiv, 2022 DOI: 10.48550/arXiv.2206.14368
  8. “SimpleSSD: Modeling Solid State Drives for Holistic System Simulation” In IEEE Computer Architecture Letters 17.1, 2018, pp. 37–41 DOI: 10.1109/LCA.2017.2750658
  9. “Amber*: Enabling precise full-system simulation with detailed modeling of all ssd resources” In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018, pp. 469–481 IEEE
  10. “FlashSim: A Simulator for NAND Flash-Based Solid-State Drives” In 2009 First International Conference on Advances in System Simulation, 2009, pp. 125–131 DOI: 10.1109/SIMUL.2009.17
  11. “Shock and head slap simulations of operational and nonoperational hard disk drives” In IEEE transactions on magnetics 38.5 IEEE, 2002, pp. 2150–2152
  12. Yongkun Li, Patrick PC Lee and John CS Lui “Stochastic modeling of large-scale solid-state storage systems: Analysis, design tradeoffs and optimization” In Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems, 2013, pp. 179–190
  13. “Fast Performance Estimation and Design Space Exploration of SSD Using AI Techniques” In Embedded Computer Systems: Architectures, Modeling, and Simulation, Lecture Notes in Computer Science Cham: Springer International Publishing, 2020, pp. 1–17 DOI: 10.1007/978-3-030-60939-9˙1
  14. “A Framework for Estimating Execution Times of IO Traces on SSDs” In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Singapore Singapore: ACM, 2017, pp. 2123–2126 DOI: 10.1145/3132847.3133115
  15. “Perf: Linux profiling with performance counters” Accessed: 2022-01-12, https://perf.wiki.kernel.org/index.php/Main_Page
  16. Il’ya Meerovich Sobol’ “On the distribution of points in a cube and the approximate evaluation of integrals” In Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 7.4 Russian Academy of Sciences, Branch of Mathematical Sciences, 1967, pp. 784–802
  17. Anna Veronika Dorogush, Vasily Ershov and Andrey Gulin “CatBoost: gradient boosting with categorical features support” arXiv, 2018 DOI: 10.48550/ARXIV.1810.11363
  18. John D.C. Little “A Proof for the Queuing Formula: L= λ𝜆\lambdaitalic_λ W” In Operations Research 9.3 INFORMS, 1961, pp. 383–387 URL: http://www.jstor.org/stable/167570
  19. Gene H. Golub and Charles F. Van Loan “Matrix Computations (3rd Ed.)” USA: Johns Hopkins University Press, 1996
  20. “Normalizing Flows for Probabilistic Modeling and Inference” In Journal of Machine Learning Research 22.57, 2021, pp. 1–64 URL: http://jmlr.org/papers/v22/19-1028.html
  21. Laurent Dinh, Jascha Sohl-Dickstein and Samy Bengio “Density estimation using Real NVP” arXiv, 2016 DOI: 10.48550/ARXIV.1605.08803
  22. “The Fréchet distance between multivariate normal distributions” In Journal of Multivariate Analysis 12.3, 1982, pp. 450–455 DOI: https://doi.org/10.1016/0047-259X(82)90077-X
  23. “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium” In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17 Long Beach, California, USA: Curran Associates Inc., 2017, pp. 6629–6640
  24. “A Kernel Two-Sample Test” In Journal of Machine Learning Research 13.25, 2012, pp. 723–773 URL: http://jmlr.org/papers/v13/gretton12a.html
  25. “Scikit-learn: Machine Learning in Python” In Journal of Machine Learning Research 12, 2011, pp. 2825–2830
  26. P.S. Kostenetskiy, R.A. Chulkevich and V.I. Kozyrev “HPC Resources of the Higher School of Economics” In Journal of Physics: Conference Series 1740.1 IOP Publishing, 2021, pp. 012050 DOI: 10.1088/1742-6596/1740/1/012050

Summary

We haven't generated a summary for this paper yet.