Performance Modeling of Data Storage Systems using Generative Models (2307.02073v2)
Abstract: High-precision modeling of systems is one of the main areas of industrial data analysis. Models of systems, their digital twins, are used to predict their behavior under various conditions. We have developed several models of a storage system using machine learning-based generative models. The system consists of several components: hard disk drive (HDD) and solid-state drive (SSD) storage pools with different RAID schemes and cache. Each storage component is represented by a probabilistic model that describes the probability distribution of the component performance in terms of IOPS and latency, depending on their configuration and external data load parameters. The results of the experiments demonstrate the errors of 4-10 % for IOPS and 3-16 % for latency predictions depending on the components and models of the system. The predictions show up to 0.99 Pearson correlation with Little's law, which can be used for unsupervised reliability checks of the models. In addition, we present novel data sets that can be used for benchmarking regression algorithms, conditional generative models, and uncertainty estimation methods in machine learning.
- James Byron, Darrell D.E. Long and Ethan L. Miller “Using Simulation to Design Scalable and Cost-Efficient Archival Storage Systems” In 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2018 IEEE DOI: 10.1109/mascots.2018.00011
- Yang Li, Li Guo and Yike Guo “An Efficient and Performance-Aware Big Data Storage System” In Communications in Computer and Information Science Springer International Publishing, 2013, pp. 102–116 DOI: 10.1007/978-3-319-04519-1˙7
- “Performance Modeling and Analysis of Flash-Based Storage Devices” In 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST), 2011, pp. 1–11 DOI: 10.1109/MSST.2011.5937213
- “Black-Box Performance Modeling for Solid-State Drives” In 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems Miami Beach, FL, USA: IEEE, 2010, pp. 391–393 DOI: 10.1109/MASCOTS.2010.48
- “Performance Modeling and Practical Use Cases for Black-Box SSDs” In ACM Transactions on Storage 17.2, 2021, pp. 14:1–14:38 DOI: 10.1145/3440022
- “Quick Generation of SSD Performance Models Using Machine Learning” In IEEE Transactions on Emerging Topics in Computing 10.4, 2022, pp. 1821–1836 DOI: 10.1109/TETC.2021.3116197
- “IMRSim: A Disk Simulator for Interlaced Magnetic Recording Technology” arXiv, 2022 DOI: 10.48550/arXiv.2206.14368
- “SimpleSSD: Modeling Solid State Drives for Holistic System Simulation” In IEEE Computer Architecture Letters 17.1, 2018, pp. 37–41 DOI: 10.1109/LCA.2017.2750658
- “Amber*: Enabling precise full-system simulation with detailed modeling of all ssd resources” In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018, pp. 469–481 IEEE
- “FlashSim: A Simulator for NAND Flash-Based Solid-State Drives” In 2009 First International Conference on Advances in System Simulation, 2009, pp. 125–131 DOI: 10.1109/SIMUL.2009.17
- “Shock and head slap simulations of operational and nonoperational hard disk drives” In IEEE transactions on magnetics 38.5 IEEE, 2002, pp. 2150–2152
- Yongkun Li, Patrick PC Lee and John CS Lui “Stochastic modeling of large-scale solid-state storage systems: Analysis, design tradeoffs and optimization” In Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems, 2013, pp. 179–190
- “Fast Performance Estimation and Design Space Exploration of SSD Using AI Techniques” In Embedded Computer Systems: Architectures, Modeling, and Simulation, Lecture Notes in Computer Science Cham: Springer International Publishing, 2020, pp. 1–17 DOI: 10.1007/978-3-030-60939-9˙1
- “A Framework for Estimating Execution Times of IO Traces on SSDs” In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Singapore Singapore: ACM, 2017, pp. 2123–2126 DOI: 10.1145/3132847.3133115
- “Perf: Linux profiling with performance counters” Accessed: 2022-01-12, https://perf.wiki.kernel.org/index.php/Main_Page
- Il’ya Meerovich Sobol’ “On the distribution of points in a cube and the approximate evaluation of integrals” In Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 7.4 Russian Academy of Sciences, Branch of Mathematical Sciences, 1967, pp. 784–802
- Anna Veronika Dorogush, Vasily Ershov and Andrey Gulin “CatBoost: gradient boosting with categorical features support” arXiv, 2018 DOI: 10.48550/ARXIV.1810.11363
- John D.C. Little “A Proof for the Queuing Formula: L= λ𝜆\lambdaitalic_λ W” In Operations Research 9.3 INFORMS, 1961, pp. 383–387 URL: http://www.jstor.org/stable/167570
- Gene H. Golub and Charles F. Van Loan “Matrix Computations (3rd Ed.)” USA: Johns Hopkins University Press, 1996
- “Normalizing Flows for Probabilistic Modeling and Inference” In Journal of Machine Learning Research 22.57, 2021, pp. 1–64 URL: http://jmlr.org/papers/v22/19-1028.html
- Laurent Dinh, Jascha Sohl-Dickstein and Samy Bengio “Density estimation using Real NVP” arXiv, 2016 DOI: 10.48550/ARXIV.1605.08803
- “The Fréchet distance between multivariate normal distributions” In Journal of Multivariate Analysis 12.3, 1982, pp. 450–455 DOI: https://doi.org/10.1016/0047-259X(82)90077-X
- “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium” In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17 Long Beach, California, USA: Curran Associates Inc., 2017, pp. 6629–6640
- “A Kernel Two-Sample Test” In Journal of Machine Learning Research 13.25, 2012, pp. 723–773 URL: http://jmlr.org/papers/v13/gretton12a.html
- “Scikit-learn: Machine Learning in Python” In Journal of Machine Learning Research 12, 2011, pp. 2825–2830
- P.S. Kostenetskiy, R.A. Chulkevich and V.I. Kozyrev “HPC Resources of the Higher School of Economics” In Journal of Physics: Conference Series 1740.1 IOP Publishing, 2021, pp. 012050 DOI: 10.1088/1742-6596/1740/1/012050