ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems (2312.06131v2)
Abstract: Parallel applications can spend a significant amount of time performing I/O on large-scale supercomputers. Fast near-compute storage accelerators called burst buffers can reduce the time a processor spends performing I/O and mitigate I/O bottlenecks. However, determining if a given application could be accelerated using burst buffers is not straightforward even for storage experts. The relationship between an application's I/O characteristics (such as I/O volume, processes involved, etc.) and the best storage sub-system for it can be complicated. As a result, adapting parallel applications to use burst buffers efficiently is a trial-and-error process. In this work, we present a Python-based tool called PrismIO that enables programmatic analysis of I/O traces. Using PrismIO, we identify bottlenecks on burst buffers and parallel file systems and explain why certain I/O patterns perform poorly. Further, we use machine learning to model the relationship between I/O characteristics and burst buffer selections. We run IOR (an I/O benchmark) with various I/O characteristics on different storage systems and collect performance data. We use the data as the input for training the model. Our model can predict if a file of an application should be placed on BBs for unseen IOR scenarios with an accuracy of 94.47% and for four real applications with an accuracy of 95.86%.
- A. Gainaru, G. Aupy, A. Benoit, F. Cappello, Y. Robert, and M. Snir, “Scheduling the i/o of hpc applications under congestion,” in 2015 IEEE International Parallel and Distributed Processing Symposium, 2015, pp. 1013–1022.
- N. T. Hjelm, “libhio: Optimizing io on cray xc systems with datawarp,” 2017.
- T. Wang, K. Mohror, A. Moody, K. Sato, and W. Yu, “An ephemeral burst-buffer file system for scientific applications,” in SC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, pp. 807–818.
- K. Sato, N. Maruyama, K. Mohror, A. Moody, T. Gamblin, B. R. de Supinski, and S. Matsuoka, “Design and modeling of a non-blocking checkpointing system,” in SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012, pp. 1–10.
- B. Nicolae, A. Moody, E. Gonsiorowski, K. Mohror, and F. Cappello, “Veloc: Towards high performance adaptive asynchronous checkpointing at large scale,” in 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019, pp. 911–920.
- K. Sato, K. Mohror, A. Moody, T. Gamblin, B. R. d. Supinski, N. Maruyama, and S. Matsuoka, “A user-level infiniband-based file system and checkpoint strategy for burst buffers,” in 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2014, pp. 21–30.
- W. Bhimji, D. Bard, M. Romanus, D. Paul, A. Ovsyannikov, B. Friesen, M. Bryson, J. Correa, G. K. Lockwood, V. Tsulaia, S. Byna, S. Farrell, D. Gursoy, C. Daley, V. Beckner, B. Van Straalen, D. Trebotich, C. Tull, G. H. Weber, N. J. Wright, K. Antypas, and n. Prabhat, “Accelerating science with the nersc burst buffer early user program,” 1 2016. [Online]. Available: https://www.osti.gov/biblio/1393591
- A. Ovsyannikov, M. Romanus, B. Van Straalen, G. H. Weber, and D. Trebotich, “Scientific workflows at datawarp-speed: Accelerated data-intensive science using nersc’s burst buffer,” in 2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS), 2016, pp. 1–6.
- “Ior.” [Online]. Available: https://ior.readthedocs.io/en/latest/index.html
- Y.-F. Guo, Q. Li, G.-M. Liu, Y.-S. Cao, and L. Zhang, “A distributed shared parallel io system for hpc,” in Fifth International Conference on Information Technology: New Generations (itng 2008), 2008, pp. 229–234.
- Using lc’s sierra systems. [Online]. Available: hpc.llnl.gov/documentation/tutorials/using-lc-s-sierra-systems
- L. Cao, B. W. Settlemyer, and J. Bent, “To share or not to share: Comparing burst buffer architectures,” in Proceedings of the 25th High Performance Computing Symposium, ser. HPC ’17. San Diego, CA, USA: Society for Computer Simulation International, 2017.
- B. R. Landsteiner, D. Henseler, D. Petesch, and N. J. Wright, “Architecture and design of cray datawarp,” 2016.
- L. Pottier, R. F. da Silva, H. Casanova, and E. Deelman, “Modeling the performance of scientific workflow executions on hpc platforms with burst buffers,” in 2020 IEEE International Conference on Cluster Computing (CLUSTER), 2020, pp. 92–103.
- Pydarshan documentation — pydarshan 3.3.1.0 documentation. [Online]. Available: https://www.mcs.anl.gov/research/projects/darshan/docs/pydarshan/index.html
- P. Carns, K. Harms, W. Allcock, C. Bacon, S. Lang, R. Latham, and R. Ross, “Understanding and improving computational science storage access through continuous characterization,” ACM Trans. Storage, vol. 7, no. 3, oct 2011. [Online]. Available: https://doi.org/10.1145/2027066.2027068
- C. Wang, J. Sun, M. Snir, K. Mohror, and E. Gonsiorowski, “Recorder 2.0: Efficient parallel i/o tracing and analysis,” in 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2020, pp. 1–8.
- G. K. Lockwood, S. Snyder, S. Byna, P. Carns, and N. J. Wright, “Understanding data motion in the modern hpc data center,” in 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), 2019, pp. 74–83.
- T. Wang, S. Byna, G. K. Lockwood, S. Snyder, P. Carns, S. Kim, and N. J. Wright, “A zoom-in analysis of i/o logs to detect root causes of i/o performance bottlenecks,” in 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2019, pp. 102–111.
- D. Quintero, J. Bolinches, J. Chaudhary, W. Davis, S. Duersch, C. H. Fachim, A. Socoliuc, and O. Weiser, “Ibm spectrum scale (formerly gpfs),” 2020.
- H. Shan, K. Antypas, and J. Shalf, “Characterizing and predicting the i/o performance of hpc applications using a parameterized synthetic benchmark,” in SC ’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, 2008, pp. 1–12.
- J. M. Kunkel, J. Bent, J. Lofstead, and G. S. Markomanolis, “Establishing the io-500 benchmark,” 2017.
- ——, “Data structures for statistical computing in python,” in Proceedings of the 9th Python in Science Conference, S. van der Walt and J. Millman, Eds., 2010, pp. 51 – 56.