Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey (2404.10386v1)

Published 16 Apr 2024 in cs.DC, cs.AI, and cs.LG

Abstract: High-Performance Computing (HPC) systems excel in managing distributed workloads, and the growing interest in AI has resulted in a surge in demand for faster methods of Machine Learning (ML) model training and inference. In the past, research on HPC I/O focused on optimizing the underlying storage system for modeling and simulation applications and checkpointing the results, causing writes to be the dominant I/O operation. These applications typically access large portions of the data written by simulations or experiments. ML workloads, in contrast, perform small I/O reads spread across a large number of random files. This shift of I/O access patterns poses several challenges to HPC storage systems. In this paper, we survey I/O in ML applications on HPC systems, and target literature within a 6-year time window from 2019 to 2024. We provide an overview of the common phases of ML, review available profilers and benchmarks, examine the I/O patterns encountered during ML training, explore I/O optimizations utilized in modern ML frameworks and proposed in recent literature, and lastly, present gaps requiring further R&D. We seek to summarize the common practices used in accessing data by ML applications and expose research gaps that could spawn further R&D.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (141)
  1. Column-stores vs. row-stores: how different are they really?. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD ’08). Association for Computing Machinery, New York, NY, USA, 967–980. https://doi.org/10.1145/1376616.1376712
  2. An efficient algorithm for data parallelism based on stochastic optimization. Alexandria Engineering Journal 61, 12 (2022), 12005–12017. https://doi.org/10.1016/j.aej.2022.05.052
  3. Fathom: reference workloads for modern deep learning methods. 2016 IEEE International Symposium on Workload Characterization (IISWC) (2016), 1–10. https://api.semanticscholar.org/CorpusID:1809844
  4. Large-scale analysis of disease pathways in the human interactome. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 23 (2018), 111–122. https://doi.org/10.1142/9789813235533_0011
  5. High Performance I/O For Large Scale Deep Learning. In 2019 IEEE International Conference on Big Data (Big Data). 5965–5967. https://doi.org/10.1109/BigData47090.2019.9005703
  6. Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes. arXiv:1711.04325 [cs.DC]
  7. Data normalization and standardization: a technical report. Mach Learn Tech Rep 1, 1 (2014), 1–6.
  8. Analyzing the distributed training of deep-learning models via data locality. In 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). 117–121. https://doi.org/10.1109/PDP52278.2021.00026
  9. Amazon Web Services. n.d.. Amazon Simple Storage Service (S3). https://aws.amazon.com/s3/
  10. The’K’in K-fold Cross Validation.. In ESANN. University of Genova, Via Opera Pia 11A, I-16145 Genova, Italy, 441–446.
  11. Quentin Anthony and Donglai Dai. 2021. Evaluating Multi-Level Checkpointing for Distributed Deep Neural Network Training. In 2021 SC Workshops Supplementary Proceedings (SCWS). 60–67. https://doi.org/10.1109/SCWS55283.2021.00018
  12. Apache Mesos. 2024. RecordIO: Apache Mesos Documentation. https://mesos.apache.org/documentation/latest/recordio/ Accessed: 2024-02-19.
  13. Apache Software Foundation. 2024. Apache Spark. https://spark.apache.org/.
  14. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2019), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607
  15. Weight Offloading Strategies for Training Large DNN Models. (Feb. 2022). https://inria.hal.science/hal-03580767 working paper or preprint.
  16. I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis. In 2021 IEEE/ACM Sixth International Parallel Data Systems Workshop (PDSW). 15–22. https://doi.org/10.1109/PDSW54622.2021.00008
  17. Léon Bottou. 2012. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade: Second Edition. Springer, 421–436.
  18. Optimization methods for large-scale machine learning. , 223-311 pages. Issue 2. https://doi.org/10.1137/16M1080173
  19. Simple Object Access Protocol (SOAP) 1.1. W3C Note. World Wide Web Consortium. See http://www.w3.org/TR/SOAP/.
  20. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
  21. Jason Brownlee. 2018. What is the Difference Between a Batch and an Epoch in a Neural Network. Machine Learning Mastery 20 (2018).
  22. Cost-Effective HPC: The Community or the Cloud?. In 2010 IEEE Second International Conference on Cloud Computing Technology and Science. 169–176. https://doi.org/10.1109/CloudCom.2010.115
  23. AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (Apr. 2018). https://doi.org/10.1609/aaai.v32i1.11728
  24. BenchNN: On the broad potential application scope of hardware neural network accelerators. Proceedings - 2012 IEEE International Symposium on Workload Characterization, IISWC 2012, 36–45. https://doi.org/10.1109/IISWC.2012.6402898
  25. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv:1512.01274 [cs.DC]
  26. iCache: An Importance-Sampling-Informed Cache for Accelerating I/O-Bound DNN Model Training. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 220–232. https://doi.org/10.1109/HPCA56546.2023.10070964
  27. On Efficient Constructions of Checkpoints. arXiv:2009.13003 [cs.LG]
  28. Peng Cheng and Haryadi S. Gunawi. 2021. Storage Benchmarking with Deep Learning Workloads. https://api.semanticscholar.org/CorpusID:231845927
  29. tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE Computer Society, Los Alamitos, CA, USA, 359–370. https://doi.org/10.1109/CLUSTER49012.2020.00046
  30. Towards accelerating model parallelism in distributed deep learning systems. PLoS One 18, 11 (Nov. 2023), e0293338.
  31. DDStore: Distributed Data Store for Scalable Training of Graph Neural Networks on Large Atomistic Modeling Datasets. ACM International Conference Proceeding Series, 941–950. https://doi.org/10.1145/3624062.3624171
  32. I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning. In Proceedings of the 48th International Conference on Parallel Processing (Kyoto, Japan) (ICPP 2019). Association for Computing Machinery, New York, NY, USA, Article 80, 10 pages. https://doi.org/10.1145/3337821.3337902
  33. Donglai Dai. 2022. SCR-Exa: Enhanced Scalable Checkpoint Restart (SCR) Library for Next Generation Exascale Computing. (2 2022). https://www.osti.gov/biblio/1847927
  34. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv:2205.14135 [cs.LG]
  35. Dask. 2024. Dask: Scalable Analytics in Python. Accessed on February 5, 2024.
  36. The UCR Time Series Classification Archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/.
  37. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
  38. Li Deng. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141–142.
  39. DLIO: A Data-Centric Benchmark for Scientific Deep Learning Applications. In 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 81–91. https://doi.org/10.1109/CCGrid51090.2021.00018
  40. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL]
  41. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. In Advances in Neural Information Processing Systems.
  42. Optimizing Asynchronous Multi-Level Checkpoint/Restart Configurations with Machine Learning. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1036–1043. https://doi.org/10.1109/IPDPSW50202.2020.00174
  43. Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management. https://doi.org/10.5441/002/edbt.2019.28
  44. Clairvoyant Prefetching for Distributed Machine Learning I/O. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (St. Louis, Missouri) (SC ’21). Association for Computing Machinery, New York, NY, USA, Article 92, 15 pages. https://doi.org/10.1145/3458817.3476181
  45. Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, 929–943. https://www.usenix.org/conference/nsdi22/presentation/eisenman
  46. Facebook Incubator. 2024. Gloo. https://github.com/facebookincubator/gloo.
  47. Roy Thomas Fielding and Richard N. Taylor. 2000. Architectural styles and the design of network-based software architectures. Ph. D. Dissertation. AAI9980887.
  48. Apache Software Foundation. 2024. Apache Parquet. Apache Software Foundation. https://parquet.apache.org/ Version 2.10.0.
  49. Scheduling the I/O of HPC Applications Under Congestion. In 2015 IEEE International Parallel and Distributed Processing Symposium. 1013–1022. https://doi.org/10.1109/IPDPS.2015.116
  50. Understanding and Leveraging the I/O Patterns of Emerging Machine Learning Analytics. 119–138. https://doi.org/10.1007/978-3-030-96498-6_7
  51. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv preprint arXiv:2101.00027 (2020).
  52. The Scalasca Performance Toolset Architecture. Concurr. Comput. : Pract. Exper. 22, 6 (apr 2010), 702–719.
  53. Audio Set: An ontology and human-labeled dataset for audio events. In Proc. IEEE ICASSP 2017. New Orleans, LA.
  54. Understanding Lustre Internals Second Edition. (9 2021). https://doi.org/10.2172/1824954
  55. Google Cloud. n.d.. Google Cloud Storage. https://cloud.google.com/storage
  56. NVIDIA GPUDirect. 2024. Enhancing Data Movement and Access for GPUs. https://developer.nvidia.com/gpudirect Accessed: February 26, 2024.
  57. Deep Lake: a Lakehouse for Deep Learning. arXiv:2209.10785 [cs.DC]
  58. Data Readiness for AI: A 360-Degree Survey. arXiv:2404.05779 [cs.LG]
  59. Open Graph Benchmark: Datasets for Machine Learning on Graphs Steering Committee. https://ogb.stanford.edu.
  60. Overview and Importance of Data Quality for Machine Learning Tasks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA, 3561–3562. https://doi.org/10.1145/3394486.3406477
  61. Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System. arXiv:2403.06664 [cs.AR]
  62. A Survey of Distributed Data Aggregation Algorithms. IEEE Communications Surveys & Tutorials 17, 1 (2015), 381–404. https://doi.org/10.1109/COMST.2014.2354398
  63. Data discretization unification. Knowledge and Information Systems 19, 1 (2009), 1–29. https://doi.org/10.1007/s10115-008-0142-6
  64. In-Datacenter Performance Analysis of a Tensor Processing Unit. arXiv:1704.04760 [cs.AR]
  65. Khalid M. Kahloot and Peter Ekler. 2021. Algorithmic Splitting: A Method for Dataset Preparation. IEEE Access 9 (2021), 125229–125237. https://doi.org/10.1109/ACCESS.2021.3110745
  66. Leveraging burst buffer coordination to prevent I/O interference. In 2016 IEEE 12th International Conference on e-Science (e-Science). 371–380. https://doi.org/10.1109/eScience.2016.7870922
  67. Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images.
  68. Progressive compressed records: Taking a byte out of deep learning data. Proceedings of the VLDB Endowment 14, 2627–2641. Issue 11. https://doi.org/10.14778/3476249.3476308
  69. HMDB: a large video database for human motion recognition. In Proceedings of the International Conference on Computer Vision (ICCV).
  70. Alexander Lavin and Subutai Ahmad. 2015. Evaluating Real-time Anomaly Detection Algorithms - the Numenta Anomaly Benchmark. CoRR abs/1510.03336 (2015). arXiv:1510.03336 http://arxiv.org/abs/1510.03336
  71. FFCV: Accelerating Training by Removing Data Bottlenecks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2023-June, 12011–12020. https://doi.org/10.1109/CVPR52729.2023.01156
  72. A Case Study of Data Management Challenges Presented in Large-Scale Machine Learning Workflows. In 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 71–81. https://doi.org/10.1109/CCGrid57682.2023.00017
  73. Jeongha Lee and Hyokyung Bahn. 2023. Analyzing Data Reference Characteristics of Deep Learning Workloads for Improving Buffer Cache Performance. Applied Sciences 13, 22 (2023). https://doi.org/10.3390/app132212102
  74. Asynchronous I/O Strategy for Large-Scale Deep Learning Applications. In 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC). 322–331. https://doi.org/10.1109/HiPC53243.2021.00046
  75. Stratified Sampling Meets Machine Learning. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 2320–2329. https://proceedings.mlr.press/v48/liberty16.html
  76. Efficient Data Loading for Deep Neural Network Training. In 2023 9th International Conference on Big Data Computing and Communications (BigCom). 211–218. https://doi.org/10.1109/BIGCOM61073.2023.00036
  77. Rise of Distributed Deep Learning Training in the Big Model Era: From a Software Engineering Perspective. ACM Trans. Softw. Eng. Methodol. 32, 6, Article 156 (sep 2023), 26 pages. https://doi.org/10.1145/3597204
  78. StarCoder 2 and The Stack v2: The Next Generation. arXiv:2402.19173 [cs.SE]
  79. The M4 Competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting 36, 1 (2020), 54–74. https://doi.org/10.1016/j.ijforecast.2019.04.014 M4 Competition.
  80. A comparative study of high-performance computing on the cloud. In Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing (New York, New York, USA) (HPDC ’13). Association for Computing Machinery, New York, NY, USA, 239–250. https://doi.org/10.1145/2493123.2462919
  81. CosmoFlow: Using Deep Learning to Learn the Universe at Scale. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. 819–829. https://doi.org/10.1109/SC.2018.00068
  82. MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance. IEEE Micro 40, 2 (2020), 8–16. https://doi.org/10.1109/MM.2020.2974843
  83. Andrew McCallum. 2017. Cora Dataset. https://doi.org/10.18738/T8/HUIG48
  84. Convergence analysis of distributed stochastic gradient descent with shuffling. Neurocomputing 337 (2019), 46–57. https://doi.org/10.1016/j.neucom.2019.01.037
  85. Xiangrui Meng. 2013. Scalable Simple Random Sampling and Stratified Sampling. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 28), Sanjoy Dasgupta and David McAllester (Eds.). PMLR, Atlanta, Georgia, USA, 531–539. https://proceedings.mlr.press/v28/meng13a.html
  86. Microsoft 2024. Open Database Connectivity (ODBC). Microsoft. https://docs.microsoft.com/en-us/sql/odbc/reference/odbc
  87. Takéhiko Nakama. 2009. Theoretical analysis of batch and on-line training for gradient descent learning in neural networks. Neurocomputing 73, 1 (2009), 151–159. https://doi.org/10.1016/j.neucom.2009.05.017 Timely Developments in Applied Neural Computing (EANN 2007) / Some Novel Analysis and Learning Methods for Neural Networks (ISNN 2008) / Pattern Recognition in Graphical Domains.
  88. RECENT TRENDS IN STOCHASTIC GRADIENT DESCENT FOR MACHINE LEARNING AND BIG DATA. In 2018 Winter Simulation Conference (WSC). 366–380. https://doi.org/10.1109/WSC.2018.8632351
  89. Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning. (2022), 1085–1096. https://doi.org/10.1109/IPDPS53621.2022.00109
  90. DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models. In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). 172–181. https://doi.org/10.1109/CCGrid49817.2020.00-76
  91. VeloC: Towards High Performance Adaptive Asynchronous Checkpointing at Large Scale. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 911–920. https://doi.org/10.1109/IPDPS.2019.00099
  92. NVIDIA. 2024a. GPUDirect RDMA: Direct Communcation between NVIDIA GPUs. https://developer.nvidia.com/blog/gpudirect-storage Accessed: February 26, 2024.
  93. NVIDIA. 2024b. GPUDirect Storage: A Direct Path Between Storage and GPU Memory. https://docs.nvidia.com/cuda/gpudirect-rdma/index.html Accessed: February 26, 2024.
  94. NVIDIA Corporation. 2024. NVIDIA DALI. https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
  95. The Impact of Data Pre-Processing Techniques and Dimensionality Reduction on the Accuracy of Machine Learning. In 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON). 279–283. https://doi.org/10.1109/IEMECONX.2019.8877011
  96. Travis E. Oliphant. 2006. NumPy: A guide to NumPy. Trelgol Publishing (2006). https://numpy.org/.
  97. LibriSpeech: an ASR corpus based on public domain audio books. http://www.openslr.org/12. Accessed: February 26, 2024.
  98. Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964
  99. Understanding HPC Application I/O Behavior Using System Level Statistics. In 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC). IEEE, Pune, India, 202–211. https://doi.org/10.1109/HiPC50609.2020.00034
  100. Characterizing Machine Learning I/O Workloads on Leadership Scale HPC Systems. In 2021 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). 1–8. https://doi.org/10.1109/MASCOTS53633.2021.9614303
  101. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only. arXiv:2306.01116 [cs.CL]
  102. Pytorch. 2024. PyTorch 2.2.0 Documentation. PyTorch. Accessed on February 1, 2024.
  103. Analyzing the I/O Patterns of Deep Learning Applications. Vol. 1444 CCIS. Springer Science and Business Media Deutschland GmbH, N/A, 3–16. https://doi.org/10.1007/978-3-030-84825-5_1
  104. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. arXiv:2104.07857 [cs.DC]
  105. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA, 3505–3506. https://doi.org/10.1145/3394486.3406703
  106. MLPerf Inference Benchmark. arXiv:1911.02549 [cs.LG]
  107. Brian D. Ripley. 1996. Pattern Recognition and Neural Networks. Cambridge University Press.
  108. A Study of Checkpointing in Large Scale Training of Deep Neural Networks. arXiv:2012.00825 [cs.DC]
  109. Sebastian Ruder. 2017. An overview of gradient descent optimization algorithms. arXiv:1609.04747 [cs.LG]
  110. A Dataset and Taxonomy for Urban Sound Research. In 22nd ACM International Conference on Multimedia (ACM-MM’14). Orlando, FL, USA, 1041–1044.
  111. Scikit-learn. 2024. scikit-learn: Machine Learning in Python. Accessed on February 5, 2024.
  112. Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799 [cs.LG]
  113. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv:1909.08053 [cs.CL]
  114. Connor Shorten and Taghi M. Khoshgoftaar. 2019. A survey on Image Data Augmentation for Deep Learning. Journal of Big Data 6, 1 (06 07 2019), 60. https://doi.org/10.1186/s40537-019-0197-0
  115. Text Data Augmentation for Deep Learning. Journal of Big Data 8, 1 (19 07 2021), 101. https://doi.org/10.1186/s40537-021-00492-0
  116. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding. ArXiv e-prints (2016). arXiv:1604.01753 http://arxiv.org/abs/1604.01753
  117. TensorFlow.js: Machine Learning for the Web and Beyond. arXiv:1901.05350 [cs.LG]
  118. On the Generalization Benefit of Noise in Stochastic Gradient Descent. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, Cornell University, 9058–9067. https://proceedings.mlr.press/v119/smith20a.html
  119. Modular HPC I/O Characterization with Darshan. In 2016 5th Workshop on Extreme-Scale Programming Tools (ESPT). 2016 5th workshop on extreme-scale programming tools (ESPT), Salt Lake City, UT, USA, 9–17. https://doi.org/10.1109/ESPT.2016.006
  120. Modular HPC I/O Characterization with Darshan. In Proceedings of the 5th Workshop on Extreme-Scale Programming Tools (Salt Lake City, Utah) (ESPT ’16). IEEE Press, 9–17.
  121. Using GPUs for machine learning algorithms. In Eighth International Conference on Document Analysis and Recognition (ICDAR’05). 1115–1120 Vol. 2. https://doi.org/10.1109/ICDAR.2005.251
  122. Jonathan J. Stickel. 2010. Data smoothing and numerical differentiation by a regularization method. Computers & Chemical Engineering 34, 4 (2010), 467–475. https://doi.org/10.1016/j.compchemeng.2009.10.007
  123. SUN. 2007. High-Performance Storage Architecture and Scalable Cluster File System. Technical Report. Sun Microsystems, Inc.
  124. STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model Training. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–17. https://doi.org/10.1109/SC41404.2022.00076
  125. Profiling and Improving the PyTorch Dataloader for high-latency Storage: A Technical Report. arXiv:2211.04908 [cs.LG]
  126. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. arXiv:1703.09039 [cs.CV]
  127. TensorFlow. 2024. TensorFlow 2.15.0 Documentation. TensorFlow. Accessed on February 5, 2024.
  128. TensorFlow. 2024a. TensorFlow Hub. https://www.tensorflow.org/hub.
  129. TensorFlow. 2024b. TensorFlow Lite. https://www.tensorflow.org/lite.
  130. TensorFlow. 2024c. TFRecord and tf.train.Example. TensorFlow. https://www.tensorflow.org/tutorials/load_data/tfrecord Accessed on 2024-02-19.
  131. The HDF Group. 1997-2022. Hierarchical Data Format, version 5. The HDF Group. https://www.hdfgroup.org/HDF5/.
  132. THUMOS Challenge Workshop. 2012. UFC-101 Dataset. http://www.thumos.info/download.html. Accessed: February 26, 2024.
  133. A conceptual basis for feature engineering. Journal of Systems and Software 49, 1 (1999), 3–15. https://doi.org/10.1016/S0164-1212(99)00062-X
  134. Machine Learning Model Sizes and the Parameter Gap. arXiv:2207.02852 [cs.LG]
  135. I/O Performance Characterization and Prediction through Machine Learning on HPC Systems. In CUG2020. CUG, Dallas Convention Center Room A309-A310 650 S Griffin St Dallas, TX 75202, 10.
  136. DIESEL+: Accelerating Distributed Deep Learning Tasks on Image Datasets. IEEE Transactions on Parallel and Distributed Systems 33 (5 2022), 1173–1184. Issue 5. https://doi.org/10.1109/TPDS.2021.3104252
  137. A Deep Learning Dataloader with Shared Data Preparation. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., Virtual, 17146–17156. https://proceedings.neurips.cc/paper_files/paper/2022/file/6d538a6e667960b168d3d947eb6207a6-Paper-Conference.pdf
  138. Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities based on Ground-truth. arXiv:1205.6233 [cs.SI]
  139. Benbo Zha and Hong Shen. 2022. Adaptively Periodic I/O Scheduling for Concurrent HPC Applications. Electronics (Switzerland) 11 (5 2022). Issue 9. https://doi.org/10.3390/electronics11091318
  140. Performance Evaluation and Optimization of HBM-Enabled GPU for Data-Intensive Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 5 (2018), 831–840. https://doi.org/10.1109/TVLSI.2018.2791442
  141. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. arXiv:1606.06650 [cs.CV]

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com