I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey (2404.10386v1)
Abstract: High-Performance Computing (HPC) systems excel in managing distributed workloads, and the growing interest in AI has resulted in a surge in demand for faster methods of Machine Learning (ML) model training and inference. In the past, research on HPC I/O focused on optimizing the underlying storage system for modeling and simulation applications and checkpointing the results, causing writes to be the dominant I/O operation. These applications typically access large portions of the data written by simulations or experiments. ML workloads, in contrast, perform small I/O reads spread across a large number of random files. This shift of I/O access patterns poses several challenges to HPC storage systems. In this paper, we survey I/O in ML applications on HPC systems, and target literature within a 6-year time window from 2019 to 2024. We provide an overview of the common phases of ML, review available profilers and benchmarks, examine the I/O patterns encountered during ML training, explore I/O optimizations utilized in modern ML frameworks and proposed in recent literature, and lastly, present gaps requiring further R&D. We seek to summarize the common practices used in accessing data by ML applications and expose research gaps that could spawn further R&D.
- Column-stores vs. row-stores: how different are they really?. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD ’08). Association for Computing Machinery, New York, NY, USA, 967–980. https://doi.org/10.1145/1376616.1376712
- An efficient algorithm for data parallelism based on stochastic optimization. Alexandria Engineering Journal 61, 12 (2022), 12005–12017. https://doi.org/10.1016/j.aej.2022.05.052
- Fathom: reference workloads for modern deep learning methods. 2016 IEEE International Symposium on Workload Characterization (IISWC) (2016), 1–10. https://api.semanticscholar.org/CorpusID:1809844
- Large-scale analysis of disease pathways in the human interactome. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 23 (2018), 111–122. https://doi.org/10.1142/9789813235533_0011
- High Performance I/O For Large Scale Deep Learning. In 2019 IEEE International Conference on Big Data (Big Data). 5965–5967. https://doi.org/10.1109/BigData47090.2019.9005703
- Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes. arXiv:1711.04325 [cs.DC]
- Data normalization and standardization: a technical report. Mach Learn Tech Rep 1, 1 (2014), 1–6.
- Analyzing the distributed training of deep-learning models via data locality. In 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). 117–121. https://doi.org/10.1109/PDP52278.2021.00026
- Amazon Web Services. n.d.. Amazon Simple Storage Service (S3). https://aws.amazon.com/s3/
- The’K’in K-fold Cross Validation.. In ESANN. University of Genova, Via Opera Pia 11A, I-16145 Genova, Italy, 441–446.
- Quentin Anthony and Donglai Dai. 2021. Evaluating Multi-Level Checkpointing for Distributed Deep Neural Network Training. In 2021 SC Workshops Supplementary Proceedings (SCWS). 60–67. https://doi.org/10.1109/SCWS55283.2021.00018
- Apache Mesos. 2024. RecordIO: Apache Mesos Documentation. https://mesos.apache.org/documentation/latest/recordio/ Accessed: 2024-02-19.
- Apache Software Foundation. 2024. Apache Spark. https://spark.apache.org/.
- Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2019), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607
- Weight Offloading Strategies for Training Large DNN Models. (Feb. 2022). https://inria.hal.science/hal-03580767 working paper or preprint.
- I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis. In 2021 IEEE/ACM Sixth International Parallel Data Systems Workshop (PDSW). 15–22. https://doi.org/10.1109/PDSW54622.2021.00008
- Léon Bottou. 2012. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade: Second Edition. Springer, 421–436.
- Optimization methods for large-scale machine learning. , 223-311 pages. Issue 2. https://doi.org/10.1137/16M1080173
- Simple Object Access Protocol (SOAP) 1.1. W3C Note. World Wide Web Consortium. See http://www.w3.org/TR/SOAP/.
- Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
- Jason Brownlee. 2018. What is the Difference Between a Batch and an Epoch in a Neural Network. Machine Learning Mastery 20 (2018).
- Cost-Effective HPC: The Community or the Cloud?. In 2010 IEEE Second International Conference on Cloud Computing Technology and Science. 169–176. https://doi.org/10.1109/CloudCom.2010.115
- AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (Apr. 2018). https://doi.org/10.1609/aaai.v32i1.11728
- BenchNN: On the broad potential application scope of hardware neural network accelerators. Proceedings - 2012 IEEE International Symposium on Workload Characterization, IISWC 2012, 36–45. https://doi.org/10.1109/IISWC.2012.6402898
- MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv:1512.01274 [cs.DC]
- iCache: An Importance-Sampling-Informed Cache for Accelerating I/O-Bound DNN Model Training. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 220–232. https://doi.org/10.1109/HPCA56546.2023.10070964
- On Efficient Constructions of Checkpoints. arXiv:2009.13003 [cs.LG]
- Peng Cheng and Haryadi S. Gunawi. 2021. Storage Benchmarking with Deep Learning Workloads. https://api.semanticscholar.org/CorpusID:231845927
- tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE Computer Society, Los Alamitos, CA, USA, 359–370. https://doi.org/10.1109/CLUSTER49012.2020.00046
- Towards accelerating model parallelism in distributed deep learning systems. PLoS One 18, 11 (Nov. 2023), e0293338.
- DDStore: Distributed Data Store for Scalable Training of Graph Neural Networks on Large Atomistic Modeling Datasets. ACM International Conference Proceeding Series, 941–950. https://doi.org/10.1145/3624062.3624171
- I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning. In Proceedings of the 48th International Conference on Parallel Processing (Kyoto, Japan) (ICPP 2019). Association for Computing Machinery, New York, NY, USA, Article 80, 10 pages. https://doi.org/10.1145/3337821.3337902
- Donglai Dai. 2022. SCR-Exa: Enhanced Scalable Checkpoint Restart (SCR) Library for Next Generation Exascale Computing. (2 2022). https://www.osti.gov/biblio/1847927
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv:2205.14135 [cs.LG]
- Dask. 2024. Dask: Scalable Analytics in Python. Accessed on February 5, 2024.
- The UCR Time Series Classification Archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/.
- ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
- Li Deng. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141–142.
- DLIO: A Data-Centric Benchmark for Scientific Deep Learning Applications. In 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 81–91. https://doi.org/10.1109/CCGrid51090.2021.00018
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL]
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. In Advances in Neural Information Processing Systems.
- Optimizing Asynchronous Multi-Level Checkpoint/Restart Configurations with Machine Learning. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1036–1043. https://doi.org/10.1109/IPDPSW50202.2020.00174
- Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management. https://doi.org/10.5441/002/edbt.2019.28
- Clairvoyant Prefetching for Distributed Machine Learning I/O. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (St. Louis, Missouri) (SC ’21). Association for Computing Machinery, New York, NY, USA, Article 92, 15 pages. https://doi.org/10.1145/3458817.3476181
- Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, 929–943. https://www.usenix.org/conference/nsdi22/presentation/eisenman
- Facebook Incubator. 2024. Gloo. https://github.com/facebookincubator/gloo.
- Roy Thomas Fielding and Richard N. Taylor. 2000. Architectural styles and the design of network-based software architectures. Ph. D. Dissertation. AAI9980887.
- Apache Software Foundation. 2024. Apache Parquet. Apache Software Foundation. https://parquet.apache.org/ Version 2.10.0.
- Scheduling the I/O of HPC Applications Under Congestion. In 2015 IEEE International Parallel and Distributed Processing Symposium. 1013–1022. https://doi.org/10.1109/IPDPS.2015.116
- Understanding and Leveraging the I/O Patterns of Emerging Machine Learning Analytics. 119–138. https://doi.org/10.1007/978-3-030-96498-6_7
- The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv preprint arXiv:2101.00027 (2020).
- The Scalasca Performance Toolset Architecture. Concurr. Comput. : Pract. Exper. 22, 6 (apr 2010), 702–719.
- Audio Set: An ontology and human-labeled dataset for audio events. In Proc. IEEE ICASSP 2017. New Orleans, LA.
- Understanding Lustre Internals Second Edition. (9 2021). https://doi.org/10.2172/1824954
- Google Cloud. n.d.. Google Cloud Storage. https://cloud.google.com/storage
- NVIDIA GPUDirect. 2024. Enhancing Data Movement and Access for GPUs. https://developer.nvidia.com/gpudirect Accessed: February 26, 2024.
- Deep Lake: a Lakehouse for Deep Learning. arXiv:2209.10785 [cs.DC]
- Data Readiness for AI: A 360-Degree Survey. arXiv:2404.05779 [cs.LG]
- Open Graph Benchmark: Datasets for Machine Learning on Graphs Steering Committee. https://ogb.stanford.edu.
- Overview and Importance of Data Quality for Machine Learning Tasks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA, 3561–3562. https://doi.org/10.1145/3394486.3406477
- Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System. arXiv:2403.06664 [cs.AR]
- A Survey of Distributed Data Aggregation Algorithms. IEEE Communications Surveys & Tutorials 17, 1 (2015), 381–404. https://doi.org/10.1109/COMST.2014.2354398
- Data discretization unification. Knowledge and Information Systems 19, 1 (2009), 1–29. https://doi.org/10.1007/s10115-008-0142-6
- In-Datacenter Performance Analysis of a Tensor Processing Unit. arXiv:1704.04760 [cs.AR]
- Khalid M. Kahloot and Peter Ekler. 2021. Algorithmic Splitting: A Method for Dataset Preparation. IEEE Access 9 (2021), 125229–125237. https://doi.org/10.1109/ACCESS.2021.3110745
- Leveraging burst buffer coordination to prevent I/O interference. In 2016 IEEE 12th International Conference on e-Science (e-Science). 371–380. https://doi.org/10.1109/eScience.2016.7870922
- Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images.
- Progressive compressed records: Taking a byte out of deep learning data. Proceedings of the VLDB Endowment 14, 2627–2641. Issue 11. https://doi.org/10.14778/3476249.3476308
- HMDB: a large video database for human motion recognition. In Proceedings of the International Conference on Computer Vision (ICCV).
- Alexander Lavin and Subutai Ahmad. 2015. Evaluating Real-time Anomaly Detection Algorithms - the Numenta Anomaly Benchmark. CoRR abs/1510.03336 (2015). arXiv:1510.03336 http://arxiv.org/abs/1510.03336
- FFCV: Accelerating Training by Removing Data Bottlenecks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2023-June, 12011–12020. https://doi.org/10.1109/CVPR52729.2023.01156
- A Case Study of Data Management Challenges Presented in Large-Scale Machine Learning Workflows. In 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 71–81. https://doi.org/10.1109/CCGrid57682.2023.00017
- Jeongha Lee and Hyokyung Bahn. 2023. Analyzing Data Reference Characteristics of Deep Learning Workloads for Improving Buffer Cache Performance. Applied Sciences 13, 22 (2023). https://doi.org/10.3390/app132212102
- Asynchronous I/O Strategy for Large-Scale Deep Learning Applications. In 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC). 322–331. https://doi.org/10.1109/HiPC53243.2021.00046
- Stratified Sampling Meets Machine Learning. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 2320–2329. https://proceedings.mlr.press/v48/liberty16.html
- Efficient Data Loading for Deep Neural Network Training. In 2023 9th International Conference on Big Data Computing and Communications (BigCom). 211–218. https://doi.org/10.1109/BIGCOM61073.2023.00036
- Rise of Distributed Deep Learning Training in the Big Model Era: From a Software Engineering Perspective. ACM Trans. Softw. Eng. Methodol. 32, 6, Article 156 (sep 2023), 26 pages. https://doi.org/10.1145/3597204
- StarCoder 2 and The Stack v2: The Next Generation. arXiv:2402.19173 [cs.SE]
- The M4 Competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting 36, 1 (2020), 54–74. https://doi.org/10.1016/j.ijforecast.2019.04.014 M4 Competition.
- A comparative study of high-performance computing on the cloud. In Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing (New York, New York, USA) (HPDC ’13). Association for Computing Machinery, New York, NY, USA, 239–250. https://doi.org/10.1145/2493123.2462919
- CosmoFlow: Using Deep Learning to Learn the Universe at Scale. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. 819–829. https://doi.org/10.1109/SC.2018.00068
- MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance. IEEE Micro 40, 2 (2020), 8–16. https://doi.org/10.1109/MM.2020.2974843
- Andrew McCallum. 2017. Cora Dataset. https://doi.org/10.18738/T8/HUIG48
- Convergence analysis of distributed stochastic gradient descent with shuffling. Neurocomputing 337 (2019), 46–57. https://doi.org/10.1016/j.neucom.2019.01.037
- Xiangrui Meng. 2013. Scalable Simple Random Sampling and Stratified Sampling. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 28), Sanjoy Dasgupta and David McAllester (Eds.). PMLR, Atlanta, Georgia, USA, 531–539. https://proceedings.mlr.press/v28/meng13a.html
- Microsoft 2024. Open Database Connectivity (ODBC). Microsoft. https://docs.microsoft.com/en-us/sql/odbc/reference/odbc
- Takéhiko Nakama. 2009. Theoretical analysis of batch and on-line training for gradient descent learning in neural networks. Neurocomputing 73, 1 (2009), 151–159. https://doi.org/10.1016/j.neucom.2009.05.017 Timely Developments in Applied Neural Computing (EANN 2007) / Some Novel Analysis and Learning Methods for Neural Networks (ISNN 2008) / Pattern Recognition in Graphical Domains.
- RECENT TRENDS IN STOCHASTIC GRADIENT DESCENT FOR MACHINE LEARNING AND BIG DATA. In 2018 Winter Simulation Conference (WSC). 366–380. https://doi.org/10.1109/WSC.2018.8632351
- Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning. (2022), 1085–1096. https://doi.org/10.1109/IPDPS53621.2022.00109
- DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models. In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). 172–181. https://doi.org/10.1109/CCGrid49817.2020.00-76
- VeloC: Towards High Performance Adaptive Asynchronous Checkpointing at Large Scale. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 911–920. https://doi.org/10.1109/IPDPS.2019.00099
- NVIDIA. 2024a. GPUDirect RDMA: Direct Communcation between NVIDIA GPUs. https://developer.nvidia.com/blog/gpudirect-storage Accessed: February 26, 2024.
- NVIDIA. 2024b. GPUDirect Storage: A Direct Path Between Storage and GPU Memory. https://docs.nvidia.com/cuda/gpudirect-rdma/index.html Accessed: February 26, 2024.
- NVIDIA Corporation. 2024. NVIDIA DALI. https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
- The Impact of Data Pre-Processing Techniques and Dimensionality Reduction on the Accuracy of Machine Learning. In 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON). 279–283. https://doi.org/10.1109/IEMECONX.2019.8877011
- Travis E. Oliphant. 2006. NumPy: A guide to NumPy. Trelgol Publishing (2006). https://numpy.org/.
- LibriSpeech: an ASR corpus based on public domain audio books. http://www.openslr.org/12. Accessed: February 26, 2024.
- Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964
- Understanding HPC Application I/O Behavior Using System Level Statistics. In 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC). IEEE, Pune, India, 202–211. https://doi.org/10.1109/HiPC50609.2020.00034
- Characterizing Machine Learning I/O Workloads on Leadership Scale HPC Systems. In 2021 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). 1–8. https://doi.org/10.1109/MASCOTS53633.2021.9614303
- The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only. arXiv:2306.01116 [cs.CL]
- Pytorch. 2024. PyTorch 2.2.0 Documentation. PyTorch. Accessed on February 1, 2024.
- Analyzing the I/O Patterns of Deep Learning Applications. Vol. 1444 CCIS. Springer Science and Business Media Deutschland GmbH, N/A, 3–16. https://doi.org/10.1007/978-3-030-84825-5_1
- ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. arXiv:2104.07857 [cs.DC]
- DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA, 3505–3506. https://doi.org/10.1145/3394486.3406703
- MLPerf Inference Benchmark. arXiv:1911.02549 [cs.LG]
- Brian D. Ripley. 1996. Pattern Recognition and Neural Networks. Cambridge University Press.
- A Study of Checkpointing in Large Scale Training of Deep Neural Networks. arXiv:2012.00825 [cs.DC]
- Sebastian Ruder. 2017. An overview of gradient descent optimization algorithms. arXiv:1609.04747 [cs.LG]
- A Dataset and Taxonomy for Urban Sound Research. In 22nd ACM International Conference on Multimedia (ACM-MM’14). Orlando, FL, USA, 1041–1044.
- Scikit-learn. 2024. scikit-learn: Machine Learning in Python. Accessed on February 5, 2024.
- Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799 [cs.LG]
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv:1909.08053 [cs.CL]
- Connor Shorten and Taghi M. Khoshgoftaar. 2019. A survey on Image Data Augmentation for Deep Learning. Journal of Big Data 6, 1 (06 07 2019), 60. https://doi.org/10.1186/s40537-019-0197-0
- Text Data Augmentation for Deep Learning. Journal of Big Data 8, 1 (19 07 2021), 101. https://doi.org/10.1186/s40537-021-00492-0
- Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding. ArXiv e-prints (2016). arXiv:1604.01753 http://arxiv.org/abs/1604.01753
- TensorFlow.js: Machine Learning for the Web and Beyond. arXiv:1901.05350 [cs.LG]
- On the Generalization Benefit of Noise in Stochastic Gradient Descent. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, Cornell University, 9058–9067. https://proceedings.mlr.press/v119/smith20a.html
- Modular HPC I/O Characterization with Darshan. In 2016 5th Workshop on Extreme-Scale Programming Tools (ESPT). 2016 5th workshop on extreme-scale programming tools (ESPT), Salt Lake City, UT, USA, 9–17. https://doi.org/10.1109/ESPT.2016.006
- Modular HPC I/O Characterization with Darshan. In Proceedings of the 5th Workshop on Extreme-Scale Programming Tools (Salt Lake City, Utah) (ESPT ’16). IEEE Press, 9–17.
- Using GPUs for machine learning algorithms. In Eighth International Conference on Document Analysis and Recognition (ICDAR’05). 1115–1120 Vol. 2. https://doi.org/10.1109/ICDAR.2005.251
- Jonathan J. Stickel. 2010. Data smoothing and numerical differentiation by a regularization method. Computers & Chemical Engineering 34, 4 (2010), 467–475. https://doi.org/10.1016/j.compchemeng.2009.10.007
- SUN. 2007. High-Performance Storage Architecture and Scalable Cluster File System. Technical Report. Sun Microsystems, Inc.
- STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model Training. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–17. https://doi.org/10.1109/SC41404.2022.00076
- Profiling and Improving the PyTorch Dataloader for high-latency Storage: A Technical Report. arXiv:2211.04908 [cs.LG]
- Efficient Processing of Deep Neural Networks: A Tutorial and Survey. arXiv:1703.09039 [cs.CV]
- TensorFlow. 2024. TensorFlow 2.15.0 Documentation. TensorFlow. Accessed on February 5, 2024.
- TensorFlow. 2024a. TensorFlow Hub. https://www.tensorflow.org/hub.
- TensorFlow. 2024b. TensorFlow Lite. https://www.tensorflow.org/lite.
- TensorFlow. 2024c. TFRecord and tf.train.Example. TensorFlow. https://www.tensorflow.org/tutorials/load_data/tfrecord Accessed on 2024-02-19.
- The HDF Group. 1997-2022. Hierarchical Data Format, version 5. The HDF Group. https://www.hdfgroup.org/HDF5/.
- THUMOS Challenge Workshop. 2012. UFC-101 Dataset. http://www.thumos.info/download.html. Accessed: February 26, 2024.
- A conceptual basis for feature engineering. Journal of Systems and Software 49, 1 (1999), 3–15. https://doi.org/10.1016/S0164-1212(99)00062-X
- Machine Learning Model Sizes and the Parameter Gap. arXiv:2207.02852 [cs.LG]
- I/O Performance Characterization and Prediction through Machine Learning on HPC Systems. In CUG2020. CUG, Dallas Convention Center Room A309-A310 650 S Griffin St Dallas, TX 75202, 10.
- DIESEL+: Accelerating Distributed Deep Learning Tasks on Image Datasets. IEEE Transactions on Parallel and Distributed Systems 33 (5 2022), 1173–1184. Issue 5. https://doi.org/10.1109/TPDS.2021.3104252
- A Deep Learning Dataloader with Shared Data Preparation. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., Virtual, 17146–17156. https://proceedings.neurips.cc/paper_files/paper/2022/file/6d538a6e667960b168d3d947eb6207a6-Paper-Conference.pdf
- Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities based on Ground-truth. arXiv:1205.6233 [cs.SI]
- Benbo Zha and Hong Shen. 2022. Adaptively Periodic I/O Scheduling for Concurrent HPC Applications. Electronics (Switzerland) 11 (5 2022). Issue 9. https://doi.org/10.3390/electronics11091318
- Performance Evaluation and Optimization of HBM-Enabled GPU for Data-Intensive Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 5 (2018), 831–840. https://doi.org/10.1109/TVLSI.2018.2791442
- 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. arXiv:1606.06650 [cs.CV]