Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Formal Definitions and Performance Comparison of Consistency Models for Parallel File Systems (2402.14105v2)

Published 21 Feb 2024 in cs.DC and cs.OS

Abstract: The semantics of HPC storage systems are defined by the consistency models to which they abide. Storage consistency models have been less studied than their counterparts in memory systems, with the exception of the POSIX standard and its strict consistency model. The use of POSIX consistency imposes a performance penalty that becomes more significant as the scale of parallel file systems increases and the access time to storage devices, such as node-local solid storage devices, decreases. While some efforts have been made to adopt relaxed storage consistency models, these models are often defined informally and ambiguously as by-products of a particular implementation. In this work, we establish a connection between memory consistency models and storage consistency models and revisit the key design choices of storage consistency models from a high-level perspective. Further, we propose a formal and unified framework for defining storage consistency models and a layered implementation that can be used to easily evaluate their relative performance for different I/O workloads. Finally, we conduct a comprehensive performance comparison of two relaxed consistency models on a range of commonly-seen parallel I/O workloads, such as checkpoint/restart of scientific applications and random reads of deep learning applications. We demonstrate that for certain I/O scenarios, a weaker consistency model can significantly improve the I/O performance. For instance, in small random reads that typically found in deep learning applications, session consistency achieved an 5x improvement in I/O bandwidth compared to commit consistency, even at small scales.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. T. Patel, S. Byna, G. K. Lockwood, and D. Tiwari, “Revisiting I/O Behavior in Large-Scale Storage Systems: the Expected and the Unexpected,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1–13.
  2. A. K. Paul, O. Faaland, A. Moody, E. Gonsiorowski, K. Mohror, and A. R. Butt, “Understanding HPC Application I/O Behavior Using System Level Statistics,” in 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC).   IEEE, 2020, pp. 202–211.
  3. N. Dryden, R. Böhringer, T. Ben-Nun, and T. Hoefler, “Clairvoyant Prefetching for Distributed Machine Learning I/O,” arXiv preprint arXiv:2101.08734, 2021.
  4. F. Di Natale, H. Bhatia, T. S. Carpenter, C. Neale, S. Kokkila-Schumacher, T. Oppelstrup, L. Stanton, X. Zhang, S. Sundram, T. R. Scogland et al., “A Massively Parallel Infrastructure for Adaptive Multiscale Simulations: Modeling RAS Initiation Pathway for Cancer,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1–16.
  5. “IEEE Standard for Information Technology–Portable Operating System Interface (POSIX(TM)) Base Specifications, Issue 7,” IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008), pp. 1–3951, 2018.
  6. P. Braam, “The Lustre Storage Architecture,” arXiv preprint arXiv:1903.01955, 2019.
  7. F. B. Schmuck and R. L. Haskin, “GPFS: A Shared-Disk File System for Large Computing Clusters.” in FAST, vol. 2, no. 19, 2002.
  8. F. Herold, S. Breuner, and J. Heichler, “An Introduction to BeeGFS,” 2014. [Online]. Available: https://www.beegfs.io/docs/whitepapers/Introduction_to_BeeGFS_by_ThinkParQ.pdf
  9. T. Wang, K. Mohror, A. Moody, W. Yu, and K. Sato, “BurstFS: A Distributed Burst Buffer File System for Scientific Applications,” in The International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2015.
  10. L. L. N. Laboratory, “UnifyFS: A File System for Burst Buffers ,” https://github.com/LLNL/UnifyFS, Mar. 2021.
  11. A. Miranda, R. Nou, and T. Cortes, “echofs: A Scheduler-Guided Temporary Filesystem to Leverage Node-local NVMs,” in 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).   IEEE, 2018, pp. 225–228.
  12. O. Tatebe, S. Moriwake, and Y. Oyama, “Gfarm/BB—Gfarm File System for Node-Local Burst Buffer,” Journal of Computer Science and Technology, vol. 35, no. 1, pp. 61–71, 2020.
  13. S. Oral, S. S. Vazhkudai, F. Wang, C. Zimmer, C. Brumgard, J. Hanley, G. Markomanolis, R. Miller, D. Leverman, S. Atchley et al., “End-to-end I/O Portfolio for the Summit Supercomputing Ecosystem,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1–14.
  14. L. Lamport, “How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Progranm,” IEEE transactions on computers, vol. 28, no. 09, pp. 690–691, 1979.
  15. P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen, “x86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors,” Communications of the ACM, vol. 53, no. 7, pp. 89–97, 2010.
  16. M. Dubois, C. Scheurich, and F. Briggs, “Memory Access Buffering in Multiprocessors,” ACM SIGARCH computer architecture news, vol. 14, no. 2, pp. 434–442, 1986.
  17. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors,” ACM SIGARCH Computer Architecture News, vol. 18, no. 2SI, pp. 15–26, 1990.
  18. C. Wang, K. Mohror, and M. Snir, “File System Semantics Requirements of HPC Applications,” in Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2020, pp. 19–30.
  19. IBM, “Burst Buffer Shared Checkpoint File System,” Apr. 2020. [Online]. Available: https://github.com/IBM/CAST/tree/master/bscfs
  20. S. Shepler, B. Callaghan, D. Robinson, R. Thurlow, C. Beame, M. Eisler, and D. Noveck, “RFC3530: Network File System (NFS) Version 4 Protocol,” 2003.
  21. P. Corbett, D. Feitelson, S. Fineberg, Y. Hsu, B. Nitzberg, J.-P. Prost, M. Snir, B. Traversat, and P. Wong, “Overview of the MPI-IO Parallel I/O Interface,” in IPPS’95 Workshop on Input/Output in Parallel and Distributed Systems, 1995, pp. 1–15.
  22. “MPI: A Message-Passing Interface Standard Version 4.0,” https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf, 2021.
  23. S. V. Adve, “Designing Memory Consistency Models for Shared-Memory Multiprocessors,” Ph.D. dissertation, University of Wisconsin, Madison, 1993.
  24. S. V. Adve and M. D. Hill, “Weak Ordering - a New Definition,” ACM SIGARCH Computer Architecture News, vol. 18, no. 2SI, pp. 2–14, 1990.
  25. J. Manson, W. Pugh, and S. V. Adve, “The Java Memory Model,” ACM SIGPLAN Notices, vol. 40, no. 1, pp. 378–391, 2005.
  26. A. Moody, G. Bronevetsky, K. Mohror, and B. R. De Supinski, “Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System,” in SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.   IEEE, 2010, pp. 1–11.
  27. “HACC IO Kernel from the CORAL Benchmark Codes,” https://asc.llnl.gov/coral-benchmarks##hacc, Jan 2018.
  28. Y. Oyama, N. Maruyama, N. Dryden, E. McCarthy, P. Harrington, J. Balewski, S. Matsuoka, P. Nugent, and B. Van Essen, “The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism,” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 7, pp. 1641–1652, 2020.
  29. P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He, “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour,” arXiv preprint arXiv:1706.02677, 2017.
  30. S. A. Jacobs, B. Van Essen, D. Hysom, J.-S. Yeom, T. Moon, R. Anirudh, J. J. Thiagaranjan, S. Liu, P.-T. Bremer, J. Gaffney et al., “Parallelizing Training of Deep Generative Models on Massive Scientific Datasets,” in 2019 IEEE International Conference on Cluster Computing (CLUSTER).   IEEE, 2019, pp. 1–10.
  31. S. A. Jacobs, N. Dryden, R. Pearce, and B. Van Essen, “Towards Scalable Parallel Training of Deep Neural Networks,” in Proceedings of the Machine Learning on HPC Environments, 2017, pp. 1–9.
  32. B. Van Essen, H. Kim, R. Pearce, K. Boakye, and B. Chen, “LBANN: Livermore Big Artificial Neural Network HPC Toolkit,” in Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, ser. MLHPC ’15.   New York, NY, USA: ACM, 2015, pp. 5:1–5:6. [Online]. Available: http://doi.acm.org/10.1145/2834892.2834897
  33. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition.   Ieee, 2009, pp. 248–255.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com