Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Landscape and Challenges of HPC Research and LLMs (2402.02018v3)

Published 3 Feb 2024 in cs.LG

Abstract: Recently, LLMs (LMs), especially LLMs, have revolutionized the field of deep learning. Both encoder-decoder models and prompt-based techniques have shown immense potential for natural language processing and code-based tasks. Over the past several years, many research labs and institutions have invested heavily in high-performance computing, approaching or breaching exascale performance levels. In this paper, we posit that adapting and utilizing such LLM-based techniques for tasks in high-performance computing (HPC) would be very beneficial. This study presents our reasoning behind the aforementioned position and highlights how existing ideas can be improved and adapted for HPC tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (102)
  1. URL https://codeium.com/.
  2. URL https://github.com/features/copilot.
  3. URL https://www.tabnine.com/.
  4. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  5. Bailey, D. H. et al. The nas parallel benchmarks. The International Journal of Supercomputing Applications, 5(3):63–73, 1991.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback, 2023. URL http://arxiv.org/abs/2307.15217.
  8. Deep Vision Multimodal Learning: Methodology, Benchmark, and Trend. 12(13):6588, 2022. ISSN 2076-3417. doi: 10.3390/app12136588.
  9. Multi-view learning for parallelism discovery of sequential programs. In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp.  295–303. IEEE, 2022.
  10. Compcodevet: A compiler-guided validation and enhancement approach for code dataset. arXiv preprint arXiv:2311.06505, 2023a.
  11. Data race detection using large language models. In Proceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pp.  215–223, 2023b.
  12. Lm4hpc: Towards effective language model application in high-performance computing. In International Workshop on OpenMP, pp.  18–33. Springer, 2023c.
  13. Learning to parallelize with openmp by augmented heterogeneous ast representation. Proceedings of Machine Learning and Systems, 5, 2023d.
  14. Ompgpt: A generative pre-trained transformer model for openmp, 2024.
  15. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  16. Runtime composition of iterations for fusing loop-carried sparse dependence. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’23, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701092. doi: 10.1145/3581784.3607097. URL https://doi.org/10.1145/3581784.3607097.
  17. Cross-lingual language model pretraining. Advances in neural information processing systems, 32, 2019.
  18. Creusillet, B. et al. Par4all: Auto-parallelizing c and fortran for the cuda architecture. 2009.
  19. Programl: A graph-based program representation for data flow analysis and compiler optimizations. In International Conference on Machine Learning, pp.  2244–2253. PMLR, 2021.
  20. Large language models for compiler optimization. arXiv preprint arXiv:2309.07062, 2023.
  21. Dave, C. et al. Cetus: A source-to-source compiler infrastructure for multicores. Computer, 42(12), 2009.
  22. Dever, M. AutoPar: automating the parallelization of functional programs. PhD thesis, Dublin City University, 2015.
  23. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  24. Hpc-gpt: Integrating large language model for high-performance computing. In Proceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pp.  951–960, 2023.
  25. Performance optimization using multimodal modeling and heterogeneous gnn. In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, pp.  45–57, 2023.
  26. A Generalization of Transformer Networks to Graphs. In AAAI 2021 Workshop on Deep Learning on Graphs: Methods and Applications. arXiv, 2021. doi: 10.48550/arXiv.2012.09699.
  27. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp.  1536–1547, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.139. URL https://aclanthology.org/2020.findings-emnlp.139.
  28. Evaluation of openai codex for hpc parallel programming models kernel generation. In Proceedings of the 52nd International Conference on Parallel Processing Workshops, pp.  136–144, 2023.
  29. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366, 2020.
  30. Learning to parallelize in a shared-memory environment with transformers. In Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, pp.  450–452, 2023.
  31. Harel, R. et al. Source-to-source parallelization compilers for scientific shared-memory multi-core and accelerated multiprocessing: analysis, pitfalls, enhancement and potential. International Journal of Parallel Programming, 48(1):1–31, 2020.
  32. Heterogeneous graph transformer. In Proceedings of the web conference 2020, pp.  2704–2710, 2020.
  33. Intel. New 5th gen Intel® Xeon® processors are built with ai acceleration in every core, 2023. URL https://www.intel.com/content/www/us/en/newsroom/news/5th-gen-xeon-data-center-news.html#gs.4emehd. Retrieved from Intel.
  34. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pp.  4904–4916. PMLR, 2021.
  35. Nova+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT: Generative language models for binaries, 2023.
  36. Opengpt-x–training large language models on hpc systems. Technical report, Jülich Supercomputing Center, 2023.
  37. Challenges and applications of large language models. arXiv preprint arXiv:2307.10169, 2023.
  38. Quantifying openmp: Statistical insights into usage and adoption. In 2023 IEEE High Performance Extreme Computing Conference (HPEC), pp.  1–7. IEEE, 2023a.
  39. Pragformer: Data-driven parallel source code classification with transformers. 2023b.
  40. Domain-specific code language models: Unraveling the potential for hpc codes and tasks. arXiv preprint arXiv:2312.13322, 2023c.
  41. Scope is all you need: Transforming llms for hpc code. arXiv preprint arXiv:2308.09440, 2023d.
  42. Advising openmp parallelization via a graph-based approach with transformers. arXiv preprint arXiv:2305.11999, 2023e.
  43. Automatic code documentation generation using gpt-3. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pp.  1–6, 2022.
  44. The stack: 3 tb of permissively licensed source code, 2022.
  45. Koroteev, M. Bert: a review of applications in natural language processing and understanding. arXiv preprint arXiv:2103.11943, 2021.
  46. Self-supervised learning in medicine and healthcare. 6(12):1346–1352, 2022. ISSN 2157-846X. doi: 10.1038/s41551-022-00914-1.
  47. Efficient memory management for large language model serving with pagedattention, 2023a.
  48. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023b.
  49. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34:9694–9705, 2021.
  50. Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161, 2023.
  51. On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189, 2019.
  52. Choosing the best parallelization and implementation styles for graph analytics codes: Lessons learned from 1106 programs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’23, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701092. doi: 10.1145/3581784.3607038. URL https://doi.org/10.1145/3581784.3607038.
  53. Big data preprocessing. Cham: Springer, 2020.
  54. Wizardcoder: Empowering code large language models with evol-instruct, 2023.
  55. Generating diverse code explanations using the gpt-3 large language model. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 2, pp.  37–39, 2022.
  56. Experiences from using code explanations generated by large language models in a web software development e-book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1, pp.  931–937, 2023.
  57. Autoparllm: Gnn-guided automatic code parallelization using large language models. arXiv preprint arXiv:2310.04047, 2023.
  58. Sympy: symbolic computing in python. PeerJ Computer Science, 3:e103, 2017.
  59. Intelligent selection of language model training data. In Proceedings of the ACL 2010 conference short papers, pp.  220–224, 2010.
  60. Mosseri, I. et al. Compar: Optimized multi-compiler for automatic openmp S2S parallelization. In Milfeld, K. F., de Supinski, B. R., Koesterke, L., and Klinkenberg, J. (eds.), OpenMP: Portable Multi-Level Parallelism on Modern Systems - 16th International Workshop on OpenMP, IWOMP 2020, Austin, TX, USA, September 22-24, 2020, Proceedings, volume 12295 of Lecture Notes in Computer Science, pp.  247–262. Springer, 2020. doi: 10.1007/978-3-030-58144-2_16. URL https://doi.org/10.1007/978-3-030-58144-2_16.
  61. Enabling nas with automated super-network generation. In Practical Deep Learning in the Wild, AAAI, 2022. URL https://arxiv.org/abs/2112.10878.
  62. Attending to Graph Transformers, 2023. URL http://arxiv.org/abs/2302.04181.
  63. Attention Bottlenecks for Multimodal Fusion. In Advances in Neural Information Processing Systems, volume 34, pp.  14200–14213. Curran Associates, Inc. URL https://proceedings.neurips.cc/paper_files/paper/2021/hash/76ba9f564ebbc35b1014ac498fafadd0-Abstract.html.
  64. HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges. 51(1):1–29, 2019. ISSN 0360-0300, 1557-7341. doi: 10.1145/3150224.
  65. Modeling parallel programs using large language models. arXiv preprint arXiv:2306.17281, 2023.
  66. Can large language models write parallel code? arXiv preprint arXiv:2401.12554, 2024.
  67. Fair: Flow type-aware pre-training of compiler intermediate representations, 2023.
  68. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023. URL https://arxiv.org/abs/2303.08774.
  69. Ai-assisted coding: Experiments with gpt-4. arXiv preprint arXiv:2304.13187, 2023.
  70. Efficiently scaling transformer inference, 2022.
  71. Prema, S. et al. Identifying pitfalls in automatic parallelization of nas parallel benchmarks. In Parallel Computing Technologies (PARCOMPTECH), 2017 National Conference on, pp.  1–6. IEEE, 2017.
  72. Prema, S. et al. A study on popular auto-parallelization frameworks. Concurrency and Computation: Practice and Experience, 31(17):e5168, 2019.
  73. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  74. Zero-shot text-to-image generation. In International Conference on Machine Learning, pp.  8821–8831. PMLR, 2021.
  75. Ronacher, A. Jinja2 documentation. Welcome to Jinja2—Jinja2 Documentation (2.8-dev), 2008.
  76. Llms in e-commerce: A comparative analysis of gpt and llama models in product review evaluation. Natural Language Processing Journal, pp.  100056, 2024.
  77. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023.
  78. On contrastive learning of semantic similarity forcode to code search. arXiv preprint arXiv:2305.03843, 2023.
  79. Sallam, M. The utility of chatgpt as an example of large language models in healthcare education, research and practice: Systematic review on the future perspectives and potential limitations. medRxiv, pp.  2023–02, 2023.
  80. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1, pp.  27–43, 2022.
  81. On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, HPCAsia ’19, pp.  1–8. Association for Computing Machinery, 2019. ISBN 978-1-4503-6632-8. doi: 10.1145/3293320.3293338.
  82. An empirical evaluation of using large language models for automated unit test generation. IEEE Transactions on Software Engineering, 2023.
  83. Mpi-rical: Data-driven mpi distributed parallelism assistance with transformers. arXiv preprint arXiv:2305.09438, 2023.
  84. Megatron-lm: Training multi-billion parameter language models using model parallelism, 2020.
  85. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. arXiv preprint arXiv:2201.11990, 2022.
  86. Tadych, B. Distributed llama - distributed inference of large language models with slow synchronization over ethernet. https://github.com/b4rtaz/distributed-llama/blob/main/report/report.pdf, 2024.
  87. Learning intermediate representations using graph neural networks for numa and prefetchers optimization. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.  1206–1216. IEEE, 2022.
  88. Perfograph: A numerical aware program graph representation for performance optimization and program analysis. arXiv preprint arXiv:2306.00210, 2023.
  89. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  90. Comparing llama-2 and gpt-3 llms for hpc kernels generation. arXiv preprint arXiv:2309.07103, 2023.
  91. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  92. Codet5+: Open code large language models for code understanding and generation. arXiv preprint, 2023.
  93. Babeltower: Learning to auto-parallelized program translation. In International Conference on Machine Learning, pp.  23685–23700. PMLR, 2022.
  94. Huggingface’s transformers: State-of-the-art natural language processing, 2020.
  95. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  96. A systematic evaluation of large language models of code, 2022.
  97. Multimodal Learning With Transformers: A Survey. 45(10):12113–12132, 2023. ISSN 1939-3539. doi: 10.1109/TPAMI.2023.3275156.
  98. Forge: Pre-training open foundation models for science. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’23, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701092. doi: 10.1145/3581784.3613215. URL https://doi.org/10.1145/3581784.3613215.
  99. Trivialspy: Identifying software triviality via fine-grained and dataflow-based value profiling. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’23, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701092. doi: 10.1145/3581784.3607052. URL https://doi.org/10.1145/3581784.3607052.
  100. Bridging the gap between foundation models and heterogeneous federated learning. 2023.
  101. Enhancing training efficiency of large-scale language models through advanced hpc techniques. IEEE Transactions on Parallel and Distributed Systems, 33(8):1845–1857, 2022.
  102. Deep scaffold hopping with multimodal transformer neural networks. 13(1):87, 2021. ISSN 1758-2946. doi: 10.1186/s13321-021-00565-5.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (17)
  1. Nesreen K. Ahmed (76 papers)
  2. Akash Dutta (8 papers)
  3. Arijit Bhattacharjee (4 papers)
  4. Sixing Yu (12 papers)
  5. Quazi Ishtiaque Mahmud (5 papers)
  6. Waqwoya Abebe (6 papers)
  7. Hung Phan (9 papers)
  8. Aishwarya Sarkar (6 papers)
  9. Branden Butler (2 papers)
  10. Niranjan Hasabnis (21 papers)
  11. Gal Oren (38 papers)
  12. Vy A. Vo (11 papers)
  13. Juan Pablo Munoz (4 papers)
  14. Theodore L. Willke (21 papers)
  15. Tim Mattson (10 papers)
  16. Ali Jannesari (56 papers)
  17. le Chen (71 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.