ACPO: AI-Enabled Compiler-Driven Program Optimization (2312.09982v2)
Abstract: The key to performance optimization of a program is to decide correctly when a certain transformation should be applied by a compiler. This is an ideal opportunity to apply machine-learning models to speed up the tuning process; while this realization has been around since the late 90s, only recent advancements in ML enabled a practical application of ML to compilers as an end-to-end framework. This paper presents ACPO: \textbf{\underline{A}}I-Enabled \textbf{\underline{C}}ompiler-driven \textbf{\underline{P}}rogram \textbf{\underline{O}}ptimization; a novel framework to provide LLVM with simple and comprehensive tools to benefit from employing ML models for different optimization passes. We first showcase the high-level view, class hierarchy, and functionalities of ACPO and subsequently, demonstrate a couple of use cases of ACPO by ML-enabling the Loop Unroll and Function Inlining passes and describe how ACPO can be leveraged to optimize other passes. Experimental results reveal that ACPO model for Loop Unroll is able to gain on average 4\% compared to LLVM's O3 optimization when deployed on Polybench. Furthermore, by adding the Inliner model as well, ACPO is able to provide up to 4.5\% and 2.4\% on Polybench and Cbench compared with LLVM's O3 optimization, respectively.
- Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).
- Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 295–305.
- Compilers, Principles, Techniques. Addison wesley.
- code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1–29.
- Opentuner: An extensible framework for program autotuning. In Proceedings of the 23rd international conference on Parallel architectures and compilation. 303–316.
- MiCOMP: Mitigating the Compiler Phase-Ordering Problem Using Optimization Sub-Sequences and Machine Learning. ACM Trans. Archit. Code Optim. 14, 3, Article 29 (Sept. 2017), 28 pages. https://doi.org/10.1145/3124452
- MLGOPerf: An ML Guided Inliner to Optimize Performance. arXiv preprint arXiv:2207.08389 (2022).
- A survey on compiler autotuning using machine learning. ACM Computing Surveys (CSUR) 51, 5 (2018), 1–42. https://doi.org/10.1145/3197978
- COBAYN: Compiler Autotuning Framework Using Bayesian Networks. ACM Trans. Archit. Code Optim. (TACO) 13, 2, Article 21 (June 2016), 25 pages. https://doi.org/10.1145/2928270
- Automatic Tuning of Compilers Using Machine Learning. Springer. https://doi.org/10.1007/978-3-319-71489-9
- A framework for Compiler Level statistical analysis over customized VLIW architecture. In VLSI-SoC. 124–129. https://doi.org/10.1109/VLSI-SoC.2013.6673262
- Neural code comprehension: A learnable representation of code semantics. Advances in Neural Information Processing Systems 31 (2018).
- Pearson correlation coefficient. In Noise reduction in speech processing. Springer, 1–4.
- Iterative compilation in a non-linear optimisation space. In Workshop on Profile and Feedback-Directed Compilation.
- OpenAI Gym. ArXiv abs/1606.01540 (2016). https://api.semanticscholar.org/CorpusID:16099293
- Hybrid optimizations: Which optimization algorithm to use? Compiler Construction (2006). http://link.springer.com/chapter/10.1007/11688839{_}12
- Deconstructing iterative optimization. ACM Transactions on Architecture and Code Optimization (TACO) 9, 3 (2012), 21.
- Adaptive optimizing compilers for the 21st century. The Journal of Supercomputing (2002). http://link.springer.com/article/10.1023/A:1015729001611
- Coral-2 2017. Coral-2 Benchmarks. https://asc.llnl.gov/coral-2-benchmarks.
- Programl: A graph-based program representation for data flow analysis and compiler optimizations. In International Conference on Machine Learning. PMLR, 2244–2253.
- End-to-End Deep Learning of Optimization Heuristics. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 219–232. https://doi.org/10.1109/PACT.2017.24
- CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research. arXiv preprint arXiv:2109.08267 (2021).
- Deep Learning-based Approximate Graph-Coloring Algorithm for Register Allocation. In 2020 IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar). IEEE, 23–32.
- Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 472–489.
- Milepost gcc: Machine learning enabled self-tuning compiler. International journal of parallel programming 39, 3 (2011), 296–327. http://link.springer.com/article/10.1007/s10766-010-0161-2
- MILEPOST GCC: machine learning based research compiler. GCC Summit (2008). https://hal.inria.fr/inria-00294704/
- Grigori Fursin and Olivier Temam. 2009. Collective optimization. In International Conference on High-Performance Embedded Architectures and Compilers. Springer, 34–49.
- Shay Gal-On and Markus Levy. 2012. Exploring coremark a benchmark maximizing simplicity and efficacy. The Embedded Microprocessor Benchmark Consortium (2012).
- Less is More: Exploiting the Standard Compiler Optimization Levels for Better Performance and Energy Consumption. arXiv preprint arXiv:1802.09845 (2018).
- Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).
- Neurovectorizer: End-to-end vectorization with deep reinforcement learning. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization. 242–255.
- Compiler research: the next 50 years. Commun. ACM (2009). http://dl.acm.org/citation.cfm?id=1461946
- Kenneth Hoste and Lieven Eeckhout. 2007. Microarchitecture-independent workload characterization. IEEE Micro 27, 3 (2007), 63–72.
- Kenneth Hoste and Lieven Eeckhout. 2008. Cole: compiler optimization level exploration. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization. 165–174.
- Autophase: Compiler phase-ordering for hls with deep reinforcement learning. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 308–308.
- Peter Ivie and Douglas Thain. 2018. Reproducibility in scientific computing. ACM Computing Surveys (CSUR) 51, 3 (2018), 1–36.
- Code Region Based Auto-Tuning Enabled Compilers. In Workshop on the Intersection of High Performance Computing and Machine Learning (HPCaML). ACM.
- S Kulkarni and J Cavazos. 2012. Mitigating the compiler optimization phase-ordering problem using machine learning. ACM SIGPLAN Notices (2012). http://dl.acm.org/citation.cfm?id=2384628
- Chris Lattner. 2008. LLVM and Clang: Next generation compiler technology. In The BSD conference, Vol. 5.
- Paul Lokuciejewski and Peter Marwedel. 2009. Combining worst-case timing models, loop unrolling, and static loop analysis for WCET minimization. In 2009 21st Euromicro Conference on Real-Time Systems. IEEE, 35–44.
- Static neural compiler optimization via deep reinforcement learning. In 2020 IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar). IEEE, 1–11.
- Sandya Mannarswamy and Dibyendu Das. 2022. Learning to Combine Instructions in LLVM Compiler. arXiv preprint arXiv:2202.12379 (2022).
- Introducing ReQuEST: an Open Platform for Reproducible and Quality-Efficient Systems-ML Tournaments. CoRR abs/1801.06378 (2018). arXiv:1801.06378 http://arxiv.org/abs/1801.06378
- Introducing the graph 500. Cray Users Group (CUG) 19 (2010), 45–74.
- Optimal loop unrolling for GPGPU programs. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 1–11.
- Impact of Compiler Phase Ordering When Targeting GPUs. In Euro-Par 2017: Parallel Processing Workshops, Dora B. Heras and Luc Bougé (Eds.). Springer International Publishing, Cham, 427–438.
- Multi-objective design space exploration of embedded systems. Journal of Embedded Computing 1, 3 (2005), 305–316.
- Using graph-based program characterization for predictive modeling. Proceedings of the International Symposium on Code Generation and Optimization (2012), 295–305. http://dl.acm.org/citation.cfm?id=2259042
- Predictive modeling in a polyhedral optimization space. International journal of parallel programming (2013), 704–750. http://link.springer.com/article/10.1007/s10766-013-0241-1
- SRTuner: Effective Compiler Optimization Customization by Exposing Synergistic Relations. In 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 118–130.
- Automatic differentiation in pytorch. (2017).
- Tharindu Rusira Patabandi and Mary W. Hall. 2023. Efficiently Learning Locality Optimizations by Decomposing Transformation Domains. Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction (2023). https://api.semanticscholar.org/CorpusID:256941469
- Predictive data locality optimization for higher-order tensor computations. Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming (2021). https://api.semanticscholar.org/CorpusID:235474167
- A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 1–16.
- Louis-Noël Pouchet. 2012. Polybench: The polyhedral benchmark suite. url: http://www. cs. ucla. edu/~ pouchet/software/polybench/[cited July,] (2012).
- Nadav Rotem and Chris Cummins. 2021. Profile Guided Optimization without Profiles: A Machine Learning Approach. arXiv preprint arXiv:2112.14679 (2021).
- Vivek Sarkar. 2000. Optimized unrolling of nested loops. In Proceedings of the 14th international conference on Supercomputing. 153–166.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
- Exploring the space of optimization sequences for code-size reduction: insights and tools. In Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction. 47–58.
- Understanding and exploiting optimal function inlining. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 977–989.
- Mlgo: a machine learning guided compiler optimizations framework. arXiv preprint arXiv:2101.04808 (2021).
- Automating reinforcement learning architecture design for code optimization. In Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction. 129–143.
- Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021).
- Jason Wei and Kai Zou. 2019. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196 (2019).
- Machine learning approach for loop unrolling factor prediction in high level synthesis. In 2018 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 91–97.
- Amir H. Ashouri (4 papers)
- Muhammad Asif Manzoor (2 papers)
- Duc Minh Vu (4 papers)
- Raymond Zhang (5 papers)
- Ziwen Wang (37 papers)
- Angel Zhang (1 paper)
- Bryan Chan (11 papers)
- Tomasz S. Czajkowski (1 paper)
- Yaoqing Gao (3 papers)