The Parallel Semantics Program Dependence Graph (2402.00986v1)
Abstract: A compiler's intermediate representation (IR) defines a program's execution plan by encoding its instructions and their relative order. Compiler optimizations aim to replace a given execution plan with a semantically-equivalent one that increases the program's performance for the target architecture. Alternative representations of an IR, like the Program Dependence Graph (PDG), aid this process by capturing the minimum set of constraints that semantically-equivalent execution plans must satisfy. Parallel programming like OpenMP extends a sequential execution plan by adding the possibility of running instructions in parallel, creating a parallel execution plan. Recently introduced parallel IRs, like TAPIR, explicitly encode a parallel execution plan. These new IRs finally make it possible for compilers to change the parallel execution plan expressed by programmers to better fit the target parallel architecture. Unfortunately, parallel IRs do not help compilers in identifying the set of parallel execution plans that preserve the original semantics. In other words, we are still lacking an alternative representation of parallel IRs to capture the minimum set of constraints that parallel execution plans must satisfy to be semantically-equivalent. Unfortunately, the PDG is not an ideal candidate for this task as it was designed for sequential code. We propose the Parallel Semantics Program Dependence Graph (PS-PDG) to precisely capture the salient program constraints that all semantically-equivalent parallel execution plans must satisfy. This paper defines the PS-PDG, justifies the necessity of each extension to the PDG, and demonstrates the increased optimization power of the PS-PDG over an existing PDG-based automatic-parallelizing compiler. Compilers can now rely on the PS-PDG to select different parallel execution plans while maintaining the same original semantics.
- 2022. Write fast code with C/C++ and OpenCilk. https://www.opencilk.org/
- 2023. LLVM/OpenMP design and overview. https://www.https://openmp.llvm.org/
- 2023. RTL Representation. https://gcc.gnu.org/onlinedocs/gccint/RTL.html
- Compilers: principles, techniques, & tools. Pearson Education India.
- Perspective: A Sensible Approach to Speculative Automatic Parallelization. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA, 351–367. https://doi.org/10.1145/3373376.3378458
- SCAF: A Speculation-Aware Collaborative Dependence Analysis Framework. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) (PLDI 2020). Association for Computing Machinery, New York, NY, USA, 638–654. https://doi.org/10.1145/3385412.3386028
- I-Structures: Data Structures for Parallel Computing. ACM Trans. Program. Lang. Syst. 11 (Oct. 1989), 598–632. https://doi.org/10.1145/69558.69562
- The NAS Parallel Benchmarks—Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Albuquerque, New Mexico, USA) (Supercomputing ’91). Association for Computing Machinery, New York, NY, USA, 158–165. https://doi.org/10.1145/125826.125925
- RAJA: Portable Performance for Large-Scale Scientific Applications. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 71–81. https://doi.org/10.1109/P3HPC49587.2019.00012
- Guy Blelloch and John Greiner. 1995. Parallelism in sequential functional languages. In Proceedings of the seventh international conference on Functional programming languages and computer architecture - FPCA ’95. ACM Press, La Jolla, California, United States, 226–237. https://doi.org/10.1145/224164.224210
- Implementation of a portable nested data-parallel language. ACM SIGPLAN Notices 28, 7 (July 1993), 102–111. https://doi.org/10.1145/173284.155343
- Cilk: An efficient multithreaded runtime system. Journal of parallel and distributed computing 37, 1 (1996), 55–69.
- Simone Campanoni and Stefano Crespi Reghizzi. 2009. Traces of Control-Flow Graphs. In Developments in Language Theory, Volker Diekert and Dirk Nowotka (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 156–169.
- HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (San Jose, California) (CGO ’12). ACM, New York, NY, USA, 84–93. https://doi.org/10.1145/2259016.2259028
- HELIX: Making the extraction of thread-level parallelism mainstream. IEEE Micro 32, 4 (2012), 8–18.
- Efficient kernel synthesis for performance portable programming. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–13. https://doi.org/10.1109/MICRO.2016.7783715
- Performance Portability across Diverse Computer Architectures. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 1–13. https://doi.org/10.1109/P3HPC49587.2019.00006
- Time squeezing for tiny devices. In Proceedings of the 46th International Symposium on Computer Architecture, ISCA 2019, Phoenix, AZ, USA, June 22-26, 2019. 657–670. https://doi.org/10.1145/3307650.3322268
- Compiler-guided Instruction-level Clock Scheduling for Timing Speculative Processors. In Proceedings of the 55th Annual Design Automation Conference (San Francisco, California) (DAC ’18). ACM, New York, NY, USA, Article 40, 6 pages. https://doi.org/10.1145/3195970.3196013
- The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst. 9, 3 (jul 1987), 319–349. https://doi.org/10.1145/24039.24041
- Implicitly-threaded Parallelism in Manticore. ([n. d.]), 12.
- Kremlin: Rethinking and Rebooting Gprof for the Multicore Age. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (San Jose, California, USA) (PLDI ’11). Association for Computing Machinery, New York, NY, USA, 458–469. https://doi.org/10.1145/1993498.1993553
- Hierarchical memory management for mutable state. ACM SIGPLAN Notices 53, 1 (Feb. 2018), 81–93. https://doi.org/10.1145/3200691.3178494
- Robert H. Halstead. 1984. Implementation of multilisp: Lisp on a multiprocessor. In Proceedings of the 1984 ACM Symposium on LISP and functional programming (LFP ’84). Association for Computing Machinery, New York, NY, USA, 9–17. https://doi.org/10.1145/800055.802017
- Kinetic Dependence Graphs. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (Istanbul, Turkey) (ASPLOS ’15). Association for Computing Machinery, New York, NY, USA, 457–471. https://doi.org/10.1145/2694344.2694363
- Parallelization of DOALL and DOACROSS loops—a survey. In Advances in computers. Vol. 45. Elsevier, 53–103.
- ISO JTC1/SC22/WG14 - N1665 2012. Intel® Cilk™ Plus Language Extension Specification. Technical Report. International Organization for Standardization. https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1665.htm
- Nicklas Bo Jensen and Sven Karlsson. 2017. Improving Loop Dependence Analysis. ACM Trans. Archit. Code Optim. 14, 3, Article 22 (aug 2017), 24 pages. https://doi.org/10.1145/3095754
- The Program Structure Tree: Computing Control Regions in Linear Time. SIGPLAN Not. 29, 6 (jun 1994), 171–185. https://doi.org/10.1145/773473.178258
- INSPIRE: The insieme parallel intermediate representation. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. 7–17. https://doi.org/10.1109/PACT.2013.6618799
- Implementation Skeletons in Eden: Low-Effort Parallel Programming (Lecture Notes in Computer Science), Markus Mohnen and Pieter Koopman (Eds.). Springer, Berlin, Heidelberg, 71–88. https://doi.org/10.1007/3-540-45361-X_5
- HPVM: Heterogeneous Parallel Virtual Machine. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Vienna, Austria) (PPoPP ’18). Association for Computing Machinery, New York, NY, USA, 68–80. https://doi.org/10.1145/3178487.3178493
- Optimistic Parallelism Requires Abstractions. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (San Diego, California, USA) (PLDI ’07). Association for Computing Machinery, New York, NY, USA, 211–222. https://doi.org/10.1145/1250734.1250759
- Monica Lam. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation. 318–328.
- Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization. IEEE Computer Society, 75.
- Compiler optimization on instruction scheduling for low power. In Proceedings 13th International Symposium on System Synthesis. IEEE, 55–60.
- Lightweight concurrency primitives for GHC. In Proceedings of the ACM SIGPLAN workshop on Haskell workshop (Haskell ’07). Association for Computing Machinery, New York, NY, USA, 107–118. https://doi.org/10.1145/1291201.1291217
- Paths to OpenMP in the kernel. In SC ’21: The International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, Missouri, USA, November 14 - 19, 2021, Bronis R. de Supinski, Mary W. Hall, and Todd Gamblin (Eds.). ACM, 65:1–65:17. https://doi.org/10.1145/3458817.3476183
- Simon Marlow. 2012. Parallel and Concurrent Programming in Haskell. In Central European Functional Programming School: 4th Summer School, CEFP 2011, Budapest, Hungary, June 14-24, 2011, Revised Selected Papers, Viktória Zsók, Zoltán Horváth, and Rinus Plasmeijer (Eds.). Springer, Berlin, Heidelberg, 339–401. https://doi.org/10.1007/978-3-642-32096-5_7
- NOELLE Offers Empowering LLVM Extensions. In Proceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization (Virtual Event, Republic of Korea) (CGO ’22). IEEE Press, 179–192. https://doi.org/10.1109/CGO53902.2022.9741276
- A Transformation Framework for Optimizing Task-Parallel Programs. ACM Trans. Program. Lang. Syst. 35, 1, Article 3 (apr 2013), 48 pages. https://doi.org/10.1145/2450136.2450138
- Portable Performance of Data Parallel Languages. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing (San Jose, CA) (SC ’97). Association for Computing Machinery, New York, NY, USA, 1–20. https://doi.org/10.1145/509593.509611
- OpenMP Architecture Review Board. 2018. OpenMP Application Program Interface Version 5.0. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf
- Harnessing the Multicores: Nested Data Parallelism in Haskell. Leibniz International Proceedings in Informatics, LIPIcs 2 (Dec. 2008). https://doi.org/10.4230/LIPIcs.FSTTCS.2008.1769
- Portable Performance on Heterogeneous Architectures. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (Houston, Texas, USA) (ASPLOS ’13). Association for Computing Machinery, New York, NY, USA, 431–444. https://doi.org/10.1145/2451116.2451162
- Hierarchical memory management for parallel programs. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016). Association for Computing Machinery, New York, NY, USA, 392–406. https://doi.org/10.1145/2951913.2951935
- Eliminating Voltage Emergencies via Software-guided Code Transformations. ACM Trans. Archit. Code Optim. 7, 2, Article 12 (Oct. 2010), 28 pages. https://doi.org/10.1145/1839667.1839674
- Vivek Sarkar. 1989. Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT Press, Cambridge, MA, USA.
- Vivek Sarkar. 1997. Analysis and Optimization of Explicitly Parallel Programs Using the Parallel Program Graph Representation. In LCPC.
- Vivek Sarkar and Barbara Simons. 1993. Parallel Program Graphs and Their Classification. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin, Heidelberg, 633–655.
- Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Austin, Texas, USA) (PPoPP ’17). Association for Computing Machinery, New York, NY, USA, 249–265. https://doi.org/10.1145/3018743.3018758
- MultiMLton: A multicore-aware runtime for standard ML. Journal of Functional Programming 24, 6 (Nov. 2014), 613–674. https://doi.org/10.1017/S0956796814000161 Publisher: Cambridge University Press.
- Harini Srinivasan and Michael Wolfe. 1991. Analyzing Programs with Explicit Parallelism. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin, Heidelberg, 405–419.
- Clairvoyance: Look-ahead compile-time scheduling. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 171–184.
- Speculative decoupled software pipelining. In 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007). IEEE, 49–59.
- Disentanglement in nested-parallel programs. Proceedings of the ACM on Programming Languages 4, POPL (Jan. 2020), 1–32. https://doi.org/10.1145/3371115
- Quantifying the Semantic Gap Between Serial and Parallel Programming. In 2021 IEEE International Symposium on Workload Characterization (IISWC). 151–162. https://doi.org/10.1109/IISWC53511.2021.00024