Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Parallel Semantics Program Dependence Graph (2402.00986v1)

Published 1 Feb 2024 in cs.PL

Abstract: A compiler's intermediate representation (IR) defines a program's execution plan by encoding its instructions and their relative order. Compiler optimizations aim to replace a given execution plan with a semantically-equivalent one that increases the program's performance for the target architecture. Alternative representations of an IR, like the Program Dependence Graph (PDG), aid this process by capturing the minimum set of constraints that semantically-equivalent execution plans must satisfy. Parallel programming like OpenMP extends a sequential execution plan by adding the possibility of running instructions in parallel, creating a parallel execution plan. Recently introduced parallel IRs, like TAPIR, explicitly encode a parallel execution plan. These new IRs finally make it possible for compilers to change the parallel execution plan expressed by programmers to better fit the target parallel architecture. Unfortunately, parallel IRs do not help compilers in identifying the set of parallel execution plans that preserve the original semantics. In other words, we are still lacking an alternative representation of parallel IRs to capture the minimum set of constraints that parallel execution plans must satisfy to be semantically-equivalent. Unfortunately, the PDG is not an ideal candidate for this task as it was designed for sequential code. We propose the Parallel Semantics Program Dependence Graph (PS-PDG) to precisely capture the salient program constraints that all semantically-equivalent parallel execution plans must satisfy. This paper defines the PS-PDG, justifies the necessity of each extension to the PDG, and demonstrates the increased optimization power of the PS-PDG over an existing PDG-based automatic-parallelizing compiler. Compilers can now rely on the PS-PDG to select different parallel execution plans while maintaining the same original semantics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. 2022. Write fast code with C/C++ and OpenCilk. https://www.opencilk.org/
  2. 2023. LLVM/OpenMP design and overview. https://www.https://openmp.llvm.org/
  3. 2023. RTL Representation. https://gcc.gnu.org/onlinedocs/gccint/RTL.html
  4. Compilers: principles, techniques, & tools. Pearson Education India.
  5. Perspective: A Sensible Approach to Speculative Automatic Parallelization. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA, 351–367. https://doi.org/10.1145/3373376.3378458
  6. SCAF: A Speculation-Aware Collaborative Dependence Analysis Framework. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) (PLDI 2020). Association for Computing Machinery, New York, NY, USA, 638–654. https://doi.org/10.1145/3385412.3386028
  7. I-Structures: Data Structures for Parallel Computing. ACM Trans. Program. Lang. Syst. 11 (Oct. 1989), 598–632. https://doi.org/10.1145/69558.69562
  8. The NAS Parallel Benchmarks—Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Albuquerque, New Mexico, USA) (Supercomputing ’91). Association for Computing Machinery, New York, NY, USA, 158–165. https://doi.org/10.1145/125826.125925
  9. RAJA: Portable Performance for Large-Scale Scientific Applications. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 71–81. https://doi.org/10.1109/P3HPC49587.2019.00012
  10. Guy Blelloch and John Greiner. 1995. Parallelism in sequential functional languages. In Proceedings of the seventh international conference on Functional programming languages and computer architecture - FPCA ’95. ACM Press, La Jolla, California, United States, 226–237. https://doi.org/10.1145/224164.224210
  11. Implementation of a portable nested data-parallel language. ACM SIGPLAN Notices 28, 7 (July 1993), 102–111. https://doi.org/10.1145/173284.155343
  12. Cilk: An efficient multithreaded runtime system. Journal of parallel and distributed computing 37, 1 (1996), 55–69.
  13. Simone Campanoni and Stefano Crespi Reghizzi. 2009. Traces of Control-Flow Graphs. In Developments in Language Theory, Volker Diekert and Dirk Nowotka (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 156–169.
  14. HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (San Jose, California) (CGO ’12). ACM, New York, NY, USA, 84–93. https://doi.org/10.1145/2259016.2259028
  15. HELIX: Making the extraction of thread-level parallelism mainstream. IEEE Micro 32, 4 (2012), 8–18.
  16. Efficient kernel synthesis for performance portable programming. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–13. https://doi.org/10.1109/MICRO.2016.7783715
  17. Performance Portability across Diverse Computer Architectures. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 1–13. https://doi.org/10.1109/P3HPC49587.2019.00006
  18. Time squeezing for tiny devices. In Proceedings of the 46th International Symposium on Computer Architecture, ISCA 2019, Phoenix, AZ, USA, June 22-26, 2019. 657–670. https://doi.org/10.1145/3307650.3322268
  19. Compiler-guided Instruction-level Clock Scheduling for Timing Speculative Processors. In Proceedings of the 55th Annual Design Automation Conference (San Francisco, California) (DAC ’18). ACM, New York, NY, USA, Article 40, 6 pages. https://doi.org/10.1145/3195970.3196013
  20. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst. 9, 3 (jul 1987), 319–349. https://doi.org/10.1145/24039.24041
  21. Implicitly-threaded Parallelism in Manticore. ([n. d.]), 12.
  22. Kremlin: Rethinking and Rebooting Gprof for the Multicore Age. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (San Jose, California, USA) (PLDI ’11). Association for Computing Machinery, New York, NY, USA, 458–469. https://doi.org/10.1145/1993498.1993553
  23. Hierarchical memory management for mutable state. ACM SIGPLAN Notices 53, 1 (Feb. 2018), 81–93. https://doi.org/10.1145/3200691.3178494
  24. Robert H. Halstead. 1984. Implementation of multilisp: Lisp on a multiprocessor. In Proceedings of the 1984 ACM Symposium on LISP and functional programming (LFP ’84). Association for Computing Machinery, New York, NY, USA, 9–17. https://doi.org/10.1145/800055.802017
  25. Kinetic Dependence Graphs. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (Istanbul, Turkey) (ASPLOS ’15). Association for Computing Machinery, New York, NY, USA, 457–471. https://doi.org/10.1145/2694344.2694363
  26. Parallelization of DOALL and DOACROSS loops—a survey. In Advances in computers. Vol. 45. Elsevier, 53–103.
  27. ISO JTC1/SC22/WG14 - N1665 2012. Intel® Cilk™ Plus Language Extension Specification. Technical Report. International Organization for Standardization. https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1665.htm
  28. Nicklas Bo Jensen and Sven Karlsson. 2017. Improving Loop Dependence Analysis. ACM Trans. Archit. Code Optim. 14, 3, Article 22 (aug 2017), 24 pages. https://doi.org/10.1145/3095754
  29. The Program Structure Tree: Computing Control Regions in Linear Time. SIGPLAN Not. 29, 6 (jun 1994), 171–185. https://doi.org/10.1145/773473.178258
  30. INSPIRE: The insieme parallel intermediate representation. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. 7–17. https://doi.org/10.1109/PACT.2013.6618799
  31. Implementation Skeletons in Eden: Low-Effort Parallel Programming (Lecture Notes in Computer Science), Markus Mohnen and Pieter Koopman (Eds.). Springer, Berlin, Heidelberg, 71–88. https://doi.org/10.1007/3-540-45361-X_5
  32. HPVM: Heterogeneous Parallel Virtual Machine. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Vienna, Austria) (PPoPP ’18). Association for Computing Machinery, New York, NY, USA, 68–80. https://doi.org/10.1145/3178487.3178493
  33. Optimistic Parallelism Requires Abstractions. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (San Diego, California, USA) (PLDI ’07). Association for Computing Machinery, New York, NY, USA, 211–222. https://doi.org/10.1145/1250734.1250759
  34. Monica Lam. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation. 318–328.
  35. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization. IEEE Computer Society, 75.
  36. Compiler optimization on instruction scheduling for low power. In Proceedings 13th International Symposium on System Synthesis. IEEE, 55–60.
  37. Lightweight concurrency primitives for GHC. In Proceedings of the ACM SIGPLAN workshop on Haskell workshop (Haskell ’07). Association for Computing Machinery, New York, NY, USA, 107–118. https://doi.org/10.1145/1291201.1291217
  38. Paths to OpenMP in the kernel. In SC ’21: The International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, Missouri, USA, November 14 - 19, 2021, Bronis R. de Supinski, Mary W. Hall, and Todd Gamblin (Eds.). ACM, 65:1–65:17. https://doi.org/10.1145/3458817.3476183
  39. Simon Marlow. 2012. Parallel and Concurrent Programming in Haskell. In Central European Functional Programming School: 4th Summer School, CEFP 2011, Budapest, Hungary, June 14-24, 2011, Revised Selected Papers, Viktória Zsók, Zoltán Horváth, and Rinus Plasmeijer (Eds.). Springer, Berlin, Heidelberg, 339–401. https://doi.org/10.1007/978-3-642-32096-5_7
  40. NOELLE Offers Empowering LLVM Extensions. In Proceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization (Virtual Event, Republic of Korea) (CGO ’22). IEEE Press, 179–192. https://doi.org/10.1109/CGO53902.2022.9741276
  41. A Transformation Framework for Optimizing Task-Parallel Programs. ACM Trans. Program. Lang. Syst. 35, 1, Article 3 (apr 2013), 48 pages. https://doi.org/10.1145/2450136.2450138
  42. Portable Performance of Data Parallel Languages. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing (San Jose, CA) (SC ’97). Association for Computing Machinery, New York, NY, USA, 1–20. https://doi.org/10.1145/509593.509611
  43. OpenMP Architecture Review Board. 2018. OpenMP Application Program Interface Version 5.0. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf
  44. Harnessing the Multicores: Nested Data Parallelism in Haskell. Leibniz International Proceedings in Informatics, LIPIcs 2 (Dec. 2008). https://doi.org/10.4230/LIPIcs.FSTTCS.2008.1769
  45. Portable Performance on Heterogeneous Architectures. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (Houston, Texas, USA) (ASPLOS ’13). Association for Computing Machinery, New York, NY, USA, 431–444. https://doi.org/10.1145/2451116.2451162
  46. Hierarchical memory management for parallel programs. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016). Association for Computing Machinery, New York, NY, USA, 392–406. https://doi.org/10.1145/2951913.2951935
  47. Eliminating Voltage Emergencies via Software-guided Code Transformations. ACM Trans. Archit. Code Optim. 7, 2, Article 12 (Oct. 2010), 28 pages. https://doi.org/10.1145/1839667.1839674
  48. Vivek Sarkar. 1989. Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT Press, Cambridge, MA, USA.
  49. Vivek Sarkar. 1997. Analysis and Optimization of Explicitly Parallel Programs Using the Parallel Program Graph Representation. In LCPC.
  50. Vivek Sarkar and Barbara Simons. 1993. Parallel Program Graphs and Their Classification. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin, Heidelberg, 633–655.
  51. Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Austin, Texas, USA) (PPoPP ’17). Association for Computing Machinery, New York, NY, USA, 249–265. https://doi.org/10.1145/3018743.3018758
  52. MultiMLton: A multicore-aware runtime for standard ML. Journal of Functional Programming 24, 6 (Nov. 2014), 613–674. https://doi.org/10.1017/S0956796814000161 Publisher: Cambridge University Press.
  53. Harini Srinivasan and Michael Wolfe. 1991. Analyzing Programs with Explicit Parallelism. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin, Heidelberg, 405–419.
  54. Clairvoyance: Look-ahead compile-time scheduling. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 171–184.
  55. Speculative decoupled software pipelining. In 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007). IEEE, 49–59.
  56. Disentanglement in nested-parallel programs. Proceedings of the ACM on Programming Languages 4, POPL (Jan. 2020), 1–32. https://doi.org/10.1145/3371115
  57. Quantifying the Semantic Gap Between Serial and Parallel Programming. In 2021 IEEE International Symposium on Workload Characterization (IISWC). 151–162. https://doi.org/10.1109/IISWC53511.2021.00024

Summary

We haven't generated a summary for this paper yet.