Code Generation for a Variety of Accelerators for a Graph DSL (2401.02472v1)
Abstract: Sparse graphs are ubiquitous in real and virtual worlds. With the phenomenal growth in semi-structured and unstructured data, sizes of the underlying graphs have witnessed a rapid growth over the years. Analyzing such large structures necessitates parallel processing, which is challenged by the intrinsic irregularity of sparse computation, memory access, and communication. It would be ideal if programmers and domain-experts get to focus only on the sequential computation and a compiler takes care of auto-generating the parallel code. On the other side, there is a variety in the number of target hardware devices, and achieving optimal performance often demands coding in specific languages or frameworks. Our goal in this work is to focus on a graph DSL which allows the domain-experts to write almost-sequential code, and generate parallel code for different accelerators from the same algorithmic specification. In particular, we illustrate code generation from the StarPlat graph DSL for NVIDIA, AMD, and Intel GPUs using CUDA, OpenCL, SYCL, and OpenACC programming languages. Using a suite of ten large graphs and four popular algorithms, we present the efficacy of StarPlat's versatile code generator.
- A Tool for Translating Sequential Source Code to Parallel Code Written in C++ and OpenACC. In 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA). IEEE, New York, 1–8. https://doi.org/10.1109/AICCSA47632.2019.9035292
- StarPlat: A Versatile DSL for Graph Analytics. arXiv:2305.03317 [cs.DC]
- Ulrik Brandes. 2001. A Faster Algorithm for Betweenness Centrality, In Journal of Mathematical Sociology. Journal of Mathematical Sociology 25, 163–177.
- A quantitative study of irregular programs on GPUs. In Proceedings of the 2012 IEEE International Symposium on Workload Characterization, IISWC 2012, La Jolla, CA, USA, November 4-6, 2012. IEEE Computer Society, New York, NY, USA, 141–151. https://doi.org/10.1109/IISWC.2012.6402918
- MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs. In Proceedings of Workshop on GRAph Data Management Experiences and Systems (Snowbird, UT, USA) (GRADES’14). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/2621934.2621936
- Green-Marl: A DSL for Easy and Efficient Graph Analysis. SIGPLAN Not. 47, 4 (mar 2012), 349–362. https://doi.org/10.1145/2248487.2151013
- CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application. In 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. IEEE, New York, 136–143. https://doi.org/10.1109/CCGrid.2013.12
- CuSha: Vertex-Centric Graph Processing on GPUs. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing (Vancouver, BC, Canada) (HPDC ’14). Association for Computing Machinery, New York, NY, USA, 239–252. https://doi.org/10.1145/2600212.2600227
- Jure Leskovec and Rok Sosič. 2016. SNAP: A General-Purpose Network Analysis and Graph-Mining Library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1.
- Data-Driven Versus Topology-driven Irregular Computations on GPUs. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. IEEE, New York, 463–474. https://doi.org/10.1109/IPDPS.2013.28
- Erik Tomusk. 2021. Executing Graphs with OpenCL. In International Workshop on OpenCL (Munich, Germany) (IWOCL’21). Association for Computing Machinery, New York, NY, USA, Article 12, 2 pages. https://doi.org/10.1145/3456669.3456681
- Gunrock: a high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, March 12-16, 2016, Rafael Asenjo and Tim Harris (Eds.). ACM, Barcelona, Spain, 11:1–11:12. https://doi.org/10.1145/2851141.2851145
- OpenACC — First Experiences with Real-World Applications. In Euro-Par 2012 Parallel Processing, Christos Kaklamanis, Theodore Papatheodorou, and Paul G. Spirakis (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 859–870.
- Jianlong Zhong and Bingsheng He. 2014. Medusa: Simplified Graph Processing on GPUs. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2014), 1543–1552. https://doi.org/10.1109/TPDS.2013.111