Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame (2401.13310v1)

Published 24 Jan 2024 in cs.DC

Abstract: The world's largest particle accelerator, located at CERN, produces petabytes of data that need to be analysed efficiently, to study the fundamental structures of our universe. ROOT is an open-source C++ data analysis framework, developed for this purpose. Its high-level data analysis interface, RDataFrame, currently only supports CPU parallelism. Given the increasing heterogeneity in computing facilities, it becomes crucial to efficiently support GPGPUs to take advantage of the available resources. SYCL allows for a single-source implementation, which enables support for different architectures. In this paper, we describe a CUDA implementation and the migration process to SYCL, focusing on a core high energy physics operation in RDataFrame -- histogramming. We detail the challenges that we faced when integrating SYCL into a large and complex code base. Furthermore, we perform an extensive comparative performance analysis of two SYCL compilers, AdaptiveCpp and DPC++, and the reference CUDA implementation. We highlight the performance bottlenecks that we encountered, and the methodology used to detect these. Based on our findings, we provide actionable insights for developers of SYCL applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. AdaptiveCpp “AdaptiveCpp”, 2023 URL: https://github.com/AdaptiveCpp/AdaptiveCpp
  2. “Exploring the Possibility of a HipSYCL-Based Implementation of OneAPI” In International Workshop on OpenCL, IWOCL’22 Bristol, United Kingdom, United Kingdom: Association for Computing Machinery, 2022 DOI: 10.1145/3529538.3530005
  3. “A Medium-Scale Distributed System for Computer Science Research: Infrastructure for the Long Term” In Computer 49.5, 2016, pp. 54–63 DOI: 10.1109/MC.2016.127
  4. Igor Baratta, Chris Richardson and Garth Wells “Performance Analysis of Matrix-Free Conjugate Gradient Kernels Using SYCL” In International Workshop on OpenCL, IWOCL’22 Bristol, United Kingdom, United Kingdom: Association for Computing Machinery, 2022 DOI: 10.1145/3529538.3529993
  5. “ROOT — An object oriented data analysis framework” New Computing Techniques in Physics Research V In Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 389.1, 1997, pp. 81–86 DOI: https://doi.org/10.1016/S0168-9002(97)00048-X
  6. CERN “Storage: What data to record?” Retrieved on 23-11-2023, 2023 ROOT URL: https://home.cern/science/computing/storage
  7. Jolly Chen “ROOT GPU Histogramming development branch” Latest commit: https://github.com/jolly-chen/root/commit/7bc23323b503cdd7b3dafbf1792ca599b7048fed, 2023 ROOT URL: https://github.com/jolly-chen/root/tree/gpu_histogram_bulk
  8. “Evaluating the Performance of HPC-Style SYCL Applications” In Proceedings of the International Workshop on OpenCL, IWOCL ’20 Munich, Germany: Association for Computing Machinery, 2020 DOI: 10.1145/3388333.3388643
  9. The Khronos SYCL Working Group “SYCL 2020 Specification (revision 8)” Retrieved on 03-01-2024, 2023 The Khronos Group, Inc. URL: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html
  10. Enrico Guiraud “ROOT RDataFrame bulk processing development branch” Latest commit: https://github.com/eguiraud/root/commit/3cb95f7b1b321a2e24329ce8bdb88729f235ae54, 2023 ROOT URL: https://github.com/eguiraud/root/tree/df-bulk-ntuple
  11. Mark Harris “CUDA Pro Tip: Understand Fat Binaries and JIT Caching”, 2013 NVIDIA URL: https://developer.nvidia.com/blog/cuda-pro-tip-understand-fat-binaries-jit-caching/
  12. “A Roadmap for HEP Software and Computing R&D for the 2020s” In Computing and software for big science 3 Springer, 2019, pp. 1–49 DOI: https://doi.org/10.1007/s41781-018-0018-8
  13. iluhad “AdaptiveCpp compilation model”, 2023 University of Heidelberg URL: https://github.com/AdaptiveCpp/AdaptiveCpp/blob/68f49a788c035bc16746de3e45ad383dd68aab5e/doc/compilation.md
  14. Intel “oneAPI DPC++ compiler”, 2023 URL: https://github.com/intel/llvm
  15. Intel Corporation “oneAPI DPC++ Compiler and Runtime architecture design” Retrieved on 07-01-2024, 2024 Intel Corporation URL: https://intel.github.io/llvm-docs/design/CompilerAndRuntimeDesign.html
  16. “Evaluating the Performance of Integer Sum Reduction on an Intel GPU” In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2021, pp. 652–655 DOI: 10.1109/IPDPSW52791.2021.00099
  17. Zheming Jin and Jeffrey S. Vetter “Evaluating Nonuniform Reduction in HIP and SYCL on GPUs” In 2022 IEEE/ACM 8th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD), 2022, pp. 37–43 DOI: 10.1109/DRBSD56682.2022.00010
  18. “Comparing SYCL data transfer strategies for tracking use cases” In Journal of Physics: Conference Series 2438.1 IOP Publishing, 2023, pp. 012018 DOI: 10.1088/1742-6596/2438/1/012018
  19. NVIDIA “CUDA C++ Programming Guide” Version: v12.3, 2024 NVIDIA URL: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#application-compatibility
  20. NVIDIA “CUDA Samples: reduction_kernel.cu”, 2023 URL: https://github.com/NVIDIA/cuda-samples/blob/e8568c417356f7e66bb9b7130d6be7e55324a519/Samples/2_Concepts_and_Techniques/reduction/reduction_kernel.cu
  21. NVIDIA “Difference between the driver and runtime APIs” Retrieved on 07-01-2024, 2023 NVIDIA URL: https://docs.nvidia.com/cuda/cuda-runtime-api/driver-vs-runtime-api.html
  22. “Distributed data analysis with ROOT RDataFrame” In EPJ Web Conf. 245, 2020, pp. 03009 DOI: 10.1051/epjconf/202024503009
  23. “RDataFrame: Easy Parallel ROOT Analysis at 100 Threads” In EPJ Web of Conferences 214, 2019, pp. 06029 DOI: 10.1051/epjconf/201921406029
  24. ROOT “Cling - The Interactive C++ Interpreter”, 2023 ROOT URL: https://github.com/root-project/cling
  25. ROOT “ROOT::Experimental::RNTuple Class Reference” Retrieved on 15-12-2023, 2023 ROOT URL: https://root.cern/doc/master/classROOT_1_1Experimental_1_1RNTuple.html
  26. ROOT “TTree Class Reference” Retrieved on 15-12-2023, 2023 ROOT URL: https://root.cern/doc/master/group__tutorial__dataframe.html
Citations (1)

Summary

We haven't generated a summary for this paper yet.