Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Analyzing and Improving Hardware Modeling of Accel-Sim (2401.10082v1)

Published 18 Jan 2024 in cs.AR

Abstract: GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU architectures, each SM/core is typically composed of several sub-cores, where each sub-core has its own independent pipeline. Simulators are a key tool for investigating novel concepts in computer architecture. They must be performance-accurate and have a proper model related to the target hardware to explore the different bottlenecks properly. This paper presents a wide analysis of different parts of Accel-sim, a popular GPGPU simulator, and some improvements of its model. First, we focus on the front-end and developed a more realistic model. Then, we analyze the way the result bus works and develop a more realistic one. Next, we describe the current memory pipeline model and propose a model for a more cost-effective design. Finally, we discuss other areas of improvement of the simulator.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Lightweight Register File Caching in Collector Units for GPUs. Proceedings of the 15th Workshop on General Purpose Processing Using GPU (feb 2023), 27–33. https://doi.org/10.1145/3589236.3589245
  2. Analyzing CUDA workloads using a detailed GPU simulator. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software. 163–174. https://doi.org/10.1109/ISPASS.2009.4919648
  3. Mitigating GPU Core Partitioning Performance Effects. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 530–542. https://doi.org/10.1109/HPCA56546.2023.10070957
  4. A quantitative study of irregular programs on GPUs. In Proceedings - 2012 IEEE International Symposium on Workload Characterization, IISWC 2012. 141–151. https://doi.org/10.1109/IISWC.2012.6402918
  5. Pannotia: Understanding irregular GPGPU graph applications. In 2013 IEEE International Symposium on Workload Characterization (IISWC). 185–195. https://doi.org/10.1109/IISWC.2013.6704684
  6. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IISWC 2009. 44–54. https://doi.org/10.1109/IISWC.2009.5306797
  7. Dissecting the NVidia Turing T4 GPU via Microbenchmarking Technical Report. (2019).
  8. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. CoRR abs/1804.06826 (2018). arXiv:1804.06826 http://arxiv.org/abs/1804.06826
  9. Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 473–486. https://doi.org/10.1109/ISCA45697.2020.00047
  10. Michael Mishkin. 2016. Write-after-Read Hazard Prevention in GPGPUsim. (2016).
  11. S. Narang and G. Diamos. 2016. GitHub - baidu-research/DeepBench: Benchmarking Deep Learning operations on different hardware. https://github.com/baidu-research/DeepBench
  12. NVIDIA. 2010. Consolidated crossbar that supports a multitude of traffic types.
  13. NVIDIA. 2017. NVIDIA Tesla V100 GPU architecture the world’s most advanced data center GPU. Technical Report. NVIDIA.
  14. NVIDIA. 2018. NVIDIA TURING GPU architecture Graphics Reinvented NVIDIA Turing GPU Architecture. Technical Report.
  15. NVIDIA. 2020. NVIDIA AMPERE GA102 GPU architecture Second-Generation RTX NVIDIA Ampere GA102 GPU Architecture. Technical Report.
  16. NVIDIA. 2022a. NVIDIA ADA GPU architecture. Technical Report.
  17. NVIDIA. 2022b. NVIDIA H100 Tensor Core GPU Architecture. Technical Report.
  18. Cache-conscious wavefront scheduling. In Proceedings - 2012 IEEE/ACM 45th International Symposium on Microarchitecture, MICRO 2012. IEEE Computer Society, 72–83. https://doi.org/10.1109/MICRO.2012.16
  19. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. Center for Reliable and High-Performance Computing (2012).
  20. MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). 197–209.
  21. Need for Speed: Experiences Building a Trustworthy System-Level GPU Simulator. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 868–880. https://doi.org/10.1109/HPCA51647.2021.00077
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Rodrigo Huerta (3 papers)
  2. Mojtaba Abaie Shoushtary (4 papers)
  3. Antonio González (64 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com