Analyzing and Improving Hardware Modeling of Accel-Sim (2401.10082v1)
Abstract: GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU architectures, each SM/core is typically composed of several sub-cores, where each sub-core has its own independent pipeline. Simulators are a key tool for investigating novel concepts in computer architecture. They must be performance-accurate and have a proper model related to the target hardware to explore the different bottlenecks properly. This paper presents a wide analysis of different parts of Accel-sim, a popular GPGPU simulator, and some improvements of its model. First, we focus on the front-end and developed a more realistic model. Then, we analyze the way the result bus works and develop a more realistic one. Next, we describe the current memory pipeline model and propose a model for a more cost-effective design. Finally, we discuss other areas of improvement of the simulator.
- Lightweight Register File Caching in Collector Units for GPUs. Proceedings of the 15th Workshop on General Purpose Processing Using GPU (feb 2023), 27–33. https://doi.org/10.1145/3589236.3589245
- Analyzing CUDA workloads using a detailed GPU simulator. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software. 163–174. https://doi.org/10.1109/ISPASS.2009.4919648
- Mitigating GPU Core Partitioning Performance Effects. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 530–542. https://doi.org/10.1109/HPCA56546.2023.10070957
- A quantitative study of irregular programs on GPUs. In Proceedings - 2012 IEEE International Symposium on Workload Characterization, IISWC 2012. 141–151. https://doi.org/10.1109/IISWC.2012.6402918
- Pannotia: Understanding irregular GPGPU graph applications. In 2013 IEEE International Symposium on Workload Characterization (IISWC). 185–195. https://doi.org/10.1109/IISWC.2013.6704684
- Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IISWC 2009. 44–54. https://doi.org/10.1109/IISWC.2009.5306797
- Dissecting the NVidia Turing T4 GPU via Microbenchmarking Technical Report. (2019).
- Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. CoRR abs/1804.06826 (2018). arXiv:1804.06826 http://arxiv.org/abs/1804.06826
- Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 473–486. https://doi.org/10.1109/ISCA45697.2020.00047
- Michael Mishkin. 2016. Write-after-Read Hazard Prevention in GPGPUsim. (2016).
- S. Narang and G. Diamos. 2016. GitHub - baidu-research/DeepBench: Benchmarking Deep Learning operations on different hardware. https://github.com/baidu-research/DeepBench
- NVIDIA. 2010. Consolidated crossbar that supports a multitude of traffic types.
- NVIDIA. 2017. NVIDIA Tesla V100 GPU architecture the world’s most advanced data center GPU. Technical Report. NVIDIA.
- NVIDIA. 2018. NVIDIA TURING GPU architecture Graphics Reinvented NVIDIA Turing GPU Architecture. Technical Report.
- NVIDIA. 2020. NVIDIA AMPERE GA102 GPU architecture Second-Generation RTX NVIDIA Ampere GA102 GPU Architecture. Technical Report.
- NVIDIA. 2022a. NVIDIA ADA GPU architecture. Technical Report.
- NVIDIA. 2022b. NVIDIA H100 Tensor Core GPU Architecture. Technical Report.
- Cache-conscious wavefront scheduling. In Proceedings - 2012 IEEE/ACM 45th International Symposium on Microarchitecture, MICRO 2012. IEEE Computer Society, 72–83. https://doi.org/10.1109/MICRO.2012.16
- Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. Center for Reliable and High-Performance Computing (2012).
- MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). 197–209.
- Need for Speed: Experiences Building a Trustworthy System-Level GPU Simulator. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 868–880. https://doi.org/10.1109/HPCA51647.2021.00077
- Rodrigo Huerta (3 papers)
- Mojtaba Abaie Shoushtary (4 papers)
- Antonio González (64 papers)