Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
132 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LightningSimV2: Faster and Scalable Simulation for High-Level Synthesis via Graph Compilation and Optimization (2404.09471v2)

Published 15 Apr 2024 in cs.PF and cs.AR

Abstract: High-Level Synthesis (HLS) enables rapid prototyping of complex hardware designs by translating C or C++ code to low-level RTL code. However, the testing and evaluation of HLS designs still typically rely on slow RTL-level simulators that can take hours to provide feedback, especially for complex designs. A recent work, LightningSim, helps to solve this problem by providing a simulation workflow one to two orders of magnitude faster than RTL simulation. However, it still exhibits inefficiencies due to several types of redundant computation, making it slow for large design simulation and design space exploration. Addressing these inefficiencies, we introduce LightningSimV2, a much faster and scalable simulation tool. LightningSimV2 features three main innovations. First, we perform compile-time static analysis, exploiting the repetitive structures in HLS designs, e.g., loops, to reduce the simulation workload. Second, we propose a novel graph-based simulation approach, with decoupled simulation graph construction step and graph traversal step, significantly reducing repeated computation. Third, benefiting from the decoupled approach, LightningSimV2 can perform incremental stall analysis extremely fast, enabling highly efficient design space exploration of large numbers of complex hardware parameters, e.g., optimal FIFO depths. Moreover, the DSE is well-suited for parallel computing, further improving the DSE efficiency. Compared with LightningSim, LightningSimV2 achieves up to 3.5x speedup in full simulation and up to 577x speed up for incremental DSE. Our code is open-source on GitHub at https://github.com/sharc-lab/LightningSim/tree/v0.2.0.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (12)
  1. R. Sarkar and C. Hao, “LightningSim: Fast and accurate trace-based simulation for high-level synthesis,” in 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).   Marina Del Rey, CA, USA: IEEE, May 2023, pp. 1–11.
  2. R. Sarkar, S. Abi-Karam, Y. He, L. Sathidevi, and C. Hao, “FlowGNN: A dataflow architecture for real-time workload-agnostic graph neural network inference,” in 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA).   Montreal, QC, Canada: IEEE, Feb. 2023, pp. 1099–1112.
  3. S. Abi-Karam, R. Sarkar, D. Xu, Z. Fan, Z. Wang, and C. Hao, “INR-Arch: A dataflow architecture and compiler for arbitrary-order gradient computations in implicit neural representation processing,” in Proceedings of the 42nd IEEE/ACM International Conference on Computer-Aided Design, ser. ICCAD ’23.   San Francisco, CA, USA: Association for Computing Machinery, Nov. 2023.
  4. M. Abderehman, J. Patidar, J. Oza, Y. Nigam, T. A. Khader, and C. Karfa, “FastSim: A fast simulation framework for high-level synthesis,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 5, pp. 1371–1385, May 2022.
  5. Y.-K. Choi, Y. Chi, J. Wang, and J. Cong, “FLASH: Fast, parallel, and accurate simulator for HLS,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 12, pp. 4828–4841, Dec. 2020.
  6. J. Zhai, J. Hu, X. Tang, X. Ma, and W. Chen, “Cypress: Combining static and dynamic analysis for top-down communication trace compression,” in SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.   New Orleans, LA, USA: IEEE, Nov. 2014, pp. 143–153.
  7. C. Lattner and V. Adve, “LLVM: A compilation framework for lifelong program analysis & transformation,” in Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, ser. CGO ’04.   USA: IEEE Computer Society, Mar. 2004, p. 75.
  8. T. Hilbrich, B. R. de Supinski, M. Schulz, and M. S. Müller, “A graph based approach for MPI deadlock detection,” in Proceedings of the 23rd International Conference on Supercomputing, ser. ICS ’09.   New York, NY, USA: Association for Computing Machinery, Jun. 2009, pp. 296–305.
  9. Xilinx, “Basic examples for Vitis HLS,” GitHub, Apr. 2021.
  10. R. Kastner, J. Matai, and S. Neuendorffer, “Parallel programming for FPGAs,” May 2018.
  11. Xilinx, “Vitis accel examples’ repository,” GitHub, Aug. 2022.
  12. X. Zhang, H. Lu, C. Hao, J. Li, B. Cheng, Y. Li, K. Rupnow, J. Xiong, T. Huang, H. Shi, W.-M. Hwu, and D. Chen, “SkyNet: A hardware-efficient method for object detection and tracking on embedded systems,” Proceedings of Machine Learning and Systems, vol. 2, pp. 216–229, Mar. 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com