HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond
Abstract: Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extensibility, or lack of reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs, limiting wider adoption of such datasets. In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture ensures broad design space coverage via design space expansion and supports multiple vendor tools. Users can contribute to each stage with their own HLS designs and synthesis results and extend the framework itself with custom frontends and tool flows. We also include an initial set of built-in designs from common HLS benchmarks curated open-source HLS designs. We showcase the versatility and multi-functionality of our framework through seven case studies: I) ML model for QoR prediction; II) Design space sampling; III) Fine-grained parallelism backend speedup; IV) Targeting Intel's HLS flow; V) Adding new auxiliary designs; VI) Integrating published HLS data; VII) HLS tool version regression benchmarking.
- H. Mohammadi Makrani, F. Farahmand, H. Sayadi, S. Bondi, S. M. Pudukotai Dinakarrao, H. Homayoun, and S. Rafatirad, “Pyramid: Machine learning framework to estimate the optimal timing and resource usage of a high-level synthesis design,” in 2019 29th International Conference on Field Programmable Logic and Applications (FPL), 2019.
- S. Dai, Y. Zhou, H. Zhang, E. Ustun, E. F. Young, and Z. Zhang, “Fast and accurate estimation of quality of results in high-level synthesis with machine learning,” in 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2018, pp. 129–132.
- D. Liu and B. C. Schafer, “Efficient and reliable high-level synthesis design space explorer for fpgas,” in 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016.
- W. Haaswijk, E. Collins, B. Seguin, M. Soeken, F. Kaplan, S. Süsstrunk, and G. De Micheli, “Deep learning for logic optimization algorithms,” in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018.
- Y. Luo, C. Tan, N. B. Agostini, A. Li, A. Tumeo, N. Dave, and T. Geng, “Ml-cgra: An integrated compilation framework to enable efficient machine learning acceleration on cgras,” in 2023 60th ACM/IEEE Design Automation Conference (DAC), 2023.
- V. A. Chhabria, Y. Zhang, H. Ren, B. Keller, B. Khailany, and S. S. Sapatnekar, “Mavirec: Ml-aided vectored ir-drop estimation and classification,” in 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2021.
- R. G. Kim, J. R. Doppa, and P. P. Pande, “Machine learning for design space exploration and optimization of manycore systems,” in 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2018, pp. 1–6.
- Z. Lin, J. Zhao, S. Sinha, and W. Zhang, “HL-Pow: A Learning-Based Power Modeling Framework for High-Level Synthesis,” in 25th Asia and South Pacific Design Automation Conference (ASP-DAC), 2020.
- G. Singha, D. Diamantopoulosb, J. Gómez-Lunaa, S. Stuijkc, H. Corporaalc, and O. Mutlu, “LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning,” in IEEE 40th International Conference on Computer Design (ICCD), 2022.
- Z. Lin, Z. Yuan, J. Zhao, W. Zhang, H. Wang, and Y. Tian, “Powergear: Early-stage power estimation in fpga hls via heterogeneous edge-centric gnns,” in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2022.
- Q. Gautier, A. Althoff, P. Meng, and R. Kastner, “Spector: An opencl fpga benchmark suite,” in 2016 International Conference on Field-Programmable Technology (FPT). IEEE, 2016, pp. 141–148.
- Y. Zhou, U. Gupta, S. Dai, R. Zhao, N. Srivastava, H. Jin, J. Featherston, Y.-H. Lai, G. Liu, G. A. Velasquez, W. Wang, and Z. Zhang, “Rosetta: A realistic high-level synthesis benchmark suite for software programmable FPGAs,” in Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA ’18. Association for Computing Machinery, pp. 269–278. [Online]. Available: https://dl.acm.org/doi/10.1145/3174243.3174255
- Z. Wei, A. Arora, R. Li, and L. John, “HLSDataset: Open-source dataset for ML-assisted FPGA design using high level synthesis,” in 2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, pp. 197–204. [Online]. Available: https://ieeexplore.ieee.org/document/10265706/
- P. Goswami, M. Shahshahani, and D. Bhatia, “MLSBench: A Synthesizable Dataset of HLS Designs to Support ML Based Design Flows,” in Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA ’20. Association for Computing Machinery, p. 312. [Online]. Available: https://doi.org/10.1145/3373087.3375378
- Y. Bai, A. Sohrabizadeh, Z. Qin, Z. Hu, Y. Sun, and J. Cong, “Towards a comprehensive benchmark for high-level synthesis targeted to FPGAs,” in Thirty-Seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, Nov. 2023.
- PolyBench. [Online]. Available: https://web.cse.ohio-state.edu/~pouchet.2/software/polybench/
- Y. Hara, H. Tomiyama, S. Honda, H. Takada, and K. Ishii, “CHStone: A benchmark program suite for practical C-based high-level synthesis,” in 2008 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1192–1195.
- B. Reagen, R. Adolf, Y. S. Shao, G.-Y. Wei, and D. Brooks, “MachSuite: Benchmarks for accelerator design and customized architectures,” in 2014 IEEE International Symposium on Workload Characterization (IISWC), pp. 110–119. [Online]. Available: https://ieeexplore.ieee.org/document/6983050
- N. Wu, Y. Xie, and C. Hao, “Ironman: Gnn-assisted design space exploration in high-level synthesis via reinforcement learning,” in Proceedings of the 2021 on Great Lakes Symposium on VLSI, 2021, pp. 39–44.
- ——, “Ironman-pro: Multiobjective design space exploration in hls via reinforcement learning and graph neural network-based modeling,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 3, pp. 900–913, 2022.
- C. Wolf and J. Glaser, “Yosys - a free Verilog synthesis suite,” in Proceedings of the 21st Austrian Workshop on Microelectronics (Austrochip), Linz, Austria, Oct. 2013.
- Y. Wang, H. Huang, C. Rudin, and Y. Shaposhnik, “Understanding how dimension reduction tools work: An empirical approach to deciphering t-sne, umap, trimap, and pacmap for data visualization,” Journal of Machine Learning Research, vol. 22, no. 201, pp. 1–73, 2021. [Online]. Available: http://jmlr.org/papers/v22/20-1061.html
- R. Sarkar and C. Hao, “LightningSim: Fast and accurate trace-based simulation for high-level synthesis,” in 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). Marina Del Rey, CA, USA: IEEE, May 2023, pp. 1–11.
- AMD/Xilinx, “Basic examples for Vitis HLS,” GitHub, Apr. 2021.
- ——, “Vitis accel examples’ repository,” GitHub, Aug. 2022.
- R. Kastner, J. Matai, and S. Neuendorffer, “Parallel programming for FPGAs,” May 2018.
- R. Sarkar, S. Abi-Karam, Y. He, L. Sathidevi, and C. Hao, “FlowGNN: A dataflow architecture for real-time workload-agnostic graph neural network inference,” in 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). Montreal, QC, Canada: IEEE, Feb. 2023, pp. 1099–1112.
- H. Chen and C. Hao, “DGNN-booster: A generic FPGA accelerator framework for dynamic graph neural network inference,” in 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). Marina Del Rey, CA, USA: IEEE, May 2023, pp. 195–201.
- ——, “Mask-Net: A hardware-efficient object detection network with masked region proposals,” in 2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP). Gothenburg, Sweden: IEEE, Jul. 2022, pp. 131–138.
- X. Zhang, H. Lu, C. Hao, J. Li, B. Cheng, Y. Li, K. Rupnow, J. Xiong, T. Huang, H. Shi, W.-M. Hwu, and D. Chen, “SkyNet: A hardware-efficient method for object detection and tracking on embedded systems,” Proceedings of Machine Learning and Systems, vol. 2, pp. 216–229, Mar. 2020.
- R. Sarkar, H. Liang, Z. Fan, Z. Wang, and C. Hao, “Edge-MoE: Memory-efficient multi-task vision transformer architecture with task-level sparsity via mixture-of-experts,” in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). San Francisco, CA, USA: IEEE, Oct. 2023, pp. 01–09.
- R. M. Conroy, “What Hypotheses do “Nonparametric” Two-Group Tests Actually Test?” vol. 12, no. 2, pp. 182–190. [Online]. Available: https://doi.org/10.1177/1536867X1201200202
- G. W. Divine, H. J. Norton, A. E. Barón, and E. Juarez-Colunga, “The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians,” vol. 72, no. 3, pp. 278–286. [Online]. Available: https://doi.org/10.1080/00031305.2017.1305291
- A. Hart, “Mann-Whitney test is not just a test of medians: Differences in spread can be important,” vol. 323, no. 7309, pp. 391–393. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1120984/
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.