Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation (2401.05154v1)

Published 10 Jan 2024 in cs.AR and cs.PL

Abstract: With the increasing demand for computing capability given limited resource and power budgets, it is crucial to deploy applications to customized accelerators like FPGAs. However, FPGA programming is non-trivial. Although existing high-level synthesis (HLS) tools improve productivity to a certain extent, they are limited in scope and capability to support sufficient FPGA-oriented optimizations. This paper focuses on FPGA-based accelerators and proposes POM, an optimizing framework built on multi-level intermediate representation (MLIR). POM has several features which demonstrate its scope and capability of performance optimization. First, most HLS tools depend exclusively on a single-level IR to perform all the optimizations, introducing excessive information into the IR and making debugging an arduous task. In contrast, POM introduces three layers of IR to perform operations at suitable abstraction levels, streamlining the implementation and debugging process and exhibiting better flexibility, extensibility, and systematicness. Second, POM integrates the polyhedral model into MLIR, enabling advanced dependence analysis and various FPGA-oriented loop transformations. By representing nested loops with integer sets and maps, loop transformations can be conducted conveniently through manipulations on polyhedral semantics. Finally, to further relieve design effort, POM has a user-friendly programming interface (DSL) that allows a concise description of computation and includes a rich collection of scheduling primitives. An automatic design space exploration (DSE) engine is provided to search for high-performance optimization schemes efficiently and generate optimized accelerators automatically. Experimental results show that POM achieves a $6.46\times$ average speedup on typical benchmark suites and a $6.06\times$ average speedup on real-world applications compared to the state-of-the-art.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. “About mlir polyhedral optimization.” [Online]. Available: https://discourse.llvm.org/t/about-mlir-polyhedral-optimization/1268
  2. “Affine dialect.” [Online]. Available: https://mlir.llvm.org/docs/Dialects/Affine/
  3. “Arith dialect.” [Online]. Available: https://mlir.llvm.org/docs/Dialects/ArithOps/
  4. “Circuit ir compilers and tools.” [Online]. Available: https://circt.llvm.org/
  5. “Clang: a c language family frontend for llvm.” [Online]. Available: https://clang.llvm.org/
  6. “Memref dialect.” [Online]. Available: https://mlir.llvm.org/docs/Dialects/MemRef/
  7. “Mlir: The case for a simplified polyhedral form.” [Online]. Available: https://mlir.llvm.org/docs/Rationale/RationaleSimplifiedPolyhedralForm/
  8. “Tensor dialect.” [Online]. Available: https://mlir.llvm.org/docs/Dialects/TensorOps/
  9. R. Baghdadi, U. Beaugnon, A. Cohen, T. Grosser, M. Kruse, C. Reddy, S. Verdoolaege, A. Betts, A. F. Donaldson, J. Ketema, J. Absar, S. Van Haastregt, A. Kravets, A. Lokhmotov, R. David, and E. Hajiyev, “Pencil: A platform-neutral compute intermediate language for accelerator programming,” in 2015 International Conference on Parallel Architecture and Compilation (PACT), 2015, pp. 138–149.
  10. R. Baghdadi, J. Ray, M. B. Romdhane, E. Del Sozzo, A. Akkas, Y. Zhang, P. Suriana, S. Kamil, and S. Amarasinghe, “Tiramisu: A polyhedral compiler for expressing fast and portable code,” in 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).   IEEE, 2019, pp. 193–205.
  11. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, “Pluto: A practical and fully automatic polyhedral program optimization system,” in Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 08), Tucson, AZ (June 2008).   Citeseer, 2008.
  12. L. Chelini, A. Drebes, O. Zinenko, A. Cohen, N. Vasilache, T. Grosser, and H. Corporaal, “Progressive raising in multi-level ir,” in 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).   IEEE, 2021, pp. 15–26.
  13. T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze et al., “{{\{{TVM}}\}}: An automated {{\{{End-to-End}}\}} optimizing compiler for deep learning,” in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 2018, pp. 578–594.
  14. J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang, “High-level synthesis for fpgas: From prototyping to deployment,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 4, pp. 473–491, 2011.
  15. J. Cong and J. Wang, “Polysa: Polyhedral-based systolic array auto-compilation,” in 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2018, pp. 1–8.
  16. V. Elango, N. Rubin, M. Ravishankar, H. Sandanagobalane, and V. Grover, “Diesel: Dsl for linear algebra and neural net computations on gpus,” in Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018, pp. 42–51.
  17. T. Gysi, C. Müller, O. Zinenko, S. Herhut, E. Davis, T. Wicky, O. Fuhrer, T. Hoefler, and T. Grosser, “Domain-specific multi-level ir rewriting for gpu: The open earth compiler for gpu-accelerated climate simulation,” ACM Transactions on Architecture and Code Optimization (TACO), vol. 18, no. 4, pp. 1–23, 2021.
  18. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015. [Online]. Available: http://arxiv.org/abs/1512.03385
  19. Y. Ikarashi, G. L. Bernstein, A. Reinking, H. Genc, and J. Ragan-Kelley, “Exocompilation for productive programming of hardware accelerators,” in Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, ser. PLDI 2022.   New York, NY, USA: Association for Computing Machinery, 2022, p. 703–718. [Online]. Available: https://doi.org/10.1145/3519939.3523446
  20. M. Kong, R. Veras, K. Stock, F. Franchetti, L.-N. Pouchet, and P. Sadayappan, “When polyhedral transformations meet simd code generation,” in Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation, 2013, pp. 127–138.
  21. Y.-H. Lai, Y. Chi, Y. Hu, J. Wang, C. H. Yu, Y. Zhou, J. Cong, and Z. Zhang, “Heterocl: A multi-paradigm programming infrastructure for software-defined reconfigurable computing,” in Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019, pp. 242–251.
  22. C. Lattner and V. Adve, “Llvm: A compilation framework for lifelong program analysis & transformation,” in International Symposium on Code Generation and Optimization, 2004. CGO 2004.   IEEE, 2004, pp. 75–86.
  23. C. Lattner, M. Amini, U. Bondhugula, A. Cohen, A. Davis, J. Pienaar, R. Riddle, T. Shpeisman, N. Vasilache, and O. Zinenko, “Mlir: A compiler infrastructure for the end of moore’s law,” arXiv preprint arXiv:2002.11054, 2020.
  24. J. Li, Y. Chi, and J. Cong, “Heterohalide: From image processing dsl to efficient fpga acceleration,” 02 2020, pp. 51–57.
  25. T. Moreau, T. Chen, L. Vega, J. Roesch, E. Yan, L. Zheng, J. Fromm, Z. Jiang, L. Ceze, C. Guestrin et al., “A hardware–software blueprint for flexible deep learning specialization,” IEEE Micro, vol. 39, no. 5, pp. 8–16, 2019.
  26. R. T. Mullapudi, V. Vasista, and U. Bondhugula, “Polymage: Automatic optimization for image processing pipelines,” ACM SIGARCH Computer Architecture News, vol. 43, no. 1, pp. 429–443, 2015.
  27. L.-N. Pouchet, U. Bondhugula, and C. Bastoul, “Polybench: The polyhedral benchmark suite,” http://web.cse.ohio-state.edu/~pouchet/software/polybench/, 2012.
  28. J. Pu, S. Bell, X. Yang, J. Setter, S. Richardson, J. Ragan-Kelley, and M. Horowitz, “Programming heterogeneous systems from an image processing dsl,” ACM Trans. Archit. Code Optim., vol. 14, no. 3, aug 2017. [Online]. Available: https://doi.org/10.1145/3107953
  29. J. Ragan-Kelley, A. Adams, D. Sharlet, C. Barnes, S. Paris, M. Levoy, S. Amarasinghe, and F. Durand, “Halide: Decoupling algorithms from schedules for high-performance image processing,” Communications of the ACM, vol. 61, no. 1, pp. 106–115, 2017.
  30. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  31. S. Verdoolaege, “isl: An integer set library for the polyhedral model,” in International Congress on Mathematical Software.   Springer, 2010, pp. 299–302.
  32. S. Verdoolaege, S. Guelton, T. Grosser, and A. Cohen, “Schedule trees,” Jan. 2014, 4th International Workshop on Polyhedral Compilation Techniques, IMPACT 2014 ; Conference date: 20-01-2014 Through 20-01-2014. [Online]. Available: http://impact.gforge.inria.fr/impact2014/
  33. J. Wang, L. Guo, and J. Cong, “Autosa: A polyhedral compiler for high-performance systolic arrays on fpga,” in The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA ’21.   New York, NY, USA: Association for Computing Machinery, 2021, p. 93–104. [Online]. Available: https://doi.org/10.1145/3431920.3439292
  34. S. Xiang, Y.-H. Lai, Y. Zhou, H. Chen, N. Zhang, D. Pal, and Z. Zhang, “Heteroflow: An accelerator programming model with decoupled data placement for software-defined fpgas,” in Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2022, pp. 78–88.
  35. H. Ye, C. Hao, J. Cheng, H. Jeong, J. Huang, S. Neuendorffer, and D. Chen, “Scalehls: A new scalable high-level synthesis framework on multi-level intermediate representation,” in 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).   IEEE, 2022, pp. 741–755.
  36. T. Yuki, G. Gupta, D. Kim, T. Pathan, and S. Rajopadhye, “Alphaz: A system for design space exploration in the polyhedral model,” in International Workshop on Languages and Compilers for Parallel Computing.   Springer, 2012, pp. 17–31.
  37. J. Zhao, B. Li, W. Nie, Z. Geng, R. Zhang, X. Gao, B. Cheng, C. Wu, Y. Cheng, Z. Li, P. Di, K. Zhang, and X. Jin, “Akg: Automatic kernel generation for neural processing units using polyhedral transformations,” in Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, ser. PLDI 2021.   New York, NY, USA: Association for Computing Machinery, 2021, p. 1233–1248. [Online]. Available: https://doi.org/10.1145/3453483.3454106
  38. J. Zhao, L. Feng, S. Sinha, W. Zhang, Y. Liang, and B. He, “Comba: A comprehensive model-based analysis framework for high level synthesis of real applications,” in 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).   IEEE, 2017, pp. 430–437.
  39. J. Zhao, L. Feng, S. Sinha, W. Zhang, Y. Liang, and B. He, “Performance modeling and directives optimization for high-level synthesis on fpga,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 7, pp. 1428–1441, 2020.
  40. R. Zhao, J. Cheng, W. Luk, and G. A. Constantinides, “Polsca: Polyhedral high-level synthesis with compiler transformations,” in 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL).   Los Alamitos, CA, USA: IEEE Computer Society, sep 2022, pp. 235–242. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/FPL57034.2022.00044
  41. W. Zuo, Y. Liang, P. Li, K. Rupnow, D. Chen, and J. Cong, “Improving high level synthesis optimization opportunity through polyhedral transformations,” in Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, 2013, pp. 9–18.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Weichuang Zhang (1 paper)
  2. Jieru Zhao (28 papers)
  3. Guan Shen (3 papers)
  4. Quan Chen (91 papers)
  5. Chen Chen (753 papers)
  6. Minyi Guo (98 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com