Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SAT-based Exact Modulo Scheduling Mapping for Resource-Constrained CGRAs (2402.12834v2)

Published 20 Feb 2024 in cs.AR

Abstract: Coarse-Grain Reconfigurable Arrays (CGRAs) represent emerging low-power architectures designed to accelerate Compute-Intensive Loops (CILs). The effectiveness of CGRAs in providing acceleration relies on the quality of mapping: how efficiently the CIL is compiled onto the platform. State of the Art (SoA) compilation techniques utilize modulo scheduling to minimize the Iteration Interval (II) and use graph algorithms like Max-Clique Enumeration to address mapping challenges. Our work approaches the mapping problem through a satisfiability (SAT) formulation. We introduce the Kernel Mobility Schedule (KMS), an ad-hoc schedule used with the Data Flow Graph and CGRA architectural information to generate Boolean statements that, when satisfied, yield a valid mapping. Experimental results demonstrate SAT-MapIt outperforming SoA alternatives in almost 50\% of explored benchmarks. Additionally, we evaluated the mapping results in a synthesizable CGRA design and emphasized the run-time metrics trends, i.e. energy efficiency and latency, across different CILs and CGRA sizes. We show that a hardware-agnostic analysis performed on compiler-level metrics can optimally prune the architectural design space, while still retaining Pareto-optimal configurations. Moreover, by exploring how implementation details impact cost and performance on real hardware, we highlight the importance of holistic software-to-hardware mapping flows, as the one presented herein.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. PX-CGRA: Polymorphic Approximate Coarse-Grained Reconfigurable Architecture. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. IEEE, 413–418.
  2. Mahesh Balasubramanian and Aviral Shrivastava. 2020. CRIMSON: Compute-Intensive Loop Acceleration by Randomized Iterative Modulo Scheduling and Optimized Mapping on CGRAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 3300–3310.
  3. Mahesh Balasubramanian and Aviral Shrivastava. 2022. PathSeeker: A Fast Mapping Algorithm for CGRAs. Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (2022).
  4. The gem5 simulator. ACM SIGARCH computer architecture news 39, 2 (2011), 1–7.
  5. SAT encodings of the at-most-k constraint. In International Conference on Software Engineering and Formal Methods. Springer, 127–144.
  6. Liang Chen and Tulika Mitra. 2014. Graph minor approach for application mapping on CGRAs. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 7, 3 (2014), 1–25.
  7. S Alexander Chin and Jason H Anderson. 2018. An Architecture-Agnostic Integer Linear Programming Approach to CGRA Mapping. In Proceedings of the 55th Design Automation Conference. 1–6.
  8. CGRA-ME: A unified framework for CGRA modelling and exploration. In 2017 IEEE 28th international conference on application-specific systems, architectures and processors (ASAP). IEEE, 184–189.
  9. RAMP: Resource-Aware Mapping for CGRAs. In Proceedings of the 55th Design Automation Conference. IEEE, 1–6.
  10. HEAL-WEAR: An Ultra-Low Power Heterogeneous System for Bio-Signal Analysis. IEEE Transactions on Circuits and Systems I: Regular Papers 64, 9 (2017), 2448–2461.
  11. i-DPs CGRA: An Interleaved-Datapaths Reconfigurable Accelerator for Embedded Bio-Signal Processing. In IEEE Embedded Systems Letters, Vol. 11. 50–53.
  12. ESL-CGRA Gitlab repository. https://github.com/esl-epfl/OpenEdgeCGRA.
  13. SPR: an architecture-adaptive CGRA mapping tool. Proceedings of the 7th ACM SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA’09 (02 2009), 191–200. https://doi.org/10.1145/1508128.1508158
  14. Register allocation for programs in SSA-form. In International Conference on Compiler Construction. Springer, 247–262.
  15. EPIMap: Using Epimorphism to map applications on CGRAs. In Proceedings of the 49th Design Automation Conference. 1284–1291.
  16. REGIMap: Register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs). In Proceedings of the 50th Design Automation Conference. 1–10.
  17. HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect. In Proceedings of the 54th Design Automation Conference. 1–6.
  18. Optimizing stream program performance on CGRA-based systems. In Proceedings of the 52th Design Automation Conference. 1–6.
  19. ChordMap: Automated Mapping of Streaming Applications onto CGRA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021).
  20. Pierre L’Ecuyer. 1999. L’Ecuyer, P.: Tables of maximally equidistributed combined LFSR generators. Math. Comput. 68(225), 261-269. Math. Comput. 68 (01 1999), 261–269. https://doi.org/10.1090/S0025-5718-99-01039-X
  21. ADRES & DRESC: Architecture and Compiler for Coarse-Grain Reconfigurable Processors. In Fine and coarse-grain reconfigurable computing. Springer, 255–297.
  22. SAT-Based Mapping of Data-Flow Graphs onto Coarse-Grained Reconfigurable Arrays. In IFIP/IEEE International Conference on Very Large Scale Integration-System on a Chip. Springer, 113–131.
  23. Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340.
  24. Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures. In Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems. 21–30.
  25. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques. 166–176.
  26. A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective. IEEE Access 8 (2020), 146719–146743.
  27. B Ramakrishna Rau. 1996. Iterative Modulo Scheduling. International Journal of Parallel Programming 24, 1 (1996), 3–64.
  28. An Open-Hardware Coarse-Grained Reconfigurable Array for Edge Computing. In Workshop on Open-Source Hardware. ACM, 1–2.
  29. MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE transactions on computers 49, 5 (2000), 465–481.
  30. SAT-MapIt: A SAT-based Modulo Scheduling Mapper for Coarse Grain Reconfigurable Architectures. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). 1–6. https://doi.org/10.23919/DATE56975.2023.10137123
  31. CASCADE: High Throughput Data Streaming via Decoupled Access-Execute CGRA. ACM Transactions on Embedded Computing Systems (TECS) 18, 5s (2019), 1–26.
  32. Morpher: An Open-Source Integrated Compilation and Simulation Framework for CGRA. In Fifth Workshop on Open-Source EDA Technology (WOSET).
  33. HiMap: Fast and Scalable High-Quality Mapping on CGRA via Hierarchical Abstraction. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). 1192–1197. https://doi.org/10.23919/DATE51398.2021.9473916
  34. RegionSeeker: Automatically identifying and selecting accelerators from application source code. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 4 (2018), 741–754.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com