Efficient deadlock avoidance for 2D mesh NoCs that use OQ or VOQ routers (2303.10526v4)
Abstract: Network-on-chips (NoCs) are currently a widely used approach for achieving scalability of multi-cores to many-cores, as well as for interconnecting other vital system-on-chip (SoC) components. Each entity in 2D mesh-based NoCs has a router responsible for forwarding packets between the dimensions as well as the entity itself, and it is essentially a 5-port switch. With respect to the routing algorithm, there are important trade-offs between routing performance and the efficiency of overcoming potential deadlocks. Common deadlock avoidance techniques including the turn model usually involve restrictions of certain paths a packet can take at the cost of a higher probability for network congestion. In contrast, deadlock resolution techniques, as well as some avoidance schemes, provide more path flexibility at the expense of hardware complexity, such as by incorporating (or assuming) dedicated buffers. This paper provides a deadlock avoidance algorithm for NoC routers based on output-queues (OQs) or virtual-output queues (VOQs), with a focus on their use on field-programmable gate-arrays (FPGAs). The proposed approach features fewer path restrictions than common techniques, and can be based on existing routing algorithms as a baseline, deadlock-free or not. This requires no modification to the queueing topology, and the required logic is minimal. Our algorithm approaches the performance of fully-adaptive algorithms, while maintaining deadlock freedom.
- B. Adhi, C. Cortes, Y. Tan, T. Kojima, A. Podobas, and K. Sano, “The Cost of Flexibility: Embedded versus Discrete Routers in CGRAs for HPC,” in 2022 IEEE International Conference on Cluster Computing (CLUSTER), 2022, pp. 347–356.
- N. Kapre, N. Mehta, M. Delorimier, R. Rubin, H. Barnor, M. J. Wilson, M. Wrighton, and A. DeHon, “Packet switched vs. time multiplexed fpga overlay networks,” in 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, 2006, pp. 205–216.
- Y. Huan and A. DeHon, “Fpga optimized packet-switched noc using split and merge primitives,” in 2012 International Conference on Field-Programmable Technology. IEEE, 2012, pp. 47–52.
- J. Wang, Y.-b. Li, and Q.-c. Peng, “A performance analytical model for noc with voq router architecture,” in The 2nd International Conference on Information Science and Engineering. IEEE, 2010, pp. 924–927.
- A. B. Ahmed and A. B. Abdallah, “Graceful deadlock-free fault-tolerant routing algorithm for 3d network-on-chip architectures,” Journal of Parallel and Distributed Computing, vol. 74, no. 4, pp. 2229–2240, 2014.
- P. Papaphilippou, K. Sano, B. A. Adhi, and W. Luk, “Experimental survey of fpga-based monolithic switches and a novel queue balancer,” IEEE Transactions on Parallel and Distributed Systems, pp. 1–14, 2023.
- F. Verbeek and J. Schmaltz, “On necessary and sufficient conditions for deadlock-free routing in wormhole networks,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 12, pp. 2022–2032, 2011.
- H. Farrokhbakht, H. Kao, K. Hasan, P. V. Gratz, T. Krishna, J. San Miguel, and N. E. Jerger, “Pitstop: Enabling a virtual network free network-on-chip,” in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2021, pp. 682–695.
- A. Ramrakhyani, P. V. Gratz, and T. Krishna, “Synchronized progress in interconnection networks (spin): A new theory for deadlock freedom,” in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018, pp. 699–711.
- Y. Dai, K. Lu, S. Ma, and J. Chang, “Full-credit flow control: a novel technique to implement deadlock-free adaptive routing,” in 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2022, pp. 1041–1046.
- S. Ma, N. E. Jerger, and Z. Wang, “Whole packet forwarding: Efficient design of fully adaptive routing algorithms for networks-on-chip,” in IEEE International Symposium on High-Performance Comp Architecture. IEEE, 2012, pp. 1–12.
- P. Papaphilippou, K. Sano, B. A. Adhi, and W. Luk, “Efficient queue-balancing switch for fpgas,” in 2021 International Conference on Field-Programmable Technology (ICFPT), Dec 2021, pp. 1–5.
- C. J. Glass and L. M. Ni, “The turn model for adaptive routing,” ACM SIGARCH Computer Architecture News, vol. 20, no. 2, pp. 278–287, 1992.
- P. Lopez, J.-M. Martínez, and J. Duato, “A very efficient distributed deadlock detection mechanism for wormhole networks,” in Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture. IEEE, 1998, pp. 57–66.
- T. Van Chu and K. Kise, “Lef: An effective routing algorithm for two-dimensional meshes,” IEICE TRANSACTIONS on Information and Systems, vol. 102, no. 10, pp. 1925–1941, 2019.
- S. Ma, Z. Wang, N. E. Jerger, L. Shen, and N. Xiao, “Novel flow control for fully adaptive routing in cache-coherent nocs,” IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 9, pp. 2397–2407, 2013.
- R. Holsmark, “Deadlock free routing in mesh networks on chip with regions,” Ph.D. dissertation, Linköping University Electronic Press, 2009.
- J. Duato, “A new theory of deadlock-free adaptive routing in wormhole networks,” IEEE transactions on parallel and distributed systems, vol. 4, no. 12, pp. 1320–1331, 1993.
- D. Seo, A. Ali, W.-T. Lim, and N. Rafique, “Near-optimal worst-case throughput routing for two-dimensional mesh networks,” in 32nd International Symposium on Computer Architecture (ISCA’05), 2005, pp. 432–443.
- J. Duato, “A necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks,” IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 10, pp. 1055–1067, 1995.
- M. Ebrahimi, M. Daneshtalab, P. Liljeberg, J. Plosila, and H. Tenhunen, “Catra-congestion aware trapezoid-based routing algorithm for on-chip networks,” in 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2012, pp. 320–325.
- J. Hu and R. Marculescu, “Dyad: smart routing for networks-on-chip,” in Proceedings of the 41st annual Design Automation Conference, 2004, pp. 260–263.
- C. Bienia, “Benchmarking modern multiprocessors,” Ph.D. dissertation, Princeton University, 2011.
- J. Hestness, B. Grot, and S. W. Keckler, “Netrace: dependency-driven trace-based network-on-chip simulation,” in Proceedings of the Third International Workshop on Network on Chip Architectures, 2010, pp. 31–36.
- N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt, “The m5 simulator: Modeling networked systems,” Ieee micro, vol. 26, no. 4, pp. 52–60, 2006.
- C. Wolf, “Yosys open synthesis suite,” 2016.
- J. Bush, N. Taherinejad, E. Willegger, M. Wojcik, M. Kessler, J. Blatnik, I. Daktylidis, J. Ferdig, and D. Haslauer, “Nyuzi: An open source gpgpu for graphics, enhanced with opencl compiler for calculations.”
- Xilinx, “Versal architecture prime series libraries guide (ug1344),” 2022. [Online]. Available: https://docs.xilinx.com/r/2022.1-English/ug1344-versal-architecture-libraries/Primitive-Groups
- N. Kapre and J. Gray, “Hoplite: Building austere overlay nocs for fpgas,” in 2015 25th international conference on field programmable logic and applications (FPL). IEEE, 2015, pp. 1–8.
- K. Helal, S. Attia, H. A. Fahmy, T. Ismail, Y. Ismail, and H. Mostafa, “Dual split-merge: A high throughput router architecture for fpgas,” Microelectronics Journal, vol. 81, pp. 51–57, 2018.
- G.-M. Chiu, “The odd-even turn model for adaptive routing,” IEEE Transactions on parallel and distributed systems, vol. 11, no. 7, pp. 729–738, 2000.
- V. Puente, C. Izu, R. Beivide, J. A. Gregorio, F. Vallejo, and J. M. Prellezo, “The adaptive bubble router,” Journal of Parallel and Distributed Computing, vol. 61, no. 9, pp. 1180–1208, 2001.
- Z. Yu, X. Wang, and K. Shen, “Conditional forwarding: simple flow control to increase adaptivity for fully adaptive routing algorithms,” The Journal of Supercomputing, vol. 72, no. 2, pp. 639–653, 2016.