DG-RePlAce: A Dataflow-Driven GPU-Accelerated Analytical Global Placement Framework for Machine Learning Accelerators (2404.13049v2)
Abstract: Global placement is a fundamental step in VLSI physical design. The wide use of 2D processing element (PE) arrays in machine learning accelerators poses new challenges of scalability and Quality of Results (QoR) for state-of-the-art academic global placers. In this work, we develop DG-RePlAce, a new and fast GPU-accelerated global placement framework built on top of the OpenROAD infrastructure, which exploits the inherent dataflow and datapath structures of machine learning accelerators. Experimental results with a variety of machine learning accelerators using a commercial 12nm enablement show that, compared with RePlAce (DREAMPlace), our approach achieves an average reduction in routed wirelength by 10% (7%) and total negative slack (TNS) by 31% (34%), with faster global placement and on-par total runtimes relative to DREAMPlace. Empirical studies on the TILOS MacroPlacement Benchmarks further demonstrate that post-route improvements over RePlAce and DREAMPlace may reach beyond the motivating application to machine learning accelerators.
- S. Chou, M.-K. Hsu and Y.-W. Chang, “Structure-aware placement for datapath-intensive circuit designs”, Proc. DAC, 2012, pp. 762-767.
- C.-K. Cheng, A. B. Kahng, I. Kang and L. Wang, “RePlAce: advancing solution quality and routability validation in global placement”, IEEE Trans. on CAD 38(9) (2019), pp. 1717-1730.
- J. Cong and Y. Zou, “Parallel multi-level analytical global placement on graphics processing units”, Proc. ICCAD, 2009, pp. 681-688.
- F. Gessler, P. Brisk and M. Stojilovič, “A shared-memory parallel implementation of the RePlAce global cell placer”, Proc. VLSID, 2020, pp. 78-83.
- A. B. Kahng, R. Varadarajan and Z. Wang, “RTL-MP: toward practical, human-quality chip planning and macro placement”, Proc. ISPD, 2022, pp. 3-11.
- A. B. Kahng, R. Varadarajan and Z. Wang, “Hier-RTLMP: a hierarchical automatic macro placer for large-scale complex IP blocks”, IEEE Trans. on CAD, 2023. https://ieeexplore.ieee.org/document/10372220
- S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi, “Optimization by simulated annealing”, Science 220(4598) (1983), pp. 671-680.
- C.-X. Lin and M. D. F. Wong, “Accelerate analytical placement with GPU: a generic approach”, Proc. DATE, 2018, pp. 1345-1350.
- J.-M. Lin, W.-F. Huang, Y.-C. Chen, Y.-T. Wang and P.-W. Wang, “DAPA: a dataflow-aware analytical placement algorithm for modern mixed-size circuit designs”, Proc. ICCAD, 2021, pp. 1-8.
- J.-M. Lin, Y.-L. Deng, Y.-C. Yang, J.-J. Chen and P.-C. Lu, “Dataflow-aware macro placement based on simulated evolution algorithm for mixed-size designs”, IEEE Trans. on VLSI Systems 29(5) (2021), pp. 973-984.
- P. Liao, S. Liu, Z. Chen, W. Lv, Y. Lin and B. Yu, “DREAMPlace 4.0: timing-driven global placement with momentum-based net weighting”, Proc. DATE, 2022, pp. 939-944.
- L. Liu, B. Fu, M. D. F. Wong and E. F. Y. Young, “Xplace: an extremely fast and extensible global placement framework”, Proc. DAC, 2022, pp. 1309–1314.
- H. Murata, K. Fujiyoshi, S. Nakatake and Y. Kajitani, “VLSI module placement based on rectangle-packing by the sequence-pair”, IEEE Trans. on CAD 15(12) (1996), pp. 1518-1524.
- R. X.T. Nijssen and J. A.G. Jess, “Two-dimensional datapath regularity extraction”, Proc. ACM/SIGDA Physical Design Workshop, 1996, pp. 111-117.
- P. Quinton, “An introduction to systolic architectures”, Future Parallel Computers: An Advanced Course, P. Treleaven and M. Vanneschi, Eds., Springer, 1987, pp. 387-400.
- A. Vidal-Obiols, J. Cortadella, J. Petit, M. Galceran-Oms and F. Martorell, “Multilevel dataflow-driven macro placement guided by RTL structure and analytical methods”, IEEE Trans. on CAD 40(12) (2021), pp. 2542-2555.
- S. Ward, D. Ding and D. Z. Pan, “PADE: A high-performance placer with automatic datapath extraction and evaluation through high-dimensional data learning”, Proc. DAC, 2012, pp. 756-761.
- OpenROAD, https://github.com/The-OpenROAD-Project/OpenROAD.
- DREAMPlace 4.0.0, https://github.com/limbo018/DREAMPlace.
- MacroPlacement, https://github.com/TILOS-AI-Institute/MacroPlacement.
- Default hyperparameter settings for DREAMPlace, https://github.com/limbo018/DREAMPlace/blob/master/test/ispd2019/lefdef/ispd19_test1.json.
- Default hyperparameter settings for RePlAce, https://github.com/The-OpenROAD-Project/RePlAce/blob/cf289bb141a995d8304656cd994ba3f2f95b2f8a/src/nesterovPlace.cpp#L24.
- BlackParrot repo, https://github.com/black-parrot/black-parrot.
- MemPool repo, https://github.com/pulp-platform/mempool.
- Ablation (artificial intelligence), https://en.wikipedia.org/wiki/Ablation_(artificial_intelligence)#:~:text=An%20ablation%20study%20investigates%20the,component%20to%20the%20overall%20system.
- OpenDB, https://github.com/The-OpenROAD-Project/OpenROAD/tree/master/src/odb.
- Hier-RTLMP, https://github.com/The-OpenROAD-Project/OpenROAD/tree/master/src/mpl2.
- The DG-RePlAce repository, https://github.com/ABKGroup/DG-RePlAce.
- BOOM: the berkeley out-of-order risc-v processor, https://github.com/riscvboom/.