Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs (2405.06758v1)

Published 10 May 2024 in cs.LG

Abstract: Across a wide range of hardware scenarios, the computational efficiency and physical size of the arithmetic units significantly influence the speed and footprint of the overall hardware system. Nevertheless, the effectiveness of prior arithmetic design techniques proves inadequate, as it does not sufficiently optimize speed and area, resulting in a reduced processing rate and larger module size. To boost the arithmetic performance, in this work, we focus on the two most common and fundamental arithmetic modules: adders and multipliers. We cast the design tasks as single-player tree generation games, leveraging reinforcement learning techniques to optimize their arithmetic tree structures. Such a tree generation formulation allows us to efficiently navigate the vast search space and discover superior arithmetic designs that improve computational efficiency and hardware size within just a few hours. For adders, our approach discovers designs of 128-bit adders that achieve Pareto optimality in theoretical metrics. Compared with the state-of-the-art PrefixRL, our method decreases computational delay and hardware size by up to 26% and 30%, respectively. For multipliers, when compared to RL-MUL, our approach increases speed and reduces size by as much as 49% and 45%. Moreover, the inherent flexibility and scalability of our method enable us to deploy our designs into cutting-edge technologies, as we show that they can be seamlessly integrated into 7nm technology. We believe our work will offer valuable insights into hardware design, further accelerating speed and reducing size through the refined search space and our tree generation methodologies. See our introduction video at https://bit.ly/ArithmeticTree. Codes are released at https://github.com/laiyao1/ArithmeticTree.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. Fully hardware-implemented memristor convolutional neural network. Nature, 577(7792):641–646, 2020.
  2. High performance computing framework for tera-scale database search of mass spectrometry data. Nature computational science, 1(8):550–561, 2021.
  3. Quantum computing for finance: Overview and prospects. Reviews in Physics, 4:100028, 2019.
  4. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems (NeurIPS), 35:27730–27744, 2022.
  5. An autonomous wearable biosensor powered by a perovskite solar cell. Nature Electronics, pages 1–12, 2023.
  6. Cmos scaling trends and beyond. IEEE Micro, 37(6):20–29, 2017.
  7. Yuan Taur. Cmos design near the limit of scaling. IBM Journal of Research and Development, 46(2.3):213–222, 2002.
  8. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 770–778. IEEE, 2016.
  9. David P Rodgers. Improvements in multiprocessor system design. ACM SIGARCH Computer Architecture News, 13(3):225–231, 1985.
  10. Toshiro Hiramoto. Five nanometre cmos technology. Nature Electronics, 2(12):557–558, 2019.
  11. The era of hyper-scaling in electronics. Nature Electronics, 1(8):442–450, 2018.
  12. Jack Sklansky. Conditional-sum addition logic. IRE Transactions on Electronic computers, pages 226–231, 1960.
  13. Christopher S Wallace. A suggestion for a fast multiplier. IEEE Transactions on electronic Computers, pages 14–17, 1964.
  14. Towards optimal performance-area trade-off in adders by synthesis of parallel prefix structures. In Proceedings of the Annual DAC, pages 1–8. IEEE, 2013.
  15. Towards optimal performance-area trade-off in adders by synthesis of parallel prefix structures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 33(10):1517, 2014.
  16. Gomil: Global optimization of multiplier by integer linear programming. In Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 374–379. IEEE, 2021.
  17. Prefixrl: Optimization of parallel prefix circuits using deep reinforcement learning. In Proceedings of the Annual Design Automation Conference (DAC), pages 853–858. IEEE, 2021.
  18. RL-MUL: Multiplier design optimization with deep reinforcement learning. In Proceedings of the Annual Design Automation Conference (DAC), pages 1–8. IEEE, 2023.
  19. High-speed adder design space exploration via graph neural processes. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 41(8):2657–2670, 2021.
  20. Cross-layer optimization for high speed adders: A pareto driven machine learning approach. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 38(12):2298–2311, 2018.
  21. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  22. Samir Palnitkar. Verilog HDL: a guide to digital design and synthesis, volume 1. Prentice Hall Professional, 2003.
  23. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, 4(1):1–43, 2012.
  24. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  25. A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Transactions on Computers (TC), 100(8):786–793, 1973.
  26. Parallel prefix computation. Journal of the ACM (JACM), 27(4):831–838, 1980.
  27. Fundamentals of logic design. Cengage Learning, 2020.
  28. Monte carlo tree search: A review of recent modifications and applications. Artificial Intelligence Review, 56(3):2497–2562, 2023.
  29. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
  30. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610(7930):47–53, 2022.
  31. Bandit based monte-carlo planning. In European conference on machine learning, pages 282–293. Springer, 2006.
  32. Deep reinforcement learning for 2048. In Conference on Neural Information Processing Systems (NeurIPS), 2017.
  33. Fionn Murtagh. Multilayer perceptrons for classification and regression. Neurocomputing, 2(5-6):183–197, 1991.
  34. Pattern recognition and machine learning, volume 4. Springer, 2006.
  35. Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (ICCV), pages 1440–1448. IEEE, 2015.
  36. Easymac: design exploration-enabled multiplier-accumulator generator using a canonical architectural representation. In Proceedings of Asia and South Pacific Design Automation Conference (ASP-DAC), pages 647–653. IEEE, 2022.
  37. Clifford Wolf. Yosys open synthesis suite, 2016.
  38. Openroad: Toward a self-driving, open-source digital layout implementation tool chain. In Proceedings of Government Microcircuit Applications and Critical Technology Conference, 2019.
  39. Inc NanGate. NanGate FreePDK45 open cell library, 2008.
  40. ASAP7: A 7-nm FinFET predictive process design kit. Microelectronics Journal, 53, 2016.
  41. Marc Snir. Depth-size trade-offs for parallel prefix computation. Journal of Algorithms, 7(2):185–201, 1986.
  42. Area minimization algorithm for parallel prefix adders under bitwise delay constraints. In Proceedings of the 17th ACM Great Lakes symposium on VLSI, pages 435–440, 2007.
  43. Brent and Kung. A regular layout for parallel adders. IEEE Transactions on Computers (TC), 100(3):260–264, 1982.
  44. Parhami Behrooz. Computer arithmetic: Algorithms and hardware designs. Oxford University Press, 19:512583–512585, 2000.
  45. Simulated annealing. Springer, 1987.
  46. John K Ousterhout et al. Tcl: An embeddable command language. University of California, Berkeley, Computer Science Division, 1989.
  47. Synopsys Design Compiler. Synopsys design compiler. Pages/default. aspx, 2016.
  48. CMOS VLSI design: a circuits and systems perspective. Pearson Education India, 2015.
  49. VLSI DESIGN OF HIGH-SPEED, LOW-AREA ADDITION CIRCUITRY. IEEE, 1987.
  50. An algorithmic approach for generic parallel adders. In International Conference on Computer Aided Design (ICCAD), pages 734–740. IEEE, 2003.
  51. John P Fishburn. A depth-decreasing heuristic for combinational logic: or how to convert a ripple-carry adder into a carry-lookahead adder or anything in-between. In ACM/IEEE DAC, pages 361–364. ACM/IEEE, 1991.
  52. Reto Zimmermann. Non-heuristic optimization and synthesis of parallel-prefix adders. In proc. of IFIP workshop. Citeseer, 1996.
  53. Constructing zero-deficiency parallel prefix adder of minimum depth. In Proceedings of Asia and South Pacific Design Automation Conference (ASP-DAC), pages 883–888, 2005.
  54. Francisco S Melo. Convergence of q-learning: A simple proof. Institute Of Systems and Robotics, Tech. Rep, pages 1–4, 2001.
  55. Luigi Dadda. Some schemes for parallel multipliers. Alta frequenza, 34:349–356, 1965.
  56. A comparison of dadda and wallace multiplier delays. In Advanced signal processing algorithms, architectures, and implementations XIII, volume 5205, pages 552–560. SPIE, 2003.
  57. Faster sorting algorithms discovered using deep reinforcement learning. Nature, 618(7964):257–263, 2023.
  58. Learn goal-conditioned policy with intrinsic motivation for deep reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, 2022.
  59. Unsupervised domain adaptation with dynamics-aware rewards in reinforcement learning. NeurIPS, 34:28784–28797, 2021.
  60. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  61. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  62. Policy gradient methods for reinforcement learning with function approximation. Conference on Neural Information Processing Systems (NeurIPS), 12, 1999.
  63. Behavior proximal policy optimization. arXiv preprint arXiv:2302.11312, 2023.
  64. Dara: Dynamics-aware reward augmentation in offline reinforcement learning. arXiv preprint arXiv:2203.06662, 2022.
  65. Ceil: Generalized contextual imitation learning. NeurIPS, 36, 2024.
  66. Drills: Deep reinforcement learning for logic synthesis. In 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), pages 581–586. IEEE, 2020.
  67. Exploring logic optimizations with reinforcement learning and graph convolutional network. In Proceedings of the 2020 ACM/IEEE Workshop on Machine Learning for CAD, pages 145–150, 2020.
  68. Ossp-pta: An online stochastic stepping policy for pta on reinforcement learning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023.
  69. Accelerating nonlinear dc circuit simulation with reinforcement learning. In Proceedings of the 59th ACM/IEEE Design Automation Conference, pages 619–624, 2022.
  70. A graph placement methodology for fast chip design. Nature, 594(7862):207–212, 2021.
  71. On joint learning for solving placement and routing in chip design. Advances in Neural Information Processing Systems (NeurIPS), 34:16508–16519, 2021.
  72. Maskplace: Fast chip placement via reinforced visual representation learning. Advances in Neural Information Processing Systems (NeurIPS), 35:24019–24030, 2022.
  73. Macro placement by wire-mask-guided black-box optimization. Advances in Neural Information Processing Systems (NeurIPS), 36, 2024.
  74. Chipformer: Transferable chip placement via offline decision transformer. In ICML, pages 18346–18364. PMLR, 2023.
  75. Preroutgnn for timing prediction with order preserving partition: Global circuit pre-training, local delay learning and attentional cell modeling. arXiv preprint arXiv:2403.00012, 2024.
  76. Reinforcement learning guided detailed routing for custom circuits. In Proceedings of the 2023 International Symposium on Physical Design (ISPD), pages 26–34, 2023.
  77. Asynchronous reinforcement learning framework for net order exploration in detailed routing. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1815–1820. IEEE, 2021.
  78. A reinforced learning solution for clock skew engineering to reduce peak current and ir drop. In Proceedings of the 2021 on Great Lakes Symposium on VLSI, pages 181–187, 2021.
  79. Gcn-rl circuit designer: Transferable transistor sizing with graph neural networks and reinforcement learning. In 2020 57th ACM/IEEE Design Automation Conference (DAC), pages 1–6. IEEE, 2020.
  80. Rl-sizer: Vlsi gate sizing for timing optimization using deep reinforcement learning. In 2021 58th ACM/IEEE DAC, pages 733–738. IEEE, 2021.
  81. Deeptpi: Test point insertion with deep reinforcement learning. In 2022 IEEE International Test Conference (ITC), pages 194–203. IEEE, 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.