Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets (2311.16417v2)

Published 28 Nov 2023 in cs.AR

Abstract: Fast-evolving AI algorithms such as LLMs have been driving the ever-increasing computing demands in today's data centers. Heterogeneous computing with domain-specific architectures (DSAs) brings many opportunities when scaling up and scaling out the computing system. In particular, heterogeneous chiplet architecture is favored to keep scaling up and scaling out the system as well as to reduce the design complexity and the cost stemming from the traditional monolithic chip design. However, how to interconnect computing resources and orchestrate heterogeneous chiplets is the key to success. In this paper, we first discuss the diversity and evolving demands of different AI workloads. We discuss how chiplet brings better cost efficiency and shorter time to market. Then we discuss the challenges in establishing chiplet interface standards, packaging, and security issues. We further discuss the software programming challenges in chiplet systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. A. Reuther et al., “Ai and ml accelerator survey and trends.”   Institute of Electrical and Electronics Engineers Inc., 2022.
  2. Langhammer et al., “Stratix 10 nx architecture,” ACM TRETS, vol. 15, no. 4, pp. 1–32, 2022.
  3. B. Gaide et al., “Xilinx adaptive compute acceleration platform: Versaltm architecture,” in ACM/SIGDA FPGA, 2019.
  4. J. Zhuang et al., “CHARM: Composing Heterogeneous AcceleRators for Matrix Multiply on Versal ACAP Architecture,” in ACM/SIGDA FPGA, 2023.
  5. Z. Yang et al., “AIM: Accelerating Arbitrary-precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP,” in ICCAD, 2023.
  6. J. Zhuang et al., “High Performance, Low Power Matrix Multiply Design on ACAP: from Architecture, Design Challenges and DSE Perspectives,” in 2023 60th ACM/IEEE Design Automation Conference DAC, 2023.
  7. AMD, “Amd ryzen™ ai software platform,” https://www.amd.com/en/developer/resources/ryzen-ai-software-platform.html.
  8. M. Hutner et al., “Special session: Test challenges in a chiplet marketplace,” in 38th VTS.   IEEE, 2020.
  9. “Open domain-specific architecture ¿¿ open compute project,” https://www.opencompute.org/projects/open-domain-specific-architecture.
  10. T. Li et al., “Chiplet heterogeneous integration technology—status and challenges,” Electronics, vol. 9, no. 4, p. 670, 2020.
  11. A. Radford et al., “Improving language understanding by generative pre-training,” 2018.
  12. J. Devlin et al., “BERT: pre-training of deep bidirectional transformers for language understanding,” CoRR, vol. abs/1810.04805, 2018.
  13. “Full stack optimization of transformer inference: a survey,” 2 2023. [Online]. Available: http://arxiv.org/abs/2302.14017
  14. C. Zhang et al., “Optimizing fpga-based accelerator design for deep convolutional neural networks,” in ACM/SIGDA FPGA.   ACM, 2015.
  15. A. Gholami et al., “Ai and memory wall,” RiseLab Medium Post, 2021.
  16. G. H. Loh et al., “Understanding chiplets today to anticipate future integration opportunities and limits,” in DATE, 2021.
  17. P. Gupta et al., “Goodbye, motherboard. bare chiplets bonded to silicon will make computers smaller and more powerful: Hello, silicon-interconnect fabric,” IEEE Spectrum, vol. 56, no. 10, pp. 28–33, 2019.
  18. S. Naffziger et al., “Pioneering chiplet technology and design for the amd epyc™ and ryzen™ processor families: Industrial product,” in ACM/IEEE 48th ISCA.   IEEE, 2021.
  19. R. Munoz, “Furthering moore’s law integration benefits in the chiplet era,” IEEE Design & Test, pp. 1–1, 2023.
  20. H. Peng et al., “Chiplet cloud: Building ai supercomputers for serving large generative language models,” arXiv:2307.02666, 2023.
  21. A. C. Carusone et al., “Ultra-short-reach interconnects for package-level integration,” in IEEE OI, 2016.
  22. Intel. (2022) Advanced Interface Bus (AIB) Specification, Revision 2.0.3. https://github.com/chipsalliance/AIB-specification/blob/master/AIB_Specification%202_0.pdf. Update Date:2022/06/17.
  23. David Kehlet, “Accelerating Innovation Through A Standard Chiplet Interface: The Advanced Interface Bus (AIB),” https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/accelerating-innovation-through-aib-whitepaper.pdf, 2022, update Date:2022/06/17.
  24. R. Farjadrad et al., “A bunch-of-wires (bow) interface for interchiplet communication,” IEEE Micro, vol. 40, no. 1, pp. 15–24, 2020.
  25. “Bunch of Wires (BoW) PHY Specification, DRAFT Version 2.0,” https://www.opencompute.org/documents/bow-specification-v2-0d-1-pdf, update Date:2023/03.
  26. JEDEC, “High Bandwidth Memory DRAM (HBM3),” January,2022. [Online]. Available: https://www.jedec.org/document_search?search_api_views_fulltext=JESD238
  27. M.-S. Lin et al., “A 16nm 256-bit wide 89.6gbyte/s total bandwidth in-package interconnect with 0.3v swing and 0.062pj/bit power in info package,” in IEEE HCS, 2016.
  28. “Universal Chiplet Interconnect Express (UCIe) Specification, Revision 1.1, Version 1.0,” https://www.uciexpress.org/specifications, update Date: 2023/11.
  29. “Advanced Cost-driven Chiplet Interface (ACC 1.0),” http://www.iiisct.com/smart/upload/CMS1/202303/ACC1.0.pdf, update Date: 2023/11.
  30. K. Ma, “Introducing acc 1.0: Advanced cost-driven chiplet interface standard,” in The 3rd HiPChips Conference at ISCA, 2023. [Online]. Available: https://hipchips.github.io/isca2023/
  31. S. Ardalan et al., “An open inter-chiplet communication link: Bunch of wires (bow),” IEEE Micro, vol. 41, no. 1, pp. 54–60, 2020.
  32. X. Ma et al., “Survey on chiplets: interface, interconnect and integration methodology,” CCF THPC, 2022.
  33. B. Dehlaghi et al., “Ultra-short-reach interconnects for die-to-die links: Global bandwidth demands in microcosm,” IEEE Solid-State Circuits Magazine, vol. 11, no. 2, pp. 42–53, 2019.
  34. D. Das Sharma et al., “Universal chiplet interconnect express (ucie): An open industry standard for innovations with chiplets at package level,” IEEE CPMT, 2022.
  35. D. Stow et al., “Investigation of cost-optimal network-on-chip for passive and active interposer systems,” in 2019 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP).   IEEE.
  36. S. Bharadwaj et al., “Kite: A family of heterogeneous interposer topologies enabled via accurate interconnect modeling,” in DAC, 2020.
  37. E. Taheri et al., “Deft: A deadlock-free and fault-tolerant routing algorithm for 2.5 d chiplet networks,” in DATE.   IEEE, 2022.
  38. J. Yin et al., “Modular routing design for chiplet-based systems,” in 45th ISCA.   IEEE, 2018.
  39. C. Chen et al., “Design challenges of intrachiplet and interchiplet interconnection,” IEEE Design & Test, 2022.
  40. H. Zhi et al., “A methodology for simulating multi-chiplet systems using open-source simulators,” in NANOCOM, 2021.
  41. G. Krishnan et al., “Siam: Chiplet-based scalable in-memory acceleration with mesh for deep neural networks,” ACM TECS, 2021.
  42. A. Mastroianni et al., “Proposed standardization of heterogenous integrated chiplet models.”   IEEE, 2021.
  43. Y. Feng et al., “Chiplet actuary: A quantitative cost model and multi-chiplet architecture exploration,” in DAC, 2022.
  44. “IEEE Standard for Access and Control of Instrumentation Embedded within a Semiconductor Device,” IEEE Std 1687-2014, pp. 1–283, 2014.
  45. “IEEE Standard for Test Access Architecture for Three-Dimensional Stacked Integrated Circuits,” IEEE Std 1838-2019, 2020.
  46. Y. Ma et al., “Tap-2.5d: A thermally-aware chiplet placement methodology for 2.5d systems,” DATE, 2021.
  47. M. D. Kabir et al., “Holistic 2.5d chiplet design flow: A 65nm shared-block microcontroller case study,” IEEE SOCC 2020.
  48. A. Kabir et al., “Coupling extraction and optimization for heterogeneous 2.5d chiplet-package co-design,” 2020.
  49. F. Eris et al., “Leveraging thermally-aware chiplet organization in 2.5 d systems to reclaim dark silicon,” in DATE.   IEEE, 2018.
  50. X. Li et al., “Power management for chiplet-based multicore systems using deep reinforcement learning.”   IEEE Computer Society, 2022.
  51. W. Hu et al., “An overview of hardware security and trust: Threats, countermeasures, and design tools,” TCAD, 2020.
  52. M. S. U. I. Sami et al., “Enabling security of heterogeneous integration: From supply chain to in-field operations,” IEEE Design & Test, 2023.
  53. M. Schwarz et al., “Zombieload: Cross-privilege-boundary data sampling,” in CCS, 2019.
  54. F. McKeen et al., “Intel® software guard extensions (intel® sgx) support for dynamic memory management inside an enclave,” in HASP, 2016.
  55. C. Luo et al., “Side-channel timing attack of rsa on a gpu,” TACO, vol. 16, no. 3, pp. 1–18, 2019.
  56. Z. Wang et al., “Side-channel attack analysis on in-memory computing architectures,” IEEE TETC, 2023.
  57. M. Dai et al., “Don’t mesh around:{{\{{Side-Channel}}\}} attacks and mitigations on mesh interconnects,” in USENIX Security, 2022, pp. 2857–2874.
  58. J. Krautter et al., “Fpgahammer: Remote voltage fault attacks on shared fpgas, suitable for dfa on aes,” IACR CHES, pp. 44–68, 2018.
  59. M. M. Alam et al., “Ram-jam: Remote temperature and voltage fault attack on fpgas using memory collisions,” in 2019 FDTC.   IEEE, 2019.
  60. M. A. Elmohr et al., “Em fault injection on arm and risc-v,” in 2020 ISQED.   IEEE, 2020, pp. 206–212.
  61. S. Volos et al., “Graviton: Trusted execution environments on {{\{{GPUs}}\}},” in OSDI, 2018, pp. 681–696.
  62. M. Zhao et al., “Shef: Shielded enclaves for cloud fpgas,” in ASPLOS, 2022, pp. 1070–1085.
  63. M. Nabeel et al., “2.5 d root of trust: Secure system-level integration of untrusted chiplets,” IEEE TC, vol. 69, no. 11, pp. 1611–1625, 2020.
  64. Y. Safari et al., “Hybrid obfuscation of chiplet-based systems,” in DAC.   IEEE, 2023, pp. 1–6.
  65. “SYCL Overview,” https://www.khronos.org/sycl/.
  66. R. W. Wisniewski et al., “A holistic systems approach to leveraging heterogeneity,” in 2021 PEHC.   IEEE, 2021, pp. 27–33.
  67. C. Lattner et al., “MLIR: Scaling compiler infrastructure for domain specific computation,” in 2021 IEEE/ACM CGO.   IEEE, 2021.
  68. H. Ye et al., “Scalehls: A new scalable high-level synthesis framework on multi-level intermediate representation,” HPCA, 2022.
  69. Y.-H. Lai et al., “Heterocl: A multi-paradigm programming infrastructure for software-defined reconfigurable computing,” in FPGA, 2019.
  70. X. Zhang et al., “H2h: Heterogeneous model to heterogeneous system mapping with computation and communication awareness,” in Proceedings of the 59th ACM/IEEE Design Automation Conference, 2022, pp. 601–606.
  71. A. Sohrabizadeh et al., “Autodse: Enabling software programmers to design efficient fpga accelerators,” TODAES, 2022.
Citations (5)

Summary

We haven't generated a summary for this paper yet.