Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Low-Dissipation and Scalable GEMM Accelerator with Silicon Nitride Photonics (2402.11047v1)

Published 16 Feb 2024 in cs.AR, cs.ET, cs.PF, and physics.optics

Abstract: Over the past few years, several microring resonator (MRR)-based analog photonic architectures have been proposed to accelerate general matrix-matrix multiplications (GEMMs), which are found in abundance in deep learning workloads.These architectures have dramatically grown in popularity because they offer exceptional throughput and energy efficiency compared to their electronic counterparts. However, such architectures, due to their traditional realization based on the silicon-on-insulator (SOI) material platform, face two shortcomings. First, the high-index contrast of the SOI platform incurs high scattering losses, which mandates the provisioning of high optical input power.Second, SOI waveguides are susceptible to two-photon absorption, which can incur substantial optical signal losses at moderate-to-high signal fan-in. These shortcomings have severely detrimental effects on the achievable parallelism, throughput, and energy efficiency of SOI MRR-based GEMM accelerators. To address these shortcomings, we present a novel Silicon Nitride (SiN)-Based Photonic GEMM Accelerator called SiNPhAR. SiNPhAR architecture employs SiN-based active and passive devices to implement analog GEMM functions. Since the SiN material exhibits lower index contrast and no TPA, the optical signal losses in our SiNPhAR architecture are very low. This advantage significantly enhances the achievable processing parallelism, throughput, and energy efficiency of SiNPhAR architecture, compared to SOI-based photonic GEMM accelerators from prior work. We quantify and compare these benefits of SiNPhAR architecture via our cross-layer evaluation for a benchmark workload comprising four modern deep neural network models. From the system-level performance analysis, SiNPhAR demonstrates at least 1.7x better throughput FPS while consuming at least 2.8x better energy efficiency (FPS/W) than prior SOI-based GEMM accelerators.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. Y. LeCun et al., “Deep learning,” Nature, 2015.
  2. W. Liu et al., “A survey of deep neural network architectures and their applications,” Neurocomputing, 2017.
  3. L. Baischer et al., “Learning on hardware: A tutorial on neural network accelerators and co-processors,” 2021.
  4. J. Gu et al., “Squeezelight: Towards scalable optical neural networks with multi-operand ring resonators,” in DATE, 2021.
  5. Q. Cheng et al., “Silicon photonics codesign for deep learning,” Proceedings of the IEEE, 2020.
  6. L. Yang et al., “On-chip optical matrix-vector multiplier,” in Optics and Photonics for Information Processing.   SPIE, 2013.
  7. V. S. P. Karempudi et al., “An analysis of various design pathways towards multi-terabit photonic on-interposer interconnects,” JETCS, dec 2023.
  8. F. Brückerhoff-Plückelmann et al., “A large scale photonic matrix processor enabled by charge accumulation,” Nanophotonics, 2022.
  9. C.-C. Wang et al., “67.5-fj per access 1-kb sram using 40-nm logic cmos process,” in ISCAS, 2021.
  10. F. N. U. Juanda et al., “A 10-gs/s 4-bit single-core digital-to-analog converter for cognitive ultrawidebands,” TCS, 2017.
  11. D.-R. Oh et al., “An 8b 1gs/s 2.55mw sar-flash adc with complementary dynamic amplifiers,” in IVLSIC, 2020.
  12. Y.-S. Shu, “A 6b 3gs/s 11mw fully dynamic flash adc in 40nm cmos with reduced number of comparators,” in VLSIC, 2012.
  13. M. Guo et al., “A 29mw 5gs/s time-interleaved sar adc achieving 48.5db sndr with fully-digital timing-skew calibration based on digital-mixing,” in VLSIC, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.