A Comparative Analysis of Microrings Based Incoherent Photonic GEMM Accelerators (2402.03149v3)
Abstract: Several microring resonator (MRR) based analog photonic architectures have been proposed to accelerate general matrix-matrix multiplications (GEMMs) in deep neural networks with exceptional throughput and energy efficiency. To implement GEMM functions, these MRR-based architectures, in general, manipulate optical signals in five different ways: (i) Splitting (copying) of multiple optical signals to achieve a certain fan-out, (ii) Aggregation (multiplexing) of multiple optical signals to achieve a certain fan-in, (iii) Modulation of optical signals to imprint input values onto analog signal amplitude, (iv) Weighting of modulated optical signals to achieve analog input-weight multiplication, (v) Summation of optical signals. The MRR-based GEMM accelerators undertake the first four ways of signal manipulation in an arbitrary order ignoring the possible impact of the order of these manipulations on their performance. In this paper, we conduct a detailed analysis of accelerator organizations with three different orders of these manipulations: (1) Modulation-Aggregation-Splitting-Weighting (MASW), (2) Aggregation-Splitting-Modulation-Weighting (ASMW), and (3) Splitting-Modulation-Weighting-Aggregation (SMWA). We show that these organizations affect the crosstalk noise and optical signal losses in different magnitudes, which renders these organizations with different levels of processing parallelism at the circuit level, and different magnitudes of throughput and energy-area efficiency at the system level. Our evaluation results for four CNN models show that SMWA organization achieves up to 4.4$\times$, 5$\times$, and 5.2$\times$ better throughput, energy efficiency, and area-energy efficiency, respectively, compared to ASMW and MASW organizations on average.
- Y. LeCun et al., “Deep learning,” nature, 2015.
- S. Dong et al., “A survey on deep learning and its applications,” Comput. Sci. Rev., 2021.
- L. Baischer et al., “Learning on hardware: A tutorial on neural network accelerators and co-processors,” 04 2021.
- V. Sze et al., “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, 2017.
- W. Liu et al., “Holylight: A nanophotonic accelerator for deep learning in data centers,” in DATE, 2019.
- J. Gu et al., “Squeezelight: Towards scalable optical neural networks with multi-operand ring resonators,” in DATE, 2021.
- V. Bangari et al., “Digital electronics and analog photonics for convolutional neural networks (deap-cnns),” JSTQE, 2020.
- Q. Cheng et al., “Silicon photonics codesign for deep learning,” Proceedings of the IEEE, 2020.
- F. Sunny et al., “Crosslight: A cross-layer optimized silicon photonic neural network accelerator,” in DAC, 2021.
- K. Shiflett et al., “Pixel: Photonic neural network accelerator,” in HPCA, 2020.
- A. N. Tait et al., “Neuromorphic photonic networks using silicon photonic weight banks,” Scientific reports.
- S. Sri Vatsavai et al., “Photonic reconfigurable accelerators for efficient inference of cnns with mixed-sized tensors,” TCAD, 2022.
- “Pytorch 2.0 documentation — pytorch.org.” [Online]. Available: https://pytorch.org/docs/stable/generated/torch.nn.Unfold.html
- L. S. Blackford et al., “An updated set of basic linear algebra subprograms (blas),” ACM Transactions on Mathematical Software, 2002.
- M. Fatica, “Cuda toolkit and libraries,” in 2008 IEEE Hot Chips 20 Symposium HCS, 2008, pp. 1–22.
- N. P. Jouppi et al., “In-datacenter performance analysis of a tensor processing unit,” 2017.
- K. Shiflett et al., “Albireo: Energy-efficient acceleration of convolutional neural networks via silicon photonics,” in ISCA, 2021.
- C. Demirkiran et al., “An electro-photonic system for accelerating deep neural networks,” ArXiv, 2021.
- S. Vatsavai et al., “Sconna: A stochastic computing based optical accelerator for ultra-fast, energy-efficient inference of integer-quantized cnns,” in IPDPS, 2023.
- P. Y. Ma et al., “Photonic independent component analysis using an on-chip microring weight bank,” Opt. Express, 2020.
- H. Zhang et al., “An optical neural chip for implementing complex-valued neural network,” Nature Communications, 2021.
- L. De Marinis et al., “Photonic neural networks: A survey,” IEEE Access, 2019.
- F. P. Sunny et al., “A survey on silicon photonics for deep learning,” JETC, 2021.
- N. Peserico et al., “Integrated photonic tensor processing unit for a matrix multiply: a review,” JLT, 2023.
- F. P. Sunny et al., “Robin: A robust optical binary neural network accelerator,” TECS, 2021.
- L. Yang et al., “On-chip optical matrix-vector multiplier,” Proc SPIE, 2013.
- M. A. Al-Qadasi et al., “Scaling up silicon photonic-based accelerators: Challenges and opportunities,” APL Photonics.
- H. Kwon et al., “Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects,” SIGPLAN Not., 2018.
- F. Muñoz Martínez et al., “Stift: A spatio-temporal integrated folding tree for efficient reductions in flexible dnn accelerators,” JETC, 2023.
- F. Sunny et al., “A silicon photonic accelerator for convolutional neural networks with heterogeneous quantization,” in GLSVLSI, 2022.
- K. Padmaraju et al., “Intermodulation crosstalk characteristics of wdm silicon microring modulators,” IEEE Photonics Technology Letters, 2014.
- V. S. P. Karempudi et al., “Photonic networks-on-chip employing multilevel signaling: A cross-layer comparative study,” JETC, 2022.
- M. Bahadori et al., “Comprehensive design space exploration of silicon photonic interconnects,” JLT, 2016.
- A. N. Tait et al., “Microring weight banks,” JSTQE, 2016.
- C.-C. Wang et al., “67.5-fj per access 1-kb sram using 40-nm logic cmos process,” in ISCAS, 2021.
- A. van den Bosch et al., “A 10-bit 1-gsample/s nyquist current-steering cmos d/a converter,” JSSC, 2001.
- D.-R. Oh et al., “An 8b 1gs/s 2.55mw sar-flash adc with complementary dynamic amplifiers,” in IVLSIC, 2020.
- Y.-S. Shu, “A 6b 3gs/s 11mw fully dynamic flash adc in 40nm cmos with reduced number of comparators,” in VLSIC, 2012.
- M. Guo et al., “A 29mw 5gs/s time-interleaved sar adc achieving 48.5db sndr with fully-digital timing-skew calibration based on digital-mixing,” in VLSIC, 2019.
- C. Szegedy et al., “Going deeper with convolutions,” in CVPR, 2015.
- K. He et al., “Deep residual learning for image recognition,” in CVPR, 2016.
- A. Howard et al., “Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation,” in CVPR, 2018.
- X. Zhang et al., “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” 2018.