Beyond Terabit/s Integrated Neuromorphic Photonic Processor for DSP-Free Optical Interconnects (2504.15044v1)

Published 21 Apr 2025 in physics.optics, cs.AI, and cs.ET

Abstract: The rapid expansion of generative AI drives unprecedented demands for high-performance computing. Training large-scale AI models now requires vast interconnected GPU clusters across multiple data centers. Multi-scale AI training and inference demand uniform, ultra-low latency, and energy-efficient links to enable massive GPUs to function as a single cohesive unit. However, traditional electrical and optical interconnects, relying on conventional digital signal processors (DSPs) for signal distortion compensation, increasingly fail to meet these stringent requirements. To overcome these limitations, we present an integrated neuromorphic optical signal processor (OSP) that leverages deep reservoir computing and achieves DSP-free, all-optical, real-time processing. Experimentally, our OSP achieves a 100 Gbaud PAM4 per lane, 1.6 Tbit/s data center interconnect over a 5 km optical fiber in the C-band (equivalent to over 80 km in the O-band), far exceeding the reach of state-of-the-art DSP solutions, which are fundamentally constrained by chromatic dispersion in IMDD systems. Simultaneously, it reduces processing latency by four orders of magnitude and energy consumption by three orders of magnitude. Unlike DSPs, which introduce increased latency at high data rates, our OSP maintains consistent, ultra-low latency regardless of data rate scaling, making it ideal for future optical interconnects. Moreover, the OSP retains full optical field information for better impairment compensation and adapts to various modulation formats, data rates, and wavelengths. Fabricated using a mature silicon photonic process, the OSP can be monolithically integrated with silicon photonic transceivers, enhancing the compactness and reliability of all-optical interconnects. This research provides a highly scalable, energy-efficient, and high-speed solution, paving the way for next-generation AI infrastructure.

Summary

The paper introduces an integrated neuromorphic optical signal processor that leverages a deep time-delay reservoir architecture to overcome latency and energy challenges of conventional DSPs.
It demonstrates high-speed transmission up to 1.6 Tbit/s over 5 km fibers with superior impairment compensation using all-optical processing and in-situ training.
The processor achieves sub-60 ps latency and over 1,700x energy efficiency improvement compared to DSP-based methods, enhancing scalability for AI and data center applications.

This paper introduces an integrated neuromorphic optical signal processor (OSP) designed to overcome the latency and energy consumption bottlenecks of traditional digital signal processor (DSP)-based optical interconnects, particularly for demanding applications like large-scale AI model training across multiple data centers. The core problem addressed is that conventional DSPs introduce significant latency and power draw for compensating signal distortions (like chromatic dispersion) in high-speed intensity modulation/direct detection (IMDD) links, hindering the scalability of AI infrastructure.

The proposed solution is an all-optical, DSP-free OSP based on a novel deep reservoir computing (RC) architecture implemented on a silicon photonic chip. Key features of the OSP design include:

Deep Time-Delay Reservoir: Utilizes three cascaded photonic reservoir nodes with different feedback delays (9 ps, 18 ps, 36 ps) and tunable strengths (via MZIs). This deep architecture enhances memory capacity and processing power compared to single reservoirs, reducing the complexity needed in the subsequent readout layer.
Input Layer Elimination: Directly feeds the distorted optical signal into the reservoir, removing the need for bandwidth-limiting and energy-intensive digital masks often used as input layers in photonic RC.
Broadband Components: Avoids bandwidth-constrained or wavelength-selective components like lasers or ring resonators within the reservoir, relying on standard waveguides.
Complex-Valued Operations: Incorporates phase shifters within the reservoir feedback loops and readout paths, enabling complex-valued weighting for more effective signal manipulation.
Photonic Readout Layer: Implements an 8-tap delay line filter (5 ps delay interval per tap) with tunable complex weights (MZI + phase shifter).
Integrated Nonlinearity: Leverages the square-law detection ( $|\cdot|^2$ ) of the output photodetector as the final nonlinear activation function.

The OSP chip was fabricated using a commercial silicon-on-insulator (SOI) process, allowing monolithic integration with transceivers. Programmability is achieved using 28 microheaters acting as phase shifters, trained in-situ using a particle swarm optimization (PSO) algorithm to minimize mean square error (MSE) between the OSP output and the known transmitted symbols.

Experimental Demonstrations and Key Results:

High-Speed DSP-Free Transmission: Achieved 100 Gbaud PAM4 transmission (200 Gbps per lane) over a 5 km single-mode fiber (SMF) in the C-band without any DSP assistance at the receiver. This distance in C-band corresponds to dispersion equivalent to over 80 km in the O-band, significantly exceeding the ~2 km reach of state-of-the-art DSPs in the O-band for similar speeds. It also supported 112 Gbaud OOK and 112 Gbaud PAM4 (below SD-FEC) over the same link.
Programmability: Demonstrated adaptability by successfully training and operating the same chip for different modulation formats (OOK, PAM4), baud rates (56-112 Gbaud), and wavelengths (1540 nm to 1565 nm with 400 GHz spacing).
Impairment Compensation: Showcased superior compensation compared to DSP algorithms (FFE, FNN). By operating on the full optical field before detection, the OSP effectively learned the inverse channel transfer function, mitigating both linear chromatic dispersion (eliminating spectral power fading) and nonlinear impairments (handling higher launch powers). It achieved a 1.88 dB higher Q-factor than an 885-tap FFE and 0.85 dB higher than a 256x256x1 FNN for 100 Gbaud PAM4.
1.6 Tbit/s WDM: Demonstrated an 8-channel WDM transmission (8 x 100 Gbaud PAM4 = 1.6 Tbit/s aggregate) over 5 km SMF using a single OSP chip.
- OSP-only: When trained for a single channel (1550 nm), the OSP provided significant improvement across all channels (BERs below SD-FEC), outperforming DSP-only compensation optimized per channel.
- Hybrid OSP/DSP: Combining the OSP (handling major impairments) with lightweight DSP (<30 taps FFE per channel) achieved BERs below the HD-FEC threshold for all channels, far surpassing the performance of a complex DSP-only approach (>800 taps FFE per channel only reaching SD-FEC).
Latency and Energy Efficiency:
- Latency: Estimated at < 60 ps (based on signal propagation through ~4 mm waveguide), over 14,000 times lower than the > 0.8 µs latency estimated for a highly optimistic DSP scenario at 100 Gbaud. OSP latency is independent of the data rate.
- Energy: Measured at 108 mW for 100 Gbaud PAM4 (0.54 pJ/bit). For the 1.6T WDM system, efficiency improves to 67.5 fJ/bit, a >1,700x reduction compared to estimated DSP power (>37.5 W per channel). Projections suggest sub-fJ/bit efficiency is possible with advanced phase shifters (MEMS, BTO).

The paper concludes that the integrated neuromorphic OSP provides a highly scalable, ultra-low latency, and energy-efficient alternative to DSPs for high-speed optical interconnects, directly addressing critical bottlenecks in next-generation AI and data center infrastructure. While acknowledging chip insertion loss (15 dB) as a challenge, it suggests mitigation through monolithic integration with photodetectors and on-chip amplification.

PDF Markdown

Tweets

https://twitter.com/Underfox3/status/1914843117007651080

Beyond Terabit/s Integrated Neuromorphic Photonic Processor for DSP-Free Optical Interconnects (2504.15044v1)

Summary

Related Papers

Tweets