Nonlinear Photonic Neuromorphic Chips for Spiking Reinforcement Learning

Published 9 Aug 2025 in physics.optics | (2508.06962v1)

Abstract: Photonic computing chips have made significant progress in accelerating linear computations, but nonlinear computations are usually implemented in the digital domain, which introduces additional system latency and power consumption, and hinders the implementation of fully-functional photonic neural network chips. Here, we propose and fabricate a 16-channel programmable incoherent photonic neuromorphic computing chip by co-designing a simplified MZI mesh and distributed feedback lasers with saturable absorber array using different materials, enabling implementation of both linear and nonlinear spike computations in the optical domain. Furthermore, previous studies mainly focused on supervised learning and simple image classification tasks. Here, we propose a photonic spiking reinforcement learning (RL) architecture for the first time, and develop a software-hardware collaborative training-inference framework to address the challenge of training spiking RL models. We achieve large-scale, energy-efficient (photonic linear computation: 1.39 TOPS/W, photonic nonlinear computation: 987.65 GOPS/W) and low-latency (320 ps) end-to-end deployment of an entire layer of photonic spiking RL. Two RL benchmarks include the discrete CartPole task and the continuous Pendulum tasks are demonstrated experimentally based on spiking proximal policy optimization algorithm. The hardware-software collaborative computing reward value converges to 200 (-250) for the CartPole tasks, respectively, comparable to that of a traditional PPO algorithm. This experimental demonstration addresses the challenge of the absence of large-scale photonic nonlinear spike computation and spiking RL training difficulty, and presents a high-speed and low-latency photonic spiking RL solution with promising application prospects in fields such as real-time decision-making and control for robots and autonomous driving.

Abstract PDF Upgrade to Chat

Authors (11)

Summary

The paper introduces a novel photonic chip that performs spiking reinforcement learning entirely in the optical domain.
It combines a 16-channel MZI mesh with DFB-SA arrays to achieve efficient matrix-vector multiplication and nonlinear spike activations, yielding 1.39 TOPS/W and 987.65 GOPS/W.
The proposed hybrid training framework, merging global pre-training and in-situ hardware fine-tuning, demonstrates competitive RL performance on tasks like CartPole and Pendulum.

Overview

The paper "Nonlinear Photonic Neuromorphic Chips for Spiking Reinforcement Learning" (2508.06962) introduces a novel approach to photonic neuromorphic computing by demonstrating a programmable incoherent photonic computing chip capable of implementing both linear and nonlinear computations entirely in the optical domain. This is achieved through the co-design of a Mach-Zehnder interferometer (MZI) mesh and distributed feedback lasers with saturable absorbers (DFB-SA). The integration of spiking reinforcement learning (RL) with photonic circuits presents a new paradigm in energy-efficient and high-speed computing, offering significant implications for real-time decision-making and control.

Photonic Chip Design and Architecture

A key advancement in this research is the design of a 16-channel photonic chip, incorporating MZI meshes tailored for performing matrix-vector multiplications and DFB-SA arrays for nonlinear spike activations. The combination of these components facilitates the execution of spiking reinforcement learning computations fully in the optical domain. This design addresses the limitations of existing photonic neuromorphic chips that rely on digital conversions for nonlinear operations, thereby reducing latency and power consumption.

The authors employed a simplified architecture for the photonic synapse array to optimize for spiking neural networks (SNNs), which are characterized by sparse weight matrices and spike-based activations. This optimization allows the photonic chips to efficiently execute the sparse operations typical in SNNs with reduced phase shifter complexity and tuning requirements.

Implementation of Spiking Reinforcement Learning

The photonic implementation of the Proximal Policy Optimization (PPO) algorithm marks a significant shift towards leveraging photonic hardware for RL tasks. The authors developed a hybrid architecture where an SNN-based actor network operates on the photonic chip while an ANN-based critic network evaluates the state-action pairs. This setup facilitates the training of spiking RL models in an energy-efficient manner.

A crucial aspect of this work is the introduction of a software-hardware collaborative training-inference framework. This framework includes global software pre-training, local hardware in-situ training using stochastic parallel gradient descent (SPGD), and hardware-aware software fine-tuning. This multi-step process ensures accurate mapping of trained weights onto the hardware, compensating for potential hardware imperfections.

Experimental Results

The experiments conducted involve RL benchmarks such as the CartPole and Pendulum tasks, demonstrating the capability of the photonic spiking RL system to achieve competitive performance metrics when compared to traditional algorithms. The photonic hardware showcased remarkable energy efficiency with 1.39 TOPS/W for linear operations and 987.65 GOPS/W for nonlinear operations, alongside a low latency of 320 ps.

The CartPole task achieved a convergence to a reward value of 200, while the Pendulum task attained a convergence point at a reward indicative of successful control. These results affirm the high-performance potential of photonic spiking RL in discrete and continuous action spaces.

Implications and Future Directions

The successful deployment of fully functional photonic SNNs for RL tasks indicates a substantial advancement in neuromorphic computing, with notable implications for edge AI applications demanding low-latency and energy-efficient processing. The work paves the way for future research in scaling up photonic neuromorphic chips and exploring broader application scenarios such as autonomous driving and robotics.

Furthermore, the approach sets a precedent for enhancing photonic computing architectures, encouraging exploration into larger and more complex network configurations. Integration with advanced packaging techniques and heterogeneous integration will further augment the scalability and practicality of photonic neuromorphic solutions.

Conclusion

The research provides a comprehensive demonstration of a novel photonic neuromorphic computing paradigm geared towards spiking reinforcement learning tasks. By achieving high computational density and energy efficiency while maintaining low latency, the study offers compelling insights into the potential of photonics in overcoming the limitations of traditional electronic computing for AI applications. This work represents a pivotal step in bridging the gap between neuromorphic computing principles and practical, deployable systems.

Markdown Report Issue