Optical Transformers (2302.10360v1)

Published 20 Feb 2023 in cs.ET, cs.LG, cs.NE, physics.app-ph, and physics.optics

Abstract: The rapidly increasing size of deep-learning models has caused renewed and growing interest in alternatives to digital computers to dramatically reduce the energy cost of running state-of-the-art neural networks. Optical matrix-vector multipliers are best suited to performing computations with very large operands, which suggests that large Transformer models could be a good target for optical computing. To test this idea, we performed small-scale optical experiments with a prototype accelerator to demonstrate that Transformer operations can run on optical hardware despite noise and errors. Using simulations, validated by our experiments, we then explored the energy efficiency of optical implementations of Transformers and identified scaling laws for model performance with respect to optical energy usage. We found that the optical energy per multiply-accumulate (MAC) scales as $\frac{1}{d}$ where $d$ is the Transformer width, an asymptotic advantage over digital systems. We conclude that with well-engineered, large-scale optical hardware, it may be possible to achieve a $100 \times$ energy-efficiency advantage for running some of the largest current Transformer models, and that if both the models and the optical hardware are scaled to the quadrillion-parameter regime, optical computers could have a $>8,000\times$ energy-efficiency advantage over state-of-the-art digital-electronic processors that achieve 300 fJ/MAC. We analyzed how these results motivate and inform the construction of future optical accelerators along with optics-amenable deep-learning approaches. With assumptions about future improvements to electronics and Transformer quantization techniques (5$\times$ cheaper memory access, double the digital--analog conversion efficiency, and 4-bit precision), we estimated that optical computers' advantage against current 300-fJ/MAC digital processors could grow to $>100,000\times$.

Citations (15)

View on Semantic Scholar

Summary

The paper presents optical matrix-vector multipliers to execute Transformer computations with improved energy efficiency and validated accuracy despite analog noise.
It derives scaling laws showing that optical energy per MAC inversely relates to Transformer width, highlighting significant energy savings for large models.
Simulations calibrated by experiments compare optical accelerators with digital processors, underscoring the potential of optical computing in neural network applications.

Optical Transformers: Enhancing Energy Efficiency in Neural Network Computations

The paper "Optical Transformers" explores an innovative approach to addressing the growing computational demands and energy inefficiencies associated with scaling deep learning models, particularly Transformers. In recent years, Transformer architectures have been pivotal in achieving state-of-the-art results across various domains, including natural language processing and computer vision. Nevertheless, their exponential growth in size and FLOP requirements has prompted researchers to seek hardware solutions that extend beyond traditional digital electronic systems, which have struggled to maintain pace with the scaling demands.

The authors propose optical matrix-vector multipliers as a promising alternative for executing the large linear algebra operations inherent to Transformer models. Optical computing, characterized by its potential for high throughput and low energy consumption, emerges as an attractive option, particularly when it concerns the execution of large-scale models. The paper demonstrates, through a series of optical experiments and simulations, that optical hardware can effectively handle Transformer operations while mitigating issues such as noise and error inherent to analog systems.

Key Findings and Contributions

Experimental Validation: The paper reports small-scale experiments using optical accelerators, verifying that Transformer-related computations can be executed on optical hardware with an acceptable degree of accuracy despite noise-induced challenges. This validation constitutes a critical step in confirming the feasibility of optical Transformers.
Energy Efficiency and Scaling Laws: A significant contribution is the quantification of energy efficiency benefits when scaling models optically. The authors derive and validate scaling laws indicating that optical energy per MAC scales inversely with Transformer width. Consequently, optical implementations can theoretically achieve superior energy efficiency compared to electronic alternatives, scaling to thousands of times more energy-efficient as models grow to the quadrillion-parameter regime.
Simulation and Modeling: By employing simulations calibrated with experimental data, the authors model the energy consumption of potential ONN accelerators, comparing it to state-of-the-art digital processors. Their findings suggest substantial energy-efficiency gains for large-scale models, supported by practical modeling scenarios and assumptions about near-future improvements in electronic and optical components.

Implications for Future Developments

The research emphasizes the need for further development of high-parallelism optical neural network accelerators capable of taking full advantage of the optical fan-out/fan-in data transportation capabilities. Prospective developments could revolve around refining optical memory technology or alternative encoding systems to further enhance data reuse, addressing a current limitation that impacts energy efficiency markedly when loading large models. Additionally, achieving the greatest impact from ONN designs requires model architectures and quantization strategies calibrated to operate in low-precision, high-throughput environments.

Theoretical and Practical Prospects

The implications of this research extend to both theoretical exploration and practical application. The establishment of reliable optical scaling laws offers pathways for more extensive theoretical investigation into optimal model configurations and scaling strategies leveraging photonic systems' unique capabilities. Practically, the efficiency gains projected present a compelling case for investment and innovation in optical computing infrastructure, aimed at accommodating the immense computational needs anticipated in the AI landscape's future.

In conclusion, the paper on Optical Transformers elucidates novel strategies and insights into leveraging optical computing to meet deep learning's swelling demands, especially for large Transformer models. By encapsulating both experimental validation and robust simulations, the authors pave the way for future discourse and development in optical computing, illustrating its potential as a transformative force in accelerating neural network computations efficiently and sustainably.

PDF Markdown

Related Papers

Tweets

https://twitter.com/teortaxesTex/status/1781223280315093320

YouTube

Show All Videos