Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures (2006.12878v2)

Published 23 Jun 2020 in stat.ML, cs.LG, and cs.NE

Abstract: Despite being the workhorse of deep learning, the backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization of the training process. Furthermore, its biological plausibility is being challenged. Alternative schemes have been devised; yet, under the constraint of synaptic asymmetry, none have scaled to modern deep learning tasks and architectures. Here, we challenge this perspective, and study the applicability of Direct Feedback Alignment to neural view synthesis, recommender systems, geometric learning, and natural language processing. In contrast with previous studies limited to computer vision tasks, our findings show that it successfully trains a large range of state-of-the-art deep learning architectures, with performance close to fine-tuned backpropagation. At variance with common beliefs, our work supports that challenging tasks can be tackled in the absence of weight transport.

Citations (63)

View on Semantic Scholar

Summary

The paper demonstrates that DFA scales to diverse deep learning architectures, achieving performance comparable to backpropagation in tasks like neural view synthesis, recommendation systems, and graph learning.
It applies DFA to a range of experiments, showing potential for asynchronous and parallel updates that could reduce computational constraints in modern hardware.
The study underlines DFA's increased biological plausibility and calls for further optimization, especially to enhance performance in transformer models.

An Evaluation of Direct Feedback Alignment: Scalability to Modern Deep Learning Architectures

The paper presented in the paper, "Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures," investigates the potential for Direct Feedback Alignment (DFA) to function as a viable alternative to the backpropagation (BP) algorithm in training contemporary deep learning architectures. The authors critically assess the limitations of BP, such as sequential layer updates that hinder parallelization and its lack of biological plausibility due to the requirement for synaptic symmetry in weight updates, referred to as the weight transport problem. They propose DFA as a promising alternative that could lead to more biologically plausible training algorithms and computational efficiency.

Key Experiments and Findings

The authors conduct a comprehensive series of experiments across diverse domains, applying DFA in tasks beyond its traditional use in computer vision, thus expanding its applicability to:

Neural View Synthesis: DFA was tested on Neural Radiance Fields (NeRF) models, which use fully connected layers to synthesize novel 3D scene views. Notably, DFA trained NeRF models were able to perform comparably with former state-of-the-art methods, achieving notable performance on synthetic scenes. However, it underperformed in rendering quality with high-frequency noise when compared to BP.
Recommender Systems: In tasks such as click-through rate prediction, DFA was deployed in the deep components of hybrid recommendation models like DeepFM and DeepDCN. The performance of these DFA-trained models was on par with those trained using BP, emphasizing DFA's potential in heterogeneous architecture settings.
Geometric Learning with Graph Convolutional Networks (GCNNs): Various GCNN architectures were effectively trained with DFA across tasks involving graph-based data. For instance, GraphConv and SplineConv models showed alignment with BP results, indicating that DFA can operate effectively even in spectral domain methods and attention-based graph models like GATConv.
NLP with Transformers: When training transformers for LLMing, DFA showed successful learning outcomes. However, optimals had to be retuned independently of those for BP, and the resulting perplexity indicated a need for further refinements to exploit DFA's full potential without falling short of BP's performance.

Implications and Future Directions

This work stands as thorough empirical validation for DFA’s capacity to handle state-of-the-art tasks and architectures, challenging previous assumptions about the inability of alternative training methods to scale without weight symmetry. The implications of this research are multifaceted:

Scalability and Efficiency: DFA's compatibility with asynchronous and parallel weight updates could inspire efficient hardware accelerators, potentially reducing computational resource constraints linked to BP.
Biological Plausibility: DFA better aligns with biological processes than BP, bringing us closer to discovering neural networks that function in truly brain-like mechanisms. However, challenges remain due to its reliance on arbitrary global error paths that are still not plausible in a biological context.
Further Optimization and Training Techniques: As initial results with transformers indicate, DFA is sensitive to optimizer hyperparameters and training protocols. Thus, dedicated research into DFA-tuning procedures, akin to modern BP practices, could lead to closing current performance gaps.
Broader Validation Across Domains: Although DFA has demonstrated surprising versatility, further experimentation in domains not covered by this paper could provide a fuller picture of its general applicability and robustness.

In conclusion, this research marks a significant step in the exploration and validation of DFA as an alternative to backpropagation for training complex deep learning architectures. Future work will undoubtedly focus on detailed optimizations and refinements that leverage DFA's unique traits, potentially shaping more efficient and biologically inspired neural network models.

PDF Markdown

Related Papers

Tweets

https://twitter.com/KrzakalaF/status/1822180918255145143

YouTube

Show All Videos