Two Routes to Scalable Credit Assignment without Weight Symmetry (2003.01513v2)

Published 28 Feb 2020 in q-bio.NC, cs.LG, cs.NE, and stat.ML

Abstract: The neural plausibility of backpropagation has long been disputed, primarily for its use of non-local weight transport $-$ the biologically dubious requirement that one neuron instantaneously measure the synaptic weights of another. Until recently, attempts to create local learning rules that avoid weight transport have typically failed in the large-scale learning scenarios where backpropagation shines, e.g. ImageNet categorization with deep convolutional networks. Here, we investigate a recently proposed local learning rule that yields competitive performance with backpropagation and find that it is highly sensitive to metaparameter choices, requiring laborious tuning that does not transfer across network architecture. Our analysis indicates the underlying mathematical reason for this instability, allowing us to identify a more robust local learning rule that better transfers without metaparameter tuning. Nonetheless, we find a performance and stability gap between this local rule and backpropagation that widens with increasing model depth. We then investigate several non-local learning rules that relax the need for instantaneous weight transport into a more biologically-plausible "weight estimation" process, showing that these rules match state-of-the-art performance on deep networks and operate effectively in the presence of noisy updates. Taken together, our results suggest two routes towards the discovery of neural implementations for credit assignment without weight symmetry: further improvement of local rules so that they perform consistently across architectures and the identification of biological implementations for non-local learning mechanisms.

Authors (6)

Daniel Kunin (12 papers)
Aran Nayebi (22 papers)
Javier Sagastuy-Brena (3 papers)
Surya Ganguli (73 papers)
Jonathan M. Bloom (4 papers)
Daniel L. K. Yamins (26 papers)

Citations (31)

View on Semantic Scholar

Summary

Scalable Credit Assignment without Weight Symmetry: An Analytical Perspective

This paper, "Two Routes to Scalable Credit Assignment without Weight Symmetry," explores the limitations and advancements in learning rules for neural networks that deviate from the standard backpropagation approach. The fundamental challenge addressed is the biologically implausible requirement of instantaneous weight symmetry in backpropagation, which necessitates that forward and backward weights are exact transposes. The authors provide a comprehensive investigation into local and non-local learning rules that aim to achieve competitive scalability and performance without this symmetry constraint.

Core Contributions

Analysis of Local Learning Rules: The authors revisit a recently proposed local learning rule that eliminates the need for weight symmetry. They find it plagued by instability due to its sensitivity to metaparameter tuning, a property that restricts its transference across different neural architectures. Through detailed mathematical analysis, an improved variant of this rule, termed Information Alignment (IA), is proposed. IA introduces a stabilization mechanism that reduces sensitivity to metaparameters by incorporating primitives into its layer-wise regularization function. It achieves notable performance on large-scale tasks like ImageNet while enhancing stability.
Non-Local Learning Rules and Weight Estimation: The paper explores non-local learning rules, specifically Symmetric Alignment (SA) and Activation Alignment (AA), which bypass the requirement for instantaneous weight transport by enabling a more natural alignment process over time. These mechanisms are found to perform on par with backpropagation. The authors suggest that these strategies might be implementable within biological systems through a concept termed "weight estimation," a plausible mechanism offering a form of regularization through temporal measurements of synaptic strength, even in noisy environments.
Mathematical Framework for Biological Plausibility: A mathematical framework unifying various learning rule strategies is formulated. It encompasses existing methods like feedback alignment and weight mirror while deriving novel learning algorithms such as IA, SA, and AA. This framework retains the critical features required for deep network training while introducing scalable neural plausibility.

Numerical and Theoretical Results

Performance Evaluation: IA demonstrates marked improvements over existing local rules with a robust transferability across architectures. However, SA and AA surpass IA by achieving backpropagation-level performance across even deeper and variant architectures, highlighting their robustness in noisy update environments.
Empirical Stability: The inclusion of regularization terms in SA and AA produces a pseudo-gradient descent that effectively aligns backward weights with forward computations. This alignment ensures the stability of these rules across training epochs.
Scalability and Noise Resilience: Both SA and AA show impressive robustness to Gaussian noise during pseudo-gradient updates. This property suggests that noisy biological implementations could potentially adopt these algorithms without catastrophic failure.

Implications and Speculative Future Directions

The paper posits two prospective pathways for realizing biologically plausible learning rules:

Refinement of Local Learning Rules: Further exploration into the stability of local operations and the efficacy of regularization primitives may yield learning algorithms that are agnostic to architecture changes.
Development of Scalable Biological Mechanisms: Investigating weight estimation techniques that can be integrated into neuronal circuits to facilitate credit assignment in a neurally-plausible manner opens up a promising avenue of research.

In conclusion, this work sets the stage for a deeper understanding and development of credit assignment methods without the stringent constraints of weight symmetry. The provided framework, alongside the discussed strategies, offers a roadmap not just for effective AI learning protocols, but also for insights into potential mechanisms of biological learning, broadening the applicability of these theories to both artificial and natural intelligent systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/aran_nayebi/status/1821932742449176698

https://twitter.com/aran_nayebi/status/1766444854265987398

https://twitter.com/aran_nayebi/status/1823433902507303284

YouTube

Show All Videos