Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 73 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 110 tok/s Pro
Kimi K2 210 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Learning compositional functions via multiplicative weight updates (2006.14560v2)

Published 25 Jun 2020 in cs.NE, cs.LG, cs.NA, math.NA, and stat.ML

Abstract: Compositionality is a basic structural feature of both biological and artificial neural networks. Learning compositional functions via gradient descent incurs well known problems like vanishing and exploding gradients, making careful learning rate tuning essential for real-world applications. This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions. Based on this lemma, we derive Madam -- a multiplicative version of the Adam optimiser -- and show that it can train state of the art neural network architectures without learning rate tuning. We further show that Madam is easily adapted to train natively compressed neural networks by representing their weights in a logarithmic number system. We conclude by drawing connections between multiplicative weight updates and recent findings about synapses in biology.

Citations (24)

Summary

  • The paper proposes the Madam optimizer, which uses multiplicative weight updates to stabilize training in compositional neural networks.
  • Utilizing a rigorous descent lemma and deep relative trust analysis, the approach alleviates gradient instability without heavy Hessian computations.
  • Empirical evaluations show Madam’s competitive performance across image classification, GANs, and language modeling, even in low-bit precision settings.

Essay on Learning Compositional Functions via Multiplicative Weight Updates

The paper explores the optimization of compositional functions in neural networks through multiplicative weight updates, presenting Madam—a multiplicative adaptation of the Adam optimizer. This approach addresses well-known issues in gradient descent, such as vanishing and exploding gradients, and provides a method that alleviates the necessity for intensive learning rate tuning.

Theoretical Insights

The authors provide a rigorous mathematical foundation, demonstrating that multiplicative updates adhere to a descent lemma suited to compositional functions. Central to this insight is the concept of deep relative trust, an analytical model that predicts the breakdown of gradients in multilayer perceptrons. By focusing on relative changes across layers, the paper advances the understanding of how perturbations might impact learning stability without the computational burden of managing enormous Hessian matrices.

Madam Optimizer

Madam, the proposed optimizer, is distinguished by its use of multiplicative updates. By leveraging a logarithmic representation of weights, Madam can naturally handle low-bit precision, making it viable for both high-performance architectures and those constrained by hardware limitations. This is implemented using a logarithmic number system, drawing parallels to synaptic structures in biological systems, where weight strengths are relatively sparse and discrete.

Empirical Evaluation

The empirical section benchmarks Madam against prevalent optimizers such as SGD and Adam across various tasks: image classification (CIFAR-10, CIFAR-100, and ImageNet), generative tasks (CIFAR-10 GAN), and LLMing (Wikitext-2 Transformer). Notably, the results indicate that Madam performs competitively without the extensive hyperparameter tuning typical for SGD or Adam.

In FP32 testing, Madam achieves satisfactory results in classification tasks and GANs, showing its versatility across differing neural architectures. Meanwhile, in low-precision experiments, it delivers preserved accuracy down to 12 bits per synapse, highlighting the method's suitability for computationally efficient models and its potential application in developing hardware.

Implications and Future Directions

The methodological contribution of this paper has pronounced implications for hardware design, especially considering the declining returns on precision improvements in digital electronics. Madam opens opportunities for co-designing learning algorithms and low-precision hardware, aligning with the growing industry trend towards lightweight, energy-efficient AI models.

In theoretical neuroscience, the observations reinforce the hypothesis that synapses update multiplicatively, a notion concordant with empirical findings such as changes in spine size proportional to their size. This germinates questions about biological learning processes and how these insights can guide the development of more robust artificial neural networks.

The paper discusses potential limitations, such as the influence of weight clipping and minor performance degradation compared to traditional methods in certain scenarios. These aspects deserve future exploration to refine multiplicative strategies further and achieve optimal equilibrium between stability and expressiveness in neural networks.

In conclusion, this paper advances the paper of learning in neural networks through a multiplicative lens, offering both theoretical and practical advances. The findings have broad implications, from improving hardware efficiency to unveiling biological learning mechanisms, granting a renewed perspective on optimization practices in deep learning.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 posts and received 440 likes.