Expressivity of Neural Networks with Random Weights and Learned Biases (2407.00957v2)

Published 1 Jul 2024 in cs.NE, q-bio.NC, and stat.ML

Abstract: Landmark universal function approximation results for neural networks with trained weights and biases provided impetus for the ubiquitous use of neural networks as learning models in AI and neuroscience. Recent work has pushed the bounds of universal approximation by showing that arbitrary functions can similarly be learned by tuning smaller subsets of parameters, for example the output weights, within randomly initialized networks. Motivated by the fact that biases can be interpreted as biologically plausible mechanisms for adjusting unit outputs in neural networks, such as tonic inputs or activation thresholds, we investigate the expressivity of neural networks with random weights where only biases are optimized. We provide theoretical and numerical evidence demonstrating that feedforward neural networks with fixed random weights can be trained to perform multiple tasks by learning biases only. We further show that an equivalent result holds for recurrent neural networks predicting dynamical system trajectories. Our results are relevant to neuroscience, where they demonstrate the potential for behaviourally relevant changes in dynamics without modifying synaptic weights, as well as for AI, where they shed light on multi-task methods such as bias fine-tuning and unit masking.

Citations (1)

View on Semantic Scholar

Summary

The paper proves that both feedforward and recurrent networks with random weights achieve universal approximation by solely learning biases.
It employs bias-learning activations, such as ReLU, to identify effective sub-networks, achieving performance comparable to fully-trained models on tasks like MNIST.
The study highlights that bias learning surpasses masking techniques by promoting higher unit variance, enabling efficient multi-task learning in AI and neuroscience.

Expressivity of Neural Networks with Random Weights and Learned Biases

The paper "Expressivity of Neural Networks with Random Weights and Learned Biases" by Ezekiel Williams et al. addresses the theoretical foundations and practical implications of neural networks (NNs) where only the biases are trained, while the weights are fixed after random initialization. This paper spans both feedforward neural networks (FFNs) and recurrent neural networks (RNNs), providing insights into their approximation capabilities and applications in machine learning and neuroscience.

Theoretical Contributions

Universal Approximation by FFNs

The paper builds upon the universal approximation theorems, which traditionally indicate that NNs can approximate any continuous function with arbitrary precision, provided the network has sufficiently many hidden units and both weights and biases can be trained. Williams et al. extend this theory to show that FFNs with random weights can still approximate arbitrary functions with high probability if the network has wide hidden layers and only the biases are learned.

The authors define certain activation functions (e.g., ReLU) as "bias-learning activations," which enable universal function approximation even when the weights are bounded. The core of their proof involves showing that within a large random network, one can find a sub-network that approximates the desired function. This sub-network can be selected using learned biases, effectively turning off irrelevant units—a concept analogous to the Strong Lottery Ticket Hypothesis.

Approximation by RNNs

Analogous to FFNs, the paper also proves that RNNs with random weights and learned biases can approximate finite-time trajectories of smooth dynamical systems with high probability. This extends the approximation capabilities to temporal sequences and dynamic environments. The proofs leverage similar ideas to the FFN case but account for the temporal dependencies inherent in sequence modelling.

Empirical Validation

Multi-task Learning with FFNs

The empirical validation shows that FFNs with fixed random weights, but learned biases, can perform comparably to fully-trained networks on multiple image classification tasks. The bias-learning network achieved similar performance to its fully-trained counterpart across several datasets (e.g., MNIST, Fashion MNIST), albeit requiring much wider hidden layers.

The research highlights the underlying mechanism where task-specific clusters of neurons emerge. This was particularly observed through task-variance analysis, demonstrating that bias learning leads to the development of specialized neural subcircuits.

Bias vs. Mask Learning

The paper draws parallels between bias learning and the method of masking, where only specific neurons or weights are selected to be active. While both approaches achieve similar performance, bias-learning slightly outperforms masking due to its flexibility. Notably, the learned solutions exhibit differences, with bias learning resulting in higher unit variance and less sparsity compared to masking.

Autonomous and Non-autonomous Dynamical Systems

In the context of RNNs, the paper demonstrates that networks with random weights can model both autonomous and non-autonomous dynamical systems effectively by only learning biases. For autonomous systems, such as oscillators, the learned biases modulate the Jacobian to produce dynamics like limit cycles. For non-autonomous systems like parts of the Lorenz attractor, the bias-learning network accurately forecasts future values given current states and external inputs.

Implications and Future Directions

Practical Applications

The ability to fine-tune biases while keeping weights fixed has significant implications for both neuroscience and AI. In neuroscience, it suggests that behavioral adaptations and memory retrieval could occur through mechanisms other than synaptic plasticity, such as changes in neuron firing thresholds or tonic inputs. In AI, it provides a pathway for efficient multi-task learning with significantly fewer trainable parameters.

Theoretical Extensions

The paper opens avenues for further theoretical exploration. Future work could address the hidden layer width requirements, which are currently much larger in practice than suggested by the proofs. Extending these findings to settings beyond high-probability bounds, including almost sure convergence or $L^p$ norms, could also provide deeper insights.

Moreover, examining the interplay between bias and gain modulation offers fertile ground for bridging neural and synaptic plasticity. Also, exploring initial weight distributions that optimize bias learning effectiveness is an intriguing direction, potentially providing more biologically plausible models of neural computation.

Conclusion

Williams et al.'s work is a foundational step toward understanding neural networks' expressivity when constrained to learning only biases. It bridges theoretical results with practical applications, offering promising insights for both machine learning paradigms and neuroscientific models of learning and memory. The methodology and results suggest promising directions for optimizing neural computation and understanding non-synaptic learning.

PDF Markdown

Related Papers

Tweets

https://twitter.com/averyryoo/status/1816962609188327513

https://twitter.com/averyryoo/status/1811098028897415223

https://twitter.com/BioPapers/status/1808635993206988851