- The paper proves that both feedforward and recurrent networks with random weights achieve universal approximation by solely learning biases.
- It employs bias-learning activations, such as ReLU, to identify effective sub-networks, achieving performance comparable to fully-trained models on tasks like MNIST.
- The study highlights that bias learning surpasses masking techniques by promoting higher unit variance, enabling efficient multi-task learning in AI and neuroscience.
Expressivity of Neural Networks with Random Weights and Learned Biases
The paper "Expressivity of Neural Networks with Random Weights and Learned Biases" by Ezekiel Williams et al. addresses the theoretical foundations and practical implications of neural networks (NNs) where only the biases are trained, while the weights are fixed after random initialization. This paper spans both feedforward neural networks (FFNs) and recurrent neural networks (RNNs), providing insights into their approximation capabilities and applications in machine learning and neuroscience.
Theoretical Contributions
Universal Approximation by FFNs
The paper builds upon the universal approximation theorems, which traditionally indicate that NNs can approximate any continuous function with arbitrary precision, provided the network has sufficiently many hidden units and both weights and biases can be trained. Williams et al. extend this theory to show that FFNs with random weights can still approximate arbitrary functions with high probability if the network has wide hidden layers and only the biases are learned.
The authors define certain activation functions (e.g., ReLU) as "bias-learning activations," which enable universal function approximation even when the weights are bounded. The core of their proof involves showing that within a large random network, one can find a sub-network that approximates the desired function. This sub-network can be selected using learned biases, effectively turning off irrelevant units—a concept analogous to the Strong Lottery Ticket Hypothesis.
Approximation by RNNs
Analogous to FFNs, the paper also proves that RNNs with random weights and learned biases can approximate finite-time trajectories of smooth dynamical systems with high probability. This extends the approximation capabilities to temporal sequences and dynamic environments. The proofs leverage similar ideas to the FFN case but account for the temporal dependencies inherent in sequence modelling.
Empirical Validation
Multi-task Learning with FFNs
The empirical validation shows that FFNs with fixed random weights, but learned biases, can perform comparably to fully-trained networks on multiple image classification tasks. The bias-learning network achieved similar performance to its fully-trained counterpart across several datasets (e.g., MNIST, Fashion MNIST), albeit requiring much wider hidden layers.
The research highlights the underlying mechanism where task-specific clusters of neurons emerge. This was particularly observed through task-variance analysis, demonstrating that bias learning leads to the development of specialized neural subcircuits.
Bias vs. Mask Learning
The paper draws parallels between bias learning and the method of masking, where only specific neurons or weights are selected to be active. While both approaches achieve similar performance, bias-learning slightly outperforms masking due to its flexibility. Notably, the learned solutions exhibit differences, with bias learning resulting in higher unit variance and less sparsity compared to masking.
Autonomous and Non-autonomous Dynamical Systems
In the context of RNNs, the paper demonstrates that networks with random weights can model both autonomous and non-autonomous dynamical systems effectively by only learning biases. For autonomous systems, such as oscillators, the learned biases modulate the Jacobian to produce dynamics like limit cycles. For non-autonomous systems like parts of the Lorenz attractor, the bias-learning network accurately forecasts future values given current states and external inputs.
Implications and Future Directions
Practical Applications
The ability to fine-tune biases while keeping weights fixed has significant implications for both neuroscience and AI. In neuroscience, it suggests that behavioral adaptations and memory retrieval could occur through mechanisms other than synaptic plasticity, such as changes in neuron firing thresholds or tonic inputs. In AI, it provides a pathway for efficient multi-task learning with significantly fewer trainable parameters.
Theoretical Extensions
The paper opens avenues for further theoretical exploration. Future work could address the hidden layer width requirements, which are currently much larger in practice than suggested by the proofs. Extending these findings to settings beyond high-probability bounds, including almost sure convergence or Lp norms, could also provide deeper insights.
Moreover, examining the interplay between bias and gain modulation offers fertile ground for bridging neural and synaptic plasticity. Also, exploring initial weight distributions that optimize bias learning effectiveness is an intriguing direction, potentially providing more biologically plausible models of neural computation.
Conclusion
Williams et al.'s work is a foundational step toward understanding neural networks' expressivity when constrained to learning only biases. It bridges theoretical results with practical applications, offering promising insights for both machine learning paradigms and neuroscientific models of learning and memory. The methodology and results suggest promising directions for optimizing neural computation and understanding non-synaptic learning.