Papers
Topics
Authors
Recent
Search
2000 character limit reached

Neural networks are a priori biased towards Boolean functions with low entropy

Published 25 Sep 2019 in cs.LG and stat.ML | (1909.11522v3)

Abstract: Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks -- a single-layer perceptron with n input neurons, one output neuron, and no threshold bias term -- we prove that upon random initialisation of weights, the a priori probability $P(t)$ that it represents a Boolean function that classifies t points in ${0,1}n$ as 1 has a remarkably simple form: $P(t) = 2{-n}$ for $0\leq t < 2n$. Since a perceptron can express far fewer Boolean functions with small or large values of t (low entropy) than with intermediate values of t (high entropy) there is, on average, a strong intrinsic a-priori bias towards individual functions with low entropy. Furthermore, within a class of functions with fixed t, we often observe a further intrinsic bias towards functions of lower complexity. Finally, we prove that, regardless of the distribution of inputs, the bias towards low entropy becomes monotonically stronger upon adding ReLU layers, and empirically show that increasing the variance of the bias term has a similar effect.

Citations (34)

Summary

  • The paper reveals that random initialization biases simple neural networks toward low-entropy (simpler) Boolean functions.
  • The paper finds that adding layers and ReLU activation intensifies this bias, leading to a preference for functions that generalize effectively.
  • The paper extends insights from perceptrons to deep models, suggesting that architectural complexity systematically enhances simplicity bias.

Exploring the Inductive Bias of Simple Neural Networks and Its Generalization across Architectures

Introduction to Inductive Bias in Neural Networks

Inductive bias in neural networks fundamentally influences their capability to generalize beyond training data. With deep neural networks (DNNs) performing robustly in overparameterized settings where traditional theories predict generalization failures, it becomes crucial to understand the sources and impacts of these biases. This paper explores the intrinsic bias of neural networks with an emphasis on the random initialization of parameters, exploring how these biases affect the types of functions these networks are likely to express and their implications on generalization performance.

Overview of Main Contributions

  • Uniform Distribution of Perceptron Bias: Demonstrated that for a single-layer perceptron initialized randomly under specific conditions, the resulting distribution of Boolean function representations is surprisingly uniform across all possible boolean outputs.
  • Increasing Entropy Bias with Added Complexity: Proved that adding layers and incorporating non-linearities like ReLU significantly enhances the network's entropy bias, promoting lower complexity functions systematically across layers.
  • Relevance to Generic Deep Learning Models: These insights extend beyond perceptrons, indicating a pervasive attribute of deeper and more complex models, potentially explaining their notable generalization capabilities in practical applications.

Perceptron's Bias Analysis (No Threshold Bias)

  1. Uniform Probability Distribution: Initial examinations reveal that when a perceptron's weights are initialized randomly (with no threshold bias), every specific Boolean function output configuration is equally probable.
  2. Entropy-based Function Bias: Despite uniformity in function probability, an intrinsic bias towards functions with lower entropy — or simpler Boolean functions — exists. This is inherent in the nature of the perceptron's architecture and the probabilistic characteristics of its parameter initialization.

Influence of Architecture and Initialization

  • Implications of Depth and Activation Functions: Adding more layers or switching to non-linear activation functions like ReLU typically skews the function distribution toward even lower entropy states. This suggests a deepening bias towards simplicity with increasing network complexity.
  • Comparing Activation Functions: Investigations into different activation functions under varying initialization conditions (e.g., variance levels of bias terms) showed that these factors could either enhance or modulate the intrinsic biases of the networks.

Towards Generalizing Networks

  • Expressivity and Complexity: Analysis in the paper outlines that increasing a network's depth (while maintaining or slightly modifying its breadth) allows it to preserve its expressiveness — capable of representing more complex Boolean functions while enhancing simplicity bias.
  • Empirical Evidences: Experiments with standard datasets and architectures corroborate theoretical assertions, showing that deeper or more complex networks tend to generalize better, possibly due to their stronger simplicity biases.

Conclusion and Future Directions

The paper highlights a crucial aspect of neural network design and initialization — the bias towards simplicity inherent in network architectures and their initialization strategies. Future research could explore:

  • Other Network Architectures: Examining whether similar biases are observable in other types of neural networks, such as those used in unsupervised learning or reinforcement learning contexts.
  • Impact on Learning Dynamics: Further studies on how these biases affect the dynamics of network training, particularly under different optimization scenarios or with non-standard objective functions.

Definitions and Terminology

The terms like DNNs, activation function (specifically ReLU), and other general neural network architecture terms are used consistently with established definitions in machine learning literature. The paper also clearly lays out specific measures and definitions employed in its analyses, ensuring clarity and precision in its technical arguments.

In summary, this investigation into the intrinsic biases of neural networks, beginning from simple models like perceptrons to more involved architectures, provides valuable insights into why these models often generalize well in practical scenarios, despite theoretical predications otherwise.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.