Neural Bias Functions

Updated 13 May 2026

Neural bias functions are the inherent preferences of neural networks determined by architecture, initialization, and training protocols, affecting the range of functions they can learn.
They are analyzed using mathematical frameworks like Fourier spectral analysis, rank, entropy, and geometric simplicity to quantify biases toward low-frequency and low-dimensional representations.
Tuning neural bias through activation functions, network depth, and regularization strategies can enhance sample efficiency, robustness, and overall generalization performance.

A neural bias function characterizes the inherent functional preference or inductive bias imposed by the architecture, initialization, parameterization, and training protocol of a neural network. In contemporary theory, neural bias functions are formalized as the subset of target functions that a network finds easiest to fit, or equivalently, the functions the network is most likely to represent or generalize to, under constraints such as finite data, overparameterization, and specific regularization or optimization methods. The mathematical and empirical frameworks for neural bias functions relate these preferences to spectral, geometric, algebraic, and combinatorial properties of neural function classes.

1. Mathematical Formalism of Neural Bias in Function Space

Let $f: X \to Y$ be a function realized by a neural network with a given architecture $\mathcal{A}$ , initialization distribution $\mathcal{P}_0$ , and (optionally) regularization scheme $\mathcal{R}$ . The neural bias function $B$ is a real-valued functional over the hypothesis space, explicitly or implicitly minimized by the network's training dynamics: $f^* = \arg\min_{f\,:\,\mathcal{L}_{\mathrm{train}}(f) = 0} B[f],$ or in regularized settings,

$f^* = \arg\min_{f} \big\{ \mathcal{L}_{\mathrm{train}}(f) + \lambda B[f] \big\}.$

Here, $B$ may quantify functional complexity, spectral content, rank, entropy, sensitivity, or other structural properties. This framework underlies both theoretical and meta-learning approaches to extracting neural bias functions in practical neural circuits and artificial networks (Dorrell et al., 2022).

2. Spectral Bias: Fourier and Walsh–Hadamard Characterizations

Neural networks trained by gradient descent exhibit a pronounced spectral bias, manifesting as a preference for learning low-frequency (smooth) components of the target function before high-frequency (rapidly varying) components. For Boolean inputs $\{0,1\}^d$ , any function $f$ admits a Walsh–Hadamard (Boolean Fourier) expansion: $\mathcal{A}$ 0 The degree $\mathcal{A}$ 1 indexes the frequency; neural nets typically fit low-degree components rapidly while severely underfitting higher-degree interactions (Gorji et al., 2023, Rahaman et al., 2018). This low-degree spectral bias directly limits the network's ability to generalize in tasks where high-order effects matter, but may improve inductive bias for naturalistic, smooth signals. The spectral bias is tightly connected to the eigenstructure of the network's Neural Tangent Kernel (NTK), which typically exhibits rapidly decaying eigenvalues for high-frequency basis functions (Choraria et al., 2022, Geifman et al., 2023).

3. Rank, Entropy, and Geometric Simplicity Biases

Beyond spectral bias, neural networks can exhibit functional biases encoded as preferences for:

Low-rank function structure: In fully connected networks with homogeneous nonlinearities, the implicit bias under $\mathcal{A}$ 2 (weight-norm) regularization or depth-scaling promotes solutions of minimal (nonlinear) rank, defined via bottleneck or Jacobian rank factorizations (Jacot, 2022). This drives networks towards single- or multi-index models and low-dimensional subspace variation (Parkinson et al., 2023).
Low-entropy Boolean functions: The class of Boolean functions most likely to be realized by randomly initialized perceptrons have extremely low or high bias (classifying nearly all points to 0 or 1), with astrong prior bias towards such low-entropy functions. This effect is magnified with increased depth and even more so by ReLU stacking and certain bias term settings (Mingard et al., 2019).
Geometric simplicity: Random shallow ReLU networks tend to output functions with few "kinks" (breakpoints), with these located preferentially near the origin. This geometric bias makes it extremely unlikely for such networks to approximate highly oscillatory or evenly kink-distributed targets, regardless of their Kolmogorov complexity (Holmes, 2023).

These effects are summarized in the following table:

Bias Type	Mathematical Metric	Structural Preference
Spectral bias	Fourier degree, eigenvalues	Low-frequency content
Rank bias	Nonlinear bottleneck rank	Low-dimensional subspace
Entropy bias	Output distribution entropy	Majority-class output functions
Geometric simplicity	Breakpoints, kink density	Few/clustered nonlinearity

4. Bias Control: Architecture, Activation, and Regularization

The magnitude and nature of neural bias functions can be influenced, or even tuned, by architectural and training choices:

Activation function: The choice of nonlinearity is a key determinant. Piecewise-linear ReLU enforces a heavy low-frequency bias; replacing it with higher-order B-splines ("Hat" functions) or variable-periodic activations (e.g., FINER, with bias scaling) can suppress or flexibly tune spectral bias (Hong et al., 2022, Liu et al., 2023).
Depth and linear layers: Deeper networks with many linear layers before a nonlinearity minimize functional cost with a stronger low-rank/quasi-norm penalty, favoring single/multi-index models (Parkinson et al., 2023).
Regularization in function space: Techniques such as direct $\mathcal{A}$ 3-norm sparsification of Fourier (Walsh–Hadamard) coefficients or spectrum reweighting in NTK/KRR regimes enable explicit control over which frequencies/components the network captures, with scalable algorithms such as the Hashed Walsh–Hadamard regularizer (Gorji et al., 2023, Geifman et al., 2023).
Bias strategies: Neural expressivity can be modulated by learning bias terms even when weights are fixed, as demonstrated in universal approximation theorems for bias-only learning and in architectures with dense per-connection biases (e.g., DAC units) (Williams et al., 2024, Metta et al., 2023).

5. Empirical and Theoretical Consequences

The practical impact of neural bias functions is manifold:

Generalization and sample efficiency: The alignment between the class of target functions and the network's bias function ultimately determines generalization capability in limited data regimes. Networks biased toward the relevant function space—e.g., true low-degree, low-rank, or low-entropy tasks—achieve significantly higher sample efficiency.
Spurious modes and failure cases: Spectral bias not only underrepresents needed high-degree components but often "hallucinates" spurious low-degree modes absent from the target, leading to systematic overfitting and generalization failures if not properly controlled (Gorji et al., 2023).
Regularization and robustness: Architectural biases can enhance robustness (e.g., low-sensitivity bias of transformers for adversarial stability (Vasudeva et al., 2024)), but may also render models fragile on out-of-distribution or high-frequency tasks if the bias is mismatched to the problem.
Empirical validation: Controlled experiments routinely demonstrate that, without bias-controlling interventions, deep networks fit low-degree Fourier terms early (thermal "spectral heatmaps"), slow to fit high-frequencies, and respond poorly to artificial high-oscillation or high-entropy targets (Gorji et al., 2023, Rahaman et al., 2018, Choraria et al., 2022, Liu et al., 2023).

6. Meta-Learning, Neuroscience, and Interpretability

Meta-learning frameworks allow for the data-driven extraction of neural bias functions from black-box learners, including highly nonlinear circuits or biological networks. By constructing meta-objective loops, one can characterize the functional templates to which a given neural system generalizes most readily, providing a method to empirically reveal and interpret the network's inductive priors (Dorrell et al., 2022). This methodology forms a bridge to neuroscientific theories where bias functions help explain rapid, robust generalization in neural circuits, the effects of architectural motifs (such as clustered connectivity or sign constraints), and the emergence of efficient behavioral computations.

7. Future Directions and Open Problems

Research on neural bias functions continues to advance:

Development of scalable, function-space regularizers for complex, high-dimensional domains (Gorji et al., 2023).
Characterization of bias in nonstandard architectures (transformers, spiking nets, convolutional ensembles) under different optimization regimes (Vasudeva et al., 2024, Dorrell et al., 2022).
Further connections between functional bias, sample complexity, and PAC–Bayesian generalization bounds (Palma et al., 2018, Mingard et al., 2019).
Extension of bias-matrix methods to other domains such as sequence analysis in neuroscience, where pairwise order biases capture complex temporal structure (Roth, 2016).
Engineering of activations and network motifs to tightly control the spectrum, rank, or geometric properties of the network's function space in order to optimize both approximation power and data efficiency (Hong et al., 2022, Lucey, 25 Apr 2025, Liu et al., 2023).

In sum, the neural bias function is central to the theoretical understanding and practical engineering of neural network generalization. It links architectural design, parameterization, regularization, and optimization to the precise manner in which a neural system "prefers" solutions in function space.

References:

"A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks" (Gorji et al., 2023)
"Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions" (Jacot, 2022)
"Neural networks are a priori biased towards Boolean functions with low entropy" (Mingard et al., 2019)
"FINER: Flexible spectral-bias tuning in Implicit NEural Representation by Variable-periodic Activation Functions" (Liu et al., 2023)
"The Spectral Bias of Polynomial Neural Networks" (Choraria et al., 2022)
"Meta-Learning the Inductive Biases of Simple Neural Circuits" (Dorrell et al., 2022)
"Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum" (Geifman et al., 2023)
"Points of non-linearity of functions generated by random neural networks" (Holmes, 2023)
"On the Spectral Bias of Neural Networks" (Rahaman et al., 2018)
"Random deep neural networks are biased towards simple functions" (Palma et al., 2018)
"On the Activation Function Dependence of the Spectral Bias of Neural Networks" (Hong et al., 2022)
"Expressivity of Neural Networks with Random Weights and Learned Biases" (Williams et al., 2024)
"Increasing biases can be more efficient than increasing weights" (Metta et al., 2023)
"Transformers Learn Low Sensitivity Functions: Investigations and Implications" (Vasudeva et al., 2024)
"Analysis of neuronal sequences using pairwise biases" (Roth, 2016)
"ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models" (Parkinson et al., 2023)
"Gradient Descent as a Shrinkage Operator for Spectral Bias" (Lucey, 25 Apr 2025)