It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep Models (2310.09250v1)

Published 13 Oct 2023 in cs.LG, cs.AI, and stat.ML

Abstract: Classical wisdom in machine learning holds that the generalization error can be decomposed into bias and variance, and these two terms exhibit a \emph{trade-off}. However, in this paper, we show that for an ensemble of deep learning based classification models, bias and variance are \emph{aligned} at a sample level, where squared bias is approximately \emph{equal} to variance for correctly classified sample points. We present empirical evidence confirming this phenomenon in a variety of deep learning models and datasets. Moreover, we study this phenomenon from two theoretical perspectives: calibration and neural collapse. We first show theoretically that under the assumption that the models are well calibrated, we can observe the bias-variance alignment. Second, starting from the picture provided by the neural collapse theory, we show an approximate correlation between bias and variance.

Authors (5)

Lin Chen (384 papers)
Michal Lukasik (23 papers)
Wittawat Jitkrittum (42 papers)
Chong You (35 papers)
Sanjiv Kumar (123 papers)

Citations (2)

View on Semantic Scholar

Summary

The paper presents bias-variance alignment where squared bias approximates variance on correctly classified samples, challenging the classical trade-off view.
It uses rigorous quantitative regression to show that this alignment strengthens in over-parameterized models across datasets like ImageNet and CIFAR.
The study links calibration and neural collapse theories to bias-variance dynamics, offering a novel method for model validation and selection.

Insights into Bias-Variance Alignment in Deep Learning Models

The paper "It's an Alignment, Not a Trade-off:" challenges the classical view of bias-variance trade-off in machine learning by presenting empirical and theoretical evidence for a phenomenon termed as "bias-variance alignment." This paper provides a detailed analysis, suggesting that for large, over-parameterized deep learning models, bias and variance align at a sample level, thereby exhibiting a structure different from the traditionally assumed trade-off.

Empirical Observations and Methodologies

The authors demonstrate through extensive empirical analysis that for correctly classified sample points in ensemble deep learning models, squared bias approximately equals variance, revealing a strong linear relationship on the logarithmic scale. This relationship, termed as bias-variance alignment, is consistently observed across various model architectures and datasets, including ImageNet and CIFAR collections. Notably, this alignment becomes more pronounced as model size increases, indicating its specificity to over-parameterized models.

To measure this alignment, the paper employs a rigorous quantitative regression analysis, showing high coefficients of determination in log-scale regressions of variance on squared bias. The empirical investigation is supported by plots that clearly show this alignment, especially for larger models, which exhibit a cone-shaped distribution when bias and variance are plotted in linear scale, reflecting a wider range of variance with increasing bias.

Theoretical Contributions

From a theoretical perspective, the paper explores two main lines of reasoning:

Calibration Perspective: The authors connect model calibration—where model outputs reflect true Bayesian probabilities—with bias-variance alignment. They propose that if a model is well-calibrated, then bias and variance align. Theoretical bounds are presented, showing that the discrepancy between squared bias and variance can be measured by calibration error. This aligns with previous studies linking calibration to generalization metrics, yet provides a novel contribution by directly associating it with bias-variance dynamics.
Neural Collapse Theory: This theory suggests that, within deep networks, last-layer features exhibit specific geometric structures—an observation that is characterized by neural collapse. The paper leverages this theory to illustrate a statistical model where ensemble predictions under neural collapse theory naturally align bias and variance. Particularly in binary classification problems, they compute explicit bounds illustrating the ratio of bias to variance, affirming the approximations found in empirical results.

Implications and Potential Applications

The findings of this paper have significant implications for model validation and selection. By providing a robust mechanism to estimate generalization error based on variance—even without true labels—practitioners can effectively validate models using variance as a proxy for bias. This has practical value in developing more reliable AI systems and understanding why deep, over-parameterized models generalize effectively despite their size.

Furthermore, the paper suggests future explorations into how bias-variance alignment might inform dynamic model routing and ensemble selection strategies, providing a fertile ground for developing more efficient and effective learning systems.

Conclusion

Overall, this paper extends the understanding of generalization in deep learning, presenting bias-variance alignment as a nuanced perspective that is specific to contemporary, large-scale neural networks. Such insights contribute to a deeper theoretical understanding of neural network behavior, with immediate implications for both the design and evaluation of machine learning systems. The paper encourages a rethinking of the classical bias-variance trade-off, inviting future research to further explore the boundaries and applications of this alignment.

PDF Markdown

Related Papers

Tweets

https://twitter.com/kellerjordan0/status/1767315965153595900

https://twitter.com/knishimae0531/status/1767353221323096288