Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random Designs (2409.18842v1)

Published 27 Sep 2024 in stat.ML and cs.LG

Abstract: The sudden appearance of modern ML phenomena like double descent and benign overfitting may leave many classically trained statisticians feeling uneasy -- these phenomena appear to go against the very core of statistical intuitions conveyed in any introductory class on learning from data. The historical lack of earlier observation of such phenomena is usually attributed to today's reliance on more complex ML methods, overparameterization, interpolation and/or higher data dimensionality. In this note, we show that there is another reason why we observe behaviors today that appear at odds with intuitions taught in classical statistics textbooks, which is much simpler to understand yet rarely discussed explicitly. In particular, many intuitions originate in fixed design settings, in which in-sample prediction error (under resampling of noisy outcomes) is of interest, while modern ML evaluates its predictions in terms of generalization error, i.e. out-of-sample prediction error in random designs. Here, we highlight that this simple move from fixed to random designs has (perhaps surprisingly) far-reaching consequences on textbook intuitions relating to the bias-variance tradeoff, and comment on the resulting (im)possibility of observing double descent and benign overfitting in fixed versus random designs.

Citations (1)

Summary

  • The paper demonstrates that classical bias-variance principles fail in random design settings, elucidating modern ML phenomena such as double descent.
  • It employs a k-NN estimator to show that increased model complexity can simultaneously lower both bias and variance in random designs.
  • The research emphasizes that benign overfitting in overparameterized models requires a revised evaluation strategy distinct from fixed design assumptions.

Classical Statistical (In-Sample) Intuitions Don't Generalize Well: Insights Into Bias-Variance Tradeoffs and Modern ML Phenomena

This paper, authored by Alicia Curth from the University of Cambridge, addresses a critical and often overlooked issue in the field of statistical and ML methodologies. It examines why certain modern ML phenomena, such as double descent and benign overfitting, appear to contradict classical statistical intuitions. The author attributes this contradiction to a fundamental shift from fixed design settings, typically addressed in classical statistics, to random design settings, which are more common in modern ML.

Introduction

The efficacy of overparameterized ML models trained to zero loss, which is increasingly observed today, seemingly contradicts traditional statistical teachings regarding overfitting. Historically, the absence of such phenomena was often attributed to the classical statistics' simpler methods and lower-dimensional data. This paper posits that the key reason lies in a shift from analyzing in-sample prediction errors in fixed designs to evaluating generalization errors in random designs. This subtle yet powerful shift has significant implications for our understanding of the bias-variance tradeoff and the occurrence of phenomena such as double descent and benign overfitting.

Fixed vs. Random Designs

In classical statistics, the focus was on fixed design settings where in-sample prediction errors were of primary interest. In contrast, modern ML emphasizes generalization from training samples to new, unseen data, making the out-of-sample prediction error a critical evaluation metric. The paper defines:

  • Fixed design settings: Used in classical statistics where test inputs are the same as training inputs, but with new outcome realizations.
  • Random design settings: Where new test inputs are sampled anew, differing from training inputs, and performance is evaluated on this out-of-sample error.

Revisiting the Bias-Variance Tradeoff

Using a simple k-Nearest Neighbor (k-NN) estimator, Curth demonstrates that the classical bias-variance tradeoff intuition does not always hold in random design settings. In classical settings, increasing model complexity (decreasing k) typically increases variance but decreases bias. However, in random design settings, both bias and variance can decrease with increasing complexity, negating the traditional bias-variance tradeoff. This key insight is illustrated through the analysis of in-sample vs. out-of-sample prediction errors.

Double Descent

Double descent, where prediction error initially increases with model complexity but then decreases in the overparameterized regime, has garnered attention for its contradiction to classical views. The paper argues that the historical absence of double descent in statistical literature is not just due to the lack of overparameterized models but also because fixed design settings do not exhibit double descent. Empirical evidence provided shows that double descent appears only in out-of-sample prediction error plots and not in in-sample plots.

Benign Overfitting

Benign overfitting, where models perform well despite fitting the training data perfectly, also appears at odds with classical wisdom. The paper clarifies the term "overfitting," emphasizing the distinction between fitting the training data perfectly (interpolation) and genuinely suffering from overfit (poor generalization). In fixed design settings, interpolation cannot be benign due to inherent bias and variance decomposition. However, in random design settings, interpolation can be benign if models differentiate effectively between training and new inputs. This is demonstrated using examples where interpolating models show surprisingly good generalization by being spiked around training data but smooth elsewhere.

Conclusion

The paper concludes that classical statistical intuitions about the bias-variance tradeoff need to be revisited in light of the shift towards generalization-focused evaluations in modern ML. It underscores that the move from fixed to random design considerations is crucial for understanding phenomena like double descent and benign overfitting. The findings suggest that classical overfitting concerns are less critical in contexts where ML models interpolate training data benignly, although specific applications still require caution.

Implications and Future Directions

The implications of this research are significant for both theoretical and practical advancements in ML:

  • Theoretical: The paper prompts a re-evaluation of long-held statistical principles, suggesting a need for updated pedagogical approaches in statistical learning.
  • Practical: For practitioners, the insights aid in understanding when to be concerned about overfitting and when it might be benign, impacting model selection and evaluation strategies.

The findings provoke further research into other classical intuitions that may be affected by the move to random design settings and advance the dialogue on the interplay between classical statistics and modern ML methodologies.

Acknowledgements

The paper acknowledges contributions and discussions that enriched the research, highlighting collaborative efforts that underpin academic progress.

References

The paper cites key references that contextualize its contribution within the broader landscape of statistical and ML research, including foundational texts and recent advancements that explore bias-variance tradeoffs, overfitting, and generalization phenomena.

In summary, this paper offers a nuanced exploration of how statistical intuitions intersect with modern ML practices, providing a clearer lens through which to examine the surprising behaviors increasingly observed in complex, overparameterized models.