Precise Asymptotics of Bagging Regularized M-estimators (2409.15252v2)

Published 23 Sep 2024 in math.ST, stat.ML, and stat.TH

Abstract: We characterize the squared prediction risk of ensemble estimators obtained through subagging (subsample bootstrap aggregating) regularized M-estimators and construct a consistent estimator for the risk. Specifically, we consider a heterogeneous collection of $M \ge 1$ regularized M-estimators, each trained with (possibly different) subsample sizes, convex differentiable losses, and convex regularizers. We operate under the proportional asymptotics regime, where the sample size $n$, feature size $p$, and subsample sizes $k_m$ for $m \in [M]$ all diverge with fixed limiting ratios $n/p$ and $k_m/n$. Key to our analysis is a new result on the joint asymptotic behavior of correlations between the estimator and residual errors on overlapping subsamples, governed through a (provably) contractible nonlinear system of equations. Of independent interest, we also establish convergence of trace functionals related to degrees of freedom in the non-ensemble setting (with $M = 1$) along the way, extending previously known cases for square loss and ridge, lasso regularizers. When specialized to homogeneous ensembles trained with a common loss, regularizer, and subsample size, the risk characterization sheds some light on the implicit regularization effect due to the ensemble and subsample sizes $(M,k)$. For any ensemble size $M$, optimally tuning subsample size yields sample-wise monotonic risk. For the full-ensemble estimator (when $M \to \infty$), the optimal subsample size $k^\star$ tends to be in the overparameterized regime $(k^\star \le \min{n,p})$, when explicit regularization is vanishing. Finally, joint optimization of subsample size, ensemble size, and regularization can significantly outperform regularizer optimization alone on the full data (without any subagging).

Summary

The paper provides a precise asymptotic risk analysis of ensemble regularized M-estimators trained via subagging on overlapping subsamples.
It introduces a consistent observable risk estimator for tuning ensemble hyperparameters and uncovers implicit regularization effects in overparameterized settings.
Findings demonstrate that larger homogeneous ensembles and optimal subsample sizes significantly improve predictive performance and risk minimization.

Precise Asymptotics of Bagging Regularized M-estimators

In the paper "Precise Asymptotics of Bagging Regularized M-estimators," Koriyama, Patil, Du, Tan, and Bellec present an in-depth analysis of the squared prediction risk of ensemble estimators obtained through subsample bootstrap aggregating (subagging) regularized M-estimators. These estimators are trained with convex differentiable losses and convex regularizers. The paper is grounded in the proportional asymptotics regime, where the sample size $n$ , feature size $p$ , and subsample sizes $k_m$ for $m \in [M]$ all diverge with fixed limiting ratios $n/p$ and $k_m/n$ .

Motivation and Main Contributions

The analysis of ensemble methods, particularly those involving subagging, is fueled by their ability to enhance predictive performance, especially in overparameterized regimes. While significant theoretical work exists for specific ensemble methods like ridge and lasso, this paper generalizes these findings to a broader class of regularized M-estimators. The main contributions are:

Asymptotic Risk Analysis: The authors characterize the squared prediction risk of ensembles of regularized M-estimators, providing consistent estimators for this risk. This involves deriving new results on the joint asymptotic behavior of correlations between the estimator and residual errors on overlapping subsamples.
Homogeneous Ensembles: The paper explores the special case of homogeneous ensembles, where component models share the same loss function, regularizer, and subsample size. This examination reveals insights about the implicit regularization effects due to the ensemble and subsample sizes.
Implications for Overparameterization: The paper explores subagging in the context of vanishing regularization and contrasts it with explicitly regularized models, showing the advantages of joint optimization of subsample size, ensemble size, and regularization parameter.

Key Theoretical Insights

Non-Homogeneous Case

The authors consider a collection of $M \ge 1$ regularized M-estimators, each trained with potentially different subsample sizes. The risk analysis hinges on the joint asymptotic behavior of the estimator errors and residuals. Two crucial systems of nonlinear equations are introduced to characterize these behaviors:

Non-Ensemble Setting ( $M = 1$ ): The parameters $(\alpha, \beta, \kappa, \nu)$ in this setting are defined based on the asymptotic behavior of the individual regularized M-estimators. This system extends the known results from prior literature to general losses and regularizers.
Full-Ensemble Setting ( $M \rightarrow \infty$ ): The correlation parameters $(\etaG, \etaH)$ govern the asymptotic behavior of overlaps between ensemble components. These parameters are solutions to a contraction map system, ensuring existence and uniqueness under mild conditions.

Homogeneous Ensembles

For homogeneous ensembles, where all components share the same loss and regularization functions:

Monotonicity in Ensemble Size: The paper proves that increasing the number of ensembles $M$ reduces the risk, i.e., $\cR_{M+1} < \cR_M$. This suggests the benefits of larger ensembles, provided computational resources allow.
Optimal Subsample Size: Interestingly, the optimal subsample size for ensembles without explicit regularization ( $k \rightarrow 0$ ) often lies in the overparameterized regime. Hence, even in originally underparameterized settings (where $n > p$ ), the optimal subsample size is such that $k < p$ .

Practical Risk Estimation

A significant contribution is the development of an observable risk estimator $\EST$, which approximates the prediction risk from data, enabling practical tuning of ensemble hyperparameters. This estimator is consistent for the prediction risk, allowing for effective model selection even when the noise distribution has heavy tails.

Numerical Results

The authors provide extensive numerical experiments to validate their theoretical findings. For instance, they demonstrate that:

The risk of the ensemble estimator decreases monotonically with the ensemble size $M$ , illustrating the practical benefits of ensembling.
Optimal ensembles often benefit from overparameterization, even in settings where the full dataset is underparameterized.
Joint optimization of subsample size and regularization outperforms optimizing regularization alone, highlighting the additional regularization effect induced by subsampling and ensembling.

Future Directions

This work opens several avenues for future research:

Extending analysis to non-differentiable losses and relaxing assumptions on regularizers.
Generalizing the scope of features to non-Gaussian and anisotropic designs.
Exploring alternative resampling strategies, including sampling with replacement and other ensemble methods beyond subagging.

Conclusion

The paper "Precise Asymptotics of Bagging Regularized M-estimators" contributes significantly to the theoretical understanding of ensemble methods in high-dimensional settings. It generalizes existing results to a wider class of regularized M-estimators, provides practical risk estimators for ensemble tuning, and demonstrates the utility of subagging in achieving implicit regularization, especially in overparameterized regimes. This work is valuable for researchers aiming to optimize ensemble methods in machine learning and statistics.

PDF Markdown

Related Papers

Tweets

https://twitter.com/takuya_koriyama/status/1838423633548497279