- The paper provides a precise asymptotic risk analysis of ensemble regularized M-estimators trained via subagging on overlapping subsamples.
- It introduces a consistent observable risk estimator for tuning ensemble hyperparameters and uncovers implicit regularization effects in overparameterized settings.
- Findings demonstrate that larger homogeneous ensembles and optimal subsample sizes significantly improve predictive performance and risk minimization.
Precise Asymptotics of Bagging Regularized M-estimators
In the paper "Precise Asymptotics of Bagging Regularized M-estimators," Koriyama, Patil, Du, Tan, and Bellec present an in-depth analysis of the squared prediction risk of ensemble estimators obtained through subsample bootstrap aggregating (subagging) regularized M-estimators. These estimators are trained with convex differentiable losses and convex regularizers. The paper is grounded in the proportional asymptotics regime, where the sample size n, feature size p, and subsample sizes km for m∈[M] all diverge with fixed limiting ratios n/p and km/n.
Motivation and Main Contributions
The analysis of ensemble methods, particularly those involving subagging, is fueled by their ability to enhance predictive performance, especially in overparameterized regimes. While significant theoretical work exists for specific ensemble methods like ridge and lasso, this paper generalizes these findings to a broader class of regularized M-estimators. The main contributions are:
- Asymptotic Risk Analysis: The authors characterize the squared prediction risk of ensembles of regularized M-estimators, providing consistent estimators for this risk. This involves deriving new results on the joint asymptotic behavior of correlations between the estimator and residual errors on overlapping subsamples.
- Homogeneous Ensembles: The paper explores the special case of homogeneous ensembles, where component models share the same loss function, regularizer, and subsample size. This examination reveals insights about the implicit regularization effects due to the ensemble and subsample sizes.
- Implications for Overparameterization: The paper explores subagging in the context of vanishing regularization and contrasts it with explicitly regularized models, showing the advantages of joint optimization of subsample size, ensemble size, and regularization parameter.
Key Theoretical Insights
Non-Homogeneous Case
The authors consider a collection of M≥1 regularized M-estimators, each trained with potentially different subsample sizes. The risk analysis hinges on the joint asymptotic behavior of the estimator errors and residuals. Two crucial systems of nonlinear equations are introduced to characterize these behaviors:
- Non-Ensemble Setting (M=1): The parameters (α,β,κ,ν) in this setting are defined based on the asymptotic behavior of the individual regularized M-estimators. This system extends the known results from prior literature to general losses and regularizers.
- Full-Ensemble Setting (M→∞): The correlation parameters $(\etaG, \etaH)$ govern the asymptotic behavior of overlaps between ensemble components. These parameters are solutions to a contraction map system, ensuring existence and uniqueness under mild conditions.
Homogeneous Ensembles
For homogeneous ensembles, where all components share the same loss and regularization functions:
- Monotonicity in Ensemble Size: The paper proves that increasing the number of ensembles M reduces the risk, i.e., $\cR_{M+1} < \cR_M$. This suggests the benefits of larger ensembles, provided computational resources allow.
- Optimal Subsample Size: Interestingly, the optimal subsample size for ensembles without explicit regularization (k→0) often lies in the overparameterized regime. Hence, even in originally underparameterized settings (where n>p), the optimal subsample size is such that k<p.
Practical Risk Estimation
A significant contribution is the development of an observable risk estimator $\EST$, which approximates the prediction risk from data, enabling practical tuning of ensemble hyperparameters. This estimator is consistent for the prediction risk, allowing for effective model selection even when the noise distribution has heavy tails.
Numerical Results
The authors provide extensive numerical experiments to validate their theoretical findings. For instance, they demonstrate that:
- The risk of the ensemble estimator decreases monotonically with the ensemble size M, illustrating the practical benefits of ensembling.
- Optimal ensembles often benefit from overparameterization, even in settings where the full dataset is underparameterized.
- Joint optimization of subsample size and regularization outperforms optimizing regularization alone, highlighting the additional regularization effect induced by subsampling and ensembling.
Future Directions
This work opens several avenues for future research:
- Extending analysis to non-differentiable losses and relaxing assumptions on regularizers.
- Generalizing the scope of features to non-Gaussian and anisotropic designs.
- Exploring alternative resampling strategies, including sampling with replacement and other ensemble methods beyond subagging.
Conclusion
The paper "Precise Asymptotics of Bagging Regularized M-estimators" contributes significantly to the theoretical understanding of ensemble methods in high-dimensional settings. It generalizes existing results to a wider class of regularized M-estimators, provides practical risk estimators for ensemble tuning, and demonstrates the utility of subagging in achieving implicit regularization, especially in overparameterized regimes. This work is valuable for researchers aiming to optimize ensemble methods in machine learning and statistics.