2000 character limit reached
Robust variance-regularized risk minimization with concomitant scaling (2301.11584v2)
Published 27 Jan 2023 in stat.ML and cs.LG
Abstract: Under losses which are potentially heavy-tailed, we consider the task of minimizing sums of the loss mean and standard deviation, without trying to accurately estimate the variance. By modifying a technique for variance-free robust mean estimation to fit our problem setting, we derive a simple learning procedure which can be easily combined with standard gradient-based solvers to be used in traditional machine learning workflows. Empirically, we verify that our proposed approach, despite its simplicity, performs as well or better than even the best-performing candidates derived from alternative criteria such as CVaR or DRO risks on a variety of datasets.
- Probability and Measure Theory. Academic Press, 2nd edition.
- Barron, J. T. (2019). A general and adaptive robust loss function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4331–4339.
- Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357.
- Convex Optimization. Cambridge University Press.
- Empirical risk minimization for heavy-tailed losses. The Annals of Statistics, 43(6):2507–2536.
- Catoni, O. (2012). Challenging the empirical mean and empirical variance: a deviation study. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 48(4):1148–1185.
- Adaptive sampling for stochastic risk-averse learning. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pages 1036–1047.
- Stochastic model-based minimization of weakly convex functions. SIAM Journal on Optimization, 29(1):207–239.
- A Probabilistic Theory of Pattern Recognition. Springer.
- Sub-Gaussian mean estimators. The Annals of Statistics, 44(6):2695–2725.
- Variance-based regularization with convex objectives. Journal of Machine Learning Research, 20(68):1–55.
- Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368.
- Fairness without demographics in repeated loss minimization. In Proceedings of the 35th International Conference on Machine Learning (ICML), volume 80 of Proceedings of Machine Learning Research, pages 1929–1938.
- A survey of learning criteria going beyond the usual risk. Journal of Artificial Intelligence Research, 73:781–821.
- Loss minimization and parameter estimation with heavy tails. Journal of Machine Learning Research, 17(18):1–40.
- Rank-based decomposable losses in machine learning: A survey. arXiv preprint arXiv:2207.08768v1.
- Huber, P. J. (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1):73–101.
- Robust Statistics. John Wiley & Sons, 2nd edition.
- Learning bounds for risk-sensitive learning. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pages 13867–13879.
- Tilted empirical risk minimization. In The 9th International Conference on Learning Representations (ICLR).
- Luenberger, D. G. (1969). Optimization by Vector Space Methods. John Wiley & Sons.
- Mean estimation and regression under heavy-tailed distributions: A survey. Foundations of Computational Mathematics, 19(5):1145–1190.
- Markowitz, H. (1952). Portfolio selection. Journal of Finance, 7(1):77–91.
- Empirical Bernstein bounds and sample variance penalization. In Proceedings of the 22nd Conference on Learning Theory (COLT).
- Robust unsupervised learning via L-statistic minimization. In 38th International Conference on Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pages 7524–7533.
- Long-tail learning via logit adjustment. In The 9th International Conference on Learning Representations (ICLR).
- Foundations of Machine Learning. MIT Press.
- Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. Springer.
- Rey, W. J. J. (1983). Introduction to Robust and Quasi-Robust Statistical Methods. Springer.
- Optimization of conditional value-at-risk. Journal of Risk, 2:21–42.
- The fundamental risk quadrangle in risk management, optimization and statistical estimation. Surveys in Operations Research and Management Science, 18(1-2):33–53.
- Royset, J. O. (2022). Risk-adaptive approaches to learning and decision making: A survey. arXiv preprint arXiv:2212.00856.
- Shapiro, A. (2017). Distributionally robust stochastic programming. SIAM Journal on Optimization, 27(4):2258–2275.
- Sun, Q. (2021). Do we need to estimate the variance in robust mean estimation? arXiv preprint arXiv:2107.00118v1.
- Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Statistics for Engineering and Information Science. Springer, 2nd edition.
- DORO: Distributional and outlier robust optimization. In 38th International Conference on Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pages 12345–12355.