Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust variance-regularized risk minimization with concomitant scaling (2301.11584v2)

Published 27 Jan 2023 in stat.ML and cs.LG

Abstract: Under losses which are potentially heavy-tailed, we consider the task of minimizing sums of the loss mean and standard deviation, without trying to accurately estimate the variance. By modifying a technique for variance-free robust mean estimation to fit our problem setting, we derive a simple learning procedure which can be easily combined with standard gradient-based solvers to be used in traditional machine learning workflows. Empirically, we verify that our proposed approach, despite its simplicity, performs as well or better than even the best-performing candidates derived from alternative criteria such as CVaR or DRO risks on a variety of datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Probability and Measure Theory. Academic Press, 2nd edition.
  2. Barron, J. T. (2019). A general and adaptive robust loss function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4331–4339.
  3. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357.
  4. Convex Optimization. Cambridge University Press.
  5. Empirical risk minimization for heavy-tailed losses. The Annals of Statistics, 43(6):2507–2536.
  6. Catoni, O. (2012). Challenging the empirical mean and empirical variance: a deviation study. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 48(4):1148–1185.
  7. Adaptive sampling for stochastic risk-averse learning. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pages 1036–1047.
  8. Stochastic model-based minimization of weakly convex functions. SIAM Journal on Optimization, 29(1):207–239.
  9. A Probabilistic Theory of Pattern Recognition. Springer.
  10. Sub-Gaussian mean estimators. The Annals of Statistics, 44(6):2695–2725.
  11. Variance-based regularization with convex objectives. Journal of Machine Learning Research, 20(68):1–55.
  12. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368.
  13. Fairness without demographics in repeated loss minimization. In Proceedings of the 35th International Conference on Machine Learning (ICML), volume 80 of Proceedings of Machine Learning Research, pages 1929–1938.
  14. A survey of learning criteria going beyond the usual risk. Journal of Artificial Intelligence Research, 73:781–821.
  15. Loss minimization and parameter estimation with heavy tails. Journal of Machine Learning Research, 17(18):1–40.
  16. Rank-based decomposable losses in machine learning: A survey. arXiv preprint arXiv:2207.08768v1.
  17. Huber, P. J. (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1):73–101.
  18. Robust Statistics. John Wiley & Sons, 2nd edition.
  19. Learning bounds for risk-sensitive learning. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pages 13867–13879.
  20. Tilted empirical risk minimization. In The 9th International Conference on Learning Representations (ICLR).
  21. Luenberger, D. G. (1969). Optimization by Vector Space Methods. John Wiley & Sons.
  22. Mean estimation and regression under heavy-tailed distributions: A survey. Foundations of Computational Mathematics, 19(5):1145–1190.
  23. Markowitz, H. (1952). Portfolio selection. Journal of Finance, 7(1):77–91.
  24. Empirical Bernstein bounds and sample variance penalization. In Proceedings of the 22nd Conference on Learning Theory (COLT).
  25. Robust unsupervised learning via L-statistic minimization. In 38th International Conference on Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pages 7524–7533.
  26. Long-tail learning via logit adjustment. In The 9th International Conference on Learning Representations (ICLR).
  27. Foundations of Machine Learning. MIT Press.
  28. Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. Springer.
  29. Rey, W. J. J. (1983). Introduction to Robust and Quasi-Robust Statistical Methods. Springer.
  30. Optimization of conditional value-at-risk. Journal of Risk, 2:21–42.
  31. The fundamental risk quadrangle in risk management, optimization and statistical estimation. Surveys in Operations Research and Management Science, 18(1-2):33–53.
  32. Royset, J. O. (2022). Risk-adaptive approaches to learning and decision making: A survey. arXiv preprint arXiv:2212.00856.
  33. Shapiro, A. (2017). Distributionally robust stochastic programming. SIAM Journal on Optimization, 27(4):2258–2275.
  34. Sun, Q. (2021). Do we need to estimate the variance in robust mean estimation? arXiv preprint arXiv:2107.00118v1.
  35. Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Statistics for Engineering and Information Science. Springer, 2nd edition.
  36. DORO: Distributional and outlier robust optimization. In 38th International Conference on Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pages 12345–12355.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com