Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bayesian Nonparametrics Meets Data-Driven Distributionally Robust Optimization (2401.15771v5)

Published 28 Jan 2024 in stat.ML and cs.LG

Abstract: Training machine learning and statistical models often involves optimizing a data-driven risk criterion. The risk is usually computed with respect to the empirical data distribution, but this may result in poor and unstable out-of-sample performance due to distributional uncertainty. In the spirit of distributionally robust optimization, we propose a novel robust criterion by combining insights from Bayesian nonparametric (i.e., Dirichlet process) theory and a recent decision-theoretic model of smooth ambiguity-averse preferences. First, we highlight novel connections with standard regularized empirical risk minimization techniques, among which Ridge and LASSO regressions. Then, we theoretically demonstrate the existence of favorable finite-sample and asymptotic statistical guarantees on the performance of the robust optimization procedure. For practical implementation, we propose and study tractable approximations of the criterion based on well-known Dirichlet process representations. We also show that the smoothness of the criterion naturally leads to standard gradient-based numerical optimization. Finally, we provide insights into the workings of our method by applying it to a variety of tasks based on simulated and real datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Stochastic Approximations to the Pitman–Yor Process. Bayesian Analysis, 14(4):1201 – 1219, 2019.
  2. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.
  3. Models for minimax stochastic linear optimization problems with risk aversion. Mathematics of Operations Research, 35(3):580–602, 2010.
  4. Ferguson distributions via Pólya urn schemes. The Annals of Statistics, 1(2):353–355, 1973.
  5. Uncertainty averse preferences. Journal of Economic Theory, 146(4):1275–1330, 2011.
  6. Ambiguity and robust statistics. Journal of Economic Theory, 148(3):974–1049, 2013.
  7. Christensen, R. Plane Answers to Complex Questions: The Theory of Linear Models. Springer, 2020.
  8. Are gibbs-type priors the most natural generalization of the dirichlet process? IEEE transactions on pattern analysis and machine intelligence, 37(2):212–229, 2015.
  9. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations research, 58(3):595–612, 2010.
  10. Efron, B. Bootstrap methods: another look at the jackknife. In Breakthroughs in statistics: Methodology and distribution, pp.  569–593. Springer, 1992.
  11. Ferguson, T. S. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, pp.  209–230, 1973.
  12. Ferguson, T. S. Prior distributions on spaces of probability measures. The Annals of Statistics, 2(4):615–629, 1974.
  13. Handbook of convergence theorems for (stochastic) gradient methods. arXiv preprint arXiv:2301.11235, 2023.
  14. Fundamentals of nonparametric Bayesian inference, volume 44. Cambridge University Press, 2017.
  15. Ambiguity and the Bayesian paradigm. Readings in formal epistemology: Sourcebook, pp.  385–439, 2016.
  16. Exchangeable gibbs partitions and stirling triangles. Journal of Mathematical Sciences, 138(3):5674–5685, 2006.
  17. The elements of statistical learning: Data mining, inference, and prediction, 2009.
  18. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970.
  19. Kingman, J. F. C. Poisson processes, volume 3. Clarendon Press, 1992.
  20. A smooth model of decision making under ambiguity. Econometrica, 73(6):1849–1892, 2005.
  21. Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations research & management science in the age of analytics, pp.  130–166. INFORMS, 2019.
  22. Concentration of measure. Technical report, Carnegie Mellon University, 2010. URL https://www.stat.cmu.edu/~larry/=sml/Concentration.pdf.
  23. Models beyond the Dirichlet process. In Hjort, N. L., Holmes, C., Müller, P., and Walker, S. G. (eds.), Bayesian Nonparametrics, Cambridge Series in Statistical and Probabilistic Mathematics, pp.  80–136. Cambridge University Press, 2010.
  24. Nonparametric learning from Bayesian models with randomized objective functions. Advances in Neural Information Processing Systems, 31, 2018.
  25. Majumdar, S. On topological support of Dirichlet prior. Statistics & Probability Letters, 15(5):385–388, 1992.
  26. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Mathematical Programming, 171(1-2):115–166, 2018.
  27. Approximating distributions of random functionals of Ferguson-Dirichlet priors. Canadian Journal of Statistics, 26(2):283–297, 1998.
  28. Size-biased sampling of poisson point processes and excursions. Probability Theory and Related Fields, 92(1):21–39, 1992.
  29. Pitman, J. Exchangeable and partially exchangeable random partitions. Probability theory and related fields, 102(2):145–158, 1995.
  30. Pitman, J. Some developments of the Blackwell-Macqueen urn scheme. In Ferguson, T. S., Shapley, L. S., and MacQueen, J. B. (eds.), Statistics, probability and game theory: Papers in honor of David Blackwell, volume 30 of IMS Lecture Notes - Monograph Series, pp. 245–267. Institute of Mathematical Statistics, 1996.
  31. Frameworks and results in distributionally robust optimization. Open Journal of Mathematical Optimization, 3:1–85, 2022.
  32. Distributional results for means of normalized random measures with independent increments. The Annals of Statistics, 31(2):560–585, 2003.
  33. Savage, L. J. The foundations of statistics. Courier Corporation, 1972.
  34. Linear regression analysis. John Wiley & Sons, 2003.
  35. Sethuraman, J. A constructive definition of Dirichlet priors. Statistica sinica, pp.  639–650, 1994.
  36. Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996.
  37. Van der Vaart, A. W. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  38. Wainwright, M. J. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
  39. Distributional robustness bounds generalization errors. arXiv preprint arXiv:2212.09962, 2022.
  40. Robust Markov decision processes. Mathematics of Operations Research, 38(1):153–183, 2013.
Citations (1)

Summary

We haven't generated a summary for this paper yet.