Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hedging Complexity in Generalization via a Parametric Distributionally Robust Optimization Framework (2212.01518v2)

Published 3 Dec 2022 in math.OC and cs.LG

Abstract: Empirical risk minimization (ERM) and distributionally robust optimization (DRO) are popular approaches for solving stochastic optimization problems that appear in operations management and machine learning. Existing generalization error bounds for these methods depend on either the complexity of the cost function or dimension of the random perturbations. Consequently, the performance of these methods can be poor for high-dimensional problems with complex objective functions. We propose a simple approach in which the distribution of random perturbations is approximated using a parametric family of distributions. This mitigates both sources of complexity; however, it introduces a model misspecification error. We show that this new source of error can be controlled by suitable DRO formulations. Our proposed parametric DRO approach has significantly improved generalization bounds over existing ERM and DRO methods and parametric ERM for a wide variety of settings. Our method is particularly effective under distribution shifts and works broadly in contextual optimization. We also illustrate the superior performance of our approach on both synthetic and real-data portfolio optimization and regression tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Anderson D, Burnham K (2004) Model selection and multi-model inference. Second Edition. NY: Springer-Verlag 63(2020):10.
  2. Ban GY, Rudin C (2019) The big data newsvendor: Practical insights from machine learning. Operations Research 67(1):90–108.
  3. Bartlett PL, Mendelson S (2002) Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3(Nov):463–482.
  4. Bennouna M, Van Parys BP (2021) Learning and decision-making with data: Optimal formulations and phase transitions. arXiv preprint arXiv:2109.06911 .
  5. Bertsimas D, Kallus N (2020) From predictive to prescriptive analytics. Management Science 66(3):1025–1044.
  6. Bertsimas D, Van Parys B (2022) Bootstrap robust prescriptive analytics. Mathematical Programming 195(1-2):39–78.
  7. Birge JR, Louveaux F (2011) Introduction to Stochastic Programming (Springer Science & Business Media).
  8. Dehejia RH, Wahba S (1999) Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American Statistical Association 94(448):1053–1062.
  9. Delage E, Ye Y (2010) Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations Research 58(3):595–612.
  10. Dowson D, Landau B (1982) The Fréchet distance between multivariate normal distributions. Journal of Multivariate Analysis 12(3):450–455.
  11. Duchi J, Namkoong H (2019) Variance-based regularization with convex objectives. Journal of Machine Learning Research 20(1):2450–2504.
  12. Duchi JC, Namkoong H (2021) Learning models with uniform performance via distributionally robust optimization. The Annals of Statistics 49(3):1378–1406.
  13. Esfahani PM, Kuhn D (2018) Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Mathematical Programming 171(1):115–166.
  14. Esteban-Pérez A, Morales JM (2022) Distributionally robust stochastic programs with side information based on trimmings. Mathematical Programming 195(1-2):1069–1105.
  15. Fama E, French KR (2023) Fama-French data sets. http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html.
  16. Foster DJ, Syrgkanis V (2019) Orthogonal statistical learning. arXiv preprint arXiv:1901.09036 .
  17. Fournier N, Guillin A (2015) On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields 162(3):707–738.
  18. Gao R (2022) Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality. Operations Research .
  19. Goh J, Sim M (2010) Distributionally robust optimization and its tractable approximations. Operations Research 58(4-part-1):902–917.
  20. Gupta V (2019) Near-optimal Bayesian ambiguity sets for distributionally robust optimization. Management Science 65(9):4242–4260.
  21. Jiang R, Guan Y (2018) Risk-averse two-stage stochastic program with distributional ambiguity. Operations Research 66(5):1390–1405.
  22. Koltchinskii V (2011) Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: École D’Été de Probabilités de Saint-Flour XXXVIII-2008, volume 2033 (Springer Science & Business Media).
  23. Lam H (2016) Robust sensitivity analysis for stochastic systems. Mathematics of Operations Research 41(4):1248–1275.
  24. Lam H (2018) Sensitivity to serial dependency of input processes: A robust approach. Management Science 64(3):1311–1327.
  25. Lam H (2019) Recovering best statistical guarantees via the empirical divergence-based distributionally robust optimization. Operations Research 67(4):1090–1105.
  26. Lam H (2021) On the impossibility of statistically improving empirical optimization: A second-order stochastic dominance perspective. arXiv preprint arXiv:2105.13419 .
  27. Lam H, Li F (2020) Parametric scenario optimization under limited data: A distributionally robust optimization view. ACM Transactions on Modeling and Computer Simulation (TOMACS) 30(4):1–41.
  28. Lee J, Raginsky M (2018) Minimax statistical learning with Wasserstein distances. Advances in Neural Information Processing Systems 31.
  29. Liang T (2021) How well generative adversarial networks learn distributions. The Journal of Machine Learning Research 22(1):10366–10406.
  30. Liyanage LH, Shanthikumar JG (2005) A practical inventory control policy using operational statistics. Operations Research Letters 33(4):341–348.
  31. Matousek J (1999) Geometric Discrepancy: An Illustrated Guide, volume 18 (Springer Science & Business Media).
  32. Maurer A, Pontil M (2009) Empirical Bernstein bounds and sample variance penalization. arXiv preprint arXiv:0907.3740 .
  33. Müller A (1997) Integral probability metrics and their generating classes of functions. Advances in Applied Probability 29(2):429–443.
  34. Rahimian H, Mehrotra S (2019) Distributionally robust optimization: A review. arXiv preprint arXiv:1908.05659 .
  35. Shalev-Shwartz S, Ben-David S (2014) Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press).
  36. Spokoiny V (2012) Parametric estimation. Finite sample theory. The Annals of Statistics 40(6):2877–2909.
  37. Staib M, Jegelka S (2019) Distributionally robust optimization and generalization in kernel methods. Advances in Neural Information Processing Systems 32.
  38. van der Vaart A, Wellner J (1996) Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics (Springer).
  39. Vapnik V (1999) The Nature of Statistical Learning Theory (Springer science & business media).
  40. Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, volume 48 (Cambridge University Press).
  41. Xu Y, Zeevi A (2020) Towards optimal problem dependent generalization error bounds in statistical learning theory. arXiv preprint arXiv:2011.06186 .
  42. Zeng Y, Lam H (2022) Generalization bounds with minimal dependency on hypothesis class via distributionally robust optimization. Advances in Neural Information Processing Systems.
  43. Zhao C, Guan Y (2015) Data-driven risk-averse two-stage stochastic program with ζ𝜁\zetaitalic_ζ-structure probability metrics. Available on Optimization Online 2(5):1–40.
  44. Zhivotovskiy N, Hanneke S (2018) Localization of VC classes: Beyond local Rademacher complexities. Theoretical Computer Science 742:27–49.

Summary

We haven't generated a summary for this paper yet.