Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Metric Entropy-Free Sample Complexity Bounds for Sample Average Approximation in Convex Stochastic Programming (2401.00664v6)

Published 1 Jan 2024 in math.OC, cs.LG, math.PR, math.ST, and stat.TH

Abstract: This paper studies sample average approximation (SAA) in solving convex or strongly convex stochastic programming (SP) problems. In estimating SAA's sample efficiency, the state-of-the-art sample complexity bounds entail metric entropy terms (such as the logarithm of the feasible region's covering number), which often grow polynomially with problem dimensionality. While it has been shown that metric entropy-free complexity rates are attainable under a uniform Lipschitz condition, such an assumption can be overly critical for many important SP problem settings. In response, this paper presents perhaps the first set of metric entropy-free sample complexity bounds for the SAA under standard SP assumptions -- in the absence of the uniform Lipschitz condition. The new results often lead to an $O(d)$-improvement in the complexity rate than the state-of-the-art. From the newly established complexity bounds, an important revelation is that SAA and the canonical stochastic mirror descent (SMD) method, two mainstream solution approaches to SP, entail almost identical rates of sample efficiency, lifting a theoretical discrepancy of SAA from SMD also by the order of $O(d)$. Furthermore, this paper explores non-Lipschitzian scenarios where SAA maintains provable efficacy but the corresponding results for SMD remain mostly unexplored, indicating the potential of SAA's better applicability in some irregular settings. Our numerical experiment results on SAA for solving a simulated SP problem align with our theoretical findings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Information-theoretic lower bounds on the oracle complexity of convex optimization. Advances in Neural Information Processing Systems, 22, 2009.
  2. Z. Artstein and R. J. Wets. Consistency of minimizers and the slln for stochastic programs. J. Convex Anal, 2(1-2):1–17, 1995.
  3. Machine learning and portfolio optimization. Management Science, 64(3):1136–1154, 2018.
  4. The ordered subsets mirror descent optimization method with applications to tomography. SIAM Journal on Optimization, 12(1):79–108, 2001.
  5. Simultaneous analysis of lasso and dantzig selector. The Annals of statistics, 37(4):1705–1732, 2009.
  6. J. R. Birge. State-of-the-art-survey: Stochastic programming: Computation and applications. INFORMS journal on computing, 9(2):111–133, 1997.
  7. J. R. Birge and F. Louveaux. Introduction to stochastic programming. Springer Science & Business Media, 2011.
  8. O. Bousquet and A. Elisseeff. Stability and generalization. The Journal of Machine Learning Research, 2:499–526, 2002.
  9. C. Bugg and A. Aswani. Logarithmic sample bounds for sample average approximation with capacity-or budget-constraints. Operations Research Letters, 49(2):231–238, 2021.
  10. E. Candes and T. Tao. The dantzig selector: Statistical estimation when p is much larger than n. The annals of Statistics, 35(6):2313–2351, 2007.
  11. Z. Dai and F. Wen. Some improved sparse and stable portfolio optimization problems. Finance Research Letters, 27:46–52, 2018.
  12. D. L. Donoho. Superresolution via sparsity constraints. SIAM journal on mathematical analysis, 23(5):1309–1331, 1992.
  13. D. L. Donoho. Compressed sensing. IEEE Transactions on information theory, 52(4):1289–1306, 2006.
  14. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pages 272–279, 2008.
  15. J. Dupacová and R. Wets. Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. The annals of statistics, 16(4):1517–1549, 1988.
  16. J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc., 96(456):1348–1360, 2003.
  17. Challenges of big data analysis. National science review, 1(2):293–314, 2014.
  18. A shrinkage principle for heavy-tailed data: High-dimensional robust low-rank matrix recovery. Annals of statistics, 49(3):1239, 2021.
  19. V. Feldman and J. Vondrak. High probability generalization bounds for uniformly stable algorithms with nearly optimal rate. In Conference on Learning Theory, pages 1270–1279. PMLR, 2019.
  20. S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization i: A generic algorithmic framework. SIAM Journal on Optimization, 22(4):1469–1492, 2012.
  21. S. Ghadimi and G. Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
  22. S. Ghadimi and G. Lan. Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming, 156(1-2):59–99, 2016.
  23. Tikhonov regularization and total least squares. SIAM journal on matrix analysis and applications, 21(1):185–194, 1999.
  24. Non-asymptotic confidence bounds for the optimal value of a stochastic program. Optimization Methods and Software, 32(5):1033–1058, 2017.
  25. Ridge regression: applications to nonorthogonal problems. Technometrics, 12(1):69–82, 1970.
  26. Sample complexity of sample average approximation for conditional stochastic optimization. SIAM Journal on Optimization, 30(3):2103–2133, 2020.
  27. V. Kaňková and M. Houda. Thin and heavy tails in stochastic programming. Kybernetika, 51(3):433–456, 2015.
  28. Asymptotic theory for solutions in statistical estimation and stochastic programming. Mathematics of Operations Research, 18(1):148–162, 1993.
  29. Epi-consistency of convex stochastic programs. Stochastics and Stochastic Reports, 34(1-2):83–92, 1991.
  30. The sample average approximation method for stochastic discrete optimization. SIAM Journal on optimization, 12(2):479–502, 2002.
  31. Y. Klochkov and N. Zhivotovskiy. Stability and deviation optimal risk bounds with convergence rate o⁢(1/n)𝑜1𝑛o(1/n)italic_o ( 1 / italic_n ). Advances in Neural Information Processing Systems, 34:5065–5076, 2021.
  32. G. Lan. First-order and stochastic optimization methods for machine learning, volume 1. Springer, 2020.
  33. Regularized sample average approximation for high-dimensional stochastic optimization under low-rankness. Journal of Global Optimization, 85(2):257–282, 2023.
  34. Y. Lei and Y. Ying. Sharper generalization bounds for learning with gradient-dominated objective functions. In International Conference on Learning Representations, 2020.
  35. A new sparse optimization scheme for simultaneous beam angle and fluence map optimization in radiotherapy planning. Physics in Medicine & Biology, 62(16):6428, 2017.
  36. Sample average approximation with sparsity-inducing penalty for high-dimensional stochastic programming. Mathematical programming, 178(1):69–108, 2019.
  37. High-dimensional learning under approximate sparsity with applications to nonsmooth estimation and regularized neural networks. Operations Research, 70(6):3176–3197, 2022.
  38. Dimensionality-dependent generalization bounds for k-dimensional coding schemes. Neural computation, 28(10):2213–2249, 2016.
  39. P.-L. Loh. Statistical consistency and asymptotic normality for high-dimensional robust m𝑚mitalic_m-estimators. The Annals of Statistics, 45(2):866, 2017.
  40. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Advances in neural information processing systems, 24, 2011.
  41. J. Milz. Sample average approximations of strongly convex stochastic programs in hilbert spaces. Optimization Letters, 17(2):471–492, 2023.
  42. A unified framework for high-dimensional analysis of m𝑚mitalic_m-estimators with decomposable regularizers. Statistical science, 27(4):538–557, 2012.
  43. Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4):1574–1609, 2009.
  44. R. I. Oliveira and P. Thompson. Sample average approximation with heavier tails i: non-asymptotic bounds with weak assumptions and stochastic constraints. Mathematical Programming, 199(1-2):1–48, 2023.
  45. V. Omelchenko and V. Kankova. Empirical estimates in stochastic programs with probability and second order stochastic dominance constraints. Acta Mathematica Universitatis Comenianae, 84(2):267–281, 2015.
  46. G. C. Pflug. Asymptotic stochastic programs. Mathematics of Operations Research, 20(4):769–789, 1995.
  47. G. C. Pflug. Stochastic programs and statistical data. Annals of Operations Research, 85(0):59–78, 1999.
  48. G. C. Pflug. Stochastic optimization and statistical inference. Handbooks in operations research and management science, 10:427–482, 2003.
  49. Stability results in learning theory. Analysis and Applications, 3(04):397–417, 2005.
  50. Making gradient descent optimal for strongly convex stochastic optimization. arXiv preprint arXiv:1109.5647, 2011.
  51. Minimax rates of estimation for high-dimensional linear regression over ℓqsubscriptℓ𝑞\ell_{q}roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-balls. IEEE transactions on information theory, 57(10):6976–6994, 2011.
  52. A. Ruszczyński and A. Shapiro. Stochastic programming models. Handbooks in operations research and management science, 10:1–64, 2003.
  53. Stochastic convex optimization. In COLT, volume 2. No. 4, page 5, 2009.
  54. Learnability, stability and uniform convergence. The Journal of Machine Learning Research, 11:2635–2670, 2010.
  55. A. Shapiro. Asymptotic properties of statistical estimators in stochastic programming. The Annals of Statistics, 17(2):841–858, 1989.
  56. A. Shapiro. Monte carlo sampling methods. Handbooks in operations research and management science, 10:353–425, 2003.
  57. Lectures on stochastic programming: modeling and theory. SIAM, 2021.
  58. Statistical guarantees for regularized neural networks. Neural Networks, 142:148–161, 2021.
  59. R. Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  60. M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-constrained quadratic programming (lasso). IEEE transactions on information theory, 55(5):2183–2202, 2009.
  61. M. J. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
  62. A practical algorithm for vmat optimization using column generation techniques. Medical Physics, 49(7):4335–4352, 2022.
  63. C.-H. Zhang et al. Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics, 38(2):894–942, 2010.
  64. Gene selection using support vector machines with non-convex penalty. bioinformatics, 22(1):88–95, 2006.
Citations (1)

Summary

We haven't generated a summary for this paper yet.