Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Constrained Online Two-stage Stochastic Optimization: Algorithm with (and without) Predictions (2401.01077v1)

Published 2 Jan 2024 in cs.LG

Abstract: We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the long-term average second-stage decision belongs to a set. We develop online algorithms for the online two-stage problem from adversarial learning algorithms. Also, the regret bound of our algorithm can be reduced to the regret bound of embedded adversarial learning algorithms. Based on this framework, we obtain new results under various settings. When the model parameters are drawn from unknown non-stationary distributions and we are given machine-learned predictions of the distributions, we develop a new algorithm from our framework with a regret $O(W_T+\sqrt{T})$, where $W_T$ measures the total inaccuracy of the machine-learned predictions. We then develop another algorithm that works when no machine-learned predictions are given and show the performances.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. S. Agrawal and N. R. Devanur. Bandits with concave rewards and convex knapsacks. In Proceedings of the fifteenth ACM conference on Economics and computation, pages 989–1006, 2014a.
  2. S. Agrawal and N. R. Devanur. Fast algorithms for online stochastic convex programming. In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms, pages 1405–1424. SIAM, 2014b.
  3. A dynamic near-optimal algorithm for online linear programming. Operations Research, 62(4):876–890, 2014.
  4. The newsvendor with advice. arXiv preprint arXiv:2305.07993, 2023.
  5. Secretary and online matching problems with machine learned advice. Advances in Neural Information Processing Systems, 33:7933–7944, 2020.
  6. A. Arlotto and I. Gurvich. Uniformly bounded regret in the multisecretary problem. Stochastic Systems, 2019.
  7. A. Arlotto and X. Xie. Logarithmic regret in the dynamic and stochastic knapsack problem with equal rewards. Stochastic Systems, 2020.
  8. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
  9. Bandits with knapsacks. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 207–216. IEEE, 2013.
  10. Single-leg revenue management with advice. arXiv preprint arXiv:2202.10939, 2022a.
  11. The best of many worlds: Dual mirror descent for online allocation problems. Operations Research, 2022b.
  12. The best of many worlds: Dual mirror descent for online allocation problems. Operations Research, 71(1):101–119, 2023.
  13. Online nash social welfare maximization with predictions. In Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1–19. SIAM, 2022.
  14. Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in neural information processing systems, 27, 2014.
  15. Non-stationary stochastic optimization. Operations research, 63(5):1227–1244, 2015.
  16. J. R. Birge and F. Louveaux. Introduction to stochastic programming. Springer Science & Business Media, 2011.
  17. N. Buchbinder and J. Naor. Online primal-dual algorithms for covering and packing. Mathematics of Operations Research, 34(2):270–286, 2009.
  18. Online learning with knapsacks: the best of both worlds. In International Conference on Machine Learning, pages 2767–2783. PMLR, 2022.
  19. Best of many worlds guarantees for online learning with knapsacks. arXiv preprint arXiv:2202.13710, 2022.
  20. Reinforcement learning for non-stationary markov decision processes: The blessing of (more) optimism. In International Conference on Machine Learning, pages 1843–1854. PMLR, 2020.
  21. Secretaries with advice. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 409–429, 2021.
  22. T. S. Ferguson et al. Who solved the secretary problem? Statistical science, 4(3):282–289, 1989.
  23. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
  24. Online resource allocation with convex-set machine-learned advice. arXiv preprint arXiv:2306.12282, 2023.
  25. A. Gupta and M. Molinaro. How experts can solve lps online. In European Symposium on Algorithms, pages 517–529. Springer, 2014.
  26. Better algorithms for stochastic bandits with adversarial corruptions. In Conference on Learning Theory, pages 1562–1578. PMLR, 2019.
  27. E. Hall and R. Willett. Dynamical models and tracking regret in online convex programming. In International Conference on Machine Learning, pages 579–587. PMLR, 2013.
  28. Leveraging demonstrations to improve online learning: Quality matters.
  29. E. Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3-4):157–325, 2016.
  30. A theory of QoS for wireless. IEEE, 2009.
  31. Adversarial bandits with knapsacks. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 202–219. IEEE, 2019.
  32. Online optimization: Competing with dynamic comparators. In Artificial Intelligence and Statistics, pages 398–406. PMLR, 2015.
  33. Adaptive algorithms for online convex optimization with long-term constraints. In International Conference on Machine Learning, pages 402–411. PMLR, 2016.
  34. J. Jiang and J. Zhang. Online resource allocation with stochastic resource consumption. 11 2019. 10.13140/RG.2.2.27542.09287.
  35. Online stochastic optimization with wasserstein based non-stationarity. arXiv preprint arXiv:2012.06961, 2020.
  36. B. Jin and W. Ma. Online bipartite matching with advice: Tight robustness-consistency tradeoffs for the two-stage model. Advances in Neural Information Processing Systems, 35:14555–14567, 2022.
  37. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International conference on machine learning, pages 2564–2572. PMLR, 2018.
  38. Primal beats dual on online packing lps in the random-order model. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 303–312. ACM, 2014.
  39. Online scheduling via learned weights. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1859–1877. SIAM, 2020.
  40. Simple and fast algorithm for binary integer and online linear programming. Advances in Neural Information Processing Systems, 33:9412–9421, 2020.
  41. Non-stationary bandits with knapsacks. arXiv preprint arXiv:2205.12427, 2022.
  42. T. Lykouris and S. Vassilvitskii. Competitive caching with machine learned advice. Journal of the ACM (JACM), 68(4):1–25, 2021.
  43. Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 114–122, 2018.
  44. L. Lyu and W. C. Cheung. Bandits with knapsacks: advice on time-varying demands. In International Conference on Machine Learning, pages 23212–23238. PMLR, 2023.
  45. Trading regret for efficiency: online convex optimization with long term constraints. Journal of Machine Learning Research, 13(Sep):2503–2528, 2012.
  46. Online optimization with uncertain information. ACM Transactions on Algorithms (TALG), 8(1):1–29, 2012.
  47. Adwords and generalized online matching. Journal of the ACM (JACM), 54(5):22–es, 2007.
  48. M. Molinaro and R. Ravi. The geometry of online packing linear programs. Mathematics of Operations Research, 39(1):46–59, 2014.
  49. A. Munoz and S. Vassilvitskii. Revenue optimization with approximate bid predictions. Advances in Neural Information Processing Systems, 30, 2017.
  50. M. J. Neely and H. Yu. Online convex optimization with time-varying constraints. arXiv preprint arXiv:1702.04783, 2017.
  51. Unifying the stochastic and the adversarial bandits with knapsack. arXiv preprint arXiv:1811.12253, 2018.
  52. D. Rohatgi. Near-optimal bounds for online caching with machine learned advice. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1834–1845. SIAM, 2020.
  53. M. Sion. On general minimax theorems. Pacific Journal of mathematics, 8(1):171–176, 1958.
  54. Regret and cumulative constraint violation analysis for online convex optimization with long term constraints. In International Conference on Machine Learning, pages 11998–12008. PMLR, 2021.
  55. J. Yuan and A. Lamperski. Online convex optimization for cumulative constraints. In Advances in Neural Information Processing Systems, pages 6137–6146, 2018.
  56. M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936, 2003.
Citations (1)

Summary

We haven't generated a summary for this paper yet.