Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semi-Bandit Learning for Monotone Stochastic Optimization (2312.15427v1)

Published 24 Dec 2023 in cs.LG and cs.DS

Abstract: Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get good (approximation) algorithms if these distributions are unknown, and the algorithm needs to learn them through repeated interactions? In this paper, we resolve this question for a large class of "monotone" stochastic problems, by providing a generic online learning algorithm with $\sqrt{T \log T}$ regret relative to the best approximation algorithm (under known distributions). Importantly, our online algorithm works in a semi-bandit setting, where in each period, the algorithm only observes samples from the r.v.s that were actually probed. Our framework applies to several fundamental problems in stochastic optimization such as prophet inequality, Pandora's box, stochastic knapsack, stochastic matchings and stochastic submodular optimization.

Semi-Bandit Learning for Monotone Stochastic Optimization

The paper "Semi-Bandit Learning for Monotone Stochastic Optimization" addresses a fundamental question in stochastic optimization: how can effective algorithms be designed when underlying probability distributions of stochastic inputs are unknown? Unlike traditional methods that assume full distributional knowledge, this paper focuses on scenarios requiring algorithms to learn these distributions through repeated interactions. Specifically, the authors develop an online learning framework tailored for a class of problems termed "monotone" stochastic problems, offering a novel semi-bandit setting that allows for more practical learning when only partial feedback is available.

Key Contributions

The core contribution of this research is the development of an online learning algorithm that demonstrates a regret bound of TlogT\sqrt{T \log T} relative to the best-known approximation algorithm when probability distributions are known. This is significant as it means that despite the absence of full distributional knowledge, the proposed approach asymptotically achieves close to optimal performance. The versatility of the framework is demonstrated across several canonical problems in stochastic optimization, such as stochastic knapsack, stochastic matchings, and prophet inequalities.

The paper lays out a general procedure for transforming offline approximation algorithms into online learning algorithms suitable for unknown distributions. This transformation hinges critically on a designed method to construct "optimistic" empirical distributions that stochastically dominate the true unknown distributions, a principle grounded in the notion of optimism in the face of uncertainty.

Regret Analysis

A primary feature of this work is a detailed regret analysis. The regret, a measure of the performance difference between the algorithm and an oracle with full distributional knowledge, is shown to scale optimally with TT, the number of rounds. The authors employ a clever analytical technique which identifies the semi-bandit settings' unique characteristics, leveraging the probability that a particular item is probed to optimize exploration versus exploitation dynamically.

Practical Implications

The results have broad applications in fields where decision-making under uncertainty is crucial and full feedback is impractical. The domains of online advertising, adaptive K-armed bandit problems, and economic models where acquiring full information incurs costs or delays are particularly relevant. The emphasis on semi-bandit feedback provides a pragmatic angle, making the algorithms applicable to real-world systems where only partial data is accessible during learning phases.

Future Directions

The paper opens several avenues for future exploration. Potential improvements include developing broader classes of stochastic problems beyond the monotone constraints while maintaining efficient regret bounds. Another direction could involve refining the empirical distribution estimates used in constructing the learning strategy to further improve computational performance and scalability.

In summary, the semi-bandit learning framework for monotone stochastic optimization stands as a robust contribution to the field of online learning, offering promising pathways for efficient decision-making in uncertain environments. Theoretically, it narrows the gulf between full-information algorithms and those with restricted feedback, framing a compelling narrative for further inquiry and innovation in adaptive learning systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Marek Adamczyk. Improved analysis of the greedy algorithm for stochastic matching. Inf. Process. Lett., 111(15):731–737, 2011.
  2. Prophet inequalities with limited information. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1358–1377. SIAM, 2014.
  3. Saeed Alaei. Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers. SIAM J. Comput., 43(2):930–972, 2014.
  4. Maximizing stochastic monotone submodular functions. Management Science, 62(8):2374–2391, 2016.
  5. Minimax regret bounds for reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70, pages 263–272. PMLR, 2017.
  6. Submodular stochastic probing on matroids. Math. Oper. Res., 41(3):1022–1038, 2016.
  7. Pandora’s box problem with order constraints. Math. Oper. Res., 48(1):498–519, 2023.
  8. Improved approximation results for stochastic knapsack problems. In Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1647–1665, 2011.
  9. When LP is the cure for your matching woes: Improved bounds for stochastic matchings. Algorithmica, 63(4):733–762, 2012.
  10. Improved guarantees for offline stochastic matching via new ordered contention resolution schemes. In Advances in Neural Information Processing Systems, pages 27184–27195, 2021.
  11. (near) optimal adaptivity gaps for stochastic multi-value probing. In Approximation, Randomization, and Combinatorial Optimization (APPROX/RANDOM), volume 145 of LIPIcs, pages 49:1–49:21, 2019.
  12. Richard Butterworth. Some reliability fault-testing models. Operations Research, 20(2):335–343, 1972.
  13. Prediction, learning, and games. Cambridge university press, 2006.
  14. Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput., 40(6):1740–1766, 2011.
  15. Single-sample prophet inequalities via greedy-ordered selection. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1298–1325. SIAM, 2022.
  16. Prophet inequalities for iid random variables from an unknown distribution. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 3–17, 2019.
  17. Pandora’s box with correlations: Learning and approximation. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 1214–1225. IEEE, 2020.
  18. Approximating matches made in heaven. In 36th International Colloquium on Automata, Languages and Programming (ICALP, pages 266–278.
  19. The sample complexity of revenue maximization. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 243–252, 2014.
  20. Submodular function maximization via the multilinear relaxation and contention resolution schemes. SIAM J. Comput., 43(6):1831–1879, 2014.
  21. Approximating the stochastic knapsack problem: The benefit of adaptivity. Math. Oper. Res., 33(4):945–964, 2008.
  22. Approximation algorithms for stochastic submodular set cover with applications to boolean function evaluation and min-knapsack. ACM Transactions on Algorithms (TALG), 12(3):1–28, 2016.
  23. Analytical approach to parallel repetition. In ACM Symposium on Theory of Computing, pages 624–633, 2014.
  24. Prophet matching with general arrivals. Math. Oper. Res., 47(2):878–898, 2022.
  25. Online pandora’s boxes and bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1885–1892, 2019.
  26. Online contention resolution schemes with applications to bayesian selection problems. SIAM J. Comput., 50(2):255–300, 2021.
  27. Naveen Garg. Saving an epsilon: a 2-approximation for the k-mst problem in graphs. In ACM Symposium on Theory of Computing, pages 396–402. ACM, 2005.
  28. Generalizing complex hypotheses on product distributions: Auctions, prophet inequalities, and pandora’s problem. In Conference on Learning Theory, pages 2248–2288. PMLR, 2021.
  29. Settling the sample complexity of single-parameter revenue maximization. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 662–673. ACM, 2019.
  30. Adaptive submodularity: A new approach to active learning and stochastic optimization. CoRR, abs/1003.3967, 2017.
  31. Approximation algorithms for correlated knapsacks and non-martingale bandits. In Rafail Ostrovsky, editor, IEEE 52nd Annual Symposium on Foundations of Computer Science (FOCS), pages 827–836, 2011.
  32. Running errands in time: Approximation algorithms for stochastic orienteering. Math. Oper. Res., 40(1):56–79, 2015.
  33. Bandit algorithms for prophet inequality and pandora’s box. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM, 2024.
  34. Adaptive submodular maximization in bandit setting. Advances in Neural Information Processing Systems, 26, 2013.
  35. A stochastic probing problem with applications. In Integer Programming and Combinatorial Optimization - 16th International Conference, pages 205–216, 2013.
  36. Adaptivity gaps for stochastic probing: Submodular and xos functions. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1688–1702. SIAM, 2017.
  37. Online learning for min sum set cover and pandora’s box. In International Conference on Machine Learning, pages 7382–7403. PMLR, 2022.
  38. Stochastic covering and adaptivity. In Latin American symposium on theoretical informatics, pages 532–543. Springer, 2006.
  39. Subhashis Ghosal and Aad Van der Vaart. Fundamentals of nonparametric Bayesian inference, volume 44. Cambridge University Press, 2017.
  40. Elad Hazan. Introduction to online convex optimization. MIT Press, 2022.
  41. A tight bound for stochastic submodular cover. J. Artif. Intell. Res., 71:347–370, 2021.
  42. Minimum latency submodular cover. ACM Transactions on Algorithms (TALG), 13(1):1–28, 2016.
  43. Is q-learning provably efficient? In Advances in Neural Information Processing Systems, pages 4868–4878, 2018.
  44. Learning adversarial markov decision processes with bandit feedback and unknown transition. In Proceedings of the 37th International Conference on Machine Learning, (ICML), volume 119, pages 4860–4869. PMLR, 2020.
  45. Algorithms and adaptivity gaps for stochastic k-tsp. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
  46. Tight guarantees for multi-unit prophet inequalities and online stochastic knapsack. In ACM-SIAM Symposium on Discrete Algorithms, (SODA), pages 1221–1246. SIAM, 2022.
  47. Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res., 11:1563–1600, 2010.
  48. Playing games with approximation algorithms. SIAM J. Comput., 39(3):1088–1106, 2009.
  49. Semiamarts and finite values. Bulletin of the American Mathematical Society, 83:745–747, 1977.
  50. Approximations for monotone and nonmonotone submodular maximization with knapsack constraints. Math. Oper. Res., 38(4):729–739, 2013.
  51. Efficient algorithms for online decision problems. J. Comput. Syst. Sci., 71(3):291–307, 2005.
  52. Matroid prophet inequalities. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 123–136, 2012.
  53. Matroid prophet inequalities and applications to multi-dimensional mechanism design. Games Econ. Behav., 113:97–115, 2019.
  54. Descending price optimally coordinates search. In ACM Conference on Economics and Computation, pages 23–24. ACM, 2016.
  55. Asymptotically Efficient Adaptive Allocation Rules. Advances in Applied Mathematics, 6:4–22, 1985.
  56. Bandit algorithms. Cambridge University Press, 2020.
  57. Will Ma. Improvements and generalizations of stochastic knapsack and markovian bandits approximation algorithms. Math. Oper. Res., 43(3):789–812, 2018.
  58. Online learning via offline greedy algorithms: Applications in market design and optimization. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 737–738, 2021.
  59. An analysis of approximations for maximizing submodular set functions - I. Math. Program., 14(1):265–294, 1978.
  60. On lower bounds for regret in reinforcement learning. CoRR, abs/1608.02732, 2016.
  61. R Rado. Theorems on linear combinatorial topology and general measure. Annals of Mathematics, pages 228–270, 1943.
  62. Ironing in the dark. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 1–18, 2016.
  63. Combinatorial prophet inequalities. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1671–1687. SIAM, 2017.
  64. Optimal single-choice prophet inequalities from samples. In 11th Innovations in Theoretical Computer Science Conference, (ITCS), volume 151 of LIPIcs, pages 60:1–60:10, 2020.
  65. Alexander Schrijver et al. Combinatorial optimization: polyhedra and efficiency, volume 24. Springer, 2003.
  66. Ester Samuel-Cahn. Comparison of threshold stop rules and maximum for independent nonnegative random variables. the Annals of Probability, pages 1213–1216, 1984.
  67. Sahil Singla. The price of information in combinatorial optimization. In Proceedings of the twenty-ninth annual ACM-SIAM symposium on discrete algorithms, pages 2523–2532. SIAM, 2018.
  68. Martin L Weitzman. Optimal Search for the Best Alternative. Econometrica, 47(3):641–654, May 1979.
  69. L.A. Wolsey. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385–393, 1982.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Arpit Agarwal (26 papers)
  2. Rohan Ghuge (10 papers)
  3. Viswanath Nagarajan (47 papers)
Youtube Logo Streamline Icon: https://streamlinehq.com