Combinatorial Stochastic-Greedy Bandit (2312.08057v1)
Abstract: We propose a novel combinatorial stochastic-greedy bandit (SGB) algorithm for combinatorial multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time step $t\in [T]$ is observed. SGB adopts an optimized stochastic-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms. Unlike existing methods that explore the entire set of unselected base arms during each selection step, our SGB algorithm samples only an optimized proportion of unselected arms and selects actions from this subset. We prove that our algorithm achieves a $(1-1/e)$-regret bound of $\mathcal{O}(n{\frac{1}{3}} k{\frac{2}{3}} T{\frac{2}{3}} \log(T){\frac{2}{3}})$ for monotone stochastic submodular rewards, which outperforms the state-of-the-art in terms of the cardinality constraint $k$. Furthermore, we empirically evaluate the performance of our algorithm in the context of online constrained social influence maximization. Our results demonstrate that our proposed approach consistently outperforms the other algorithms, increasing the performance gap as $k$ grows.
- DART: Adaptive Accept Reject Algorithm for Non-Linear Combinatorial Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8): 6557–6565.
- Stochastic Top K-Subset Bandits with Linear Space and Non-Linear Feedback with Applications to Social Influence Maximization. ACM/IMS Transactions on Data Science (TDS), 2(4): 1–39.
- Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2): 235–256.
- The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1): 48–77.
- Diverse client selection for federated learning via submodular maximization. In International Conference on Learning Representations.
- What Doubling Tricks Can and Can’t Do for Multi-Armed Bandits. ArXiv, abs/1803.06971.
- Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008: 10008.
- Stochastic linear optimization under bandit feedback. In 21st Annual Conference on Learning Theory, 355–366.
- Edmonds, J. 2003. Submodular functions, matroids, and certain polyhedra. In Combinatorial Optimization—Eureka, You Shrink!, 11–26. Springer.
- Feige, U. 1998. A threshold of ln n for approximating set cover. Journal of the ACM (JACM), 45(4): 634–652.
- Maximizing non-monotone submodular functions. SIAM Journal on Computing, 40(4): 1133–1153.
- Randomized greedy learning for non-monotone stochastic submodular maximization under full-bandit feedback. In International Conference on Artificial Intelligence and Statistics, 7455–7471. PMLR.
- FilFL: Accelerating Federated Learning via Client Filtering. arXiv preprint arXiv:2302.06599.
- Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6): 1115–1145.
- Online submodular maximization under a matroid constraint with application to learning assignments. arXiv preprint arXiv:1407.1082.
- Hoeffding, W. 1994. Probability inequalities for sums of bounded random variables. In The collected works of Wassily Hoeffding, 409–426. Springer.
- A combinatorial strongly polynomial algorithm for minimizing submodular functions. Journal of the ACM (JACM), 48(4): 761–777.
- Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 137–146.
- Bandit algorithms. Cambridge University Press.
- Online influence maximization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 645–654.
- Learning to Discover Social Circles in Ego Networks. In Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc.
- Online Influence Maximization under Linear Threshold Model. arXiv preprint arXiv:2011.06378.
- Lazier than lazy greedy. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29.
- An analysis of approximations for maximizing submodular set functions—I. Mathematical programming, 14(1): 265–294.
- Online Learning via Offline Greedy: Applications in Market Design and Optimization. EC 2021, Management Science Journal.
- Online learning via offline greedy algorithms: Applications in market design and optimization. In Proceedings of the 22nd ACM Conference on Economics and Computation, 737–738.
- An Explore-then-Commit Algorithm for Submodular Maximization Under Full-bandit Feedback. In The 38th Conference on Uncertainty in Artificial Intelligence.
- A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.
- Budgeted online influence maximization. In International Conference on Machine Learning, 7620–7631. PMLR.
- Promoting Diversity in Recommendation by Entropy Regularizer. In IJCAI.
- Top-k𝑘kitalic_k Combinatorial Bandits with Full-Bandit Feedback. In Algorithmic Learning Theory, 752–776.
- An Optimal Learning Algorithm for Online Unconstrained Submodular Maximization. In Proceedings of the 31st Conference On Learning Theory, 1307–1325.
- An Online Algorithm for Maximizing Submodular Functions. In Proceedings of the 21st International Conference on Neural Information Processing Systems, NIPS’08, 1577–1584. Red Hook, NY, USA: Curran Associates Inc.
- Submodular Bandit Problem Under Multiple Constraints. In Conference on Uncertainty in Artificial Intelligence, 191–200. PMLR.
- Model-independent online learning for influence maximization. In International Conference on Machine Learning, 3530–3539. PMLR.
- Online influence maximization under independent cascade model with semi-bandit feedback. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 3026–3036.
- Fares Fourati (12 papers)
- Christopher John Quinn (7 papers)
- Mohamed-Slim Alouini (524 papers)
- Vaneet Aggarwal (222 papers)